Commit Graph

156793 Commits

Author SHA1 Message Date
Steven Rostedt (Google)
d9bbfbd14f ftrace: Allow subops filtering to be modified
The subops filters use a "manager" ops to enable and disable its filters.
The manager ops can handle more than one subops, and its filter is what
controls what functions get set. Add a ftrace_hash_move_and_update_subops()
function that will update the manager ops when the subops filters change.

Link: https://lore.kernel.org/linux-trace-kernel/20240603190822.673932251@goodmis.org

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Guo Ren <guoren@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-06-04 10:36:44 -04:00
Steven Rostedt (Google)
5fccc7552c ftrace: Add subops logic to allow one ops to manage many
There are cases where a single system will use a single function callback
to handle multiple users. For example, to allow function_graph tracer to
have multiple users where each can trace their own set of functions, it is
useful to only have one ftrace_ops registered to ftrace that will call a
function by the function_graph tracer to handle the multiplexing with the
different registered  function_graph tracers.

Add a "subop_list" to the ftrace_ops that will hold a list of other
ftrace_ops that the top ftrace_ops will manage.

The function ftrace_startup_subops() that takes the manager ftrace_ops and
a subop ftrace_ops it will manage. If there are no subops with the
ftrace_ops yet, it will copy the ftrace_ops subop filters to the manager
ftrace_ops and register that with ftrace_startup(), and adds the subop to
its subop_list. If the manager ops already has something registered, it
will then merge the new subop filters with what it has and enable the new
functions that covers all the subops it has.

To remove a subop, ftrace_shutdown_subops() is called which will use the
subop_list of the manager ops to rebuild all the functions it needs to
trace, and update the ftrace records to only call the functions it now has
registered. If there are no more functions registered, it will then call
ftrace_shutdown() to disable itself completely.

Note, it is up to the manager ops callback to always make sure that the
subops callbacks are called if its filter matches, as there are times in
the update where the callback could be calling more functions than those
that are currently registered.

This could be updated to handle other systems other than function_graph,
for example, fprobes could use this (but will need an interface to call
ftrace_startup_subops()).

Link: https://lore.kernel.org/linux-trace-kernel/20240603190822.508431129@goodmis.org

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Guo Ren <guoren@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-06-04 10:36:38 -04:00
Steven Rostedt (VMware)
ab6b846303 ftrace: Allow ftrace startup flags to exist without dynamic ftrace
Some of the flags for ftrace_startup() may be exposed even when
CONFIG_DYNAMIC_FTRACE is not configured in. This is fine as the difference
between dynamic ftrace and static ftrace is done within the internals of
ftrace itself. No need to have use cases fail to compile because dynamic
ftrace is disabled.

This change is needed to move some of the logic of what is passed to
ftrace_startup() out of the parameters of ftrace_startup().

Link: https://lore.kernel.org/linux-trace-kernel/171509100890.162236.4362350342549122222.stgit@devnote2
Link: https://lore.kernel.org/linux-trace-kernel/20240603190822.350654104@goodmis.org

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Guo Ren <guoren@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-06-04 10:36:33 -04:00
Steven Rostedt (VMware)
26dda5631d ftrace: Allow function_graph tracer to be enabled in instances
Now that function graph tracing can handle more than one user, allow it to
be enabled in the ftrace instances. Note, the filtering of the functions is
still joined by the top level set_ftrace_filter and friends, as well as the
graph and nograph files.

Co-developed with Masami Hiramatsu:
Link: https://lore.kernel.org/linux-trace-kernel/171509099743.162236.1699959255446248163.stgit@devnote2
Link: https://lore.kernel.org/linux-trace-kernel/20240603190822.190630762@goodmis.org

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Guo Ren <guoren@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-06-04 10:36:28 -04:00
Steven Rostedt (VMware)
37238abe3c ftrace/function_graph: Pass fgraph_ops to function graph callbacks
Pass the fgraph_ops structure to the function graph callbacks. This will
allow callbacks to add a descriptor to a fgraph_ops private field that wil
be added in the future and use it for the callbacks. This will be useful
when more than one callback can be registered to the function graph tracer.

Co-developed with Masami Hiramatsu:
Link: https://lore.kernel.org/linux-trace-kernel/171509098588.162236.4787930115997357578.stgit@devnote2
Link: https://lore.kernel.org/linux-trace-kernel/20240603190822.035147698@goodmis.org

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Guo Ren <guoren@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-06-04 10:36:22 -04:00
Steven Rostedt (VMware)
7aa1eaef9f function_graph: Allow multiple users to attach to function graph
Allow for multiple users to attach to function graph tracer at the same
time. Only 16 simultaneous users can attach to the tracer. This is because
there's an array that stores the pointers to the attached fgraph_ops. When
a function being traced is entered, each of the ftrace_ops entryfunc is
called and if it returns non zero, its index into the array will be added
to the shadow stack.

On exit of the function being traced, the shadow stack will contain the
indexes of the ftrace_ops on the array that want their retfunc to be
called.

Because a function may sleep for a long time (if a task sleeps itself),
the return of the function may be literally days later. If the ftrace_ops
is removed, its place on the array is replaced with a ftrace_ops that
contains the stub functions and that will be called when the function
finally returns.

If another ftrace_ops is added that happens to get the same index into the
array, its return function may be called. But that's actually the way
things current work with the old function graph tracer. If one tracer is
removed and another is added, the new one will get the return calls of the
function traced by the previous one, thus this is not a regression. This
can be fixed by adding a counter to each time the array item is updated and
save that on the shadow stack as well, such that it won't be called if the
index saved does not match the index on the array.

Note, being able to filter functions when both are called is not completely
handled yet, but that shouldn't be too hard to manage.

Co-developed with Masami Hiramatsu:
Link: https://lore.kernel.org/linux-trace-kernel/171509096221.162236.8806372072523195752.stgit@devnote2
Link: https://lore.kernel.org/linux-trace-kernel/20240603190821.555493396@goodmis.org

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Guo Ren <guoren@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-06-04 10:35:58 -04:00
Steven Rostedt (VMware)
42675b723b function_graph: Convert ret_stack to a series of longs
In order to make it possible to have multiple callbacks registered with the
function_graph tracer, the retstack needs to be converted from an array of
ftrace_ret_stack structures to an array of longs. This will allow to store
the list of callbacks on the stack for the return side of the functions.

Link: https://lore.kernel.org/linux-trace-kernel/171509092742.162236.4427737821399314856.stgit@devnote2
Link: https://lore.kernel.org/linux-trace-kernel/20240603190821.073111754@goodmis.org

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Guo Ren <guoren@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-06-04 10:35:09 -04:00
Linus Torvalds
f26ee67a0f Merge tag 'tty-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty fix from Greg KH:
 "Here is a single revert for a much-reported regression in 6.10-rc1
  when it comes to a few older architectures.

  Turns out that the VT ioctls don't work the same across all cpu types
  because of some old compatibility requrements for stuff like alpha and
  powerpc. So revert the change that attempted to have them use the
  _IO() macros and go back to the known-working values instead.

  This has NOT been in linux-next but has had many reports that it fixes
  the issue with 6.10-rc1"

* tag 'tty-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  Revert "VT: Use macros to define ioctls"
2024-06-01 08:53:39 -07:00
Greg Kroah-Hartman
7bc4244c88 Revert "VT: Use macros to define ioctls"
This reverts commit 8c467f3300.

Turns out this breaks many architectures as the vt ioctls do not all
match up everywhere due to historical reasons, so the original commit is
invalid for many values.

Reported-by: Nick Bowler <nbowler@draconx.ca>
Reported-by: Arnd Bergmann <arnd@kernel.org>
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Gladkov <legion@kernel.org>
Link: https://lore.kernel.org/r/ad4e561c-1d49-4f25-882c-7a36c6b1b5c0@draconx.ca
Link: https://lore.kernel.org/r/0da9785e-ba44-4718-9d08-4e96c1ba7ab2@kernel.org
Link: https://lore.kernel.org/all/34d848f4-670b-4493-bf21-130ef862521b@xenosoft.de/
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-01 07:28:21 +02:00
Linus Torvalds
cc8ed4d0a8 Merge tag 'drm-fixes-2024-06-01' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
 "This is the weekly fixes. Lots of small fixes across the board, one
  BUG_ON fix in shmem seems most important, otherwise amdgpu, i915, xe
  mostly with small fixes to all the other drivers.

  shmem:
   - fix BUG_ON in COW handling
   - warn when trying to pin imported objects

  buddy:
   - fix page size handling

  dma-buf:
   - sw-sync: Don't interfere with IRQ handling
   - fix kthreads-handling error path

  i915:
   - fix a race in audio component by registering it later
   - make DPT object unshrinkable to avoid shrinking when framebuffer
     has not shrunk
   - fix CCS id calculation to fix a perf regression
   - fix selftest caching mode
   - fix FIELD_PREP compiler warnings
   - fix indefinite wait for GT wakeref release
   - revert overeager multi-gt pm reference removal

  xe:
   - pcode polling timeout change
   - fix for deadlocks for faulting VMs
   - error-path lock imbalance fix

  amdgpu:
   - RAS fix
   - fix colorspace property for MST connectors
   - fix for PCIe DPM
   - silence UBSAN warning
   - GPUVM robustness fix
   - partition fix
   - drop deprecated I2C_CLASS_SPD

  amdkfd:
   - revert unused changes for certain 11.0.3 devices
   - simplify APU VRAM handling

  lima:
   - fix dma_resv-related deadlock in object pin

  msm:
   - remove build-time dependency on Python 3.9

  nouveau:
   - nvif: Fix possible integer overflow

  panel:
   - lg-sw43408: Select DP helpers; Declare backlight ops as static
   - sitronix-st7789v: Various fixes for jt240mhqs_hwt_ek_e3 panel

  panfrost:
   - fix dma_resv-related deadlock in object pin"

* tag 'drm-fixes-2024-06-01' of https://gitlab.freedesktop.org/drm/kernel: (35 commits)
  drm/msm: remove python 3.9 dependency for compiling msm
  drm/panel: sitronix-st7789v: fix display size for jt240mhqs_hwt_ek_e3 panel
  drm/panel: sitronix-st7789v: tweak timing for jt240mhqs_hwt_ek_e3 panel
  drm/panel: sitronix-st7789v: fix timing for jt240mhqs_hwt_ek_e3 panel
  drm/amd/pm: remove deprecated I2C_CLASS_SPD support from newly added SMU_14_0_2
  drm/amdgpu: Make CPX mode auto default in NPS4
  drm/amdkfd: simplify APU VRAM handling
  Revert "drm/amdkfd: fix gfx_target_version for certain 11.0.3 devices"
  drm/amdgpu: fix dereference null return value for the function amdgpu_vm_pt_parent
  drm/amdgpu: silence UBSAN warning
  drm/amdgpu: Adjust logic in amdgpu_device_partner_bandwidth()
  drm/i915: Fix audio component initialization
  drm/i915/dpt: Make DPT object unshrinkable
  drm/i915/gt: Fix CCS id's calculation for CCS mode setting
  drm/panel/lg-sw43408: mark sw43408_backlight_ops as static
  drm/i915/selftests: Set always_coherent to false when reading from CPU
  drm/panel/lg-sw43408: select CONFIG_DRM_DISPLAY_DP_HELPER
  drm/i915/guc: avoid FIELD_PREP warning
  drm/i915/gt: Disarm breadcrumbs if engines are already idle
  Revert "drm/i915: Remove extra multi-gt pm-references"
  ...
2024-05-31 16:26:48 -07:00
Linus Torvalds
0f9a75179d Merge tag 'block-6.10-20240530' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:

 - NVMe fixes via Keith:
      - Removing unused fields (Kanchan)
      - Large folio offsets support (Kundan)
      - Multipath NUMA node initialiazation fix (Nilay)
      - Multipath IO stats accounting fixes (Keith)
      - Circular lockdep fix (Keith)
      - Target race condition fix (Sagi)
      - Target memory leak fix (Sagi)

 - bcache fixes

 - null_blk fixes (Damien)

 - Fix regression in io.max due to throttle low removal (Waiman)

 - DM limit table fixes (Christoph)

 - SCSI and block limit fixes (Christoph)

 - zone fixes (Damien)

 - Misc fixes (Christoph, Hannes, hexue)

* tag 'block-6.10-20240530' of git://git.kernel.dk/linux: (25 commits)
  blk-throttle: Fix incorrect display of io.max
  block: Fix zone write plugging handling of devices with a runt zone
  block: Fix validation of zoned device with a runt zone
  null_blk: Do not allow runt zone with zone capacity smaller then zone size
  nvmet: fix a possible leak when destroy a ctrl during qp establishment
  nvme: use srcu for iterating namespace list
  bcache: code cleanup in __bch_bucket_alloc_set()
  bcache: call force_wake_up_gc() if necessary in check_should_bypass()
  bcache: allow allocator to invalidate bucket in gc
  block: check for max_hw_sectors underflow
  block: stack max_user_sectors
  sd: also set max_user_sectors when setting max_sectors
  null_blk: Print correct max open zones limit in null_init_zoned_dev()
  block: delete redundant function declaration
  null_blk: Fix return value of nullb_device_power_store()
  dm: make dm_set_zones_restrictions work on the queue limits
  dm: remove dm_check_zoned
  dm: move setting zoned_enabled to dm_table_set_restrictions
  block: remove blk_queue_max_integrity_segments
  nvme: adjust multiples of NVME_CTRL_PAGE_SIZE in offset
  ...
2024-05-31 15:31:27 -07:00
Linus Torvalds
7d88cc8ecc Merge tag 'sound-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
 "Lots of small fixes:

   - A race fix for debugfs handling in ALSA core

   - A series of corrections for MIDI2 core format conversions

   - ASoC Intel fixes for 16 bit DMIC config

   - Updates for missing module parameters in ASoC code

   - HD-audio quirk, Cirrus codec fix, etc minor fixes"

* tag 'sound-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (26 commits)
  ALSA: seq: ump: Fix swapped song position pointer data
  ASoC: SOF: ipc4-topology: Adjust the params based on DAI formats
  ASoC: SOF: ipc4-topology: Improve readability of sof_ipc4_prepare_dai_copier()
  ASoC: SOF: ipc4-topology/pcm: Rename sof_ipc4_copier_is_single_format()
  ASoC: SOF: ipc4-topology: Print out the channel count in sof_ipc4_dbg_audio_format
  ASoC: SOF: ipc4-topology: Add support for NHLT with 16-bit only DMIC blob
  ALSA: seq: Fix yet another spot for system message conversion
  ALSA: ump: Set default protocol when not given explicitly
  ALSA: ump: Don't accept an invalid UMP protocol number
  ASoC: SOF: ipc4-topology: Fix input format query of process modules without base extension
  ASoC: Intel: sof-sdw: fix missing SPI_MASTER dependency
  ALSA: pcm: fix typo in comment
  ALSA: ump: Don't clear bank selection after sending a program change
  ALSA: seq: Fix incorrect UMP type for system messages
  ALSA/hda: intel-dsp-config: reduce log verbosity
  ALSA: seq: Don't clear bank selection at event -> UMP MIDI2 conversion
  ALSA: seq: Fix missing bank setup between MIDI1/MIDI2 UMP conversion
  ASoC: SOF: add missing MODULE_DESCRIPTION()
  ASoC: SOF: reorder MODULE_ definitions
  ASoC: SOF: AMD: group all module related information
  ...
2024-05-31 12:11:44 -07:00
Damien Le Moal
29459c3eaa block: Fix zone write plugging handling of devices with a runt zone
A zoned device may have a last sequential write required zone that is
smaller than other zones. However, all tests to check if a zone write
plug write offset exceeds the zone capacity use the same capacity
value stored in the gendisk zone_capacity field. This is incorrect for a
zoned device with a last runt (smaller) zone.

Add the new field last_zone_capacity to struct gendisk to store the
capacity of the last zone of the device. blk_revalidate_seq_zone() and
blk_revalidate_conv_zone() are both modified to get this value when
disk_zone_is_last() returns true. Similarly to zone_capacity, the value
is first stored using the last_zone_capacity field of struct
blk_revalidate_zone_args. Once zone revalidation of all zones is done,
this is used to set the gendisk last_zone_capacity field.

The checks to determine if a zone is full or if a sector offset in a
zone exceeds the zone capacity in disk_should_remove_zone_wplug(),
disk_zone_wplug_abort_unaligned(), blk_zone_write_plug_init_request(),
and blk_zone_wplug_prepare_bio() are modified to use the new helper
functions disk_zone_is_full() and disk_zone_wplug_is_full().
disk_zone_is_full() uses the zone index to determine if the zone being
tested is the last one of the disk and uses the either the disk
zone_capacity or last_zone_capacity accordingly.

Fixes: dd291d77cc ("block: Introduce zone write plugging")
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Link: https://lore.kernel.org/r/20240530054035.491497-4-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-05-30 15:03:52 -06:00
Linus Torvalds
d8ec19857b Merge tag 'net-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
 "Including fixes from bpf and netfilter.

  Current release - regressions:

   - gro: initialize network_offset in network layer

   - tcp: reduce accepted window in NEW_SYN_RECV state

  Current release - new code bugs:

   - eth: mlx5e: do not use ptp structure for tx ts stats when not
     initialized

   - eth: ice: check for unregistering correct number of devlink params

  Previous releases - regressions:

   - bpf: Allow delete from sockmap/sockhash only if update is allowed

   - sched: taprio: extend minimum interval restriction to entire cycle
     too

   - netfilter: ipset: add list flush to cancel_gc

   - ipv4: fix address dump when IPv4 is disabled on an interface

   - sock_map: avoid race between sock_map_close and sk_psock_put

   - eth: mlx5: use mlx5_ipsec_rx_status_destroy to correctly delete
     status rules

  Previous releases - always broken:

   - core: fix __dst_negative_advice() race

   - bpf:
       - fix multi-uprobe PID filtering logic
       - fix pkt_type override upon netkit pass verdict

   - netfilter: tproxy: bail out if IP has been disabled on the device

   - af_unix: annotate data-race around unix_sk(sk)->addr

   - eth: mlx5e: fix UDP GSO for encapsulated packets

   - eth: idpf: don't enable NAPI and interrupts prior to allocating Rx
     buffers

   - eth: i40e: fully suspend and resume IO operations in EEH case

   - eth: octeontx2-pf: free send queue buffers incase of leaf to inner

   - eth: ipvlan: dont Use skb->sk in ipvlan_process_v{4,6}_outbound"

* tag 'net-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (69 commits)
  netdev: add qstat for csum complete
  ipvlan: Dont Use skb->sk in ipvlan_process_v{4,6}_outbound
  net: ena: Fix redundant device NUMA node override
  ice: check for unregistering correct number of devlink params
  ice: fix 200G PHY types to link speed mapping
  i40e: Fully suspend and resume IO operations in EEH case
  i40e: factoring out i40e_suspend/i40e_resume
  e1000e: move force SMBUS near the end of enable_ulp function
  net: dsa: microchip: fix RGMII error in KSZ DSA driver
  ipv4: correctly iterate over the target netns in inet_dump_ifaddr()
  net: fix __dst_negative_advice() race
  nfc/nci: Add the inconsistency check between the input data length and count
  MAINTAINERS: dwmac: starfive: update Maintainer
  net/sched: taprio: extend minimum interval restriction to entire cycle too
  net/sched: taprio: make q->picos_per_byte available to fill_sched_entry()
  netfilter: nft_fib: allow from forward/input without iif selector
  netfilter: tproxy: bail out if IP has been disabled on the device
  netfilter: nft_payload: skbuff vlan metadata mangle support
  net: ti: icssg-prueth: Fix start counter for ft1 filter
  sock_map: avoid race between sock_map_close and sk_psock_put
  ...
2024-05-30 08:33:04 -07:00
Jakub Kicinski
13c7c941e7 netdev: add qstat for csum complete
Recent commit 0cfe71f45f ("netdev: add queue stats") added
a lot of useful stats, but only those immediately needed by virtio.
Presumably virtio does not support CHECKSUM_COMPLETE,
so statistic for that form of checksumming wasn't included.
Other drivers will definitely need it, in fact we expect it
to be needed in net-next soon (mlx5). So let's add the definition
of the counter for CHECKSUM_COMPLETE to uAPI in net already,
so that the counters are in a more natural order (all subsequent
counters have not been present in any released kernel, yet).

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Fixes: 0cfe71f45f ("netdev: add queue stats")
Link: https://lore.kernel.org/r/20240529163547.3693194-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-30 12:15:56 +02:00
Eric Dumazet
92f1655aa2 net: fix __dst_negative_advice() race
__dst_negative_advice() does not enforce proper RCU rules when
sk->dst_cache must be cleared, leading to possible UAF.

RCU rules are that we must first clear sk->sk_dst_cache,
then call dst_release(old_dst).

Note that sk_dst_reset(sk) is implementing this protocol correctly,
while __dst_negative_advice() uses the wrong order.

Given that ip6_negative_advice() has special logic
against RTF_CACHE, this means each of the three ->negative_advice()
existing methods must perform the sk_dst_reset() themselves.

Note the check against NULL dst is centralized in
__dst_negative_advice(), there is no need to duplicate
it in various callbacks.

Many thanks to Clement Lecigne for tracking this issue.

This old bug became visible after the blamed commit, using UDP sockets.

Fixes: a87cb3e48e ("net: Facility to report route quality of connected sockets")
Reported-by: Clement Lecigne <clecigne@google.com>
Diagnosed-by: Clement Lecigne <clecigne@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <tom@herbertland.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240528114353.1794151-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-29 17:34:49 -07:00
Alexandre Belloni
6d40dbc758 ALSA: pcm: fix typo in comment
Fix the typo in the comment for SNDRV_PCM_RATE_KNOT

Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20240528191850.63314-1-alexandre.belloni@bootlin.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-05-29 10:40:36 +02:00
Jarkko Sakkinen
f09fc6cee0 tpm: Rename TPM2_OA_TMPL to TPM2_OA_NULL_KEY and make it local
Rename and document TPM2_OA_TMPL, as originally requested in the patch
set review, but left unaddressed without any appropriate reasoning. The
new name is TPM2_OA_NULL_KEY, has a documentation and is local only to
tpm2-sessions.c.

Link: https://lore.kernel.org/linux-integrity/ddbeb8111f48a8ddb0b8fca248dff6cc9d7079b2.camel@HansenPartnership.com/
Link: https://lore.kernel.org/linux-integrity/CZCKTWU6ZCC9.2UTEQPEVICYHL@suppilovahvero/
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2024-05-28 13:14:23 +03:00
Jarkko Sakkinen
f3d7ba9e1b tpm: Open code tpm_buf_parameters()
With only single call site, this makes no sense (slipped out of the
radar during the review). Open code and document the action directly
to the site, to make it more readable.

Fixes: 1b6d7f9eb1 ("tpm: add session encryption protection to tpm2_get_random()")
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2024-05-28 13:03:57 +03:00
Alexander Lobakin
266aa3b481 page_pool: fix &page_pool_params kdoc issues
After the tagged commit, @netdev got documented twice and the kdoc
script didn't notice that. Remove the second description added later
and move the initial one according to the field position.

After merging commit 5f8e4007c1 ("kernel-doc: fix
struct_group_tagged() parsing"), kdoc requires to describe struct
groups as well. &page_pool_params has 2 struct groups which
generated new warnings, describe them to resolve this.

Fixes: 403f11ac9a ("page_pool: don't use driver-set flags field directly")
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/r/20240524112859.2757403-1-aleksander.lobakin@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-27 17:00:22 -07:00
Eric Dumazet
f4dca95fc0 tcp: reduce accepted window in NEW_SYN_RECV state
Jason commit made checks against ACK sequence less strict
and can be exploited by attackers to establish spoofed flows
with less probes.

Innocent users might use tcp_rmem[1] == 1,000,000,000,
or something more reasonable.

An attacker can use a regular TCP connection to learn the server
initial tp->rcv_wnd, and use it to optimize the attack.

If we make sure that only the announced window (smaller than 65535)
is used for ACK validation, we force an attacker to use
65537 packets to complete the 3WHS (assuming server ISN is unknown)

Fixes: 378979e94e ("tcp: remove 64 KByte limit for initial tp->rcv_wnd value")
Link: https://datatracker.ietf.org/meeting/119/materials/slides-119-tcpm-ghost-acks-00
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://lore.kernel.org/r/20240523130528.60376-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-27 16:47:23 -07:00
Jakub Kicinski
2786ae339e Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2024-05-27

We've added 15 non-merge commits during the last 7 day(s) which contain
a total of 18 files changed, 583 insertions(+), 55 deletions(-).

The main changes are:

1) Fix broken BPF multi-uprobe PID filtering logic which filtered by thread
   while the promise was to filter by process, from Andrii Nakryiko.

2) Fix the recent influx of syzkaller reports to sockmap which triggered
   a locking rule violation by performing a map_delete, from Jakub Sitnicki.

3) Fixes to netkit driver in particular on skb->pkt_type override upon pass
   verdict, from Daniel Borkmann.

4) Fix an integer overflow in resolve_btfids which can wrongly trigger build
   failures, from Friedrich Vock.

5) Follow-up fixes for ARC JIT reported by static analyzers,
   from Shahab Vahedi.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: Cover verifier checks for mutating sockmap/sockhash
  Revert "bpf, sockmap: Prevent lock inversion deadlock in map delete elem"
  bpf: Allow delete from sockmap/sockhash only if update is allowed
  selftests/bpf: Add netkit test for pkt_type
  selftests/bpf: Add netkit tests for mac address
  netkit: Fix pkt_type override upon netkit pass verdict
  netkit: Fix setting mac address in l2 mode
  ARC, bpf: Fix issues reported by the static analyzers
  selftests/bpf: extend multi-uprobe tests with USDTs
  selftests/bpf: extend multi-uprobe tests with child thread case
  libbpf: detect broken PID filtering logic for multi-uprobe
  bpf: remove unnecessary rcu_read_{lock,unlock}() in multi-uprobe attach logic
  bpf: fix multi-uprobe PID filtering logic
  bpf: Fix potential integer overflow in resolve_btfids
  MAINTAINERS: Add myself as reviewer of ARM64 BPF JIT
====================

Link: https://lore.kernel.org/r/20240527203551.29712-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-27 16:26:30 -07:00
Christoph Hellwig
80e4e17ac9 block: remove blk_queue_max_integrity_segments
This is unused now that all the atomic queue limit conversions are
merged.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240521221606.393040-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-05-27 09:16:22 -06:00
Linus Torvalds
e4c07ec89e Merge tag 'vfs-6.10-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:

 - Fix io_uring based write-through after converting cifs to use the
   netfs library

 - Fix aio error handling when doing write-through via netfs library

 - Fix performance regression in iomap when used with non-large folio
   mappings

 - Fix signalfd error code

 - Remove obsolete comment in signalfd code

 - Fix async request indication in netfs_perform_write() by raising
   BDP_ASYNC when IOCB_NOWAIT is set

 - Yield swap device immediately to prevent spurious EBUSY errors

 - Don't cross a .backup mountpoint from backup volumes in afs to avoid
   infinite loops

 - Fix a race between umount and async request completion in 9p after 9p
   was converted to use the netfs library

* tag 'vfs-6.10-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  netfs, 9p: Fix race between umount and async request completion
  afs: Don't cross .backup mountpoint from backup volume
  swap: yield device immediately
  netfs: Fix setting of BDP_ASYNC from iocb flags
  signalfd: drop an obsolete comment
  signalfd: fix error return code
  iomap: fault in smaller chunks for non-large folio mappings
  filemap: add helper mapping_max_folio_size()
  netfs: Fix AIO error handling when doing write-through
  netfs: Fix io_uring based write-through
2024-05-27 08:09:12 -07:00
David Howells
f89ea63f1c netfs, 9p: Fix race between umount and async request completion
There's a problem in 9p's interaction with netfslib whereby a crash occurs
because the 9p_fid structs get forcibly destroyed during client teardown
(without paying attention to their refcounts) before netfslib has finished
with them.  However, it's not a simple case of deferring the clunking that
p9_fid_put() does as that requires the p9_client record to still be
present.

The problem is that netfslib has to unlock pages and clear the IN_PROGRESS
flag before destroying the objects involved - including the fid - and, in
any case, nothing checks to see if writeback completed barring looking at
the page flags.

Fix this by keeping a count of outstanding I/O requests (of any type) and
waiting for it to quiesce during inode eviction.

Reported-by: syzbot+df038d463cca332e8414@syzkaller.appspotmail.com
Link: https://lore.kernel.org/all/0000000000005be0aa061846f8d6@google.com/
Reported-by: syzbot+d7c7a495a5e466c031b6@syzkaller.appspotmail.com
Link: https://lore.kernel.org/all/000000000000b86c5e06130da9c6@google.com/
Reported-by: syzbot+1527696d41a634cc1819@syzkaller.appspotmail.com
Link: https://lore.kernel.org/all/000000000000041f960618206d7e@google.com/
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/755891.1716560771@warthog.procyon.org.uk
Tested-by: syzbot+d7c7a495a5e466c031b6@syzkaller.appspotmail.com
Reviewed-by: Dominique Martinet <asmadeus@codewreck.org>
cc: Eric Van Hensbergen <ericvh@kernel.org>
cc: Latchesar Ionkov <lucho@ionkov.net>
cc: Christian Schoenebeck <linux_oss@crudebyte.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: Steve French <sfrench@samba.org>
cc: Hillf Danton <hdanton@sina.com>
cc: v9fs@lists.linux.dev
cc: linux-afs@lists.infradead.org
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Reported-and-tested-by: syzbot+d7c7a495a5e466c031b6@syzkaller.appspotmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-27 13:12:13 +02:00
Dave Airlie
3e049b6b8f Merge tag 'drm-misc-fixes-2024-05-23' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
Short summary of fixes pull:

buddy:
- stop using PAGE_SIZE

shmem-helper:
- avoid kernel panic in mmap()

tests:
- buddy: fix PAGE_SIZE dependency

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20240523184745.GA11363@localhost.localdomain
2024-05-27 13:47:14 +10:00
Kent Overstreet
9b0abe7948 mm: percpu: Include smp.h in alloc_tag.h
percpu.h depends on smp.h, but doesn't include it directly because of
circular header dependency issues; percpu.h is needed in a bunch of low
level headers.

This fixes a randconfig build error on mips:

  include/linux/alloc_tag.h: In function '__alloc_tag_ref_set':
  include/asm-generic/percpu.h:31:40: error: implicit declaration of function 'raw_smp_processor_id' [-Werror=implicit-function-declaration]

Reported-by: kernel test robot <lkp@intel.com>
Fixes: 24e44cc22a ("mm: percpu: enable per-cpu allocation tagging")
Closes: https://lore.kernel.org/oe-kbuild-all/202405210052.DIrMXJNz-lkp@intel.com/
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-05-26 14:40:39 -07:00
Linus Torvalds
c13320499b Merge tag '6.10-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:

 - two important netfs integration fixes - including for a data
   corruption and also fixes for multiple xfstests

 - reenable swap support over SMB3

* tag '6.10-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: Fix missing set of remote_i_size
  cifs: Fix smb3_insert_range() to move the zero_point
  cifs: update internal version number
  smb3: reenable swapfiles over SMB3 mounts
2024-05-25 22:33:10 -07:00
Linus Torvalds
9b62e02e63 Merge tag 'mm-hotfixes-stable-2024-05-25-09-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
 "16 hotfixes, 11 of which are cc:stable.

  A few nilfs2 fixes, the remainder are for MM: a couple of selftests
  fixes, various singletons fixing various issues in various parts"

* tag 'mm-hotfixes-stable-2024-05-25-09-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm/ksm: fix possible UAF of stable_node
  mm/memory-failure: fix handling of dissolved but not taken off from buddy pages
  mm: /proc/pid/smaps_rollup: avoid skipping vma after getting mmap_lock again
  nilfs2: fix potential hang in nilfs_detach_log_writer()
  nilfs2: fix unexpected freezing of nilfs_segctor_sync()
  nilfs2: fix use-after-free of timer for log writer thread
  selftests/mm: fix build warnings on ppc64
  arm64: patching: fix handling of execmem addresses
  selftests/mm: compaction_test: fix bogus test success and reduce probability of OOM-killer invocation
  selftests/mm: compaction_test: fix incorrect write of zero to nr_hugepages
  selftests/mm: compaction_test: fix bogus test success on Aarch64
  mailmap: update email address for Satya Priya
  mm/huge_memory: don't unpoison huge_zero_folio
  kasan, fortify: properly rename memintrinsics
  lib: add version into /proc/allocinfo output
  mm/vmalloc: fix vmalloc which may return null if called with __GFP_NOFAIL
2024-05-25 15:10:33 -07:00
Linus Torvalds
3a390f24b7 Merge tag 'x86-urgent-2024-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:

 - Fix regressions of the new x86 CPU VFM (vendor/family/model)
   enumeration/matching code

 - Fix crash kernel detection on buggy firmware with
   non-compliant ACPI MADT tables

 - Address Kconfig warning

* tag 'x86-urgent-2024-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Fix x86_match_cpu() to match just X86_VENDOR_INTEL
  crypto: x86/aes-xts - switch to new Intel CPU model defines
  x86/topology: Handle bogus ACPI tables correctly
  x86/kconfig: Select ARCH_WANT_FRAME_POINTERS again when UNWINDER_FRAME_POINTER=y
2024-05-25 14:40:09 -07:00
Daniel Borkmann
3998d18426 netkit: Fix pkt_type override upon netkit pass verdict
When running Cilium connectivity test suite with netkit in L2 mode, we
found that compared to tcx a few tests were failing which pushed traffic
into an L7 proxy sitting in host namespace. The problem in particular is
around the invocation of eth_type_trans() in netkit.

In case of tcx, this is run before the tcx ingress is triggered inside
host namespace and thus if the BPF program uses the bpf_skb_change_type()
helper the newly set type is retained. However, in case of netkit, the
late eth_type_trans() invocation overrides the earlier decision from the
BPF program which eventually leads to the test failure.

Instead of eth_type_trans(), split out the relevant parts, meaning, reset
of mac header and call to eth_skb_pkt_type() before the BPF program is run
in order to have the same behavior as with tcx, and refactor a small helper
called eth_skb_pull_mac() which is run in case it's passed up the stack
where the mac header must be pulled. With this all connectivity tests pass.

Fixes: 35dfaad718 ("netkit, bpf: Add bpf programmable net device")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20240524163619.26001-2-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-25 10:48:57 -07:00
Linus Torvalds
56fb6f9285 Merge tag 'drm-next-2024-05-25' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
 "Some fixes for the end of the merge window, mostly amdgpu and panthor,
  with one nouveau uAPI change that fixes a bad decision we made a few
  months back.

  nouveau:
   - fix bo metadata uAPI for vm bind

  panthor:
   - Fixes for panthor's heap logical block.
   - Reset on unrecoverable fault
   - Fix VM references.
   - Reset fix.

  xlnx:
   - xlnx compile and doc fixes.

  amdgpu:
   - Handle vbios table integrated info v2.3

  amdkfd:
   - Handle duplicate BOs in reserve_bo_and_cond_vms
   - Handle memory limitations on small APUs

  dp/mst:
   - MST null deref fix.

  bridge:
   - Don't let next bridge create connector in adv7511 to make probe
     work"

* tag 'drm-next-2024-05-25' of https://gitlab.freedesktop.org/drm/kernel:
  drm/amdgpu/atomfirmware: add intergrated info v2.3 table
  drm/mst: Fix NULL pointer dereference at drm_dp_add_payload_part2
  drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs
  drm/amdkfd: handle duplicate BOs in reserve_bo_and_cond_vms
  drm/bridge: adv7511: Attach next bridge without creating connector
  drm/buddy: Fix the warn on's during force merge
  drm/nouveau: use tile_mode and pte_kind for VM_BIND bo allocations
  drm/panthor: Call panthor_sched_post_reset() even if the reset failed
  drm/panthor: Reset the FW VM to NULL on unplug
  drm/panthor: Keep a ref to the VM at the panthor_kernel_bo level
  drm/panthor: Force an immediate reset on unrecoverable faults
  drm/panthor: Document drm_panthor_tiler_heap_destroy::handle validity constraints
  drm/panthor: Fix an off-by-one in the heap context retrieval logic
  drm/panthor: Relax the constraints on the tiler chunk size
  drm/panthor: Make sure the tiler initial/max chunks are consistent
  drm/panthor: Fix tiler OOM handling to allow incremental rendering
  drm: xlnx: zynqmp_dpsub: Fix compilation error
  drm: xlnx: zynqmp_dpsub: Fix few function comments
2024-05-24 17:28:02 -07:00
Linus Torvalds
0b32d436c0 Merge tag 'mm-stable-2024-05-24-11-49' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more mm updates from Andrew Morton:
 "Jeff Xu's implementation of the mseal() syscall"

* tag 'mm-stable-2024-05-24-11-49' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  selftest mm/mseal read-only elf memory segment
  mseal: add documentation
  selftest mm/mseal memory sealing
  mseal: add mseal syscall
  mseal: wire up mseal syscall
2024-05-24 12:47:28 -07:00
Andrey Konovalov
2e577732e8 kasan, fortify: properly rename memintrinsics
After commit 69d4c0d321 ("entry, kasan, x86: Disallow overriding mem*()
functions") and the follow-up fixes, with CONFIG_FORTIFY_SOURCE enabled,
even though the compiler instruments meminstrinsics by generating calls to
__asan/__hwasan_ prefixed functions, FORTIFY_SOURCE still uses
uninstrumented memset/memmove/memcpy as the underlying functions.

As a result, KASAN cannot detect bad accesses in memset/memmove/memcpy. 
This also makes KASAN tests corrupt kernel memory and cause crashes.

To fix this, use __asan_/__hwasan_memset/memmove/memcpy as the underlying
functions whenever appropriate.  Do this only for the instrumented code
(as indicated by __SANITIZE_ADDRESS__).

Link: https://lkml.kernel.org/r/20240517130118.759301-1-andrey.konovalov@linux.dev
Fixes: 69d4c0d321 ("entry, kasan, x86: Disallow overriding mem*() functions")
Fixes: 51287dcb00 ("kasan: emit different calls for instrumentable memintrinsics")
Fixes: 36be5cba99 ("kasan: treat meminstrinsic as builtins in uninstrumented files")
Signed-off-by: Andrey Konovalov <andreyknvl@gmail.com>
Reported-by: Erhard Furtner <erhard_f@mailbox.org>
Reported-by: Nico Pache <npache@redhat.com>
Closes: https://lore.kernel.org/all/20240501144156.17e65021@outsider.home/
Reviewed-by: Marco Elver <elver@google.com>
Tested-by: Nico Pache <npache@redhat.com>
Acked-by: Nico Pache <npache@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-24 11:55:05 -07:00
Linus Torvalds
f1f9984fdc Merge tag 'riscv-for-linus-6.10-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull more RISC-V updates from Palmer Dabbelt:

 - The compression format used for boot images is now configurable at
   build time, and these formats are shown in `make help`

 - access_ok() has been optimized

 - A pair of performance bugs have been fixed in the uaccess handlers

 - Various fixes and cleanups, including one for the IMSIC build failure
   and one for the early-boot ftrace illegal NOPs bug

* tag 'riscv-for-linus-6.10-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: Fix early ftrace nop patching
  irqchip: riscv-imsic: Fixup riscv_ipi_set_virq_range() conflict
  riscv: selftests: Add signal handling vector tests
  riscv: mm: accelerate pagefault when badaccess
  riscv: uaccess: Relax the threshold for fast path
  riscv: uaccess: Allow the last potential unrolled copy
  riscv: typo in comment for get_f64_reg
  Use bool value in set_cpu_online()
  riscv: selftests: Add hwprobe binaries to .gitignore
  riscv: stacktrace: fixed walk_stackframe()
  ftrace: riscv: move from REGS to ARGS
  riscv: do not select MODULE_SECTIONS by default
  riscv: show help string for riscv-specific targets
  riscv: make image compression configurable
  riscv: cpufeature: Fix extension subset checking
  riscv: cpufeature: Fix thead vector hwcap removal
  riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context
  riscv: force PAGE_SIZE linear mapping if debug_pagealloc is enabled
  riscv: Define TASK_SIZE_MAX for __access_ok()
  riscv: Remove PGDIR_SIZE_L3 and TASK_SIZE_MIN
2024-05-24 10:46:35 -07:00
Linus Torvalds
041c9f71a4 Merge tag 'sound-fix-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
 "A collection of small fixes for 6.10-rc1. Most of changes are various
  device-specific fixes and quirks, while there are a few small changes
  in ALSA core timer and module / built-in fixes"

* tag 'sound-fix-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/realtek: fix mute/micmute LEDs don't work for ProBook 440/460 G11.
  ALSA: core: Enable proc module when CONFIG_MODULES=y
  ALSA: core: Fix NULL module pointer assignment at card init
  ALSA: hda/realtek: Enable headset mic of JP-IK LEAP W502 with ALC897
  ASoC: dt-bindings: stm32: Ensure compatible pattern matches whole string
  ASoC: tas2781: Fix wrong loading calibrated data sequence
  ASoC: tas2552: Add TX path for capturing AUDIO-OUT data
  ALSA: usb-audio: Fix for sampling rates support for Mbox3
  Documentation: sound: Fix trailing whitespaces
  ALSA: timer: Set lower bound of start tick time
  ASoC: codecs: ES8326: solve hp and button detect issue
  ASoC: rt5645: mic-in detection threshold modification
  ASoC: Intel: sof_sdw_rt_sdca_jack_common: Use name_prefix for `-sdca` detection
2024-05-24 08:48:51 -07:00
Gal Pressman
1b9f86c6d5 net/mlx5: Fix MTMP register capability offset in MCAM register
The MTMP register (0x900a) capability offset is off-by-one, move it to
the right place.

Fixes: 1f507e80c7 ("net/mlx5: Expose NIC temperature via hardware monitoring kernel API")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-24 13:27:07 +01:00
Xu Yang
79c1374548 filemap: add helper mapping_max_folio_size()
Add mapping_max_folio_size() to get the maximum folio size for this
pagecache mapping.

Fixes: 5d8edfb900 ("iomap: Copy larger chunks from userspace")
Cc: stable@vger.kernel.org
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Link: https://lore.kernel.org/r/20240521114939.2541461-1-xu.yang_2@nxp.com
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-24 13:34:06 +02:00
Matt Jan
06e785aeb9 connector: Fix invalid conversion in cn_proc.h
The implicit conversion from unsigned int to enum
proc_cn_event is invalid, so explicitly cast it
for compilation in a C++ compiler.
/usr/include/linux/cn_proc.h: In function 'proc_cn_event valid_event(proc_cn_event)':
/usr/include/linux/cn_proc.h:72:17: error: invalid conversion from 'unsigned int' to 'proc_cn_event' [-fpermissive]
   72 |         ev_type &= PROC_EVENT_ALL;
      |                 ^
      |                 |
      |                 unsigned int

Signed-off-by: Matt Jan <zoo868e@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-24 10:36:55 +01:00
Jeff Xu
8be7258aad mseal: add mseal syscall
The new mseal() is an syscall on 64 bit CPU, and with following signature:

int mseal(void addr, size_t len, unsigned long flags)
addr/len: memory range.
flags: reserved.

mseal() blocks following operations for the given memory range.

1> Unmapping, moving to another location, and shrinking the size,
   via munmap() and mremap(), can leave an empty space, therefore can
   be replaced with a VMA with a new set of attributes.

2> Moving or expanding a different VMA into the current location,
   via mremap().

3> Modifying a VMA via mmap(MAP_FIXED).

4> Size expansion, via mremap(), does not appear to pose any specific
   risks to sealed VMAs. It is included anyway because the use case is
   unclear. In any case, users can rely on merging to expand a sealed VMA.

5> mprotect() and pkey_mprotect().

6> Some destructive madvice() behaviors (e.g. MADV_DONTNEED) for anonymous
   memory, when users don't have write permission to the memory. Those
   behaviors can alter region contents by discarding pages, effectively a
   memset(0) for anonymous memory.

Following input during RFC are incooperated into this patch:

Jann Horn: raising awareness and providing valuable insights on the
destructive madvise operations.
Linus Torvalds: assisting in defining system call signature and scope.
Liam R. Howlett: perf optimization.
Theo de Raadt: sharing the experiences and insight gained from
  implementing mimmutable() in OpenBSD.

Finally, the idea that inspired this patch comes from Stephen Röttger's
work in Chrome V8 CFI.

[jeffxu@chromium.org: add branch prediction hint, per Pedro]
  Link: https://lkml.kernel.org/r/20240423192825.1273679-2-jeffxu@chromium.org
Link: https://lkml.kernel.org/r/20240415163527.626541-3-jeffxu@chromium.org
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jeff Xu <jeffxu@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Stephen Röttger <sroettger@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Amer Al Shanawany <amer.shanawany@gmail.com>
Cc: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-23 19:40:26 -07:00
Jeff Xu
ff388fe5c4 mseal: wire up mseal syscall
Patch series "Introduce mseal", v10.

This patchset proposes a new mseal() syscall for the Linux kernel.

In a nutshell, mseal() protects the VMAs of a given virtual memory range
against modifications, such as changes to their permission bits.

Modern CPUs support memory permissions, such as the read/write (RW) and
no-execute (NX) bits.  Linux has supported NX since the release of kernel
version 2.6.8 in August 2004 [1].  The memory permission feature improves
the security stance on memory corruption bugs, as an attacker cannot
simply write to arbitrary memory and point the code to it.  The memory
must be marked with the X bit, or else an exception will occur. 
Internally, the kernel maintains the memory permissions in a data
structure called VMA (vm_area_struct).  mseal() additionally protects the
VMA itself against modifications of the selected seal type.

Memory sealing is useful to mitigate memory corruption issues where a
corrupted pointer is passed to a memory management system.  For example,
such an attacker primitive can break control-flow integrity guarantees
since read-only memory that is supposed to be trusted can become writable
or .text pages can get remapped.  Memory sealing can automatically be
applied by the runtime loader to seal .text and .rodata pages and
applications can additionally seal security critical data at runtime.  A
similar feature already exists in the XNU kernel with the
VM_FLAGS_PERMANENT [3] flag and on OpenBSD with the mimmutable syscall
[4].  Also, Chrome wants to adopt this feature for their CFI work [2] and
this patchset has been designed to be compatible with the Chrome use case.

Two system calls are involved in sealing the map:  mmap() and mseal().

The new mseal() is an syscall on 64 bit CPU, and with following signature:

int mseal(void addr, size_t len, unsigned long flags)
addr/len: memory range.
flags: reserved.

mseal() blocks following operations for the given memory range.

1> Unmapping, moving to another location, and shrinking the size,
   via munmap() and mremap(), can leave an empty space, therefore can
   be replaced with a VMA with a new set of attributes.

2> Moving or expanding a different VMA into the current location,
   via mremap().

3> Modifying a VMA via mmap(MAP_FIXED).

4> Size expansion, via mremap(), does not appear to pose any specific
   risks to sealed VMAs. It is included anyway because the use case is
   unclear. In any case, users can rely on merging to expand a sealed VMA.

5> mprotect() and pkey_mprotect().

6> Some destructive madvice() behaviors (e.g. MADV_DONTNEED) for anonymous
   memory, when users don't have write permission to the memory. Those
   behaviors can alter region contents by discarding pages, effectively a
   memset(0) for anonymous memory.

The idea that inspired this patch comes from Stephen Röttger’s work in
V8 CFI [5].  Chrome browser in ChromeOS will be the first user of this
API.

Indeed, the Chrome browser has very specific requirements for sealing,
which are distinct from those of most applications.  For example, in the
case of libc, sealing is only applied to read-only (RO) or read-execute
(RX) memory segments (such as .text and .RELRO) to prevent them from
becoming writable, the lifetime of those mappings are tied to the lifetime
of the process.

Chrome wants to seal two large address space reservations that are managed
by different allocators.  The memory is mapped RW- and RWX respectively
but write access to it is restricted using pkeys (or in the future ARM
permission overlay extensions).  The lifetime of those mappings are not
tied to the lifetime of the process, therefore, while the memory is
sealed, the allocators still need to free or discard the unused memory. 
For example, with madvise(DONTNEED).

However, always allowing madvise(DONTNEED) on this range poses a security
risk.  For example if a jump instruction crosses a page boundary and the
second page gets discarded, it will overwrite the target bytes with zeros
and change the control flow.  Checking write-permission before the discard
operation allows us to control when the operation is valid.  In this case,
the madvise will only succeed if the executing thread has PKEY write
permissions and PKRU changes are protected in software by control-flow
integrity.

Although the initial version of this patch series is targeting the Chrome
browser as its first user, it became evident during upstream discussions
that we would also want to ensure that the patch set eventually is a
complete solution for memory sealing and compatible with other use cases. 
The specific scenario currently in mind is glibc's use case of loading and
sealing ELF executables.  To this end, Stephen is working on a change to
glibc to add sealing support to the dynamic linker, which will seal all
non-writable segments at startup.  Once this work is completed, all
applications will be able to automatically benefit from these new
protections.

In closing, I would like to formally acknowledge the valuable
contributions received during the RFC process, which were instrumental in
shaping this patch:

Jann Horn: raising awareness and providing valuable insights on the
  destructive madvise operations.
Liam R. Howlett: perf optimization.
Linus Torvalds: assisting in defining system call signature and scope.
Theo de Raadt: sharing the experiences and insight gained from
  implementing mimmutable() in OpenBSD.

MM perf benchmarks
==================
This patch adds a loop in the mprotect/munmap/madvise(DONTNEED) to
check the VMAs’ sealing flag, so that no partial update can be made,
when any segment within the given memory range is sealed.

To measure the performance impact of this loop, two tests are developed.
[8]

The first is measuring the time taken for a particular system call,
by using clock_gettime(CLOCK_MONOTONIC). The second is using
PERF_COUNT_HW_REF_CPU_CYCLES (exclude user space). Both tests have
similar results.

The tests have roughly below sequence:
for (i = 0; i < 1000, i++)
    create 1000 mappings (1 page per VMA)
    start the sampling
    for (j = 0; j < 1000, j++)
        mprotect one mapping
    stop and save the sample
    delete 1000 mappings
calculates all samples.

Below tests are performed on Intel(R) Pentium(R) Gold 7505 @ 2.00GHz,
4G memory, Chromebook.

Based on the latest upstream code:
The first test (measuring time)
syscall__	vmas	t	t_mseal	delta_ns	per_vma	%
munmap__  	1	909	944	35	35	104%
munmap__  	2	1398	1502	104	52	107%
munmap__  	4	2444	2594	149	37	106%
munmap__  	8	4029	4323	293	37	107%
munmap__  	16	6647	6935	288	18	104%
munmap__  	32	11811	12398	587	18	105%
mprotect	1	439	465	26	26	106%
mprotect	2	1659	1745	86	43	105%
mprotect	4	3747	3889	142	36	104%
mprotect	8	6755	6969	215	27	103%
mprotect	16	13748	14144	396	25	103%
mprotect	32	27827	28969	1142	36	104%
madvise_	1	240	262	22	22	109%
madvise_	2	366	442	76	38	121%
madvise_	4	623	751	128	32	121%
madvise_	8	1110	1324	215	27	119%
madvise_	16	2127	2451	324	20	115%
madvise_	32	4109	4642	534	17	113%

The second test (measuring cpu cycle)
syscall__	vmas	cpu	cmseal	delta_cpu	per_vma	%
munmap__	1	1790	1890	100	100	106%
munmap__	2	2819	3033	214	107	108%
munmap__	4	4959	5271	312	78	106%
munmap__	8	8262	8745	483	60	106%
munmap__	16	13099	14116	1017	64	108%
munmap__	32	23221	24785	1565	49	107%
mprotect	1	906	967	62	62	107%
mprotect	2	3019	3203	184	92	106%
mprotect	4	6149	6569	420	105	107%
mprotect	8	9978	10524	545	68	105%
mprotect	16	20448	21427	979	61	105%
mprotect	32	40972	42935	1963	61	105%
madvise_	1	434	497	63	63	115%
madvise_	2	752	899	147	74	120%
madvise_	4	1313	1513	200	50	115%
madvise_	8	2271	2627	356	44	116%
madvise_	16	4312	4883	571	36	113%
madvise_	32	8376	9319	943	29	111%

Based on the result, for 6.8 kernel, sealing check adds
20-40 nano seconds, or around 50-100 CPU cycles, per VMA.

In addition, I applied the sealing to 5.10 kernel:
The first test (measuring time)
syscall__	vmas	t	tmseal	delta_ns	per_vma	%
munmap__	1	357	390	33	33	109%
munmap__	2	442	463	21	11	105%
munmap__	4	614	634	20	5	103%
munmap__	8	1017	1137	120	15	112%
munmap__	16	1889	2153	263	16	114%
munmap__	32	4109	4088	-21	-1	99%
mprotect	1	235	227	-7	-7	97%
mprotect	2	495	464	-30	-15	94%
mprotect	4	741	764	24	6	103%
mprotect	8	1434	1437	2	0	100%
mprotect	16	2958	2991	33	2	101%
mprotect	32	6431	6608	177	6	103%
madvise_	1	191	208	16	16	109%
madvise_	2	300	324	24	12	108%
madvise_	4	450	473	23	6	105%
madvise_	8	753	806	53	7	107%
madvise_	16	1467	1592	125	8	108%
madvise_	32	2795	3405	610	19	122%
					
The second test (measuring cpu cycle)
syscall__	nbr_vma	cpu	cmseal	delta_cpu	per_vma	%
munmap__	1	684	715	31	31	105%
munmap__	2	861	898	38	19	104%
munmap__	4	1183	1235	51	13	104%
munmap__	8	1999	2045	46	6	102%
munmap__	16	3839	3816	-23	-1	99%
munmap__	32	7672	7887	216	7	103%
mprotect	1	397	443	46	46	112%
mprotect	2	738	788	50	25	107%
mprotect	4	1221	1256	35	9	103%
mprotect	8	2356	2429	72	9	103%
mprotect	16	4961	4935	-26	-2	99%
mprotect	32	9882	10172	291	9	103%
madvise_	1	351	380	29	29	108%
madvise_	2	565	615	49	25	109%
madvise_	4	872	933	61	15	107%
madvise_	8	1508	1640	132	16	109%
madvise_	16	3078	3323	245	15	108%
madvise_	32	5893	6704	811	25	114%

For 5.10 kernel, sealing check adds 0-15 ns in time, or 10-30
CPU cycles, there is even decrease in some cases.

It might be interesting to compare 5.10 and 6.8 kernel
The first test (measuring time)
syscall__	vmas	t_5_10	t_6_8	delta_ns	per_vma	%
munmap__	1	357	909	552	552	254%
munmap__	2	442	1398	956	478	316%
munmap__	4	614	2444	1830	458	398%
munmap__	8	1017	4029	3012	377	396%
munmap__	16	1889	6647	4758	297	352%
munmap__	32	4109	11811	7702	241	287%
mprotect	1	235	439	204	204	187%
mprotect	2	495	1659	1164	582	335%
mprotect	4	741	3747	3006	752	506%
mprotect	8	1434	6755	5320	665	471%
mprotect	16	2958	13748	10790	674	465%
mprotect	32	6431	27827	21397	669	433%
madvise_	1	191	240	49	49	125%
madvise_	2	300	366	67	33	122%
madvise_	4	450	623	173	43	138%
madvise_	8	753	1110	357	45	147%
madvise_	16	1467	2127	660	41	145%
madvise_	32	2795	4109	1314	41	147%

The second test (measuring cpu cycle)
syscall__	vmas	cpu_5_10	c_6_8	delta_cpu	per_vma	%
munmap__	1	684	1790	1106	1106	262%
munmap__	2	861	2819	1958	979	327%
munmap__	4	1183	4959	3776	944	419%
munmap__	8	1999	8262	6263	783	413%
munmap__	16	3839	13099	9260	579	341%
munmap__	32	7672	23221	15549	486	303%
mprotect	1	397	906	509	509	228%
mprotect	2	738	3019	2281	1140	409%
mprotect	4	1221	6149	4929	1232	504%
mprotect	8	2356	9978	7622	953	423%
mprotect	16	4961	20448	15487	968	412%
mprotect	32	9882	40972	31091	972	415%
madvise_	1	351	434	82	82	123%
madvise_	2	565	752	186	93	133%
madvise_	4	872	1313	442	110	151%
madvise_	8	1508	2271	763	95	151%
madvise_	16	3078	4312	1234	77	140%
madvise_	32	5893	8376	2483	78	142%

From 5.10 to 6.8
munmap: added 250-550 ns in time, or 500-1100 in cpu cycle, per vma.
mprotect: added 200-750 ns in time, or 500-1200 in cpu cycle, per vma.
madvise: added 33-50 ns in time, or 70-110 in cpu cycle, per vma.

In comparison to mseal, which adds 20-40 ns or 50-100 CPU cycles, the
increase from 5.10 to 6.8 is significantly larger, approximately ten times
greater for munmap and mprotect.

When I discuss the mm performance with Brian Makin, an engineer who worked
on performance, it was brought to my attention that such performance
benchmarks, which measuring millions of mm syscall in a tight loop, may
not accurately reflect real-world scenarios, such as that of a database
service.  Also this is tested using a single HW and ChromeOS, the data
from another HW or distribution might be different.  It might be best to
take this data with a grain of salt.


This patch (of 5):

Wire up mseal syscall for all architectures.

Link: https://lkml.kernel.org/r/20240415163527.626541-1-jeffxu@chromium.org
Link: https://lkml.kernel.org/r/20240415163527.626541-2-jeffxu@chromium.org
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Jann Horn <jannh@google.com> [Bug #2]
Cc: Jeff Xu <jeffxu@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Stephen Röttger <sroettger@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Amer Al Shanawany <amer.shanawany@gmail.com>
Cc: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-23 19:40:26 -07:00
Linus Torvalds
6d69b6c12f Merge tag 'nfs-for-6.10-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
 "Stable fixes:
   - nfs: fix undefined behavior in nfs_block_bits()
   - NFSv4.2: Fix READ_PLUS when server doesn't support OP_READ_PLUS

  Bugfixes:
   - Fix mixing of the lock/nolock and local_lock mount options
   - NFSv4: Fixup smatch warning for ambiguous return
   - NFSv3: Fix remount when using the legacy binary mount api
   - SUNRPC: Fix the handling of expired RPCSEC_GSS contexts
   - SUNRPC: fix the NFSACL RPC retries when soft mounts are enabled
   - rpcrdma: fix handling for RDMA_CM_EVENT_DEVICE_REMOVAL

  Features and cleanups:
   - NFSv3: Use the atomic_open API to fix open(O_CREAT|O_TRUNC)
   - pNFS/filelayout: S layout segment range in LAYOUTGET
   - pNFS: rework pnfs_generic_pg_check_layout to check IO range
   - NFSv2: Turn off enabling of NFS v2 by default"

* tag 'nfs-for-6.10-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  nfs: fix undefined behavior in nfs_block_bits()
  pNFS: rework pnfs_generic_pg_check_layout to check IO range
  pNFS/filelayout: check layout segment range
  pNFS/filelayout: fixup pNfs allocation modes
  rpcrdma: fix handling for RDMA_CM_EVENT_DEVICE_REMOVAL
  NFS: Don't enable NFS v2 by default
  NFS: Fix READ_PLUS when server doesn't support OP_READ_PLUS
  sunrpc: fix NFSACL RPC retry on soft mount
  SUNRPC: fix handling expired GSS context
  nfs: keep server info for remounts
  NFSv4: Fixup smatch warning for ambiguous return
  NFS: make sure lock/nolock overriding local_lock mount option
  NFS: add atomic_open for NFSv3 to handle O_TRUNC correctly.
  pNFS/filelayout: Specify the layout segment range in LAYOUTGET
  pNFS/filelayout: Remove the whole file layout requirement
2024-05-23 13:51:09 -07:00
Linus Torvalds
b4d88a60fe Merge tag 'block-6.10-20240523' of git://git.kernel.dk/linux
Pull more block updates from Jens Axboe:
 "Followup block updates, mostly due to NVMe being a bit late to the
  party. But nothing major in there, so not a big deal.

  In detail, this contains:

   - NVMe pull request via Keith:
       - Fabrics connection retries (Daniel, Hannes)
       - Fabrics logging enhancements (Tokunori)
       - RDMA delete optimization (Sagi)

   - ublk DMA alignment fix (me)

   - null_blk sparse warning fixes (Bart)

   - Discard support for brd (Keith)

   - blk-cgroup list corruption fixes (Ming)

   - blk-cgroup stat propagation fix (Waiman)

   - Regression fix for plugging stall with md (Yu)

   - Misc fixes or cleanups (David, Jeff, Justin)"

* tag 'block-6.10-20240523' of git://git.kernel.dk/linux: (24 commits)
  null_blk: fix null-ptr-dereference while configuring 'power' and 'submit_queues'
  blk-throttle: remove unused struct 'avg_latency_bucket'
  block: fix lost bio for plug enabled bio based device
  block: t10-pi: add MODULE_DESCRIPTION()
  blk-mq: add helper for checking if one CPU is mapped to specified hctx
  blk-cgroup: Properly propagate the iostat update up the hierarchy
  blk-cgroup: fix list corruption from reorder of WRITE ->lqueued
  blk-cgroup: fix list corruption from resetting io stat
  cdrom: rearrange last_media_change check to avoid unintentional overflow
  nbd: Fix signal handling
  nbd: Remove a local variable from nbd_send_cmd()
  nbd: Improve the documentation of the locking assumptions
  nbd: Remove superfluous casts
  nbd: Use NULL to represent a pointer
  brd: implement discard support
  null_blk: Fix two sparse warnings
  ublk_drv: set DMA alignment mask to 3
  nvme-rdma, nvme-tcp: include max reconnects for reconnect logging
  nvmet-rdma: Avoid o(n^2) loop in delete_ctrl
  nvme: do not retry authentication failures
  ...
2024-05-23 13:44:47 -07:00
Linus Torvalds
c2c80ecdb4 Merge tag 'regulator-fix-v6.10-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
 "A bunch of fixes that came in during the merge window.

  Matti found several issues with some of the more complexly configured
  Rohm regulators and the helpers they use and there were some errors in
  the specification of tps6594 when regulators are grouped together"

* tag 'regulator-fix-v6.10-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: tps6594-regulator: Correct multi-phase configuration
  regulator: tps6287x: Force writing VSEL bit
  regulator: pickable ranges: don't always cache vsel
  regulator: rohm-regulator: warn if unsupported voltage is set
  regulator: bd71828: Don't overwrite runtime voltages
2024-05-23 13:39:42 -07:00
Linus Torvalds
d6a326d694 Merge tag 'trace-assign-str-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing cleanup from Steven Rostedt:
 "Remove second argument of __assign_str()

  The __assign_str() macro logic of the TRACE_EVENT() macro was
  optimized so that it no longer needs the second argument. The
  __assign_str() is always matched with __string() field that takes a
  field name and the source for that field:

    __string(field, source)

  The TRACE_EVENT() macro logic will save off the source value and then
  use that value to copy into the ring buffer via the __assign_str().

  Before commit c1fa617cae ("tracing: Rework __assign_str() and
  __string() to not duplicate getting the string"), the __assign_str()
  needed the second argument which would perform the same logic as the
  __string() source parameter did. Not only would this add overhead, but
  it was error prone as if the __assign_str() source produced something
  different, it may not have allocated enough for the string in the ring
  buffer (as the __string() source was used to determine how much to
  allocate)

  Now that the __assign_str() just uses the same string that was used in
  __string() it no longer needs the source parameter. It can now be
  removed"

* tag 'trace-assign-str-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing/treewide: Remove second parameter of __assign_str()
2024-05-23 12:28:01 -07:00
Linus Torvalds
2ef32ad224 Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio updates from Michael Tsirkin:
 "Several new features here:

   - virtio-net is finally supported in vduse

   - virtio (balloon and mem) interaction with suspend is improved

   - vhost-scsi now handles signals better/faster

  And fixes, cleanups all over the place"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (48 commits)
  virtio-pci: Check if is_avq is NULL
  virtio: delete vq in vp_find_vqs_msix() when request_irq() fails
  MAINTAINERS: add Eugenio Pérez as reviewer
  vhost-vdpa: Remove usage of the deprecated ida_simple_xx() API
  vp_vdpa: don't allocate unused msix vectors
  sound: virtio: drop owner assignment
  fuse: virtio: drop owner assignment
  scsi: virtio: drop owner assignment
  rpmsg: virtio: drop owner assignment
  nvdimm: virtio_pmem: drop owner assignment
  wifi: mac80211_hwsim: drop owner assignment
  vsock/virtio: drop owner assignment
  net: 9p: virtio: drop owner assignment
  net: virtio: drop owner assignment
  net: caif: virtio: drop owner assignment
  misc: nsm: drop owner assignment
  iommu: virtio: drop owner assignment
  drm/virtio: drop owner assignment
  gpio: virtio: drop owner assignment
  firmware: arm_scmi: virtio: drop owner assignment
  ...
2024-05-23 12:04:36 -07:00
Takashi Iwai
d001e978c1 Merge tag 'asoc-fix-v6.10-merge-window' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v6.10

A bunch of fixes that came in during the merge window, all driver
specific and none of them especially remarkable.
2024-05-23 13:29:27 +02:00
Dave Airlie
32a0bb7ef2 Merge tag 'drm-misc-next-fixes-2024-05-23' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
drm-misc-next-fixes for v6.10-rc1:
- MST null deref fix.
- Don't let next bridge create connector in adv7511 to make probe work.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/f171b14a-ed6b-4124-893b-802a336dbe2b@linux.intel.com
2024-05-23 16:01:39 +10:00
Linus Torvalds
c760b3725e Merge tag 'mm-nonmm-stable-2024-05-22-17-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more non-mm updates from Andrew Morton:

 - A series ("kbuild: enable more warnings by default") from Arnd
   Bergmann which enables a number of additional build-time warnings. We
   fixed all the fallout which we could find, there may still be a few
   stragglers.

 - Samuel Holland has developed the series "Unified cross-architecture
   kernel-mode FPU API". This does a lot of consolidation of
   per-architecture kernel-mode FPU usage and enables the use of newer
   AMD GPUs on RISC-V.

 - Tao Su has fixed some selftests build warnings in the series
   "Selftests: Fix compilation warnings due to missing _GNU_SOURCE
   definition".

 - This pull also includes a nilfs2 fixup from Ryusuke Konishi.

* tag 'mm-nonmm-stable-2024-05-22-17-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (23 commits)
  nilfs2: make block erasure safe in nilfs_finish_roll_forward()
  selftests/harness: use 1024 in place of LINE_MAX
  Revert "selftests/harness: remove use of LINE_MAX"
  selftests/fpu: allow building on other architectures
  selftests/fpu: move FP code to a separate translation unit
  drm/amd/display: use ARCH_HAS_KERNEL_FPU_SUPPORT
  drm/amd/display: only use hard-float, not altivec on powerpc
  riscv: add support for kernel-mode FPU
  x86: implement ARCH_HAS_KERNEL_FPU_SUPPORT
  powerpc: implement ARCH_HAS_KERNEL_FPU_SUPPORT
  LoongArch: implement ARCH_HAS_KERNEL_FPU_SUPPORT
  lib/raid6: use CC_FLAGS_FPU for NEON CFLAGS
  arm64: crypto: use CC_FLAGS_FPU for NEON CFLAGS
  arm64: implement ARCH_HAS_KERNEL_FPU_SUPPORT
  ARM: crypto: use CC_FLAGS_FPU for NEON CFLAGS
  ARM: implement ARCH_HAS_KERNEL_FPU_SUPPORT
  arch: add ARCH_HAS_KERNEL_FPU_SUPPORT
  x86/fpu: fix asm/fpu/types.h include guard
  kbuild: enable -Wcast-function-type-strict unconditionally
  kbuild: enable -Wformat-truncation on clang
  ...
2024-05-22 18:59:29 -07:00
Linus Torvalds
5c6f4d68e2 Merge tag 'mm-stable-2024-05-22-17-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more mm updates from Andrew Morton:
 "A series from Dave Chinner which cleans up and fixes the handling of
  nested allocations within stackdepot and page-owner"

* tag 'mm-stable-2024-05-22-17-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm/page-owner: use gfp_nested_mask() instead of open coded masking
  stackdepot: use gfp_nested_mask() instead of open coded masking
  mm: lift gfp_kmemleak_mask() to gfp.h
2024-05-22 17:32:04 -07:00