Commit Graph

1324233 Commits

Author SHA1 Message Date
Danilo Krummrich
ea7e18289f rust: implement generic driver registration
Implement the generic `Registration` type and the `RegistrationOps`
trait.

The `Registration` structure is the common type that represents a driver
registration and is typically bound to the lifetime of a module. However,
it doesn't implement actual calls to the kernel's driver core to register
drivers itself.

Instead the `RegistrationOps` trait is provided to subsystems, which have
to implement `RegistrationOps::register` and
`RegistrationOps::unregister`. Subsystems have to provide an
implementation for both of those methods where the subsystem specific
variants to register / unregister a driver have to implemented.

For instance, the PCI subsystem would call __pci_register_driver() from
`RegistrationOps::register` and pci_unregister_driver() from
`DrvierOps::unregister`.

Co-developed-by: Wedson Almeida Filho <wedsonaf@gmail.com>
Signed-off-by: Wedson Almeida Filho <wedsonaf@gmail.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Tested-by: Dirk Behme <dirk.behme@de.bosch.com>
Tested-by: Fabien Parent <fabien.parent@linaro.org>
Link: https://lore.kernel.org/r/20241219170425.12036-3-dakr@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-20 17:19:25 +01:00
Danilo Krummrich
a790265c7f rust: module: add trait ModuleMetadata
In order to access static metadata of a Rust kernel module, add the
`ModuleMetadata` trait.

In particular, this trait provides the name of a Rust kernel module as
specified by the `module!` macro.

Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Tested-by: Dirk Behme <dirk.behme@de.bosch.com>
Tested-by: Fabien Parent <fabien.parent@linaro.org>
Link: https://lore.kernel.org/r/20241219170425.12036-2-dakr@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-20 17:19:25 +01:00
Alice Ryhl
5bcc8bfe84 rust: miscdevice: add fops->show_fdinfo() hook
File descriptors should generally provide a fops->show_fdinfo() hook for
debugging purposes. Thus, add such a hook to the miscdevice
abstractions.

Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/20241203-miscdevice-showfdinfo-v1-1-7e990732d430@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-16 16:12:57 +01:00
Lee Jones
8d9b095b8f samples: rust_misc_device: Provide an example C program to exercise functionality
Here is the expected output (manually spliced together):

USERSPACE: Opening /dev/rust-misc-device for reading and writing
KERNEL: rust_misc_device: Opening Rust Misc Device Sample
USERSPACE: Calling Hello
KERNEL: rust_misc_device: IOCTLing Rust Misc Device Sample
KERNEL: rust_misc_device: -> Hello from the Rust Misc Device
USERSPACE: Fetching initial value
KERNEL: rust_misc_device: IOCTLing Rust Misc Device Sample
KERNEL: rust_misc_device: -> Copying data to userspace (value: 0)
USERSPACE: Submitting new value (1)
KERNEL: rust_misc_device: IOCTLing Rust Misc Device Sample
KERNEL: rust_misc_device: -> Copying data from userspace (value: 1)
USERSPACE: Fetching new value
KERNEL: rust_misc_device: IOCTLing Rust Misc Device Sample
KERNEL: rust_misc_device: -> Copying data to userspace (value: 1)
USERSPACE: Attempting to call in to an non-existent IOCTL
KERNEL: rust_misc_device: IOCTLing Rust Misc Device Sample
KERNEL: rust_misc_device: -> IOCTL not recognised: 20992
USERSPACE: ioctl: Succeeded to fail - this was expected: Inappropriate ioctl for device
USERSPACE: Closing /dev/rust-misc-device
KERNEL: rust_misc_device: Exiting the Rust Misc Device Sample
USERSPACE: Success

Signed-off-by: Lee Jones <lee@kernel.org>
Link: https://lore.kernel.org/r/20241213134715.601415-6-lee@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-16 16:12:56 +01:00
Lee Jones
ff9feb0567 MAINTAINERS: Add Rust Misc Sample to MISC entry
Signed-off-by: Lee Jones <lee@kernel.org>
Link: https://lore.kernel.org/r/20241213134715.601415-5-lee@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-16 16:12:55 +01:00
Lee Jones
42523ceba5 samples: rust_misc_device: Demonstrate additional get/set value functionality
Expand the complexity of the sample driver by providing the ability to
get and set an integer.  The value is protected by a mutex.

Signed-off-by: Lee Jones <lee@kernel.org>
Link: https://lore.kernel.org/r/20241213134715.601415-4-lee@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-16 16:12:53 +01:00
Lee Jones
fdb1ac6c30 samples: rust: Provide example using the new Rust MiscDevice abstraction
This sample driver demonstrates the following basic operations:

* Register a Misc Device
* Create /dev/rust-misc-device
* Provide open call-back for the aforementioned character device
* Operate on the character device via a simple ioctl()
* Provide close call-back for the character device

Signed-off-by: Lee Jones <lee@kernel.org>
Reviewed-by: Danilo Krummrich <dakr@kernel.org>
Link: https://lore.kernel.org/r/20241213134715.601415-3-lee@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-16 16:12:45 +01:00
Lee Jones
4a9ce18874 Documentation: ioctl-number: Carve out some identifiers for use by sample drivers
32 IDs should be plenty (at yet, not too greedy) since a lot of sample
drivers will use their own subsystem allocations.

Sample drivers predominately reside in <KERNEL_ROOT>/samples, but there
should be no issue with in-place example drivers using them.

Signed-off-by: Lee Jones <lee@kernel.org>
Link: https://lore.kernel.org/r/20241213134715.601415-2-lee@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-16 16:12:44 +01:00
Lee Jones
284ae0be4d rust: miscdevice: Provide accessor to pull out miscdevice::this_device
There are situations where a pointer to a `struct device` will become
necessary (e.g. for calling into dev_*() functions).  This accessor
allows callers to pull this out from the `struct miscdevice`.

Signed-off-by: Lee Jones <lee@kernel.org>
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Tested-by: Lee Jones <lee@kernel.org>
Reviewed-by: Danilo Krummrich <dakr@kernel.org>
Link: https://lore.kernel.org/r/20241210-miscdevice-file-param-v3-3-b2a79b666dc5@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-16 16:12:34 +01:00
Alice Ryhl
88441d5c6d rust: miscdevice: access the struct miscdevice from fops->open()
Providing access to the underlying `struct miscdevice` is useful for
various reasons. For example, this allows you access the miscdevice's
internal `struct device` for use with the `dev_*` printing macros.

Note that since the underlying `struct miscdevice` could get freed at
any point after the fops->open() call (if misc_deregister is called),
only the open call is given access to it. To use `dev_*` printing macros
from other fops hooks, take a refcount on `miscdevice->this_device` to
keep it alive. See the linked thread for further discussion on the
lifetime of `struct miscdevice`.

Link: https://lore.kernel.org/r/2024120951-botanist-exhale-4845@gregkh
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Lee Jones <lee@kernel.org>
Tested-by: Lee Jones <lee@kernel.org>
Reviewed-by: Danilo Krummrich <dakr@kernel.org>
Link: https://lore.kernel.org/r/20241210-miscdevice-file-param-v3-2-b2a79b666dc5@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-16 16:12:30 +01:00
Alice Ryhl
0d8a7c7bf4 rust: miscdevice: access file in fops
This allows fops to access information about the underlying struct file
for the miscdevice. For example, the Binder driver needs to inspect the
O_NONBLOCK flag inside the fops->ioctl() hook.

Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Lee Jones <lee@kernel.org>
Tested-by: Lee Jones <lee@kernel.org>
Reviewed-by: Danilo Krummrich <dakr@kernel.org>
Link: https://lore.kernel.org/r/20241210-miscdevice-file-param-v3-1-b2a79b666dc5@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-16 16:12:21 +01:00
Linus Torvalds
78d4f34e21 Linux 6.13-rc3 v6.13-rc3 2024-12-15 15:58:23 -08:00
Linus Torvalds
42a19aa170 Merge tag 'arc-6.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:

 - Sundry build and misc fixes

* tag 'arc-6.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARC: build: Try to guess GCC variant of cross compiler
  ARC: bpf: Correct conditional check in 'check_jmp_32'
  ARC: dts: Replace deprecated snps,nr-gpios property for snps,dw-apb-gpio-port devices
  ARC: build: Use __force to suppress per-CPU cmpxchg warnings
  ARC: fix reference of dependency for PAE40 config
  ARC: build: disallow invalid PAE40 + 4K page config
  arc: rename aux.h to arc_aux.h
2024-12-15 15:38:12 -08:00
Linus Torvalds
7031a38ab7 Merge tag 'efi-fixes-for-v6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI fixes from Ard Biesheuvel:

 - Limit EFI zboot to GZIP and ZSTD before it comes in wider use

 - Fix inconsistent error when looking up a non-existent file in
   efivarfs with a name that does not adhere to the NAME-GUID format

 - Drop some unused code

* tag 'efi-fixes-for-v6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  efi/esrt: remove esre_attribute::store()
  efivarfs: Fix error on non-existent file
  efi/zboot: Limit compression options to GZIP and ZSTD
2024-12-15 15:33:41 -08:00
Linus Torvalds
151167d85a Merge tag 'i2c-for-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
 "i2c host fixes: PNX used the wrong unit for timeouts, Nomadik was
  missing a sentinel, and RIIC was missing rounding up"

* tag 'i2c-for-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: riic: Always round-up when calculating bus period
  i2c: nomadik: Add missing sentinel to match table
  i2c: pnx: Fix timeout in wait functions
2024-12-15 15:29:07 -08:00
Linus Torvalds
dccbe2047a Merge tag 'edac_urgent_for_v6.13_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC fix from Borislav Petkov:

 - Make sure amd64_edac loads successfully on certain Zen4 memory
   configurations

* tag 'edac_urgent_for_v6.13_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC/amd64: Simplify ECC check on unified memory controllers
2024-12-15 10:01:10 -08:00
Linus Torvalds
f7c7a1ba22 Merge tag 'irq_urgent_for_v6.13_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Borislav Petkov:

 - Disable the secure programming interface of the GIC500 chip in the
   RK3399 SoC to fix interrupt priority assignment and even make a dead
   machine boot again when the gic-v3 driver enables pseudo NMIs

 - Correct the declaration of a percpu variable to fix several sparse
   warnings

* tag 'irq_urgent_for_v6.13_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/gic-v3: Work around insecure GIC integrations
  irqchip/gic: Correct declaration of *percpu_base pointer in union gic_base
2024-12-15 09:58:27 -08:00
Linus Torvalds
acd855a949 Merge tag 'sched_urgent_for_v6.13_rc3-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Borislav Petkov:

 - Prevent incorrect dequeueing of the deadline dlserver helper task and
   fix its time accounting

 - Properly track the CFS runqueue runnable stats

 - Check the total number of all queued tasks in a sched fair's runqueue
   hierarchy before deciding to stop the tick

 - Fix the scheduling of the task that got woken last (NEXT_BUDDY) by
   preventing those from being delayed

* tag 'sched_urgent_for_v6.13_rc3-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/dlserver: Fix dlserver time accounting
  sched/dlserver: Fix dlserver double enqueue
  sched/eevdf: More PELT vs DELAYED_DEQUEUE
  sched/fair: Fix sched_can_stop_tick() for fair tasks
  sched/fair: Fix NEXT_BUDDY
2024-12-15 09:38:03 -08:00
Linus Torvalds
81576a9a27 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
 "ARM64:

   - Fix confusion with implicitly-shifted MDCR_EL2 masks breaking
     SPE/TRBE initialization

   - Align nested page table walker with the intended memory attribute
     combining rules of the architecture

   - Prevent userspace from constraining the advertised ASID width,
     avoiding horrors of guest TLBIs not matching the intended context
     in hardware

   - Don't leak references on LPIs when insertion into the translation
     cache fails

  RISC-V:

   - Replace csr_write() with csr_set() for HVIEN PMU overflow bit

  x86:

   - Cache CPUID.0xD XSTATE offsets+sizes during module init

     On Intel's Emerald Rapids CPUID costs hundreds of cycles and there
     are a lot of leaves under 0xD. Getting rid of the CPUIDs during
     nested VM-Enter and VM-Exit is planned for the next release, for
     now just cache them: even on Skylake that is 40% faster"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: Cache CPUID.0xD XSTATE offsets+sizes during module init
  RISC-V: KVM: Fix csr_write -> csr_set for HVIEN PMU overflow bit
  KVM: arm64: vgic-its: Add error handling in vgic_its_cache_translation
  KVM: arm64: Do not allow ID_AA64MMFR0_EL1.ASIDbits to be overridden
  KVM: arm64: Fix S1/S2 combination when FWB==1 and S2 has Device memory type
  arm64: Fix usage of new shifted MDCR_EL2 values
2024-12-15 09:26:13 -08:00
Linus Torvalds
2d8308bf5b Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fix from James Bottomley:
 "Single one-line fix in the ufs driver"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: ufs: core: Update compl_time_stamp_local_clock after completing a cqe
2024-12-14 15:53:02 -08:00
Linus Torvalds
35f301dd45 Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Daniel Borkmann:

 - Fix a bug in the BPF verifier to track changes to packet data
   property for global functions (Eduard Zingerman)

 - Fix a theoretical BPF prog_array use-after-free in RCU handling of
   __uprobe_perf_func (Jann Horn)

 - Fix BPF tracing to have an explicit list of tracepoints and their
   arguments which need to be annotated as PTR_MAYBE_NULL (Kumar
   Kartikeya Dwivedi)

 - Fix a logic bug in the bpf_remove_insns code where a potential error
   would have been wrongly propagated (Anton Protopopov)

 - Avoid deadlock scenarios caused by nested kprobe and fentry BPF
   programs (Priya Bala Govindasamy)

 - Fix a bug in BPF verifier which was missing a size check for
   BTF-based context access (Kumar Kartikeya Dwivedi)

 - Fix a crash found by syzbot through an invalid BPF prog_array access
   in perf_event_detach_bpf_prog (Jiri Olsa)

 - Fix several BPF sockmap bugs including a race causing a refcount
   imbalance upon element replace (Michal Luczaj)

 - Fix a use-after-free from mismatching BPF program/attachment RCU
   flavors (Jann Horn)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: (23 commits)
  bpf: Avoid deadlock caused by nested kprobe and fentry bpf programs
  selftests/bpf: Add tests for raw_tp NULL args
  bpf: Augment raw_tp arguments with PTR_MAYBE_NULL
  bpf: Revert "bpf: Mark raw_tp arguments with PTR_MAYBE_NULL"
  selftests/bpf: Add test for narrow ctx load for pointer args
  bpf: Check size for BTF-based ctx access of pointer members
  selftests/bpf: extend changes_pkt_data with cases w/o subprograms
  bpf: fix null dereference when computing changes_pkt_data of prog w/o subprogs
  bpf: Fix theoretical prog_array UAF in __uprobe_perf_func()
  bpf: fix potential error return
  selftests/bpf: validate that tail call invalidates packet pointers
  bpf: consider that tail calls invalidate packet pointers
  selftests/bpf: freplace tests for tracking of changes_packet_data
  bpf: check changes_pkt_data property for extension programs
  selftests/bpf: test for changing packet data from global functions
  bpf: track changes_pkt_data property for global functions
  bpf: refactor bpf_helper_changes_pkt_data to use helper number
  bpf: add find_containing_subprog() utility function
  bpf,perf: Fix invalid prog_array access in perf_event_detach_bpf_prog
  bpf: Fix UAF via mismatching bpf_prog/attachment RCU flavors
  ...
2024-12-14 12:58:14 -08:00
Priya Bala Govindasamy
c83508da56 bpf: Avoid deadlock caused by nested kprobe and fentry bpf programs
BPF program types like kprobe and fentry can cause deadlocks in certain
situations. If a function takes a lock and one of these bpf programs is
hooked to some point in the function's critical section, and if the
bpf program tries to call the same function and take the same lock it will
lead to deadlock. These situations have been reported in the following
bug reports.

In percpu_freelist -
Link: https://lore.kernel.org/bpf/CAADnVQLAHwsa+2C6j9+UC6ScrDaN9Fjqv1WjB1pP9AzJLhKuLQ@mail.gmail.com/T/
Link: https://lore.kernel.org/bpf/CAPPBnEYm+9zduStsZaDnq93q1jPLqO-PiKX9jy0MuL8LCXmCrQ@mail.gmail.com/T/
In bpf_lru_list -
Link: https://lore.kernel.org/bpf/CAPPBnEajj+DMfiR_WRWU5=6A7KKULdB5Rob_NJopFLWF+i9gCA@mail.gmail.com/T/
Link: https://lore.kernel.org/bpf/CAPPBnEZQDVN6VqnQXvVqGoB+ukOtHGZ9b9U0OLJJYvRoSsMY_g@mail.gmail.com/T/
Link: https://lore.kernel.org/bpf/CAPPBnEaCB1rFAYU7Wf8UxqcqOWKmRPU1Nuzk3_oLk6qXR7LBOA@mail.gmail.com/T/

Similar bugs have been reported by syzbot.
In queue_stack_maps -
Link: https://lore.kernel.org/lkml/0000000000004c3fc90615f37756@google.com/
Link: https://lore.kernel.org/all/20240418230932.2689-1-hdanton@sina.com/T/
In lpm_trie -
Link: https://lore.kernel.org/linux-kernel/00000000000035168a061a47fa38@google.com/T/
In ringbuf -
Link: https://lore.kernel.org/bpf/20240313121345.2292-1-hdanton@sina.com/T/

Prevent kprobe and fentry bpf programs from attaching to these critical
sections by removing CC_FLAGS_FTRACE for percpu_freelist.o,
bpf_lru_list.o, queue_stack_maps.o, lpm_trie.o, ringbuf.o files.

The bugs reported by syzbot are due to tracepoint bpf programs being
called in the critical sections. This patch does not aim to fix deadlocks
caused by tracepoint programs. However, it does prevent deadlocks from
occurring in similar situations due to kprobe and fentry programs.

Signed-off-by: Priya Bala Govindasamy <pgovind2@uci.edu>
Link: https://lore.kernel.org/r/CAPPBnEZpjGnsuA26Mf9kYibSaGLm=oF6=12L21X1GEQdqjLnzQ@mail.gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-14 09:49:27 -08:00
Linus Torvalds
a0e3919a2d Merge tag 'usb-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB driver fixes from Greg KH:
 "Here are some small USB driver fixes for some reported issues.
  Included in here are:

   - typec driver bugfixes

   - u_serial gadget driver bugfix for much reported and discussed issue

   - dwc2 bugfixes

   - midi gadget driver bugfix

   - ehci-hcd driver bugfix

   - other small bugfixes

  All of these have been in linux-next for over a week with no reported
  issues"

* tag 'usb-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: typec: ucsi: Fix connector status writing past buffer size
  usb: typec: ucsi: Fix completion notifications
  usb: dwc2: Fix HCD port connection race
  usb: dwc2: hcd: Fix GetPortStatus & SetPortFeature
  usb: dwc2: Fix HCD resume
  usb: gadget: u_serial: Fix the issue that gs_start_io crashed due to accessing null pointer
  usb: misc: onboard_usb_dev: skip suspend/resume sequence for USB5744 SMBus support
  usb: dwc3: xilinx: make sure pipe clock is deselected in usb2 only mode
  usb: core: hcd: only check primary hcd skip_phy_initialization
  usb: gadget: midi2: Fix interpretation of is_midi1 bits
  usb: dwc3: imx8mp: fix software node kernel dump
  usb: typec: anx7411: fix OF node reference leaks in anx7411_typec_switch_probe()
  usb: typec: anx7411: fix fwnode_handle reference leak
  usb: host: max3421-hcd: Correctly abort a USB request.
  dt-bindings: phy: imx8mq-usb: correct reference to usb-switch.yaml
  usb: ehci-hcd: fix call balance of clocks handling routines
2024-12-14 09:35:22 -08:00
Linus Torvalds
636110be62 Merge tag 'tty-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull serial driver fixes from Greg KH:
 "Here are two small serial driver fixes for 6.13-rc3.  They are:

   - ioport build fallout fix for the 8250 port driver that should
     resolve Guenter's runtime problems

   - sh-sci driver bugfix for a reported problem

  Both of these have been in linux-next for a while with no reported
  issues"

* tag 'tty-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  tty: serial: Work around warning backtrace in serial8250_set_defaults
  serial: sh-sci: Check if TX data was written to device in .tx_empty()
2024-12-14 09:31:19 -08:00
Linus Torvalds
3de4f6d919 Merge tag 'staging-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg KH:
 "Here are some small staging gpib driver build and bugfixes for issues
  that have been much-reported (should finally fix Guenter's build
  issues). There are more of these coming in later -rc releases, but for
  now this should fix the majority of the reported problems.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'staging-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: gpib: Fix i386 build issue
  staging: gpib: Fix faulty workaround for assignment in if
  staging: gpib: Workaround for ppc build failure
  staging: gpib: Make GPIB_NI_PCI_ISA depend on HAS_IOPORT
2024-12-14 09:27:07 -08:00
Linus Torvalds
ec2092915d Merge tag 'v6.13-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
 "Fix a regression in rsassa-pkcs1 as well as a buffer overrun in
  hisilicon/debugfs"

* tag 'v6.13-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: hisilicon/debugfs - fix the struct pointer incorrectly offset problem
  crypto: rsassa-pkcs1 - Copy source data for SG list
2024-12-14 09:15:49 -08:00
Linus Torvalds
7824850768 Merge tag 'rust-fixes-6.13' of https://github.com/Rust-for-Linux/linux
Pull rust fixes from Miguel Ojeda:
 "Toolchain and infrastructure:

   - Set bindgen's Rust target version to prevent issues when
     pairing older rustc releases with newer bindgen releases,
     such as bindgen >= 0.71.0 and rustc < 1.82 due to
     unsafe_extern_blocks.

  drm/panic:

   - Remove spurious empty line detected by a new Clippy warning"

* tag 'rust-fixes-6.13' of https://github.com/Rust-for-Linux/linux:
  rust: kbuild: set `bindgen`'s Rust target version
  drm/panic: remove spurious empty line to clean warning
2024-12-14 08:51:43 -08:00
Linus Torvalds
115c0cc251 Merge tag 'iommu-fixes-v6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux
Pull iommu fixes from Joerg Roedel:

 - Per-domain device-list locking fixes for the AMD IOMMU driver

 - Fix incorrect use of smp_processor_id() in the NVidia-specific part
   of the ARM-SMMU-v3 driver

 - Intel IOMMU driver fixes:
     - Remove cache tags before disabling ATS
     - Avoid draining PRQ in sva mm release path
     - Fix qi_batch NULL pointer with nested parent domain

* tag 'iommu-fixes-v6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
  iommu/vt-d: Avoid draining PRQ in sva mm release path
  iommu/vt-d: Fix qi_batch NULL pointer with nested parent domain
  iommu/vt-d: Remove cache tags before disabling ATS
  iommu/amd: Add lockdep asserts for domain->dev_list
  iommu/amd: Put list_add/del(dev_data) back under the domain->lock
  iommu/tegra241-cmdqv: do not use smp_processor_id in preemptible context
2024-12-14 08:41:40 -08:00
Linus Torvalds
5d97859e17 Merge tag 'ata-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
Pull ata fix from Damien Le Moal:

 - Fix an OF node reference leak in the sata_highbank driver

* tag 'ata-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
  ata: sata_highbank: fix OF node reference leak in highbank_initialize_phys()
2024-12-14 08:40:05 -08:00
Wolfram Sang
5b6b08af1f Merge tag 'i2c-host-fixes-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current
i2c-host-fixes for v6.13-rc3

- Replaced jiffies with msec for timeout calculations.
- Added a sentinel to the 'of_device_id' array in Nomadik.
- Rounded up bus period calculation in RIIC.
2024-12-14 10:01:46 +01:00
Linus Torvalds
a446e965a1 Merge tag '6.13-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:

 - fix rmmod leak

 - two minor cleanups

 - fix for unlink/rename with pending i/o

* tag '6.13-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb: client: destroy cfid_put_wq on module exit
  cifs: Use str_yes_no() helper in cifs_ses_add_channel()
  cifs: Fix rmdir failure due to ongoing I/O on deleted file
  smb3: fix compiler warning in reparse code
2024-12-13 17:36:02 -08:00
Linus Torvalds
f86135613a Merge tag 'spi-fix-v6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
 "A few fairly small fixes for v6.13, the most substatial one being
  disabling STIG mode for Cadence QSPI controllers on Altera SoCFPGA
  platforms since it doesn't work"

* tag 'spi-fix-v6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: spi-cadence-qspi: Disable STIG mode for Altera SoCFPGA.
  spi: rockchip: Fix PM runtime count on no-op cs
  spi: aspeed: Fix an error handling path in aspeed_spi_[read|write]_user()
2024-12-13 17:29:19 -08:00
Linus Torvalds
4e1b4861a2 Merge tag 'regulator-fix-v6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
 "A couple of additional changes, one ensuring we give AXP717 enough
  time to stabilise after changing voltages which fixes serious
  stability issues on some platforms and another documenting the DT
  support required for the Qualcomm WCN6750"

* tag 'regulator-fix-v6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: axp20x: AXP717: set ramp_delay
  regulator: dt-bindings: qcom,qca6390-pmu: document wcn6750-pmu
2024-12-13 17:18:28 -08:00
Linus Torvalds
e72da82d5a Merge tag 'drm-fixes-2024-12-14' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
 "This is the weekly fixes pull for drm. Just has i915, xe and amdgpu
  changes in it. Nothing too major in here:

  i915:
   - Don't use indexed register writes needlessly [dsb]
   - Stop using non-posted DSB writes for legacy LUT [color]
   - Fix NULL pointer dereference in capture_engine
   - Fix memory leak by correcting cache object name in error handler

  xe:
   - Fix a KUNIT test error message (Mirsad Todorovac)
   - Fix an invalidation fence PM ref leak (Daniele)
   - Fix a register pool UAF (Lucas)

  amdgpu:
   - ISP hw init fix
   - SR-IOV fixes
   - Fix contiguous VRAM mapping for UVD on older GPUs
   - Fix some regressions due to drm scheduler changes
   - Workload profile fixes
   - Cleaner shader fix

  amdkfd:
   - Fix DMA map direction for migration
   - Fix a potential null pointer dereference
   - Cacheline size fixes
   - Runtime PM fix"

* tag 'drm-fixes-2024-12-14' of https://gitlab.freedesktop.org/drm/kernel:
  drm/xe/reg_sr: Remove register pool
  drm/xe: Call invalidation_fence_fini for PT inval fences in error state
  drm/xe: fix the ERR_PTR() returned on failure to allocate tiny pt
  drm/amdkfd: pause autosuspend when creating pdd
  drm/amdgpu: fix when the cleaner shader is emitted
  drm/amdgpu: Fix ISP HW init issue
  drm/amdkfd: hard-code MALL cacheline size for gfx11, gfx12
  drm/amdkfd: hard-code cacheline size for gfx11
  drm/amdkfd: Dereference null return value
  drm/i915: Fix memory leak by correcting cache object name in error handler
  drm/i915: Fix NULL pointer dereference in capture_engine
  drm/i915/color: Stop using non-posted DSB writes for legacy LUT
  drm/i915/dsb: Don't use indexed register writes needlessly
  drm/amdkfd: Correct the migration DMA map direction
  drm/amd/pm: Set SMU v13.0.7 default workload type
  drm/amd/pm: Initialize power profile mode
  amdgpu/uvd: get ring reference from rq scheduler
  drm/amdgpu: fix UVD contiguous CS mapping problem
  drm/amdgpu: use sjt mec fw on gfx943 for sriov
  Revert "drm/amdgpu: Fix ISP hw init issue"
2024-12-13 16:58:39 -08:00
Linus Torvalds
974acf9974 Merge tag 'pm-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management documentation fix from Rafael Wysocki:
 "Fix a runtime PM documentation mistake that may mislead someone into
  making a coding mistake (Paul Barker)"

* tag 'pm-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  Documentation: PM: Clarify pm_runtime_resume_and_get() return value
2024-12-13 16:55:43 -08:00
Linus Torvalds
c810e8df96 Merge tag 'acpi-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
 "These fix two coding mistakes, one in the ACPI resources handling code
  and one in ACPICA:

   - Relocate the addr->info.mem.caching check in acpi_decode_space() to
     only execute it if the resource is of the correct type (Ilpo
     Järvinen)

   - Don't release a context_mutex that was never acquired in
     acpi_remove_address_space_handler() (Daniil Tatianin)"

* tag 'acpi-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPICA: events/evxfregn: don't release the ContextMutex that was never acquired
  ACPI: resource: Fix memory resource type union access
2024-12-13 16:51:56 -08:00
Alexei Starovoitov
a8e1a3ddf7 Merge branch 'explicit-raw_tp-null-arguments'
Kumar Kartikeya Dwivedi says:

====================
Explicit raw_tp NULL arguments

This set reverts the raw_tp masking changes introduced in commit
cb4158ce8e ("bpf: Mark raw_tp arguments with PTR_MAYBE_NULL") and
replaces it wwith an explicit list of tracepoints and their arguments
which need to be annotated as PTR_MAYBE_NULL. More context on the
fallout caused by the masking fix and subsequent discussions can be
found in [0].

To remedy this, we implement a solution of explicitly defined tracepoint
and define which args need to be marked NULL or scalar (for IS_ERR
case). The commit logs describes the details of this approach in detail.

We will follow up this solution an approach Eduard is working on to
perform automated analysis of NULL-ness of tracepoint arguments. The
current PoC is available here:

- LLVM branch with the analysis:
  https://github.com/eddyz87/llvm-project/tree/nullness-for-tracepoint-params
- Python script for merging of analysis results:
  https://gist.github.com/eddyz87/e47c164466a60e8d49e6911cff146f47

The idea is to infer a tri-state verdict for each tracepoint parameter:
definitely not null, can be null, unknown (in which case no assumptions
should be made).

Using this information, the verifier in most cases will be able to
precisely determine the state of the tracepoint parameter without any
human effort. At that point, the table maintained manually in this set
can be dropped and replace with this automated analysis tool's result.
This will be kept up to date with each kernel release.

  [0]: https://lore.kernel.org/bpf/20241206161053.809580-1-memxor@gmail.com

Changelog:
----------
v2 -> v3:
v2: https://lore.kernel.org/bpf/20241213175127.2084759-1-memxor@gmail.com

 * Address Eduard's nits, add Reviewed-by

v1 -> v2:
v1: https://lore.kernel.org/bpf/20241211020156.18966-1-memxor@gmail.com

 * Address comments from Jiri
   * Mark module tracepoints args NULL by default
   * Add more sunrpc tracepoints
   * Unify scalar or null handling
 * Address comments from Alexei
   * Use bitmask approach suggested in review
   * Unify scalar or null handling
   * Drop most tests that rely on CONFIG options
   * Drop scripts to generate tests
====================

Link: https://patch.msgid.link/20241213221929.3495062-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-13 16:24:54 -08:00
Kumar Kartikeya Dwivedi
0da1955b5b selftests/bpf: Add tests for raw_tp NULL args
Add tests to ensure that arguments are correctly marked based on their
specified positions, and whether they get marked correctly as maybe
null. For modules, all tracepoint parameters should be marked
PTR_MAYBE_NULL by default.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241213221929.3495062-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-13 16:24:53 -08:00
Kumar Kartikeya Dwivedi
838a10bd2e bpf: Augment raw_tp arguments with PTR_MAYBE_NULL
Arguments to a raw tracepoint are tagged as trusted, which carries the
semantics that the pointer will be non-NULL.  However, in certain cases,
a raw tracepoint argument may end up being NULL. More context about this
issue is available in [0].

Thus, there is a discrepancy between the reality, that raw_tp arguments can
actually be NULL, and the verifier's knowledge, that they are never NULL,
causing explicit NULL check branch to be dead code eliminated.

A previous attempt [1], i.e. the second fixed commit, was made to
simulate symbolic execution as if in most accesses, the argument is a
non-NULL raw_tp, except for conditional jumps.  This tried to suppress
branch prediction while preserving compatibility, but surfaced issues
with production programs that were difficult to solve without increasing
verifier complexity. A more complete discussion of issues and fixes is
available at [2].

Fix this by maintaining an explicit list of tracepoints where the
arguments are known to be NULL, and mark the positional arguments as
PTR_MAYBE_NULL. Additionally, capture the tracepoints where arguments
are known to be ERR_PTR, and mark these arguments as scalar values to
prevent potential dereference.

Each hex digit is used to encode NULL-ness (0x1) or ERR_PTR-ness (0x2),
shifted by the zero-indexed argument number x 4. This can be represented
as follows:
1st arg: 0x1
2nd arg: 0x10
3rd arg: 0x100
... and so on (likewise for ERR_PTR case).

In the future, an automated pass will be used to produce such a list, or
insert __nullable annotations automatically for tracepoints. Each
compilation unit will be analyzed and results will be collated to find
whether a tracepoint pointer is definitely not null, maybe null, or an
unknown state where verifier conservatively marks it PTR_MAYBE_NULL.
A proof of concept of this tool from Eduard is available at [3].

Note that in case we don't find a specification in the raw_tp_null_args
array and the tracepoint belongs to a kernel module, we will
conservatively mark the arguments as PTR_MAYBE_NULL. This is because
unlike for in-tree modules, out-of-tree module tracepoints may pass NULL
freely to the tracepoint. We don't protect against such tracepoints
passing ERR_PTR (which is uncommon anyway), lest we mark all such
arguments as SCALAR_VALUE.

While we are it, let's adjust the test raw_tp_null to not perform
dereference of the skb->mark, as that won't be allowed anymore, and make
it more robust by using inline assembly to test the dead code
elimination behavior, which should still stay the same.

  [0]: https://lore.kernel.org/bpf/ZrCZS6nisraEqehw@jlelli-thinkpadt14gen4.remote.csb
  [1]: https://lore.kernel.org/all/20241104171959.2938862-1-memxor@gmail.com
  [2]: https://lore.kernel.org/bpf/20241206161053.809580-1-memxor@gmail.com
  [3]: https://github.com/eddyz87/llvm-project/tree/nullness-for-tracepoint-params

Reported-by: Juri Lelli <juri.lelli@redhat.com> # original bug
Reported-by: Manu Bretelle <chantra@meta.com> # bugs in masking fix
Fixes: 3f00c52393 ("bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs")
Fixes: cb4158ce8e ("bpf: Mark raw_tp arguments with PTR_MAYBE_NULL")
Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Co-developed-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241213221929.3495062-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-13 16:24:53 -08:00
Kumar Kartikeya Dwivedi
c00d738e16 bpf: Revert "bpf: Mark raw_tp arguments with PTR_MAYBE_NULL"
This patch reverts commit
cb4158ce8e ("bpf: Mark raw_tp arguments with PTR_MAYBE_NULL"). The
patch was well-intended and meant to be as a stop-gap fixing branch
prediction when the pointer may actually be NULL at runtime. Eventually,
it was supposed to be replaced by an automated script or compiler pass
detecting possibly NULL arguments and marking them accordingly.

However, it caused two main issues observed for production programs and
failed to preserve backwards compatibility. First, programs relied on
the verifier not exploring == NULL branch when pointer is not NULL, thus
they started failing with a 'dereference of scalar' error.  Next,
allowing raw_tp arguments to be modified surfaced the warning in the
verifier that warns against reg->off when PTR_MAYBE_NULL is set.

More information, context, and discusson on both problems is available
in [0]. Overall, this approach had several shortcomings, and the fixes
would further complicate the verifier's logic, and the entire masking
scheme would have to be removed eventually anyway.

Hence, revert the patch in preparation of a better fix avoiding these
issues to replace this commit.

  [0]: https://lore.kernel.org/bpf/20241206161053.809580-1-memxor@gmail.com

Reported-by: Manu Bretelle <chantra@meta.com>
Fixes: cb4158ce8e ("bpf: Mark raw_tp arguments with PTR_MAYBE_NULL")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241213221929.3495062-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-13 16:24:53 -08:00
Linus Torvalds
c30c65f3fe Merge tag 'block-6.13-20241213' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:

 - Series from Damien fixing issues with the zoned write plugging

 - Fix for a potential UAF in block cgroups

 - Fix deadlock around queue freezing and the sysfs lock

 - Various little cleanups and fixes

* tag 'block-6.13-20241213' of git://git.kernel.dk/linux:
  block: Fix potential deadlock while freezing queue and acquiring sysfs_lock
  block: Fix queue_iostats_passthrough_show()
  blk-mq: Clean up blk_mq_requeue_work()
  mq-deadline: Remove a local variable
  blk-iocost: Avoid using clamp() on inuse in __propagate_weights()
  block: Make bio_iov_bvec_set() accept pointer to const iov_iter
  block: get wp_offset by bdev_offset_from_zone_start
  blk-cgroup: Fix UAF in blkcg_unpin_online()
  MAINTAINERS: update Coly Li's email address
  block: Prevent potential deadlocks in zone write plug error recovery
  dm: Fix dm-zoned-reclaim zone write pointer alignment
  block: Ignore REQ_NOWAIT for zone reset and zone finish operations
  block: Use a zone write plug BIO work for REQ_NOWAIT BIOs
2024-12-13 15:10:59 -08:00
Linus Torvalds
2a816e4f6a Merge tag 'io_uring-6.13-20241213' of git://git.kernel.dk/linux
Pull io_uring fix from Jens Axboe:
 "A single fix for a regression introduced in the 6.13 merge window"

* tag 'io_uring-6.13-20241213' of git://git.kernel.dk/linux:
  io_uring/rsrc: don't put/free empty buffers
2024-12-13 15:08:55 -08:00
Linus Torvalds
1c021e7908 Merge tag 'libnvdimm-fixes-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fix from Ira Weiny:

 - sysbot fix for out of bounds access

* tag 'libnvdimm-fixes-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  acpi: nfit: vmalloc-out-of-bounds Read in acpi_nfit_ctl
2024-12-13 15:07:22 -08:00
Leon Romanovsky
824927e884 ARC: build: Try to guess GCC variant of cross compiler
ARC GCC compiler is packaged starting from Fedora 39i and the GCC
variant of cross compile tools has arc-linux-gnu- prefix and not
arc-linux-. This is causing that CROSS_COMPILE variable is left unset.

This change allows builds without need to supply CROSS_COMPILE argument
if distro package is used.

Before this change:
$ make -j 128 ARCH=arc W=1 drivers/infiniband/hw/mlx4/
  gcc: warning: ‘-mcpu=’ is deprecated; use ‘-mtune=’ or ‘-march=’ instead
  gcc: error: unrecognized command-line option ‘-mmedium-calls’
  gcc: error: unrecognized command-line option ‘-mlock’
  gcc: error: unrecognized command-line option ‘-munaligned-access’

[1] https://packages.fedoraproject.org/pkgs/cross-gcc/gcc-arc-linux-gnu/index.html
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Vineet Gupta <vgupta@kernel.org>
2024-12-13 14:39:24 -08:00
Linus Torvalds
4800575d8c Merge tag 'xfs-fixes-6.13-rc3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Carlos Maiolino:

 - Fixes for scrub subsystem

 - Fix quota crashes

 - Fix sb_spino_align checons on large fsblock sizes

 - Fix discarded superblock updates

 - Fix stuck unmount due to a locked inode

* tag 'xfs-fixes-6.13-rc3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (28 commits)
  xfs: port xfs_ioc_start_commit to multigrain timestamps
  xfs: return from xfs_symlink_verify early on V4 filesystems
  xfs: fix zero byte checking in the superblock scrubber
  xfs: check pre-metadir fields correctly
  xfs: don't crash on corrupt /quotas dirent
  xfs: don't move nondir/nonreg temporary repair files to the metadir namespace
  xfs: fix sb_spino_align checks for large fsblock sizes
  xfs: convert quotacheck to attach dquot buffers
  xfs: attach dquot buffer to dquot log item buffer
  xfs: clean up log item accesses in xfs_qm_dqflush{,_done}
  xfs: separate dquot buffer reads from xfs_dqflush
  xfs: don't lose solo dquot update transactions
  xfs: don't lose solo superblock counter update transactions
  xfs: avoid nested calls to __xfs_trans_commit
  xfs: only run precommits once per transaction object
  xfs: unlock inodes when erroring out of xfs_trans_alloc_dir
  xfs: fix scrub tracepoints when inode-rooted btrees are involved
  xfs: update btree keys correctly when _insrec splits an inode root block
  xfs: fix error bailout in xfs_rtginode_create
  xfs: fix null bno_hint handling in xfs_rtallocate_rtg
  ...
2024-12-13 14:27:40 -08:00
Linus Torvalds
01af50af76 Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:

 - arm64 stacktrace: address some fallout from the recent changes to
   unwinding across exception boundaries

 - Ensure the arm64 signal delivery failure is recoverable - only
   override the return registers after all the user accesses took place

 - Fix the arm64 kselftest access to SVCR - only when SME is detected

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  kselftest/arm64: abi: fix SVCR detection
  arm64: signal: Ensure signal delivery failure is recoverable
  arm64: stacktrace: Don't WARN when unwinding other tasks
  arm64: stacktrace: Skip reporting LR at exception boundaries
2024-12-13 14:23:09 -08:00
Linus Torvalds
a5b3d5dfce Merge tag 'riscv-for-linus-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:

 - avoid taking a mutex while resolving jump_labels in the mutex
   implementation

 - avoid trying to resolve the early boot DT pointer via the linear map

 - avoid trying to IPI kfence TLB flushes, as kfence might flush with
   IRQs disabled

 - avoid calling PMD destructors on PMDs that were never constructed

* tag 'riscv-for-linus-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: mm: Do not call pmd dtor on vmemmap page table teardown
  riscv: Fix IPIs usage in kfence_protect_page()
  riscv: Fix wrong usage of __pa() on a fixmap address
  riscv: Fixup boot failure when CONFIG_DEBUG_RT_MUTEXES=y
2024-12-13 14:18:18 -08:00
Rafael J. Wysocki
e14d5ae28e Merge branch 'acpica'
Merge an ACPICA fix for 6.13-rc3:

 - Don't release a context_mutex that was never acquired in
   acpi_remove_address_space_handler() (Daniil Tatianin).

* acpica:
  ACPICA: events/evxfregn: don't release the ContextMutex that was never acquired
2024-12-13 21:32:06 +01:00
Paolo Bonzini
3522c41975 Merge tag 'kvm-riscv-fixes-6.13-1' of https://github.com/kvm-riscv/linux into HEAD
KVM/riscv fixes for 6.13, take #1

- Replace csr_write() with csr_set() for HVIEN PMU overflow bit
2024-12-13 13:59:20 -05:00
Sean Christopherson
1201f226c8 KVM: x86: Cache CPUID.0xD XSTATE offsets+sizes during module init
Snapshot the output of CPUID.0xD.[1..n] during kvm.ko initiliaization to
avoid the overead of CPUID during runtime.  The offset, size, and metadata
for CPUID.0xD.[1..n] sub-leaves does not depend on XCR0 or XSS values, i.e.
is constant for a given CPU, and thus can be cached during module load.

On Intel's Emerald Rapids, CPUID is *wildly* expensive, to the point where
recomputing XSAVE offsets and sizes results in a 4x increase in latency of
nested VM-Enter and VM-Exit (nested transitions can trigger
xstate_required_size() multiple times per transition), relative to using
cached values.  The issue is easily visible by running `perf top` while
triggering nested transitions: kvm_update_cpuid_runtime() shows up at a
whopping 50%.

As measured via RDTSC from L2 (using KVM-Unit-Test's CPUID VM-Exit test
and a slightly modified L1 KVM to handle CPUID in the fastpath), a nested
roundtrip to emulate CPUID on Skylake (SKX), Icelake (ICX), and Emerald
Rapids (EMR) takes:

  SKX 11650
  ICX 22350
  EMR 28850

Using cached values, the latency drops to:

  SKX 6850
  ICX 9000
  EMR 7900

The underlying issue is that CPUID itself is slow on ICX, and comically
slow on EMR.  The problem is exacerbated on CPUs which support XSAVES
and/or XSAVEC, as KVM invokes xstate_required_size() twice on each
runtime CPUID update, and because there are more supported XSAVE features
(CPUID for supported XSAVE feature sub-leafs is significantly slower).

 SKX:
  CPUID.0xD.2  = 348 cycles
  CPUID.0xD.3  = 400 cycles
  CPUID.0xD.4  = 276 cycles
  CPUID.0xD.5  = 236 cycles
  <other sub-leaves are similar>

 EMR:
  CPUID.0xD.2  = 1138 cycles
  CPUID.0xD.3  = 1362 cycles
  CPUID.0xD.4  = 1068 cycles
  CPUID.0xD.5  = 910 cycles
  CPUID.0xD.6  = 914 cycles
  CPUID.0xD.7  = 1350 cycles
  CPUID.0xD.8  = 734 cycles
  CPUID.0xD.9  = 766 cycles
  CPUID.0xD.10 = 732 cycles
  CPUID.0xD.11 = 718 cycles
  CPUID.0xD.12 = 734 cycles
  CPUID.0xD.13 = 1700 cycles
  CPUID.0xD.14 = 1126 cycles
  CPUID.0xD.15 = 898 cycles
  CPUID.0xD.16 = 716 cycles
  CPUID.0xD.17 = 748 cycles
  CPUID.0xD.18 = 776 cycles

Note, updating runtime CPUID information multiple times per nested
transition is itself a flaw, especially since CPUID is a mandotory
intercept on both Intel and AMD.  E.g. KVM doesn't need to ensure emulated
CPUID state is up-to-date while running L2.  That flaw will be fixed in a
future patch, as deferring runtime CPUID updates is more subtle than it
appears at first glance, the benefits aren't super critical to have once
the XSAVE issue is resolved, and caching CPUID output is desirable even if
KVM's updates are deferred.

Cc: Jim Mattson <jmattson@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20241211013302.1347853-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-12-13 13:58:10 -05:00