Commit Graph

103338 Commits

Author SHA1 Message Date
Simona Vetter
098456f314 Merge tag 'drm-misc-next-2025-10-21' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
drm-misc-next for v6.19:

UAPI Changes:

amdxdna:
- Support reading last hardware error

Cross-subsystem Changes:

dma-buf:
- heaps: Create heap per CMA reserved location; Improve user-space documentation

Core Changes:

atomic:
- Clean up and improve state-handling interfaces, update drivers

bridge:
- Improve ref counting

buddy:
- Optimize block management

Driver Changes:

amdxdna:
- Fix runtime power management
- Support firmware debug output

ast:
- Set quirks for each chip model

atmel-hlcdc:
- Set LCDC_ATTRE register in plane disable
- Set correct values for plane scaler

bochs:
- Use vblank timer

bridge:
- synopsis: Support CEC; Init timer with correct frequency

cirrus-qemu:
- Use vblank timer

imx:
- Clean up

ivu:
- Update JSM API to 3.33.0
- Reset engine on more job errors
- Return correct error codes for jobs

komeda:
- Use drm_ logging functions

panel:
- edp: Support AUO B116XAN02.0

panfrost:
- Embed struct drm_driver in Panfrost device
- Improve error handling
- Clean up job handling

panthor:
- Support custom ASN_HASH for mt8196

renesas:
- rz-du: Fix dependencies

rockchip:
- dsi: Add support for RK3368
- Fix LUT size for RK3386

sitronix:
- Fix output position when clearing screens

qaic:
- Support dma-buf exports
- Support new firmware's READ_DATA implementation
- Replace kcalloc with memdup
- Replace snprintf() with sysfs_emit()
- Avoid overflows in arithmetics
- Clean up
- Fixes

qxl:
- Use vblank timer

rockchip:
- Clean up mode-setting code

vgem:
- Fix fence timer deadlock

virtgpu:
- Use vblank timer

Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>

From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://lore.kernel.org/r/20251021111837.GA40643@linux.fritz.box
2025-10-24 13:25:20 +02:00
Linus Torvalds
2953fb6548 Merge tag 'hid-for-linus-2025101701' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID fixes from Jiri Kosina:

 - fix for sticky fingers handling in hid-multitouch (Benjamin
   Tissoires)

 - fix for reporting of 0 battery levels (Dmitry Torokhov)

 - build fix for hid-haptic in certain configurations (Jonathan Denose)

 - improved probe and avoiding spamming kernel log by hid-nintendo
   (Vicki Pfau)

 - fix for OOB in hid-cp2112 (Deepak Sharma)

 - interrupt handling fix for intel-thc-hid (Even Xu)

 - a couple of new device IDs and device-specific quirks

* tag 'hid-for-linus-2025101701' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
  HID: logitech-hidpp: Add HIDPP_QUIRK_RESET_HI_RES_SCROLL
  selftests/hid: add tests for missing release on the Dell Synaptics
  HID: multitouch: fix sticky fingers
  HID: multitouch: fix name of Stylus input devices
  HID: hid-input: only ignore 0 battery events for digitizers
  HID: hid-debug: Fix spelling mistake "Rechargable" -> "Rechargeable"
  HID: Kconfig: Fix build error from CONFIG_HID_HAPTIC
  HID: nintendo: Rate limit IMU compensation message
  HID: nintendo: Wait longer for initial probe
  HID: core: Add printk_ratelimited variants to hid_warn() etc
  HID: quirks: Add ALWAYS_POLL quirk for VRS R295 steering wheel
  HID: quirks: avoid Cooler Master MM712 dongle wakeup bug
  HID: cp2112: Add parameter validation to data length
  HID: intel-thc-hid: intel-quickspi: Add ARL PCI Device Id's
  HID: intel-thc-hid: Intel-quickspi: switch first interrupt from level to edge detection
  HID: intel-thc-hid: intel-quicki2c: Fix wrong type casting
2025-10-18 08:18:18 -10:00
Linus Torvalds
d303caf5ca Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Alexei Starovoitov:

 - Replace bpf_map_kmalloc_node() with kmalloc_nolock() to fix kmemleak
   imbalance in tracking of bpf_async_cb structures (Alexei Starovoitov)

 - Make selftests/bpf arg_parsing.c more robust to errors (Andrii
   Nakryiko)

 - Fix redefinition of 'off' as different kind of symbol when I40E
   driver is builtin (Brahmajit Das)

 - Do not disable preemption in bpf_test_run (Sahil Chandna)

 - Fix memory leak in __lookup_instance error path (Shardul Bankar)

 - Ensure test data is flushed to disk before reading it (Xing Guo)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: Fix redefinition of 'off' as different kind of symbol
  bpf: Do not disable preemption in bpf_test_run().
  bpf: Fix memory leak in __lookup_instance error path
  selftests: arg_parsing: Ensure data is flushed to disk before reading.
  bpf: Replace bpf_map_kmalloc_node() with kmalloc_nolock() to allocate bpf_async_cb structures.
  selftests/bpf: make arg_parsing.c more robust to crashes
  bpf: test_run: Fix ctx leak in bpf_prog_test_run_xdp error path
2025-10-18 08:00:43 -10:00
Linus Torvalds
2d07c6c209 Merge tag 'nfs-for-6.18-2' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client fixes from Anna Schumaker:

 - Fix for FlexFiles mirror->dss allocation

 - Apply delay_retrans to async operations

 - Check if suid/sgid is cleared after a write when needed

 - Fix setting the state renewal timer for early mounts after a reboot

* tag 'nfs-for-6.18-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  NFS4: Fix state renewals missing after boot
  NFS: check if suid/sgid was cleared after a write as needed
  NFS4: Apply delay_retrans to async operations
  NFSv4/flexfiles: fix to allocate mirror->dss before use
2025-10-18 07:18:48 -10:00
Linus Torvalds
02e5f74ef0 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
 "ARM:

   - Fix the handling of ZCR_EL2 in NV VMs

   - Pick the correct translation regime when doing a PTW on the back of
     a SEA

   - Prevent userspace from injecting an event into a vcpu that isn't
     initialised yet

   - Move timer save/restore to the sysreg handling code, fixing EL2
     timer access in the process

   - Add FGT-based trapping of MDSCR_EL1 to reduce the overhead of debug

   - Fix trapping configuration when the host isn't GICv3

   - Improve the detection of HCR_EL2.E2H being RES1

   - Drop a spurious 'break' statement in the S1 PTW

   - Don't try to access SPE when owned by EL3

  Documentation updates:

   - Document the failure modes of event injection

   - Document that a GICv3 guest can be created on a GICv5 host with
     FEAT_GCIE_LEGACY

  Selftest improvements:

   - Add a selftest for the effective value of HCR_EL2.AMO

   - Address build warning in the timer selftest when building with
     clang

   - Teach irqfd selftests about non-x86 architectures

   - Add missing sysregs to the set_id_regs selftest

   - Fix vcpu allocation in the vgic_lpi_stress selftest

   - Correctly enable interrupts in the vgic_lpi_stress selftest

  x86:

   - Expand the KVM_PRE_FAULT_MEMORY selftest to add a regression test
     for the bug fixed by commit 3ccbf6f470 ("KVM: x86/mmu: Return
     -EAGAIN if userspace deletes/moves memslot during prefault")

   - Don't try to get PMU capabilities from perf when running a CPU with
     hybrid CPUs/PMUs, as perf will rightly WARN.

  guest_memfd:

   - Rework KVM_CAP_GUEST_MEMFD_MMAP (newly introduced in 6.18) into a
     more generic KVM_CAP_GUEST_MEMFD_FLAGS

   - Add a guest_memfd INIT_SHARED flag and require userspace to
     explicitly set said flag to initialize memory as SHARED,
     irrespective of MMAP.

     The behavior merged in 6.18 is that enabling mmap() implicitly
     initializes memory as SHARED, which would result in an ABI
     collision for x86 CoCo VMs as their memory is currently always
     initialized PRIVATE.

   - Allow mmap() on guest_memfd for x86 CoCo VMs, i.e. on VMs with
     private memory, to enable testing such setups, i.e. to hopefully
     flush out any other lurking ABI issues before 6.18 is officially
     released.

   - Add testcases to the guest_memfd selftest to cover guest_memfd
     without MMAP, and host userspace accesses to mmap()'d private
     memory"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (46 commits)
  arm64: Revamp HCR_EL2.E2H RES1 detection
  KVM: arm64: nv: Use FGT write trap of MDSCR_EL1 when available
  KVM: arm64: Compute per-vCPU FGTs at vcpu_load()
  KVM: arm64: selftests: Fix misleading comment about virtual timer encoding
  KVM: arm64: selftests: Add an E2H=0-specific configuration to get_reg_list
  KVM: arm64: selftests: Make dependencies on VHE-specific registers explicit
  KVM: arm64: Kill leftovers of ad-hoc timer userspace access
  KVM: arm64: Fix WFxT handling of nested virt
  KVM: arm64: Move CNT*CT_EL0 userspace accessors to generic infrastructure
  KVM: arm64: Move CNT*_CVAL_EL0 userspace accessors to generic infrastructure
  KVM: arm64: Move CNT*_CTL_EL0 userspace accessors to generic infrastructure
  KVM: arm64: Add timer UAPI workaround to sysreg infrastructure
  KVM: arm64: Make timer_set_offset() generally accessible
  KVM: arm64: Replace timer context vcpu pointer with timer_id
  KVM: arm64: Introduce timer_context_to_vcpu() helper
  KVM: arm64: Hide CNTHV_*_EL2 from userspace for nVHE guests
  Documentation: KVM: Update GICv3 docs for GICv5 hosts
  KVM: arm64: gic-v3: Only set ICH_HCR traps for v2-on-v3 or v3 guests
  KVM: arm64: selftests: Actually enable IRQs in vgic_lpi_stress
  KVM: arm64: selftests: Allocate vcpus with correct size
  ...
2025-10-18 07:07:14 -10:00
Maxime Ripard
8b5690d5c9 dma-buf: heaps: cma: Register list of CMA regions at boot
In order to create a CMA heap instance for each CMA region found in the
system, we need to register each of these instances.

While it would appear trivial, the CMA regions are created super early
in the kernel boot process, before most of the subsystems are
initialized. Thus, we can't just create an exported function to create a
heap from the CMA region being initialized.

What we can do however is create a two-step process, where we collect
all the CMA regions into an array early on, and then when we initialize
the heaps we iterate over that array and create the heaps from the CMA
regions we collected.

Reviewed-by: T.J. Mercier <tjmercier@google.com>
Signed-off-by: Maxime Ripard <mripard@kernel.org>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
Link: https://lore.kernel.org/r/20251013-dma-buf-ecc-heap-v8-2-04ce150ea3d9@kernel.org
2025-10-18 21:31:21 +05:30
Paolo Bonzini
4361f5aa8b Merge tag 'kvm-x86-fixes-6.18-rc2' of https://github.com/kvm-x86/linux into HEAD
KVM x86 fixes for 6.18:

 - Expand the KVM_PRE_FAULT_MEMORY selftest to add a regression test for the
   bug fixed by commit 3ccbf6f470 ("KVM: x86/mmu: Return -EAGAIN if userspace
   deletes/moves memslot during prefault")

 - Don't try to get PMU capabbilities from perf when running a CPU with hybrid
   CPUs/PMUs, as perf will rightly WARN.

 - Rework KVM_CAP_GUEST_MEMFD_MMAP (newly introduced in 6.18) into a more
   generic KVM_CAP_GUEST_MEMFD_FLAGS

 - Add a guest_memfd INIT_SHARED flag and require userspace to explicitly set
   said flag to initialize memory as SHARED, irrespective of MMAP.  The
   behavior merged in 6.18 is that enabling mmap() implicitly initializes
   memory as SHARED, which would result in an ABI collision for x86 CoCo VMs
   as their memory is currently always initialized PRIVATE.

 - Allow mmap() on guest_memfd for x86 CoCo VMs, i.e. on VMs with private
   memory, to enable testing such setups, i.e. to hopefully flush out any
   other lurking ABI issues before 6.18 is officially released.

 - Add testcases to the guest_memfd selftest to cover guest_memfd without MMAP,
   and host userspace accesses to mmap()'d private memory.
2025-10-18 10:25:43 +02:00
Linus Torvalds
cf1ea8854e Merge tag 'mmc-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull mmc cleanup from Ulf Hansson:
 "Move rpmb_frame struct and constants to rpmb common header

  This helps us to avoid sharing an immutable branch between our git
  trees. I was planning to send it before rc1, but I didn't make it"

* tag 'mmc-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  rpmb: move rpmb_frame struct and constants to common header
2025-10-17 08:22:20 -07:00
Linus Torvalds
634ec1fc79 Merge tag 'net-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
 "Including fixes from CAN

  Current release - regressions:

    - udp: do not use skb_release_head_state() before
      skb_attempt_defer_free()

    - gro_cells: use nested-BH locking for gro_cell

    - dpll: zl3073x: increase maximum size of flash utility

  Previous releases - regressions:

    - core: fix lockdep splat on device unregister

    - tcp: fix tcp_tso_should_defer() vs large RTT

    - tls:
        - don't rely on tx_work during send()
        - wait for pending async decryptions if tls_strp_msg_hold fails

    - can: j1939: add missing calls in NETDEV_UNREGISTER notification
      handler

    - eth: lan78xx: fix lost EEPROM write timeout in
      lan78xx_write_raw_eeprom

  Previous releases - always broken:

    - ip6_tunnel: prevent perpetual tunnel growth

    - dpll: zl3073x: handle missing or corrupted flash configuration

    - can: m_can: fix pm_runtime and CAN state handling

    - eth:
        - ixgbe: fix too early devlink_free() in ixgbe_remove()
        - ixgbevf: fix mailbox API compatibility
        - gve: Check valid ts bit on RX descriptor before hw timestamping
        - idpf: cleanup remaining SKBs in PTP flows
        - r8169: fix packet truncation after S4 resume on RTL8168H/RTL8111H"

* tag 'net-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (50 commits)
  udp: do not use skb_release_head_state() before skb_attempt_defer_free()
  net: usb: lan78xx: fix use of improperly initialized dev->chipid in lan78xx_reset
  netdevsim: set the carrier when the device goes up
  selftests: tls: add test for short splice due to full skmsg
  selftests: net: tls: add tests for cmsg vs MSG_MORE
  tls: don't rely on tx_work during send()
  tls: wait for pending async decryptions if tls_strp_msg_hold fails
  tls: always set record_type in tls_process_cmsg
  tls: wait for async encrypt in case of error during latter iterations of sendmsg
  tls: trim encrypted message to match the plaintext on short splice
  tg3: prevent use of uninitialized remote_adv and local_adv variables
  MAINTAINERS: new entry for IPv6 IOAM
  gve: Check valid ts bit on RX descriptor before hw timestamping
  net: core: fix lockdep splat on device unregister
  MAINTAINERS: add myself as maintainer for b53
  selftests: net: check jq command is supported
  net: airoha: Take into account out-of-order tx completions in airoha_dev_xmit()
  tcp: fix tcp_tso_should_defer() vs large RTT
  r8152: add error handling in rtl8152_driver_init
  usbnet: Fix using smp_processor_id() in preemptible code warnings
  ...
2025-10-16 09:41:21 -07:00
Alexei Starovoitov
5fb750e8a9 bpf: Replace bpf_map_kmalloc_node() with kmalloc_nolock() to allocate bpf_async_cb structures.
The following kmemleak splat:

[    8.105530] kmemleak: Trying to color unknown object at 0xff11000100e918c0 as Black
[    8.106521] Call Trace:
[    8.106521]  <TASK>
[    8.106521]  dump_stack_lvl+0x4b/0x70
[    8.106521]  kvfree_call_rcu+0xcb/0x3b0
[    8.106521]  ? hrtimer_cancel+0x21/0x40
[    8.106521]  bpf_obj_free_fields+0x193/0x200
[    8.106521]  htab_map_update_elem+0x29c/0x410
[    8.106521]  bpf_prog_cfc8cd0f42c04044_overwrite_cb+0x47/0x4b
[    8.106521]  bpf_prog_8c30cd7c4db2e963_overwrite_timer+0x65/0x86
[    8.106521]  bpf_prog_test_run_syscall+0xe1/0x2a0

happens due to the combination of features and fixes, but mainly due to
commit 6d78b4473c ("bpf: Tell memcg to use allow_spinning=false path in bpf_timer_init()")
It's using __GFP_HIGH, which instructs slub/kmemleak internals to skip
kmemleak_alloc_recursive() on allocation, so subsequent kfree_rcu()->
kvfree_call_rcu()->kmemleak_ignore() complains with the above splat.

To fix this imbalance, replace bpf_map_kmalloc_node() with
kmalloc_nolock() and kfree_rcu() with call_rcu() + kfree_nolock() to
make sure that the objects allocated with kmalloc_nolock() are freed
with kfree_nolock() rather than the implicit kfree() that kfree_rcu()
uses internally.

Note, the kmalloc_nolock() happens under bpf_spin_lock_irqsave(), so
it will always fail in PREEMPT_RT. This is not an issue at the moment,
since bpf_timers are disabled in PREEMPT_RT. In the future
bpf_spin_lock will be replaced with state machine similar to
bpf_task_work.

Fixes: 6d78b4473c ("bpf: Tell memcg to use allow_spinning=false path in bpf_timer_init()")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Harry Yoo <harry.yoo@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: linux-mm@kvack.org
Link: https://lore.kernel.org/bpf/20251015000700.28988-1-alexei.starovoitov@gmail.com
2025-10-15 12:22:22 +02:00
Vicki Pfau
1d64624243 HID: core: Add printk_ratelimited variants to hid_warn() etc
hid_warn_ratelimited() is needed. Add the others as part of the block.

Signed-off-by: Vicki Pfau <vi@endrift.com>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
2025-10-14 11:57:40 +02:00
Kamil Horák - 2N
e4d0c909bf net: phy: bcm54811: Fix GMII/MII/MII-Lite selection
The Broadcom bcm54811 is hardware-strapped to select among RGMII and
GMII/MII/MII-Lite modes. However, the corresponding bit, RGMII Enable
in Miscellaneous Control Register must be also set to select desired
RGMII or MII(-lite)/GMII mode.

Fixes: 3117a11fff ("net: phy: bcm54811: PHY initialization")
Signed-off-by: Kamil Horák - 2N <kamilh@axis.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20251009130656.1308237-2-kamilh@axis.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-13 17:36:20 -07:00
Joshua Watt
7a84394f02 NFS4: Apply delay_retrans to async operations
The setting of delay_retrans is applied to synchronous RPC operations
because the retransmit count is stored in same struct nfs4_exception
that is passed each time an error is checked. However, for asynchronous
operations (READ, WRITE, LOCKU, CLOSE, DELEGRETURN), a new struct
nfs4_exception is made on the stack each time the task callback is
invoked. This means that the retransmit count is always zero and thus
delay_retrans never takes effect.

Apply delay_retrans to these operations by tracking and updating their
retransmit count.

Change-Id: Ieb33e046c2b277cb979caa3faca7f52faf0568c9
Signed-off-by: Joshua Watt <jpewhacker@gmail.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-10-13 14:33:00 -04:00
Bean Huo
7e8242405b rpmb: move rpmb_frame struct and constants to common header
Move struct rpmb_frame and RPMB operation constants from MMC block
driver to include/linux/rpmb.h for reuse across different RPMB
implementations (UFS, NVMe, etc.).

Signed-off-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Avri Altman <avri.altman@sandisk.com>
Acked-by: Jens Wiklander <jens.wiklander@linaro.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2025-10-13 13:18:03 +02:00
Damien Le Moal
12d724f285 ata: libata-core: relax checks in ata_read_log_directory()
Commit 6d4405b16d ("ata: libata-core: Cache the general purpose log
directory") introduced caching of a device general purpose log directory
to avoid repeated access to this log page during device scan. This
change also added a check on this log page to verify that the log page
version is 0x0001 as mandated by the ACS specifications.

And it turns out that some devices do not bother reporting this version,
instead reporting a version 0, resulting in error messages such as:

ata6.00: Invalid log directory version 0x0000

and to the device being marked as not supporting the general purpose log
directory log page.

Since before commit 6d4405b16d the log page version check did not
exist and things were still working correctly for these devices, relax
ata_read_log_directory() version check and only warn about the invalid
log page version number without disabling access to the log directory
page.

Fixes: 6d4405b16d ("ata: libata-core: Cache the general purpose log directory")
Cc: stable@vger.kernel.org
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220635
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
2025-10-13 09:12:36 +02:00
Wolfram Sang
a8482d2c90 Revert "i2c: boardinfo: Annotate code used in init phase only"
This reverts commit 1a2b423be6 because we
got a regression report and need time to find out the details.

Reported-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Closes: https://lore.kernel.org/r/29ec0082-4dd4-4120-acd2-44b35b4b9487@oss.qualcomm.com
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
2025-10-11 23:57:33 +02:00
Linus Torvalds
9591fdb061 Merge tag 'x86_core_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull more x86 updates from Borislav Petkov:

 - Remove a bunch of asm implementing condition flags testing in KVM's
   emulator in favor of int3_emulate_jcc() which is written in C

 - Replace KVM fastops with C-based stubs which avoids problems with the
   fastop infra related to latter not adhering to the C ABI due to their
   special calling convention and, more importantly, bypassing compiler
   control-flow integrity checking because they're written in asm

 - Remove wrongly used static branches and other ugliness accumulated
   over time in hyperv's hypercall implementation with a proper static
   function call to the correct hypervisor call variant

 - Add some fixes and modifications to allow running FRED-enabled
   kernels in KVM even on non-FRED hardware

 - Add kCFI improvements like validating indirect calls and prepare for
   enabling kCFI with GCC. Add cmdline params documentation and other
   code cleanups

 - Use the single-byte 0xd6 insn as the official #UD single-byte
   undefined opcode instruction as agreed upon by both x86 vendors

 - Other smaller cleanups and touchups all over the place

* tag 'x86_core_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  x86,retpoline: Optimize patch_retpoline()
  x86,ibt: Use UDB instead of 0xEA
  x86/cfi: Remove __noinitretpoline and __noretpoline
  x86/cfi: Add "debug" option to "cfi=" bootparam
  x86/cfi: Standardize on common "CFI:" prefix for CFI reports
  x86/cfi: Document the "cfi=" bootparam options
  x86/traps: Clarify KCFI instruction layout
  compiler_types.h: Move __nocfi out of compiler-specific header
  objtool: Validate kCFI calls
  x86/fred: KVM: VMX: Always use FRED for IRQs when CONFIG_X86_FRED=y
  x86/fred: Play nice with invoking asm_fred_entry_from_kvm() on non-FRED hardware
  x86/fred: Install system vector handlers even if FRED isn't fully enabled
  x86/hyperv: Use direct call to hypercall-page
  x86/hyperv: Clean up hv_do_hypercall()
  KVM: x86: Remove fastops
  KVM: x86: Convert em_salc() to C
  KVM: x86: Introduce EM_ASM_3WCL
  KVM: x86: Introduce EM_ASM_1SRC2
  KVM: x86: Introduce EM_ASM_2CL
  KVM: x86: Introduce EM_ASM_2W
  ...
2025-10-11 11:19:16 -07:00
Linus Torvalds
ae13bd2310 Merge tag 'mm-nonmm-stable-2025-10-10-15-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more updates from Andrew Morton:
 "Just one series here - Mike Rappoport has taught KEXEC handover to
  preserve vmalloc allocations across handover"

* tag 'mm-nonmm-stable-2025-10-10-15-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  lib/test_kho: use kho_preserve_vmalloc instead of storing addresses in fdt
  kho: add support for preserving vmalloc allocations
  kho: replace kho_preserve_phys() with kho_preserve_pages()
  kho: check if kho is finalized in __kho_preserve_order()
  MAINTAINERS, .mailmap: update Umang's email address
2025-10-11 10:27:52 -07:00
Linus Torvalds
971370a88c Merge tag 'mm-hotfixes-stable-2025-10-10-15-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
 "7 hotfixes.  All 7 are cc:stable and all 7 are for MM.

  All singletons, please see the changelogs for details"

* tag 'mm-hotfixes-stable-2025-10-10-15-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm: hugetlb: avoid soft lockup when mprotect to large memory area
  fsnotify: pass correct offset to fsnotify_mmap_perm()
  mm/ksm: fix flag-dropping behavior in ksm_madvise
  mm/damon/vaddr: do not repeat pte_offset_map_lock() until success
  mm/rmap: fix soft-dirty and uffd-wp bit loss when remapping zero-filled mTHP subpage to shared zeropage
  mm/thp: fix MTE tag mismatch when replacing zero-filled subpages
  memcg: skip cgroup_file_notify if spinning is not allowed
2025-10-11 10:14:55 -07:00
Sean Christopherson
44c6cb9fe9 KVM: guest_memfd: Allow mmap() on guest_memfd for x86 VMs with private memory
Allow mmap() on guest_memfd instances for x86 VMs with private memory as
the need to track private vs. shared state in the guest_memfd instance is
only pertinent to INIT_SHARED.  Doing mmap() on private memory isn't
terrible useful (yet!), but it's now possible, and will be desirable when
guest_memfd gains support for other VMA-based syscalls, e.g. mbind() to
set NUMA policy.

Lift the restriction now, before MMAP support is officially released, so
that KVM doesn't need to add another capability to enumerate support for
mmap() on private memory.

Fixes: 3d3a04fad2 ("KVM: Allow and advertise support for host mmap() on guest_memfd files")
Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Tested-by: Ackerley Tng <ackerleytng@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20251003232606.4070510-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-10 14:25:25 -07:00
Linus Torvalds
f76b1683d1 Merge tag 'devicetree-fixes-for-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree fixes from Rob Herring:

 - Allow child nodes on renesas-bsc bus binding

 - Drop node name pattern on allwinner,sun50i-a64-de2 bus binding

 - Switch DT patchwork to kernel.org from ozlabs.org

 - Fix some typos in docs and bindings

 - Fix reference count in PCI node unittest

* tag 'devicetree-fixes-for-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  dt-bindings: bus: renesas-bsc: allow additional properties
  dt-bindings: bus: allwinner,sun50i-a64-de2: don't check node names
  MAINTAINERS: Move DT patchwork to kernel.org
  of: unittest: Fix device reference count leak in of_unittest_pci_node_verify
  of: doc: Fix typo in doc comments.
  dt-bindings: mmc: Correct typo "upto" to "up to"
2025-10-10 13:05:40 -07:00
Linus Torvalds
8bd9238e51 Merge tag 'ceph-for-6.18-rc1' of https://github.com/ceph/ceph-client
Pull ceph updates from Ilya Dryomov:

 - some messenger improvements (Eric and Max)

 - address an issue (also affected userspace) of incorrect permissions
   being granted to users who have access to multiple different CephFS
   instances within the same cluster (Kotresh)

 - a bunch of assorted CephFS fixes (Slava)

* tag 'ceph-for-6.18-rc1' of https://github.com/ceph/ceph-client:
  ceph: add bug tracking system info to MAINTAINERS
  ceph: fix multifs mds auth caps issue
  ceph: cleanup in ceph_alloc_readdir_reply_buffer()
  ceph: fix potential NULL dereference issue in ceph_fill_trace()
  libceph: add empty check to ceph_con_get_out_msg()
  libceph: pass the message pointer instead of loading con->out_msg
  libceph: make ceph_con_get_out_msg() return the message pointer
  ceph: fix potential race condition on operations with CEPH_I_ODIRECT flag
  ceph: refactor wake_up_bit() pattern of calling
  ceph: fix potential race condition in ceph_ioctl_lazyio()
  ceph: fix overflowed constant issue in ceph_do_objects_copy()
  ceph: fix wrong sizeof argument issue in register_session()
  ceph: add checking of wait_for_completion_killable() return value
  ceph: make ceph_start_io_*() killable
  libceph: Use HMAC-SHA256 library instead of crypto_shash
2025-10-10 11:30:19 -07:00
Linus Torvalds
1b1391b9c4 Merge tag 'block-6.18-20251009' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block fixes from Jens Axboe:

 - Don't include __GFP_NOWARN for loop worker allocation, as it already
   uses GFP_NOWAIT which has __GFP_NOWARN set already

 - Small series cleaning up the recent bio_iov_iter_get_pages() changes

 - loop fix for leaking the backing reference file, if validation fails

 - Update of a comment pertaining to disk/partition stat locking

* tag 'block-6.18-20251009' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  loop: remove redundant __GFP_NOWARN flag
  block: move bio_iov_iter_get_bdev_pages to block/fops.c
  iomap: open code bio_iov_iter_get_bdev_pages
  block: rename bio_iov_iter_get_pages_aligned to bio_iov_iter_get_pages
  block: remove bio_iov_iter_get_pages
  block: Update a comment of disk statistics
  loop: fix backing file reference leak on validation error
2025-10-10 10:37:13 -07:00
Linus Torvalds
aac3190332 Merge tag 'i2c-for-6.18-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull more i2c updates from Wolfram Sang:

 - Second part of rtl9300 updates since dependencies are in now:
    - general cleanups
    - implement block read/write support
    - add RTL9310 support

 - DT schema conversion of hix5hd2 binding

 - namespace cleanup for i2c-algo-pca

 - minor simplification for mt65xx

* tag 'i2c-for-6.18-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  dt-bindings: i2c: hisilicon,hix5hd2: convert to DT schema
  i2c: mt65xx: convert set_speed function to void
  i2c: rename wait_for_completion callback to wait_for_completion_cb
  i2c: rtl9300: add support for RTL9310 I2C controller
  dt-bindings: i2c: realtek,rtl9301-i2c: extend for RTL9310 support
  i2c: rtl9300: use scoped guard instead of explicit lock/unlock
  i2c: rtl9300: separate xfer configuration and execution
  i2c: rtl9300: do not set read mode on every transfer
  i2c: rtl9300: move setting SCL frequency to config_io
  i2c: rtl9300: rename internal sda_pin to sda_num
  dt-bindings: i2c: realtek,rtl9301-i2c: fix wording and typos
  i2c: rtl9300: use regmap fields and API for registers
  i2c: rtl9300: Implement I2C block read and write
2025-10-10 09:13:11 -07:00
Jarkko Sakkinen
207696b17f tpm: use a map for tpm2_calc_ordinal_duration()
The current shenanigans for duration calculation introduce too much
complexity for a trivial problem, and further the code is hard to patch and
maintain.

Address these issues with a flat look-up table, which is easy to understand
and patch. If leaf driver specific patching is required in future, it is
easy enough to make a copy of this table during driver initialization and
add the chip parameter back.

'chip->duration' is retained for TPM 1.x.

As the first entry for this new behavior address TCG spec update mentioned
in this issue:

https://github.com/raspberrypi/linux/issues/7054

Therefore, for TPM_SelfTest the duration is set to 3000 ms.

This does not categorize a as bug, given that this is introduced to the
spec after the feature was originally made.

Reviewed-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2025-10-10 08:21:45 +03:00
Linus Torvalds
18a7e218cf Merge tag 'net-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull  networking fixes from Paolo Abeni:
 "Including fixes from netfilter.

  Current release - regressions:

   - mlx5: fix pre-2.40 binutils assembler error

  Current release - new code bugs:

   - net: psp: don't assume reply skbs will have a socket

   - eth: fbnic: fix missing programming of the default descriptor

  Previous releases - regressions:

   - page_pool: fix PP_MAGIC_MASK to avoid crashing on some 32-bit arches

   - tcp:
       - take care of zero tp->window_clamp in tcp_set_rcvlowat()
       - don't call reqsk_fastopen_remove() in tcp_conn_request()

   - eth:
       - ice: release xa entry on adapter allocation failure
       - usb: asix: hold PM usage ref to avoid PM/MDIO + RTNL deadlock

  Previous releases - always broken:

   - netfilter: validate objref and objrefmap expressions

   - sctp: fix a null dereference in sctp_disposition sctp_sf_do_5_1D_ce()

   - eth:
       - mlx4: prevent potential use after free in mlx4_en_do_uc_filter()
       - mlx5: prevent tunnel mode conflicts between FDB and NIC IPsec tables
       - ocelot: fix use-after-free caused by cyclic delayed work

  Misc:

   -  add support for MediaTek PCIe 5G HP DRMR-H01"

* tag 'net-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (38 commits)
  net: airoha: Fix loopback mode configuration for GDM2 port
  selftests: drv-net: pp_alloc_fail: add necessary optoins to config
  selftests: drv-net: pp_alloc_fail: lower traffic expectations
  selftests: drv-net: fix linter warnings in pp_alloc_fail
  eth: fbnic: fix reporting of alloc_failed qstats
  selftests: drv-net: xdp: add test for interface level qstats
  selftests: drv-net: xdp: rename netnl to ethnl
  eth: fbnic: fix saving stats from XDP_TX rings on close
  eth: fbnic: fix accounting of XDP packets
  eth: fbnic: fix missing programming of the default descriptor
  selftests: netfilter: query conntrack state to check for port clash resolution
  selftests: netfilter: nft_fib.sh: fix spurious test failures
  bridge: br_vlan_fill_forward_path_pvid: use br_vlan_group_rcu()
  netfilter: nft_objref: validate objref and objrefmap expressions
  net: pse-pd: tps23881: Fix current measurement scaling
  net/mlx5: fix pre-2.40 binutils assembler error
  net/mlx5e: Do not fail PSP init on missing caps
  net/mlx5e: Prevent tunnel reformat when tunnel mode not allowed
  net/mlx5: Prevent tunnel mode conflicts between FDB and NIC IPsec tables
  net: usb: asix: hold PM usage ref to avoid PM/MDIO + RTNL deadlock
  ...
2025-10-09 11:13:08 -07:00
Max Kellermann
7399212dcf libceph: pass the message pointer instead of loading con->out_msg
This pointer is in a register anyway, so let's use that instead of
reloading from memory everywhere.

[ idryomov: formatting ]

Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2025-10-08 23:30:46 +02:00
Max Kellermann
59699a5a71 libceph: make ceph_con_get_out_msg() return the message pointer
The caller in messenger_v1.c loads it anyway, so let's keep the
pointer in the register instead of reloading it from memory.  This
eliminates a tiny bit of unnecessary overhead.

Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2025-10-08 23:30:46 +02:00
Eric Biggers
27c0a7b05d libceph: Use HMAC-SHA256 library instead of crypto_shash
Use the HMAC-SHA256 library functions instead of crypto_shash.  This is
simpler and faster.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2025-10-08 23:30:45 +02:00
Linus Torvalds
ed4d6e9246 Merge tag 'vfio-v6.18-rc1-pt2' of https://github.com/awilliam/linux-vfio
Pull more VFIO updates from Alex Williamson:

 - Optimizations for DMA map and unmap opertions through the type1 vfio
   IOMMU backend.

   This uses various means of batching and hints from the mm structures
   to improve efficiency and therefore performance, resulting in a
   significant speedup for huge page use cases (Li Zhe)

 - Expose supported device migration features through debugfs (Cédric Le
   Goater)

* tag 'vfio-v6.18-rc1-pt2' of https://github.com/awilliam/linux-vfio:
  vfio: Dump migration features under debugfs
  vfio/type1: optimize vfio_unpin_pages_remote()
  vfio/type1: introduce a new member has_rsvd for struct vfio_dma
  vfio/type1: batch vfio_find_vpfn() in function vfio_unpin_pages_remote()
  vfio/type1: optimize vfio_pin_pages_remote()
  mm: introduce num_pages_contiguous()
2025-10-08 11:22:27 -07:00
Linus Torvalds
99cedb6b8f Merge tag 'input-for-v6.18-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:

 - Conversions to yaml/json schema and fixes for input-related device
   tree bindings

 - New drivers:
     - Awinic AW86927 haptic chip
     - Hynitron CST816x series controller
     - Himax HX852x(ES) touchscreen controller

 - Fix uinput to not leak kernel memory via a gap in
   uinput_ff_upload_compat structure

 - Prevent overflow in pressure calculation in tsc2007 driver causing
   phantom touches

 - Make the Atmel maxTouch driver support generic touchscreen
   configuration (flip, rotate, etc)

 - Drop support for platform data in tca8418_keypad, pxa27x-keypad,
   spear-keyboard and twl4030_keypad drivers, they all now rely on
   generic device properties for configuration

 - Other assorted changes and fixes

* tag 'input-for-v6.18-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (50 commits)
  Input: atmel_mxt_ts - allow reset GPIO to sleep
  Input: aw86927 - fix error code in probe()
  Input: psxpad-spi - add a check for the return value of spi_setup()
  Input: uinput - zero-initialize uinput_ff_upload_compat to avoid info leak
  Input: aw86927 - add driver for Awinic AW86927
  dt-bindings: input: Add Awinic AW86927
  dt-bindings: touchscreen: remove touchscreen.txt
  dt-bindings: arm: bcm: raspberrypi,bcm2835-firmware: Add touchscreen child node
  dt-bindings: touchscreen: convert eeti bindings to json schema
  Input: pm8941-pwrkey - disable wakeup for resin by default
  dt-bindings: input: pm8941-pwrkey: Document wakeup-source property
  Input: add driver for Hynitron CST816x series
  dt-bindings: input: touchscreen: add hynitron cst816x series
  Input: imx6ul_tsc - set glitch threshold by DTS property
  dt-bindings: touchscreen: fsl,imx6ul-tsc: support glitch thresold
  dt-bindings: touchscreen: add debounce-delay-us property
  Input: ps2-gpio - fix typo
  Input: atmel_mxt_ts - add support for generic touchscreen configurations
  dt-bindings: input: maxtouch: add common touchscreen properties
  dt-bindings: touchscreen: convert zet6223 bindings to json schema
  ...
2025-10-08 09:44:38 -07:00
Jakub Acs
f04aad36a0 mm/ksm: fix flag-dropping behavior in ksm_madvise
syzkaller discovered the following crash: (kernel BUG)

[   44.607039] ------------[ cut here ]------------
[   44.607422] kernel BUG at mm/userfaultfd.c:2067!
[   44.608148] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
[   44.608814] CPU: 1 UID: 0 PID: 2475 Comm: reproducer Not tainted 6.16.0-rc6 #1 PREEMPT(none)
[   44.609635] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[   44.610695] RIP: 0010:userfaultfd_release_all+0x3a8/0x460

<snip other registers, drop unreliable trace>

[   44.617726] Call Trace:
[   44.617926]  <TASK>
[   44.619284]  userfaultfd_release+0xef/0x1b0
[   44.620976]  __fput+0x3f9/0xb60
[   44.621240]  fput_close_sync+0x110/0x210
[   44.622222]  __x64_sys_close+0x8f/0x120
[   44.622530]  do_syscall_64+0x5b/0x2f0
[   44.622840]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   44.623244] RIP: 0033:0x7f365bb3f227

Kernel panics because it detects UFFD inconsistency during
userfaultfd_release_all().  Specifically, a VMA which has a valid pointer
to vma->vm_userfaultfd_ctx, but no UFFD flags in vma->vm_flags.

The inconsistency is caused in ksm_madvise(): when user calls madvise()
with MADV_UNMEARGEABLE on a VMA that is registered for UFFD in MINOR mode,
it accidentally clears all flags stored in the upper 32 bits of
vma->vm_flags.

Assuming x86_64 kernel build, unsigned long is 64-bit and unsigned int and
int are 32-bit wide.  This setup causes the following mishap during the &=
~VM_MERGEABLE assignment.

VM_MERGEABLE is a 32-bit constant of type unsigned int, 0x8000'0000. 
After ~ is applied, it becomes 0x7fff'ffff unsigned int, which is then
promoted to unsigned long before the & operation.  This promotion fills
upper 32 bits with leading 0s, as we're doing unsigned conversion (and
even for a signed conversion, this wouldn't help as the leading bit is 0).
& operation thus ends up AND-ing vm_flags with 0x0000'0000'7fff'ffff
instead of intended 0xffff'ffff'7fff'ffff and hence accidentally clears
the upper 32-bits of its value.

Fix it by changing `VM_MERGEABLE` constant to unsigned long, using the
BIT() macro.

Note: other VM_* flags are not affected: This only happens to the
VM_MERGEABLE flag, as the other VM_* flags are all constants of type int
and after ~ operation, they end up with leading 1 and are thus converted
to unsigned long with leading 1s.

Note 2:
After commit 31defc3b01 ("userfaultfd: remove (VM_)BUG_ON()s"), this is
no longer a kernel BUG, but a WARNING at the same place:

[   45.595973] WARNING: CPU: 1 PID: 2474 at mm/userfaultfd.c:2067

but the root-cause (flag-drop) remains the same.

[akpm@linux-foundation.org: rust bindgen wasn't able to handle BIT(), from Miguel]
  Link: https://lore.kernel.org/oe-kbuild-all/202510030449.VfSaAjvd-lkp@intel.com/
Link: https://lkml.kernel.org/r/20251001090353.57523-2-acsjakub@amazon.de
Fixes: 7677f7fd8b ("userfaultfd: add minor fault registration mode")
Signed-off-by: Jakub Acs <acsjakub@amazon.de>
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: SeongJae Park <sj@kernel.org>
Tested-by: Alice Ryhl <aliceryhl@google.com>
Tested-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Cc: Xu Xin <xu.xin16@zte.com.cn>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Peter Xu <peterx@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-10-07 14:01:12 -07:00
Shakeel Butt
fcc0669c5a memcg: skip cgroup_file_notify if spinning is not allowed
Generally memcg charging is allowed from all the contexts including NMI
where even spinning on spinlock can cause locking issues.  However one
call chain was missed during the addition of memcg charging from any
context support.  That is try_charge_memcg() -> memcg_memory_event() ->
cgroup_file_notify().

The possible function call tree under cgroup_file_notify() can acquire
many different spin locks in spinning mode.  Some of them are
cgroup_file_kn_lock, kernfs_notify_lock, pool_workqeue's lock.  So, let's
just skip cgroup_file_notify() from memcg charging if the context does not
allow spinning.

Alternative approach was also explored where instead of skipping
cgroup_file_notify(), we defer the memcg event processing to irq_work [1].
However it adds complexity and it was decided to keep things simple until
we need more memcg events with !allow_spinning requirement.

Link: https://lore.kernel.org/all/5qi2llyzf7gklncflo6gxoozljbm4h3tpnuv4u4ej4ztysvi6f@x44v7nz2wdzd/ [1]
Link: https://lkml.kernel.org/r/20250922220203.261714-1-shakeel.butt@linux.dev
Fixes: 3ac4638a73 ("memcg: make memcg_rstat_updated nmi safe")
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Michal Hocko <mhocko@suse.com>
Closes: https://lore.kernel.org/all/20250905061919.439648-1-yepeilin@google.com/
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Peilin Ye <yepeilin@google.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-10-07 14:01:11 -07:00
Mike Rapoport (Microsoft)
a667300bd5 kho: add support for preserving vmalloc allocations
A vmalloc allocation is preserved using binary structure similar to global
KHO memory tracker.  It's a linked list of pages where each page is an
array of physical address of pages in vmalloc area.

kho_preserve_vmalloc() hands out the physical address of the head page to
the caller.  This address is used as the argument to kho_vmalloc_restore()
to restore the mapping in the vmalloc address space and populate it with
the preserved pages.

[pasha.tatashin@soleen.com: free chunks using free_page() not kfree()]
  Link: https://lkml.kernel.org/r/mafs0a52idbeg.fsf@kernel.org
[akpm@linux-foundation.org: coding-style cleanups]
Link: https://lkml.kernel.org/r/20250921054458.4043761-4-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Pratyush Yadav <pratyush@kernel.org>
Cc: Alexander Graf <graf@amazon.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Changyuan Lyu <changyuanl@google.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-10-07 13:48:55 -07:00
Mike Rapoport (Microsoft)
8375b76517 kho: replace kho_preserve_phys() with kho_preserve_pages()
to make it clear that KHO operates on pages rather than on a random
physical address.

The kho_preserve_pages() will be also used in upcoming support for vmalloc
preservation.

Link: https://lkml.kernel.org/r/20250921054458.4043761-3-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Pratyush Yadav <pratyush@kernel.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Cc: Alexander Graf <graf@amazon.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Changyuan Lyu <changyuanl@google.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-10-07 13:48:55 -07:00
Linus Torvalds
abdf766d14 Merge tag 'pm-6.18-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more power management updates from Rafael Wysocki:
 "These are cpufreq fixes and cleanups on top of the material merged
  previously, a power management core code fix and updates of the
  runtime PM framework including unit tests, documentation updates and
  introduction of auto-cleanup macros for runtime PM "resume and get"
  and "get without resuming" operations.

  Specifics:

   - Make cpufreq drivers setting the default CPU transition latency to
     CPUFREQ_ETERNAL specify a proper default transition latency value
     instead which addresses a regression introduced during the 6.6
     cycle that broke CPUFREQ_ETERNAL handling (Rafael Wysocki)

   - Make the cpufreq CPPC driver use a proper transition delay value
     when CPUFREQ_ETERNAL is returned by cppc_get_transition_latency()
     to indicate an error condition (Rafael Wysocki)

   - Make cppc_get_transition_latency() return a negative error code to
     indicate error conditions instead of using CPUFREQ_ETERNAL for this
     purpose and drop CPUFREQ_ETERNAL that has no other users (Rafael
     Wysocki, Gopi Krishna Menon)

   - Fix device leak in the mediatek cpufreq driver (Johan Hovold)

   - Set target frequency on all CPUs sharing a policy during frequency
     updates in the tegra186 cpufreq driver and make it initialize all
     cores to max frequencies (Aaron Kling)

   - Rust cpufreq helper cleanup (Thorsten Blum)

   - Make pm_runtime_put*() family of functions return 1 when the given
     device is already suspended which is consistent with the
     documentation (Brian Norris)

   - Add basic kunit tests for runtime PM API contracts and update
     return values in kerneldoc comments for the runtime PM API (Brian
     Norris, Dan Carpenter)

   - Add auto-cleanup macros for runtime PM "resume and get" and "get
     without resume" operations, use one of them in the PCI core and
     drop the existing "free" macro introduced for similar purpose, but
     somewhat cumbersome to use (Rafael Wysocki)

   - Make the core power management code avoid waiting on device links
     marked as SYNC_STATE_ONLY which is consistent with the handling of
     those device links elsewhere (Pin-yen Lin)"

* tag 'pm-6.18-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  docs/zh_CN: Fix malformed table
  docs/zh_TW: Fix malformed table
  PM: runtime: Fix error checking for kunit_device_register()
  PM: runtime: Introduce one more usage counter guard
  cpufreq: Drop unused symbol CPUFREQ_ETERNAL
  ACPI: CPPC: Do not use CPUFREQ_ETERNAL as an error value
  cpufreq: CPPC: Avoid using CPUFREQ_ETERNAL as transition delay
  cpufreq: Make drivers using CPUFREQ_ETERNAL specify transition latency
  PM: runtime: Drop DEFINE_FREE() for pm_runtime_put()
  PCI/sysfs: Use runtime PM guard macro for auto-cleanup
  PM: runtime: Add auto-cleanup macros for "resume and get" operations
  cpufreq: tegra186: Initialize all cores to max frequencies
  cpufreq: tegra186: Set target frequency for all cpus in policy
  rust: cpufreq: streamline find_supply_names
  cpufreq: mediatek: fix device leak on probe failure
  PM: sleep: Do not wait on SYNC_STATE_ONLY device links
  PM: runtime: Update kerneldoc return codes
  PM: runtime: Make put{,_sync}() return 1 when already suspended
  PM: runtime: Add basic kunit tests for API contracts
2025-10-07 09:39:51 -07:00
Linus Torvalds
522ba450b5 Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk updates from Stephen Boyd:
 "There's a bunch of patches here across drivers/clk/ to migrate drivers
  to use struct clk_ops::determine_rate() instead of the round_rate()
  one so that we can remove the round_rate clk_op entirely. Brian has
  taken up that task which nobody else has wanted to do for close to a
  decade. Thanks Brian!

  This is all prerequisite work to get to the real task of improving the
  clk rate setting process. Once we have determine_rate() used
  everywhere, we'll be able to do things like chain the rate request
  structs in linked lists to order the rate setting operations or add
  more parameters without having to change every clk driver in
  existence. It's also nice to not have multiple ways to do something
  which just causes confusion for clk driver authors. Overall I'm glad
  this is getting done.

  Beyond this change we also have a tweak to the clk_lookup() function
  in the core framework to use hashing on the clk name instead of a clk
  tree walk with string comparisons. We _still_ rely on the clk name to
  be unique, because historically we've used globally unique strings to
  describe the clk tree topology. This tree walk becomes increasingly
  slow as more clks are added to the system. Searching from the roots
  for a duplicate is simple but pretty dumb and it wastes boot time so
  we're using a hash table as an improvement. Ideally we wouldn't rely
  on the strings to be unique at all, relegating them to simply debug
  information, but that is future work that will likely require some
  sort of Kconfig knob indicating strings aren't used for topology
  description.

  Outside of the core framework changes we have the usual new SoC
  support and fixes to clk drivers for things that were discovered once
  the clks were used by consumer drivers. Nothing in particular is
  jumping out at me in the "misc" pile, except maybe the Amlogic driver
  that has gone through a refactoring. That series got a fix from
  testing in -next though so it seems likely that things have been
  getting good test coverage for a couple weeks already"

* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (299 commits)
  clk: microchip: core: remove duplicate roclk_determine_rate()
  reset: aspeed: register AST2700 reset auxiliary bus device
  dt-bindings: clock: ast2700: modify soc0/1 clock define
  clk: tegra: do not overallocate memory for bpmp clocks
  clk: ep93xx: Use int type to store negative error codes
  clk: nxp: Fix pll0 rate check condition in LPC18xx CGU driver
  clk: loongson2: Add clock definitions for Loongson-2K0300 SoC
  clk: loongson2: Avoid hardcoding firmware name of the reference clock
  clk: loongson2: Allow zero divisors for dividers
  clk: loongson2: Support scale clocks with an alternative mode
  clk: loongson2: Allow specifying clock flags for gate clock
  dt-bindings: clock: loongson2: Add Loongson-2K0300 compatible
  clk: clocking-wizard: Fix output clock register offset for Versal platforms
  clk: xilinx: Optimize divisor search in clk_wzrd_get_divisors_ver()
  clk: mmp: pxa1908: Instantiate power driver through auxiliary bus
  clk: s2mps11: add support for S2MPG10 PMIC clock
  dt-bindings: clock: samsung,s2mps11: add s2mpg10
  dt-bindings: stm32: cosmetic fixes for STM32MP25 clock and reset bindings
  clk: stm32: introduce clocks for STM32MP21 platform
  dt-bindings: stm32: add STM32MP21 clocks and reset bindings
  ...
2025-10-07 09:28:37 -07:00
Linus Torvalds
2215336295 Merge tag 'hyperv-next-signed-20251006' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv updates from Wei Liu:

 - Unify guest entry code for KVM and MSHV (Sean Christopherson)

 - Switch Hyper-V MSI domain to use msi_create_parent_irq_domain()
   (Nam Cao)

 - Add CONFIG_HYPERV_VMBUS and limit the semantics of CONFIG_HYPERV
   (Mukesh Rathor)

 - Add kexec/kdump support on Azure CVMs (Vitaly Kuznetsov)

 - Deprecate hyperv_fb in favor of Hyper-V DRM driver (Prasanna
   Kumar T S M)

 - Miscellaneous enhancements, fixes and cleanups (Abhishek Tiwari,
   Alok Tiwari, Nuno Das Neves, Wei Liu, Roman Kisel, Michael Kelley)

* tag 'hyperv-next-signed-20251006' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  hyperv: Remove the spurious null directive line
  MAINTAINERS: Mark hyperv_fb driver Obsolete
  fbdev/hyperv_fb: deprecate this in favor of Hyper-V DRM driver
  Drivers: hv: Make CONFIG_HYPERV bool
  Drivers: hv: Add CONFIG_HYPERV_VMBUS option
  Drivers: hv: vmbus: Fix typos in vmbus_drv.c
  Drivers: hv: vmbus: Fix sysfs output format for ring buffer index
  Drivers: hv: vmbus: Clean up sscanf format specifier in target_cpu_store()
  x86/hyperv: Switch to msi_create_parent_irq_domain()
  mshv: Use common "entry virt" APIs to do work in root before running guest
  entry: Rename "kvm" entry code assets to "virt" to genericize APIs
  entry/kvm: KVM: Move KVM details related to signal/-EINTR into KVM proper
  mshv: Handle NEED_RESCHED_LAZY before transferring to guest
  x86/hyperv: Add kexec/kdump support on Azure CVMs
  Drivers: hv: Simplify data structures for VMBus channel close message
  Drivers: hv: util: Cosmetic changes for hv_utils_transport.c
  mshv: Add support for a new parent partition configuration
  clocksource: hyper-v: Skip unnecessary checks for the root partition
  hyperv: Add missing field to hv_output_map_device_interrupt
2025-10-07 08:40:15 -07:00
Christoph Hellwig
506aa235f6 block: move bio_iov_iter_get_bdev_pages to block/fops.c
Keep bio_iov_iter_get_bdev_pages local with the callers, as blindly
looking at the bdev logical block size is often not the best idea
unless on a block device.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-10-07 08:05:44 -06:00
Christoph Hellwig
82dd5d763c block: rename bio_iov_iter_get_pages_aligned to bio_iov_iter_get_pages
Now that the bio_iov_iter_get_pages is free again, use it instead of
the more complicated now.  Also drop the unused export.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-10-07 08:05:44 -06:00
Christoph Hellwig
1ed06c8350 block: remove bio_iov_iter_get_pages
Switch the only caller to bio_iov_iter_get_pages, and explain why it does
not have any alignment requirements.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-10-07 08:05:44 -06:00
Rafael J. Wysocki
53d4d315d4 Merge branch 'pm-cpufreq'
Merge cpufreq fixes and cleanups, mostly on top of those fixes, for
6.18-rc1:

 - Make cpufreq drivers setting the default CPU transition latency to
   CPUFREQ_ETERNAL specify a proper default transition latency value
   instead which addresses a regression introduced during the 6.6 cycle
   that broke CPUFREQ_ETERNAL handling (Rafael Wysocki)

 - Make the cpufreq CPPC driver use a proper transition delay value
   when CPUFREQ_ETERNAL is returned by cppc_get_transition_latency() to
   indicate an error condition (Rafael Wysocki)

 - Make cppc_get_transition_latency() return a negative error code to
   indicate error conditions instead of using CPUFREQ_ETERNAL for this
   purpose and drop CPUFREQ_ETERNAL that has no other users (Rafael
   Wysocki, Gopi Krishna Menon)

 - Fix device leak in the mediatek cpufreq driver (Johan Hovold)

 - Set target frequency on all CPUs sharing a policy during frequency
   updates in the tegra186 cpufreq driver and make it initialize all
   cores to max frequencies (Aaron Kling)

 - Rust cpufreq helper cleanup (Thorsten Blum)

* pm-cpufreq:
  docs/zh_CN: Fix malformed table
  docs/zh_TW: Fix malformed table
  cpufreq: Drop unused symbol CPUFREQ_ETERNAL
  ACPI: CPPC: Do not use CPUFREQ_ETERNAL as an error value
  cpufreq: CPPC: Avoid using CPUFREQ_ETERNAL as transition delay
  cpufreq: Make drivers using CPUFREQ_ETERNAL specify transition latency
  cpufreq: tegra186: Initialize all cores to max frequencies
  cpufreq: tegra186: Set target frequency for all cpus in policy
  rust: cpufreq: streamline find_supply_names
  cpufreq: mediatek: fix device leak on probe failure
2025-10-07 12:31:46 +02:00
Rafael J. Wysocki
05f084d24e Merge branches 'pm-core' and 'pm-runtime'
Merge runtime PM framework updates and a core power management code fix
for 6.18-rc1:

 - Make pm_runtime_put*() family of functions return 1 when the
   given device is already suspended which is consistent with the
   documentation (Brian Norris)

 - Add basic kunit tests for runtime PM API contracts and update return
   values in kerneldoc coments for the runtime PM API (Brian Norris,
   Dan Carpenter)

 - Add auto-cleanup macros for runtime PM "resume and get" and "get
   without resume" operations, use one of them in the PCI core and
   drop the existing "free" macro introduced for similar purpose, but
   somewhat cumbersome to use (Rafael Wysocki)

 - Make the core power management code avoid waiting on device links
   marked as SYNC_STATE_ONLY which is consistent with the handling of
   those device links elsewhere (Pin-yen Lin)

* pm-core:
  PM: sleep: Do not wait on SYNC_STATE_ONLY device links

* pm-runtime:
  PM: runtime: Fix error checking for kunit_device_register()
  PM: runtime: Introduce one more usage counter guard
  PM: runtime: Drop DEFINE_FREE() for pm_runtime_put()
  PCI/sysfs: Use runtime PM guard macro for auto-cleanup
  PM: runtime: Add auto-cleanup macros for "resume and get" operations
  PM: runtime: Update kerneldoc return codes
  PM: runtime: Make put{,_sync}() return 1 when already suspended
  PM: runtime: Add basic kunit tests for API contracts
2025-10-07 12:20:36 +02:00
Linus Torvalds
81538c8e42 Merge tag 'nfsd-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd updates from Chuck Lever:
 "Mike Snitzer has prototyped a mechanism for disabling I/O caching in
  NFSD. This is introduced in v6.18 as an experimental feature. This
  enables scaling NFSD in /both/ directions:

   - NFS service can be supported on systems with small memory
     footprints, such as low-cost cloud instances

   - Large NFS workloads will be less likely to force the eviction of
     server-local activity, helping it avoid thrashing

  Jeff Layton contributed a number of fixes to the new attribute
  delegation implementation (based on a pending Internet RFC) that we
  hope will make attribute delegation reliable enough to enable by
  default, as it is on the Linux NFS client.

  The remaining patches in this pull request are clean-ups and minor
  optimizations. Many thanks to the contributors, reviewers, testers,
  and bug reporters who participated during the v6.18 NFSD development
  cycle"

* tag 'nfsd-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (42 commits)
  nfsd: discard nfserr_dropit
  SUNRPC: Make RPCSEC_GSS_KRB5 select CRYPTO instead of depending on it
  NFSD: Add io_cache_{read,write} controls to debugfs
  NFSD: Do the grace period check in ->proc_layoutget
  nfsd: delete unnecessary NULL check in __fh_verify()
  NFSD: Allow layoutcommit during grace period
  NFSD: Disallow layoutget during grace period
  sunrpc: fix "occurence"->"occurrence"
  nfsd: Don't force CRYPTO_LIB_SHA256 to be built-in
  nfsd: nfserr_jukebox in nlm_fopen should lead to a retry
  NFSD: Reduce DRC bucket size
  NFSD: Delay adding new entries to LRU
  SUNRPC: Move the svc_rpcb_cleanup() call sites
  NFS: Remove rpcbind cleanup for NFSv4.0 callback
  nfsd: unregister with rpcbind when deleting a transport
  NFSD: Drop redundant conversion to bool
  sunrpc: eliminate return pointer in svc_tcp_sendmsg()
  sunrpc: fix pr_notice in svc_tcp_sendto() to show correct length
  nfsd: decouple the xprtsec policy check from check_nfsd_access()
  NFSD: Fix destination buffer size in nfsd4_ssc_setup_dul()
  ...
2025-10-06 13:22:21 -07:00
Linus Torvalds
256e341706 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull x86 kvm updates from Paolo Bonzini:
 "Generic:

   - Rework almost all of KVM's exports to expose symbols only to KVM's
     x86 vendor modules (kvm-{amd,intel}.ko and PPC's kvm-{pr,hv}.ko

  x86:

   - Rework almost all of KVM x86's exports to expose symbols only to
     KVM's vendor modules, i.e. to kvm-{amd,intel}.ko

   - Add support for virtualizing Control-flow Enforcement Technology
     (CET) on Intel (Shadow Stacks and Indirect Branch Tracking) and AMD
     (Shadow Stacks).

     It is worth noting that while SHSTK and IBT can be enabled
     separately in CPUID, it is not really possible to virtualize them
     separately. Therefore, Intel processors will really allow both
     SHSTK and IBT under the hood if either is made visible in the
     guest's CPUID. The alternative would be to intercept
     XSAVES/XRSTORS, which is not feasible for performance reasons

   - Fix a variety of fuzzing WARNs all caused by checking L1 intercepts
     when completing userspace I/O. KVM has already committed to
     allowing L2 to to perform I/O at that point

   - Emulate PERF_CNTR_GLOBAL_STATUS_SET for PerfMonV2 guests, as the
     MSR is supposed to exist for v2 PMUs

   - Allow Centaur CPU leaves (base 0xC000_0000) for Zhaoxin CPUs

   - Add support for the immediate forms of RDMSR and WRMSRNS, sans full
     emulator support (KVM should never need to emulate the MSRs outside
     of forced emulation and other contrived testing scenarios)

   - Clean up the MSR APIs in preparation for CET and FRED
     virtualization, as well as mediated vPMU support

   - Clean up a pile of PMU code in anticipation of adding support for
     mediated vPMUs

   - Reject in-kernel IOAPIC/PIT for TDX VMs, as KVM can't obtain EOI
     vmexits needed to faithfully emulate an I/O APIC for such guests

   - Many cleanups and minor fixes

   - Recover possible NX huge pages within the TDP MMU under read lock
     to reduce guest jitter when restoring NX huge pages

   - Return -EAGAIN during prefault if userspace concurrently
     deletes/moves the relevant memslot, to fix an issue where
     prefaulting could deadlock with the memslot update

  x86 (AMD):

   - Enable AVIC by default for Zen4+ if x2AVIC (and other prereqs) is
     supported

   - Require a minimum GHCB version of 2 when starting SEV-SNP guests
     via KVM_SEV_INIT2 so that invalid GHCB versions result in immediate
     errors instead of latent guest failures

   - Add support for SEV-SNP's CipherText Hiding, an opt-in feature that
     prevents unauthorized CPU accesses from reading the ciphertext of
     SNP guest private memory, e.g. to attempt an offline attack. This
     feature splits the shared SEV-ES/SEV-SNP ASID space into separate
     ranges for SEV-ES and SEV-SNP guests, therefore a new module
     parameter is needed to control the number of ASIDs that can be used
     for VMs with CipherText Hiding vs. how many can be used to run
     SEV-ES guests

   - Add support for Secure TSC for SEV-SNP guests, which prevents the
     untrusted host from tampering with the guest's TSC frequency, while
     still allowing the the VMM to configure the guest's TSC frequency
     prior to launch

   - Validate the XCR0 provided by the guest (via the GHCB) to avoid
     bugs resulting from bogus XCR0 values

   - Save an SEV guest's policy if and only if LAUNCH_START fully
     succeeds to avoid leaving behind stale state (thankfully not
     consumed in KVM)

   - Explicitly reject non-positive effective lengths during SNP's
     LAUNCH_UPDATE instead of subtly relying on guest_memfd to deal with
     them

   - Reload the pre-VMRUN TSC_AUX on #VMEXIT for SEV-ES guests, not the
     host's desired TSC_AUX, to fix a bug where KVM was keeping a
     different vCPU's TSC_AUX in the host MSR until return to userspace

  KVM (Intel):

   - Preparation for FRED support

   - Don't retry in TDX's anti-zero-step mitigation if the target
     memslot is invalid, i.e. is being deleted or moved, to fix a
     deadlock scenario similar to the aforementioned prefaulting case

   - Misc bugfixes and minor cleanups"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (142 commits)
  KVM: x86: Export KVM-internal symbols for sub-modules only
  KVM: x86: Drop pointless exports of kvm_arch_xxx() hooks
  KVM: x86: Move kvm_intr_is_single_vcpu() to lapic.c
  KVM: Export KVM-internal symbols for sub-modules only
  KVM: s390/vfio-ap: Use kvm_is_gpa_in_memslot() instead of open coded equivalent
  KVM: VMX: Make CR4.CET a guest owned bit
  KVM: selftests: Verify MSRs are (not) in save/restore list when (un)supported
  KVM: selftests: Add coverage for KVM-defined registers in MSRs test
  KVM: selftests: Add KVM_{G,S}ET_ONE_REG coverage to MSRs test
  KVM: selftests: Extend MSRs test to validate vCPUs without supported features
  KVM: selftests: Add support for MSR_IA32_{S,U}_CET to MSRs test
  KVM: selftests: Add an MSR test to exercise guest/host and read/write
  KVM: x86: Define AMD's #HV, #VC, and #SX exception vectors
  KVM: x86: Define Control Protection Exception (#CP) vector
  KVM: x86: Add human friendly formatting for #XM, and #VE
  KVM: SVM: Enable shadow stack virtualization for SVM
  KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid
  KVM: SVM: Pass through shadow stack MSRs as appropriate
  KVM: SVM: Update dump_vmcb with shadow stack save area additions
  KVM: nSVM: Save/load CET Shadow Stack state to/from vmcb12/vmcb02
  ...
2025-10-06 12:37:34 -07:00
Toke Høiland-Jørgensen
95920c2ed0 page_pool: Fix PP_MAGIC_MASK to avoid crashing on some 32-bit arches
Helge reported that the introduction of PP_MAGIC_MASK let to crashes on
boot on his 32-bit parisc machine. The cause of this is the mask is set
too wide, so the page_pool_page_is_pp() incurs false positives which
crashes the machine.

Just disabling the check in page_pool_is_pp() will lead to the page_pool
code itself malfunctioning; so instead of doing this, this patch changes
the define for PP_DMA_INDEX_BITS to avoid mistaking arbitrary kernel
pointers for page_pool-tagged pages.

The fix relies on the kernel pointers that alias with the pp_magic field
always being above PAGE_OFFSET. With this assumption, we can use the
lowest bit of the value of PAGE_OFFSET as the upper bound of the
PP_DMA_INDEX_MASK, which should avoid the false positives.

Because we cannot rely on PAGE_OFFSET always being a compile-time
constant, nor on it always being >0, we fall back to disabling the
dma_index storage when there are not enough bits available. This leaves
us in the situation we were in before the patch in the Fixes tag, but
only on a subset of architecture configurations. This seems to be the
best we can do until the transition to page types in complete for
page_pool pages.

v2:
- Make sure there's at least 8 bits available and that the PAGE_OFFSET
  bit calculation doesn't wrap

Link: https://lore.kernel.org/all/aMNJMFa5fDalFmtn@p100/
Fixes: ee62ce7a1d ("page_pool: Track DMA-mapped pages and unmap them when destroying the pool")
Cc: stable@vger.kernel.org # 6.15+
Tested-by: Helge Deller <deller@gmx.de>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Tested-by: Helge Deller <deller@gmx.de>
Link: https://patch.msgid.link/20250930114331.675412-1-toke@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-06 12:14:04 -07:00
Stephen Boyd
112104e2b7 Merge branch 'clk-determine-rate' into clk-next
* clk-determine-rate: (120 commits)
  clk: microchip: core: remove duplicate roclk_determine_rate()
  clk: nxp: Fix pll0 rate check condition in LPC18xx CGU driver
  clk: scmi: migrate round_rate() to determine_rate()
  clk: ti: fapll: convert from round_rate() to determine_rate()
  clk: ti: dra7-atl: convert from round_rate() to determine_rate()
  clk: ti: divider: convert from round_rate() to determine_rate()
  clk: ti: composite: convert from round_rate() to determine_rate()
  clk: ti: dpll: convert from round_rate() to determine_rate()
  clk: ti: dpll: change error return from ~0 to -EINVAL
  clk: ti: dpll: remove round_rate() in favor of determine_rate()
  clk: tegra: tegra210-emc: convert from round_rate() to determine_rate()
  clk: tegra: super: convert from round_rate() to determine_rate()
  clk: tegra: pll: convert from round_rate() to determine_rate()
  clk: tegra: periph: divider: convert from round_rate() to determine_rate()
  clk: tegra: divider: convert from round_rate() to determine_rate()
  clk: tegra: audio-sync: convert from round_rate() to determine_rate()
  clk: fixed-factor: drop round_rate() clk ops
  clk: divider: remove round_rate() in favor of determine_rate()
  clk: visconti: pll: convert from round_rate() to determine_rate()
  clk: versatile: vexpress-osc: convert from round_rate() to determine_rate()
  ...
2025-10-06 13:02:50 -05:00
Linus Torvalds
2f2c725493 Merge tag 'pci-v6.18-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci updates from Bjorn Helgaas:
 "Enumeration:

   - Add PCI_FIND_NEXT_CAP() and PCI_FIND_NEXT_EXT_CAP() macros that
     take config space accessor functions.

     Implement pci_find_capability(), pci_find_ext_capability(), and
     dwc, dwc endpoint, and cadence capability search interfaces with
     them (Hans Zhang)

   - Leave parent unit address 0 in 'interrupt-map' so that when we
     build devicetree nodes to describe PCI functions that contain
     multiple peripherals, we can build this property even when
     interrupt controllers lack 'reg' properties (Lorenzo Pieralisi)

   - Add a Xeon 6 quirk to disable Extended Tags and limit Max Read
     Request Size to 128B to avoid a performance issue (Ilpo Järvinen)

   - Add sysfs 'serial_number' file to expose the Device Serial Number
     (Matthew Wood)

   - Fix pci_acpi_preserve_config() memory leak (Nirmoy Das)

  Resource management:

   - Align m68k pcibios_enable_device() with other arches (Ilpo
     Järvinen)

   - Remove sparc pcibios_enable_device() implementations that don't do
     anything beyond what pci_enable_resources() does (Ilpo Järvinen)

   - Remove mips pcibios_enable_resources() and use
     pci_enable_resources() instead (Ilpo Järvinen)

   - Clean up bridge window sizing and assignment (Ilpo Järvinen),
     including:

       - Leave non-claimed bridge windows disabled

       - Enable bridges even if a window wasn't assigned because not all
         windows are required by downstream devices

       - Preserve bridge window type when releasing the resource, since
         the type is needed for reassignment

       - Consolidate selection of bridge windows into two new
         interfaces, pbus_select_window() and
         pbus_select_window_for_type(), so this is done consistently

       - Compute bridge window start and end earlier to avoid logging
         stale information

  MSI:

   - Add quirk to disable MSI on RDC PCI to PCIe bridges (Marcos Del Sol
     Vives)

  Error handling:

   - Align AER with EEH by allowing drivers to request a Bus Reset on
     Non-Fatal Errors (in addition to the reset on Fatal Errors that we
     already do) (Lukas Wunner)

   - If error recovery fails, emit FAILED_RECOVERY uevents for the
     devices, not for the bridge leading to them.

     This makes them correspond to BEGIN_RECOVERY uevents (Lukas Wunner)

   - Align AER with EEH by calling err_handler.error_detected()
     callbacks to notify drivers if error recovery fails (Lukas Wunner)

   - Align AER with EEH by restoring device error_state to
     pci_channel_io_normal before the err_handler.slot_reset() callback.

     This is earlier than before the err_handler.resume() callback
     (Lukas Wunner)

   - Emit a BEGIN_RECOVERY uevent when driver's
     err_handler.error_detected() requests a reset, as well as when it
     says recovery is complete or can be done without a reset (Niklas
     Schnelle)

   - Align s390 with AER and EEH by emitting uevents during error
     recovery (Niklas Schnelle)

   - Align EEH with AER and s390 by emitting BEGIN_RECOVERY,
     SUCCESSFUL_RECOVERY, or FAILED_RECOVERY uevents depending on the
     result of err_handler.error_detected() (Niklas Schnelle)

   - Fix a NULL pointer dereference in aer_ratelimit() when ACPI GHES
     error information identifies a device without an AER Capability
     (Breno Leitao)

   - Update error decoding and TLP Log printing for new errors in
     current PCIe base spec (Lukas Wunner)

   - Update error recovery documentation to match the current code
     and use consistent nomenclature (Lukas Wunner)

  ASPM:

   - Enable all ClockPM and ASPM states for devicetree platforms, since
     there's typically no firmware that enables ASPM

     This is a risky change that may uncover hardware or configuration
     defects at boot-time rather than when users enable ASPM via sysfs
     later. Booting with "pcie_aspm=off" prevents this enabling
     (Manivannan Sadhasivam)

   - Remove the qcom code that enabled ASPM (Manivannan Sadhasivam)

  Power management:

   - If a device has already been disconnected, e.g., by a hotplug
     removal, don't bother trying to resume it to D0 when detaching the
     driver.

     This avoids annoying "Unable to change power state from D3cold to
     D0" messages (Mario Limonciello)

   - Ensure devices are powered up before config reads for
     'max_link_width', 'current_link_speed', 'current_link_width',
     'secondary_bus_number', and 'subordinate_bus_number' sysfs files.

     This prevents using invalid data (~0) in drivers or lspci and,
     depending on how the PCIe controller reports errors, may avoid
     error interrupts or crashes (Brian Norris)

  Virtualization:

   - Add rescan/remove locking when enabling/disabling SR-IOV, which
     avoids list corruption on s390, where disabling SR-IOV also
     generates hotplug events (Niklas Schnelle)

  Peer-to-peer DMA:

   - Free struct p2p_pgmap, not a member within it, in the
     pci_p2pdma_add_resource() error path (Sungho Kim)

  Endpoint framework:

   - Document sysfs interface for BAR assignment of vNTB endpoint
     functions (Jerome Brunet)

   - Fix array underflow in endpoint BAR test case (Dan Carpenter)

   - Skip endpoint IRQ test if the IRQ is out of range to avoid false
     errors (Christian Bruel)

   - Fix endpoint test case for controllers with fixed-size BARs smaller
     than requested by the test (Marek Vasut)

   - Restore inbound translation when disabling doorbell so the endpoint
     doorbell test case can be run more than once (Niklas Cassel)

   - Avoid a NULL pointer dereference when releasing DMA channels in
     endpoint DMA test case (Shin'ichiro Kawasaki)

   - Convert tegra194 interrupt number to MSI vector to fix endpoint
     Kselftest MSI_TEST test case (Niklas Cassel)

   - Reset tegra194 BARs when running in endpoint mode so the BAR tests
     don't overwrite the ATU settings in BAR4 (Niklas Cassel)

   - Handle errors in tegra194 BPMP transactions so we don't mistakenly
     skip future PERST# assertion (Vidya Sagar)

  AMD MDB PCIe controller driver:

   - Update DT binding example to separate PERST# to a Root Port stanza
     to make multiple Root Ports possible in the future (Sai Krishna
     Musham)

   - Add driver support for PERST# being described in a Root Port
     stanza, falling back to the host bridge if not found there (Sai
     Krishna Musham)

  Freescale i.MX6 PCIe controller driver:

   - Enable the 3.3V Vaux supply if available so devices can request
     wakeup with either Beacon or WAKE# (Richard Zhu)

  MediaTek PCIe Gen3 controller driver:

   - Add optional sys clock ready time setting to avoid sys_clk_rdy
     signal glitching in MT6991 and MT8196 (AngeloGioacchino Del Regno)

   - Add DT binding and driver support for MT6991 and MT8196
     (AngeloGioacchino Del Regno)

  NVIDIA Tegra PCIe controller driver:

   - When asserting PERST#, disable the controller instead of mistakenly
     disabling the PLL twice (Nagarjuna Kristam)

   - Convert struct tegra_msi mask_lock to raw spinlock to avoid a lock
     nesting error (Marek Vasut)

  Qualcomm PCIe controller driver:

   - Select PCI Power Control Slot driver so slot voltage rails can be
     turned on/off if described in Root Port devicetree node (Qiang Yu)

   - Parse only PCI bridge child nodes in devicetree, skipping unrelated
     nodes such as OPP (Operating Performance Points), which caused
     probe failures (Krishna Chaitanya Chundru)

   - Add 8.0 GT/s and 32.0 GT/s equalization settings (Ziyue Zhang)

   - Consolidate Root Port 'phy' and 'reset' properties in struct
     qcom_pcie_port, regardless of whether we got them from the Root
     Port node or the host bridge node (Manivannan Sadhasivam)

   - Fetch and map the ELBI register space in the DWC core rather than
     in each driver individually (Krishna Chaitanya Chundru)

   - Enable ECAM mechanism in DWC core by setting up iATU with 'CFG
     Shift Feature' and use this in the qcom driver (Krishna Chaitanya
     Chundru)

   - Add SM8750 compatible to qcom,pcie-sm8550.yaml (Krishna Chaitanya
     Chundru)

   - Update qcom,pcie-x1e80100.yaml to allow fifth PCIe host on Qualcomm
     Glymur, which is compatible with X1E80100 but doesn't have the
     cnoc_sf_axi clock (Qiang Yu)

  Renesas R-Car PCIe controller driver:

   - Fix a typo that prevented correct PHY initialization (Marek Vasut)

   - Add a missing 1ms delay after PWR reset assertion as required by
     the V4H manual (Marek Vasut)

   - Assure reset has completed before DBI access to avoid SError (Marek
     Vasut)

   - Fix inverted PHY initialization check, which sometimes led to
     timeouts and failure to start the controller (Marek Vasut)

   - Pass the correct IRQ domain to generic_handle_domain_irq() to fix a
     regression when converting to msi_create_parent_irq_domain()
     (Claudiu Beznea)

   - Drop the spinlock protecting the PMSR register - it's no longer
     required since pci_lock already serializes accesses (Marek Vasut)

   - Convert struct rcar_msi mask_lock to raw spinlock to avoid a lock
     nesting error (Marek Vasut)

  SOPHGO PCIe controller driver:

   - Check for existence of struct cdns_pcie.ops before using it to
     allow Cadence drivers that don't need to supply ops (Chen Wang)

   - Add DT binding and driver for the SOPHGO SG2042 PCIe controller
     (Chen Wang)

  STMicroelectronics STM32MP25 PCIe controller driver:

   - Update pinctrl documentation of initial states and use in runtime
     suspend/resume (Christian Bruel)

   - Add pinctrl_pm_select_init_state() for use by stm32 driver, which
     needs it during resume (Christian Bruel)

   - Add devicetree bindings and drivers for the STMicroelectronics
     STM32MP25 in host and endpoint modes (Christian Bruel)

  Synopsys DesignWare PCIe controller driver:

   - Add support for x16 in devicetree 'num-lanes' property (Konrad
     Dybcio)

   - Verify that if DT specifies a single IRQ for all eDMA channels, it
     is named 'dma' (Niklas Cassel)

  TI J721E PCIe driver:

   - Add MODULE_DEVICE_TABLE() so driver can be autoloaded (Siddharth
     Vadapalli)

   - Power controller off before configuring the glue layer so the
     controller latches the correct values on power-on (Siddharth
     Vadapalli)

  TI Keystone PCIe controller driver:

   - Use devm_request_irq() so 'ks-pcie-error-irq' is freed when driver
     exits with error (Siddharth Vadapalli)

   - Add Peripheral Virtualization Unit (PVU), which restricts DMA from
     PCIe devices to specific regions of host memory, to the ti,am65
     binding (Jan Kiszka)

  Xilinx NWL PCIe controller driver:

   - Clear bootloader E_ECAM_CONTROL before merging in the new driver
     value to avoid writing invalid values (Jani Nurminen)"

* tag 'pci-v6.18-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (141 commits)
  PCI/AER: Avoid NULL pointer dereference in aer_ratelimit()
  MAINTAINERS: Add entry for ST STM32MP25 PCIe drivers
  PCI: stm32-ep: Add PCIe Endpoint support for STM32MP25
  dt-bindings: PCI: Add STM32MP25 PCIe Endpoint bindings
  PCI: stm32: Add PCIe host support for STM32MP25
  PCI: xilinx-nwl: Fix ECAM programming
  PCI: j721e: Fix incorrect error message in probe()
  PCI: keystone: Use devm_request_irq() to free "ks-pcie-error-irq" on exit
  dt-bindings: PCI: qcom,pcie-x1e80100: Set clocks minItems for the fifth Glymur PCIe Controller
  PCI: dwc: Support 16-lane operation
  PCI: Add lockdep assertion in pci_stop_and_remove_bus_device()
  PCI/IOV: Add PCI rescan-remove locking when enabling/disabling SR-IOV
  PCI: rcar-host: Convert struct rcar_msi mask_lock into raw spinlock
  PCI: tegra194: Rename 'root_bus' to 'root_port_bus' in tegra_pcie_downstream_dev_to_D0()
  PCI: tegra: Convert struct tegra_msi mask_lock into raw spinlock
  PCI: rcar-gen4: Fix inverted break condition in PHY initialization
  PCI: rcar-gen4: Assure reset occurs before DBI access
  PCI: rcar-gen4: Add missing 1ms delay after PWR reset assertion
  PCI: Set up bridge resources earlier
  PCI: rcar-host: Drop PMSR spinlock
  ...
2025-10-06 10:41:03 -07:00
Linus Torvalds
e4c0fdd5af Merge tag 'dmaengine-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine
Pull dmaengine updates from Vinod Koul:
 "A couple of new device support and small driver updates for this
  round.

  New support:
   - Intel idxd Wildcat Lake family support
   - SpacemiT K1 PDMA controller support
   - Renesas RZ/G3E family support

  Updates:
   - Xilinx shutdown support and dma client properties update
   - Designware edma callback_result support"

* tag 'dmaengine-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
  dt-bindings: dma: rz-dmac: Document RZ/G3E family of SoCs
  dmaengine: dw-edma: Set status for callback_result
  dmaengine: mv_xor: match alloc_wc and free_wc
  dmaengine: mmp_pdma: Add SpacemiT K1 PDMA support with 64-bit addressing
  dmaengine: mmp_pdma: Add operations structure for controller abstraction
  dmaengine: mmp_pdma: Add reset controller support
  dmaengine: mmp_pdma: Add clock support
  dt-bindings: dma: Add SpacemiT K1 PDMA controller
  dt-bindings: dmaengine: xilinx_dma: Remove DMA client properties
  dmaengine: Fix dma_async_tx_descriptor->tx_submit documentation
  dmaengine: xilinx_dma: Support descriptor setup from dma_vecs
  dmaengine: sh: setup_xref error handling
  dmaengine: Replace zero-length array with flexible-array
  dmaengine: ppc4xx: Remove space before newline
  dmaengine: idxd: Add a new IAA device ID for Wildcat Lake family platforms
  dmaengine: idxd: Replace memset(0) + strscpy() with strscpy_pad()
  dt-bindings: dma: nvidia,tegra20-apbdma: Add undocumented compatibles and "clock-names"
  dmaengine: zynqmp_dma: Add shutdown operation support
2025-10-06 10:37:06 -07:00
Li Zhe
929bf010e0 mm: introduce num_pages_contiguous()
Let's add a simple helper for determining the number of contiguous pages
that represent contiguous PFNs.

In an ideal world, this helper would be simpler or not even required.
Unfortunately, on some configs we still have to maintain (SPARSEMEM
without VMEMMAP), the memmap is allocated per memory section, and we might
run into weird corner cases of false positives when blindly testing for
contiguous pages only.

One example of such false positives would be a memory section-sized hole
that does not have a memmap. The surrounding memory sections might get
"struct pages" that are contiguous, but the PFNs are actually not.

This helper will, for example, be useful for determining contiguous PFNs
in a GUP result, to batch further operations across returned "struct
page"s. VFIO will utilize this interface to accelerate the VFIO DMA map
process.

Implementation based on Linus' suggestions to avoid new usage of
nth_page() where avoidable.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20250814064714.56485-2-lizhe.67@bytedance.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-10-06 11:21:26 -06:00