Commit Graph

6656 Commits

Author SHA1 Message Date
Linus Torvalds
d348c22394 Merge tag 'pm-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
 "There are quite a few interesting things here, including new hardware
  support, new features, some bug fixes and documentation updates. In
  addition, there are a usual bunch of minor fixes and cleanups all
  over.

  In the new hardware support category, there are intel_pstate and
  intel_rapl driver updates to support new processors, Panther Lake,
  Wildcat Lake, Noval Lake, and Diamond Rapids in the OOB mode, OPP and
  bandwidth allocation support in the tegra186 cpufreq driver, and
  JH7110S SOC support in dt-platdev cpufreq.

  The new features are the PM QoS CPU latency limit for suspend-to-idle,
  the netlink support for the energy model management, support for
  terminating system suspend via a wakeup event during the sync of file
  systems, configurable number of hibernation compression threads, the
  runtime PM auto-cleanup macros, and the "poweroff" PM event that is
  expected to be used during system shutdown.

  Bugs are mostly fixed in cpuidle governors, but there are also fixes
  elsewhere, like in the amd-pstate cpufreq driver.

  Documentation updates include, but are not limited to, a new doc on
  debugging shutdown hangs, cross-referencing fixes and cleanups in the
  intel_pstate documentation, and updates of comments in the core
  hibernation code.

  Specifics:

   - Introduce and document a QoS limit on CPU exit latency during
     wakeup from suspend-to-idle (Ulf Hansson)

   - Add support for building libcpupower statically (Zuo An)

   - Add support for sending netlink notifications to user space on
     energy model updates (Changwoo Mini, Peng Fan)

   - Minor improvements to the Rust OPP interface (Tamir Duberstein)

   - Fixes to scope-based pointers in the OPP library (Viresh Kumar)

   - Use residency threshold in polling state override decisions in the
     menu cpuidle governor (Aboorva Devarajan)

   - Add sanity check for exit latency and target residency in the
     cpufreq core (Rafael Wysocki)

   - Use this_cpu_ptr() where possible in the teo governor (Christian
     Loehle)

   - Rework the handling of tick wakeups in the teo cpuidle governor to
     increase the likelihood of stopping the scheduler tick in the cases
     when tick wakeups can be counted as non-timer ones (Rafael Wysocki)

   - Fix a reverse condition in the teo cpuidle governor and drop a
     misguided target residency check from it (Rafael Wysocki)

   - Clean up multiple minor defects in the teo cpuidle governor (Rafael
     Wysocki)

   - Update header inclusion to make it follow the Include What You Use
     principle (Andy Shevchenko)

   - Enable MSR-based RAPL PMU support in the intel_rapl power capping
     driver and arrange for using it on the Panther Lake and Wildcat
     Lake processors (Kuppuswamy Sathyanarayanan)

   - Add support for Nova Lake and Wildcat Lake processors to the
     intel_rapl power capping driver (Kaushlendra Kumar, Srinivas
     Pandruvada)

   - Add OPP and bandwidth support for Tegra186 (Aaron Kling)

   - Optimizations for parameter array handling in the amd-pstate
     cpufreq driver (Mario Limonciello)

   - Fix for mode changes with offline CPUs in the amd-pstate cpufreq
     driver (Gautham Shenoy)

   - Preserve freq_table_sorted across suspend/hibernate in the cpufreq
     core (Zihuan Zhang)

   - Adjust energy model rules for Intel hybrid platforms in the
     intel_pstate cpufreq driver and improve printing of debug messages
     in it (Rafael Wysocki)

   - Replace deprecated strcpy() in cpufreq_unregister_governor()
     (Thorsten Blum)

   - Fix duplicate hyperlink target errors in the intel_pstate cpufreq
     driver documentation and use :ref: directive for internal linking
     in it (Swaraj Gaikwad, Bagas Sanjaya)

   - Add Diamond Rapids OOB mode support to the intel_pstate cpufreq
     driver (Kuppuswamy Sathyanarayanan)

   - Use mutex guard for driver locking in the intel_pstate driver and
     eliminate some code duplication from it (Rafael Wysocki)

   - Replace udelay() with usleep_range() in ACPI cpufreq (Kaushlendra
     Kumar)

   - Minor improvements to various cpufreq drivers (Christian Marangi,
     Hal Feng, Jie Zhan, Marco Crivellari, Miaoqian Lin, and Shuhao Fu)

   - Replace snprintf() with scnprintf() in show_trace_dev_match()
     (Kaushlendra Kumar)

   - Fix memory allocation error handling in pm_vt_switch_required()
     (Malaya Kumar Rout)

   - Introduce CALL_PM_OP() macro and use it to simplify code in generic
     PM operations (Kaushlendra Kumar)

   - Add module param to backtrace all CPUs in the device power
     management watchdog (Sergey Senozhatsky)

   - Rework message printing in swsusp_save() (Rafael Wysocki)

   - Make it possible to change the number of hibernation compression
     threads (Xueqin Luo)

   - Clarify that only cgroup1 freezer uses PM freezer (Tejun Heo)

   - Add document on debugging shutdown hangs to PM documentation and
     correct a mistaken configuration option in it (Mario Limonciello)

   - Shut down wakeup source timer before removing the wakeup source
     from the list (Kaushlendra Kumar, Rafael Wysocki)

   - Introduce new PMSG_POWEROFF event for system shutdown handling with
     the help of PM device callbacks (Mario Limonciello)

   - Make pm_test delay interruptible by wakeup events (Riwen Lu)

   - Clean up kernel-doc comment style usage in the core hibernation
     code and remove unuseful comments from it (Sunday Adelodun, Rafael
     Wysocki)

   - Add support for handling wakeup events and aborting the suspend
     process while it is syncing file systems (Samuel Wu, Rafael
     Wysocki)

   - Add WQ_UNBOUND to pm_wq workqueue (Marco Crivellari)

   - Add runtime PM wrapper macros for ACQUIRE()/ACQUIRE_ERR() and use
     them in the PCI core and the ACPI TAD driver (Rafael Wysocki)

   - Improve runtime PM in the ACPI TAD driver (Rafael Wysocki)

   - Update pm_runtime_allow/forbid() documentation (Rafael Wysocki)

   - Fix typos in runtime.c comments (Malaya Kumar Rout)

   - Move governor.h from devfreq under include/linux/ and rename to
     devfreq-governor.h to allow devfreq governor definitions in out of
     drivers/devfreq/ (Dmitry Baryshkov)

   - Use min() to improve readability in tegra30-devfreq.c (Thorsten
     Blum)

   - Fix potential use-after-free issue of OPP handling in
     hisi_uncore_freq.c (Pengjie Zhang)

   - Fix typo in DFSO_DOWNDIFFERENTIAL macro name in
     governor_simpleondemand.c in devfreq (Riwen Lu)"

* tag 'pm-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (96 commits)
  PM / devfreq: Fix typo in DFSO_DOWNDIFFERENTIAL macro name
  cpuidle: Warn instead of bailing out if target residency check fails
  cpuidle: Update header inclusion
  Documentation: power/cpuidle: Document the CPU system wakeup latency QoS
  cpuidle: Respect the CPU system wakeup QoS limit for cpuidle
  sched: idle: Respect the CPU system wakeup QoS limit for s2idle
  pmdomain: Respect the CPU system wakeup QoS limit for cpuidle
  pmdomain: Respect the CPU system wakeup QoS limit for s2idle
  PM: QoS: Introduce a CPU system wakeup QoS limit
  cpuidle: governors: teo: Add missing space to the description
  PM: hibernate: Extra cleanup of comments in swap handling code
  PM / devfreq: tegra30: use min to simplify actmon_cpu_to_emc_rate
  PM / devfreq: hisi: Fix potential UAF in OPP handling
  PM / devfreq: Move governor.h to a public header location
  powercap: intel_rapl: Enable MSR-based RAPL PMU support
  powercap: intel_rapl: Prepare read_raw() interface for atomic-context callers
  cpufreq: qcom-nvmem: fix compilation warning for qcom_cpufreq_ipq806x_match_list
  PM: sleep: Call pm_sleep_fs_sync() instead of ksys_sync_helper()
  PM: sleep: Add support for wakeup during filesystem sync
  cpufreq: ACPI: Replace udelay() with usleep_range()
  ...
2025-12-02 17:31:22 -08:00
Linus Torvalds
2547f79b0b Merge tag 's390-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Heiko Carstens:

 - Provide a new interface for dynamic configuration and deconfiguration
   of hotplug memory, allowing with and without memmap_on_memory
   support. This makes the way memory hotplug is handled on s390 much
   more similar to other architectures

 - Remove compat support. There shouldn't be any compat user space
   around anymore, therefore get rid of a lot of code which also doesn't
   need to be tested anymore

 - Add stackprotector support. GCC 16 will get new compiler options,
   which allow to generate code required for kernel stackprotector
   support

 - Merge pai_crypto and pai_ext PMU drivers into a new driver. This
   removes a lot of duplicated code. The new driver is also extendable
   and allows to support new PMUs

 - Add driver override support for AP queues

 - Rework and extend zcrypt and AP trace events to allow for tracing of
   crypto requests

 - Support block sizes larger than 65535 bytes for CCW tape devices

 - Since the rework of the virtual kernel address space the module area
   and the kernel image are within the same 4GB area. This eliminates
   the need of weak per cpu variables. Get rid of
   ARCH_MODULE_NEEDS_WEAK_PER_CPU

 - Various other small improvements and fixes

* tag 's390-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (92 commits)
  watchdog: diag288_wdt: Remove KMSG_COMPONENT macro
  s390/entry: Use lay instead of aghik
  s390/vdso: Get rid of -m64 flag handling
  s390/vdso: Rename vdso64 to vdso
  s390: Rename head64.S to head.S
  s390/vdso: Use common STABS_DEBUG and DWARF_DEBUG macros
  s390: Add stackprotector support
  s390/modules: Simplify module_finalize() slightly
  s390: Remove KMSG_COMPONENT macro
  s390/percpu: Get rid of ARCH_MODULE_NEEDS_WEAK_PER_CPU
  s390/ap: Restrict driver_override versus apmask and aqmask use
  s390/ap: Rename mutex ap_perms_mutex to ap_attr_mutex
  s390/ap: Support driver_override for AP queue devices
  s390/ap: Use all-bits-one apmask/aqmask for vfio in_use() checks
  s390/debug: Update description of resize operation
  s390/syscalls: Switch to generic system call table generation
  s390/syscalls: Remove system call table pointer from thread_struct
  s390/uapi: Remove 31 bit support from uapi header files
  s390: Remove compat support
  tools: Remove s390 compat support
  ...
2025-12-02 16:37:00 -08:00
Linus Torvalds
6863c8385c Merge tag 'irq-core-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq core updates from Thomas Gleixner:
 "Updates for the interrupt core and treewide cleanups:

   - Rework of the Per Processor Interrupt (PPI) management on ARM[64]

     PPI support was built under the assumption that the systems are
     homogenous so that the same CPU local device types are connected to
     them. That's unfortunately wishful thinking and created horrible
     workarounds.

     This rework provides affinity management for PPIs so that they can
     be individually configured in the firmware tables and mops up the
     related drivers all over the place.

   - Prevent CPUSET/isolation changes to arbitrarily affine interrupt
     threads to random CPUs, which ignores user or driver settings.

   - Plug a harmless race in the interrupt affinity proc interface,
     which allows to see a half updated mask

   - Adjust the priority of secondary interrupt threads on RT, so that
     the combination of primary and secondary thread emulates the
     hardware interrupt plus thread scenario. Having them at the same
     priority can cause starvation issues in some drivers"

* tag 'irq-core-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
  genirq: Remove cpumask availability check on kthread affinity setting
  genirq: Fix interrupt threads affinity vs. cpuset isolated partitions
  genirq: Prevent early spurious wake-ups of interrupt threads
  genirq: Use raw_spinlock_irq() in irq_set_affinity_notifier()
  genirq/manage: Reduce priority of forced secondary interrupt handler
  genirq/proc: Fix race in show_irq_affinity()
  genirq: Fix percpu_devid irq affinity documentation
  perf: arm_pmu: Kill last use of per-CPU cpu_armpmu pointer
  irqdomain: Kill of_node_to_fwnode() helper
  genirq: Kill irq_{g,s}et_percpu_devid_partition()
  irqchip: Kill irq-partition-percpu
  irqchip/apple-aic: Drop support for custom PMU irq partitions
  irqchip/gic-v3: Drop support for custom PPI partitions
  coresight: trbe: Request specific affinities for per CPU interrupts
  perf: arm_spe_pmu: Request specific affinities for per CPU interrupts
  perf: arm_pmu: Request specific affinities for per CPU NMIs/interrupts
  genirq: Add request_percpu_irq_affinity() helper
  genirq: Allow per-cpu interrupt sharing for non-overlapping affinities
  genirq: Update request_percpu_nmi() to take an affinity
  genirq: Add affinity to percpu_devid interrupt requests
  ...
2025-12-02 09:14:26 -08:00
Linus Torvalds
db74a7d02a Merge tag 'vfs-6.19-rc1.directory.delegations' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull directory delegations update from Christian Brauner:
 "This contains the work for recall-only directory delegations for
  knfsd.

  Add support for simple, recallable-only directory delegations. This
  was decided at the fall NFS Bakeathon where the NFS client and server
  maintainers discussed how to merge directory delegation support.

  The approach starts with recallable-only delegations for several reasons:

   1. RFC8881 has gaps that are being addressed in RFC8881bis. In
      particular, it requires directory position information for
      CB_NOTIFY callbacks, which is difficult to implement properly
      under Linux. The spec is being extended to allow that information
      to be omitted.

   2. Client-side support for CB_NOTIFY still lags. The client side
      involves heuristics about when to request a delegation.

   3. Early indication shows simple, recallable-only delegations can
      help performance. Anna Schumaker mentioned seeing a multi-minute
      speedup in xfstests runs with them enabled.

  With these changes, userspace can also request a read lease on a
  directory that will be recalled on conflicting accesses. This may be
  useful for applications like Samba. Users can disable leases
  altogether via the fs.leases-enable sysctl if needed.

  VFS changes:

   - Dedicated Type for Delegations

     Introduce struct delegated_inode to track inodes that may have
     delegations that need to be broken. This replaces the previous
     approach of passing raw inode pointers through the delegation
     breaking code paths, providing better type safety and clearer
     semantics for the delegation machinery.

   - Break parent directory delegations in open(..., O_CREAT) codepath

   - Allow mkdir to wait for delegation break on parent

   - Allow rmdir to wait for delegation break on parent

   - Add try_break_deleg calls for parents to vfs_link(), vfs_rename(),
     and vfs_unlink()

   - Make vfs_create(), vfs_mknod(), and vfs_symlink() break delegations
     on parent directory

   - Clean up argument list for vfs_create()

   - Expose delegation support to userland

  Filelock changes:

   - Make lease_alloc() take a flags argument

   - Rework the __break_lease API to use flags

   - Add struct delegated_inode

   - Push the S_ISREG check down to ->setlease handlers

   - Lift the ban on directory leases in generic_setlease

  NFSD changes:

   - Allow filecache to hold S_IFDIR files

   - Allow DELEGRETURN on directories

   - Wire up GET_DIR_DELEGATION handling

  Fixes:

   - Fix kernel-doc warnings in __fcntl_getlease

   - Add needed headers for new struct delegation definition"

* tag 'vfs-6.19-rc1.directory.delegations' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  vfs: add needed headers for new struct delegation definition
  filelock: __fcntl_getlease: fix kernel-doc warnings
  vfs: expose delegation support to userland
  nfsd: wire up GET_DIR_DELEGATION handling
  nfsd: allow DELEGRETURN on directories
  nfsd: allow filecache to hold S_IFDIR files
  filelock: lift the ban on directory leases in generic_setlease
  vfs: make vfs_symlink break delegations on parent dir
  vfs: make vfs_mknod break delegations on parent directory
  vfs: make vfs_create break delegations on parent directory
  vfs: clean up argument list for vfs_create()
  vfs: break parent dir delegations in open(..., O_CREAT) codepath
  vfs: allow rmdir to wait for delegation break on parent
  vfs: allow mkdir to wait for delegation break on parent
  vfs: add try_break_deleg calls for parents to vfs_{link,rename,unlink}
  filelock: push the S_ISREG check down to ->setlease handlers
  filelock: add struct delegated_inode
  filelock: rework the __break_lease API to use flags
  filelock: make lease_alloc() take a flags argument
2025-12-01 15:34:41 -08:00
Linus Torvalds
1d18101a64 Merge tag 'kernel-6.19-rc1.cred' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull cred guard updates from Christian Brauner:
 "This contains substantial credential infrastructure improvements
  adding guard-based credential management that simplifies code and
  eliminates manual reference counting in many subsystems.

  Features:

   - Kernel Credential Guards

     Add with_kernel_creds() and scoped_with_kernel_creds() guards that
     allow using the kernel credentials without allocating and copying
     them. This was requested by Linus after seeing repeated
     prepare_kernel_creds() calls that duplicate the kernel credentials
     only to drop them again later.

     The new guards completely avoid the allocation and never expose the
     temporary variable to hold the kernel credentials anywhere in
     callers.

   - Generic Credential Guards

     Add scoped_with_creds() guards for the common override_creds() and
     revert_creds() pattern. This builds on earlier work that made
     override_creds()/revert_creds() completely reference count free.

   - Prepare Credential Guards

     Add prepare credential guards for the more complex pattern of
     preparing a new set of credentials and overriding the current
     credentials with them:
      - prepare_creds()
      - modify new creds
      - override_creds()
      - revert_creds()
      - put_cred()

  Cleanups:

   - Make init_cred static since it should not be directly accessed

   - Add kernel_cred() helper to properly access the kernel credentials

   - Fix scoped_class() macro that was introduced two cycles ago

   - coredump: split out do_coredump() from vfs_coredump() for cleaner
     credential handling

   - coredump: move revert_cred() before coredump_cleanup()

   - coredump: mark struct mm_struct as const

   - coredump: pass struct linux_binfmt as const

   - sev-dev: use guard for path"

* tag 'kernel-6.19-rc1.cred' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (36 commits)
  trace: use override credential guard
  trace: use prepare credential guard
  coredump: use override credential guard
  coredump: use prepare credential guard
  coredump: split out do_coredump() from vfs_coredump()
  coredump: mark struct mm_struct as const
  coredump: pass struct linux_binfmt as const
  coredump: move revert_cred() before coredump_cleanup()
  sev-dev: use override credential guards
  sev-dev: use prepare credential guard
  sev-dev: use guard for path
  cred: add prepare credential guard
  net/dns_resolver: use credential guards in dns_query()
  cgroup: use credential guards in cgroup_attach_permissions()
  act: use credential guards in acct_write_process()
  smb: use credential guards in cifs_get_spnego_key()
  nfs: use credential guards in nfs_idmap_get_key()
  nfs: use credential guards in nfs_local_call_write()
  nfs: use credential guards in nfs_local_call_read()
  erofs: use credential guards
  ...
2025-12-01 13:45:41 -08:00
Rafael J. Wysocki
f086594adb Merge branch 'pm-sleep'
Merge updates related to system suspend and hibernation for 6.19-rc1:

 - Replace snprintf() with scnprintf() in show_trace_dev_match()
   (Kaushlendra Kumar)

 - Fix memory allocation error handling in pm_vt_switch_required()
   (Malaya Kumar Rout)

 - Introduce CALL_PM_OP() macro and use it to simplify code in
   generic PM operations (Kaushlendra Kumar)

 - Add module param to backtrace all CPUs in the device power management
   watchdog (Sergey Senozhatsky)

 - Rework message printing in swsusp_save() (Rafael Wysocki)

 - Make it possible to change the number of hibernation compression
   threads (Xueqin Luo)

 - Clarify that only cgroup1 freezer uses PM freezer (Tejun Heo)

 - Add document on debugging shutdown hangs to PM documentation and
   correct a mistaken configuration option in it (Mario Limonciello)

 - Shut down wakeup source timer before removing the wakeup source from
   the list (Kaushlendra Kumar, Rafael Wysocki)

 - Introduce new PMSG_POWEROFF event for system shutdown handling with
   the help of PM device callbacks (Mario Limonciello)

 - Make pm_test delay interruptible by wakeup events (Riwen Lu)

 - Clean up kernel-doc comment style usage in the core hibernation
   code and remove unuseful comments from it (Sunday Adelodun, Rafael
   Wysocki)

 - Add support for handling wakeup events and aborting the suspend
   process while it is syncing file systems (Samuel Wu, Rafael Wysocki)

* pm-sleep: (21 commits)
  PM: hibernate: Extra cleanup of comments in swap handling code
  PM: sleep: Call pm_sleep_fs_sync() instead of ksys_sync_helper()
  PM: sleep: Add support for wakeup during filesystem sync
  PM: hibernate: Clean up kernel-doc comment style usage
  PM: suspend: Make pm_test delay interruptible by wakeup events
  usb: sl811-hcd: Add PM_EVENT_POWEROFF into suspend callbacks
  scsi: Add PM_EVENT_POWEROFF into suspend callbacks
  PM: Introduce new PMSG_POWEROFF event
  PM: wakeup: Update after recent wakeup source removal ordering change
  PM: wakeup: Delete timer before removing wakeup source from list
  Documentation: power: Correct a mistaken configuration option
  Documentation: power: Add document on debugging shutdown hangs
  freezer: Clarify that only cgroup1 freezer uses PM freezer
  PM: hibernate: add sysfs interface for hibernate_compression_threads
  PM: hibernate: make compression threads configurable
  PM: hibernate: dynamically allocate crc->unc_len/unc for configurable threads
  PM: hibernate: Rework message printing in swsusp_save()
  PM: dpm_watchdog: add module param to backtrace all CPUs
  PM: sleep: Introduce CALL_PM_OP() macro to simplify code
  PM: console: Fix memory allocation error handling in pm_vt_switch_required()
  ...
2025-11-28 16:01:13 +01:00
Rafael J. Wysocki
60d69a7ed1 Merge branches 'pm-core' and 'pm-runtime'
Merge a core power management update and runtime PM framework updates
for 6.19-rc1:

 - Add WQ_UNBOUND to pm_wq workqueue (Marco Crivellari)

 - Add runtime PM wrapper macros for ACQUIRE()/ACQUIRE_ERR() and use
   them in the PCI core and the ACPI TAD driver (Rafael Wysocki)

 - Improve runtime PM in the ACPI TAD driver (Rafael Wysocki)

 - Update pm_runtime_allow/forbid() documentation (Rafael Wysocki)

 - Fix typos in runtime.c comments (Malaya Kumar Rout)

* pm-core:
  PM: WQ_UNBOUND added to pm_wq workqueue

* pm-runtime:
  PCI/sysfs: Use PM_RUNTIME_ACQUIRE()/PM_RUNTIME_ACQUIRE_ERR()
  ACPI: TAD: Use PM_RUNTIME_ACQUIRE()/PM_RUNTIME_ACQUIRE_ERR()
  PM: runtime: Wrapper macros for ACQUIRE()/ACQUIRE_ERR()
  PM: runtime: fix typos in runtime.c comments
  ACPI: TAD: Improve runtime PM using guard macros
  ACPI: TAD: Rearrange runtime PM operations in acpi_tad_remove()
  PM: runtime: docs: Update pm_runtime_allow/forbid() documentation
2025-11-28 15:56:09 +01:00
Rafael J. Wysocki
a857b530b3 Merge back material related to system sleep for 6.19 2025-11-20 22:28:23 +01:00
Rafael J. Wysocki
f384497a76 PM: sleep: core: Fix runtime PM enabling in device_resume_early()
Runtime PM should only be enabled in device_resume_early() if it has
been disabled for the given device by device_suspend_late().  Otherwise,
it may cause runtime PM callbacks to run prematurely in some cases
which leads to further functional issues.

Make two changes to address this problem.

First, reorder device_suspend_late() to only disable runtime PM for a
device when it is going to look for the device's callback or if the
device is a "syscore" one.  In all of the other cases, disabling runtime
PM for the device is not in fact necessary.  However, if the device's
callback returns an error and the power.is_late_suspended flag is not
going to be set, enable runtime PM so it only remains disabled when
power.is_late_suspended is set.

Second, make device_resume_early() only enable runtime PM for the
devices with the power.is_late_suspended flag set.

Fixes: 443046d1ad ("PM: sleep: Make suspend of devices more asynchronous")
Reported-by: Rose Wu <ya-jou.wu@mediatek.com>
Closes: https://lore.kernel.org/linux-pm/70b25dca6f8c2756d78f076f4a7dee7edaaffc33.camel@mediatek.com/
Cc: 6.16+ <stable@vger.kernel.org> # 6.16+
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/12784270.O9o76ZdvQC@rafael.j.wysocki
2025-11-18 15:47:55 +01:00
Rafael J. Wysocki
37d6d92fe0 Merge back earlier material related to system sleep for 6.19 2025-11-17 16:55:55 +01:00
Mario Limonciello (AMD)
0ca04993da PM: Introduce new PMSG_POWEROFF event
PMSG_POWEROFF will be used for the PM core to allow differentiating between
a hibernation or shutdown sequence when re-using callbacks for common code.

Hibernation is started by writing a hibernation method (such as 'platform'
'shutdown', or 'reboot') to use into /sys/power/disk and writing 'disk' to
/sys/power/state.

Shutdown is initiated with the reboot() syscall with arguments on whether
to halt the system or power it off.

Tested-by: Eric Naim <dnaim@cachyos.org>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Link: https://patch.msgid.link/20251112224025.2051702-2-superm1@kernel.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-14 17:05:53 +01:00
Rafael J. Wysocki
bdfacf441b Merge back earlier runtime PM changes for 6.19 2025-11-14 16:56:40 +01:00
Rafael J. Wysocki
9cf02802d6 PM: wakeup: Update after recent wakeup source removal ordering change
After a recent change, wakeup_source_activate() will warn that the given
wakeup source is "unregistered" after its timer has been shut down
in wakeup_source_remove() which may be somewhat confusing, so change
the warning message to say that the wakeup source is "unusable".

Accordingly, rename wakeup_source_not_registered() to
wakeup_source_not_usable() and update the comment in it
to also mention the removal of the wakeup source.

Also restore the comment in wakeup_source_remove() regarding the warning
in wakeup_source_activate() that may trigger after shutting down the
wakeup source timer.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/12788103.O9o76ZdvQC@rafael.j.wysocki
2025-11-12 20:56:25 +01:00
Jeff Layton
e8960c1b2e vfs: make vfs_mknod break delegations on parent directory
In order to add directory delegation support, we need to break
delegations on the parent whenever there is going to be a change in the
directory.

Add a new delegated_inode pointer to vfs_mknod() and have the
appropriate callers wait when there is an outstanding delegation. All
other callers just set the pointer to NULL.

Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: NeilBrown <neil@brown.name>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://patch.msgid.link/20251111-dir-deleg-ro-v6-11-52f3feebb2f2@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-12 09:38:36 +01:00
Jeff Layton
4fa76319cd vfs: allow rmdir to wait for delegation break on parent
In order to add directory delegation support, we need to break
delegations on the parent whenever there is going to be a change in the
directory.

Add a delegated_inode struct to vfs_rmdir() and populate that
pointer with the parent inode if it's non-NULL. Most existing in-kernel
callers pass in a NULL pointer.

Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: NeilBrown <neil@brown.name>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://patch.msgid.link/20251111-dir-deleg-ro-v6-7-52f3feebb2f2@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-12 09:38:35 +01:00
Jeff Layton
e12d203b8c vfs: allow mkdir to wait for delegation break on parent
In order to add directory delegation support, we need to break
delegations on the parent whenever there is going to be a change in the
directory.

Add a new delegated_inode parameter to vfs_mkdir. All of the existing
callers set that to NULL for now, except for do_mkdirat which will
properly block until the lease is gone.

Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: NeilBrown <neil@brown.name>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://patch.msgid.link/20251111-dir-deleg-ro-v6-6-52f3feebb2f2@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-12 09:38:35 +01:00
Kaushlendra Kumar
352899fd91 PM: wakeup: Delete timer before removing wakeup source from list
Replace timer_delete_sync() with timer_shutdown_sync() and move
it before list_del_rcu() in wakeup_source_remove() to improve the
cleanup ordering and code clarity.

This ensures that the timer is stopped before removing the wakeup
source from the events list, providing a more logical cleanup
sequence.

While the current ordering is functionally correct, stopping the
timer first makes the cleanup flow more intuitive and follows the
general pattern of disabling active components before removing data
structures.

Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com>
[ rjw: Subject and changelog edits ]
Link: https://patch.msgid.link/20251027044127.2456365-1-kaushlendra.kumar@intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-08 12:17:28 +01:00
Christian Brauner
b9e3594e70 firmware: don't copy kernel creds
No need to copy kernel credentials.

Link: https://patch.msgid.link/20251103-work-creds-init_cred-v1-5-cb3ec8711a6a@kernel.org
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-04 12:36:10 +01:00
Linus Torvalds
963bf16194 Merge tag 'regmap-fix-v6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap fixes from Mark Brown:
 "One documentation fix and a fix for a problem with the slimbus regmap
  which was uncovered by some changes in one of the drivers"

* tag 'regmap-fix-v6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap: irq: Correct documentation of wake_invert flag
  regmap: slimbus: fix bus_context pointer in regmap init calls
2025-11-01 10:45:39 -07:00
Malaya Kumar Rout
4e48e7baa3 PM: runtime: fix typos in runtime.c comments
Fix several typos in comments:
- "timesptamp" -> "timestamp"
- "involed" -> "involved"
- "nonero" -> "nonzero"

Fix typos in comments to improve code documentation clarity.

Signed-off-by: Malaya Kumar Rout <mrout@redhat.com>
Link: https://patch.msgid.link/20251026170527.262003-1-mrout@redhat.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-29 19:58:58 +01:00
Marc Zyngier
0d5daa938c platform: Add firmware-agnostic irq and affinity retrieval interface
Expand platform_get_irq_optional() to also return an affinity if available,
renaming it to platform_get_irq_affinity() in the process.

platform_get_irq_optional() is preserved with its current semantics by
calling into the new helper with a NULL affinity pointer.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Will Deacon <will@kernel.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://patch.msgid.link/20251020122944.3074811-5-maz@kernel.org
2025-10-27 17:16:32 +01:00
Alexey Klimov
434f7349a1 regmap: slimbus: fix bus_context pointer in regmap init calls
Commit 4e65bda827 ("ASoC: wcd934x: fix error handling in
wcd934x_codec_parse_data()") revealed the problem in the slimbus regmap.
That commit breaks audio playback, for instance, on sdm845 Thundercomm
Dragonboard 845c board:

 Unable to handle kernel paging request at virtual address ffff8000847cbad4
 ...
 CPU: 5 UID: 0 PID: 776 Comm: aplay Not tainted 6.18.0-rc1-00028-g7ea30958b305 #11 PREEMPT
 Hardware name: Thundercomm Dragonboard 845c (DT)
 ...
 Call trace:
  slim_xfer_msg+0x24/0x1ac [slimbus] (P)
  slim_read+0x48/0x74 [slimbus]
  regmap_slimbus_read+0x18/0x24 [regmap_slimbus]
  _regmap_raw_read+0xe8/0x174
  _regmap_bus_read+0x44/0x80
  _regmap_read+0x60/0xd8
  _regmap_update_bits+0xf4/0x140
  _regmap_select_page+0xa8/0x124
  _regmap_raw_write_impl+0x3b8/0x65c
  _regmap_bus_raw_write+0x60/0x80
  _regmap_write+0x58/0xc0
  regmap_write+0x4c/0x80
  wcd934x_hw_params+0x494/0x8b8 [snd_soc_wcd934x]
  snd_soc_dai_hw_params+0x3c/0x7c [snd_soc_core]
  __soc_pcm_hw_params+0x22c/0x634 [snd_soc_core]
  dpcm_be_dai_hw_params+0x1d4/0x38c [snd_soc_core]
  dpcm_fe_dai_hw_params+0x9c/0x17c [snd_soc_core]
  snd_pcm_hw_params+0x124/0x464 [snd_pcm]
  snd_pcm_common_ioctl+0x110c/0x1820 [snd_pcm]
  snd_pcm_ioctl+0x34/0x4c [snd_pcm]
  __arm64_sys_ioctl+0xac/0x104
  invoke_syscall+0x48/0x104
  el0_svc_common.constprop.0+0x40/0xe0
  do_el0_svc+0x1c/0x28
  el0_svc+0x34/0xec
  el0t_64_sync_handler+0xa0/0xf0
  el0t_64_sync+0x198/0x19c

The __devm_regmap_init_slimbus() started to be used instead of
__regmap_init_slimbus() after the commit mentioned above and turns out
the incorrect bus_context pointer (3rd argument) was used in
__devm_regmap_init_slimbus(). It should be just "slimbus" (which is equal
to &slimbus->dev). Correct it. The wcd934x codec seems to be the only or
the first user of devm_regmap_init_slimbus() but we should fix it till
the point where __devm_regmap_init_slimbus() was introduced therefore
two "Fixes" tags.

While at this, also correct the same argument in __regmap_init_slimbus().

Fixes: 4e65bda827 ("ASoC: wcd934x: fix error handling in wcd934x_codec_parse_data()")
Fixes: 7d6f7fb053 ("regmap: add SLIMbus support")
Cc: stable@vger.kernel.org
Cc: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Cc: Ma Ke <make24@iscas.ac.cn>
Cc: Steev Klimaszewski <steev@kali.org>
Cc: Srinivas Kandagatla <srini@kernel.org>
Reviewed-by: Abel Vesa <abel.vesa@linaro.org>
Signed-off-by: Alexey Klimov <alexey.klimov@linaro.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://patch.msgid.link/20251022201013.1740211-1-alexey.klimov@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
2025-10-23 15:19:58 +01:00
Rafael J. Wysocki
cea54f8e34 PM: runtime: docs: Update pm_runtime_allow/forbid() documentation
Drop confusing descriptions of pm_runtime_allow() and pm_runtime_forbid()
from Documentation/power/runtime_pm.rst and update the kerneldoc comments
of these functions to better explain their purpose.

Link: https://lore.kernel.org/linux-pm/08976178-298f-79d9-1d63-cff5a4e56cc3@linux.intel.com/
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Brian Norris <briannorris@chromium.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://patch.msgid.link/12780841.O9o76ZdvQC@rafael.j.wysocki
2025-10-23 16:13:33 +02:00
Kaushlendra Kumar
2eead19334 arch_topology: Fix incorrect error check in topology_parse_cpu_capacity()
Fix incorrect use of PTR_ERR_OR_ZERO() in topology_parse_cpu_capacity()
which causes the code to proceed with NULL clock pointers. The current
logic uses !PTR_ERR_OR_ZERO(cpu_clk) which evaluates to true for both
valid pointers and NULL, leading to potential NULL pointer dereference
in clk_get_rate().

Per include/linux/err.h documentation, PTR_ERR_OR_ZERO(ptr) returns:
"The error code within @ptr if it is an error pointer; 0 otherwise."

This means PTR_ERR_OR_ZERO() returns 0 for both valid pointers AND NULL
pointers. Therefore !PTR_ERR_OR_ZERO(cpu_clk) evaluates to true (proceed)
when cpu_clk is either valid or NULL, causing clk_get_rate(NULL) to be
called when of_clk_get() returns NULL.

Replace with !IS_ERR_OR_NULL(cpu_clk) which only proceeds for valid
pointers, preventing potential NULL pointer dereference in clk_get_rate().

Cc: stable <stable@kernel.org>
Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Fixes: b8fe128dad ("arch_topology: Adjust initial CPU capacities with current freq")
Link: https://patch.msgid.link/20250923174308.1771906-1-kaushlendra.kumar@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-22 08:06:28 +02:00
Sergey Senozhatsky
a67818f745 PM: dpm_watchdog: add module param to backtrace all CPUs
Add dpm_watchdog_all_cpu_backtrace module parameter which
controls all CPU backtrace dump before the DPM watchdog panics
the system.

This is expected to help understand what might have caused device
timeout.

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Dhruva Gole <d-gole@ti.com>
Link: https://patch.msgid.link/20251007063551.3147937-1-senozhatsky@chromium.org
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-20 20:07:02 +02:00
Kaushlendra Kumar
5a151c2328 PM: sleep: Introduce CALL_PM_OP() macro to simplify code
Add CALL_PM_OP() macro to eliminate a repetitive code pattern in
power management generic operations.

Replace analogous driver PM callback invocation logic across all
pm_generic_*() functions with a single macro that handles the NULL
pointer checks and function calls.

This reduces code size while maintaining the same functionality and
improving code maintainability.

Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com>
Reviewed-by: Dhruva Gole <d-gole@ti.com>
Link: https://patch.msgid.link/20250919124437.3075016-1-kaushlendra.kumar@intel.com
[ rjw: Subject and changelog edits, adjust white space ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-20 19:54:25 +02:00
Maarten Lankhorst
a91c809659 devcoredump: Fix circular locking dependency with devcd->mutex.
The original code causes a circular locking dependency found by lockdep.

======================================================
WARNING: possible circular locking dependency detected
6.16.0-rc6-lgci-xe-xe-pw-151626v3+ #1 Tainted: G S   U
------------------------------------------------------
xe_fault_inject/5091 is trying to acquire lock:
ffff888156815688 ((work_completion)(&(&devcd->del_wk)->work)){+.+.}-{0:0}, at: __flush_work+0x25d/0x660

but task is already holding lock:

ffff888156815620 (&devcd->mutex){+.+.}-{3:3}, at: dev_coredump_put+0x3f/0xa0
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (&devcd->mutex){+.+.}-{3:3}:
       mutex_lock_nested+0x4e/0xc0
       devcd_data_write+0x27/0x90
       sysfs_kf_bin_write+0x80/0xf0
       kernfs_fop_write_iter+0x169/0x220
       vfs_write+0x293/0x560
       ksys_write+0x72/0xf0
       __x64_sys_write+0x19/0x30
       x64_sys_call+0x2bf/0x2660
       do_syscall_64+0x93/0xb60
       entry_SYSCALL_64_after_hwframe+0x76/0x7e
-> #1 (kn->active#236){++++}-{0:0}:
       kernfs_drain+0x1e2/0x200
       __kernfs_remove+0xae/0x400
       kernfs_remove_by_name_ns+0x5d/0xc0
       remove_files+0x54/0x70
       sysfs_remove_group+0x3d/0xa0
       sysfs_remove_groups+0x2e/0x60
       device_remove_attrs+0xc7/0x100
       device_del+0x15d/0x3b0
       devcd_del+0x19/0x30
       process_one_work+0x22b/0x6f0
       worker_thread+0x1e8/0x3d0
       kthread+0x11c/0x250
       ret_from_fork+0x26c/0x2e0
       ret_from_fork_asm+0x1a/0x30
-> #0 ((work_completion)(&(&devcd->del_wk)->work)){+.+.}-{0:0}:
       __lock_acquire+0x1661/0x2860
       lock_acquire+0xc4/0x2f0
       __flush_work+0x27a/0x660
       flush_delayed_work+0x5d/0xa0
       dev_coredump_put+0x63/0xa0
       xe_driver_devcoredump_fini+0x12/0x20 [xe]
       devm_action_release+0x12/0x30
       release_nodes+0x3a/0x120
       devres_release_all+0x8a/0xd0
       device_unbind_cleanup+0x12/0x80
       device_release_driver_internal+0x23a/0x280
       device_driver_detach+0x14/0x20
       unbind_store+0xaf/0xc0
       drv_attr_store+0x21/0x50
       sysfs_kf_write+0x4a/0x80
       kernfs_fop_write_iter+0x169/0x220
       vfs_write+0x293/0x560
       ksys_write+0x72/0xf0
       __x64_sys_write+0x19/0x30
       x64_sys_call+0x2bf/0x2660
       do_syscall_64+0x93/0xb60
       entry_SYSCALL_64_after_hwframe+0x76/0x7e
other info that might help us debug this:
Chain exists of: (work_completion)(&(&devcd->del_wk)->work) --> kn->active#236 --> &devcd->mutex
 Possible unsafe locking scenario:
       CPU0                    CPU1
       ----                    ----
  lock(&devcd->mutex);
                               lock(kn->active#236);
                               lock(&devcd->mutex);
  lock((work_completion)(&(&devcd->del_wk)->work));
 *** DEADLOCK ***
5 locks held by xe_fault_inject/5091:
 #0: ffff8881129f9488 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x72/0xf0
 #1: ffff88810c755078 (&of->mutex#2){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x123/0x220
 #2: ffff8881054811a0 (&dev->mutex){....}-{3:3}, at: device_release_driver_internal+0x55/0x280
 #3: ffff888156815620 (&devcd->mutex){+.+.}-{3:3}, at: dev_coredump_put+0x3f/0xa0
 #4: ffffffff8359e020 (rcu_read_lock){....}-{1:2}, at: __flush_work+0x72/0x660
stack backtrace:
CPU: 14 UID: 0 PID: 5091 Comm: xe_fault_inject Tainted: G S   U              6.16.0-rc6-lgci-xe-xe-pw-151626v3+ #1 PREEMPT_{RT,(lazy)}
Tainted: [S]=CPU_OUT_OF_SPEC, [U]=USER
Hardware name: Micro-Star International Co., Ltd. MS-7D25/PRO Z690-A DDR4(MS-7D25), BIOS 1.10 12/13/2021
Call Trace:
 <TASK>
 dump_stack_lvl+0x91/0xf0
 dump_stack+0x10/0x20
 print_circular_bug+0x285/0x360
 check_noncircular+0x135/0x150
 ? register_lock_class+0x48/0x4a0
 __lock_acquire+0x1661/0x2860
 lock_acquire+0xc4/0x2f0
 ? __flush_work+0x25d/0x660
 ? mark_held_locks+0x46/0x90
 ? __flush_work+0x25d/0x660
 __flush_work+0x27a/0x660
 ? __flush_work+0x25d/0x660
 ? trace_hardirqs_on+0x1e/0xd0
 ? __pfx_wq_barrier_func+0x10/0x10
 flush_delayed_work+0x5d/0xa0
 dev_coredump_put+0x63/0xa0
 xe_driver_devcoredump_fini+0x12/0x20 [xe]
 devm_action_release+0x12/0x30
 release_nodes+0x3a/0x120
 devres_release_all+0x8a/0xd0
 device_unbind_cleanup+0x12/0x80
 device_release_driver_internal+0x23a/0x280
 ? bus_find_device+0xa8/0xe0
 device_driver_detach+0x14/0x20
 unbind_store+0xaf/0xc0
 drv_attr_store+0x21/0x50
 sysfs_kf_write+0x4a/0x80
 kernfs_fop_write_iter+0x169/0x220
 vfs_write+0x293/0x560
 ksys_write+0x72/0xf0
 __x64_sys_write+0x19/0x30
 x64_sys_call+0x2bf/0x2660
 do_syscall_64+0x93/0xb60
 ? __f_unlock_pos+0x15/0x20
 ? __x64_sys_getdents64+0x9b/0x130
 ? __pfx_filldir64+0x10/0x10
 ? do_syscall_64+0x1a2/0xb60
 ? clear_bhb_loop+0x30/0x80
 ? clear_bhb_loop+0x30/0x80
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x76e292edd574
Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d d5 ea 0e 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89
RSP: 002b:00007fffe247a828 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000076e292edd574
RDX: 000000000000000c RSI: 00006267f6306063 RDI: 000000000000000b
RBP: 000000000000000c R08: 000076e292fc4b20 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000202 R12: 00006267f6306063
R13: 000000000000000b R14: 00006267e6859c00 R15: 000076e29322a000
 </TASK>
xe 0000:03:00.0: [drm] Xe device coredump has been deleted.

Fixes: 01daccf748 ("devcoredump : Serialize devcd_del work")
Cc: Mukesh Ojha <quic_mojha@quicinc.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org # v6.1+
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
Cc: Matthew Brost <matthew.brost@intel.com>
Acked-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250723142416.1020423-1-dev@lankhorst.se
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-17 09:47:40 +02:00
Ulf Hansson
74b84d1be0 driver core: fw_devlink: Don't warn about sync_state() pending
Due to the wider deployment of the ->sync_state() support, for PM domains
for example, we are receiving reports about the sync_state() pending
message that is being logged in fw_devlink_dev_sync_state(). In particular
as it's printed at the warning level, which is questionable.

Even if it certainly is useful to know that the ->sync_state() condition
could not be met, there may be nothing wrong with it. For example, a driver
may be built as module and are still waiting to be initialized/probed. For
this reason let's move to the info level for now.

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reported-by: Sebin Francis <sebin.francis@ti.com>
Reported-by: Diederik de Haas <didi.debian@cknow.org>
Reported-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Dhruva Gole <d-gole@ti.com>
Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Reviewed-by: Kevin Hilman <khilman@baylibre.com>
Acked-by: Saravana Kannan <saravanak@google.com>
Reviewed-by: Sebin Francis <sebin.francis@ti.com>
Tested-by: Sebin Francis <sebin.francis@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-17 09:47:40 +02:00
Sumanth Korikkar
300709fbef mm/memory_hotplug: Remove MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers
MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE memory notifiers were introduced
to prepare the transition of memory to and from a physically accessible
state. This enhancement was crucial for implementing the "memmap on memory"
feature for s390.

With introduction of dynamic (de)configuration of hotpluggable memory,
memory can be brought to accessible state before add_memory(). Memory
can be brought to inaccessible state before remove_memory(). Hence,
there is no need of MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE memory
notifiers anymore.

This basically reverts commit
c5f1e2d189 ("mm/memory_hotplug: introduce MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers")
Additionally, apply minor adjustments to the function parameters of
move_pfn_range_to_zone() and mhp_supports_memmap_on_memory() to ensure
compatibility with the latest branch.

Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-10-14 14:24:53 +02:00
Kaushlendra Kumar
67434ce57c PM: sleep: Replace snprintf() with scnprintf() in show_trace_dev_match()
Replace snprintf() with scnprintf() in show_trace_dev_match() to simplify
buffer length handling. The scnprintf() function returns the number of
characters actually written (excluding the null terminator), which
eliminates the need for manual length checking and clamping.

This change removes the redundant size check since scnprintf() guarantees
that the return value will never exceed the buffer size, making the code
cleaner and less error-prone.

Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com>
Link: https://patch.msgid.link/20250922055231.3523680-1-kaushlendra.kumar@intel.com
[ rjw: Subject adjustment ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-13 21:19:12 +02:00
Linus Torvalds
abdf766d14 Merge tag 'pm-6.18-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more power management updates from Rafael Wysocki:
 "These are cpufreq fixes and cleanups on top of the material merged
  previously, a power management core code fix and updates of the
  runtime PM framework including unit tests, documentation updates and
  introduction of auto-cleanup macros for runtime PM "resume and get"
  and "get without resuming" operations.

  Specifics:

   - Make cpufreq drivers setting the default CPU transition latency to
     CPUFREQ_ETERNAL specify a proper default transition latency value
     instead which addresses a regression introduced during the 6.6
     cycle that broke CPUFREQ_ETERNAL handling (Rafael Wysocki)

   - Make the cpufreq CPPC driver use a proper transition delay value
     when CPUFREQ_ETERNAL is returned by cppc_get_transition_latency()
     to indicate an error condition (Rafael Wysocki)

   - Make cppc_get_transition_latency() return a negative error code to
     indicate error conditions instead of using CPUFREQ_ETERNAL for this
     purpose and drop CPUFREQ_ETERNAL that has no other users (Rafael
     Wysocki, Gopi Krishna Menon)

   - Fix device leak in the mediatek cpufreq driver (Johan Hovold)

   - Set target frequency on all CPUs sharing a policy during frequency
     updates in the tegra186 cpufreq driver and make it initialize all
     cores to max frequencies (Aaron Kling)

   - Rust cpufreq helper cleanup (Thorsten Blum)

   - Make pm_runtime_put*() family of functions return 1 when the given
     device is already suspended which is consistent with the
     documentation (Brian Norris)

   - Add basic kunit tests for runtime PM API contracts and update
     return values in kerneldoc comments for the runtime PM API (Brian
     Norris, Dan Carpenter)

   - Add auto-cleanup macros for runtime PM "resume and get" and "get
     without resume" operations, use one of them in the PCI core and
     drop the existing "free" macro introduced for similar purpose, but
     somewhat cumbersome to use (Rafael Wysocki)

   - Make the core power management code avoid waiting on device links
     marked as SYNC_STATE_ONLY which is consistent with the handling of
     those device links elsewhere (Pin-yen Lin)"

* tag 'pm-6.18-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  docs/zh_CN: Fix malformed table
  docs/zh_TW: Fix malformed table
  PM: runtime: Fix error checking for kunit_device_register()
  PM: runtime: Introduce one more usage counter guard
  cpufreq: Drop unused symbol CPUFREQ_ETERNAL
  ACPI: CPPC: Do not use CPUFREQ_ETERNAL as an error value
  cpufreq: CPPC: Avoid using CPUFREQ_ETERNAL as transition delay
  cpufreq: Make drivers using CPUFREQ_ETERNAL specify transition latency
  PM: runtime: Drop DEFINE_FREE() for pm_runtime_put()
  PCI/sysfs: Use runtime PM guard macro for auto-cleanup
  PM: runtime: Add auto-cleanup macros for "resume and get" operations
  cpufreq: tegra186: Initialize all cores to max frequencies
  cpufreq: tegra186: Set target frequency for all cpus in policy
  rust: cpufreq: streamline find_supply_names
  cpufreq: mediatek: fix device leak on probe failure
  PM: sleep: Do not wait on SYNC_STATE_ONLY device links
  PM: runtime: Update kerneldoc return codes
  PM: runtime: Make put{,_sync}() return 1 when already suspended
  PM: runtime: Add basic kunit tests for API contracts
2025-10-07 09:39:51 -07:00
Rafael J. Wysocki
05f084d24e Merge branches 'pm-core' and 'pm-runtime'
Merge runtime PM framework updates and a core power management code fix
for 6.18-rc1:

 - Make pm_runtime_put*() family of functions return 1 when the
   given device is already suspended which is consistent with the
   documentation (Brian Norris)

 - Add basic kunit tests for runtime PM API contracts and update return
   values in kerneldoc coments for the runtime PM API (Brian Norris,
   Dan Carpenter)

 - Add auto-cleanup macros for runtime PM "resume and get" and "get
   without resume" operations, use one of them in the PCI core and
   drop the existing "free" macro introduced for similar purpose, but
   somewhat cumbersome to use (Rafael Wysocki)

 - Make the core power management code avoid waiting on device links
   marked as SYNC_STATE_ONLY which is consistent with the handling of
   those device links elsewhere (Pin-yen Lin)

* pm-core:
  PM: sleep: Do not wait on SYNC_STATE_ONLY device links

* pm-runtime:
  PM: runtime: Fix error checking for kunit_device_register()
  PM: runtime: Introduce one more usage counter guard
  PM: runtime: Drop DEFINE_FREE() for pm_runtime_put()
  PCI/sysfs: Use runtime PM guard macro for auto-cleanup
  PM: runtime: Add auto-cleanup macros for "resume and get" operations
  PM: runtime: Update kerneldoc return codes
  PM: runtime: Make put{,_sync}() return 1 when already suspended
  PM: runtime: Add basic kunit tests for API contracts
2025-10-07 12:20:36 +02:00
Linus Torvalds
7a405dbb0f Merge tag 'mm-stable-2025-10-03-16-49' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more MM updates from Andrew Morton:
"Only two patch series in this pull request:

   - "mm/memory_hotplug: fixup crash during uevent handling" from Hannes
     Reinecke fixes a race that was causing udev to trigger a crash in
     the memory hotplug code

   - "mm_slot: following fixup for usage of mm_slot_entry()" from Wei
     Yang adds some touchups to the just-merged mm_slot changes"

* tag 'mm-stable-2025-10-03-16-49' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm/khugepaged: use KMEM_CACHE()
  mm/ksm: cleanup mm_slot_entry() invocation
  Documentation/mm: drop pxx_mkdevmap() descriptions from page table helpers
  mm: clean up is_guard_pte_marker()
  drivers/base: move memory_block_add_nid() into the caller
  mm/memory_hotplug: activate node before adding new memory blocks
  drivers/base/memory: add node id parameter to add_memory_block()
2025-10-05 12:11:07 -07:00
Linus Torvalds
d104e3d17f Merge tag 'cxl-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull CXL updates from Dave Jiang:
 "The changes include adding poison injection support, fixing CXL access
  coordinates when onlining CXL memory, and delaing the enumeration of
  downstream switch ports for CXL hierarchy to ensure that the CXL link
  is established at the time of enumeration to address a few issues
  observed on AMD and Intel platforms.

  Misc changes:
   - Use str_plural() instead of open code for emitting strings.
   - Use str_enabled_disabled() instead of ternary operator
   - Fix emit of type resource_size_t argument for
     validate_region_offset()
   - Typo fixup in CXL driver-api documentation
   - Rename CFMWS coherency restriction defines
   - Add convention doc describe dealing with x86 low memory hole
     and CXL

  Poison Inject support:
   - Move hpa_to_spa callback to new reoot decoder ops structure
   - Define a SPA to HPA callback for interleave calculation with
     XOR math
   - Add support for SPA to DPA address translation with XOR
   - Add locked variants of poison inject and clear functions
   - Add inject and clear poison support by region offset

  CXL access coordinates update fix:
   - A comment update for hotplug memory callback prority defines
   - Add node_update_perf_attrs() for updating perf attrs on a node
   - Update cxl_access_coordinates() to use the new node update function
   - Remove hmat_update_target_coordinates() and related code

  CXL delayed downstream port enumeration and initialization:
   - Add helper to detect top of CXL device topology and remove
     open coding
   - Add helper to delete single dport
   - Add a cached copy of target_map to cxl_decoder
   - Refactor decoder setup to reduce cxl_test burden
   - Defer dport allocation for switch ports
   - Add mock version of devm_cxl_add_dport_by_dev() for cxl_test
   - Adjust the mock version of devm_cxl_switch_port_decoders_setup()
     due to cxl core usage
   - Setup target_map for cxl_test decoder initialization
   - Change SSLBIS handler to handle single dport
   - Move port register setup to when first dport appears"

* tag 'cxl-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (25 commits)
  cxl: Move port register setup to when first dport appear
  cxl: Change sslbis handler to only handle single dport
  cxl/test: Setup target_map for cxl_test decoder initialization
  cxl/test: Adjust the mock version of devm_cxl_switch_port_decoders_setup()
  cxl/test: Add mock version of devm_cxl_add_dport_by_dev()
  cxl: Defer dport allocation for switch ports
  cxl/test: Refactor decoder setup to reduce cxl_test burden
  cxl: Add a cached copy of target_map to cxl_decoder
  cxl: Add helper to delete dport
  cxl: Add helper to detect top of CXL device topology
  cxl: Documentation/driver-api/cxl: Describe the x86 Low Memory Hole solution
  cxl/acpi: Rename CFMW coherency restrictions
  Documentation/driver-api: Fix typo error in cxl
  acpi/hmat: Remove now unused hmat_update_target_coordinates()
  cxl, acpi/hmat: Update CXL access coordinates directly instead of through HMAT
  drivers/base/node: Add a helper function node_update_perf_attrs()
  mm/memory_hotplug: Update comment for hotplug memory callback priorities
  cxl: Fix emit of type resource_size_t argument for validate_region_offset()
  cxl/region: Add inject and clear poison by region offset
  cxl/core: Add locked variants of the poison inject and clear funcs
  ...
2025-10-04 12:02:50 -07:00
Linus Torvalds
86bcf7be1e Merge tag 'riscv-for-linus-6.18-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull more RISC-V updates from Paul Walmsley:

 - Support for the RISC-V-standardized RPMI interface.

   RPMI is a platform management communication mechanism between OSes
   running on application processors, and a remote platform management
   processor. Similar to ARM SCMI, TI SCI, etc. This includes irqchip,
   mailbox, and clk changes.

 - Support for the RISC-V-standardized MPXY SBI extension.

   MPXY is a RISC-V-specific standard implementing a shared memory
   mailbox between S-mode operating systems (e.g., Linux) and M-mode
   firmware (e.g., OpenSBI). It is part of this PR since one of its use
   cases is to enable M-mode firmware to act as a single RPMI client for
   all RPMI activity on a core (including S-mode RPMI activity).
   Includes a mailbox driver.

 - Some ACPI-related updates to enable the use of RPMI and MPXY.

 - The addition of Linux-wide memcpy_{from,to}_le32() static inline
   functions, for RPMI use.

 - An ACPI Kconfig change to enable boot logos on any ACPI-using
   architecture (including RISC-V)

 - A RISC-V defconfig change to add GPIO keyboard and event device
   support, for front panel shutdown or reboot buttons

* tag 'riscv-for-linus-6.18-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (26 commits)
  clk: COMMON_CLK_RPMI should depend on RISCV
  ACPI: support BGRT table on RISC-V
  MAINTAINERS: Add entry for RISC-V RPMI and MPXY drivers
  RISC-V: Enable GPIO keyboard and event device in RV64 defconfig
  irqchip/riscv-rpmi-sysmsi: Add ACPI support
  mailbox/riscv-sbi-mpxy: Add ACPI support
  irqchip/irq-riscv-imsic-early: Export imsic_acpi_get_fwnode()
  ACPI: RISC-V: Add RPMI System MSI to GSI mapping
  ACPI: RISC-V: Add support to update gsi range
  ACPI: RISC-V: Create interrupt controller list in sorted order
  ACPI: scan: Update honor list for RPMI System MSI
  ACPI: Add support for nargs_prop in acpi_fwnode_get_reference_args()
  ACPI: property: Refactor acpi_fwnode_get_reference_args() to support nargs_prop
  irqchip: Add driver for the RPMI system MSI service group
  dt-bindings: Add RPMI system MSI interrupt controller bindings
  dt-bindings: Add RPMI system MSI message proxy bindings
  clk: Add clock driver for the RISC-V RPMI clock service group
  dt-bindings: clock: Add RPMI clock service controller bindings
  dt-bindings: clock: Add RPMI clock service message proxy bindings
  mailbox: Add RISC-V SBI message proxy (MPXY) based mailbox driver
  ...
2025-10-04 10:36:22 -07:00
Hannes Reinecke
0a947c14e4 drivers/base: move memory_block_add_nid() into the caller
Now the node id only needs to be set for early memory, so move
memory_block_add_nid() into the caller and rename it into
memory_block_add_nid_early().  This allows us to further simplify the code
by dropping the 'context' argument to
do_register_memory_block_under_node().

Link: https://lkml.kernel.org/r/20250729064637.51662-4-hare@kernel.org
Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Hannes Reinecke <hare@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Donet Tom <donettom@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-10-03 16:42:43 -07:00
Hannes Reinecke
b8179af120 mm/memory_hotplug: activate node before adding new memory blocks
The sysfs attributes for memory blocks require the node ID to be set and
initialized, so move the node activation before adding new memory blocks. 
This also has the nice side effect that the BUG_ON() can be converted into
a WARN_ON() as we now can handle registration errors.

Link: https://lkml.kernel.org/r/20250729064637.51662-3-hare@kernel.org
Fixes: b9ff036082 ("mm/memory_hotplug.c: make add_memory_resource use __try_online_node")
Signed-off-by: Hannes Reinecke <hare@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Donet Tom <donettom@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-10-03 16:42:43 -07:00
Hannes Reinecke
c6a809363a drivers/base/memory: add node id parameter to add_memory_block()
Patch series "mm/memory_hotplug: fixup crash during uevent handling", v4.

we have some udev rules trying to read the sysfs attribute 'valid_zones'
during an memory 'add' event, causing a crash in zone_for_pfn_range(). 
Debugging found that mem->nid was set to NUMA_NO_NODE, which crashed in
NODE_DATA(nid).  Further analysis revealed that we're running into a race
with udev event processing: add_memory_resource() has this function calls:

1) __try_online_node()
2) arch_add_memory()
3) create_memory_block_devices()
  -> calls device_register() -> memory 'add' event
4) node_set_online()/__register_one_node()
  -> calls device_register() -> node 'add' event
5) register_memory_blocks_under_node()
  -> sets mem->nid

Which, to the uninitated, is ... weird ...

Why do we try to online the node in 1), but only register the node in 4)
_after_ we have created the memory blocks in 3) ?  And why do we set the
'nid' value in 5), when the uevent (which might need to see the correct
'nid' value) is sent out in 3) ?  There must be a reason, I'm sure ...

So here's a small patchset to fixup uevent ordering.  The first patch adds
a 'nid' parameter to add_memory_blocks() (to avoid mem->nid being
initialized with NUMA_NO_NODE), and the second patch reshuffles the code
in add_memory_resource() to fully initialize the node prior to calling
create_memory_block_devices() so that the node is valid at that time and
uevent processing will see correct values in sysfs.


This patch (of 3):

We have some udev rules trying to read the sysfs attribute 'valid_zones'
during an memory 'add' event, causing a crash in zone_for_pfn_range(). 
Debugging found that mem->nid was set to NUMA_NO_NODE, which crashed in
NODE_DATA(nid).  Further analysis revealed that we're running into a race
with udev event processing: add_memory_resource() has this function calls:

1) __try_online_node()
2) arch_add_memory()
3) create_memory_block_devices()
  -> calls device_register() -> memory 'add' event
4) node_set_online()/__register_one_node()
  -> calls device_register() -> node 'add' event
5) register_memory_blocks_under_node()
  -> sets mem->nid

Which, to the uninitated, is ... weird ...

Why do we try to online the node in 1), but only register the node in 4)
_after_ we have created the memory blocks in 3) ?  And why do we set the
'nid' value in 5), when the uevent (which might need to see the correct
'nid' value) is sent out in 3) ?  There must be a reason, I'm sure ...

So here's a small patchset to fixup uevent ordering.  The first patch adds
a 'nid' parameter to add_memory_blocks() (to avoid mem->nid being
initialized with NUMA_NO_NODE), and the second patch reshuffles the code
in add_memory_resource() to fully initialize the node prior to calling
create_memory_block_devices() so that the node is valid at that time and
uevent processing will see correct values in sysfs.


This patch (of 3):

Add a 'nid' parameter to add_memory_block() to initialize the memory block
with the correct node id.

Link: https://lkml.kernel.org/r/20250729064637.51662-1-hare@kernel.org
Link: https://lkml.kernel.org/r/20250729064637.51662-2-hare@kernel.org
Signed-off-by: Hannes Reinecke <hare@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Donet Tom <donettom@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-10-03 16:42:43 -07:00
Dan Carpenter
92158fae2e PM: runtime: Fix error checking for kunit_device_register()
The kunit_device_register() function never returns NULL, it returns
error pointers.  Update the assertions to use
KUNIT_ASSERT_NOT_ERR_OR_NULL() instead of checking for NULL.

Fixes: 7f7acd193b ("PM: runtime: Add basic kunit tests for API contracts")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-03 21:10:48 +02:00
Linus Torvalds
8804d970fa Merge tag 'mm-stable-2025-10-01-19-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:

 - "mm, swap: improve cluster scan strategy" from Kairui Song improves
   performance and reduces the failure rate of swap cluster allocation

 - "support large align and nid in Rust allocators" from Vitaly Wool
   permits Rust allocators to set NUMA node and large alignment when
   perforning slub and vmalloc reallocs

 - "mm/damon/vaddr: support stat-purpose DAMOS" from Yueyang Pan extend
   DAMOS_STAT's handling of the DAMON operations sets for virtual
   address spaces for ops-level DAMOS filters

 - "execute PROCMAP_QUERY ioctl under per-vma lock" from Suren
   Baghdasaryan reduces mmap_lock contention during reads of
   /proc/pid/maps

 - "mm/mincore: minor clean up for swap cache checking" from Kairui Song
   performs some cleanup in the swap code

 - "mm: vm_normal_page*() improvements" from David Hildenbrand provides
   code cleanup in the pagemap code

 - "add persistent huge zero folio support" from Pankaj Raghav provides
   a block layer speedup by optionalls making the
   huge_zero_pagepersistent, instead of releasing it when its refcount
   falls to zero

 - "kho: fixes and cleanups" from Mike Rapoport adds a few touchups to
   the recently added Kexec Handover feature

 - "mm: make mm->flags a bitmap and 64-bit on all arches" from Lorenzo
   Stoakes turns mm_struct.flags into a bitmap. To end the constant
   struggle with space shortage on 32-bit conflicting with 64-bit's
   needs

 - "mm/swapfile.c and swap.h cleanup" from Chris Li cleans up some swap
   code

 - "selftests/mm: Fix false positives and skip unsupported tests" from
   Donet Tom fixes a few things in our selftests code

 - "prctl: extend PR_SET_THP_DISABLE to only provide THPs when advised"
   from David Hildenbrand "allows individual processes to opt-out of
   THP=always into THP=madvise, without affecting other workloads on the
   system".

   It's a long story - the [1/N] changelog spells out the considerations

 - "Add and use memdesc_flags_t" from Matthew Wilcox gets us started on
   the memdesc project. Please see

      https://kernelnewbies.org/MatthewWilcox/Memdescs and
      https://blogs.oracle.com/linux/post/introducing-memdesc

 - "Tiny optimization for large read operations" from Chi Zhiling
   improves the efficiency of the pagecache read path

 - "Better split_huge_page_test result check" from Zi Yan improves our
   folio splitting selftest code

 - "test that rmap behaves as expected" from Wei Yang adds some rmap
   selftests

 - "remove write_cache_pages()" from Christoph Hellwig removes that
   function and converts its two remaining callers

 - "selftests/mm: uffd-stress fixes" from Dev Jain fixes some UFFD
   selftests issues

 - "introduce kernel file mapped folios" from Boris Burkov introduces
   the concept of "kernel file pages". Using these permits btrfs to
   account its metadata pages to the root cgroup, rather than to the
   cgroups of random inappropriate tasks

 - "mm/pageblock: improve readability of some pageblock handling" from
   Wei Yang provides some readability improvements to the page allocator
   code

 - "mm/damon: support ARM32 with LPAE" from SeongJae Park teaches DAMON
   to understand arm32 highmem

 - "tools: testing: Use existing atomic.h for vma/maple tests" from
   Brendan Jackman performs some code cleanups and deduplication under
   tools/testing/

 - "maple_tree: Fix testing for 32bit compiles" from Liam Howlett fixes
   a couple of 32-bit issues in tools/testing/radix-tree.c

 - "kasan: unify kasan_enabled() and remove arch-specific
   implementations" from Sabyrzhan Tasbolatov moves KASAN arch-specific
   initialization code into a common arch-neutral implementation

 - "mm: remove zpool" from Johannes Weiner removes zspool - an
   indirection layer which now only redirects to a single thing
   (zsmalloc)

 - "mm: task_stack: Stack handling cleanups" from Pasha Tatashin makes a
   couple of cleanups in the fork code

 - "mm: remove nth_page()" from David Hildenbrand makes rather a lot of
   adjustments at various nth_page() callsites, eventually permitting
   the removal of that undesirable helper function

 - "introduce kasan.write_only option in hw-tags" from Yeoreum Yun
   creates a KASAN read-only mode for ARM, using that architecture's
   memory tagging feature. It is felt that a read-only mode KASAN is
   suitable for use in production systems rather than debug-only

 - "mm: hugetlb: cleanup hugetlb folio allocation" from Kefeng Wang does
   some tidying in the hugetlb folio allocation code

 - "mm: establish const-correctness for pointer parameters" from Max
   Kellermann makes quite a number of the MM API functions more accurate
   about the constness of their arguments. This was getting in the way
   of subsystems (in this case CEPH) when they attempt to improving
   their own const/non-const accuracy

 - "Cleanup free_pages() misuse" from Vishal Moola fixes a number of
   code sites which were confused over when to use free_pages() vs
   __free_pages()

 - "Add Rust abstraction for Maple Trees" from Alice Ryhl makes the
   mapletree code accessible to Rust. Required by nouveau and by its
   forthcoming successor: the new Rust Nova driver

 - "selftests/mm: split_huge_page_test: split_pte_mapped_thp
   improvements" from David Hildenbrand adds a fix and some cleanups to
   the thp selftesting code

 - "mm, swap: introduce swap table as swap cache (phase I)" from Chris
   Li and Kairui Song is the first step along the path to implementing
   "swap tables" - a new approach to swap allocation and state tracking
   which is expected to yield speed and space improvements. This
   patchset itself yields a 5-20% performance benefit in some situations

 - "Some ptdesc cleanups" from Matthew Wilcox utilizes the new memdesc
   layer to clean up the ptdesc code a little

 - "Fix va_high_addr_switch.sh test failure" from Chunyu Hu fixes some
   issues in our 5-level pagetable selftesting code

 - "Minor fixes for memory allocation profiling" from Suren Baghdasaryan
   addresses a couple of minor issues in relatively new memory
   allocation profiling feature

 - "Small cleanups" from Matthew Wilcox has a few cleanups in
   preparation for more memdesc work

 - "mm/damon: add addr_unit for DAMON_LRU_SORT and DAMON_RECLAIM" from
   Quanmin Yan makes some changes to DAMON in furtherance of supporting
   arm highmem

 - "selftests/mm: Add -Wunreachable-code and fix warnings" from Muhammad
   Anjum adds that compiler check to selftests code and fixes the
   fallout, by removing dead code

 - "Improvements to Victim Process Thawing and OOM Reaper Traversal
   Order" from zhongjinji makes a number of improvements in the OOM
   killer: mainly thawing a more appropriate group of victim threads so
   they can release resources

 - "mm/damon: misc fixups and improvements for 6.18" from SeongJae Park
   is a bunch of small and unrelated fixups for DAMON

 - "mm/damon: define and use DAMON initialization check function" from
   SeongJae Park implement reliability and maintainability improvements
   to a recently-added bug fix

 - "mm/damon/stat: expose auto-tuned intervals and non-idle ages" from
   SeongJae Park provides additional transparency to userspace clients
   of the DAMON_STAT information

 - "Expand scope of khugepaged anonymous collapse" from Dev Jain removes
   some constraints on khubepaged's collapsing of anon VMAs. It also
   increases the success rate of MADV_COLLAPSE against an anon vma

 - "mm: do not assume file == vma->vm_file in compat_vma_mmap_prepare()"
   from Lorenzo Stoakes moves us further towards removal of
   file_operations.mmap(). This patchset concentrates upon clearing up
   the treatment of stacked filesystems

 - "mm: Improve mlock tracking for large folios" from Kiryl Shutsemau
   provides some fixes and improvements to mlock's tracking of large
   folios. /proc/meminfo's "Mlocked" field became more accurate

 - "mm/ksm: Fix incorrect accounting of KSM counters during fork" from
   Donet Tom fixes several user-visible KSM stats inaccuracies across
   forks and adds selftest code to verify these counters

 - "mm_slot: fix the usage of mm_slot_entry" from Wei Yang addresses
   some potential but presently benign issues in KSM's mm_slot handling

* tag 'mm-stable-2025-10-01-19-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (372 commits)
  mm: swap: check for stable address space before operating on the VMA
  mm: convert folio_page() back to a macro
  mm/khugepaged: use start_addr/addr for improved readability
  hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list
  alloc_tag: fix boot failure due to NULL pointer dereference
  mm: silence data-race in update_hiwater_rss
  mm/memory-failure: don't select MEMORY_ISOLATION
  mm/khugepaged: remove definition of struct khugepaged_mm_slot
  mm/ksm: get mm_slot by mm_slot_entry() when slot is !NULL
  hugetlb: increase number of reserving hugepages via cmdline
  selftests/mm: add fork inheritance test for ksm_merging_pages counter
  mm/ksm: fix incorrect KSM counter handling in mm_struct during fork
  drivers/base/node: fix double free in register_one_node()
  mm: remove PMD alignment constraint in execmem_vmalloc()
  mm/memory_hotplug: fix typo 'esecially' -> 'especially'
  mm/rmap: improve mlock tracking for large folios
  mm/filemap: map entire large folio faultaround
  mm/fault: try to map the entire file folio in finish_fault()
  mm/rmap: mlock large folios in try_to_unmap_one()
  mm/rmap: fix a mlock race condition in folio_referenced_one()
  ...
2025-10-02 18:18:33 -07:00
Linus Torvalds
991053178e Merge tag 'pm-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
 "The majority of these are cpufreq changes, which has been a recurring
  pattern for a few recent cycles.

  Those changes include new hardware support (AN7583 SoC support in the
  airoha cpufreq driver, ipq5424 support in the qcom-nvmem cpufreq
  driver, MT8196 support in the mediatek cpufreq driver, AM62D2 support
  in the ti cpufreq driver), DT bindings and Rust code updates, cleanups
  of the core and governors, and multiple driver fixes and cleanups.

  Beyond that, there are hibernation fixes (some remaining 6.16 cycle
  fallout and an issue related to hybrid suspend in the amdgpu driver),
  cleanups of the PM core code, runtime PM documentation update, cpuidle
  and power capping cleanups, and tooling updates.

  Specifics:

   - Rearrange variable declarations involving __free() in the cpufreq
     core and intel_pstate driver to follow common coding style (Rafael
     Wysocki)

   - Fix object lifecycle issue in update_qos_request(), rearrange freq
     QoS updates using __free(), and adjust frequency percentage
     computations in the intel_pstate driver (Rafael Wysocki)

   - Update intel_pstate to allow it to enable HWP without EPP if the
     new DEC (Dynamic Efficiency Control) HW feature is enabled (Rafael
     Wysocki)

   - Use on_each_cpu_mask() in drv_write() in the ACPI cpufreq driver to
     simplify the code (Rafael Wysocki)

   - Use likely() optimization in intel_pstate_sample() (Yaxiong Tian)

   - Remove dead EPB-related code from intel_pstate (Srinivas
     Pandruvada)

   - Use scope-based cleanup for cpufreq policy references in multiple
     cpufreq drivers (Zihuan Zhang)

   - Avoid calling get_governor() for the first policy in the cpufreq
     core to simplify the initial policy path (Zihuan Zhang)

   - Clean up the cpufreq core in multiple places (Zihuan Zhang)

   - Use int type to store negative error codes in the cpufreq core and
     update the speedstep-lib to use int for error codes (Qianfeng Rong)

   - Update the efficient idle check for Intel extended Families in the
     ondemand cpufreq governor (Sohil Mehta)

   - Replace sscanf() with kstrtouint() in the conservative cpufreq
     governor (Kaushlendra Kumar)

   - Rename CpumaskVar::as[_mut]_ref to from_raw[_mut] in the cpumask
     Rust code and mark CpumaskVar as transparent (Alice Ryhl, Baptiste
     Lepers)

   - Update ARef and AlwaysRefCounted imports from sync::aref in the OPP
     Rust code (Shankari Anand)

   - Add support for AN7583 SoC to the airoha cpufreq driver (Christian
     Marangi)

   - Enable cpufreq for ipq5424 in the qcom-nvmem cpufreq driver (Md
     Sadre Alam)

   - Add support for MT8196 to the mediatek-hw cpufreq driver, refactor
     that driver and add mediatek,mt8196-cpufreq-hw DT binding (Nicolas
     Frattaroli)

   - Avoid redundant conditions in the mediatek cpufreq driver (Liao
     Yuanhong)

   - Add support for AM62D2 to the ti cpufreq driver and blocklist
     ti,am62d2 SoC in dt-platdev (Paresh Bhagat)

   - Support more speed grades on AM62Px SoC in the ti cpufreq driver,
     allow all silicon revisions to support OPPs in it, and fix
     supported hardware for 1GHz OPP (Judith Mendez)

   - Add QCS615 compatible to DT bindings for cpufreq-qcom-hw (Taniya
     Das)

   - Minor assorted updates of the scmi, longhaul, CPPC, and armada-37xx
     cpufreq drivers (Akhilesh Patil, BowenYu, Dennis Beier, and Florian
     Fainelli)

   - Remove outdated cpufreq-dt.txt (Frank Li)

   - Fix python gnuplot package names in the amd_pstate_tracer utility
     (Kuan-Wei Chiu)

   - Saravana Kannan will maintain the virtual-cpufreq driver (Saravana
     Kannan)

   - Prevent CPU capacity updates after registering a perf domain from
     failing on a first CPU that is not present (Christian Loehle)

   - Add support for the cases in which frequency alone is not
     sufficient to uniquely identify an OPP (Krishna Chaitanya Chundru)

   - Use to_result() for OPP error handling in Rust (Onur Özkan)

   - Add support for LPDDR5 on Rockhip RK3588 SoC to rockchip-dfi
     devfreq driver (Nicolas Frattaroli)

   - Fix an issue where DDR cycle counts on RK3588/RK3528 with LPDDR4(X)
     are reported as half by adding a cycle multiplier to the DFI driver
     in rockchip-dfi devfreq-event driver (Nicolas Frattaroli)

   - Fix missing error pointer dereference check of regulator instance
     in the mtk-cci devfreq driver probe and remove a redundant
     condition from an if () statement in that driver (Dan Carpenter,
     Liao Yuanhong)

   - Fail cpuidle device registration if there is one already to avoid
     sysfs-related issues (Rafael Wysocki)

   - Use sysfs_emit()/sysfs_emit_at() instead of sprintf()/scnprintf()
     in cpuidle (Vivek Yadav)

   - Fix device and OF node leaks at probe in the qcom-spm cpuidle
     driver and drop unnecessary initialisations from it (Johan Hovold)

   - Remove unnecessary address-of operators from the intel_idle cpuidle
     driver (Kaushlendra Kumar)

   - Rearrange main loop in menu_select() to make the code in that
     funtion easier to follow (Rafael Wysocki)

   - Convert values in microseconds to ktime using us_to_ktime() where
     applicable in the intel_idle power capping driver (Xichao Zhao)

   - Annotate loops walking device links in the power management core
     code as _srcu and add macros for walking device links to reduce the
     likelihood of coding mistakes related to them (Rafael Wysocki)

   - Document time units for *_time functions in the runtime PM API
     (Brian Norris)

   - Clear power.must_resume in noirq suspend error path to avoid
     resuming a dependant device under a suspended parent or supplier
     (Rafael Wysocki)

   - Fix GFP mask handling during hybrid suspend and make the amdgpu
     driver handle hybrid suspend correctly (Mario Limonciello, Rafael
     Wysocki)

   - Fix GFP mask handling after aborted hibernation in platform mode
     and combine exit paths in power_down() to avoid code duplication
     (Rafael Wysocki)

   - Use vmalloc_array() and vcalloc() in the hibernation core to avoid
     open-coded size computations (Qianfeng Rong)

   - Fix typo in hibernation core code comment (Li Jun)

   - Call pm_wakeup_clear() in the same place where other functions that
     do bookkeeping prior to suspend_prepare() are called (Samuel Wu)

   - Fix and clean up the x86_energy_perf_policy utility and update its
     documentation (Len Brown, Kaushlendra Kumar)

   - Fix incorrect sorting of PMT telemetry in turbostat (Kaushlendra
     Kumar)

   - Fix incorrect size in cpuidle_state_disable() and the error return
     value of cpupower_write_sysfs() in cpupower (Kaushlendra Kumar)"

* tag 'pm-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (86 commits)
  PM: hibernate: Combine return paths in power_down()
  PM: hibernate: Restrict GFP mask in power_down()
  PM: hibernate: Fix pm_hibernation_mode_is_suspend() build breakage
  PM: runtime: Documentation: ABI: Document time units for *_time
  tools/power x86_energy_perf_policy.8: Emphasize preference for SW interfaces
  tools/power x86_energy_perf_policy: Add make snapshot target
  tools/power x86_energy_perf_policy: Prefer driver HWP limits
  tools/power x86_energy_perf_policy: EPB access is only via sysfs
  tools/power x86_energy_perf_policy: Prepare for MSR/sysfs refactoring
  tools/power x86_energy_perf_policy: Enhance HWP enable
  tools/power x86_energy_perf_policy: Enhance HWP enabled check
  tools/power x86_energy_perf_policy: Fix incorrect fopen mode usage
  tools/power turbostat: Fix incorrect sorting of PMT telemetry
  drm/amd: Fix hybrid sleep
  PM: hibernate: Add pm_hibernation_mode_is_suspend()
  PM: hibernate: Fix hybrid-sleep
  tools/cpupower: Fix incorrect size in cpuidle_state_disable()
  tools/power/x86/amd_pstate_tracer: Fix python gnuplot package names
  cpufreq: Replace pointer subtraction with iteration macro
  cpuidle: Fail cpuidle device registration if there is one already
  ...
2025-10-01 16:08:37 -07:00
Linus Torvalds
5fb0249319 Merge tag 'pinctrl-v6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control updates from Linus Walleij:
 "We have GPIO awareness in the pin control core and an interesting
  AAEON driver.

  Core changes:

   - Allow pins to be identified/marked as GPIO mode with a special
     callback.

     The pin controller core is now "aware" if a pin is in GPIO mode if
     the callback is implemented in the driver, and can thus be marked
     as "strict", i.e. disallowing simultaneous use of a line as GPIO
     and another function such as I2C.

     This is enabled in the Qualcomm TLMM driver and also implemeted
     from day 1 in the new Broadcom STB driver

   - Rename the pin config option PIN_CONFIG_OUTPUT to PIN_CONFIG_LEVEL
     to better describe what the config is doing, as well as making it
     more intuitive what shall be returned when reading this property

  New drivers:

   - Qualcomm SDM660 LPASS LPI TLMM pin controller subdriver

   - Qualcomm Glymur family pin controller driver

   - Broadcom STB family pin controller driver

   - Tegra186 pin controller driver

   - AAEON UP pin controller support.

     This is some special pin controller that works as an external
     advanced line MUX and amplifier for signals from an Intel SoC. A
     cooperative effort with the GPIO maintainer was needed to reach a
     solution where we reuse code from the GPIO aggregator/forwarder
     driver

   - Renesas RZ/T2H and RZ/N2H pin controller support

   - Axis ARTPEC-8 subdriver for the Samsung pin controller driver

  Improvements:

   - Output enable (OEN) support in the Renesas RZG2L driver

   - Properly support bias pull up/down in the pinctrl-single driver

   - Move over all GPIO portions using generic MMIO GPIO to the new
     generic GPIO chip management which has a nice and separate API

   - Proper DT bindings for some older Broadcom SoCs

   - External GPIO (EGPIO) support in the Qualcomm SM8250

  Deleted code:

   - Dropped the now unused Samsung S3C24xx drivers"

* tag 'pinctrl-v6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (75 commits)
  pinctrl: use more common syntax for compound literals
  pinctrl: Simplify printks with pOF format
  pinctrl: qcom: Add SDM660 LPASS LPI TLMM
  dt-bindings: pinctrl: qcom: Add SDM660 LPI pinctrl
  pinctrl: qcom: lpass-lpi: Add ability to use custom pin offsets
  pinctrl: qcom: Add glymur pinctrl driver
  dt-bindings: pinctrl: qcom: Add Glymur pinctrl
  pinctrl: qcom: sm8250: Add egpio support
  pinctrl: generic: rename PIN_CONFIG_OUTPUT to LEVEL
  pinctrl: keembay: fix double free in keembay_build_functions()
  pinctrl: spacemit: fix typo in PRI_TDI pin name
  pinctrl: eswin: Fix regulator error check and Kconfig dependency
  pinctrl: bcm: Add STB family pin controller driver
  dt-bindings: pinctrl: Add support for Broadcom STB pin controller
  pinctrl: qcom: make the pinmuxing strict
  pinctrl: qcom: mark the `gpio` and `egpio` pins function as non-strict functions
  pinctrl: qcom: add infrastructure for marking pin functions as GPIOs
  pinctrl: allow to mark pin functions as requestable GPIOs
  pinctrl: qcom: use generic pin function helpers
  pinctrl: make struct pinfunction a pointer in struct function_desc
  ...
2025-10-01 13:14:48 -07:00
Linus Torvalds
c252b8cf12 Merge tag 'regmap-v6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap updates from Mark Brown:
 "This just contains a few small fixes, there's been no substantial
  development on regmap this release cycle"

* tag 'regmap-v6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap: use int type to store negative error codes
  regmap: Remove superfluous check for !config in __regmap_init()
  regmap: mmio: Add missing MODULE_DESCRIPTION()
2025-10-01 11:41:51 -07:00
Linus Torvalds
eb3289fc47 Merge tag 'driver-core-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core
Pull driver core updates from Danilo Krummrich:
 "Auxiliary:
   - Drop call to dev_pm_domain_detach() in auxiliary_bus_probe()
   - Optimize logic of auxiliary_match_id()

  Rust:
   - Auxiliary:
      - Use primitive C types from prelude

   - DebugFs:
      - Add debugfs support for simple read/write files and custom
        callbacks through a File-type-based and directory-scope-based
        API
      - Sample driver code for the File-type-based API
      - Sample module code for the directory-scope-based API

   - I/O:
      - Add io::poll module and implement Rust specific
        read_poll_timeout() helper

   - IRQ:
      - Implement support for threaded and non-threaded device IRQs
        based on (&Device<Bound>, IRQ number) tuples (IrqRequest)
      - Provide &Device<Bound> cookie in IRQ handlers

   - PCI:
      - Support IRQ requests from IRQ vectors for a specific
        pci::Device<Bound>
      - Implement accessors for subsystem IDs, revision, devid and
        resource start
      - Provide dedicated pci::Vendor and pci::Class types for vendor
        and class ID numbers
      - Implement Display to print actual vendor and class names; Debug
        to print the raw ID numbers
      - Add pci::DeviceId::from_class_and_vendor() helper
      - Use primitive C types from prelude
      - Various minor inline and (safety) comment improvements

   - Platform:
      - Support IRQ requests from IRQ vectors for a specific
        platform::Device<Bound>

   - Nova:
      - Use pci::DeviceId::from_class_and_vendor() to avoid probing
        non-display/compute PCI functions

   - Misc:
      - Add helper for cpu_relax()
      - Update ARef import from sync::aref

  sysfs:
   - Remove bin_attrs_new field from struct attribute_group
   - Remove read_new() and write_new() from struct bin_attribute

  Misc:
   - Document potential race condition in get_dev_from_fwnode()
   - Constify node_group argument in software node registration
     functions
   - Fix order of kernel-doc parameters in various functions
   - Set power.no_pm flag for faux devices
   - Set power.no_callbacks flag along with the power.no_pm flag
   - Constify the pmu_bus bus type
   - Minor spelling fixes"

* tag 'driver-core-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core: (43 commits)
  rust: pci: display symbolic PCI vendor names
  rust: pci: display symbolic PCI class names
  rust: pci: fix incorrect platform reference in PCI driver probe doc comment
  rust: pci: fix incorrect platform reference in PCI driver unbind doc comment
  perf: make pmu_bus const
  samples: rust: Add scoped debugfs sample driver
  rust: debugfs: Add support for scoped directories
  samples: rust: Add debugfs sample driver
  rust: debugfs: Add support for callback-based files
  rust: debugfs: Add support for writable files
  rust: debugfs: Add support for read-only files
  rust: debugfs: Add initial support for directories
  driver core: auxiliary bus: Optimize logic of auxiliary_match_id()
  driver core: auxiliary bus: Drop dev_pm_domain_detach() call
  driver core: Fix order of the kernel-doc parameters
  driver core: get_dev_from_fwnode(): document potential race
  drivers: base: fix "publically"->"publicly"
  driver core/PM: Set power.no_callbacks along with power.no_pm
  driver core: faux: Set power.no_pm for faux devices
  rust: pci: inline several tiny functions
  ...
2025-10-01 08:39:23 -07:00
Linus Torvalds
449c2b302c Merge tag 'vfs-6.18-rc1.async' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs async directory updates from Christian Brauner:
 "This contains further preparatory changes for the asynchronous directory
  locking scheme:

   - Add lookup_one_positive_killable() which allows overlayfs to
     perform lookup that won't block on a fatal signal

   - Unify the mount idmap handling in struct renamedata as a rename can
     only happen within a single mount

   - Introduce kern_path_parent() for audit which sets the path to the
     parent and returns a dentry for the target without holding any
     locks on return

   - Rename kern_path_locked() as it is only used to prepare for the
     removal of an object from the filesystem:

	kern_path_locked()    => start_removing_path()
	kern_path_create()    => start_creating_path()
	user_path_create()    => start_creating_user_path()
	user_path_locked_at() => start_removing_user_path_at()
	done_path_create()    => end_creating_path()
	NA                    => end_removing_path()"

* tag 'vfs-6.18-rc1.async' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  debugfs: rename start_creating() to debugfs_start_creating()
  VFS: rename kern_path_locked() and related functions.
  VFS/audit: introduce kern_path_parent() for audit
  VFS: unify old_mnt_idmap and new_mnt_idmap in renamedata
  VFS: discard err2 in filename_create()
  VFS/ovl: add lookup_one_positive_killable()
2025-09-29 11:55:15 -07:00
Rafael J. Wysocki
9a0abc3945 PM: runtime: Add auto-cleanup macros for "resume and get" operations
It is generally useful to be able to automatically drop a device's
runtime PM usage counter incremented by runtime PM operations that
resume a device and bump up its usage counter [1].

To that end, add guard definition macros allowing pm_runtime_put()
and pm_runtime_put_autosuspend() to be used for the auto-cleanup in
those cases.

Simply put, a piece of code like below:

	pm_runtime_get_sync(dev);
	.....
	pm_runtime_put(dev);
	return 0;

can be transformed with guard() like:

	guard(pm_runtime_active)(dev);
	.....
	return 0;

(see the pm_runtime_put() call is gone).

However, it is better to do proper error handling in the majority of
cases, so doing something like this instead of the above is recommended:

	ACQUIRE(pm_runtime_active_try, pm)(dev);
	if (ACQUIRE_ERR(pm_runtime_active_try, &pm))
		return -ENXIO;
	.....
	return 0;

In all of the cases in which runtime PM is known to be enabled for the
given device or the device can be regarded as operational (and so it can
be accessed) with runtime PM disabled, a piece of code like:

	ret = pm_runtime_resume_and_get(dev);
	if (ret < 0)
		return ret;
	.....
	pm_runtime_put(dev);
	return 0;

can be changed as follows:

	ACQUIRE(pm_runtime_active_try, pm)(dev);
	ret = ACQUIRE_ERR(pm_runtime_active_try, &pm);
	if (ret < 0)
		return ret;
	.....
	return 0;

(again, see the pm_runtime_put() call is gone).

Still, if the device cannot be accessed unless runtime PM has been
enabled for it, the pm_runtime_active_try_enabled guard variant
needs to be used, that is (in the context of the example above):

	ACQUIRE(pm_runtime_active_try_enabled, pm)(dev);
	ret = ACQUIRE_ERR(pm_runtime_active_try_enabled, &pm);
	if (ret < 0)
		return ret;
	.....
	return 0;

When the original code calls pm_runtime_put_autosuspend(), use one
of the "auto" guard variants, pm_runtime_active_auto/_try/_enabled,
so for example, a piece of code like:

	ret = pm_runtime_resume_and_get(dev);
	if (ret < 0)
		return ret;
	.....
	pm_runtime_put_autosuspend(dev);
	return 0;

will become:

	ACQUIRE(pm_runtime_active_auto_try_enabled, pm)(dev);
	ret = ACQUIRE_ERR(pm_runtime_active_auto_try_enabled, &pm);
	if (ret < 0)
		return ret;
	.....
	return 0;

Note that the cases in which the return value of pm_runtime_get_sync()
is checked can also be handled with the help of the new guard macros.
For example, a piece of code like:

	ret = pm_runtime_get_sync(dev);
	if (ret < 0) {
		pm_runtime_put(dev);
		return ret;
	}
	.....
	pm_runtime_put(dev);
	return 0;

can be rewritten as:

	ACQUIRE(pm_runtime_active_auto_try_enabled, pm)(dev);
	ret = ACQUIRE_ERR(pm_runtime_active_auto_try_enabled, &pm);
	if (ret < 0)
		return ret;
	.....
	return 0;

or pm_runtime_get_active_try can be used if transparent handling of
disabled runtime PM is desirable.

Link: https://lore.kernel.org/linux-pm/878qimv24u.wl-tiwai@suse.de/ [1]
Link: https://lore.kernel.org/linux-pm/20250926150613.000073a4@huawei.com/
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/2238241.irdbgypaU6@rafael.j.wysocki
[ rjw: Fixed leftovers from the previous version in the changelog ]
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Dhruva Gole <d-gole@ti.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-09-29 17:00:21 +02:00
Rafael J. Wysocki
f58f86df6a Merge branches 'pm-core', 'pm-runtime' and 'pm-sleep'
Merge changes related to system sleep and runtime PM framework for
6.18-rc1:

 - Annotate loops walking device links in the power management core
   code as _srcu and add macros for walking device links to reduce the
   likelihood of coding mistakes related to them (Rafael Wysocki)

 - Document time units for *_time functions in the runtime PM API (Brian
   Norris)

 - Clear power.must_resume in noirq suspend error path to avoid resuming
   a dependant device under a suspended parent or supplier (Rafael
   Wysocki)

 - Fix GFP mask handling during hybrid suspend and make the amdgpu
   driver handle hybrid suspend correctly (Mario Limonciello, Rafael
   Wysocki)

 - Fix GFP mask handling after aborted hibernation in platform mode and
   combine exit paths in power_down() to avoid code duplication (Rafael
   Wysocki)

 - Use vmalloc_array() and vcalloc() in the hibernation core to avoid
   open-coded size computations (Qianfeng Rong)

 - Fix typo in hibernation core code comment (Li Jun)

 - Call pm_wakeup_clear() in the same place where other functions that do
   bookkeeping prior to suspend_prepare() are called (Samuel Wu)

* pm-core:
  PM: core: Add two macros for walking device links
  PM: core: Annotate loops walking device links as _srcu

* pm-runtime:
  PM: runtime: Documentation: ABI: Document time units for *_time

* pm-sleep:
  PM: hibernate: Combine return paths in power_down()
  PM: hibernate: Restrict GFP mask in power_down()
  PM: hibernate: Fix pm_hibernation_mode_is_suspend() build breakage
  drm/amd: Fix hybrid sleep
  PM: hibernate: Add pm_hibernation_mode_is_suspend()
  PM: hibernate: Fix hybrid-sleep
  PM: sleep: core: Clear power.must_resume in noirq suspend error path
  PM: sleep: Make pm_wakeup_clear() call more clear
  PM: hibernate: Fix typo in memory bitmaps description comment
  PM: hibernate: Use vmalloc_array() and vcalloc() to improve code
2025-09-29 12:54:01 +02:00
Donet Tom
0efdedfa53 drivers/base/node: fix double free in register_one_node()
When device_register() fails in register_node(), it calls
put_device(&node->dev).  This triggers node_device_release(), which calls
kfree(to_node(dev)), thereby freeing the entire node structure.

As a result, when register_node() returns an error, the node memory has
already been freed.  Calling kfree(node) again in register_one_node()
leads to a double free.

This patch removes the redundant kfree(node) from register_one_node() to
prevent the double free.

Link: https://lkml.kernel.org/r/20250918054144.58980-1-donettom@linux.ibm.com
Fixes: 786eb990cf ("drivers/base/node: handle error properly in register_one_node()")
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Cc: Alison Schofield <alison.schofield@intel.com>
Cc: Chris Mason <clm@meta.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hiroyouki Kamezawa <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Joanthan Cameron <Jonathan.Cameron@huawei.com>
Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
Cc: Yury Norov (NVIDIA) <yury.norov@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-28 11:51:31 -07:00
Pin-yen Lin
632d31067b PM: sleep: Do not wait on SYNC_STATE_ONLY device links
Device links with DL_FLAG_SYNC_STATE_ONLY should not affect system
suspend and resume, and functions like device_reorder_to_tail() and
device_link_add() don't try to reorder the consumers with that flag.

However, dpm_wait_for_consumers() and dpm_wait_for_suppliers() don't
check thas flag before triggering dpm_wait(), leading to potential hang
during suspend/resume.

This can be reproduced on MT8186 Corsola Chromebook with devicetree like:

usb-a-connector {
        compatible = "usb-a-connector";
        port {
                usb_a_con: endpoint {
                        remote-endpoint = <&usb_hs>;
                };
        };
};

usb_host {
        compatible = "mediatek,mt8186-xhci", "mediatek,mtk-xhci";
        port {
                usb_hs: endpoint {
                        remote-endpoint = <&usb_a_con>;
                };
        };
};

In this case, the two nodes form a cycle and a SYNC_STATE_ONLY devlink
between usb_host (supplier) and usb-a-connector (consumer) is created.

Address this by exporting device_link_flag_is_sync_state_only() and
making dpm_wait_for_consumers() and dpm_wait_for_suppliers() use it
when deciding if dpm_wait() should be called.

Fixes: 05ef983e0d ("driver core: Add device link support for SYNC_STATE_ONLY flag")
Signed-off-by: Pin-yen Lin <treapking@chromium.org>
Link: https://patch.msgid.link/20250926102320.4053167-1-treapking@chromium.org
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-09-27 14:10:51 +02:00
Brian Norris
d0b8651a02 PM: runtime: Make put{,_sync}() return 1 when already suspended
The pm_runtime.h docs say pm_runtime_put() and pm_runtime_put_sync()
return 1 when already suspended, but this is not true -- they return
-EAGAIN. On the other hand, pm_runtime_put_sync_suspend() and
pm_runtime_put_sync_autosuspend() *do* return 1.

This is an artifact of the fact that the former are built on rpm_idle(),
whereas the latter are built on rpm_suspend().

There are precious few pm_runtime_put()/pm_runtime_put_sync() callers
that check the return code at all, but most of them only log errors, and
usually only for negative error codes. None of them should be treating
this as an error, so:

 * at best, this may fix some case where a driver treats this condition
   as an error, when it shouldn't;

 * at worst, this should make no effect; and

 * somewhere in between, we could potentially clear up non-fatal log
   messages.

Fix the pm_runtime_already_suspended_test() while tweaking the behavior.
The test makes a lot more sense when these all return 1 when the device
is already suspended:

    pm_runtime_put_sync(dev);
    pm_runtime_suspend(dev);
    pm_runtime_autosuspend(dev);
    pm_request_autosuspend(dev);
    pm_runtime_put_sync_autosuspend(dev);

Notably, I've avoided testing the return codes for these, since they
really should be ignored by callers, and we may make them 'void'
altogether:

    pm_runtime_put(dev);
    pm_runtime_put_autosuspend(dev);

Signed-off-by: Brian Norris <briannorris@chromium.org>
Reviewed-by: Dhruva Gole <d-gole@ti.com>
Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-09-27 13:41:47 +02:00