Merge tag 'pm-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki: "As is tradition, cpufreq is the part with the largest number of updates that include core fixes and cleanups as well as updates of several assorted drivers, but there are also quite a few updates related to system sleep, mostly focused on asynchronous suspend and resume of devices and on making the integration of system suspend and resume with runtime PM easier. Runtime PM is also updated to allow some code duplication in drivers to be eliminated going forward and to work more consistently overall in some cases. Apart from that, there are some driver core updates related to PM domains that should help to address ordering issues with devm_ cleanup routines relying on PM domains, some assorted devfreq updates including core fixes and cleanups, tooling updates, and documentation and MAINTAINERS updates. Specifics: - Fix two initialization ordering issues in the cpufreq core and a governor initialization error path in it, and clean it up (Lifeng Zheng) - Add Granite Rapids support in no-HWP mode to the intel_pstate cpufreq driver (Li RongQing) - Make intel_pstate always use HWP_DESIRED_PERF when operating in the passive mode (Rafael Wysocki) - Allow building the tegra124 cpufreq driver as a module (Aaron Kling) - Do minor cleanups for Rust cpufreq and cpumask APIs and fix MAINTAINERS entry for cpu.rs (Abhinav Ananthu, Ritvik Gupta, Lukas Bulwahn) - Clean up assorted cpufreq drivers (Arnd Bergmann, Dan Carpenter, Krzysztof Kozlowski, Sven Peter, Svyatoslav Ryhel, Lifeng Zheng) - Add the NEED_UPDATE_LIMITS flag to the CPPC cpufreq driver (Prashant Malani) - Fix minimum performance state label error in the amd-pstate driver documentation (Shouye Liu) - Add the CPUFREQ_GOV_STRICT_TARGET flag to the userspace cpufreq governor and explain HW coordination influence on it in the documentation (Shashank Balaji) - Fix opencoded for_each_cpu() in idle_state_valid() in the DT cpuidle driver (Yury Norov) - Remove info about non-existing QoS interfaces from the PM QoS documentation (Ulf Hansson) - Use c_* types via kernel prelude in Rust for OPP (Abhinav Ananthu) - Add HiSilicon uncore frequency scaling driver to devfreq (Jie Zhan) - Allow devfreq drivers to add custom sysfs ABIs (Jie Zhan) - Simplify the sun8i-a33-mbus devfreq driver by using more devm functions (Uwe Kleine-König) - Fix an index typo in trans_stat() in devfreq (Chanwoo Choi) - Check devfreq governor before using governor->name (Lifeng Zheng) - Remove a redundant devfreq_get_freq_range() call from devfreq_add_device() (Lifeng Zheng) - Limit max_freq with scaling_min_freq in devfreq (Lifeng Zheng) - Replace sscanf() with kstrtoul() in set_freq_store() (Lifeng Zheng) - Extend the asynchronous suspend and resume of devices to handle suppliers like parents and consumers like children (Rafael Wysocki) - Make pm_runtime_force_resume() work for drivers that set the DPM_FLAG_SMART_SUSPEND flag and allow PCI drivers and drivers that collaborate with the general ACPI PM domain to set it (Rafael Wysocki) - Add kernel parameter to disable asynchronous suspend/resume of devices (Tudor Ambarus) - Drop redundant might_sleep() calls from some functions in the device suspend/resume core code (Zhongqiu Han) - Fix the handling of monitors connected right before waking up the system from sleep (tuhaowen) - Clean up MAINTAINERS entries for suspend and hibernation (Rafael Wysocki) - Fix error code path in the KEXEC_JUMP flow and drop a redundant pm_restore_gfp_mask() call from it (Rafael Wysocki) - Rearrange suspend/resume error handling in the core device suspend and resume code (Rafael Wysocki) - Fix up white space that does not follow coding style in the hibernation core code (Darshan Rathod) - Document return values of suspend-related API functions in the runtime PM framework (Sakari Ailus) - Mark last busy stamp in multiple autosuspend-related functions in the runtime PM framework and update its documentation (Sakari Ailus) - Take active children into account in pm_runtime_get_if_in_use() for consistency (Rafael Wysocki) - Fix NULL pointer dereference in get_pd_power_uw() in the dtpm_cpu power capping driver (Sivan Zohar-Kotzer) - Add support for the Bartlett Lake platform to the Intel RAPL power capping driver (Qiao Wei) - Add PL4 support for Panther Lake to the intel_rapl_msr power capping driver (Zhang Rui) - Update contact information in the PM ABI docs and maintainer information in the power domains DT binding (Rafael Wysocki) - Update PM header inclusions to follow the IWYU (Include What You Use) principle (Andy Shevchenko) - Add flags to specify power on attach/detach for PM domains, make the driver core detach PM domains in device_unbind_cleanup(), and drop the dev_pm_domain_detach() call from the platform bus type (Claudiu Beznea) - Improve Python binding's Makefile for cpupower (John B. Wyatt IV) - Fix printing of CORE, CPU fields in cpupower-monitor (Gautham Shenoy)" * tag 'pm-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (75 commits) cpufreq: CPPC: Mark driver with NEED_UPDATE_LIMITS flag PM: docs: Use my kernel.org address in ABI docs and DT bindings PM: hibernate: Fix up white space that does not follow coding style PM: sleep: Rearrange suspend/resume error handling in the core Documentation: amd-pstate:fix minimum performance state label error PM: runtime: Take active children into account in pm_runtime_get_if_in_use() kexec_core: Drop redundant pm_restore_gfp_mask() call kexec_core: Fix error code path in the KEXEC_JUMP flow PM: sleep: Clean up MAINTAINERS entries for suspend and hibernation drivers: cpufreq: add Tegra114 support rust: cpumask: Replace `MaybeUninit` and `mem::zeroed` with `Opaque` APIs cpufreq: Exit governor when failed to start old governor cpufreq: Move the check of cpufreq_driver->get into cpufreq_verify_current_freq() cpufreq: Init policy->rwsem before it may be possibly used cpufreq: Initialize cpufreq-based frequency-invariance later cpufreq: Remove duplicate check in __cpufreq_offline() cpufreq: Contain scaling_cur_freq.attr in cpufreq_attrs cpufreq: intel_pstate: Add Granite Rapids support in no-HWP mode cpufreq: intel_pstate: Always use HWP_DESIRED_PERF in passive mode PM / devfreq: Add HiSilicon uncore frequency scaling driver ...
2025-12-27 11:06:41 -05:00 · 2025-07-28 20:13:36 -07:00
parent ae388edd4a 40c2819955
commit 53edfecef6
64 changed files with 1405 additions and 457 deletions
--- a/Documentation/ABI/testing/sysfs-class-devfreq
+++ b/Documentation/ABI/testing/sysfs-class-devfreq
@@ -132,3 +132,12 @@ Description:

 		A list of governors that support the node:
 		- simple_ondemand
+
+What:		/sys/class/devfreq/.../related_cpus
+Date:		June 2025
+Contact:	Linux power management list <linux-pm@vger.kernel.org>
+Description:	The list of CPUs whose performance is closely related to the
+		frequency of this devfreq domain.
+
+		This file is only present if a specific devfreq device is
+		closely associated with a subset of CPUs.
--- a/Documentation/ABI/testing/sysfs-devices-power
+++ b/Documentation/ABI/testing/sysfs-devices-power
@@ -1,6 +1,6 @@
 What:		/sys/devices/.../power/
 Date:		January 2009
-Contact:	Rafael J. Wysocki <rjw@rjwysocki.net>
+Contact:	Rafael J. Wysocki <rafael@kernel.org>
 Description:
 		The /sys/devices/.../power directory contains attributes
 		allowing the user space to check and modify some power
@@ -8,7 +8,7 @@ Description:

 What:		/sys/devices/.../power/wakeup
 Date:		January 2009
-Contact:	Rafael J. Wysocki <rjw@rjwysocki.net>
+Contact:	Rafael J. Wysocki <rafael@kernel.org>
 Description:
 		The /sys/devices/.../power/wakeup attribute allows the user
 		space to check if the device is enabled to wake up the system
@@ -34,7 +34,7 @@ Description:

 What:		/sys/devices/.../power/control
 Date:		January 2009
-Contact:	Rafael J. Wysocki <rjw@rjwysocki.net>
+Contact:	Rafael J. Wysocki <rafael@kernel.org>
 Description:
 		The /sys/devices/.../power/control attribute allows the user
 		space to control the run-time power management of the device.
@@ -53,7 +53,7 @@ Description:

 What:		/sys/devices/.../power/async
 Date:		January 2009
-Contact:	Rafael J. Wysocki <rjw@rjwysocki.net>
+Contact:	Rafael J. Wysocki <rafael@kernel.org>
 Description:
 		The /sys/devices/.../async attribute allows the user space to
 		enable or disable the device's suspend and resume callbacks to
@@ -79,7 +79,7 @@ Description:

 What:		/sys/devices/.../power/wakeup_count
 Date:		September 2010
-Contact:	Rafael J. Wysocki <rjw@rjwysocki.net>
+Contact:	Rafael J. Wysocki <rafael@kernel.org>
 Description:
 		The /sys/devices/.../wakeup_count attribute contains the number
 		of signaled wakeup events associated with the device.  This
@@ -90,7 +90,7 @@ Description:

 What:		/sys/devices/.../power/wakeup_active_count
 Date:		September 2010
-Contact:	Rafael J. Wysocki <rjw@rjwysocki.net>
+Contact:	Rafael J. Wysocki <rafael@kernel.org>
 Description:
 		The /sys/devices/.../wakeup_active_count attribute contains the
 		number of times the processing of wakeup events associated with
@@ -102,7 +102,7 @@ Description:

 What:		/sys/devices/.../power/wakeup_abort_count
 Date:		February 2012
-Contact:	Rafael J. Wysocki <rjw@rjwysocki.net>
+Contact:	Rafael J. Wysocki <rafael@kernel.org>
 Description:
 		The /sys/devices/.../wakeup_abort_count attribute contains the
 		number of times the processing of a wakeup event associated with
@@ -114,7 +114,7 @@ Description:

 What:		/sys/devices/.../power/wakeup_expire_count
 Date:		February 2012
-Contact:	Rafael J. Wysocki <rjw@rjwysocki.net>
+Contact:	Rafael J. Wysocki <rafael@kernel.org>
 Description:
 		The /sys/devices/.../wakeup_expire_count attribute contains the
 		number of times a wakeup event associated with the device has
@@ -126,7 +126,7 @@ Description:

 What:		/sys/devices/.../power/wakeup_active
 Date:		September 2010
-Contact:	Rafael J. Wysocki <rjw@rjwysocki.net>
+Contact:	Rafael J. Wysocki <rafael@kernel.org>
 Description:
 		The /sys/devices/.../wakeup_active attribute contains either 1,
 		or 0, depending on whether or not a wakeup event associated with
@@ -138,7 +138,7 @@ Description:

 What:		/sys/devices/.../power/wakeup_total_time_ms
 Date:		September 2010
-Contact:	Rafael J. Wysocki <rjw@rjwysocki.net>
+Contact:	Rafael J. Wysocki <rafael@kernel.org>
 Description:
 		The /sys/devices/.../wakeup_total_time_ms attribute contains
 		the total time of processing wakeup events associated with the
@@ -149,7 +149,7 @@ Description:

 What:		/sys/devices/.../power/wakeup_max_time_ms
 Date:		September 2010
-Contact:	Rafael J. Wysocki <rjw@rjwysocki.net>
+Contact:	Rafael J. Wysocki <rafael@kernel.org>
 Description:
 		The /sys/devices/.../wakeup_max_time_ms attribute contains
 		the maximum time of processing a single wakeup event associated
@@ -161,7 +161,7 @@ Description:

 What:		/sys/devices/.../power/wakeup_last_time_ms
 Date:		September 2010
-Contact:	Rafael J. Wysocki <rjw@rjwysocki.net>
+Contact:	Rafael J. Wysocki <rafael@kernel.org>
 Description:
 		The /sys/devices/.../wakeup_last_time_ms attribute contains
 		the value of the monotonic clock corresponding to the time of
@@ -173,7 +173,7 @@ Description:

 What:		/sys/devices/.../power/wakeup_prevent_sleep_time_ms
 Date:		February 2012
-Contact:	Rafael J. Wysocki <rjw@rjwysocki.net>
+Contact:	Rafael J. Wysocki <rafael@kernel.org>
 Description:
 		The /sys/devices/.../wakeup_prevent_sleep_time_ms attribute
 		contains the total time the device has been preventing
@@ -203,7 +203,7 @@ Description:

 What:		/sys/devices/.../power/pm_qos_resume_latency_us
 Date:		March 2012
-Contact:	Rafael J. Wysocki <rjw@rjwysocki.net>
+Contact:	Rafael J. Wysocki <rafael@kernel.org>
 Description:
 		The /sys/devices/.../power/pm_qos_resume_latency_us attribute
 		contains the PM QoS resume latency limit for the given device,
@@ -223,7 +223,7 @@ Description:

 What:		/sys/devices/.../power/pm_qos_latency_tolerance_us
 Date:		January 2014
-Contact:	Rafael J. Wysocki <rjw@rjwysocki.net>
+Contact:	Rafael J. Wysocki <rafael@kernel.org>
 Description:
 		The /sys/devices/.../power/pm_qos_latency_tolerance_us attribute
 		contains the PM QoS active state latency tolerance limit for the
@@ -248,7 +248,7 @@ Description:

 What:		/sys/devices/.../power/pm_qos_no_power_off
 Date:		September 2012
-Contact:	Rafael J. Wysocki <rjw@rjwysocki.net>
+Contact:	Rafael J. Wysocki <rafael@kernel.org>
 Description:
 		The /sys/devices/.../power/pm_qos_no_power_off attribute
 		is used for manipulating the PM QoS "no power off" flag.  If
@@ -263,7 +263,7 @@ Description:

 What:		/sys/devices/.../power/runtime_status
 Date:		April 2010
-Contact:	Rafael J. Wysocki <rjw@rjwysocki.net>
+Contact:	Rafael J. Wysocki <rafael@kernel.org>
 Description:
 		The /sys/devices/.../power/runtime_status attribute contains
 		the current runtime PM status of the device, which may be
--- a/Documentation/ABI/testing/sysfs-power
+++ b/Documentation/ABI/testing/sysfs-power
@@ -1,6 +1,6 @@
 What:		/sys/power/
 Date:		August 2006
-Contact:	Rafael J. Wysocki <rjw@rjwysocki.net>
+Contact:	Rafael J. Wysocki <rafael@kernel.org>
 Description:
 		The /sys/power directory will contain files that will
 		provide a unified interface to the power management
@@ -8,7 +8,7 @@ Description:

 What:		/sys/power/state
 Date:		November 2016
-Contact:	Rafael J. Wysocki <rjw@rjwysocki.net>
+Contact:	Rafael J. Wysocki <rafael@kernel.org>
 Description:
 		The /sys/power/state file controls system sleep states.
 		Reading from this file returns the available sleep state
@@ -23,7 +23,7 @@ Description:

 What:		/sys/power/mem_sleep
 Date:		November 2016
-Contact:	Rafael J. Wysocki <rjw@rjwysocki.net>
+Contact:	Rafael J. Wysocki <rafael@kernel.org>
 Description:
 		The /sys/power/mem_sleep file controls the operating mode of
 		system suspend.  Reading from it returns the available modes
@@ -41,7 +41,7 @@ Description:

 What:		/sys/power/disk
 Date:		September 2006
-Contact:	Rafael J. Wysocki <rjw@rjwysocki.net>
+Contact:	Rafael J. Wysocki <rafael@kernel.org>
 Description:
 		The /sys/power/disk file controls the operating mode of the
 		suspend-to-disk mechanism.  Reading from this file returns
@@ -90,7 +90,7 @@ Description:

 What:		/sys/power/image_size
 Date:		August 2006
-Contact:	Rafael J. Wysocki <rjw@rjwysocki.net>
+Contact:	Rafael J. Wysocki <rafael@kernel.org>
 Description:
 		The /sys/power/image_size file controls the size of the image
 		created by the suspend-to-disk mechanism.  It can be written a
@@ -107,7 +107,7 @@ Description:

 What:		/sys/power/pm_trace
 Date:		August 2006
-Contact:	Rafael J. Wysocki <rjw@rjwysocki.net>
+Contact:	Rafael J. Wysocki <rafael@kernel.org>
 Description:
 		The /sys/power/pm_trace file controls the code which saves the
 		last PM event point in the RTC across reboots, so that you can
@@ -156,7 +156,7 @@ Description:

 What:		/sys/power/pm_async
 Date:		January 2009
-Contact:	Rafael J. Wysocki <rjw@rjwysocki.net>
+Contact:	Rafael J. Wysocki <rafael@kernel.org>
 Description:
 		The /sys/power/pm_async file controls the switch allowing the
 		user space to enable or disable asynchronous suspend and resume
@@ -169,7 +169,7 @@ Description:

 What:		/sys/power/wakeup_count
 Date:		July 2010
-Contact:	Rafael J. Wysocki <rjw@rjwysocki.net>
+Contact:	Rafael J. Wysocki <rafael@kernel.org>
 Description:
 		The /sys/power/wakeup_count file allows user space to put the
 		system into a sleep state while taking into account the
@@ -184,7 +184,7 @@ Description:

 What:		/sys/power/reserved_size
 Date:		May 2011
-Contact:	Rafael J. Wysocki <rjw@rjwysocki.net>
+Contact:	Rafael J. Wysocki <rafael@kernel.org>
 Description:
 		The /sys/power/reserved_size file allows user space to control
 		the amount of memory reserved for allocations made by device
@@ -198,7 +198,7 @@ Description:

 What:		/sys/power/autosleep
 Date:		April 2012
-Contact:	Rafael J. Wysocki <rjw@rjwysocki.net>
+Contact:	Rafael J. Wysocki <rafael@kernel.org>
 Description:
 		The /sys/power/autosleep file can be written one of the strings
 		returned by reads from /sys/power/state.  If that happens, a
@@ -215,7 +215,7 @@ Description:

 What:		/sys/power/wake_lock
 Date:		February 2012
-Contact:	Rafael J. Wysocki <rjw@rjwysocki.net>
+Contact:	Rafael J. Wysocki <rafael@kernel.org>
 Description:
 		The /sys/power/wake_lock file allows user space to create
 		wakeup source objects and activate them on demand (if one of
@@ -242,7 +242,7 @@ Description:

 What:		/sys/power/wake_unlock
 Date:		February 2012
-Contact:	Rafael J. Wysocki <rjw@rjwysocki.net>
+Contact:	Rafael J. Wysocki <rafael@kernel.org>
 Description:
 		The /sys/power/wake_unlock file allows user space to deactivate
 		wakeup sources created with the help of /sys/power/wake_lock.
@@ -283,7 +283,7 @@ Description:

 What:		/sys/power/pm_debug_messages
 Date:		July 2017
-Contact:	Rafael J. Wysocki <rjw@rjwysocki.net>
+Contact:	Rafael J. Wysocki <rafael@kernel.org>
 Description:
 		The /sys/power/pm_debug_messages file controls the printing
 		of debug messages from the system suspend/hiberbation
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5000,6 +5000,18 @@
 			that number, otherwise (e.g., 'pmu_override=on'), MMCR1
 			remains 0.

+	pm_async=	[PM]
+			Format: off
+			This parameter sets the initial value of the
+			/sys/power/pm_async sysfs knob at boot time.
+			If set to "off", disables asynchronous suspend and
+			resume of devices during system-wide power transitions.
+			This can be useful on platforms where device
+			dependencies are not well-defined, or for debugging
+			power management issues. Asynchronous operations are
+			enabled by default.
+
+
 	pm_debug_messages	[SUSPEND,KNL]
 			Enable suspend/resume debug messages during boot up.

--- a/Documentation/admin-guide/pm/amd-pstate.rst
+++ b/Documentation/admin-guide/pm/amd-pstate.rst
@@ -72,7 +72,7 @@ to manage each performance update behavior. ::
  Lowest non-        |                       |                         |                       |
  linear perf ------>+-----------------------+                         +-----------------------+
                     |                       |                         |                       |
-                     |                       |       Lowest perf  ---->|                       |
+                     |                       |          Min perf  ---->|                       |
                     |                       |                         |                       |
  Lowest perf ------>+-----------------------+                         +-----------------------+
                     |                       |                         |                       |
--- a/Documentation/admin-guide/pm/cpufreq.rst
+++ b/Documentation/admin-guide/pm/cpufreq.rst
@@ -398,7 +398,9 @@ policy limits change after that.

 This governor does not do anything by itself.  Instead, it allows user space
 to set the CPU frequency for the policy it is attached to by writing to the
-``scaling_setspeed`` attribute of that policy.
+``scaling_setspeed`` attribute of that policy. Though the intention may be to
+set an exact frequency for the policy, the actual frequency may vary depending
+on hardware coordination, thermal and power limits, and other factors.

 ``schedutil``
 -------------
--- a/Documentation/devicetree/bindings/power/power-domain.yaml
+++ b/Documentation/devicetree/bindings/power/power-domain.yaml
@@ -7,7 +7,7 @@ $schema: http://devicetree.org/meta-schemas/core.yaml#
 title: Generic PM domains

 maintainers:
-  - Rafael J. Wysocki <rjw@rjwysocki.net>
+  - Rafael J. Wysocki <rafael@kernel.org>
  - Kevin Hilman <khilman@kernel.org>
  - Ulf Hansson <ulf.hansson@linaro.org>

--- a/Documentation/power/pm_qos_interface.rst
+++ b/Documentation/power/pm_qos_interface.rst
@@ -52,13 +52,6 @@ int cpu_latency_qos_request_active(handle):
  Returns if the request is still active, i.e. it has not been removed from the
  CPU latency QoS list.

-int cpu_latency_qos_add_notifier(notifier):
-  Adds a notification callback function to the CPU latency QoS. The callback is
-  called when the aggregated value for the CPU latency QoS is changed.
-
-int cpu_latency_qos_remove_notifier(notifier):
-  Removes the notification callback function from the CPU latency QoS.
-

 From user space:

--- a/Documentation/power/runtime_pm.rst
+++ b/Documentation/power/runtime_pm.rst
@@ -154,11 +154,9 @@ suspending the device are satisfied) and to queue up a suspend request for the
 device in that case.  If there is no idle callback, or if the callback returns
 0, then the PM core will attempt to carry out a runtime suspend of the device,
 also respecting devices configured for autosuspend.  In essence this means a
-call to pm_runtime_autosuspend() (do note that drivers needs to update the
-device last busy mark, pm_runtime_mark_last_busy(), to control the delay under
-this circumstance).  To prevent this (for example, if the callback routine has
-started a delayed suspend), the routine must return a non-zero value.  Negative
-error return codes are ignored by the PM core.
+call to pm_runtime_autosuspend(). To prevent this (for example, if the callback
+routine has started a delayed suspend), the routine must return a non-zero
+value.  Negative error return codes are ignored by the PM core.

 The helper functions provided by the PM core, described in Section 4, guarantee
 that the following constraints are met with respect to runtime PM callbacks for
@@ -330,10 +328,9 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
      'power.disable_depth' is different from 0

  `int pm_runtime_autosuspend(struct device *dev);`
-    - same as pm_runtime_suspend() except that the autosuspend delay is taken
-      `into account;` if pm_runtime_autosuspend_expiration() says the delay has
-      not yet expired then an autosuspend is scheduled for the appropriate time
-      and 0 is returned
+    - same as pm_runtime_suspend() except that a call to
+      pm_runtime_mark_last_busy() is made and an autosuspend is scheduled for
+      the appropriate time and 0 is returned

  `int pm_runtime_resume(struct device *dev);`
    - execute the subsystem-level resume callback for the device; returns 0 on
@@ -357,9 +354,9 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
      success or error code if the request has not been queued up

  `int pm_request_autosuspend(struct device *dev);`
-    - schedule the execution of the subsystem-level suspend callback for the
-      device when the autosuspend delay has expired; if the delay has already
-      expired then the work item is queued up immediately
+    - Call pm_runtime_mark_last_busy() and schedule the execution of the
+      subsystem-level suspend callback for the device when the autosuspend delay
+      expires

  `int pm_schedule_suspend(struct device *dev, unsigned int delay);`
    - schedule the execution of the subsystem-level suspend callback for the
@@ -411,8 +408,9 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
      pm_request_idle(dev) and return its result

  `int pm_runtime_put_autosuspend(struct device *dev);`
-    - does the same as __pm_runtime_put_autosuspend() for now, but in the
-      future, will also call pm_runtime_mark_last_busy() as well, DO NOT USE!
+    - set the power.last_busy field to the current time and decrement the
+      device's usage counter; if the result is 0 then run
+      pm_request_autosuspend(dev) and return its result

  `int __pm_runtime_put_autosuspend(struct device *dev);`
    - decrement the device's usage counter; if the result is 0 then run
@@ -427,7 +425,8 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
      pm_runtime_suspend(dev) and return its result

  `int pm_runtime_put_sync_autosuspend(struct device *dev);`
-    - decrement the device's usage counter; if the result is 0 then run
+    - set the power.last_busy field to the current time and decrement the
+      device's usage counter; if the result is 0 then run
      pm_runtime_autosuspend(dev) and return its result

  `void pm_runtime_enable(struct device *dev);`
@@ -870,11 +869,9 @@ device is automatically suspended (the subsystem or driver still has to call
 the appropriate PM routines); rather it means that runtime suspends will
 automatically be delayed until the desired period of inactivity has elapsed.

-Inactivity is determined based on the power.last_busy field.  Drivers should
-call pm_runtime_mark_last_busy() to update this field after carrying out I/O,
-typically just before calling __pm_runtime_put_autosuspend().  The desired
-length of the inactivity period is a matter of policy.  Subsystems can set this
-length initially by calling pm_runtime_set_autosuspend_delay(), but after device
+Inactivity is determined based on the power.last_busy field. The desired length
+of the inactivity period is a matter of policy.  Subsystems can set this length
+initially by calling pm_runtime_set_autosuspend_delay(), but after device
 registration the length should be controlled by user space, using the
 /sys/devices/.../power/autosuspend_delay_ms attribute.

@@ -885,12 +882,13 @@ instead of the non-autosuspend counterparts::

 	Instead of: pm_runtime_suspend    use: pm_runtime_autosuspend;
 	Instead of: pm_schedule_suspend   use: pm_request_autosuspend;
-	Instead of: pm_runtime_put        use: __pm_runtime_put_autosuspend;
+	Instead of: pm_runtime_put        use: pm_runtime_put_autosuspend;
 	Instead of: pm_runtime_put_sync   use: pm_runtime_put_sync_autosuspend.

 Drivers may also continue to use the non-autosuspend helper functions; they
 will behave normally, which means sometimes taking the autosuspend delay into
-account (see pm_runtime_idle).
+account (see pm_runtime_idle). The autosuspend variants of the functions also
+call pm_runtime_mark_last_busy().

 Under some circumstances a driver or subsystem may want to prevent a device
 from autosuspending immediately, even though the usage counter is zero and the
@@ -922,12 +920,10 @@ Here is a schematic pseudo-code example::
 	foo_io_completion(struct foo_priv *foo, void *req)
 	{
 		lock(&foo->private_lock);
-		if (--foo->num_pending_requests == 0) {
-			pm_runtime_mark_last_busy(&foo->dev);
-			__pm_runtime_put_autosuspend(&foo->dev);
-		} else {
+		if (--foo->num_pending_requests == 0)
+			pm_runtime_put_autosuspend(&foo->dev);
+		else
 			foo_process_next_request(foo);
-		}
 		unlock(&foo->private_lock);
 		/* Send req result back to the user ... */
 	}
--- a/10
+++ b/10
@@ -6257,7 +6257,7 @@ F:	include/linux/cpuhotplug.h
 F:	include/linux/smpboot.h
 F:	kernel/cpu.c
 F:	kernel/smpboot.*
-F:	rust/helper/cpu.c
+F:	rust/helpers/cpu.c
 F:	rust/kernel/cpu.rs

 CPU IDLE TIME MANAGEMENT FRAMEWORK
@@ -9789,7 +9789,7 @@ F:	fs/freevxfs/

 FREEZER
 M:	"Rafael J. Wysocki" <rafael@kernel.org>
-M:	Pavel Machek <pavel@kernel.org>
+R:	Pavel Machek <pavel@kernel.org>
 L:	linux-pm@vger.kernel.org
 S:	Supported
 F:	Documentation/power/freezing-of-tasks.rst
@@ -10663,7 +10663,7 @@ F:	drivers/video/fbdev/hgafb.c

 HIBERNATION (aka Software Suspend, aka swsusp)
 M:	"Rafael J. Wysocki" <rafael@kernel.org>
-M:	Pavel Machek <pavel@kernel.org>
+R:	Pavel Machek <pavel@kernel.org>
 L:	linux-pm@vger.kernel.org
 S:	Supported
 B:	https://bugzilla.kernel.org
@@ -23938,8 +23938,8 @@ F:	drivers/sh/

 SUSPEND TO RAM
 M:	"Rafael J. Wysocki" <rafael@kernel.org>
-M:	Len Brown <len.brown@intel.com>
-M:	Pavel Machek <pavel@kernel.org>
+R:	Len Brown <lenb@kernel.org>
+R:	Pavel Machek <pavel@kernel.org>
 L:	linux-pm@vger.kernel.org
 S:	Supported
 B:	https://bugzilla.kernel.org
--- a/drivers/acpi/device_pm.c
+++ b/drivers/acpi/device_pm.c
@@ -1119,6 +1119,8 @@ int acpi_subsys_prepare(struct device *dev)
 {
 	struct acpi_device *adev = ACPI_COMPANION(dev);

+	dev_pm_set_strict_midlayer(dev, true);
+
 	if (dev->driver && dev->driver->pm && dev->driver->pm->prepare) {
 		int ret = dev->driver->pm->prepare(dev);

@@ -1147,6 +1149,8 @@ void acpi_subsys_complete(struct device *dev)
 	 */
 	if (pm_runtime_suspended(dev) && pm_resume_via_firmware())
 		pm_request_resume(dev);
+
+	dev_pm_set_strict_midlayer(dev, false);
 }
 EXPORT_SYMBOL_GPL(acpi_subsys_complete);

--- a/drivers/amba/bus.c
+++ b/drivers/amba/bus.c
@@ -138,7 +138,7 @@ static int amba_read_periphid(struct amba_device *dev)
 	void __iomem *tmp;
 	int i, ret;

-	ret = dev_pm_domain_attach(&dev->dev, true);
+	ret = dev_pm_domain_attach(&dev->dev, PD_FLAG_ATTACH_POWER_ON);
 	if (ret) {
 		dev_dbg(&dev->dev, "can't get PM domain: %d\n", ret);
 		goto err_out;
@@ -291,7 +291,7 @@ static int amba_probe(struct device *dev)
 		if (ret < 0)
 			break;

-		ret = dev_pm_domain_attach(dev, true);
+		ret = dev_pm_domain_attach(dev, PD_FLAG_ATTACH_POWER_ON);
 		if (ret)
 			break;

--- a/drivers/base/auxiliary.c
+++ b/drivers/base/auxiliary.c
@@ -217,7 +217,7 @@ static int auxiliary_bus_probe(struct device *dev)
 	struct auxiliary_device *auxdev = to_auxiliary_dev(dev);
 	int ret;

-	ret = dev_pm_domain_attach(dev, true);
+	ret = dev_pm_domain_attach(dev, PD_FLAG_ATTACH_POWER_ON);
 	if (ret) {
 		dev_warn(dev, "Failed to attach to PM Domain : %d\n", ret);
 		return ret;
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -25,6 +25,7 @@
 #include <linux/kthread.h>
 #include <linux/wait.h>
 #include <linux/async.h>
+#include <linux/pm_domain.h>
 #include <linux/pm_runtime.h>
 #include <linux/pinctrl/devinfo.h>
 #include <linux/slab.h>
@@ -552,6 +553,7 @@ static void device_unbind_cleanup(struct device *dev)
 	dev->dma_range_map = NULL;
 	device_set_driver(dev, NULL);
 	dev_set_drvdata(dev, NULL);
+	dev_pm_domain_detach(dev, dev->power.detach_power_off);
 	if (dev->pm_domain && dev->pm_domain->dismiss)
 		dev->pm_domain->dismiss(dev);
 	pm_runtime_reinit(dev);
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -1396,15 +1396,13 @@ static int platform_probe(struct device *_dev)
 	if (ret < 0)
 		return ret;

-	ret = dev_pm_domain_attach(_dev, true);
+	ret = dev_pm_domain_attach(_dev, PD_FLAG_ATTACH_POWER_ON |
+					 PD_FLAG_DETACH_POWER_OFF);
 	if (ret)
 		goto out;

-	if (drv->probe) {
+	if (drv->probe)
 		ret = drv->probe(dev);
-		if (ret)
-			dev_pm_domain_detach(_dev, true);
-	}

 out:
 	if (drv->prevent_deferred_probe && ret == -EPROBE_DEFER) {
@@ -1422,7 +1420,6 @@ static void platform_remove(struct device *_dev)

 	if (drv->remove)
 		drv->remove(dev);
-	dev_pm_domain_detach(_dev, true);
 }

 static void platform_shutdown(struct device *_dev)
--- a/drivers/base/power/common.c
+++ b/drivers/base/power/common.c
@@ -83,7 +83,7 @@ EXPORT_SYMBOL_GPL(dev_pm_put_subsys_data);
 /**
 * dev_pm_domain_attach - Attach a device to its PM domain.
 * @dev: Device to attach.
- * @power_on: Used to indicate whether we should power on the device.
+ * @flags: indicate whether we should power on/off the device on attach/detach
 *
 * The @dev may only be attached to a single PM domain. By iterating through
 * the available alternatives we try to find a valid PM domain for the device.
@@ -100,17 +100,20 @@ EXPORT_SYMBOL_GPL(dev_pm_put_subsys_data);
 * Returns 0 on successfully attached PM domain, or when it is found that the
 * device doesn't need a PM domain, else a negative error code.
 */
-int dev_pm_domain_attach(struct device *dev, bool power_on)
+int dev_pm_domain_attach(struct device *dev, u32 flags)
 {
 	int ret;

 	if (dev->pm_domain)
 		return 0;

-	ret = acpi_dev_pm_attach(dev, power_on);
+	ret = acpi_dev_pm_attach(dev, !!(flags & PD_FLAG_ATTACH_POWER_ON));
 	if (!ret)
 		ret = genpd_dev_pm_attach(dev);

+	if (dev->pm_domain)
+		dev->power.detach_power_off = !!(flags & PD_FLAG_DETACH_POWER_OFF);
+
 	return ret < 0 ? ret : 0;
 }
 EXPORT_SYMBOL_GPL(dev_pm_domain_attach);
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -647,14 +647,27 @@ static void dpm_async_resume_children(struct device *dev, async_func_t func)
 	/*
 	 * Start processing "async" children of the device unless it's been
 	 * started already for them.
-	 *
-	 * This could have been done for the device's "async" consumers too, but
-	 * they either need to wait for their parents or the processing has
-	 * already started for them after their parents were processed.
 	 */
 	device_for_each_child(dev, func, dpm_async_with_cleanup);
 }

+static void dpm_async_resume_subordinate(struct device *dev, async_func_t func)
+{
+	struct device_link *link;
+	int idx;
+
+	dpm_async_resume_children(dev, func);
+
+	idx = device_links_read_lock();
+
+	/* Start processing the device's "async" consumers. */
+	list_for_each_entry_rcu(link, &dev->links.consumers, s_node)
+		if (READ_ONCE(link->status) != DL_STATE_DORMANT)
+			dpm_async_with_cleanup(link->consumer, func);
+
+	device_links_read_unlock(idx);
+}
+
 static void dpm_clear_async_state(struct device *dev)
 {
 	reinit_completion(&dev->power.completion);
@@ -663,7 +676,14 @@ static void dpm_clear_async_state(struct device *dev)

 static bool dpm_root_device(struct device *dev)
 {
-	return !dev->parent;
+	lockdep_assert_held(&dpm_list_mtx);
+
+	/*
+	 * Since this function is required to run under dpm_list_mtx, the
+	 * list_empty() below will only return true if the device's list of
+	 * consumers is actually empty before calling it.
+	 */
+	return !dev->parent && list_empty(&dev->links.suppliers);
 }

 static void async_resume_noirq(void *data, async_cookie_t cookie);
@@ -747,12 +767,12 @@ static void device_resume_noirq(struct device *dev, pm_message_t state, bool asy
 	TRACE_RESUME(error);

 	if (error) {
-		async_error = error;
+		WRITE_ONCE(async_error, error);
 		dpm_save_failed_dev(dev_name(dev));
 		pm_dev_err(dev, state, async ? " async noirq" : " noirq", error);
 	}

-	dpm_async_resume_children(dev, async_resume_noirq);
+	dpm_async_resume_subordinate(dev, async_resume_noirq);
 }

 static void async_resume_noirq(void *data, async_cookie_t cookie)
@@ -804,7 +824,7 @@ static void dpm_noirq_resume_devices(pm_message_t state)
 	mutex_unlock(&dpm_list_mtx);
 	async_synchronize_full();
 	dpm_show_time(starttime, state, 0, "noirq");
-	if (async_error)
+	if (READ_ONCE(async_error))
 		dpm_save_failed_step(SUSPEND_RESUME_NOIRQ);

 	trace_suspend_resume(TPS("dpm_resume_noirq"), state.event, false);
@@ -890,12 +910,12 @@ static void device_resume_early(struct device *dev, pm_message_t state, bool asy
 	complete_all(&dev->power.completion);

 	if (error) {
-		async_error = error;
+		WRITE_ONCE(async_error, error);
 		dpm_save_failed_dev(dev_name(dev));
 		pm_dev_err(dev, state, async ? " async early" : " early", error);
 	}

-	dpm_async_resume_children(dev, async_resume_early);
+	dpm_async_resume_subordinate(dev, async_resume_early);
 }

 static void async_resume_early(void *data, async_cookie_t cookie)
@@ -951,7 +971,7 @@ void dpm_resume_early(pm_message_t state)
 	mutex_unlock(&dpm_list_mtx);
 	async_synchronize_full();
 	dpm_show_time(starttime, state, 0, "early");
-	if (async_error)
+	if (READ_ONCE(async_error))
 		dpm_save_failed_step(SUSPEND_RESUME_EARLY);

 	trace_suspend_resume(TPS("dpm_resume_early"), state.event, false);
@@ -1066,12 +1086,12 @@ static void device_resume(struct device *dev, pm_message_t state, bool async)
 	TRACE_RESUME(error);

 	if (error) {
-		async_error = error;
+		WRITE_ONCE(async_error, error);
 		dpm_save_failed_dev(dev_name(dev));
 		pm_dev_err(dev, state, async ? " async" : "", error);
 	}

-	dpm_async_resume_children(dev, async_resume);
+	dpm_async_resume_subordinate(dev, async_resume);
 }

 static void async_resume(void *data, async_cookie_t cookie)
@@ -1095,7 +1115,6 @@ void dpm_resume(pm_message_t state)
 	ktime_t starttime = ktime_get();

 	trace_suspend_resume(TPS("dpm_resume"), state.event, true);
-	might_sleep();

 	pm_transition = state;
 	async_error = 0;
@@ -1131,7 +1150,7 @@ void dpm_resume(pm_message_t state)
 	mutex_unlock(&dpm_list_mtx);
 	async_synchronize_full();
 	dpm_show_time(starttime, state, 0, NULL);
-	if (async_error)
+	if (READ_ONCE(async_error))
 		dpm_save_failed_step(SUSPEND_RESUME);

 	cpufreq_resume();
@@ -1198,7 +1217,6 @@ void dpm_complete(pm_message_t state)
 	struct list_head list;

 	trace_suspend_resume(TPS("dpm_complete"), state.event, true);
-	might_sleep();

 	INIT_LIST_HEAD(&list);
 	mutex_lock(&dpm_list_mtx);
@@ -1258,10 +1276,15 @@ static bool dpm_leaf_device(struct device *dev)
 		return false;
 	}

-	return true;
+	/*
+	 * Since this function is required to run under dpm_list_mtx, the
+	 * list_empty() below will only return true if the device's list of
+	 * consumers is actually empty before calling it.
+	 */
+	return list_empty(&dev->links.consumers);
 }

-static void dpm_async_suspend_parent(struct device *dev, async_func_t func)
+static bool dpm_async_suspend_parent(struct device *dev, async_func_t func)
 {
 	guard(mutex)(&dpm_list_mtx);

@@ -1273,11 +1296,31 @@ static void dpm_async_suspend_parent(struct device *dev, async_func_t func)
 	 * deleted before it.
 	 */
 	if (!device_pm_initialized(dev))
-		return;
+		return false;

 	/* Start processing the device's parent if it is "async". */
 	if (dev->parent)
 		dpm_async_with_cleanup(dev->parent, func);
+
+	return true;
+}
+
+static void dpm_async_suspend_superior(struct device *dev, async_func_t func)
+{
+	struct device_link *link;
+	int idx;
+
+	if (!dpm_async_suspend_parent(dev, func))
+		return;
+
+	idx = device_links_read_lock();
+
+	/* Start processing the device's "async" suppliers. */
+	list_for_each_entry_rcu(link, &dev->links.suppliers, c_node)
+		if (READ_ONCE(link->status) != DL_STATE_DORMANT)
+			dpm_async_with_cleanup(link->supplier, func);
+
+	device_links_read_unlock(idx);
 }

 static void dpm_async_suspend_complete_all(struct list_head *device_list)
@@ -1344,7 +1387,7 @@ static void async_suspend_noirq(void *data, async_cookie_t cookie);
 * The driver of @dev will not receive interrupts while this function is being
 * executed.
 */
-static int device_suspend_noirq(struct device *dev, pm_message_t state, bool async)
+static void device_suspend_noirq(struct device *dev, pm_message_t state, bool async)
 {
 	pm_callback_t callback = NULL;
 	const char *info = NULL;
@@ -1355,7 +1398,7 @@ static int device_suspend_noirq(struct device *dev, pm_message_t state, bool asy

 	dpm_wait_for_subordinate(dev, async);

-	if (async_error)
+	if (READ_ONCE(async_error))
 		goto Complete;

 	if (dev->power.syscore || dev->power.direct_complete)
@@ -1388,7 +1431,7 @@ static int device_suspend_noirq(struct device *dev, pm_message_t state, bool asy
 Run:
 	error = dpm_run_callback(callback, dev, state, info);
 	if (error) {
-		async_error = error;
+		WRITE_ONCE(async_error, error);
 		dpm_save_failed_dev(dev_name(dev));
 		pm_dev_err(dev, state, async ? " async noirq" : " noirq", error);
 		goto Complete;
@@ -1414,12 +1457,10 @@ static int device_suspend_noirq(struct device *dev, pm_message_t state, bool asy
 	complete_all(&dev->power.completion);
 	TRACE_SUSPEND(error);

-	if (error || async_error)
-		return error;
+	if (error || READ_ONCE(async_error))
+		return;

-	dpm_async_suspend_parent(dev, async_suspend_noirq);
-
-	return 0;
+	dpm_async_suspend_superior(dev, async_suspend_noirq);
 }

 static void async_suspend_noirq(void *data, async_cookie_t cookie)
@@ -1434,7 +1475,7 @@ static int dpm_noirq_suspend_devices(pm_message_t state)
 {
 	ktime_t starttime = ktime_get();
 	struct device *dev;
-	int error = 0;
+	int error;

 	trace_suspend_resume(TPS("dpm_suspend_noirq"), state.event, true);

@@ -1465,13 +1506,13 @@ static int dpm_noirq_suspend_devices(pm_message_t state)

 		mutex_unlock(&dpm_list_mtx);

-		error = device_suspend_noirq(dev, state, false);
+		device_suspend_noirq(dev, state, false);

 		put_device(dev);

 		mutex_lock(&dpm_list_mtx);

-		if (error || async_error) {
+		if (READ_ONCE(async_error)) {
 			dpm_async_suspend_complete_all(&dpm_late_early_list);
 			/*
 			 * Move all devices to the target list to resume them
@@ -1485,9 +1526,8 @@ static int dpm_noirq_suspend_devices(pm_message_t state)
 	mutex_unlock(&dpm_list_mtx);

 	async_synchronize_full();
-	if (!error)
-		error = async_error;

+	error = READ_ONCE(async_error);
 	if (error)
 		dpm_save_failed_step(SUSPEND_SUSPEND_NOIRQ);

@@ -1542,7 +1582,7 @@ static void async_suspend_late(void *data, async_cookie_t cookie);
 *
 * Runtime PM is disabled for @dev while this function is being executed.
 */
-static int device_suspend_late(struct device *dev, pm_message_t state, bool async)
+static void device_suspend_late(struct device *dev, pm_message_t state, bool async)
 {
 	pm_callback_t callback = NULL;
 	const char *info = NULL;
@@ -1559,11 +1599,11 @@ static int device_suspend_late(struct device *dev, pm_message_t state, bool asyn

 	dpm_wait_for_subordinate(dev, async);

-	if (async_error)
+	if (READ_ONCE(async_error))
 		goto Complete;

 	if (pm_wakeup_pending()) {
-		async_error = -EBUSY;
+		WRITE_ONCE(async_error, -EBUSY);
 		goto Complete;
 	}

@@ -1597,7 +1637,7 @@ static int device_suspend_late(struct device *dev, pm_message_t state, bool asyn
 Run:
 	error = dpm_run_callback(callback, dev, state, info);
 	if (error) {
-		async_error = error;
+		WRITE_ONCE(async_error, error);
 		dpm_save_failed_dev(dev_name(dev));
 		pm_dev_err(dev, state, async ? " async late" : " late", error);
 		goto Complete;
@@ -1611,12 +1651,10 @@ static int device_suspend_late(struct device *dev, pm_message_t state, bool asyn
 	TRACE_SUSPEND(error);
 	complete_all(&dev->power.completion);

-	if (error || async_error)
-		return error;
+	if (error || READ_ONCE(async_error))
+		return;

-	dpm_async_suspend_parent(dev, async_suspend_late);
-
-	return 0;
+	dpm_async_suspend_superior(dev, async_suspend_late);
 }

 static void async_suspend_late(void *data, async_cookie_t cookie)
@@ -1635,7 +1673,7 @@ int dpm_suspend_late(pm_message_t state)
 {
 	ktime_t starttime = ktime_get();
 	struct device *dev;
-	int error = 0;
+	int error;

 	trace_suspend_resume(TPS("dpm_suspend_late"), state.event, true);

@@ -1668,13 +1706,13 @@ int dpm_suspend_late(pm_message_t state)

 		mutex_unlock(&dpm_list_mtx);

-		error = device_suspend_late(dev, state, false);
+		device_suspend_late(dev, state, false);

 		put_device(dev);

 		mutex_lock(&dpm_list_mtx);

-		if (error || async_error) {
+		if (READ_ONCE(async_error)) {
 			dpm_async_suspend_complete_all(&dpm_suspended_list);
 			/*
 			 * Move all devices to the target list to resume them
@@ -1688,9 +1726,8 @@ int dpm_suspend_late(pm_message_t state)
 	mutex_unlock(&dpm_list_mtx);

 	async_synchronize_full();
-	if (!error)
-		error = async_error;

+	error = READ_ONCE(async_error);
 	if (error) {
 		dpm_save_failed_step(SUSPEND_SUSPEND_LATE);
 		dpm_resume_early(resume_event(state));
@@ -1779,7 +1816,7 @@ static void async_suspend(void *data, async_cookie_t cookie);
 * @state: PM transition of the system being carried out.
 * @async: If true, the device is being suspended asynchronously.
 */
-static int device_suspend(struct device *dev, pm_message_t state, bool async)
+static void device_suspend(struct device *dev, pm_message_t state, bool async)
 {
 	pm_callback_t callback = NULL;
 	const char *info = NULL;
@@ -1791,7 +1828,7 @@ static int device_suspend(struct device *dev, pm_message_t state, bool async)

 	dpm_wait_for_subordinate(dev, async);

-	if (async_error) {
+	if (READ_ONCE(async_error)) {
 		dev->power.direct_complete = false;
 		goto Complete;
 	}
@@ -1811,7 +1848,7 @@ static int device_suspend(struct device *dev, pm_message_t state, bool async)

 	if (pm_wakeup_pending()) {
 		dev->power.direct_complete = false;
-		async_error = -EBUSY;
+		WRITE_ONCE(async_error, -EBUSY);
 		goto Complete;
 	}

@@ -1895,7 +1932,7 @@ static int device_suspend(struct device *dev, pm_message_t state, bool async)

 Complete:
 	if (error) {
-		async_error = error;
+		WRITE_ONCE(async_error, error);
 		dpm_save_failed_dev(dev_name(dev));
 		pm_dev_err(dev, state, async ? " async" : "", error);
 	}
@@ -1903,12 +1940,10 @@ static int device_suspend(struct device *dev, pm_message_t state, bool async)
 	complete_all(&dev->power.completion);
 	TRACE_SUSPEND(error);

-	if (error || async_error)
-		return error;
+	if (error || READ_ONCE(async_error))
+		return;

-	dpm_async_suspend_parent(dev, async_suspend);
-
-	return 0;
+	dpm_async_suspend_superior(dev, async_suspend);
 }

 static void async_suspend(void *data, async_cookie_t cookie)
@@ -1927,7 +1962,7 @@ int dpm_suspend(pm_message_t state)
 {
 	ktime_t starttime = ktime_get();
 	struct device *dev;
-	int error = 0;
+	int error;

 	trace_suspend_resume(TPS("dpm_suspend"), state.event, true);
 	might_sleep();
@@ -1962,13 +1997,13 @@ int dpm_suspend(pm_message_t state)

 		mutex_unlock(&dpm_list_mtx);

-		error = device_suspend(dev, state, false);
+		device_suspend(dev, state, false);

 		put_device(dev);

 		mutex_lock(&dpm_list_mtx);

-		if (error || async_error) {
+		if (READ_ONCE(async_error)) {
 			dpm_async_suspend_complete_all(&dpm_prepared_list);
 			/*
 			 * Move all devices to the target list to resume them
@@ -1982,9 +2017,8 @@ int dpm_suspend(pm_message_t state)
 	mutex_unlock(&dpm_list_mtx);

 	async_synchronize_full();
-	if (!error)
-		error = async_error;

+	error = READ_ONCE(async_error);
 	if (error)
 		dpm_save_failed_step(SUSPEND_SUSPEND);

@@ -2129,7 +2163,6 @@ int dpm_prepare(pm_message_t state)
 	int error = 0;

 	trace_suspend_resume(TPS("dpm_prepare"), state.event, true);
-	might_sleep();

 	/*
 	 * Give a chance for the known devices to complete their probes, before
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -19,10 +19,24 @@

 typedef int (*pm_callback_t)(struct device *);

+static inline pm_callback_t get_callback_ptr(const void *start, size_t offset)
+{
+	return *(pm_callback_t *)(start + offset);
+}
+
+static pm_callback_t __rpm_get_driver_callback(struct device *dev,
+					       size_t cb_offset)
+{
+	if (dev->driver && dev->driver->pm)
+		return get_callback_ptr(dev->driver->pm, cb_offset);
+
+	return NULL;
+}
+
 static pm_callback_t __rpm_get_callback(struct device *dev, size_t cb_offset)
 {
-	pm_callback_t cb;
 	const struct dev_pm_ops *ops;
+	pm_callback_t cb = NULL;

 	if (dev->pm_domain)
 		ops = &dev->pm_domain->ops;
@@ -36,12 +50,10 @@ static pm_callback_t __rpm_get_callback(struct device *dev, size_t cb_offset)
 		ops = NULL;

 	if (ops)
-		cb = *(pm_callback_t *)((void *)ops + cb_offset);
-	else
-		cb = NULL;
+		cb = get_callback_ptr(ops, cb_offset);

-	if (!cb && dev->driver && dev->driver->pm)
-		cb = *(pm_callback_t *)((void *)dev->driver->pm + cb_offset);
+	if (!cb)
+		cb = __rpm_get_driver_callback(dev, cb_offset);

 	return cb;
 }
@@ -1191,10 +1203,12 @@ EXPORT_SYMBOL_GPL(__pm_runtime_resume);
 *
 * Return -EINVAL if runtime PM is disabled for @dev.
 *
- * Otherwise, if the runtime PM status of @dev is %RPM_ACTIVE and either
- * @ign_usage_count is %true or the runtime PM usage counter of @dev is not
- * zero, increment the usage counter of @dev and return 1. Otherwise, return 0
- * without changing the usage counter.
+ * Otherwise, if its runtime PM status is %RPM_ACTIVE and (1) @ign_usage_count
+ * is set, or (2) @dev is not ignoring children and its active child count is
+ * nonero, or (3) the runtime PM usage counter of @dev is not zero, increment
+ * the usage counter of @dev and return 1.
+ *
+ * Otherwise, return 0 without changing the usage counter.
 *
 * If @ign_usage_count is %true, this function can be used to prevent suspending
 * the device when its runtime PM status is %RPM_ACTIVE.
@@ -1216,7 +1230,8 @@ static int pm_runtime_get_conditional(struct device *dev, bool ign_usage_count)
 		retval = -EINVAL;
 	} else if (dev->power.runtime_status != RPM_ACTIVE) {
 		retval = 0;
-	} else if (ign_usage_count) {
+	} else if (ign_usage_count || (!dev->power.ignore_children &&
+		   atomic_read(&dev->power.child_count) > 0)) {
 		retval = 1;
 		atomic_inc(&dev->power.usage_count);
 	} else {
@@ -1249,10 +1264,16 @@ EXPORT_SYMBOL_GPL(pm_runtime_get_if_active);
 * @dev: Target device.
 *
 * Increment the runtime PM usage counter of @dev if its runtime PM status is
- * %RPM_ACTIVE and its runtime PM usage counter is greater than 0, in which case
- * it returns 1. If the device is in a different state or its usage_count is 0,
- * 0 is returned. -EINVAL is returned if runtime PM is disabled for the device,
- * in which case also the usage_count will remain unmodified.
+ * %RPM_ACTIVE and its runtime PM usage counter is greater than 0 or it is not
+ * ignoring children and its active child count is nonzero.  1 is returned in
+ * this case.
+ *
+ * If @dev is in a different state or it is not in use (that is, its usage
+ * counter is 0, or it is ignoring children, or its active child count is 0),
+ * 0 is returned.
+ *
+ * -EINVAL is returned if runtime PM is disabled for the device, in which case
+ * also the usage counter of @dev is not updated.
 */
 int pm_runtime_get_if_in_use(struct device *dev)
 {
@@ -1827,7 +1848,7 @@ void pm_runtime_init(struct device *dev)
 	dev->power.request_pending = false;
 	dev->power.request = RPM_REQ_NONE;
 	dev->power.deferred_resume = false;
-	dev->power.needs_force_resume = 0;
+	dev->power.needs_force_resume = false;
 	INIT_WORK(&dev->power.work, pm_runtime_work);

 	dev->power.timer_expires = 0;
@@ -1854,6 +1875,11 @@ void pm_runtime_reinit(struct device *dev)
 				pm_runtime_put(dev->parent);
 		}
 	}
+	/*
+	 * Clear power.needs_force_resume in case it has been set by
+	 * pm_runtime_force_suspend() invoked from a driver remove callback.
+	 */
+	dev->power.needs_force_resume = false;
 }

 /**
@@ -1941,13 +1967,23 @@ void pm_runtime_drop_link(struct device_link *link)
 	pm_request_idle(link->supplier);
 }

-bool pm_runtime_need_not_resume(struct device *dev)
+static pm_callback_t get_callback(struct device *dev, size_t cb_offset)
 {
-	return atomic_read(&dev->power.usage_count) <= 1 &&
-		(atomic_read(&dev->power.child_count) == 0 ||
-		 dev->power.ignore_children);
+	/*
+	 * Setting power.strict_midlayer means that the middle layer
+	 * code does not want its runtime PM callbacks to be invoked via
+	 * pm_runtime_force_suspend() and pm_runtime_force_resume(), so
+	 * return a direct pointer to the driver callback in that case.
+	 */
+	if (dev_pm_strict_midlayer_is_set(dev))
+		return __rpm_get_driver_callback(dev, cb_offset);
+
+	return __rpm_get_callback(dev, cb_offset);
 }

+#define GET_CALLBACK(dev, callback) \
+		get_callback(dev, offsetof(struct dev_pm_ops, callback))
+
 /**
 * pm_runtime_force_suspend - Force a device into suspend state if needed.
 * @dev: Device to suspend.
@@ -1964,10 +2000,6 @@ bool pm_runtime_need_not_resume(struct device *dev)
 * sure the device is put into low power state and it should only be used during
 * system-wide PM transitions to sleep states.  It assumes that the analogous
 * pm_runtime_force_resume() will be used to resume the device.
- *
- * Do not use with DPM_FLAG_SMART_SUSPEND as this can lead to an inconsistent
- * state where this function has called the ->runtime_suspend callback but the
- * PM core marks the driver as runtime active.
 */
 int pm_runtime_force_suspend(struct device *dev)
 {
@@ -1975,10 +2007,10 @@ int pm_runtime_force_suspend(struct device *dev)
 	int ret;

 	pm_runtime_disable(dev);
-	if (pm_runtime_status_suspended(dev))
+	if (pm_runtime_status_suspended(dev) || dev->power.needs_force_resume)
 		return 0;

-	callback = RPM_GET_CALLBACK(dev, runtime_suspend);
+	callback = GET_CALLBACK(dev, runtime_suspend);

 	dev_pm_enable_wake_irq_check(dev, true);
 	ret = callback ? callback(dev) : 0;
@@ -1990,15 +2022,16 @@ int pm_runtime_force_suspend(struct device *dev)
 	/*
 	 * If the device can stay in suspend after the system-wide transition
 	 * to the working state that will follow, drop the children counter of
-	 * its parent, but set its status to RPM_SUSPENDED anyway in case this
-	 * function will be called again for it in the meantime.
+	 * its parent and the usage counters of its suppliers.  Otherwise, set
+	 * power.needs_force_resume to let pm_runtime_force_resume() know that
+	 * the device needs to be taken care of and to prevent this function
+	 * from handling the device again in case the device is passed to it
+	 * once more subsequently.
 	 */
-	if (pm_runtime_need_not_resume(dev)) {
+	if (pm_runtime_need_not_resume(dev))
 		pm_runtime_set_suspended(dev);
-	} else {
-		__update_runtime_status(dev, RPM_SUSPENDED);
-		dev->power.needs_force_resume = 1;
-	}
+	else
+		dev->power.needs_force_resume = true;

 	return 0;

@@ -2009,33 +2042,37 @@ int pm_runtime_force_suspend(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(pm_runtime_force_suspend);

+#ifdef CONFIG_PM_SLEEP
+
 /**
 * pm_runtime_force_resume - Force a device into resume state if needed.
 * @dev: Device to resume.
 *
- * Prior invoking this function we expect the user to have brought the device
- * into low power state by a call to pm_runtime_force_suspend(). Here we reverse
- * those actions and bring the device into full power, if it is expected to be
- * used on system resume.  In the other case, we defer the resume to be managed
- * via runtime PM.
+ * This function expects that either pm_runtime_force_suspend() has put the
+ * device into a low-power state prior to calling it, or the device had been
+ * runtime-suspended before the preceding system-wide suspend transition and it
+ * was left in suspend during that transition.
 *
- * Typically this function may be invoked from a system resume callback.
+ * The actions carried out by pm_runtime_force_suspend(), or by a runtime
+ * suspend in general, are reversed and the device is brought back into full
+ * power if it is expected to be used on system resume, which is the case when
+ * its needs_force_resume flag is set or when its smart_suspend flag is set and
+ * its runtime PM status is "active".
+ *
+ * In other cases, the resume is deferred to be managed via runtime PM.
+ *
+ * Typically, this function may be invoked from a system resume callback.
 */
 int pm_runtime_force_resume(struct device *dev)
 {
 	int (*callback)(struct device *);
 	int ret = 0;

-	if (!dev->power.needs_force_resume)
+	if (!dev->power.needs_force_resume && (!dev_pm_smart_suspend(dev) ||
+	    pm_runtime_status_suspended(dev)))
 		goto out;

-	/*
-	 * The value of the parent's children counter is correct already, so
-	 * just update the status of the device.
-	 */
-	__update_runtime_status(dev, RPM_ACTIVE);
-
-	callback = RPM_GET_CALLBACK(dev, runtime_resume);
+	callback = GET_CALLBACK(dev, runtime_resume);

 	dev_pm_disable_wake_irq_check(dev, false);
 	ret = callback ? callback(dev) : 0;
@@ -2046,9 +2083,30 @@ int pm_runtime_force_resume(struct device *dev)
 	}

 	pm_runtime_mark_last_busy(dev);
+
 out:
-	dev->power.needs_force_resume = 0;
+	/*
+	 * The smart_suspend flag can be cleared here because it is not going
+	 * to be necessary until the next system-wide suspend transition that
+	 * will update it again.
+	 */
+	dev->power.smart_suspend = false;
+	/*
+	 * Also clear needs_force_resume to make this function skip devices that
+	 * have been seen by it once.
+	 */
+	dev->power.needs_force_resume = false;
+
 	pm_runtime_enable(dev);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(pm_runtime_force_resume);
+
+bool pm_runtime_need_not_resume(struct device *dev)
+{
+	return atomic_read(&dev->power.usage_count) <= 1 &&
+		(atomic_read(&dev->power.child_count) == 0 ||
+		 dev->power.ignore_children);
+}
+
+#endif /* CONFIG_PM_SLEEP */
--- a/drivers/clk/qcom/apcs-sdx55.c
+++ b/drivers/clk/qcom/apcs-sdx55.c
@@ -111,7 +111,7 @@ static int qcom_apcs_sdx55_clk_probe(struct platform_device *pdev)
 	 * driver, there seems to be no better place to do this. So do it here!
 	 */
 	cpu_dev = get_cpu_device(0);
-	ret = dev_pm_domain_attach(cpu_dev, true);
+	ret = dev_pm_domain_attach(cpu_dev, PD_FLAG_ATTACH_POWER_ON);
 	if (ret) {
 		dev_err_probe(dev, ret, "can't get PM domain: %d\n", ret);
 		goto err;
--- a/drivers/cpufreq/Kconfig.arm
+++ b/drivers/cpufreq/Kconfig.arm
@@ -28,7 +28,6 @@ config ARM_APPLE_SOC_CPUFREQ
 	tristate "Apple Silicon SoC CPUFreq support"
 	depends on ARCH_APPLE || (COMPILE_TEST && 64BIT)
 	select PM_OPP
-	default ARCH_APPLE
 	help
 	  This adds the CPUFreq driver for Apple Silicon machines
 	  (e.g. Apple M1).
@@ -238,7 +237,7 @@ config ARM_TEGRA20_CPUFREQ
 	  This adds the CPUFreq driver support for Tegra20/30 SOCs.

 config ARM_TEGRA124_CPUFREQ
-	bool "Tegra124 CPUFreq support"
+	tristate "Tegra124 CPUFreq support"
 	depends on ARCH_TEGRA || COMPILE_TEST
 	depends on CPUFREQ_DT
 	default ARCH_TEGRA
--- a/drivers/cpufreq/armada-8k-cpufreq.c
+++ b/drivers/cpufreq/armada-8k-cpufreq.c
@@ -103,7 +103,7 @@ static void armada_8k_cpufreq_free_table(struct freq_table *freq_tables)
 {
 	int opps_index, nb_cpus = num_possible_cpus();

-	for (opps_index = 0 ; opps_index <= nb_cpus; opps_index++) {
+	for (opps_index = 0 ; opps_index < nb_cpus; opps_index++) {
 		int i;

 		/* If cpu_dev is NULL then we reached the end of the array */
@@ -132,7 +132,7 @@ static int __init armada_8k_cpufreq_init(void)
 	int ret = 0, opps_index = 0, cpu, nb_cpus;
 	struct freq_table *freq_tables;
 	struct device_node *node;
-	static struct cpumask cpus;
+	static struct cpumask cpus, shared_cpus;

 	node = of_find_matching_node_and_match(NULL, armada_8k_cpufreq_of_match,
 					       NULL);
@@ -154,7 +154,6 @@ static int __init armada_8k_cpufreq_init(void)
 	 * divisions of it).
 	 */
 	for_each_cpu(cpu, &cpus) {
-		struct cpumask shared_cpus;
 		struct device *cpu_dev;
 		struct clk *clk;

--- a/drivers/cpufreq/brcmstb-avs-cpufreq.c
+++ b/drivers/cpufreq/brcmstb-avs-cpufreq.c
@@ -765,7 +765,7 @@ static void brcm_avs_cpufreq_remove(struct platform_device *pdev)
 }

 static const struct of_device_id brcm_avs_cpufreq_match[] = {
-	{ .compatible = BRCM_AVS_CPU_DATA },
+	{ .compatible = "brcm,avs-cpu-data-mem" },
 	{ }
 };
 MODULE_DEVICE_TABLE(of, brcm_avs_cpufreq_match);
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -26,14 +26,6 @@

 #include <acpi/cppc_acpi.h>

-/*
- * This list contains information parsed from per CPU ACPI _CPC and _PSD
- * structures: e.g. the highest and lowest supported performance, capabilities,
- * desired performance, level requested etc. Depending on the share_type, not
- * all CPUs will have an entry in the list.
- */
-static LIST_HEAD(cpu_data_list);
-
 static struct cpufreq_driver cppc_cpufreq_driver;

 #ifdef CONFIG_ACPI_CPPC_CPUFREQ_FIE
@@ -352,7 +344,6 @@ static unsigned int cppc_cpufreq_get_transition_delay_us(unsigned int cpu)
 #if defined(CONFIG_ARM64) && defined(CONFIG_ENERGY_MODEL)

 static DEFINE_PER_CPU(unsigned int, efficiency_class);
-static void cppc_cpufreq_register_em(struct cpufreq_policy *policy);

 /* Create an artificial performance state every CPPC_EM_CAP_STEP capacity unit. */
 #define CPPC_EM_CAP_STEP	(20)
@@ -488,7 +479,19 @@ static int cppc_get_cpu_cost(struct device *cpu_dev, unsigned long KHz,
 	return 0;
 }

-static int populate_efficiency_class(void)
+static void cppc_cpufreq_register_em(struct cpufreq_policy *policy)
+{
+	struct cppc_cpudata *cpu_data;
+	struct em_data_callback em_cb =
+		EM_ADV_DATA_CB(cppc_get_cpu_power, cppc_get_cpu_cost);
+
+	cpu_data = policy->driver_data;
+	em_dev_register_perf_domain(get_cpu_device(policy->cpu),
+			get_perf_level_count(policy), &em_cb,
+			cpu_data->shared_cpu_map, 0);
+}
+
+static void populate_efficiency_class(void)
 {
 	struct acpi_madt_generic_interrupt *gicc;
 	DECLARE_BITMAP(used_classes, 256) = {};
@@ -503,7 +506,7 @@ static int populate_efficiency_class(void)
 	if (bitmap_weight(used_classes, 256) <= 1) {
 		pr_debug("Efficiency classes are all equal (=%d). "
 			"No EM registered", class);
-		return -EINVAL;
+		return;
 	}

 	/*
@@ -520,26 +523,11 @@ static int populate_efficiency_class(void)
 		index++;
 	}
 	cppc_cpufreq_driver.register_em = cppc_cpufreq_register_em;
-
-	return 0;
-}
-
-static void cppc_cpufreq_register_em(struct cpufreq_policy *policy)
-{
-	struct cppc_cpudata *cpu_data;
-	struct em_data_callback em_cb =
-		EM_ADV_DATA_CB(cppc_get_cpu_power, cppc_get_cpu_cost);
-
-	cpu_data = policy->driver_data;
-	em_dev_register_perf_domain(get_cpu_device(policy->cpu),
-			get_perf_level_count(policy), &em_cb,
-			cpu_data->shared_cpu_map, 0);
 }

 #else
-static int populate_efficiency_class(void)
+static void populate_efficiency_class(void)
 {
-	return 0;
 }
 #endif

@@ -567,8 +555,6 @@ static struct cppc_cpudata *cppc_cpufreq_get_cpu_data(unsigned int cpu)
 		goto free_mask;
 	}

-	list_add(&cpu_data->node, &cpu_data_list);
-
 	return cpu_data;

 free_mask:
@@ -583,7 +569,6 @@ static void cppc_cpufreq_put_cpu_data(struct cpufreq_policy *policy)
 {
 	struct cppc_cpudata *cpu_data = policy->driver_data;

-	list_del(&cpu_data->node);
 	free_cpumask_var(cpu_data->shared_cpu_map);
 	kfree(cpu_data);
 	policy->driver_data = NULL;
@@ -925,7 +910,7 @@ static struct freq_attr *cppc_cpufreq_attr[] = {
 };

 static struct cpufreq_driver cppc_cpufreq_driver = {
-	.flags = CPUFREQ_CONST_LOOPS,
+	.flags = CPUFREQ_CONST_LOOPS | CPUFREQ_NEED_UPDATE_LIMITS,
 	.verify = cppc_verify_policy,
 	.target = cppc_cpufreq_set_target,
 	.get = cppc_cpufreq_get_rate,
@@ -954,24 +939,10 @@ static int __init cppc_cpufreq_init(void)
 	return ret;
 }

-static inline void free_cpu_data(void)
-{
-	struct cppc_cpudata *iter, *tmp;
-
-	list_for_each_entry_safe(iter, tmp, &cpu_data_list, node) {
-		free_cpumask_var(iter->shared_cpu_map);
-		list_del(&iter->node);
-		kfree(iter);
-	}
-
-}
-
 static void __exit cppc_cpufreq_exit(void)
 {
 	cpufreq_unregister_driver(&cppc_cpufreq_driver);
 	cppc_freq_invariance_exit();
-
-	free_cpu_data();
 }

 module_exit(cppc_cpufreq_exit);
--- a/drivers/cpufreq/cpufreq-dt-platdev.c
+++ b/drivers/cpufreq/cpufreq-dt-platdev.c
@@ -143,6 +143,7 @@ static const struct of_device_id blocklist[] __initconst = {

 	{ .compatible = "nvidia,tegra20", },
 	{ .compatible = "nvidia,tegra30", },
+	{ .compatible = "nvidia,tegra114", },
 	{ .compatible = "nvidia,tegra124", },
 	{ .compatible = "nvidia,tegra210", },
 	{ .compatible = "nvidia,tegra234", },
--- a/drivers/cpufreq/cpufreq-dt.c
+++ b/drivers/cpufreq/cpufreq-dt.c
@@ -329,6 +329,17 @@ static struct platform_driver dt_cpufreq_platdrv = {
 };
 module_platform_driver(dt_cpufreq_platdrv);

+struct platform_device *cpufreq_dt_pdev_register(struct device *dev)
+{
+	struct platform_device_info cpufreq_dt_devinfo = {};
+
+	cpufreq_dt_devinfo.name = "cpufreq-dt";
+	cpufreq_dt_devinfo.parent = dev;
+
+	return platform_device_register_full(&cpufreq_dt_devinfo);
+}
+EXPORT_SYMBOL_GPL(cpufreq_dt_pdev_register);
+
 MODULE_ALIAS("platform:cpufreq-dt");
 MODULE_AUTHOR("Viresh Kumar <viresh.kumar@linaro.org>");
 MODULE_AUTHOR("Shawn Guo <shawn.guo@linaro.org>");
--- a/drivers/cpufreq/cpufreq-dt.h
+++ b/drivers/cpufreq/cpufreq-dt.h
@@ -22,4 +22,6 @@ struct cpufreq_dt_platform_data {
 	int (*resume)(struct cpufreq_policy *policy);
 };

+struct platform_device *cpufreq_dt_pdev_register(struct device *dev);
+
 #endif /* __CPUFREQ_DT_H__ */
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -109,6 +109,8 @@ void disable_cpufreq(void)
 {
 	off = 1;
 }
+EXPORT_SYMBOL_GPL(disable_cpufreq);
+
 static DEFINE_MUTEX(cpufreq_governor_mutex);

 bool have_governor_per_policy(void)
@@ -967,6 +969,7 @@ static struct attribute *cpufreq_attrs[] = {
 	&cpuinfo_min_freq.attr,
 	&cpuinfo_max_freq.attr,
 	&cpuinfo_transition_latency.attr,
+	&scaling_cur_freq.attr,
 	&scaling_min_freq.attr,
 	&scaling_max_freq.attr,
 	&affected_cpus.attr,
@@ -1095,10 +1098,6 @@ static int cpufreq_add_dev_interface(struct cpufreq_policy *policy)
 			return ret;
 	}

-	ret = sysfs_create_file(&policy->kobj, &scaling_cur_freq.attr);
-	if (ret)
-		return ret;
-
 	if (cpufreq_driver->bios_limit) {
 		ret = sysfs_create_file(&policy->kobj, &bios_limit.attr);
 		if (ret)
@@ -1284,6 +1283,8 @@ static struct cpufreq_policy *cpufreq_policy_alloc(unsigned int cpu)
 		goto err_free_real_cpus;
 	}

+	init_rwsem(&policy->rwsem);
+
 	freq_constraints_init(&policy->constraints);

 	policy->nb_min.notifier_call = cpufreq_notifier_min;
@@ -1306,7 +1307,6 @@ static struct cpufreq_policy *cpufreq_policy_alloc(unsigned int cpu)
 	}

 	INIT_LIST_HEAD(&policy->policy_list);
-	init_rwsem(&policy->rwsem);
 	spin_lock_init(&policy->transition_lock);
 	init_waitqueue_head(&policy->transition_wait);
 	INIT_WORK(&policy->update, handle_update);
@@ -1694,14 +1694,13 @@ static void __cpufreq_offline(unsigned int cpu, struct cpufreq_policy *policy)
 		return;
 	}

-	if (has_target())
+	if (has_target()) {
 		strscpy(policy->last_governor, policy->governor->name,
 			CPUFREQ_NAME_LEN);
-	else
-		policy->last_policy = policy->policy;
-
-	if (has_target())
 		cpufreq_exit_governor(policy);
+	} else {
+		policy->last_policy = policy->policy;
+	}

 	/*
 	 * Perform the ->offline() during light-weight tear-down, as
@@ -1803,6 +1802,9 @@ static unsigned int cpufreq_verify_current_freq(struct cpufreq_policy *policy, b
 {
 	unsigned int new_freq;

+	if (!cpufreq_driver->get)
+		return 0;
+
 	new_freq = cpufreq_driver->get(policy->cpu);
 	if (!new_freq)
 		return 0;
@@ -1925,10 +1927,7 @@ unsigned int cpufreq_get(unsigned int cpu)

 	guard(cpufreq_policy_read)(policy);

-	if (cpufreq_driver->get)
-		return __cpufreq_get(policy);
-
-	return 0;
+	return __cpufreq_get(policy);
 }
 EXPORT_SYMBOL(cpufreq_get);

@@ -2482,8 +2481,7 @@ int cpufreq_start_governor(struct cpufreq_policy *policy)

 	pr_debug("%s: for CPU %u\n", __func__, policy->cpu);

-	if (cpufreq_driver->get)
-		cpufreq_verify_current_freq(policy, false);
+	cpufreq_verify_current_freq(policy, false);

 	if (policy->governor->start) {
 		ret = policy->governor->start(policy);
@@ -2715,10 +2713,12 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy,
 	pr_debug("starting governor %s failed\n", policy->governor->name);
 	if (old_gov) {
 		policy->governor = old_gov;
-		if (cpufreq_init_governor(policy))
+		if (cpufreq_init_governor(policy)) {
 			policy->governor = NULL;
-		else
-			cpufreq_start_governor(policy);
+		} else if (cpufreq_start_governor(policy)) {
+			cpufreq_exit_governor(policy);
+			policy->governor = NULL;
+		}
 	}

 	return ret;
@@ -2944,15 +2944,6 @@ int cpufreq_register_driver(struct cpufreq_driver *driver_data)
 	cpufreq_driver = driver_data;
 	write_unlock_irqrestore(&cpufreq_driver_lock, flags);

-	/*
-	 * Mark support for the scheduler's frequency invariance engine for
-	 * drivers that implement target(), target_index() or fast_switch().
-	 */
-	if (!cpufreq_driver->setpolicy) {
-		static_branch_enable_cpuslocked(&cpufreq_freq_invariance);
-		pr_debug("supports frequency invariance");
-	}
-
 	if (driver_data->setpolicy)
 		driver_data->flags |= CPUFREQ_CONST_LOOPS;

@@ -2983,6 +2974,15 @@ int cpufreq_register_driver(struct cpufreq_driver *driver_data)
 	hp_online = ret;
 	ret = 0;

+	/*
+	 * Mark support for the scheduler's frequency invariance engine for
+	 * drivers that implement target(), target_index() or fast_switch().
+	 */
+	if (!cpufreq_driver->setpolicy) {
+		static_branch_enable_cpuslocked(&cpufreq_freq_invariance);
+		pr_debug("supports frequency invariance");
+	}
+
 	pr_debug("driver %s up and running\n", driver_data->name);
 	goto out;

--- a/drivers/cpufreq/cpufreq_userspace.c
+++ b/drivers/cpufreq/cpufreq_userspace.c
@@ -134,6 +134,7 @@ static struct cpufreq_governor cpufreq_gov_userspace = {
 	.store_setspeed	= cpufreq_set,
 	.show_setspeed	= show_speed,
 	.owner		= THIS_MODULE,
+	.flags		= CPUFREQ_GOV_STRICT_TARGET,
 };

 MODULE_AUTHOR("Dominik Brodowski <linux@brodo.de>, "
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -2775,6 +2775,8 @@ static const struct x86_cpu_id intel_pstate_cpu_ids[] = {
 	X86_MATCH(INTEL_TIGERLAKE,		core_funcs),
 	X86_MATCH(INTEL_SAPPHIRERAPIDS_X,	core_funcs),
 	X86_MATCH(INTEL_EMERALDRAPIDS_X,	core_funcs),
+	X86_MATCH(INTEL_GRANITERAPIDS_D,	core_funcs),
+	X86_MATCH(INTEL_GRANITERAPIDS_X,	core_funcs),
 	{}
 };
 MODULE_DEVICE_TABLE(x86cpu, intel_pstate_cpu_ids);
@@ -3249,8 +3251,8 @@ static int intel_cpufreq_update_pstate(struct cpufreq_policy *policy,
 		int max_pstate = policy->strict_target ?
 					target_pstate : cpu->max_perf_ratio;

-		intel_cpufreq_hwp_update(cpu, target_pstate, max_pstate, 0,
-					 fast_switch);
+		intel_cpufreq_hwp_update(cpu, target_pstate, max_pstate,
+					 target_pstate, fast_switch);
 	} else if (target_pstate != old_pstate) {
 		intel_cpufreq_perf_ctl_update(cpu, target_pstate, fast_switch);
 	}
--- a/drivers/cpufreq/tegra124-cpufreq.c
+++ b/drivers/cpufreq/tegra124-cpufreq.c
@@ -16,6 +16,10 @@
 #include <linux/pm_opp.h>
 #include <linux/types.h>

+#include "cpufreq-dt.h"
+
+static struct platform_device *tegra124_cpufreq_pdev;
+
 struct tegra124_cpufreq_priv {
 	struct clk *cpu_clk;
 	struct clk *pllp_clk;
@@ -55,7 +59,6 @@ static int tegra124_cpufreq_probe(struct platform_device *pdev)
 	struct device_node *np __free(device_node) = of_cpu_device_node_get(0);
 	struct tegra124_cpufreq_priv *priv;
 	struct device *cpu_dev;
-	struct platform_device_info cpufreq_dt_devinfo = {};
 	int ret;

 	if (!np)
@@ -95,11 +98,7 @@ static int tegra124_cpufreq_probe(struct platform_device *pdev)
 	if (ret)
 		goto out_put_pllp_clk;

-	cpufreq_dt_devinfo.name = "cpufreq-dt";
-	cpufreq_dt_devinfo.parent = &pdev->dev;
-
-	priv->cpufreq_dt_pdev =
-		platform_device_register_full(&cpufreq_dt_devinfo);
+	priv->cpufreq_dt_pdev = cpufreq_dt_pdev_register(&pdev->dev);
 	if (IS_ERR(priv->cpufreq_dt_pdev)) {
 		ret = PTR_ERR(priv->cpufreq_dt_pdev);
 		goto out_put_pllp_clk;
@@ -173,6 +172,21 @@ static int __maybe_unused tegra124_cpufreq_resume(struct device *dev)
 	return err;
 }

+static void tegra124_cpufreq_remove(struct platform_device *pdev)
+{
+	struct tegra124_cpufreq_priv *priv = dev_get_drvdata(&pdev->dev);
+
+	if (!IS_ERR(priv->cpufreq_dt_pdev)) {
+		platform_device_unregister(priv->cpufreq_dt_pdev);
+		priv->cpufreq_dt_pdev = ERR_PTR(-ENODEV);
+	}
+
+	clk_put(priv->pllp_clk);
+	clk_put(priv->pllx_clk);
+	clk_put(priv->dfll_clk);
+	clk_put(priv->cpu_clk);
+}
+
 static const struct dev_pm_ops tegra124_cpufreq_pm_ops = {
 	SET_SYSTEM_SLEEP_PM_OPS(tegra124_cpufreq_suspend,
 				tegra124_cpufreq_resume)
@@ -182,15 +196,16 @@ static struct platform_driver tegra124_cpufreq_platdrv = {
 	.driver.name	= "cpufreq-tegra124",
 	.driver.pm	= &tegra124_cpufreq_pm_ops,
 	.probe		= tegra124_cpufreq_probe,
+	.remove		= tegra124_cpufreq_remove,
 };

 static int __init tegra_cpufreq_init(void)
 {
 	int ret;
-	struct platform_device *pdev;

-	if (!(of_machine_is_compatible("nvidia,tegra124") ||
-		of_machine_is_compatible("nvidia,tegra210")))
+	if (!(of_machine_is_compatible("nvidia,tegra114") ||
+	      of_machine_is_compatible("nvidia,tegra124") ||
+	      of_machine_is_compatible("nvidia,tegra210")))
 		return -ENODEV;

 	/*
@@ -201,15 +216,25 @@ static int __init tegra_cpufreq_init(void)
 	if (ret)
 		return ret;

-	pdev = platform_device_register_simple("cpufreq-tegra124", -1, NULL, 0);
-	if (IS_ERR(pdev)) {
+	tegra124_cpufreq_pdev = platform_device_register_simple("cpufreq-tegra124", -1, NULL, 0);
+	if (IS_ERR(tegra124_cpufreq_pdev)) {
 		platform_driver_unregister(&tegra124_cpufreq_platdrv);
-		return PTR_ERR(pdev);
+		return PTR_ERR(tegra124_cpufreq_pdev);
 	}

 	return 0;
 }
 module_init(tegra_cpufreq_init);

+static void __exit tegra_cpufreq_module_exit(void)
+{
+	if (!IS_ERR_OR_NULL(tegra124_cpufreq_pdev))
+		platform_device_unregister(tegra124_cpufreq_pdev);
+
+	platform_driver_unregister(&tegra124_cpufreq_platdrv);
+}
+module_exit(tegra_cpufreq_module_exit);
+
 MODULE_AUTHOR("Tuomas Tynkkynen <ttynkkynen@nvidia.com>");
 MODULE_DESCRIPTION("cpufreq driver for NVIDIA Tegra124");
+MODULE_LICENSE("GPL");
--- a/drivers/cpuidle/dt_idle_states.c
+++ b/drivers/cpuidle/dt_idle_states.c
@@ -98,7 +98,6 @@ static bool idle_state_valid(struct device_node *state_node, unsigned int idx,
 {
 	int cpu;
 	struct device_node *cpu_node, *curr_state_node;
-	bool valid = true;

 	/*
 	 * Compare idle state phandles for index idx on all CPUs in the
@@ -107,20 +106,17 @@ static bool idle_state_valid(struct device_node *state_node, unsigned int idx,
 	 * retrieved from. If a mismatch is found bail out straight
 	 * away since we certainly hit a firmware misconfiguration.
 	 */
-	for (cpu = cpumask_next(cpumask_first(cpumask), cpumask);
-	     cpu < nr_cpu_ids; cpu = cpumask_next(cpu, cpumask)) {
+	cpu = cpumask_first(cpumask) + 1;
+	for_each_cpu_from(cpu, cpumask) {
 		cpu_node = of_cpu_device_node_get(cpu);
 		curr_state_node = of_get_cpu_state_node(cpu_node, idx);
-		if (state_node != curr_state_node)
-			valid = false;
-
 		of_node_put(curr_state_node);
 		of_node_put(cpu_node);
-		if (!valid)
-			break;
+		if (state_node != curr_state_node)
+			return false;
 	}

-	return valid;
+	return true;
 }

 /**
--- a/drivers/devfreq/Kconfig
+++ b/drivers/devfreq/Kconfig
@@ -90,6 +90,17 @@ config ARM_EXYNOS_BUS_DEVFREQ
 	  and adjusts the operating frequencies and voltages with OPP support.
 	  This does not yet operate with optimal voltages.

+config ARM_HISI_UNCORE_DEVFREQ
+	tristate "HiSilicon uncore DEVFREQ Driver"
+	depends on ACPI && ACPI_PPTT && PCC
+	select DEVFREQ_GOV_PERFORMANCE
+	select DEVFREQ_GOV_USERSPACE
+	help
+	  This adds a DEVFREQ driver that manages uncore frequency scaling for
+	  HiSilicon Kunpeng SoCs. This enables runtime management of uncore
+	  frequency scaling from kernel and userspace. The uncore domain
+	  contains system interconnects and L3 cache.
+
 config ARM_IMX_BUS_DEVFREQ
 	tristate "i.MX Generic Bus DEVFREQ Driver"
 	depends on ARCH_MXC || COMPILE_TEST
--- a/drivers/devfreq/Makefile
+++ b/drivers/devfreq/Makefile
@@ -9,6 +9,7 @@ obj-$(CONFIG_DEVFREQ_GOV_PASSIVE)	+= governor_passive.o

 # DEVFREQ Drivers
 obj-$(CONFIG_ARM_EXYNOS_BUS_DEVFREQ)	+= exynos-bus.o
+obj-$(CONFIG_ARM_HISI_UNCORE_DEVFREQ)	+= hisi_uncore_freq.o
 obj-$(CONFIG_ARM_IMX_BUS_DEVFREQ)	+= imx-bus.o
 obj-$(CONFIG_ARM_IMX8M_DDRC_DEVFREQ)	+= imx8m-ddrc.o
 obj-$(CONFIG_ARM_MEDIATEK_CCI_DEVFREQ)	+= mtk-cci-devfreq.o
--- a/drivers/devfreq/devfreq.c
+++ b/drivers/devfreq/devfreq.c
@@ -152,11 +152,8 @@ void devfreq_get_freq_range(struct devfreq *devfreq,
 				(unsigned long)HZ_PER_KHZ * qos_max_freq);

 	/* Apply constraints from OPP interface */
-	*min_freq = max(*min_freq, devfreq->scaling_min_freq);
-	*max_freq = min(*max_freq, devfreq->scaling_max_freq);
-
-	if (*min_freq > *max_freq)
-		*min_freq = *max_freq;
+	*max_freq = clamp(*max_freq, devfreq->scaling_min_freq, devfreq->scaling_max_freq);
+	*min_freq = clamp(*min_freq, devfreq->scaling_min_freq, *max_freq);
 }
 EXPORT_SYMBOL(devfreq_get_freq_range);

@@ -807,7 +804,6 @@ struct devfreq *devfreq_add_device(struct device *dev,
 {
 	struct devfreq *devfreq;
 	struct devfreq_governor *governor;
-	unsigned long min_freq, max_freq;
 	int err = 0;

 	if (!dev || !profile || !governor_name) {
@@ -835,6 +831,7 @@ struct devfreq *devfreq_add_device(struct device *dev,
 	mutex_lock(&devfreq->lock);
 	devfreq->dev.parent = dev;
 	devfreq->dev.class = devfreq_class;
+	devfreq->dev.groups = profile->dev_groups;
 	devfreq->dev.release = devfreq_dev_release;
 	INIT_LIST_HEAD(&devfreq->node);
 	devfreq->profile = profile;
@@ -875,8 +872,6 @@ struct devfreq *devfreq_add_device(struct device *dev,
 		goto err_dev;
 	}

-	devfreq_get_freq_range(devfreq, &min_freq, &max_freq);
-
 	devfreq->suspend_freq = dev_pm_opp_get_suspend_opp_freq(dev);
 	devfreq->opp_table = dev_pm_opp_get_opp_table(dev);
 	if (IS_ERR(devfreq->opp_table))
@@ -1382,15 +1377,11 @@ int devfreq_remove_governor(struct devfreq_governor *governor)
 		int ret;
 		struct device *dev = devfreq->dev.parent;

+		if (!devfreq->governor)
+			continue;
+
 		if (!strncmp(devfreq->governor->name, governor->name,
 			     DEVFREQ_NAME_LEN)) {
-			/* we should have a devfreq governor! */
-			if (!devfreq->governor) {
-				dev_warn(dev, "%s: Governor %s NOT present\n",
-					 __func__, governor->name);
-				continue;
-				/* Fall through */
-			}
 			ret = devfreq->governor->event_handler(devfreq,
 						DEVFREQ_GOV_STOP, NULL);
 			if (ret) {
@@ -1743,7 +1734,7 @@ static ssize_t trans_stat_show(struct device *dev,
 	for (i = 0; i < max_state; i++) {
 		if (len >= PAGE_SIZE - 1)
 			break;
-		if (df->freq_table[2] == df->previous_freq)
+		if (df->freq_table[i] == df->previous_freq)
 			len += sysfs_emit_at(buf, len, "*");
 		else
 			len += sysfs_emit_at(buf, len, " ");
--- a/drivers/devfreq/governor_userspace.c
+++ b/drivers/devfreq/governor_userspace.c
@@ -9,6 +9,7 @@
 #include <linux/slab.h>
 #include <linux/device.h>
 #include <linux/devfreq.h>
+#include <linux/kstrtox.h>
 #include <linux/pm.h>
 #include <linux/mutex.h>
 #include <linux/module.h>
@@ -39,10 +40,13 @@ static ssize_t set_freq_store(struct device *dev, struct device_attribute *attr,
 	unsigned long wanted;
 	int err = 0;

+	err = kstrtoul(buf, 0, &wanted);
+	if (err)
+		return err;
+
 	mutex_lock(&devfreq->lock);
 	data = devfreq->governor_data;

-	sscanf(buf, "%lu", &wanted);
 	data->user_frequency = wanted;
 	data->valid = true;
 	err = update_devfreq(devfreq);
--- a/drivers/devfreq/hisi_uncore_freq.c
+++ b/drivers/devfreq/hisi_uncore_freq.c
@@ -0,0 +1,658 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * HiSilicon uncore frequency scaling driver
+ *
+ * Copyright (c) 2025 HiSilicon Co., Ltd
+ */
+
+#include <linux/acpi.h>
+#include <linux/bits.h>
+#include <linux/cleanup.h>
+#include <linux/devfreq.h>
+#include <linux/device.h>
+#include <linux/dev_printk.h>
+#include <linux/errno.h>
+#include <linux/iopoll.h>
+#include <linux/kernel.h>
+#include <linux/ktime.h>
+#include <linux/mailbox_client.h>
+#include <linux/module.h>
+#include <linux/mod_devicetable.h>
+#include <linux/mutex.h>
+#include <linux/platform_device.h>
+#include <linux/pm_opp.h>
+#include <linux/property.h>
+#include <linux/topology.h>
+#include <linux/units.h>
+#include <acpi/pcc.h>
+
+#include "governor.h"
+
+struct hisi_uncore_pcc_data {
+	u16 status;
+	u16 resv;
+	u32 data;
+};
+
+struct hisi_uncore_pcc_shmem {
+	struct acpi_pcct_shared_memory head;
+	struct hisi_uncore_pcc_data pcc_data;
+};
+
+enum hisi_uncore_pcc_cmd_type {
+	HUCF_PCC_CMD_GET_CAP = 0,
+	HUCF_PCC_CMD_GET_FREQ,
+	HUCF_PCC_CMD_SET_FREQ,
+	HUCF_PCC_CMD_GET_MODE,
+	HUCF_PCC_CMD_SET_MODE,
+	HUCF_PCC_CMD_GET_PLAT_FREQ_NUM,
+	HUCF_PCC_CMD_GET_PLAT_FREQ_BY_IDX,
+	HUCF_PCC_CMD_MAX = 256
+};
+
+static int hisi_platform_gov_usage;
+static DEFINE_MUTEX(hisi_platform_gov_usage_lock);
+
+enum hisi_uncore_freq_mode {
+	HUCF_MODE_PLATFORM = 0,
+	HUCF_MODE_OS,
+	HUCF_MODE_MAX
+};
+
+#define HUCF_CAP_PLATFORM_CTRL	BIT(0)
+
+/**
+ * struct hisi_uncore_freq - hisi uncore frequency scaling device data
+ * @dev:		device of this frequency scaling driver
+ * @cl:			mailbox client object
+ * @pchan:		PCC mailbox channel
+ * @chan_id:		PCC channel ID
+ * @last_cmd_cmpl_time:	timestamp of the last completed PCC command
+ * @pcc_lock:		PCC channel lock
+ * @devfreq:		devfreq data of this hisi_uncore_freq device
+ * @related_cpus:	CPUs whose performance is majorly affected by this
+ *			uncore frequency domain
+ * @cap:		capability flag
+ */
+struct hisi_uncore_freq {
+	struct device *dev;
+	struct mbox_client cl;
+	struct pcc_mbox_chan *pchan;
+	int chan_id;
+	ktime_t last_cmd_cmpl_time;
+	struct mutex pcc_lock;
+	struct devfreq *devfreq;
+	struct cpumask related_cpus;
+	u32 cap;
+};
+
+/* PCC channel timeout = PCC nominal latency * NUM */
+#define HUCF_PCC_POLL_TIMEOUT_NUM	1000
+#define HUCF_PCC_POLL_INTERVAL_US	5
+
+/* Default polling interval in ms for devfreq governors*/
+#define HUCF_DEFAULT_POLLING_MS 100
+
+static void hisi_uncore_free_pcc_chan(struct hisi_uncore_freq *uncore)
+{
+	guard(mutex)(&uncore->pcc_lock);
+	pcc_mbox_free_channel(uncore->pchan);
+	uncore->pchan = NULL;
+}
+
+static void devm_hisi_uncore_free_pcc_chan(void *data)
+{
+	hisi_uncore_free_pcc_chan(data);
+}
+
+static int hisi_uncore_request_pcc_chan(struct hisi_uncore_freq *uncore)
+{
+	struct device *dev = uncore->dev;
+	struct pcc_mbox_chan *pcc_chan;
+
+	uncore->cl = (struct mbox_client) {
+		.dev = dev,
+		.tx_block = false,
+		.knows_txdone = true,
+	};
+
+	pcc_chan = pcc_mbox_request_channel(&uncore->cl, uncore->chan_id);
+	if (IS_ERR(pcc_chan))
+		return dev_err_probe(dev, PTR_ERR(pcc_chan),
+			"Failed to request PCC channel %u\n", uncore->chan_id);
+
+	if (!pcc_chan->shmem_base_addr) {
+		pcc_mbox_free_channel(pcc_chan);
+		return dev_err_probe(dev, -EINVAL,
+			"Invalid PCC shared memory address\n");
+	}
+
+	if (pcc_chan->shmem_size < sizeof(struct hisi_uncore_pcc_shmem)) {
+		pcc_mbox_free_channel(pcc_chan);
+		return dev_err_probe(dev, -EINVAL,
+			"Invalid PCC shared memory size (%lluB)\n",
+			pcc_chan->shmem_size);
+	}
+
+	uncore->pchan = pcc_chan;
+
+	return devm_add_action_or_reset(uncore->dev,
+					devm_hisi_uncore_free_pcc_chan, uncore);
+}
+
+static acpi_status hisi_uncore_pcc_reg_scan(struct acpi_resource *res,
+					    void *ctx)
+{
+	struct acpi_resource_generic_register *reg;
+	struct hisi_uncore_freq *uncore;
+
+	if (!res || res->type != ACPI_RESOURCE_TYPE_GENERIC_REGISTER)
+		return AE_OK;
+
+	reg = &res->data.generic_reg;
+	if (reg->space_id != ACPI_ADR_SPACE_PLATFORM_COMM)
+		return AE_OK;
+
+	if (!ctx)
+		return AE_ERROR;
+
+	uncore = ctx;
+	/* PCC subspace ID stored in Access Size */
+	uncore->chan_id = reg->access_size;
+
+	return AE_CTRL_TERMINATE;
+}
+
+static int hisi_uncore_init_pcc_chan(struct hisi_uncore_freq *uncore)
+{
+	acpi_handle handle = ACPI_HANDLE(uncore->dev);
+	acpi_status status;
+	int rc;
+
+	uncore->chan_id = -1;
+	status = acpi_walk_resources(handle, METHOD_NAME__CRS,
+				     hisi_uncore_pcc_reg_scan, uncore);
+	if (ACPI_FAILURE(status) || uncore->chan_id < 0)
+		return dev_err_probe(uncore->dev, -ENODEV,
+			"Failed to get a PCC channel\n");
+
+
+	rc = devm_mutex_init(uncore->dev, &uncore->pcc_lock);
+	if (rc)
+		return rc;
+
+	return hisi_uncore_request_pcc_chan(uncore);
+}
+
+static int hisi_uncore_cmd_send(struct hisi_uncore_freq *uncore,
+				u8 cmd, u32 *data)
+{
+	struct hisi_uncore_pcc_shmem __iomem *addr;
+	struct hisi_uncore_pcc_shmem shmem;
+	struct pcc_mbox_chan *pchan;
+	unsigned int mrtt;
+	s64 time_delta;
+	u16 status;
+	int rc;
+
+	guard(mutex)(&uncore->pcc_lock);
+
+	pchan = uncore->pchan;
+	if (!pchan)
+		return -ENODEV;
+
+	addr = (struct hisi_uncore_pcc_shmem __iomem *)pchan->shmem;
+	if (!addr)
+		return -EINVAL;
+
+	/* Handle the Minimum Request Turnaround Time (MRTT) */
+	mrtt = pchan->min_turnaround_time;
+	time_delta = ktime_us_delta(ktime_get(), uncore->last_cmd_cmpl_time);
+	if (mrtt > time_delta)
+		udelay(mrtt - time_delta);
+
+	/* Copy data */
+	shmem.head = (struct acpi_pcct_shared_memory) {
+		.signature = PCC_SIGNATURE | uncore->chan_id,
+		.command = cmd,
+	};
+	shmem.pcc_data.data = *data;
+	memcpy_toio(addr, &shmem, sizeof(shmem));
+
+	/* Ring doorbell */
+	rc = mbox_send_message(pchan->mchan, &cmd);
+	if (rc < 0) {
+		dev_err(uncore->dev, "Failed to send mbox message, %d\n", rc);
+		return rc;
+	}
+
+	/* Wait status */
+	rc = readw_poll_timeout(&addr->head.status, status,
+				status & (PCC_STATUS_CMD_COMPLETE |
+					  PCC_STATUS_ERROR),
+				HUCF_PCC_POLL_INTERVAL_US,
+				pchan->latency * HUCF_PCC_POLL_TIMEOUT_NUM);
+	if (rc) {
+		dev_err(uncore->dev, "PCC channel response timeout, cmd=%u\n", cmd);
+	} else if (status & PCC_STATUS_ERROR) {
+		dev_err(uncore->dev, "PCC cmd error, cmd=%u\n", cmd);
+		rc = -EIO;
+	}
+
+	uncore->last_cmd_cmpl_time = ktime_get();
+
+	/* Copy data back */
+	memcpy_fromio(data, &addr->pcc_data.data, sizeof(*data));
+
+	/* Clear mailbox active req */
+	mbox_client_txdone(pchan->mchan, rc);
+
+	return rc;
+}
+
+static int hisi_uncore_target(struct device *dev, unsigned long *freq,
+			      u32 flags)
+{
+	struct hisi_uncore_freq *uncore = dev_get_drvdata(dev);
+	struct dev_pm_opp *opp;
+	u32 data;
+
+	if (WARN_ON(!uncore || !uncore->pchan))
+		return -ENODEV;
+
+	opp = devfreq_recommended_opp(dev, freq, flags);
+	if (IS_ERR(opp)) {
+		dev_err(dev, "Failed to get opp for freq %lu hz\n", *freq);
+		return PTR_ERR(opp);
+	}
+	dev_pm_opp_put(opp);
+
+	data = (u32)(dev_pm_opp_get_freq(opp) / HZ_PER_MHZ);
+
+	return hisi_uncore_cmd_send(uncore, HUCF_PCC_CMD_SET_FREQ, &data);
+}
+
+static int hisi_uncore_get_dev_status(struct device *dev,
+				      struct devfreq_dev_status *stat)
+{
+	/* Not used */
+	return 0;
+}
+
+static int hisi_uncore_get_cur_freq(struct device *dev, unsigned long *freq)
+{
+	struct hisi_uncore_freq *uncore = dev_get_drvdata(dev);
+	u32 data = 0;
+	int rc;
+
+	if (WARN_ON(!uncore || !uncore->pchan))
+		return -ENODEV;
+
+	rc = hisi_uncore_cmd_send(uncore, HUCF_PCC_CMD_GET_FREQ, &data);
+
+	/*
+	 * Upon a failure, 'data' remains 0 and 'freq' is set to 0 rather than a
+	 * random value.  devfreq shouldn't use 'freq' in that case though.
+	 */
+	*freq = data * HZ_PER_MHZ;
+
+	return rc;
+}
+
+static void devm_hisi_uncore_remove_opp(void *data)
+{
+	struct hisi_uncore_freq *uncore = data;
+
+	dev_pm_opp_remove_all_dynamic(uncore->dev);
+}
+
+static int hisi_uncore_init_opp(struct hisi_uncore_freq *uncore)
+{
+	struct device *dev = uncore->dev;
+	unsigned long freq_mhz;
+	u32 num, index;
+	u32 data = 0;
+	int rc;
+
+	rc = hisi_uncore_cmd_send(uncore, HUCF_PCC_CMD_GET_PLAT_FREQ_NUM,
+				  &data);
+	if (rc)
+		return dev_err_probe(dev, rc, "Failed to get plat freq num\n");
+
+	num = data;
+
+	for (index = 0; index < num; index++) {
+		data = index;
+		rc = hisi_uncore_cmd_send(uncore,
+					  HUCF_PCC_CMD_GET_PLAT_FREQ_BY_IDX,
+					  &data);
+		if (rc) {
+			dev_pm_opp_remove_all_dynamic(dev);
+			return dev_err_probe(dev, rc,
+				"Failed to get plat freq at index %u\n", index);
+		}
+		freq_mhz = data;
+
+		/* Don't care OPP voltage, take 1V as default */
+		rc = dev_pm_opp_add(dev, freq_mhz * HZ_PER_MHZ, 1000000);
+		if (rc) {
+			dev_pm_opp_remove_all_dynamic(dev);
+			return dev_err_probe(dev, rc,
+				"Add OPP %lu failed\n", freq_mhz);
+		}
+	}
+
+	return devm_add_action_or_reset(dev, devm_hisi_uncore_remove_opp,
+					uncore);
+}
+
+static int hisi_platform_gov_func(struct devfreq *df, unsigned long *freq)
+{
+	/*
+	 * Platform-controlled mode doesn't care the frequency issued from
+	 * devfreq, so just pick the max freq.
+	 */
+	*freq = DEVFREQ_MAX_FREQ;
+
+	return 0;
+}
+
+static int hisi_platform_gov_handler(struct devfreq *df, unsigned int event,
+				     void *val)
+{
+	struct hisi_uncore_freq *uncore = dev_get_drvdata(df->dev.parent);
+	int rc = 0;
+	u32 data;
+
+	if (WARN_ON(!uncore || !uncore->pchan))
+		return -ENODEV;
+
+	switch (event) {
+	case DEVFREQ_GOV_START:
+		data = HUCF_MODE_PLATFORM;
+		rc = hisi_uncore_cmd_send(uncore, HUCF_PCC_CMD_SET_MODE, &data);
+		if (rc)
+			dev_err(uncore->dev, "Failed to set platform mode (%d)\n", rc);
+		break;
+	case DEVFREQ_GOV_STOP:
+		data = HUCF_MODE_OS;
+		rc = hisi_uncore_cmd_send(uncore, HUCF_PCC_CMD_SET_MODE, &data);
+		if (rc)
+			dev_err(uncore->dev, "Failed to set os mode (%d)\n", rc);
+		break;
+	default:
+		break;
+	}
+
+	return rc;
+}
+
+/*
+ * In the platform-controlled mode, the platform decides the uncore frequency
+ * and ignores the frequency issued from the driver.
+ * Thus, create a pseudo 'hisi_platform' governor that stops devfreq monitor
+ * from working so as to save meaningless overhead.
+ */
+static struct devfreq_governor hisi_platform_governor = {
+	.name = "hisi_platform",
+	/*
+	 * Set interrupt_driven to skip the devfreq monitor mechanism, though
+	 * this governor is not interrupt-driven.
+	 */
+	.flags = DEVFREQ_GOV_FLAG_IRQ_DRIVEN,
+	.get_target_freq = hisi_platform_gov_func,
+	.event_handler = hisi_platform_gov_handler,
+};
+
+static void hisi_uncore_remove_platform_gov(struct hisi_uncore_freq *uncore)
+{
+	u32 data = HUCF_MODE_PLATFORM;
+	int rc;
+
+	if (!(uncore->cap & HUCF_CAP_PLATFORM_CTRL))
+		return;
+
+	guard(mutex)(&hisi_platform_gov_usage_lock);
+
+	if (--hisi_platform_gov_usage == 0) {
+		rc = devfreq_remove_governor(&hisi_platform_governor);
+		if (rc)
+			dev_err(uncore->dev, "Failed to remove hisi_platform gov (%d)\n", rc);
+	}
+
+	/*
+	 * Set to the platform-controlled mode on exit if supported, so as to
+	 * have a certain behaviour when the driver is detached.
+	 */
+	rc = hisi_uncore_cmd_send(uncore, HUCF_PCC_CMD_SET_MODE, &data);
+	if (rc)
+		dev_err(uncore->dev, "Failed to set platform mode on exit (%d)\n", rc);
+}
+
+static void devm_hisi_uncore_remove_platform_gov(void *data)
+{
+	hisi_uncore_remove_platform_gov(data);
+}
+
+static int hisi_uncore_add_platform_gov(struct hisi_uncore_freq *uncore)
+{
+	if (!(uncore->cap & HUCF_CAP_PLATFORM_CTRL))
+		return 0;
+
+	guard(mutex)(&hisi_platform_gov_usage_lock);
+
+	if (hisi_platform_gov_usage == 0) {
+		int rc = devfreq_add_governor(&hisi_platform_governor);
+		if (rc)
+			return rc;
+	}
+	hisi_platform_gov_usage++;
+
+	return devm_add_action_or_reset(uncore->dev,
+					devm_hisi_uncore_remove_platform_gov,
+					uncore);
+}
+
+/*
+ * Returns:
+ * 0 if success, uncore->related_cpus is set.
+ * -EINVAL if property not found, or property found but without elements in it,
+ * or invalid arguments received in any of the subroutine.
+ * Other error codes if it goes wrong.
+ */
+static int hisi_uncore_mark_related_cpus(struct hisi_uncore_freq *uncore,
+				 char *property, int (*get_topo_id)(int cpu),
+				 const struct cpumask *(*get_cpumask)(int cpu))
+{
+	unsigned int i, cpu;
+	size_t len;
+	int rc;
+
+	rc = device_property_count_u32(uncore->dev, property);
+	if (rc < 0)
+		return rc;
+	if (rc == 0)
+		return -EINVAL;
+
+	len = rc;
+	u32 *num __free(kfree) = kcalloc(len, sizeof(*num), GFP_KERNEL);
+	if (!num)
+		return -ENOMEM;
+
+	rc = device_property_read_u32_array(uncore->dev, property, num, len);
+	if (rc)
+		return rc;
+
+	for (i = 0; i < len; i++) {
+		for_each_possible_cpu(cpu) {
+			if (get_topo_id(cpu) != num[i])
+				continue;
+
+			cpumask_or(&uncore->related_cpus,
+				   &uncore->related_cpus, get_cpumask(cpu));
+			break;
+		}
+	}
+
+	return 0;
+}
+
+static int get_package_id(int cpu)
+{
+	return topology_physical_package_id(cpu);
+}
+
+static const struct cpumask *get_package_cpumask(int cpu)
+{
+	return topology_core_cpumask(cpu);
+}
+
+static int get_cluster_id(int cpu)
+{
+	return topology_cluster_id(cpu);
+}
+
+static const struct cpumask *get_cluster_cpumask(int cpu)
+{
+	return topology_cluster_cpumask(cpu);
+}
+
+static int hisi_uncore_mark_related_cpus_wrap(struct hisi_uncore_freq *uncore)
+{
+	int rc;
+
+	cpumask_clear(&uncore->related_cpus);
+
+	rc = hisi_uncore_mark_related_cpus(uncore, "related-package",
+					   get_package_id,
+					   get_package_cpumask);
+	/* Success, or firmware probably broken */
+	if (!rc || rc != -EINVAL)
+		return rc;
+
+	/* Try another property name if rc == -EINVAL */
+	return hisi_uncore_mark_related_cpus(uncore, "related-cluster",
+					     get_cluster_id,
+					     get_cluster_cpumask);
+}
+
+static ssize_t related_cpus_show(struct device *dev,
+				 struct device_attribute *attr, char *buf)
+{
+	struct hisi_uncore_freq *uncore = dev_get_drvdata(dev->parent);
+
+	return cpumap_print_to_pagebuf(true, buf, &uncore->related_cpus);
+}
+
+static DEVICE_ATTR_RO(related_cpus);
+
+static struct attribute *hisi_uncore_freq_attrs[] = {
+	&dev_attr_related_cpus.attr,
+	NULL
+};
+ATTRIBUTE_GROUPS(hisi_uncore_freq);
+
+static int hisi_uncore_devfreq_register(struct hisi_uncore_freq *uncore)
+{
+	struct devfreq_dev_profile *profile;
+	struct device *dev = uncore->dev;
+	unsigned long freq;
+	u32 data;
+	int rc;
+
+	rc = hisi_uncore_get_cur_freq(dev, &freq);
+	if (rc)
+		return dev_err_probe(dev, rc, "Failed to get plat init freq\n");
+
+	profile = devm_kzalloc(dev, sizeof(*profile), GFP_KERNEL);
+	if (!profile)
+		return -ENOMEM;
+
+	*profile = (struct devfreq_dev_profile) {
+		.initial_freq = freq,
+		.polling_ms = HUCF_DEFAULT_POLLING_MS,
+		.timer = DEVFREQ_TIMER_DELAYED,
+		.target = hisi_uncore_target,
+		.get_dev_status = hisi_uncore_get_dev_status,
+		.get_cur_freq = hisi_uncore_get_cur_freq,
+		.dev_groups = hisi_uncore_freq_groups,
+	};
+
+	rc = hisi_uncore_cmd_send(uncore, HUCF_PCC_CMD_GET_MODE, &data);
+	if (rc)
+		return dev_err_probe(dev, rc, "Failed to get operate mode\n");
+
+	if (data == HUCF_MODE_PLATFORM)
+		uncore->devfreq = devm_devfreq_add_device(dev, profile,
+					  hisi_platform_governor.name, NULL);
+	else
+		uncore->devfreq = devm_devfreq_add_device(dev, profile,
+					  DEVFREQ_GOV_PERFORMANCE, NULL);
+	if (IS_ERR(uncore->devfreq))
+		return dev_err_probe(dev, PTR_ERR(uncore->devfreq),
+			"Failed to add devfreq device\n");
+
+	return 0;
+}
+
+static int hisi_uncore_freq_probe(struct platform_device *pdev)
+{
+	struct hisi_uncore_freq *uncore;
+	struct device *dev = &pdev->dev;
+	u32 cap;
+	int rc;
+
+	uncore = devm_kzalloc(dev, sizeof(*uncore), GFP_KERNEL);
+	if (!uncore)
+		return -ENOMEM;
+
+	uncore->dev = dev;
+	platform_set_drvdata(pdev, uncore);
+
+	rc = hisi_uncore_init_pcc_chan(uncore);
+	if (rc)
+		return dev_err_probe(dev, rc, "Failed to init PCC channel\n");
+
+	rc = hisi_uncore_init_opp(uncore);
+	if (rc)
+		return dev_err_probe(dev, rc, "Failed to init OPP\n");
+
+	rc = hisi_uncore_cmd_send(uncore, HUCF_PCC_CMD_GET_CAP, &cap);
+	if (rc)
+		return dev_err_probe(dev, rc, "Failed to get capability\n");
+
+	uncore->cap = cap;
+
+	rc = hisi_uncore_add_platform_gov(uncore);
+	if (rc)
+		return dev_err_probe(dev, rc, "Failed to add hisi_platform governor\n");
+
+	rc = hisi_uncore_mark_related_cpus_wrap(uncore);
+	if (rc)
+		return dev_err_probe(dev, rc, "Failed to mark related cpus\n");
+
+	rc = hisi_uncore_devfreq_register(uncore);
+	if (rc)
+		return dev_err_probe(dev, rc, "Failed to register devfreq\n");
+
+	return 0;
+}
+
+static const struct acpi_device_id hisi_uncore_freq_acpi_match[] = {
+	{ "HISI04F1", },
+	{ }
+};
+MODULE_DEVICE_TABLE(acpi, hisi_uncore_freq_acpi_match);
+
+static struct platform_driver hisi_uncore_freq_drv = {
+	.probe	= hisi_uncore_freq_probe,
+	.driver = {
+		.name = "hisi_uncore_freq",
+		.acpi_match_table = hisi_uncore_freq_acpi_match,
+	},
+};
+module_platform_driver(hisi_uncore_freq_drv);
+
+MODULE_DESCRIPTION("HiSilicon uncore frequency scaling driver");
+MODULE_AUTHOR("Jie Zhan <zhanjie9@hisilicon.com>");
+MODULE_LICENSE("GPL");
--- a/drivers/devfreq/sun8i-a33-mbus.c
+++ b/drivers/devfreq/sun8i-a33-mbus.c
@@ -360,7 +360,7 @@ static int sun8i_a33_mbus_probe(struct platform_device *pdev)
 	if (IS_ERR(priv->reg_mbus))
 		return PTR_ERR(priv->reg_mbus);

-	priv->clk_bus = devm_clk_get(dev, "bus");
+	priv->clk_bus = devm_clk_get_enabled(dev, "bus");
 	if (IS_ERR(priv->clk_bus))
 		return dev_err_probe(dev, PTR_ERR(priv->clk_bus),
 				     "failed to get bus clock\n");
@@ -375,24 +375,15 @@ static int sun8i_a33_mbus_probe(struct platform_device *pdev)
 		return dev_err_probe(dev, PTR_ERR(priv->clk_mbus),
 				     "failed to get mbus clock\n");

-	ret = clk_prepare_enable(priv->clk_bus);
-	if (ret)
-		return dev_err_probe(dev, ret,
-				     "failed to enable bus clock\n");
-
 	/* Lock the DRAM clock rate to keep priv->nominal_bw in sync. */
-	ret = clk_rate_exclusive_get(priv->clk_dram);
-	if (ret) {
-		err = "failed to lock dram clock rate\n";
-		goto err_disable_bus;
-	}
+	ret = devm_clk_rate_exclusive_get(dev, priv->clk_dram);
+	if (ret)
+		return dev_err_probe(dev, ret, "failed to lock dram clock rate\n");

 	/* Lock the MBUS clock rate to keep MBUS_TMR_PERIOD in sync. */
-	ret = clk_rate_exclusive_get(priv->clk_mbus);
-	if (ret) {
-		err = "failed to lock mbus clock rate\n";
-		goto err_unlock_dram;
-	}
+	ret = devm_clk_rate_exclusive_get(dev, priv->clk_mbus);
+	if (ret)
+		return dev_err_probe(dev, ret, "failed to lock mbus clock rate\n");

 	priv->gov_data.upthreshold	= 10;
 	priv->gov_data.downdifferential	=  5;
@@ -405,10 +396,8 @@ static int sun8i_a33_mbus_probe(struct platform_device *pdev)
 	priv->profile.max_state		= max_state;

 	ret = devm_pm_opp_set_clkname(dev, "dram");
-	if (ret) {
-		err = "failed to add OPP table\n";
-		goto err_unlock_mbus;
-	}
+	if (ret)
+		return dev_err_probe(dev, ret, "failed to add OPP table\n");

 	base_freq = clk_get_rate(clk_get_parent(priv->clk_dram));
 	for (i = 0; i < max_state; ++i) {
@@ -448,12 +437,6 @@ static int sun8i_a33_mbus_probe(struct platform_device *pdev)

 err_remove_opps:
 	dev_pm_opp_remove_all_dynamic(dev);
-err_unlock_mbus:
-	clk_rate_exclusive_put(priv->clk_mbus);
-err_unlock_dram:
-	clk_rate_exclusive_put(priv->clk_dram);
-err_disable_bus:
-	clk_disable_unprepare(priv->clk_bus);

 	return dev_err_probe(dev, ret, err);
 }
@@ -472,9 +455,6 @@ static void sun8i_a33_mbus_remove(struct platform_device *pdev)
 		dev_warn(dev, "failed to restore DRAM frequency: %d\n", ret);

 	dev_pm_opp_remove_all_dynamic(dev);
-	clk_rate_exclusive_put(priv->clk_mbus);
-	clk_rate_exclusive_put(priv->clk_dram);
-	clk_disable_unprepare(priv->clk_bus);
 }

 static const struct sun8i_a33_mbus_variant sun50i_a64_mbus = {
--- a/drivers/gpu/drm/display/drm_dp_aux_bus.c
+++ b/drivers/gpu/drm/display/drm_dp_aux_bus.c
@@ -57,7 +57,7 @@ static int dp_aux_ep_probe(struct device *dev)
 		container_of(aux_ep, struct dp_aux_ep_device_with_data, aux_ep);
 	int ret;

-	ret = dev_pm_domain_attach(dev, true);
+	ret = dev_pm_domain_attach(dev, PD_FLAG_ATTACH_POWER_ON);
 	if (ret)
 		return dev_err_probe(dev, ret, "Failed to attach to PM Domain\n");

--- a/drivers/i2c/i2c-core-base.c
+++ b/drivers/i2c/i2c-core-base.c
@@ -573,7 +573,7 @@ static int i2c_device_probe(struct device *dev)
 		goto err_clear_wakeup_irq;

 	do_power_on = !i2c_acpi_waive_d0_probe(dev);
-	status = dev_pm_domain_attach(&client->dev, do_power_on);
+	status = dev_pm_domain_attach(&client->dev, do_power_on ? PD_FLAG_ATTACH_POWER_ON : 0);
 	if (status)
 		goto err_clear_wakeup_irq;

--- a/drivers/mmc/core/sdio_bus.c
+++ b/drivers/mmc/core/sdio_bus.c
@@ -161,7 +161,7 @@ static int sdio_bus_probe(struct device *dev)
 	if (!id)
 		return -ENODEV;

-	ret = dev_pm_domain_attach(dev, false);
+	ret = dev_pm_domain_attach(dev, 0);
 	if (ret)
 		return ret;

--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -708,6 +708,8 @@ static int pci_pm_prepare(struct device *dev)
 	struct pci_dev *pci_dev = to_pci_dev(dev);
 	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;

+	dev_pm_set_strict_midlayer(dev, true);
+
 	if (pm && pm->prepare) {
 		int error = pm->prepare(dev);
 		if (error < 0)
@@ -749,6 +751,8 @@ static void pci_pm_complete(struct device *dev)
 		if (pci_dev->current_state < pre_sleep_state)
 			pm_request_resume(dev);
 	}
+
+	dev_pm_set_strict_midlayer(dev, false);
 }

 #else /* !CONFIG_PM_SLEEP */
--- a/drivers/powercap/dtpm_cpu.c
+++ b/drivers/powercap/dtpm_cpu.c
@@ -96,6 +96,8 @@ static u64 get_pd_power_uw(struct dtpm *dtpm)
 	int i;

 	pd = em_cpu_get(dtpm_cpu->cpu);
+	if (!pd)
+		return 0;

 	pd_mask = em_span_cpus(pd);

--- a/drivers/powercap/intel_rapl_common.c
+++ b/drivers/powercap/intel_rapl_common.c
@@ -1277,6 +1277,7 @@ static const struct x86_cpu_id rapl_ids[] __initconst = {
 	X86_MATCH_VFM(INTEL_RAPTORLAKE,		&rapl_defaults_core),
 	X86_MATCH_VFM(INTEL_RAPTORLAKE_P,        &rapl_defaults_core),
 	X86_MATCH_VFM(INTEL_RAPTORLAKE_S,	&rapl_defaults_core),
+	X86_MATCH_VFM(INTEL_BARTLETTLAKE,	&rapl_defaults_core),
 	X86_MATCH_VFM(INTEL_METEORLAKE,		&rapl_defaults_core),
 	X86_MATCH_VFM(INTEL_METEORLAKE_L,	&rapl_defaults_core),
 	X86_MATCH_VFM(INTEL_SAPPHIRERAPIDS_X,	&rapl_defaults_spr_server),
--- a/drivers/powercap/intel_rapl_msr.c
+++ b/drivers/powercap/intel_rapl_msr.c
@@ -150,6 +150,7 @@ static const struct x86_cpu_id pl4_support_ids[] = {
 	X86_MATCH_VFM(INTEL_METEORLAKE_L, NULL),
 	X86_MATCH_VFM(INTEL_ARROWLAKE_U, NULL),
 	X86_MATCH_VFM(INTEL_ARROWLAKE_H, NULL),
+	X86_MATCH_VFM(INTEL_PANTHERLAKE_L, NULL),
 	{}
 };

--- a/drivers/rpmsg/rpmsg_core.c
+++ b/drivers/rpmsg/rpmsg_core.c
@@ -479,7 +479,7 @@ static int rpmsg_dev_probe(struct device *dev)
 	struct rpmsg_endpoint *ept = NULL;
 	int err;

-	err = dev_pm_domain_attach(dev, true);
+	err = dev_pm_domain_attach(dev, PD_FLAG_ATTACH_POWER_ON);
 	if (err)
 		goto out;

--- a/drivers/soundwire/bus_type.c
+++ b/drivers/soundwire/bus_type.c
@@ -101,7 +101,7 @@ static int sdw_drv_probe(struct device *dev)
 	/*
 	 * attach to power domain but don't turn on (last arg)
 	 */
-	ret = dev_pm_domain_attach(dev, false);
+	ret = dev_pm_domain_attach(dev, 0);
 	if (ret)
 		return ret;

--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -427,7 +427,7 @@ static int spi_probe(struct device *dev)
 	if (spi->irq < 0)
 		spi->irq = 0;

-	ret = dev_pm_domain_attach(dev, true);
+	ret = dev_pm_domain_attach(dev, PD_FLAG_ATTACH_POWER_ON);
 	if (ret)
 		return ret;

--- a/drivers/tty/serdev/core.c
+++ b/drivers/tty/serdev/core.c
@@ -399,7 +399,7 @@ static int serdev_drv_probe(struct device *dev)
 	const struct serdev_device_driver *sdrv = to_serdev_device_driver(dev->driver);
 	int ret;

-	ret = dev_pm_domain_attach(dev, true);
+	ret = dev_pm_domain_attach(dev, PD_FLAG_ATTACH_POWER_ON);
 	if (ret)
 		return ret;

--- a/include/acpi/cppc_acpi.h
+++ b/include/acpi/cppc_acpi.h
@@ -139,7 +139,6 @@ struct cppc_perf_fb_ctrs {

 /* Per CPU container for runtime CPPC management. */
 struct cppc_cpudata {
-	struct list_head node;
 	struct cppc_perf_caps perf_caps;
 	struct cppc_perf_ctrls perf_ctrls;
 	struct cppc_perf_fb_ctrs perf_fb_ctrs;
--- a/include/linux/devfreq.h
+++ b/include/linux/devfreq.h
@@ -103,6 +103,8 @@ struct devfreq_dev_status {
 *
 * @is_cooling_device: A self-explanatory boolean giving the device a
 *                     cooling effect property.
+ * @dev_groups:		Optional device-specific sysfs attribute groups that to
+ *			be attached to the devfreq device.
 */
 struct devfreq_dev_profile {
 	unsigned long initial_freq;
@@ -119,6 +121,8 @@ struct devfreq_dev_profile {
 	unsigned int max_state;

 	bool is_cooling_device;
+
+	const struct attribute_group **dev_groups;
 };

 /**
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -879,6 +879,33 @@ static inline bool dev_pm_smart_suspend(struct device *dev)
 #endif
 }

+/*
+ * dev_pm_set_strict_midlayer - Update the device's power.strict_midlayer flag
+ * @dev: Target device.
+ * @val: New flag value.
+ *
+ * When set, power.strict_midlayer means that the middle layer power management
+ * code (typically, a bus type or a PM domain) does not expect its runtime PM
+ * suspend callback to be invoked at all during system-wide PM transitions and
+ * it does not expect its runtime PM resume callback to be invoked at any point
+ * when runtime PM is disabled for the device during system-wide PM transitions.
+ */
+static inline void dev_pm_set_strict_midlayer(struct device *dev, bool val)
+{
+#ifdef CONFIG_PM_SLEEP
+	dev->power.strict_midlayer = val;
+#endif
+}
+
+static inline bool dev_pm_strict_midlayer_is_set(struct device *dev)
+{
+#ifdef CONFIG_PM_SLEEP
+	return dev->power.strict_midlayer;
+#else
+	return false;
+#endif
+}
+
 static inline void device_lock(struct device *dev)
 {
 	mutex_lock(&dev->mutex);
--- a/include/linux/pm.h
+++ b/include/linux/pm.h
@@ -8,14 +8,15 @@
 #ifndef _LINUX_PM_H
 #define _LINUX_PM_H

-#include <linux/export.h>
-#include <linux/list.h>
-#include <linux/workqueue.h>
-#include <linux/spinlock.h>
-#include <linux/wait.h>
-#include <linux/timer.h>
-#include <linux/hrtimer.h>
 #include <linux/completion.h>
+#include <linux/export.h>
+#include <linux/hrtimer_types.h>
+#include <linux/mutex.h>
+#include <linux/spinlock.h>
+#include <linux/types.h>
+#include <linux/util_macros.h>
+#include <linux/wait.h>
+#include <linux/workqueue_types.h>

 /*
 * Callbacks for platform drivers to implement.
@@ -683,6 +684,7 @@ struct dev_pm_info {
 	bool			smart_suspend:1;	/* Owned by the PM core */
 	bool			must_resume:1;		/* Owned by the PM core */
 	bool			may_skip_resume:1;	/* Set by subsystems */
+	bool			strict_midlayer:1;
 #else
 	bool			should_wakeup:1;
 #endif
@@ -720,6 +722,7 @@ struct dev_pm_info {
 	struct pm_subsys_data	*subsys_data;  /* Owned by the subsystem. */
 	void (*set_latency_tolerance)(struct device *, s32);
 	struct dev_pm_qos	*qos;
+	bool			detach_power_off:1;	/* Owned by the driver core */
 };

 extern int dev_pm_get_subsys_data(struct device *dev);
--- a/include/linux/pm_domain.h
+++ b/include/linux/pm_domain.h
@@ -36,10 +36,16 @@
 *				isn't specified, the index just follows the
 *				index for the attached PM domain.
 *
+ * PD_FLAG_ATTACH_POWER_ON:	Power on the domain during attach.
+ *
+ * PD_FLAG_DETACH_POWER_OFF:	Power off the domain during detach.
+ *
 */
 #define PD_FLAG_NO_DEV_LINK		BIT(0)
 #define PD_FLAG_DEV_LINK_ON		BIT(1)
 #define PD_FLAG_REQUIRED_OPP		BIT(2)
+#define PD_FLAG_ATTACH_POWER_ON		BIT(3)
+#define PD_FLAG_DETACH_POWER_OFF	BIT(4)

 struct dev_pm_domain_attach_data {
 	const char * const *pd_names;
@@ -501,7 +507,7 @@ struct generic_pm_domain *of_genpd_remove_last(struct device_node *np)
 #endif /* CONFIG_PM_GENERIC_DOMAINS_OF */

 #ifdef CONFIG_PM
-int dev_pm_domain_attach(struct device *dev, bool power_on);
+int dev_pm_domain_attach(struct device *dev, u32 flags);
 struct device *dev_pm_domain_attach_by_id(struct device *dev,
 					  unsigned int index);
 struct device *dev_pm_domain_attach_by_name(struct device *dev,
@@ -518,7 +524,7 @@ int dev_pm_domain_start(struct device *dev);
 void dev_pm_domain_set(struct device *dev, struct dev_pm_domain *pd);
 int dev_pm_domain_set_performance_state(struct device *dev, unsigned int state);
 #else
-static inline int dev_pm_domain_attach(struct device *dev, bool power_on)
+static inline int dev_pm_domain_attach(struct device *dev, u32 flags)
 {
 	return 0;
 }
--- a/include/linux/pm_runtime.h
+++ b/include/linux/pm_runtime.h
@@ -66,9 +66,7 @@ static inline bool queue_pm_work(struct work_struct *work)

 extern int pm_generic_runtime_suspend(struct device *dev);
 extern int pm_generic_runtime_resume(struct device *dev);
-extern bool pm_runtime_need_not_resume(struct device *dev);
 extern int pm_runtime_force_suspend(struct device *dev);
-extern int pm_runtime_force_resume(struct device *dev);

 extern int __pm_runtime_idle(struct device *dev, int rpmflags);
 extern int __pm_runtime_suspend(struct device *dev, int rpmflags);
@@ -257,9 +255,7 @@ static inline bool queue_pm_work(struct work_struct *work) { return false; }

 static inline int pm_generic_runtime_suspend(struct device *dev) { return 0; }
 static inline int pm_generic_runtime_resume(struct device *dev) { return 0; }
-static inline bool pm_runtime_need_not_resume(struct device *dev) {return true; }
 static inline int pm_runtime_force_suspend(struct device *dev) { return 0; }
-static inline int pm_runtime_force_resume(struct device *dev) { return 0; }

 static inline int __pm_runtime_idle(struct device *dev, int rpmflags)
 {
@@ -330,6 +326,18 @@ static inline void pm_runtime_release_supplier(struct device_link *link) {}

 #endif /* !CONFIG_PM */

+#ifdef CONFIG_PM_SLEEP
+
+bool pm_runtime_need_not_resume(struct device *dev);
+int pm_runtime_force_resume(struct device *dev);
+
+#else /* !CONFIG_PM_SLEEP */
+
+static inline bool pm_runtime_need_not_resume(struct device *dev) {return true; }
+static inline int pm_runtime_force_resume(struct device *dev) { return -ENXIO; }
+
+#endif /* CONFIG_PM_SLEEP */
+
 /**
 * pm_runtime_idle - Conditionally set up autosuspend of a device or suspend it.
 * @dev: Target device.
@@ -337,6 +345,20 @@ static inline void pm_runtime_release_supplier(struct device_link *link) {}
 * Invoke the "idle check" callback of @dev and, depending on its return value,
 * set up autosuspend of @dev or suspend it (depending on whether or not
 * autosuspend has been enabled for it).
+ *
+ * Return:
+ * * 0: Success.
+ * * -EINVAL: Runtime PM error.
+ * * -EACCES: Runtime PM disabled.
+ * * -EAGAIN: Runtime PM usage_count non-zero, Runtime PM status change ongoing
+ *            or device not in %RPM_ACTIVE state.
+ * * -EBUSY: Runtime PM child_count non-zero.
+ * * -EPERM: Device PM QoS resume latency 0.
+ * * -EINPROGRESS: Suspend already in progress.
+ * * -ENOSYS: CONFIG_PM not enabled.
+ * * 1: Device already suspended.
+ * Other values and conditions for the above values are possible as returned by
+ * Runtime PM idle and suspend callbacks.
 */
 static inline int pm_runtime_idle(struct device *dev)
 {
@@ -346,6 +368,18 @@ static inline int pm_runtime_idle(struct device *dev)
 /**
 * pm_runtime_suspend - Suspend a device synchronously.
 * @dev: Target device.
+ *
+ * Return:
+ * * 0: Success.
+ * * -EINVAL: Runtime PM error.
+ * * -EACCES: Runtime PM disabled.
+ * * -EAGAIN: Runtime PM usage_count non-zero or Runtime PM status change ongoing.
+ * * -EBUSY: Runtime PM child_count non-zero.
+ * * -EPERM: Device PM QoS resume latency 0.
+ * * -ENOSYS: CONFIG_PM not enabled.
+ * * 1: Device already suspended.
+ * Other values and conditions for the above values are possible as returned by
+ * Runtime PM suspend callbacks.
 */
 static inline int pm_runtime_suspend(struct device *dev)
 {
@@ -353,14 +387,29 @@ static inline int pm_runtime_suspend(struct device *dev)
 }

 /**
- * pm_runtime_autosuspend - Set up autosuspend of a device or suspend it.
+ * pm_runtime_autosuspend - Update the last access time and set up autosuspend
+ * of a device.
 * @dev: Target device.
 *
- * Set up autosuspend of @dev or suspend it (depending on whether or not
- * autosuspend is enabled for it) without engaging its "idle check" callback.
+ * First update the last access time, then set up autosuspend of @dev or suspend
+ * it (depending on whether or not autosuspend is enabled for it) without
+ * engaging its "idle check" callback.
+ *
+ * Return:
+ * * 0: Success.
+ * * -EINVAL: Runtime PM error.
+ * * -EACCES: Runtime PM disabled.
+ * * -EAGAIN: Runtime PM usage_count non-zero or Runtime PM status change ongoing.
+ * * -EBUSY: Runtime PM child_count non-zero.
+ * * -EPERM: Device PM QoS resume latency 0.
+ * * -ENOSYS: CONFIG_PM not enabled.
+ * * 1: Device already suspended.
+ * Other values and conditions for the above values are possible as returned by
+ * Runtime PM suspend callbacks.
 */
 static inline int pm_runtime_autosuspend(struct device *dev)
 {
+	pm_runtime_mark_last_busy(dev);
 	return __pm_runtime_suspend(dev, RPM_AUTO);
 }

@@ -379,6 +428,18 @@ static inline int pm_runtime_resume(struct device *dev)
 *
 * Queue up a work item to run an equivalent of pm_runtime_idle() for @dev
 * asynchronously.
+ *
+ * Return:
+ * * 0: Success.
+ * * -EINVAL: Runtime PM error.
+ * * -EACCES: Runtime PM disabled.
+ * * -EAGAIN: Runtime PM usage_count non-zero, Runtime PM status change ongoing
+ *            or device not in %RPM_ACTIVE state.
+ * * -EBUSY: Runtime PM child_count non-zero.
+ * * -EPERM: Device PM QoS resume latency 0.
+ * * -EINPROGRESS: Suspend already in progress.
+ * * -ENOSYS: CONFIG_PM not enabled.
+ * * 1: Device already suspended.
 */
 static inline int pm_request_idle(struct device *dev)
 {
@@ -395,14 +456,27 @@ static inline int pm_request_resume(struct device *dev)
 }

 /**
- * pm_request_autosuspend - Queue up autosuspend of a device.
+ * pm_request_autosuspend - Update the last access time and queue up autosuspend
+ * of a device.
 * @dev: Target device.
 *
- * Queue up a work item to run an equivalent pm_runtime_autosuspend() for @dev
- * asynchronously.
+ * Update the last access time of a device and queue up a work item to run an
+ * equivalent pm_runtime_autosuspend() for @dev asynchronously.
+ *
+ * Return:
+ * * 0: Success.
+ * * -EINVAL: Runtime PM error.
+ * * -EACCES: Runtime PM disabled.
+ * * -EAGAIN: Runtime PM usage_count non-zero or Runtime PM status change ongoing.
+ * * -EBUSY: Runtime PM child_count non-zero.
+ * * -EPERM: Device PM QoS resume latency 0.
+ * * -EINPROGRESS: Suspend already in progress.
+ * * -ENOSYS: CONFIG_PM not enabled.
+ * * 1: Device already suspended.
 */
 static inline int pm_request_autosuspend(struct device *dev)
 {
+	pm_runtime_mark_last_busy(dev);
 	return __pm_runtime_suspend(dev, RPM_ASYNC | RPM_AUTO);
 }

@@ -464,6 +538,17 @@ static inline int pm_runtime_resume_and_get(struct device *dev)
 *
 * Decrement the runtime PM usage counter of @dev and if it turns out to be
 * equal to 0, queue up a work item for @dev like in pm_request_idle().
+ *
+ * Return:
+ * * 0: Success.
+ * * -EINVAL: Runtime PM error.
+ * * -EACCES: Runtime PM disabled.
+ * * -EAGAIN: Runtime PM usage_count non-zero or Runtime PM status change ongoing.
+ * * -EBUSY: Runtime PM child_count non-zero.
+ * * -EPERM: Device PM QoS resume latency 0.
+ * * -EINPROGRESS: Suspend already in progress.
+ * * -ENOSYS: CONFIG_PM not enabled.
+ * * 1: Device already suspended.
 */
 static inline int pm_runtime_put(struct device *dev)
 {
@@ -478,6 +563,17 @@ DEFINE_FREE(pm_runtime_put, struct device *, if (_T) pm_runtime_put(_T))
 *
 * Decrement the runtime PM usage counter of @dev and if it turns out to be
 * equal to 0, queue up a work item for @dev like in pm_request_autosuspend().
+ *
+ * Return:
+ * * 0: Success.
+ * * -EINVAL: Runtime PM error.
+ * * -EACCES: Runtime PM disabled.
+ * * -EAGAIN: Runtime PM usage_count non-zero or Runtime PM status change ongoing.
+ * * -EBUSY: Runtime PM child_count non-zero.
+ * * -EPERM: Device PM QoS resume latency 0.
+ * * -EINPROGRESS: Suspend already in progress.
+ * * -ENOSYS: CONFIG_PM not enabled.
+ * * 1: Device already suspended.
 */
 static inline int __pm_runtime_put_autosuspend(struct device *dev)
 {
@@ -485,16 +581,29 @@ static inline int __pm_runtime_put_autosuspend(struct device *dev)
 }

 /**
- * pm_runtime_put_autosuspend - Drop device usage counter and queue autosuspend if 0.
+ * pm_runtime_put_autosuspend - Update the last access time of a device, drop
+ * its usage counter and queue autosuspend if the usage counter becomes 0.
 * @dev: Target device.
 *
- * Decrement the runtime PM usage counter of @dev and if it turns out to be
- * equal to 0, queue up a work item for @dev like in pm_request_autosuspend().
+ * Update the last access time of @dev, decrement runtime PM usage counter of
+ * @dev and if it turns out to be equal to 0, queue up a work item for @dev like
+ * in pm_request_autosuspend().
+ *
+ * Return:
+ * * 0: Success.
+ * * -EINVAL: Runtime PM error.
+ * * -EACCES: Runtime PM disabled.
+ * * -EAGAIN: Runtime PM usage_count non-zero or Runtime PM status change ongoing.
+ * * -EBUSY: Runtime PM child_count non-zero.
+ * * -EPERM: Device PM QoS resume latency 0.
+ * * -EINPROGRESS: Suspend already in progress.
+ * * -ENOSYS: CONFIG_PM not enabled.
+ * * 1: Device already suspended.
 */
 static inline int pm_runtime_put_autosuspend(struct device *dev)
 {
-	return __pm_runtime_suspend(dev,
-	    RPM_GET_PUT | RPM_ASYNC | RPM_AUTO);
+	pm_runtime_mark_last_busy(dev);
+	return __pm_runtime_put_autosuspend(dev);
 }

 /**
@@ -506,9 +615,20 @@ static inline int pm_runtime_put_autosuspend(struct device *dev)
 * return value, set up autosuspend of @dev or suspend it (depending on whether
 * or not autosuspend has been enabled for it).
 *
- * The possible return values of this function are the same as for
- * pm_runtime_idle() and the runtime PM usage counter of @dev remains
- * decremented in all cases, even if it returns an error code.
+ * The runtime PM usage counter of @dev remains decremented in all cases, even
+ * if it returns an error code.
+ *
+ * Return:
+ * * 0: Success.
+ * * -EINVAL: Runtime PM error.
+ * * -EACCES: Runtime PM disabled.
+ * * -EAGAIN: Runtime PM usage_count non-zero or Runtime PM status change ongoing.
+ * * -EBUSY: Runtime PM child_count non-zero.
+ * * -EPERM: Device PM QoS resume latency 0.
+ * * -ENOSYS: CONFIG_PM not enabled.
+ * * 1: Device already suspended.
+ * Other values and conditions for the above values are possible as returned by
+ * Runtime PM suspend callbacks.
 */
 static inline int pm_runtime_put_sync(struct device *dev)
 {
@@ -522,9 +642,21 @@ static inline int pm_runtime_put_sync(struct device *dev)
 * Decrement the runtime PM usage counter of @dev and if it turns out to be
 * equal to 0, carry out runtime-suspend of @dev synchronously.
 *
- * The possible return values of this function are the same as for
- * pm_runtime_suspend() and the runtime PM usage counter of @dev remains
- * decremented in all cases, even if it returns an error code.
+ * The runtime PM usage counter of @dev remains decremented in all cases, even
+ * if it returns an error code.
+ *
+ * Return:
+ * * 0: Success.
+ * * -EINVAL: Runtime PM error.
+ * * -EACCES: Runtime PM disabled.
+ * * -EAGAIN: Runtime PM usage_count non-zero or Runtime PM status change ongoing.
+ * * -EAGAIN: usage_count non-zero or Runtime PM status change ongoing.
+ * * -EBUSY: Runtime PM child_count non-zero.
+ * * -EPERM: Device PM QoS resume latency 0.
+ * * -ENOSYS: CONFIG_PM not enabled.
+ * * 1: Device already suspended.
+ * Other values and conditions for the above values are possible as returned by
+ * Runtime PM suspend callbacks.
 */
 static inline int pm_runtime_put_sync_suspend(struct device *dev)
 {
@@ -532,19 +664,34 @@ static inline int pm_runtime_put_sync_suspend(struct device *dev)
 }

 /**
- * pm_runtime_put_sync_autosuspend - Drop device usage counter and autosuspend if 0.
+ * pm_runtime_put_sync_autosuspend - Update the last access time of a device,
+ * drop device usage counter and autosuspend if 0.
 * @dev: Target device.
 *
- * Decrement the runtime PM usage counter of @dev and if it turns out to be
- * equal to 0, set up autosuspend of @dev or suspend it synchronously (depending
- * on whether or not autosuspend has been enabled for it).
+ * Update the last access time of @dev, decrement the runtime PM usage counter
+ * of @dev and if it turns out to be equal to 0, set up autosuspend of @dev or
+ * suspend it synchronously (depending on whether or not autosuspend has been
+ * enabled for it).
 *
- * The possible return values of this function are the same as for
- * pm_runtime_autosuspend() and the runtime PM usage counter of @dev remains
- * decremented in all cases, even if it returns an error code.
+ * The runtime PM usage counter of @dev remains decremented in all cases, even
+ * if it returns an error code.
+ *
+ * Return:
+ * * 0: Success.
+ * * -EINVAL: Runtime PM error.
+ * * -EACCES: Runtime PM disabled.
+ * * -EAGAIN: Runtime PM usage_count non-zero or Runtime PM status change ongoing.
+ * * -EBUSY: Runtime PM child_count non-zero.
+ * * -EPERM: Device PM QoS resume latency 0.
+ * * -EINPROGRESS: Suspend already in progress.
+ * * -ENOSYS: CONFIG_PM not enabled.
+ * * 1: Device already suspended.
+ * Other values and conditions for the above values are possible as returned by
+ * Runtime PM suspend callbacks.
 */
 static inline int pm_runtime_put_sync_autosuspend(struct device *dev)
 {
+	pm_runtime_mark_last_busy(dev);
 	return __pm_runtime_suspend(dev, RPM_GET_PUT | RPM_AUTO);
 }

--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -1080,7 +1080,7 @@ int kernel_kexec(void)
 		console_suspend_all();
 		error = dpm_suspend_start(PMSG_FREEZE);
 		if (error)
-			goto Resume_console;
+			goto Resume_devices;
 		/*
 		 * dpm_suspend_end() must be called after dpm_suspend_start()
 		 * to complete the transition, like in the hibernation flows
@@ -1135,8 +1135,6 @@ int kernel_kexec(void)
 		dpm_resume_start(PMSG_RESTORE);
 Resume_devices:
 		dpm_resume_end(PMSG_RESTORE);
- Resume_console:
-		pm_restore_gfp_mask();
 		console_resume_all();
 		thaw_processes();
 Restore_console:
--- a/kernel/power/console.c
+++ b/kernel/power/console.c
@@ -16,6 +16,7 @@
 #define SUSPEND_CONSOLE	(MAX_NR_CONSOLES-1)

 static int orig_fgconsole, orig_kmsg;
+static bool vt_switch_done;

 static DEFINE_MUTEX(vt_switch_mutex);

@@ -136,17 +137,21 @@ void pm_prepare_console(void)
 	if (orig_fgconsole < 0)
 		return;

+	vt_switch_done = true;
+
 	orig_kmsg = vt_kmsg_redirect(SUSPEND_CONSOLE);
 	return;
 }

 void pm_restore_console(void)
 {
-	if (!pm_vt_switch())
+	if (!pm_vt_switch() && !vt_switch_done)
 		return;

 	if (orig_fgconsole >= 0) {
 		vt_move_to_console(orig_fgconsole, 0);
 		vt_kmsg_redirect(orig_kmsg);
 	}
+
+	vt_switch_done = false;
 }
--- a/kernel/power/main.c
+++ b/kernel/power/main.c
@@ -8,6 +8,7 @@

 #include <linux/acpi.h>
 #include <linux/export.h>
+#include <linux/init.h>
 #include <linux/kobject.h>
 #include <linux/string.h>
 #include <linux/pm-trace.h>
@@ -112,6 +113,14 @@ int pm_notifier_call_chain(unsigned long val)
 /* If set, devices may be suspended and resumed asynchronously. */
 int pm_async_enabled = 1;

+static int __init pm_async_setup(char *str)
+{
+	if (!strcmp(str, "off"))
+		pm_async_enabled = 0;
+	return 1;
+}
+__setup("pm_async=", pm_async_setup);
+
 static ssize_t pm_async_show(struct kobject *kobj, struct kobj_attribute *attr,
 			     char *buf)
 {
--- a/kernel/power/snapshot.c
+++ b/kernel/power/snapshot.c
@@ -1536,7 +1536,7 @@ static unsigned long copy_data_pages(struct memory_bitmap *copy_bm,
 	memory_bm_position_reset(orig_bm);
 	memory_bm_position_reset(copy_bm);
 	copy_pfn = memory_bm_next_pfn(copy_bm);
-	for(;;) {
+	for (;;) {
 		pfn = memory_bm_next_pfn(orig_bm);
 		if (unlikely(pfn == BM_END_OF_MAP))
 			break;
@@ -2161,13 +2161,13 @@ static const char *check_image_kernel(struct swsusp_info *info)
 {
 	if (info->version_code != LINUX_VERSION_CODE)
 		return "kernel version";
-	if (strcmp(info->uts.sysname,init_utsname()->sysname))
+	if (strcmp(info->uts.sysname, init_utsname()->sysname))
 		return "system type";
-	if (strcmp(info->uts.release,init_utsname()->release))
+	if (strcmp(info->uts.release, init_utsname()->release))
 		return "kernel release";
-	if (strcmp(info->uts.version,init_utsname()->version))
+	if (strcmp(info->uts.version, init_utsname()->version))
 		return "version";
-	if (strcmp(info->uts.machine,init_utsname()->machine))
+	if (strcmp(info->uts.machine, init_utsname()->machine))
 		return "machine";
 	return NULL;
 }
@@ -2361,7 +2361,7 @@ static int unpack_orig_pfns(unsigned long *buf, struct memory_bitmap *bm,
 		struct memory_bitmap *zero_bm)
 {
 	unsigned long decoded_pfn;
-        bool zero;
+	bool zero;
 	int j;

 	for (j = 0; j < PAGE_SIZE / sizeof(long); j++) {
--- a/rust/kernel/cpufreq.rs
+++ b/rust/kernel/cpufreq.rs
@@ -1061,7 +1061,7 @@ impl<T: Driver> Registration<T> {
    ///
    /// - This function may only be called from the cpufreq C infrastructure.
    /// - The pointer arguments must be valid pointers.
-    unsafe extern "C" fn init_callback(ptr: *mut bindings::cpufreq_policy) -> kernel::ffi::c_int {
+    unsafe extern "C" fn init_callback(ptr: *mut bindings::cpufreq_policy) -> c_int {
        from_result(|| {
            // SAFETY: The `ptr` is guaranteed to be valid by the contract with the C code for the
            // lifetime of `policy`.
@@ -1094,7 +1094,7 @@ impl<T: Driver> Registration<T> {
    ///
    /// - This function may only be called from the cpufreq C infrastructure.
    /// - The pointer arguments must be valid pointers.
-    unsafe extern "C" fn online_callback(ptr: *mut bindings::cpufreq_policy) -> kernel::ffi::c_int {
+    unsafe extern "C" fn online_callback(ptr: *mut bindings::cpufreq_policy) -> c_int {
        from_result(|| {
            // SAFETY: The `ptr` is guaranteed to be valid by the contract with the C code for the
            // lifetime of `policy`.
@@ -1109,9 +1109,7 @@ impl<T: Driver> Registration<T> {
    ///
    /// - This function may only be called from the cpufreq C infrastructure.
    /// - The pointer arguments must be valid pointers.
-    unsafe extern "C" fn offline_callback(
-        ptr: *mut bindings::cpufreq_policy,
-    ) -> kernel::ffi::c_int {
+    unsafe extern "C" fn offline_callback(ptr: *mut bindings::cpufreq_policy) -> c_int {
        from_result(|| {
            // SAFETY: The `ptr` is guaranteed to be valid by the contract with the C code for the
            // lifetime of `policy`.
@@ -1126,9 +1124,7 @@ impl<T: Driver> Registration<T> {
    ///
    /// - This function may only be called from the cpufreq C infrastructure.
    /// - The pointer arguments must be valid pointers.
-    unsafe extern "C" fn suspend_callback(
-        ptr: *mut bindings::cpufreq_policy,
-    ) -> kernel::ffi::c_int {
+    unsafe extern "C" fn suspend_callback(ptr: *mut bindings::cpufreq_policy) -> c_int {
        from_result(|| {
            // SAFETY: The `ptr` is guaranteed to be valid by the contract with the C code for the
            // lifetime of `policy`.
@@ -1143,7 +1139,7 @@ impl<T: Driver> Registration<T> {
    ///
    /// - This function may only be called from the cpufreq C infrastructure.
    /// - The pointer arguments must be valid pointers.
-    unsafe extern "C" fn resume_callback(ptr: *mut bindings::cpufreq_policy) -> kernel::ffi::c_int {
+    unsafe extern "C" fn resume_callback(ptr: *mut bindings::cpufreq_policy) -> c_int {
        from_result(|| {
            // SAFETY: The `ptr` is guaranteed to be valid by the contract with the C code for the
            // lifetime of `policy`.
@@ -1171,9 +1167,7 @@ impl<T: Driver> Registration<T> {
    ///
    /// - This function may only be called from the cpufreq C infrastructure.
    /// - The pointer arguments must be valid pointers.
-    unsafe extern "C" fn verify_callback(
-        ptr: *mut bindings::cpufreq_policy_data,
-    ) -> kernel::ffi::c_int {
+    unsafe extern "C" fn verify_callback(ptr: *mut bindings::cpufreq_policy_data) -> c_int {
        from_result(|| {
            // SAFETY: The `ptr` is guaranteed to be valid by the contract with the C code for the
            // lifetime of `policy`.
@@ -1188,9 +1182,7 @@ impl<T: Driver> Registration<T> {
    ///
    /// - This function may only be called from the cpufreq C infrastructure.
    /// - The pointer arguments must be valid pointers.
-    unsafe extern "C" fn setpolicy_callback(
-        ptr: *mut bindings::cpufreq_policy,
-    ) -> kernel::ffi::c_int {
+    unsafe extern "C" fn setpolicy_callback(ptr: *mut bindings::cpufreq_policy) -> c_int {
        from_result(|| {
            // SAFETY: The `ptr` is guaranteed to be valid by the contract with the C code for the
            // lifetime of `policy`.
@@ -1207,9 +1199,9 @@ impl<T: Driver> Registration<T> {
    /// - The pointer arguments must be valid pointers.
    unsafe extern "C" fn target_callback(
        ptr: *mut bindings::cpufreq_policy,
-        target_freq: u32,
-        relation: u32,
-    ) -> kernel::ffi::c_int {
+        target_freq: c_uint,
+        relation: c_uint,
+    ) -> c_int {
        from_result(|| {
            // SAFETY: The `ptr` is guaranteed to be valid by the contract with the C code for the
            // lifetime of `policy`.
@@ -1226,8 +1218,8 @@ impl<T: Driver> Registration<T> {
    /// - The pointer arguments must be valid pointers.
    unsafe extern "C" fn target_index_callback(
        ptr: *mut bindings::cpufreq_policy,
-        index: u32,
-    ) -> kernel::ffi::c_int {
+        index: c_uint,
+    ) -> c_int {
        from_result(|| {
            // SAFETY: The `ptr` is guaranteed to be valid by the contract with the C code for the
            // lifetime of `policy`.
@@ -1249,8 +1241,8 @@ impl<T: Driver> Registration<T> {
    /// - The pointer arguments must be valid pointers.
    unsafe extern "C" fn fast_switch_callback(
        ptr: *mut bindings::cpufreq_policy,
-        target_freq: u32,
-    ) -> kernel::ffi::c_uint {
+        target_freq: c_uint,
+    ) -> c_uint {
        // SAFETY: The `ptr` is guaranteed to be valid by the contract with the C code for the
        // lifetime of `policy`.
        let policy = unsafe { Policy::from_raw_mut(ptr) };
@@ -1263,10 +1255,10 @@ impl<T: Driver> Registration<T> {
    ///
    /// - This function may only be called from the cpufreq C infrastructure.
    unsafe extern "C" fn adjust_perf_callback(
-        cpu: u32,
-        min_perf: usize,
-        target_perf: usize,
-        capacity: usize,
+        cpu: c_uint,
+        min_perf: c_ulong,
+        target_perf: c_ulong,
+        capacity: c_ulong,
    ) {
        // SAFETY: The C API guarantees that `cpu` refers to a valid CPU number.
        let cpu_id = unsafe { CpuId::from_u32_unchecked(cpu) };
@@ -1284,8 +1276,8 @@ impl<T: Driver> Registration<T> {
    /// - The pointer arguments must be valid pointers.
    unsafe extern "C" fn get_intermediate_callback(
        ptr: *mut bindings::cpufreq_policy,
-        index: u32,
-    ) -> kernel::ffi::c_uint {
+        index: c_uint,
+    ) -> c_uint {
        // SAFETY: The `ptr` is guaranteed to be valid by the contract with the C code for the
        // lifetime of `policy`.
        let policy = unsafe { Policy::from_raw_mut(ptr) };
@@ -1305,8 +1297,8 @@ impl<T: Driver> Registration<T> {
    /// - The pointer arguments must be valid pointers.
    unsafe extern "C" fn target_intermediate_callback(
        ptr: *mut bindings::cpufreq_policy,
-        index: u32,
-    ) -> kernel::ffi::c_int {
+        index: c_uint,
+    ) -> c_int {
        from_result(|| {
            // SAFETY: The `ptr` is guaranteed to be valid by the contract with the C code for the
            // lifetime of `policy`.
@@ -1325,7 +1317,7 @@ impl<T: Driver> Registration<T> {
    /// # Safety
    ///
    /// - This function may only be called from the cpufreq C infrastructure.
-    unsafe extern "C" fn get_callback(cpu: u32) -> kernel::ffi::c_uint {
+    unsafe extern "C" fn get_callback(cpu: c_uint) -> c_uint {
        // SAFETY: The C API guarantees that `cpu` refers to a valid CPU number.
        let cpu_id = unsafe { CpuId::from_u32_unchecked(cpu) };

@@ -1351,7 +1343,7 @@ impl<T: Driver> Registration<T> {
    ///
    /// - This function may only be called from the cpufreq C infrastructure.
    /// - The pointer arguments must be valid pointers.
-    unsafe extern "C" fn bios_limit_callback(cpu: i32, limit: *mut u32) -> kernel::ffi::c_int {
+    unsafe extern "C" fn bios_limit_callback(cpu: c_int, limit: *mut c_uint) -> c_int {
        // SAFETY: The C API guarantees that `cpu` refers to a valid CPU number.
        let cpu_id = unsafe { CpuId::from_i32_unchecked(cpu) };

@@ -1371,8 +1363,8 @@ impl<T: Driver> Registration<T> {
    /// - The pointer arguments must be valid pointers.
    unsafe extern "C" fn set_boost_callback(
        ptr: *mut bindings::cpufreq_policy,
-        state: i32,
-    ) -> kernel::ffi::c_int {
+        state: c_int,
+    ) -> c_int {
        from_result(|| {
            // SAFETY: The `ptr` is guaranteed to be valid by the contract with the C code for the
            // lifetime of `policy`.
--- a/rust/kernel/cpumask.rs
+++ b/rust/kernel/cpumask.rs
@@ -14,9 +14,6 @@
 #[cfg(CONFIG_CPUMASK_OFFSTACK)]
 use core::ptr::{self, NonNull};

-#[cfg(not(CONFIG_CPUMASK_OFFSTACK))]
-use core::mem::MaybeUninit;
-
 use core::ops::{Deref, DerefMut};

 /// A CPU Mask.
@@ -239,10 +236,7 @@ pub fn new_zero(_flags: Flags) -> Result<Self, AllocError> {
            },

            #[cfg(not(CONFIG_CPUMASK_OFFSTACK))]
-            // SAFETY: FFI type is valid to be zero-initialized.
-            //
-            // INVARIANT: The associated memory is freed when the `CpumaskVar` goes out of scope.
-            mask: unsafe { core::mem::zeroed() },
+            mask: Cpumask(Opaque::zeroed()),
        })
    }

@@ -266,10 +260,7 @@ pub unsafe fn new(_flags: Flags) -> Result<Self, AllocError> {
                NonNull::new(ptr.cast()).ok_or(AllocError)?
            },
            #[cfg(not(CONFIG_CPUMASK_OFFSTACK))]
-            // SAFETY: Guaranteed by the safety requirements of the function.
-            //
-            // INVARIANT: The associated memory is freed when the `CpumaskVar` goes out of scope.
-            mask: unsafe { MaybeUninit::uninit().assume_init() },
+            mask: Cpumask(Opaque::uninit()),
        })
    }

--- a/rust/kernel/opp.rs
+++ b/rust/kernel/opp.rs
@@ -514,9 +514,9 @@ extern "C" fn config_clks(
        dev: *mut bindings::device,
        opp_table: *mut bindings::opp_table,
        opp: *mut bindings::dev_pm_opp,
-        _data: *mut kernel::ffi::c_void,
+        _data: *mut c_void,
        scaling_down: bool,
-    ) -> kernel::ffi::c_int {
+    ) -> c_int {
        from_result(|| {
            // SAFETY: 'dev' is guaranteed by the C code to be valid.
            let dev = unsafe { Device::get_device(dev) };
@@ -540,8 +540,8 @@ extern "C" fn config_regulators(
        old_opp: *mut bindings::dev_pm_opp,
        new_opp: *mut bindings::dev_pm_opp,
        regulators: *mut *mut bindings::regulator,
-        count: kernel::ffi::c_uint,
-    ) -> kernel::ffi::c_int {
+        count: c_uint,
+    ) -> c_int {
        from_result(|| {
            // SAFETY: 'dev' is guaranteed by the C code to be valid.
            let dev = unsafe { Device::get_device(dev) };
--- a/tools/power/cpupower/bindings/python/Makefile
+++ b/tools/power/cpupower/bindings/python/Makefile
@@ -4,20 +4,22 @@
 # This Makefile expects you have already run `make install-lib` in the lib
 # directory for the bindings to be created.

-CC := gcc
+CC ?= gcc
+# CFLAGS ?=
+LDFLAGS ?= -lcpupower
 HAVE_SWIG := $(shell if which swig >/dev/null 2>&1; then echo 1; else echo 0; fi)
 HAVE_PYCONFIG := $(shell if which python-config >/dev/null 2>&1; then echo 1; else echo 0; fi)

-PY_INCLUDE = $(firstword $(shell python-config --includes))
-INSTALL_DIR = $(shell python3 -c "import site; print(site.getsitepackages()[0])")
+PY_INCLUDE ?= $(firstword $(shell python-config --includes))
+INSTALL_DIR ?= $(shell python3 -c "import site; print(site.getsitepackages()[0])")

 all: _raw_pylibcpupower.so

 _raw_pylibcpupower.so: raw_pylibcpupower_wrap.o
-	$(CC) -shared -lcpupower raw_pylibcpupower_wrap.o -o _raw_pylibcpupower.so
+	$(CC) -shared $(LDFLAGS) raw_pylibcpupower_wrap.o -o _raw_pylibcpupower.so

 raw_pylibcpupower_wrap.o: raw_pylibcpupower_wrap.c
-	$(CC) -fPIC -c raw_pylibcpupower_wrap.c $(PY_INCLUDE)
+	$(CC) $(CFLAGS) $(PY_INCLUDE) -fPIC -c raw_pylibcpupower_wrap.c

 raw_pylibcpupower_wrap.c: raw_pylibcpupower.swg
 ifeq ($(HAVE_SWIG),0)
--- a/tools/power/cpupower/utils/idle_monitor/cpupower-monitor.c
+++ b/tools/power/cpupower/utils/idle_monitor/cpupower-monitor.c
@@ -121,10 +121,8 @@ void print_header(int topology_depth)
 	switch (topology_depth) {
 	case TOPOLOGY_DEPTH_PKG:
 		printf(" PKG|");
-		break;
 	case TOPOLOGY_DEPTH_CORE:
 		printf("CORE|");
-		break;
 	case	TOPOLOGY_DEPTH_CPU:
 		printf(" CPU|");
 		break;
@@ -167,10 +165,8 @@ void print_results(int topology_depth, int cpu)
 	switch (topology_depth) {
 	case TOPOLOGY_DEPTH_PKG:
 		printf("%4d|", cpu_top.core_info[cpu].pkg);
-		break;
 	case TOPOLOGY_DEPTH_CORE:
 		printf("%4d|", cpu_top.core_info[cpu].core);
-		break;
 	case TOPOLOGY_DEPTH_CPU:
 		printf("%4d|", cpu_top.core_info[cpu].cpu);
 		break;
--- a/tools/power/cpupower/utils/idle_monitor/mperf_monitor.c
+++ b/tools/power/cpupower/utils/idle_monitor/mperf_monitor.c
@@ -240,9 +240,9 @@ static int mperf_stop(void)
 	int cpu;

 	for (cpu = 0; cpu < cpu_count; cpu++) {
-		mperf_measure_stats(cpu);
-		mperf_get_tsc(&tsc_at_measure_end[cpu]);
 		clock_gettime(CLOCK_REALTIME, &time_end[cpu]);
+		mperf_get_tsc(&tsc_at_measure_end[cpu]);
+		mperf_measure_stats(cpu);
 	}

 	return 0;