linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-12-27 12:21:22 -05:00

Author	SHA1	Message	Date
Rafael J. Wysocki	b20a374902	cpufreq: intel_pstate: Eliminate some code duplication To eliminate some code duplication from the intel_pstate driver, move the core_get_val() function body to a new function called get_perf_ctl_val() and make both core_get_val() and atom_get_val() invoke it to carry out the same computation. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/2829273.mvXUDI8C0e@rafael.j.wysocki	2025-11-18 15:51:31 +01:00
Rafael J. Wysocki	62c95ea763	cpufreq: intel_pstate: Use mutex guard for driver locking Use guard(mutex)(&intel_pstate_driver_lock), or the scoped variant of it, wherever intel_pstate_driver_lock needs to be held. This allows some local variables and goto statements to be dropped as they are not necessary any more. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Link: https://patch.msgid.link/2807232.mvXUDI8C0e@rafael.j.wysocki	2025-11-12 20:54:57 +01:00
Rafael J. Wysocki	377e38859c	Merge back cpufreq material for 6.19	2025-11-12 20:50:58 +01:00
Srinivas Pandruvada	4b747cc628	cpufreq: intel_pstate: Check IDA only before MSR_IA32_PERF_CTL writes Commit `ac4e04d9e3` ("cpufreq: intel_pstate: Unchecked MSR aceess in legacy mode") introduced a check for feature X86_FEATURE_IDA to verify turbo mode support. Although this is the correct way to check for turbo mode support, it causes issues on some platforms that disable turbo during OS boot, but enable it later [1]. Before adding this feature check, users were able to get turbo mode frequencies by writing 0 to /sys/devices/system/cpu/intel_pstate/no_turbo post-boot. To restore the old behavior on the affected systems while still addressing the unchecked MSR issue on some Skylake-X systems, check X86_FEATURE_IDA only immediately before updates of MSR_IA32_PERF_CTL that may involve setting the Turbo Engage Bit (bit 32). Fixes: `ac4e04d9e3` ("cpufreq: intel_pstate: Unchecked MSR aceess in legacy mode") Reported-by: Aaron Rainbolt <arainbolt@kfocus.org> Closes: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2122531 [1] Tested-by: Aaron Rainbolt <arainbolt@kfocus.org> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Subject adjustment, changelog edits ] Link: https://patch.msgid.link/20251111010840.141490-1-srinivas.pandruvada@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-12 17:59:37 +01:00
Kuppuswamy Sathyanarayanan	790e826be8	cpufreq: intel_pstate: Add Diamond Rapids OOB mode support Prevent intel_pstate from loading when Out-of-Band (OOB) P-states mode is enabled. The OOB identification mechanism for Diamond Rapids servers is the same as for prior generation CPUs such as Granite Rapids. Add the Diamond Rapids CPU model to intel_pstate_cpu_oob_ids[] to ensure correct OOB handling. Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://patch.msgid.link/20251022215425.3566218-1-sathyanarayanan.kuppuswamy@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-10-30 20:13:14 +01:00
Rafael J. Wysocki	5313ec4a21	cpufreq: intel_pstate: Improve printing of debug messages Some debug messages generated by intel_pstate on a given hybrid system are only printed for some CPUs which is confusing, so modify the driver to print them for all CPUs. Also change those messages to avoid printing local variable names in them. Moreover, some debug messages printed by intel_pstate are quite hard to understand without looking at the code printing them, so make them somewhat clearer while at it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/8609836.T7Z3S40VBb@rafael.j.wysocki	2025-10-20 21:23:36 +02:00
Rafael J. Wysocki	d852b6f67b	cpufreq: intel_pstate: hybrid: Adjust energy model rules Instead of using HWP-to-frequency scaling factors for computing cost coefficients in the energy model used on hybrid systems, which is fragile, rely on CPU type information that is easily accessible now and the information on whether or not L3 cache is present for this purpose. This also allows the cost coefficients for P-cores to be adjusted so that they start to be populated somewhat earlier (that is, before E-cores are loaded up to their full capacity). In addition to the above, replace an inaccurate comment regarding the reason why the freq value is added to the cost in hybrid_get_cost(). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Reviewed-by: Yaxiong Tian <tianyaxiong@kylinos.cn> Link: https://patch.msgid.link/5932894.DvuYhMxLoT@rafael.j.wysocki	2025-10-20 21:22:21 +02:00
Rafael J. Wysocki	c17add7349	cpufreq: intel_pstate: Add and use hybrid_has_l3() Introduce a function for checking whether or not a given CPU has L3 cache, called hybrid_has_l3(), and use it in hybrid_get_cost() for computing cost coefficients associated with a given perf domain. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/13884343.uLZWGnKmhe@rafael.j.wysocki	2025-10-20 21:20:49 +02:00
Rafael J. Wysocki	528dde6619	cpufreq: intel_pstate: Add and use hybrid_get_cpu_type() Introduce a function for identifying the type of a given CPU in a hybrid system, called hybrid_get_cpu_type(), and use if for hybrid scaling factor determination in hwp_get_cpu_scaling(). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/1954386.tdWV9SEqCh@rafael.j.wysocki	2025-10-20 21:20:49 +02:00
Rafael J. Wysocki	c51f0d3b6e	Merge back earlier cpufreq material for 6.18	2025-09-24 21:32:28 +02:00
Yaxiong Tian	02d09026a8	cpufreq: intel_pstate: Use likely() optimization in intel_pstate_sample() The comment above the condition `if (cpu->last_sample_time)` clearly indicates that the branch is taken for the vast majority of invocations after the first sample in a cycle. The first sample is a one-time initialization case. Add likely() hint to the condition to improve branch prediction for this performance-critical path in intel_pstate_sample(). Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-09-19 23:21:01 +02:00
Rafael J. Wysocki	46c435cbaf	cpufreq: intel_pstate: Enable HWP without EPP if DEC is enabled So far, HWP has never been enabled without EPP (Energy-Performance Preference) interface support, since the lack of the latter indicates an incomplete implementation of HWP, which was the case on early development vehicle platforms. However, HWP can be expected to work if DEC (Dynamic Efficiency Control) is enabled as indicated by setting bit 27 in MSR_IA32_POWER_CTL (DEC enable bit). Accordingly, allow HWP to be enabled if the EPP interface is not supported so long as DEC is enabled in the processor. Still, the EPP control sysfs interface is useless when EPP is not supported, so do not expose it in that case. Link: https://lore.kernel.org/linux-pm/20250904000608.260817-2-srinivas.pandruvada@linux.intel.com/ Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Co-developed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-09-19 23:05:54 +02:00
Rafael J. Wysocki	e042354147	PM: EM: Add function for registering a PD without capacity update The intel_pstate driver manages CPU capacity changes itself and it does not need an update of the capacity of all CPUs in the system to be carried out after registering a PD. Moreover, in some configurations (for instance, an SMT-capable hybrid x86 system booted with nosmt in the kernel command line) the em_check_capacity_update() call at the end of em_dev_register_perf_domain() always fails and reschedules itself to run once again in 1 s, so effectively it runs in vain every 1 s forever. To address this, introduce a new variant of em_dev_register_perf_domain(), called em_dev_register_pd_no_update(), that does not invoke em_check_capacity_update(), and make intel_pstate use it instead of the original. Fixes: `7b010f9b90` ("cpufreq: intel_pstate: EAS support for hybrid platforms") Closes: https://lore.kernel.org/linux-pm/40212796-734c-4140-8a85-854f72b8144d@panix.com/ Reported-by: Kenneth R. Crudup <kenny@panix.com> Tested-by: Kenneth R. Crudup <kenny@panix.com> Cc: 6.16+ <stable@vger.kernel.org> # 6.16+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-09-10 12:03:19 +02:00
Rafael J. Wysocki	ae1bdd23b9	cpufreq: intel_pstate: Adjust frequency percentage computations Adjust frequency percentage computations in update_cpu_qos_request() to avoid going above the exact numerical percentage in the FREQ_QOS_MAX case. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Zihuan Zhang <zhangzihuan@kylinos.cn> Link: https://patch.msgid.link/3395556.44csPzL39Z@rafael.j.wysocki [ rjw: Rename "cpu" to "cpudata" and "cpunum" to "cpu" ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-09-09 12:58:55 +02:00
Rafael J. Wysocki	f1bbf5bbf2	cpufreq: intel_pstate: Rearrange freq QoS updates using __free() Move the code from the for_each_possible_cpu() loop in update_qos_request() to a separate function and use __free() for cpufreq policy reference counting in it to avoid having to call cpufreq_cpu_put() repeatedly (or using goto). While at it, rename update_qos_request() to update_qos_requests() because it updates multiple requests in one go. No intentional functional impact. Link: https://lore.kernel.org/linux-pm/CAJZ5v0gN1T5woSF0tO=TbAh+2-sWzxFjWyDbB7wG2TFCOU01iQ@mail.gmail.com/ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Zihuan Zhang <zhangzihuan@kylinos.cn> Link: https://patch.msgid.link/3026597.e9J7NaK4W3@rafael.j.wysocki [ rjw: Rename "cpu" to "cpudata" and "cpunum" to "cpu" in new code ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-09-09 12:58:54 +02:00
Rafael J. Wysocki	69e5d50fcf	cpufreq: intel_pstate: Fix object lifecycle issue in update_qos_request() The cpufreq_cpu_put() call in update_qos_request() takes place too early because the latter subsequently calls freq_qos_update_request() that indirectly accesses the policy object in question through the QoS request object passed to it. Fortunately, update_qos_request() is called under intel_pstate_driver_lock, so this issue does not matter for changing the intel_pstate operation mode, but it theoretically can cause a crash to occur on CPU device hot removal (which currently can only happen in virt, but it is formally supported nevertheless). Address this issue by modifying update_qos_request() to drop the reference to the policy later. Fixes: `da5c504c7a` ("cpufreq: intel_pstate: Implement QoS supported freq constraints") Cc: 5.4+ <stable@vger.kernel.org> # 5.4+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Zihuan Zhang <zhangzihuan@kylinos.cn> Link: https://patch.msgid.link/2255671.irdbgypaU6@rafael.j.wysocki	2025-09-09 12:58:54 +02:00
Srinivas Pandruvada	42c74f6b1c	cpufreq: intel_pstate: Remove EPB-related code The intel_pstate driver does not enable HWP mode when CPUID.06H:EAX[10] is not set, indicating that EPP (Energy Performance Preference) is not supported by the hardware. When EPP is unavailable, the system falls back to using EPB (Energy Performance Bias) if the feature is supported. However, since the intel_pstate driver will not enable HWP in this scenario, any EPB-related code becomes unreachable and irrelevant. Remove the EPB handling code paths simplifying the driver logic and reducing code size. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/20250904000608.260817-1-srinivas.pandruvada@linux.intel.com [ rjw: Subject adjustment ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-09-05 20:57:04 +02:00
Rafael J. Wysocki	5f2ec3536d	cpufreq: intel_pstate: Rearrange variable declaration involving __free() Follow cleanup.h recommendations and define and assign a variable in one statement when __free() is used. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Zihuan Zhang <zhangzihuan@kylinos.cn> Link: https://patch.msgid.link/2251447.irdbgypaU6@rafael.j.wysocki	2025-09-05 20:47:52 +02:00
Srinivas Pandruvada	3ead77989c	cpufreq: intel_pstate: Support Clearwater Forest OOB mode Prevent intel_pstate from loading when OOB (Out Of Band) P-states mode is enabled. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/20250808145122.4057208-1-srinivas.pandruvada@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-08-11 21:49:47 +02:00
Li RongQing	fc64e04215	cpufreq: intel_pstate: Add Granite Rapids support in no-HWP mode Users may disable HWP in firmware, in which case intel_pstate wouldn't load unless the CPU model is explicitly supported. Signed-off-by: Li RongQing <lirongqing@baidu.com> Link: https://patch.msgid.link/20250623105601.3924-1-lirongqing@baidu.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-07-14 19:38:10 +02:00
Rafael J. Wysocki	1cefe495ca	cpufreq: intel_pstate: Always use HWP_DESIRED_PERF in passive mode In the passive mode, intel_cpufreq_update_pstate() sets HWP_MIN_PERF in accordance with the target frequency to ensure delivering adequate performance, but it sets HWP_DESIRED_PERF to 0, so the processor has no indication that the desired performance level is actually equal to the floor one. This may cause it to choose a performance point way above the desired level. Moreover, this is inconsistent with intel_cpufreq_adjust_perf() which actually sets HWP_DESIRED_PERF in accordance with the target performance value. Address this by adjusting intel_cpufreq_update_pstate() to pass target_pstate as both the minimum and the desired performance levels to intel_cpufreq_hwp_update(). Fixes: `a365ab6b9d` ("cpufreq: intel_pstate: Implement the ->adjust_perf() callback") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Shashank Balaji <shashank.mahadasyam@sony.com> Link: https://patch.msgid.link/6173276.lOV4Wx5bFT@rjwysocki.net	2025-07-14 19:36:43 +02:00
Linus Torvalds	c89756bcf4	Merge tag 'pm-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "Once again, the changes are dominated by cpufreq updates, but this time the majority of them are cpufreq core changes, mostly related to the introduction of policy locking guards and __free() usage, and fixes related to boost handling. Still, there is also a significant update of the intel_pstate driver making it register an energy model when running on a hybrid platform which is used for enabling energy-aware scheduling (EAS) if the driver operates in the passive mode (and schedutil is used as the cpufreq governor for all CPUs which is the passive mode default). There are some amd-pstate driver updates too, for a good measure, including the "Requested CPU Min frequency" BIOS option support and new online/offline callbacks. In the cpuidle space, the most significant change is the addition of a C1 demotion on/off sysfs knob to intel_idle which should help some users to configure their systems more precisely. There is also the conversion of the PSCI cpuidle driver to a faux device one and there are two small updates of cpuidle governors. Device power management is also modified quite a bit, especially the handling of devices with asynchronous suspend and resume enabled during system transitions. They are now going to be handled more asynchronously during suspend transitions and somewhat less aggressively during resume transitions. Apart from the above, the operating performance points (OPP) library is now going to use mutex locking guards and scope-based cleanup helpers and there is the usual bunch of assorted fixes and code cleanups. Specifics: - Fix potential division-by-zero error in em_compute_costs() (Yaxiong Tian) - Fix typos in energy model documentation and example driver code (Moon Hee Lee, Atul Kumar Pant) - Rearrange the energy model management code and add a new function for adjusting a CPU energy model after adjusting the capacity of the given CPU to it (Rafael Wysocki) - Refactor cpufreq_online(), add and use cpufreq policy locking guards, use __free() in policy reference counting, and clean up core cpufreq code on top of that (Rafael Wysocki) - Fix boost handling on CPU suspend/resume and sysfs updates (Viresh Kumar) - Fix des_perf clamping with max_perf in amd_pstate_update() (Dhananjay Ugwekar) - Add offline, online and suspend callbacks to the amd-pstate driver, rename and use the existing amd_pstate_epp callbacks in it (Dhananjay Ugwekar) - Add support for the "Requested CPU Min frequency" BIOS option to the amd-pstate driver (Dhananjay Ugwekar) - Reset amd-pstate driver mode after running selftests (Swapnil Sapkal) - Avoid shadowing ret in amd_pstate_ut_check_driver() (Nathan Chancellor) - Add helper for governor checks to the schedutil cpufreq governor and move cpufreq-specific EAS checks to cpufreq (Rafael Wysocki) - Populate the cpu_capacity sysfs entries from the intel_pstate driver after registering asym capacity support (Ricardo Neri) - Add support for enabling Energy-aware scheduling (EAS) to the intel_pstate driver when operating in the passive mode on a hybrid platform (Rafael Wysocki) - Drop redundant cpus_read_lock() from store_local_boost() in the cpufreq core (Seyediman Seyedarab) - Replace sscanf() with kstrtouint() in the cpufreq code and use a symbol instead of a raw number in it (Bowen Yu) - Add support for autonomous CPU performance state selection to the CPPC cpufreq driver (Lifeng Zheng) - OPP: Add dev_pm_opp_set_level() (Praveen Talari) - Introduce scope-based cleanup headers and mutex locking guards in OPP core (Viresh Kumar) - Switch OPP to use kmemdup_array() (Zhang Enpei) - Optimize bucket assignment when next_timer_ns equals KTIME_MAX in the menu cpuidle governor (Zhongqiu Han) - Convert the cpuidle PSCI driver to a faux device one (Sudeep Holla) - Add C1 demotion on/off sysfs knob to the intel_idle driver (Artem Bityutskiy) - Fix typos in two comments in the teo cpuidle governor (Atul Kumar Pant) - Fix denying of auto suspend in pm_suspend_timer_fn() (Charan Teja Kalla) - Move debug runtime PM attributes to runtime_attrs[] (Rafael Wysocki) - Add new devm_ functions for enabling runtime PM and runtime PM reference counting (Bence Csókás) - Remove size arguments from strscpy() calls in the hibernation core code (Thorsten Blum) - Adjust the handling of devices with asynchronous suspend enabled during system suspend and resume to start resuming them immediately after resuming their parents and to start suspending such a device immediately after suspending its first child (Rafael Wysocki) - Adjust messages printed during tasks freezing to avoid using pr_cont() (Andrew Sayers, Paul Menzel) - Clean up unnecessary usage of !! in pm_print_times_init() (Zihuan Zhang) - Add missing wakeup source attribute relax_count to sysfs and remove the space character at the end ofi the string produced by pm_show_wakelocks() (Zijun Hu) - Add configurable pm_test delay for hibernation (Zihuan Zhang) - Disable asynchronous suspend in ucsi_ccg_probe() to prevent the cypd4226 device on Tegra boards from suspending prematurely (Jon Hunter) - Unbreak printing PM debug messages during hibernation and clean up some related code (Rafael Wysocki) - Add a systemd service to run cpupower and change cpupower binding's Makefile to use -lcpupower (John B. Wyatt IV, Francesco Poli)" * tag 'pm-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (72 commits) cpufreq: CPPC: Add support for autonomous selection cpufreq: Update sscanf() to kstrtouint() cpufreq: Replace magic number OPP: switch to use kmemdup_array() PM: freezer: Rewrite restarting tasks log to remove stray done. PM: runtime: fix denying of auto suspend in pm_suspend_timer_fn() cpufreq: drop redundant cpus_read_lock() from store_local_boost() cpupower: do not install files to /etc/default/ cpupower: do not call systemctl at install time cpupower: do not write DESTDIR to cpupower.service PM: sleep: Introduce pm_sleep_transition_in_progress() cpufreq/amd-pstate: Avoid shadowing ret in amd_pstate_ut_check_driver() cpufreq: intel_pstate: Document hybrid processor support cpufreq: intel_pstate: EAS: Increase cost for CPUs using L3 cache cpufreq: intel_pstate: EAS support for hybrid platforms PM: EM: Introduce em_adjust_cpu_capacity() PM: EM: Move CPU capacity check to em_adjust_new_capacity() PM: EM: Documentation: Fix typos in example driver code cpufreq: Drop policy locking from cpufreq_policy_is_good_for_eas() PM: sleep: Introduce pm_suspend_in_progress() ...	2025-05-27 16:48:47 -07:00
Rafael J. Wysocki	05cf8b8c51	cpufreq: intel_pstate: EAS: Increase cost for CPUs using L3 cache On some hybrid platforms some efficient CPUs (E-cores) are not connected to the L3 cache, but there are no other differences between them and the other E-cores that use L3. In that case, it is generally more efficient to run "light" workloads on the E-cores that do not use L3 and allow all of the cores using L3, including P-cores, to go into idle states. For this reason, slightly increase the cost for all CPUs sharing the L3 cache to make EAS prefer CPUs that do not use it to the other CPUs of the same type (if any). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/2032776.usQuhbGJ8B@rjwysocki.net	2025-05-13 14:35:25 +02:00
Rafael J. Wysocki	7b010f9b90	cpufreq: intel_pstate: EAS support for hybrid platforms Modify intel_pstate to register EM perf domains for CPUs on hybrid platforms without SMT which causes EAS to be enabled on them when schedutil is used as the cpufreq governor (which requires intel_pstate to operate in the passive mode). This change is targeting platforms (for example, Lunar Lake) where the "little" CPUs (E-cores) are always more energy-efficient than the "big" or "performance" CPUs (P-cores) when run at the same HWP performance level, so it is sufficient to tell EAS that E-cores are always preferred (so long as there is enough spare capacity on one of them to run the given task). However, migrating tasks between CPUs of the same type too often is not desirable because it may hurt both performance and energy efficiency due to leaving warm caches behind. For this reason, register a separate perf domain for each CPU and choose the cost values for them so that the cost mostly depends on the CPU type, but there is also a small component of it depending on the performance level (utilization) which helps to balance the load between CPUs of the same type. The cost component related to the CPU type is computed with the help of the observation that the IPC metric value for a given CPU is inversely proportional to its performance-to-frequency scaling factor and the cost of running code on it can be assumed to be roughly proportional to that IPC ratio (in principle, the higher the IPC ratio, the more resources are utilized when running at a given frequency, so the cost should be higher). For all CPUs that are online at the system initialization time, EM perf domains are registered when the driver starts up, after asymmetric capacity support has been enabled. For the CPUs that become online later, EM perf domains are registered after setting the asymmetric capacity for them. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Christian Loehle <christian.loehle@arm.com> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Link: https://patch.msgid.link/6057101.MhkbZ0Pkbq@rjwysocki.net	2025-05-13 14:35:25 +02:00
Ricardo Neri	f1a50492f5	cpufreq: intel_pstate: Populate the cpu_capacity sysfs entries Intel hybrid processors have CPUs of different capacity. Populate the interface /sys/devices/system/cpu/cpuN/cpu_capacity. This interface uses the per-CPU variable `cpu_scale`. On x86 this variable has no other use besides feeding the sysfs entries. Initialize it when setting CPU capacity for the scheduler and scale-invariant code. Feed it with arch_scale_cpu_capacity() as it gives capacity normalized to the interval [0, SCHED_CAPACITY_SCALE]. Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-05-07 21:56:55 +02:00
Ingo Molnar	570d58b12f	Merge tag 'v6.15-rc5' into x86/msr, to pick up fixes and to resolve conflicts Conflicts: drivers/cpufreq/intel_pstate.c Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-05-06 19:42:00 +02:00
Rafael J. Wysocki	400da808fd	Merge back cpufreq material for 6.16	2025-05-04 12:35:42 +02:00
Ingo Molnar	0c7b20b852	Merge tag 'v6.15-rc4' into x86/msr, to pick up fixes and resolve conflicts Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-05-02 09:43:44 +02:00
Srinivas Pandruvada	ac4e04d9e3	cpufreq: intel_pstate: Unchecked MSR aceess in legacy mode When turbo mode is unavailable on a Skylake-X system, executing the command: # echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo results in an unchecked MSR access error: WRMSR to 0x199 (attempted to write 0x0000000100001300). This issue was reproduced on an OEM (Original Equipment Manufacturer) system and is not a common problem across all Skylake-X systems. This error occurs because the MSR 0x199 Turbo Engage Bit (bit 32) is set when turbo mode is disabled. The issue arises when intel_pstate fails to detect that turbo mode is disabled. Here intel_pstate relies on MSR_IA32_MISC_ENABLE bit 38 to determine the status of turbo mode. However, on this system, bit 38 is not set even when turbo mode is disabled. According to the Intel Software Developer's Manual (SDM), the BIOS sets this bit during platform initialization to enable or disable opportunistic processor performance operations. Logically, this bit should be set in such cases. However, the SDM also specifies that "OS and applications must use CPUID leaf 06H to detect processors with opportunistic processor performance operations enabled." Therefore, in addition to checking MSR_IA32_MISC_ENABLE bit 38, verify that CPUID.06H:EAX[1] is 0 to accurately determine if turbo mode is disabled. Fixes: `4521e1a0ce` ("cpufreq: intel_pstate: Reflect current no_turbo state correctly") Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-04-30 16:27:34 +02:00
Rafael J. Wysocki	464dc75dfe	Merge back earlier cpufreq material for 6.16	2025-04-19 12:49:00 +02:00
Rafael J. Wysocki	18605e9525	cpufreq: intel_pstate: Fix hwp_get_cpu_scaling() Commit `b52aaeeadf` ("cpufreq: intel_pstate: Avoid SMP calls to get cpu-type") introduced two issues into hwp_get_cpu_scaling(). First, it made that function use the CPU type of the CPU running the code even though the target CPU is passed as the argument to it and second, it used smp_processor_id() for that even though hwp_get_cpu_scaling() runs in preemptible context. Fix both of these problems by simply passing "cpu" to cpu_data(). Fixes: `b52aaeeadf` ("cpufreq: intel_pstate: Avoid SMP calls to get cpu-type") Link: https://lore.kernel.org/linux-pm/20250412103434.5321-1-xry111@xry111.site/ Reported-by: Xi Ruoyao <xry111@xry111.site> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/12659608.O9o76ZdvQC@rjwysocki.net	2025-04-14 17:33:25 +02:00
Ingo Molnar	c895ecdab2	x86/msr: Rename 'wrmsrl_on_cpu()' to 'wrmsrq_on_cpu()' Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juergen Gross <jgross@suse.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Xin Li <xin@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>	2025-04-10 11:59:05 +02:00
Ingo Molnar	d7484babd2	x86/msr: Rename 'rdmsrl_on_cpu()' to 'rdmsrq_on_cpu()' Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juergen Gross <jgross@suse.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Xin Li <xin@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>	2025-04-10 11:59:00 +02:00
Ingo Molnar	5e404cb7ac	x86/msr: Rename 'rdmsrl_safe_on_cpu()' to 'rdmsrq_safe_on_cpu()' Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juergen Gross <jgross@suse.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Xin Li <xin@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>	2025-04-10 11:58:49 +02:00
Ingo Molnar	6fa17efe45	x86/msr: Rename 'wrmsrl_safe()' to 'wrmsrq_safe()' Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juergen Gross <jgross@suse.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Xin Li <xin@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>	2025-04-10 11:58:44 +02:00
Ingo Molnar	6fe22abacd	x86/msr: Rename 'rdmsrl_safe()' to 'rdmsrq_safe()' Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juergen Gross <jgross@suse.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Xin Li <xin@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>	2025-04-10 11:58:38 +02:00
Ingo Molnar	78255eb239	x86/msr: Rename 'wrmsrl()' to 'wrmsrq()' Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juergen Gross <jgross@suse.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Xin Li <xin@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>	2025-04-10 11:58:33 +02:00
Ingo Molnar	c435e608cf	x86/msr: Rename 'rdmsrl()' to 'rdmsrq()' Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juergen Gross <jgross@suse.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Xin Li <xin@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>	2025-04-10 11:58:27 +02:00
Rafael J. Wysocki	eaff6b62d3	cpufreq: Pass policy pointer to ->update_limits() Since cpufreq_update_limits() obtains a cpufreq policy pointer for the given CPU and reference counts the corresponding policy object, it may as well pass the policy pointer to the cpufreq driver's ->update_limits() callback which allows that callback to avoid invoking cpufreq_cpu_get() for the same CPU. Accordingly, redefine ->update_limits() to take a policy pointer instead of a CPU number and update both drivers implementing it, intel_pstate and amd-pstate, as needed. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Acked-by: Sudeep Holla <sudeep.holla@arm.com> Tested-by: Sudeep Holla <sudeep.holla@arm.com> Link: https://patch.msgid.link/8560367.NyiUUSuA9g@rjwysocki.net	2025-04-09 21:22:18 +02:00
Rafael J. Wysocki	973207ae3d	cpufreq: intel_pstate: Rearrange max frequency updates handling code Rename __intel_pstate_update_max_freq() to intel_pstate_update_max_freq() and move the cpufreq policy reference counting and locking into it (and implement the locking with the recently introduced cpufreq policy "write" locking guard). No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/2315023.iZASKD2KPV@rjwysocki.net	2025-04-09 21:22:08 +02:00
Linus Torvalds	7d20aa5c32	Merge tag 'pm-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These are dominated by cpufreq updates which in turn are dominated by updates related to boost support in the core and drivers and amd-pstate driver optimizations. Apart from the above, there are some cpuidle updates including a rework of the most recent idle intervals handling in the venerable menu governor that leads to significant improvements in some performance benchmarks, as the governor is now more likely to predict a shorter idle duration in some cases, and there are updates of the core device power management code, mostly related to system suspend and resume, that should help to avoid potential issues arising when the drivers of devices depending on one another want to use different optimizations. There is also a usual collection of assorted fixes and cleanups, including removal of some unused code. Specifics: - Manage sysfs attributes and boost frequencies efficiently from cpufreq core to reduce boilerplate code in drivers (Viresh Kumar) - Minor cleanups to cpufreq drivers (Aaron Kling, Benjamin Schneider, Dhananjay Ugwekar, Imran Shaik, zuoqian) - Migrate some cpufreq drivers to using for_each_present_cpu() (Jacky Bai) - cpufreq-qcom-hw DT binding fixes (Krzysztof Kozlowski) - Use str_enable_disable() helper in cpufreq_online() (Lifeng Zheng) - Optimize the amd-pstate driver to avoid cases where call paths end up calling the same writes multiple times and needlessly caching variables through code reorganization, locking overhaul and tracing adjustments (Mario Limonciello, Dhananjay Ugwekar) - Make it possible to avoid enabling capacity-aware scheduling (CAS) in the intel_pstate driver and relocate a check for out-of-band (OOB) platform handling in it to make it detect OOB before checking HWP availability (Rafael Wysocki) - Fix dbs_update() to avoid inadvertent conversions of negative integer values to unsigned int which causes CPU frequency selection to be inaccurate in some cases when the "conservative" cpufreq governor is in use (Jie Zhan) - Update the handling of the most recent idle intervals in the menu cpuidle governor to prevent useful information from being discarded by it in some cases and improve the prediction accuracy (Rafael Wysocki) - Make it possible to tell the intel_idle driver to ignore its built-in table of idle states for the given processor, clean up the handling of auto-demotion disabling on Baytrail and Cherrytrail chips in it, and update its MAINTAINERS entry (David Arcari, Artem Bityutskiy, Rafael Wysocki) - Make some cpuidle drivers use for_each_present_cpu() instead of for_each_possible_cpu() during initialization to avoid issues occurring when nosmp or maxcpus=0 are used (Jacky Bai) - Clean up the Energy Model handling code somewhat (Rafael Wysocki) - Use kfree_rcu() to simplify the handling of runtime Energy Model updates (Li RongQing) - Add an entry for the Energy Model framework to MAINTAINERS as properly maintained (Lukasz Luba) - Address RCU-related sparse warnings in the Energy Model code (Rafael Wysocki) - Remove ENERGY_MODEL dependency on SMP and allow it to be selected when DEVFREQ is set without CPUFREQ so it can be used on a wider range of systems (Jeson Gao) - Unify error handling during runtime suspend and runtime resume in the core to help drivers to implement more consistent runtime PM error handling (Rafael Wysocki) - Drop a redundant check from pm_runtime_force_resume() and rearrange documentation related to __pm_runtime_disable() (Rafael Wysocki) - Rework the handling of the "smart suspend" driver flag in the PM core to avoid issues hat may occur when drivers using it depend on some other drivers and clean up the related PM core code (Rafael Wysocki, Colin Ian King) - Fix the handling of devices with the power.direct_complete flag set if device_suspend() returns an error for at least one device to avoid situations in which some of them may not be resumed (Rafael Wysocki) - Use mutex_trylock() in hibernate_compressor_param_set() to avoid a possible deadlock that may occur if the "compressor" hibernation module parameter is accessed during the registration of a new ieee80211 device (Lizhi Xu) - Suppress sleeping parent warning in device_pm_add() in the case when new children are added under a device with the power.direct_complete set after it has been processed by device_resume() (Xu Yang) - Remove needless return in three void functions related to system wakeup (Zijun Hu) - Replace deprecated kmap_atomic() with kmap_local_page() in the hibernation core code (David Reaver) - Remove unused helper functions related to system sleep (David Alan Gilbert) - Clean up s2idle_enter() so it does not lock and unlock CPU offline in vain and update comments in it (Ulf Hansson) - Clean up broken white space in dpm_wait_for_children() (Geert Uytterhoeven) - Update the cpupower utility to fix lib version-ing in it and memory leaks in error legs, remove hard-coded values, and implement CPU physical core querying (Thomas Renninger, John B. Wyatt IV, Shuah Khan, Yiwei Lin, Zhongqiu Han)" * tag 'pm-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (139 commits) PM: sleep: Fix bit masking operation dt-bindings: cpufreq: cpufreq-qcom-hw: Narrow properties on SDX75, SA8775p and SM8650 dt-bindings: cpufreq: cpufreq-qcom-hw: Drop redundant minItems:1 dt-bindings: cpufreq: cpufreq-qcom-hw: Add missing constraint for interrupt-names dt-bindings: cpufreq: cpufreq-qcom-hw: Add QCS8300 compatible cpufreq: Init cpufreq only for present CPUs PM: sleep: Fix handling devices with direct_complete set on errors cpuidle: Init cpuidle only for present CPUs PM: clk: Remove unused pm_clk_remove() PM: sleep: core: Fix indentation in dpm_wait_for_children() PM: s2idle: Extend comment in s2idle_enter() PM: s2idle: Drop redundant locks when entering s2idle PM: sleep: Remove unused pm_generic_ wrappers cpufreq: tegra186: Share policy per cluster cpupower: Make lib versioning scheme more obvious and fix version link PM: EM: Rework the depends on for CONFIG_ENERGY_MODEL PM: EM: Address RCU-related sparse warnings cpupower: Implement CPU physical core querying pm: cpupower: remove hard-coded topology depth values pm: cpupower: Fix cmd_monitor() error legs to free cpu_topology ...	2025-03-25 15:00:18 -07:00
Pawan Gupta	b52aaeeadf	cpufreq: intel_pstate: Avoid SMP calls to get cpu-type Intel pstate driver relies on SMP calls to get the cpu-type of a given CPU. Remove the SMP calls and instead use the cached value of cpu-type which is more efficient. [ mingo: Forward ported it. ] Suggested-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/r/20241211-add-cpu-type-v5-2-2ae010f50370@linux.intel.com	2025-02-27 13:34:52 +01:00
Rafael J. Wysocki	ed7cad0504	cpufreq: intel_pstate: Relocate platform preference check Move the invocation of intel_pstate_platform_pwr_mgmt_exists() before checking whether or not HWP is enabled because it does not depend on any code running before it except for the vendor check and if CPU performance scaling is going to be carried out by the platform, all of the code that runs before that function (again, except for the vendor check) is redundant. This is not expected to alter any functionality except for the ordering of messages printed by intel_pstate_init() when it is going to return an error before attempting to register the driver. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/2776745.mvXUDI8C0e@rjwysocki.net	2025-02-21 18:09:21 +01:00
Rafael J. Wysocki	7802fce7dc	cpufreq: intel_pstate: Make it possible to avoid enabling CAS Capacity-aware scheduling (CAS) is enabled by default by intel_pstate on hybrid systems without SMT, but in some usage scenarios it may be more attractive to place tasks for maximum CPU performance regardless of the extra cost in terms of energy, which is the case on such systems when CAS is not enabled, so introduce a command line option to forbid intel_pstate to enable CAS. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by:Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/2781262.mvXUDI8C0e@rjwysocki.net	2025-02-18 20:47:33 +01:00
Christian Loehle	de51589f9b	cpufreq: intel_pstate: Use CPUFREQ_POLICY_UNKNOWN epp_policy uses the same values as cpufreq_policy.policy and resets to CPUFREQ_POLICY_UNKNOWN during offlining. Be consistent about it and initialize to CPUFREQ_POLICY_UNKNOWN instead of 0, too. No functional change intended. Signed-off-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/20241211122605.3048503-3-christian.loehle@arm.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-12-18 15:49:25 +01:00
Rafael J. Wysocki	20e20f83dd	cpufreq: intel_pstate: Drop Arrow Lake from "scaling factor" list Since HYBRID_SCALING_FACTOR_MTL is not going to be suitable for Arrow Lake in general, drop it from the "known hybrid scaling factors" list of platforms, so the scaling factor for it will be determined with the help of information provided by the platform firmware via CPPC. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/2307515.iZASKD2KPV@rjwysocki.net	2024-12-10 20:08:36 +01:00
Rafael J. Wysocki	9b18d536b1	cpufreq: intel_pstate: Use CPPC to get scaling factors The perf-to-frequency scaling factors are used by intel_pstate on hybrid platforms to cast performance levels to frequency on different types of CPUs which is needed because the generic cpufreq sysfs interface works in the frequency domain. For some hybrid platforms already in the field, the scaling factors are known, but for others (including some upcoming ones) they most likely will be different and the only way to get them that scales is to use information provided by the platform firmware. In this particular case, the requisite information can be obtained via CPPC. If the P-core hybrid scaling factor for the given processor model is not known, use CPPC to compute hybrid scaling factors for all CPUs. Since the current default hybrid scaling factor is only suitable for a few early hybrid platforms, add intel_hybrid_scaling_factor[] entries for them and initialize the scaling factor to zero ("unknown") by default. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/8476313.T7Z3S40VBb@rjwysocki.net	2024-12-10 20:08:36 +01:00
Rafael J. Wysocki	a8aaea4f6e	Merge back cpufreq material for 6.13	2024-11-14 12:23:32 +01:00
Srinivas Pandruvada	00e2c199cb	cpufreq: intel_pstate: Update Balance-performance EPP for Granite Rapids Update EPP default for balance_performance to 32. This will give better performance out of the box using Intel P-State powersave governor while still offering power savings compared to performance governor. This is in line with what has already been done for Emerald Rapids and Sapphire Rapids. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/20241112235946.368082-1-srinivas.pandruvada@linux.intel.com [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-11-13 12:59:02 +01:00
Rafael J. Wysocki	1a1030d10a	cpufreq: intel_pstate: Rearrange locking in hybrid_init_cpu_capacity_scaling() Notice that hybrid_init_cpu_capacity_scaling() only needs to hold hybrid_capacity_lock around __hybrid_init_cpu_capacity_scaling() calls, so introduce a "locked" wrapper around the latter and call it from the former. This allows to drop a local variable and a label that are not needed any more. Also, rename __hybrid_init_cpu_capacity_scaling() to __hybrid_refresh_cpu_capacity_scaling() for consistency. Interestingly enough, this fixes a locking issue introduced by commit `929ebc93cc` ("cpufreq: intel_pstate: Set asymmetric CPU capacity on hybrid systems") that put an arch_enable_hybrid_capacity_scale() call under hybrid_capacity_lock, which was a mistake because the latter is acquired in CPU hotplug paths and so it cannot be held around cpus_read_lock() calls. Link: https://lore.kernel.org/linux-pm/SJ1PR11MB6129EDBF22F8A90FC3A3EDC8B9582@SJ1PR11MB6129.namprd11.prod.outlook.com/ Fixes: `929ebc93cc` ("cpufreq: intel_pstate: Set asymmetric CPU capacity on hybrid systems") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reported-by: "Borah, Chaitanya Kumar" <chaitanya.kumar.borah@intel.com> Link: https://patch.msgid.link/12554508.O9o76ZdvQC@rjwysocki.net [ rjw: Changelog update ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-11-11 15:18:41 +01:00

1 2 3 4 5 ...

551 Commits