Fix inconsistent error handling for sscanf() return value check.
Implicit boolean conversion is used instead of explicit return
value checks. The code checks if (!sscanf(...)) which is incorrect
because:
1. sscanf returns the number of successfully parsed items
2. On success, it returns 1 (one item passed)
3. On failure, it returns 0 or EOF
4. The check 'if (!sscanf(...))' is wrong because it treats
success (1) as failure
All occurrences of sscanf() now uses explicit return value check.
With this behavior it returns '-EINVAL' when parsing fails (returns
0 or EOF), and continues when parsing succeeds (returns 1).
Signed-off-by: Sumeet Pawnikar <sumeet4linux@gmail.com>
[ rjw: Subject and changelog edits ]
Link: https://patch.msgid.link/20251207151549.202452-1-sumeet4linux@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The device becomes visible to userspace via device_register()
even before it fully initialized by idr_init(). If userspace
or another thread tries to register a zone immediately after
device_register(), the control_type_valid() will fail because
the control_type is not yet in the list. The IDR is not yet
initialized, so this race condition causes zone registration
failure.
Move idr_init() and list addition before device_register()
fix the race condition.
Signed-off-by: Sumeet Pawnikar <sumeet4linux@gmail.com>
[ rjw: Subject adjustment, empty line added ]
Link: https://patch.msgid.link/20251205190216.5032-1-sumeet4linux@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull devicetree updates from Rob Herring:
"DT bindings:
- Convert lattice,ice40-fpga-mgr, apm,xgene-storm-dma,
brcm,sr-thermal, amazon,al-thermal, brcm,ocotp, mt8173-mdp, Actions
Owl SPS, Marvell AP80x System Controller, Marvell CP110 System
Controller, cznic,moxtet, and apm,xgene-slimpro-mbox to DT schema
format
- Add i.MX95 fsl,irqsteer, MT8365 Mali Bifrost GPU, Anvo ANV32C81W
EEPROM, and Microchip pic64gx PLIC
- Add missing LGE, AMD Seattle, and APM X-Gene SoC platform
compatibles
- Updates to brcm,bcm2836-l1-intc, brcm,bcm2835-hvs, and bcm2711-hdmi
bindings to fix warnings on BCM2712 platforms
- Drop obsolete db8500-thermal.txt
- Treewide clean-up of extra blank lines and inconsistent quoting
- Ensure all .dtbo targets are applied to a base .dtb
- Speed up dt_binding_check by skipping running validation on empty
examples
DT core:
- Add of_machine_device_match() and of_machine_get_match_data()
helpers and convert users treewide
- Fix bounds checking of address properties in FDT code. Rework the
code to have a single implementation of the bounds checks.
- Rework of_irq_init() to ignore any implicit interrupt-parent (i.e.
in a parent node) on nodes without an interrupt. This matches the
spec description and fixes some RISC-V platforms.
- Avoid a spurious message on overlay removal
- Skip DT kunit tests on RISCV+ACPI"
* tag 'devicetree-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (55 commits)
dt-bindings: kbuild: Skip validating empty examples
dt-bindings: interrupt-controller: brcm,bcm2836-l1-intc: Drop interrupt-controller requirement
dt-bindings: display: Fix brcm,bcm2835-hvs bindings for BCM2712
dt-bindings: display: bcm2711-hdmi: Add interrupt details for BCM2712
of: Skip devicetree kunit tests when RISCV+ACPI doesn't populate root node
soc: tegra: Simplify with of_machine_device_match()
soc: qcom: ubwc: Simplify with of_machine_get_match_data()
powercap: dtpm: Simplify with of_machine_get_match_data()
platform: surface: Simplify with of_machine_get_match_data()
irqchip/atmel-aic: Simplify with of_machine_get_match_data()
firmware: qcom: scm: Simplify with of_machine_device_match()
cpuidle: big_little: Simplify with of_machine_device_match()
cpufreq: sun50i: Simplify with of_machine_device_match()
cpufreq: mediatek: Simplify with of_machine_get_match_data()
cpufreq: dt-platdev: Simplify with of_machine_get_match_data()
of: Add wrappers to match root node with OF device ID tables
dt-bindings: eeprom: at25: Add Anvo ANV32C81W
of/reserved_mem: Simplify the logic of __reserved_mem_alloc_size()
of/reserved_mem: Simplify the logic of fdt_scan_reserved_mem_reg_nodes()
of/reserved_mem: Simplify the logic of __reserved_mem_reserve_reg()
...
Currently, RAPL PMU support requires adding CPU model entries to
arch/x86/events/rapl.c for each new generation. However, RAPL MSRs are
not architectural and require platform-specific customization, making
arch/x86 an inappropriate location for this functionality.
The powercap subsystem already handles RAPL functionality and is the
natural place to consolidate all RAPL features. The powercap RAPL
driver already includes PMU support for TPMI-based RAPL interfaces,
making it straightforward to extend this support to MSR-based RAPL
interfaces as well.
This consolidation eliminates the need to maintain RAPL support in
multiple subsystems and provides a unified approach for both TPMI and
MSR-based RAPL implementations.
The MSR-based PMU support includes the following updates:
1. Register MSR-based PMU support for the supported platforms
and unregister it when no online CPUs remain in the package.
2. Remove existing checks that restrict RAPL PMU support to TPMI-based
interfaces and extend the logic to allow MSR-based RAPL interfaces.
3. Define a CPU model list to determine which processors should
register RAPL PMU interface through the powercap driver for
MSR-based RAPL, excluding those that support TPMI interface.
This list prevents conflicts with existing arch/x86 PMU code
that already registers RAPL PMU for some processors. Add
Panther Lake & Wildcat Lake to the CPU models list.
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw: Changelog edits ]
Link: https://patch.msgid.link/20251121000539.386069-3-sathyanarayanan.kuppuswamy@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The current read_raw() implementation of the TPMI, MMIO and MSR
interfaces does not distinguish between atomic and non-atomic callers.
rapl_msr_read_raw() uses rdmsrq_safe_on_cpu(), which can sleep and
issue cross CPU calls. When MSR-based RAPL PMU support is enabled, PMU
event handlers can invoke this function from atomic context where
sleeping or rescheduling is not allowed. In atomic context, the caller
is already executing on the target CPU, so a direct rdmsrq() is
sufficient.
To support such usage, introduce an atomic flag to the read_raw()
interface to allow callers pass the context information. Modify the
common RAPL code to propagate this flag, and set the flag to reflect
the calling contexts.
Utilize the atomic flag in rapl_msr_read_raw() to perform direct MSR
read with rdmsrq() when running in atomic context, and a sanity check
to ensure target CPU matches the current CPU for such use cases.
The TPMI and MMIO implementations do not require special atomic
handling, so the flag is ignored in those paths.
This is a preparatory patch for adding MSR-based RAPL PMU support.
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw: Subject tweak ]
Link: https://patch.msgid.link/20251121000539.386069-2-sathyanarayanan.kuppuswamy@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The get_pd_power_uw() function can crash with a NULL pointer dereference
when em_cpu_get() returns NULL. This occurs when a CPU becomes impossible
during runtime, causing get_cpu_device() to return NULL, which propagates
through em_cpu_get() and leads to a crash when em_span_cpus() dereferences
the NULL pointer.
Add a NULL check after em_cpu_get() and return 0 if unavailable,
matching the existing fallback behavior in __dtpm_cpu_setup().
Fixes: eb82bace89 ("powercap/drivers/dtpm: Scale the power with the load")
Signed-off-by: Sivan Zohar-Kotzer <sivany32@gmail.com>
Link: https://patch.msgid.link/20250701221355.96916-1-sivany32@gmail.com
[ rjw: Drop an excess empty code line ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The TPMI platform information provides a mapping of OOBMSM PCI devices to
logical CPUs. Since this mapping is consistent across all OOBMSM features
(e.g., TPMI, PMT, SDSi), it can be leveraged by multiple drivers. To
facilitate reuse, relocate the struct intel_tpmi_plat_info to intel_vsec.h,
renaming it to struct oobmsm_plat_info, making it accessible to other
features. While modifying headers, place them in alphabetical order.
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/20250703022832.1302928-11-david.e.box@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Add Bartlett Lake to the list of supported processors in the RAPL
common driver.
Acked-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Qiao Wei <wei.qiao@intel.com>
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
PL1 cannot be disabled on some platforms. The ENABLE bit is still set
after software clears it. This behavior leads to a scenario where, upon
user request to disable the Power Limit through the powercap sysfs, the
ENABLE bit remains set while the CLAMPING bit is inadvertently cleared.
According to the Intel Software Developer's Manual, the CLAMPING bit,
"When set, allows the processor to go below the OS requested P states in
order to maintain the power below specified Platform Power Limit value."
Thus this means the system may operate at higher power levels than
intended on such platforms.
Enhance the code to check ENABLE bit after writing to it, and stop
further processing if ENABLE bit cannot be changed.
Reported-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Fixes: 2d281d8196 ("PowerCap: Introduce Intel RAPL power capping driver")
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Link: https://patch.msgid.link/20250619071340.384782-1-rui.zhang@intel.com
[ rjw: Use str_enabled_disabled() instead of open-coded equivalent ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull power management updates from Rafael Wysocki:
"These are dominated by cpufreq updates which in turn are dominated by
updates related to boost support in the core and drivers and
amd-pstate driver optimizations.
Apart from the above, there are some cpuidle updates including a
rework of the most recent idle intervals handling in the venerable
menu governor that leads to significant improvements in some
performance benchmarks, as the governor is now more likely to predict
a shorter idle duration in some cases, and there are updates of the
core device power management code, mostly related to system suspend
and resume, that should help to avoid potential issues arising when
the drivers of devices depending on one another want to use different
optimizations.
There is also a usual collection of assorted fixes and cleanups,
including removal of some unused code.
Specifics:
- Manage sysfs attributes and boost frequencies efficiently from
cpufreq core to reduce boilerplate code in drivers (Viresh Kumar)
- Minor cleanups to cpufreq drivers (Aaron Kling, Benjamin Schneider,
Dhananjay Ugwekar, Imran Shaik, zuoqian)
- Migrate some cpufreq drivers to using for_each_present_cpu() (Jacky
Bai)
- cpufreq-qcom-hw DT binding fixes (Krzysztof Kozlowski)
- Use str_enable_disable() helper in cpufreq_online() (Lifeng Zheng)
- Optimize the amd-pstate driver to avoid cases where call paths end
up calling the same writes multiple times and needlessly caching
variables through code reorganization, locking overhaul and tracing
adjustments (Mario Limonciello, Dhananjay Ugwekar)
- Make it possible to avoid enabling capacity-aware scheduling (CAS)
in the intel_pstate driver and relocate a check for out-of-band
(OOB) platform handling in it to make it detect OOB before checking
HWP availability (Rafael Wysocki)
- Fix dbs_update() to avoid inadvertent conversions of negative
integer values to unsigned int which causes CPU frequency selection
to be inaccurate in some cases when the "conservative" cpufreq
governor is in use (Jie Zhan)
- Update the handling of the most recent idle intervals in the menu
cpuidle governor to prevent useful information from being discarded
by it in some cases and improve the prediction accuracy (Rafael
Wysocki)
- Make it possible to tell the intel_idle driver to ignore its
built-in table of idle states for the given processor, clean up the
handling of auto-demotion disabling on Baytrail and Cherrytrail
chips in it, and update its MAINTAINERS entry (David Arcari, Artem
Bityutskiy, Rafael Wysocki)
- Make some cpuidle drivers use for_each_present_cpu() instead of
for_each_possible_cpu() during initialization to avoid issues
occurring when nosmp or maxcpus=0 are used (Jacky Bai)
- Clean up the Energy Model handling code somewhat (Rafael Wysocki)
- Use kfree_rcu() to simplify the handling of runtime Energy Model
updates (Li RongQing)
- Add an entry for the Energy Model framework to MAINTAINERS as
properly maintained (Lukasz Luba)
- Address RCU-related sparse warnings in the Energy Model code
(Rafael Wysocki)
- Remove ENERGY_MODEL dependency on SMP and allow it to be selected
when DEVFREQ is set without CPUFREQ so it can be used on a wider
range of systems (Jeson Gao)
- Unify error handling during runtime suspend and runtime resume in
the core to help drivers to implement more consistent runtime PM
error handling (Rafael Wysocki)
- Drop a redundant check from pm_runtime_force_resume() and rearrange
documentation related to __pm_runtime_disable() (Rafael Wysocki)
- Rework the handling of the "smart suspend" driver flag in the PM
core to avoid issues hat may occur when drivers using it depend on
some other drivers and clean up the related PM core code (Rafael
Wysocki, Colin Ian King)
- Fix the handling of devices with the power.direct_complete flag set
if device_suspend() returns an error for at least one device to
avoid situations in which some of them may not be resumed (Rafael
Wysocki)
- Use mutex_trylock() in hibernate_compressor_param_set() to avoid a
possible deadlock that may occur if the "compressor" hibernation
module parameter is accessed during the registration of a new
ieee80211 device (Lizhi Xu)
- Suppress sleeping parent warning in device_pm_add() in the case
when new children are added under a device with the
power.direct_complete set after it has been processed by
device_resume() (Xu Yang)
- Remove needless return in three void functions related to system
wakeup (Zijun Hu)
- Replace deprecated kmap_atomic() with kmap_local_page() in the
hibernation core code (David Reaver)
- Remove unused helper functions related to system sleep (David Alan
Gilbert)
- Clean up s2idle_enter() so it does not lock and unlock CPU offline
in vain and update comments in it (Ulf Hansson)
- Clean up broken white space in dpm_wait_for_children() (Geert
Uytterhoeven)
- Update the cpupower utility to fix lib version-ing in it and memory
leaks in error legs, remove hard-coded values, and implement CPU
physical core querying (Thomas Renninger, John B. Wyatt IV, Shuah
Khan, Yiwei Lin, Zhongqiu Han)"
* tag 'pm-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (139 commits)
PM: sleep: Fix bit masking operation
dt-bindings: cpufreq: cpufreq-qcom-hw: Narrow properties on SDX75, SA8775p and SM8650
dt-bindings: cpufreq: cpufreq-qcom-hw: Drop redundant minItems:1
dt-bindings: cpufreq: cpufreq-qcom-hw: Add missing constraint for interrupt-names
dt-bindings: cpufreq: cpufreq-qcom-hw: Add QCS8300 compatible
cpufreq: Init cpufreq only for present CPUs
PM: sleep: Fix handling devices with direct_complete set on errors
cpuidle: Init cpuidle only for present CPUs
PM: clk: Remove unused pm_clk_remove()
PM: sleep: core: Fix indentation in dpm_wait_for_children()
PM: s2idle: Extend comment in s2idle_enter()
PM: s2idle: Drop redundant locks when entering s2idle
PM: sleep: Remove unused pm_generic_ wrappers
cpufreq: tegra186: Share policy per cluster
cpupower: Make lib versioning scheme more obvious and fix version link
PM: EM: Rework the depends on for CONFIG_ENERGY_MODEL
PM: EM: Address RCU-related sparse warnings
cpupower: Implement CPU physical core querying
pm: cpupower: remove hard-coded topology depth values
pm: cpupower: Fix cmd_monitor() error legs to free cpu_topology
...
Pull timer cleanups from Thomas Gleixner:
"A treewide hrtimer timer cleanup
hrtimers are initialized with hrtimer_init() and a subsequent store to
the callback pointer. This turned out to be suboptimal for the
upcoming Rust integration and is obviously a silly implementation to
begin with.
This cleanup replaces the hrtimer_init(T); T->function = cb; sequence
with hrtimer_setup(T, cb);
The conversion was done with Coccinelle and a few manual fixups.
Once the conversion has completely landed in mainline, hrtimer_init()
will be removed and the hrtimer::function becomes a private member"
* tag 'timers-cleanups-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (100 commits)
wifi: rt2x00: Switch to use hrtimer_update_function()
io_uring: Use helper function hrtimer_update_function()
serial: xilinx_uartps: Use helper function hrtimer_update_function()
ASoC: fsl: imx-pcm-fiq: Switch to use hrtimer_setup()
RDMA: Switch to use hrtimer_setup()
virtio: mem: Switch to use hrtimer_setup()
drm/vmwgfx: Switch to use hrtimer_setup()
drm/xe/oa: Switch to use hrtimer_setup()
drm/vkms: Switch to use hrtimer_setup()
drm/msm: Switch to use hrtimer_setup()
drm/i915/request: Switch to use hrtimer_setup()
drm/i915/uncore: Switch to use hrtimer_setup()
drm/i915/pmu: Switch to use hrtimer_setup()
drm/i915/perf: Switch to use hrtimer_setup()
drm/i915/gvt: Switch to use hrtimer_setup()
drm/i915/huc: Switch to use hrtimer_setup()
drm/amdgpu: Switch to use hrtimer_setup()
stm class: heartbeat: Switch to use hrtimer_setup()
i2c: Switch to use hrtimer_setup()
iio: Switch to use hrtimer_setup()
...
Now not only CPUs can use energy efficiency models, but GPUs
can also use. On the other hand, even with only one CPU, we can also
use energy_model to align control in thermal.
So remove the dependence of SMP, and add the DEVFREQ.
Signed-off-by: Jeson Gao <jeson.gao@unisoc.com>
[Added missing SMP config option in DTPM_CPU dependency]
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://patch.msgid.link/20250307132649.4056210-1-lukasz.luba@arm.com
[ rjw: Subject edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
We are going to apply a new series that conflicts with pending
work in x86/mm, so merge in x86/mm to avoid it, and also to
refresh the x86/cpu branch with fixes.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Fix a possible memory leak in the power capping subsystem (Joe Hattori).
* pm-powercap:
powercap: call put_device() on an error path in powercap_register_control_type()
powercap_register_control_type() calls device_register(), but does not
release the refcount of the device when it fails.
Call put_device() before returning an error to balance the refcount.
Since the kfree(control_type) will be done by powercap_release(), remove
the lines in powercap_register_control_type() before returning the error.
This bug was found by an experimental verifier that I am developing.
Signed-off-by: Joe Hattori <joe@pf.is.s.u-tokyo.ac.jp>
Link: https://patch.msgid.link/20250110010554.1583411-1-joe@pf.is.s.u-tokyo.ac.jp
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The continual trickle of small conversion patches is grating on me, and
is really not helping. Just get rid of the 'remove_new' member
function, which is just an alias for the plain 'remove', and had a
comment to that effect:
/*
* .remove_new() is a relic from a prototype conversion of .remove().
* New drivers are supposed to implement .remove(). Once all drivers are
* converted to not use .remove_new any more, it will be dropped.
*/
This was just a tree-wide 'sed' script that replaced '.remove_new' with
'.remove', with some care taken to turn a subsequent tab into two tabs
to make things line up.
I did do some minimal manual whitespace adjustment for places that used
spaces to line things up.
Then I just removed the old (sic) .remove_new member function, and this
is the end result. No more unnecessary conversion noise.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The caller of the function dev_pm_qos_add_request() checks again a non
zero value but dev_pm_qos_add_request() can return '1' if the request
already exists. Therefore, the setup function fails while the QoS
request actually did not failed.
Fix that by changing the check against a negative value like all the
other callers of the function.
Fixes: e446556173 ("powercap/drivers/dtpm: Add dtpm devfreq with energy model support")
Signed-off-by: Yuan Can <yuancan@huawei.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://patch.msgid.link/20241018021205.46460-1-yuancan@huawei.com
[ rjw: Subject edit ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The hardware definition of every TPMI feature contains a major and minor
version. When there is a change in the MMIO offset or change in the
definition of a field, hardware will change major version. For addition
of new fields without modifying existing MMIO offsets or fields, only
the minor version is changed.
If the driver has not been updated to recognize a new hardware major
version, it cannot provide the RAPL interface to users due to possible
register layout incompatibilities. However, the driver does not need to
be updated every time the hardware minor version changes because in that
case it will just miss some new functionality exposed by the hardware.
The current implementation causes the driver to refuse to work for any
hardware version change which is unnecessarily restrictive.
If there is a minor version mismatch, log an information message and
continue, but if there is a major version mismatch, log a warning and
exit (as before).
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://patch.msgid.link/20240930081801.28502-4-rui.zhang@intel.com
Fixes: 9eef7f9da9 ("powercap: intel_rapl: Introduce RAPL TPMI interface driver")
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
After commit ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf"),
on AMD processors that support extended CPUID leaf 0x80000026, the
topology_logical_die_id() macros, no longer returns package id, instead it
returns the CCD (Core Complex Die) id. This leads to the energy-pkg
event scope to be modified to CCD instead of package.
For more historical context, please refer to commit 32fb480e0a
("powercap/intel_rapl: Support multi-die/package"), which initially changed
the RAPL scope from package to die for all systems, as Intel systems
with Die enumeration have RAPL scope as die, and those without die
enumeration are not affected. So, all systems(Intel, AMD, Hygon), worked
correctly with topology_logical_die_id() until recently, but this changed
after the "0x80000026 leaf" commit mentioned above.
Future multi-die Intel systems will have package scope RAPL counters,
but they will be using TPMI RAPL interface, which is not affected by
this change.
Replacing topology_logical_die_id() with topology_physical_package_id()
conditionally only for AMD and Hygon fixes the energy-pkg event.
On an AMD 2 socket 8 CCD Zen4 server:
Before:
linux$ ls /sys/class/powercap/
intel-rapl intel-rapl:4 intel-rapl:8:0 intel-rapl:d
intel-rapl:0 intel-rapl:4:0 intel-rapl:9 intel-rapl:d:0
intel-rapl:0:0 intel-rapl:5 intel-rapl:9:0 intel-rapl:e
intel-rapl:1 intel-rapl:5:0 intel-rapl:a intel-rapl:e:0
intel-rapl:1:0 intel-rapl:6 intel-rapl🅰️0 intel-rapl:f
intel-rapl:2 intel-rapl:6:0 intel-rapl:b intel-rapl:f:0
intel-rapl:2:0 intel-rapl:7 intel-rapl🅱️0
intel-rapl:3 intel-rapl:7:0 intel-rapl:c
intel-rapl:3:0 intel-rapl:8 intel-rapl:c:0
After:
linux$ ls /sys/class/powercap/
intel-rapl intel-rapl:0 intel-rapl:0:0 intel-rapl:1 intel-rapl:1:0
Only one sysfs entry per-event per-package is created after this change.
Fixes: 63edbaa48a ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf")
Reported-by: Michael Larabel <michael@michaellarabel.com>
Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Link: https://patch.msgid.link/20240730044917.4680-3-Dhananjay.Ugwekar@amd.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The if condition !A || A && B can be simplified to !A || B.
Fixes the following Coccinelle/coccicheck warning reported by
excluded_middle.cocci:
WARNING !A || A && B is equivalent to !A || B
Compile-tested only.
Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Introduce two new APIs rapl_package_add_pmu()/rapl_package_remove_pmu().
RAPL driver can invoke these APIs to expose its supported energy
counters via perf PMU. The new RAPL PMU is fully compatible with current
MSR RAPL PMU, including using the same PMU name and events
name/id/unit/scale, etc.
For example, use below command
perf stat -e power/energy-pkg/ -e power/energy-ram/ FOO
to get the energy consumption if power/energy-pkg/ and power/energy-ram/
events are available in the "perf list" output.
This does not introduce any conflict because TPMI RAPL is the only user
of these APIs currently, and it never co-exists with MSR RAPL.
Note that RAPL Packages can be probed/removed dynamically, and the
events supported by each TPMI RAPL device can be different. Thus the
RAPL PMU support is done on demand, which means
1. PMU is registered only if it is needed by a RAPL Package. PMU events
for unsupported counters are not exposed.
2. PMU is unregistered and registered when a new RAPL Package is probed
and supports new counters that are not supported by current PMU.
For example, on a dual-package system using TPMI RAPL, it is possible
that Package 1 behaves as TPMI domain root and supports Psys domain.
In this case, register PMU without Psys event when probing Package 0,
and re-register the PMU with Psys event when probing Package 1.
3. PMU is unregistered when all registered RAPL Packages don't need PMU.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In general it's preferable to avoid placing cpumasks on the stack, as
for large values of NR_CPUS these can consume significant amounts of
stack space and make stack overflows more likely.
Use cpumask_weight_and() to avoid the need for a temporary cpumask on
the stack.
Signed-off-by: Dawei Li <dawei.li@shingroup.cn>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.
To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>