Devfreq cooling needs to now the correct status of the device in order
to operate. Devfreq framework can change the device status in the
background. To mitigate issues make a copy of the status structure and use
it for internal calculations.
In addition this patch adds normalization function, which also makes sure
that whatever data comes from the device, the load will be in range from 1
to 1024.
Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20201210143014.24685-3-lukasz.luba@arm.com
Added processor thermal device mail box interface for workload hints
setting. These hints will give indication to hardware to better manage
power and thermals. The supported hints are:
idle
semi_active
burusty
sustained
battery_life
For example when the system is on battery, the hardware can be less
aggressive in power ramp up.
This will create an attribute group at
/sys/bus/pci/devices/0000:00:04.0/workload_request
This folder contains two attributes:
workload_available_types : (RO): This shows available workload types
workload_type: (RW) : Allows to set and get current workload type
setting
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20201126171829.945969-4-srinivas.pandruvada@linux.intel.com
Add support for RFIM (Radio Frequency Interference Mitigation) support
via processor thermal PCI device. This drivers allows adjustment of
FIVR (Fully Integrated Voltage Regulator) and DDR (Double Data Rate)
frequencies to avoid RF interference with WiFi and 5G.
Switching voltage regulators (VR) generate radiated EMI or RFI at the
fundamental frequency and its harmonics. Some harmonics may interfere
with very sensitive wireless receivers such as Wi-Fi and cellular that
are integrated into host systems like notebook PCs. One of mitigation
methods is requesting SOC integrated VR (IVR) switching frequency to a
small % and shift away the switching noise harmonic interference from
radio channels. OEM or ODMs can use the driver to control SOC IVR
operation within the range where it does not impact IVR performance.
DRAM devices of DDR IO interface and their power plane can generate EMI
at the data rates. Similar to IVR control mechanism, Intel offers a
mechanism by which DDR data rates can be changed if several conditions
are met: there is strong RFI interference because of DDR; CPU power
management has no other restriction in changing DDR data rates;
PC ODMs enable this feature (real time DDR RFI Mitigation referred to as
DDR-RFIM) for Wi-Fi from BIOS.
This change exports two folders under /sys/bus/pci/devices/0000:00:04.0.
One folder "fivr" contains all attributes exposed for controling FIVR
features. The other folder "dvfs" contains all attributes for DDR
features.
Changes done to implement:
- New module for rfim interfaces
- Two new per processor features for DDR and FIVR
- Enable feature for Tiger Lake (FIVR only) and Alder Lake
The attributes exposed and explanation:
FIVR attributes
vco_ref_code_lo (RW): The VCO reference code is an 11-bit field and
controls the FIVR switching frequency. This is the 3-bit LSB field.
vco_ref_code_hi (RW): The VCO reference code is an 11-bit field and
controls the FIVR switching frequency. This is the 8-bit MSB field.
spread_spectrum_pct (RW): Set the FIVR spread spectrum clocking
percentage
spread_spectrum_clk_enable (RW): Enable/disable of the FIVR spread
spectrum clocking feature
rfi_vco_ref_code (RW): This field is a read only status register which
reflects the current FIVR switching frequency
fivr_fffc_rev (RW): This field indicated the revision of the FIVR HW.
DVFS attributes
rfi_restriction_run_busy (RW): Request the restriction of specific DDR
data rate and set this value 1. Self reset to 0 after operation.
rfi_restriction_err_code (RW): Values: 0 :Request is accepted, 1:Feature
disabled, 2: the request restricts more points than it is allowed
rfi_restriction_data_rate_Delta (RW): Restricted DDR data rate for RFI
protection: Lower Limit
rfi_restriction_data_rate_Base (RW): Restricted DDR data rate for RFI
protection: Upper Limit
ddr_data_rate_point_0 (RO): DDR data rate selection 1st point
ddr_data_rate_point_1 (RO): DDR data rate selection 2nd point
ddr_data_rate_point_2 (RO): DDR data rate selection 3rd point
ddr_data_rate_point_3 (RO): DDR data rate selection 4th point
rfi_disable (RW): Disable DDR rate change feature
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20201126171829.945969-3-srinivas.pandruvada@linux.intel.com
The Processor Thermal PCI device supports multiple features. Currently
we export only RAPL. But we need more features from this device exposed
for Tiger Lake and Alder Lake based platforms. So re-structure the
current MMIO interface, so that more features can be added cleanly.
No functional changes are expected with this change.
Changes done in this patch:
- Using PCI_DEVICE_DATA(), hence names of defines changed
- Move RAPL MMIO code to its own module
- Move the RAPL MMIO offsets to RAPL MMIO module
- Adjust Kconfig dependency of PROC_THERMAL_MMIO_RAPL
- Per processor driver data now contains the supported features
- Moved all the common data structures and defines to a common header
file
- This new header file contains all the processor_thermal_* interfaces
- Based on the features supported the module interface is called
- Each module atleast provides one add and one remove function
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20201126171829.945969-1-srinivas.pandruvada@linux.intel.com
Currently the code checks the interval value when the temperature is
read which is bad for two reasons:
- checking and setting the interval in the get_temp callback is
inaccurate and awful, that can be done when changing the value.
- Changing the thermal zone structure internals is an abuse of the
exported structure, moreover no lock is taken here.
The goal of this patch is to solve the first item by using the 'set'
function called when changing the interval. The check is done there
and removed from the get_temp function. If the thermal zone was not
initialized yet, the interval is not updated in this case as that will
happen in the init function when registering the thermal zone device.
I don't have any hardware to test the changes.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Peter Kaestle <peter@piie.net>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20201203071738.2363701-2-daniel.lezcano@linaro.org
The sustainable power value might come from the Device Tree or can be
estimated in run time. The sustainable power might be updated by the user
via sysfs interface, which should trigger new estimation of PID
coefficients. There is no need to estimate it every time when the
governor is called and temperature is high. Instead, store the estimated
value and make it available via standard sysfs interface, so it can be
checked from the user-space.
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20201124161025.27694-3-lukasz.luba@arm.com
Intelligent Power Allocation (IPA) is built around the PID controller
concept. The initialization code tries to setup the environment based on
the information available in DT or estimate the value based on minimum
power reported by each of the cooling device. The estimation will have an
impact on the PID controller behaviour via the related 'k_po', 'k_pu',
'k_i' coefficients and also on the power budget calculation.
This change prevents the situation when 'k_i' is relatively big compared
to 'k_po' and 'k_pu' values. This might happen when the estimation for
'sustainable_power' returned small value, thus 'k_po' and 'k_pu' are
small.
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20201124161025.27694-2-lukasz.luba@arm.com
When system tries to enter S0ix suspend state, just after active load
scenarios, it fails due to PCH current temperature is higher than set
threshold.
This patch introduces delay loop mechanism that allows PCH temperature
to go down below threshold during suspend so it won't fail to enter S0ix.
Add delay loop timeout and count as module parameters for user to tune it,
if required based on system design. This change notifies the different
warning messages like when PCH temperature above the threshold and
executing delay loop. Also, notify the messages when it success or
failure for S0ix entry.
Previously out of 1000 runs around 3 to 5 times it might fail to enter
S0ix just after heavy workload. With this change, S0ix failures reduced
as PCH cools down below threshold.
Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20201106170633.20838-1-sumeet.r.pawnikar@intel.com
On RT or even on mainline with 'threadirqs' on the command line all
interrupts which are not explicitly requested with IRQF_NO_THREAD
run their handlers in thread context. The same applies to soft interrupts.
That means they are subject to the normal scheduler rules and no other
code is going to acquire that lock from hard interrupt context either,
so the irqsave() here is pointless in all cases.
Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/1603760790-37748-1-git-send-email-tiantao6@hisilicon.com
Since the power actor section has one function power_actor_set_power()
move it into Intelligent Power Allocation (IPA). There is no other user
of that helper function. It would also allow to remove the check of
cdev_is_power_actor() because the code which calls it in IPA already does
the needed check. Make the function static since only IPA use it.
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20201015112441.4056-5-lukasz.luba@arm.com
The thermal cooling device specified in DT might be instantiated for
a thermal zone trip point with a limited set of OPPs to operate on. This
configuration should be supported by Intelligent Power Allocation (IPA),
since it is a standard for other governors. Change the code and allow IPA
to get power value of lower and upper bound set for a given cooling
device.
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20201015112441.4056-3-lukasz.luba@arm.com
tid_addr is not a "pointer to (pointer to int in userspace)"; it is in
fact a "pointer to (pointer to int in userspace) in userspace". So
sparse rightfully complains about passing a kernel pointer to
put_user().
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 453431a549 ("mm, treewide: rename kzfree() to
kfree_sensitive()") renamed kzfree() to kfree_sensitive(),
but it left a compatibility definition of kzfree() to avoid
being too disruptive.
Since then a few more instances of kzfree() have slipped in.
Just get rid of them and remove the compatibility definition
once and for all.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull timer fixes from Thomas Gleixner:
"A time namespace fix and a matching selftest. The futex absolute
timeouts which are based on CLOCK_MONOTONIC require time namespace
corrected. This was missed in the original time namesapce support"
* tag 'timers-urgent-2020-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
selftests/timens: Add a test for futex()
futex: Adjust absolute futex timeouts with per time namespace offset
Pull scheduler fixes from Thomas Gleixner:
"Two scheduler fixes:
- A trivial build fix for sched_feat() to compile correctly with
CONFIG_JUMP_LABEL=n
- Replace a zero lenght array with a flexible array"
* tag 'sched-urgent-2020-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/features: Fix !CONFIG_JUMP_LABEL case
sched: Replace zero-length array with flexible-array
Pull perf fix from Thomas Gleixner:
"A single fix to compute the field offset of the SNOOPX bit in the data
source bitmask of perf events correctly"
* tag 'perf-urgent-2020-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: correct SNOOPX field offset
Pull locking fix from Thomas Gleixner:
"Just a trivial fix for kernel-doc warnings"
* tag 'locking-urgent-2020-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/seqlocks: Fix kernel-doc warnings
Pull NTB fixes from Jon Mason.
* tag 'ntb-5.10' of git://github.com/jonmason/ntb:
NTB: Use struct_size() helper in devm_kzalloc()
ntb: intel: Fix memleak in intel_ntb_pci_probe
NTB: hw: amd: fix an issue about leak system resources
Pull i2c fix from Wolfram Sang:
"Regression fix for rc1 and stable kernels as well"
* 'i2c/for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: core: Restore acpi_walk_dep_device_list() getting called after registering the ACPI i2c devs
Pull more cifs updates from Steve French:
"Add support for stat of various special file types (WSL reparse points
for char, block, fifo)"
* tag '5.10-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
cifs: update internal module version number
smb3: add some missing definitions from MS-FSCC
smb3: remove two unused variables
smb3: add support for stat of WSL reparse points for special file types