During the probe, driver enables clocks necessary to access registers
(in get_temp()) and then registers thermal zone with managed-resources
(devm) interface. Removal of device is not done in reversed order,
because:
1. Clock will be disabled in driver remove() callback - thermal zone is
still registered and accessible to users,
2. devm interface will unregister thermal zone.
This leaves short window between (1) and (2) for accessing the
get_temp() callback with disabled clock.
Fix this by enabling clock also via devm-interface, so entire cleanup
path will be in proper, reversed order.
Fixes: 8454c8c09c ("thermal/drivers/bcm2835: Remove buggy call to thermal_of_zone_unregister")
Cc: stable@vger.kernel.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240709-thermal-probe-v1-1-241644e2b6e0@linaro.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
It is possible to have thermal zones which don't have any trip points.
These zones in effect simply represent a temperature sensor without any
action associated with it. While the schema has always required a
'trips' node, users have existed for a long time without it. Update the
schema to match reality.
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240709150154.3272825-1-robh@kernel.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
On some pre-production hardware, the SoCs do not contain calibration
data for the thermal sensors. The downstream drivers provide default
values that sort of work, instead of having the thermal sensors not
work at all.
Port the default values to the upstream driver. These values are from
the ChromeOS kernels, which sadly do not cover the MT7988.
Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20240620092306.2352606-1-wenst@chromium.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Fix thermal zone names for consistency with the other SoCs:
- GPU0 must be used as the first GPU item.
- SOCx deal with audio DSP, video, and infra subsystems.
The naming must be fixed "atomically" so compilation does not break.
As a result, the change is made in the dt-bindings and in the LVTS
driver within a single commit, despite the checkpatch warning.
The definitions can be safely modified here because they are used only
in the LVTS driver, which is modified accordingly, and have not yet
been included in a released kernel.
Fixes: 78c88534e5 ("dt-bindings: thermal: mediatek: Add LVTS thermal controller definition for MT8188")
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Julien Panis <jpanis@baylibre.com>
Link: https://lore.kernel.org/r/20240603-mtk-thermal-mt818x-dtsi-v7-2-8c8e3c7a3643@baylibre.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Fix a thermal zone name for consistency with the other SoCs:
MFG contains GPU, the latter is more specific and must be used here.
The naming must be fixed "atomically" so compilation does not break.
As a result, the change is made in the dt-bindings and in the LVTS
driver within a single commit, despite the checkpatch warning.
The definition can be safely modified here because it is used only
in the LVTS driver, which is modified accordingly, and has not yet
been included in a released kernel.
Fixes: a2ca202350 ("dt-bindings: thermal: mediatek: Add LVTS thermal controller definition for MT8186")
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Julien Panis <jpanis@baylibre.com>
Link: https://lore.kernel.org/r/20240603-mtk-thermal-mt818x-dtsi-v7-1-8c8e3c7a3643@baylibre.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
This add suspend-to-ram support.
The derived_table is kept-as is, so the resume is only about
pm_runtime_* calls and restoring the same registers as the probe.
Extract the hardware initialization procedure to a function called at
both probe-time & resume-time.
The probe-time loop is split in two to ensure doing the hardware
initialization before registering thermal zones. That ensures our
callbacks cannot be called while in bad state.
The 100ms delay in the hardware initialization sequence was removed.
It was initially added to be sure the thresholds are programmed before
enabling the interrupt, but in fact it's not needed (tested on J7200
platform).
Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com>
Acked-by: Keerthy <j-keerthy@ti.com>
Signed-off-by: Thomas Richard <thomas.richard@bootlin.com>
Link: https://lore.kernel.org/r/20240425153238.498750-1-thomas.richard@bootlin.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
The order in which lists are sorted in __thermal_zone_device_update()
is reverse with respect to what it should be due to a mistake in
thermal_trip_notify_cmp().
Fix it and observe that it is not necessary to sort the lists in
different orders. They can both be sorted in ascending order if
way_down_list is walked in reverse order which allows the code to
be slightly more straightforward (and less prone to silly mistakes).
Fixes: 7454f2c42c ("thermal: core: Sort trip point crossing notifications by temperature")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/12481676.O9o76ZdvQC@rjwysocki.net
Commit 202aa0d4bb ("thermal: core: Do not call handle_thermal_trip()
if zone temperature is invalid") caused __thermal_zone_device_update()
to return early if the current thermal zone temperature was invalid.
This was done to avoid running handle_thermal_trip() and governor
callbacks in that case which led to confusion. However, it went too
far because monitor_thermal_zone() still needs to be called even when
the zone temperature is invalid to ensure that it will be updated
eventually in case thermal polling is enabled and the driver has no
other means to notify the core of zone temperature changes (for example,
it does not register an interrupt handler or ACPI notifier).
Also if the .set_trips() zone callback is expected to set up monitoring
interrupts for a thermal zone, it has to be provided with valid
boundaries and that can only happen if the zone temperature is known.
Accordingly, to ensure that __thermal_zone_device_update() will
run again after a failing zone temperature check, make it call
monitor_thermal_zone() regardless of whether or not the zone
temperature is valid and make the latter schedule a thermal zone
temperature update if the zone temperature is invalid even if
polling is not enabled for the thermal zone.
Fixes: 202aa0d4bb ("thermal: core: Do not call handle_thermal_trip() if zone temperature is invalid")
Reported-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Tested-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2764814.mvXUDI8C0e@rjwysocki.net
[ rjw: Changed THERMAL_RECHECK_DELAY_MS to 250 ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull ata fixes from Niklas Cassel:
- Add NOLPM quirk for for all Crucial BX SSD1 models.
Considering that we now have had bug reports for 3 different BX SSD1
variants from Crucial with the same product name, make the quirk more
inclusive, to catch more device models from the same generation.
- Fix a trivial NULL pointer dereference in the error path for
ata_host_release().
- Create a ata_port_free(), so that we don't miss freeing ata_port
struct members when freeing a struct ata_port.
- Fix a trivial double free in the error path for ata_host_alloc().
- Ensure that we remove the libata "remapped NVMe device count" sysfs
entry on .probe() error.
* tag 'ata-6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
ata: ahci: Clean up sysfs file on error
ata: libata-core: Fix double free on error
ata,scsi: libata-core: Do not leak memory for ata_port struct members
ata: libata-core: Fix null pointer dereference on error
ata: libata-core: Add ATA_HORKAGE_NOLPM for all Crucial BX SSD1 models
.probe() (ahci_init_one()) calls sysfs_add_file_to_group(), however,
if probe() fails after this call, we currently never call
sysfs_remove_file_from_group().
(The sysfs_remove_file_from_group() call in .remove() (ahci_remove_one())
does not help, as .remove() is not called on .probe() error.)
Thus, if probe() fails after the sysfs_add_file_to_group() call, the next
time we insmod the module we will get:
sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:04.0/remapped_nvme'
CPU: 11 PID: 954 Comm: modprobe Not tainted 6.10.0-rc5 #43
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x5d/0x80
sysfs_warn_dup.cold+0x17/0x23
sysfs_add_file_mode_ns+0x11a/0x130
sysfs_add_file_to_group+0x7e/0xc0
ahci_init_one+0x31f/0xd40 [ahci]
Fixes: 894fba7f43 ("ata: ahci: Add sysfs attribute to show remapped NVMe device count")
Cc: stable@vger.kernel.org
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240629124210.181537-10-cassel@kernel.org
Signed-off-by: Niklas Cassel <cassel@kernel.org>
libsas is currently not freeing all the struct ata_port struct members,
e.g. ncq_sense_buf for a driver supporting Command Duration Limits (CDL).
Add a function, ata_port_free(), that is used to free a ata_port,
including its struct members. It makes sense to keep the code related to
freeing a ata_port in its own function, which will also free all the
struct members of struct ata_port.
Fixes: 18bd7718b5 ("scsi: ata: libata: Handle completion of CDL commands using policy 0xD")
Reviewed-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20240629124210.181537-8-cassel@kernel.org
Signed-off-by: Niklas Cassel <cassel@kernel.org>