Convert the bus_register() and bus_unregister() functions to use
bus_to_subsys() and not use the back-pointer to the private structure.
Because bus_add_groups() and bus_remove_groups() were only called in one
place, remove those one-line-wrapper functions and call the real sysfs
group function where it is needed instead, saving another layer of
indirection.
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Link: https://lore.kernel.org/r/20230208111330.439504-8-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
bus_create_file() and bus_remove_file() can be made to take a constant
bus pointer, as it should not be modifying anything in the bus
structure. Make this change and move the functions to use the internal
subsys_get/put() logic as well, to prevent the use of the back-pointer
in struct bus_type.
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Link: https://lore.kernel.org/r/20230208111330.439504-5-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In the quest to make 'struct bus_type' constant and in read-only memory,
we need to stop using the private pointer to the subsys_private
structure. First step in doing this is to create a helper function that
turns a 'struct bus_type' into 'struct subsys_private' called
bus_to_subsys().
bus_to_subsys() walks the list of registered busses in the system and
finds the matching one based on the pointer to the bus_type itself. As
this is a short list, and this function is not on any fast path, it
should not be noticable.
Implement bus_get() and bus_put() using this new helper function.
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Link: https://lore.kernel.org/r/20230208111330.439504-3-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
fw_devlink could only detect a single and simple cycle because it relied
mainly on device link cycle detection code that only checked for cycles
between devices. The expectation was that the firmware wouldn't have
complicated cycles and multiple cycles between devices. That expectation
has been proven to be wrong.
For example, fw_devlink could handle:
+-+ +-+
|A+------> |B+
+-+ +++
^ |
| |
+----------+
But it couldn't handle even something as "simple" as:
+---------------------+
| |
v |
+-+ +-+ +++
|A+------> |B+------> |C|
+-+ +++ +-+
^ |
| |
+----------+
But firmware has even more complicated cycles like:
+---------------------+
| |
v |
+-+ +---+ +++
+--+A+------>| B +-----> |C|<--+
| +-+ ++--+ +++ |
| ^ | ^ | |
| | | | | |
| +---------+ +---------+ |
| |
+------------------------------+
And this is without including parent child dependencies or nodes in the
cycle that are just firmware nodes that'll never have a struct device
created for them.
The proper way to treat these devices it to not force any probe ordering
between them, while still enforce dependencies between node in the
cycles (A, B and C) and their consumers.
So this patch goes all out and just deals with all types of cycles. It
does this by:
1. Following dependencies across device links, parent-child and fwnode
links.
2. When it find cycles, it mark the device links and fwnode links as
such instead of just deleting them or making the indistinguishable
from proxy SYNC_STATE_ONLY device links.
This way, when new nodes get added, we can immediately find and mark any
new cycles whether the new node is a device or firmware node.
Fixes: 2de9d8e0d2 ("driver core: fw_devlink: Improve handling of cyclic dependencies")
Signed-off-by: Saravana Kannan <saravanak@google.com>
Tested-by: Colin Foster <colin.foster@in-advantage.com>
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Tested-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Luca Weiss <luca.weiss@fairphone.com> # qcom/sm7225-fairphone-fp4
Link: https://lore.kernel.org/r/20230207014207.1678715-9-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
fw_devlink uses DL_FLAG_SYNC_STATE_ONLY device link flag for two
purposes:
1. To allow a parent device to proxy its child device's dependency on a
supplier so that the supplier doesn't get its sync_state() callback
before the child device/consumer can be added and probed. In this
usage scenario, we need to ignore cycles for ensure correctness of
sync_state() callbacks.
2. When there are dependency cycles in firmware, we don't know which of
those dependencies are valid. So, we have to ignore them all wrt
probe ordering while still making sure the sync_state() callbacks
come correctly.
However, when detecting dependency cycles, there can be multiple
dependency cycles between two devices that we need to detect. For
example:
A -> B -> A and A -> C -> B -> A.
To detect multiple cycles correct, we need to be able to differentiate
DL_FLAG_SYNC_STATE_ONLY device links used for (1) vs (2) above.
To allow this differentiation, add a DL_FLAG_CYCLE that can be use to
mark use case (2). We can then use the DL_FLAG_CYCLE to decide which
DL_FLAG_SYNC_STATE_ONLY device links to follow when looking for
dependency cycles.
Fixes: 2de9d8e0d2 ("driver core: fw_devlink: Improve handling of cyclic dependencies")
Signed-off-by: Saravana Kannan <saravanak@google.com>
Tested-by: Colin Foster <colin.foster@in-advantage.com>
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Tested-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Luca Weiss <luca.weiss@fairphone.com> # qcom/sm7225-fairphone-fp4
Link: https://lore.kernel.org/r/20230207014207.1678715-6-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When a device X is bound successfully to a driver, if it has a child
firmware node Y that doesn't have a struct device created by then, we
delete fwnode links where the child firmware node Y is the supplier. We
did this to avoid blocking the consumers of the child firmware node Y
from deferring probe indefinitely.
While that a step in the right direction, it's better to make the
consumers of the child firmware node Y to be consumers of the device X
because device X is probably implementing whatever functionality is
represented by child firmware node Y. By doing this, we capture the
device dependencies more accurately and ensure better
probe/suspend/resume ordering.
Signed-off-by: Saravana Kannan <saravanak@google.com>
Tested-by: Colin Foster <colin.foster@in-advantage.com>
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Tested-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Luca Weiss <luca.weiss@fairphone.com> # qcom/sm7225-fairphone-fp4
Link: https://lore.kernel.org/r/20230207014207.1678715-2-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Move the lock_class_key structure out of struct bus_type and into the
dynamic structure we create already for all bus_types registered with
the kernel. This saves on static space and removes one more writable
field in struct bus_type.
In the future, the same field can be moved out of the struct class logic
because it shares this same private structure.
Most everyone will never notice this change, as lockdep is not enabled
in real systems so no memory or logic changes are happening for them.
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20230201083349.4038660-1-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
__platform_driver_probe() pokes around in some bus and driver private
lists and locks in a way that is not needed at all. The code only wants
to know if a device was bound to the driver that was registered, so walk
all devices on the bus to see if there was a match. If there is not a
match, return an error. This is the same logic as was originally
present, but just done in a simpler and more obvious way that is not a
layering violation.
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Link: https://lore.kernel.org/r/20230131082459.301603-2-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Clear the class private pointer if __class_register() fails for it, so
as to allow its users to verify that the class is usable by checking
the value of that pointer.
For consistency, clear that pointer before freeing the object pointed
to by it in class_release().
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/4463268.LvFx2qVVIh@kreacher
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The normal call sequence of using transport class is:
Add path:
transport_setup_device()
transport_setup_classdev() // call sas_host_setup() here
transport_add_device() // if fails, need call transport_destroy_device()
transport_configure_device()
Remove path:
transport_remove_device()
transport_remove_classdev // call sas_host_remove() here
transport_destroy_device()
If transport_add_device() fails, need call transport_destroy_device()
to free memory, but in this case, ->remove() is not called, and the
resources allocated in ->setup() are leaked. So fix these leaks by
calling ->remove() in transport_add_class_device() if it returns error.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20221115031638.3816551-1-yangyingliang@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When calling kobject_add() failed in device_add(), it will call
cleanup_glue_dir() to free resource. But in kobject_add(),
dev->kobj.parent has been set to NULL. This will cause resource leak.
The process is as follows:
device_add()
get_device_parent()
class_dir_create_and_add()
kobject_add() //kobject_get()
...
dev->kobj.parent = kobj;
...
kobject_add() //failed, but set dev->kobj.parent = NULL
...
glue_dir = get_glue_dir(dev) //glue_dir = NULL, and goto
//"Error" label
...
cleanup_glue_dir() //becaues glue_dir is NULL, not call
//kobject_put()
The preceding problem may cause insmod mac80211_hwsim.ko to failed.
sysfs: cannot create duplicate filename '/devices/virtual/mac80211_hwsim'
Call Trace:
<TASK>
dump_stack_lvl+0x8e/0xd1
sysfs_warn_dup.cold+0x1c/0x29
sysfs_create_dir_ns+0x224/0x280
kobject_add_internal+0x2aa/0x880
kobject_add+0x135/0x1a0
get_device_parent+0x3d7/0x590
device_add+0x2aa/0x1cb0
device_create_groups_vargs+0x1eb/0x260
device_create+0xdc/0x110
mac80211_hwsim_new_radio+0x31e/0x4790 [mac80211_hwsim]
init_mac80211_hwsim+0x48d/0x1000 [mac80211_hwsim]
do_one_initcall+0x10f/0x630
do_init_module+0x19f/0x5e0
load_module+0x64b7/0x6eb0
__do_sys_finit_module+0x140/0x200
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
</TASK>
kobject_add_internal failed for mac80211_hwsim with -EEXIST, don't try to
register things with the same name in the same directory.
Fixes: cebf8fd169 ("driver core: fix race between creating/querying glue dir and its cleanup")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Link: https://lore.kernel.org/r/20221123012042.335252-1-shaozhengchao@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
According to 'admin-guide/mm/memory-hotplug.rst', the memory block ID,
instead of the section index, is shown by '/sys/devices/system/memory/
memoryX/phys_index'.
Fix the comments to match with 'admin-guide/mm/memory-hotplug.rst'.
Besides, use the existing helper memory_block_id() to convert the section
index to the memory block index.
No functional change intended.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Link: https://lore.kernel.org/r/20230120055727.355483-2-gshan@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sudeep writes:
"cacheinfo and arch_topology updates for v6.3
The main change is to build the cache topology information for all
the CPUs from the primary CPU. Currently the cacheinfo for secondary CPUs
is created during the early boot on the respective CPU itself. Preemption
and interrupts are disabled at this stage. On PREEMPT_RT kernels, allocating
memory and even parsing the PPTT table for ACPI based systems triggers a:
'BUG: sleeping function called from invalid context'
To prevent this bug, the cacheinfo is now allocated from the primary CPU
when preemption and interrupts are enabled and before booting secondary
CPUs. The cache levels/leaves are computed from DT/ACPI PPTT information
only, without relying on any architecture specific mechanism if done so
early.
The other minor change included here is to handle shared caches at
different levels when not all the CPUs on the system have the same
cache hierarchy."
* tag 'archtopo-cacheinfo-updates-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux:
cacheinfo: Fix shared_cpu_map to handle shared caches at different levels
arch_topology: Build cacheinfo from primary CPU
ACPI: PPTT: Update acpi_find_last_cache_level() to acpi_get_cache_info()
ACPI: PPTT: Remove acpi_find_cache_levels()
cacheinfo: Check 'cache-unified' property to count cache leaves
cacheinfo: Return error code in init_of_cache_level()
cacheinfo: Use RISC-V's init_cache_level() as generic OF implementation
In test_async_probe_init, second set of asynchronous devices are saved
in sync_dev[sync_id], which should be async_dev[async_id].
This makes these devices not unregistered when exit.
> modprobe test_async_driver_probe && \
> modprobe -r test_async_driver_probe && \
> modprobe test_async_driver_probe
...
> sysfs: cannot create duplicate filename '/devices/platform/test_async_driver.4'
> kobject_add_internal failed for test_async_driver.4 with -EEXIST,
don't try to register things with the same name in the same directory.
Fixes: 57ea974fb8 ("driver core: Rewrite test_async_driver_probe to cover serialization and NUMA affinity")
Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com>
Link: https://lore.kernel.org/r/20221125063541.241328-1-chenzhongjin@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The cacheinfo sets up the shared_cpu_map by checking whether the caches
with the same index are shared between CPUs. However, this will trigger
slab-out-of-bounds access if the CPUs do not have the same cache hierarchy.
Another problem is the mismatched shared_cpu_map when the shared cache does
not have the same index between CPUs.
CPU0 I D L3
index 0 1 2 x
^ ^ ^ ^
index 0 1 2 3
CPU1 I D L2 L3
This patch checks each cache is shared with all caches on other CPUs.
Reviewed-by: Pierre Gondois <pierre.gondois@arm.com>
Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com>
Link: https://lore.kernel.org/r/20230117105133.4445-2-yongxuan.wang@sifive.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
commit 3fcbf1c77d ("arch_topology: Fix cache attributes detection
in the CPU hotplug path")
adds a call to detect_cache_attributes() to populate the cacheinfo
before updating the siblings mask. detect_cache_attributes() allocates
memory and can take the PPTT mutex (on ACPI platforms). On PREEMPT_RT
kernels, on secondary CPUs, this triggers a:
'BUG: sleeping function called from invalid context' [1]
as the code is executed with preemption and interrupts disabled.
The primary CPU was previously storing the cache information using
the now removed (struct cpu_topology).llc_id:
commit 5b8dc787ce ("arch_topology: Drop LLC identifier stash from
the CPU topology")
allocate_cache_info() tries to build the cacheinfo from the primary
CPU prior secondary CPUs boot, if the DT/ACPI description
contains cache information.
If allocate_cache_info() fails, then fallback to the current state
for the cacheinfo allocation. [1] will be triggered in such case.
When unplugging a CPU, the cacheinfo memory cannot be freed. If it
was, then the memory would be allocated early by the re-plugged
CPU and would trigger [1].
Note that populate_cache_leaves() might be called multiple times
due to populate_leaves being moved up. This is required since
detect_cache_attributes() might be called with per_cpu_cacheinfo(cpu)
being allocated but not populated.
[1]:
| BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
| in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 0, name: swapper/111
| preempt_count: 1, expected: 0
| RCU nest depth: 1, expected: 1
| 3 locks held by swapper/111/0:
| #0: (&pcp->lock){+.+.}-{3:3}, at: get_page_from_freelist+0x218/0x12c8
| #1: (rcu_read_lock){....}-{1:3}, at: rt_spin_trylock+0x48/0xf0
| #2: (&zone->lock){+.+.}-{3:3}, at: rmqueue_bulk+0x64/0xa80
| irq event stamp: 0
| hardirqs last enabled at (0): 0x0
| hardirqs last disabled at (0): copy_process+0x5dc/0x1ab8
| softirqs last enabled at (0): copy_process+0x5dc/0x1ab8
| softirqs last disabled at (0): 0x0
| Preemption disabled at:
| migrate_enable+0x30/0x130
| CPU: 111 PID: 0 Comm: swapper/111 Tainted: G W 6.0.0-rc4-rt6-[...]
| Call trace:
| __kmalloc+0xbc/0x1e8
| detect_cache_attributes+0x2d4/0x5f0
| update_siblings_masks+0x30/0x368
| store_cpu_topology+0x78/0xb8
| secondary_start_kernel+0xd0/0x198
| __secondary_switched+0xb0/0xb4
Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/20230104183033.755668-7-pierre.gondois@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Make init_of_cache_level() return an error code when the cache
information parsing fails to help detecting missing information.
init_of_cache_level() is only called for riscv. Returning an error
code instead of 0 will prevent detect_cache_attributes() to allocate
memory if an incomplete DT is parsed.
Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/20230104183033.755668-3-pierre.gondois@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>