linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-15 19:01:45 -04:00

Author	SHA1	Message	Date
Florian Westphal	b2f742c846	netfilter: nf_tables: restart set lookup on base_seq change The hash, hash_fast, rhash and bitwise sets may indicate no result even though a matching element exists during a short time window while other cpu is finalizing the transaction. This happens when the hash lookup/bitwise lookup function has picked up the old genbit, right before it was toggled by nf_tables_commit(), but then the same cpu managed to unlink the matching old element from the hash table: cpu0 cpu1 has added new elements to clone has marked elements as being inactive in new generation perform lookup in the set enters commit phase: A) observes old genbit increments base_seq I) increments the genbit II) removes old element from the set B) finds matching element C) returns no match: found element is not valid in old generation Next lookup observes new genbit and finds matching e2. Consider a packet matching element e1, e2. cpu0 processes following transaction: 1. remove e1 2. adds e2, which has same key as e1. P matches both e1 and e2. Therefore, cpu1 should always find a match for P. Due to above race, this is not the case: cpu1 observed the old genbit. e2 will not be considered once it is found. The element e1 is not found anymore if cpu0 managed to unlink it from the hlist before cpu1 found it during list traversal. The situation only occurs for a brief time period, lookups happening after I) observe new genbit and return e2. This problem exists in all set types except nft_set_pipapo, so fix it once in nft_lookup rather than each set ops individually. Sample the base sequence counter, which gets incremented right before the genbit is changed. Then, if no match is found, retry the lookup if the base sequence was altered in between. If the base sequence hasn't changed: - No update took place: no-match result is expected. This is the common case. or: - nf_tables_commit() hasn't progressed to genbit update yet. Old elements were still visible and nomatch result is expected, or: - nf_tables_commit updated the genbit: We picked up the new base_seq, so the lookup function also picked up the new genbit, no-match result is expected. If the old genbit was observed, then nft_lookup also picked up the old base_seq: nft_lookup_should_retry() returns true and relookup is performed in the new generation. This problem was added when the unconditional synchronize_rcu() call that followed the current/next generation bit toggle was removed. Thanks to Pablo Neira Ayuso for reviewing an earlier version of this patchset, for suggesting re-use of existing base_seq and placement of the restart loop in nft_set_do_lookup(). Fixes: `0cbc06b3fa` ("netfilter: nf_tables: remove synchronize_rcu in commit phase") Signed-off-by: Florian Westphal <fw@strlen.de>	2025-09-10 20:30:37 +02:00
Florian Westphal	11fe5a82e5	netfilter: nf_tables: make nft_set_do_lookup available unconditionally This function was added for retpoline mitigation and is replaced by a static inline helper if mitigations are not enabled. Enable this helper function unconditionally so next patch can add a lookup restart mechanism to fix possible false negatives while transactions are in progress. Adding lookup restarts in nft_lookup_eval doesn't work as nft_objref would then need the same copypaste loop. This patch is separate to ease review of the actual bug fix. Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>	2025-09-10 20:30:37 +02:00
Florian Westphal	64102d9bbc	netfilter: nf_tables: place base_seq in struct net This will soon be read from packet path around same time as the gencursor. Both gencursor and base_seq get incremented almost at the same time, so it makes sense to place them in the same structure. This doesn't increase struct net size on 64bit due to padding. Signed-off-by: Florian Westphal <fw@strlen.de>	2025-09-10 20:30:37 +02:00
Florian Westphal	a60f7bf4a1	netfilter: nft_set_rbtree: continue traversal if element is inactive When the rbtree lookup function finds a match in the rbtree, it sets the range start interval to a potentially inactive element. Then, after tree lookup, if the matching element is inactive, it returns NULL and suppresses a matching result. This is wrong and leads to false negative matches when a transaction has already entered the commit phase. cpu0 cpu1 has added new elements to clone has marked elements as being inactive in new generation perform lookup in the set enters commit phase: I) increments the genbit A) observes new genbit B) finds matching range C) returns no match: found range invalid in new generation II) removes old elements from the tree C New nft_lookup happening now will find matching element, because it is no longer obscured by old, inactive one. Consider a packet matching range r1-r2: cpu0 processes following transaction: 1. remove r1-r2 2. add r1-r3 P is contained in both ranges. Therefore, cpu1 should always find a match for P. Due to above race, this is not the case: cpu1 does find r1-r2, but then ignores it due to the genbit indicating the range has been removed. It does NOT test for further matches. The situation persists for all lookups until after cpu0 hits II) after which r1-r3 range start node is tested for the first time. Move the "interval start is valid" check ahead so that tree traversal continues if the starting interval is not valid in this generation. Thanks to Stefan Hanreich for providing an initial reproducer for this bug. Reported-by: Stefan Hanreich <s.hanreich@proxmox.com> Fixes: `c1eda3c639` ("netfilter: nft_rbtree: ignore inactive matching element with no descendants") Signed-off-by: Florian Westphal <fw@strlen.de>	2025-09-10 20:30:37 +02:00
Florian Westphal	c4eaca2e10	netfilter: nft_set_pipapo: don't check genbit from packetpath lookups The pipapo set type is special in that it has two copies of its datastructure: one live copy containing only valid elements and one on-demand clone used during transaction where adds/deletes happen. This clone is not visible to the datapath. This is unlike all other set types in nftables, those all link new elements into their live hlist/tree. For those sets, the lookup functions must skip the new elements while the transaction is ongoing to ensure consistency. As the clone is shallow, removal does have an effect on the packet path: once the transaction enters the commit phase the 'gencursor' bit that determines which elements are active and which elements should be ignored (because they are no longer valid) is flipped. This causes the datapath lookup to ignore these elements if they are found during lookup. This opens up a small race window where pipapo has an inconsistent view of the dataset from when the transaction-cpu flipped the genbit until the transaction-cpu calls nft_pipapo_commit() to swap live/clone pointers: cpu0 cpu1 has added new elements to clone has marked elements as being inactive in new generation perform lookup in the set enters commit phase: I) increments the genbit A) observes new genbit removes elements from the clone so they won't be found anymore B) lookup in datastructure can't see new elements yet, but old elements are ignored -> Only matches elements that were not changed in the transaction II) calls nft_pipapo_commit(), clone and live pointers are swapped. C New nft_lookup happening now will find matching elements. Consider a packet matching range r1-r2: cpu0 processes following transaction: 1. remove r1-r2 2. add r1-r3 P is contained in both ranges. Therefore, cpu1 should always find a match for P. Due to above race, this is not the case: cpu1 does find r1-r2, but then ignores it due to the genbit indicating the range has been removed. At the same time, r1-r3 is not visible yet, because it can only be found in the clone. The situation persists for all lookups until after cpu0 hits II). The fix is easy: Don't check the genbit from pipapo lookup functions. This is possible because unlike the other set types, the new elements are not reachable from the live copy of the dataset. The clone/live pointer swap is enough to avoid matching on old elements while at the same time all new elements are exposed in one go. After this change, step B above returns a match in r1-r2. This is fine: r1-r2 only becomes truly invalid the moment they get freed. This happens after a synchronize_rcu() call and rcu read lock is held via netfilter hook traversal (nf_hook_slow()). Cc: Stefano Brivio <sbrivio@redhat.com> Fixes: `3c4287f620` ("nf_tables: Add set type for arbitrary concatenation of ranges") Signed-off-by: Florian Westphal <fw@strlen.de>	2025-09-10 20:30:37 +02:00
Florian Westphal	5e13f2c491	netfilter: nft_set_bitmap: fix lockdep splat due to missing annotation Running new 'set_flush_add_atomic_bitmap' test case for nftables.git with CONFIG_PROVE_RCU_LIST=y yields: net/netfilter/nft_set_bitmap.c:231 RCU-list traversed in non-reader section!! rcu_scheduler_active = 2, debug_locks = 1 1 lock held by nft/4008: #0: ffff888147f79cd8 (&nft_net->commit_mutex){+.+.}-{4:4}, at: nf_tables_valid_genid+0x2f/0xd0 lockdep_rcu_suspicious+0x116/0x160 nft_bitmap_walk+0x22d/0x240 nf_tables_delsetelem+0x1010/0x1a00 .. This is a false positive, the list cannot be altered while the transaction mutex is held, so pass the relevant argument to the iterator. Fixes tag intentionally wrong; no point in picking this up if earlier false-positive-fixups were not applied. Fixes: `28b7a6b84c` ("netfilter: nf_tables: avoid false-positive lockdep splats in set walker") Signed-off-by: Florian Westphal <fw@strlen.de>	2025-09-10 20:28:24 +02:00
Geert Uytterhoeven	5c793afa07	can: rcar_can: rcar_can_resume(): fix s2ram with PSCI On R-Car Gen3 using PSCI, s2ram powers down the SoC. After resume, the CAN interface no longer works, until it is brought down and up again. Fix this by calling rcar_can_start() from the PM resume callback, to fully initialize the controller instead of just restarting it. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://patch.msgid.link/699b2f7fcb60b31b6f976a37f08ce99c5ffccb31.1755165227.git.geert+renesas@glider.be Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2025-09-10 17:12:05 +02:00
Anssi Hannula	ef79f00be7	can: xilinx_can: xcan_write_frame(): fix use-after-free of transmitted SKB can_put_echo_skb() takes ownership of the SKB and it may be freed during or after the call. However, xilinx_can xcan_write_frame() keeps using SKB after the call. Fix that by only calling can_put_echo_skb() after the code is done touching the SKB. The tx_lock is held for the entire xcan_write_frame() execution and also on the can_get_echo_skb() side so the order of operations does not matter. An earlier fix commit `3d3c817c3a` ("can: xilinx_can: Fix usage of skb memory") did not move the can_put_echo_skb() call far enough. Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi> Fixes: `1598efe57b` ("can: xilinx_can: refactor code in preparation for CAN FD support") Link: https://patch.msgid.link/20250822095002.168389-1-anssi.hannula@bitwise.fi [mkl: add "commit" in front of sha1 in patch description] [mkl: fix indention] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2025-09-10 17:12:05 +02:00
Tetsuo Handa	06e02da29f	can: j1939: j1939_local_ecu_get(): undo increment when j1939_local_ecu_get() fails Since j1939_sk_bind() and j1939_sk_release() call j1939_local_ecu_put() when J1939_SOCK_BOUND was already set, but the error handling path for j1939_sk_bind() will not set J1939_SOCK_BOUND when j1939_local_ecu_get() fails, j1939_local_ecu_get() needs to undo priv->ents[sa].nusers++ when j1939_local_ecu_get() returns an error. Fixes: `9d71dd0c70` ("can: add support of SAE J1939 protocol") Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/e7f80046-4ff7-4ce2-8ad8-7c3c678a42c9@I-love.SAKURA.ne.jp Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2025-09-10 17:12:05 +02:00
Tetsuo Handa	f214744c8a	can: j1939: j1939_sk_bind(): call j1939_priv_put() immediately when j1939_local_ecu_get() failed Commit `25fe97cb76` ("can: j1939: move j1939_priv_put() into sk_destruct callback") expects that a call to j1939_priv_put() can be unconditionally delayed until j1939_sk_sock_destruct() is called. But a refcount leak will happen when j1939_sk_bind() is called again after j1939_local_ecu_get() from previous j1939_sk_bind() call returned an error. We need to call j1939_priv_put() before j1939_sk_bind() returns an error. Fixes: `25fe97cb76` ("can: j1939: move j1939_priv_put() into sk_destruct callback") Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/4f49a1bc-a528-42ad-86c0-187268ab6535@I-love.SAKURA.ne.jp Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2025-09-10 17:12:04 +02:00
Tetsuo Handa	7fcbe5b2c6	can: j1939: implement NETDEV_UNREGISTER notification handler syzbot is reporting unregister_netdevice: waiting for vcan0 to become free. Usage count = 2 problem, for j1939 protocol did not have NETDEV_UNREGISTER notification handler for undoing changes made by j1939_sk_bind(). Commit `25fe97cb76` ("can: j1939: move j1939_priv_put() into sk_destruct callback") expects that a call to j1939_priv_put() can be unconditionally delayed until j1939_sk_sock_destruct() is called. But we need to call j1939_priv_put() against an extra ref held by j1939_sk_bind() call (as a part of undoing changes made by j1939_sk_bind()) as soon as NETDEV_UNREGISTER notification fires (i.e. before j1939_sk_sock_destruct() is called via j1939_sk_release()). Otherwise, the extra ref on "struct j1939_priv" held by j1939_sk_bind() call prevents "struct net_device" from dropping the usage count to 1; making it impossible for unregister_netdevice() to continue. Reported-by: syzbot <syzbot+881d65229ca4f9ae8c84@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=881d65229ca4f9ae8c84 Tested-by: syzbot <syzbot+881d65229ca4f9ae8c84@syzkaller.appspotmail.com> Fixes: `9d71dd0c70` ("can: add support of SAE J1939 protocol") Fixes: `25fe97cb76` ("can: j1939: move j1939_priv_put() into sk_destruct callback") Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/ac9db9a4-6c30-416e-8b94-96e6559d55b2@I-love.SAKURA.ne.jp [mkl: remove space in front of label] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2025-09-10 17:12:04 +02:00
Davide Caratti	d013ebc349	selftests: can: enable CONFIG_CAN_VCAN as a module A proper kernel configuration for running kselftest can be obtained with: $ yes \| make kselftest-merge Build of 'vcan' driver is currently missing, while the other required knobs are already there because of net/link_netns.py [1]. Add a config file in selftests/net/can to store the minimum set of kconfig needed for CAN selftests. [1] https://patch.msgid.link/20250219125039.18024-14-shaw.leon@gmail.com Fixes: `77442ffa83` ("selftests: can: Import tst-filter from can-tests") Reviewed-by: Vincent Mailhol <mailhol@kernel.org> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Link: https://patch.msgid.link/fa4c0ea262ec529f25e5f5aa9269d84764c67321.1757516009.git.dcaratti@redhat.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2025-09-10 17:11:50 +02:00
Mario Limonciello (AMD)	7df7b728c3	DRM: Add a new 'boot_display' attribute On systems with multiple GPUs there can be uncertainty which GPU is the primary one used to drive the display at bootup. In some desktop environments this can lead to increased power consumption because secondary GPUs may be used for rendering and never go to a low power state. In order to disambiguate this add a new sysfs attribute 'boot_display' that uses the output of video_is_primary_device() to populate whether the PCI device was used for driving the display. Suggested-by: Manivannan Sadhasivam <mani@kernel.org> Acked-by: Manivannan Sadhasivam <mani@kernel.org> Link: https://gitlab.freedesktop.org/xorg/lib/libpciaccess/-/issues/23 Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://lore.kernel.org/r/20250811162606.587759-5-superm1@kernel.org Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>	2025-09-10 09:35:33 -05:00
Mario Limonciello (AMD)	ad90860bd1	fbcon: Use screen info to find primary device On systems with non VGA GPUs fbcon can't find the primary GPU because video_is_primary_device() only checks the VGA arbiter. Add a screen info check to video_is_primary_device() so that callers can get accurate data on such systems. Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Suggested-by: Thomas Zimmermann <tzimmermann@suse.de> Suggested-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20250811162606.587759-4-superm1@kernel.org Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>	2025-09-10 09:35:33 -05:00
Mario Limonciello (AMD)	337bf13aa9	PCI/VGA: Replace vga_is_firmware_default() with a screen info check vga_is_firmware_default() checks firmware resources to find the owner framebuffer resources to find the firmware PCI device. This is an open coded implementation of screen_info_pci_dev(). Switch to using screen_info_pci_dev() instead. Acked-by: Bjorn Helgaas <bhelgaas@google.com> Suggested-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://lore.kernel.org/r/20250811162606.587759-3-superm1@kernel.org Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>	2025-09-10 09:35:33 -05:00
Mario Limonciello (AMD)	6e490dea61	Fix access to video_is_primary_device() when compiled without CONFIG_VIDEO When compiled without CONFIG_VIDEO the architecture specific implementations of video_is_primary_device() include prototypes and assume that video-common.c will be linked. Guard against this so that the fallback inline implementation that returns false will be used when compiled without CONFIG_VIDEO. Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202506221312.49Fy1aNA-lkp@intel.com/ Link: https://lore.kernel.org/r/20250811162606.587759-2-superm1@kernel.org Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>	2025-09-10 09:35:33 -05:00
Kuniyuki Iwashima	a3967baad4	tcp_bpf: Call sk_msg_free() when tcp_bpf_send_verdict() fails to allocate psock->cork. syzbot reported the splat below. [0] The repro does the following: 1. Load a sk_msg prog that calls bpf_msg_cork_bytes(msg, cork_bytes) 2. Attach the prog to a SOCKMAP 3. Add a socket to the SOCKMAP 4. Activate fault injection 5. Send data less than cork_bytes At 5., the data is carried over to the next sendmsg() as it is smaller than the cork_bytes specified by bpf_msg_cork_bytes(). Then, tcp_bpf_send_verdict() tries to allocate psock->cork to hold the data, but this fails silently due to fault injection + __GFP_NOWARN. If the allocation fails, we need to revert the sk->sk_forward_alloc change done by sk_msg_alloc(). Let's call sk_msg_free() when tcp_bpf_send_verdict fails to allocate psock->cork. The "copied" also needs to be updated such that a proper error can be returned to the caller, sendmsg. It fails to allocate psock->cork. Nothing has been corked so far, so this patch simply sets "copied" to 0. [0]: WARNING: net/ipv4/af_inet.c:156 at inet_sock_destruct+0x623/0x730 net/ipv4/af_inet.c:156, CPU#1: syz-executor/5983 Modules linked in: CPU: 1 UID: 0 PID: 5983 Comm: syz-executor Not tainted syzkaller #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2025 RIP: 0010:inet_sock_destruct+0x623/0x730 net/ipv4/af_inet.c:156 Code: 0f 0b 90 e9 62 fe ff ff e8 7a db b5 f7 90 0f 0b 90 e9 95 fe ff ff e8 6c db b5 f7 90 0f 0b 90 e9 bb fe ff ff e8 5e db b5 f7 90 <0f> 0b 90 e9 e1 fe ff ff 89 f9 80 e1 07 80 c1 03 38 c1 0f 8c 9f fc RSP: 0018:ffffc90000a08b48 EFLAGS: 00010246 RAX: ffffffff8a09d0b2 RBX: dffffc0000000000 RCX: ffff888024a23c80 RDX: 0000000000000100 RSI: 0000000000000fff RDI: 0000000000000000 RBP: 0000000000000fff R08: ffff88807e07c627 R09: 1ffff1100fc0f8c4 R10: dffffc0000000000 R11: ffffed100fc0f8c5 R12: ffff88807e07c380 R13: dffffc0000000000 R14: ffff88807e07c60c R15: 1ffff1100fc0f872 FS: 00005555604c4500(0000) GS:ffff888125af1000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005555604df5c8 CR3: 0000000032b06000 CR4: 00000000003526f0 Call Trace: <IRQ> __sk_destruct+0x86/0x660 net/core/sock.c:2339 rcu_do_batch kernel/rcu/tree.c:2605 [inline] rcu_core+0xca8/0x1770 kernel/rcu/tree.c:2861 handle_softirqs+0x286/0x870 kernel/softirq.c:579 __do_softirq kernel/softirq.c:613 [inline] invoke_softirq kernel/softirq.c:453 [inline] __irq_exit_rcu+0xca/0x1f0 kernel/softirq.c:680 irq_exit_rcu+0x9/0x30 kernel/softirq.c:696 instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1052 [inline] sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1052 </IRQ> Fixes: `4f738adba3` ("bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data") Reported-by: syzbot+4cabd1d2fa917a456db8@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/68c0b6b5.050a0220.3c6139.0013.GAE@google.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20250909232623.4151337-1-kuniyu@google.com	2025-09-10 06:53:56 -07:00
Danilo Krummrich	87b90cee22	MAINTAINERS: drm-misc: fix X: entries for nova/nouveau Nouveau patches usually flow through the drm-misc tree, while nova (and nova-core) are maintained through a dedicated driver tree and soon through drm-rust. Hence, fix up the corresponding X: entries to list nova instead of nouveau. Reported-by: Maxime Ripard <mripard@kernel.org> Closes: https://lore.kernel.org/dri-devel/enuksb2qk5wyrilz3l2vnog45lghgmplrav5to6pd5k5owi36h@pxdq6y5dpgpt/ Acked-by: Maxime Ripard <mripard@kernel.org> Link: https://lore.kernel.org/r/20250902190247.435340-1-dakr@kernel.org Signed-off-by: Danilo Krummrich <dakr@kernel.org>	2025-09-10 15:52:25 +02:00
James Guan	9c600589e1	wifi: virt_wifi: Fix page fault on connect This patch prevents page fault in __cfg80211_connect_result()[1] when connecting a virt_wifi device, while ensuring that virt_wifi can connect properly. [1] https://lore.kernel.org/linux-wireless/20250909063213.1055024-1-guan_yufei@163.com/ Closes: https://lore.kernel.org/linux-wireless/20250909063213.1055024-1-guan_yufei@163.com/ Signed-off-by: James Guan <guan_yufei@163.com> Link: https://patch.msgid.link/20250910111929.137049-1-guan_yufei@163.com [remove irrelevant network-manager instructions] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-09-10 15:01:07 +02:00
Johan Hovold	9ba2556cef	drm/mediatek: clean up driver data initialisation The platform and drm devices are only used to look up the drm device and its driver data respectively when initialising the driver data during bind(). Drop the reference counts as soon as they have been used to make the code more readable. Note that the crtc count is never incremented on lookup failures. Signed-off-by: Johan Hovold <johan@kernel.org> Reviewed-by: CK Hu <ck.hu@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://patchwork.kernel.org/project/dri-devel/patch/20250829090345.21075-3-johan@kernel.org/ Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>	2025-09-10 12:52:59 +00:00
Johan Hovold	4de37a48b6	drm/mediatek: fix potential OF node use-after-free The for_each_child_of_node() helper drops the reference it takes to each node as it iterates over children and an explicit of_node_put() is only needed when exiting the loop early. Drop the recently introduced bogus additional reference count decrement at each iteration that could potentially lead to a use-after-free. Fixes: `1f403699c4` ("drm/mediatek: Fix device/node reference count leaks in mtk_drm_get_all_drm_priv") Cc: Ma Ke <make24@iscas.ac.cn> Cc: stable@vger.kernel.org Signed-off-by: Johan Hovold <johan@kernel.org> Reviewed-by: CK Hu <ck.hu@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://patchwork.kernel.org/project/dri-devel/patch/20250829090345.21075-2-johan@kernel.org/ Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>	2025-09-10 12:49:37 +00:00
Rodrigo Vivi	702fdf3513	Merge drm/drm-next into drm-intel-next Catching up with some display dependencies. Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-09-10 08:01:42 -04:00
Rafael J. Wysocki	e042354147	PM: EM: Add function for registering a PD without capacity update The intel_pstate driver manages CPU capacity changes itself and it does not need an update of the capacity of all CPUs in the system to be carried out after registering a PD. Moreover, in some configurations (for instance, an SMT-capable hybrid x86 system booted with nosmt in the kernel command line) the em_check_capacity_update() call at the end of em_dev_register_perf_domain() always fails and reschedules itself to run once again in 1 s, so effectively it runs in vain every 1 s forever. To address this, introduce a new variant of em_dev_register_perf_domain(), called em_dev_register_pd_no_update(), that does not invoke em_check_capacity_update(), and make intel_pstate use it instead of the original. Fixes: `7b010f9b90` ("cpufreq: intel_pstate: EAS support for hybrid platforms") Closes: https://lore.kernel.org/linux-pm/40212796-734c-4140-8a85-854f72b8144d@panix.com/ Reported-by: Kenneth R. Crudup <kenny@panix.com> Tested-by: Kenneth R. Crudup <kenny@panix.com> Cc: 6.16+ <stable@vger.kernel.org> # 6.16+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-09-10 12:03:19 +02:00
Jacek Lawrynowicz	7acbe30813	MAINTAINERS: Remove Jacek Lawrynowicz as intel_vpu maintainer Remove myself from the intel_vpu driver maintainer list as I'm moving to another company. Time to let someone else deal with the NPU bugs while I pretend to know what I'm doing elsewhere! Thanks to everyone for the great collaboration (and for putting up with my creative interpretations of what "minor fixes" means). Reviewed-by: Maciej Falkowski <maciej.falkowski@linux.intel.com> Reviewed-by: Karol Wachowski <karol.wachowski@linux.intel.com> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Link: https://lore.kernel.org/r/20250910085526.230458-1-jacek.lawrynowicz@linux.intel.com	2025-09-10 11:14:41 +02:00
Danilo Krummrich	d4dc08c530	Merge drm-misc-next-2025-08-21 into drm-rust-next We need the DRM Rust changes that went into drm-misc before the existence of the drm-rust tree in here as well. Signed-off-by: Danilo Krummrich <dakr@kernel.org>	2025-09-10 11:07:05 +02:00
Thomas Hellström	844150c255	drm/xe: Convert pinned suspend eviction for exhaustive eviction Pinned suspend eviction and preparation for eviction validates system memory for eviction buffers. Do that under a validation exclusive lock to avoid interfering with other processes validating system graphics memory. v2: - Avoid gotos from within xe_validation_guard(). - Adapt to signature change of xe_validation_guard(). Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250908101246.65025-14-thomas.hellstrom@linux.intel.com	2025-09-10 09:16:10 +02:00
Thomas Hellström	1f1541720f	drm/xe: Rework instances of variants of xe_bo_create_locked() A common pattern is to create a locked bo, pin it without mapping and then unlock it. Add a function to do that, which internally uses xe_validation_guard(). With that we can remove xe_bo_create_locked_range() and add exhaustive eviction to stolen, pf_provision_vf_lmem and psmi_alloc_object. v4: - New patch after reorganization. v5: - Replace DRM_XE_GEM_CPU_CACHING_WB with 0. (CI) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250908101246.65025-13-thomas.hellstrom@linux.intel.com	2025-09-10 09:16:09 +02:00
Thomas Hellström	59eabff2a3	drm/xe: Convert xe_bo_create_pin_map() for exhaustive eviction Introduce an xe_bo_create_pin_map_novm() function that does not take the drm_exec paramenter to simplify the conversion of many callsites. For the rest, ensure that the same drm_exec context that was used for locking the vm is passed down to validation. Use xe_validation_guard() where appropriate. v2: - Avoid gotos from within xe_validation_guard(). (Matt Brost) - Break out the change to pf_provision_vf_lmem8 to a separate patch. - Adapt to signature change of xe_validation_guard(). Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250908101246.65025-12-thomas.hellstrom@linux.intel.com	2025-09-10 09:16:06 +02:00
Thomas Hellström	e6108eade1	drm/xe: Convert xe_bo_create_pin_map_at() for exhaustive eviction Most users of xe_bo_create_pin_map_at() and xe_bo_create_pin_map_at_aligned() are not using the vm parameter, and that simplifies conversion. Introduce an xe_bo_create_pin_map_at_novm() function and make the _aligned() version static. Use xe_validation_guard() for conversion. v2: - Adapt to signature change of xe_validation_guard(). (Matt Brost) - Fix up documentation. v4: - Postpone the change to i915_gem_stolen_insert_node_in_range() to a later patch. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250908101246.65025-11-thomas.hellstrom@linux.intel.com	2025-09-10 09:16:05 +02:00
Thomas Hellström	550a42a8da	drm/xe: Rename ___xe_bo_create_locked() Don't start external function names with underscores. Rename to xe_bo_init_locked(). Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250908101246.65025-10-thomas.hellstrom@linux.intel.com	2025-09-10 09:16:04 +02:00
Thomas Hellström	eb289a5f6c	drm/xe: Convert xe_dma_buf.c for exhaustive eviction Convert dma-buf migration to XE_PL_TT and dma-buf import to support exhaustive eviction, using xe_validation_guard(). It seems unlikely that the import would result in an -ENOMEM, but convert import anyway for completeness. The dma-buf map_attachment() functionality unfortunately doesn't support passing a drm_exec, which means that foreign devices validating a dma-buf that we exported will not, unless they are xeKMD devices, participate in the exhaustive eviction scheme. v2: - Avoid gotos from within xe_validation_guard(). (Matt Brost) - Adapt to signature change of xe_validation_guard(). (Matt Brost) - Remove an unneeded (void)ret. (Matt Brost) - Fix up an error path. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250908101246.65025-9-thomas.hellstrom@linux.intel.com	2025-09-10 09:16:03 +02:00
Thomas Hellström	7bcb6e38c1	drm/xe/display: Convert __xe_pin_fb_vma() Convert __xe_pin_fb_vma() for exhaustive eviction using xe_validation_guard(). v2: - Avoid gotos from within xe_validation_guard(). (Matt Brost) - Adapt to signature change of xe_validation_guard(). (Matt Brost) - Use interruptible waiting, since xe_bo_migrate() already does that. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250908101246.65025-8-thomas.hellstrom@linux.intel.com	2025-09-10 09:16:02 +02:00
Thomas Hellström	c2ae94cf8c	drm/xe: Convert the CPU fault handler for exhaustive eviction The CPU fault handler may populate bos and migrate, and in doing so might interfere with other tasks validating. Rework the CPU fault handler completely into a fastpath and a slowpath. The fastpath trylocks only the validation lock in read-mode. If that fails, there's a fallback to the slowpath, where we do a full validation transaction. This mandates open-coding of bo locking, bo idling and bo populating, but we still call into TTM for fault finalizing. v2: - Rework the CPU fault handler to actually take part in the exhaustive eviction scheme (Matthew Brost). v3: - Don't return anything but VM_FAULT_RETRY if we've dropped the mmap_lock. Not even if a signal is pending. - Rebase on gpu_madvise() and split out fault migration. - Wait for idle after migration. - Check whether the resource manager uses tts to determine whether to map the tt or iomem. - Add a number of asserts. - Allow passing a ttm_operation_ctx to xe_bo_migrate() so that it's possible to try non-blocking migration. - Don't fall through to TTM on migration / population error Instead remove the gfp_retry_mayfail in mode 2 where we must succeed. (Matthew Brost) v5: - Don't allow faulting in the imported bo case (Matthew Brost) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthews Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250908101246.65025-7-thomas.hellstrom@linux.intel.com	2025-09-10 09:16:01 +02:00
Thomas Hellström	8f25e5abcb	drm/xe: Convert existing drm_exec transactions for exhaustive eviction Convert existing drm_exec transactions, like GT pagefault validation, non-LR exec() IOCTL and the rebind worker to support exhaustive eviction using the xe_validation_guard(). v2: - Adapt to signature change in xe_validation_guard() (Matt Brost) - Avoid gotos from within xe_validation_guard() (Matt Brost) - Check error return from xe_validation_guard() v3: - Rebase on gpu_madvise() Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> #v1 Link: https://lore.kernel.org/r/20250908101246.65025-6-thomas.hellstrom@linux.intel.com	2025-09-10 09:16:00 +02:00
Thomas Hellström	1710cd5c8c	drm/xe: Convert SVM validation for exhaustive eviction Convert SVM validation to support exhaustive eviction, using xe_validation_guard(). v2: - Wrap also xe_vm_range_rebind (Matt Brost) - Adapt to argument changes of xe_validation_guard(). v5: - Rebase on SVM stats. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250908101246.65025-5-thomas.hellstrom@linux.intel.com	2025-09-10 09:15:59 +02:00
Thomas Hellström	a2f2453c2c	drm/xe: Convert xe_bo_create_user() for exhaustive eviction Use the xe_validation_guard() to convert xe_bo_create_user() for exhaustive eviction. v2: - Adapt to argument changes of xe_validation_guard() Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> #v1 Link: https://lore.kernel.org/r/20250908101246.65025-4-thomas.hellstrom@linux.intel.com	2025-09-10 09:15:57 +02:00
Thomas Hellström	c460bc2311	drm/xe: Introduce an xe_validation wrapper around drm_exec Introduce a validation wrapper xe_validation_guard() as a helper intended to be used around drm_exec transactions what perform validations. Once TTM can handle exhaustive eviction we could remove this wrapper or make it mostly a NO-OP unless other functionality is added to it. Currently the wrapper takes a read lock upon entry and if the transaction hits an OOM, all locks are released and the transaction is retried with a write-lock. If all other validations participate in this scheme, the transaction with the write lock will be the only transaction validating and should have access to all available non-pinned memory. There is currently a problem in that TTM converts -EDEADLOCKS to -ENOMEM, and with ww_mutex slowpath error injections, we can hit -ENOMEMs without having actually ran out of memory. We abuse ww_mutex internals to detect such situations until TTM is fixes to not convert the error code. In the meantime, injecting ww_mutex slowpath -EDEADLOCKs is a good way to test the implementation in the absence of real OOMs. Just introduce the wrapper in this commit. It will be hooked up to the driver in following commits. v2: - Mark class_xe_validation conditional so that the loop is skipped on initialization error. - Argument sanitation (Matt Brost) - Fix conditional execution of xe_validation_ctx_fini() (Matt Brost) - Add a no_block mode for upcoming use in the CPU fault handler. v4: - Update kerneldoc. (Xe CI). Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250908101246.65025-3-thomas.hellstrom@linux.intel.com	2025-09-10 09:15:56 +02:00
Thomas Hellström	0131514f97	drm/xe: Pass down drm_exec context to validation We want all validation (potential backing store allocation) to be part of a drm_exec transaction. Therefore add a drm_exec pointer argument to xe_bo_validate() and ___xe_bo_create_locked(). Upcoming patches will deal with making all (or nearly all) calls to these functions part of a drm_exec transaction. In the meantime, define special values of the drm_exec pointer: XE_VALIDATION_UNIMPLEMENTED: Implementation of the drm_exec transaction has not been done yet. XE_VALIDATION_UNSUPPORTED: Some Middle-layers (dma-buf) doesn't allow the drm_exec context to be passed down to map_attachment where validation takes place. XE_VALIDATION_OPT_OUT: May be used only for kunit tests where exhaustive eviction isn't crucial and the ROI of converting those is very small. For XE_VALIDATION_UNIMPLEMENTED and XE_VALIDATION_OPT_OUT there is also a lockdep check that a drm_exec transaction can indeed start at the location where the macro is expanded. This is to encourage developers to take this into consideration early in the code development process. v2: - Fix xe_vm_set_validation_exec() imbalance. Add an assert that hopefully catches future instances of this (Matt Brost) v3: - Extend to psmi_alloc_object Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> #v3 Link: https://lore.kernel.org/r/20250908101246.65025-2-thomas.hellstrom@linux.intel.com	2025-09-10 09:15:52 +02:00
Nithyanantham Paramasivam	8cc71fc3b8	wifi: cfg80211: Fix "no buffer space available" error in nl80211_get_station() for MLO Currently, nl80211_get_station() allocates a fixed buffer size using NLMSG_DEFAULT_SIZE. In multi-link scenarios - particularly when the number of links exceeds two - this buffer size is often insufficient to accommodate complete station statistics, resulting in "no buffer space available" errors. To address this, modify nl80211_get_station() to return only accumulated station statistics and exclude per link stats. Pass a new flag (link_stats) to nl80211_send_station() to control the inclusion of per link statistics. This allows retaining detailed output with per link data in dump commands, while excluding it from other commands where it is not needed. This change modifies the handling of per link stats introduced in commit `82d7f841d9` ("wifi: cfg80211: extend to embed link level statistics in NL message") to enable them only for nl80211_dump_station(). Apply the same fix to cfg80211_del_sta_sinfo() by skipping per link stats to avoid buffer issues. cfg80211_new_sta() doesn't include stats and is therefore not impacted. Fixes: `82d7f841d9` ("wifi: cfg80211: extend to embed link level statistics in NL message") Signed-off-by: Nithyanantham Paramasivam <nithyanantham.paramasivam@oss.qualcomm.com> Link: https://patch.msgid.link/20250905124800.1448493-1-nithyanantham.paramasivam@oss.qualcomm.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-09-10 09:11:35 +02:00
Johannes Berg	a814c36cc6	Merge tag 'iwlwifi-fixes-2025-09-09' of https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-next Miri Korenblit says: ==================== iwlwifi fix ==================== Which is a fix for (old) 130/1030 devices to work again. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-09-10 09:10:11 +02:00
Johannes Berg	bda605962c	Merge tag 'ath-current-20250909' of git://git.kernel.org/pub/scm/linux/kernel/git/ath/ath Jeff Johnson says: ================== ath.git update for v6.17-rc6 ================== There's a firmware API alignment fix, and a fix for powersave, both for ath12k. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-09-10 09:08:44 +02:00
Yuezhang Mo	181993bb0d	erofs: fix runtime warning on truncate_folio_batch_exceptionals() Commit 0e2f80afcfa6("fs/dax: ensure all pages are idle prior to filesystem unmount") introduced the WARN_ON_ONCE to capture whether the filesystem has removed all DAX entries or not and applied the fix to xfs and ext4. Apply the missed fix on erofs to fix the runtime warning: [ 5.266254] ------------[ cut here ]------------ [ 5.266274] WARNING: CPU: 6 PID: 3109 at mm/truncate.c:89 truncate_folio_batch_exceptionals+0xff/0x260 [ 5.266294] Modules linked in: [ 5.266999] CPU: 6 UID: 0 PID: 3109 Comm: umount Tainted: G S 6.16.0+ #6 PREEMPT(voluntary) [ 5.267012] Tainted: [S]=CPU_OUT_OF_SPEC [ 5.267017] Hardware name: Dell Inc. OptiPlex 5000/05WXFV, BIOS 1.5.1 08/24/2022 [ 5.267024] RIP: 0010:truncate_folio_batch_exceptionals+0xff/0x260 [ 5.267076] Code: 00 00 41 39 df 7f 11 eb 78 83 c3 01 49 83 c4 08 41 39 df 74 6c 48 63 f3 48 83 fe 1f 0f 83 3c 01 00 00 43 f6 44 26 08 01 74 df <0f> 0b 4a 8b 34 22 4c 89 ef 48 89 55 90 e8 ff 54 1f 00 48 8b 55 90 [ 5.267083] RSP: 0018:ffffc900013f36c8 EFLAGS: 00010202 [ 5.267095] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 5.267101] RDX: ffffc900013f3790 RSI: 0000000000000000 RDI: ffff8882a1407898 [ 5.267108] RBP: ffffc900013f3740 R08: 0000000000000000 R09: 0000000000000000 [ 5.267113] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 5.267119] R13: ffff8882a1407ab8 R14: ffffc900013f3888 R15: 0000000000000001 [ 5.267125] FS: 00007aaa8b437800(0000) GS:ffff88850025b000(0000) knlGS:0000000000000000 [ 5.267132] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5.267138] CR2: 00007aaa8b3aac10 CR3: 000000024f764000 CR4: 0000000000f52ef0 [ 5.267144] PKRU: 55555554 [ 5.267150] Call Trace: [ 5.267154] <TASK> [ 5.267181] truncate_inode_pages_range+0x118/0x5e0 [ 5.267193] ? save_trace+0x54/0x390 [ 5.267296] truncate_inode_pages_final+0x43/0x60 [ 5.267309] evict+0x2a4/0x2c0 [ 5.267339] dispose_list+0x39/0x80 [ 5.267352] evict_inodes+0x150/0x1b0 [ 5.267376] generic_shutdown_super+0x41/0x180 [ 5.267390] kill_block_super+0x1b/0x50 [ 5.267402] erofs_kill_sb+0x81/0x90 [erofs] [ 5.267436] deactivate_locked_super+0x32/0xb0 [ 5.267450] deactivate_super+0x46/0x60 [ 5.267460] cleanup_mnt+0xc3/0x170 [ 5.267475] __cleanup_mnt+0x12/0x20 [ 5.267485] task_work_run+0x5d/0xb0 [ 5.267499] exit_to_user_mode_loop+0x144/0x170 [ 5.267512] do_syscall_64+0x2b9/0x7c0 [ 5.267523] ? __lock_acquire+0x665/0x2ce0 [ 5.267535] ? __lock_acquire+0x665/0x2ce0 [ 5.267560] ? lock_acquire+0xcd/0x300 [ 5.267573] ? find_held_lock+0x31/0x90 [ 5.267582] ? mntput_no_expire+0x97/0x4e0 [ 5.267606] ? mntput_no_expire+0xa1/0x4e0 [ 5.267625] ? mntput+0x24/0x50 [ 5.267634] ? path_put+0x1e/0x30 [ 5.267647] ? do_faccessat+0x120/0x2f0 [ 5.267677] ? do_syscall_64+0x1a2/0x7c0 [ 5.267686] ? from_kgid_munged+0x17/0x30 [ 5.267703] ? from_kuid_munged+0x13/0x30 [ 5.267711] ? __do_sys_getuid+0x3d/0x50 [ 5.267724] ? do_syscall_64+0x1a2/0x7c0 [ 5.267732] ? irqentry_exit+0x77/0xb0 [ 5.267743] ? clear_bhb_loop+0x30/0x80 [ 5.267752] ? clear_bhb_loop+0x30/0x80 [ 5.267765] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 5.267772] RIP: 0033:0x7aaa8b32a9fb [ 5.267781] Code: c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 e9 83 0d 00 f7 d8 [ 5.267787] RSP: 002b:00007ffd7c4c9468 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [ 5.267796] RAX: 0000000000000000 RBX: 00005a61592a8b00 RCX: 00007aaa8b32a9fb [ 5.267802] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00005a61592b2080 [ 5.267806] RBP: 00007ffd7c4c9540 R08: 00007aaa8b403b20 R09: 0000000000000020 [ 5.267812] R10: 0000000000000001 R11: 0000000000000246 R12: 00005a61592a8c00 [ 5.267817] R13: 0000000000000000 R14: 00005a61592b2080 R15: 00005a61592a8f10 [ 5.267849] </TASK> [ 5.267854] irq event stamp: 4721 [ 5.267859] hardirqs last enabled at (4727): [<ffffffff814abf50>] __up_console_sem+0x90/0xa0 [ 5.267873] hardirqs last disabled at (4732): [<ffffffff814abf35>] __up_console_sem+0x75/0xa0 [ 5.267884] softirqs last enabled at (3044): [<ffffffff8132adb3>] kernel_fpu_end+0x53/0x70 [ 5.267895] softirqs last disabled at (3042): [<ffffffff8132b5f4>] kernel_fpu_begin_mask+0xc4/0x120 [ 5.267905] ---[ end trace 0000000000000000 ]--- Fixes: `bde708f1a6` ("fs/dax: always remove DAX page-cache entries when breaking layouts") Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Friendy Su <friendy.su@sony.com> Reviewed-by: Daniel Palmer <daniel.palmer@sony.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2025-09-10 14:11:06 +08:00
Jakub Kicinski	78dd8ad62c	Merge branch 'mptcp-misc-fixes-for-v6-17-rc6' Matthieu Baerts says: ==================== mptcp: misc fixes for v6.17-rc6 Here are various unrelated fixes: - Patch 1: Fix a wrong attribute type in the MPTCP Netlink specs. A fix for v6.7. - Patch 2: Avoid mentioning a deprecated MPTCP sysctl knob in the doc. A fix for v6.15. - Patch 3: Handle new warnings from ShellCheck v0.11.0. This prevents some warnings reported by some CIs. If it is not a good material for 'net', please drop. ==================== Link: https://patch.msgid.link/20250908-net-mptcp-misc-fixes-6-17-rc5-v1-0-5f2168a66079@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-09 18:39:55 -07:00
Matthieu Baerts (NGI0)	ef1bd93b3b	selftests: mptcp: shellcheck: support v0.11.0 This v0.11.0 version introduces SC2329: Warn when (non-escaping) functions are never invoked. Except that, similar to SC2317, ShellCheck is currently unable to figure out functions that are invoked via trap, or indirectly, when calling functions via variables. It is then needed to disable this new SC2329. Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250908-net-mptcp-misc-fixes-6-17-rc5-v1-3-5f2168a66079@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-09 18:39:52 -07:00
Matthieu Baerts (NGI0)	6f021e95d0	doc: mptcp: net.mptcp.pm_type is deprecated The net.mptcp.pm_type sysctl knob has been deprecated in v6.15, net.mptcp.path_manager should be used instead. Adapt the section about path managers to suggest using the new sysctl knob instead of the deprecated one. Fixes: `595c26d122` ("mptcp: sysctl: set path manager by name") Cc: stable@vger.kernel.org Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250908-net-mptcp-misc-fixes-6-17-rc5-v1-2-5f2168a66079@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-09 18:39:52 -07:00
Matthieu Baerts (NGI0)	7094b84863	netlink: specs: mptcp: fix if-idx attribute type This attribute is used as a signed number in the code in pm_netlink.c: nla_put_s32(skb, MPTCP_ATTR_IF_IDX, ssk->sk_bound_dev_if)) The specs should then reflect that. Note that other 'if-idx' attributes from the same .yaml file use a signed number as well. Fixes: `bc8aeb2045` ("Documentation: netlink: add a YAML spec for mptcp") Cc: stable@vger.kernel.org Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250908-net-mptcp-misc-fixes-6-17-rc5-v1-1-5f2168a66079@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-09 18:39:52 -07:00
Krister Johansen	648de37416	mptcp: sockopt: make sync_socket_options propagate SOCK_KEEPOPEN Users reported a scenario where MPTCP connections that were configured with SO_KEEPALIVE prior to connect would fail to enable their keepalives if MTPCP fell back to TCP mode. After investigating, this affects keepalives for any connection where sync_socket_options is called on a socket that is in the closed or listening state. Joins are handled properly. For connects, sync_socket_options is called when the socket is still in the closed state. The tcp_set_keepalive() function does not act on sockets that are closed or listening, hence keepalive is not immediately enabled. Since the SO_KEEPOPEN flag is absent, it is not enabled later in the connect sequence via tcp_finish_connect. Setting the keepalive via sockopt after connect does work, but would not address any subsequently created flows. Fortunately, the fix here is straight-forward: set SOCK_KEEPOPEN on the subflow when calling sync_socket_options. The fix was valdidated both by using tcpdump to observe keepalive packets not being sent before the fix, and being sent after the fix. It was also possible to observe via ss that the keepalive timer was not enabled on these sockets before the fix, but was enabled afterwards. Fixes: `1b3e7ede13` ("mptcp: setsockopt: handle SO_KEEPALIVE and SO_PRIORITY") Cc: stable@vger.kernel.org Signed-off-by: Krister Johansen <kjlx@templeofstupid.com> Reviewed-by: Geliang Tang <geliang@kernel.org> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/aL8dYfPZrwedCIh9@templeofstupid.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-09 18:38:06 -07:00
Stanislav Fomichev	0f82c3ba66	macsec: sync features on RTM_NEWLINK Syzkaller managed to lock the lower device via ETHTOOL_SFEATURES: netdev_lock include/linux/netdevice.h:2761 [inline] netdev_lock_ops include/net/netdev_lock.h:42 [inline] netdev_sync_lower_features net/core/dev.c:10649 [inline] __netdev_update_features+0xcb1/0x1be0 net/core/dev.c:10819 netdev_update_features+0x6d/0xe0 net/core/dev.c:10876 macsec_notify+0x2f5/0x660 drivers/net/macsec.c:4533 notifier_call_chain+0x1b3/0x3e0 kernel/notifier.c:85 call_netdevice_notifiers_extack net/core/dev.c:2267 [inline] call_netdevice_notifiers net/core/dev.c:2281 [inline] netdev_features_change+0x85/0xc0 net/core/dev.c:1570 __dev_ethtool net/ethtool/ioctl.c:3469 [inline] dev_ethtool+0x1536/0x19b0 net/ethtool/ioctl.c:3502 dev_ioctl+0x392/0x1150 net/core/dev_ioctl.c:759 It happens because lower features are out of sync with the upper: __dev_ethtool (real_dev) netdev_lock_ops(real_dev) ETHTOOL_SFEATURES __netdev_features_change netdev_sync_upper_features disable LRO on the lower if (old_features != dev->features) netdev_features_change fires NETDEV_FEAT_CHANGE macsec_notify NETDEV_FEAT_CHANGE netdev_update_features (for each macsec dev) netdev_sync_lower_features if (upper_features != lower_features) netdev_lock_ops(lower) # lower == real_dev stuck ... netdev_unlock_ops(real_dev) Per commit `af5f54b0ef` ("net: Lock lower level devices when updating features"), we elide the lock/unlock when the upper and lower features are synced. Makes sure the lower (real_dev) has proper features after the macsec link has been created. This makes sure we never hit the situation where we need to sync upper flags to the lower. Reported-by: syzbot+7e0f89fb6cae5d002de0@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=7e0f89fb6cae5d002de0 Fixes: `7e4d784f58` ("net: hold netdev instance lock during rtnetlink operations") Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://patch.msgid.link/20250908173614.3358264-1-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-09 18:28:27 -07:00
Carolina Jubran	686cab5a18	net: dev_ioctl: take ops lock in hwtstamp lower paths ndo hwtstamp callbacks are expected to run under the per-device ops lock. Make the lower get/set paths consistent with the rest of ndo invocations. Kernel log: WARNING: CPU: 13 PID: 51364 at ./include/net/netdev_lock.h:70 __netdev_update_features+0x4bd/0xe60 ... RIP: 0010:__netdev_update_features+0x4bd/0xe60 ... Call Trace: <TASK> netdev_update_features+0x1f/0x60 mlx5_hwtstamp_set+0x181/0x290 [mlx5_core] mlx5e_hwtstamp_set+0x19/0x30 [mlx5_core] dev_set_hwtstamp_phylib+0x9f/0x220 dev_set_hwtstamp_phylib+0x9f/0x220 dev_set_hwtstamp+0x13d/0x240 dev_ioctl+0x12f/0x4b0 sock_ioctl+0x171/0x370 __x64_sys_ioctl+0x3f7/0x900 ? __sys_setsockopt+0x69/0xb0 do_syscall_64+0x6f/0x2e0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 ... </TASK> .... ---[ end trace 0000000000000000 ]--- Note that the mlx5_hwtstamp_set and mlx5e_hwtstamp_set functions shown in the trace come from an in progress patch converting the legacy ioctl to ndo_hwtstamp_get/set and are not present in mainline. Fixes: `ffb7ed19ac` ("net: hold netdev instance lock during ioctl operations") Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Link: https://patch.msgid.link/20250907080821.2353388-1-cjubran@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-09 18:13:36 -07:00
Paulo Alcantara	c5ea306558	smb: client: fix data loss due to broken rename(2) Rename of open files in SMB2+ has been broken for a very long time, resulting in data loss as the CIFS client would fail the rename(2) call with -ENOENT and then removing the target file. Fix this by implementing ->rename_pending_delete() for SMB2+, which will rename busy files to random filenames (e.g. silly rename) during unlink(2) or rename(2), and then marking them to delete-on-close. Besides, introduce a FIND_WR_NO_PENDING_DELETE flag to prevent open(2) from reusing open handles that had been marked as delete pending. Handle it in cifs_get_readable_path() as well. Reported-by: Jean-Baptiste Denis <jbdenis@pasteur.fr> Closes: https://marc.info/?i=16aeb380-30d4-4551-9134-4e7d1dc833c0@pasteur.fr Reviewed-by: David Howells <dhowells@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Cc: Frank Sorenson <sorenson@redhat.com> Cc: Olga Kornievskaia <okorniev@redhat.com> Cc: Benjamin Coddington <bcodding@redhat.com> Cc: Scott Mayhew <smayhew@redhat.com> Cc: linux-cifs@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>	2025-09-09 18:39:58 -05:00

... 6 7 8 9 10 ...

1384752 Commits