linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-19 00:01:01 -04:00

Author	SHA1	Message	Date
Greg Jumper	ebf71dd4af	net/rds: Restrict use of RDS/IB to the initial network namespace Prevent using RDS/IB in network namespaces other than the initial one. The existing RDS/IB code will not work properly in non-initial network namespaces. Fixes: `d5a8ac28a7` ("RDS-TCP: Make RDS-TCP work correctly when it is set up in a netns other than init_net") Reported-by: syzbot+da8e060735ae02c8f3d1@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=da8e060735ae02c8f3d1 Signed-off-by: Greg Jumper <greg.jumper@oracle.com> Signed-off-by: Allison Henderson <achender@kernel.org> Link: https://patch.msgid.link/20260408080420.540032-3-achender@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 13:33:19 -07:00
Håkon Bugge	236f718ac8	net/rds: Optimize rds_ib_laddr_check rds_ib_laddr_check() creates a CM_ID and attempts to bind the address in question to it. This in order to qualify the allegedly local address as a usable IB/RoCE address. In the field, ExaWatcher runs rds-ping to all ports in the fabric from all local ports. This using all active ToS'es. In a full rack system, we have 14 cell servers and eight db servers. Typically, 6 ToS'es are used. This implies 528 rds-ping invocations per ExaWatcher's "RDSinfo" interval. Adding to this, each rds-ping invocation creates eight sockets and binds the local address to them: socket(AF_RDS, SOCK_SEQPACKET, 0) = 3 bind(3, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("192.168.36.2")}, 16) = 0 socket(AF_RDS, SOCK_SEQPACKET, 0) = 4 bind(4, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("192.168.36.2")}, 16) = 0 socket(AF_RDS, SOCK_SEQPACKET, 0) = 5 bind(5, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("192.168.36.2")}, 16) = 0 socket(AF_RDS, SOCK_SEQPACKET, 0) = 6 bind(6, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("192.168.36.2")}, 16) = 0 socket(AF_RDS, SOCK_SEQPACKET, 0) = 7 bind(7, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("192.168.36.2")}, 16) = 0 socket(AF_RDS, SOCK_SEQPACKET, 0) = 8 bind(8, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("192.168.36.2")}, 16) = 0 socket(AF_RDS, SOCK_SEQPACKET, 0) = 9 bind(9, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("192.168.36.2")}, 16) = 0 socket(AF_RDS, SOCK_SEQPACKET, 0) = 10 bind(10, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("192.168.36.2")}, 16) = 0 So, at every interval ExaWatcher executes rds-ping's, 4224 CM_IDs are allocated, considering this full-rack system. After the a CM_ID has been allocated, rdma_bind_addr() is called, with the port number being zero. This implies that the CMA will attempt to search for an un-used ephemeral port. Simplified, the algorithm is to start at a random position in the available port space, and then if needed, iterate until an un-used port is found. The book-keeping of used ports uses the idr system, which again uses slab to allocate new struct idr_layer's. The size is 2092 bytes and slab tries to reduce the wasted space. Hence, it chooses an order:3 allocation, for which 15 idr_layer structs will fit and only 1388 bytes are wasted per the 32KiB order:3 chunk. Although this order:3 allocation seems like a good space/speed trade-off, it does not resonate well with how it used by the CMA. The combination of the randomized starting point in the port space (which has close to zero spatial locality) and the close proximity in time of the 4224 invocations of the rds-ping's, creates a memory hog for order:3 allocations. These costly allocations may need reclaims and/or compaction. At worst, they may fail and produce a stack trace such as (from uek4): [<ffffffff811a72d5>] __inc_zone_page_state+0x35/0x40 [<ffffffff811c2e97>] page_add_file_rmap+0x57/0x60 [<ffffffffa37ca1df>] remove_migration_pte+0x3f/0x3c0 [ksplice_6cn872bt_vmlinux_new] [<ffffffff811c3de8>] rmap_walk+0xd8/0x340 [<ffffffff811e8860>] remove_migration_ptes+0x40/0x50 [<ffffffff811ea83c>] migrate_pages+0x3ec/0x890 [<ffffffff811afa0d>] compact_zone+0x32d/0x9a0 [<ffffffff811b00ed>] compact_zone_order+0x6d/0x90 [<ffffffff811b03b2>] try_to_compact_pages+0x102/0x270 [<ffffffff81190e56>] __alloc_pages_direct_compact+0x46/0x100 [<ffffffff8119165b>] __alloc_pages_nodemask+0x74b/0xaa0 [<ffffffff811d8411>] alloc_pages_current+0x91/0x110 [<ffffffff811e3b0b>] new_slab+0x38b/0x480 [<ffffffffa41323c7>] __slab_alloc+0x3b7/0x4a0 [ksplice_s0dk66a8_vmlinux_new] [<ffffffff811e42ab>] kmem_cache_alloc+0x1fb/0x250 [<ffffffff8131fdd6>] idr_layer_alloc+0x36/0x90 [<ffffffff8132029c>] idr_get_empty_slot+0x28c/0x3d0 [<ffffffff813204ad>] idr_alloc+0x4d/0xf0 [<ffffffffa051727d>] cma_alloc_port+0x4d/0xa0 [rdma_cm] [<ffffffffa0517cbe>] rdma_bind_addr+0x2ae/0x5b0 [rdma_cm] [<ffffffffa09d8083>] rds_ib_laddr_check+0x83/0x2c0 [ksplice_6l2xst5i_rds_rdma_new] [<ffffffffa05f892b>] rds_trans_get_preferred+0x5b/0xa0 [rds] [<ffffffffa05f09f2>] rds_bind+0x212/0x280 [rds] [<ffffffff815b4016>] SYSC_bind+0xe6/0x120 [<ffffffff815b4d3e>] SyS_bind+0xe/0x10 [<ffffffff816b031a>] system_call_fastpath+0x18/0xd4 To avoid these excessive calls to rdma_bind_addr(), we optimize rds_ib_laddr_check() by simply checking if the address in question has been used before. The rds_rdma module keeps track of addresses associated with IB devices, and the function rds_ib_get_device() is used to determine if the address already has been qualified as a valid local address. If not found, we call the legacy rds_ib_laddr_check(), now renamed to rds_ib_laddr_check_cm(). Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com> Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com> Signed-off-by: Allison Henderson <achender@kernel.org> Link: https://patch.msgid.link/20260408080420.540032-2-achender@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 13:33:19 -07:00
Mashiro Chen	2835750dd6	net: rose: reject truncated CLEAR_REQUEST frames in state machines All five ROSE state machines (states 1-5) handle ROSE_CLEAR_REQUEST by reading the cause and diagnostic bytes directly from skb->data[3] and skb->data[4] without verifying that the frame is long enough: rose_disconnect(sk, ..., skb->data[3], skb->data[4]); The entry-point check in rose_route_frame() only enforces ROSE_MIN_LEN (3 bytes), so a remote peer on a ROSE network can send a syntactically valid but truncated CLEAR_REQUEST (3 or 4 bytes) while a connection is open in any state. Processing such a frame causes a one- or two-byte out-of-bounds read past the skb data, leaking uninitialized heap content as the cause/diagnostic values returned to user space via getsockopt(ROSE_GETCAUSE). Add a single length check at the rose_process_rx_frame() dispatch point, before any state machine is entered, to drop frames that carry the CLEAR_REQUEST type code but are too short to contain the required cause and diagnostic fields. Signed-off-by: Mashiro Chen <mashiro.chen@mailbox.org> Link: https://patch.msgid.link/20260408172551.281486-1-mashiro.chen@mailbox.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 13:09:45 -07:00
Jiayuan Chen	10f86a2a5c	bpf: Fix same-register dst/src OOB read and pointer leak in sock_ops When a BPF sock_ops program accesses ctx fields with dst_reg == src_reg, the SOCK_OPS_GET_SK() and SOCK_OPS_GET_FIELD() macros fail to zero the destination register in the !fullsock / !locked_tcp_sock path. Both macros borrow a temporary register to check is_fullsock / is_locked_tcp_sock when dst_reg == src_reg, because dst_reg holds the ctx pointer. When the check is false (e.g., TCP_NEW_SYN_RECV state with a request_sock), dst_reg should be zeroed but is not, leaving the stale ctx pointer: - SOCK_OPS_GET_SK: dst_reg retains the ctx pointer, passes NULL checks as PTR_TO_SOCKET_OR_NULL, and can be used as a bogus socket pointer, leading to stack-out-of-bounds access in helpers like bpf_skc_to_tcp6_sock(). - SOCK_OPS_GET_FIELD: dst_reg retains the ctx pointer which the verifier believes is a SCALAR_VALUE, leaking a kernel pointer. Fix both macros by: - Changing JMP_A(1) to JMP_A(2) in the fullsock path to skip the added instruction. - Adding BPF_MOV64_IMM(si->dst_reg, 0) after the temp register restore in the !fullsock path, placed after the restore because dst_reg == src_reg means we need src_reg intact to read ctx->temp. Fixes: `fd09af0107` ("bpf: sock_ops ctx access may stomp registers in corner case") Fixes: `84f44df664` ("bpf: sock_ops sk access may stomp registers when dst_reg = src_reg") Reported-by: Quan Sun <2022090917019@std.uestc.edu.cn> Reported-by: Yinhao Hu <dddddd@hust.edu.cn> Reported-by: Kaiyan Mei <M202472210@hust.edu.cn> Reported-by: Dongliang Mu <dzm91@hust.edu.cn> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Closes: https://lore.kernel.org/bpf/6fe1243e-149b-4d3b-99c7-fcc9e2f75787@std.uestc.edu.cn/T/#u Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260407022720.162151-2-jiayuan.chen@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 12:28:05 -07:00
Greg Kroah-Hartman	46ce8be2ce	NFC: digital: Bounds check NFC-A cascade depth in SDD response handler The NFC-A anti-collision cascade in digital_in_recv_sdd_res() appends 3 or 4 bytes to target->nfcid1 on each round, but the number of cascade rounds is controlled entirely by the peer device. The peer sets the cascade tag in the SDD_RES (deciding 3 vs 4 bytes) and the cascade-incomplete bit in the SEL_RES (deciding whether another round follows). ISO 14443-3 limits NFC-A to three cascade levels and target->nfcid1 is sized accordingly (NFC_NFCID1_MAXSIZE = 10), but nothing in the driver actually enforces this. This means a malicious peer can keep the cascade running, writing past the heap-allocated nfc_target with each round. Fix this by rejecting the response when the accumulated UID would exceed the buffer. Commit `e329e71013` ("NFC: nci: Bounds check struct nfc_target arrays") fixed similar missing checks against the same field on the NCI path. Cc: Simon Horman <horms@kernel.org> Cc: Kees Cook <kees@kernel.org> Cc: Thierry Escande <thierry.escande@linux.intel.com> Cc: Samuel Ortiz <sameo@linux.intel.com> Fixes: `2c66daecc4` ("NFC Digital: Add NFC-A technology support") Cc: stable <stable@kernel.org> Assisted-by: gregkh_clanker_t1000 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://patch.msgid.link/2026040913-figure-seducing-bd3f@gregkh Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 11:40:45 -07:00
Junxi Qian	2b5dd46329	nfc: llcp: add missing return after LLCP_CLOSED checks In nfc_llcp_recv_hdlc() and nfc_llcp_recv_disc(), when the socket state is LLCP_CLOSED, the code correctly calls release_sock() and nfc_llcp_sock_put() but fails to return. Execution falls through to the remainder of the function, which calls release_sock() and nfc_llcp_sock_put() again. This results in a double release_sock() and a refcount underflow via double nfc_llcp_sock_put(), leading to a use-after-free. Add the missing return statements after the LLCP_CLOSED branches in both functions to prevent the fall-through. Fixes: `d646960f79` ("NFC: Initial LLCP support") Signed-off-by: Junxi Qian <qjx1298677004@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260408081006.3723-1-qjx1298677004@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 11:12:44 -07:00
Jamal Hadi Salim	f462dca0c8	net/sched: act_ct: Only release RCU read lock after ct_ft When looking up a flow table in act_ct in tcf_ct_flow_table_get(), rhashtable_lookup_fast() internally opens and closes an RCU read critical section before returning ct_ft. The tcf_ct_flow_table_cleanup_work() can complete before refcount_inc_not_zero() is invoked on the returned ct_ft resulting in a UAF on the already freed ct_ft object. This vulnerability can lead to privilege escalation. Analysis from zdi-disclosures@trendmicro.com: When initializing act_ct, tcf_ct_init() is called, which internally triggers tcf_ct_flow_table_get(). static int tcf_ct_flow_table_get(struct net net, struct tcf_ct_params params) { struct zones_ht_key key = { .net = net, .zone = params->zone }; struct tcf_ct_flow_table ct_ft; int err = -ENOMEM; mutex_lock(&zones_mutex); ct_ft = rhashtable_lookup_fast(&zones_ht, &key, zones_params); // [1] if (ct_ft && refcount_inc_not_zero(&ct_ft->ref)) // [2] goto out_unlock; ... } static __always_inline void rhashtable_lookup_fast( struct rhashtable ht, const void key, const struct rhashtable_params params) { void obj; rcu_read_lock(); obj = rhashtable_lookup(ht, key, params); rcu_read_unlock(); return obj; } At [1], rhashtable_lookup_fast() looks up and returns the corresponding ct_ft from zones_ht . The lookup is performed within an RCU read critical section through rcu_read_lock() / rcu_read_unlock(), which prevents the object from being freed. However, at the point of function return, rcu_read_unlock() has already been called, and there is nothing preventing ct_ft from being freed before reaching refcount_inc_not_zero(&ct_ft->ref) at [2]. This interval becomes the race window, during which ct_ft can be freed. Free Process: tcf_ct_flow_table_put() is executed through the path tcf_ct_cleanup() call_rcu() tcf_ct_params_free_rcu() tcf_ct_params_free() tcf_ct_flow_table_put(). static void tcf_ct_flow_table_put(struct tcf_ct_flow_table ct_ft) { if (refcount_dec_and_test(&ct_ft->ref)) { rhashtable_remove_fast(&zones_ht, &ct_ft->node, zones_params); INIT_RCU_WORK(&ct_ft->rwork, tcf_ct_flow_table_cleanup_work); // [3] queue_rcu_work(act_ct_wq, &ct_ft->rwork); } } At [3], tcf_ct_flow_table_cleanup_work() is scheduled as RCU work static void tcf_ct_flow_table_cleanup_work(struct work_struct work) { struct tcf_ct_flow_table ct_ft; struct flow_block *block; ct_ft = container_of(to_rcu_work(work), struct tcf_ct_flow_table, rwork); nf_flow_table_free(&ct_ft->nf_ft); block = &ct_ft->nf_ft.flow_block; down_write(&ct_ft->nf_ft.flow_block_lock); WARN_ON(!list_empty(&block->cb_list)); up_write(&ct_ft->nf_ft.flow_block_lock); kfree(ct_ft); // [4] module_put(THIS_MODULE); } tcf_ct_flow_table_cleanup_work() frees ct_ft at [4]. When this function executes between [1] and [2], UAF occurs. This race condition has a very short race window, making it generally difficult to trigger. Therefore, to trigger the vulnerability an msleep(100) was inserted after[1] Fixes: `138470a9b2` ("net/sched: act_ct: fix lockdep splat in tcf_ct_flow_table_get") Reported-by: zdi-disclosures@trendmicro.com Tested-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20260410111627.46611-1-jhs@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 09:26:15 -07:00
Jakub Kicinski	372169fd95	Merge tag 'linux-can-fixes-for-7.0-20260409' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2026-04-09 Johan Hovold's patch fixes the a devres lifetime in the ucan driver. The last patch is by Samuel Page and fixes a use-after-free in raw_rcv() in the CAN_RAW protocol. * tag 'linux-can-fixes-for-7.0-20260409' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: raw: fix ro->uniq use-after-free in raw_rcv() can: ucan: fix devres lifetime ==================== Link: https://patch.msgid.link/20260409165942.588421-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 09:24:16 -07:00
Davide Caratti	65782b2db7	net/sched: cls_fw: fix NULL dereference of "old" filters before change() Like pointed out by Sashiko [1], since commit `ed76f5edcc` ("net: sched: protect filter_chain list with filter_chain_lock mutex") TC filters are added to a shared block and published to datapath before their ->change() function is called. This is a problem for cls_fw: an invalid filter created with the "old" method can still classify some packets before it is destroyed by the validation logic added by Xiang. Therefore, insisting with repeated runs of the following script: # ip link add dev crash0 type dummy # ip link set dev crash0 up # mausezahn crash0 -c 100000 -P 10 \ > -A 4.3.2.1 -B 1.2.3.4 -t udp "dp=1234" -q & # sleep 1 # tc qdisc add dev crash0 egress_block 1 clsact # tc filter add block 1 protocol ip prio 1 matchall \ > action skbedit mark 65536 continue # tc filter add block 1 protocol ip prio 2 fw # ip link del dev crash0 can still make fw_classify() hit the WARN_ON() in [2]: WARNING: ./include/net/pkt_cls.h:88 at fw_classify+0x244/0x250 [cls_fw], CPU#18: mausezahn/1399 Modules linked in: cls_fw(E) act_skbedit(E) CPU: 18 UID: 0 PID: 1399 Comm: mausezahn Tainted: G E 7.0.0-rc6-virtme #17 PREEMPT(full) Tainted: [E]=UNSIGNED_MODULE Hardware name: Red Hat KVM, BIOS 1.16.3-2.el9 04/01/2014 RIP: 0010:fw_classify+0x244/0x250 [cls_fw] Code: 5c 49 c7 45 00 00 00 00 00 41 5d 41 5e 41 5f 5d c3 cc cc cc cc 5b b8 ff ff ff ff 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc 90 <0f> 0b 90 eb a0 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 RSP: 0018:ffffd1b7026bf8a8 EFLAGS: 00010202 RAX: ffff8c5ac9c60800 RBX: ffff8c5ac99322c0 RCX: 0000000000000004 RDX: 0000000000000001 RSI: ffff8c5b74d7a000 RDI: ffff8c5ac8284f40 RBP: ffffd1b7026bf8d0 R08: 0000000000000000 R09: ffffd1b7026bf9b0 R10: 00000000ffffffff R11: 0000000000000000 R12: 0000000000010000 R13: ffffd1b7026bf930 R14: ffff8c5ac8284f40 R15: 0000000000000000 FS: 00007fca40c37740(0000) GS:ffff8c5b74d7a000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fca40e822a0 CR3: 0000000005ca0001 CR4: 0000000000172ef0 Call Trace: <TASK> tcf_classify+0x17d/0x5c0 tc_run+0x9d/0x150 __dev_queue_xmit+0x2ab/0x14d0 ip_finish_output2+0x340/0x8f0 ip_output+0xa4/0x250 raw_sendmsg+0x147d/0x14b0 __sys_sendto+0x1cc/0x1f0 __x64_sys_sendto+0x24/0x30 do_syscall_64+0x126/0xf80 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fca40e822ba Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89 RSP: 002b:00007ffc248a42c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 000055ef233289d0 RCX: 00007fca40e822ba RDX: 000000000000001e RSI: 000055ef23328c30 RDI: 0000000000000003 RBP: 000055ef233289d0 R08: 00007ffc248a42d0 R09: 0000000000000010 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000001e R13: 00000000000186a0 R14: 0000000000000000 R15: 00007fca41043000 </TASK> irq event stamp: 1045778 hardirqs last enabled at (1045784): [<ffffffff864ec042>] __up_console_sem+0x52/0x60 hardirqs last disabled at (1045789): [<ffffffff864ec027>] __up_console_sem+0x37/0x60 softirqs last enabled at (1045426): [<ffffffff874d48c7>] __alloc_skb+0x207/0x260 softirqs last disabled at (1045434): [<ffffffff874fe8f8>] __dev_queue_xmit+0x78/0x14d0 Then, because of the value in the packet's mark, dereference on 'q->handle' with NULL 'q' occurs: BUG: kernel NULL pointer dereference, address: 0000000000000038 [...] RIP: 0010:fw_classify+0x1fe/0x250 [cls_fw] [...] Skip "old-style" classification on shared blocks, so that the NULL dereference is fixed and WARN_ON() is not hit anymore in the short lifetime of invalid cls_fw "old-style" filters. [1] https://sashiko.dev/#/patchset/20260331050217.504278-1-xmei5%40asu.edu [2] https://elixir.bootlin.com/linux/v7.0-rc6/source/include/net/pkt_cls.h#L86 Fixes: `faeea8bbf6` ("net/sched: cls_fw: fix NULL pointer dereference on shared blocks") Fixes: `ed76f5edcc` ("net: sched: protect filter_chain list with filter_chain_lock mutex") Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Link: https://patch.msgid.link/e39cbd3103a337f1e515d186fe697b4459d24757.1775661704.git.dcaratti@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 08:49:13 -07:00
Samuel Page	a535a9217c	can: raw: fix ro->uniq use-after-free in raw_rcv() raw_release() unregisters raw CAN receive filters via can_rx_unregister(), but receiver deletion is deferred with call_rcu(). This leaves a window where raw_rcv() may still be running in an RCU read-side critical section after raw_release() frees ro->uniq, leading to a use-after-free of the percpu uniq storage. Move free_percpu(ro->uniq) out of raw_release() and into a raw-specific socket destructor. can_rx_unregister() takes an extra reference to the socket and only drops it from the RCU callback, so freeing uniq from sk_destruct ensures the percpu area is not released until the relevant callbacks have drained. Fixes: `514ac99c64` ("can: fix multiple delivery of a single CAN frame for overlapping CAN filters") Cc: stable@vger.kernel.org # v4.1+ Assisted-by: Bynario AI Signed-off-by: Samuel Page <sam@bynar.io> Link: https://patch.msgid.link/26ec626d-cae7-4418-9782-7198864d070c@bynar.io Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> [mkl: applied manually] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2026-04-09 18:51:42 +02:00
Linus Torvalds	a55f7f5f29	Merge tag 'net-7.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from netfilter, IPsec and wireless. This is again considerably bigger than the old average. No known outstanding regressions. Current release - regressions: - net: increase IP_TUNNEL_RECURSION_LIMIT to 5 - eth: ice: fix PTP timestamping broken by SyncE code on E825C Current release - new code bugs: - eth: stmmac: dwmac-motorcomm: fix eFUSE MAC address read failure Previous releases - regressions: - core: fix cross-cache free of KFENCE-allocated skb head - sched: act_csum: validate nested VLAN headers - rxrpc: fix call removal to use RCU safe deletion - xfrm: - wait for RCU readers during policy netns exit - fix refcount leak in xfrm_migrate_policy_find - wifi: rt2x00usb: fix devres lifetime - mptcp: fix slab-use-after-free in __inet_lookup_established - ipvs: fix NULL deref in ip_vs_add_service error path - eth: - airoha: fix memory leak in airoha_qdma_rx_process() - lan966x: fix use-after-free and leak in lan966x_fdma_reload() Previous releases - always broken: - ipv6: ioam: fix potential NULL dereferences in __ioam6_fill_trace_data() - ipv4: nexthop: avoid duplicate NHA_HW_STATS_ENABLE on nexthop group dump - bridge: guard local VLAN-0 FDB helpers against NULL vlan group - xsk: tailroom reservation and MTU validation - rxrpc: - fix to request an ack if window is limited - fix RESPONSE authenticator parser OOB read - netfilter: nft_ct: fix use-after-free in timeout object destroy - batman-adv: hold claim backbone gateways by reference - eth: - stmmac: fix PTP ref clock for Tegra234 - idpf: fix PREEMPT_RT raw/bh spinlock nesting for async VC handling - ipa: fix GENERIC_CMD register field masks for IPA v5.0+" * tag 'net-7.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (104 commits) net: lan966x: fix use-after-free and leak in lan966x_fdma_reload() net: lan966x: fix page pool leak in error paths net: lan966x: fix page_pool error handling in lan966x_fdma_rx_alloc_page_pool() nfc: pn533: allocate rx skb before consuming bytes l2tp: Drop large packets with UDP encap net: ipa: fix event ring index not programmed for IPA v5.0+ net: ipa: fix GENERIC_CMD register field masks for IPA v5.0+ MAINTAINERS: Add Prashanth as additional maintainer for amd-xgbe driver devlink: Fix incorrect skb socket family dumping af_unix: read UNIX_DIAG_VFS data under unix_state_lock Revert "mptcp: add needs_id for netlink appending addr" mptcp: fix slab-use-after-free in __inet_lookup_established net: txgbe: leave space for null terminators on property_entry net: ioam6: fix OOB and missing lock rxrpc: proc: size address buffers for %pISpc output rxrpc: only handle RESPONSE during service challenge rxrpc: Fix buffer overread in rxgk_do_verify_authenticator() rxrpc: Fix leak of rxgk context in rxgk_verify_response() rxrpc: Fix integer overflow in rxgk_verify_response() rxrpc: Fix missing error checks for rxkad encryption/decryption failure ...	2026-04-09 08:39:25 -07:00
Alice Mikityanska	ebe560ea5f	l2tp: Drop large packets with UDP encap syzbot reported a WARN on my patch series [1]. The actual issue is an overflow of 16-bit UDP length field, and it exists in the upstream code. My series added a debug WARN with an overflow check that exposed the issue, that's why syzbot tripped on my patches, rather than on upstream code. syzbot's repro: r0 = socket$pppl2tp(0x18, 0x1, 0x1) r1 = socket$inet6_udp(0xa, 0x2, 0x0) connect$inet6(r1, &(0x7f00000000c0)={0xa, 0x0, 0x0, @loopback, 0xfffffffc}, 0x1c) connect$pppl2tp(r0, &(0x7f0000000240)=@pppol2tpin6={0x18, 0x1, {0x0, r1, 0x4, 0x0, 0x0, 0x0, {0xa, 0x4e22, 0xffff, @ipv4={'\x00', '\xff\xff', @empty}}}}, 0x32) writev(r0, &(0x7f0000000080)=[{&(0x7f0000000000)="ee", 0x34000}], 0x1) It basically sends an oversized (0x34000 bytes) PPPoL2TP packet with UDP encapsulation, and l2tp_xmit_core doesn't check for overflows when it assigns the UDP length field. The value gets trimmed to 16 bites. Add an overflow check that drops oversized packets and avoids sending packets with trimmed UDP length to the wire. syzbot's stack trace (with my patch applied): len >= 65536u WARNING: ./include/linux/udp.h:38 at udp_set_len_short include/linux/udp.h:38 [inline], CPU#1: syz.0.17/5957 WARNING: ./include/linux/udp.h:38 at l2tp_xmit_core net/l2tp/l2tp_core.c:1293 [inline], CPU#1: syz.0.17/5957 WARNING: ./include/linux/udp.h:38 at l2tp_xmit_skb+0x1204/0x18d0 net/l2tp/l2tp_core.c:1327, CPU#1: syz.0.17/5957 Modules linked in: CPU: 1 UID: 0 PID: 5957 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 RIP: 0010:udp_set_len_short include/linux/udp.h:38 [inline] RIP: 0010:l2tp_xmit_core net/l2tp/l2tp_core.c:1293 [inline] RIP: 0010:l2tp_xmit_skb+0x1204/0x18d0 net/l2tp/l2tp_core.c:1327 Code: 0f 0b 90 e9 21 f9 ff ff e8 e9 05 ec f6 90 0f 0b 90 e9 8d f9 ff ff e8 db 05 ec f6 90 0f 0b 90 e9 cc f9 ff ff e8 cd 05 ec f6 90 <0f> 0b 90 e9 de fa ff ff 44 89 f1 80 e1 07 80 c1 03 38 c1 0f 8c 4f RSP: 0018:ffffc90003d67878 EFLAGS: 00010293 RAX: ffffffff8ad985e3 RBX: ffff8881a6400090 RCX: ffff8881697f0000 RDX: 0000000000000000 RSI: 0000000000034010 RDI: 000000000000ffff RBP: dffffc0000000000 R08: 0000000000000003 R09: 0000000000000004 R10: dffffc0000000000 R11: fffff520007acf00 R12: ffff8881baf20900 R13: 0000000000034010 R14: ffff8881a640008e R15: ffff8881760f7000 FS: 000055557e81f500(0000) GS:ffff8882a9467000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000200000033000 CR3: 00000001612f4000 CR4: 00000000000006f0 Call Trace: <TASK> pppol2tp_sendmsg+0x40a/0x5f0 net/l2tp/l2tp_ppp.c:302 sock_sendmsg_nosec net/socket.c:727 [inline] __sock_sendmsg net/socket.c:742 [inline] sock_write_iter+0x503/0x550 net/socket.c:1195 do_iter_readv_writev+0x619/0x8c0 fs/read_write.c:-1 vfs_writev+0x33c/0x990 fs/read_write.c:1059 do_writev+0x154/0x2e0 fs/read_write.c:1105 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x14d/0xf80 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f636479c629 Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffffd4241c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000014 RAX: ffffffffffffffda RBX: 00007f6364a15fa0 RCX: 00007f636479c629 RDX: 0000000000000001 RSI: 0000200000000080 RDI: 0000000000000003 RBP: 00007f6364832b39 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007f6364a15fac R14: 00007f6364a15fa0 R15: 00007f6364a15fa0 </TASK> [1]: https://lore.kernel.org/all/20260226201600.222044-1-alice.kernel@fastmail.im/ Fixes: `3557baabf2` ("[L2TP]: PPP over L2TP driver core") Reported-by: syzbot+ci3edea60a44225dec@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/69a1dfba.050a0220.3a55be.0026.GAE@google.com/ Signed-off-by: Alice Mikityanska <alice@isovalent.com> Link: https://patch.msgid.link/20260403174949.843941-1-alice.kernel@fastmail.im Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-04-09 10:19:05 +02:00
Li RongQing	0006c6f109	devlink: Fix incorrect skb socket family dumping The devlink_fmsg_dump_skb function was incorrectly using the socket type (sk->sk_type) instead of the socket family (sk->sk_family) when filling the "family" field in the fast message dump. This patch fixes this to properly display the socket family. Fixes: `3dbfde7f6b` ("devlink: add devlink_fmsg_dump_skb() function") Signed-off-by: Li RongQing <lirongqing@baidu.com> Link: https://patch.msgid.link/20260407022730.2393-1-lirongqing@baidu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:34:38 -07:00
Jiexun Wang	39897df386	af_unix: read UNIX_DIAG_VFS data under unix_state_lock Exact UNIX diag lookups hold a reference to the socket, but not to u->path. Meanwhile, unix_release_sock() clears u->path under unix_state_lock() and drops the path reference after unlocking. Read the inode and device numbers for UNIX_DIAG_VFS while holding unix_state_lock(), then emit the netlink attribute after dropping the lock. This keeps the VFS data stable while the reply is being built. Fixes: `5f7b056946` ("unix_diag: Unix inode info NLA") Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Co-developed-by: Yuan Tan <yuantan098@gmail.com> Signed-off-by: Yuan Tan <yuantan098@gmail.com> Suggested-by: Xin Liu <bird@lzu.edu.cn> Tested-by: Ren Wei <enjou1224z@gmail.com> Signed-off-by: Jiexun Wang <wangjiexun2025@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260407080015.1744197-1-n05ec@lzu.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:33:52 -07:00
Matthieu Baerts (NGI0)	8e2760eaab	Revert "mptcp: add needs_id for netlink appending addr" This commit was originally adding the ability to add MPTCP endpoints with ID 0 by accident. The in-kernel PM, handling MPTCP endpoints at the net namespace level, is not supposed to handle endpoints with such ID, because this ID 0 is reserved to the initial subflow, as mentioned in the MPTCPv1 protocol [1], a per-connection setting. Note that 'ip mptcp endpoint add id 0' stops early with an error, but other tools might still request the in-kernel PM to create MPTCP endpoints with this restricted ID 0. In other words, it was wrong to call the mptcp_pm_has_addr_attr_id helper to check whether the address ID attribute is set: if it was set to 0, a new MPTCP endpoint would be created with ID 0, which is not expected, and might cause various issues later. Fixes: `584f389426` ("mptcp: add needs_id for netlink appending addr") Cc: stable@vger.kernel.org Link: https://datatracker.ietf.org/doc/html/rfc8684#section-3.2-9 [1] Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260407-net-mptcp-revert-pm-needs-id-v2-1-7a25cbc324f8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:31:16 -07:00
Jiayuan Chen	9b55b25390	mptcp: fix slab-use-after-free in __inet_lookup_established The ehash table lookups are lockless and rely on SLAB_TYPESAFE_BY_RCU to guarantee socket memory stability during RCU read-side critical sections. Both tcp_prot and tcpv6_prot have their slab caches created with this flag via proto_register(). However, MPTCP's mptcp_subflow_init() copies tcpv6_prot into tcpv6_prot_override during inet_init() (fs_initcall, level 5), before inet6_init() (module_init/device_initcall, level 6) has called proto_register(&tcpv6_prot). At that point, tcpv6_prot.slab is still NULL, so tcpv6_prot_override.slab remains NULL permanently. This causes MPTCP v6 subflow child sockets to be allocated via kmalloc (falling into kmalloc-4k) instead of the TCPv6 slab cache. The kmalloc-4k cache lacks SLAB_TYPESAFE_BY_RCU, so when these sockets are freed without SOCK_RCU_FREE (which is cleared for child sockets by design), the memory can be immediately reused. Concurrent ehash lookups under rcu_read_lock can then access freed memory, triggering a slab-use-after-free in __inet_lookup_established. Fix this by splitting the IPv6-specific initialization out of mptcp_subflow_init() into a new mptcp_subflow_v6_init(), called from mptcp_proto_v6_init() before protocol registration. This ensures tcpv6_prot_override.slab correctly inherits the SLAB_TYPESAFE_BY_RCU slab cache. Fixes: `b19bc2945b` ("mptcp: implement delegated actions") Cc: stable@vger.kernel.org Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260406031512.189159-1-jiayuan.chen@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:30:08 -07:00
Justin Iurman	b30b1675aa	net: ioam6: fix OOB and missing lock When trace->type.bit6 is set: if (trace->type.bit6) { ... queue = skb_get_tx_queue(dev, skb); qdisc = rcu_dereference(queue->qdisc); This code can lead to an out-of-bounds access of the dev->_tx[] array when is_input is true. In such a case, the packet is on the RX path and skb->queue_mapping contains the RX queue index of the ingress device. If the ingress device has more RX queues than the egress device (dev) has TX queues, skb_get_queue_mapping(skb) will exceed dev->num_tx_queues. Add a check to avoid this situation since skb_get_tx_queue() does not clamp the index. This issue has also revealed that per queue visibility cannot be accurate and will be replaced later as a new feature. While at it, add missing lock around qdisc_qstats_qlen_backlog(). The function __ioam6_fill_trace_data() is called from both softirq and process contexts, hence the use of spin_lock_bh() here. Fixes: `b63c5478e9` ("ipv6: ioam: Support for Queue depth data field") Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://lore.kernel.org/netdev/20260403214418.2233266-2-kuba@kernel.org/ Signed-off-by: Justin Iurman <justin.iurman@gmail.com> Link: https://patch.msgid.link/20260404134137.24553-1-justin.iurman@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:08:56 -07:00
Jakub Kicinski	d65b175cfa	Merge tag 'wireless-2026-04-08' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== A few last-minute fixes: - rfkill: prevent boundless event list - rt2x00: fix USB resource management - brcmfmac: validate firmware IDs - brcmsmac: fix DMA free size * tag 'wireless-2026-04-08' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: net: rfkill: prevent unlimited numbers of rfkill events from being created wifi: rt2x00usb: fix devres lifetime wifi: brcmfmac: validate bsscfg indices in IF events wifi: brcmsmac: Fix dma_free_coherent() size ==================== Link: https://patch.msgid.link/20260408081802.111623-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:56:17 -07:00
Jakub Kicinski	84ac9a922d	Merge tag 'ipsec-2026-04-08' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2026-04-08 1) Clear trailing padding in build_polexpire() to prevent leaking unititialized memory. From Yasuaki Torimaru. 2) Fix aevent size calculation when XFRMA_IF_ID is used. From Keenan Dong. 3) Wait for RCU readers during policy netns exit before freeing the policy hash tables. 4) Fix dome too eaerly dropped references on the netdev when uding transport mode. From Qi Tang. 5) Fix refcount leak in xfrm_migrate_policy_find(). From Kotlyarov Mihail. 6) Fix two fix info leaks in build_report() and in build_mapping(). From Greg Kroah-Hartman. 7) Zero aligned sockaddr tail in PF_KEY exports. From Zhengchuan Liang. * tag 'ipsec-2026-04-08' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: net: af_key: zero aligned sockaddr tail in PF_KEY exports xfrm_user: fix info leak in build_report() xfrm_user: fix info leak in build_mapping() xfrm: fix refcount leak in xfrm_migrate_policy_find xfrm: hold dev ref until after transport_finish NF_HOOK xfrm: Wait for RCU readers during policy netns exit xfrm: account XFRMA_IF_ID in aevent size calculation xfrm: clear trailing padding in build_polexpire() ==================== Link: https://patch.msgid.link/20260408095925.253681-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:54:32 -07:00
Jakub Kicinski	d614e0186b	Merge tag 'batadv-net-pullrequest-20260408' of https://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== Here are two batman-adv bugfixes: - reject oversized global TT response buffers, by Ruide Cao - hold claim backbone gateways by reference, by Haoze Xie * tag 'batadv-net-pullrequest-20260408' of https://git.open-mesh.org/linux-merge: batman-adv: hold claim backbone gateways by reference batman-adv: reject oversized global TT response buffers ==================== Link: https://patch.msgid.link/20260408110255.976389-1-sw@simonwunderlich.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:50:27 -07:00
Jakub Kicinski	1ee3b19a26	Merge tag 'nf-26-04-08' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Florian Westphal says: ==================== netfilter updates for net I only included crash fixes, as we're closer to a release, rest will be handled via -next. 1) Fix a NULL pointer dereference in ip_vs_add_service error path, from Weiming Shi, bug added in 6.2 development cycle. 2) Don't leak kernel data bytes from allocator to userspace: nfnetlink_log needs to init the trailing NLMSG_DONE terminator. From Xiang Mei. 3) xt_multiport match lacks range validation, bogus userspace request will cause out-of-bounds read. From Ren Wei. 4) ip6t_eui64 match must reject packets with invalid mac header before calling eth_hdr. Make existing check unconditional. From Zhengchuan Liang. 5) nft_ct timeout policies are free'd via kfree() while they may still be reachable by other cpus that process a conntrack object that uses such a timeout policy. Existing reaping of entries is not sufficient because it doesn't wait for a grace period. Use kfree_rcu(). From Tuan Do. 6/7) Make nfnetlink_queue hash table per queue. As-is we can hit a page fault in case underlying page of removed element was free'd. Per-queue hash prevents parallel lookups. This comes with a test case that demonstrates the bug, from Fernando Fernandez Mancera. * tag 'nf-26-04-08' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: selftests: nft_queue.sh: add a parallel stress test netfilter: nfnetlink_queue: make hash table per queue netfilter: nft_ct: fix use-after-free in timeout object destroy netfilter: ip6t_eui64: reject invalid MAC header for all packets netfilter: xt_multiport: validate range encoding in checkentry netfilter: nfnetlink_log: initialize nfgenmsg in NLMSG_DONE terminator ipvs: fix NULL deref in ip_vs_add_service error path ==================== Link: https://patch.msgid.link/20260408163512.30537-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:48:44 -07:00
Pengpeng Hou	a44ce6aa2e	rxrpc: proc: size address buffers for %pISpc output The AF_RXRPC procfs helpers format local and remote socket addresses into fixed 50-byte stack buffers with "%pISpc". That is too small for the longest current-tree IPv6-with-port form the formatter can produce. In lib/vsprintf.c, the compressed IPv6 path uses a dotted-quad tail not only for v4mapped addresses, but also for ISATAP addresses via ipv6_addr_is_isatap(). As a result, a case such as [ffff:ffff:ffff:ffff:0:5efe:255.255.255.255]:65535 is possible with the current formatter. That is 50 visible characters, so 51 bytes including the trailing NUL, which does not fit in the existing char[50] buffers used by net/rxrpc/proc.c. Size the buffers from the formatter's maximum textual form and switch the call sites to scnprintf(). Changes since v1: - correct the changelog to cite the actual maximum current-tree case explicitly - frame the proof around the ISATAP formatting path instead of the earlier mapped-v4 example Fixes: `75b54cb57c` ("rxrpc: Add IPv6 support") Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Anderson Nascimento <anderson@allelesecurity.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-22-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:45:32 -07:00
Wang Jie	c43ffdcfdb	rxrpc: only handle RESPONSE during service challenge Only process RESPONSE packets while the service connection is still in RXRPC_CONN_SERVICE_CHALLENGING. Check that state under state_lock before running response verification and security initialization, then use a local secured flag to decide whether to queue the secured-connection work after the state transition. This keeps duplicate or late RESPONSE packets from re-running the setup path and removes the unlocked post-transition state test. Fixes: `17926a7932` ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Co-developed-by: Yuan Tan <yuantan098@gmail.com> Signed-off-by: Yuan Tan <yuantan098@gmail.com> Suggested-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Jie Wang <jiewang2024@lzu.edu.cn> Signed-off-by: Yang Yang <n05ec@lzu.edu.cn> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Jeffrey Altman <jaltman@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-21-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:45:05 -07:00
David Howells	f564af387c	rxrpc: Fix buffer overread in rxgk_do_verify_authenticator() Fix rxgk_do_verify_authenticator() to check the buffer size before checking the nonce. Fixes: `9d1d2b5934` ("rxrpc: rxgk: Implement the yfs-rxgk security class (GSSAPI)") Closes: https://sashiko.dev/#/patchset/20260401105614.1696001-10-dhowells@redhat.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Jeffrey Altman <jaltman@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-20-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:44:34 -07:00
David Howells	7e1876caa8	rxrpc: Fix leak of rxgk context in rxgk_verify_response() Fix rxgk_verify_response() to clean up the rxgk context it creates. Fixes: `9d1d2b5934` ("rxrpc: rxgk: Implement the yfs-rxgk security class (GSSAPI)") Closes: https://sashiko.dev/#/patchset/20260401105614.1696001-10-dhowells@redhat.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Jeffrey Altman <jaltman@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-19-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:44:34 -07:00
David Howells	699e52180f	rxrpc: Fix integer overflow in rxgk_verify_response() In rxgk_verify_response(), there's a potential integer overflow due to rounding up token_len before checking it, thereby allowing the length check to be bypassed. Fix this by checking the unrounded value against len too (len is limited as the response must fit in a single UDP packet). Fixes: `9d1d2b5934` ("rxrpc: rxgk: Implement the yfs-rxgk security class (GSSAPI)") Closes: https://sashiko.dev/#/patchset/20260401105614.1696001-10-dhowells@redhat.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Jeffrey Altman <jaltman@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-18-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:44:34 -07:00
David Howells	f93af41b9f	rxrpc: Fix missing error checks for rxkad encryption/decryption failure Add error checking for failure of crypto_skcipher_en/decrypt() to various rxkad function as the crypto functions can fail with ENOMEM at least. Fixes: `17926a7932` ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Closes: https://sashiko.dev/#/patchset/20260401105614.1696001-10-dhowells@redhat.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Jeffrey Altman <jaltman@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-17-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:44:34 -07:00
David Howells	2afd86ccbb	rxrpc: Fix key/keyring checks in setsockopt(RXRPC_SECURITY_KEY/KEYRING) An AF_RXRPC socket can be both client and server at the same time. When sending new calls (ie. it's acting as a client), it uses rx->key to set the security, and when accepting incoming calls (ie. it's acting as a server), it uses rx->securities. setsockopt(RXRPC_SECURITY_KEY) sets rx->key to point to an rxrpc-type key and setsockopt(RXRPC_SECURITY_KEYRING) sets rx->securities to point to a keyring of rxrpc_s-type keys. Now, it should be possible to use both rx->key and rx->securities on the same socket - but for userspace AF_RXRPC sockets rxrpc_setsockopt() prevents that. Fix this by: (1) Remove the incorrect check rxrpc_setsockopt(RXRPC_SECURITY_KEYRING) makes on rx->key. (2) Move the check that rxrpc_setsockopt(RXRPC_SECURITY_KEY) makes on rx->key down into rxrpc_request_key(). (3) Remove rxrpc_request_key()'s check on rx->securities. This (in combination with a previous patch) pushes the checks down into the functions that set those pointers and removes the cross-checks that prevent both key and keyring being set. Fixes: `17926a7932` ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Closes: https://sashiko.dev/#/patchset/20260401105614.1696001-10-dhowells@redhat.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Anderson Nascimento <anderson@allelesecurity.com> cc: Luxiao Xu <rakukuip@gmail.com> cc: Yuan Tan <yuantan098@gmail.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-16-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:44:34 -07:00
Luxiao Xu	f125846ee7	rxrpc: fix reference count leak in rxrpc_server_keyring() This patch fixes a reference count leak in rxrpc_server_keyring() by checking if rx->securities is already set. Fixes: `17926a7932` ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Co-developed-by: Yuan Tan <yuantan098@gmail.com> Signed-off-by: Yuan Tan <yuantan098@gmail.com> Suggested-by: Xin Liu <bird@lzu.edu.cn> Tested-by: Ren Wei <enjou1224z@gmail.com> Signed-off-by: Luxiao Xu <rakukuip@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-15-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:44:33 -07:00
Keenan Dong	a2567217ad	rxrpc: fix oversized RESPONSE authenticator length check rxgk_verify_response() decodes auth_len from the packet and is supposed to verify that it fits in the remaining bytes. The existing check is inverted, so oversized RESPONSE authenticators are accepted and passed to rxgk_decrypt_skb(), which can later reach skb_to_sgvec() with an impossible length and hit BUG_ON(len). Decoded from the original latest-net reproduction logs with scripts/decode_stacktrace.sh: RIP: __skb_to_sgvec() [net/core/skbuff.c:5285 (discriminator 1)] Call Trace: skb_to_sgvec() [net/core/skbuff.c:5305] rxgk_decrypt_skb() [net/rxrpc/rxgk_common.h:81] rxgk_verify_response() [net/rxrpc/rxgk.c:1268] rxrpc_process_connection() [net/rxrpc/conn_event.c:266 net/rxrpc/conn_event.c:364 net/rxrpc/conn_event.c:386] process_one_work() [kernel/workqueue.c:3281] worker_thread() [kernel/workqueue.c:3353 kernel/workqueue.c:3440] kthread() [kernel/kthread.c:436] ret_from_fork() [arch/x86/kernel/process.c:164] Reject authenticator lengths that exceed the remaining packet payload. Fixes: `9d1d2b5934` ("rxrpc: rxgk: Implement the yfs-rxgk security class (GSSAPI)") Signed-off-by: Keenan Dong <keenanat2000@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: Willy Tarreau <w@1wt.eu> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-14-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:44:33 -07:00
Keenan Dong	3e31380078	rxrpc: fix RESPONSE authenticator parser OOB read rxgk_verify_authenticator() copies auth_len bytes into a temporary buffer and then passes p + auth_len as the parser limit to rxgk_do_verify_authenticator(). Since p is a __be32 *, that inflates the parser end pointer by a factor of four and lets malformed RESPONSE authenticators read past the kmalloc() buffer. Decoded from the original latest-net reproduction logs with scripts/decode_stacktrace.sh: BUG: KASAN: slab-out-of-bounds in rxgk_verify_response() Call Trace: dump_stack_lvl() [lib/dump_stack.c:123] print_report() [mm/kasan/report.c:379 mm/kasan/report.c:482] kasan_report() [mm/kasan/report.c:597] rxgk_verify_response() [net/rxrpc/rxgk.c:1103 net/rxrpc/rxgk.c:1167 net/rxrpc/rxgk.c:1274] rxrpc_process_connection() [net/rxrpc/conn_event.c:266 net/rxrpc/conn_event.c:364 net/rxrpc/conn_event.c:386] process_one_work() [kernel/workqueue.c:3281] worker_thread() [kernel/workqueue.c:3353 kernel/workqueue.c:3440] kthread() [kernel/kthread.c:436] ret_from_fork() [arch/x86/kernel/process.c:164] Allocated by task 54: rxgk_verify_response() [include/linux/slab.h:954 net/rxrpc/rxgk.c:1155 net/rxrpc/rxgk.c:1274] rxrpc_process_connection() [net/rxrpc/conn_event.c:266 net/rxrpc/conn_event.c:364 net/rxrpc/conn_event.c:386] Convert the byte count to __be32 units before constructing the parser limit. Fixes: `9d1d2b5934` ("rxrpc: rxgk: Implement the yfs-rxgk security class (GSSAPI)") Signed-off-by: Keenan Dong <keenanat2000@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: Willy Tarreau <w@1wt.eu> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-13-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:44:33 -07:00
Yuqi Xu	fe4447cd95	rxrpc: reject undecryptable rxkad response tickets rxkad_decrypt_ticket() decrypts the RXKAD response ticket and then parses the buffer as plaintext without checking whether crypto_skcipher_decrypt() succeeded. A malformed RESPONSE can therefore use a non-block-aligned ticket length, make the decrypt operation fail, and still drive the ticket parser with attacker-controlled bytes. Check the decrypt result and abort the connection with RXKADBADTICKET when ticket decryption fails. Fixes: `17926a7932` ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Co-developed-by: Yuan Tan <yuantan098@gmail.com> Signed-off-by: Yuan Tan <yuantan098@gmail.com> Suggested-by: Xin Liu <bird@lzu.edu.cn> Tested-by: Ren Wei <enjou1224z@gmail.com> Signed-off-by: Yuqi Xu <xuyuqiabc@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-12-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:44:33 -07:00
Douya Le	6331f1b24a	rxrpc: Only put the call ref if one was acquired rxrpc_input_packet_on_conn() can process a to-client packet after the current client call on the channel has already been torn down. In that case chan->call is NULL, rxrpc_try_get_call() returns NULL and there is no reference to drop. The client-side implicit-end error path does not account for that and unconditionally calls rxrpc_put_call(). This turns a protocol error path into a kernel crash instead of rejecting the packet. Only drop the call reference if one was actually acquired. Keep the existing protocol error handling unchanged. Fixes: `5e6ef4f101` ("rxrpc: Make the I/O thread take over the call and local processor work") Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Signed-off-by: Douya Le <ldy3087146292@gmail.com> Co-developed-by: Yuan Tan <tanyuan98@gmail.com> Signed-off-by: Yuan Tan <tanyuan98@gmail.com> Suggested-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Ao Zhou <n05ec@lzu.edu.cn> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-11-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:44:33 -07:00
Marc Dionne	0cd3e3f3f2	rxrpc: Fix to request an ack if window is limited Peers may only send immediate acks for every 2 UDP packets received. When sending a jumbogram, it is important to check that there is sufficient window space to send another same sized jumbogram following the current one, and request an ack if there isn't. Failure to do so may cause the call to stall waiting for an ack until the resend timer fires. Where jumbograms are in use this causes a very significant drop in performance. Fixes: `fe24a54943` ("rxrpc: Send jumbo DATA packets") Signed-off-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeffrey Altman <jaltman@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-10-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:44:33 -07:00
Anderson Nascimento	d666540d21	rxrpc: Fix key reference count leak from call->key When creating a client call in rxrpc_alloc_client_call(), the code obtains a reference to the key. This is never cleaned up and gets leaked when the call is destroyed. Fix this by freeing call->key in rxrpc_destroy_call(). Before the patch, it shows the key reference counter elevated: $ cat /proc/keys \| grep afs@54321 1bffe9cd I--Q--i 8053480 4169w 3b010000 1000 1000 rxrpc afs@54321: ka $ After the patch, the invalidated key is removed when the code exits: $ cat /proc/keys \| grep afs@54321 $ Fixes: `f3441d4125` ("rxrpc: Copy client call parameters into rxrpc_call earlier") Signed-off-by: Anderson Nascimento <anderson@allelesecurity.com> Co-developed-by: David Howells <dhowells@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-9-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:44:32 -07:00
Alok Tiwari	65b3ffe097	rxrpc: Fix rack timer warning to report unexpected mode rxrpc_rack_timer_expired() clears call->rack_timer_mode to OFF before the switch. The default case warning therefore always prints OFF and doesn't identify the unexpected timer mode. Log the saved mode value instead so the warning reports the actual unexpected rack timer mode. Fixes: `7c48266593` ("rxrpc: Implement RACK/TLP to deal with transmission stalls [RFC8985]") Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Jeffrey Altman <jaltman@auristor.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-8-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:44:32 -07:00
Alok Tiwari	b33f5741bb	rxrpc: Fix use of wrong skb when comparing queued RESP challenge serial In rxrpc_post_response(), the code should be comparing the challenge serial number from the cached response before deciding to switch to a newer response, but looks at the newer packet private data instead, rendering the comparison always false. Fix this by switching to look at the older packet. Fix further[1] to substitute the new packet in place of the old one if newer and also to release whichever we don't use. Fixes: `5800b1cf3f` ("rxrpc: Allow CHALLENGEs to the passed to the app for a RESPONSE") Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://sashiko.dev/#/patchset/20260319150150.4189381-1-dhowells%40redhat.com [1] Link: https://patch.msgid.link/20260408121252.2249051-7-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:44:32 -07:00
Oleh Konko	d179a868dd	rxrpc: Fix RxGK token loading to check bounds rxrpc_preparse_xdr_yfs_rxgk() reads the raw key length and ticket length from the XDR token as u32 values and passes each through round_up(x, 4) before using the rounded value for validation and allocation. When the raw length is >= 0xfffffffd, round_up() wraps to 0, so the bounds check and kzalloc both use 0 while the subsequent memcpy still copies the original ~4 GiB value, producing a heap buffer overflow reachable from an unprivileged add_key() call. Fix this by: (1) Rejecting raw key lengths above AFSTOKEN_GK_KEY_MAX and raw ticket lengths above AFSTOKEN_GK_TOKEN_MAX before rounding, consistent with the caps that the RxKAD path already enforces via AFSTOKEN_RK_TIX_MAX. (2) Sizing the flexible-array allocation from the validated raw key length via struct_size_t() instead of the rounded value. (3) Caching the raw lengths so that the later field assignments and memcpy calls do not re-read from the token, eliminating a class of TOCTOU re-parse. The control path (valid token with lengths within bounds) is unaffected. Fixes: `0ca100ff4d` ("rxrpc: Add YFS RxGK (GSSAPI) security class") Signed-off-by: Oleh Konko <security@1seal.org> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-6-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:44:32 -07:00
David Howells	146d4ab94c	rxrpc: Fix call removal to use RCU safe deletion Fix rxrpc call removal from the rxnet->calls list to use list_del_rcu() rather than list_del_init() to prevent stuffing up reading /proc/net/rxrpc/calls from potentially getting into an infinite loop. This, however, means that list_empty() no longer works on an entry that's been deleted from the list, making it harder to detect prior deletion. Fix this by: Firstly, make rxrpc_destroy_all_calls() only dump the first ten calls that are unexpectedly still on the list. Limiting the number of steps means there's no need to call cond_resched() or to remove calls from the list here, thereby eliminating the need for rxrpc_put_call() to check for that. rxrpc_put_call() can then be fixed to unconditionally delete the call from the list as it is the only place that the deletion occurs. Fixes: `2baec2c3f8` ("rxrpc: Support network namespacing") Closes: https://sashiko.dev/#/patchset/20260319150150.4189381-1-dhowells%40redhat.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Jeffrey Altman <jaltman@auristor.com> cc: Linus Torvalds <torvalds@linux-foundation.org> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-5-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:44:32 -07:00
David Howells	6a59d84b4f	rxrpc: Fix anonymous key handling In rxrpc_new_client_call_for_sendmsg(), a key with no payload is meant to be substituted for a NULL key pointer, but the variable this is done with is subsequently not used. Fix this by using "key" rather than "rx->key" when filling in the connection parameters. Note that this only affects direct use of AF_RXRPC; the kAFS filesystem doesn't use sendmsg() directly and so bypasses the issue. Further, AF_RXRPC passes a NULL key in if no key is set, so using an anonymous key in that manner works. Since this hasn't been noticed to this point, it might be better just to remove the "key" variable and the code that sets it - and, arguably, rxrpc_init_client_call_security() would be a better place to handle it. Fixes: `19ffa01c9c` ("rxrpc: Use structs to hold connection params and protocol info") Closes: https://sashiko.dev/#/patchset/20260319150150.4189381-1-dhowells%40redhat.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Jeffrey Altman <jaltman@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-4-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:44:31 -07:00
David Howells	b555912b9b	rxrpc: Fix key parsing memleak In rxrpc_preparse_xdr_yfs_rxgk(), the memory attached to token->rxgk can be leaked in a few error paths after it's allocated. Fix this by freeing it in the "reject_token:" case. Fixes: `0ca100ff4d` ("rxrpc: Add YFS RxGK (GSSAPI) security class") Closes: https://sashiko.dev/#/patchset/20260319150150.4189381-1-dhowells%40redhat.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Jeffrey Altman <jaltman@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-3-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:44:31 -07:00
David Howells	bdbfead6d3	rxrpc: Fix key quota calculation for multitoken keys In the rxrpc key preparsing, every token extracted sets the proposed quota value, but for multitoken keys, this will overwrite the previous proposed quota, losing it. Fix this by adding to the proposed quota instead. Fixes: `8a7a3eb4dd` ("KEYS: RxRPC: Use key preparsing") Closes: https://sashiko.dev/#/patchset/20260319150150.4189381-1-dhowells%40redhat.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Jeffrey Altman <jaltman@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260408121252.2249051-2-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:44:31 -07:00
Florian Westphal	936206e3f6	netfilter: nfnetlink_queue: make hash table per queue Sharing a global hash table among all queues is tempting, but it can cause crash: BUG: KASAN: slab-use-after-free in nfqnl_recv_verdict+0x11ac/0x15e0 [nfnetlink_queue] [..] nfqnl_recv_verdict+0x11ac/0x15e0 [nfnetlink_queue] nfnetlink_rcv_msg+0x46a/0x930 kmem_cache_alloc_node_noprof+0x11e/0x450 struct nf_queue_entry is freed via kfree, but parallel cpu can still encounter such an nf_queue_entry when walking the list. Alternative fix is to free the nf_queue_entry via kfree_rcu() instead, but as we have to alloc/free for each skb this will cause more mem pressure. Cc: Scott Mitchell <scott.k.mitch1@gmail.com> Fixes: `e19079adcd` ("netfilter: nfnetlink_queue: optimize verdict lookup with hash table") Signed-off-by: Florian Westphal <fw@strlen.de>	2026-04-08 13:34:51 +02:00
Tuan Do	f8dca15a1b	netfilter: nft_ct: fix use-after-free in timeout object destroy nft_ct_timeout_obj_destroy() frees the timeout object with kfree() immediately after nf_ct_untimeout(), without waiting for an RCU grace period. Concurrent packet processing on other CPUs may still hold RCU-protected references to the timeout object obtained via rcu_dereference() in nf_ct_timeout_data(). Add an rcu_head to struct nf_ct_timeout and use kfree_rcu() to defer freeing until after an RCU grace period, matching the approach already used in nfnetlink_cttimeout.c. KASAN report: BUG: KASAN: slab-use-after-free in nf_conntrack_tcp_packet+0x1381/0x29d0 Read of size 4 at addr ffff8881035fe19c by task exploit/80 Call Trace: nf_conntrack_tcp_packet+0x1381/0x29d0 nf_conntrack_in+0x612/0x8b0 nf_hook_slow+0x70/0x100 __ip_local_out+0x1b2/0x210 tcp_sendmsg_locked+0x722/0x1580 __sys_sendto+0x2d8/0x320 Allocated by task 75: nft_ct_timeout_obj_init+0xf6/0x290 nft_obj_init+0x107/0x1b0 nf_tables_newobj+0x680/0x9c0 nfnetlink_rcv_batch+0xc29/0xe00 Freed by task 26: nft_obj_destroy+0x3f/0xa0 nf_tables_trans_destroy_work+0x51c/0x5c0 process_one_work+0x2c4/0x5a0 Fixes: `7e0b2b57f0` ("netfilter: nft_ct: add ct timeout support") Cc: stable@vger.kernel.org Signed-off-by: Tuan Do <tuan@calif.io> Signed-off-by: Florian Westphal <fw@strlen.de>	2026-04-08 13:34:16 +02:00
Zhengchuan Liang	fdce0b3590	netfilter: ip6t_eui64: reject invalid MAC header for all packets `eui64_mt6()` derives a modified EUI-64 from the Ethernet source address and compares it with the low 64 bits of the IPv6 source address. The existing guard only rejects an invalid MAC header when `par->fragoff != 0`. For packets with `par->fragoff == 0`, `eui64_mt6()` can still reach `eth_hdr(skb)` even when the MAC header is not valid. Fix this by removing the `par->fragoff != 0` condition so that packets with an invalid MAC header are rejected before accessing `eth_hdr(skb)`. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Co-developed-by: Yuan Tan <yuantan098@gmail.com> Signed-off-by: Yuan Tan <yuantan098@gmail.com> Suggested-by: Xin Liu <bird@lzu.edu.cn> Tested-by: Ren Wei <enjou1224z@gmail.com> Signed-off-by: Zhengchuan Liang <zcliangcn@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Signed-off-by: Florian Westphal <fw@strlen.de>	2026-04-08 13:33:56 +02:00
Ren Wei	ff64c5bfef	netfilter: xt_multiport: validate range encoding in checkentry ports_match_v1() treats any non-zero pflags entry as the start of a port range and unconditionally consumes the next ports[] element as the range end. The checkentry path currently validates protocol, flags and count, but it does not validate the range encoding itself. As a result, malformed rules can mark the last slot as a range start or place two range starts back to back, leaving ports_match_v1() to step past the last valid ports[] element while interpreting the rule. Reject malformed multiport v1 rules in checkentry by validating that each range start has a following element and that the following element is not itself marked as another range start. Fixes: `a89ecb6a2e` ("[NETFILTER]: x_tables: unify IPv4/IPv6 multiport match") Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Co-developed-by: Yuan Tan <yuantan098@gmail.com> Signed-off-by: Yuan Tan <yuantan098@gmail.com> Suggested-by: Xin Liu <bird@lzu.edu.cn> Tested-by: Yuhang Zheng <z1652074432@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Signed-off-by: Florian Westphal <fw@strlen.de>	2026-04-08 13:33:38 +02:00
Xiang Mei	1f3083aec8	netfilter: nfnetlink_log: initialize nfgenmsg in NLMSG_DONE terminator When batching multiple NFLOG messages (inst->qlen > 1), __nfulnl_send() appends an NLMSG_DONE terminator with sizeof(struct nfgenmsg) payload via nlmsg_put(), but never initializes the nfgenmsg bytes. The nlmsg_put() helper only zeroes alignment padding after the payload, not the payload itself, so four bytes of stale kernel heap data are leaked to userspace in the NLMSG_DONE message body. Use nfnl_msg_put() to build the NLMSG_DONE terminator, which initializes the nfgenmsg payload via nfnl_fill_hdr(), consistent with how __build_packet_message() already constructs NFULNL_MSG_PACKET headers. Fixes: `29c5d4afba` ("[NETFILTER]: nfnetlink_log: fix sending of multipart messages") Reported-by: Weiming Shi <bestswngs@gmail.com> Signed-off-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Florian Westphal <fw@strlen.de>	2026-04-08 13:33:36 +02:00
Weiming Shi	9a91797e61	ipvs: fix NULL deref in ip_vs_add_service error path When ip_vs_bind_scheduler() succeeds in ip_vs_add_service(), the local variable sched is set to NULL. If ip_vs_start_estimator() subsequently fails, the out_err cleanup calls ip_vs_unbind_scheduler(svc, sched) with sched == NULL. ip_vs_unbind_scheduler() passes the cur_sched NULL check (because svc->scheduler was set by the successful bind) but then dereferences the NULL sched parameter at sched->done_service, causing a kernel panic at offset 0x30 from NULL. Oops: general protection fault, [..] [#1] PREEMPT SMP KASAN NOPTI KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037] RIP: 0010:ip_vs_unbind_scheduler (net/netfilter/ipvs/ip_vs_sched.c:69) Call Trace: <TASK> ip_vs_add_service.isra.0 (net/netfilter/ipvs/ip_vs_ctl.c:1500) do_ip_vs_set_ctl (net/netfilter/ipvs/ip_vs_ctl.c:2809) nf_setsockopt (net/netfilter/nf_sockopt.c:102) [..] Fix by simply not clearing the local sched variable after a successful bind. ip_vs_unbind_scheduler() already detects whether a scheduler is installed via svc->scheduler, and keeping sched non-NULL ensures the error path passes the correct pointer to both ip_vs_unbind_scheduler() and ip_vs_scheduler_put(). While the bug is older, the problem popups in more recent kernels (6.2), when the new error path is taken after the ip_vs_start_estimator() call. Fixes: `705dd34440` ("ipvs: use kthreads for stats estimation") Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Weiming Shi <bestswngs@gmail.com> Acked-by: Simon Horman <horms@kernel.org> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Florian Westphal <fw@strlen.de>	2026-04-08 13:33:28 +02:00
Andrea Mayer	c3812651b5	seg6: separate dst_cache for input and output paths in seg6 lwtunnel The seg6 lwtunnel uses a single dst_cache per encap route, shared between seg6_input_core() and seg6_output_core(). These two paths can perform the post-encap SID lookup in different routing contexts (e.g., ip rules matching on the ingress interface, or VRF table separation). Whichever path runs first populates the cache, and the other reuses it blindly, bypassing its own lookup. Fix this by splitting the cache into cache_input and cache_output, so each path maintains its own cached dst independently. Fixes: `6c8702c60b` ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels") Cc: stable@vger.kernel.org Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Justin Iurman <justin.iurman@gmail.com> Link: https://patch.msgid.link/20260404004405.4057-2-andrea.mayer@uniroma2.it Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-07 20:20:56 -07:00
Jakub Kicinski	944b3b734c	net: avoid nul-deref trying to bind mp to incapable device Sashiko points out that we use qops in __net_mp_open_rxq() but never validate they are null. This was introduced when check was moved from netdev_rx_queue_restart(). Look at ops directly instead of the locking config. qops imply netdev_need_ops_lock(). We used netdev_need_ops_lock() initially to signify that the real_num_rx_queues check below is safe without rtnl_lock, but I'm not sure if this is actually clear to most people, anyway. Fixes: `da7772a2b4` ("net: move mp->rx_page_size validation to __net_mp_open_rxq()") Acked-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Mina Almasry <almasrymina@google.com> Link: https://patch.msgid.link/20260404001938.2425670-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-07 18:57:56 -07:00

1 2 3 4 5 ...

83503 Commits