linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-13 11:48:58 -04:00

Author	SHA1	Message	Date
Matthieu Baerts (NGI0)	14e22b43df	selftests: mptcp: connect: catch IO errors on listen side IO errors were correctly printed to stderr, and propagated up to the main loop for the server side, but the returned value was ignored. As a consequence, the program for the listener side was no longer exiting with an error code in case of IO issues. Because of that, some issues might not have been seen. But very likely, most issues either had an effect on the client side, or the file transfer was not the expected one, e.g. the connection got reset before the end. Still, it is better to fix this. The main consequence of this issue is the error that was reported by the selftests: the received and sent files were different, and the MIB counters were not printed. Also, when such errors happened during the 'disconnect' tests, the program tried to continue until the timeout. Now when an IO error is detected, the program exits directly with an error. Fixes: `05be5e273c` ("selftests: mptcp: add disconnect tests") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250912-net-mptcp-fix-sft-connect-v1-2-d40e77cbbf02@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-15 18:10:36 -07:00
Matthieu Baerts (NGI0)	f755be0b1f	mptcp: propagate shutdown to subflows when possible When the MPTCP DATA FIN have been ACKed, there is no more MPTCP related metadata to exchange, and all subflows can be safely shutdown. Before this patch, the subflows were actually terminated at 'close()' time. That's certainly fine most of the time, but not when the userspace 'shutdown()' a connection, without close()ing it. When doing so, the subflows were staying in LAST_ACK state on one side -- and consequently in FIN_WAIT2 on the other side -- until the 'close()' of the MPTCP socket. Now, when the DATA FIN have been ACKed, all subflows are shutdown. A consequence of this is that the TCP 'FIN' flag can be set earlier now, but the end result is the same. This affects the packetdrill tests looking at the end of the MPTCP connections, but for a good reason. Note that tcp_shutdown() will check the subflow state, so no need to do that again before calling it. Fixes: `3721b9b646` ("mptcp: Track received DATA_FIN sequence number and add related helpers") Cc: stable@vger.kernel.org Fixes: `16a9a9da17` ("mptcp: Add helper to process acks of DATA_FIN") Reviewed-by: Mat Martineau <martineau@kernel.org> Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250912-net-mptcp-fix-sft-connect-v1-1-d40e77cbbf02@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-15 18:10:36 -07:00
Hangbin Liu	71379e1c95	selftests: bonding: add fail_over_mac testing Add a test to check each value of bond fail_over_mac option. Also fix a minor garp_test print issue. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20250910024336.400253-2-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-15 17:48:42 -07:00
Hangbin Liu	35ae4e8629	bonding: set random address only when slaves already exist After commit `5c3bf6cba7` ("bonding: assign random address if device address is same as bond"), bonding will erroneously randomize the MAC address of the first interface added to the bond if fail_over_mac = follow. Correct this by additionally testing for the bond being empty before randomizing the MAC. Fixes: `5c3bf6cba7` ("bonding: assign random address if device address is same as bond") Reported-by: Qiuling Ren <qren@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20250910024336.400253-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-15 17:48:42 -07:00
Håkon Bugge	4351ca3fcb	rds: ib: Increment i_fastreg_wrs before bailing out We need to increment i_fastreg_wrs before we bail out from rds_ib_post_reg_frmr(). We have a fixed budget of how many FRWR operations that can be outstanding using the dedicated QP used for memory registrations and de-registrations. This budget is enforced by the atomic_t i_fastreg_wrs. If we bail out early in rds_ib_post_reg_frmr(), we will "leak" the possibility of posting an FRWR operation, and if that accumulates, no FRWR operation can be carried out. Fixes: `1659185fb4` ("RDS: IB: Support Fastreg MR (FRMR) memory registration mode") Fixes: `3a2886cca7` ("net/rds: Keep track of and wait for FRWR segments in use upon shutdown") Cc: stable@vger.kernel.org Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Link: https://patch.msgid.link/20250911133336.451212-1-haakon.bugge@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-15 16:47:53 -07:00
Jakub Kicinski	2e5fb2ff31	Merge branch 'net-dst_metadata-fix-df-flag-extraction-on-tunnel-rx' Ilya Maximets says: ==================== net: dst_metadata: fix DF flag extraction on tunnel rx Two patches here, first fixes the issue where tunnel core doesn't actually extract DF bit from the outer IP header, even though both OVS and TC flower allow matching on it. More details in the commit message. The second is a selftest for openvswitch that reproduces the issue, but also just adds some basic coverage for the tunnel metadata extraction and related openvswitch uAPI. ==================== Link: https://patch.msgid.link/20250909165440.229890-1-i.maximets@ovn.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-14 14:28:15 -07:00
Ilya Maximets	6cafb93c1f	selftests: openvswitch: add a simple test for tunnel metadata This test ensures that upon receiving decapsulated packets from a tunnel interface in openvswitch, the tunnel metadata fields are properly populated. This partially covers interoperability of the kernel tunnel ports and openvswitch tunnels (LWT) and parsing and formatting of the tunnel metadata fields of the openvswitch netlink uAPI. Doing so, this test also ensures that fields and flags are properly extracted during decapsulation by the tunnel core code, serving as a regression test for the previously fixed issue with the DF bit not being extracted from the outer IP header. The ovs-dpctl.py script already supports all that is necessary for the tunnel ports for this test, so we only need to adjust the ovs_add_if() function to pass the '-t' port type argument in order to be able to create tunnel ports in the openvswitch datapath. Reviewed-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Link: https://patch.msgid.link/20250909165440.229890-3-i.maximets@ovn.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-14 14:28:13 -07:00
Ilya Maximets	a9888628cb	net: dst_metadata: fix IP_DF bit not extracted from tunnel headers Both OVS and TC flower allow extracting and matching on the DF bit of the outer IP header via OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT in the OVS_KEY_ATTR_TUNNEL and TCA_FLOWER_KEY_FLAGS_TUNNEL_DONT_FRAGMENT in the TCA_FLOWER_KEY_ENC_FLAGS respectively. Flow dissector extracts this information as FLOW_DIS_F_TUNNEL_DONT_FRAGMENT from the tunnel info key. However, the IP_TUNNEL_DONT_FRAGMENT_BIT in the tunnel key is never actually set, because the tunneling code doesn't actually extract it from the IP header. OAM and CRIT_OPT are extracted by the tunnel implementation code, same code also sets the KEY flag, if present. UDP tunnel core takes care of setting the CSUM flag if the checksum is present in the UDP header, but the DONT_FRAGMENT is not handled at any layer. Fix that by checking the bit and setting the corresponding flag while populating the tunnel info in the IP layer where it belongs. Not using __assign_bit as we don't really need to clear the bit in a just initialized field. It also doesn't seem like using __assign_bit will make the code look better. Clearly, users didn't rely on this functionality for anything very important until now. The reason why this doesn't break OVS logic is that it only matches on what kernel previously parsed out and if kernel consistently reports this bit as zero, OVS will only match on it to be zero, which sort of works. But it is still a bug that the uAPI reports and allows matching on the field that is not actually checked in the packet. And this is causing misleading -df reporting in OVS datapath flows, while the tunnel traffic actually has the bit set in most cases. This may also cause issues if a hardware properly implements support for tunnel flag matching as it will disagree with the implementation in a software path of TC flower. Fixes: `7d5437c709` ("openvswitch: Add tunneling interface.") Fixes: `1d17568e74` ("net/sched: cls_flower: add support for matching tunnel control flags") Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250909165440.229890-2-i.maximets@ovn.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-14 14:28:12 -07:00
Jamie Bainbridge	56c0a2a9dd	qed: Don't collect too many protection override GRC elements In the protection override dump path, the firmware can return far too many GRC elements, resulting in attempting to write past the end of the previously-kmalloc'ed dump buffer. This will result in a kernel panic with reason: BUG: unable to handle kernel paging request at ADDRESS where "ADDRESS" is just past the end of the protection override dump buffer. The start address of the buffer is: p_hwfn->cdev->dbg_features[DBG_FEATURE_PROTECTION_OVERRIDE].dump_buf and the size of the buffer is buf_size in the same data structure. The panic can be arrived at from either the qede Ethernet driver path: [exception RIP: qed_grc_dump_addr_range+0x108] qed_protection_override_dump at ffffffffc02662ed [qed] qed_dbg_protection_override_dump at ffffffffc0267792 [qed] qed_dbg_feature at ffffffffc026aa8f [qed] qed_dbg_all_data at ffffffffc026b211 [qed] qed_fw_fatal_reporter_dump at ffffffffc027298a [qed] devlink_health_do_dump at ffffffff82497f61 devlink_health_report at ffffffff8249cf29 qed_report_fatal_error at ffffffffc0272baf [qed] qede_sp_task at ffffffffc045ed32 [qede] process_one_work at ffffffff81d19783 or the qedf storage driver path: [exception RIP: qed_grc_dump_addr_range+0x108] qed_protection_override_dump at ffffffffc068b2ed [qed] qed_dbg_protection_override_dump at ffffffffc068c792 [qed] qed_dbg_feature at ffffffffc068fa8f [qed] qed_dbg_all_data at ffffffffc0690211 [qed] qed_fw_fatal_reporter_dump at ffffffffc069798a [qed] devlink_health_do_dump at ffffffff8aa95e51 devlink_health_report at ffffffff8aa9ae19 qed_report_fatal_error at ffffffffc0697baf [qed] qed_hw_err_notify at ffffffffc06d32d7 [qed] qed_spq_post at ffffffffc06b1011 [qed] qed_fcoe_destroy_conn at ffffffffc06b2e91 [qed] qedf_cleanup_fcport at ffffffffc05e7597 [qedf] qedf_rport_event_handler at ffffffffc05e7bf7 [qedf] fc_rport_work at ffffffffc02da715 [libfc] process_one_work at ffffffff8a319663 Resolve this by clamping the firmware's return value to the maximum number of legal elements the firmware should return. Fixes: `d52c89f120` ("qed*: Utilize FW 8.37.2.0") Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com> Link: https://patch.msgid.link/f8e1182934aa274c18d0682a12dbaf347595469c.1757485536.git.jamie.bainbridge@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-14 14:25:03 -07:00
Kamal Heib	af82e857df	octeon_ep: Validate the VF ID Add a helper to validate the VF ID and use it in the VF ndo ops to prevent accessing out-of-range entries. Without this check, users can run commands such as: # ip link show dev enp135s0 2: enp135s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether 00:00:00:01:01:00 brd ff:ff:ff:ff:ff:ff vf 0 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state enable, trust off vf 1 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state enable, trust off # ip link set dev enp135s0 vf 4 mac 00:00:00:00:00:14 # echo $? 0 even though VF 4 does not exist, which results in silent success instead of returning an error. Fixes: `8a241ef9b9` ("octeon_ep: add ndo ops for VFs in PF driver") Signed-off-by: Kamal Heib <kheib@redhat.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250911223610.1803144-1-kheib@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-14 13:12:03 -07:00
David Howells	2429a19764	rxrpc: Fix untrusted unsigned subtract Fix the following Smatch static checker warning: net/rxrpc/rxgk_app.c:65 rxgk_yfs_decode_ticket() warn: untrusted unsigned subtract. 'ticket_len - 10 * 4' by prechecking the length of what we're trying to extract in two places in the token and decoding for a response packet. Also use sizeof() on the struct we're extracting rather specifying the size numerically to be consistent with the other related statements. Fixes: `9d1d2b5934` ("rxrpc: rxgk: Implement the yfs-rxgk security class (GSSAPI)") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lists.infradead.org/pipermail/linux-afs/2025-September/010135.html Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/2039268.1757631977@warthog.procyon.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-14 13:05:22 -07:00
David Howells	64863f4ca4	rxrpc: Fix unhandled errors in rxgk_verify_packet_integrity() rxgk_verify_packet_integrity() may get more errors than just -EPROTO from rxgk_verify_mic_skb(). Pretty much anything other than -ENOMEM constitutes an unrecoverable error. In the case of -ENOMEM, we can just drop the packet and wait for a retransmission. Similar happens with rxgk_decrypt_skb() and its callers. Fix rxgk_decrypt_skb() or rxgk_verify_mic_skb() to return a greater variety of abort codes and fix their callers to abort the connection on any error apart from -ENOMEM. Also preclear the variables used to hold the abort code returned from rxgk_decrypt_skb() or rxgk_verify_mic_skb() to eliminate uninitialised variable warnings. Fixes: `9d1d2b5934` ("rxrpc: rxgk: Implement the yfs-rxgk security class (GSSAPI)") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lists.infradead.org/pipermail/linux-afs/2025-April/009739.html Closes: https://lists.infradead.org/pipermail/linux-afs/2025-April/009740.html Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/2038804.1757631496@warthog.procyon.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-14 13:05:22 -07:00
Ivan Vecera	70d99623d5	dpll: fix clock quality level reporting The DPLL_CLOCK_QUALITY_LEVEL_ITU_OPT1_EPRC is not reported via netlink due to bug in dpll_msg_add_clock_quality_level(). The usage of DPLL_CLOCK_QUALITY_LEVEL_MAX for both DECLARE_BITMAP() and for_each_set_bit() is not correct because these macros requires bitmap size and not the highest valid bit in the bitmap. Use correct bitmap size to fix this issue. Fixes: `a1afb959ad` ("dpll: add clock quality level attribute and op") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Link: https://patch.msgid.link/20250912093331.862333-1-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-14 13:03:40 -07:00
Anderson Nascimento	2e7bba0892	net/tcp: Fix a NULL pointer dereference when using TCP-AO with TCP_REPAIR A NULL pointer dereference can occur in tcp_ao_finish_connect() during a connect() system call on a socket with a TCP-AO key added and TCP_REPAIR enabled. The function is called with skb being NULL and attempts to dereference it on tcp_hdr(skb)->seq without a prior skb validation. Fix this by checking if skb is NULL before dereferencing it. The commentary is taken from bpf_skops_established(), which is also called in the same flow. Unlike the function being patched, bpf_skops_established() validates the skb before dereferencing it. int main(void){ struct sockaddr_in sockaddr; struct tcp_ao_add tcp_ao; int sk; int one = 1; memset(&sockaddr,'\0',sizeof(sockaddr)); memset(&tcp_ao,'\0',sizeof(tcp_ao)); sk = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP); sockaddr.sin_family = AF_INET; memcpy(tcp_ao.alg_name,"cmac(aes128)",12); memcpy(tcp_ao.key,"ABCDEFGHABCDEFGH",16); tcp_ao.keylen = 16; memcpy(&tcp_ao.addr,&sockaddr,sizeof(sockaddr)); setsockopt(sk, IPPROTO_TCP, TCP_AO_ADD_KEY, &tcp_ao, sizeof(tcp_ao)); setsockopt(sk, IPPROTO_TCP, TCP_REPAIR, &one, sizeof(one)); sockaddr.sin_family = AF_INET; sockaddr.sin_port = htobe16(123); inet_aton("127.0.0.1", &sockaddr.sin_addr); connect(sk,(struct sockaddr *)&sockaddr,sizeof(sockaddr)); return 0; } $ gcc tcp-ao-nullptr.c -o tcp-ao-nullptr -Wall $ unshare -Urn BUG: kernel NULL pointer dereference, address: 00000000000000b6 PGD 1f648d067 P4D 1f648d067 PUD 1982e8067 PMD 0 Oops: Oops: 0000 [#1] SMP NOPTI Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 RIP: 0010:tcp_ao_finish_connect (net/ipv4/tcp_ao.c:1182) Fixes: `7c2ffaf21b` ("net/tcp: Calculate TCP-AO traffic keys") Signed-off-by: Anderson Nascimento <anderson@allelesecurity.com> Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250911230743.2551-3-anderson@allelesecurity.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-14 12:49:53 -07:00
Russell King (Oracle)	201825fb42	net: ethtool: handle EOPNOTSUPP from ethtool get_ts_info() method Network drivers sometimes return -EOPNOTSUPP from their get_ts_info() method, and this should not cause the reporting of PHY timestamping information to be prohibited. Handle this error code, and also arrange for ethtool_net_get_ts_info_by_phc() to return -EOPNOTSUPP when the method is not implemented. This allows e.g. PHYs connected to DSA switches which support timestamping to report their timestamping capabilities. Fixes: `b9e3f7dc9e` ("net: ethtool: tsinfo: Enhance tsinfo to support several hwtstamp by net topology") Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Kory Maincent <kory.maincent@bootlin.com> Link: https://patch.msgid.link/E1uwiW3-00000004jRF-3CnC@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-12 17:09:10 -07:00
Ioana Ciornei	2690cb0895	dpaa2-switch: fix buffer pool seeding for control traffic Starting with commit `c50e747596` ("dpaa2-switch: Fix error checking in dpaa2_switch_seed_bp()"), the probing of a second DPSW object errors out like below. fsl_dpaa2_switch dpsw.1: fsl_mc_driver_probe failed: -12 fsl_dpaa2_switch dpsw.1: probe with driver fsl_dpaa2_switch failed with error -12 The aforementioned commit brought to the surface the fact that seeding buffers into the buffer pool destined for control traffic is not successful and an access violation recoverable error can be seen in the MC firmware log: [E, qbman_rec_isr:391, QBMAN] QBMAN recoverable event 0x1000000 This happens because the driver incorrectly used the ID of the DPBP object instead of the hardware buffer pool ID when trying to release buffers into it. This is because any DPSW object uses two buffer pools, one managed by the Linux driver and destined for control traffic packet buffers and the other one managed by the MC firmware and destined only for offloaded traffic. And since the buffer pool managed by the MC firmware does not have an external facing DPBP equivalent, any subsequent DPBP objects created after the first DPSW will have a DPBP id different to the underlying hardware buffer ID. The issue was not caught earlier because these two numbers can be identical when all DPBP objects are created before the DPSW objects are. This is the case when the DPL file is used to describe the entire DPAA2 object layout and objects are created at boot time and it's also true for the first DPSW being created dynamically using ls-addsw. Fix this by using the buffer pool ID instead of the DPBP id when releasing buffers into the pool. Fixes: `2877e4f7e1` ("staging: dpaa2-switch: setup buffer pool and RX path rings") Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Link: https://patch.msgid.link/20250910144825.2416019-1-ioana.ciornei@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-11 18:51:25 -07:00
Li Tian	5577352b55	net/mlx5: Not returning mlx5_link_info table when speed is unknown Because mlx5e_link_info and mlx5e_ext_link_info have holes e.g. Azure mlx5 reports PTYS 19. Do not return it unless speed is retrieved successfully. Fixes: `65a5d35571` ("net/mlx5: Refactor link speed handling with mlx5_link_info struct") Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Li Tian <litian@redhat.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250910003732.5973-1-litian@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-11 18:07:42 -07:00
Samiullah Khawaja	247981eecd	net: Use NAPI_* in test_bit when stopping napi kthread napi_stop_kthread waits for the NAPI_STATE_SCHED_THREADED to be unset before stopping the kthread. But it uses test_bit with the NAPIF_STATE_SCHED_THREADED and that might stop the kthread early before the flag is unset. Use the NAPI_* variant of the NAPI state bits in test_bit instead. Tested: ./tools/testing/selftests/net/nl_netdev.py TAP version 13 1..7 ok 1 nl_netdev.empty_check ok 2 nl_netdev.lo_check ok 3 nl_netdev.page_pool_check ok 4 nl_netdev.napi_list_check ok 5 nl_netdev.dev_set_threaded ok 6 nl_netdev.napi_set_threaded ok 7 nl_netdev.nsim_rxq_reset_down # Totals: pass:7 fail:0 xfail:0 xpass:0 skip:0 error:0 ./tools/testing/selftests/drivers/net/napi_threaded.py TAP version 13 1..2 ok 1 napi_threaded.change_num_queues ok 2 napi_threaded.enable_dev_threaded_disable_napi_threaded # Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0 Fixes: `689883de94` ("net: stop napi kthreads when THREADED napi is disabled") Signed-off-by: Samiullah Khawaja <skhawaja@google.com> Link: https://patch.msgid.link/20250910203716.1016546-1-skhawaja@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-11 17:52:42 -07:00
Linus Torvalds	db87bd2ad1	Merge tag 'net-6.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from CAN, netfilter and wireless. We have an IPv6 routing regression with the relevant fix still a WiP. This includes a last-minute revert to avoid more problems. Current release - new code bugs: - wifi: nl80211: completely disable per-link stats for now Previous releases - regressions: - dev_ioctl: take ops lock in hwtstamp lower paths - netfilter: - fix spurious set lookup failures - fix lockdep splat due to missing annotation - genetlink: fix genl_bind() invoking bind() after -EPERM - phy: transfer phy_config_inband() locking responsibility to phylink - can: xilinx_can: fix use-after-free of transmitted SKB - hsr: fix lock warnings - eth: - igb: fix NULL pointer dereference in ethtool loopback test - i40e: fix Jumbo Frame support after iPXE boot - macsec: sync features on RTM_NEWLINK Previous releases - always broken: - tunnels: reset the GSO metadata before reusing the skb - mptcp: make sync_socket_options propagate SOCK_KEEPOPEN - can: j1939: implement NETDEV_UNREGISTER notification hanidler - wifi: ath12k: fix WMI TLV header misalignment" * tag 'net-6.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits) Revert "net: usb: asix: ax88772: drop phylink use in PM to avoid MDIO runtime PM wakeups" hsr: hold rcu and dev lock for hsr_get_port_ndev hsr: use hsr_for_each_port_rtnl in hsr_port_get_hsr hsr: use rtnl lock when iterating over ports wifi: nl80211: completely disable per-link stats for now net: usb: asix: ax88772: drop phylink use in PM to avoid MDIO runtime PM wakeups net: ethtool: fix wrong type used in struct kernel_ethtool_ts_info MAINTAINERS: add Phil as netfilter reviewer netfilter: nf_tables: restart set lookup on base_seq change netfilter: nf_tables: make nft_set_do_lookup available unconditionally netfilter: nf_tables: place base_seq in struct net netfilter: nft_set_rbtree: continue traversal if element is inactive netfilter: nft_set_pipapo: don't check genbit from packetpath lookups netfilter: nft_set_bitmap: fix lockdep splat due to missing annotation can: rcar_can: rcar_can_resume(): fix s2ram with PSCI can: xilinx_can: xcan_write_frame(): fix use-after-free of transmitted SKB can: j1939: j1939_local_ecu_get(): undo increment when j1939_local_ecu_get() fails can: j1939: j1939_sk_bind(): call j1939_priv_put() immediately when j1939_local_ecu_get() failed can: j1939: implement NETDEV_UNREGISTER notification handler selftests: can: enable CONFIG_CAN_VCAN as a module ...	2025-09-11 08:54:42 -07:00
Linus Torvalds	e59a039119	Merge tag 's390-6.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Alexander Gordeev: - ptep_modify_prot_start() may be called in a loop, which might lead to the preempt_count overflow due to the unnecessary preemption disabling. Do not disable preemption to prevent the overflow - Events of type PERF_TYPE_HARDWARE are not tested for sampling and return -EOPNOTSUPP eventually. Instead, deny all sampling events by CPUMF counter facility and return -ENOENT to allow other PMUs to be tried - The PAI PMU driver returns -EINVAL if an event out of its range. That aborts a search for an alternative PMU driver. Instead, return -ENOENT to allow other PMUs to be tried * tag 's390-6.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/cpum_cf: Deny all sampling events by counter PMU s390/pai: Deny all events not handled by this PMU s390/mm: Prevent possible preempt_count overflow	2025-09-11 08:46:30 -07:00
Linus Torvalds	a1228f048a	Merge tag 'pm-6.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix a nasty hibernation regression introduced during the 6.16 cycle, an issue related to energy model management occurring on Intel hybrid systems where some CPUs are offline to start with, and two regressions in the amd-pstate driver: - Restore a pm_restrict_gfp_mask() call in hibernation_snapshot() that was removed incorrectly during the 6.16 development cycle (Rafael Wysocki) - Introduce a function for registering a perf domain without triggering a system-wide CPU capacity update and make the intel_pstate driver use it to avoid reocurring unsuccessful attempts to update capacities of all CPUs in the system (Rafael Wysocki) - Fix setting of CPPC.min_perf in the active mode with performance governor in the amd-pstate driver to restore its expected behavior changed recently (Gautham Shenoy) - Avoid mistakenly setting EPP to 0 in the amd-pstate driver after system resume as a result of recent code changes (Mario Limonciello)" * tag 'pm-6.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM: hibernate: Restrict GFP mask in hibernation_snapshot() PM: EM: Add function for registering a PD without capacity update cpufreq/amd-pstate: Fix a regression leading to EPP 0 after resume cpufreq/amd-pstate: Fix setting of CPPC.min_perf in active mode for performance governor	2025-09-11 08:11:16 -07:00
Linus Torvalds	b10c31b70b	Merge tag 'for-6.17-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - fix delayed inode tracking in xarray, eviction can race with insertion and leave behind a disconnected inode - on systems with large page (64K) and small block size (4K) fix compression read that can return partially filled folio - slightly relax compression option format for backward compatibility, allow to specify level for LZO although there's only one - fix simple quota accounting of compressed extents - validate minimum device size in 'device add' - update maintainers' entry * tag 'for-6.17-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: don't allow adding block device of less than 1 MB MAINTAINERS: update btrfs entry btrfs: fix subvolume deletion lockup caused by inodes xarray race btrfs: fix corruption reading compressed range when block size is smaller than page size btrfs: accept and ignore compression level for lzo btrfs: fix squota compressed stats leak	2025-09-11 08:01:18 -07:00
Linus Torvalds	02ffd6f89c	Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Pull bpf fixes from Alexei Starovoitov: "A number of fixes accumulated due to summer vacations - Fix out-of-bounds dynptr write in bpf_crypto_crypt() kfunc which was misidentified as a security issue (Daniel Borkmann) - Update the list of BPF selftests maintainers (Eduard Zingerman) - Fix selftests warnings with icecc compiler (Ilya Leoshkevich) - Disable XDP/cpumap direct return optimization (Jesper Dangaard Brouer) - Fix unexpected get_helper_proto() result in unusual configuration BPF_SYSCALL=y and BPF_EVENTS=n (Jiri Olsa) - Allow fallback to interpreter when JIT support is limited (KaFai Wan) - Fix rqspinlock and choose trylock fallback for NMI waiters. Pick the simplest fix. More involved fix is targeted bpf-next (Kumar Kartikeya Dwivedi) - Fix cleanup when tcp_bpf_send_verdict() fails to allocate psock->cork (Kuniyuki Iwashima) - Disallow bpf_timer in PREEMPT_RT for now. Proper solution is being discussed for bpf-next. (Leon Hwang) - Fix XSK cq descriptor production (Maciej Fijalkowski) - Tell memcg to use allow_spinning=false path in bpf_timer_init() to avoid lockup in cgroup_file_notify() (Peilin Ye) - Fix bpf_strnstr() to handle suffix match cases (Rong Tao)" * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: Skip timer cases when bpf_timer is not supported bpf: Reject bpf_timer for PREEMPT_RT tcp_bpf: Call sk_msg_free() when tcp_bpf_send_verdict() fails to allocate psock->cork. bpf: Tell memcg to use allow_spinning=false path in bpf_timer_init() bpf: Allow fall back to interpreter for programs with stack size <= 512 rqspinlock: Choose trylock fallback for NMI waiters xsk: Fix immature cq descriptor production bpf: Update the list of BPF selftests maintainers selftests/bpf: Add tests for bpf_strnstr selftests/bpf: Fix "expression result unused" warnings with icecc bpf: Fix bpf_strnstr() to handle suffix match cases better selftests/bpf: Extend crypto_sanity selftest with invalid dst buffer bpf: Fix out-of-bounds dynptr write in bpf_crypto_crypt bpf: Check the helper function is valid in get_helper_proto bpf, cpumap: Disable page_pool direct xdp_return need larger scope	2025-09-11 07:54:16 -07:00
Paolo Abeni	63a796558b	Revert "net: usb: asix: ax88772: drop phylink use in PM to avoid MDIO runtime PM wakeups" This reverts commit `5537a46794` ("net: usb: asix: ax88772: drop phylink use in PM to avoid MDIO runtime PM wakeups"), it breaks operation of asix ethernet usb dongle after system suspend-resume cycle. Link: https://lore.kernel.org/all/b5ea8296-f981-445d-a09a-2f389d7f6fdd@samsung.com/ Fixes: `5537a46794` ("net: usb: asix: ax88772: drop phylink use in PM to avoid MDIO runtime PM wakeups") Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/2945b9dbadb8ee1fee058b19554a5cb14f1763c1.1757601118.git.pabeni@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-09-11 16:46:04 +02:00
Rafael J. Wysocki	bddce1c7a5	Merge branches 'pm-sleep' and 'pm-em' Merge a hibernation regression fix and an fix related to energy model management for 6.17-rc6 * pm-sleep: PM: hibernate: Restrict GFP mask in hibernation_snapshot() * pm-em: PM: EM: Add function for registering a PD without capacity update	2025-09-11 14:22:35 +02:00
Paolo Abeni	62e1de1d33	Merge tag 'wireless-2025-09-11' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Some more fixes: - iwlwifi: fix 130/1030 devices - ath12k: fix alignment, power save - virt_wifi: fix crash - cfg80211: disable per-link stats due to buffer size issues * tag 'wireless-2025-09-11' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: nl80211: completely disable per-link stats for now wifi: virt_wifi: Fix page fault on connect wifi: cfg80211: Fix "no buffer space available" error in nl80211_get_station() for MLO wifi: iwlwifi: fix 130/1030 configs wifi: ath12k: fix WMI TLV header misalignment wifi: ath12k: Fix missing station power save configuration ==================== Link: https://patch.msgid.link/20250911100345.20025-3-johannes@sipsolutions.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-09-11 12:49:53 +02:00
Paolo Abeni	9b1fbd3539	Merge branch 'hsr-fix-lock-warnings' Hangbin Liu says: ==================== hsr: fix lock warnings hsr_for_each_port is called in many places without holding the RCU read lock, this may trigger warnings on debug kernels like: [ 40.457015] [ T201] WARNING: suspicious RCU usage [ 40.457020] [ T201] 6.17.0-rc2-virtme #1 Not tainted [ 40.457025] [ T201] ----------------------------- [ 40.457029] [ T201] net/hsr/hsr_main.c:137 RCU-list traversed in non-reader section!! [ 40.457036] [ T201] other info that might help us debug this: [ 40.457040] [ T201] rcu_scheduler_active = 2, debug_locks = 1 [ 40.457045] [ T201] 2 locks held by ip/201: [ 40.457050] [ T201] #0: ffffffff93040a40 (&ops->srcu){.+.+}-{0:0}, at: rtnl_link_ops_get+0xf2/0x280 [ 40.457080] [ T201] #1: ffffffff92e7f968 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_newlink+0x5e1/0xb20 [ 40.457102] [ T201] stack backtrace: [ 40.457108] [ T201] CPU: 2 UID: 0 PID: 201 Comm: ip Not tainted 6.17.0-rc2-virtme #1 PREEMPT(full) [ 40.457114] [ T201] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 40.457117] [ T201] Call Trace: [ 40.457120] [ T201] <TASK> [ 40.457126] [ T201] dump_stack_lvl+0x6f/0xb0 [ 40.457136] [ T201] lockdep_rcu_suspicious.cold+0x4f/0xb1 [ 40.457148] [ T201] hsr_port_get_hsr+0xfe/0x140 [ 40.457158] [ T201] hsr_add_port+0x192/0x940 [ 40.457167] [ T201] ? __pfx_hsr_add_port+0x10/0x10 [ 40.457176] [ T201] ? lockdep_init_map_type+0x5c/0x270 [ 40.457189] [ T201] hsr_dev_finalize+0x4bc/0xbf0 [ 40.457204] [ T201] hsr_newlink+0x3c3/0x8f0 [ 40.457212] [ T201] ? __pfx_hsr_newlink+0x10/0x10 [ 40.457222] [ T201] ? rtnl_create_link+0x173/0xe40 [ 40.457233] [ T201] rtnl_newlink_create+0x2cf/0x750 [ 40.457243] [ T201] ? __pfx_rtnl_newlink_create+0x10/0x10 [ 40.457247] [ T201] ? __dev_get_by_name+0x12/0x50 [ 40.457252] [ T201] ? rtnl_dev_get+0xac/0x140 [ 40.457259] [ T201] ? __pfx_rtnl_dev_get+0x10/0x10 [ 40.457285] [ T201] __rtnl_newlink+0x22c/0xa50 [ 40.457305] [ T201] rtnl_newlink+0x637/0xb20 Adding rcu_read_lock() for all hsr_for_each_port() looks confusing. Introduce a new helper, hsr_for_each_port_rtnl(), that assumes the RTNL lock is held. This allows callers in suitable contexts to iterate ports safely without explicit RCU locking. Other code paths that rely on RCU protection continue to use hsr_for_each_port() with rcu_read_lock(). ==================== Link: https://patch.msgid.link/20250905091533.377443-1-liuhangbin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-09-11 11:49:29 +02:00
Hangbin Liu	847748fc66	hsr: hold rcu and dev lock for hsr_get_port_ndev hsr_get_port_ndev calls hsr_for_each_port, which need to hold rcu lock. On the other hand, before return the port device, we need to hold the device reference to avoid UaF in the caller function. Suggested-by: Paolo Abeni <pabeni@redhat.com> Fixes: `9c10dd8eed` ("net: hsr: Create and export hsr_get_port_ndev()") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250905091533.377443-4-liuhangbin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-09-11 11:49:19 +02:00
Hangbin Liu	393c841fe4	hsr: use hsr_for_each_port_rtnl in hsr_port_get_hsr hsr_port_get_hsr() iterates over ports using hsr_for_each_port(), but many of its callers do not hold the required RCU lock. Switch to hsr_for_each_port_rtnl(), since most callers already hold the rtnl lock. After review, all callers are covered by either the rtnl lock or the RCU lock, except hsr_dev_xmit(). Fix this by adding an RCU read lock there. Fixes: `c5a7591172` ("net/hsr: Use list_head (and rcu) instead of array for slave devices.") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250905091533.377443-3-liuhangbin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-09-11 11:49:19 +02:00
Hangbin Liu	8884c69399	hsr: use rtnl lock when iterating over ports hsr_for_each_port is called in many places without holding the RCU read lock, this may trigger warnings on debug kernels. Most of the callers are actually hold rtnl lock. So add a new helper hsr_for_each_port_rtnl to allow callers in suitable contexts to iterate ports safely without explicit RCU locking. This patch only fixed the callers that is hold rtnl lock. Other caller issues will be fixed in later patches. Fixes: `c5a7591172` ("net/hsr: Use list_head (and rcu) instead of array for slave devices.") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250905091533.377443-2-liuhangbin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-09-11 11:49:19 +02:00
Johannes Berg	c3f8d13357	wifi: nl80211: completely disable per-link stats for now After commit `8cc71fc3b8` ("wifi: cfg80211: Fix "no buffer space available" error in nl80211_get_station() for MLO"), the per-link data is only included in station dumps, where the size limit is somewhat less of an issue. However, it's still an issue, depending on how many links a station has and how much per-link data there is. Thus, for now, disable per-link statistics entirely. A complete fix will need to take this into account, make it opt-in by userspace, and change the dump format to be able to split a single station's data across multiple netlink dump messages, which all together is too much development for a fix. Fixes: `82d7f841d9` ("wifi: cfg80211: extend to embed link level statistics in NL message") Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2025-09-11 08:50:31 +02:00
Linus Torvalds	4f553c1e2c	Merge tag 'mm-hotfixes-stable-2025-09-10-20-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "20 hotfixes. 15 are cc:stable and the remainder address post-6.16 issues or aren't considered necessary for -stable kernels. 14 of these fixes are for MM. This includes - kexec fixes from Breno for a recently introduced use-uninitialized bug - DAMON fixes from Quanmin Yan to avoid div-by-zero crashes which can occur if the operator uses poorly-chosen insmod parameters and misc singleton fixes" * tag 'mm-hotfixes-stable-2025-09-10-20-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: MAINTAINERS: add tree entry to numa memblocks and emulation block mm/damon/sysfs: fix use-after-free in state_show() proc: fix type confusion in pde_set_flags() compiler-clang.h: define __SANITIZE_*__ macros only when undefined mm/vmalloc, mm/kasan: respect gfp mask in kasan_populate_vmalloc() ocfs2: fix recursive semaphore deadlock in fiemap call mm/memory-failure: fix VM_BUG_ON_PAGE(PagePoisoned(page)) when unpoison memory mm/mremap: fix regression in vrm->new_addr check percpu: fix race on alloc failed warning limit mm/memory-failure: fix redundant updates for already poisoned pages s390: kexec: initialize kexec_buf struct riscv: kexec: initialize kexec_buf struct arm64: kexec: initialize kexec_buf struct in load_other_segments() mm/damon/reclaim: avoid divide-by-zero in damon_reclaim_apply_parameters() mm/damon/lru_sort: avoid divide-by-zero in damon_lru_sort_apply_parameters() mm/damon/core: set quota->charged_from to jiffies at first charge window mm/hugetlb: add missing hugetlb_lock in __unmap_hugepage_range() init/main.c: fix boot time tracing crash mm/memory_hotplug: fix hwpoisoned large folio handling in do_migrate_range() mm/khugepaged: fix the address passed to notifier on testing young	2025-09-10 21:19:34 -07:00
Linus Torvalds	223ba8ee0a	Merge tag 'vmscape-for-linus-20250904' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull vmescape mitigation fixes from Dave Hansen: "Mitigate vmscape issue with indirect branch predictor flushes. vmscape is a vulnerability that essentially takes Spectre-v2 and attacks host userspace from a guest. It particularly affects hypervisors like QEMU. Even if a hypervisor may not have any sensitive data like disk encryption keys, guest-userspace may be able to attack the guest-kernel using the hypervisor as a confused deputy. There are many ways to mitigate vmscape using the existing Spectre-v2 defenses like IBRS variants or the IBPB flushes. This series focuses solely on IBPB because it works universally across vendors and all vulnerable processors. Further work doing vendor and model-specific optimizations can build on top of this if needed / wanted. Do the normal issue mitigation dance: - Add the CPU bug boilerplate - Add a list of vulnerable CPUs - Use IBPB to flush the branch predictors after running guests" * tag 'vmscape-for-linus-20250904' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/vmscape: Add old Intel CPUs to affected list x86/vmscape: Warn when STIBP is disabled with SMT x86/bugs: Move cpu_bugs_smt_update() down x86/vmscape: Enable the mitigation x86/vmscape: Add conditional IBPB mitigation x86/vmscape: Enumerate VMSCAPE bug Documentation/hw-vuln: Add VMSCAPE documentation	2025-09-10 20:52:16 -07:00
Jakub Kicinski	3a1a66d124	Merge tag 'nf-25-09-10-v2' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Florian Westpha says: ==================== netfilter pull request nf-25-09-10 First patch adds a lockdep annotation for a false-positive splat. Last patch adds formal reviewer tag for Phil Sutter to MAINTAINERS. Rest of the patches resolve spurious false negative results during set lookups while another CPU is processing a transaction. This has been broken at least since v4.18 when an unconditional synchronize_rcu call was removed from the commit phase of nf_tables. Quoting from Stefan Hanreichs original report: It seems like we've found an issue with atomicity when reloading nftables rulesets. Sometimes there is a small window where rules containing sets do not seem to apply to incoming traffic, due to the set apparently being empty for a short amount of time when flushing / adding elements. Exanple ruleset: table ip filter { set match { type ipv4_addr flags interval elements = { 0.0.0.0-192.168.2.19, 192.168.2.21-255.255.255.255 } } chain pre { type filter hook prerouting priority filter; policy accept; ip saddr @match accept counter comment "must never match" } } Reproducer transaction: while true: nft -f -<<EOF flush set ip filter match create element ip filter match { \ 0.0.0.0-192.168.2.19, 192.168.2.21-255.255.255.255 } EOF done Then create traffic. to/from e.g. 192.168.2.1 to 192.168.3.10. Once in a while the counter will increment even though the 'ip saddr @match' rule should have accepted the packet. See individual patches for details. Thanks to Stefan Hanreich for an initial description and reproducer for this bug and to Pablo Neira Ayuso for reviewing earlier iterations of the patchset. * tag 'nf-25-09-10-v2' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: MAINTAINERS: add Phil as netfilter reviewer netfilter: nf_tables: restart set lookup on base_seq change netfilter: nf_tables: make nft_set_do_lookup available unconditionally netfilter: nf_tables: place base_seq in struct net netfilter: nft_set_rbtree: continue traversal if element is inactive netfilter: nft_set_pipapo: don't check genbit from packetpath lookups netfilter: nft_set_bitmap: fix lockdep splat due to missing annotation ==================== Link: https://patch.msgid.link/20250910190308.13356-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-10 19:36:49 -07:00
Jakub Kicinski	ccf78f7f05	Merge tag 'linux-can-fixes-for-6.17-20250910' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2025-09-10 The 1st patch is by Alex Tran and fixes the Documentation of the struct bcm_msg_head. Davide Caratti's patch enabled the VCAN driver as a module for the Linux self tests. Tetsuo Handa contributes 3 patches that fix various problems in the CAN j1939 protocol. Anssi Hannula's patch fixes a potential use-after-free in the xilinx_can driver. Geert Uytterhoeven's patch fixes the rcan_can's suspend to RAM on R-Car Gen3 using PSCI. * tag 'linux-can-fixes-for-6.17-20250910' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: rcar_can: rcar_can_resume(): fix s2ram with PSCI can: xilinx_can: xcan_write_frame(): fix use-after-free of transmitted SKB can: j1939: j1939_local_ecu_get(): undo increment when j1939_local_ecu_get() fails can: j1939: j1939_sk_bind(): call j1939_priv_put() immediately when j1939_local_ecu_get() failed can: j1939: implement NETDEV_UNREGISTER notification handler selftests: can: enable CONFIG_CAN_VCAN as a module docs: networking: can: change bcm_msg_head frames member to support flexible array ==================== Link: https://patch.msgid.link/20250910162907.948454-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-10 19:29:40 -07:00
Jakub Kicinski	a2ddf8a51c	Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2025-09-09 (igb, i40e) For igb: Tianyu Xu removes passing of, no longer needed, NAPI id to avoid NULL pointer dereference on ethtool loopback testing. Kohei Enju corrects reporting/testing of link state when interface is down. For i40e: Michal Schmidt corrects value being passed to free_irq(). Jake sets hardware maximum frame size on probe to ensure expected/consistent state. * '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: i40e: fix Jumbo Frame support after iPXE boot i40e: fix IRQ freeing in i40e_vsi_request_irq_msix error path igb: fix link test skipping when interface is admin down igb: Fix NULL pointer dereference in ethtool loopback test ==================== Link: https://patch.msgid.link/20250909203236.3603960-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-10 19:21:11 -07:00
Oleksij Rempel	5537a46794	net: usb: asix: ax88772: drop phylink use in PM to avoid MDIO runtime PM wakeups Drop phylink_{suspend,resume}() from ax88772 PM callbacks. MDIO bus accesses have their own runtime-PM handling and will try to wake the device if it is suspended. Such wake attempts must not happen from PM callbacks while the device PM lock is held. Since phylink {sus\|re}sume may trigger MDIO, it must not be called in PM context. No extra phylink PM handling is required for this driver: - .ndo_open/.ndo_stop control the phylink start/stop lifecycle. - ethtool/phylib entry points run in process context, not PM. - phylink MAC ops program the MAC on link changes after resume. Fixes: `e0bffe3e68` ("net: asix: ax88772: migrate to phylink") Reported-by: Hubert Wiśniewski <hubert.wisniewski.25632@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Tested-by: Hubert Wiśniewski <hubert.wisniewski.25632@gmail.com> Tested-by: Xu Yang <xu.yang_2@nxp.com> Link: https://patch.msgid.link/20250908112619.2900723-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-10 17:49:09 -07:00
Russell King (Oracle)	6fef6ae764	net: ethtool: fix wrong type used in struct kernel_ethtool_ts_info In C, enumerated types do not have a defined size, apart from being compatible with one of the standard types. This allows an ABI / compiler to choose the type of an enum depending on the values it needs to store, and storing larger values in it can lead to undefined behaviour. The tx_type and rx_filters members of struct kernel_ethtool_ts_info are defined as enumerated types, but are bit arrays, where each bit is defined by the enumerated type. This means they typically store values in excess of the maximum value of the enumerated type, in fact (1 << max_value) and thus must not be declared using the enumated type. Fix both of these to use u32, as per the corresponding __u32 UAPI type. Fixes: `2111375b85` ("net: Add struct kernel_ethtool_ts_info") Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Kory Maincent <kory.maincent@bootlin.com> Link: https://patch.msgid.link/E1uvMEK-00000003Amd-2pWR@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-10 17:43:28 -07:00
Linus Torvalds	7aac71907b	Merge tag 'nfs-for-6.17-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client fixes from Trond Myklebust: "Stable patches: - Revert "SUNRPC: Don't allow waiting for exiting tasks" as it is breaking ltp tests Bugfixes: - Another set of fixes to the tracking of NFSv4 server capabilities when crossing filesystem boundaries - Localio fix to restore credentials and prevent triggering a BUG_ON() - Fix to prevent flapping of the localio on/off trigger - Protections against 'eof page pollution' as demonstrated in xfstests generic/363 - Series of patches to ensure correct ordering of O_DIRECT i/o and truncate, fallocate and copy functions - Fix a NULL pointer check in flexfiles reads that regresses 6.17 - Correct a typo that breaks flexfiles layout segment processing" * tag 'nfs-for-6.17-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFSv4/flexfiles: Fix layout merge mirror check. SUNRPC: call xs_sock_process_cmsg for all cmsg Revert "SUNRPC: Don't allow waiting for exiting tasks" NFS: Fix the marking of the folio as up to date NFS: nfs_invalidate_folio() must observe the offset and size arguments NFSv4.2: Serialise O_DIRECT i/o and copy range NFSv4.2: Serialise O_DIRECT i/o and clone range NFSv4.2: Serialise O_DIRECT i/o and fallocate() NFS: Serialise O_DIRECT i/o and truncate() NFSv4.2: Protect copy offload and clone against 'eof page pollution' NFS: Protect against 'eof page pollution' flexfiles/pNFS: fix NULL checks on result of ff_layout_choose_ds_for_read nfs/localio: avoid bouncing LOCALIO if nfs_client_is_local() nfs/localio: restore creds before releasing pageio data NFSv4: Clear the NFS_CAP_XATTR flag if not supported by the server NFSv4: Clear NFS_CAP_OPEN_XOR and NFS_CAP_DELEGTIME if not supported NFSv4: Clear the NFS_CAP_FS_LOCATIONS flag if it is not set NFSv4: Don't clear capabilities that won't be reset	2025-09-10 12:38:41 -07:00
Alexei Starovoitov	91f34aaae0	Merge branch 'bpf-reject-bpf_timer-for-preempt_rt' Leon Hwang says: ==================== bpf: Reject bpf_timer for PREEMPT_RT While running './test_progs -t timer' to validate the test case from "selftests/bpf: Introduce experimental bpf_in_interrupt()"[0] for PREEMPT_RT, I encountered a kernel warning: BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 To address this, reject bpf_timer usage in the verifier when PREEMPT_RT is enabled, and skip the corresponding timer selftests. Changes: v2 -> v3: * Drop skipping test case 'timer_interrupt'. * Address comments from Alexei: * Respin targeting bpf tree. * Trim commit log. v1 -> v2: * Skip test case 'timer_interrupt'. Links: [0] https://lore.kernel.org/bpf/20250903140438.59517-1-leon.hwang@linux.dev/ ==================== Link: https://patch.msgid.link/20250910125740.52172-1-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-10 12:34:09 -07:00
Leon Hwang	fbdd61c94b	selftests/bpf: Skip timer cases when bpf_timer is not supported When enable CONFIG_PREEMPT_RT, verifier will reject bpf_timer with returning -EOPNOTSUPP. Therefore, skip test cases when errno is EOPNOTSUPP. cd tools/testing/selftests/bpf ./test_progs -t timer 125 free_timer:SKIP 456 timer:SKIP 457/1 timer_crash/array:SKIP 457/2 timer_crash/hash:SKIP 457 timer_crash:SKIP 458 timer_lockup:SKIP 459 timer_mim:SKIP Summary: 5/0 PASSED, 6 SKIPPED, 0 FAILED Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20250910125740.52172-3-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-10 12:34:09 -07:00
Leon Hwang	e25ddfb388	bpf: Reject bpf_timer for PREEMPT_RT When enable CONFIG_PREEMPT_RT, the kernel will warn when run timer selftests by './test_progs -t timer': BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 In order to avoid such warning, reject bpf_timer in verifier when PREEMPT_RT is enabled. Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20250910125740.52172-2-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-10 12:34:09 -07:00
Linus Torvalds	1b5d4661c7	Merge tag 'trace-v6.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Remove redundant __GFP_NOWARN flag is kmalloc As now __GFP_NOWARN is part of __GFP_NOWAIT, it can be removed from kmalloc as it is redundant. - Use copy_from_user_nofault() instead of _inatomic() for trace markers The trace_marker files are written to to allow user space to quickly write into the tracing ring buffer. Back in 2016, the get_user_pages_fast() and the kmap() logic was replaced by a __copy_from_user_inatomic(), but didn't properly disable page faults around it. Since the time this was added, copy_from_user_nofault() was added which does the required page fault disabling for us. - Fix the assembly markup in the ftrace direct sample code The ftrace direct sample code (which is also used for selftests), had the size directive between the "leave" and the "ret" instead of after the ret. This caused objtool to think the code was unreachable. - Only call unregister_pm_notifier() on outer most fgraph registration There was an error path in register_ftrace_graph() that did not call unregister_pm_notifier() on error, so it was added in the error path. The problem with that fix, is that register_pm_notifier() is only called by the initial user of fgraph. If that succeeds, but another fgraph registration were to fail, then unregister_pm_notifier() would be called incorrectly. - Fix a crash in osnoise when zero size cpumask is passed in If a zero size CPU mask is passed in, the kmalloc() would return ZERO_SIZE_PTR which is not checked, and the code would continue thinking it had real memory and crash. If zero is passed in as the size of the write, simply return 0. - Fix possible warning in trace_pid_write() If while processing a series of numbers passed to the "set_event_pid" file, and one of the updates fails to allocate (triggered by a fault injection), it can cause a warning to trigger. Check the return value of the call to trace_pid_list_set() and break out early with an error code if it fails. * tag 'trace-v6.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Silence warning when chunk allocation fails in trace_pid_write tracing/osnoise: Fix null-ptr-deref in bitmap_parselist() trace/fgraph: Fix error handling ftrace/samples: Fix function size computation tracing: Fix tracing_marker may trigger page fault during preempt_disable trace: Remove redundant __GFP_NOWARN	2025-09-10 12:03:47 -07:00
Rafael J. Wysocki	449c9c0253	PM: hibernate: Restrict GFP mask in hibernation_snapshot() Commit `12ffc3b151` ("PM: Restrict swap use to later in the suspend sequence") incorrectly removed a pm_restrict_gfp_mask() call from hibernation_snapshot(), so memory allocations involving swap are not prevented from being carried out in this code path any more which may lead to serious breakage. The symptoms of such breakage have become visible after adding a shrink_shmem_memory() call to hibernation_snapshot() in commit `2640e81947` ("PM: hibernate: shrink shmem pages after dev_pm_ops.prepare()") which caused this problem to be much more likely to manifest itself. However, since commit `2640e81947` was initially present in the DRM tree that did not include commit `12ffc3b151`, the symptoms of this issue were not visible until merge commit `260f6f4fda` ("Merge tag 'drm-next-2025-07-30' of https://gitlab.freedesktop.org/drm/kernel") that exposed it through an entirely reasonable merge conflict resolution. Fixes: `12ffc3b151` ("PM: Restrict swap use to later in the suspend sequence") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220555 Reported-by: Todd Brandt <todd.e.brandt@linux.intel.com> Tested-by: Todd Brandt <todd.e.brandt@linux.intel.com> Cc: 6.16+ <stable@vger.kernel.org> # 6.16+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>	2025-09-10 20:36:43 +02:00
Florian Westphal	37a9675e61	MAINTAINERS: add Phil as netfilter reviewer Phil has contributed to netfilter with features, fixes and patch reviews for a long time. Make this more formal and add Reviewer tag. Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>	2025-09-10 20:32:46 +02:00
Florian Westphal	b2f742c846	netfilter: nf_tables: restart set lookup on base_seq change The hash, hash_fast, rhash and bitwise sets may indicate no result even though a matching element exists during a short time window while other cpu is finalizing the transaction. This happens when the hash lookup/bitwise lookup function has picked up the old genbit, right before it was toggled by nf_tables_commit(), but then the same cpu managed to unlink the matching old element from the hash table: cpu0 cpu1 has added new elements to clone has marked elements as being inactive in new generation perform lookup in the set enters commit phase: A) observes old genbit increments base_seq I) increments the genbit II) removes old element from the set B) finds matching element C) returns no match: found element is not valid in old generation Next lookup observes new genbit and finds matching e2. Consider a packet matching element e1, e2. cpu0 processes following transaction: 1. remove e1 2. adds e2, which has same key as e1. P matches both e1 and e2. Therefore, cpu1 should always find a match for P. Due to above race, this is not the case: cpu1 observed the old genbit. e2 will not be considered once it is found. The element e1 is not found anymore if cpu0 managed to unlink it from the hlist before cpu1 found it during list traversal. The situation only occurs for a brief time period, lookups happening after I) observe new genbit and return e2. This problem exists in all set types except nft_set_pipapo, so fix it once in nft_lookup rather than each set ops individually. Sample the base sequence counter, which gets incremented right before the genbit is changed. Then, if no match is found, retry the lookup if the base sequence was altered in between. If the base sequence hasn't changed: - No update took place: no-match result is expected. This is the common case. or: - nf_tables_commit() hasn't progressed to genbit update yet. Old elements were still visible and nomatch result is expected, or: - nf_tables_commit updated the genbit: We picked up the new base_seq, so the lookup function also picked up the new genbit, no-match result is expected. If the old genbit was observed, then nft_lookup also picked up the old base_seq: nft_lookup_should_retry() returns true and relookup is performed in the new generation. This problem was added when the unconditional synchronize_rcu() call that followed the current/next generation bit toggle was removed. Thanks to Pablo Neira Ayuso for reviewing an earlier version of this patchset, for suggesting re-use of existing base_seq and placement of the restart loop in nft_set_do_lookup(). Fixes: `0cbc06b3fa` ("netfilter: nf_tables: remove synchronize_rcu in commit phase") Signed-off-by: Florian Westphal <fw@strlen.de>	2025-09-10 20:30:37 +02:00
Florian Westphal	11fe5a82e5	netfilter: nf_tables: make nft_set_do_lookup available unconditionally This function was added for retpoline mitigation and is replaced by a static inline helper if mitigations are not enabled. Enable this helper function unconditionally so next patch can add a lookup restart mechanism to fix possible false negatives while transactions are in progress. Adding lookup restarts in nft_lookup_eval doesn't work as nft_objref would then need the same copypaste loop. This patch is separate to ease review of the actual bug fix. Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>	2025-09-10 20:30:37 +02:00
Florian Westphal	64102d9bbc	netfilter: nf_tables: place base_seq in struct net This will soon be read from packet path around same time as the gencursor. Both gencursor and base_seq get incremented almost at the same time, so it makes sense to place them in the same structure. This doesn't increase struct net size on 64bit due to padding. Signed-off-by: Florian Westphal <fw@strlen.de>	2025-09-10 20:30:37 +02:00
Florian Westphal	a60f7bf4a1	netfilter: nft_set_rbtree: continue traversal if element is inactive When the rbtree lookup function finds a match in the rbtree, it sets the range start interval to a potentially inactive element. Then, after tree lookup, if the matching element is inactive, it returns NULL and suppresses a matching result. This is wrong and leads to false negative matches when a transaction has already entered the commit phase. cpu0 cpu1 has added new elements to clone has marked elements as being inactive in new generation perform lookup in the set enters commit phase: I) increments the genbit A) observes new genbit B) finds matching range C) returns no match: found range invalid in new generation II) removes old elements from the tree C New nft_lookup happening now will find matching element, because it is no longer obscured by old, inactive one. Consider a packet matching range r1-r2: cpu0 processes following transaction: 1. remove r1-r2 2. add r1-r3 P is contained in both ranges. Therefore, cpu1 should always find a match for P. Due to above race, this is not the case: cpu1 does find r1-r2, but then ignores it due to the genbit indicating the range has been removed. It does NOT test for further matches. The situation persists for all lookups until after cpu0 hits II) after which r1-r3 range start node is tested for the first time. Move the "interval start is valid" check ahead so that tree traversal continues if the starting interval is not valid in this generation. Thanks to Stefan Hanreich for providing an initial reproducer for this bug. Reported-by: Stefan Hanreich <s.hanreich@proxmox.com> Fixes: `c1eda3c639` ("netfilter: nft_rbtree: ignore inactive matching element with no descendants") Signed-off-by: Florian Westphal <fw@strlen.de>	2025-09-10 20:30:37 +02:00
Florian Westphal	c4eaca2e10	netfilter: nft_set_pipapo: don't check genbit from packetpath lookups The pipapo set type is special in that it has two copies of its datastructure: one live copy containing only valid elements and one on-demand clone used during transaction where adds/deletes happen. This clone is not visible to the datapath. This is unlike all other set types in nftables, those all link new elements into their live hlist/tree. For those sets, the lookup functions must skip the new elements while the transaction is ongoing to ensure consistency. As the clone is shallow, removal does have an effect on the packet path: once the transaction enters the commit phase the 'gencursor' bit that determines which elements are active and which elements should be ignored (because they are no longer valid) is flipped. This causes the datapath lookup to ignore these elements if they are found during lookup. This opens up a small race window where pipapo has an inconsistent view of the dataset from when the transaction-cpu flipped the genbit until the transaction-cpu calls nft_pipapo_commit() to swap live/clone pointers: cpu0 cpu1 has added new elements to clone has marked elements as being inactive in new generation perform lookup in the set enters commit phase: I) increments the genbit A) observes new genbit removes elements from the clone so they won't be found anymore B) lookup in datastructure can't see new elements yet, but old elements are ignored -> Only matches elements that were not changed in the transaction II) calls nft_pipapo_commit(), clone and live pointers are swapped. C New nft_lookup happening now will find matching elements. Consider a packet matching range r1-r2: cpu0 processes following transaction: 1. remove r1-r2 2. add r1-r3 P is contained in both ranges. Therefore, cpu1 should always find a match for P. Due to above race, this is not the case: cpu1 does find r1-r2, but then ignores it due to the genbit indicating the range has been removed. At the same time, r1-r3 is not visible yet, because it can only be found in the clone. The situation persists for all lookups until after cpu0 hits II). The fix is easy: Don't check the genbit from pipapo lookup functions. This is possible because unlike the other set types, the new elements are not reachable from the live copy of the dataset. The clone/live pointer swap is enough to avoid matching on old elements while at the same time all new elements are exposed in one go. After this change, step B above returns a match in r1-r2. This is fine: r1-r2 only becomes truly invalid the moment they get freed. This happens after a synchronize_rcu() call and rcu read lock is held via netfilter hook traversal (nf_hook_slow()). Cc: Stefano Brivio <sbrivio@redhat.com> Fixes: `3c4287f620` ("nf_tables: Add set type for arbitrary concatenation of ranges") Signed-off-by: Florian Westphal <fw@strlen.de>	2025-09-10 20:30:37 +02:00

1 2 3 4 5 ...

1383165 Commits