linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-07 23:20:32 -04:00

Author	SHA1	Message	Date
Linus Torvalds	d72052ac09	Merge tag 'sched_ext-for-6.17-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext Pull sched_ext fixes from Tejun Heo: - Fix a subtle bug during SCX enabling where a dead task skips init but doesn't skip sched class switch leading to invalid task state transition warning - Cosmetic fix in selftests * tag 'sched_ext-for-6.17-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: selftests/sched_ext: Remove duplicate sched.h header sched/ext: Fix invalid task state transitions on class switch	2025-08-21 16:02:35 -04:00
Linus Torvalds	6439a0e64c	Merge tag 'net-6.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from Bluetooth. Current release - fix to a fix: - usb: asix_devices: fix PHY address mask in MDIO bus initialization Current release - regressions: - Bluetooth: fixes for the split between BIS_LINK and PA_LINK - Revert "net: cadence: macb: sama7g5_emac: Remove USARIO CLKEN flag", breaks compatibility with some existing device tree blobs - dsa: b53: fix reserved register access in b53_fdb_dump() Current release - new code bugs: - sched: dualpi2: run probability update timer in BH to avoid deadlock - eth: libwx: fix the size in RSS hash key population - pse-pd: pd692x0: improve power budget error paths and handling Previous releases - regressions: - tls: fix handling of zero-length records on the rx_list - hsr: reject HSR frame if skb can't hold tag - bonding: fix negotiation flapping in 802.3ad passive mode Previous releases - always broken: - gso: forbid IPv6 TSO with extensions on devices with only IPV6_CSUM - sched: make cake_enqueue return NET_XMIT_CN when past buffer_limit, avoid packet drops with low buffer_limit, remove unnecessary WARN() - sched: fix backlog accounting after modifying config of a qdisc in the middle of the hierarchy - mptcp: improve handling of skb extension allocation failures - eth: mlx5: - fixes for the "HW Steering" flow management method - fixes for QoS and device buffer management" * tag 'net-6.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (81 commits) netfilter: nf_reject: don't leak dst refcount for loopback packets net/mlx5e: Preserve shared buffer capacity during headroom updates net/mlx5e: Query FW for buffer ownership net/mlx5: Restore missing scheduling node cleanup on vport enable failure net/mlx5: Fix QoS reference leak in vport enable error path net/mlx5: Destroy vport QoS element when no configuration remains net/mlx5e: Preserve tc-bw during parent changes net/mlx5: Remove default QoS group and attach vports directly to root TSAR net/mlx5: Base ECVF devlink port attrs from 0 net: pse-pd: pd692x0: Skip power budget configuration when undefined net: pse-pd: pd692x0: Fix power budget leak in manager setup error path Octeontx2-af: Skip overlap check for SPI field selftests: tls: add tests for zero-length records tls: fix handling of zero-length records on the rx_list net: airoha: ppe: Do not invalid PPE entries in case of SW hash collision selftests: bonding: add test for passive LACP mode bonding: send LACPDUs periodically in passive mode after receiving partner's LACPDU bonding: update LACP activity flag after setting lacp_active Revert "net: cadence: macb: sama7g5_emac: Remove USARIO CLKEN flag" ipv6: sr: Fix MAC comparison to be constant-time ...	2025-08-21 13:51:15 -04:00
Florian Westphal	91a79b7922	netfilter: nf_reject: don't leak dst refcount for loopback packets recent patches to add a WARN() when replacing skb dst entry found an old bug: WARNING: include/linux/skbuff.h:1165 skb_dst_check_unset include/linux/skbuff.h:1164 [inline] WARNING: include/linux/skbuff.h:1165 skb_dst_set include/linux/skbuff.h:1210 [inline] WARNING: include/linux/skbuff.h:1165 nf_reject_fill_skb_dst+0x2a4/0x330 net/ipv4/netfilter/nf_reject_ipv4.c:234 [..] Call Trace: nf_send_unreach+0x17b/0x6e0 net/ipv4/netfilter/nf_reject_ipv4.c:325 nft_reject_inet_eval+0x4bc/0x690 net/netfilter/nft_reject_inet.c:27 expr_call_ops_eval net/netfilter/nf_tables_core.c:237 [inline] .. This is because blamed commit forgot about loopback packets. Such packets already have a dst_entry attached, even at PRE_ROUTING stage. Instead of checking hook just check if the skb already has a route attached to it. Fixes: `f53b9b0bdc` ("netfilter: introduce support for reject at prerouting stage") Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://patch.msgid.link/20250820123707.10671-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-21 10:02:00 -07:00
Jakub Kicinski	1b78236a05	Merge branch 'mlx5-misx-fixes-2025-08-20' Mark Bloch says: ==================== mlx5 misx fixes 2025-08-20 This patchset provides misc bug fixes from the team to the mlx5 core and Eth drivers. v1: https://lore.kernel.org/1755095476-414026-1-git-send-email-tariqt@nvidia.com ==================== Link: https://patch.msgid.link/20250820133209.389065-1-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-21 07:58:36 -07:00
Armen Ratner	8b0587a885	net/mlx5e: Preserve shared buffer capacity during headroom updates When port buffer headroom changes, port_update_shared_buffer() recalculates the shared buffer size and splits it in a 3:1 ratio (lossy:lossless) - Currently, the calculation is: lossless = shared / 4; lossy = (shared / 4) * 3; Meaning, the calculation dropped the remainder of shared % 4 due to integer division, unintentionally reducing the total shared buffer by up to three cells on each update. Over time, this could shrink the buffer below usable size. Fix it by changing the calculation to: lossless = shared / 4; lossy = shared - lossless; This retains all buffer cells while still approximating the intended 3:1 split, preventing capacity loss over time. While at it, perform headroom calculations in units of cells rather than in bytes for more accurate calculations avoiding extra divisions. Fixes: `a440030d89` ("net/mlx5e: Update shared buffer along with device buffer changes") Signed-off-by: Armen Ratner <armeng@nvidia.com> Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Alexei Lazar <alazar@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://patch.msgid.link/20250820133209.389065-9-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-21 07:58:33 -07:00
Alexei Lazar	451d2849ea	net/mlx5e: Query FW for buffer ownership The SW currently saves local buffer ownership when setting the buffer. This means that the SW assumes it has ownership of the buffer after the command is set. If setting the buffer fails and we remain in FW ownership, the local buffer ownership state incorrectly remains as SW-owned. This leads to incorrect behavior in subsequent PFC commands, causing failures. Instead of saving local buffer ownership in SW, query the FW for buffer ownership when setting the buffer. This ensures that the buffer ownership state is accurately reflected, avoiding the issues caused by incorrect ownership states. Fixes: `ecdf2dadee` ("net/mlx5e: Receive buffer support for DCBX") Signed-off-by: Alexei Lazar <alazar@nvidia.com> Reviewed-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250820133209.389065-8-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-21 07:58:32 -07:00
Carolina Jubran	51b17c98e3	net/mlx5: Restore missing scheduling node cleanup on vport enable failure Restore the __esw_qos_free_node() call removed by the offending commit. Fixes: `97733d1e00` ("net/mlx5: Add traffic class scheduling support for vport QoS") Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250820133209.389065-7-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-21 07:58:32 -07:00
Carolina Jubran	3c114fb2af	net/mlx5: Fix QoS reference leak in vport enable error path Add missing esw_qos_put() call when __esw_qos_alloc_node() fails in mlx5_esw_qos_vport_enable(). Fixes: `be034baba8` ("net/mlx5: Make vport QoS enablement more flexible for future extensions") Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250820133209.389065-6-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-21 07:58:31 -07:00
Carolina Jubran	b697ef4d1d	net/mlx5: Destroy vport QoS element when no configuration remains If a VF has been configured and the user later clears all QoS settings, the vport element remains in the firmware QoS tree. This leads to inconsistent behavior compared to VFs that were never configured, since the FW assumes that unconfigured VFs are outside the QoS hierarchy. As a result, the bandwidth share across VFs may differ, even though none of them appear to have any configuration. Align the driver behavior with the FW expectation by destroying the vport QoS element when all configurations are removed. Fixes: `c9497c9890` ("net/mlx5: Add support for setting VF min rate") Fixes: `cf7e73770d` ("net/mlx5: Manage TC arbiter nodes and implement full support for tc-bw") Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://patch.msgid.link/20250820133209.389065-5-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-21 07:58:31 -07:00
Carolina Jubran	e8f973576c	net/mlx5e: Preserve tc-bw during parent changes When changing parent of a node/leaf with tc-bw configured, the code saves and restores tc-bw values. However, it was reading the converted hardware bw_share values (where 0 becomes 1) instead of the original user values, causing incorrect tc-bw calculations after parent change. Store original tc-bw values in the node structure and use them directly for save/restore operations. Fixes: `cf7e73770d` ("net/mlx5: Manage TC arbiter nodes and implement full support for tc-bw") Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250820133209.389065-4-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-21 07:58:31 -07:00
Carolina Jubran	330f0f6713	net/mlx5: Remove default QoS group and attach vports directly to root TSAR Currently, the driver creates a default group (`node0`) and attaches all vports to it unless the user explicitly sets a parent group. As a result, when a user configures tx_share on a group and tx_share on a VF, the expectation is for the group and the VF to share bandwidth relatively. However, since the VF is not connected to the same parent (but to the default node), the proportional share logic is not applied correctly. To fix this, remove the default group (`node0`) and instead connect vports directly to the root TSAR when no parent is specified. This ensures that vports and groups share the same root scheduler and their tx_share values are compared directly under the same hierarchy. Fixes: `0fe132eac3` ("net/mlx5: E-switch, Allow to add vports to rate groups") Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250820133209.389065-3-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-21 07:58:30 -07:00
Daniel Jurgens	bc17455bc8	net/mlx5: Base ECVF devlink port attrs from 0 Adjust the vport number by the base ECVF vport number so the port attributes start at 0. Previously the port attributes would start 1 after the maximum number of host VFs. Fixes: `dc13180824` ("net/mlx5: Enable devlink port for embedded cpu VF vports") Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250820133209.389065-2-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-21 07:58:30 -07:00
Kory Maincent	7ef353879f	net: pse-pd: pd692x0: Skip power budget configuration when undefined If the power supply's power budget is not defined in the device tree, the current code still requests power and configures the PSE manager with a 0W power limit, which is undesirable behavior. Skip power budget configuration entirely when the budget is zero, avoiding unnecessary power requests and preventing invalid 0W limits from being set on the PSE manager. Fixes: `359754013e` ("net: pse-pd: pd692x0: Add support for PSE PI priority feature") Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20250820133321.841054-1-kory.maincent@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-21 07:56:08 -07:00
Kory Maincent	1c67f9c54c	net: pse-pd: pd692x0: Fix power budget leak in manager setup error path Fix a resource leak where manager power budgets were freed on both success and error paths during manager setup. Power budgets should only be freed on error paths after regulator registration or during driver removal. Refactor cleanup logic by extracting OF node cleanup and power budget freeing into separate helper functions for better maintainability. Fixes: `359754013e` ("net: pse-pd: pd692x0: Add support for PSE PI priority feature") Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Link: https://patch.msgid.link/20250820132708.837255-1-kory.maincent@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-21 07:55:40 -07:00
Hariprasad Kelam	8c5d95988c	Octeontx2-af: Skip overlap check for SPI field Octeontx2/CN10K silicon supports generating a 256-bit key per packet. The specific fields to be extracted from a packet for key generation are configurable via a Key Extraction (MKEX) Profile. The AF driver scans the configured extraction profile to ensure that fields from upper layers do not overwrite fields from lower layers in the key. Example Packet Field Layout: LA: DMAC + SMAC LB: VLAN LC: IPv4/IPv6 LD: TCP/UDP Valid MKEX Profile Configuration: LA -> DMAC -> key_offset[0-5] LC -> SIP -> key_offset[20-23] LD -> SPORT -> key_offset[30-31] Invalid MKEX profile configuration: LA -> DMAC -> key_offset[0-5] LC -> SIP -> key_offset[20-23] LD -> SPORT -> key_offset[2-3] // Overlaps with DMAC field In another scenario, if the MKEX profile is configured to extract the SPI field from both AH and ESP headers at the same key offset, the driver rejecting this configuration. In a regular traffic, ipsec packet will be having either AH(LD) or ESP (LE). This patch relaxes the check for the same. Fixes: `12aa0a3b93` ("octeontx2-af: Harden rule validation.") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Link: https://patch.msgid.link/20250820063919.1463518-1-hkelam@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-21 07:52:54 -07:00
Jakub Kicinski	a61a3e961b	selftests: tls: add tests for zero-length records Test various combinations of zero-length records. Unfortunately, kernel cannot be coerced into producing those, so hardcode the ciphertext messages in the test. Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://patch.msgid.link/20250820021952.143068-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-21 07:52:31 -07:00
Jakub Kicinski	62708b9452	tls: fix handling of zero-length records on the rx_list Each recvmsg() call must process either - only contiguous DATA records (any number of them) - one non-DATA record If the next record has different type than what has already been processed we break out of the main processing loop. If the record has already been decrypted (which may be the case for TLS 1.3 where we don't know type until decryption) we queue the pending record to the rx_list. Next recvmsg() will pick it up from there. Queuing the skb to rx_list after zero-copy decrypt is not possible, since in that case we decrypted directly to the user space buffer, and we don't have an skb to queue (darg.skb points to the ciphertext skb for access to metadata like length). Only data records are allowed zero-copy, and we break the processing loop after each non-data record. So we should never zero-copy and then find out that the record type has changed. The corner case we missed is when the initial record comes from rx_list, and it's zero length. Reported-by: Muhammad Alifa Ramdhan <ramdhan@starlabs.sg> Reported-by: Billy Jheng Bing-Jhong <billy@starlabs.sg> Fixes: `84c61fe1a7` ("tls: rx: do not use the standard strparser") Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://patch.msgid.link/20250820021952.143068-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-21 07:52:30 -07:00
Linus Torvalds	1c656b1efd	Merge tag 'loongarch-fixes-6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson Pull LoongArch fixes from Huacai Chen: "Fix a lot of build warnings for LTO-enabled objtool check, increase COMMAND_LINE_SIZE up to 4096, rename a missing GCC_PLUGIN_STACKLEAK to KSTACK_ERASE, and fix some bugs about arch timer, module loading, LBT and KVM" * tag 'loongarch-fixes-6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: LoongArch: KVM: Add address alignment check in pch_pic register access LoongArch: KVM: Use kvm_get_vcpu_by_id() instead of kvm_get_vcpu() LoongArch: KVM: Fix stack protector issue in send_ipi_data() LoongArch: KVM: Make function kvm_own_lbt() robust LoongArch: Rename GCC_PLUGIN_STACKLEAK to KSTACK_ERASE LoongArch: Save LBT before FPU in setup_sigcontext() LoongArch: Optimize module load time by optimizing PLT/GOT counting LoongArch: Add cpuhotplug hooks to fix high cpu usage of vCPU threads LoongArch: Increase COMMAND_LINE_SIZE up to 4096 LoongArch: Pass annotate-tablejump option if LTO is enabled objtool/LoongArch: Get table size correctly if LTO is enabled	2025-08-21 10:37:33 -04:00
Linus Torvalds	32b7144f80	Merge tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux Pull crypto library fixes from Eric Biggers: "Fix a regression where 'make clean' stopped removing some of the generated assembly files on arm and arm64" * tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: lib/crypto: ensure generated *.S files are removed on make clean lib/crypto: sha: Update Kconfig help for SHA1 and SHA256	2025-08-21 04:54:01 -07:00
Linus Torvalds	eb4a0992dd	Merge tag '6.17-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbd Pull smb server fixes from Steve French: - fix refcount issue that can cause memory leak - rate limit repeated connections from IPv6, not just IPv4 addresses - fix potential null pointer access of smb direct work queue * tag '6.17-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbd: ksmbd: fix refcount leak causing resource not released ksmbd: extend the connection limiting mechanism to support IPv6 smb: server: split ksmbd_rdma_stop_listening() out of ksmbd_rdma_destroy()	2025-08-21 04:48:41 -07:00
Lorenzo Bianconi	9f6b606b6b	net: airoha: ppe: Do not invalid PPE entries in case of SW hash collision SW hash computed by airoha_ppe_foe_get_entry_hash routine (used for foe_flow hlist) can theoretically produce collisions between two different HW PPE entries. In airoha_ppe_foe_insert_entry() if the collision occurs we will mark the second PPE entry in the list as stale (setting the hw hash to 0xffff). Stale entries are no more updated in airoha_ppe_foe_flow_entry_update routine and so they are removed by Netfilter. Fix the problem not marking the second entry as stale in airoha_ppe_foe_insert_entry routine if we have already inserted the brand new entry in the PPE table and let Netfilter remove real stale entries according to their timestamp. Please note this is just a theoretical issue spotted reviewing the code and not faced running the system. Fixes: `cd53f62261` ("net: airoha: Add L2 hw acceleration support") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20250818-airoha-en7581-hash-collision-fix-v1-1-d190c4b53d1c@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-08-21 11:25:11 +02:00
Paolo Abeni	184fa9d704	Merge branch 'bonding-fix-negotiation-flapping-in-802-3ad-passive-mode' Hangbin Liu says: ==================== bonding: fix negotiation flapping in 802.3ad passive mode This patch fixes unstable LACP negotiation when bonding is configured in passive mode (`lacp_active=off`). Previously, the actor would stop sending LACPDUs after initial negotiation succeeded, leading to the partner timing out and restarting the negotiation cycle. This resulted in continuous LACP state flapping. The fix ensures the passive actor starts sending periodic LACPDUs after receiving the first LACPDU from the partner, in accordance with IEEE 802.1AX-2020 section 6.4.1. ==================== Link: https://patch.msgid.link/20250815062000.22220-1-liuhangbin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-08-21 09:35:24 +02:00
Hangbin Liu	87951b5664	selftests: bonding: add test for passive LACP mode Add a selftest to verify bonding behavior when `lacp_active` is set to `off`. The test checks the following: - The passive LACP bond should not send LACPDUs before receiving a partner's LACPDU. - The transmitted LACPDUs must not include the active flag. - After transitioning to EXPIRED and DEFAULTED states, the passive side should still not initiate LACPDUs. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20250815062000.22220-4-liuhangbin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-08-21 09:35:21 +02:00
Hangbin Liu	0599640a21	bonding: send LACPDUs periodically in passive mode after receiving partner's LACPDU When `lacp_active` is set to `off`, the bond operates in passive mode, meaning it only "speaks when spoken to." However, the current kernel implementation only sends an LACPDU in response when the partner's state changes. As a result, once LACP negotiation succeeds, the actor stops sending LACPDUs until the partner times out and sends an "expired" LACPDU. This causes continuous LACP state flapping. According to IEEE 802.1AX-2014, 6.4.13 Periodic Transmission machine. The values of Partner_Oper_Port_State.LACP_Activity and Actor_Oper_Port_State.LACP_Activity determine whether periodic transmissions take place. If either or both parameters are set to Active LACP, then periodic transmissions occur; if both are set to Passive LACP, then periodic transmissions do not occur. To comply with this, we remove the `!bond->params.lacp_active` check in `ad_periodic_machine()`. Instead, we initialize the actor's port's `LACP_STATE_LACP_ACTIVITY` state based on `lacp_active` setting. Additionally, we avoid setting the partner's state to `LACP_STATE_LACP_ACTIVITY` in the EXPIRED state, since we should not assume the partner is active by default. This ensures that in passive mode, the bond starts sending periodic LACPDUs after receiving one from the partner, and avoids flapping due to inactivity. Fixes: `3a755cd8b7` ("bonding: add new option lacp_active") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20250815062000.22220-3-liuhangbin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-08-21 09:35:20 +02:00
Hangbin Liu	b64d035f77	bonding: update LACP activity flag after setting lacp_active The port's actor_oper_port_state activity flag should be updated immediately after changing the lacp_active option to reflect the current mode correctly. Fixes: `3a755cd8b7` ("bonding: add new option lacp_active") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20250815062000.22220-2-liuhangbin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-08-21 09:35:20 +02:00
Ryan Wanner	c42be53454	Revert "net: cadence: macb: sama7g5_emac: Remove USARIO CLKEN flag" This reverts commit `db400061b5`. This commit can cause a Devicetree ABI break for older DTS files that rely this flag for RMII configuration. Adding this back in ensures that the older DTBs will not break. Fixes: `db400061b5` ("net: cadence: macb: sama7g5_emac: Remove USARIO CLKEN flag") Signed-off-by: Ryan Wanner <Ryan.Wanner@microchip.com> Link: https://patch.msgid.link/20250819163236.100680-1-Ryan.Wanner@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-20 19:33:42 -07:00
Eric Biggers	a458b29021	ipv6: sr: Fix MAC comparison to be constant-time To prevent timing attacks, MACs need to be compared in constant time. Use the appropriate helper function for this. Fixes: `bf355b8d2c` ("ipv6: sr: add core files for SR HMAC support") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org> Reviewed-by: Andrea Mayer <andrea.mayer@uniroma2.it> Link: https://patch.msgid.link/20250818202724.15713-1-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-20 19:32:30 -07:00
Jakub Acs	7af76e9d18	net, hsr: reject HSR frame if skb can't hold tag Receiving HSR frame with insufficient space to hold HSR tag in the skb can result in a crash (kernel BUG): [ 45.390915] skbuff: skb_under_panic: text:ffffffff86f32cac len:26 put:14 head:ffff888042418000 data:ffff888042417ff4 tail:0xe end:0x180 dev:bridge_slave_1 [ 45.392559] ------------[ cut here ]------------ [ 45.392912] kernel BUG at net/core/skbuff.c:211! [ 45.393276] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI [ 45.393809] CPU: 1 UID: 0 PID: 2496 Comm: reproducer Not tainted 6.15.0 #12 PREEMPT(undef) [ 45.394433] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 [ 45.395273] RIP: 0010:skb_panic+0x15b/0x1d0 <snip registers, remove unreliable trace> [ 45.402911] Call Trace: [ 45.403105] <IRQ> [ 45.404470] skb_push+0xcd/0xf0 [ 45.404726] br_dev_queue_push_xmit+0x7c/0x6c0 [ 45.406513] br_forward_finish+0x128/0x260 [ 45.408483] __br_forward+0x42d/0x590 [ 45.409464] maybe_deliver+0x2eb/0x420 [ 45.409763] br_flood+0x174/0x4a0 [ 45.410030] br_handle_frame_finish+0xc7c/0x1bc0 [ 45.411618] br_handle_frame+0xac3/0x1230 [ 45.413674] __netif_receive_skb_core.constprop.0+0x808/0x3df0 [ 45.422966] __netif_receive_skb_one_core+0xb4/0x1f0 [ 45.424478] __netif_receive_skb+0x22/0x170 [ 45.424806] process_backlog+0x242/0x6d0 [ 45.425116] __napi_poll+0xbb/0x630 [ 45.425394] net_rx_action+0x4d1/0xcc0 [ 45.427613] handle_softirqs+0x1a4/0x580 [ 45.427926] do_softirq+0x74/0x90 [ 45.428196] </IRQ> This issue was found by syzkaller. The panic happens in br_dev_queue_push_xmit() once it receives a corrupted skb with ETH header already pushed in linear data. When it attempts the skb_push() call, there's not enough headroom and skb_push() panics. The corrupted skb is put on the queue by HSR layer, which makes a sequence of unintended transformations when it receives a specific corrupted HSR frame (with incomplete TAG). Fix it by dropping and consuming frames that are not long enough to contain both ethernet and hsr headers. Alternative fix would be to check for enough headroom before skb_push() in br_dev_queue_push_xmit(). In the reproducer, this is injected via AF_PACKET, but I don't easily see why it couldn't be sent over the wire from adjacent network. Further Details: In the reproducer, the following network interface chain is set up: ┌────────────────┐ ┌────────────────┐ │ veth0_to_hsr ├───┤ hsr_slave0 ┼───┐ └────────────────┘ └────────────────┘ │ │ ┌──────┐ ├─┤ hsr0 ├───┐ │ └──────┘ │ ┌────────────────┐ ┌────────────────┐ │ │┌────────┐ │ veth1_to_hsr ┼───┤ hsr_slave1 ├───┘ └┤ │ └────────────────┘ └────────────────┘ ┌┼ bridge │ ││ │ │└────────┘ │ ┌───────┐ │ │ ... ├──────┘ └───────┘ To trigger the events leading up to crash, reproducer sends a corrupted HSR frame with incomplete TAG, via AF_PACKET socket on 'veth0_to_hsr'. The first HSR-layer function to process this frame is hsr_handle_frame(). It and then checks if the protocol is ETH_P_PRP or ETH_P_HSR. If it is, it calls skb_set_network_header(skb, ETH_HLEN + HSR_HLEN), without checking that the skb is long enough. For the crashing frame it is not, and hence the skb->network_header and skb->mac_len fields are set incorrectly, pointing after the end of the linear buffer. I will call this a BUG#1 and it is what is addressed by this patch. In the crashing scenario before the fix, the skb continues to go down the hsr path as follows. hsr_handle_frame() then calls this sequence hsr_forward_skb() fill_frame_info() hsr->proto_ops->fill_frame_info() hsr_fill_frame_info() hsr_fill_frame_info() contains a check that intends to check whether the skb actually contains the HSR header. But the check relies on the skb->mac_len field which was erroneously setup due to BUG#1, so the check passes and the execution continues back in the hsr_forward_skb(): hsr_forward_skb() hsr_forward_do() hsr->proto_ops->get_untagged_frame() hsr_get_untagged_frame() create_stripped_skb_hsr() In create_stripped_skb_hsr(), a copy of the skb is created and is further corrupted by operation that attempts to strip the HSR tag in a call to __pskb_copy(). The skb enters create_stripped_skb_hsr() with ethernet header pushed in linear buffer. The skb_pull(skb_in, HSR_HLEN) thus pulls 6 bytes of ethernet header into the headroom, creating skb_in with a headroom of size 8. The subsequent __pskb_copy() then creates an skb with headroom of just 2 and skb->len of just 12, this is how it looks after the copy: gdb) p skb->len $10 = 12 (gdb) p skb->data $11 = (unsigned char ) 0xffff888041e45382 "\252\252\252\252\252!\210\373", (gdb) p skb->head $12 = (unsigned char ) 0xffff888041e45380 "" It seems create_stripped_skb_hsr() assumes that ETH header is pulled in the headroom when it's entered, because it just pulls HSR header on top. But that is not the case in our code-path and we end up with the corrupted skb instead. I will call this BUG#2 I got confused here because it seems that under no conditions can create_stripped_skb_hsr() work well, the assumption it makes is not true during the processing of hsr frames - since the skb_push() in hsr_handle_frame to skb_pull in hsr_deliver_master(). I wonder whether I missed something here. Next, the execution arrives in hsr_deliver_master(). It calls skb_pull(ETH_HLEN), which just returns NULL - the SKB does not have enough space for the pull (as it only has 12 bytes in total at this point). The skb_pull() here further suggests that ethernet header is meant to be pushed through the whole hsr processing and create_stripped_skb_hsr() should pull it before doing the HSR header pull. hsr_deliver_master() then puts the corrupted skb on the queue, it is then picked up from there by bridge frame handling layer and finally lands in br_dev_queue_push_xmit where it panics. Cc: stable@kernel.org Fixes: `48b491a5cc` ("net: hsr: fix mac_len checks") Reported-by: syzbot+a81f2759d022496b40ab@syzkaller.appspotmail.com Signed-off-by: Jakub Acs <acsjakub@amazon.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250819082842.94378-1-acsjakub@amazon.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-20 19:31:25 -07:00
William Liu	2c2192e5f9	net/sched: Remove unnecessary WARNING condition for empty child qdisc in htb_activate The WARN_ON trigger based on !cl->leaf.q->q.qlen is unnecessary in htb_activate. htb_dequeue_tree already accounts for that scenario. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: William Liu <will@willsroot.io> Reviewed-by: Savino Dicanosa <savy@syst3mfailure.io> Link: https://patch.msgid.link/20250819033632.579854-1-will@willsroot.io Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-20 19:27:08 -07:00
William Liu	15de71d06a	net/sched: Make cake_enqueue return NET_XMIT_CN when past buffer_limit The following setup can trigger a WARNING in htb_activate due to the condition: !cl->leaf.q->q.qlen tc qdisc del dev lo root tc qdisc add dev lo root handle 1: htb default 1 tc class add dev lo parent 1: classid 1:1 \ htb rate 64bit tc qdisc add dev lo parent 1:1 handle f: \ cake memlimit 1b ping -I lo -f -c1 -s64 -W0.001 127.0.0.1 This is because the low memlimit leads to a low buffer_limit, which causes packet dropping. However, cake_enqueue still returns NET_XMIT_SUCCESS, causing htb_enqueue to call htb_activate with an empty child qdisc. We should return NET_XMIT_CN when packets are dropped from the same tin and flow. I do not believe return value of NET_XMIT_CN is necessary for packet drops in the case of ack filtering, as that is meant to optimize performance, not to signal congestion. Fixes: `046f6fd5da` ("sched: Add Common Applications Kept Enhanced (cake) qdisc") Signed-off-by: William Liu <will@willsroot.io> Reviewed-by: Savino Dicanosa <savy@syst3mfailure.io> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20250819033601.579821-1-will@willsroot.io Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-20 19:27:08 -07:00
Tristram Ha	e318cd6714	net: dsa: microchip: Fix KSZ9477 HSR port setup issue ksz9477_hsr_join() is called once to setup the HSR port membership, but the port can be enabled later, or disabled and enabled back and the port membership is not set correctly inside ksz_update_port_member(). The added code always use the correct HSR port membership for HSR port that is enabled. Fixes: `2d61298fdd` ("net: dsa: microchip: Enable HSR offloading for KSZ9477") Reported-by: Frieder Schrempf <frieder.schrempf@kontron.de> Signed-off-by: Tristram Ha <tristram.ha@microchip.com> Reviewed-by: Łukasz Majewski <lukma@nabladev.com> Tested-by: Frieder Schrempf <frieder.schrempf@kontron.de> Reviewed-by: Frieder Schrempf <frieder.schrempf@kontron.de> Link: https://patch.msgid.link/20250819010457.563286-1-Tristram.Ha@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-20 19:25:38 -07:00
Jakub Kicinski	f7b0b97c2d	Merge branch 'intel-wired-lan-driver-updates-2025-08-15-ice-ixgbe-igc' Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2025-08-15 (ice, ixgbe, igc) For ixgbe: Jason Xing corrects a condition in which improper decrement can cause improper budget value. Maciej extends down states in which XDP cannot transmit and excludes XDP rings from Tx hang checks. For igc: VladikSS moves setting of hardware device information to allow for proper check of device ID. v1: https://lore.kernel.org/20250815204205.1407768-1-anthony.l.nguyen@intel.com ==================== Link: https://patch.msgid.link/20250819222000.3504873-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-20 19:20:50 -07:00
ValdikSS	1468c1f97c	igc: fix disabling L1.2 PCI-E link substate on I226 on init Device ID comparison in igc_is_device_id_i226 is performed before the ID is set, resulting in always failing check on init. Before the patch: * L1.2 is not disabled on init * L1.2 is properly disabled after suspend-resume cycle With the patch: * L1.2 is properly disabled both on init and after suspend-resume How to test: Connect to the 1G link with 300+ mbit/s Internet speed, and run the download speed test, such as: curl -o /dev/null http://speedtest.selectel.ru/1GB Without L1.2 disabled, the speed would be no more than ~200 mbit/s. With L1.2 disabled, the speed would reach 1 gbit/s. Note: it's required that the latency between your host and the remote be around 3-5 ms, the test inside LAN (<1 ms latency) won't trigger the issue. Link: https://lore.kernel.org/intel-wired-lan/15248b4f-3271-42dd-8e35-02bfc92b25e1@intel.com Fixes: `0325143b59` ("igc: disable L1.2 PCI-E link substate to avoid performance issue") Signed-off-by: ValdikSS <iam@valdikss.org.ru> Reviewed-by: Vitaly Lifshits <vitaly.lifshits@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20250819222000.3504873-6-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-20 18:46:30 -07:00
Maciej Fijalkowski	f3d9f7fa7f	ixgbe: fix ndo_xdp_xmit() workloads Currently ixgbe driver checks periodically in its watchdog subtask if there is anything to be transmitted (considering both Tx and XDP rings) under state of carrier not being 'ok'. Such event is interpreted as Tx hang and therefore results in interface reset. This is currently problematic for ndo_xdp_xmit() as it is allowed to produce descriptors when interface is going through reset or its carrier is turned off. Furthermore, XDP rings should not really be objects of Tx hang detection. This mechanism is rather a matter of ndo_tx_timeout() being called from dev_watchdog against Tx rings exposed to networking stack. Taking into account issues described above, let us have a two fold fix - do not respect XDP rings in local ixgbe watchdog and do not produce Tx descriptors in ndo_xdp_xmit callback when there is some problem with carrier currently. For now, keep the Tx hang checks in clean Tx irq routine, but adjust it to not execute for XDP rings. Cc: Tobias Böhm <tobias.boehm@hetzner-cloud.de> Reported-by: Marcus Wichelmann <marcus.wichelmann@hetzner-cloud.de> Closes: https://lore.kernel.org/netdev/eca1880f-253a-4955-afe6-732d7c6926ee@hetzner-cloud.de/ Fixes: `6453073987` ("ixgbe: add initial support for xdp redirect") Fixes: `33fdc82f08` ("ixgbe: add support for XDP_TX action") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Marcus Wichelmann <marcus.wichelmann@hetzner-cloud.de> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20250819222000.3504873-5-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-20 18:46:30 -07:00
Jason Xing	4d4d9ef9df	ixgbe: xsk: resolve the negative overflow of budget in ixgbe_xmit_zc Resolve the budget negative overflow which leads to returning true in ixgbe_xmit_zc even when the budget of descs are thoroughly consumed. Before this patch, when the budget is decreased to zero and finishes sending the last allowed desc in ixgbe_xmit_zc, it will always turn back and enter into the while() statement to see if it should keep processing packets, but in the meantime it unexpectedly decreases the value again to 'unsigned int (0--)', namely, UINT_MAX. Finally, the ixgbe_xmit_zc returns true, showing 'we complete cleaning the budget'. That also means 'clean_complete = true' in ixgbe_poll. The true theory behind this is if that budget number of descs are consumed, it implies that we might have more descs to be done. So we should return false in ixgbe_xmit_zc to tell napi poll to find another chance to start polling to handle the rest of descs. On the contrary, returning true here means job done and we know we finish all the possible descs this time and we don't intend to start a new napi poll. It is apparently against our expectations. Please also see how ixgbe_clean_tx_irq() handles the problem: it uses do..while() statement to make sure the budget can be decreased to zero at most and the negative overflow never happens. The patch adds 'likely' because we rarely would not hit the loop condition since the standard budget is 256. Fixes: `8221c5eba8` ("ixgbe: add AF_XDP zero-copy Tx support") Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Priya Singh <priyax.singh@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20250819222000.3504873-4-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-08-20 18:46:30 -07:00
Linus Torvalds	068a56e56f	Merge tag 'probes-fixes-v6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probes fix from Masami Hiramatsu: "Sanitize wildcard for fprobe event name Fprobe event accepts wildcards for the target functions, but unless the user specifies its event name, it makes an event with the wildcards. Replace the wildcard '' with the underscore '_'" tag 'probes-fixes-v6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: fprobe-event: Sanitize wildcard for fprobe event name	2025-08-20 16:29:30 -07:00
Linus Torvalds	43f981b7a7	Merge tag 'bootconfig-fixes-v6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull bootconfig fix from Masami Hiramatsu: "Fix negative seeks on 32-bit with LFS enabled On 32bit architecture, -BOOTCONFIG_FOOTER_SIZE (size_t, 32bit) becomes a positive value when it is passed to lseek() because it is cast to off_t (64bit). Thus, add type casts" * tag 'bootconfig-fixes-v6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: bootconfig: Fix negative seeks on 32-bit with LFS enabled	2025-08-20 16:27:38 -07:00
Ben Hutchings	729dc340a4	bootconfig: Fix negative seeks on 32-bit with LFS enabled Commit `26dda57695` "tools/bootconfig: Cleanup bootconfig footer size calculations" replaced some expressions of type int with the BOOTCONFIG_FOOTER_SIZE macro, which expands to an expression of type size_t, which is unsigned. On 32-bit architectures with LFS enabled (i.e. off_t is 64-bit), the seek offset of -BOOTCONFIG_FOOTER_SIZE now turns into a positive value. Fix this by casting the size to off_t before negating it. Just in case someone changes BOOTCONFIG_MAGIC_LEN to have type size_t later, do the same thing to the seek offset of -BOOTCONFIG_MAGIC_LEN. Link: https://lore.kernel.org/all/aKHlevxeg6Y7UQrz@decadent.org.uk/ Fixes: `26dda57695` ("tools/bootconfig: Cleanup bootconfig footer size calculations") Signed-off-by: Ben Hutchings <benh@debian.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>	2025-08-21 08:16:31 +09:00
Linus Torvalds	41cd3fd152	Merge tag 'pci-v6.17-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci fixes from Bjorn Helgaas: - Remove vmd restriction on children using MSI-X because VMD does in fact support both MSI and MSI-X for children (Nam Cao) - Fix a NULL pointer dereference in the xilinx interrupt handler (Nam Cao) * tag 'pci-v6.17-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: PCI: vmd: Remove MSI-X check on child devices PCI: xilinx: Fix NULL pointer dereference in xilinx_pcie_intr_handler()	2025-08-20 13:26:33 -07:00
Bibo Mao	538c06e396	LoongArch: KVM: Add address alignment check in pch_pic register access With pch_pic device, its register is based on MMIO address space, different access size 1/2/4/8 is supported. And base address should be naturally aligned with its access size, here add alignment check in its register access emulation function. Cc: stable@vger.kernel.org Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2025-08-20 22:51:15 +08:00
Song Gao	0dfd9ea7bf	LoongArch: KVM: Use kvm_get_vcpu_by_id() instead of kvm_get_vcpu() Since using kvm_get_vcpu() may fail to retrieve the vCPU context, kvm_get_vcpu_by_id() should be used instead. Fixes: `8e3054261b` ("LoongArch: KVM: Add IPI user mode read and write function") Fixes: `3956a52bc0` ("LoongArch: KVM: Add EIOINTC read and write functions") Reviewed-by: Yanteng Si <siyanteng@cqsoftware.com.cm> Signed-off-by: Song Gao <gaosong@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2025-08-20 22:51:15 +08:00
Bibo Mao	5c68549c81	LoongArch: KVM: Fix stack protector issue in send_ipi_data() Function kvm_io_bus_read() is called in function send_ipi_data(), buffer size of parameter val should be at least 8 bytes. Since some emulation functions like loongarch_ipi_readl() and kvm_eiointc_read() will write the buffer val with 8 bytes signed extension regardless parameter len. Otherwise there will be buffer overflow issue when CONFIG_STACKPROTECTOR is enabled. The bug report is shown as follows: Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: send_ipi_data+0x194/0x1a0 [kvm] CPU: 11 UID: 107 PID: 2692 Comm: CPU 0/KVM Not tainted 6.17.0-rc1+ #102 PREEMPT(full) Stack : 9000000005901568 0000000000000000 9000000003af371c 900000013c68c000 900000013c68f850 900000013c68f858 0000000000000000 900000013c68f998 900000013c68f990 900000013c68f990 900000013c68f6c0 fffffffffffdb058 fffffffffffdb0e0 900000013c68f858 911e1d4d39cf0ec2 9000000105657a00 0000000000000001 fffffffffffffffe 0000000000000578 282049464555206e 6f73676e6f6f4c20 0000000000000001 00000000086b4000 0000000000000000 0000000000000000 0000000000000000 9000000005709968 90000000058f9000 900000013c68fa68 900000013c68fab4 90000000029279f0 900000010153f940 900000010001f360 0000000000000000 9000000003af3734 000000004390000c 00000000000000b0 0000000000000004 0000000000000000 0000000000071c1d ... Call Trace: [<9000000003af3734>] show_stack+0x5c/0x180 [<9000000003aed168>] dump_stack_lvl+0x6c/0x9c [<9000000003ad0ab0>] vpanic+0x108/0x2c4 [<9000000003ad0ca8>] panic+0x3c/0x40 [<9000000004eb0a1c>] __stack_chk_fail+0x14/0x18 [<ffff8000023473f8>] send_ipi_data+0x190/0x1a0 [kvm] [<ffff8000023313e4>] __kvm_io_bus_write+0xa4/0xe8 [kvm] [<ffff80000233147c>] kvm_io_bus_write+0x54/0x90 [kvm] [<ffff80000233f9f8>] kvm_emu_iocsr+0x180/0x310 [kvm] [<ffff80000233fe08>] kvm_handle_gspr+0x280/0x478 [kvm] [<ffff8000023443e8>] kvm_handle_exit+0xc0/0x130 [kvm] Cc: stable@vger.kernel.org Fixes: `daee2f9cae` ("LoongArch: KVM: Add IPI read and write function") Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2025-08-20 22:51:15 +08:00
Bibo Mao	4be8cefc13	LoongArch: KVM: Make function kvm_own_lbt() robust Add the flag KVM_LARCH_LBT checking in function kvm_own_lbt(), so that it can be called safely rather than duplicated enabling again. Cc: stable@vger.kernel.org Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2025-08-20 22:51:14 +08:00
Masami Hiramatsu (Google)	ec879e1a0b	tracing: fprobe-event: Sanitize wildcard for fprobe event name Fprobe event accepts wildcards for the target functions, but unless user specifies its event name, it makes an event with the wildcards. /sys/kernel/tracing # echo 'f mutex' >> dynamic_events /sys/kernel/tracing # cat dynamic_events f:fprobes/mutex__entry mutex* /sys/kernel/tracing # ls events/fprobes/ enable filter mutex__entry To fix this, replace the wildcard ('') with an underscore. Link: https://lore.kernel.org/all/175535345114.282990.12294108192847938710.stgit@devnote2/ Fixes: `334e5519c3` ("tracing/probes: Add fprobe events for tracing function entry and exit.") Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: stable@vger.kernel.org	2025-08-20 23:41:58 +09:00
Huacai Chen	0078e94a47	LoongArch: Rename GCC_PLUGIN_STACKLEAK to KSTACK_ERASE Commit `57fbad15c2` ("stackleak: Rename STACKLEAK to KSTACK_ERASE") misses the stackframe.h part for LoongArch, so fix it. Fixes: `57fbad15c2` ("stackleak: Rename STACKLEAK to KSTACK_ERASE") Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2025-08-20 22:23:44 +08:00
Huacai Chen	112ca94f6c	LoongArch: Save LBT before FPU in setup_sigcontext() Now if preemption happens between protected_save_fpu_context() and protected_save_lbt_context(), FTOP context is lost. Because FTOP is saved by protected_save_lbt_context() but protected_save_fpu_context() disables TM before that. So save LBT before FPU in setup_sigcontext() to avoid this potential risk. Signed-off-by: Hanlu Li <lihanlu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2025-08-20 22:23:44 +08:00
Kanglong Wang	63dbd8fb2a	LoongArch: Optimize module load time by optimizing PLT/GOT counting When enabling CONFIG_KASAN, CONFIG_PREEMPT_VOLUNTARY_BUILD and CONFIG_PREEMPT_VOLUNTARY at the same time, there will be soft deadlock, the relevant logs are as follows: rcu: INFO: rcu_sched self-detected stall on CPU ... Call Trace: [<900000000024f9e4>] show_stack+0x5c/0x180 [<90000000002482f4>] dump_stack_lvl+0x94/0xbc [<9000000000224544>] rcu_dump_cpu_stacks+0x1fc/0x280 [<900000000037ac80>] rcu_sched_clock_irq+0x720/0xf88 [<9000000000396c34>] update_process_times+0xb4/0x150 [<90000000003b2474>] tick_nohz_handler+0xf4/0x250 [<9000000000397e28>] __hrtimer_run_queues+0x1d0/0x428 [<9000000000399b2c>] hrtimer_interrupt+0x214/0x538 [<9000000000253634>] constant_timer_interrupt+0x64/0x80 [<9000000000349938>] __handle_irq_event_percpu+0x78/0x1a0 [<9000000000349a78>] handle_irq_event_percpu+0x18/0x88 [<9000000000354c00>] handle_percpu_irq+0x90/0xf0 [<9000000000348c74>] handle_irq_desc+0x94/0xb8 [<9000000001012b28>] handle_cpu_irq+0x68/0xa0 [<9000000001def8c0>] handle_loongarch_irq+0x30/0x48 [<9000000001def958>] do_vint+0x80/0xd0 [<9000000000268a0c>] kasan_mem_to_shadow.part.0+0x2c/0x2a0 [<90000000006344f4>] __asan_load8+0x4c/0x120 [<900000000025c0d0>] module_frob_arch_sections+0x5c8/0x6b8 [<90000000003895f0>] load_module+0x9e0/0x2958 [<900000000038b770>] __do_sys_init_module+0x208/0x2d0 [<9000000001df0c34>] do_syscall+0x94/0x190 [<900000000024d6fc>] handle_syscall+0xbc/0x158 After analysis, this is because the slow speed of loading the amdgpu module leads to the long time occupation of the cpu and then the soft deadlock. When loading a module, module_frob_arch_sections() tries to figure out the number of PLTs/GOTs that will be needed to handle all the RELAs. It will call the count_max_entries() to find in an out-of-order date which counting algorithm has O(n^2) complexity. To make it faster, we sort the relocation list by info and addend. That way, to check for a duplicate relocation, it just needs to compare with the previous entry. This reduces the complexity of the algorithm to O(n log n), as done in commit `d4e0340919` ("arm64/module: Optimize module load time by optimizing PLT counting"). This gives sinificant reduction in module load time for modules with large number of relocations. After applying this patch, the soft deadlock problem has been solved, and the kernel starts normally without "Call Trace". Using the default configuration to test some modules, the results are as follows: Module Size ip_tables 36K fat 143K radeon 2.5MB amdgpu 16MB Without this patch: Module Module load time (ms) Count(PLTs/GOTs) ip_tables 18 59/6 fat 0 162/14 radeon 54 1221/84 amdgpu 1411 4525/1098 With this patch: Module Module load time (ms) Count(PLTs/GOTs) ip_tables 18 59/6 fat 0 162/14 radeon 22 1221/84 amdgpu 45 4525/1098 Fixes: `fcdfe9d22b` ("LoongArch: Add ELF and module support") Signed-off-by: Kanglong Wang <wangkanglong@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2025-08-20 22:23:44 +08:00
Xianglai Li	8ef7f3132e	LoongArch: Add cpuhotplug hooks to fix high cpu usage of vCPU threads When the CPU is offline, the timer of LoongArch is not correctly closed. This is harmless for real machines, but resulting in an excessively high cpu usage rate of the offline vCPU thread in the virtual machines. To correctly close the timer, we have made the following modifications: Register the cpu hotplug event (CPUHP_AP_LOONGARCH_ARCH_TIMER_STARTING) for LoongArch. This event's hooks will be called to close the timer when the CPU is offline. Clear the timer interrupt when the timer is turned off. Since before the timer is turned off, there may be a timer interrupt that has already been in the pending state due to the interruption of the disabled, which also affects the halt state of the offline vCPU. Signed-off-by: Xianglai Li <lixianglai@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2025-08-20 22:23:44 +08:00
Ming Wang	f7794a4d92	LoongArch: Increase COMMAND_LINE_SIZE up to 4096 The default COMMAND_LINE_SIZE of 512, inherited from asm-generic, is too small for modern use cases. For example, kdump configurations or extensive debugging parameters can easily exceed this limit. Therefore, increase the command line size to 4096 bytes, aligning LoongArch with the MIPS architecture. This change follows a broader trend among architectures to raise this limit to support modern needs; for instance, PowerPC increased its value for similar reasons in the commit `a5980d064f` ("powerpc: Bump COMMAND_LINE_SIZE to 2048"). Similar to the change made for RISC-V in the commit `61fc1ee8be` ("riscv: Bump COMMAND_LINE_SIZE value to 1024"), this is considered a safe change. The broader kernel community has reached a consensus that modifying COMMAND_LINE_SIZE from UAPI headers does not constitute a uABI breakage, as well-behaved userspace applications should not rely on this macro. Suggested-by: Huang Cun <cunhuang@tencent.com> Signed-off-by: Ming Wang <wangming01@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2025-08-20 22:23:16 +08:00
Tiezhu Yang	5dfea6644d	LoongArch: Pass annotate-tablejump option if LTO is enabled When compiling with LLVM and CONFIG_LTO_CLANG is set, there exist many objtool warnings "sibling call from callable instruction with modified stack frame". For this special case, the related object file shows that there is no generated relocation section '.rela.discard.tablejump_annotate' for the table jump instruction jirl, thus objtool can not know that what is the actual destination address. It needs to do something on the LLVM side to make sure that there is the relocation section '.rela.discard.tablejump_annotate' if LTO is enabled, but in order to maintain compatibility for the current LLVM compiler, this can be done in the kernel Makefile for now. Ensure it is aware of linker with LTO, '--loongarch-annotate-tablejump' needs to be passed via '-mllvm' to ld.lld. Note that it should also pass the compiler option -mannotate-tablejump rather than only pass '-mllvm --loongarch-annotate-tablejump' to ld.lld if LTO is enabled, otherwise there are no jump info for some table jump instructions. Fixes: `e20ab7d454` ("LoongArch: Enable jump table for objtool") Closes: https://lore.kernel.org/loongarch/20250731175655.GA1455142@ax162/ Reported-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Co-developed-by: WANG Rui <wangrui@loongson.cn> Signed-off-by: WANG Rui <wangrui@loongson.cn> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2025-08-20 22:23:15 +08:00

1 2 3 4 5 ...

1382090 Commits