RSS configuration should be updated with queue number. In particular, it
should be updated when (1) rss enabled and (2) default rss configuration
is used without user modification.
During rss command processing, device updates queue_pairs using
rss.max_tx_vq. That is, the device updates queue_pairs together with
rss, so we can skip the sperate queue_pairs update
(VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET below) and return directly.
Also remove the `vi->has_rss ?` check when setting vi->rss.max_tx_vq,
because this is not used in the other hash_report case.
Fixes: c7114b1249 ("drivers/net/virtio_net: Added basic RSS support.")
Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
During virtnet_probe, default rss configuration is initialized, but was
not committed to the device. This patch fix this by sending rss command
after device ready in virtnet_probe. Otherwise, the actual rss
configuration used by device can be different with that read by user
from driver, which may confuse the user.
If the command committing fails, driver rss will be disabled.
Fixes: c7114b1249 ("drivers/net/virtio_net: Added basic RSS support.")
Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Joe Damato <jdamato@fastly.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When reading/writing virtio_net_ctrl_rss, we get the indirection table
size from vi->rss_indir_table_size, which is initialized in
virtnet_probe(). However, the actual size of indirection_table was set
as VIRTIO_NET_RSS_MAX_TABLE_LEN=128. This collision may cause issues if
the vi->rss_indir_table_size exceeds 128.
This patch instead uses dynamic indirection table, allocated with
vi->rss after vi->rss_indir_table_size initialized. And free it in
virtnet_remove().
In virtnet_commit_rss_command(), sgs for rss is initialized differently
with hash_report. So indirection_table is not used if !vi->has_rss, and
then we don't need to alloc indirection_table for hash_report only uses.
Fixes: c7114b1249 ("drivers/net/virtio_net: Added basic RSS support.")
Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Joe Damato <jdamato@fastly.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Commit a23aa04042 ("net: stmmac: ethtool: Fixed calltrace caused by
unbalanced disable_irq_wake calls") introduced checks to prevent
unbalanced enable and disable IRQ wake calls. However it only
initialized the auxiliary variable on one of the paths,
stmmac_request_irq_multi_msi(), missing the other,
stmmac_request_irq_single().
Add the same initialization on stmmac_request_irq_single() to prevent
"Unbalanced IRQ <x> wake disable" warnings from being printed the first
time disable_irq_wake() is called on platforms that run on that code
path.
Fixes: a23aa04042 ("net: stmmac: ethtool: Fixed calltrace caused by unbalanced disable_irq_wake calls")
Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241101-stmmac-unbalanced-wake-single-fix-v1-1-5952524c97f0@collabora.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2024-11-04 (ice, idpf, i40e, e1000e)
For ice:
Marcin adjusts ordering of calls in ice_eswitch_detach() to resolve a
use after free issue.
Mateusz corrects variable type for Flow Director queue to fix issues
related to drop actions.
For idpf:
Pavan resolves issues related to reset on idpf; avoiding use of freed
vport and correctly unrolling the mailbox task.
For i40e:
Aleksandr fixes a race condition involving addition and deletion of VF
MAC filters.
For e1000e:
Vitaly reverts workaround for Meteor Lake causing regressions in power
management flows.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
e1000e: Remove Meteor Lake SMBUS workarounds
i40e: fix race condition by adding filter's intermediate sync state
idpf: fix idpf_vc_core_init error path
idpf: avoid vport access in idpf_get_link_ksettings
ice: change q_index variable type to s16 to store -1 value
ice: Fix use after free during unload with ports in bridge
====================
Link: https://patch.msgid.link/20241104223639.2801097-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Matthieu Baerts says:
====================
mptcp: pm: fix wrong perm and sock kfree
Two small fixes related to the MPTCP path-manager:
- Patch 1: remove an accidental restriction to admin users to list MPTCP
endpoints. A regression from v6.7.
- Patch 2: correctly use sock_kfree_s() instead of kfree() in the
userspace PM. A fix for another fix introduced in v6.4 and
backportable up to v5.19.
====================
Link: https://patch.msgid.link/20241104-net-mptcp-misc-6-12-v1-0-c13f2ff1656f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
During the switch to YNL, the command to list all endpoints has been
accidentally restricted to users with admin permissions.
It looks like there are no reasons to have this restriction which makes
it harder for a user to quickly check if the endpoint list has been
correctly populated by an automated tool. Best to go back to the
previous behaviour then.
mptcp_pm_gen.c has been modified using ynl-gen-c.py:
$ ./tools/net/ynl/ynl-gen-c.py --mode kernel \
--spec Documentation/netlink/specs/mptcp_pm.yaml --source \
-o net/mptcp/mptcp_pm_gen.c
The header file doesn't need to be regenerated.
Fixes: 1d0507f468 ("net: mptcp: convert netlink from small_ops to ops")
Cc: stable@vger.kernel.org
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241104-net-mptcp-misc-6-12-v1-1-c13f2ff1656f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
DP83848 datasheet (section 4.7.2) indicates that the reset pin should be
toggled after the clocks are running. Add the PHY_RST_AFTER_CLK_EN to
make sure that this indication is respected.
In my experience not having this flag enabled would lead to, on some
boots, the wrong MII mode being selected if the PHY was initialized on
the bootloader and was receiving data during Linux boot.
Signed-off-by: Diogo Silva <diogompaissilva@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Fixes: 34e45ad937 ("net: phy: dp83848: Add TI DP83848 Ethernet PHY")
Link: https://patch.msgid.link/20241102151504.811306-1-paissilva@ld-100007.ds1.internal
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Roger Quadros says:
====================
net: ethernet: ti: am65-cpsw: Fixes to multi queue RX feature
On J7 platforms, setting up multiple RX flows was failing
as the RX free descriptor ring 0 is shared among all flows
and we did not allocate enough elements in the RX free descriptor
ring 0 to accommodate for all RX flows. Patch 1 fixes this.
The second patch fixes a warning if there was any error in
am65_cpsw_nuss_init_rx_chns() and am65_cpsw_nuss_cleanup_rx_chns()
was called after that.
Signed-off-by: Roger Quadros <rogerq@kernel.org>
====================
Link: https://patch.msgid.link/20241101-am65-cpsw-multi-rx-j7-fix-v3-0-338fdd6a55da@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
flow->irq is initialized to 0 which is a valid IRQ. Set it to -EINVAL
in error path of am65_cpsw_nuss_init_rx_chns() so we do not try
to free an unallocated IRQ in am65_cpsw_nuss_remove_rx_chns().
If user tried to change number of RX queues and am65_cpsw_nuss_init_rx_chns()
failed due to any reason, the warning will happen if user tries to change
the number of RX queues after the error condition.
root@am62xx-evm:~# ethtool -L eth0 rx 3
[ 40.385293] am65-cpsw-nuss 8000000.ethernet: set new flow-id-base 19
[ 40.393211] am65-cpsw-nuss 8000000.ethernet: Failed to init rx flow2
netlink error: Invalid argument
root@am62xx-evm:~# ethtool -L eth0 rx 2
[ 82.306427] ------------[ cut here ]------------
[ 82.311075] WARNING: CPU: 0 PID: 378 at kernel/irq/devres.c:144 devm_free_irq+0x84/0x90
[ 82.469770] Call trace:
[ 82.472208] devm_free_irq+0x84/0x90
[ 82.475777] am65_cpsw_nuss_remove_rx_chns+0x6c/0xac [ti_am65_cpsw_nuss]
[ 82.482487] am65_cpsw_nuss_update_tx_rx_chns+0x2c/0x9c [ti_am65_cpsw_nuss]
[ 82.489442] am65_cpsw_set_channels+0x30/0x4c [ti_am65_cpsw_nuss]
[ 82.495531] ethnl_set_channels+0x224/0x2dc
[ 82.499713] ethnl_default_set_doit+0xb8/0x1b8
[ 82.504149] genl_family_rcv_msg_doit+0xc0/0x124
[ 82.508757] genl_rcv_msg+0x1f0/0x284
[ 82.512409] netlink_rcv_skb+0x58/0x130
[ 82.516239] genl_rcv+0x38/0x50
[ 82.519374] netlink_unicast+0x1d0/0x2b0
[ 82.523289] netlink_sendmsg+0x180/0x3c4
[ 82.527205] __sys_sendto+0xe4/0x158
[ 82.530779] __arm64_sys_sendto+0x28/0x38
[ 82.534782] invoke_syscall+0x44/0x100
[ 82.538526] el0_svc_common.constprop.0+0xc0/0xe0
[ 82.543221] do_el0_svc+0x1c/0x28
[ 82.546528] el0_svc+0x28/0x98
[ 82.549578] el0t_64_sync_handler+0xc0/0xc4
[ 82.553752] el0t_64_sync+0x190/0x194
[ 82.557407] ---[ end trace 0000000000000000 ]---
Fixes: da70d184a8 ("net: ethernet: ti: am65-cpsw: Introduce multi queue Rx")
Signed-off-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
On J7 platforms, setting up multiple RX flows was failing
as the RX free descriptor ring 0 is shared among all flows
and we did not allocate enough elements in the RX free descriptor
ring 0 to accommodate for all RX flows.
This issue is not present on AM62 as separate pair of
rings are used for free and completion rings for each flow.
Fix this by allocating enough elements for RX free descriptor
ring 0.
However, we can no longer rely on desc_idx (descriptor based
offsets) to identify the pages in the respective flows as
free descriptor ring includes elements for all flows.
To solve this, introduce a new swdata data structure to store
flow_id and page. This can be used to identify which flow (page_pool)
and page the descriptor belonged to when popped out of the
RX rings.
Fixes: da70d184a8 ("net: ethernet: ti: am65-cpsw: Introduce multi queue Rx")
Signed-off-by: Roger Quadros <rogerq@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When the driver is uninstalled and the VF is disabled concurrently, a
kernel crash occurs. The reason is that the two actions call function
pci_disable_sriov(). The num_VFs is checked to determine whether to
release the corresponding resources. During the second calling, num_VFs
is not 0 and the resource release function is called. However, the
corresponding resource has been released during the first invoking.
Therefore, the problem occurs:
[15277.839633][T50670] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020
...
[15278.131557][T50670] Call trace:
[15278.134686][T50670] klist_put+0x28/0x12c
[15278.138682][T50670] klist_del+0x14/0x20
[15278.142592][T50670] device_del+0xbc/0x3c0
[15278.146676][T50670] pci_remove_bus_device+0x84/0x120
[15278.151714][T50670] pci_stop_and_remove_bus_device+0x6c/0x80
[15278.157447][T50670] pci_iov_remove_virtfn+0xb4/0x12c
[15278.162485][T50670] sriov_disable+0x50/0x11c
[15278.166829][T50670] pci_disable_sriov+0x24/0x30
[15278.171433][T50670] hnae3_unregister_ae_algo_prepare+0x60/0x90 [hnae3]
[15278.178039][T50670] hclge_exit+0x28/0xd0 [hclge]
[15278.182730][T50670] __se_sys_delete_module.isra.0+0x164/0x230
[15278.188550][T50670] __arm64_sys_delete_module+0x1c/0x30
[15278.193848][T50670] invoke_syscall+0x50/0x11c
[15278.198278][T50670] el0_svc_common.constprop.0+0x158/0x164
[15278.203837][T50670] do_el0_svc+0x34/0xcc
[15278.207834][T50670] el0_svc+0x20/0x30
For details, see the following figure.
rmmod hclge disable VFs
----------------------------------------------------
hclge_exit() sriov_numvfs_store()
... device_lock()
pci_disable_sriov() hns3_pci_sriov_configure()
pci_disable_sriov()
sriov_disable()
sriov_disable() if !num_VFs :
if !num_VFs : return;
return; sriov_del_vfs()
sriov_del_vfs() ...
... klist_put()
klist_put() ...
... num_VFs = 0;
num_VFs = 0; device_unlock();
In this patch, when driver is removing, we get the device_lock()
to protect num_VFs, just like sriov_numvfs_store().
Fixes: 0dd8a25f35 ("net: hns3: disable sriov before unload hclge layer")
Signed-off-by: Peiyang Wang <wangpeiyang1@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241101091507.3644584-1-shaojijie@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This reverts commit d80a309130, reversing
changes made to 637f414763:
2cf2461435 ("net: hns3: fix kernel crash when 1588 is sent on HIP08 devices")
3e22b7de34 ("net: hns3: fixed hclge_fetch_pf_reg accesses bar space out of bounds issue")
d1c2e2961a ("net: hns3: initialize reset_timer before hclgevf_misc_irq_init()")
5f62009ff1 ("net: hns3: don't auto enable misc vector")
2758f18a83 ("net: hns3: Resolved the issue that the debugfs query result is inconsistent.")
662ecfc466 ("net: hns3: fix missing features due to dev->features configuration too early")
3e0f7cc887 ("net: hns3: fixed reset failure issues caused by the incorrect reset type")
f2c14899ca ("net: hns3: add sync command to sync io-pgtable")
e6ab19443b ("net: hns3: default enable tx bounce buffer when smmu enabled")
The series is making the driver poke into IOMMU internals instead of
implementing appropriate IOMMU workarounds.
Link: https://lore.kernel.org/069c9838-b781-4012-934a-d2626fa78212@arm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Marc Kleine-Budde says:
====================
pull-request: can 2024-11-04
Alexander Hölzl contributes a patch to fix an error in the CAN j1939
documentation.
Thomas Mühlbacher's patch allows building of the {cc770,sja1000}_isa
drivers on x86_64 again.
A patch by me targets the m_can driver and limits the call to
free_irq() to devices with IRQs.
Dario Binacchi's patch fixes the RX and TX error counters in the c_can
driver.
The next 2 patches target the rockchip_canfd driver. Geert
Uytterhoeven's patch lets the driver depend on ARCH_ROCKCHIP. Jean
Delvare's patch drops the obsolete dependency on COMPILE_TEST.
The last 2 patches are by me and fix 2 regressions in the mcp251xfd
driver: fix broken coalescing configuration when switching CAN modes
and fix the length calculation of the Transmit Event FIFO (TEF) on
full TEF.
* tag 'linux-can-fixes-for-6.12-20241104' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: mcp251xfd: mcp251xfd_get_tef_len(): fix length calculation
can: mcp251xfd: mcp251xfd_ring_alloc(): fix coalescing configuration when switching CAN modes
can: rockchip_canfd: Drop obsolete dependency on COMPILE_TEST
can: rockchip_canfd: CAN_ROCKCHIP_CANFD should depend on ARCH_ROCKCHIP
can: c_can: fix {rx,tx}_errors statistics
can: m_can: m_can_close(): don't call free_irq() for IRQ-less devices
can: {cc770,sja1000}_isa: allow building on x86_64
can: j1939: fix error in J1939 documentation.
====================
Link: https://patch.msgid.link/20241104200120.393312-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This is a partial revert to commit 76a0a3f9cc ("e1000e: fix force smbus
during suspend flow"). That commit fixed a sporadic PHY access issue but
introduced a regression in runtime suspend flows.
The original issue on Meteor Lake systems was rare in terms of the
reproduction rate and the number of the systems affected.
After the integration of commit 0a6ad4d9e1 ("e1000e: avoid failing the
system during pm_suspend"), PHY access loss can no longer cause a
system-level suspend failure. As it only occurs when the LAN cable is
disconnected, and is recovered during system resume flow. Therefore, its
functional impact is low, and the priority is given to stabilizing
runtime suspend.
Fixes: 76a0a3f9cc ("e1000e: fix force smbus during suspend flow")
Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Tested-by: Avigail Dahan <avigailx.dahan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Fix a race condition in the i40e driver that leads to MAC/VLAN filters
becoming corrupted and leaking. Address the issue that occurs under
heavy load when multiple threads are concurrently modifying MAC/VLAN
filters by setting mac and port VLAN.
1. Thread T0 allocates a filter in i40e_add_filter() within
i40e_ndo_set_vf_port_vlan().
2. Thread T1 concurrently frees the filter in __i40e_del_filter() within
i40e_ndo_set_vf_mac().
3. Subsequently, i40e_service_task() calls i40e_sync_vsi_filters(), which
refers to the already freed filter memory, causing corruption.
Reproduction steps:
1. Spawn multiple VFs.
2. Apply a concurrent heavy load by running parallel operations to change
MAC addresses on the VFs and change port VLANs on the host.
3. Observe errors in dmesg:
"Error I40E_AQ_RC_ENOSPC adding RX filters on VF XX,
please set promiscuous on manually for VF XX".
Exact code for stable reproduction Intel can't open-source now.
The fix involves implementing a new intermediate filter state,
I40E_FILTER_NEW_SYNC, for the time when a filter is on a tmp_add_list.
These filters cannot be deleted from the hash list directly but
must be removed using the full process.
Fixes: 278e7d0b9d ("i40e: store MAC/VLAN filters in a hash with the MAC Address as key")
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Reviewed-by: Michal Schmidt <mschmidt@redhat.com>
Tested-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
In an event where the platform running the device control plane
is rebooted, reset is detected on the driver. It releases
all the resources and waits for the reset to complete. Once the
reset is done, it tries to build the resources back. At this
time if the device control plane is not yet started, then
the driver timeouts on the virtchnl message and retries to
establish the mailbox again.
In the retry flow, mailbox is deinitialized but the mailbox
workqueue is still alive and polling for the mailbox message.
This results in accessing the released control queue leading to
null-ptr-deref. Fix it by unrolling the work queue cancellation
and mailbox deinitialization in the reverse order which they got
initialized.
Fixes: 4930fbf419 ("idpf: add core init and interrupt request")
Fixes: 34c21fa894 ("idpf: implement virtchnl transaction manager")
Cc: stable@vger.kernel.org # 6.9+
Reviewed-by: Tarun K Singh <tarun.k.singh@intel.com>
Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
When the device control plane is removed or the platform
running device control plane is rebooted, a reset is detected
on the driver. On driver reset, it releases the resources and
waits for the reset to complete. If the reset fails, it takes
the error path and releases the vport lock. At this time if the
monitoring tools tries to access link settings, it call traces
for accessing released vport pointer.
To avoid it, move link_speed_mbps to netdev_priv structure
which removes the dependency on vport pointer and the vport lock
in idpf_get_link_ksettings. Also use netif_carrier_ok()
to check the link status and adjust the offsetof to use link_up
instead of link_speed_mbps.
Fixes: 02cbfba1ad ("idpf: add ethtool callbacks")
Cc: stable@vger.kernel.org # 6.7+
Reviewed-by: Tarun K Singh <tarun.k.singh@intel.com>
Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Fix Flow Director not allowing to re-map traffic to 0th queue when action
is configured to drop (and vice versa).
The current implementation of ethtool callback in the ice driver forbids
change Flow Director action from 0 to -1 and from -1 to 0 with an error,
e.g:
# ethtool -U eth2 flow-type tcp4 src-ip 1.1.1.1 loc 1 action 0
# ethtool -U eth2 flow-type tcp4 src-ip 1.1.1.1 loc 1 action -1
rmgr: Cannot insert RX class rule: Invalid argument
We set the value of `u16 q_index = 0` at the beginning of the function
ice_set_fdir_input_set(). In case of "drop traffic" action (which is
equal to -1 in ethtool) we store the 0 value. Later, when want to change
traffic rule to redirect to queue with index 0 it returns an error
caused by duplicate found.
Fix this behaviour by change of the type of field `q_index` from u16 to s16
in `struct ice_fdir_fltr`. This allows to store -1 in the field in case
of "drop traffic" action. What is more, change the variable type in the
function ice_set_fdir_input_set() and assign at the beginning the new
`#define ICE_FDIR_NO_QUEUE_IDX` which is -1. Later, if the action is set
to another value (point specific queue index) the variable value is
overwritten in the function.
Fixes: cac2a27cd9 ("ice: Support IPv4 Flow Director filters")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Unloading the ice driver while switchdev port representors are added to
a bridge can lead to kernel panic. Reproducer:
modprobe ice
devlink dev eswitch set $PF1_PCI mode switchdev
ip link add $BR type bridge
ip link set $BR up
echo 2 > /sys/class/net/$PF1/device/sriov_numvfs
sleep 2
ip link set $PF1 master $BR
ip link set $VF1_PR master $BR
ip link set $VF2_PR master $BR
ip link set $PF1 up
ip link set $VF1_PR up
ip link set $VF2_PR up
ip link set $VF1 up
rmmod irdma ice
When unloading the driver, ice_eswitch_detach() is eventually called as
part of VF freeing. First, it removes a port representor from xarray,
then unregister_netdev() is called (via repr->ops.rem()), finally
representor is deallocated. The problem comes from the bridge doing its
own deinit at the same time. unregister_netdev() triggers a notifier
chain, resulting in ice_eswitch_br_port_deinit() being called. It should
set repr->br_port = NULL, but this does not happen since repr has
already been removed from xarray and is not found. Regardless, it
finishes up deallocating br_port. At this point, repr is still not freed
and an fdb event can happen, in which ice_eswitch_br_fdb_event_work()
takes repr->br_port and tries to use it, which causes a panic (use after
free).
Note that this only happens with 2 or more port representors added to
the bridge, since with only one representor port, the bridge deinit is
slightly different (ice_eswitch_br_port_deinit() is called via
ice_eswitch_br_ports_flush(), not ice_eswitch_br_port_unlink()).
Trace:
Oops: general protection fault, probably for non-canonical address 0xf129010fd1a93284: 0000 [#1] PREEMPT SMP KASAN NOPTI
KASAN: maybe wild-memory-access in range [0x8948287e8d499420-0x8948287e8d499427]
(...)
Workqueue: ice_bridge_wq ice_eswitch_br_fdb_event_work [ice]
RIP: 0010:__rht_bucket_nested+0xb4/0x180
(...)
Call Trace:
(...)
ice_eswitch_br_fdb_find+0x3fa/0x550 [ice]
? __pfx_ice_eswitch_br_fdb_find+0x10/0x10 [ice]
ice_eswitch_br_fdb_event_work+0x2de/0x1e60 [ice]
? __schedule+0xf60/0x5210
? mutex_lock+0x91/0xe0
? __pfx_ice_eswitch_br_fdb_event_work+0x10/0x10 [ice]
? ice_eswitch_br_update_work+0x1f4/0x310 [ice]
(...)
A workaround is available: brctl setageing $BR 0, which stops the bridge
from adding fdb entries altogether.
Change the order of operations in ice_eswitch_detach(): move the call to
unregister_netdev() before removing repr from xarray. This way
repr->br_port will be correctly set to NULL in
ice_eswitch_br_port_deinit(), preventing a panic.
Fixes: fff292b47a ("ice: add VF representors one by one")
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Since commit 50ea5449c5 ("can: mcp251xfd: fix ring configuration
when switching from CAN-CC to CAN-FD mode"), the current ring and
coalescing configuration is passed to can_ram_get_layout(). That fixed
the issue when switching between CAN-CC and CAN-FD mode with
configured ring (rx, tx) and/or coalescing parameters (rx-frames-irq,
tx-frames-irq).
However 50ea5449c5 ("can: mcp251xfd: fix ring configuration when
switching from CAN-CC to CAN-FD mode"), introduced a regression when
switching CAN modes with disabled coalescing configuration: Even if
the previous CAN mode has no coalescing configured, the new mode is
configured with active coalescing. This leads to delayed receiving of
CAN-FD frames.
This comes from the fact, that ethtool uses usecs = 0 and max_frames =
1 to disable coalescing, however the driver uses internally
priv->{rx,tx}_obj_num_coalesce_irq = 0 to indicate disabled
coalescing.
Fix the regression by assigning struct ethtool_coalesce
ec->{rx,tx}_max_coalesced_frames_irq = 1 if coalescing is disabled in
the driver as can_ram_get_layout() expects this.
Reported-by: https://github.com/vdh-robothania
Closes: https://github.com/raspberrypi/linux/issues/6407
Fixes: 50ea5449c5 ("can: mcp251xfd: fix ring configuration when switching from CAN-CC to CAN-FD mode")
Cc: stable@vger.kernel.org
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241025-mcp251xfd-fix-coalesing-v1-1-9d11416de1df@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
In commit b382380c0d ("can: m_can: Add hrtimer to generate software
interrupt") support for IRQ-less devices was added. Instead of an
interrupt, the interrupt routine is called by a hrtimer-based polling
loop.
That patch forgot to change free_irq() to be only called for devices
with IRQs. Fix this, by calling free_irq() conditionally only if an
IRQ is available for the device (and thus has been requested
previously).
Fixes: b382380c0d ("can: m_can: Add hrtimer to generate software interrupt")
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Markus Schneider-Pargmann <msp@baylibre.com>
Link: https://patch.msgid.link/20240930-m_can-cleanups-v1-1-001c579cdee4@pengutronix.de
Cc: <stable@vger.kernel.org> # v6.6+
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The ISA variable is only defined if X86_32 is also defined. However,
these drivers are still useful and in use on at least some modern 64-bit
x86 industrial systems as well. With the correct module parameters, they
work as long as IO port communication is possible, despite their name
having ISA in them.
Fixes: a29689e60e ("net: handle HAS_IOPORT dependencies")
Signed-off-by: Thomas Mühlbacher <tmuehlbacher@posteo.net>
Link: https://patch.msgid.link/20240919174151.15473-2-tmuehlbacher@posteo.net
Cc: stable@vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Enqueue packets in dql after dma engine starts causes race condition.
Tx transfer starts once dma engine is started and may execute dql dequeue
in completion before it gets queued. It results in following kernel crash
while running iperf stress test:
kernel BUG at lib/dynamic_queue_limits.c:99!
<snip>
Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
pc : dql_completed+0x238/0x248
lr : dql_completed+0x3c/0x248
Call trace:
dql_completed+0x238/0x248
axienet_dma_tx_cb+0xa0/0x170
xilinx_dma_do_tasklet+0xdc/0x290
tasklet_action_common+0xf8/0x11c
tasklet_action+0x30/0x3c
handle_softirqs+0xf8/0x230
<snip>
Start dmaengine after enqueue in dql fixes the crash.
Fixes: 6a91b846af ("net: axienet: Introduce dmaengine support")
Signed-off-by: Suraj Gupta <suraj.gupta2@amd.com>
Link: https://patch.msgid.link/20241030062533.2527042-2-suraj.gupta2@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In the previous implementation, vf_state is allocated memory only when VF
is enabled. However, net_device_ops::ndo_set_vf_mac() may be called before
VF is enabled to configure the MAC address of VF. If this is the case,
enetc_pf_set_vf_mac() will access vf_state, resulting in access to a null
pointer. The simplified error log is as follows.
root@ls1028ardb:~# ip link set eno0 vf 1 mac 00:0c:e7:66:77:89
[ 173.543315] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000004
[ 173.637254] pc : enetc_pf_set_vf_mac+0x3c/0x80 Message from sy
[ 173.641973] lr : do_setlink+0x4a8/0xec8
[ 173.732292] Call trace:
[ 173.734740] enetc_pf_set_vf_mac+0x3c/0x80
[ 173.738847] __rtnl_newlink+0x530/0x89c
[ 173.742692] rtnl_newlink+0x50/0x7c
[ 173.746189] rtnetlink_rcv_msg+0x128/0x390
[ 173.750298] netlink_rcv_skb+0x60/0x130
[ 173.754145] rtnetlink_rcv+0x18/0x24
[ 173.757731] netlink_unicast+0x318/0x380
[ 173.761665] netlink_sendmsg+0x17c/0x3c8
Fixes: d4fd0404c1 ("enetc: Introduce basic PF and VF ENETC ethernet drivers")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241031060247.1290941-2-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A size validation fix similar to that in Commit 50619dbf8d ("sctp: add
size validation when walking chunks") is also required in sctp_sf_ootb()
to address a crash reported by syzbot:
BUG: KMSAN: uninit-value in sctp_sf_ootb+0x7f5/0xce0 net/sctp/sm_statefuns.c:3712
sctp_sf_ootb+0x7f5/0xce0 net/sctp/sm_statefuns.c:3712
sctp_do_sm+0x181/0x93d0 net/sctp/sm_sideeffect.c:1166
sctp_endpoint_bh_rcv+0xc38/0xf90 net/sctp/endpointola.c:407
sctp_inq_push+0x2ef/0x380 net/sctp/inqueue.c:88
sctp_rcv+0x3831/0x3b20 net/sctp/input.c:243
sctp4_rcv+0x42/0x50 net/sctp/protocol.c:1159
ip_protocol_deliver_rcu+0xb51/0x13d0 net/ipv4/ip_input.c:205
ip_local_deliver_finish+0x336/0x500 net/ipv4/ip_input.c:233
Reported-by: syzbot+f0cbb34d39392f2746ca@syzkaller.appspotmail.com
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/a29ebb6d8b9f8affd0f9abb296faafafe10c17d8.1730223981.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sparse warns:
note: in included file (through ../include/trace/trace_events.h,
../include/trace/define_trace.h,
../drivers/net/ethernet/freescale/dpaa/dpaa_eth_trace.h):
warning: incorrect type in assignment (different base types)
expected unsigned int [usertype] fd_status
got restricted __be32 const [usertype] status
We take struct qm_fd :: status, store it and print it as an u32,
though it is a big endian field. We should print the FD status in
CPU endianness for ease of debug and consistency between PowerPC and
Arm systems.
Though it is a not often used debug feature, it is best to treat it as
a bug and backport the format change to all supported stable kernels,
for consistency.
Fixes: eb11ddf36e ("dpaa_eth: add trace points")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Link: https://patch.msgid.link/20241029163105.44135-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The MAC address of VF can be configured through the mailbox mechanism of
ENETC, but the previous implementation forgot to set the MAC address in
net_device, resulting in the SMAC of the sent frames still being the old
MAC address. Since the MAC address in the hardware has been changed, Rx
cannot receive frames with the DMAC address as the new MAC address. The
most obvious phenomenon is that after changing the MAC address, we can
see that the MAC address of eno0vf0 has not changed through the "ifconfig
eno0vf0" command and the IP address cannot be obtained .
root@ls1028ardb:~# ifconfig eno0vf0 down
root@ls1028ardb:~# ifconfig eno0vf0 hw ether 00:04:9f:3a:4d:56 up
root@ls1028ardb:~# ifconfig eno0vf0
eno0vf0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
ether 66:36:2c:3b:87:76 txqueuelen 1000 (Ethernet)
RX packets 794 bytes 69239 (69.2 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 11 bytes 2226 (2.2 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
Fixes: beb74ac878 ("enetc: Add vf to pf messaging support")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Link: https://patch.msgid.link/20241029090406.841836-1-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull bpf fixes from Daniel Borkmann:
- Fix BPF verifier to force a checkpoint when the program's jump
history becomes too long (Eduard Zingerman)
- Add several fixes to the BPF bits iterator addressing issues like
memory leaks and overflow problems (Hou Tao)
- Fix an out-of-bounds write in trie_get_next_key (Byeonguk Jeong)
- Fix BPF test infra's LIVE_FRAME frame update after a page has been
recycled (Toke Høiland-Jørgensen)
- Fix BPF verifier and undo the 40-bytes extra stack space for
bpf_fastcall patterns due to various bugs (Eduard Zingerman)
- Fix a BPF sockmap race condition which could trigger a NULL pointer
dereference in sock_map_link_update_prog (Cong Wang)
- Fix tcp_bpf_recvmsg_parser to retrieve seq_copied from tcp_sk under
the socket lock (Jiayuan Chen)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf, test_run: Fix LIVE_FRAME frame update after a page has been recycled
selftests/bpf: Add three test cases for bits_iter
bpf: Use __u64 to save the bits in bits iterator
bpf: Check the validity of nr_words in bpf_iter_bits_new()
bpf: Add bpf_mem_alloc_check_size() helper
bpf: Free dynamically allocated bits in bpf_iter_bits_destroy()
bpf: disallow 40-bytes extra stack for bpf_fastcall patterns
selftests/bpf: Add test for trie_get_next_key()
bpf: Fix out-of-bounds write in trie_get_next_key()
selftests/bpf: Test with a very short loop
bpf: Force checkpoint when jmp history is too long
bpf: fix filed access without lock
sock_map: fix a NULL pointer dereference in sock_map_link_update_prog()
Pull networking fixes from Paolo Abeni:
"Including fixes from WiFi, bluetooth and netfilter.
No known new regressions outstanding.
Current release - regressions:
- wifi: mt76: do not increase mcu skb refcount if retry is not
supported
Current release - new code bugs:
- wifi:
- rtw88: fix the RX aggregation in USB 3 mode
- mac80211: fix memory corruption bug in struct ieee80211_chanctx
Previous releases - regressions:
- sched:
- stop qdisc_tree_reduce_backlog on TC_H_ROOT
- sch_api: fix xa_insert() error path in tcf_block_get_ext()
- wifi:
- revert "wifi: iwlwifi: remove retry loops in start"
- cfg80211: clear wdev->cqm_config pointer on free
- netfilter: fix potential crash in nf_send_reset6()
- ip_tunnel: fix suspicious RCU usage warning in ip_tunnel_find()
- bluetooth: fix null-ptr-deref in hci_read_supported_codecs
- eth: mlxsw: add missing verification before pushing Tx header
- eth: hns3: fixed hclge_fetch_pf_reg accesses bar space out of
bounds issue
Previous releases - always broken:
- wifi: mac80211: do not pass a stopped vif to the driver in
.get_txpower
- netfilter: sanitize offset and length before calling skb_checksum()
- core:
- fix crash when config small gso_max_size/gso_ipv4_max_size
- skip offload for NETIF_F_IPV6_CSUM if ipv6 header contains extension
- mptcp: protect sched with rcu_read_lock
- eth: ice: fix crash on probe for DPLL enabled E810 LOM
- eth: macsec: fix use-after-free while sending the offloading packet
- eth: stmmac: fix unbalanced DMA map/unmap for non-paged SKB data
- eth: hns3: fix kernel crash when 1588 is sent on HIP08 devices
- eth: mtk_wed: fix path of MT7988 WO firmware"
* tag 'net-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (70 commits)
net: hns3: fix kernel crash when 1588 is sent on HIP08 devices
net: hns3: fixed hclge_fetch_pf_reg accesses bar space out of bounds issue
net: hns3: initialize reset_timer before hclgevf_misc_irq_init()
net: hns3: don't auto enable misc vector
net: hns3: Resolved the issue that the debugfs query result is inconsistent.
net: hns3: fix missing features due to dev->features configuration too early
net: hns3: fixed reset failure issues caused by the incorrect reset type
net: hns3: add sync command to sync io-pgtable
net: hns3: default enable tx bounce buffer when smmu enabled
netfilter: nft_payload: sanitize offset and length before calling skb_checksum()
net: ethernet: mtk_wed: fix path of MT7988 WO firmware
selftests: forwarding: Add IPv6 GRE remote change tests
mlxsw: spectrum_ipip: Fix memory leak when changing remote IPv6 address
mlxsw: pci: Sync Rx buffers for device
mlxsw: pci: Sync Rx buffers for CPU
mlxsw: spectrum_ptp: Add missing verification before pushing Tx header
net: skip offload for NETIF_F_IPV6_CSUM if ipv6 header contains extension
Bluetooth: hci: fix null-ptr-deref in hci_read_supported_codecs
netfilter: nf_reject_ipv6: fix potential crash in nf_send_reset6()
netfilter: Fix use-after-free in get_info()
...
Pull sound fixes from Takashi Iwai:
"Here we see slightly more commits than wished, but basically all are
small and mostly trivial fixes.
The only core change is the workaround for __counted_by() usage in
ASoC DAPM code, while the rest are device-specific fixes for Intel
Baytrail devices, Cirrus and wcd937x codecs, and HD-audio / USB-audio
devices"
* tag 'sound-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek: Fix headset mic on TUXEDO Stellaris 16 Gen6 mb1
ALSA: hda/realtek: Fix headset mic on TUXEDO Gemini 17 Gen3
ALSA: usb-audio: Add quirks for Dell WD19 dock
ASoC: codecs: wcd937x: relax the AUX PDM watchdog
ASoC: codecs: wcd937x: add missing LO Switch control
ASoC: dt-bindings: rockchip,rk3308-codec: add port property
ALSA: hda/realtek: Add subwoofer quirk for Infinix ZERO BOOK 13
ASoC: dapm: fix bounds checker error in dapm_widget_list_create
ASoC: Intel: sst: Fix used of uninitialized ctx to log an error
ASoC: cs42l51: Fix some error handling paths in cs42l51_probe()
ASoC: Intel: sst: Support LPE0F28 ACPI HID
ALSA: hda/realtek: Limit internal Mic boost on Dell platform
ASoC: Intel: bytcr_rt5640: Add DMI quirk for Vexia Edu Atla 10 tablet
ASoC: Intel: bytcr_rt5640: Add support for non ACPI instantiated codec
ASoC: codecs: rt5640: Always disable IRQs from rt5640_cancel_work()
Pablo Neira Ayuso says:
====================
The following patchset contains Netfilter fixes for net:
1) Remove unused parameters in conntrack_dump_flush.c used by
selftests, from Liu Jing.
2) Fix possible UaF when removing xtables module via getsockopt()
interface, from Dong Chenchen.
3) Fix potential crash in nf_send_reset6() reported by syzkaller.
From Eric Dumazet
4) Validate offset and length before calling skb_checksum()
in nft_payload, otherwise hitting BUG() is possible.
netfilter pull request 24-10-31
* tag 'nf-24-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nft_payload: sanitize offset and length before calling skb_checksum()
netfilter: nf_reject_ipv6: fix potential crash in nf_send_reset6()
netfilter: Fix use-after-free in get_info()
selftests: netfilter: remove unused parameter
====================
Link: https://patch.msgid.link/
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- hci: fix null-ptr-deref in hci_read_supported_codecs
* tag 'for-net-2024-10-30' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: hci: fix null-ptr-deref in hci_read_supported_codecs
====================
Link: https://patch.msgid.link/20241030192205.38298-1-luiz.dentz@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The TQP BAR space is divided into two segments. TQPs 0-1023 and TQPs
1024-1279 are in different BAR space addresses. However,
hclge_fetch_pf_reg does not distinguish the tqp space information when
reading the tqp space information. When the number of TQPs is greater
than 1024, access bar space overwriting occurs.
The problem of different segments has been considered during the
initialization of tqp.io_base. Therefore, tqp.io_base is directly used
when the queue is read in hclge_fetch_pf_reg.
The error message:
Unable to handle kernel paging request at virtual address ffff800037200000
pc : hclge_fetch_pf_reg+0x138/0x250 [hclge]
lr : hclge_get_regs+0x84/0x1d0 [hclge]
Call trace:
hclge_fetch_pf_reg+0x138/0x250 [hclge]
hclge_get_regs+0x84/0x1d0 [hclge]
hns3_get_regs+0x2c/0x50 [hns3]
ethtool_get_regs+0xf4/0x270
dev_ethtool+0x674/0x8a0
dev_ioctl+0x270/0x36c
sock_do_ioctl+0x110/0x2a0
sock_ioctl+0x2ac/0x530
__arm64_sys_ioctl+0xa8/0x100
invoke_syscall+0x4c/0x124
el0_svc_common.constprop.0+0x140/0x15c
do_el0_svc+0x30/0xd0
el0_svc+0x1c/0x2c
el0_sync_handler+0xb0/0xb4
el0_sync+0x168/0x180
Fixes: 939ccd107f ("net: hns3: move dump regs function to a separate file")
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Currently the misc irq is initialized before reset_timer setup. But
it will access the reset_timer in the irq handler. So initialize
the reset_timer earlier.
Fixes: ff200099d2 ("net: hns3: remove unnecessary work in hclgevf_main")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Currently, there is a time window between misc irq enabled
and service task inited. If an interrupte is reported at
this time, it will cause warning like below:
[ 16.324639] Call trace:
[ 16.324641] __queue_delayed_work+0xb8/0xe0
[ 16.324643] mod_delayed_work_on+0x78/0xd0
[ 16.324655] hclge_errhand_task_schedule+0x58/0x90 [hclge]
[ 16.324662] hclge_misc_irq_handle+0x168/0x240 [hclge]
[ 16.324666] __handle_irq_event_percpu+0x64/0x1e0
[ 16.324667] handle_irq_event+0x80/0x170
[ 16.324670] handle_fasteoi_edge_irq+0x110/0x2bc
[ 16.324671] __handle_domain_irq+0x84/0xfc
[ 16.324673] gic_handle_irq+0x88/0x2c0
[ 16.324674] el1_irq+0xb8/0x140
[ 16.324677] arch_cpu_idle+0x18/0x40
[ 16.324679] default_idle_call+0x5c/0x1bc
[ 16.324682] cpuidle_idle_call+0x18c/0x1c4
[ 16.324684] do_idle+0x174/0x17c
[ 16.324685] cpu_startup_entry+0x30/0x6c
[ 16.324687] secondary_start_kernel+0x1a4/0x280
[ 16.324688] ---[ end trace 6aa0bff672a964aa ]---
So don't auto enable misc vector when request irq..
Fixes: 7be1b9f3e9 ("net: hns3: make hclge_service use delayed workqueue")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>