The "bec" struct isn't necessarily always initialized. For example, the
mcp251xfd_get_berr_counter() function doesn't initialize anything if the
interface is down.
Fixes: 52c793f240 ("can: netlink support for bus-error reporting and counters")
Link: https://lore.kernel.org/r/YAkaRdRJncsJO8Ve@mwanda
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
fec_restart() does a hard reset of the MAC module when the link status
changes to up. This temporarily resets the R_CNTRL register which controls
the MII mode of the ENET_OUT clock. In the case of RMII, the clock
frequency momentarily drops from 50MHz to 25MHz until the register is
reconfigured. Some link partners do not tolerate this glitch and
invalidate the link causing failure to establish a stable link when using
PHY polling mode. Since as per IEEE802.3 the criteria for link validity
are PHY-specific, what the partner should tolerate cannot be assumed, so
avoid resetting the MII clock by using software reset instead of hardware
reset when the link is up. This is generally relevant only if the SoC
provides the clock to an external PHY and the PHY is configured for RMII.
Signed-off-by: Laurent Badel <laurentbadel@eaton.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Function __team_compute_features() is protected by team->lock
mutex when it is called from team_compute_features() used when
features of an underlying device is changed. This causes
a deadlock when NETDEV_FEAT_CHANGE notifier for underlying device
is fired due to change propagated from team driver (e.g. MTU
change). It's because callbacks like team_change_mtu() or
team_vlan_rx_{add,del}_vid() protect their port list traversal
by team->lock mutex.
Example (r8169 case where this driver disables TSO for certain MTU
values):
...
[ 6391.348202] __mutex_lock.isra.6+0x2d0/0x4a0
[ 6391.358602] team_device_event+0x9d/0x160 [team]
[ 6391.363756] notifier_call_chain+0x47/0x70
[ 6391.368329] netdev_update_features+0x56/0x60
[ 6391.373207] rtl8169_change_mtu+0x14/0x50 [r8169]
[ 6391.378457] dev_set_mtu_ext+0xe1/0x1d0
[ 6391.387022] dev_set_mtu+0x52/0x90
[ 6391.390820] team_change_mtu+0x64/0xf0 [team]
[ 6391.395683] dev_set_mtu_ext+0xe1/0x1d0
[ 6391.399963] do_setlink+0x231/0xf50
...
In fact team_compute_features() called from team_device_event()
does not need to be protected by team->lock mutex and rcu_read_lock()
is sufficient there for port list traversal.
Fixes: 3d249d4ca7 ("net: introduce ethernet teaming device")
Cc: Saeed Mahameed <saeed@kernel.org>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20210125074416.4056484-1-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If a non nat tuple entry is inserted just to the regular tuples
rhashtable (ct_tuples_ht) and not to natted tuples rhashtable
(ct_nat_tuples_ht). Commit bc562be967 ("net/mlx5e: CT: Save ct entries
tuples in hashtables") mixed up the return labels and names sot that on
cleanup or failure we still try to remove for the natted tuples rhashtable.
Fix that by correctly checking if a natted tuples insertion
before removing it. While here make it more readable.
Fixes: bc562be967 ("net/mlx5e: CT: Save ct entries tuples in hashtables")
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Sometimes, channel params are changed without recreating the channels.
It happens in two basic cases: when the channels are closed, and when
the parameter being changed doesn't affect how channels are configured.
Such changes invoke a hardware command that might fail. The whole
operation should be reverted in such cases, but the code that restores
the parameters' values in the driver was missing. This commit adds this
handling.
Fixes: 2e20a15120 ("net/mlx5e: Fail safe mtu and lro setting")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Trust state may be changed without recreating the channels. It happens
when the channels are closed, and when channel parameters (min inline
mode) stay the same after changing the trust state. Changing the trust
state is a hardware command that may fail. The current code didn't
restore the channel parameters to their old values if an error happened
and the channels were closed. This commit adds handling for this case.
Fixes: 6e0504c698 ("net/mlx5e: Change inline mode correctly when changing trust state")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
This commit addresses two issues related to changing the number of
queues when the channels are closed:
1. Missing call to mlx5e_num_channels_changed to update
real_num_tx_queues when the number of TCs is changed.
2. When mlx5e_num_channels_changed returns an error, the channel
parameters must be reverted.
Two Fixes: tags correspond to the first commits where these two issues
were introduced.
Fixes: 3909a12e79 ("net/mlx5e: Fix configuration of XPS cpumasks and netdev queues in corner cases")
Fixes: fa3748775b ("net/mlx5e: Handle errors from netif_set_real_num_{tx,rx}_queues")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently, if a neighbour isn't valid when offloading tunnel encap rules,
we offload the original match and replace the original action with
"goto slow path" action. For this we use a temporary flow attribute based
on the original flow attribute and then change the action. Flow flags,
which among those is the CT flag, are still shared for the slow path rule
offload, so we end up parsing this flow as a CT + goto slow path rule.
Besides being unnecessary, CT action offload saves extra information in
the passed flow attribute, such as created ct_flow and mod_hdr, which
is lost onces the temporary flow attribute is freed.
When a neigh is updated and is valid, we offload the original CT rule
with original CT action, which again creates a ct_flow and mod_hdr
and saves it in the flow's original attribute. Then we delete the slow
path rule with a temporary flow attribute based on original updated
flow attribute, and we free the relevant ct_flow and mod_hdr.
Then when tc deletes this flow, we try to free the ct_flow and mod_hdr
on the flow's attribute again.
To fix the issue, skip all furture proccesing (CT/Sample/Split rules)
in offload/unoffload of slow path rules.
Call trace:
[ 758.850525] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000218
[ 758.952987] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[ 758.964170] Modules linked in: act_csum(E) act_pedit(E) act_tunnel_key(E) act_ct(E) nf_flow_table(E) xt_nat(E) ip6table_filter(E) ip6table_nat(E) xt_comment(E) ip6_tables(E) xt_conntrack(E) xt_MASQUERADE(E) nf_conntrack_netlink(E) xt_addrtype(E) iptable_filter(E) iptable_nat(E) bpfilter(E) br_netfilter(E) bridge(E) stp(E) llc(E) xfrm_user(E) overlay(E) act_mirred(E) act_skbedit(E) rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) esp6_offload(E) esp6(E) esp4_offload(E) esp4(E) xfrm_algo(E) mlx5_ib(OE) ib_uverbs(OE) geneve(E) ip6_udp_tunnel(E) udp_tunnel(E) nfnetlink_cttimeout(E) nfnetlink(E) mlx5_core(OE) act_gact(E) cls_flower(E) sch_ingress(E) openvswitch(E) nsh(E) nf_conncount(E) nf_nat(E) mlxfw(OE) psample(E) nf_conntrack(E) nf_defrag_ipv4(E) vfio_mdev(E) mdev(E) ib_core(OE) mlx_compat(OE) crct10dif_ce(E) uio_pdrv_genirq(E) uio(E) i2c_mlx(E) mlxbf_pmc(E) sbsa_gwdt(E) mlxbf_gige(E) gpio_mlxbf2(E) mlxbf_pka(E) mlx_trio(E) mlx_bootctl(E) bluefield_edac(E) knem(O)
[ 758.964225] ip_tables(E) mlxbf_tmfifo(E) ipv6(E) crc_ccitt(E) nf_defrag_ipv6(E)
[ 759.154186] CPU: 5 PID: 122 Comm: kworker/u16:1 Tainted: G OE 5.4.60-mlnx.52.gde81e85 #1
[ 759.172870] Hardware name: https://www.mellanox.com BlueField SoC/BlueField SoC, BIOS BlueField:3.5.0-2-gc1b5d64 Jan 4 2021
[ 759.195466] Workqueue: mlx5e mlx5e_rep_neigh_update [mlx5_core]
[ 759.207344] pstate: a0000005 (NzCv daif -PAN -UAO)
[ 759.217003] pc : mlx5_del_flow_rules+0x5c/0x160 [mlx5_core]
[ 759.228229] lr : mlx5_del_flow_rules+0x34/0x160 [mlx5_core]
[ 759.405858] Call trace:
[ 759.410804] mlx5_del_flow_rules+0x5c/0x160 [mlx5_core]
[ 759.421337] __mlx5_eswitch_del_rule.isra.43+0x5c/0x1c8 [mlx5_core]
[ 759.433963] mlx5_eswitch_del_offloaded_rule_ct+0x34/0x40 [mlx5_core]
[ 759.446942] mlx5_tc_rule_delete_ct+0x68/0x74 [mlx5_core]
[ 759.457821] mlx5_tc_ct_delete_flow+0x160/0x21c [mlx5_core]
[ 759.469051] mlx5e_tc_unoffload_fdb_rules+0x158/0x168 [mlx5_core]
[ 759.481325] mlx5e_tc_encap_flows_del+0x140/0x26c [mlx5_core]
[ 759.492901] mlx5e_rep_update_flows+0x11c/0x1ec [mlx5_core]
[ 759.504127] mlx5e_rep_neigh_update+0x160/0x200 [mlx5_core]
[ 759.515314] process_one_work+0x178/0x400
[ 759.523350] worker_thread+0x58/0x3e8
[ 759.530685] kthread+0x100/0x12c
[ 759.537152] ret_from_fork+0x10/0x18
[ 759.544320] Code: 97ffef55 51000673 3100067f 54ffff41 (b9421ab3)
[ 759.556548] ---[ end trace fab818bb1085832d ]---
Fixes: 4c3844d9e9 ("net/mlx5e: CT: Introduce connection tracking")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The cited commit introduce new CONFIG_MLX5_CLS_ACT kconfig variable
to control compilation of TC hardware offloads implementation.
When this configuration is disabled the driver is still wrongly
reports in ethtool that hw-tc-offload is supported.
Fixed by reporting hw-tc-offload is supported only when
CONFIG_MLX5_CLS_ACT is enabled.
Fixes: d956873f90 ("net/mlx5e: Introduce kconfig var for TC support")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Pages for the host PF and ECPF were stored in the same tree, so the ECPF
pages were being freed along with the host PF's when the host driver
unloaded.
Combine the function ID and ECPF flag to use as an index into the
x-array containing the trees to get a different tree for the host PF and
ECPF.
Fixes: c6168161f6 ("net/mlx5: Add support for release all pages event")
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When IPSEC offload isn't active, the number of stats is not zero, but
the strings are not filled, leading to exposing stats with empty names.
Fix this by using the same condition for NUM_STATS and FILL_STRS.
Fixes: 0aab3e1b04 ("net/mlx5e: IPSec, Expose IPsec HW stat only for supporting HW")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
"Unsupported key used:" appears in kernel log when flows with
unsupported key are used, arp fields for example.
OpenVSwitch was changed to match on arp fields by default that
caused this warning to appear in kernel log for every arp rule, which
can be a lot.
Fix by lowering print level from warning to debug.
Fixes: e3a2b7ed01 ("net/mlx5e: Support offload cls_flower with drop action")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Instead of directly return, goto the error handling label to free
allocated page.
Fixes: 5f29458b77 ("net/mlx5e: Support dump callback in TX reporter")
Signed-off-by: Pan Bian <bianpan2016@163.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
rate_bytes_ps is a 64-bit field. It passed as 32-bit field to
apply_police_params(). Due to this when police rate is higher
than 4Gbps, 32-bit calculation ignores the carry. This results
in incorrect rate configurationn the device.
Fix it by performing 64-bit calculation.
Fixes: fcb64c0f56 ("net/mlx5: E-Switch, add ingress rate support")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When we create the ft object we also init rhltable in ft->fgs_hash.
So in error flow before kfree of ft we need to destroy that rhltable.
Fixes: 693c6883bb ("net/mlx5: Add hash table for flow groups in flow table")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Kalle Valo says:
====================
wireless-drivers fixes for v5.11
Second set of fixes for v5.11. Like in last time we again have more
fixes than usual Actually a bit too much for my liking in this state
of the cycle, but due to unrelated challenges I was only able to
submit them now.
We have few important crash fixes, iwlwifi modifying read-only data
being the most reported issue, and also smaller fixes to iwlwifi.
mt76
* fix a clang warning about enum usage
* fix rx buffer refcounting crash
mt7601u
* fix rx buffer refcounting crash
* fix crash when unbplugging the device
iwlwifi
* fix a crash where we were modifying read-only firmware data
* lots of smaller fixes all over the driver
* tag 'wireless-drivers-2021-01-26' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers: (24 commits)
mt7601u: fix kernel crash unplugging the device
iwlwifi: queue: bail out on invalid freeing
iwlwifi: mvm: guard against device removal in reprobe
iwlwifi: Fix IWL_SUBDEVICE_NO_160 macro to use the correct bit.
iwlwifi: mvm: clear IN_D3 after wowlan status cmd
iwlwifi: pcie: add rules to match Qu with Hr2
iwlwifi: mvm: invalidate IDs of internal stations at mvm start
iwlwifi: mvm: fix the return type for DSM functions 1 and 2
iwlwifi: pcie: reschedule in long-running memory reads
iwlwifi: pcie: use jiffies for memory read spin time limit
iwlwifi: pcie: fix context info memory leak
iwlwifi: pcie: add a NULL check in iwl_pcie_txq_unmap
iwlwifi: pcie: set LTR on more devices
iwlwifi: queue: don't crash if txq->entries is NULL
iwlwifi: fix the NMI flow for old devices
iwlwifi: pnvm: don't try to load after failures
iwlwifi: pnvm: don't skip everything when not reloading
iwlwifi: pcie: avoid potential PNVM leaks
iwlwifi: mvm: take mutex for calling iwl_mvm_get_sync_time()
iwlwifi: mvm: skip power command when unbinding vif during CSA
...
====================
Link: https://lore.kernel.org/r/20210126092202.6A367C433CA@smtp.codeaurora.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link speed advertising in igc has two problems:
- When setting the advertisement via ethtool, the link speed is converted
to the legacy 32 bit representation for the intel PHY code.
This inadvertently drops ETHTOOL_LINK_MODE_2500baseT_Full_BIT (being
beyond bit 31). As a result, any call to `ethtool -s ...' drops the
2500Mbit/s link speed from the PHY settings. Only reloading the driver
alleviates that problem.
Fix this by converting the ETHTOOL_LINK_MODE_2500baseT_Full_BIT to the
Intel PHY ADVERTISE_2500_FULL bit explicitly.
- Rather than checking the actual PHY setting, the .get_link_ksettings
function always fills link_modes.advertising with all link speeds
the device is capable of.
Fix this by checking the PHY autoneg_advertised settings and report
only the actually advertised speeds up to ethtool.
Fixes: 8c5ad0dae9 ("igc: Add ethtool support")
Signed-off-by: Corinna Vinschen <vinschen@redhat.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
This change simplifies the VF initialization check and also minimizes
the delay between acquiring the VSI pointer and using it. As known by
the commit being fixed, there is a risk of the VSI pointer getting
changed. Therefore minimize the delay between getting and using the
pointer.
Fixes: 9889707b06 ("i40e: Fix crash caused by stress setting of VF MAC addresses")
Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The current MSI-X enablement logic tries to enable best-case MSI-X
vectors and if that fails we only support a bare-minimum set. This
includes a single MSI-X for 1 Tx and 1 Rx queue and a single MSI-X
for the OICR interrupt. Unfortunately, the driver fails to load when we
don't get as many MSI-X as requested for a couple reasons.
First, the code to allocate MSI-X in the driver tries to allocate
num_online_cpus() MSI-X for LAN traffic without caring about the number
of MSI-X actually enabled/requested from the kernel for LAN traffic.
So, when calling ice_get_res() for the PF VSI, it returns failure
because the number of available vectors is less than requested. Fix
this by not allowing the PF VSI to allocation more than
pf->num_lan_msix MSI-X vectors and pf->num_lan_msix Rx/Tx queues.
Limiting the number of queues is done because we don't want more than
1 Tx/Rx queue per interrupt due to performance conerns.
Second, the driver assigns pf->num_lan_msix = 2, to account for LAN
traffic and the OICR. However, pf->num_lan_msix is only meant for LAN
MSI-X. This is causing a failure when the PF VSI tries to
allocate/reserve the minimum pf->num_lan_msix because the OICR MSI-X has
already been reserved, so there may not be enough MSI-X vectors left.
Fix this by setting pf->num_lan_msix = 1 for the failure case. Then the
ICE_MIN_MSIX accounts for the LAN MSI-X and the OICR MSI-X needed for
the failure case.
Update the related defines used in ice_ena_msix_range() to align with
the above behavior and remove the unused RDMA defines because RDMA is
currently not supported. Also, remove the now incorrect comment.
Fixes: 152b978a1f ("ice: Rework ice_ena_msix_range")
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Currently users could create more channels than LAN MSI-X available.
This is happening because there is no check against pf->num_lan_msix
when checking the max allowed channels and will cause performance issues
if multiple Tx and Rx queues are tied to a single MSI-X. Fix this by not
allowing more channels than LAN MSI-X available in pf->num_lan_msix.
Fixes: 87324e747f ("ice: Implement ethtool ops for channels")
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Fix the driver to copy the MAC address configured in ndo_set_mac_address
into dev_addr, even if the MAC filter already exists in HW. In some
situations (e.g. bonding) the netdev's dev_addr could have been modified
outside of the driver, with no change to the HW filter, so the driver
cannot assume that they match.
Fixes: 757976ab16 ("ice: Fix check for removing/adding mac filters")
Signed-off-by: Nick Nunley <nicholas.d.nunley@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
This patch is based on a similar change to i40e by Slawomir Laba:
"i40e: Implement flow for IPv6 next header (extension header)".
When a packet contains an IPv6 header with next header which is
an extension header and not a protocol one, the kernel function
skb_transport_header called with such sk_buff will return a
pointer to the extension header and not to the TCP one.
The above explained call caused a problem with packet processing
for skb with encapsulation for tunnel with ICE_TX_CTX_EIPT_IPV6.
The extension header was not skipped at all.
The ipv6_skip_exthdr function does check if next header of the IPV6
header is an extension header and doesn't modify the l4_proto pointer
if it points to a protocol header value so its safe to omit the
comparison of exthdr and l4.hdr pointers. The ipv6_skip_exthdr can
return value -1. This means that the skipping process failed
and there is something wrong with the packet so it will be dropped.
Fixes: a4e82a81f5 ("ice: Add support for tunnel offloads")
Signed-off-by: Nick Nunley <nicholas.d.nunley@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The packet classifier would occasionally misrecognize an IPv6 training
packet when the next protocol field was 0. The correct value for
unspecified protocol is IPPROTO_NONE.
Fixes: 165d80d6ad ("ice: Support IPv6 Flow Director filters")
Signed-off-by: Henry Tieman <henry.w.tieman@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
In D3 resume flow, avoid the following race where sending
packets before updating the sequence number (sequence
number received from the wowlan status command response):
Thread 1:
__iwl_mvm_resume clears IWL_MVM_STATUS_IN_D3 and is cut
by thread 2 before reaching iwl_mvm_query_wakeup_reasons.
Thread 2:
iwl_mvm_mac_itxq_xmit calls iwl_mvm_tx_skb since
IWL_MVM_STATUS_IN_D3 is not set using a wrong sequence number.
Thread 1:
__iwl_mvm_resume continues and calls iwl_mvm_query_wakeup_reasons
updating the sequence number received from the firmware.
The next packet that will be sent now will cause sysassert 0x1096.
Fix the bug by moving 'clear IWL_MVM_STATUS_IN_D3' to after
sending the wowlan status command and updating the sequence
number.
Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/iwlwifi.20210122144849.fe927ec939c6.I103d3321fb55da7e6c6c51582cfadf94eb8b6c58@changeid
If we spin for a long time in memory reads that (for some reason in
hardware) take a long time, then we'll eventually get messages such
as
watchdog: BUG: soft lockup - CPU#2 stuck for 24s! [kworker/2:2:272]
This is because the reading really does take a very long time, and
we don't schedule, so we're hogging the CPU with this task, at least
if CONFIG_PREEMPT is not set, e.g. with CONFIG_PREEMPT_VOLUNTARY=y.
Previously I misinterpreted the situation and thought that this was
only going to happen if we had interrupts disabled, and then fixed
this (which is good anyway, however), but that didn't always help;
looking at it again now I realized that the spin unlock will only
reschedule if CONFIG_PREEMPT is used.
In order to avoid this issue, change the code to cond_resched() if
we've been spinning for too long here.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Fixes: 04516706bb ("iwlwifi: pcie: limit memory read spin time")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/iwlwifi.20210115130253.217a9d6a6a12.If964cb582ab0aaa94e81c4ff3b279eaafda0fd3f@changeid
I noticed that the flow that triggers an NMI on the firmware
for old devices (tested on 7265) doesn't work.
Apparently, the firmware / device is still in low power when
we write the register that triggers the NMI. We call the
"grab_nic_access" function to make sure the device is awake
but that wasn't enough. I played with this and noticed that
if we wait 1 ms after the device reports it is awake before
we write to the NMI register, the device always sees our
write and the firmware gets properly asserted.
Triggering an NMI to the firmware can be done with the
debugfs hook:
echo 1 > /sys/kernel/debug/iwlwifi/0000\:00\:03.0/iwlmvm/fw_nmi
What happened before is that the firmware would just stall
without running its NMI routine. Because of that the driver
wouldn't get the "firmware crashed" interrupt. After a while
the driver would notice that the firmware is not responding
to some command and it would read the error data from the
firmware, but this data is populated in the NMI service
routine in the firmware which was not called. So in the logs
it looked like:
iwlwifi 0000:00:03.0: Error sending REPLY_ERROR: time out after 2000ms.
iwlwifi 0000:00:03.0: Current CMD queue read_ptr 33 write_ptr 34
iwlwifi 0000:00:03.0: Loaded firmware version: 29.09bd31e1.0 7265D-29.ucode
iwlwifi 0000:00:03.0: 0x00000000 | ADVANCED_SYSASSERT
iwlwifi 0000:00:03.0: 0x00000000 | trm_hw_status0
iwlwifi 0000:00:03.0: 0x00000000 | trm_hw_status1
iwlwifi 0000:00:03.0: 0x00000000 | branchlink2
iwlwifi 0000:00:03.0: 0x00000000 | interruptlink1
iwlwifi 0000:00:03.0: 0x00000000 | interruptlink2
iwlwifi 0000:00:03.0: 0x00000000 | data1
iwlwifi 0000:00:03.0: 0x00000000 | data2
iwlwifi 0000:00:03.0: 0x00000000 | data3
iwlwifi 0000:00:03.0: 0x00000000 | beacon time
iwlwifi 0000:00:03.0: 0x00000000 | tsf low
...
With this fix, immediately after we trigger the NMI to the
firmware, we get the expected:
iwlwifi 0000:00:03.0: Microcode SW error detected. Restarting 0x2000000.
iwlwifi 0000:00:03.0: Start IWL Error Log Dump:
iwlwifi 0000:00:03.0: Status: 0x00000040, count: 6
iwlwifi 0000:00:03.0: Loaded firmware version: 29.09bd31e1.0 7265D-29.ucode
iwlwifi 0000:00:03.0: 0x00000084 | NMI_INTERRUPT_UNKNOWN
iwlwifi 0000:00:03.0: 0x000002F1 | trm_hw_status0
iwlwifi 0000:00:03.0: 0x00000000 | trm_hw_status1
iwlwifi 0000:00:03.0: 0x00043D6C | branchlink2
iwlwifi 0000:00:03.0: 0x0004AFD6 | interruptlink1
iwlwifi 0000:00:03.0: 0x000008C4 | interruptlink2
iwlwifi 0000:00:03.0: 0x00000000 | data1
iwlwifi 0000:00:03.0: 0x00000080 | data2
iwlwifi 0000:00:03.0: 0x07030000 | data3
iwlwifi 0000:00:03.0: 0x003FD4C3 | beacon time
iwlwifi 0000:00:03.0: 0x00C22AC3 | tsf low
iwlwifi 0000:00:03.0: 0x00000000 | tsf hi
iwlwifi 0000:00:03.0: 0x00000000 | time gp1
iwlwifi 0000:00:03.0: 0x00C22AC3 | time gp2
iwlwifi 0000:00:03.0: 0x00000001 | uCode revision type
iwlwifi 0000:00:03.0: 0x0000001D | uCode version major
Notice the first line: "Microcode SW error detected:" which
is printed in the driver's ISR, which means that the driver
actually got an interrupt from the firmware saying that it
crashed. And then we have the properly populated error data.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/iwlwifi.20210115130252.70e67cc75d88.I6615cad4361862e7f3c9f2d3cafb6a8c61e16781@changeid
The octeontx2 hardware needs the buffer to be 128 byte aligned.
But in the current implementation of napi_alloc_frag(), it can't
guarantee the return address is 128 byte aligned even the request size
is a multiple of 128 bytes, so we have to request an extra 128 bytes and
use the PTR_ALIGN() to make sure that the buffer is aligned correctly.
Fixes: 7a36e4918e ("octeontx2-pf: Use the napi_alloc_frag() to alloc the pool buffers")
Reported-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Tested-by: Subbaraya Sundeep <sbhatta@marvell.com>
Link: https://lore.kernel.org/r/20210121070906.25380-1-haokexin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>