Currently, mlxsw_afk_picker() chooses which blocks will be used for a
given list of elements, and fills the blocks during the searching - when a
key block is found with most hits, it adds it and removes the elements from
the count of hits. This should be changed as we want to be able to choose
which blocks will be placed in blocks 0 to 5.
To separate between choosing blocks and filling blocks, several pre-changes
are required. The indexes of the chosen blocks should be saved, so then
the relevant blocks will be filled at the end of search.
Allocate a bitmap for chosen blocks, when a block is found with most
hits, set the relevant bit in the bitmap. This bitmap will be used in a
following patch.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For 12 key blocks in the A-TCAM, rules are split into two records, which
constitute two lookups. The two records are linked using a
"large entry key ID".
Due to a Spectrum-4 hardware issue, KVD entries that correspond to key
blocks 0 to 5 of 12 key blocks A-TCAM entries will be placed in the same
KVD pipe if they only differ in their "large entry key ID", as it is
ignored. This results in a reduced scale. To reduce the probability of this
issue, we can place key blocks with high entropy in blocks 0 to 5. The idea
is to place blocks that are changed often in blocks 0 to 5, for
example, key blocks that match on IPv4 addresses or the LSBs of IPv6
addresses. Such placement will reduce the probability of these blocks to be
same.
Mark several blocks with 'high_entropy' flag, so later we will take into
account this flag and place them in blocks 0 to 5.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree says:
====================
sfc: conntrack offload for tunnels
This series adds support for offloading TC flower rules which require
both connection tracking and tunnel decapsulation. Depending on the
match keys required, the left-hand-side rule may go in either the
Outer Rule table or the Action Rule table.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When a foreign LHS rule (TC rule from a tunnel netdev which requests
conntrack lookup) matches on inner headers or enc_key_id, these matches
cannot be performed by the Outer Rule table, as the keys are only
available after the tunnel type has been identified (by the OR lookup)
and the rest of the headers parsed accordingly.
Offload such rules with an Action Rule, using the LOOKUP_CONTROL section
of the AR response to specify the conntrack and/or recirculation actions,
combined with an Outer Rule which performs only the usual Encap Match
duties.
This processing flow, as it requires two AR lookups per packet, is less
performant than OR-CT-AR, so only use it where necessary.
Reviewed-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com>
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There were a few places where no extack error message was set, or the
extack was not forwarded to callees, potentially resulting in a return
of -EOPNOTSUPP with no additional information.
Make sure to populate the error message in these cases. In practice
this does us no good as TC indirect block callbacks don't come with an
extack to fill in; but maybe they will someday and when debugging it's
possible to provide a fake extack and emit its message to the console.
Reviewed-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com>
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Normally, if a TC filter on a tunnel netdev does not match on any
encap fields, we decline to offload it, as it cannot meet our
requirement for a <sip,dip,dport> tuple for the encap match.
However, if the rule has a nonzero chain_index, then for a packet to
reach the rule, it must already have matched a LHS rule which will
have included an encap match and determined the tunnel type, so in
that case we can offload the right-hand-side rule.
Reviewed-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com>
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow a tunnel netdevice (such as a vxlan) to offload conntrack lookups,
in much the same way as efx netdevs.
To ensure this rule does not overlap with other tunnel rules on the same
sip,dip,dport tuple, register a pseudo encap match of a new type
(EFX_TC_EM_PSEUDO_OR), which unlike PSEUDO_MASK may only be referenced
once (because an actual Outer Rule in hardware exists, although its
fw_id is not recorded in the encap match entry).
Reviewed-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com>
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King says:
====================
Rework tx fault fixups
This series reworks the tx-fault fixup and then improves the Nokia GPON
workaround to also ignore the RX LOS signal as well. We do this by
introducing a mask of hardware pin states that should be ignored,
converting the tx-fault fixup to use that, and then augmenting it for
RX LOS.
====================
Link: https://lore.kernel.org/r/ZRwYJXRizvkhm83M@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Improve the Nokia GPON fixup - we need to ignore not only the hardware
LOS signal, but also the software implementation as well. Do this by
using the new state_ignore_mask to indicate that we should ignore not
only the hardware RX_LOS signal, and also clear the LOS bits in the
option field.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Tested-by: Christian Marangi <ansuelsmth@gmail.com>
Link: https://lore.kernel.org/r/E1qnfXh-008UDe-F9@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull networking fixes from Jakub Kicinski:
"Including fixes from Bluetooth, netfilter, BPF and WiFi.
I didn't collect precise data but feels like we've got a lot of 6.5
fixes here. WiFi fixes are most user-awaited.
Current release - regressions:
- Bluetooth: fix hci_link_tx_to RCU lock usage
Current release - new code bugs:
- bpf: mprog: fix maximum program check on mprog attachment
- eth: ti: icssg-prueth: fix signedness bug in prueth_init_tx_chns()
Previous releases - regressions:
- ipv6: tcp: add a missing nf_reset_ct() in 3WHS handling
- vringh: don't use vringh_kiov_advance() in vringh_iov_xfer(), it
doesn't handle zero length like we expected
- wifi:
- cfg80211: fix cqm_config access race, fix crashes with brcmfmac
- iwlwifi: mvm: handle PS changes in vif_cfg_changed
- mac80211: fix mesh id corruption on 32 bit systems
- mt76: mt76x02: fix MT76x0 external LNA gain handling
- Bluetooth: fix handling of HCI_QUIRK_STRICT_DUPLICATE_FILTER
- l2tp: fix handling of transhdrlen in __ip{,6}_append_data()
- dsa: mv88e6xxx: avoid EEPROM timeout when EEPROM is absent
- eth: stmmac: fix the incorrect parameter after refactoring
Previous releases - always broken:
- net: replace calls to sock->ops->connect() with kernel_connect(),
prevent address rewrite in kernel_bind(); otherwise BPF hooks may
modify arguments, unexpectedly to the caller
- tcp: fix delayed ACKs when reads and writes align with MSS
- bpf:
- verifier: unconditionally reset backtrack_state masks on global
func exit
- s390: let arch_prepare_bpf_trampoline return program size, fix
struct_ops offsets
- sockmap: fix accounting of available bytes in presence of PEEKs
- sockmap: reject sk_msg egress redirects to non-TCP sockets
- ipv4/fib: send netlink notify when delete source address routes
- ethtool: plca: fix width of reads when parsing netlink commands
- netfilter: nft_payload: rebuild vlan header on h_proto access
- Bluetooth: hci_codec: fix leaking memory of local_codecs
- eth: intel: ice: always add legacy 32byte RXDID in supported_rxdids
- eth: stmmac:
- dwmac-stm32: fix resume on STM32 MCU
- remove buggy and unneeded stmmac_poll_controller, depend on NAPI
- ibmveth: always recompute TCP pseudo-header checksum, fix use of
the driver with Open vSwitch
- wifi:
- rtw88: rtw8723d: fix MAC address offset in EEPROM
- mt76: fix lock dependency problem for wed_lock
- mwifiex: sanity check data reported by the device
- iwlwifi: ensure ack flag is properly cleared
- iwlwifi: mvm: fix a memory corruption due to bad pointer arithm
- iwlwifi: mvm: fix incorrect usage of scan API
Misc:
- wifi: mac80211: work around Cisco AP 9115 VHT MPDU length"
* tag 'net-6.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (99 commits)
MAINTAINERS: update Matthieu's email address
mptcp: userspace pm allow creating id 0 subflow
mptcp: fix delegated action races
net: stmmac: remove unneeded stmmac_poll_controller
net: lan743x: also select PHYLIB
net: ethernet: mediatek: disable irq before schedule napi
net: mana: Fix oversized sge0 for GSO packets
net: mana: Fix the tso_bytes calculation
net: mana: Fix TX CQE error handling
netlink: annotate data-races around sk->sk_err
sctp: update hb timer immediately after users change hb_interval
sctp: update transport state when processing a dupcook packet
tcp: fix delayed ACKs for MSS boundary condition
tcp: fix quick-ack counting to count actual ACKs of new data
page_pool: fix documentation typos
tipc: fix a potential deadlock on &tx->lock
net: stmmac: dwmac-stm32: fix resume on STM32 MCU
ipv4: Set offload_failed flag in fibmatch results
netfilter: nf_tables: nft_set_rbtree: fix spurious insertion failure
netfilter: nf_tables: Deduplicate nft_register_obj audit logs
...
Pull integrity fixes from Mimi Zohar:
"Two additional patches to fix the removal of the deprecated
IMA_TRUSTED_KEYRING Kconfig"
* tag 'integrity-v6.6-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
ima: rework CONFIG_IMA dependency block
ima: Finish deprecation of IMA_TRUSTED_KEYRING Kconfig
Pull LED fix from Lee Jones:
"Just the one bug-fix:
- Fix regression affecting LED_COLOR_ID_MULTI users"
* tag 'leds-fixes-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/leds:
leds: Drop BUG_ON check for LED_COLOR_ID_MULTI
Pull MFD fixes from Lee Jones:
"A couple of small fixes:
- Potential build failure in CS42L43
- Device Tree bindings clean-up for a superseded patch"
* tag 'mfd-fixes-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd:
dt-bindings: mfd: Revert "dt-bindings: mfd: maxim,max77693: Add USB connector"
mfd: cs42l43: Fix MFD_CS42L43 dependency on REGMAP_IRQ
Pull overlayfs fixes from Amir Goldstein:
- Fix for file reference leak regression
- Fix for NULL pointer deref regression
- Fixes for RCU-walk race regressions:
Two of the fixes were taken from Al's RCU pathwalk race fixes series
with his consent [1].
Note that unlike most of Al's series, these two patches are not about
racing with ->kill_sb() and they are also very recent regressions
from v6.5, so I think it's worth getting them into v6.5.y.
There is also a fix for an RCU pathwalk race with ->kill_sb(), which
may have been solved in vfs generic code as you suggested, but it
also rids overlayfs from a nasty hack, so I think it's worth anyway.
Link: https://lore.kernel.org/linux-fsdevel/20231003204749.GA800259@ZenIV/ [1]
* tag 'ovl-fixes-6.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
ovl: fix NULL pointer defer when encoding non-decodable lower fid
ovl: make use of ->layers safe in rcu pathwalk
ovl: fetch inode once in ovl_dentry_revalidate_common()
ovl: move freeing ovl_entry past rcu delay
ovl: fix file reference leak when submitting aio
Mat Martineau says:
====================
mptcp: Fixes and maintainer email update for v6.6
Patch 1 addresses a race condition in MPTCP "delegated actions"
infrastructure. Affects v5.19 and later.
Patch 2 removes an unnecessary restriction that did not allow additional
outgoing subflows using the local address of the initial MPTCP subflow.
v5.16 and later.
Patch 3 updates Matthieu's email address.
====================
Link: https://lore.kernel.org/r/20231004-send-net-20231004-v1-0-28de4ac663ae@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The delegated action infrastructure is prone to the following
race: different CPUs can try to schedule different delegated
actions on the same subflow at the same time.
Each of them will check different bits via mptcp_subflow_delegate(),
and will try to schedule the action on the related per-cpu napi
instance.
Depending on the timing, both can observe an empty delegated list
node, causing the same entry to be added simultaneously on two different
lists.
The root cause is that the delegated actions infra does not provide
a single synchronization point. Address the issue reserving an additional
bit to mark the subflow as scheduled for delegation. Acquiring such bit
guarantee the caller to own the delegated list node, and being able to
safely schedule the subflow.
Clear such bit only when the subflow scheduling is completed, ensuring
proper barrier in place.
Additionally swap the meaning of the delegated_action bitmask, to allow
the usage of the existing helper to set multiple bit at once.
Fixes: bcd9773431 ("mptcp: use delegate action to schedule 3rd ack retrans")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20231004-send-net-20231004-v1-1-28de4ac663ae@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Using netconsole netpoll_poll_dev could be called from interrupt
context, thus using disable_irq() would cause the following kernel
warning with CONFIG_DEBUG_ATOMIC_SLEEP enabled:
BUG: sleeping function called from invalid context at kernel/irq/manage.c:137
in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 10, name: ksoftirqd/0
CPU: 0 PID: 10 Comm: ksoftirqd/0 Tainted: G W 5.15.42-00075-g816b502b2298-dirty #117
Hardware name: aml (r1) (DT)
Call trace:
dump_backtrace+0x0/0x270
show_stack+0x14/0x20
dump_stack_lvl+0x8c/0xac
dump_stack+0x18/0x30
___might_sleep+0x150/0x194
__might_sleep+0x64/0xbc
synchronize_irq+0x8c/0x150
disable_irq+0x2c/0x40
stmmac_poll_controller+0x140/0x1a0
netpoll_poll_dev+0x6c/0x220
netpoll_send_skb+0x308/0x390
netpoll_send_udp+0x418/0x760
write_msg+0x118/0x140 [netconsole]
console_unlock+0x404/0x500
vprintk_emit+0x118/0x250
dev_vprintk_emit+0x19c/0x1cc
dev_printk_emit+0x90/0xa8
__dev_printk+0x78/0x9c
_dev_warn+0xa4/0xbc
ath10k_warn+0xe8/0xf0 [ath10k_core]
ath10k_htt_txrx_compl_task+0x790/0x7fc [ath10k_core]
ath10k_pci_napi_poll+0x98/0x1f4 [ath10k_pci]
__napi_poll+0x58/0x1f4
net_rx_action+0x504/0x590
_stext+0x1b8/0x418
run_ksoftirqd+0x74/0xa4
smpboot_thread_fn+0x210/0x3c0
kthread+0x1fc/0x210
ret_from_fork+0x10/0x20
Since [0] .ndo_poll_controller is only needed if driver doesn't or
partially use NAPI. Because stmmac does so, stmmac_poll_controller
can be removed fixing the above warning.
[0] commit ac3d9dd034 ("netpoll: make ndo_poll_controller() optional")
Cc: <stable@vger.kernel.org> # 5.15.x
Fixes: 47dd7a540b ("net: add support for STMicroelectronics Ethernet controllers")
Signed-off-by: Remi Pommarel <repk@triplefau.lt>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/1c156a6d8c9170bd6a17825f2277115525b4d50f.1696429960.git.repk@triplefau.lt
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
While searching for possible refactor of napi_schedule_prep and
__napi_schedule it was notice that the mtk eth driver disable the
interrupt for rx and tx AFTER napi is scheduled.
While this is a very hard to repro case it might happen to have
situation where the interrupt is disabled and never enabled again as the
napi completes and the interrupt is enabled before.
This is caused by the fact that a napi driven by interrupt expect a
logic with:
1. interrupt received. napi prepared -> interrupt disabled -> napi
scheduled
2. napi triggered. ring cleared -> interrupt enabled -> wait for new
interrupt
To prevent this case, disable the interrupt BEFORE the napi is
scheduled.
Fixes: 656e705243 ("net-next: mediatek: add support for MT7623 ethernet")
Cc: stable@vger.kernel.org
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Link: https://lore.kernel.org/r/20231002140805.568-1-ansuelsmth@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Before Google adopted FQ for its production servers,
we had to ensure AF4 packets would get a higher share
than BE1 ones.
As discussed this week in Netconf 2023 in Paris, it is time
to upstream this for public use.
After this patch FQ can replace pfifo_fast, with the following
differences :
- FQ uses WRR instead of strict prio, to avoid starvation of
low priority packets.
- We make sure each band/prio tracks its own usage against sch->limit.
This was done to make sure flood of low priority packets would not
prevent AF4 packets to be queued. Contributed by Willem.
- priomap can be changed, if needed (default value are the ones
coming from pfifo_fast).
In this patch, we set default band weights so that :
- high prio (band=0) packets get 90% of the bandwidth
if they compete with low prio (band=2) packets.
- high prio packets get 75% of the bandwidth
if they compete with medium prio (band=1) packets.
Following patch in this series adds the possibility to tune
the per-band weights.
As we added many fields in 'struct fq_sched_data', we had
to make sure to have the first cache line read-mostly, and
avoid wasting precious cache lines.
More optimizations are possible but will be sent separately.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
pfifo_fast prio2band[] is renamed to sch_default_prio2band[]
and exported because we want to share it in FQ.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Now that both enqueue() and dequeue() need to use ktime_get_ns(),
there is no point wasting 8 bytes in struct fq_sched_data.
This makes room for future fields. ;)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Handle the case when GSO SKB linear length is too large.
MANA NIC requires GSO packets to put only the header part to SGE0,
otherwise the TX queue may stop at the HW level.
So, use 2 SGEs for the skb linear part which contains more than the
packet header.
Fixes: ca9c54d2d6 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
For an unknown TX CQE error type (probably from a newer hardware),
still free the SKB, update the queue tail, etc., otherwise the
accounting will be wrong.
Also, TX errors can be triggered by injecting corrupted packets, so
replace the WARN_ONCE to ratelimited error logging.
Cc: stable@vger.kernel.org
Fixes: ca9c54d2d6 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Pull rtla fixes from Daniel Bristot de Oliveira:
"rtla (Real-Time Linux Analysis) tool fixes.
Timerlat auto-analysis:
- Timerlat is reporting thread interference time without thread noise
events occurrence. It was caused because the thread interference
variable was not reset after the analysis of a timerlat activation
that did not hit the threshold.
- The IRQ handler delay is estimated from the delta of the IRQ
latency reported by timerlat, and the timestamp from IRQ handler
start event. If the delta is near-zero, the drift from the external
clock and the trace event and/or the overhead can cause the value
to be negative. If the value is negative, print a zero-delay.
- IRQ handlers happening after the timerlat thread event but before
the stop tracing were being reported as IRQ that happened before
the *current* IRQ occurrence. Ignore Previous IRQ noise in this
condition because they are valid only for the *next* timerlat
activation.
Timerlat user-space:
- Timerlat is stopping all user-space thread if a CPU becomes
offline. Do not stop the entire tool if a CPU is/become offline,
but only the thread of the unavailable CPU. Stop the tool only, if
all threads leave because the CPUs become/are offline.
man-pages:
- Fix command line example in timerlat hist man page"
* tag 'rtla-v6.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bristot/linux:
rtla: fix a example in rtla-timerlat-hist.rst
rtla/timerlat: Do not stop user-space if a cpu is offline
rtla/timerlat_aa: Fix previous IRQ delay for IRQs that happens after thread sample
rtla/timerlat_aa: Fix negative IRQ delay
rtla/timerlat_aa: Zero thread sum after every sample analysis
Jakub Kicinski says:
====================
ynl Makefile cleanup
While catching up on recent changes I noticed unexpected
changes to Makefiles in YNL. Indeed they were not working
as intended but the fixes put in place were not what I had
in mind :)
====================
Link: https://lore.kernel.org/r/20231003153416.2479808-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As far as I can tell the normal Makefile dependency tracking
works, generated files get re-generated if the YAML was updated.
Let make do its job, don't force the re-generation.
make hardclean can be used to force regeneration.
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20231003153416.2479808-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
During the 4-way handshake, the transport's state is set to ACTIVE in
sctp_process_init() when processing INIT_ACK chunk on client or
COOKIE_ECHO chunk on server.
In the collision scenario below:
192.168.1.2 > 192.168.1.1: sctp (1) [INIT] [init tag: 3922216408]
192.168.1.1 > 192.168.1.2: sctp (1) [INIT] [init tag: 144230885]
192.168.1.2 > 192.168.1.1: sctp (1) [INIT ACK] [init tag: 3922216408]
192.168.1.1 > 192.168.1.2: sctp (1) [COOKIE ECHO]
192.168.1.2 > 192.168.1.1: sctp (1) [COOKIE ACK]
192.168.1.1 > 192.168.1.2: sctp (1) [INIT ACK] [init tag: 3914796021]
when processing COOKIE_ECHO on 192.168.1.2, as it's in COOKIE_WAIT state,
sctp_sf_do_dupcook_b() is called by sctp_sf_do_5_2_4_dupcook() where it
creates a new association and sets its transport to ACTIVE then updates
to the old association in sctp_assoc_update().
However, in sctp_assoc_update(), it will skip the transport update if it
finds a transport with the same ipaddr already existing in the old asoc,
and this causes the old asoc's transport state not to move to ACTIVE
after the handshake.
This means if DATA retransmission happens at this moment, it won't be able
to enter PF state because of the check 'transport->state == SCTP_ACTIVE'
in sctp_do_8_2_transport_strike().
This patch fixes it by updating the transport in sctp_assoc_update() with
sctp_assoc_add_peer() where it updates the transport state if there is
already a transport with the same ipaddr exists in the old asoc.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Link: https://lore.kernel.org/r/fd17356abe49713ded425250cc1ae51e9f5846c6.1696172325.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-10-03 (i40e, iavf)
This series contains updates to i40e and iavf drivers.
Yajun Deng aligns reporting of buffer exhaustion statistics to follow
documentation for i40e.
Jake removes undesired 'inline' from functions in iavf.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
iavf: remove "inline" functions from iavf_txrx.c
i40e: Add rx_missed_errors for buffer exhaustion
====================
Link: https://lore.kernel.org/r/20231003223610.2004976-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>