The ct_state validate should not only check the mask bit and also
check mask_bit & key_bit..
For the +new+est case example, The 'new' and 'est' bits should be
set in both state_mask and state flags. Or the -new-est case also
will be reject by kernel.
When Openvswitch with two flows
ct_state=+trk+new,action=commit,forward
ct_state=+trk+est,action=forward
A packet go through the kernel and the contrack state is invalid,
The ct_state will be +trk-inv. Upcall to the ovs-vswitchd, the
finally dp action will be drop with -new-est+trk.
Fixes: 1bcc51ac07 ("net/sched: cls_flower: Reject invalid ct_state flags rules")
Fixes: 3aed8b6333 ("net/sched: cls_flower: validate ct_state for invalid and reply flags")
Signed-off-by: wenxu <wenxu@ucloud.cn>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We were linearizing non-TSO skbs that had too many frags, but
we weren't checking number of frags on TSO skbs. This could
lead to a bad page reference when we received a TSO skb with
more frags than the Tx descriptor could support.
v2: use gso_segs rather than yet another division
don't rework the check on the nr_frags
Fixes: 0f3154e6bc ("ionic: Add Tx and Rx handling")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change the Spidernet network driver from supported to
maintained, add the linuxppc-dev ML, and add myself as
a 'maintainer'.
Cc: Ishizaki Kou <kou.ishizaki@toshiba.co.jp>
Signed-off-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Horman says:
====================
Fixes for nfp pre_tunnel code
Louis Peens says:
The following set of patches fixes up a few bugs in the pre_tun
decap code paths which has been hiding for a while.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
pre_tun_rule flows does not follow the usual add-flow path, instead
they are used to update the pre_tun table on the firmware. This means
that if the mask-id gets allocated here the firmware will never see the
"NFP_FL_META_FLAG_MANAGE_MASK" flag for the specific mask id, which
triggers the allocation on the firmware side. This leads to the firmware
mask being corrupted and causing all sorts of strange behaviour.
Fixes: f12725d98c ("nfp: flower: offload pre-tunnel rules")
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Differentiate between ipv4 and ipv6 flows when configuring the pre_tunnel
table to prevent them trampling each other in the table.
Fixes: 783461604f ("nfp: flower: update flow merge code to support IPv6 tunnels")
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are some pre_tunnel flows combinations which are incorrectly being
offloaded without proper support, fix these.
- Matching on MPLS is not supported for pre_tun.
- Match on IPv4/IPv6 layer must be present.
- Destination MAC address must match pre_tun.dev MAC
Fixes: 120ffd84a9 ("nfp: flower: verify pre-tunnel rules")
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Merely enabling compile-testing should not enable additional code.
To fix this, restrict the automatic enabling of BCM4908_ENET to
ARCH_BCM4908.
Fixes: 4feffeadbc ("net: broadcom: bcm4908enet: add BCM4908 controller driver")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
When openvswitch conntrack offload with act_ct action. The first rule
do conntrack in the act_ct in tc subsystem. And miss the next rule in
the tc and fallback to the ovs datapath but miss set post_ct flag
which will lead the ct_state_key with -trk flag.
Fixes: 7baf2429a1 ("net/sched: cls_flower add CT_FLAGS_INVALID flag support")
Signed-off-by: wenxu <wenxu@ucloud.cn>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marc Kleine-Budde says:
====================
pull-request: can 2021-03-16
this is a pull request of 11 patches for net/master.
The first patch is by Martin Willi and fixes the deletion of network
name spaces with physical CAN interfaces in them.
The next two patches are by me an fix the ISOTP protocol, to ensure
that unused flags in classical CAN frames are properly initialized to
zero.
Stephane Grosjean contributes a patch for the pcan_usb_fd driver,
which add MODULE_SUPPORTED_DEVICE lines for two supported devices.
Angelo Dureghello's patch for the flexcan driver fixes a potential div
by zero, if the bitrate is not set during driver probe.
Jimmy Assarsson's patch for the kvaser_pciefd disables bus load
reporting in the device, if it was previously enabled by the vendor's
out of tree drier. A patch for the kvaser_usb adds support for a new
device, by adding the appropriate USB product ID.
Tong Zhang contributes two patches for the c_can driver. First a
use-after-free in the c_can_pci driver is fixed, in the second patch
the runtime PM for the c_can_pci is fixed by moving the runtime PM
enable/disable from the core driver to the platform driver.
The last two patches are by Torin Cooper-Bennun for the m_can driver.
First a extraneous msg loss warning is removed then he fixes the
RX-path, which might be blocked by errors.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The ena.rst documentation referred to end_start_xmit() when it should refer
to ena_start_xmit(). Fix the typo.
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Acked-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For M_CAN peripherals, m_can_rx_handler() was called with quota = 1,
which caused any error handling to block RX from taking place until
the next time the IRQ handler is called. This had been observed to
cause RX to be blocked indefinitely in some cases.
This is fixed by calling m_can_rx_handler with a sensibly high quota.
Fixes: f524f829b7 ("can: m_can: Create a m_can platform framework")
Link: https://lore.kernel.org/r/20210303144350.4093750-1-torin@maxiluxsystems.com
Suggested-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Torin Cooper-Bennun <torin@maxiluxsystems.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Message loss from RX FIFO 0 is already handled in
m_can_handle_lost_msg(), with netdev output included.
Removing this warning also improves driver performance under heavy
load, where m_can_do_rx_poll() may be called many times before this
interrupt is cleared, causing this message to be output many
times (thanks Mariusz Madej for this report).
Fixes: e0d1f4816f ("can: m_can: add Bosch M_CAN controller support")
Link: https://lore.kernel.org/r/20210303103151.3760532-1-torin@maxiluxsystems.com
Reported-by: Mariusz Madej <mariusz.madej@xtrack.com>
Signed-off-by: Torin Cooper-Bennun <torin@maxiluxsystems.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Under certain circumstances, when switching from Kvaser's linuxcan driver
(kvpciefd) to the SocketCAN driver (kvaser_pciefd), the bus load reporting
is not disabled.
This is flooding the kernel log with prints like:
[3485.574677] kvaser_pciefd 0000:02:00.0: Received unexpected packet type 0x00000009
Always put the controller in the expected state, instead of assuming that
bus load reporting is inactive.
Note: If bus load reporting is enabled when the driver is loaded, you will
still get a number of bus load packages (and printouts), before it is
disabled.
Fixes: 26ad340e58 ("can: kvaser_pciefd: Add driver for Kvaser PCIEcan devices")
Link: https://lore.kernel.org/r/20210309091724.31262-1-jimmyassarsson@gmail.com
Signed-off-by: Jimmy Assarsson <extja@kvaser.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
For cases when flexcan is built-in, bitrate is still not set at
registering. So flexcan_chip_freeze() generates:
[ 1.860000] *** ZERO DIVIDE *** FORMAT=4
[ 1.860000] Current process id is 1
[ 1.860000] BAD KERNEL TRAP: 00000000
[ 1.860000] PC: [<402e70c8>] flexcan_chip_freeze+0x1a/0xa8
To allow chip freeze, using an hardcoded timeout when bitrate is still
not set.
Fixes: ec15e27cc8 ("can: flexcan: enable RX FIFO after FRZ/HALT valid")
Link: https://lore.kernel.org/r/20210315231510.650593-1-angelo@kernel-space.org
Signed-off-by: Angelo Dureghello <angelo@kernel-space.org>
[mkl: use if instead of ? operator]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The previous patch ensures that the TX flags (struct
can_isotp_ll_options::tx_flags) are 0 for classic CAN frames or a user
configured value for CAN-FD frames.
This patch sets the CAN frames flags unconditionally to the ISO-TP TX
flags, so that they are initialized to a proper value. Otherwise when
running "candump -x" on a classical CAN ISO-TP stream shows wrongly
set "B" and "E" flags.
| $ candump any,0:0,#FFFFFFFF -extA
| [...]
| can0 TX B E 713 [8] 2B 0A 0B 0C 0D 0E 0F 00
| can0 TX B E 713 [8] 2C 01 02 03 04 05 06 07
| can0 TX B E 713 [8] 2D 08 09 0A 0B 0C 0D 0E
| can0 TX B E 713 [8] 2E 0F 00 01 02 03 04 05
Fixes: e057dd3fc2 ("can: add ISO 15765-2:2016 transport protocol")
Link: https://lore.kernel.org/r/20210218215434.1708249-2-mkl@pengutronix.de
Cc: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
When a non-initial netns is destroyed, the usual policy is to delete
all virtual network interfaces contained, but move physical interfaces
back to the initial netns. This keeps the physical interface visible
on the system.
CAN devices are somewhat special, as they define rtnl_link_ops even
if they are physical devices. If a CAN interface is moved into a
non-initial netns, destroying that netns lets the interface vanish
instead of moving it back to the initial netns. default_device_exit()
skips CAN interfaces due to having rtnl_link_ops set. Reproducer:
ip netns add foo
ip link set can0 netns foo
ip netns delete foo
WARNING: CPU: 1 PID: 84 at net/core/dev.c:11030 ops_exit_list+0x38/0x60
CPU: 1 PID: 84 Comm: kworker/u4:2 Not tainted 5.10.19 #1
Workqueue: netns cleanup_net
[<c010e700>] (unwind_backtrace) from [<c010a1d8>] (show_stack+0x10/0x14)
[<c010a1d8>] (show_stack) from [<c086dc10>] (dump_stack+0x94/0xa8)
[<c086dc10>] (dump_stack) from [<c086b938>] (__warn+0xb8/0x114)
[<c086b938>] (__warn) from [<c086ba10>] (warn_slowpath_fmt+0x7c/0xac)
[<c086ba10>] (warn_slowpath_fmt) from [<c0629f20>] (ops_exit_list+0x38/0x60)
[<c0629f20>] (ops_exit_list) from [<c062a5c4>] (cleanup_net+0x230/0x380)
[<c062a5c4>] (cleanup_net) from [<c0142c20>] (process_one_work+0x1d8/0x438)
[<c0142c20>] (process_one_work) from [<c0142ee4>] (worker_thread+0x64/0x5a8)
[<c0142ee4>] (worker_thread) from [<c0148a98>] (kthread+0x148/0x14c)
[<c0148a98>] (kthread) from [<c0100148>] (ret_from_fork+0x14/0x2c)
To properly restore physical CAN devices to the initial netns on owning
netns exit, introduce a flag on rtnl_link_ops that can be set by drivers.
For CAN devices setting this flag, default_device_exit() considers them
non-virtual, applying the usual namespace move.
The issue was introduced in the commit mentioned below, as at that time
CAN devices did not have a dellink() operation.
Fixes: e008b5fc8d ("net: Simplfy default_device_exit and improve batching.")
Link: https://lore.kernel.org/r/20210302122423.872326-1-martin@strongswan.org
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Currently, Linux computes the HMAC contained in ADD_ADDR sub-option using
the Address Id and the IP Address, and hardcodes a destination port equal
to zero. This is not ok for ADD_ADDR with port: ensure to account for the
endpoint port when computing the HMAC, in compliance with RFC8684 §3.4.1.
Fixes: 22fb85ffae ("mptcp: add port support for ADD_ADDR suboption writing")
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Acked-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently tcp_check_req can be called with obsolete req socket for which big
socket have been already created (because of CPU race or early demux
assigning req socket to multiple packets in gro batch).
Commit e0f9759f53 ("tcp: try to keep packet if SYN_RCV race
is lost") added retry in case when tcp_check_req is called for PSH|ACK packet.
But if client sends RST+ACK immediatly after connection being
established (it is performing healthcheck, for example) retry does not
occur. In that case tcp_check_req tries to close req socket,
leaving big socket active.
Fixes: e0f9759f53 ("tcp: try to keep packet if SYN_RCV race is lost")
Signed-off-by: Alexander Ovechkin <ovov@yandex-team.ru>
Reported-by: Oleg Senin <olegsenin@yandex-team.ru>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
if pl->mac_ops->mac_finish() failed, phylink_err should use
"mac_finish" instead of "mac_prepare".
Fixes: b7ad14c2fe ("net: phylink: re-implement interface configuration with PCS")
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
"x25_close" is called by "hdlc_close" in "hdlc.c", which is called by
hardware drivers' "ndo_stop" function.
"x25_xmit" is called by "hdlc_start_xmit" in "hdlc.c", which is hardware
drivers' "ndo_start_xmit" function.
"x25_rx" is called by "hdlc_rcv" in "hdlc.c", which receives HDLC frames
from "net/core/dev.c".
"x25_close" races with "x25_xmit" and "x25_rx" because their callers race.
However, we need to ensure that the LAPB APIs called in "x25_xmit" and
"x25_rx" are called before "lapb_unregister" is called in "x25_close".
This patch adds locking to ensure when "x25_xmit" and "x25_rx" are doing
their work, "lapb_unregister" is not yet called in "x25_close".
Reasons for not solving the racing between "x25_close" and "x25_xmit" by
calling "netif_tx_disable" in "x25_close":
1. We still need to solve the racing between "x25_close" and "x25_rx";
2. The design of the HDLC subsystem assumes the HDLC hardware drivers
have full control over the TX queue, and the HDLC protocol drivers (like
this driver) have no control. Controlling the queue here in the protocol
driver may interfere with hardware drivers' control of the queue.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Xie He <xie.he.0141@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
flow_dissector_key_icmp::id is of type u16 (CPU byteorder),
ICMP header has its ID field in network byteorder obviously.
Sparse says:
net/core/flow_dissector.c:178:43: warning: restricted __be16 degrades to integer
Convert ID value to CPU byteorder when storing it into
flow_dissector_key_icmp.
Fixes: 5dec597e5c ("flow_dissector: extract more ICMP information")
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are two issues when handling error case in com20020pci_probe()
1. priv might be not initialized yet when calling com20020pci_remove()
from com20020pci_probe(), since the priv is set at the very last but it
can jump to error handling in the middle and priv remains NULL.
2. memory leak - the net device is allocated in alloc_arcdev but not
properly released if error happens in the middle of the big for loop
[ 1.529110] BUG: kernel NULL pointer dereference, address: 0000000000000008
[ 1.531447] RIP: 0010:com20020pci_remove+0x15/0x60 [com20020_pci]
[ 1.536805] Call Trace:
[ 1.536939] com20020pci_probe+0x3f2/0x48c [com20020_pci]
[ 1.537226] local_pci_probe+0x48/0x80
[ 1.539918] com20020pci_init+0x3f/0x1000 [com20020_pci]
Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a QMI handle is initialized, an array of message handler
structures is provided, defining how any received message should
be handled based on its type and message ID. The QMI core code
traverses this array when a message arrives and calls the function
associated with the (type, msg_id) found in the array.
The array is supposed to be terminated with an empty (all zero)
entry though. Without it, an unsupported message will cause
the QMI core code to go past the end of the array.
Fix this bug, by properly terminating the message handler arrays
provided when QMI handles are set up by the IPA driver.
Fixes: 530f9216a9 ("soc: qcom: ipa: AP/modem communications")
Reported-by: Sujit Kautkar <sujitka@chromium.org>
Signed-off-by: Alex Elder <elder@linaro.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2021-03-12
This series contains updates to ice, i40e, ixgbe and igb drivers.
Magnus adjusts the return value for xsk allocation for ice. This fixes
reporting of napi work done and matches the behavior of other Intel NIC
drivers for xsk allocations.
Maciej moves storing of the rx_offset value to after the build_skb flag
is set as this flag affects the offset value for ice, i40e, and ixgbe.
Li RongQing resolves an issue where an Rx buffer can be reused
prematurely with XDP redirect for igb.
====================
Tom wrote most of the driver code and his experience is valuable to us.
Add him as a Reviewer so that patches will be Cc'ed and reviewed by him.
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The join self tests previously used the '-c' command line option to
enable creation of pcap files for the tests that run, but the change to
allow running a subset of the join tests made overlapping use of that
option.
Restore the capture functionality with '-c' and move the syncookie test
option to '-k'.
Fixes: 1002b89f23 ("selftests: mptcp: add command line arguments for mptcp_join.sh")
Acked-and-tested-by: Geliang Tang <geliangtang@gmail.com>
Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A recent change to MIPS ralink reset logic made it so mt7530 actually
resets the switch on platforms such as mt7621 (where bit 2 is the reset
line for the switch). That exposed an issue where the switch would not
function properly in TRGMII mode after a reset.
Reconfigure core clock in TRGMII mode to fix the issue.
Tested on Ubiquiti ER-X (MT7621) with TRGMII mode enabled.
Fixes: 3f9ef7785a ("MIPS: ralink: manage low reset lines")
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PHY driver entry for BCM50160 and BCM50610M calls
bcm54xx_config_init() but does not call bcm54xx_config_clock_delay() in
order to configuration appropriate clock delays on the PHY, fix that.
Fixes: 733336262b ("net: phy: Allow BCM5481x PHYs to setup internal TX/RX clock delay")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The interrupt handler may set the flag to reset the mac in the future,
but that flag is not cleared once the reset has occurred.
Fixes: 10cbd64076 ("ftgmac100: Rework NAPI & interrupts handling")
Signed-off-by: Dylan Hung <dylan_hung@aspeedtech.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver did not always clean up all allocated resources when probe
failed. Fix the probe cleanup path to clean up everything that was
allocated.
Fixes: 57baf8cc70 ("net: axienet: Handle deferred probe on clock properly")
Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The "backlog" argument in listen() specifies
the maximom length of pending connections,
so the accept queue should be considered full
if there are exactly "backlog" elements.
Signed-off-by: liuyacan <yacanliu@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 2055a99da8.
This change rejects legitimate configurations.
A slave doesn't need to exist nor implement ndo_slave_setup.
Signed-off-by: David S. Miller <davem@davemloft.net>
Igb needs a similar fix as commit 75aab4e10a ("i40e: avoid
premature Rx buffer reuse")
The page recycle code, incorrectly, relied on that a page fragment
could not be freed inside xdp_do_redirect(). This assumption leads to
that page fragments that are used by the stack/XDP redirect can be
reused and overwritten.
To avoid this, store the page count prior invoking xdp_do_redirect().
Longer explanation:
Intel NICs have a recycle mechanism. The main idea is that a page is
split into two parts. One part is owned by the driver, one part might
be owned by someone else, such as the stack.
t0: Page is allocated, and put on the Rx ring
+---------------
used by NIC ->| upper buffer
(rx_buffer) +---------------
| lower buffer
+---------------
page count == USHRT_MAX
rx_buffer->pagecnt_bias == USHRT_MAX
t1: Buffer is received, and passed to the stack (e.g.)
+---------------
| upper buff (skb)
+---------------
used by NIC ->| lower buffer
(rx_buffer) +---------------
page count == USHRT_MAX
rx_buffer->pagecnt_bias == USHRT_MAX - 1
t2: Buffer is received, and redirected
+---------------
| upper buff (skb)
+---------------
used by NIC ->| lower buffer
(rx_buffer) +---------------
Now, prior calling xdp_do_redirect():
page count == USHRT_MAX
rx_buffer->pagecnt_bias == USHRT_MAX - 2
This means that buffer *cannot* be flipped/reused, because the skb is
still using it.
The problem arises when xdp_do_redirect() actually frees the
segment. Then we get:
page count == USHRT_MAX - 1
rx_buffer->pagecnt_bias == USHRT_MAX - 2
From a recycle perspective, the buffer can be flipped and reused,
which means that the skb data area is passed to the Rx HW ring!
To work around this, the page count is stored prior calling
xdp_do_redirect().
Fixes: 9cbc948b5a ("igb: add XDP support")
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Tested-by: Vishakha Jambekar <vishakha.jambekar@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
ixgbe_rx_offset(), that is supposed to initialize the Rx buffer headroom,
relies on __IXGBE_RX_BUILD_SKB_ENABLED flag.
Currently, the callsite of mentioned function is placed incorrectly
within ixgbe_setup_rx_resources() where Rx ring's build skb flag is not
set yet. This causes the XDP_REDIRECT to be partially broken due to
inability to create xdp_frame in the headroom space, as the headroom is
0.
Fix this by moving ixgbe_rx_offset() to ixgbe_configure_rx_ring() after
the flag setting, which happens to be set in ixgbe_set_rx_buffer_len.
Fixes: c0d4e9d223 ("ixgbe: store the result of ixgbe_rx_offset() onto ixgbe_ring")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Vishakha Jambekar <vishakha.jambekar@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
ice_rx_offset(), that is supposed to initialize the Rx buffer headroom,
relies on ICE_RX_FLAGS_RING_BUILD_SKB flag as well as XDP prog presence.
Currently, the callsite of mentioned function is placed incorrectly
within ice_setup_rx_ring() where Rx ring's build skb flag is not
set yet. This causes the XDP_REDIRECT to be partially broken due to
inability to create xdp_frame in the headroom space, as the headroom is
0.
Fix this by moving ice_rx_offset() to ice_setup_rx_ctx() after the flag
setting.
Fixes: f1b1f409bf ("ice: store the result of ice_rx_offset() onto ice_ring")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
i40e_rx_offset(), that is supposed to initialize the Rx buffer headroom,
relies on I40E_RXR_FLAGS_BUILD_SKB_ENABLED flag.
Currently, the callsite of mentioned function is placed incorrectly
within i40e_setup_rx_descriptors() where Rx ring's build skb flag is not
set yet. This causes the XDP_REDIRECT to be partially broken due to
inability to create xdp_frame in the headroom space, as the headroom is
0.
For the record, below is the call graph:
i40e_vsi_open
i40e_vsi_setup_rx_resources
i40e_setup_rx_descriptors
i40e_rx_offset() <-- sets offset to 0 as build_skb flag is set below
i40e_vsi_configure_rx
i40e_configure_rx_ring
set_ring_build_skb_enabled(ring) <-- set build_skb flag
Fix this by moving i40e_rx_offset() to i40e_configure_rx_ring() after
the flag setting.
Fixes: f7bb0d71d6 ("i40e: store the result of i40e_rx_offset() onto i40e_ring")
Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Fix the wrong napi work done reporting in the xsk path of the ice
driver. The code in the main Rx processing loop was written to assume
that the buffer allocation code returns true if all allocations where
successful and false if not. In contrast with all other Intel NIC xsk
drivers, the ice_alloc_rx_bufs_zc() has the inverted logic messing up
the work done reporting in the napi loop.
This can be fixed either by inverting the return value from
ice_alloc_rx_bufs_zc() in the function that uses this in an incorrect
way, or by changing the return value of ice_alloc_rx_bufs_zc(). We
chose the latter as it makes all the xsk allocation functions for
Intel NICs behave in the same way. My guess is that it was this
unexpected discrepancy that gave rise to this bug in the first place.
Fixes: 5bb0c4b5eb ("ice, xsk: Move Rx allocation out of while-loop")
Reported-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Maxim Mikityanskiy says:
====================
Bugfixes for HTB
The HTB offload feature introduced a few bugs in HTB. One affects the
non-offload mode, preventing attaching qdiscs to HTB classes, and the
other affects the error flow, when the netdev doesn't support the
offload, but it was requested. This short series fixes them.
====================
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
htb_init may fail to do the offload if it's not supported or if a
runtime error happens when allocating direct qdiscs. In those cases
TC_HTB_CREATE command is not sent to the driver, however, htb_destroy
gets called anyway and attempts to send TC_HTB_DESTROY.
It shouldn't happen, because the driver didn't receive TC_HTB_CREATE,
and also because the driver may not support ndo_setup_tc at all, while
q->offload is true, and htb_destroy mistakenly thinks the offload is
supported. Trying to call ndo_setup_tc in the latter case will lead to a
NULL pointer dereference.
This commit fixes the issues with htb_destroy by deferring assignment of
q->offload until after the TC_HTB_CREATE command. The necessary cleanup
of the offload entities is already done in htb_init.
Reported-by: syzbot+b53a709f04722ca12a3c@syzkaller.appspotmail.com
Fixes: d03b195b5a ("sch_htb: Hierarchical QoS hardware offload")
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>