The top syzbot report for networking (#14 for the entire kernel)
is the queue timeout splat. We kept it around for a long time,
because in real life it provides pretty strong signal that
something is wrong with the driver or the device.
Removing it is also likely to break monitoring for those who
track it as a kernel warning.
Nevertheless, WARN()ings are best suited for catching kernel
programming bugs. If a Tx queue gets starved due to a pause
storm, priority configuration, or other weirdness - that's
obviously a problem, but not a problem we can fix at
the kernel level.
Bite the bullet and convert the WARN() to a print.
Before:
NETDEV WATCHDOG: eni1np1 (netdevsim): transmit queue 0 timed out 1975 ms
WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:525 dev_watchdog+0x39e/0x3b0
[... completely pointless stack trace of a timer follows ...]
Now:
netdevsim netdevsim1 eni1np1: NETDEV WATCHDOG: CPU: 0: transmit queue 0 timed out 1769 ms
Alternatively we could mark the drivers which syzbot has
learned to abuse as "print-instead-of-WARN" selectively.
Reported-by: syzbot+d55372214aff0faa1f1f@syzkaller.appspotmail.com
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan says:
====================
bnxt_en: TX path improvements
All patches in this patchset are related to improving the TX path.
There are 2 areas of improvements:
1. The TX interrupt logic currently counts the number of TX completions
to determine the number of TX SKBs to free. We now change it so that
the TX completion will now contain the hardware consumer index
information. The driver will keep track of the latest hardware
consumer index from the last TX completion and clean up all TX SKBs
up to that index. This scheme aligns better with future chips and
allows xmit_more code path to be more optimized.
2. The current driver logic requires an additional MSIX for each
additional MQPRIO TX ring. This scheme uses too many MSIX vectors if
the user enables a large number of MQPRIO TCs. We now use a new scheme
that will use the same MSIX for all the MQPRIO TX rings for each
ethtool channel. Each ethtool TX channel can have up to 8 MQPRIO
TX rings and now they all will share the same MSIX.
v2: Rebased
v1 posted on Oct 27 2023 right before the close of net-next:
https://lore.kernel.org/netdev/20231027232252.36111-1-michael.chan@broadcom.com/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we use the cumulative consumer index scheme for TX completion,
we don't need to have one TX completion per TX packet in the xmit_more
code path. Set the TX_BD_FLAGS_NO_CMPL flag if xmit_more is true.
Fallback to one interrupt per packet if the ring is filled beyond
bp->tx_wake_thresh.
Also, move the wmb() to bnxt_txr_db_kick(). When xmit_more is true,
we'll skip the bnxt_txr_db_kick() call and there is no need to call
wmb() to sync. the TX BD data.
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We can now fully support sharing the same MSIX for all mqprio TX rings
belonging to the same ethtool channel with the new infrastructure:
1. Allocate the proper entries for cp_ring_arr in struct bnxt_cp_ring_info
to support the additional TX rings.
2. Populate the tx_ring array in struct bnxt_napi for all TX rings
sharing the same NAPI.
3. bnxt_num_tx_to_cp() returns the proper NQ/completion rings to support
the TX rings in the input.
4. Adjust bnxt_get_num_ring_stats() for the reduced number of ring
counters with the new scheme.
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add 3 macros that handle to conversions between TC numbers and TX
ring numbers. These will help to clarify the existing logic and the
new logic in the next patch.
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Up until now, each TX ring always requires a completion ring/NQ/MSIX.
bnxt_trim_rings() and the assignment of bp->cp_nr_rings always make
this assumption. This will no longer be true in the next patches, so
we refactor and add helper functions to determine the proper relationship
between TX rings and the required completion ring/NQ/MSIX. This patch
does not change the 1:1 relationship yet.
Note that on P5 chips, each RX and TX ring still requires a completion
ring. Only the number of NQs has been reduced. We should no longer call
bnxt_trim_rings() to adjust the RX and TX rings on P5 chips. Replace with
simple logic to check that RX + TX < CP and adjust accordingly.
bnxt_check_rings() should call _bnxt_get_max_rings() to get the raw
number of rings instead of bnxt_get_max_rings(). If we are about to
create TCs, bnxt_get_max_rings() would not be able to calculate the max
rings correctly.
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For each mqprio TC, we allocate a set of TX rings to map to the new
hardware CoS queue. Expand the tx_ring pointer in struct bnxt_napi
to an array of 8 to support up to 8 TX rings, one for each TC.
Only array entry 0 is used at this time. The rest of the array
entries will be used in later patches.
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add 2 helper functions to set coalescing for each RX and TX rings. This
will make it easier to expand the number of TX rings per MSIX in the
next patches.
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to support multiple TX rings on the same MSIX, we'll use the
upper byte of the TX opaque field to store the ring index in the new
tx_napi_idx field. This tx_napi_idx field is currently always 0 until
more infrastructure is added in later patches.
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bnxt_tx_int() processes the only one TX ring from the bnxt_napi pointer.
To prepare for more TX rings associated with the bnxt_napi structure,
add a new __bnxt_tx_int() function that takes the bnxt_tx_ring_info
pointer to process that one TX ring. No functional change.
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These 2 constants were used for the one RX and one TX completion
ring pointer in the cpr->cp_ring_arr fixed array. Now that we've
changed to allocating the array for the exact number of entries to
support more TX rings, we no longer use these constants.
The array index as well as the type of completion ring (RX/TX) are
now encoded in the handle for the completion ring. This will allow
us to locate the completion ring during NAPI for any number of
completion rings sharing the same MSIX. In the following patches,
we'll be adding support for more TX rings associated with the same
MSIX vector.
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
From the TX or RX ring structure, we need to find the corresponding
completion ring during initialization. On P5 chips, we use the MSIX/napi
entry to locate the completion ring because there is only one RX/TX
ring per MSIX. To allow multiple TX rings for each MSIX, we need
to add a direct pointer from the TX ring and RX ring structures.
This also simplifies the existing logic.
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The cp_ring_arr is currently a fixed array of 2 pointers for the
TX and RX completion rings. These pointers are allocated during
ring initialization. Currntly, we support up to 2 completion rings
for each MSIX. In order to support more completion rings, we change
this fixed array to a pointer and allocate the required entries
during ring initialization. This patch keeps the current scheme of
allocating only 2 entries when needed. Later patches will expand
and allocate more entries when required.
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
From the TX or RX ring structure, we need to find the corresponding
completion ring during initialization. On P5 chips, we use the MSIX/napi
entry to locate the completion ring because there is only one RX/TX
ring per MSIX. To allow multiple TX rings for each MSIX, we need
to add a direct pointer from the TX ring and RX ring structures.
This also simplifies the existing logic.
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the opaque field in the TX BD is only used for debugging.
The TX completion logic relies on getting one TX completion for each
packet and they always complete in order.
Improve this scheme by putting the producer information (ring index plus
number of BDs for the packet) in the opaque field. This way, we can
handle TX completion processing by looking at the last TX completion
instead of counting the number of completions.
Since we no longer need to count the exact number of completions, we can
optimize xmit_more by disabling TX completion when the xmit_more
condition is true. This will be done in later patches.
This patch is only initializing the opaque field in the TX BD and is
not changing the driver's TX completion logic yet.
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-11-13 (i40e)
This series contains updates to i40e driver only.
Justin Bronder increases number of allowable descriptors for XL710
devices.
Su Hui adds error check, and unroll, for RSS configuration.
Andrii changes module read error message to debug and makes it more
verbose.
Ivan Vecera performs numerous clean-ups and refactors to driver such as
removing unused defines and fields, converting use of flags to bitmaps,
adding helpers, re-organizing code, etc.
====================
Link: https://lore.kernel.org/r/20231113231047.548659-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Convert flags field in i40e_hw from u64 to bitmap and its usage
to use bit access functions and rename the field to 'caps' as
this field describes capabilities that are set once on init and
read many times later.
Changes:
- Convert "hw_ptr->flags & FLAG" to "test_bit(FLAG, ...)"
- Convert "hw_ptr->flags |= FLAG" to "set_bit(FLAG, ...)"
- Convert "hw_ptr->flags &= ~FLAG" to "clear_bit(FLAG, ...)"
- Rename i40e_hw.flags to i40e_hw.caps
- Rename i40e_set_hw_flags() to i40e_set_hw_caps()
- Adjust caps names so they are prefixed by I40E_HW_CAP_ and existing
_CAPABLE suffixes are stripped
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20231113231047.548659-8-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Convert flags and hw_features fields from i40e_pf from u32 to
bitmaps and their usage to use bit access functions.
Changes:
- Convert "pf_ptr->(flags|hw_features) & FL" to "test_bit(FL, ...)"
- Convert "pf_ptr->(flags|hw_features) |= FL" to "set_bit(FL, ...)"
- Convert "pf_ptr->(flags|hw_features) &= ~FL" to "clear_bit(FL, ...)"
- Rename flag field to bitno in i40e_priv_flags and adjust ethtool
callbacks to work with flags bitmap
- Rename flag names where '_ENABLED'->'_ENA' and '_DISABLED'->'_DIS'
like in ice driver
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20231113231047.548659-7-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter and bpf.
Current release - regressions:
- sched: fix SKB_NOT_DROPPED_YET splat under debug config
Current release - new code bugs:
- tcp:
- fix usec timestamps with TCP fastopen
- fix possible out-of-bounds reads in tcp_hash_fail()
- fix SYN option room calculation for TCP-AO
- tcp_sigpool: fix some off by one bugs
- bpf: fix compilation error without CGROUPS
- ptp:
- ptp_read() should not release queue
- fix tsevqs corruption
Previous releases - regressions:
- llc: verify mac len before reading mac header
Previous releases - always broken:
- bpf:
- fix check_stack_write_fixed_off() to correctly spill imm
- fix precision tracking for BPF_ALU | BPF_TO_BE | BPF_END
- check map->usercnt after timer->timer is assigned
- dsa: lan9303: consequently nested-lock physical MDIO
- dccp/tcp: call security_inet_conn_request() after setting IP addr
- tg3: fix the TX ring stall due to incorrect full ring handling
- phylink: initialize carrier state at creation
- ice: fix direction of VF rules in switchdev mode
Misc:
- fill in a bunch of missing MODULE_DESCRIPTION()s, more to come"
* tag 'net-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (84 commits)
net: ti: icss-iep: fix setting counter value
ptp: fix corrupted list in ptp_open
ptp: ptp_read should not release queue
net_sched: sch_fq: better validate TCA_FQ_WEIGHTS and TCA_FQ_PRIOMAP
net: kcm: fill in MODULE_DESCRIPTION()
net/sched: act_ct: Always fill offloading tuple iifidx
netfilter: nat: fix ipv6 nat redirect with mapped and scoped addresses
netfilter: xt_recent: fix (increase) ipv6 literal buffer length
ipvs: add missing module descriptions
netfilter: nf_tables: remove catchall element in GC sync path
netfilter: add missing module descriptions
drivers/net/ppp: use standard array-copy-function
net: enetc: shorten enetc_setup_xdp_prog() error message to fit NETLINK_MAX_FMTMSG_LEN
virtio/vsock: Fix uninit-value in virtio_transport_recv_pkt()
r8169: respect userspace disabling IFF_MULTICAST
selftests/bpf: get trusted cgrp from bpf_iter__cgroup directly
bpf: Let verifier consider {task,cgroup} is trusted in bpf_iter_reg
net: phylink: initialize carrier state at creation
test/vsock: add dobule bind connect test
test/vsock: refactor vsock_accept
...
Pull crypto fixes from Herbert Xu:
"This fixes a regression in ahash and hides the Kconfig sub-options for
the jitter RNG"
* tag 'v6.7-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: ahash - Set using_shash for cloned ahash wrapper over shash
crypto: jitterentropy - Hide esoteric Kconfig options under FIPS and EXPERT
Pull input updates from Dmitry Torokhov:
- a number of input drivers has been converted to use facilities
provided by the device core to instantiate driver-specific attributes
instead of using devm_device_add_group() and similar APIs
- platform input devices have been converted to use remove() callback
returning void
- a fix for use-after-free when tearing down a Synaptics RMI device
- a few flexible arrays in input structures have been annotated with
__counted_by to help hardening efforts
- handling of vddio supply in cyttsp5 driver
- other miscellaneous fixups
* tag 'input-for-v6.7-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (86 commits)
Input: walkera0701 - use module_parport_driver macro to simplify the code
Input: synaptics-rmi4 - fix use after free in rmi_unregister_function()
dt-bindings: input: fsl,scu-key: Document wakeup-source
Input: cyttsp5 - add handling for vddio regulator
dt-bindings: input: cyttsp5: document vddio-supply
Input: tegra-kbc - use device_get_match_data()
Input: Annotate struct ff_device with __counted_by
Input: axp20x-pek - avoid needless newline removal
Input: mt - annotate struct input_mt with __counted_by
Input: leds - annotate struct input_leds with __counted_by
Input: evdev - annotate struct evdev_client with __counted_by
Input: synaptics-rmi4 - replace deprecated strncpy
Input: wm97xx-core - convert to platform remove callback returning void
Input: wm831x-ts - convert to platform remove callback returning void
Input: ti_am335x_tsc - convert to platform remove callback returning void
Input: sun4i-ts - convert to platform remove callback returning void
Input: stmpe-ts - convert to platform remove callback returning void
Input: pcap_ts - convert to platform remove callback returning void
Input: mc13783_ts - convert to platform remove callback returning void
Input: mainstone-wm97xx - convert to platform remove callback returning void
...
Pull more i2c updates from Wolfram Sang:
"This contains one patch which slipped through the cracks (iproc), a
core sanitizing improvement as the new memdup_array_user() helper went
upstream (i2c-dev), and two driver bugfixes (designware, cp2615)"
* tag 'for-6.7-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: cp2615: Fix 'assignment to __be16' warning
i2c: dev: copy userspace array safely
i2c: designware: Disable TX_EMPTY irq while waiting for block length byte
i2c: iproc: handle invalid slave state
Pull watchdog updates from Wim Van Sebroeck:
- add support for Amlogic C3 and S4 SoCs
- add IT8613 ID
- add MSM8226 and MSM8974 compatibles
- other small fixes and improvements
* tag 'linux-watchdog-6.7-rc1' of git://www.linux-watchdog.org/linux-watchdog: (24 commits)
dt-bindings: watchdog: Add support for Amlogic C3 and S4 SoCs
watchdog: mlx-wdt: Parameter desctiption warning fix
watchdog: aspeed: Add support for aspeed,reset-mask DT property
dt-bindings: watchdog: aspeed-wdt: Add aspeed,reset-mask property
watchdog: apple: Deactivate on suspend
dt-bindings: watchdog: qcom-wdt: Add MSM8226 and MSM8974 compatibles
dt-bindings: watchdog: fsl-imx7ulp-wdt: Add 'fsl,ext-reset-output'
wdog: imx7ulp: Enable wdog int_en bit for watchdog any reset
drivers: watchdog: marvell_gti: Program the max_hw_heartbeat_ms
drivers: watchdog: marvell_gti: fix zero pretimeout handling
watchdog: marvell_gti: Replace of_platform.h with explicit includes
watchdog: imx_sc_wdt: continue if the wdog already enabled
watchdog: st_lpc: Use device_get_match_data()
watchdog: wdat_wdt: Add timeout value as a param in ping method
watchdog: gpio_wdt: Make use of device properties
sbsa_gwdt: Calculate timeout with 64-bit math
watchdog: ixp4xx: Make sure restart always works
watchdog: it87_wdt: add IT8613 ID
watchdog: marvell_gti_wdt: Fix error code in probe()
Watchdog: marvell_gti_wdt: Remove redundant dev_err_probe() for platform_get_irq()
...
Pull pwm updates from Thierry Reding:
"This contains a few fixes and a bunch of cleanups, a lot of which is
in preparation for Uwe's character device support that may be ready in
time for the next merge window"
* tag 'pwm/for-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm: (37 commits)
pwm: samsung: Document new member .channel in struct samsung_pwm_chip
pwm: bcm2835: Add support for suspend/resume
pwm: brcmstb: Checked clk_prepare_enable() return value
pwm: brcmstb: Utilize appropriate clock APIs in suspend/resume
pwm: pxa: Explicitly include correct DT includes
pwm: cros-ec: Simplify using devm_pwmchip_add() and dev_err_probe()
pwm: samsung: Consistently use the same name for driver data
pwm: vt8500: Simplify using devm functions
pwm: sprd: Simplify using devm_pwmchip_add() and dev_err_probe()
pwm: sprd: Provide a helper to cast a chip to driver data
pwm: spear: Simplify using devm functions
pwm: mtk-disp: Simplify using devm_pwmchip_add()
pwm: imx-tpm: Simplify using devm functions
pwm: brcmstb: Simplify using devm functions
pwm: bcm2835: Simplify using devm functions
pwm: bcm-iproc: Simplify using devm functions
pwm: Adapt sysfs API documentation to reality
pwm: dwc: add PWM bit unset in get_state call
pwm: dwc: make timer clock configurable
pwm: dwc: split pci out of core driver
...
Pull iommu updates from Joerg Roedel:
"Core changes:
- Make default-domains mandatory for all IOMMU drivers
- Remove group refcounting
- Add generic_single_device_group() helper and consolidate drivers
- Cleanup map/unmap ops
- Scaling improvements for the IOVA rcache depot
- Convert dart & iommufd to the new domain_alloc_paging()
ARM-SMMU:
- Device-tree binding update:
- Add qcom,sm7150-smmu-v2 for Adreno on SM7150 SoC
- SMMUv2:
- Support for Qualcomm SDM670 (MDSS) and SM7150 SoCs
- SMMUv3:
- Large refactoring of the context descriptor code to move the CD
table into the master, paving the way for '->set_dev_pasid()'
support on non-SVA domains
- Minor cleanups to the SVA code
Intel VT-d:
- Enable debugfs to dump domain attached to a pasid
- Remove an unnecessary inline function
AMD IOMMU:
- Initial patches for SVA support (not complete yet)
S390 IOMMU:
- DMA-API conversion and optimized IOTLB flushing
And some smaller fixes and improvements"
* tag 'iommu-updates-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (102 commits)
iommu/dart: Remove the force_bypass variable
iommu/dart: Call apple_dart_finalize_domain() as part of alloc_paging()
iommu/dart: Convert to domain_alloc_paging()
iommu/dart: Move the blocked domain support to a global static
iommu/dart: Use static global identity domains
iommufd: Convert to alloc_domain_paging()
iommu/vt-d: Use ops->blocked_domain
iommu/vt-d: Update the definition of the blocking domain
iommu: Move IOMMU_DOMAIN_BLOCKED global statics to ops->blocked_domain
Revert "iommu/vt-d: Remove unused function"
iommu/amd: Remove DMA_FQ type from domain allocation path
iommu: change iommu_map_sgtable to return signed values
iommu/virtio: Add __counted_by for struct viommu_request and use struct_size()
iommu/vt-d: debugfs: Support dumping a specified page table
iommu/vt-d: debugfs: Create/remove debugfs file per {device, pasid}
iommu/vt-d: debugfs: Dump entry pointing to huge page
iommu/vt-d: Remove unused function
iommu/arm-smmu-v3-sva: Remove bond refcount
iommu/arm-smmu-v3-sva: Remove unused iommu_sva handle
iommu/arm-smmu-v3: Rename cdcfg to cd_table
...
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-11-06 (ice)
This series contains updates to ice driver only.
Dave removes SR-IOV LAG attribute for only the interface being disabled
to allow for proper unwinding of all interfaces.
Michal Schmidt changes some LAG allocations from GFP_KERNEL to GFP_ATOMIC
due to non-allowed sleeping.
Aniruddha and Marcin fix redirection and drop rules for switchdev by
properly setting and marking egress/ingress type.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: Fix VF-VF direction matching in drop rule in switchdev
ice: Fix VF-VF filter rules in switchdev mode
ice: lag: in RCU, use atomic allocation
ice: Fix SRIOV LAG disable on non-compliant aggregate
====================
Link: https://lore.kernel.org/r/20231107004844.655549-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-11-06 (i40e)
This series contains updates to i40e driver only.
Ivan Vecera resolves a couple issues with devlink; removing a call to
devlink_port_type_clear() and ensuring devlink port is unregistered
after the net device.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
i40e: Fix devlink port unregistering
i40e: Do not call devlink_port_type_clear()
====================
Link: https://lore.kernel.org/r/20231107003600.653796-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Add missing netfilter modules description to fix W=1, from Florian Westphal.
2) Fix catch-all element GC with timeout when use with the pipapo set
backend, this remained broken since I tried to fix it this summer,
then another attempt to fix it recently.
3) Add missing IPVS modules descriptions to fix W=1, also from Florian.
4) xt_recent allocated a too small buffer to store an IPv4-mapped IPv6
address which can be parsed by in6_pton(), from Maciej Zenczykowski.
Broken for many releases.
5) Skip IPv4-mapped IPv6, IPv4-compat IPv6, site/link local scoped IPv6
addressses to set up IPv6 NAT redirect, also from Florian. This is
broken since 2012.
* tag 'nf-23-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nat: fix ipv6 nat redirect with mapped and scoped addresses
netfilter: xt_recent: fix (increase) ipv6 literal buffer length
ipvs: add missing module descriptions
netfilter: nf_tables: remove catchall element in GC sync path
netfilter: add missing module descriptions
====================
Link: https://lore.kernel.org/r/20231108155802.84617-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Borkmann says:
====================
pull-request: bpf 2023-11-08
We've added 16 non-merge commits during the last 6 day(s) which contain
a total of 30 files changed, 341 insertions(+), 130 deletions(-).
The main changes are:
1) Fix a BPF verifier issue in precision tracking for BPF_ALU | BPF_TO_BE |
BPF_END where the source register was incorrectly marked as precise,
from Shung-Hsi Yu.
2) Fix a concurrency issue in bpf_timer where the former could still have
been alive after an application releases or unpins the map, from Hou Tao.
3) Fix a BPF verifier issue where immediates are incorrectly cast to u32
before being spilled and therefore losing sign information, from Hao Sun.
4) Fix a misplaced BPF_TRACE_ITER in check_css_task_iter_allowlist which
incorrectly compared bpf_prog_type with bpf_attach_type, from Chuyi Zhou.
5) Add __bpf_hook_{start,end} as well as __bpf_kfunc_{start,end}_defs macros,
migrate all BPF-related __diag callsites over to it, and add a new
__diag_ignore_all for -Wmissing-declarations to the macros to address
recent build warnings, from Dave Marchevsky.
6) Fix broken BPF selftest build of xdp_hw_metadata test on architectures
where char is not signed, from Björn Töpel.
7) Fix test_maps selftest to properly use LIBBPF_OPTS() macro to initialize
the bpf_map_create_opts, from Andrii Nakryiko.
8) Fix bpffs selftest to avoid unmounting /sys/kernel/debug as it may have
been mounted and used by other applications already, from Manu Bretelle.
9) Fix a build issue without CONFIG_CGROUPS wrt css_task open-coded
iterators, from Matthieu Baerts.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: get trusted cgrp from bpf_iter__cgroup directly
bpf: Let verifier consider {task,cgroup} is trusted in bpf_iter_reg
selftests/bpf: Fix broken build where char is unsigned
selftests/bpf: precision tracking test for BPF_NEG and BPF_END
bpf: Fix precision tracking for BPF_ALU | BPF_TO_BE | BPF_END
selftests/bpf: Add test for using css_task iter in sleepable progs
selftests/bpf: Add tests for css_task iter combining with cgroup iter
bpf: Relax allowlist for css_task iter
selftests/bpf: fix test_maps' use of bpf_map_create_opts
bpf: Check map->usercnt after timer->timer is assigned
bpf: Add __bpf_hook_{start,end} macros
bpf: Add __bpf_kfunc_{start,end}_defs macros
selftests/bpf: fix test_bpffs
selftests/bpf: Add test for immediate spilled to stack
bpf: Fix check_stack_write_fixed_off() to correctly spill imm
bpf: fix compilation error without CGROUPS
====================
Link: https://lore.kernel.org/r/20231108132448.1970-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Referenced commit doesn't always set iifidx when offloading the flow to
hardware. Fix the following cases:
- nf_conn_act_ct_ext_fill() is called before extension is created with
nf_conn_act_ct_ext_add() in tcf_ct_act(). This can cause rule offload with
unspecified iifidx when connection is offloaded after only single
original-direction packet has been processed by tc data path. Always fill
the new nf_conn_act_ct_ext instance after creating it in
nf_conn_act_ct_ext_add().
- Offloading of unidirectional UDP NEW connections is now supported, but ct
flow iifidx field is not updated when connection is promoted to
bidirectional which can result reply-direction iifidx to be zero when
refreshing the connection. Fill in the extension and update flow iifidx
before calling flow_offload_refresh().
Fixes: 9795ded7f9 ("net/sched: act_ct: Fill offloading tuple iifidx")
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Fixes: 6a9bad0069 ("net/sched: act_ct: offload UDP NEW connections")
Link: https://lore.kernel.org/r/20231103151410.764271-1-vladbu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull NFS client updates from Trond Myklebust:
"Bugfixes:
- SUNRPC:
- re-probe the target RPC port after an ECONNRESET error
- handle allocation errors from rpcb_call_async()
- fix a use-after-free condition in rpc_pipefs
- fix up various checks for timeouts
- NFSv4.1:
- Handle NFS4ERR_DELAY errors during session trunking
- fix SP4_MACH_CRED protection for pnfs IO
- NFSv4:
- Ensure that we test all delegations when the server notifies
us that it may have revoked some of them
Features:
- Allow knfsd processes to break out of NFS4ERR_DELAY loops when
re-exporting NFSv4.x by setting appropriate values for the
'delay_retrans' module parameter
- nfs: Convert nfs_symlink() to use a folio"
* tag 'nfs-for-6.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
nfs: Convert nfs_symlink() to use a folio
SUNRPC: Fix RPC client cleaned up the freed pipefs dentries
NFSv4.1: fix SP4_MACH_CRED protection for pnfs IO
SUNRPC: Add an IS_ERR() check back to where it was
NFSv4.1: fix handling NFS4ERR_DELAY when testing for session trunking
nfs41: drop dependency between flexfiles layout driver and NFSv3 modules
NFSv4: fairly test all delegations on a SEQ4_ revocation
SUNRPC: SOFTCONN tasks should time out when on the sending list
SUNRPC: Force close the socket when a hard error is reported
SUNRPC: Don't skip timeout checks in call_connect_status()
SUNRPC: ECONNRESET might require a rebind
NFSv4/pnfs: Allow layoutget to return EAGAIN for softerr mounts
NFSv4: Add a parameter to limit the number of retries after NFS4ERR_DELAY