linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-05 15:49:42 -04:00

Author	SHA1	Message	Date
Joe Damato	888634377f	ena: Link queues to NAPIs Link queues to NAPIs using the netdev-genl API so this information is queryable. $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump queue-get --json='{"ifindex": 2}' [{'id': 0, 'ifindex': 2, 'napi-id': 8201, 'type': 'rx'}, {'id': 1, 'ifindex': 2, 'napi-id': 8202, 'type': 'rx'}, {'id': 2, 'ifindex': 2, 'napi-id': 8203, 'type': 'rx'}, {'id': 3, 'ifindex': 2, 'napi-id': 8204, 'type': 'rx'}, {'id': 4, 'ifindex': 2, 'napi-id': 8205, 'type': 'rx'}, {'id': 5, 'ifindex': 2, 'napi-id': 8206, 'type': 'rx'}, {'id': 6, 'ifindex': 2, 'napi-id': 8207, 'type': 'rx'}, {'id': 7, 'ifindex': 2, 'napi-id': 8208, 'type': 'rx'}, {'id': 0, 'ifindex': 2, 'napi-id': 8201, 'type': 'tx'}, {'id': 1, 'ifindex': 2, 'napi-id': 8202, 'type': 'tx'}, {'id': 2, 'ifindex': 2, 'napi-id': 8203, 'type': 'tx'}, {'id': 3, 'ifindex': 2, 'napi-id': 8204, 'type': 'tx'}, {'id': 4, 'ifindex': 2, 'napi-id': 8205, 'type': 'tx'}, {'id': 5, 'ifindex': 2, 'napi-id': 8206, 'type': 'tx'}, {'id': 6, 'ifindex': 2, 'napi-id': 8207, 'type': 'tx'}, {'id': 7, 'ifindex': 2, 'napi-id': 8208, 'type': 'tx'}] Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: David Arinzon <darinzon@amazon.com> Link: https://patch.msgid.link/20241002001331.65444-3-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-03 16:13:47 -07:00
Joe Damato	989867846f	ena: Link IRQs to NAPI instances Link IRQs to NAPI instances with netif_napi_set_irq. This information can be queried with the netdev-genl API. Note that the ENA device appears to allocate an IRQ for management purposes which does not have a NAPI associated with it; this commit takes this into consideration to accurately construct a map between IRQs and NAPI instances. Compare the output of /proc/interrupts for my ena device with the output of netdev-genl after applying this patch: $ cat /proc/interrupts \| grep enp55s0 \| cut -f1 --delimiter=':' 94 95 96 97 98 99 100 101 $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump napi-get --json='{"ifindex": 2}' [{'id': 8208, 'ifindex': 2, 'irq': 101}, {'id': 8207, 'ifindex': 2, 'irq': 100}, {'id': 8206, 'ifindex': 2, 'irq': 99}, {'id': 8205, 'ifindex': 2, 'irq': 98}, {'id': 8204, 'ifindex': 2, 'irq': 97}, {'id': 8203, 'ifindex': 2, 'irq': 96}, {'id': 8202, 'ifindex': 2, 'irq': 95}, {'id': 8201, 'ifindex': 2, 'irq': 94}] Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: David Arinzon <darinzon@amazon.com> Link: https://patch.msgid.link/20241002001331.65444-2-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-03 16:13:46 -07:00
Jakub Kicinski	d07dceb91a	Merge branch 'packing-various-improvements-and-kunit-tests' Jacob Keller says: ==================== packing: various improvements and KUnit tests This series contains a handful of improvements and fixes for the packing library, including the addition of KUnit tests. There are two major changes which might be considered bug fixes: 1) The library is updated to handle arbitrary buffer lengths, fixing undefined behavior when operating on buffers which are not a multiple of 4 bytes. 2) The behavior of QUIRK_MSB_ON_THE_RIGHT is fixed to match the intended behavior when operating on packings that are not byte aligned. These are not sent to net because no driver currently depends on this behavior. For (1), the existing users of the packing API all operate on buffers which are multiples of 4-bytes. For (2), no driver currently uses QUIRK_MSB_ON_THE_RIGHT. The incorrect behavior was found while writing KUnit tests. This series also includes a handful of minor cleanups from Vladimir, as well as a change to introduce a separated pack() and unpack() API. This API is not (yet) used by a driver, but is the first step in implementing pack_fields() and unpack_fields() which will be used in future changes for the ice driver and changes Vladimir has in progress for other drivers using the packing API. This series is part 1 of a 2-part series for implementing use of lib/packing in the ice driver. The 2nd part includes a new pack_fields() and unpack_fields() implementation inspired by the ice driver's existing bit packing code. It is built on top of the split pack() and unpack() code. Additionally, the KUnit tests are built on top of pack() and unpack(), based on original selftests written by Vladimir. Fitting the entire library changes and drivers changes into a single series exceeded the usual series limits. v1: https://lore.kernel.org/r/20240930-packing-kunit-tests-and-split-pack-unpack-v1-0-94b1f04aca85@intel.com ==================== Link: https://patch.msgid.link/20241002-packing-kunit-tests-and-split-pack-unpack-v2-0-8373e551eae3@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-03 15:32:07 -07:00
Vladimir Oltean	46e784e94b	lib: packing: use GENMASK() for box_mask This is an u8, so using GENMASK_ULL() for unsigned long long is unnecessary. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20241002-packing-kunit-tests-and-split-pack-unpack-v2-10-8373e551eae3@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-03 15:32:04 -07:00
Vladimir Oltean	fb02c7c8a5	lib: packing: use BITS_PER_BYTE instead of 8 This helps clarify what the 8 is for. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20241002-packing-kunit-tests-and-split-pack-unpack-v2-9-8373e551eae3@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-03 15:32:04 -07:00
Jacob Keller	e7fdf5dddc	lib: packing: fix QUIRK_MSB_ON_THE_RIGHT behavior The QUIRK_MSB_ON_THE_RIGHT quirk is intended to modify pack() and unpack() so that the most significant bit of each byte in the packed layout is on the right. The way the quirk is currently implemented is broken whenever the packing code packs or unpacks any value that is not exactly a full byte. The broken behavior can occur when packing any values smaller than one byte, when packing any value that is not exactly a whole number of bytes, or when the packing is not aligned to a byte boundary. This quirk is documented in the following way: 1. Normally (no quirks), we would do it like this: :: 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 7 6 5 4 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 3 2 1 0 <snip> 2. If QUIRK_MSB_ON_THE_RIGHT is set, we do it like this: :: 56 57 58 59 60 61 62 63 48 49 50 51 52 53 54 55 40 41 42 43 44 45 46 47 32 33 34 35 36 37 38 39 7 6 5 4 24 25 26 27 28 29 30 31 16 17 18 19 20 21 22 23 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 3 2 1 0 That is, QUIRK_MSB_ON_THE_RIGHT does not affect byte positioning, but inverts bit offsets inside a byte. Essentially, the mapping for physical bit offsets should be reserved for a given byte within the payload. This reversal should be fixed to the bytes in the packing layout. The logic to implement this quirk is handled within the adjust_for_msb_right_quirk() function. This function does not work properly when dealing with the bytes that contain only a partial amount of data. In particular, consider trying to pack or unpack the range 53-44. We should always be mapping the bits from the logical ordering to their physical ordering in the same way, regardless of what sequence of bits we are unpacking. This, we should grab the following logical bits: Logical: 55 54 53 52 51 50 49 48 47 45 44 43 42 41 40 39 ^ ^ ^ ^ ^ ^ ^ ^ ^ And pack them into the physical bits: Physical: 48 49 50 51 52 53 54 55 40 41 42 43 44 45 46 47 Logical: 48 49 50 51 52 53 44 45 46 47 ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ The current logic in adjust_for_msb_right_quirk is broken. I believe it is intending to map according to the following: Physical: 48 49 50 51 52 53 54 55 40 41 42 43 44 45 46 47 Logical: 48 49 50 51 52 53 44 45 46 47 ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ That is, it tries to keep the bits at the start and end of a packing together. This is wrong, as it makes the packing change what bit is being mapped to what based on which bits you're currently packing or unpacking. Worse, the actual calculations within adjust_for_msb_right_quirk don't make sense. Consider the case when packing the last byte of an unaligned packing. It might have a start bit of 7 and an end bit of 5. This would have a width of 3 bits. The new_start_bit will be calculated as the width - the box_end_bit - 1. This will underflow and produce a negative value, which will ultimate result in generating a new box_mask of all 0s. For any other values, the result of the calculations of the new_box_end_bit, new_box_start_bit, and the new box_mask will result in the exact same values for the box_end_bit, box_start_bit, and box_mask. This makes the calculations completely irrelevant. If box_end_bit is 0, and box_start_bit is 7, then the entire function of adjust_for_msb_right_quirk will boil down to just: to_write = bitrev8(to_write) The other adjustments are attempting (incorrectly) to keep the bits in the same place but just reversed. This is not the right behavior even if implemented correctly, as it leaves the mapping dependent on the bit values being packed or unpacked. Remove adjust_for_msb_right_quirk() and just use bitrev8 to reverse the byte order when interacting with the packed data. In particular, for packing, we need to reverse both the box_mask and the physical value being packed. This is done after shifting the value by box_end_bit so that the reversed mapping is always aligned to the physical buffer byte boundary. The box_mask is reversed as we're about to use it to clear any stale bits in the physical buffer at this block. For unpacking, we need to reverse the contents of the physical buffer before masking with the box_mask. This is critical, as the box_mask is a logical mask of the bit layout before handling the QUIRK_MSB_ON_THE_RIGHT. Add several new tests which cover this behavior. These tests will fail without the fix and pass afterwards. Note that no current drivers make use of QUIRK_MSB_ON_THE_RIGHT. I suspect this is why there have been no reports of this inconsistency before. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://patch.msgid.link/20241002-packing-kunit-tests-and-split-pack-unpack-v2-8-8373e551eae3@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-03 15:32:04 -07:00
Jacob Keller	fcd6dd91d0	lib: packing: add additional KUnit tests While reviewing the initial KUnit tests for lib/packing, Przemek pointed out that the test values have duplicate bytes in the input sequence. In addition, I noticed that the unit tests pack and unpack on a byte boundary, instead of crossing bytes. Thus, we lack good coverage of the corner cases of the API. Add additional unit tests to cover packing and unpacking byte buffers which do not have duplicate bytes in the unpacked value, and which pack and unpack to an unaligned offset. A careful reviewer may note the lack tests for QUIRK_MSB_ON_THE_RIGHT. This is because I found issues with that quirk during test implementation. This quirk will be fixed and the tests will be included in a future change. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://patch.msgid.link/20241002-packing-kunit-tests-and-split-pack-unpack-v2-7-8373e551eae3@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-03 15:32:04 -07:00
Jacob Keller	e9502ea6db	lib: packing: add KUnit tests adapted from selftests Add 24 simple KUnit tests for the lib/packing.c pack() and unpack() APIs. The first 16 tests exercise all combinations of quirks with a simple magic number value on a 16-byte buffer. The remaining 8 tests cover non-multiple-of-4 buffer sizes. These tests were originally written by Vladimir as simple selftest functions. I adapted them to KUnit, refactoring them into a table driven approach. This will aid in adding additional tests in the future. Co-developed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20241002-packing-kunit-tests-and-split-pack-unpack-v2-6-8373e551eae3@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-03 15:32:04 -07:00
Vladimir Oltean	28aec9ca29	lib: packing: duplicate pack() and unpack() implementations packing() is now used in some hot paths, and it would be good to get rid of some ifs and buts that depend on "op", to speed things up a little bit. With the main implementations now taking size_t endbit, we no longer have to check for negative values. Update the local integer variables to also be size_t to match. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20241002-packing-kunit-tests-and-split-pack-unpack-v2-5-8373e551eae3@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-03 15:32:04 -07:00
Vladimir Oltean	7263f64e16	lib: packing: add pack() and unpack() wrappers over packing() Geert Uytterhoeven described packing() as "really bad API" because of not being able to enforce const correctness. The same function is used both when "pbuf" is input and "uval" is output, as in the other way around. Create 2 wrapper functions where const correctness can be ensured. Do ugly type casts inside, to be able to reuse packing() as currently implemented - which will _not_ modify the input argument. Also, take the opportunity to change the type of startbit and endbit to size_t - an unsigned type - in these new function prototypes. When int, an extra check for negative values is necessary. Hopefully, when packing() goes away completely, that check can be dropped. My concern is that code which does rely on the conditional directionality of packing() is harder to refactor without blowing up in size. So it may take a while to completely eliminate packing(). But let's make alternatives available for those who do not need that. Link: https://lore.kernel.org/netdev/20210223112003.2223332-1-geert+renesas@glider.be/ Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20241002-packing-kunit-tests-and-split-pack-unpack-v2-4-8373e551eae3@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-03 15:32:04 -07:00
Vladimir Oltean	816ad8f1e4	lib: packing: remove kernel-doc from header file It is not necessary to have the kernel-doc duplicated both in the header and in the implementation. It is better to have it near the implementation of the function, since in C, a function can have N declarations, but only one definition. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20241002-packing-kunit-tests-and-split-pack-unpack-v2-3-8373e551eae3@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-03 15:32:03 -07:00
Vladimir Oltean	a636ba5e86	lib: packing: adjust definitions and implementation for arbitrary buffer lengths Jacob Keller has a use case for packing() in the intel/ice networking driver, but it cannot be used as-is. Simply put, the API quirks for LSW32_IS_FIRST and LITTLE_ENDIAN are naively implemented with the undocumented assumption that the buffer length must be a multiple of 4. All calculations of group offsets and offsets of bytes within groups assume that this is the case. But in the ice case, this does not hold true. For example, packing into a buffer of 22 bytes would yield wrong results, but pretending it was a 24 byte buffer would work. Rather than requiring such hacks, and leaving a big question mark when it comes to discontinuities in the accessible bit fields of such buffer, we should extend the packing API to support this use case. It turns out that we can keep the design in terms of groups of 4 bytes, but also make it work if the total length is not a multiple of 4. Just like before, imagine the buffer as a big number, and its most significant bytes (the ones that would make up to a multiple of 4) are missing. Thus, with a big endian (no quirks) interpretation of the buffer, those most significant bytes would be absent from the beginning of the buffer, and with a LSW32_IS_FIRST interpretation, they would be absent from the end of the buffer. The LITTLE_ENDIAN quirk, in the packing() API world, only affects byte ordering within groups of 4. Thus, it does not change which bytes are missing. Only the significance of the remaining bytes within the (smaller) group. No change intended for buffer sizes which are multiples of 4. Tested with the sja1105 driver and with downstream unit tests. Link: https://lore.kernel.org/netdev/a0338310-e66c-497c-bc1f-a597e50aa3ff@intel.com/ Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20241002-packing-kunit-tests-and-split-pack-unpack-v2-2-8373e551eae3@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-03 15:32:03 -07:00
Vladimir Oltean	8b3e26677b	lib: packing: refuse operating on bit indices which exceed size of buffer While reworking the implementation, it became apparent that this check does not exist. There is no functional issue yet, because at call sites, "startbit" and "endbit" are always hardcoded to correct values, and never come from the user. Even with the upcoming support of arbitrary buffer lengths, the "startbit >= 8 * pbuflen" check will remain correct. This is because we intend to always interpret the packed buffer in a way that avoids discontinuities in the available bit indices. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20241002-packing-kunit-tests-and-split-pack-unpack-v2-1-8373e551eae3@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-03 15:32:03 -07:00
Jakub Kicinski	f66ebf37d6	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. No conflicts and no adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-03 10:05:55 -07:00
Linus Torvalds	8c245fe7dd	Merge tag 'net-6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from ieee802154, bluetooth and netfilter. Current release - regressions: - eth: mlx5: fix wrong reserved field in hca_cap_2 in mlx5_ifc - eth: am65-cpsw: fix forever loop in cleanup code Current release - new code bugs: - eth: mlx5: HWS, fixed double-free in error flow of creating SQ Previous releases - regressions: - core: avoid potential underflow in qdisc_pkt_len_init() with UFO - core: test for not too small csum_start in virtio_net_hdr_to_skb() - vrf: revert "vrf: remove unnecessary RCU-bh critical section" - bluetooth: - fix uaf in l2cap_connect - fix possible crash on mgmt_index_removed - dsa: improve shutdown sequence - eth: mlx5e: SHAMPO, fix overflow of hd_per_wq - eth: ip_gre: fix drops of small packets in ipgre_xmit Previous releases - always broken: - core: fix gso_features_check to check for both dev->gso_{ipv4_,}max_size - core: fix tcp fraglist segmentation after pull from frag_list - netfilter: nf_tables: prevent nf_skb_duplicated corruption - sctp: set sk_state back to CLOSED if autobind fails in sctp_listen_start - mac802154: fix potential RCU dereference issue in mac802154_scan_worker - eth: fec: restart PPS after link state change" * tag 'net-6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (48 commits) sctp: set sk_state back to CLOSED if autobind fails in sctp_listen_start dt-bindings: net: xlnx,axi-ethernet: Add missing reg minItems doc: net: napi: Update documentation for napi_schedule_irqoff net/ncsi: Disable the ncsi work before freeing the associated structure net: phy: qt2025: Fix warning: unused import DeviceId gso: fix udp gso fraglist segmentation after pull from frag_list bridge: mcast: Fail MDB get request on empty entry vrf: revert "vrf: Remove unnecessary RCU-bh critical section" net: ethernet: ti: am65-cpsw: Fix forever loop in cleanup code net: phy: realtek: Check the index value in led_hw_control_get ppp: do not assume bh is held in ppp_channel_bridge_input() selftests: rds: move include.sh to TEST_FILES net: test for not too small csum_start in virtio_net_hdr_to_skb() net: gso: fix tcp fraglist segmentation after pull from frag_list ipv4: ip_gre: Fix drops of small packets in ipgre_xmit net: stmmac: dwmac4: extend timeout for VLAN Tag register busy bit check net: add more sanity checks to qdisc_pkt_len_init() net: avoid potential underflow in qdisc_pkt_len_init() with UFO net: ethernet: ti: cpsw_ale: Fix warning on some platforms net: microchip: Make FDMA config symbol invisible ...	2024-10-03 09:44:00 -07:00
Linus Torvalds	9c02404b52	Merge tag 'v6.12-rc1-ksmbd-fixes' of git://git.samba.org/ksmbd Pull smb server fixes from Steve French: - small cleanup patches leveraging struct size to improve access bounds checking * tag 'v6.12-rc1-ksmbd-fixes' of git://git.samba.org/ksmbd: ksmbd: Use struct_size() to improve smb_direct_rdma_xmit() ksmbd: Annotate struct copychunk_ioctl_req with __counted_by_le() ksmbd: Use struct_size() to improve get_file_alternate_info()	2024-10-03 09:38:16 -07:00
Linus Torvalds	20c2474fa5	Merge tag 'vfs-6.12-rc2.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: "vfs: - Ensure that iter_folioq_get_pages() advances to the next slot otherwise it will end up using the same folio with an out-of-bound offset. iomap: - Dont unshare delalloc extents which can't be reflinked, and thus can't be shared. - Constrain the file range passed to iomap_file_unshare() directly in iomap instead of requiring the callers to do it. netfs: - Use folioq_count instead of folioq_nr_slot to prevent an unitialized value warning in netfs_clear_buffer(). - Fix missing wakeup after issuing writes by scheduling the write collector only if all the subrequest queues are empty and thus no writes are pending. - Fix two minor documentation bugs" * tag 'vfs-6.12-rc2.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: iomap: constrain the file range passed to iomap_file_unshare iomap: don't bother unsharing delalloc extents netfs: Fix missing wakeup after issuing writes Documentation: add missing folio_queue entry folio_queue: fix documentation netfs: Fix a KMSAN uninit-value error in netfs_clear_buffer iov_iter: fix advancing slot in iter_folioq_get_pages()	2024-10-03 09:22:50 -07:00
Erni Sri Satya Vennela	c30a3f54e6	net: mana: Add get_link and get_link_ksettings in ethtool Add support for the ethtool get_link and get_link_ksettings operations. Display standard port information using ethtool. Before the change: $ethtool enP30832s1 > No data available After the change: $ethtool enP30832s1 > Settings for enP30832s1: Supported ports: [ ] Supported link modes: Not reported Supported pause frame use: No Supports auto-negotiation: No Supported FEC modes: Not reported Advertised link modes: Not reported Advertised pause frame use: No Advertised auto-negotiation: No Advertised FEC modes: Not reported Speed: Unknown! Duplex: Full Auto-negotiation: off Port: Other PHYAD: 0 Transceiver: internal Link detected: yes Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1727674934-12130-1-git-send-email-ernis@linux.microsoft.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-10-03 13:47:26 +02:00
Xin Long	8beee4d8de	sctp: set sk_state back to CLOSED if autobind fails in sctp_listen_start In sctp_listen_start() invoked by sctp_inet_listen(), it should set the sk_state back to CLOSED if sctp_autobind() fails due to whatever reason. Otherwise, next time when calling sctp_inet_listen(), if sctp_sk(sk)->reuse is already set via setsockopt(SCTP_REUSE_PORT), sctp_sk(sk)->bind_hash will be dereferenced as sk_state is LISTENING, which causes a crash as bind_hash is NULL. KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] RIP: 0010:sctp_inet_listen+0x7f0/0xa20 net/sctp/socket.c:8617 Call Trace: <TASK> __sys_listen_socket net/socket.c:1883 [inline] __sys_listen+0x1b7/0x230 net/socket.c:1894 __do_sys_listen net/socket.c:1902 [inline] Fixes: `5e8f3f703a` ("sctp: simplify sctp listening code") Reported-by: syzbot+f4e0f821e3a3b7cee51d@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Link: https://patch.msgid.link/a93e655b3c153dc8945d7a812e6d8ab0d52b7aa0.1727729391.git.lucien.xin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-10-03 12:18:29 +02:00
Ravikanth Tuniki	c6929644c1	dt-bindings: net: xlnx,axi-ethernet: Add missing reg minItems Add missing reg minItems as based on current binding document only ethernet MAC IO space is a supported configuration. There is a bug in schema, current examples contain 64-bit addressing as well as 32-bit addressing. The schema validation does pass incidentally considering one 64-bit reg address as two 32-bit reg address entries. If we change axi_ethernet_eth1 example node reg addressing to 32-bit schema validation reports: Documentation/devicetree/bindings/net/xlnx,axi-ethernet.example.dtb: ethernet@40000000: reg: [[1073741824, 262144]] is too short To fix it add missing reg minItems constraints and to make things clearer stick to 32-bit addressing in examples. Fixes: `cbb1ca6d5f` ("dt-bindings: net: xlnx,axi-ethernet: convert bindings document to yaml") Signed-off-by: Ravikanth Tuniki <ravikanth.tuniki@amd.com> Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://patch.msgid.link/1727723615-2109795-1-git-send-email-radhey.shyam.pandey@amd.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-10-03 12:15:04 +02:00
Sean Anderson	b63ad06ddd	doc: net: napi: Update documentation for napi_schedule_irqoff Since commit `8380c81d5c` ("net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT"), napi_schedule_irqoff will do the right thing if IRQs are threaded. Therefore, there is no need to use IRQF_NO_THREAD. Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20240930153955.971657-1-sean.anderson@linux.dev Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-10-03 12:07:29 +02:00
Paolo Abeni	1127c73a8d	Merge tag 'nf-24-10-02' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Fix incorrect documentation in uapi/linux/netfilter/nf_tables.h regarding flowtable hooks, from Phil Sutter. 2) Fix nft_audit.sh selftests with newer nft binaries, due to different (valid) audit output, also from Phil. 3) Disable BH when duplicating packets via nf_dup infrastructure, otherwise race on nf_skb_duplicated for locally generated traffic. From Eric. 4) Missing return in callback of selftest C program, from zhang jiao. netfilter pull request 24-10-02 * tag 'nf-24-10-02' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: selftests: netfilter: Add missing return value netfilter: nf_tables: prevent nf_skb_duplicated corruption selftests: netfilter: Fix nft_audit.sh for newer nft binaries netfilter: uapi: NFTA_FLOWTABLE_HOOK is NLA_NESTED ==================== Link: https://patch.msgid.link/20241002202421.1281311-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-10-03 12:01:05 +02:00
Shradha Gupta	e26a0c5d82	net: mana: Increase the DEF_RX_BUFFERS_PER_QUEUE to 1024 Through some experiments, we found out that increasing the default RX buffers count from 512 to 1024, gives slightly better throughput and significantly reduces the no_wqe_rx errs on the receiver side. Along with these, other parameters like cpu usage, retrans seg etc also show some improvement with 1024 value. Following are some snippets from the experiments ntttcp tests with 512 Rx buffers --------------------------------------- connections\| throughput\| no_wqe errs\| --------------------------------------- 1 \| 40.93Gbps \| 123,211 \| 16 \| 180.15Gbps \| 190,120 \| 128 \| 180.20Gbps \| 173,508 \| 256 \| 180.27Gbps \| 189,884 \| ntttcp tests with 1024 Rx buffers --------------------------------------- connections\| throughput\| no_wqe errs\| --------------------------------------- 1 \| 44.22Gbps \| 19,864 \| 16 \| 180.19Gbps \| 4,430 \| 128 \| 180.21Gbps \| 2,560 \| 256 \| 180.29Gbps \| 1,529 \| So, increasing the default RX buffers per queue count to 1024 Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://patch.msgid.link/1727667875-29908-1-git-send-email-shradhagupta@linux.microsoft.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-10-03 11:37:24 +02:00
zhang jiao	7c2f1c2690	selftests/net: Add missing va_end. There is no va_end after va_copy, just add it. Signed-off-by: zhang jiao <zhangjiao2@cmss.chinamobile.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240927040050.7851-1-zhangjiao2@cmss.chinamobile.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-10-03 10:43:05 +02:00
Darrick J. Wong	a311a08a42	iomap: constrain the file range passed to iomap_file_unshare File contents can only be shared (i.e. reflinked) below EOF, so it makes no sense to try to unshare ranges beyond EOF. Constrain the file range parameters here so that we don't have to do that in the callers. Fixes: `5f4e5752a8` ("fs: add iomap_file_dirty") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20241002150213.GC21853@frogsfrogsfrogs Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-10-03 10:22:28 +02:00
Darrick J. Wong	f7a4874d97	iomap: don't bother unsharing delalloc extents If unshare encounters a delalloc reservation in the srcmap, that means that the file range isn't shared because delalloc reservations cannot be reflinked. Therefore, don't try to unshare them. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20241002150040.GB21853@frogsfrogsfrogs Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-10-03 10:22:25 +02:00
Eddie James	a0ffa68c70	net/ncsi: Disable the ncsi work before freeing the associated structure The work function can run after the ncsi device is freed, resulting in use-after-free bugs or kernel panic. Fixes: `2d283bdd07` ("net/ncsi: Resource management") Signed-off-by: Eddie James <eajames@linux.ibm.com> Link: https://patch.msgid.link/20240925155523.1017097-1-eajames@linux.ibm.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-10-03 10:14:14 +02:00
FUJITA Tomonori	fa7dfeae04	net: phy: qt2025: Fix warning: unused import DeviceId Fix the following warning when the driver is compiled as built-in: warning: unused import: `DeviceId` --> drivers/net/phy/qt2025.rs:18:5 \| 18 \| DeviceId, Driver, \| ^^^^^^^^ \| = note: `#[warn(unused_imports)]` on by default device_table in module_phy_driver macro is defined only when the driver is built as a module. Use phy::DeviceId in the macro instead of importing `DeviceId` since `phy` is always used. Fixes: `fd3eaad826` ("net: phy: add Applied Micro QT2025 PHY driver") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202409190717.i135rfVo-lkp@intel.com/ Reviewed-by: Alice Ryhl <aliceryhl@google.com> Reviewed-by: Trevor Gross <tmgross@umich.edu> Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com> Reviewed-by: Fiona Behrens <me@kloenk.dev> Acked-by: Miguel Ojeda <ojeda@kernel.org> Link: https://patch.msgid.link/20240926121404.242092-1-fujita.tomonori@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-02 17:44:00 -07:00
Jakub Kicinski	6b67e098c9	Merge branch 'net-pcs-xpcs-cleanups-batch-1' Russell King says: ==================== net: pcs: xpcs: cleanups batch 1 First, sorry for the bland series subject - this is the first in a number of cleanup series to the XPCS driver. This series has some functional changes beyond merely cleanups, notably the first patch. This series starts off with a patch that moves the PCS reset from the xpcs_create() family of calls to when phylink first configures the PHY. The motivation for this change is to get rid of the interface argument to the xpcs_create() functions, which I see as unnecessary complexity. This patch should be tested on Wangxun and STMMAC drivers. Patch 2 removes the now unnecessary interface argument from the internal xpcs_create() and xpcs_init_iface() functions. With this, xpcs_init_iface() becomes a misnamed function, but patch 3 removes this function, moving its now meager contents to xpcs_create(). Patch 4 adds xpcs_destroy_pcs() and xpcs_create_pcs_mdiodev() functions which return and take a phylink_pcs, allowing SJA1105 and Wangxun drivers to be converted to using the phylink_pcs structure internally. Patches 5 through 8 convert both these drivers to that end. Patch 9 drops the interface argument from the remaining xpcs_create*() functions, addressing the only remaining caller of these functions, that being the STMMAC driver. As patch 7 removed the direct calls to the XPCS config/link-up functions, the last patch makes these functions static. ==================== Link: https://patch.msgid.link/ZvwdKIp3oYSenGdH@shell.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-02 17:32:03 -07:00
Russell King (Oracle)	faefc9730d	net: pcs: xpcs: make xpcs_do_config() and xpcs_link_up() internal As nothing outside pcs-xpcs.c calls neither xpcs_do_config() nor xpcs_link_up(), remove their exports and prototypes. Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1svfMv-005ZIv-2M@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-02 17:32:01 -07:00
Russell King (Oracle)	bf5a61645b	net: pcs: xpcs: drop interface argument from xpcs_create*() The XPCS sub-driver no longer uses the "interface" argument to the xpcs_create_mdiodev() and xpcs_create_fwnode() functions. Remove this now unnecessary argument, updating the stmmac driver appropriately. Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1svfMp-005ZIp-UX@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-02 17:32:01 -07:00
Russell King (Oracle)	41bf58314b	net: dsa: sja1105: use phylink_pcs internally Use xpcs_create_pcs_mdiodev() to create the XPCS instance, storing and using the phylink_pcs pointer internally, rather than dw_xpcs. Use xpcs_destroy_pcs() to destroy the XPCS instance when we've finished with it. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1svfMk-005ZIj-R3@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-02 17:32:00 -07:00
Russell King (Oracle)	907476c66d	net: dsa: sja1105: call PCS config/link_up via pcs_ops structure Call the PCS operations through the ops structure, which avoids needing to export xpcs internal functions. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1svfMf-005ZId-Mx@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-02 17:32:00 -07:00
Russell King (Oracle)	a18891b557	net: dsa: sja1105: simplify static configuration reload The static configuration reload saves the port speed in the static configuration tables by first converting it from the internal respresentation to the SPEED_xxx ethtool representation, and then converts it back to restore the setting. This is because sja1105_adjust_port_config() takes the speed as SPEED_xxx. However, this is unnecessarily complex. If we split sja1105_adjust_port_config() up, we can simply save and restore the mac[port].speed member in the static configuration tables. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1svfMa-005ZIX-If@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-02 17:32:00 -07:00
Russell King (Oracle)	155c499ffd	net: wangxun: txgbe: use phylink_pcs internally Use xpcs_create_pcs_mdiodev() to create the XPCS instance, storing and using the phylink_pcs pointer internally, rather than dw_xpcs. Use xpcs_destroy_pcs() to destroy the XPCS instance when we've finished with it. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/E1svfMV-005ZIR-FE@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-02 17:32:00 -07:00
Russell King (Oracle)	bedea1539a	net: pcs: xpcs: add xpcs_destroy_pcs() and xpcs_create_pcs_mdiodev() Provide xpcs create/destroy functions that return and take a phylink_pcs pointer instead of an xpcs pointer. This will be used by drivers that have been converted to use phylink_pcs pointers internally, rather than dw_xpcs pointers. As xpcs_create_mdiodev() no longer makes use of its interface argument, pass PHY_INTERFACE_MODE_NA into xpcs_create_mdiodev() until it is removed later in the series. Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1svfMQ-005ZIL-Bi@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-02 17:32:00 -07:00
Russell King (Oracle)	a487c9e7cf	net: pcs: xpcs: get rid of xpcs_init_iface() xpcs_init_iface() no longer does anything with the interface mode, and now merely does configuration related to the PMA ID. Move this back into xpcs_create() as it doesn't warrant being a separate function anymore. Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1svfML-005ZIF-84@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-02 17:32:00 -07:00
Russell King (Oracle)	92fb898608	net: pcs: xpcs: drop interface argument from internal functions Now that we no longer use the "interface" argument when creating the XPCS sub-driver, remove it from xpcs_create() and xpcs_init_iface(). Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1svfMG-005ZI9-3k@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-02 17:31:59 -07:00
Russell King (Oracle)	277b339c4b	net: pcs: xpcs: move PCS reset to .pcs_pre_config() Move the PCS reset to .pcs_pre_config() rather than at creation time, which means we call the reset function with the interface that we're actually going to be using to talk to the downstream device. Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # sja1105 Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Tested-by: for them? Link: https://patch.msgid.link/E1svfMA-005ZI3-Va@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-02 17:31:59 -07:00
Willem de Bruijn	a1e40ac5b5	gso: fix udp gso fraglist segmentation after pull from frag_list Detect gso fraglist skbs with corrupted geometry (see below) and pass these to skb_segment instead of skb_segment_list, as the first can segment them correctly. Valid SKB_GSO_FRAGLIST skbs - consist of two or more segments - the head_skb holds the protocol headers plus first gso_size - one or more frag_list skbs hold exactly one segment - all but the last must be gso_size Optional datapath hooks such as NAT and BPF (bpf_skb_pull_data) can modify these skbs, breaking these invariants. In extreme cases they pull all data into skb linear. For UDP, this causes a NULL ptr deref in __udpv4_gso_segment_list_csum at udp_hdr(seg->next)->dest. Detect invalid geometry due to pull, by checking head_skb size. Don't just drop, as this may blackhole a destination. Convert to be able to pass to regular skb_segment. Link: https://lore.kernel.org/netdev/20240428142913.18666-1-shiming.cheng@mediatek.com/ Fixes: `9fd1ff5d2a` ("udp: Support UDP fraglist GRO/GSO.") Signed-off-by: Willem de Bruijn <willemb@google.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20241001171752.107580-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-02 17:29:31 -07:00
Daniel Golle	78997e9a5e	net: phy: mxl-gpy: add basic LED support Add basic support for LEDs connected to MaxLinear GPY2xx and GPY115 PHYs. The PHYs allow up to 4 LEDs to be connected. Implement controlling LEDs in software as well as netdev trigger offloading and LED polarity setup. The hardware claims to support 16 PWM brightness levels but there is no documentation on how to use that feature, hence this is not supported. Signed-off-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/b6ec9050339f8244ff898898a1cecc33b13a48fc.1727741563.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-02 17:28:25 -07:00
Ido Schimmel	555f45d24b	bridge: mcast: Fail MDB get request on empty entry When user space deletes a port from an MDB entry, the port is removed synchronously. If this was the last port in the entry and the entry is not joined by the host itself, then the entry is scheduled for deletion via a timer. The above means that it is possible for the MDB get netlink request to retrieve an empty entry which is scheduled for deletion. This is problematic as after deleting the last port in an entry, user space cannot rely on a non-zero return code from the MDB get request as an indication that the port was successfully removed. Fix by returning an error when the entry's port list is empty and the entry is not joined by the host. Fixes: `68b380a395` ("bridge: mcast: Add MDB get support") Reported-by: Jamie Bainbridge <jamie.bainbridge@gmail.com> Closes: https://lore.kernel.org/netdev/c92569919307749f879b9482b0f3e125b7d9d2e3.1726480066.git.jamie.bainbridge@gmail.com/ Tested-by: Jamie Bainbridge <jamie.bainbridge@gmail.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20240929123640.558525-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-02 17:26:57 -07:00
Willem de Bruijn	b04c4d9eb4	vrf: revert "vrf: Remove unnecessary RCU-bh critical section" This reverts commit `504fc6f4f7`. dev_queue_xmit_nit is expected to be called with BH disabled. __dev_queue_xmit has the following: /* Disable soft irqs for various locks below. Also * stops preemption for RCU. / rcu_read_lock_bh(); VRF must follow this invariant. The referenced commit removed this protection. Which triggered a lockdep warning: ================================ WARNING: inconsistent lock state 6.11.0 #1 Tainted: G W -------------------------------- inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. btserver/134819 [HC0[0]:SC0[0]:HE1:SE1] takes: ffff8882da30c118 (rlock-AF_PACKET){+.?.}-{2:2}, at: tpacket_rcv+0x863/0x3b30 {IN-SOFTIRQ-W} state was registered at: lock_acquire+0x19a/0x4f0 _raw_spin_lock+0x27/0x40 packet_rcv+0xa33/0x1320 __netif_receive_skb_core.constprop.0+0xcb0/0x3a90 __netif_receive_skb_list_core+0x2c9/0x890 netif_receive_skb_list_internal+0x610/0xcc0 [...] other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(rlock-AF_PACKET); <Interrupt> lock(rlock-AF_PACKET); DEADLOCK * Call Trace: <TASK> dump_stack_lvl+0x73/0xa0 mark_lock+0x102e/0x16b0 __lock_acquire+0x9ae/0x6170 lock_acquire+0x19a/0x4f0 _raw_spin_lock+0x27/0x40 tpacket_rcv+0x863/0x3b30 dev_queue_xmit_nit+0x709/0xa40 vrf_finish_direct+0x26e/0x340 [vrf] vrf_l3_out+0x5f4/0xe80 [vrf] __ip_local_out+0x51e/0x7a0 [...] Fixes: `504fc6f4f7` ("vrf: Remove unnecessary RCU-bh critical section") Link: https://lore.kernel.org/netdev/20240925185216.1990381-1-greearb@candelatech.com/ Reported-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Cc: stable@vger.kernel.org Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20240929061839.1175300-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-02 17:26:11 -07:00
Dan Carpenter	3c97fe4f9f	net: ethernet: ti: am65-cpsw: Fix forever loop in cleanup code This error handling has a typo. It should i++ instead of i--. In the original code the error handling will loop until it crashes. Fixes: `da70d184a8` ("net: ethernet: ti: am65-cpsw: Introduce multi queue Rx") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Alexander Sverdlin <alexander.sverdlin@siemens.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Link: https://patch.msgid.link/8e7960cc-415d-48d7-99ce-f623022ec7b5@stanley.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-02 17:25:32 -07:00
Hui Wang	c283782fc5	net: phy: realtek: Check the index value in led_hw_control_get Just like rtl8211f_led_hw_is_supported() and rtl8211f_led_hw_control_set(), the rtl8211f_led_hw_control_get() also needs to check the index value, otherwise the caller is likely to get an incorrect rules. Fixes: `17784801d8` ("net: phy: realtek: Add support for PHY LEDs on RTL8211F") Signed-off-by: Hui Wang <hui.wang@canonical.com> Reviewed-by: Marek Vasut <marex@denx.de> Link: https://patch.msgid.link/20240927114610.1278935-1-hui.wang@canonical.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-02 17:24:56 -07:00
Eric Dumazet	aec7291003	ppp: do not assume bh is held in ppp_channel_bridge_input() Networking receive path is usually handled from BH handler. However, some protocols need to acquire the socket lock, and packets might be stored in the socket backlog is the socket was owned by a user process. In this case, release_sock(), __release_sock(), and sk_backlog_rcv() might call the sk->sk_backlog_rcv() handler in process context. sybot caught ppp was not considering this case in ppp_channel_bridge_input() : WARNING: inconsistent lock state 6.11.0-rc7-syzkaller-g5f5673607153 #0 Not tainted -------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. ksoftirqd/1/24 [HC0[0]:SC1[1]:HE1:SE0] takes: ffff0000db7f11e0 (&pch->downl){+.?.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] ffff0000db7f11e0 (&pch->downl){+.?.}-{2:2}, at: ppp_channel_bridge_input drivers/net/ppp/ppp_generic.c:2272 [inline] ffff0000db7f11e0 (&pch->downl){+.?.}-{2:2}, at: ppp_input+0x16c/0x854 drivers/net/ppp/ppp_generic.c:2304 {SOFTIRQ-ON-W} state was registered at: lock_acquire+0x240/0x728 kernel/locking/lockdep.c:5759 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x48/0x60 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:351 [inline] ppp_channel_bridge_input drivers/net/ppp/ppp_generic.c:2272 [inline] ppp_input+0x16c/0x854 drivers/net/ppp/ppp_generic.c:2304 pppoe_rcv_core+0xfc/0x314 drivers/net/ppp/pppoe.c:379 sk_backlog_rcv include/net/sock.h:1111 [inline] __release_sock+0x1a8/0x3d8 net/core/sock.c:3004 release_sock+0x68/0x1b8 net/core/sock.c:3558 pppoe_sendmsg+0xc8/0x5d8 drivers/net/ppp/pppoe.c:903 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg net/socket.c:745 [inline] __sys_sendto+0x374/0x4f4 net/socket.c:2204 __do_sys_sendto net/socket.c:2216 [inline] __se_sys_sendto net/socket.c:2212 [inline] __arm64_sys_sendto+0xd8/0xf8 net/socket.c:2212 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline] invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49 el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132 do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151 el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:712 el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598 irq event stamp: 282914 hardirqs last enabled at (282914): [<ffff80008b42e30c>] __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:151 [inline] hardirqs last enabled at (282914): [<ffff80008b42e30c>] _raw_spin_unlock_irqrestore+0x38/0x98 kernel/locking/spinlock.c:194 hardirqs last disabled at (282913): [<ffff80008b42e13c>] __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:108 [inline] hardirqs last disabled at (282913): [<ffff80008b42e13c>] _raw_spin_lock_irqsave+0x2c/0x7c kernel/locking/spinlock.c:162 softirqs last enabled at (282904): [<ffff8000801f8e88>] softirq_handle_end kernel/softirq.c:400 [inline] softirqs last enabled at (282904): [<ffff8000801f8e88>] handle_softirqs+0xa3c/0xbfc kernel/softirq.c:582 softirqs last disabled at (282909): [<ffff8000801fbdf8>] run_ksoftirqd+0x70/0x158 kernel/softirq.c:928 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&pch->downl); <Interrupt> lock(&pch->downl); * DEADLOCK * 1 lock held by ksoftirqd/1/24: #0: ffff80008f74dfa0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire+0x10/0x4c include/linux/rcupdate.h:325 stack backtrace: CPU: 1 UID: 0 PID: 24 Comm: ksoftirqd/1 Not tainted 6.11.0-rc7-syzkaller-g5f5673607153 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024 Call trace: dump_backtrace+0x1b8/0x1e4 arch/arm64/kernel/stacktrace.c:319 show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:326 __dump_stack lib/dump_stack.c:93 [inline] dump_stack_lvl+0xe4/0x150 lib/dump_stack.c:119 dump_stack+0x1c/0x28 lib/dump_stack.c:128 print_usage_bug+0x698/0x9ac kernel/locking/lockdep.c:4000 mark_lock_irq+0x980/0xd2c mark_lock+0x258/0x360 kernel/locking/lockdep.c:4677 __lock_acquire+0xf48/0x779c kernel/locking/lockdep.c:5096 lock_acquire+0x240/0x728 kernel/locking/lockdep.c:5759 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x48/0x60 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:351 [inline] ppp_channel_bridge_input drivers/net/ppp/ppp_generic.c:2272 [inline] ppp_input+0x16c/0x854 drivers/net/ppp/ppp_generic.c:2304 ppp_async_process+0x98/0x150 drivers/net/ppp/ppp_async.c:495 tasklet_action_common+0x318/0x3f4 kernel/softirq.c:785 tasklet_action+0x68/0x8c kernel/softirq.c:811 handle_softirqs+0x2e4/0xbfc kernel/softirq.c:554 run_ksoftirqd+0x70/0x158 kernel/softirq.c:928 smpboot_thread_fn+0x4b0/0x90c kernel/smpboot.c:164 kthread+0x288/0x310 kernel/kthread.c:389 ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:860 Fixes: `4cf476ced4` ("ppp: add PPPIOCBRIDGECHAN and PPPIOCUNBRIDGECHAN ioctls") Reported-by: syzbot+bd8d55ee2acd0a71d8ce@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/66f661e2.050a0220.38ace9.000f.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tom Parkin <tparkin@katalix.com> Cc: James Chapman <jchapman@katalix.com> Link: https://patch.msgid.link/20240927074553.341910-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-02 17:24:10 -07:00
Hangbin Liu	8ed7cf66f4	selftests: rds: move include.sh to TEST_FILES The include.sh file is generated for inclusion and should not be executable. Otherwise, it will be added to kselftest-list.txt. Additionally, add the executable bit for test.py at the same time to ensure proper functionality. Fixes: `3ade6ce125` ("selftests: rds: add testing infrastructure") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20240927041349.81216-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-02 17:22:49 -07:00
Eric Dumazet	49d14b54a5	net: test for not too small csum_start in virtio_net_hdr_to_skb() syzbot was able to trigger this warning [1], after injecting a malicious packet through af_packet, setting skb->csum_start and thus the transport header to an incorrect value. We can at least make sure the transport header is after the end of the network header (with a estimated minimal size). [1] [ 67.873027] skb len=4096 headroom=16 headlen=14 tailroom=0 mac=(-1,-1) mac_len=0 net=(16,-6) trans=10 shinfo(txflags=0 nr_frags=1 gso(size=0 type=0 segs=0)) csum(0xa start=10 offset=0 ip_summed=3 complete_sw=0 valid=0 level=0) hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=0 priority=0x0 mark=0x0 alloc_cpu=10 vlan_all=0x0 encapsulation=0 inner(proto=0x0000, mac=0, net=0, trans=0) [ 67.877172] dev name=veth0_vlan feat=0x000061164fdd09e9 [ 67.877764] sk family=17 type=3 proto=0 [ 67.878279] skb linear: 00000000: 00 00 10 00 00 00 00 00 0f 00 00 00 08 00 [ 67.879128] skb frag: 00000000: 0e 00 07 00 00 00 28 00 08 80 1c 00 04 00 00 02 [ 67.879877] skb frag: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.880647] skb frag: 00000020: 00 00 02 00 00 00 08 00 1b 00 00 00 00 00 00 00 [ 67.881156] skb frag: 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.881753] skb frag: 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.882173] skb frag: 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.882790] skb frag: 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.883171] skb frag: 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.883733] skb frag: 00000080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.884206] skb frag: 00000090: 00 00 00 00 00 00 00 00 00 00 69 70 76 6c 61 6e [ 67.884704] skb frag: 000000a0: 31 00 00 00 00 00 00 00 00 00 2b 00 00 00 00 00 [ 67.885139] skb frag: 000000b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.885677] skb frag: 000000c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.886042] skb frag: 000000d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.886408] skb frag: 000000e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.887020] skb frag: 000000f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.887384] skb frag: 00000100: 00 00 [ 67.887878] ------------[ cut here ]------------ [ 67.887908] offset (-6) >= skb_headlen() (14) [ 67.888445] WARNING: CPU: 10 PID: 2088 at net/core/dev.c:3332 skb_checksum_help (net/core/dev.c:3332 (discriminator 2)) [ 67.889353] Modules linked in: macsec macvtap macvlan hsr wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 libchacha poly1305_x86_64 dummy bridge sr_mod cdrom evdev pcspkr i2c_piix4 9pnet_virtio 9p 9pnet netfs [ 67.890111] CPU: 10 UID: 0 PID: 2088 Comm: b363492833 Not tainted 6.11.0-virtme #1011 [ 67.890183] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 67.890309] RIP: 0010:skb_checksum_help (net/core/dev.c:3332 (discriminator 2)) [ 67.891043] Call Trace: [ 67.891173] <TASK> [ 67.891274] ? __warn (kernel/panic.c:741) [ 67.891320] ? skb_checksum_help (net/core/dev.c:3332 (discriminator 2)) [ 67.891333] ? report_bug (lib/bug.c:180 lib/bug.c:219) [ 67.891348] ? handle_bug (arch/x86/kernel/traps.c:239) [ 67.891363] ? exc_invalid_op (arch/x86/kernel/traps.c:260 (discriminator 1)) [ 67.891372] ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:621) [ 67.891388] ? skb_checksum_help (net/core/dev.c:3332 (discriminator 2)) [ 67.891399] ? skb_checksum_help (net/core/dev.c:3332 (discriminator 2)) [ 67.891416] ip_do_fragment (net/ipv4/ip_output.c:777 (discriminator 1)) [ 67.891448] ? __ip_local_out (./include/linux/skbuff.h:1146 ./include/net/l3mdev.h:196 ./include/net/l3mdev.h:213 net/ipv4/ip_output.c:113) [ 67.891459] ? __pfx_ip_finish_output2 (net/ipv4/ip_output.c:200) [ 67.891470] ? ip_route_output_flow (./arch/x86/include/asm/preempt.h:84 (discriminator 13) ./include/linux/rcupdate.h:96 (discriminator 13) ./include/linux/rcupdate.h:871 (discriminator 13) net/ipv4/route.c:2625 (discriminator 13) ./include/net/route.h:141 (discriminator 13) net/ipv4/route.c:2852 (discriminator 13)) [ 67.891484] ipvlan_process_v4_outbound (drivers/net/ipvlan/ipvlan_core.c:445 (discriminator 1)) [ 67.891581] ipvlan_queue_xmit (drivers/net/ipvlan/ipvlan_core.c:542 drivers/net/ipvlan/ipvlan_core.c:604 drivers/net/ipvlan/ipvlan_core.c:670) [ 67.891596] ipvlan_start_xmit (drivers/net/ipvlan/ipvlan_main.c:227) [ 67.891607] dev_hard_start_xmit (./include/linux/netdevice.h:4916 ./include/linux/netdevice.h:4925 net/core/dev.c:3588 net/core/dev.c:3604) [ 67.891620] __dev_queue_xmit (net/core/dev.h:168 (discriminator 25) net/core/dev.c:4425 (discriminator 25)) [ 67.891630] ? skb_copy_bits (./include/linux/uaccess.h:233 (discriminator 1) ./include/linux/uaccess.h:260 (discriminator 1) ./include/linux/highmem-internal.h:230 (discriminator 1) net/core/skbuff.c:3018 (discriminator 1)) [ 67.891645] ? __pskb_pull_tail (net/core/skbuff.c:2848 (discriminator 4)) [ 67.891655] ? skb_partial_csum_set (net/core/skbuff.c:5657) [ 67.891666] ? virtio_net_hdr_to_skb.constprop.0 (./include/linux/skbuff.h:2791 (discriminator 3) ./include/linux/skbuff.h:2799 (discriminator 3) ./include/linux/virtio_net.h:109 (discriminator 3)) [ 67.891684] packet_sendmsg (net/packet/af_packet.c:3145 (discriminator 1) net/packet/af_packet.c:3177 (discriminator 1)) [ 67.891700] ? _raw_spin_lock_bh (./arch/x86/include/asm/atomic.h:107 (discriminator 4) ./include/linux/atomic/atomic-arch-fallback.h:2170 (discriminator 4) ./include/linux/atomic/atomic-instrumented.h:1302 (discriminator 4) ./include/asm-generic/qspinlock.h:111 (discriminator 4) ./include/linux/spinlock.h:187 (discriminator 4) ./include/linux/spinlock_api_smp.h:127 (discriminator 4) kernel/locking/spinlock.c:178 (discriminator 4)) [ 67.891716] __sys_sendto (net/socket.c:730 (discriminator 1) net/socket.c:745 (discriminator 1) net/socket.c:2210 (discriminator 1)) [ 67.891734] ? do_sock_setsockopt (net/socket.c:2335) [ 67.891747] ? __sys_setsockopt (./include/linux/file.h:34 net/socket.c:2355) [ 67.891761] __x64_sys_sendto (net/socket.c:2222 (discriminator 1) net/socket.c:2218 (discriminator 1) net/socket.c:2218 (discriminator 1)) [ 67.891772] do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1)) [ 67.891785] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) Fixes: `9181d6f8a2` ("net: add more sanity check in virtio_net_hdr_to_skb()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240926165836.3797406-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-02 17:21:59 -07:00
Felix Fietkau	17bd3bd82f	net: gso: fix tcp fraglist segmentation after pull from frag_list Detect tcp gso fraglist skbs with corrupted geometry (see below) and pass these to skb_segment instead of skb_segment_list, as the first can segment them correctly. Valid SKB_GSO_FRAGLIST skbs - consist of two or more segments - the head_skb holds the protocol headers plus first gso_size - one or more frag_list skbs hold exactly one segment - all but the last must be gso_size Optional datapath hooks such as NAT and BPF (bpf_skb_pull_data) can modify these skbs, breaking these invariants. In extreme cases they pull all data into skb linear. For TCP, this causes a NULL ptr deref in __tcpv4_gso_segment_list_csum at tcp_hdr(seg->next). Detect invalid geometry due to pull, by checking head_skb size. Don't just drop, as this may blackhole a destination. Convert to be able to pass to regular skb_segment. Approach and description based on a patch by Willem de Bruijn. Link: https://lore.kernel.org/netdev/20240428142913.18666-1-shiming.cheng@mediatek.com/ Link: https://lore.kernel.org/netdev/20240922150450.3873767-1-willemdebruijn.kernel@gmail.com/ Fixes: `bee88cd5bd` ("net: add support for segmenting TCP fraglist GSO packets") Cc: stable@vger.kernel.org Signed-off-by: Felix Fietkau <nbd@nbd.name> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240926085315.51524-1-nbd@nbd.name Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-02 17:21:47 -07:00
Jakub Kicinski	854e9bf5c5	Merge tag 'mlx5-fixes-2024-09-25' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2024-09-25 * tag 'mlx5-fixes-2024-09-25' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5e: Fix crash caused by calling __xfrm_state_delete() twice net/mlx5e: SHAMPO, Fix overflow of hd_per_wq net/mlx5: HWS, changed E2BIG error to a negative return code net/mlx5: HWS, fixed double-free in error flow of creating SQ net/mlx5: Fix wrong reserved field in hca_cap_2 in mlx5_ifc net/mlx5e: Fix NULL deref in mlx5e_tir_builder_alloc() net/mlx5: Added cond_resched() to crdump collection net/mlx5: Fix error path in multi-packet WQE transmit ==================== Link: https://patch.msgid.link/20240925202013.45374-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-02 17:14:53 -07:00

1 2 3 4 5 ...

1308908 Commits