Commit Graph

154021 Commits

Author SHA1 Message Date
Eric Dumazet
0139806eeb net: move tcpv4_offload and tcpv6_offload to net_hotdata
These are used in TCP fast paths.

Move them into net_hotdata for better cache locality.

v2: tcpv6_offload definition depends on CONFIG_INET

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240306160031.874438-8-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 21:12:42 -08:00
Eric Dumazet
61a0be1a53 net: move ip_packet_offload and ipv6_packet_offload to net_hotdata
These structures are used in GRO and GSO paths.

v2: ipv6_packet_offload definition depends on CONFIG_INET

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240306160031.874438-7-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 21:12:42 -08:00
Eric Dumazet
edbc666cdc net: move netdev_max_backlog to net_hotdata
netdev_max_backlog is used in rx fat path.

Move it to net_hodata for better cache locality.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240306160031.874438-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 21:12:42 -08:00
Eric Dumazet
0b91fa4bfb net: move ptype_all into net_hotdata
ptype_all is used in rx/tx fast paths.

Move it to net_hotdata for better cache locality.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240306160031.874438-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 21:12:41 -08:00
Eric Dumazet
f59b5416c3 net: move netdev_tstamp_prequeue into net_hotdata
netdev_tstamp_prequeue is used in rx path.

Move it to net_hotdata for better cache locality.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240306160031.874438-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 21:12:41 -08:00
Eric Dumazet
ae6e22f7b7 net: move netdev_budget and netdev_budget to net_hotdata
netdev_budget and netdev_budget are used in rx path (net_rx_action())

Move them into net_hotdata for better cache locality.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240306160031.874438-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 21:12:41 -08:00
Eric Dumazet
2658b5a8a4 net: introduce struct net_hotdata
Instead of spreading networking critical fields
all over the places, add a custom net_hotdata
structure so that we can precisely control its layout.

In this first patch, move :

- gro_normal_batch used in rx (GRO stack)
- offload_base used in rx and tx (GRO and TSO stacks)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240306160031.874438-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 21:12:41 -08:00
Jakub Kicinski
c3874bbec9 Merge tag 'rxrpc-iothread-20240305' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
David Howells says:

====================
Here are some changes to AF_RXRPC:

 (1) Cache the transmission serial number of ACK and DATA packets in the
     rxrpc_txbuf struct and log this in the retransmit tracepoint.

 (2) Don't use atomics on rxrpc_txbuf::flags[*] and cache the intended wire
     header flags there too to avoid duplication.

 (3) Cache the wire checksum in rxrpc_txbuf to make it easier to create
     jumbo packets in future (which will require altering the wire header
     to a jumbo header and restoring it back again for retransmission).

 (4) Fix the protocol names in the wire ACK trailer struct.

 (5) Strip all the barriers and atomics out of the call timer tracking[*].

 (6) Remove atomic handling from call->tx_transmitted and
     call->acks_prev_seq[*].

 (7) Don't bother resetting the DF flag after UDP packet transmission.  To
     change it, we now call directly into UDP code, so it's quick just to
     set it every time.

 (8) Merge together the DF/non-DF branches of the DATA transmission to
     reduce duplication in the code.

 (9) Add a kvec array into rxrpc_txbuf and start moving things over to it.
     This paves the way for using page frags.

(10) Split (sub)packet preparation and timestamping out of the DATA
     transmission function.  This helps pave the way for future jumbo
     packet generation.

(11) In rxkad, don't pick values out of the wire header stored in
     rxrpc_txbuf, buf rather find them elsewhere so we can remove the wire
     header from there.

(12) Move rxrpc_send_ACK() to output.c so that it can be merged with
     rxrpc_send_ack_packet().

(13) Use rxrpc_txbuf::kvec[0] to access the wire header for the packet
     rather than directly accessing the copy in rxrpc_txbuf.  This will
     allow that to be removed to a page frag.

(14) Switch from keeping the transmission buffers in rxrpc_txbuf allocated
     in the slab to allocating them using page fragment allocators.  There
     are separate allocators for DATA packets (which persist for a while)
     and control packets (which are discarded immediately).

     We can then turn on MSG_SPLICE_PAGES when transmitting DATA and ACK
     packets.

     We can also get rid of the RCU cleanup on rxrpc_txbufs, preferring
     instead to release the page frags as soon as possible.

(15) Parse received packets before handling timeouts as the former may
     reset the latter.

(16) Make sure we don't retransmit DATA packets after all the packets have
     been ACK'd.

(17) Differentiate traces for PING ACK transmission.

(18) Switch to keeping timeouts as ktime_t rather than a number of jiffies
     as the latter is too coarse a granularity.  Only set the call timer at
     the end of the call event function from the aggregate of all the
     timeouts, thereby reducing the number of timer calls made.  In future,
     it might be possible to reduce the number of timers from one per call
     to one per I/O thread and to use a high-precision timer.

(19) Record RTT probes after successful transmission rather than recording
     it before and then cancelling it after if unsuccessful[*].  This
     allows a number of calls to get the current time to be removed.

(20) Clean up the resend algorithm as there's now no need to walk the
     transmission buffer under lock[*].  DATA packets can be retransmitted
     as soon as they're found rather than being queued up and transmitted
     when the locked is dropped.

(21) When initially parsing a received ACK packet, extract some of the
     fields from the ack info to the skbuff private data.  This makes it
     easier to do path MTU discovery in the future when the call to which a
     PING RESPONSE ACK refers has been deallocated.

[*] Possible with the move of almost all code from softirq context to the
    I/O thread.

Link: https://lore.kernel.org/r/20240301163807.385573-1-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20240304084322.705539-1-dhowells@redhat.com/ # v2

* tag 'rxrpc-iothread-20240305' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: (21 commits)
  rxrpc: Extract useful fields from a received ACK to skb priv data
  rxrpc: Clean up the resend algorithm
  rxrpc: Record probes after transmission and reduce number of time-gets
  rxrpc: Use ktimes for call timeout tracking and set the timer lazily
  rxrpc: Differentiate PING ACK transmission traces.
  rxrpc: Don't permit resending after all Tx packets acked
  rxrpc: Parse received packets before dealing with timeouts
  rxrpc: Do zerocopy using MSG_SPLICE_PAGES and page frags
  rxrpc: Use rxrpc_txbuf::kvec[0] instead of rxrpc_txbuf::wire
  rxrpc: Move rxrpc_send_ACK() to output.c with rxrpc_send_ack_packet()
  rxrpc: Don't pick values out of the wire header when setting up security
  rxrpc: Split up the DATA packet transmission function
  rxrpc: Add a kvec[] to the rxrpc_txbuf struct
  rxrpc: Merge together DF/non-DF branches of data Tx function
  rxrpc: Do lazy DF flag resetting
  rxrpc: Remove atomic handling on some fields only used in I/O thread
  rxrpc: Strip barriers and atomics off of timer tracking
  rxrpc: Fix the names of the fields in the ACK trailer struct
  rxrpc: Note cksum in txbuf
  rxrpc: Convert rxrpc_txbuf::flags into a mask and don't use atomics
  ...
====================

Link: https://lore.kernel.org/r/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 20:59:58 -08:00
Jakub Kicinski
e3afe5dd3a Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

No conflicts.

Adjacent changes:

net/core/page_pool_user.c
  0b11b1c5c3 ("netdev: let netlink core handle -EMSGSIZE errors")
  429679dcf7 ("page_pool: fix netlink dump stop/resume")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 10:29:36 -08:00
Linus Torvalds
df4793505a Merge tag 'net-6.8-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
 "Including fixes from bpf, ipsec and netfilter.

  No solution yet for the stmmac issue mentioned in the last PR, but it
  proved to be a lockdep false positive, not a blocker.

  Current release - regressions:

   - dpll: move all dpll<>netdev helpers to dpll code, fix build
     regression with old compilers

  Current release - new code bugs:

   - page_pool: fix netlink dump stop/resume

  Previous releases - regressions:

   - bpf: fix verifier to check bpf_func_state->callback_depth when
     pruning states as otherwise unsafe programs could get accepted

   - ipv6: avoid possible UAF in ip6_route_mpath_notify()

   - ice: reconfig host after changing MSI-X on VF

   - mlx5:
       - e-switch, change flow rule destination checking
       - add a memory barrier to prevent a possible null-ptr-deref
       - switch to using _bh variant of of spinlock where needed

  Previous releases - always broken:

   - netfilter: nf_conntrack_h323: add protection for bmp length out of
     range

   - bpf: fix to zero-initialise xdp_rxq_info struct before running XDP
     program in CPU map which led to random xdp_md fields

   - xfrm: fix UDP encapsulation in TX packet offload

   - netrom: fix data-races around sysctls

   - ice:
       - fix potential NULL pointer dereference in ice_bridge_setlink()
       - fix uninitialized dplls mutex usage

   - igc: avoid returning frame twice in XDP_REDIRECT

   - i40e: disable NAPI right after disabling irqs when handling
     xsk_pool

   - geneve: make sure to pull inner header in geneve_rx()

   - sparx5: fix use after free inside sparx5_del_mact_entry

   - dsa: microchip: fix register write order in ksz8_ind_write8()

  Misc:

   - selftests: mptcp: fixes for diag.sh"

* tag 'net-6.8-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (63 commits)
  net: pds_core: Fix possible double free in error handling path
  netrom: Fix data-races around sysctl_net_busy_read
  netrom: Fix a data-race around sysctl_netrom_link_fails_count
  netrom: Fix a data-race around sysctl_netrom_routing_control
  netrom: Fix a data-race around sysctl_netrom_transport_no_activity_timeout
  netrom: Fix a data-race around sysctl_netrom_transport_requested_window_size
  netrom: Fix a data-race around sysctl_netrom_transport_busy_delay
  netrom: Fix a data-race around sysctl_netrom_transport_acknowledge_delay
  netrom: Fix a data-race around sysctl_netrom_transport_maximum_tries
  netrom: Fix a data-race around sysctl_netrom_transport_timeout
  netrom: Fix data-races around sysctl_netrom_network_ttl_initialiser
  netrom: Fix a data-race around sysctl_netrom_obsolescence_count_initialiser
  netrom: Fix a data-race around sysctl_netrom_default_path_quality
  netfilter: nf_conntrack_h323: Add protection for bmp length out of range
  netfilter: nf_tables: mark set as dead when unbinding anonymous set with timeout
  netfilter: nft_ct: fix l3num expectations with inet pseudo family
  netfilter: nf_tables: reject constant set with timeout
  netfilter: nf_tables: disallow anonymous set with timeout flag
  net/rds: fix WARNING in rds_conn_connect_if_down
  net: dsa: microchip: fix register write order in ksz8_ind_write8()
  ...
2024-03-07 09:23:33 -08:00
Jason Xing
0ab544b6f0 tcp: add tracing of skbaddr in tcp_event_skb class
Use the existing parameter and print the address of skbaddr
as other trace functions do.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-07 15:29:15 +01:00
Jason Xing
4e441bb8ac tcp: add tracing of skb/skaddr in tcp_event_sk_skb class
Printing the addresses can help us identify the exact skb/sk
for those system in which it's not that easy to run BPF program.
As we can see, it already fetches those, then use it directly
and it will print like below:

...tcp_retransmit_skb: skbaddr=XXX skaddr=XXX family=AF_INET...

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-07 15:29:15 +01:00
Maxime Chevallier
68ac1e4642 net: phylink: clean the pcs_get_state documentation
commit 4d72c3bb60 ("net: phylink: strip out pre-March 2020 legacy code")
dropped the mac_pcs_get_state ops in phylink_mac_ops in favor of
dedicated PCS operation pcs_get_state. However, the documentation for
the pcs_get_state ops was incorrectly converted and now self-references.

Drop the extra comment.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-07 15:27:05 +01:00
Linus Torvalds
67be068d31 Merge tag 'vfs-6.8-release.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:

 - Get rid of copy_mc flag in iov_iter which really only makes sense for
   the core dumping code so move it out of the generic iov iter code and
   make it coredump's problem. See the detailed commit description.

 - Revert fs/aio: Make io_cancel() generate completions again

   The initial fix here was predicated on the assumption that calling
   ki_cancel() didn't complete aio requests. However, that turned out to
   be wrong since the two drivers that actually make use of this set a
   cancellation function that performs the cancellation correctly. So
   revert this change.

 - Ensure that the test for IOCB_AIO_RW always happens before the read
   from ki_ctx.

* tag 'vfs-6.8-release.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  iov_iter: get rid of 'copy_mc' flag
  fs/aio: Check IOCB_AIO_RW before the struct aio_kiocb conversion
  Revert "fs/aio: Make io_cancel() generate completions again"
2024-03-06 08:12:27 -08:00
Juntong Deng
eeb78df406 inet: Add getsockopt support for IP_ROUTER_ALERT and IPV6_ROUTER_ALERT
Currently getsockopt does not support IP_ROUTER_ALERT and
IPV6_ROUTER_ALERT, and we are unable to get the values of these two
socket options through getsockopt.

This patch adds getsockopt support for IP_ROUTER_ALERT and
IPV6_ROUTER_ALERT.

Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-06 12:37:06 +00:00
Linus Torvalds
a50026bdb8 iov_iter: get rid of 'copy_mc' flag
This flag is only set by one single user: the magical core dumping code
that looks up user pages one by one, and then writes them out using
their kernel addresses (by using a BVEC_ITER).

That actually ends up being a huge problem, because while we do use
copy_mc_to_kernel() for this case and it is able to handle the possible
machine checks involved, nothing else is really ready to handle the
failures caused by the machine check.

In particular, as reported by Tong Tiangen, we don't actually support
fault_in_iov_iter_readable() on a machine check area.

As a result, the usual logic for writing things to a file under a
filesystem lock, which involves doing a copy with page faults disabled
and then if that fails trying to fault pages in without holding the
locks with fault_in_iov_iter_readable() does not work at all.

We could decide to always just make the MC copy "succeed" (and filling
the destination with zeroes), and that would then create a core dump
file that just ignores any machine checks.

But honestly, this single special case has been problematic before, and
means that all the normal iov_iter code ends up slightly more complex
and slower.

See for example commit c9eec08bac ("iov_iter: Don't deal with
iter->copy_mc in memcpy_from_iter_mc()") where David Howells
re-organized the code just to avoid having to check the 'copy_mc' flags
inside the inner iov_iter loops.

So considering that we have exactly one user, and that one user is a
non-critical special case that doesn't actually ever trigger in real
life (Tong found this with manual error injection), the sane solution is
to just decide that the onus on handling the machine check lines on that
user instead.

Ergo, do the copy_mc_to_kernel() in the core dump logic itself, copying
the user data to a stable kernel page before writing it out.

Fixes: f1982740f5 ("iov_iter: Convert iterate*() to inline funcs")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
Link: https://lore.kernel.org/r/20240305133336.3804360-1-tongtiangen@huawei.com
Link: https://lore.kernel.org/all/4e80924d-9c85-f13a-722a-6a5d2b1c225a@huawei.com/
Tested-by: David Howells <dhowells@redhat.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reported-by: Tong Tiangen <tongtiangen@huawei.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-03-06 10:52:12 +01:00
Andrew Lunn
49168d1980 net: phy: Add phy_support_eee() indicating MAC support EEE
In order for EEE to operate, both the MAC and the PHY need to support
it, similar to how pause works. With some exception - a number of PHYs
have SmartEEE or AutoGrEEEn support in order to provide some EEE-like
power savings with non-EEE capable MACs.

Copy the pause concept and add the call phy_support_eee() which the MAC
makes after connecting the PHY to indicate it supports EEE. phylib will
then advertise EEE when auto-neg is performed.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/r/20240302195306.3207716-6-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-05 19:21:17 -08:00
Andrew Lunn
fe0d4fd928 net: phy: Keep track of EEE configuration
Have phylib keep track of the EEE configuration. This simplifies the
MAC drivers, in that they don't need to store it.

Future patches to phylib will also make use of this information to
further simplify the MAC drivers.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/r/20240302195306.3207716-4-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-05 19:21:17 -08:00
Andrew Lunn
e3b6876ab8 net: phy: Add phydev->enable_tx_lpi to simplify adjust link callbacks
MAC drivers which support EEE need to know the results of the EEE
auto-neg in order to program the hardware to perform EEE or not.  The
oddly named phy_init_eee() can be used to determine this, it returns 0
if EEE should be used, or a negative error code,
e.g. -EOPPROTONOTSUPPORT if the PHY does not support EEE or negotiate
resulted in it not being used.

However, many MAC drivers get this wrong. Add phydev->enable_tx_lpi
which indicates the result of the autoneg for EEE, including if EEE is
administratively disabled with ethtool. The MAC driver can then access
this in the same way as link speed and duplex in the adjust link
callback. If enable_tx_lpi is true, the MAC should send low power
indications and does not need to consider anything else with respect
to EEE.

Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/r/20240302195306.3207716-3-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-05 19:21:17 -08:00
Russell King
6f2fc8584a net: add helpers for EEE configuration
Add helpers that phylib and phylink can use to manage EEE configuration
and determine whether the MAC should be permitted to use LPI based on
that configuration.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/r/20240302195306.3207716-2-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-05 19:21:17 -08:00
Jakub Kicinski
289e922582 dpll: move all dpll<>netdev helpers to dpll code
Older versions of GCC really want to know the full definition
of the type involved in rcu_assign_pointer().

struct dpll_pin is defined in a local header, net/core can't
reach it. Move all the netdev <> dpll code into dpll, where
the type is known. Otherwise we'd need multiple function calls
to jump between the compilation units.

This is the same problem the commit under fixes was trying to address,
but with rcu_assign_pointer() not rcu_dereference().

Some of the exports are not needed, networking core can't
be a module, we only need exports for the helpers used by
drivers.

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Link: https://lore.kernel.org/all/35a869c8-52e8-177-1d4d-e57578b99b6@linux-m68k.org/
Fixes: 640f41ed33 ("dpll: fix build failure due to rcu_dereference_check() on unknown type")
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240305013532.694866-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-05 18:36:42 -08:00
David Howells
153f90a066 rxrpc: Use ktimes for call timeout tracking and set the timer lazily
Track the call timeouts as ktimes rather than jiffies as the latter's
granularity is too high and only set the timer at the end of the event
handling function.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org
2024-03-05 23:35:25 +00:00
David Howells
12a66e77c4 rxrpc: Differentiate PING ACK transmission traces.
There are three points that transmit PING ACKs and all of them use the same
trace string.  Change two of them to use different strings.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org
2024-03-05 23:35:25 +00:00
Linus Torvalds
1c46d04a0d Merge tag 'hyperv-fixes-signed-20240303' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv fixes from Wei Liu:

 - Multiple fixes, cleanups and documentations for Hyper-V core code and
   drivers

* tag 'hyperv-fixes-signed-20240303' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  Drivers: hv: vmbus: make hv_bus const
  x86/hyperv: Allow 15-bit APIC IDs for VTL platforms
  x86/hyperv: Make encrypted/decrypted changes safe for load_unaligned_zeropad()
  x86/mm: Regularize set_memory_p() parameters and make non-static
  x86/hyperv: Use slow_virt_to_phys() in page transition hypervisor callback
  Documentation: hyperv: Add overview of PCI pass-thru device support
  Drivers: hv: vmbus: Update indentation in create_gpadl_header()
  Drivers: hv: vmbus: Remove duplication and cleanup code in create_gpadl_header()
  fbdev/hyperv_fb: Fix logic error for Gen2 VMs in hvfb_getmem()
  Drivers: hv: vmbus: Calculate ring buffer size for more efficient use of memory
  hv_utils: Allow implicit ICTIMESYNCFLAG_SYNC
2024-03-05 12:38:50 -08:00
Ricardo B. Marliere
e556001169 nfc: core: make nfc_class constant
Since commit 43a7206b09 ("driver core: class: make class_register() take
a const *"), the driver core allows for struct class to be in read-only
memory, so move the nfc_class structure to be declared at build time
placing it into read-only memory, instead of having to be dynamically
allocated at boot time.

Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240302-class_cleanup-net-next-v1-6-8fa378595b93@marliere.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-05 11:21:18 -08:00
Abhishek Chauhan
885c36e59f net: Re-use and set mono_delivery_time bit for userspace tstamp packets
Bridge driver today has no support to forward the userspace timestamp
packets and ends up resetting the timestamp. ETF qdisc checks the
packet coming from userspace and encounters to be 0 thereby dropping
time sensitive packets. These changes will allow userspace timestamps
packets to be forwarded from the bridge to NIC drivers.

Setting the same bit (mono_delivery_time) to avoid dropping of
userspace tstamp packets in the forwarding path.

Existing functionality of mono_delivery_time remains unaltered here,
instead just extended with userspace tstamp support for bridge
forwarding path.

Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20240301201348.2815102-1-quic_abchauha@quicinc.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-05 13:41:16 +01:00
Eric Dumazet
c7583e9f76 net: gro: enable fast path for more cases
Currently the so-called GRO fast path is only enabled for
napi_frags_skb() callers.

After the prior patch, we no longer have to clear frag0 whenever
we pulled bytes to skb->head.

We therefore can initialize frag0 to skb->data so that GRO
fast path can be used in the following additional cases:

- Drivers using header split (populating skb->data with headers,
  and having payload in one or more page fragments).

- Drivers not using any page frag (entire packet is in skb->data)

Add a likely() in skb_gro_may_pull() to help the compiler
to generate better code if possible.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-05 13:30:11 +01:00
Eric Dumazet
bd56a29c7a net: gro: change skb_gro_network_header()
Change skb_gro_network_header() to accept a const sk_buff
and to no longer check if frag0 is NULL or not.

This allows to remove skb_gro_frag0_invalidate()
which is seen in profiles when header-split is enabled.

sk_buff parameter is constified for skb_gro_header_fast(),
inet_gro_compute_pseudo() and ip6_gro_compute_pseudo().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-05 13:30:11 +01:00
Eric Dumazet
93e16ea025 net: gro: rename skb_gro_header_hard()
skb_gro_header_hard() is renamed to skb_gro_may_pull() to match
the convention used by common helpers like pskb_may_pull().

This means the condition is inverted:

	if (skb_gro_header_hard(skb, hlen))
		slow_path();

becomes:

	if (!skb_gro_may_pull(skb, hlen))
		slow_path();

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-05 13:30:11 +01:00
Yunsheng Lin
a0727489ac net: introduce page_frag_cache_drain()
When draining a page_frag_cache, most user are doing
the similar steps, so introduce an API to avoid code
duplication.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-05 11:38:14 +01:00
Yunsheng Lin
411c5f3680 mm/page_alloc: modify page_frag_alloc_align() to accept align as an argument
napi_alloc_frag_align() and netdev_alloc_frag_align() accept
align as an argument, and they are thin wrappers around the
__napi_alloc_frag_align() and __netdev_alloc_frag_align() APIs
doing the alignment checking and align mask conversion, in order
to call page_frag_alloc_align() directly. The intention here is
to keep the alignment checking and the alignmask conversion in
in-line wrapper to avoid those kind of operations during execution
time since it can usually be handled during compile time.

We are going to use page_frag_alloc_align() in vhost_net.c, it
need the same kind of alignment checking and alignmask conversion,
so split up page_frag_alloc_align into an inline wrapper doing the
above operation, and add __page_frag_alloc_align() which is passed
with the align mask the original function expected as suggested by
Alexander.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
CC: Alexander Duyck <alexander.duyck@gmail.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-05 11:38:14 +01:00
Jakub Kicinski
4daa873133 Merge tag 'mlx5-fixes-2024-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:

====================
mlx5 fixes 2024-03-01

This series provides bug fixes to mlx5 driver.
Please pull and let me know if there is any problem.

* tag 'mlx5-fixes-2024-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5e: Switch to using _bh variant of of spinlock API in port timestamping NAPI poll context
  net/mlx5e: Use a memory barrier to enforce PTP WQ xmit submission tracking occurs after populating the metadata_map
  net/mlx5e: Fix MACsec state loss upon state update in offload path
  net/mlx5e: Change the warning when ignore_flow_level is not supported
  net/mlx5: Check capability for fw_reset
  net/mlx5: Fix fw reporter diagnose output
  net/mlx5: E-switch, Change flow rule destination checking
  Revert "net/mlx5e: Check the number of elements before walk TC rhashtable"
  Revert "net/mlx5: Block entering switchdev mode with ns inconsistency"
====================

Link: https://lore.kernel.org/r/20240302070318.62997-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-04 21:00:28 -08:00
Eric Dumazet
345a6e2631 tcp: align tcp_sock_write_rx group
Stephen Rothwell and kernel test robot reported that some arches
(parisc, hexagon) and/or compilers would not like blamed commit.

Lets make sure tcp_sock_write_rx group does not start with a hole.

While we are at it, correct tcp_sock_write_tx CACHELINE_ASSERT_GROUP_SIZE()
since after the blamed commit, we went to 105 bytes.

Fixes: 9912362205 ("tcp: remove some holes in struct tcp_sock")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/netdev/20240301121108.5d39e4f9@canb.auug.org.au/
Closes: https://lore.kernel.org/oe-kbuild-all/202403011451.csPYOS3C-lkp@intel.com/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Simon Horman <horms@kernel.org> # build-tested
Link: https://lore.kernel.org/r/20240301171945.2958176-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-04 20:46:18 -08:00
Steven Rostedt (Google)
51270d573a tracing/net_sched: Fix tracepoints that save qdisc_dev() as a string
I'm updating __assign_str() and will be removing the second parameter. To
make sure that it does not break anything, I make sure that it matches the
__string() field, as that is where the string is actually going to be
saved in. To make sure there's nothing that breaks, I added a WARN_ON() to
make sure that what was used in __string() is the same that is used in
__assign_str().

In doing this change, an error was triggered as __assign_str() now expects
the string passed in to be a char * value. I instead had the following
warning:

include/trace/events/qdisc.h: In function ‘trace_event_raw_event_qdisc_reset’:
include/trace/events/qdisc.h:91:35: error: passing argument 1 of 'strcmp' from incompatible pointer type [-Werror=incompatible-pointer-types]
   91 |                 __assign_str(dev, qdisc_dev(q));

That's because the qdisc_enqueue() and qdisc_reset() pass in qdisc_dev(q)
to __assign_str() and to __string(). But that function returns a pointer
to struct net_device and not a string.

It appears that these events are just saving the pointer as a string and
then reading it as a string as well.

Use qdisc_dev(q)->name to save the device instead.

Fixes: a34dac0b90 ("net_sched: add tracepoints for qdisc_reset() and qdisc_destroy()")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-04 09:35:54 +00:00
Eric Dumazet
cc15bd10e7 net: adopt skb_network_header_len() more broadly
(skb_transport_header(skb) - skb_network_header(skb))
can be replaced by skb_network_header_len(skb)

Add a DEBUG_NET_WARN_ON_ONCE() in skb_network_header_len()
to catch cases were the transport_header was not set.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-04 08:47:06 +00:00
Jakub Kicinski
4b2765ae41 Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2024-02-29

We've added 119 non-merge commits during the last 32 day(s) which contain
a total of 150 files changed, 3589 insertions(+), 995 deletions(-).

The main changes are:

1) Extend the BPF verifier to enable static subprog calls in spin lock
   critical sections, from Kumar Kartikeya Dwivedi.

2) Fix confusing and incorrect inference of PTR_TO_CTX argument type
   in BPF global subprogs, from Andrii Nakryiko.

3) Larger batch of riscv BPF JIT improvements and enabling inlining
   of the bpf_kptr_xchg() for RV64, from Pu Lehui.

4) Allow skeleton users to change the values of the fields in struct_ops
   maps at runtime, from Kui-Feng Lee.

5) Extend the verifier's capabilities of tracking scalars when they
   are spilled to stack, especially when the spill or fill is narrowing,
   from Maxim Mikityanskiy & Eduard Zingerman.

6) Various BPF selftest improvements to fix errors under gcc BPF backend,
   from Jose E. Marchesi.

7) Avoid module loading failure when the module trying to register
   a struct_ops has its BTF section stripped, from Geliang Tang.

8) Annotate all kfuncs in .BTF_ids section which eventually allows
   for automatic kfunc prototype generation from bpftool, from Daniel Xu.

9) Several updates to the instruction-set.rst IETF standardization
   document, from Dave Thaler.

10) Shrink the size of struct bpf_map resp. bpf_array,
    from Alexei Starovoitov.

11) Initial small subset of BPF verifier prepwork for sleepable bpf_timer,
    from Benjamin Tissoires.

12) Fix bpftool to be more portable to musl libc by using POSIX's
    basename(), from Arnaldo Carvalho de Melo.

13) Add libbpf support to gcc in CORE macro definitions,
    from Cupertino Miranda.

14) Remove a duplicate type check in perf_event_bpf_event,
    from Florian Lehner.

15) Fix bpf_spin_{un,}lock BPF helpers to actually annotate them
    with notrace correctly, from Yonghong Song.

16) Replace the deprecated bpf_lpm_trie_key 0-length array with flexible
    array to fix build warnings, from Kees Cook.

17) Fix resolve_btfids cross-compilation to non host-native endianness,
    from Viktor Malik.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (119 commits)
  selftests/bpf: Test if shadow types work correctly.
  bpftool: Add an example for struct_ops map and shadow type.
  bpftool: Generated shadow variables for struct_ops maps.
  libbpf: Convert st_ops->data to shadow type.
  libbpf: Set btf_value_type_id of struct bpf_map for struct_ops.
  bpf: Replace bpf_lpm_trie_key 0-length array with flexible array
  bpf, arm64: use bpf_prog_pack for memory management
  arm64: patching: implement text_poke API
  bpf, arm64: support exceptions
  arm64: stacktrace: Implement arch_bpf_stack_walk() for the BPF JIT
  bpf: add is_async_callback_calling_insn() helper
  bpf: introduce in_sleepable() helper
  bpf: allow more maps in sleepable bpf programs
  selftests/bpf: Test case for lacking CFI stub functions.
  bpf: Check cfi_stubs before registering a struct_ops type.
  bpf: Clarify batch lookup/lookup_and_delete semantics
  bpf, docs: specify which BPF_ABS and BPF_IND fields were zero
  bpf, docs: Fix typos in instruction-set.rst
  selftests/bpf: update tcp_custom_syncookie to use scalar packet offset
  bpf: Shrink size of struct bpf_map/bpf_array.
  ...
====================

Link: https://lore.kernel.org/r/20240301001625.8800-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-02 20:50:59 -08:00
Ming Lei
7838b46561 block: define bvec_iter as __packed __aligned(4)
In commit 19416123ab ("block: define 'struct bvec_iter' as packed"),
what we need is to save the 4byte padding, and avoid `bio` to spread on
one extra cache line.

It is enough to define it as '__packed __aligned(4)', as '__packed'
alone means byte aligned, and can cause compiler to generate horrible
code on architectures that don't support unaligned access in case that
bvec_iter is embedded in other structures.

Cc: Mikulas Patocka <mpatocka@redhat.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixes: 19416123ab ("block: define 'struct bvec_iter' as packed")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-03-02 09:15:01 -08:00
Moshe Shemesh
5e6107b499 net/mlx5: Check capability for fw_reset
Functions which can't access MFRL (Management Firmware Reset Level)
register, have no use of fw_reset structures or events. Remove fw_reset
structures allocation and registration for fw reset events notifications
for these functions.

Having the devlink param enable_remote_dev_reset on functions that don't
have this capability is misleading as these functions are not allowed to
influence the reset flow. Hence, this patch removes this parameter for
such functions.

In addition, return not supported on devlink reload action fw_activate
for these functions.

Fixes: 38b9f903f2 ("net/mlx5: Handle sync reset request event")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-03-01 23:02:26 -08:00
Linus Torvalds
fbf9e3b697 Merge tag 'sound-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
 "The amount of changes wasn't as small as wished, but all reasonably
  small fixes. There is a PCM core API change, which is for correcting
  the behavior change we took in 6.8. The rest are device-specific fixes
  for ASoC AMD, Qualcomm, Cirrus codecs, HD-audio quirks & co"

* tag 'sound-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ASoC: amd: yc: Fix non-functional mic on Lenovo 21J2
  ASoC: amd: yc: add new YC platform variant (0x63) support
  ALSA: hda/realtek - ALC285 reduce pop noise from Headphone port
  ASoC: amd: yc: Add Lenovo ThinkBook 21J0 into DMI quirk table
  ALSA: hda/realtek: Add special fixup for Lenovo 14IRP8
  ASoC: soc-card: Fix missing locking in snd_soc_card_get_kcontrol()
  ALSA: hda/realtek: tas2781: enable subwoofer volume control
  ALSA: pcm: clarify and fix default msbits value for all formats
  ASoC: qcom: Fix uninitialized pointer dmactl
  ALSA: hda/realtek: fix mute/micmute LED For HP mt440
  ALSA: Drop leftover snd-rtctimer stuff from Makefile
  ALSA: ump: Fix the discard error code from snd_ump_legacy_open()
  ALSA: hda/realtek: Enable Mute LED on HP 840 G8 (MB 8AB8)
  ASoC: cs35l56: Must clear HALO_STATE before issuing SYSTEM_RESET
  ALSA: hda/realtek: Fix top speaker connection on Dell Inspiron 16 Plus 7630
  ALSA: firewire-lib: fix to check cycle continuity
2024-03-01 11:32:41 -08:00
Linus Torvalds
7187ea0978 Merge tag 'drm-fixes-2024-03-01' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
 "Bunch of fixes, xe, amdgpu, nouveau and tegra all have a few. Then
  drm/bridge including some drivers/soc fallout fixes. The biggest thing
  in here is a new unit test for some buddy allocator fixes, otherwise a
  misc fbcon, ttm unit test and one msm revert.

  Seems pretty normal for this stage.

  buddy:
   - two allocation fixes + unit test

  fbcon:
   - font restore syzkaller fix

  ttm:
   - kunit test fix

  bridge:
   - fix aux-hpd leaks
   - fix aux-hpd registration
   - fix use after free in soc/qcom
   - fix boot on soc/qcom

  xe:
   - A couple of tracepoint updates from Priyanka and Lucas
   - Make sure BINDs are completed before accepting UNBINDs on LR vms
   - Don't arbitrarily restrict max number of batched binds
   - Add uapi for dumpable bos (agreed on IRC)
   - Remove unused uapi flags and a leftover comment
   - A couple of fixes related to the execlist backend

  msm:
   - DP: Revert a change which was causing a HDP regression

  amdgpu:
   - Fix potential buffer overflow
   - Fix power min cap
   - Suspend/resume fix
   - SI PM fix
   - eDP fix

  nouveau:
   - fix a misreported VRAM sizing
   - fix a regression in suspend/resume due to freeing

  tegra:
   - host1x reset fix
   - only remove existing driver if display is possible"

* tag 'drm-fixes-2024-03-01' of https://gitlab.freedesktop.org/drm/kernel: (32 commits)
  drm/nouveau: keep DMA buffers required for suspend/resume
  nouveau: report byte usage in VRAM usage
  drm/xe/xe_trace: Add move_lacks_source detail to xe_bo_move trace
  drm/xe: Deny unbinds if uapi ufence pending
  drm/xe: Expose user fence from xe_sync_entry
  drm/xe: Use pointers in trace events
  drm/xe/xe_bo_move: Enhance xe_bo_move trace
  drm/xe/mmio: fix build warning for BAR resize on 32-bit
  drm/xe: get rid of MAX_BINDS
  drm/xe: Use vmalloc for array of bind allocation in bind IOCTL
  drm/xe: Don't support execlists in xe_gt_tlb_invalidation layer
  drm/xe: Fix execlist splat
  drm/xe/uapi: Remove unused flags
  drm/xe/uapi: Remove DRM_XE_VM_BIND_FLAG_ASYNC comment left over
  drm/xe: Add uapi for dumpable bos
  drm/amd/display: Add monitor patch for specific eDP
  Revert "drm/msm/dp: use drm_bridge_hpd_notify() to report HPD status changes"
  drm/tests/drm_buddy: add alloc_range_bias test
  drm/buddy: check range allocation matches alignment
  drm/buddy: fix range bias
  ...
2024-03-01 11:23:08 -08:00
Arnd Bergmann
eb2c11b27c net: bql: fix building with BQL disabled
It is now possible to disable BQL, but that causes the cpsw driver to break:

drivers/net/ethernet/ti/am65-cpsw-nuss.c:297:28: error: no member named 'dql' in 'struct netdev_queue'
  297 |                    dql_avail(&netif_txq->dql),

There is already a helper function in net/sch_generic.h that could
be used to help here. Move its implementation into the common
linux/netdevice.h along with the other bql interfaces and change
both users over to the new interface.

Fixes: ea7f3cfaa5 ("net: bql: allow the config to be disabled")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 08:46:15 +00:00
Geert Uytterhoeven
f29f9199c2 Simplify net_dbg_ratelimited() dummy
There is no need to wrap calls to the no_printk() helper inside an
always-false check, as no_printk() already does that internally.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 08:44:54 +00:00
Eric Dumazet
fca34cc075 ipv6: annotate data-races around idev->cnf.ignore_routes_with_linkdown
idev->cnf.ignore_routes_with_linkdown can be used without any locks,
add appropriate annotations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 08:42:32 +00:00
Eric Dumazet
32f754176e ipv6: annotate data-races around cnf.forwarding
idev->cnf.forwarding and net->ipv6.devconf_all->forwarding
might be read locklessly, add appropriate READ_ONCE()
and WRITE_ONCE() annotations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 08:42:31 +00:00
Eric Dumazet
e7135f4849 ipv6: annotate data-races around cnf.mtu6
idev->cnf.mtu6 might be read locklessly, add appropriate READ_ONCE()
and WRITE_ONCE() annotations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 08:42:31 +00:00
Eric Dumazet
096361b155 ipv6: add ipv6_devconf_read_txrx cacheline_group
IPv6 TX and RX fast path use the following fields:

- disable_ipv6
- hop_limit
- mtu6
- forwarding
- disable_policy
- proxy_ndp

Place them in a group to increase data locality.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 08:42:30 +00:00
Michael Kelley
b820954429 Drivers: hv: vmbus: Calculate ring buffer size for more efficient use of memory
The VMBUS_RING_SIZE macro adds space for a ring buffer header to the
requested ring buffer size.  The header size is always 1 page, and so
its size varies based on the PAGE_SIZE for which the kernel is built.
If the requested ring buffer size is a large power-of-2 size and the header
size is small, the resulting size is inefficient in its use of memory.
For example, a 512 Kbyte ring buffer with a 4 Kbyte page size results in
a 516 Kbyte allocation, which is rounded to up 1 Mbyte by the memory
allocator, and wastes 508 Kbytes of memory.

In such situations, the exact size of the ring buffer isn't that important,
and it's OK to allocate the 4 Kbyte header at the beginning of the 512
Kbytes, leaving the ring buffer itself with just 508 Kbytes. The memory
allocation can be 512 Kbytes instead of 1 Mbyte and nothing is wasted.

Update VMBUS_RING_SIZE to implement this approach for "large" ring buffer
sizes.  "Large" is somewhat arbitrarily defined as 8 times the size of
the ring buffer header (which is of size PAGE_SIZE).  For example, for
4 Kbyte PAGE_SIZE, ring buffers of 32 Kbytes and larger use the first
4 Kbytes as the ring buffer header.  For 64 Kbyte PAGE_SIZE, ring buffers
of 512 Kbytes and larger use the first 64 Kbytes as the ring buffer
header.  In both cases, smaller sizes add space for the header so
the ring size isn't reduced too much by using part of the space for
the header.  For example, with a 64 Kbyte page size, we don't want
a 128 Kbyte ring buffer to be reduced to 64 Kbytes by allocating half
of the space for the header.  In such a case, the memory allocation
is less efficient, but it's the best that can be done.

While the new algorithm slightly changes the amount of space allocated
for ring buffers by drivers that use VMBUS_RING_SIZE, the devices aren't
known to be sensitive to small changes in ring buffer size, so there
shouldn't be any effect.

Fixes: c1135c7fd0 ("Drivers: hv: vmbus: Introduce types of GPADL")
Fixes: 6941f67ad3 ("hv_netvsc: Calculate correct ring size when PAGE_SIZE is not 4 Kbytes")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218502
Cc: stable@vger.kernel.org
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Tested-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
Link: https://lore.kernel.org/r/20240229004533.313662-1-mhklinux@outlook.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20240229004533.313662-1-mhklinux@outlook.com>
2024-03-01 08:19:06 +00:00
Dave Airlie
aa5fe428d5 Merge tag 'drm-xe-fixes-2024-02-29' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
UAPI Changes:
- A couple of tracepoint updates from Priyanka and Lucas.
- Make sure BINDs are completed before accepting UNBINDs on LR vms.
- Don't arbitrarily restrict max number of batched binds.
- Add uapi for dumpable bos (agreed on IRC).
- Remove unused uapi flags and a leftover comment.

Driver Changes:
- A couple of fixes related to the execlist backend.

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZeCBg4MA2hd1oggN@fedora
2024-03-01 13:46:59 +10:00
Dave Airlie
45046af3d0 Merge tag 'drm-misc-fixes-2024-02-29' of https://anongit.freedesktop.org/git/drm/drm-misc into drm-fixes
A reset fix for host1x, a resource leak fix and a probe fix for aux-hpd,
a use-after-free fix and a boot fix for a pmic_glink qcom driver in
drivers/soc, a fix for the simpledrm/tegra transition, a kunit fix for
the TTM tests, a font handling fix for fbcon, two allocation fixes and a
kunit test to cover them for drm/buddy

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <mripard@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240229-angelic-adorable-teal-fbfabb@houat
2024-03-01 13:13:06 +10:00
Jakub Kicinski
65f5dd4f02 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

net/mptcp/protocol.c
  adf1bb78da ("mptcp: fix snd_wnd initialization for passive socket")
  9426ce476a ("mptcp: annotate lockless access for RX path fields")
https://lore.kernel.org/all/20240228103048.19255709@canb.auug.org.au/

Adjacent changes:

drivers/dpll/dpll_core.c
  0d60d8df6f ("dpll: rely on rcu for netdev_dpll_pin()")
  e7f8df0e81 ("dpll: move xa_erase() call in to match dpll_pin_alloc() error path order")

drivers/net/veth.c
  1ce7d306ea ("veth: try harder when allocating queue memory")
  0bef512012 ("net: add netdev_lockdep_set_classes() to virtual drivers")

drivers/net/wireless/intel/iwlwifi/mvm/d3.c
  8c9bef26e9 ("wifi: iwlwifi: mvm: d3: implement suspend with MLO")
  78f65fbf42 ("wifi: iwlwifi: mvm: ensure offloading TID queue exists")

net/wireless/nl80211.c
  f78c137533 ("wifi: nl80211: reject iftype change with mesh ID change")
  414532d8aa ("wifi: cfg80211: use IEEE80211_MAX_MESH_ID_LEN appropriately")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-29 14:24:56 -08:00