linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-09 04:21:03 -04:00

Author	SHA1	Message	Date
Jakub Kicinski	fd5f4d7da2	Merge branch 'splice-net-rewrite-splice-to-socket-fix-splice_f_more-and-handle-msg_splice_pages-in-af_tls' David Howells says: ==================== splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS Here are patches to do the following: (1) Block MSG_SENDPAGE_* flags from leaking into ->sendmsg() from userspace, whilst allowing splice_to_socket() to pass them in. (2) Allow MSG_SPLICE_PAGES to be passed into tls_*_sendmsg(). Until support is added, it will be ignored and a splice-driven sendmsg() will be treated like a normal sendmsg(). TCP, UDP, AF_UNIX and Chelsio-TLS already handle the flag in net-next. (3) Replace a chain of functions to splice-to-sendpage with a single function to splice via sendmsg() with MSG_SPLICE_PAGES. This allows a bunch of pages to be spliced from a pipe in a single call using a bio_vec[] and pushes the main processing loop down into the bowels of the protocol driver rather than repeatedly calling in with a page at a time. (4) Provide a ->splice_eof() op[2] that allows splice to signal to its output that the input observed a premature EOF and that the caller didn't flag SPLICE_F_MORE, thereby allowing a corked socket to be flushed. This attempts to maintain the current behaviour. It is also not called if we didn't manage to read any data and so didn't called the actor function. This needs routing though several layers to get it down to the network protocol. [!] Note that I chose not to pass in any flags - I'm not sure it's particularly useful to pass in the splice flags; I also elected not to return any error code - though we might actually want to do that. (5) Provide tls_{device,sw}_splice_eof() to flush a pending TLS record if there is one. (6) Provide splice_eof() for UDP, TCP, Chelsio-TLS and AF_KCM. AF_UNIX doesn't seem to pay attention to the MSG_MORE or MSG_SENDPAGE_NOTLAST flags. (7) Alter the behaviour of sendfile() and fix SPLICE_F_MORE/MSG_MORE signalling[1] such SPLICE_F_MORE is always signalled until we have read sufficient data to finish the request. If we get a zero-length before we've managed to splice sufficient data, we now leave the socket expecting more data and leave it to userspace to deal with it. (8) Make AF_TLS handle the MSG_SPLICE_PAGES internal sendmsg flag. MSG_SPLICE_PAGES is an internal hint that tells the protocol that it should splice the pages supplied if it can. Its sendpage implementations are then turned into wrappers around that. Link: https://lore.kernel.org/r/499791.1685485603@warthog.procyon.org.uk/ [1] Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/ [2] Link: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=51c78a4d532efe9543a4df019ff405f05c6157f6 # part 1 Link: https://lore.kernel.org/r/20230524153311.3625329-1-dhowells@redhat.com/ # v1 ==================== Link: https://lore.kernel.org/r/20230607181920.2294972-1-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-08 19:40:33 -07:00
David Howells	3dc8976c7a	tls/device: Convert tls_device_sendpage() to use MSG_SPLICE_PAGES Convert tls_device_sendpage() to use sendmsg() with MSG_SPLICE_PAGES rather than directly splicing in the pages itself. With that, the tls_iter_offset union is no longer necessary and can be replaced with an iov_iter pointer and the zc_page argument to tls_push_data() can also be removed. This allows ->sendpage() to be replaced by something that can handle multiple multipage folios in a single transaction. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Jakub Kicinski <kuba@kernel.org> cc: Chuck Lever <chuck.lever@oracle.com> cc: Boris Pismenny <borisp@nvidia.com> cc: John Fastabend <john.fastabend@gmail.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-08 19:40:31 -07:00
David Howells	24763c9c09	tls/device: Support MSG_SPLICE_PAGES Make TLS's device sendmsg() support MSG_SPLICE_PAGES. This causes pages to be spliced from the source iterator if possible. This allows ->sendpage() to be replaced by something that can handle multiple multipage folios in a single transaction. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> cc: Chuck Lever <chuck.lever@oracle.com> cc: Boris Pismenny <borisp@nvidia.com> cc: John Fastabend <john.fastabend@gmail.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-08 19:40:31 -07:00
David Howells	45e5be844a	tls/sw: Convert tls_sw_sendpage() to use MSG_SPLICE_PAGES Convert tls_sw_sendpage() and tls_sw_sendpage_locked() to use sendmsg() with MSG_SPLICE_PAGES rather than directly splicing in the pages itself. [!] Note that tls_sw_sendpage_locked() appears to have the wrong locking upstream. I think the caller will only hold the socket lock, but it should hold tls_ctx->tx_lock too. This allows ->sendpage() to be replaced by something that can handle multiple multipage folios in a single transaction. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> cc: Chuck Lever <chuck.lever@oracle.com> cc: Boris Pismenny <borisp@nvidia.com> cc: John Fastabend <john.fastabend@gmail.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-08 19:40:31 -07:00
David Howells	fe1e81d4f7	tls/sw: Support MSG_SPLICE_PAGES Make TLS's sendmsg() support MSG_SPLICE_PAGES. This causes pages to be spliced from the source iterator if possible. This allows ->sendpage() to be replaced by something that can handle multiple multipage folios in a single transaction. Signed-off-by: David Howells <dhowells@redhat.com> cc: Chuck Lever <chuck.lever@oracle.com> cc: Boris Pismenny <borisp@nvidia.com> cc: John Fastabend <john.fastabend@gmail.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-08 19:40:31 -07:00
David Howells	219d92056b	splice, net: Fix SPLICE_F_MORE signalling in splice_direct_to_actor() splice_direct_to_actor() doesn't manage SPLICE_F_MORE correctly[1] - and, as a result, it incorrectly signals/fails to signal MSG_MORE when splicing to a socket. The problem I'm seeing happens when a short splice occurs because we got a short read due to hitting the EOF on a file: as the length read (read_len) is less than the remaining size to be spliced (len), SPLICE_F_MORE (and thus MSG_MORE) is set. The issue is that, for the moment, we have no way to know why the short read occurred and so can't make a good decision on whether we should keep MSG_MORE set. MSG_SENDPAGE_NOTLAST was added to work around this, but that is also set incorrectly under some circumstances - for example if a short read fills a single pipe_buffer, but the next read would return more (seqfile can do this). This was observed with the multi_chunk_sendfile tests in the tls kselftest program. Some of those tests would hang and time out when the last chunk of file was less than the sendfile request size: build/kselftest/net/tls -r tls.12_aes_gcm.multi_chunk_sendfile This has been observed before[2] and worked around in AF_TLS[3]. Fix this by making splice_direct_to_actor() always signal SPLICE_F_MORE if we haven't yet hit the requested operation size. SPLICE_F_MORE remains signalled if the user passed it in to splice() but otherwise gets cleared when we've read sufficient data to fulfill the request. If, however, we get a premature EOF from ->splice_read(), have sent at least one byte and SPLICE_F_MORE was not set by the caller, ->splice_eof() will be invoked. Signed-off-by: David Howells <dhowells@redhat.com> cc: Linus Torvalds <torvalds@linux-foundation.org> cc: Jens Axboe <axboe@kernel.dk> cc: Christoph Hellwig <hch@lst.de> cc: Al Viro <viro@zeniv.linux.org.uk> cc: Matthew Wilcox <willy@infradead.org> cc: Jan Kara <jack@suse.cz> cc: Jeff Layton <jlayton@kernel.org> cc: David Hildenbrand <david@redhat.com> cc: Christian Brauner <brauner@kernel.org> cc: Chuck Lever <chuck.lever@oracle.com> cc: Boris Pismenny <borisp@nvidia.com> cc: John Fastabend <john.fastabend@gmail.com> cc: linux-mm@kvack.org Link: https://lore.kernel.org/r/499791.1685485603@warthog.procyon.org.uk/ [1] Link: https://lore.kernel.org/r/1591392508-14592-1-git-send-email-pooja.trivedi@stackpath.com/ [2] Link: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=d452d48b9f8b1a7f8152d33ef52cfd7fe1735b0a [3] Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-08 19:40:31 -07:00
David Howells	951ace9951	kcm: Use splice_eof() to flush Allow splice to undo the effects of MSG_MORE after prematurely ending a splice/sendfile due to getting an EOF condition (->splice_read() returned 0) after splice had called sendmsg() with MSG_MORE set when the user didn't set MSG_MORE. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/ Signed-off-by: David Howells <dhowells@redhat.com> cc: Tom Herbert <tom@herbertland.com> cc: Tom Herbert <tom@quantonium.net> cc: Cong Wang <cong.wang@bytedance.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-08 19:40:31 -07:00
David Howells	c289a1601a	chelsio/chtls: Use splice_eof() to flush Allow splice to end a Chelsio TLS record after prematurely ending a splice/sendfile due to getting an EOF condition (->splice_read() returned 0) after splice had called sendmsg() with MSG_MORE set when the user didn't set MSG_MORE. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/ Signed-off-by: David Howells <dhowells@redhat.com> cc: Ayush Sawal <ayush.sawal@chelsio.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-08 19:40:31 -07:00
David Howells	1d7e4538a5	ipv4, ipv6: Use splice_eof() to flush Allow splice to undo the effects of MSG_MORE after prematurely ending a splice/sendfile due to getting an EOF condition (->splice_read() returned 0) after splice had called sendmsg() with MSG_MORE set when the user didn't set MSG_MORE. For UDP, a pending packet will not be emitted if the socket is closed before it is flushed; with this change, it be flushed by ->splice_eof(). For TCP, it's not clear that MSG_MORE is actually effective. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/ Signed-off-by: David Howells <dhowells@redhat.com> cc: Kuniyuki Iwashima <kuniyu@amazon.com> cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com> cc: David Ahern <dsahern@kernel.org> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-08 19:40:30 -07:00
David Howells	d4c1e80b0d	tls/device: Use splice_eof() to flush Allow splice to end a TLS record after prematurely ending a splice/sendfile due to getting an EOF condition (->splice_read() returned 0) after splice had called TLS with a sendmsg() with MSG_MORE set when the user didn't set MSG_MORE. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/ Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> cc: Chuck Lever <chuck.lever@oracle.com> cc: Boris Pismenny <borisp@nvidia.com> cc: John Fastabend <john.fastabend@gmail.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-08 19:40:30 -07:00
David Howells	df720d288d	tls/sw: Use splice_eof() to flush Allow splice to end a TLS record after prematurely ending a splice/sendfile due to getting an EOF condition (->splice_read() returned 0) after splice had called TLS with a sendmsg() with MSG_MORE set when the user didn't set MSG_MORE. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/ Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> cc: Chuck Lever <chuck.lever@oracle.com> cc: Boris Pismenny <borisp@nvidia.com> cc: John Fastabend <john.fastabend@gmail.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-08 19:40:30 -07:00
David Howells	2bfc668509	splice, net: Add a splice_eof op to file-ops and socket-ops Add an optional method, ->splice_eof(), to allow splice to indicate the premature termination of a splice to struct file_operations and struct proto_ops. This is called if sendfile() or splice() encounters all of the following conditions inside splice_direct_to_actor(): (1) the user did not set SPLICE_F_MORE (splice only), and (2) an EOF condition occurred (->splice_read() returned 0), and (3) we haven't read enough to fulfill the request (ie. len > 0 still), and (4) we have already spliced at least one byte. A further patch will modify the behaviour of SPLICE_F_MORE to always be passed to the actor if either the user set it or we haven't yet read sufficient data to fulfill the request. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/ Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> cc: Jens Axboe <axboe@kernel.dk> cc: Christoph Hellwig <hch@lst.de> cc: Al Viro <viro@zeniv.linux.org.uk> cc: Matthew Wilcox <willy@infradead.org> cc: Jan Kara <jack@suse.cz> cc: Jeff Layton <jlayton@kernel.org> cc: David Hildenbrand <david@redhat.com> cc: Christian Brauner <brauner@kernel.org> cc: Chuck Lever <chuck.lever@oracle.com> cc: Boris Pismenny <borisp@nvidia.com> cc: John Fastabend <john.fastabend@gmail.com> cc: linux-mm@kvack.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-08 19:40:30 -07:00
David Howells	2dc334f1a6	splice, net: Use sendmsg(MSG_SPLICE_PAGES) rather than ->sendpage() Replace generic_splice_sendpage() + splice_from_pipe + pipe_to_sendpage() with a net-specific handler, splice_to_socket(), that calls sendmsg() with MSG_SPLICE_PAGES set instead of calling ->sendpage(). MSG_MORE is used to indicate if the sendmsg() is expected to be followed with more data. This allows multiple pipe-buffer pages to be passed in a single call in a BVEC iterator, allowing the processing to be pushed down to a loop in the protocol driver. This helps pave the way for passing multipage folios down too. Protocols that haven't been converted to handle MSG_SPLICE_PAGES yet should just ignore it and do a normal sendmsg() for now - although that may be a bit slower as it may copy everything. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-08 19:40:30 -07:00
David Howells	81840b3b91	tls: Allow MSG_SPLICE_PAGES but treat it as normal sendmsg Allow MSG_SPLICE_PAGES to be specified to sendmsg() but treat it as normal sendmsg for now. This means the data will just be copied until MSG_SPLICE_PAGES is handled. Signed-off-by: David Howells <dhowells@redhat.com> cc: Chuck Lever <chuck.lever@oracle.com> cc: Boris Pismenny <borisp@nvidia.com> cc: John Fastabend <john.fastabend@gmail.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-08 19:40:30 -07:00
David Howells	4fe38acdac	net: Block MSG_SENDPAGE_* from being passed to sendmsg() by userspace It is necessary to allow MSG_SENDPAGE_* to be passed into ->sendmsg() to allow sendmsg(MSG_SPLICE_PAGES) to replace ->sendpage(). Unblocking them in the network protocol, however, allows these flags to be passed in by userspace too[1]. Fix this by marking MSG_SENDPAGE_NOPOLICY, MSG_SENDPAGE_NOTLAST and MSG_SENDPAGE_DECRYPTED as internal flags, which causes sendmsg() to object if they are passed to sendmsg() by userspace. Network protocol ->sendmsg() implementations can then allow them through. Note that it should be possible to remove MSG_SENDPAGE_NOTLAST once sendpage is removed as a whole slew of pages will be passed in in one go by splice through sendmsg, with MSG_MORE being set if it has more data waiting in the pipe. Signed-off-by: David Howells <dhowells@redhat.com> cc: Chuck Lever <chuck.lever@oracle.com> cc: Boris Pismenny <borisp@nvidia.com> cc: John Fastabend <john.fastabend@gmail.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Link: https://lore.kernel.org/r/20230526181338.03a99016@kernel.org/ [1] Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-08 19:40:30 -07:00
Eric Dumazet	736013292e	tcp: let tcp_mtu_probe() build headless packets tcp_mtu_probe() is still copying payload from skbs in the write queue, using skb_copy_bits(), ignoring potential errors. Modern TCP stack wants to only deal with payload found in page frags, as this is a prereq for TCPDirect (host stack might not have access to the payload) Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230607214113.1992947-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-08 19:31:06 -07:00
Jakub Kicinski	f84ad5cffd	Merge tag 'mlx5-updates-2023-06-06' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2023-06-06 1) Support 4 ports VF LAG, part 2/2 2) Few extra trivial cleanup patches Shay Drory Says: ================ Support 4 ports VF LAG, part 2/2 This series continues the series[1] "Support 4 ports VF LAG, part1/2". This series adds support for 4 ports VF LAG (single FDB E-Switch). This series of patches refactoring LAG code that make assumptions about VF LAG supporting only two ports and then enable 4 ports VF LAG. Patch 1: - Fix for ib rep code Patches 2-5: - Refactors LAG layer. Patches 6-7: - Block LAG types which doesn't support 4 ports. Patch 8: - Enable 4 ports VF LAG. This series specifically allows HCAs with 4 ports to create a VF LAG with only 4 ports. It is not possible to create a VF LAG with 2 or 3 ports using HCAs that have 4 ports. Currently, the Merged E-Switch feature only supports HCAs with 2 ports. However, upcoming patches will introduce support for HCAs with 4 ports. In order to activate VF LAG a user can execute: devlink dev eswitch set pci/0000:08:00.0 mode switchdev devlink dev eswitch set pci/0000:08:00.1 mode switchdev devlink dev eswitch set pci/0000:08:00.2 mode switchdev devlink dev eswitch set pci/0000:08:00.3 mode switchdev ip link add name bond0 type bond ip link set dev bond0 type bond mode 802.3ad ip link set dev eth2 master bond0 ip link set dev eth3 master bond0 ip link set dev eth4 master bond0 ip link set dev eth5 master bond0 Where eth2, eth3, eth4 and eth5 are net-interfaces of pci/0000:08:00.0 pci/0000:08:00.1 pci/0000:08:00.2 pci/0000:08:00.3 respectively. User can verify LAG state and type via debugfs: /sys/kernel/debug/mlx5/0000\:08\:00.0/lag/state /sys/kernel/debug/mlx5/0000\:08\:00.0/lag/type [1] https://lore.kernel.org/netdev/20230601060118.154015-1-saeed@kernel.org/T/#mf1d2083780970ba277bfe721554d4925f03f36d1 ================ * tag 'mlx5-updates-2023-06-06' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5e: simplify condition after napi budget handling change mlx5/core: E-Switch, Allocate ECPF vport if it's an eswitch manager net/mlx5: Skip inline mode check after mlx5_eswitch_enable_locked() failure net/mlx5e: TC, refactor access to hash key net/mlx5e: Remove RX page cache leftovers net/mlx5e: Expose catastrophic steering error counters net/mlx5: Enable 4 ports VF LAG net/mlx5: LAG, block multiport eswitch LAG in case ldev have more than 2 ports net/mlx5: LAG, block multipath LAG in case ldev have more than 2 ports net/mlx5: LAG, change mlx5_shared_fdb_supported() to static net/mlx5: LAG, generalize handling of shared FDB net/mlx5: LAG, check if all eswitches are paired for shared FDB {net/RDMA}/mlx5: introduce lag_for_each_peer RDMA/mlx5: Free second uplink ib port ==================== Link: https://lore.kernel.org/r/20230607210410.88209-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-08 19:28:21 -07:00
Justin Chen	55b24334c0	ethtool: ioctl: improve error checking for set_wol The netlink version of set_wol checks for not supported wolopts and avoids setting wol when the correct wolopt is already set. If we do the same with the ioctl version then we can remove these checks from the driver layer. Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Justin Chen <justin.chen@broadcom.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/1686179653-29750-1-git-send-email-justin.chen@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-08 19:24:54 -07:00
Jakub Kicinski	68bd67b43f	Merge branch 'complete-lynx-mdio-device-handling' Russell King says: ==================== complete Lynx mdio device handling This series completes the mdio device lifetime handling for Lynx PCS users which do not create their own mdio device, but instead fetch it using a firmware description - namely the DPAA2 and FMAN_MEMAC drivers. In a previous patch set, lynx_pcs_create() was modified to increase the mdio device refcount, and lynx_pcs_destroy() to drop that refcount. The first two patches change these two drivers to put the reference which they hold immediately after lynx_pcs_create(), effectively handing the responsibility for maintaining the refcount to the Lynx PCS driver. A side effect of the first two patches is that lynx_get_mdio_device() is no longer used, so patch 3 removes it. Patch 4 adds a new helper - lynx_pcs_create_fwnode(), which creates a Lynx PCS instance from the fwnode. Patch 5 and 6 convert the two drivers to make use of this new helper, which simply has to find the mdio device, and then create the Lynx PCS from that. With those conversions done, lynx_pcs_create() is no longer required outside pcs-lynx.c, so remove it from public view. Patch 8 we changes lynx_pcs_create() to return an error-pointer rather than NULL to bring consistency to the return style, and means that we can remove the NULL-to-error-pointer conversion from both lynx_pcs_create_fwnode() and lynx_pcs_create_mdiodev(). Patch 9 adds a check for the fwnode being available, and returns an -ENODEV error pointer if unavailable. Patch 10 removes this check from DPAA2, detecting the error pointer value to continue printing the helpful message. Patch 11 removes this check from fman_memac, and in doing so fixes a bug where if the node is unavailable, the reference count is not dropped. ==================== Link: https://lore.kernel.org/r/ZIBwuw+IuGQo5yV8@shell.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-08 19:19:52 -07:00
Russell King (Oracle)	32fc30353f	net: fman_memac: use pcs-lynx's check for fwnode availability Use pcs-lynx's check rather than our own when determining if the device is available. This fixes a bug where the reference gained by of_parse_phandle() is not dropped if the device is not available. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-08 19:19:50 -07:00
Russell King (Oracle)	8c1d0b339d	net: dpaa2: use pcs-lynx's check for fwnode availability Use pcs-lynx's check rather than our own when determining if the device is available. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-08 19:19:50 -07:00
Russell King (Oracle)	d143898c6d	net: pcs: lynx: check that the fwnode is available prior to use Check that the fwnode is marked as available prior to trying to lookup the PCS device, and return -ENODEV if unavailable. Document the return codes from lynx_pcs_create_fwnode(). Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-08 19:19:50 -07:00
Russell King (Oracle)	05b606b884	net: pcs: lynx: change lynx_pcs_create() to return error-pointers Change lynx_pcs_create() to return an error-pointer on failure to allocate memory, rather than returning NULL. This allows the removal of the conversion in lynx_pcs_create_fwnode() and lynx_pcs_create_mdiodev(). Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-08 19:19:50 -07:00
Russell King (Oracle)	84e476b876	net: pcs: lynx: make lynx_pcs_create() static We no longer need to export lynx_pcs_create() for drivers to use as we now have all the functionality we need in the two new creation helpers. Remove the export and prototype, and make it static. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-08 19:19:50 -07:00
Russell King (Oracle)	929a629c21	net: fman_memac: use lynx_pcs_create_fwnode() Use lynx_pcs_create_fwnode() to create a lynx PCS from a fwnode handle. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-08 19:19:50 -07:00
Russell King (Oracle)	595fa7634d	net: dpaa2-mac: use lynx_pcs_create_fwnode() Use lynx_pcs_create_fwnode() to create a lynx PCS from a fwnode handle. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-08 19:19:50 -07:00
Russell King (Oracle)	6e1a12821d	net: pcs: lynx: add lynx_pcs_create_fwnode() Add a helper to create a lynx PCS from a fwnode handle. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-08 19:19:50 -07:00
Russell King (Oracle)	b3b984dc0b	net: pcs: lynx: remove lynx_get_mdio_device() lynx_get_mdio_device() is no longer necessary, let's remove it so the lynx PCS code is always managing the lifetime of the mdiodev. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-08 19:19:50 -07:00
Russell King (Oracle)	d7b6ea1a14	net: fman_memac: allow lynx PCS to handle mdiodev lifetime Put the mdiodev after lynx_pcs_create() so that the Lynx PCS driver can manage the lifetime of the mdiodev its using. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-08 19:19:50 -07:00
Russell King (Oracle)	6c79a9c8b1	net: dpaa2-mac: allow lynx PCS to manage mdiodev lifetime Put the mdiodev after lynx_pcs_create() so that the Lynx PCS driver can manage the lifetime of the mdiodev its using. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-08 19:19:50 -07:00
Jiaxun Yang	c8cc2ae229	net: pch_gbe: Allow build on MIPS_GENERIC kernel MIPS Boston board, which is using MIPS_GENERIC kernel is using EG20T PCH and thus need this driver. Dependency of PCH_GBE, PTP_1588_CLOCK_PCH is also fixed for MIPS_GENERIC. Note that CONFIG_PCH_GBE is selected in arch/mips/configs/generic/ board-boston.config for a while, some how it's never wired up in Kconfig. Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230607055953.34110-1-jiaxun.yang@flygoat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-08 19:18:32 -07:00
Ido Schimmel	37ff78e977	mlxsw: spectrum_nve_vxlan: Fix unsupported flag regression The recently added 'VXLAN_F_LOCALBYPASS' flag is set by default on VXLAN devices and denotes a behavior that is irrelevant for the hardware data path. Add it to the lists of IPv4 and IPv6 supported flags to avoid rejecting offload of VXLAN devices which have this flag set. Fixes: `69474a8a58` ("net: vxlan: Add nolocalbypass option to vxlan.") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/5533e63643bf719bbe286fef60f749c9cad35005.1686139716.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-08 18:56:02 -07:00
Jakub Kicinski	392c108bce	Merge branch 'tools-ynl-generate-code-for-the-devlink-family' Jakub Kicinski says: ==================== tools: ynl: generate code for the devlink family Another chunk of changes to support more capabilities in the YNL code gen. Devlink brings in deep nesting and directional messages (requests and responses have different IDs). We need a healthy dose of codegen changes to support those (I wasn't planning to support code gen for "directional" families initially, but the importance of devlink and ethtool is undeniable). ==================== Link: https://lore.kernel.org/r/20230607202403.1089925-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-08 14:01:12 -07:00
Jakub Kicinski	fff8660b54	tools: ynl: add sample for devlink Add a sample to show off how to issue basic devlink requests. For added testing issue get requests while walking a dump. $ ./devlink netdevsim/netdevsim1: driver: netdevsim running fw: fw.mgmt: 10.20.30 ... netdevsim/netdevsim2: driver: netdevsim running fw: fw.mgmt: 10.20.30 ... Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-08 14:01:10 -07:00
Jakub Kicinski	5d1a30eb98	tools: ynl: generate code for the devlink family Admittedly the devlink.yaml spec is fairly limitted, it only covers basic device get and info-get ops. That's sufficient to be useful (monitoring FW versions in the fleet). Plus it gives us a chance to exercise deep nesting and directional messaging in YNL. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-08 14:01:10 -07:00
Jakub Kicinski	0a94712196	tools: ynl-gen: don't generate forward declarations for policies - regen Renegerate code after dropping forward declarations for policies. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-08 14:01:10 -07:00
Jakub Kicinski	168dea20ec	tools: ynl-gen: don't generate forward declarations for policies Now that all nested types have structs and are sorted topologically there should be no need to generate forward declarations for policies. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-08 14:01:10 -07:00
Jakub Kicinski	eae7af21bd	tools: ynl-gen: walk nested types in depth So far we had only created structures for nested types nested directly in messages (second level of attrs so to speak). Walk types in depth to support deeper nesting. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-08 14:01:10 -07:00
Jakub Kicinski	37487f93b1	tools: ynl-gen: inherit struct use info We only render parse and netlink generation helpers as needed, to avoid generating dead code. Propagate the information from first- and second-layer attribute sets onto all children. Otherwise devlink won't work, it has a lot more levels of nesting. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-08 14:01:10 -07:00
Jakub Kicinski	6afaa0ef9b	tools: ynl-gen: try to sort the types more intelligently We need to sort the structures to avoid the need for forward declarations. While at it remove the sort of structs when rendering, it doesn't do anything. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-08 14:01:10 -07:00
Jakub Kicinski	ff6db4b58c	tools: ynl-gen: enable code gen for directional specs I think that user space code gen for directional specs works after recent changes. Let them through. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-08 14:01:10 -07:00
Jakub Kicinski	6f115d4575	tools: ynl-gen: refactor strmap helper generation Move generating strmap lookup function to a helper. No functional changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-08 14:01:10 -07:00
Jakub Kicinski	9858bfc271	tools: ynl-gen: use enum names in op strmap more carefully In preparation for supporting families which use different msg ids to and from the kernel - make sure the ids in op strmap are correct. The map is expected to be used mostly for notifications, don't generate a separate map for the "to kernel" direction. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-08 14:01:10 -07:00
Jakub Kicinski	8947e50373	netlink: specs: devlink: fill in some details important for C Python YNL is much more forgiving than the C code gen in terms of the spec completeness. Fill in a handful of devlink details to make the spec usable in C. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-08 14:01:10 -07:00
Jakub Kicinski	449f6bc17a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. Conflicts: net/sched/sch_taprio.c `d636fc5dd6` ("net: sched: add rcu annotations around qdisc->qdisc_sleeping") `dced11ef84` ("net/sched: taprio: don't overwrite "sch" variable in taprio_dump_class_stats()") net/ipv4/sysctl_net_ipv4.c `e209fee411` ("net/ipv4: ping_group_range: allow GID from 2147483648 to 4294967294") `ccce324dab` ("tcp: make the first N SYN RTO backoffs linear") https://lore.kernel.org/all/20230605100816.08d41a7b@canb.auug.org.au/ No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-08 11:35:14 -07:00
Linus Torvalds	25041a4c02	Merge tag 'net-6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from can, wifi, netfilter, bluetooth and ebpf. Current release - regressions: - bpf: sockmap: avoid potential NULL dereference in sk_psock_verdict_data_ready() - wifi: iwlwifi: fix -Warray-bounds bug in iwl_mvm_wait_d3_notif() - phylink: actually fix ksettings_set() ethtool call - eth: dwmac-qcom-ethqos: fix a regression on EMAC < 3 Current release - new code bugs: - wifi: mt76: fix possible NULL pointer dereference in mt7996_mac_write_txwi() Previous releases - regressions: - netfilter: fix NULL pointer dereference in nf_confirm_cthelper - wifi: rtw88/rtw89: correct PS calculation for SUPPORTS_DYNAMIC_PS - openvswitch: fix upcall counter access before allocation - bluetooth: - fix use-after-free in hci_remove_ltk/hci_remove_irk - fix l2cap_disconnect_req deadlock - nic: bnxt_en: prevent kernel panic when receiving unexpected PHC_UPDATE event Previous releases - always broken: - core: annotate rfs lockless accesses - sched: fq_pie: ensure reasonable TCA_FQ_PIE_QUANTUM values - netfilter: add null check for nla_nest_start_noflag() in nft_dump_basechain_hook() - bpf: fix UAF in task local storage - ipv4: ping_group_range: allow GID from 2147483648 to 4294967294 - ipv6: rpl: fix route of death. - tcp: gso: really support BIG TCP - mptcp: fixes for user-space PM address advertisement - smc: avoid to access invalid RMBs' MRs in SMCRv1 ADD LINK CONT - can: avoid possible use-after-free when j1939_can_rx_register fails - batman-adv: fix UaF while rescheduling delayed work - eth: qede: fix scheduling while atomic - eth: ice: make writes to /dev/gnssX synchronous" * tag 'net-6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (83 commits) bnxt_en: Implement .set_port / .unset_port UDP tunnel callbacks bnxt_en: Prevent kernel panic when receiving unexpected PHC_UPDATE event bnxt_en: Skip firmware fatal error recovery if chip is not accessible bnxt_en: Query default VLAN before VNIC setup on a VF bnxt_en: Don't issue AP reset during ethtool's reset operation bnxt_en: Fix bnxt_hwrm_update_rss_hash_cfg() net: bcmgenet: Fix EEE implementation eth: ixgbe: fix the wake condition eth: bnxt: fix the wake condition lib: cpu_rmap: Fix potential use-after-free in irq_cpu_rmap_release() bpf: Add extra path pointer check to d_path helper net: sched: fix possible refcount leak in tc_chain_tmplt_add() net: sched: act_police: fix sparse errors in tcf_police_dump() net: openvswitch: fix upcall counter access before allocation net: sched: move rtm_tca_policy declaration to include file ice: make writes to /dev/gnssX synchronous net: sched: add rcu annotations around qdisc->qdisc_sleeping rfs: annotate lockless accesses to RFS sock flow table rfs: annotate lockless accesses to sk->sk_rxhash virtio_net: use control_buf for coalesce params ...	2023-06-08 09:27:19 -07:00
Linus Torvalds	79b6fad546	Merge tag 'xfs-6.4-rc5-fixes' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs fixes from Dave Chinner: "These are a set of regression fixes discovered on recent kernels. I was hoping to send this to you a week and half ago, but events out of my control delayed finalising the changes until early this week. Whilst the diffstat looks large for this stage of the merge window, a large chunk of it comes from moving the guts of one function from one file to another i.e. it's the same code, it is just run in a different context where it is safe to hold a specific lock. Otherwise the individual changes are relatively small and straigtht forward. Summary: - Propagate unlinked inode list corruption back up to log recovery (regression fix) - improve corruption detection for AGFL entries, AGFL indexes and XEFI extents (syzkaller fuzzer oops report) - Avoid double perag reference release (regression fix) - Improve extent merging detection in scrub (regression fix) - Fix a new undefined high bit shift (regression fix) - Fix for AGF vs inode cluster buffer deadlock (regression fix)" * tag 'xfs-6.4-rc5-fixes' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: collect errors from inodegc for unlinked inode recovery xfs: validate block number being freed before adding to xefi xfs: validity check agbnos on the AGFL xfs: fix agf/agfl verification on v4 filesystems xfs: fix double xfs_perag_rele() in xfs_filestream_pick_ag() xfs: fix broken logic when detecting mergeable bmap records xfs: Fix undefined behavior of shift into sign bit xfs: fix AGF vs inode cluster buffer deadlock xfs: defered work could create precommits xfs: restore allocation trylock iteration xfs: buffer pins need to hold a buffer reference	2023-06-08 08:46:58 -07:00
Paolo Abeni	bfd019d10f	Merge branch 'crypto-splice-net-make-af_alg-handle-sendmsg-msg_splice_pages' David Howells says: ==================== crypto, splice, net: Make AF_ALG handle sendmsg(MSG_SPLICE_PAGES) Here are patches to make AF_ALG handle the MSG_SPLICE_PAGES internal sendmsg flag. MSG_SPLICE_PAGES is an internal hint that tells the protocol that it should splice the pages supplied if it can. The sendpage functions are then turned into wrappers around that. This set consists of the following parts: (1) Move netfs_extract_iter_to_sg() to somewhere more general and rename it to drop the "netfs" prefix. We use this to extract directly from an iterator into a scatterlist. (2) Make AF_ALG use iov_iter_extract_pages(). This has the additional effect of pinning pages obtained from userspace rather than taking refs on them. Pages from kernel-backed iterators would not be pinned, but AF_ALG isn't really meant for use by kernel services. (3) Change AF_ALG still further to use extract_iter_to_sg(). (4) Make af_alg_sendmsg() support MSG_SPLICE_PAGES support and make af_alg_sendpage() just a wrapper around sendmsg(). This has to take refs on the pages pinned for the moment. (5) Make hash_sendmsg() support MSG_SPLICE_PAGES by simply ignoring it. hash_sendpage() is left untouched to be removed later, after the splice core has been changed to call sendmsg(). ==================== Link: https://lore.kernel.org/r/20230606130856.1970660-1-dhowells@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-06-08 13:42:54 +02:00
David Howells	c662b043cd	crypto: af_alg/hash: Support MSG_SPLICE_PAGES Make AF_ALG sendmsg() support MSG_SPLICE_PAGES in the hashing code. This causes pages to be spliced from the source iterator if possible. This allows ->sendpage() to be replaced by something that can handle multiple multipage folios in a single transaction. Signed-off-by: David Howells <dhowells@redhat.com> cc: Herbert Xu <herbert@gondor.apana.org.au> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> cc: linux-crypto@vger.kernel.org cc: netdev@vger.kernel.org Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-06-08 13:42:34 +02:00
David Howells	fb800fa4c1	crypto: af_alg: Convert af_alg_sendpage() to use MSG_SPLICE_PAGES Convert af_alg_sendpage() to use sendmsg() with MSG_SPLICE_PAGES rather than directly splicing in the pages itself. This allows ->sendpage() to be replaced by something that can handle multiple multipage folios in a single transaction. Signed-off-by: David Howells <dhowells@redhat.com> cc: Herbert Xu <herbert@gondor.apana.org.au> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> cc: linux-crypto@vger.kernel.org cc: netdev@vger.kernel.org Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-06-08 13:42:34 +02:00

1 2 3 4 5 ...

1187353 Commits