David Howells says:
====================
splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS
Here are patches to do the following:
(1) Block MSG_SENDPAGE_* flags from leaking into ->sendmsg() from
userspace, whilst allowing splice_to_socket() to pass them in.
(2) Allow MSG_SPLICE_PAGES to be passed into tls_*_sendmsg(). Until
support is added, it will be ignored and a splice-driven sendmsg()
will be treated like a normal sendmsg(). TCP, UDP, AF_UNIX and
Chelsio-TLS already handle the flag in net-next.
(3) Replace a chain of functions to splice-to-sendpage with a single
function to splice via sendmsg() with MSG_SPLICE_PAGES. This allows a
bunch of pages to be spliced from a pipe in a single call using a
bio_vec[] and pushes the main processing loop down into the bowels of
the protocol driver rather than repeatedly calling in with a page at a
time.
(4) Provide a ->splice_eof() op[2] that allows splice to signal to its
output that the input observed a premature EOF and that the caller
didn't flag SPLICE_F_MORE, thereby allowing a corked socket to be
flushed. This attempts to maintain the current behaviour. It is also
not called if we didn't manage to read any data and so didn't called
the actor function.
This needs routing though several layers to get it down to the network
protocol.
[!] Note that I chose not to pass in any flags - I'm not sure it's
particularly useful to pass in the splice flags; I also elected
not to return any error code - though we might actually want to do
that.
(5) Provide tls_{device,sw}_splice_eof() to flush a pending TLS record if
there is one.
(6) Provide splice_eof() for UDP, TCP, Chelsio-TLS and AF_KCM. AF_UNIX
doesn't seem to pay attention to the MSG_MORE or MSG_SENDPAGE_NOTLAST
flags.
(7) Alter the behaviour of sendfile() and fix SPLICE_F_MORE/MSG_MORE
signalling[1] such SPLICE_F_MORE is always signalled until we have
read sufficient data to finish the request. If we get a zero-length
before we've managed to splice sufficient data, we now leave the
socket expecting more data and leave it to userspace to deal with it.
(8) Make AF_TLS handle the MSG_SPLICE_PAGES internal sendmsg flag.
MSG_SPLICE_PAGES is an internal hint that tells the protocol that it
should splice the pages supplied if it can. Its sendpage
implementations are then turned into wrappers around that.
Link: https://lore.kernel.org/r/499791.1685485603@warthog.procyon.org.uk/ [1]
Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/ [2]
Link: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=51c78a4d532efe9543a4df019ff405f05c6157f6 # part 1
Link: https://lore.kernel.org/r/20230524153311.3625329-1-dhowells@redhat.com/ # v1
====================
Link: https://lore.kernel.org/r/20230607181920.2294972-1-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Convert tls_device_sendpage() to use sendmsg() with MSG_SPLICE_PAGES rather
than directly splicing in the pages itself. With that, the tls_iter_offset
union is no longer necessary and can be replaced with an iov_iter pointer
and the zc_page argument to tls_push_data() can also be removed.
This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Make TLS's device sendmsg() support MSG_SPLICE_PAGES. This causes pages to
be spliced from the source iterator if possible.
This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Convert tls_sw_sendpage() and tls_sw_sendpage_locked() to use sendmsg()
with MSG_SPLICE_PAGES rather than directly splicing in the pages itself.
[!] Note that tls_sw_sendpage_locked() appears to have the wrong locking
upstream. I think the caller will only hold the socket lock, but it
should hold tls_ctx->tx_lock too.
This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Make TLS's sendmsg() support MSG_SPLICE_PAGES. This causes pages to be
spliced from the source iterator if possible.
This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
splice_direct_to_actor() doesn't manage SPLICE_F_MORE correctly[1] - and,
as a result, it incorrectly signals/fails to signal MSG_MORE when splicing
to a socket. The problem I'm seeing happens when a short splice occurs
because we got a short read due to hitting the EOF on a file: as the length
read (read_len) is less than the remaining size to be spliced (len),
SPLICE_F_MORE (and thus MSG_MORE) is set.
The issue is that, for the moment, we have no way to know *why* the short
read occurred and so can't make a good decision on whether we *should* keep
MSG_MORE set.
MSG_SENDPAGE_NOTLAST was added to work around this, but that is also set
incorrectly under some circumstances - for example if a short read fills a
single pipe_buffer, but the next read would return more (seqfile can do
this).
This was observed with the multi_chunk_sendfile tests in the tls kselftest
program. Some of those tests would hang and time out when the last chunk
of file was less than the sendfile request size:
build/kselftest/net/tls -r tls.12_aes_gcm.multi_chunk_sendfile
This has been observed before[2] and worked around in AF_TLS[3].
Fix this by making splice_direct_to_actor() always signal SPLICE_F_MORE if
we haven't yet hit the requested operation size. SPLICE_F_MORE remains
signalled if the user passed it in to splice() but otherwise gets cleared
when we've read sufficient data to fulfill the request.
If, however, we get a premature EOF from ->splice_read(), have sent at
least one byte and SPLICE_F_MORE was not set by the caller, ->splice_eof()
will be invoked.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Linus Torvalds <torvalds@linux-foundation.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Christoph Hellwig <hch@lst.de>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Matthew Wilcox <willy@infradead.org>
cc: Jan Kara <jack@suse.cz>
cc: Jeff Layton <jlayton@kernel.org>
cc: David Hildenbrand <david@redhat.com>
cc: Christian Brauner <brauner@kernel.org>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/499791.1685485603@warthog.procyon.org.uk/ [1]
Link: https://lore.kernel.org/r/1591392508-14592-1-git-send-email-pooja.trivedi@stackpath.com/ [2]
Link: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=d452d48b9f8b1a7f8152d33ef52cfd7fe1735b0a [3]
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Replace generic_splice_sendpage() + splice_from_pipe + pipe_to_sendpage()
with a net-specific handler, splice_to_socket(), that calls sendmsg() with
MSG_SPLICE_PAGES set instead of calling ->sendpage().
MSG_MORE is used to indicate if the sendmsg() is expected to be followed
with more data.
This allows multiple pipe-buffer pages to be passed in a single call in a
BVEC iterator, allowing the processing to be pushed down to a loop in the
protocol driver. This helps pave the way for passing multipage folios down
too.
Protocols that haven't been converted to handle MSG_SPLICE_PAGES yet should
just ignore it and do a normal sendmsg() for now - although that may be a
bit slower as it may copy everything.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
It is necessary to allow MSG_SENDPAGE_* to be passed into ->sendmsg() to
allow sendmsg(MSG_SPLICE_PAGES) to replace ->sendpage(). Unblocking them
in the network protocol, however, allows these flags to be passed in by
userspace too[1].
Fix this by marking MSG_SENDPAGE_NOPOLICY, MSG_SENDPAGE_NOTLAST and
MSG_SENDPAGE_DECRYPTED as internal flags, which causes sendmsg() to object
if they are passed to sendmsg() by userspace. Network protocol ->sendmsg()
implementations can then allow them through.
Note that it should be possible to remove MSG_SENDPAGE_NOTLAST once
sendpage is removed as a whole slew of pages will be passed in in one go by
splice through sendmsg, with MSG_MORE being set if it has more data waiting
in the pipe.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Link: https://lore.kernel.org/r/20230526181338.03a99016@kernel.org/ [1]
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
tcp_mtu_probe() is still copying payload from skbs in the write queue,
using skb_copy_bits(), ignoring potential errors.
Modern TCP stack wants to only deal with payload found in page frags,
as this is a prereq for TCPDirect (host stack might not have access
to the payload)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230607214113.1992947-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Saeed Mahameed says:
====================
mlx5-updates-2023-06-06
1) Support 4 ports VF LAG, part 2/2
2) Few extra trivial cleanup patches
Shay Drory Says:
================
Support 4 ports VF LAG, part 2/2
This series continues the series[1] "Support 4 ports VF LAG, part1/2".
This series adds support for 4 ports VF LAG (single FDB E-Switch).
This series of patches refactoring LAG code that make assumptions
about VF LAG supporting only two ports and then enable 4 ports VF LAG.
Patch 1:
- Fix for ib rep code
Patches 2-5:
- Refactors LAG layer.
Patches 6-7:
- Block LAG types which doesn't support 4 ports.
Patch 8:
- Enable 4 ports VF LAG.
This series specifically allows HCAs with 4 ports to create a VF LAG
with only 4 ports. It is not possible to create a VF LAG with 2 or 3
ports using HCAs that have 4 ports.
Currently, the Merged E-Switch feature only supports HCAs with 2 ports.
However, upcoming patches will introduce support for HCAs with 4 ports.
In order to activate VF LAG a user can execute:
devlink dev eswitch set pci/0000:08:00.0 mode switchdev
devlink dev eswitch set pci/0000:08:00.1 mode switchdev
devlink dev eswitch set pci/0000:08:00.2 mode switchdev
devlink dev eswitch set pci/0000:08:00.3 mode switchdev
ip link add name bond0 type bond
ip link set dev bond0 type bond mode 802.3ad
ip link set dev eth2 master bond0
ip link set dev eth3 master bond0
ip link set dev eth4 master bond0
ip link set dev eth5 master bond0
Where eth2, eth3, eth4 and eth5 are net-interfaces of pci/0000:08:00.0
pci/0000:08:00.1 pci/0000:08:00.2 pci/0000:08:00.3 respectively.
User can verify LAG state and type via debugfs:
/sys/kernel/debug/mlx5/0000\:08\:00.0/lag/state
/sys/kernel/debug/mlx5/0000\:08\:00.0/lag/type
[1]
https://lore.kernel.org/netdev/20230601060118.154015-1-saeed@kernel.org/T/#mf1d2083780970ba277bfe721554d4925f03f36d1
================
* tag 'mlx5-updates-2023-06-06' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: simplify condition after napi budget handling change
mlx5/core: E-Switch, Allocate ECPF vport if it's an eswitch manager
net/mlx5: Skip inline mode check after mlx5_eswitch_enable_locked() failure
net/mlx5e: TC, refactor access to hash key
net/mlx5e: Remove RX page cache leftovers
net/mlx5e: Expose catastrophic steering error counters
net/mlx5: Enable 4 ports VF LAG
net/mlx5: LAG, block multiport eswitch LAG in case ldev have more than 2 ports
net/mlx5: LAG, block multipath LAG in case ldev have more than 2 ports
net/mlx5: LAG, change mlx5_shared_fdb_supported() to static
net/mlx5: LAG, generalize handling of shared FDB
net/mlx5: LAG, check if all eswitches are paired for shared FDB
{net/RDMA}/mlx5: introduce lag_for_each_peer
RDMA/mlx5: Free second uplink ib port
====================
Link: https://lore.kernel.org/r/20230607210410.88209-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King says:
====================
complete Lynx mdio device handling
This series completes the mdio device lifetime handling for Lynx PCS
users which do not create their own mdio device, but instead fetch
it using a firmware description - namely the DPAA2 and FMAN_MEMAC
drivers.
In a previous patch set, lynx_pcs_create() was modified to increase
the mdio device refcount, and lynx_pcs_destroy() to drop that
refcount.
The first two patches change these two drivers to put the reference
which they hold immediately after lynx_pcs_create(), effectively
handing the responsibility for maintaining the refcount to the Lynx
PCS driver.
A side effect of the first two patches is that lynx_get_mdio_device()
is no longer used, so patch 3 removes it.
Patch 4 adds a new helper - lynx_pcs_create_fwnode(), which creates
a Lynx PCS instance from the fwnode.
Patch 5 and 6 convert the two drivers to make use of this new helper,
which simply has to find the mdio device, and then create the Lynx
PCS from that.
With those conversions done, lynx_pcs_create() is no longer required
outside pcs-lynx.c, so remove it from public view.
Patch 8 we changes lynx_pcs_create() to return an error-pointer rather
than NULL to bring consistency to the return style, and means that we
can remove the NULL-to-error-pointer conversion from both
lynx_pcs_create_fwnode() and lynx_pcs_create_mdiodev().
Patch 9 adds a check for the fwnode being available, and returns an
-ENODEV error pointer if unavailable.
Patch 10 removes this check from DPAA2, detecting the error pointer
value to continue printing the helpful message.
Patch 11 removes this check from fman_memac, and in doing so fixes a
bug where if the node is unavailable, the reference count is not
dropped.
====================
Link: https://lore.kernel.org/r/ZIBwuw+IuGQo5yV8@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use pcs-lynx's check rather than our own when determining if the device
is available. This fixes a bug where the reference gained by
of_parse_phandle() is not dropped if the device is not available.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use pcs-lynx's check rather than our own when determining if the device
is available.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Check that the fwnode is marked as available prior to trying to lookup
the PCS device, and return -ENODEV if unavailable. Document the return
codes from lynx_pcs_create_fwnode().
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Change lynx_pcs_create() to return an error-pointer on failure to
allocate memory, rather than returning NULL. This allows the removal
of the conversion in lynx_pcs_create_fwnode() and
lynx_pcs_create_mdiodev().
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We no longer need to export lynx_pcs_create() for drivers to use as we
now have all the functionality we need in the two new creation helpers.
Remove the export and prototype, and make it static.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use lynx_pcs_create_fwnode() to create a lynx PCS from a fwnode handle.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use lynx_pcs_create_fwnode() to create a lynx PCS from a fwnode handle.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a helper to create a lynx PCS from a fwnode handle.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
lynx_get_mdio_device() is no longer necessary, let's remove it so the
lynx PCS code is always managing the lifetime of the mdiodev.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Put the mdiodev after lynx_pcs_create() so that the Lynx PCS driver
can manage the lifetime of the mdiodev its using.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Put the mdiodev after lynx_pcs_create() so that the Lynx PCS driver
can manage the lifetime of the mdiodev its using.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
MIPS Boston board, which is using MIPS_GENERIC kernel is using
EG20T PCH and thus need this driver.
Dependency of PCH_GBE, PTP_1588_CLOCK_PCH is also fixed for
MIPS_GENERIC.
Note that CONFIG_PCH_GBE is selected in arch/mips/configs/generic/
board-boston.config for a while, some how it's never wired up
in Kconfig.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230607055953.34110-1-jiaxun.yang@flygoat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski says:
====================
tools: ynl: generate code for the devlink family
Another chunk of changes to support more capabilities in the YNL
code gen. Devlink brings in deep nesting and directional messages
(requests and responses have different IDs). We need a healthy
dose of codegen changes to support those (I wasn't planning to
support code gen for "directional" families initially, but
the importance of devlink and ethtool is undeniable).
====================
Link: https://lore.kernel.org/r/20230607202403.1089925-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a sample to show off how to issue basic devlink requests.
For added testing issue get requests while walking a dump.
$ ./devlink
netdevsim/netdevsim1:
driver: netdevsim
running fw:
fw.mgmt: 10.20.30
...
netdevsim/netdevsim2:
driver: netdevsim
running fw:
fw.mgmt: 10.20.30
...
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Admittedly the devlink.yaml spec is fairly limitted,
it only covers basic device get and info-get ops.
That's sufficient to be useful (monitoring FW versions
in the fleet). Plus it gives us a chance to exercise
deep nesting and directional messaging in YNL.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now that all nested types have structs and are sorted topologically
there should be no need to generate forward declarations for policies.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
So far we had only created structures for nested types nested
directly in messages (second level of attrs so to speak).
Walk types in depth to support deeper nesting.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We only render parse and netlink generation helpers as needed,
to avoid generating dead code. Propagate the information from
first- and second-layer attribute sets onto all children.
Otherwise devlink won't work, it has a lot more levels of nesting.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We need to sort the structures to avoid the need for forward
declarations. While at it remove the sort of structs when
rendering, it doesn't do anything.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In preparation for supporting families which use different msg
ids to and from the kernel - make sure the ids in op strmap
are correct. The map is expected to be used mostly for notifications,
don't generate a separate map for the "to kernel" direction.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Python YNL is much more forgiving than the C code gen in terms
of the spec completeness. Fill in a handful of devlink details
to make the spec usable in C.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Cross-merge networking fixes after downstream PR.
Conflicts:
net/sched/sch_taprio.c
d636fc5dd6 ("net: sched: add rcu annotations around qdisc->qdisc_sleeping")
dced11ef84 ("net/sched: taprio: don't overwrite "sch" variable in taprio_dump_class_stats()")
net/ipv4/sysctl_net_ipv4.c
e209fee411 ("net/ipv4: ping_group_range: allow GID from 2147483648 to 4294967294")
ccce324dab ("tcp: make the first N SYN RTO backoffs linear")
https://lore.kernel.org/all/20230605100816.08d41a7b@canb.auug.org.au/
No adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull xfs fixes from Dave Chinner:
"These are a set of regression fixes discovered on recent kernels. I
was hoping to send this to you a week and half ago, but events out of
my control delayed finalising the changes until early this week.
Whilst the diffstat looks large for this stage of the merge window, a
large chunk of it comes from moving the guts of one function from one
file to another i.e. it's the same code, it is just run in a different
context where it is safe to hold a specific lock. Otherwise the
individual changes are relatively small and straigtht forward.
Summary:
- Propagate unlinked inode list corruption back up to log recovery
(regression fix)
- improve corruption detection for AGFL entries, AGFL indexes and
XEFI extents (syzkaller fuzzer oops report)
- Avoid double perag reference release (regression fix)
- Improve extent merging detection in scrub (regression fix)
- Fix a new undefined high bit shift (regression fix)
- Fix for AGF vs inode cluster buffer deadlock (regression fix)"
* tag 'xfs-6.4-rc5-fixes' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: collect errors from inodegc for unlinked inode recovery
xfs: validate block number being freed before adding to xefi
xfs: validity check agbnos on the AGFL
xfs: fix agf/agfl verification on v4 filesystems
xfs: fix double xfs_perag_rele() in xfs_filestream_pick_ag()
xfs: fix broken logic when detecting mergeable bmap records
xfs: Fix undefined behavior of shift into sign bit
xfs: fix AGF vs inode cluster buffer deadlock
xfs: defered work could create precommits
xfs: restore allocation trylock iteration
xfs: buffer pins need to hold a buffer reference
David Howells says:
====================
crypto, splice, net: Make AF_ALG handle sendmsg(MSG_SPLICE_PAGES)
Here are patches to make AF_ALG handle the MSG_SPLICE_PAGES internal
sendmsg flag. MSG_SPLICE_PAGES is an internal hint that tells the protocol
that it should splice the pages supplied if it can. The sendpage functions
are then turned into wrappers around that.
This set consists of the following parts:
(1) Move netfs_extract_iter_to_sg() to somewhere more general and rename
it to drop the "netfs" prefix. We use this to extract directly from
an iterator into a scatterlist.
(2) Make AF_ALG use iov_iter_extract_pages(). This has the additional
effect of pinning pages obtained from userspace rather than taking
refs on them. Pages from kernel-backed iterators would not be pinned,
but AF_ALG isn't really meant for use by kernel services.
(3) Change AF_ALG still further to use extract_iter_to_sg().
(4) Make af_alg_sendmsg() support MSG_SPLICE_PAGES support and make
af_alg_sendpage() just a wrapper around sendmsg(). This has to take
refs on the pages pinned for the moment.
(5) Make hash_sendmsg() support MSG_SPLICE_PAGES by simply ignoring it.
hash_sendpage() is left untouched to be removed later, after the
splice core has been changed to call sendmsg().
====================
Link: https://lore.kernel.org/r/20230606130856.1970660-1-dhowells@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>