Commit Graph

1429404 Commits

Author SHA1 Message Date
Olga Kornievskaia
b0bf14546b nfsd: fix GET_DIR_DELEGATION when VFS leases are disabled
When leases are disabled on the server, running xfstest generic/309 leads
to an error because GET_DIR_DELEGATION returns EINVAL.

nfsd_get_dir_deleg() can fail in several ways: like memory allocation and
unable to get a lease because either leases are disable or it's already
held. Currently only the condition "already held" is translated to
returning directory-delegation-is-unavailable error. However, other failure
conditions are likely temporary and thus should result in the same kind
of error.

Fixes: 8b99f6a8c1 ("nfsd: wire up GET_DIR_DELEGATION handling")
Cc: stable@vger.kernel.org
Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-04-26 21:05:49 -04:00
Randy Dunlap
d644a698de NFSD: Docs: clean up pnfs server timeout docs
Make various changes to the documentation formatting to avoid docs
build errors and otherwise improve the produced output format:

- use bullets for lists
- don't use a '.' at the end of echo commands
- fix indentation

Documentation/admin-guide/nfs/pnfs-block-server.rst:55: ERROR: Unexpected indentation. [docutils]
Documentation/admin-guide/nfs/pnfs-scsi-server.rst:37: ERROR: Unexpected indentation. [docutils]

Fixes: 6a97f70b45e7 ("NFSD: Enforce timeout on layout recall and integrate lease manager fencing")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-04-03 09:29:32 -04:00
Joseph Salisbury
124f9af22c nfsd: fix comment typo in nfsxdr
The file contains a spelling error in a source comment (occured).

Typos in comments reduce readability and make text searches less reliable
for developers and maintainers.

Replace 'occured' with 'occurred' in the affected comment. This is a
comment-only cleanup and does not change behavior.

Signed-off-by: Joseph Salisbury <joseph.salisbury@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-04-03 09:29:07 -04:00
Joseph Salisbury
fa6966fd05 nfsd: fix comment typo in nfs3xdr
The file contains a spelling error in a source comment (occured).

Typos in comments reduce readability and make text searches less reliable
for developers and maintainers.

Replace 'occured' with 'occurred' in the affected comment. This is a
comment-only cleanup and does not change behavior.

Signed-off-by: Joseph Salisbury <joseph.salisbury@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-04-03 09:28:47 -04:00
Dai Ngo
42cc139959 NFSD: convert callback RPC program to per-net namespace
The callback channel's rpc_program, rpc_version, rpc_stat,
and per-procedure counts are declared as file-scope statics in
nfs4callback.c, shared across all network namespaces.
Forechannel RPC statistics are already maintained per-netns
(via nfsd_svcstats in struct nfsd_net); the backchannel
has no such separation. When backchannel statistics are
eventually surfaced to userspace, the global counters would
expose cross-namespace data.

Allocate per-netns copies of these structures through a new
opaque struct nfsd_net_cb, managed by nfsd_net_cb_init()
and nfsd_net_cb_shutdown(). The struct definition is private
to nfs4callback.c; struct nfsd_net holds only a pointer.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-04-03 09:27:52 -04:00
Chuck Lever
39bd1bfe92 NFSD: use per-operation statidx for callback procedures
The callback RPC procedure table uses NFSPROC4_CB_##call for
p_statidx, which maps CB_NULL to index 0 and every
compound-based callback (CB_RECALL, CB_LAYOUT, CB_OFFLOAD,
etc.) to index 1. All compound callback operations therefore
share a single statistics counter, making per-operation
accounting impossible.

Assign p_statidx from the NFSPROC4_CLNT_##proc enum instead,
giving each callback operation its own counter slot. The
counts array is already sized by ARRAY_SIZE(nfs4_cb_procedures),
so no allocation change is needed.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-04-03 09:27:15 -04:00
Chuck Lever
18755b8c2f svcrdma: Use contiguous pages for RDMA Read sink buffers
svc_rdma_build_read_segment() constructs RDMA Read sink
buffers by consuming pages one-at-a-time from rq_pages[]
and building one bvec per page. A 64KB NFS READ payload
produces 16 separate bvecs, 16 DMA mappings, and
potentially multiple RDMA Read WRs (on platforms with
4KB pages).

A single higher-order allocation followed by split_page()
yields physically contiguous memory while preserving
per-page refcounts. A single bvec spanning the contiguous
range causes rdma_rw_ctx_init_bvec() to take the
rdma_rw_init_single_wr_bvec() fast path: one DMA mapping,
one SGE, one WR.

The split sub-pages replace the original rq_pages[] entries,
so all downstream page tracking, completion handling, and
xdr_buf assembly remain unchanged.

Allocation uses __GFP_NORETRY | __GFP_NOWARN and falls back
through decreasing orders. If even order-1 fails, the
existing per-page path handles the segment.

When nr_pages is not a power of two, get_order() rounds up
and the allocation yields more pages than needed. The extra
split pages replace existing rq_pages[] entries (freed via
put_page() first), so there is no net increase in per-
request page consumption. Successive segments reuse the
same padding slots, preventing accumulation. The
rq_maxpages guard rejects any allocation that would
overrun the array, falling back to the per-page path.
Under memory pressure, __GFP_NORETRY causes the higher-
order allocation to fail without stalling.

The contiguous path is attempted when the segment starts
page-aligned (rc_pageoff == 0) and spans at least two
pages. NFS WRITE segments carry application-modified byte
ranges of arbitrary length, so the optimization is not
restricted to power-of-two page counts.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-04-03 09:26:31 -04:00
Chuck Lever
4e2866b2ba SUNRPC: Add svc_rqst_page_release() helper
svc_rqst_replace_page() releases displaced pages through a
per-rqst folio batch, but exposes the add-or-flush sequence
directly. svc_tcp_restore_pages() releases displaced pages
individually with put_page().

Introduce svc_rqst_page_release() to encapsulate the
batched release mechanism. Convert svc_rqst_replace_page()
and svc_tcp_restore_pages() to use it. The latter now
benefits from the same batched release that
svc_rqst_replace_page() already uses.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-04-03 09:26:17 -04:00
Randy Dunlap
3603bf9906 SUNRPC: xdr.h: fix all kernel-doc warnings
Correct a function parameter name (s/page/folio/) and add function
return value sections for multiple functions to eliminate
kernel-doc warnings:

Warning: include/linux/sunrpc/xdr.h:298 function parameter 'folio' not
 described in 'xdr_set_scratch_folio'
Warning: include/linux/sunrpc/xdr.h:337 No description found for return
 value of 'xdr_stream_remaining'
Warning: include/linux/sunrpc/xdr.h:357 No description found for return
 value of 'xdr_align_size'
Warning: include/linux/sunrpc/xdr.h:374 No description found for return
 value of 'xdr_pad_size'
Warning: include/linux/sunrpc/xdr.h:387 No description found for return
 value of 'xdr_stream_encode_item_present'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29 21:25:09 -04:00
Chuck Lever
2239535fb0 svcrdma: Factor out WR chain linking into helper
svc_rdma_prepare_write_chunk() and svc_rdma_prepare_reply_chunk()
contain identical code for linking RDMA R/W work requests onto a
Send context's WR chain. This duplication increases maintenance
burden and risks divergent bug fixes.

Introduce svc_rdma_cc_link_wrs() to consolidate the WR chain
linking logic. The helper walks the chunk context's rwctxts list,
chains each WR via rdma_rw_ctx_wrs(), and updates the Send
context's chain head and SQE count. Completion signaling is
requested only for the tail WR (posted first).

No functional change.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29 21:25:09 -04:00
Chuck Lever
d16f060f3e svcrdma: Add Write chunk WRs to the RPC's Send WR chain
Previously, Write chunk RDMA Writes were posted via a separate
ib_post_send() call with their own completion handler. Each Write
chunk incurred a doorbell and generated a completion event.

Link Write chunk WRs onto the RPC Reply's Send WR chain so that a
single ib_post_send() call posts both the RDMA Writes and the Send
WR. A single completion event signals that all operations have
finished. This reduces both doorbell rate and completion rate, as
well as eliminating the latency of a round-trip between the Write
chunk completion and the subsequent Send WR posting.

The lifecycle of Write chunk resources changes: previously, the
svc_rdma_write_done() completion handler released Write chunk
resources when RDMA Writes completed. With WR chaining, resources
remain live until the Send completion. A new sc_write_info_list
tracks Write chunk metadata attached to each Send context, and
svc_rdma_write_chunk_release() frees these resources when the
Send context is released.

The svc_rdma_write_done() handler now handles only error cases.
On success it returns immediately since the Send completion handles
resource release. On failure (WR flush), it closes the connection
to signal to the client that the RPC Reply is incomplete.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29 21:25:09 -04:00
Chuck Lever
c553983efa svcrdma: Clean up use of rdma->sc_pd->device
I can't think of a reason why svcrdma is using the PD's device. Most
other consumers of the IB DMA API use the ib_device pointer from the
connection's rdma_cm_id.

I don't think there's any functional difference between the two, but
it is a little confusing to see some uses of rdma_cm_id and some of
ib_pd.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29 21:25:09 -04:00
Chuck Lever
a5f2087f37 svcrdma: Clean up use of rdma->sc_pd->device in Receive paths
I can't think of a reason why svcrdma is using the PD's device. Most
other consumers of the IB DMA API use the ib_device pointer from the
connection's rdma_cm_id.

I don't believe there's any functional difference between the two,
but it is a little confusing to see some uses of rdma_cm_id->device
and some of ib_pd->device.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29 21:25:09 -04:00
Chuck Lever
ccc89b9d1e svcrdma: Add fair queuing for Send Queue access
When the Send Queue fills, multiple threads may wait for SQ slots.
The previous implementation had no ordering guarantee, allowing
starvation when one thread repeatedly acquires slots while others
wait indefinitely.

Introduce a ticket-based fair queuing system. Each waiter takes a
ticket number and is served in FIFO order. This ensures forward
progress for all waiters when SQ capacity is constrained.

The implementation has two phases:
1. Fast path: attempt to reserve SQ slots without waiting
2. Slow path: take a ticket, wait for turn, then wait for slots

The ticket system adds two atomic counters to the transport:
- sc_sq_ticket_head: next ticket to issue
- sc_sq_ticket_tail: ticket currently being served

A dedicated wait queue (sc_sq_ticket_wait) handles ticket
ordering, separate from sc_send_wait which handles SQ capacity.
This separation ensures that send completions (the high-frequency
wake source) wake only the current ticket holder rather than all
queued waiters. Ticket handoff wakes only the ticket wait queue,
and each ticket holder that exits via connection close propagates
the wake to the next waiter in line.

When a waiter successfully reserves slots, it advances the tail
counter and wakes the next waiter. This creates an orderly handoff
that prevents starvation while maintaining good throughput on the
fast path when contention is low.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29 21:25:09 -04:00
Chuck Lever
d7f3efd9ff SUNRPC: Optimize rq_respages allocation in svc_alloc_arg
svc_alloc_arg() invokes alloc_pages_bulk() with the full rq_maxpages
count (~259 for 1MB messages) for the rq_respages array, causing a
full-array scan despite most slots holding valid pages.

svc_rqst_release_pages() NULLs only the range

  [rq_respages, rq_next_page)

after each RPC, so only that range contains NULL entries. Limit the
rq_respages fill in svc_alloc_arg() to that range instead of
scanning the full array.

svc_init_buffer() initializes rq_next_page to span the entire
rq_respages array, so the first svc_alloc_arg() call fills all
slots.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29 21:25:09 -04:00
Chuck Lever
7ed7504287 SUNRPC: Track consumed rq_pages entries
The rq_pages array holds pages allocated for incoming RPC requests.
Two transport receive paths NULL entries in rq_pages to prevent
svc_rqst_release_pages() from freeing pages that the transport has
taken ownership of:

- svc_tcp_save_pages() moves partial request data pages to
  svsk->sk_pages during multi-fragment TCP reassembly.

- svc_rdma_clear_rqst_pages() moves request data pages to
  head->rc_pages because they are targets of active RDMA Read WRs.

A new rq_pages_nfree field in struct svc_rqst records how many
entries were NULLed. svc_alloc_arg() uses it to refill only those
entries rather than scanning the full rq_pages array. In steady
state, the transport NULLs a handful of entries per RPC, so the
allocator visits only those entries instead of the full ~259 slots
(for 1MB messages).

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29 21:25:09 -04:00
Chuck Lever
26c8e6eb75 svcrdma: preserve rq_next_page in svc_rdma_save_io_pages
svc_rdma_save_io_pages() transfers response pages to the send
context and sets those slots to NULL. It then resets rq_next_page to
equal rq_respages, hiding the NULL region from
svc_rqst_release_pages().

Now that svc_rqst_release_pages() handles NULL entries, this reset
is no longer necessary. Removing it preserves the invariant that the
range [rq_respages, rq_next_page) accurately describes how many
response pages were consumed, enabling a subsequent optimization in
svc_alloc_arg() that refills only the consumed range.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29 21:25:09 -04:00
Chuck Lever
22cc2ba5c2 SUNRPC: Handle NULL entries in svc_rqst_release_pages
svc_rqst_release_pages() releases response pages between rq_respages
and rq_next_page. It currently passes the entire range to
release_pages(), which does not expect NULL entries.

A subsequent patch preserves the rq_next_page pointer in
svc_rdma_save_io_pages() so that it accurately records how many
response pages were consumed. After that change, the range

  [rq_respages, rq_next_page)

can contain NULL entries where pages have already been transferred
to a send context.

Iterate through the range entry by entry, skipping NULLs, to handle
this case correctly.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29 21:25:09 -04:00
Chuck Lever
ee66b9e3e1 SUNRPC: Allocate a separate Reply page array
struct svc_rqst uses a single dynamically-allocated page array
(rq_pages) for both the incoming RPC Call message and the outgoing
RPC Reply message. rq_respages is a sliding pointer into rq_pages
that each transport receive path must compute based on how many
pages the Call consumed. This boundary tracking is a source of
confusion and bugs, and prevents an RPC transaction from having
both a large Call and a large Reply simultaneously.

Allocate rq_respages as its own page array, eliminating the boundary
arithmetic. This decouples Call and Reply buffer lifetimes,
following the precedent set by rq_bvec (a separate dynamically-
allocated array for I/O vectors).

Each svc_rqst now pins twice as many pages as before. For a server
running 16 threads with a 1MB maximum payload, the additional cost
is roughly 16MB of pinned memory. The new dynamic svc thread count
facility keeps this overhead minimal on an idle server. A subsequent
patch in this series limits per-request repopulation to only the
pages released during the previous RPC, avoiding a full-array scan
on each call to svc_alloc_arg().

Note: We've considered several alternatives to maintaining a full
second array. Each alternative reintroduces either boundary logic
complexity or I/O-path allocation pressure.

rq_next_page is initialized in svc_alloc_arg() and svc_process()
during Reply construction, and in svc_rdma_recvfrom() as a
precaution on error paths. Transport receive paths no longer compute
it from the Call size.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29 21:25:09 -04:00
Chuck Lever
46ca8dd244 SUNRPC: Tighten bounds checking in svc_rqst_replace_page
svc_rqst_replace_page() builds the Reply buffer by advancing
rq_next_page through the response page range. The bounds
check validates rq_next_page against the full rq_pages array,
but the valid range for rq_next_page is

  [rq_respages, rq_page_end].

Use those bounds instead.

This is correct today because rq_respages and rq_page_end
both point into rq_pages, and it prepares for a subsequent
change that separates the Reply page array from rq_pages.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29 21:25:09 -04:00
Benjamin Coddington
2a83ffc557 NFSD: Sign filehandles
NFS clients may bypass restrictive directory permissions by using
open_by_handle() (or other available OS system call) to guess the
filehandles for files below that directory.

In order to harden knfsd servers against this attack, create a method to
sign and verify filehandles using SipHash-2-4 as a MAC (Message
Authentication Code).  According to
https://cr.yp.to/siphash/siphash-20120918.pdf, SipHash can be used as a
MAC, and our use of SipHash-2-4 provides a low 1 in 2^64 chance of forgery.

Filehandles that have been signed cannot be tampered with, nor can
clients reasonably guess correct filehandles and hashes that may exist in
parts of the filesystem they cannot access due to directory permissions.

Append the 8 byte SipHash to encoded filehandles for exports that have set
the "sign_fh" export option.  Filehandles received from clients are
verified by comparing the appended hash to the expected hash.  If the MAC
does not match the server responds with NFS error _STALE.  If unsigned
filehandles are received for an export with "sign_fh" they are rejected
with NFS error _STALE.

Signed-off-by: Benjamin Coddington <bcodding@hammerspace.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29 21:25:09 -04:00
Benjamin Coddington
a002ad8a9b NFSD/export: Add sign_fh export option
In order to signal that filehandles on this export should be signed, add a
"sign_fh" export option.  Filehandle signing can help the server defend
against certain filehandle guessing attacks.

Setting the "sign_fh" export option sets NFSEXP_SIGN_FH.  In a future patch
NFSD uses this signal to append a MAC onto filehandles for that export.

While we're in here, tidy a few stray expflags to more closely align to the
export flag order.

Link: https://lore.kernel.org/linux-nfs/cover.1772022373.git.bcodding@hammerspace.com
Signed-off-by: Benjamin Coddington <bcodding@hammerspace.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29 21:25:09 -04:00
Benjamin Coddington
62346217fd NFSD: Add a key for signing filehandles
A future patch will enable NFSD to sign filehandles by appending a Message
Authentication Code(MAC).  To do this, NFSD requires a secret 128-bit key
that can persist across reboots.  A persisted key allows the server to
accept filehandles after a restart.  Enable NFSD to be configured with this
key via the netlink interface.

Link: https://lore.kernel.org/linux-nfs/cover.1772022373.git.bcodding@hammerspace.com
Signed-off-by: Benjamin Coddington <bcodding@hammerspace.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29 21:25:09 -04:00
Chuck Lever
116b6b7acd nfsd: use dynamic allocation for oversized NFSv4.0 replay cache
Commit 1e8e9913672a ("nfsd: fix heap overflow in NFSv4.0 LOCK
replay cache") capped the replay cache copy at NFSD4_REPLAY_ISIZE
to prevent a heap overflow, but set rp_buflen to zero when the
encoded response exceeded the inline buffer. A retransmitted LOCK
reaching the replay path then produced only a status code with no
operation body, resulting in a malformed XDR response.

When the encoded response exceeds the 112-byte inline rp_ibuf, a
buffer is kmalloc'd to hold it. If the allocation fails, rp_buflen
remains zero, preserving the behavior from the capped-copy fix.
The buffer is freed when the stateowner is released or when a
subsequent operation's response fits in the inline buffer.

Fixes: 1e8e9913672a ("nfsd: fix heap overflow in NFSv4.0 LOCK replay cache")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29 21:25:09 -04:00
Jeff Layton
8be12e0cf2 nfsd: convert global state_lock to per-net deleg_lock
Replace the global state_lock spinlock with a per-nfsd_net deleg_lock.
The state_lock was only used to protect delegation lifecycle operations
(the del_recall_lru list and delegation hash/unhash), all of which are
scoped to a single network namespace. Making the lock per-net removes
a source of unnecessary contention between containers.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29 21:25:09 -04:00
Jeff Layton
facc4e3c80 sunrpc: split cache_detail queue into request and reader lists
Replace the single interleaved queue (which mixed cache_request and
cache_reader entries distinguished by a ->reader flag) with two
dedicated lists: cd->requests for upcall requests and cd->readers
for open file handles.

Readers now track their position via a monotonically increasing
sequence number (next_seqno) rather than by their position in the
shared list. Each cache_request is assigned a seqno when enqueued,
and a new cache_next_request() helper finds the next request at or
after a given seqno.

This eliminates the cache_queue wrapper struct entirely, simplifies
the reader-skipping loops in cache_read/cache_poll/cache_ioctl/
cache_release, and makes the data flow easier to reason about.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29 21:25:09 -04:00
Jeff Layton
552d0e17ea sunrpc: convert queue_wait from global to per-cache-detail waitqueue
The queue_wait waitqueue is currently a file-scoped global, so a
wake_up for one cache_detail wakes pollers on all caches. Convert it
to a per-cache-detail field so that only pollers on the relevant cache
are woken.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29 21:25:09 -04:00
Jeff Layton
17c1d66579 sunrpc: convert queue_lock from global spinlock to per-cache-detail lock
The global queue_lock serializes all upcall queue operations across
every cache_detail instance. Convert it to a per-cache-detail spinlock
so that different caches (e.g. auth.unix.ip vs nfsd.fh) no longer
contend with each other on queue operations.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29 21:25:09 -04:00
Chuck Lever
6b4f16a532 sunrpc: Add XPT flags missing from SVC_XPRT_FLAG_LIST
Commit eccbbc7c00 ("nfsd: don't use sv_nrthreads in connection
limiting calculations.") and commit 898374fdd7 ("nfsd: unregister
with rpcbind when deleting a transport") added new XPT flags but
neglected to update the show_svc_xprt_flags() macro.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29 21:25:09 -04:00
Chuck Lever
4f406a2c1e lockd: Remove dead code from fs/lockd/xdr4.c
Now that all NLMv4 server-side procedures use XDR encoder and
decoder functions generated by xdrgen, the hand-written code in
fs/lockd/xdr4.c is no longer needed. This file contained the
original XDR processing logic that has been systematically
replaced throughout this series.

Remove the file and its Makefile reference to eliminate the
dead code. The helper function nlm4svc_set_file_lock_range()
is still needed by the generated code, so move it to xdr4.h
as an inline function where it remains accessible.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29 21:25:09 -04:00
Chuck Lever
b131a424b0 lockd: Remove C macros that are no longer used
The conversion of all NLMv4 procedures to xdrgen-generated
XDR functions is complete. The hand-rolled XDR size
calculation macros (Ck, No, St, Rg) and the nlm_void
structure definition served only the older implementations
and are now unused.

Also removes NLMDBG_FACILITY, which was set to the client
debug flag in server-side code but never referenced, and
corrects a comment to specify "NLMv4 Server procedures".

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29 21:25:09 -04:00
Chuck Lever
515788fa98 lockd: Add LOCKD_SHARE_SVID constant for DOS sharing mode
Replace the magic value ~(u32)0 with a named constant. This value
is used as a synthetic svid when looking up lockowners for DOS
share operations, which have no real process ID associated with
them.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29 21:25:09 -04:00
Chuck Lever
b201ce7af2 lockd: Use xdrgen XDR functions for the NLMv4 FREE_ALL procedure
With all other NLMv4 procedures now converted to xdrgen-generated
XDR functions, the FREE_ALL procedure can be converted as well.
This conversion allows the removal of nlm4svc_retrieve_args(),
a 79-line helper function that was used only by FREE_ALL to
retrieve client information from lockd's internal data
structures.

Replace the NLMPROC4_FREE_ALL entry in the nlm_procedures4
array with an entry that uses xdrgen-built XDR decoders and
encoders. The procedure handler is updated to use the new
wrapper structure (nlm4_notify_wrapper) and call
nlm4svc_lookup_host() directly, eliminating the need for the
now-removed helper function.

The .pc_argzero field is set to zero because xdrgen decoders
fully initialize all fields in argp->xdrgen, making the early
defensive memset unnecessary. The remaining argp fields that
fall outside the xdrgen structures are cleared explicitly as
needed.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29 21:25:09 -04:00
Chuck Lever
eedae44301 lockd: Use xdrgen XDR functions for the NLMv4 NM_LOCK procedure
Now that nlm4svc_do_lock() has been introduced to handle both
monitored and non-monitored lock requests, the NLMv4 NM_LOCK
procedure can be converted to use xdrgen-generated XDR
functions. This conversion allows the removal of
__nlm4svc_proc_lock(), a helper function that was previously
shared between the LOCK and NM_LOCK procedures.

Replace the NLMPROC4_NM_LOCK entry in the nlm_procedures4
array with an entry that uses xdrgen-built XDR decoders and
encoders. The procedure handler is updated to call
nlm4svc_do_lock() directly and access arguments through the
argp->xdrgen hierarchy.

The .pc_argzero field is set to zero because xdrgen decoders
fully initialize all fields in argp->xdrgen, making the early
defensive memset unnecessary. The remaining argp fields that
fall outside the xdrgen structures are cleared explicitly as
needed.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29 21:25:09 -04:00
Chuck Lever
985d022db3 lockd: Use xdrgen XDR functions for the NLMv4 UNSHARE procedure
Now that the share helpers have been decoupled from the
NLMv3-specific struct nlm_args and file_lock initialization
has been hoisted into the procedure handler, the NLMv4 UNSHARE
procedure can be converted to use xdrgen-generated XDR
functions.

Replace the NLMPROC4_UNSHARE entry in the nlm_procedures4
array with an entry that uses xdrgen-built XDR decoders and
encoders. The procedure handler is updated to use the new
wrapper structures (nlm4_shareargs_wrapper and
nlm4_shareres_wrapper) and access arguments through the
argp->xdrgen hierarchy.

The .pc_argzero field is set to zero because xdrgen decoders
fully initialize all fields in argp->xdrgen, making the early
defensive memset unnecessary. The remaining argp fields that
fall outside the xdrgen structures are cleared explicitly as
needed.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29 21:25:09 -04:00
Chuck Lever
57c7bb3bd2 lockd: Use xdrgen XDR functions for the NLMv4 SHARE procedure
Now that the share helpers have been decoupled from the
NLMv3-specific struct nlm_args and file_lock initialization
has been hoisted into the procedure handler, the NLMv4 SHARE
procedure can be converted to use xdrgen-generated XDR
functions.

Replace the NLMPROC4_SHARE entry in the nlm_procedures4 array
with an entry that uses xdrgen-built XDR decoders and encoders.
The procedure handler is updated to use the new wrapper
structures (nlm4_shareargs_wrapper and nlm4_shareres_wrapper)
and access arguments through the argp->xdrgen hierarchy.

The .pc_argzero field is set to zero because xdrgen decoders
fully initialize all fields in argp->xdrgen, making the early
defensive memset unnecessary. The remaining argp fields that
fall outside the xdrgen structures are cleared explicitly as
needed.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29 21:25:09 -04:00
Chuck Lever
4e6814b175 lockd: Prepare share helpers for xdrgen conversion
In order to convert the NLMv4 server-side XDR functions to use
xdrgen, the internal share helpers need to be decoupled from the
NLMv3-specific struct nlm_args. NLMv4 procedures will use
different argument structures once they are converted.

Refactor nlmsvc_share_file() and nlmsvc_unshare_file() to accept
individual arguments (oh, access, mode) instead of the common
struct nlm_args. This allows both protocol versions to call these
helpers without forcing a common argument structure.

While here, add kdoc comments to both functions and fix a comment
typo in the unshare path.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29 21:25:09 -04:00
Chuck Lever
bb2a70b610 lockd: Hoist file_lock init out of nlm4svc_decode_shareargs()
The xdrgen-generated XDR decoders cannot initialize the
file_lock structure because it is an internal kernel type,
not part of the wire protocol. To prepare for converting
SHARE and UNSHARE procedures to use xdrgen, the file_lock
initialization must be moved from nlm4svc_decode_shareargs()
into the procedure handlers themselves.

This change removes one more dependency on the "struct
nlm_lock::fl" field in fs/lockd/xdr4.c, allowing the XDR
decoder to focus solely on unmarshalling wire data.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29 21:25:09 -04:00
Chuck Lever
5eae0e00dc lockd: Convert server-side undefined procedures to xdrgen
The NLMv4 protocol defines several procedure slots that are
not implemented. These undefined procedures need proper
handling to return rpc_proc_unavail to clients that
mistakenly invoke them.

This patch converts the three undefined procedure entries
(slots 17, 18, and 19) to use xdrgen functions
nlm4_svc_decode_void and nlm4_svc_encode_void. The
nlm4svc_proc_unused function is also moved earlier in the
file to follow the convention of placing procedure
implementations before the procedure table.

The pc_argsize, pc_ressize, and pc_argzero fields are now
correctly set to zero since no arguments or results are
processed. The pc_xdrressize field is updated to XDR_void
to accurately reflect the response size.

This conversion completes the migration of all NLMv4
server-side procedures to use xdrgen-generated XDR
functions, improving type safety and eliminating
hand-written XDR code.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29 21:25:09 -04:00
Chuck Lever
16099e1002 lockd: Use xdrgen XDR functions for the NLMv4 SM_NOTIFY procedure
Convert the SM_NOTIFY procedure to use xdrgen functions
nlm4_svc_decode_nlm4_notifyargs and nlm4_svc_encode_void.
SM_NOTIFY is a private callback from statd to notify lockd
when a remote host has rebooted.

This patch introduces struct nlm4_notifyargs_wrapper to
bridge between the xdrgen-generated nlm4_notifyargs and
the nlm_reboot structure expected by nlm_host_rebooted().
The wrapper contains both the xdrgen-decoded arguments
and a reboot field for the existing API.

The pc_argzero field is set to zero because xdrgen decoders
reliably initialize all arguments, making the early
defensive memset unnecessary.

This change also corrects the pc_xdrressize field, which
previously contained a placeholder value.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29 21:25:09 -04:00
Chuck Lever
5076fff93c lockd: Use xdrgen XDR functions for the NLMv4 GRANTED_RES procedure
Convert the GRANTED_RES procedure to use xdrgen functions
nlm4_svc_decode_nlm4_res and nlm4_svc_encode_void.
GRANTED_RES is a callback procedure where the client sends
granted lock results back to the server after an async
GRANTED request.

The pc_argzero field is set to zero because xdrgen decoders
reliably initialize all arguments, making the early
defensive memset unnecessary.

This change also corrects the pc_xdrressize field, which
previously contained a placeholder value.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29 21:25:09 -04:00
Chuck Lever
d4fc8bc100 lockd: Use xdrgen XDR functions for the NLMv4 UNLOCK_RES procedure
Update the NLMPROC4_UNLOCK_RES entry in nlm_procedures4 to invoke
xdrgen-generated XDR functions.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29 21:25:09 -04:00
Chuck Lever
f0eec0eb50 lockd: Use xdrgen XDR functions for the NLMv4 CANCEL_RES procedure
Convert the CANCEL_RES procedure to use xdrgen functions
nlm4_svc_decode_nlm4_res and nlm4_svc_encode_void.
CANCEL_RES is a callback procedure where the client sends
cancel results back to the server after an async CANCEL
request.

The pc_argzero field is set to zero because xdrgen decoders
reliably initialize all arguments, making the early
defensive memset unnecessary.

This change also corrects the pc_xdrressize field, which
previously contained a placeholder value.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29 21:25:09 -04:00
Chuck Lever
50976ab979 lockd: Use xdrgen XDR functions for the NLMv4 LOCK_RES procedure
Convert the LOCK_RES procedure to use xdrgen functions
nlm4_svc_decode_nlm4_res and nlm4_svc_encode_void.
LOCK_RES is a callback procedure where the client sends
lock results back to the server after an async LOCK
request.

The pc_argzero field is set to zero because xdrgen decoders
reliably initialize all arguments, making the early
defensive memset unnecessary.

This change also corrects the pc_xdrressize field, which
previously contained a placeholder value.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29 21:25:09 -04:00
Chuck Lever
4764124811 lockd: Use xdrgen XDR functions for the NLMv4 TEST_RES procedure
Convert the TEST_RES procedure to use xdrgen functions
nlm4_svc_decode_nlm4_testres and nlm4_svc_encode_void.
TEST_RES is a callback procedure where the client sends
test lock results back to the server after an async TEST
request.

The pc_argzero field is set to zero because xdrgen decoders
reliably initialize all arguments, making the early
defensive memset unnecessary.

This change also corrects the pc_xdrressize field, which
previously contained a placeholder value.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29 21:25:09 -04:00
Chuck Lever
62721885e8 lockd: Use xdrgen XDR functions for the NLMv4 GRANTED_MSG procedure
Convert the GRANTED_MSG procedure to use xdrgen functions
nlm4_svc_decode_nlm4_testargs and nlm4_svc_encode_void.
The procedure handler uses the nlm4_testargs_wrapper
structure that bridges between xdrgen types and the legacy
nlm_lock representation.

The pc_argzero field is set to zero because xdrgen decoders
reliably initialize all arguments, making the early
defensive memset unnecessary.

The NLM async callback mechanism uses client-side functions
which continue to take legacy struct nlm_res, preventing
GRANTED and GRANTED_MSG from sharing code for now.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29 21:25:09 -04:00
Chuck Lever
eff7d82f89 lockd: Use xdrgen XDR functions for the NLMv4 UNLOCK_MSG procedure
Convert the UNLOCK_MSG procedure to use xdrgen functions
nlm4_svc_decode_nlm4_unlockargs and nlm4_svc_encode_void.
The procedure handler uses the nlm4_unlockargs_wrapper
structure that bridges between xdrgen types and the legacy
nlm_lock representation.

The pc_argzero field is set to zero because xdrgen decoders
reliably initialize all arguments, making the early
defensive memset unnecessary.

The NLM async callback mechanism uses client-side functions
which continue to take legacy struct nlm_res, preventing
UNLOCK and UNLOCK_MSG from sharing code for now.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29 21:25:09 -04:00
Chuck Lever
dea5b7ac0e lockd: Use xdrgen XDR functions for the NLMv4 CANCEL_MSG procedure
The CANCEL_MSG procedure is part of NLM's asynchronous lock
request flow, where clients send CANCEL_MSG to cancel pending
lock requests. This patch continues the xdrgen migration by
converting CANCEL_MSG to use generated XDR functions.

This patch converts the CANCEL_MSG procedure to use xdrgen
functions nlm4_svc_decode_nlm4_cancargs and
nlm4_svc_encode_void generated from the NLM version 4 protocol
specification. The procedure handler uses xdrgen types through
the nlm4_cancargs_wrapper structure that bridges between
generated code and the legacy nlm_lock representation.

The pc_argzero field is set to zero because xdrgen decoders
reliably initialize all arguments in the argp->xdrgen field,
making the early defensive memset unnecessary. Remaining argp
fields are cleared as needed.

The NLM async callback mechanism uses client-side functions
which continue to take legacy results like struct nlm_res,
preventing CANCEL and CANCEL_MSG from sharing code for now.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29 21:25:09 -04:00
Chuck Lever
b2be4e28c2 lockd: Use xdrgen XDR functions for the NLMv4 LOCK_MSG procedure
The LOCK_MSG procedure is part of NLM's asynchronous lock
request flow, where clients send LOCK_MSG to request locks
that may block. This patch continues the xdrgen migration by
converting LOCK_MSG to use generated XDR functions.

This patch converts the LOCK_MSG procedure to use xdrgen
functions nlm4_svc_decode_nlm4_lockargs and
nlm4_svc_encode_void generated from the NLM version 4
protocol specification. The procedure handler uses xdrgen
types through the nlm4_lockargs_wrapper structure that
bridges between generated code and the legacy nlm_lock
representation.

The pc_argzero field is set to zero because xdrgen decoders
reliably initialize all arguments in the argp->xdrgen field,
making the early defensive memset unnecessary. Remaining
argp fields are cleared as needed.

The NLM async callback mechanism uses client-side functions
which continue to take legacy results like struct nlm_res,
preventing LOCK and LOCK_MSG from sharing code for now.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29 21:25:09 -04:00
Chuck Lever
331f2b6acb lockd: Use xdrgen XDR functions for the NLMv4 TEST_MSG procedure
The TEST_MSG procedure is part of NLM's asynchronous lock request
flow, where clients send TEST_MSG to check lock availability
without blocking. This patch continues the xdrgen migration by
converting TEST_MSG to use generated XDR functions.

This patch converts the TEST_MSG procedure to use xdrgen
functions nlm4_svc_decode_nlm4_testargs and
nlm4_svc_encode_void generated from the NLM version 4 protocol
specification. The procedure handler uses xdrgen types through
the nlm4_testargs_wrapper structure that bridges between
generated code and the legacy nlm_lock representation.

The pc_argzero field is set to zero because xdrgen decoders
reliably initialize all arguments in the argp->xdrgen field,
making the early defensive memset unnecessary. Remaining argp
fields are cleared as needed.

The NLM async callback mechanism uses client-side functions
which continue to take legacy results like struct nlm_res,
preventing TEST and TEST_MSG from sharing code for now.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-03-29 21:25:09 -04:00