From 8a7348a9ed70bda1c1f51d3f1815bcbdf9f3b38c Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Wed, 8 Oct 2025 09:52:25 -0400 Subject: [PATCH 1/7] nfsd: fix refcount leak in nfsd_set_fh_dentry() nfsd exports a "pseudo root filesystem" which is used by NFSv4 to find the various exported filesystems using LOOKUP requests from a known root filehandle. NFSv3 uses the MOUNT protocol to find those exported filesystems and so is not given access to the pseudo root filesystem. If a v3 (or v2) client uses a filehandle from that filesystem, nfsd_set_fh_dentry() will report an error, but still stores the export in "struct svc_fh" even though it also drops the reference (exp_put()). This means that when fh_put() is called an extra reference will be dropped which can lead to use-after-free and possible denial of service. Normal NFS usage will not provide a pseudo-root filehandle to a v3 client. This bug can only be triggered by the client synthesising an incorrect filehandle. To fix this we move the assignments to the svc_fh later, after all possible error cases have been detected. Reported-and-tested-by: tianshuo han Fixes: ef7f6c4904d0 ("nfsd: move V4ROOT version check to nfsd_set_fh_dentry()") Signed-off-by: NeilBrown Reviewed-by: Jeff Layton Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever --- fs/nfsd/nfsfh.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c index 3edccc38db42..bd9acfdc7b01 100644 --- a/fs/nfsd/nfsfh.c +++ b/fs/nfsd/nfsfh.c @@ -269,9 +269,6 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct net *net, dentry); } - fhp->fh_dentry = dentry; - fhp->fh_export = exp; - switch (fhp->fh_maxsize) { case NFS4_FHSIZE: if (dentry->d_sb->s_export_op->flags & EXPORT_OP_NOATOMIC_ATTR) @@ -293,6 +290,9 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct net *net, goto out; } + fhp->fh_dentry = dentry; + fhp->fh_export = exp; + return 0; out: exp_put(exp); From 4d3dbc2386fe051e44efad663e0ec828b98ab53f Mon Sep 17 00:00:00 2001 From: Olga Kornievskaia Date: Thu, 9 Oct 2025 16:37:59 -0400 Subject: [PATCH 2/7] nfsd: add missing FATTR4_WORD2_CLONE_BLKSIZE from supported attributes RFC 7862 Section 4.1.2 says that if the server supports CLONE it MUST support clone_blksize attribute. Fixes: d6ca7d2643ee ("NFSD: Implement FATTR4_CLONE_BLKSIZE attribute") Cc: stable@vger.kernel.org Signed-off-by: Olga Kornievskaia Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever --- fs/nfsd/nfsd.h | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index f19320018639..b752433c3c2c 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -458,6 +458,7 @@ enum { #define NFSD4_2_SUPPORTED_ATTRS_WORD2 \ (NFSD4_1_SUPPORTED_ATTRS_WORD2 | \ FATTR4_WORD2_MODE_UMASK | \ + FATTR4_WORD2_CLONE_BLKSIZE | \ NFSD4_2_SECURITY_ATTRS | \ FATTR4_WORD2_XATTR_SUPPORT | \ FATTR4_WORD2_TIME_DELEG_ACCESS | \ From 4aa17144d5abc3c756883e3a010246f0dba8b468 Mon Sep 17 00:00:00 2001 From: Olga Kornievskaia Date: Tue, 14 Oct 2025 13:59:59 -0400 Subject: [PATCH 3/7] NFSD: free copynotify stateid in nfs4_free_ol_stateid() Typically copynotify stateid is freed either when parent's stateid is being close/freed or in nfsd4_laundromat if the stateid hasn't been used in a lease period. However, in case when the server got an OPEN (which created a parent stateid), followed by a COPY_NOTIFY using that stateid, followed by a client reboot. New client instance while doing CREATE_SESSION would force expire previous state of this client. It leads to the open state being freed thru release_openowner-> nfs4_free_ol_stateid() and it finds that it still has copynotify stateid associated with it. We currently print a warning and is triggerred WARNING: CPU: 1 PID: 8858 at fs/nfsd/nfs4state.c:1550 nfs4_free_ol_stateid+0xb0/0x100 [nfsd] This patch, instead, frees the associated copynotify stateid here. If the parent stateid is freed (without freeing the copynotify stateids associated with it), it leads to the list corruption when laundromat ends up freeing the copynotify state later. [ 1626.839430] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP [ 1626.842828] Modules linked in: nfnetlink_queue nfnetlink_log bluetooth cfg80211 rpcrdma rdma_cm iw_cm ib_cm ib_core nfsd nfs_acl lockd grace nfs_localio ext4 crc16 mbcache jbd2 overlay uinput snd_seq_dummy snd_hrtimer qrtr rfkill vfat fat uvcvideo snd_hda_codec_generic videobuf2_vmalloc videobuf2_memops snd_hda_intel uvc snd_intel_dspcfg videobuf2_v4l2 videobuf2_common snd_hda_codec snd_hda_core videodev snd_hwdep snd_seq mc snd_seq_device snd_pcm snd_timer snd soundcore sg loop auth_rpcgss vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vmw_vmci vsock xfs 8021q garp stp llc mrp nvme ghash_ce e1000e nvme_core sr_mod nvme_keyring nvme_auth cdrom vmwgfx drm_ttm_helper ttm sunrpc dm_mirror dm_region_hash dm_log iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi fuse dm_multipath dm_mod nfnetlink [ 1626.855594] CPU: 2 UID: 0 PID: 199 Comm: kworker/u24:33 Kdump: loaded Tainted: G B W 6.17.0-rc7+ #22 PREEMPT(voluntary) [ 1626.857075] Tainted: [B]=BAD_PAGE, [W]=WARN [ 1626.857573] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS VMW201.00V.24006586.BA64.2406042154 06/04/2024 [ 1626.858724] Workqueue: nfsd4 laundromat_main [nfsd] [ 1626.859304] pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) [ 1626.860010] pc : __list_del_entry_valid_or_report+0x148/0x200 [ 1626.860601] lr : __list_del_entry_valid_or_report+0x148/0x200 [ 1626.861182] sp : ffff8000881d7a40 [ 1626.861521] x29: ffff8000881d7a40 x28: 0000000000000018 x27: ffff0000c2a98200 [ 1626.862260] x26: 0000000000000600 x25: 0000000000000000 x24: ffff8000881d7b20 [ 1626.862986] x23: ffff0000c2a981e8 x22: 1fffe00012410e7d x21: ffff0000920873e8 [ 1626.863701] x20: ffff0000920873e8 x19: ffff000086f22998 x18: 0000000000000000 [ 1626.864421] x17: 20747562202c3839 x16: 3932326636383030 x15: 3030666666662065 [ 1626.865092] x14: 6220646c756f6873 x13: 0000000000000001 x12: ffff60004fd9e4a3 [ 1626.865713] x11: 1fffe0004fd9e4a2 x10: ffff60004fd9e4a2 x9 : dfff800000000000 [ 1626.866320] x8 : 00009fffb0261b5e x7 : ffff00027ecf2513 x6 : 0000000000000001 [ 1626.866938] x5 : ffff00027ecf2510 x4 : ffff60004fd9e4a3 x3 : 0000000000000000 [ 1626.867553] x2 : 0000000000000000 x1 : ffff000096069640 x0 : 000000000000006d [ 1626.868167] Call trace: [ 1626.868382] __list_del_entry_valid_or_report+0x148/0x200 (P) [ 1626.868876] _free_cpntf_state_locked+0xd0/0x268 [nfsd] [ 1626.869368] nfs4_laundromat+0x6f8/0x1058 [nfsd] [ 1626.869813] laundromat_main+0x24/0x60 [nfsd] [ 1626.870231] process_one_work+0x584/0x1050 [ 1626.870595] worker_thread+0x4c4/0xc60 [ 1626.870893] kthread+0x2f8/0x398 [ 1626.871146] ret_from_fork+0x10/0x20 [ 1626.871422] Code: aa1303e1 aa1403e3 910e8000 97bc55d7 (d4210000) [ 1626.871892] SMP: stopping secondary CPUs Reported-by: rtm@csail.mit.edu Closes: https://lore.kernel.org/linux-nfs/d8f064c1-a26f-4eed-b4f0-1f7f608f415f@oracle.com/T/#t Fixes: 624322f1adc5 ("NFSD add COPY_NOTIFY operation") Cc: stable@vger.kernel.org Signed-off-by: Olga Kornievskaia Signed-off-by: Chuck Lever --- fs/nfsd/nfs4state.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index c1b54322c412..de763e6d9b58 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -1542,7 +1542,8 @@ static void nfs4_free_ol_stateid(struct nfs4_stid *stid) release_all_access(stp); if (stp->st_stateowner) nfs4_put_stateowner(stp->st_stateowner); - WARN_ON(!list_empty(&stid->sc_cp_list)); + if (!list_empty(&stid->sc_cp_list)) + nfs4_free_cpntf_statelist(stid->sc_client->net, stid); kmem_cache_free(stateid_slab, stid); } From ff8141e49cf70d2d093a5228f5299ce188de6142 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 16 Oct 2025 09:49:55 -0400 Subject: [PATCH 4/7] NFSD: Skip close replay processing if XDR encoding fails The replay logic added by commit 9411b1d4c7df ("nfsd4: cleanup handling of nfsv4.0 closed stateid's") cannot be done if encoding failed due to a short send buffer; there's no guarantee that the operation encoder has actually encoded the data that is being copied to the replay cache. Reported-by: rtm@csail.mit.edu Closes: https://lore.kernel.org/linux-nfs/c3628d57-94ae-48cf-8c9e-49087a28cec9@oracle.com/T/#t Fixes: 9411b1d4c7df ("nfsd4: cleanup handling of nfsv4.0 closed stateid's") Reviewed-by: Jeff Layton Reviewed-by: NeilBrown Signed-off-by: Chuck Lever --- fs/nfsd/nfs4xdr.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 6040a6145dad..cf5df0f50208 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -5925,8 +5925,7 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op) */ warn_on_nonidempotent_op(op); xdr_truncate_encode(xdr, op_status_offset + XDR_UNIT); - } - if (so) { + } else if (so) { int len = xdr->buf->len - (op_status_offset + XDR_UNIT); so->so_replay.rp_status = op->status; From c96573c0d75db3f8478000d0d392a9cdb95adbed Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 16 Oct 2025 09:49:56 -0400 Subject: [PATCH 5/7] NFSD: Never cache a COMPOUND when the SEQUENCE operation fails RFC 8881 normatively mandates that operations where the initial SEQUENCE operation in a compound fails must not modify the slot's replay cache. nfsd4_cache_this() doesn't prevent such caching. So when SEQUENCE fails, cstate.data_offset is not set, allowing read_bytes_from_xdr_buf() to access uninitialized memory. Reported-by: rtm@csail.mit.edu Closes: https://lore.kernel.org/linux-nfs/c3628d57-94ae-48cf-8c9e-49087a28cec9@oracle.com/T/#t Fixes: 468de9e54a90 ("nfsd41: expand solo sequence check") Reviewed-by: NeilBrown Signed-off-by: Chuck Lever --- fs/nfsd/nfs4state.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index de763e6d9b58..c8c326679dca 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -3487,7 +3487,20 @@ nfsd4_store_cache_entry(struct nfsd4_compoundres *resp) struct nfsd4_slot *slot = resp->cstate.slot; unsigned int base; - dprintk("--> %s slot %p\n", __func__, slot); + /* + * RFC 5661 Section 2.10.6.1.2: + * + * Any time SEQUENCE ... returns an error ... [t]he replier MUST NOT + * modify the reply cache entry for the slot whenever an error is + * returned from SEQUENCE ... + * + * Because nfsd4_store_cache_entry is called only by + * nfsd4_sequence_done(), nfsd4_store_cache_entry() is called only + * when a SEQUENCE operation was part of the COMPOUND. + * nfs41_check_op_ordering() ensures SEQUENCE is the first op. + */ + if (resp->opcnt == 1 && resp->cstate.status != nfs_ok) + return; slot->sl_flags |= NFSD4_SLOT_INITIALIZED; slot->sl_opcnt = resp->opcnt; From 1cff14b7fc7f31363c39d0269563ce75c714f7ae Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Thu, 16 Oct 2025 09:49:57 -0400 Subject: [PATCH 6/7] nfsd: ensure SEQUENCE replay sends a valid reply. nfsd4_enc_sequence_replay() uses nfsd4_encode_operation() to encode a new SEQUENCE reply when replaying a request from the slot cache - only ops after the SEQUENCE are replayed from the cache in ->sl_data. However it does this in nfsd4_replay_cache_entry() which is called *before* nfsd4_sequence() has filled in reply fields. This means that in the replayed SEQUENCE reply: maxslots will be whatever the client sent target_maxslots will be -1 (assuming init to zero, and nfsd4_encode_sequence() subtracts 1) status_flags will be zero The incorrect maxslots value, in particular, can cause the client to think the slot table has been reduced in size so it can discard its knowledge of current sequence number of the later slots, though the server has not discarded those slots. When the client later wants to use a later slot, it can get NFS4ERR_SEQ_MISORDERED from the server. This patch moves the setup of the reply into a new helper function and call it *before* nfsd4_replay_cache_entry() is called. Only one of the updated fields was used after this point - maxslots. So the nfsd4_sequence struct has been extended to have separate maxslots for the request and the response. Reported-by: Olga Kornievskaia Closes: https://lore.kernel.org/linux-nfs/20251010194449.10281-1-okorniev@redhat.com/ Tested-by: Olga Kornievskaia Signed-off-by: NeilBrown Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever --- fs/nfsd/nfs4state.c | 50 ++++++++++++++++++++++++++++++--------------- fs/nfsd/nfs4xdr.c | 2 +- fs/nfsd/xdr4.h | 3 ++- 3 files changed, 36 insertions(+), 19 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index c8c326679dca..8a6960500217 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4363,6 +4363,36 @@ static bool replay_matches_cache(struct svc_rqst *rqstp, return true; } +/* + * Note that the response is constructed here both for the case + * of a new SEQUENCE request and for a replayed SEQUENCE request. + * We do not cache SEQUENCE responses as SEQUENCE is idempotent. + */ +static void nfsd4_construct_sequence_response(struct nfsd4_session *session, + struct nfsd4_sequence *seq) +{ + struct nfs4_client *clp = session->se_client; + + seq->maxslots_response = max(session->se_target_maxslots, + seq->maxslots); + seq->target_maxslots = session->se_target_maxslots; + + switch (clp->cl_cb_state) { + case NFSD4_CB_DOWN: + seq->status_flags = SEQ4_STATUS_CB_PATH_DOWN; + break; + case NFSD4_CB_FAULT: + seq->status_flags = SEQ4_STATUS_BACKCHANNEL_FAULT; + break; + default: + seq->status_flags = 0; + } + if (!list_empty(&clp->cl_revoked)) + seq->status_flags |= SEQ4_STATUS_RECALLABLE_STATE_REVOKED; + if (atomic_read(&clp->cl_admin_revoked)) + seq->status_flags |= SEQ4_STATUS_ADMIN_STATE_REVOKED; +} + __be32 nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, union nfsd4_op_u *u) @@ -4412,6 +4442,9 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, dprintk("%s: slotid %d\n", __func__, seq->slotid); trace_nfsd_slot_seqid_sequence(clp, seq, slot); + + nfsd4_construct_sequence_response(session, seq); + status = check_slot_seqid(seq->seqid, slot->sl_seqid, slot->sl_flags); if (status == nfserr_replay_cache) { status = nfserr_seq_misordered; @@ -4509,23 +4542,6 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, } out: - seq->maxslots = max(session->se_target_maxslots, seq->maxslots); - seq->target_maxslots = session->se_target_maxslots; - - switch (clp->cl_cb_state) { - case NFSD4_CB_DOWN: - seq->status_flags = SEQ4_STATUS_CB_PATH_DOWN; - break; - case NFSD4_CB_FAULT: - seq->status_flags = SEQ4_STATUS_BACKCHANNEL_FAULT; - break; - default: - seq->status_flags = 0; - } - if (!list_empty(&clp->cl_revoked)) - seq->status_flags |= SEQ4_STATUS_RECALLABLE_STATE_REVOKED; - if (atomic_read(&clp->cl_admin_revoked)) - seq->status_flags |= SEQ4_STATUS_ADMIN_STATE_REVOKED; trace_nfsd_seq4_status(rqstp, seq); out_no_session: if (conn) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index cf5df0f50208..67bb9c0b9fcb 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -5073,7 +5073,7 @@ nfsd4_encode_sequence(struct nfsd4_compoundres *resp, __be32 nfserr, return nfserr; /* Note slotid's are numbered from zero: */ /* sr_highest_slotid */ - nfserr = nfsd4_encode_slotid4(xdr, seq->maxslots - 1); + nfserr = nfsd4_encode_slotid4(xdr, seq->maxslots_response - 1); if (nfserr != nfs_ok) return nfserr; /* sr_target_highest_slotid */ diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index ee0570cbdd9e..1ce8e12ae335 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -574,8 +574,9 @@ struct nfsd4_sequence { struct nfs4_sessionid sessionid; /* request/response */ u32 seqid; /* request/response */ u32 slotid; /* request/response */ - u32 maxslots; /* request/response */ + u32 maxslots; /* request */ u32 cachethis; /* request */ + u32 maxslots_response; /* response */ u32 target_maxslots; /* response */ u32 status_flags; /* response */ }; From 324be6dcbf09133a322db16977a84fbb45c16129 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 16 Oct 2025 11:09:13 -0400 Subject: [PATCH 7/7] Revert "SUNRPC: Make RPCSEC_GSS_KRB5 select CRYPTO instead of depending on it" Geert reports: > This is now commit d8e97cc476e33037 ("SUNRPC: Make RPCSEC_GSS_KRB5 > select CRYPTO instead of depending on it") in v6.18-rc1. > As RPCSEC_GSS_KRB5 defaults to "y", CRYPTO is now auto-enabled in > defconfigs that didn't enable it before. Revert while we work out a proper solution and then test it. Reported-by: Geert Uytterhoeven Closes: https://lore.kernel.org/linux-nfs/b97cea29-4ab7-4fb6-85ba-83f9830e524f@kernel.org/T/#t Signed-off-by: Chuck Lever --- net/sunrpc/Kconfig | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/net/sunrpc/Kconfig b/net/sunrpc/Kconfig index 33aafdc8392e..2d8b67dac7b5 100644 --- a/net/sunrpc/Kconfig +++ b/net/sunrpc/Kconfig @@ -18,10 +18,9 @@ config SUNRPC_SWAP config RPCSEC_GSS_KRB5 tristate "Secure RPC: Kerberos V mechanism" - depends on SUNRPC + depends on SUNRPC && CRYPTO default y select SUNRPC_GSS - select CRYPTO select CRYPTO_SKCIPHER select CRYPTO_HASH help