linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-02-17 05:28:58 -05:00

Author	SHA1	Message	Date
Linus Torvalds	6fa9041b71	Merge tag 'nfsd-6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: "Address recently reported issues or issues found at the recent NFS bake-a-thon held in Raleigh, NC. Issues reported with v6.18-rc: - Address a kernel build issue - Reorder SEQUENCE processing to avoid spurious NFS4ERR_SEQ_MISORDERED Issues that need expedient stable backports: - Close a refcount leak exposure - Report support for NFSv4.2 CLONE correctly - Fix oops during COPY_NOTIFY processing - Prevent rare crash after XDR encoding failure - Prevent crash due to confused or malicious NFSv4.1 client" * tag 'nfsd-6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: Revert "SUNRPC: Make RPCSEC_GSS_KRB5 select CRYPTO instead of depending on it" nfsd: ensure SEQUENCE replay sends a valid reply. NFSD: Never cache a COMPOUND when the SEQUENCE operation fails NFSD: Skip close replay processing if XDR encoding fails NFSD: free copynotify stateid in nfs4_free_ol_stateid() nfsd: add missing FATTR4_WORD2_CLONE_BLKSIZE from supported attributes nfsd: fix refcount leak in nfsd_set_fh_dentry()	2025-11-12 18:41:01 -08:00
NeilBrown	1cff14b7fc	nfsd: ensure SEQUENCE replay sends a valid reply. nfsd4_enc_sequence_replay() uses nfsd4_encode_operation() to encode a new SEQUENCE reply when replaying a request from the slot cache - only ops after the SEQUENCE are replayed from the cache in ->sl_data. However it does this in nfsd4_replay_cache_entry() which is called before nfsd4_sequence() has filled in reply fields. This means that in the replayed SEQUENCE reply: maxslots will be whatever the client sent target_maxslots will be -1 (assuming init to zero, and nfsd4_encode_sequence() subtracts 1) status_flags will be zero The incorrect maxslots value, in particular, can cause the client to think the slot table has been reduced in size so it can discard its knowledge of current sequence number of the later slots, though the server has not discarded those slots. When the client later wants to use a later slot, it can get NFS4ERR_SEQ_MISORDERED from the server. This patch moves the setup of the reply into a new helper function and call it before nfsd4_replay_cache_entry() is called. Only one of the updated fields was used after this point - maxslots. So the nfsd4_sequence struct has been extended to have separate maxslots for the request and the response. Reported-by: Olga Kornievskaia <okorniev@redhat.com> Closes: https://lore.kernel.org/linux-nfs/20251010194449.10281-1-okorniev@redhat.com/ Tested-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: NeilBrown <neil@brown.name> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-11-10 09:31:52 -05:00
Chuck Lever	c96573c0d7	NFSD: Never cache a COMPOUND when the SEQUENCE operation fails RFC 8881 normatively mandates that operations where the initial SEQUENCE operation in a compound fails must not modify the slot's replay cache. nfsd4_cache_this() doesn't prevent such caching. So when SEQUENCE fails, cstate.data_offset is not set, allowing read_bytes_from_xdr_buf() to access uninitialized memory. Reported-by: rtm@csail.mit.edu Closes: https://lore.kernel.org/linux-nfs/c3628d57-94ae-48cf-8c9e-49087a28cec9@oracle.com/T/#t Fixes: `468de9e54a` ("nfsd41: expand solo sequence check") Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-11-10 09:31:52 -05:00
Chuck Lever	ff8141e49c	NFSD: Skip close replay processing if XDR encoding fails The replay logic added by commit `9411b1d4c7` ("nfsd4: cleanup handling of nfsv4.0 closed stateid's") cannot be done if encoding failed due to a short send buffer; there's no guarantee that the operation encoder has actually encoded the data that is being copied to the replay cache. Reported-by: rtm@csail.mit.edu Closes: https://lore.kernel.org/linux-nfs/c3628d57-94ae-48cf-8c9e-49087a28cec9@oracle.com/T/#t Fixes: `9411b1d4c7` ("nfsd4: cleanup handling of nfsv4.0 closed stateid's") Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-11-10 09:31:52 -05:00
Olga Kornievskaia	4aa17144d5	NFSD: free copynotify stateid in nfs4_free_ol_stateid() Typically copynotify stateid is freed either when parent's stateid is being close/freed or in nfsd4_laundromat if the stateid hasn't been used in a lease period. However, in case when the server got an OPEN (which created a parent stateid), followed by a COPY_NOTIFY using that stateid, followed by a client reboot. New client instance while doing CREATE_SESSION would force expire previous state of this client. It leads to the open state being freed thru release_openowner-> nfs4_free_ol_stateid() and it finds that it still has copynotify stateid associated with it. We currently print a warning and is triggerred WARNING: CPU: 1 PID: 8858 at fs/nfsd/nfs4state.c:1550 nfs4_free_ol_stateid+0xb0/0x100 [nfsd] This patch, instead, frees the associated copynotify stateid here. If the parent stateid is freed (without freeing the copynotify stateids associated with it), it leads to the list corruption when laundromat ends up freeing the copynotify state later. [ 1626.839430] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP [ 1626.842828] Modules linked in: nfnetlink_queue nfnetlink_log bluetooth cfg80211 rpcrdma rdma_cm iw_cm ib_cm ib_core nfsd nfs_acl lockd grace nfs_localio ext4 crc16 mbcache jbd2 overlay uinput snd_seq_dummy snd_hrtimer qrtr rfkill vfat fat uvcvideo snd_hda_codec_generic videobuf2_vmalloc videobuf2_memops snd_hda_intel uvc snd_intel_dspcfg videobuf2_v4l2 videobuf2_common snd_hda_codec snd_hda_core videodev snd_hwdep snd_seq mc snd_seq_device snd_pcm snd_timer snd soundcore sg loop auth_rpcgss vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vmw_vmci vsock xfs 8021q garp stp llc mrp nvme ghash_ce e1000e nvme_core sr_mod nvme_keyring nvme_auth cdrom vmwgfx drm_ttm_helper ttm sunrpc dm_mirror dm_region_hash dm_log iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi fuse dm_multipath dm_mod nfnetlink [ 1626.855594] CPU: 2 UID: 0 PID: 199 Comm: kworker/u24:33 Kdump: loaded Tainted: G B W 6.17.0-rc7+ #22 PREEMPT(voluntary) [ 1626.857075] Tainted: [B]=BAD_PAGE, [W]=WARN [ 1626.857573] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS VMW201.00V.24006586.BA64.2406042154 06/04/2024 [ 1626.858724] Workqueue: nfsd4 laundromat_main [nfsd] [ 1626.859304] pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) [ 1626.860010] pc : __list_del_entry_valid_or_report+0x148/0x200 [ 1626.860601] lr : __list_del_entry_valid_or_report+0x148/0x200 [ 1626.861182] sp : ffff8000881d7a40 [ 1626.861521] x29: ffff8000881d7a40 x28: 0000000000000018 x27: ffff0000c2a98200 [ 1626.862260] x26: 0000000000000600 x25: 0000000000000000 x24: ffff8000881d7b20 [ 1626.862986] x23: ffff0000c2a981e8 x22: 1fffe00012410e7d x21: ffff0000920873e8 [ 1626.863701] x20: ffff0000920873e8 x19: ffff000086f22998 x18: 0000000000000000 [ 1626.864421] x17: 20747562202c3839 x16: 3932326636383030 x15: 3030666666662065 [ 1626.865092] x14: 6220646c756f6873 x13: 0000000000000001 x12: ffff60004fd9e4a3 [ 1626.865713] x11: 1fffe0004fd9e4a2 x10: ffff60004fd9e4a2 x9 : dfff800000000000 [ 1626.866320] x8 : 00009fffb0261b5e x7 : ffff00027ecf2513 x6 : 0000000000000001 [ 1626.866938] x5 : ffff00027ecf2510 x4 : ffff60004fd9e4a3 x3 : 0000000000000000 [ 1626.867553] x2 : 0000000000000000 x1 : ffff000096069640 x0 : 000000000000006d [ 1626.868167] Call trace: [ 1626.868382] __list_del_entry_valid_or_report+0x148/0x200 (P) [ 1626.868876] _free_cpntf_state_locked+0xd0/0x268 [nfsd] [ 1626.869368] nfs4_laundromat+0x6f8/0x1058 [nfsd] [ 1626.869813] laundromat_main+0x24/0x60 [nfsd] [ 1626.870231] process_one_work+0x584/0x1050 [ 1626.870595] worker_thread+0x4c4/0xc60 [ 1626.870893] kthread+0x2f8/0x398 [ 1626.871146] ret_from_fork+0x10/0x20 [ 1626.871422] Code: aa1303e1 aa1403e3 910e8000 97bc55d7 (d4210000) [ 1626.871892] SMP: stopping secondary CPUs Reported-by: rtm@csail.mit.edu Closes: https://lore.kernel.org/linux-nfs/d8f064c1-a26f-4eed-b4f0-1f7f608f415f@oracle.com/T/#t Fixes: `624322f1ad` ("NFSD add COPY_NOTIFY operation") Cc: stable@vger.kernel.org Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-11-10 09:31:52 -05:00
Olga Kornievskaia	4d3dbc2386	nfsd: add missing FATTR4_WORD2_CLONE_BLKSIZE from supported attributes RFC 7862 Section 4.1.2 says that if the server supports CLONE it MUST support clone_blksize attribute. Fixes: `d6ca7d2643` ("NFSD: Implement FATTR4_CLONE_BLKSIZE attribute") Cc: stable@vger.kernel.org Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-11-04 11:02:31 -05:00
NeilBrown	8a7348a9ed	nfsd: fix refcount leak in nfsd_set_fh_dentry() nfsd exports a "pseudo root filesystem" which is used by NFSv4 to find the various exported filesystems using LOOKUP requests from a known root filehandle. NFSv3 uses the MOUNT protocol to find those exported filesystems and so is not given access to the pseudo root filesystem. If a v3 (or v2) client uses a filehandle from that filesystem, nfsd_set_fh_dentry() will report an error, but still stores the export in "struct svc_fh" even though it also drops the reference (exp_put()). This means that when fh_put() is called an extra reference will be dropped which can lead to use-after-free and possible denial of service. Normal NFS usage will not provide a pseudo-root filehandle to a v3 client. This bug can only be triggered by the client synthesising an incorrect filehandle. To fix this we move the assignments to the svc_fh later, after all possible error cases have been detected. Reported-and-tested-by: tianshuo han <hantianshuo233@gmail.com> Fixes: `ef7f6c4904` ("nfsd: move V4ROOT version check to nfsd_set_fh_dentry()") Signed-off-by: NeilBrown <neil@brown.name> Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-11-04 11:02:31 -05:00
Linus Torvalds	8eefed8f65	Merge tag 'nfsd-6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: "Regression fixes: - Revert the patch that removed the cap on MAX_OPS_PER_COMPOUND - Address a kernel build issue Stable fixes: - Fix crash when a client queries new attributes on forechannel - Fix rare NFSD crash when tracing is enabled" * tag 'nfsd-6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: Revert "NFSD: Remove the cap on number of operations per NFSv4 COMPOUND" nfsd: Avoid strlen conflict in nfsd4_encode_components_esc() NFSD: Fix crash in nfsd4_read_release() NFSD: Define actions for the new time_deleg FATTR4 attributes	2025-10-28 12:13:20 -07:00
Chuck Lever	3e7f011c25	Revert "NFSD: Remove the cap on number of operations per NFSv4 COMPOUND" I've found that pynfs COMP6 now leaves the connection or lease in a strange state, which causes CLOSE9 to hang indefinitely. I've dug into it a little, but I haven't been able to root-cause it yet. However, I bisected to commit `48aab1606f` ("NFSD: Remove the cap on number of operations per NFSv4 COMPOUND"). Tianshuo Han also reports a potential vulnerability when decoding an NFSv4 COMPOUND. An attacker can place an arbitrarily large op count in the COMPOUND header, which results in: [ 51.410584] nfsd: vmalloc error: size 1209533382144, exceeds total pages, mode:0xdc0(GFP_KERNEL\|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0 when NFSD attempts to allocate the COMPOUND op array. Let's restore the operation-per-COMPOUND limit, but increased to 200 for now. Reported-by: tianshuo han <hantianshuo233@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: stable@vger.kernel.org Tested-by: Tianshuo Han <hantianshuo233@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-10-21 11:03:50 -04:00
Nathan Chancellor	29cdfb4950	nfsd: Avoid strlen conflict in nfsd4_encode_components_esc() There is an error building nfs4xdr.c with CONFIG_SUNRPC_DEBUG_TRACE=y and CONFIG_FORTIFY_SOURCE=n due to the local variable strlen conflicting with the function strlen(): In file included from include/linux/cpumask.h:11, from arch/x86/include/asm/paravirt.h:21, from arch/x86/include/asm/irqflags.h:102, from include/linux/irqflags.h:18, from include/linux/spinlock.h:59, from include/linux/mmzone.h:8, from include/linux/gfp.h:7, from include/linux/slab.h:16, from fs/nfsd/nfs4xdr.c:37: fs/nfsd/nfs4xdr.c: In function 'nfsd4_encode_components_esc': include/linux/kernel.h:321:46: error: called object 'strlen' is not a function or function pointer 321 \| __trace_puts(_THIS_IP_, str, strlen(str)); \ \| ^~~~~~ include/linux/kernel.h:265:17: note: in expansion of macro 'trace_puts' 265 \| trace_puts(fmt); \ \| ^~~~~~~~~~ include/linux/sunrpc/debug.h:34:41: note: in expansion of macro 'trace_printk' 34 \| # define __sunrpc_printk(fmt, ...) trace_printk(fmt, ##__VA_ARGS__) \| ^~~~~~~~~~~~ include/linux/sunrpc/debug.h:42:17: note: in expansion of macro '__sunrpc_printk' 42 \| __sunrpc_printk(fmt, ##__VA_ARGS__); \ \| ^~~~~~~~~~~~~~~ include/linux/sunrpc/debug.h:25:9: note: in expansion of macro 'dfprintk' 25 \| dfprintk(FACILITY, fmt, ##__VA_ARGS__) \| ^~~~~~~~ fs/nfsd/nfs4xdr.c:2646:9: note: in expansion of macro 'dprintk' 2646 \| dprintk("nfsd4_encode_components(%s)\n", components); \| ^~~~~~~ fs/nfsd/nfs4xdr.c:2643:13: note: declared here 2643 \| int strlen, count=0; \| ^~~~~~ This dprintk() instance is not particularly useful, so just remove it altogether to get rid of the immediate strlen() conflict. At the same time, eliminate the local strlen variable to avoid potential conflicts with strlen() in the future. Fixes: `ec7d8e68ef` ("sunrpc: add a Kconfig option to redirect dfprintk() output to trace buffer") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-10-21 11:03:19 -04:00
Chuck Lever	abb1f08a21	NFSD: Fix crash in nfsd4_read_release() When tracing is enabled, the trace_nfsd_read_done trace point crashes during the pynfs read.testNoFh test. Fixes: `15a8b55dbb` ("nfsd: call op_release, even when op_func returns an error") Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-10-21 11:03:19 -04:00
Chuck Lever	4f76435fd5	NFSD: Define actions for the new time_deleg FATTR4 attributes NFSv4 clients won't send legitimate GETATTR requests for these new attributes because they are intended to be used only with CB_GETATTR and SETATTR. But NFSD has to do something besides crashing if it ever sees a GETATTR request that queries these attributes. RFC 8881 Section 18.7.3 states: > The server MUST return a value for each attribute that the client > requests if the attribute is supported by the server for the > target file system. If the server does not support a particular > attribute on the target file system, then it MUST NOT return the > attribute value and MUST NOT set the attribute bit in the result > bitmap. The server MUST return an error if it supports an > attribute on the target but cannot obtain its value. In that case, > no attribute values will be returned. Further, RFC 9754 Section 5 states: > These new attributes are invalid to be used with GETATTR, VERIFY, > and NVERIFY, and they can only be used with CB_GETATTR and SETATTR > by a client holding an appropriate delegation. Thus there does not appear to be a specific server response mandated by specification. Taking the guidance that querying these attributes via GETATTR is "invalid", NFSD will return nfserr_inval, failing the request entirely. Reported-by: Robert Morris <rtm@csail.mit.edu> Closes: https://lore.kernel.org/linux-nfs/7819419cf0cb50d8130dc6b747765d2b8febc88a.camel@kernel.org/T/#t Fixes: `51c0d4f7e3` ("nfsd: add support for FATTR4_OPEN_ARGUMENTS") Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-10-21 11:03:19 -04:00
Linus Torvalds	9b332cece9	Merge tag 'nfsd-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fix from Chuck Lever: - Fix a crasher reported by rtm@csail.mit.edu * tag 'nfsd-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: NFSD: Define a proc_layoutcommit for the FlexFiles layout type	2025-10-14 09:28:12 -07:00
Chuck Lever	4b47a8601b	NFSD: Define a proc_layoutcommit for the FlexFiles layout type Avoid a crash if a pNFS client should happen to send a LAYOUTCOMMIT operation on a FlexFiles layout. Reported-by: Robert Morris <rtm@csail.mit.edu> Closes: https://lore.kernel.org/linux-nfs/152f99b2-ba35-4dec-93a9-4690e625dccd@oracle.com/T/#t Cc: Thomas Haynes <loghyr@hammerspace.com> Cc: stable@vger.kernel.org Fixes: `9b9960a0ca` ("nfsd: Add a super simple flex file server") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-10-10 12:53:50 -04:00
Linus Torvalds	81538c8e42	Merge tag 'nfsd-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd updates from Chuck Lever: "Mike Snitzer has prototyped a mechanism for disabling I/O caching in NFSD. This is introduced in v6.18 as an experimental feature. This enables scaling NFSD in /both/ directions: - NFS service can be supported on systems with small memory footprints, such as low-cost cloud instances - Large NFS workloads will be less likely to force the eviction of server-local activity, helping it avoid thrashing Jeff Layton contributed a number of fixes to the new attribute delegation implementation (based on a pending Internet RFC) that we hope will make attribute delegation reliable enough to enable by default, as it is on the Linux NFS client. The remaining patches in this pull request are clean-ups and minor optimizations. Many thanks to the contributors, reviewers, testers, and bug reporters who participated during the v6.18 NFSD development cycle" * tag 'nfsd-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (42 commits) nfsd: discard nfserr_dropit SUNRPC: Make RPCSEC_GSS_KRB5 select CRYPTO instead of depending on it NFSD: Add io_cache_{read,write} controls to debugfs NFSD: Do the grace period check in ->proc_layoutget nfsd: delete unnecessary NULL check in __fh_verify() NFSD: Allow layoutcommit during grace period NFSD: Disallow layoutget during grace period sunrpc: fix "occurence"->"occurrence" nfsd: Don't force CRYPTO_LIB_SHA256 to be built-in nfsd: nfserr_jukebox in nlm_fopen should lead to a retry NFSD: Reduce DRC bucket size NFSD: Delay adding new entries to LRU SUNRPC: Move the svc_rpcb_cleanup() call sites NFS: Remove rpcbind cleanup for NFSv4.0 callback nfsd: unregister with rpcbind when deleting a transport NFSD: Drop redundant conversion to bool sunrpc: eliminate return pointer in svc_tcp_sendmsg() sunrpc: fix pr_notice in svc_tcp_sendto() to show correct length nfsd: decouple the xprtsec policy check from check_nfsd_access() NFSD: Fix destination buffer size in nfsd4_ssc_setup_dul() ...	2025-10-06 13:22:21 -07:00
Linus Torvalds	50647a1176	Merge tag 'pull-f_path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull file->f_path constification from Al Viro: "Only one thing was modifying ->f_path of an opened file - acct(2). Massaging that away and constifying a bunch of struct path * arguments in functions that might be given &file->f_path ends up with the situation where we can turn ->f_path into an anon union of const struct path f_path and struct path __f_path, the latter modified only in a few places in fs/{file_table,open,namei}.c, all for struct file instances that are yet to be opened" * tag 'pull-f_path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (23 commits) Have cc(1) catch attempts to modify ->f_path kernel/acct.c: saner struct file treatment configfs:get_target() - release path as soon as we grab configfs_item reference apparmor/af_unix: constify struct path * arguments ovl_is_real_file: constify realpath argument ovl_sync_file(): constify path argument ovl_lower_dir(): constify path argument ovl_get_verity_digest(): constify path argument ovl_validate_verity(): constify {meta,data}path arguments ovl_ensure_verity_loaded(): constify datapath argument ksmbd_vfs_set_init_posix_acl(): constify path argument ksmbd_vfs_inherit_posix_acl(): constify path argument ksmbd_vfs_kern_path_unlock(): constify path argument ksmbd_vfs_path_lookup_locked(): root_share_path can be const struct path * check_export(): constify path argument export_operations->open(): constify path argument rqst_exp_get_by_name(): constify path argument nfs: constify path argument of __vfs_getattr() bpf...d_path(): constify path argument done_path_create(): constify path argument ...	2025-10-03 16:32:36 -07:00
Linus Torvalds	070a542f08	Merge tag 'nfs-for-6.18-1' of git://git.linux-nfs.org/projects/anna/linux-nfs Pull NFS client updates from Anna Schumaker: "New Features: - Add a Kconfig option to redirect dfprintk() to the trace buffer - Enable use of the RWF_DONTCACHE flag on the NFS client - Add striped layout handling to pNFS flexfiles - Add proper localio handling for READ and WRITE O_DIRECT Bugfixes: - Handle NFS4ERR_GRACE errors during delegation recall - Fix NFSv4.1 backchannel max_resp_sz verification check - Fix mount hang after CREATE_SESSION failure - Fix d_parent->d_inode locking in nfs4_setup_readdir() Other Cleanups and Improvements: - Improvements to write handling tracepoints - Fix a few trivial spelling mistakes - Cleanups to the rpcbind cleanup call sites - Convert the SUNRPC xdr_buf to use a scratch folio instead of scratch page - Remove unused NFS_WBACK_BUSY() macro - Remove __GFP_NOWARN flags - Unexport rpc_malloc() and rpc_free()" * tag 'nfs-for-6.18-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (46 commits) NFS: add basic STATX_DIOALIGN and STATX_DIO_READ_ALIGN support nfs/localio: add tracepoints for misaligned DIO READ and WRITE support nfs/localio: add proper O_DIRECT support for READ and WRITE nfs/localio: refactor iocb initialization nfs/localio: refactor iocb and iov_iter_bvec initialization nfs/localio: avoid issuing misaligned IO using O_DIRECT nfs/localio: make trace_nfs_local_open_fh more useful NFSD: filecache: add STATX_DIOALIGN and STATX_DIO_READ_ALIGN support sunrpc: unexport rpc_malloc() and rpc_free() NFSv4/flexfiles: Add support for striped layouts NFSv4/flexfiles: Update layout stats & error paths for striped layouts NFSv4/flexfiles: Write path updates for striped layouts NFSv4/flexfiles: Commit path updates for striped layouts NFSv4/flexfiles: Read path updates for striped layouts NFSv4/flexfiles: Update low level helper functions to be DS stripe aware. NFSv4/flexfiles: Add data structure support for striped layouts NFSv4/flexfiles: Use ds_commit_idx when marking a write commit NFSv4/flexfiles: Remove cred local variable dependency nfs4_setup_readdir(): insufficient locking for ->d_parent->d_inode dereferencing NFS: Enable use of the RWF_DONTCACHE flag on the NFS client ...	2025-10-03 14:20:40 -07:00
Linus Torvalds	867e4513fe	Merge tag 'pull-nfsctl' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull nfsctl updates from Al Viro: "nfsctl cleanups and a fix" * tag 'pull-nfsctl' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: nfsd_get_inode(): lift setting ->i_{,f}op to callers. nfsdfs_create_files(): switch to simple_start_creating() _nfsd_symlink(): switch to simple_start_creating() nfsd_mkdir(): switch to simple_start_creating() nfsctl: symlink has no business bumping link count of parent directory	2025-10-03 10:55:35 -07:00
NeilBrown	73cc6ec1a8	nfsd: discard nfserr_dropit nfserr_dropit hasn't been used for over a decade, since rq_dropme and the RQ_DROPME were introduced. Time to get rid of it completely. Signed-off-by: NeilBrown <neil@brown.name> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-10-01 15:54:01 -04:00
Mike Snitzer	6304affe45	NFSD: Add io_cache_{read,write} controls to debugfs Add 'io_cache_read' to NFSD's debugfs interface so that any data read by NFSD will either be: - cached using page cache (NFSD_IO_BUFFERED=0) - cached but removed from the page cache upon completion (NFSD_IO_DONTCACHE=1). io_cache_read may be set by writing to: /sys/kernel/debug/nfsd/io_cache_read Add 'io_cache_write' to NFSD's debugfs interface so that any data written by NFSD will either be: - cached using page cache (NFSD_IO_BUFFERED=0) - cached but removed from the page cache upon completion (NFSD_IO_DONTCACHE=1). io_cache_write may be set by writing to: /sys/kernel/debug/nfsd/io_cache_write The default value for both settings is NFSD_IO_BUFFERED, which is NFSD's existing behavior for both read and write. Changes to these settings take immediate effect for all exports and NFS versions. Currently only xfs and ext4 implement RWF_DONTCACHE. For file systems that do not implement RWF_DONTCACHE, NFSD use only buffered I/O when the io_cache setting is NFSD_IO_DONTCACHE. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-10-01 15:54:01 -04:00
Chuck Lever	d6e80d48f9	NFSD: Do the grace period check in ->proc_layoutget RFC 8881 Section 18.43.3 states: > If the metadata server is in a grace period, and does not persist > layouts and device ID to device address mappings, then it MUST > return NFS4ERR_GRACE (see Section 8.4.2.1). Jeff observed that this suggests the grace period check is better done by the individual layout type implementations, because checking for the server grace period is unnecessary for some layout types. Suggested-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/linux-nfs/7h5p5ktyptyt37u6jhpbjfd5u6tg44lriqkdc7iz7czeeabrvo@ijgxz27dw4sg/T/#t Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-10-01 15:54:01 -04:00
Dan Carpenter	eafdd7e949	nfsd: delete unnecessary NULL check in __fh_verify() In commit 4a0de50a44bb ("nfsd: decouple the xprtsec policy check from check_nfsd_access()") we added a NULL check on "rqstp" to earlier in the function. This check is no longer required so delete it. Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-10-01 15:54:01 -04:00
Sergey Bashirov	e0963ce53b	NFSD: Allow layoutcommit during grace period If the loca_reclaim field is set to TRUE, this indicates that the client is attempting to commit changes to a layout after the restart of the metadata server during the metadata server's recovery grace period. This type of request may be necessary when the client has uncommitted writes to provisionally allocated byte-ranges of a file that were sent to the storage devices before the restart of the metadata server. See RFC 8881, section 18.42.3. Without this, the client is not able to increase the file size and commit preallocated extents when the block/scsi layout server is restarted during a write and is in a grace period. And when the grace period ends, the client also cannot perform layoutcommit because the old layout state becomes invalid, resulting in file corruption. Co-developed-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-10-01 15:54:01 -04:00
Mike Snitzer	25ba2b84c3	nfs/localio: avoid issuing misaligned IO using O_DIRECT Add nfsd_file_dio_alignment and use it to avoid issuing misaligned IO using O_DIRECT. Any misaligned DIO falls back to using buffered IO. Because misaligned DIO is now handled safely, remove the nfs modparam 'localio_O_DIRECT_semantics' that was added to require users opt-in to the requirement that all O_DIRECT be properly DIO-aligned. Also, introduce nfs_iov_iter_aligned_bvec() which is a variant of iov_iter_aligned_bvec() that also verifies the offset associated with an iov_iter is DIO-aligned. NOTE: in a parallel effort, iov_iter_aligned_bvec() is being removed along with iov_iter_is_aligned(). Lastly, add pr_info_ratelimited if underlying filesystem returns -EINVAL because it was made to try O_DIRECT for IO that is not DIO-aligned (shouldn't happen, so its best to be louder if it does). Fixes: `3feec68563` ("nfs/localio: add direct IO enablement with sync and async IO support") Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-09-30 16:10:29 -04:00
Mike Snitzer	d11f6cd1bb	NFSD: filecache: add STATX_DIOALIGN and STATX_DIO_READ_ALIGN support Use STATX_DIOALIGN and STATX_DIO_READ_ALIGN to get DIO alignment attributes from the underlying filesystem and store them in the associated nfsd_file. This is done when the nfsd_file is first opened for each regular file. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-09-30 16:10:05 -04:00
Linus Torvalds	449c2b302c	Merge tag 'vfs-6.18-rc1.async' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs async directory updates from Christian Brauner: "This contains further preparatory changes for the asynchronous directory locking scheme: - Add lookup_one_positive_killable() which allows overlayfs to perform lookup that won't block on a fatal signal - Unify the mount idmap handling in struct renamedata as a rename can only happen within a single mount - Introduce kern_path_parent() for audit which sets the path to the parent and returns a dentry for the target without holding any locks on return - Rename kern_path_locked() as it is only used to prepare for the removal of an object from the filesystem: kern_path_locked() => start_removing_path() kern_path_create() => start_creating_path() user_path_create() => start_creating_user_path() user_path_locked_at() => start_removing_user_path_at() done_path_create() => end_creating_path() NA => end_removing_path()" * tag 'vfs-6.18-rc1.async' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: debugfs: rename start_creating() to debugfs_start_creating() VFS: rename kern_path_locked() and related functions. VFS/audit: introduce kern_path_parent() for audit VFS: unify old_mnt_idmap and new_mnt_idmap in renamedata VFS: discard err2 in filename_create() VFS/ovl: add lookup_one_positive_killable()	2025-09-29 11:55:15 -07:00
Linus Torvalds	b786405685	Merge tag 'vfs-6.18-rc1.workqueue' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs workqueue updates from Christian Brauner: "This contains various workqueue changes affecting the filesystem layer. Currently if a user enqueue a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This replaces the use of system_wq and system_unbound_wq. system_wq is a per-CPU workqueue which isn't very obvious from the name and system_unbound_wq is to be used when locality is not required. So this renames system_wq to system_percpu_wq, and system_unbound_wq to system_dfl_wq. This also adds a new WQ_PERCPU flag to allow the fs subsystem users to explicitly request the use of per-CPU behavior. Both WQ_UNBOUND and WQ_PERCPU flags coexist for one release cycle to allow callers to transition their calls. WQ_UNBOUND will be removed in a next release cycle" * tag 'vfs-6.18-rc1.workqueue' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: WQ_PERCPU added to alloc_workqueue users fs: replace use of system_wq with system_percpu_wq fs: replace use of system_unbound_wq with system_dfl_wq	2025-09-29 10:27:17 -07:00
Sergey Bashirov	db155b7c7c	NFSD: Disallow layoutget during grace period When the server is recovering from a reboot and is in a grace period, any operation that may result in deletion or reallocation of block extents should not be allowed. See RFC 8881, section 18.43.3. If multiple clients write data to the same file, rebooting the server during writing may result in file corruption. In the worst case, the exported XFS may also become corrupted. Observed this behavior while testing pNFS block volume setup. Co-developed-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-25 10:01:24 -04:00
Chuck Lever	62c0c0e749	SUNRPC: Move the svc_rpcb_cleanup() call sites Clean up: because svc_rpcb_cleanup() and svc_xprt_destroy_all() are always invoked in pairs, we can deduplicate code by moving the svc_rpcb_cleanup() call sites into svc_xprt_destroy_all(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-09-23 13:28:19 -04:00
NeilBrown	d7fb2c4102	VFS: unify old_mnt_idmap and new_mnt_idmap in renamedata A rename operation can only rename within a single mount. Callers of vfs_rename() must and do ensure this is the case. So there is no point in having two mnt_idmaps in renamedata as they are always the same. Only one of them is passed to ->rename in any case. This patch replaces both with a single "mnt_idmap" and changes all callers. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neil@brown.name> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-09-23 12:37:35 +02:00
Eric Biggers	13289ed501	nfsd: Don't force CRYPTO_LIB_SHA256 to be built-in Now that nfsd is accessing SHA-256 via the library API instead of via crypto_shash, there is a direct symbol dependency on the SHA-256 code and there is no benefit to be gained from forcing it to be built-in. Therefore, select CRYPTO_LIB_SHA256 from NFSD (conditional on NFSD_V4) instead of from NFSD_V4, so that it can be 'm' if NFSD is 'm'. Signed-off-by: Eric Biggers <ebiggers@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Olga Kornievskaia	a082e4b4d0	nfsd: nfserr_jukebox in nlm_fopen should lead to a retry When v3 NLM request finds a conflicting delegation, it triggers a delegation recall and nfsd_open fails with EAGAIN. nfsd_open then translates EAGAIN into nfserr_jukebox. In nlm_fopen, instead of returning nlm_failed for when there is a conflicting delegation, drop this NLM request so that the client retries. Once delegation is recalled and if a local lock is claimed, a retry would lead to nfsd returning a nlm_lck_blocked error or a successful nlm lock. Fixes: `d343fce148` ("[PATCH] knfsd: Allow lockd to drop replies as appropriate") Cc: stable@vger.kernel.org # v6.6 Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Chuck Lever	8ddd06be9a	NFSD: Reduce DRC bucket size The common case is that a DRC lookup will not find the XID in the bucket. Reduce the amount of pointer chasing during the lookup by keeping fewer entries in each hash bucket. Changing the bucket size constant forces the size of the DRC hash table to increase, and the height of each bucket r-b tree to be reduced. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Chuck Lever	fb340bfd48	NFSD: Delay adding new entries to LRU Neil Brown observes: > I would not include RC_INPROG entries in the lru at all - they are > always ignored, and will be added when they are switched to > RCU_DONE. I also removed a stale comment. Suggested-by: NeilBrown <neil@brown.name> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Chuck Lever	d73d06dac6	SUNRPC: Move the svc_rpcb_cleanup() call sites Clean up: because svc_rpcb_cleanup() and svc_xprt_destroy_all() are always invoked in pairs, we can deduplicate code by moving the svc_rpcb_cleanup() call sites into svc_xprt_destroy_all(). Tested-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Xichao Zhao	f64397e04b	NFSD: Drop redundant conversion to bool The result of integer comparison already evaluates to bool. No need for explicit conversion. Signed-off-by: Xichao Zhao <zhao.xichao@vivo.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Scott Mayhew	e4f574ca9c	nfsd: decouple the xprtsec policy check from check_nfsd_access() A while back I had reported that an NFSv3 client could successfully mount using '-o xprtsec=none' an export that had been exported with 'xprtsec=tls:mtls'. By "successfully" I mean that the mount command would succeed and the mount would show up in /proc/mount. Attempting to do anything futher with the mount would be met with NFS3ERR_ACCES. This was fixed (albeit accidentally) by commit `bb4f07f240` ("nfsd: Fix NFSD_MAY_BYPASS_GSS and NFSD_MAY_BYPASS_GSS_ON_ROOT") and was subsequently re-broken by commit `0813c5f012` ("nfsd: fix access checking for NLM under XPRTSEC policies"). Transport Layer Security isn't an RPC security flavor or pseudo-flavor, so we shouldn't be conflating them when determining whether the access checks can be bypassed. Split check_nfsd_access() into two helpers, and have __fh_verify() call the helpers directly since __fh_verify() has logic that allows one or both of the checks to be skipped. All other sites will continue to call check_nfsd_access(). Link: https://lore.kernel.org/linux-nfs/ZjO3Qwf_G87yNXb2@aion/ Fixes: `9280c57743` ("NFSD: Handle new xprtsec= export option") Cc: stable@vger.kernel.org Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Thorsten Blum	ab1c282c01	NFSD: Fix destination buffer size in nfsd4_ssc_setup_dul() Commit `5304877936` ("NFSD: Fix strncpy() fortify warning") replaced strncpy(,, sizeof(..)) with strlcpy(,, sizeof(..) - 1), but strlcpy() already guaranteed NUL-termination of the destination buffer and subtracting one byte potentially truncated the source string. The incorrect size was then carried over in commit `72f78ae00a` ("NFSD: move from strlcpy with unused retval to strscpy") when switching from strlcpy() to strscpy(). Fix this off-by-one error by using the full size of the destination buffer again. Cc: stable@vger.kernel.org Fixes: `5304877936` ("NFSD: Fix strncpy() fortify warning") Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Eric Biggers	9ebcd022a3	nfsd: Eliminate an allocation in nfs4_make_rec_clidname() Since MD5 digests are fixed-size, make nfs4_make_rec_clidname() store the digest in a stack buffer instead of a dynamically allocated buffer. Use MD5_DIGEST_SIZE instead of a hard-coded value, both in nfs4_make_rec_clidname() and in the definition of HEXDIR_LEN. Signed-off-by: Eric Biggers <ebiggers@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Eric Biggers	17695d72d0	nfsd: Replace open-coded conversion of bytes to hex Since the Linux kernel's sprintf() has conversion to hex built-in via "%*phN", delete md5_to_hex() and just use that. Also add an explicit array bound to the dname parameter of nfs4_make_rec_clidname() to make its size clear. No functional change. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Jeff Layton	e5e9b24ab8	nfsd: freeze c/mtime updates with outstanding WRITE_ATTRS delegation Instead of allowing the ctime to roll backward with a WRITE_ATTRS delegation, set FMODE_NOCMTIME on the file and have it skip mtime and ctime updates. It is possible that the client will never send a SETATTR to set the times before returning the delegation. Add two new bools to struct nfs4_delegation: dl_written: tracks whether the file has been written since the delegation was granted. This is set in the WRITE and LAYOUTCOMMIT handlers. dl_setattr: tracks whether the client has sent at least one valid mtime that can also update the ctime in a SETATTR. When unlocking the lease for the delegation, clear FMODE_NOCMTIME. If the file has been written, but no setattr for the delegated mtime and ctime has been done, update the timestamps to current_time(). Suggested-by: NeilBrown <neil@brown.name> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Jeff Layton	b40b1ba37a	nfsd: fix timestamp updates in CB_GETATTR When updating the local timestamps from CB_GETATTR, the updated values are not being properly vetted. Compare the update times vs. the saved times in the delegation rather than the current times in the inode. Also, ensure that the ctime is properly vetted vs. its original value. Fixes: `6ae30d6eb2` ("nfsd: add support for delegated timestamps") Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Jeff Layton	3952f1cbcb	nfsd: fix SETATTR updates for delegated timestamps SETATTRs containing delegated timestamp updates are currently not being vetted properly. Since we no longer need to compare the timestamps vs. the current timestamps, move the vetting of delegated timestamps wholly into nfsd. Rename the set_cb_time() helper to nfsd4_vet_deleg_time(), and make it non-static. Add a new vet_deleg_attrs() helper that is called from nfsd4_setattr that uses nfsd4_vet_deleg_time() to properly validate the all the timestamps. If the validation indicates that the update should be skipped, unset the appropriate flags in ia_valid. Fixes: `7e13f4f8d2` ("nfsd: handle delegated timestamps in SETATTR") Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Jeff Layton	7663e963a5	nfsd: track original timestamps in nfs4_delegation As Trond points out [1], the "original time" mentioned in RFC 9754 refers to the timestamps on the files at the time that the delegation was granted, and not the current timestamp of the file on the server. Store the current timestamps for the file in the nfs4_delegation when granting one. Add STATX_ATIME and STATX_MTIME to the request mask in nfs4_delegation_stat(). When granting OPEN_DELEGATE_READ_ATTRS_DELEG, do a nfs4_delegation_stat() and save the correct atime. If the stat() fails for any reason, fall back to granting a normal read deleg. [1]: https://lore.kernel.org/linux-nfs/47a4e40310e797f21b5137e847b06bb203d99e66.camel@kernel.org/ Fixes: `7e13f4f8d2` ("nfsd: handle delegated timestamps in SETATTR") Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Jeff Layton	c066ff58e5	nfsd: use ATTR_CTIME_SET for delegated ctime updates Ensure that notify_change() doesn't clobber a delegated ctime update with current_time() by setting ATTR_CTIME_SET for those updates. Don't bother setting the timestamps in cb_getattr_update_times() in the non-delegated case. notify_change() will do that itself. Fixes: `7e13f4f8d2` ("nfsd: handle delegated timestamps in SETATTR") Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Jeff Layton	5affb498e7	nfsd: ignore ATTR_DELEG when checking ia_valid before notify_change() If the only flag left is ATTR_DELEG, then there are no changes to be made. Fixes: `7e13f4f8d2` ("nfsd: handle delegated timestamps in SETATTR") Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Jeff Layton	2990b5a479	nfsd: fix assignment of ia_ctime.tv_nsec on delegated mtime update The ia_ctime.tv_nsec field should be set to modify.nseconds. Fixes: `7e13f4f8d2` ("nfsd: handle delegated timestamps in SETATTR") Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Sergey Bashirov	d68886bae7	NFSD: Fix last write offset handling in layoutcommit The data type of loca_last_write_offset is newoffset4 and is switched on a boolean value, no_newoffset, that indicates if a previous write occurred or not. If no_newoffset is FALSE, an offset is not given. This means that client does not try to update the file size. Thus, server should not try to calculate new file size and check if it fits into the segment range. See RFC 8881, section 12.5.4.2. Sometimes the current incorrect logic may cause clients to hang when trying to sync an inode. If layoutcommit fails, the client marks the inode as dirty again. Fixes: `9cf514ccfa` ("nfsd: implement pNFS operations") Cc: stable@vger.kernel.org Co-developed-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Sergey Bashirov	f963cf2b91	NFSD: Implement large extent array support in pNFS When pNFS client in the block or scsi layout mode sends layoutcommit to MDS, a variable length array of modified extents is supplied within the request. This patch allows the server to accept such extent arrays if they do not fit within single memory page. The issue can be reproduced when writing to a 1GB file using FIO with O_DIRECT, 4K block and large I/O depth without preallocation of the file. In this case, the server returns NFSERR_BADXDR to the client. Co-developed-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00
Sergey Bashirov	6bf1be3399	NFSD: Minor cleanup in layoutcommit decoding Use the appropriate xdr function to decode the lc_newoffset field, which is a boolean value. See RFC 8881, section 18.42.1. Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-09-21 19:24:50 -04:00

1 2 3 4 5 ...

4392 Commits