linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-21 13:45:53 -04:00

Author	SHA1	Message	Date
Jaegeuk Kim	4e715744bf	f2fs: add missing dput() when printing the donation list We missed to call dput() on the grabbed dentry. Fixes: `f1a49c1b11` ("f2fs: show the list of donation files") Reviewed-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2025-10-03 03:16:10 +00:00
Linus Torvalds	e406d57be7	Merge tag 'mm-nonmm-stable-2025-10-02-15-29' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - "ida: Remove the ida_simple_xxx() API" from Christophe Jaillet completes the removal of this legacy IDR API - "panic: introduce panic status function family" from Jinchao Wang provides a number of cleanups to the panic code and its various helpers, which were rather ad-hoc and scattered all over the place - "tools/delaytop: implement real-time keyboard interaction support" from Fan Yu adds a few nice user-facing usability changes to the delaytop monitoring tool - "efi: Fix EFI boot with kexec handover (KHO)" from Evangelos Petrongonas fixes a panic which was happening with the combination of EFI and KHO - "Squashfs: performance improvement and a sanity check" from Phillip Lougher teaches squashfs's lseek() about SEEK_DATA/SEEK_HOLE. A mere 150x speedup was measured for a well-chosen microbenchmark - plus another 50-odd singleton patches all over the place * tag 'mm-nonmm-stable-2025-10-02-15-29' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (75 commits) Squashfs: reject negative file sizes in squashfs_read_inode() kallsyms: use kmalloc_array() instead of kmalloc() MAINTAINERS: update Sibi Sankar's email address Squashfs: add SEEK_DATA/SEEK_HOLE support Squashfs: add additional inode sanity checking lib/genalloc: fix device leak in of_gen_pool_get() panic: remove CONFIG_PANIC_ON_OOPS_VALUE ocfs2: fix double free in user_cluster_connect() checkpatch: suppress strscpy warnings for userspace tools cramfs: fix incorrect physical page address calculation kernel: prevent prctl(PR_SET_PDEATHSIG) from racing with parent process exit Squashfs: fix uninit-value in squashfs_get_parent kho: only fill kimage if KHO is finalized ocfs2: avoid extra calls to strlen() after ocfs2_sprintf_system_inode_name() kernel/sys.c: fix the racy usage of task_lock(tsk->group_leader) in sys_prlimit64() paths sched/task.h: fix the wrong comment on task_lock() nesting with tasklist_lock coccinelle: platform_no_drv_owner: handle also built-in drivers coccinelle: of_table: handle SPI device ID tables lib/decompress: use designated initializers for struct compress_format efi: support booting with kexec handover (KHO) ...	2025-10-02 18:44:54 -07:00
Linus Torvalds	8804d970fa	Merge tag 'mm-stable-2025-10-01-19-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "mm, swap: improve cluster scan strategy" from Kairui Song improves performance and reduces the failure rate of swap cluster allocation - "support large align and nid in Rust allocators" from Vitaly Wool permits Rust allocators to set NUMA node and large alignment when perforning slub and vmalloc reallocs - "mm/damon/vaddr: support stat-purpose DAMOS" from Yueyang Pan extend DAMOS_STAT's handling of the DAMON operations sets for virtual address spaces for ops-level DAMOS filters - "execute PROCMAP_QUERY ioctl under per-vma lock" from Suren Baghdasaryan reduces mmap_lock contention during reads of /proc/pid/maps - "mm/mincore: minor clean up for swap cache checking" from Kairui Song performs some cleanup in the swap code - "mm: vm_normal_page() improvements" from David Hildenbrand provides code cleanup in the pagemap code - "add persistent huge zero folio support" from Pankaj Raghav provides a block layer speedup by optionalls making the huge_zero_pagepersistent, instead of releasing it when its refcount falls to zero - "kho: fixes and cleanups" from Mike Rapoport adds a few touchups to the recently added Kexec Handover feature - "mm: make mm->flags a bitmap and 64-bit on all arches" from Lorenzo Stoakes turns mm_struct.flags into a bitmap. To end the constant struggle with space shortage on 32-bit conflicting with 64-bit's needs - "mm/swapfile.c and swap.h cleanup" from Chris Li cleans up some swap code - "selftests/mm: Fix false positives and skip unsupported tests" from Donet Tom fixes a few things in our selftests code - "prctl: extend PR_SET_THP_DISABLE to only provide THPs when advised" from David Hildenbrand "allows individual processes to opt-out of THP=always into THP=madvise, without affecting other workloads on the system". It's a long story - the [1/N] changelog spells out the considerations - "Add and use memdesc_flags_t" from Matthew Wilcox gets us started on the memdesc project. Please see https://kernelnewbies.org/MatthewWilcox/Memdescs and https://blogs.oracle.com/linux/post/introducing-memdesc - "Tiny optimization for large read operations" from Chi Zhiling improves the efficiency of the pagecache read path - "Better split_huge_page_test result check" from Zi Yan improves our folio splitting selftest code - "test that rmap behaves as expected" from Wei Yang adds some rmap selftests - "remove write_cache_pages()" from Christoph Hellwig removes that function and converts its two remaining callers - "selftests/mm: uffd-stress fixes" from Dev Jain fixes some UFFD selftests issues - "introduce kernel file mapped folios" from Boris Burkov introduces the concept of "kernel file pages". Using these permits btrfs to account its metadata pages to the root cgroup, rather than to the cgroups of random inappropriate tasks - "mm/pageblock: improve readability of some pageblock handling" from Wei Yang provides some readability improvements to the page allocator code - "mm/damon: support ARM32 with LPAE" from SeongJae Park teaches DAMON to understand arm32 highmem - "tools: testing: Use existing atomic.h for vma/maple tests" from Brendan Jackman performs some code cleanups and deduplication under tools/testing/ - "maple_tree: Fix testing for 32bit compiles" from Liam Howlett fixes a couple of 32-bit issues in tools/testing/radix-tree.c - "kasan: unify kasan_enabled() and remove arch-specific implementations" from Sabyrzhan Tasbolatov moves KASAN arch-specific initialization code into a common arch-neutral implementation - "mm: remove zpool" from Johannes Weiner removes zspool - an indirection layer which now only redirects to a single thing (zsmalloc) - "mm: task_stack: Stack handling cleanups" from Pasha Tatashin makes a couple of cleanups in the fork code - "mm: remove nth_page()" from David Hildenbrand makes rather a lot of adjustments at various nth_page() callsites, eventually permitting the removal of that undesirable helper function - "introduce kasan.write_only option in hw-tags" from Yeoreum Yun creates a KASAN read-only mode for ARM, using that architecture's memory tagging feature. It is felt that a read-only mode KASAN is suitable for use in production systems rather than debug-only - "mm: hugetlb: cleanup hugetlb folio allocation" from Kefeng Wang does some tidying in the hugetlb folio allocation code - "mm: establish const-correctness for pointer parameters" from Max Kellermann makes quite a number of the MM API functions more accurate about the constness of their arguments. This was getting in the way of subsystems (in this case CEPH) when they attempt to improving their own const/non-const accuracy - "Cleanup free_pages() misuse" from Vishal Moola fixes a number of code sites which were confused over when to use free_pages() vs __free_pages() - "Add Rust abstraction for Maple Trees" from Alice Ryhl makes the mapletree code accessible to Rust. Required by nouveau and by its forthcoming successor: the new Rust Nova driver - "selftests/mm: split_huge_page_test: split_pte_mapped_thp improvements" from David Hildenbrand adds a fix and some cleanups to the thp selftesting code - "mm, swap: introduce swap table as swap cache (phase I)" from Chris Li and Kairui Song is the first step along the path to implementing "swap tables" - a new approach to swap allocation and state tracking which is expected to yield speed and space improvements. This patchset itself yields a 5-20% performance benefit in some situations - "Some ptdesc cleanups" from Matthew Wilcox utilizes the new memdesc layer to clean up the ptdesc code a little - "Fix va_high_addr_switch.sh test failure" from Chunyu Hu fixes some issues in our 5-level pagetable selftesting code - "Minor fixes for memory allocation profiling" from Suren Baghdasaryan addresses a couple of minor issues in relatively new memory allocation profiling feature - "Small cleanups" from Matthew Wilcox has a few cleanups in preparation for more memdesc work - "mm/damon: add addr_unit for DAMON_LRU_SORT and DAMON_RECLAIM" from Quanmin Yan makes some changes to DAMON in furtherance of supporting arm highmem - "selftests/mm: Add -Wunreachable-code and fix warnings" from Muhammad Anjum adds that compiler check to selftests code and fixes the fallout, by removing dead code - "Improvements to Victim Process Thawing and OOM Reaper Traversal Order" from zhongjinji makes a number of improvements in the OOM killer: mainly thawing a more appropriate group of victim threads so they can release resources - "mm/damon: misc fixups and improvements for 6.18" from SeongJae Park is a bunch of small and unrelated fixups for DAMON - "mm/damon: define and use DAMON initialization check function" from SeongJae Park implement reliability and maintainability improvements to a recently-added bug fix - "mm/damon/stat: expose auto-tuned intervals and non-idle ages" from SeongJae Park provides additional transparency to userspace clients of the DAMON_STAT information - "Expand scope of khugepaged anonymous collapse" from Dev Jain removes some constraints on khubepaged's collapsing of anon VMAs. It also increases the success rate of MADV_COLLAPSE against an anon vma - "mm: do not assume file == vma->vm_file in compat_vma_mmap_prepare()" from Lorenzo Stoakes moves us further towards removal of file_operations.mmap(). This patchset concentrates upon clearing up the treatment of stacked filesystems - "mm: Improve mlock tracking for large folios" from Kiryl Shutsemau provides some fixes and improvements to mlock's tracking of large folios. /proc/meminfo's "Mlocked" field became more accurate - "mm/ksm: Fix incorrect accounting of KSM counters during fork" from Donet Tom fixes several user-visible KSM stats inaccuracies across forks and adds selftest code to verify these counters - "mm_slot: fix the usage of mm_slot_entry" from Wei Yang addresses some potential but presently benign issues in KSM's mm_slot handling tag 'mm-stable-2025-10-01-19-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (372 commits) mm: swap: check for stable address space before operating on the VMA mm: convert folio_page() back to a macro mm/khugepaged: use start_addr/addr for improved readability hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list alloc_tag: fix boot failure due to NULL pointer dereference mm: silence data-race in update_hiwater_rss mm/memory-failure: don't select MEMORY_ISOLATION mm/khugepaged: remove definition of struct khugepaged_mm_slot mm/ksm: get mm_slot by mm_slot_entry() when slot is !NULL hugetlb: increase number of reserving hugepages via cmdline selftests/mm: add fork inheritance test for ksm_merging_pages counter mm/ksm: fix incorrect KSM counter handling in mm_struct during fork drivers/base/node: fix double free in register_one_node() mm: remove PMD alignment constraint in execmem_vmalloc() mm/memory_hotplug: fix typo 'esecially' -> 'especially' mm/rmap: improve mlock tracking for large folios mm/filemap: map entire large folio faultaround mm/fault: try to map the entire file folio in finish_fault() mm/rmap: mlock large folios in try_to_unmap_one() mm/rmap: fix a mlock race condition in folio_referenced_one() ...	2025-10-02 18:18:33 -07:00
Linus Torvalds	e1b1d03cee	Merge tag 'for-6.18/block-20250929' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull block updates from Jens Axboe: - NVMe pull request via Keith: - FC target fixes (Daniel) - Authentication fixes and updates (Martin, Chris) - Admin controller handling (Kamaljit) - Target lockdep assertions (Max) - Keep-alive updates for discovery (Alastair) - Suspend quirk (Georg) - MD pull request via Yu: - Add support for a lockless bitmap. A key feature for the new bitmap are that the IO fastpath is lockless. If a user issues lots of write IO to the same bitmap bit in a short time, only the first write has additional overhead to update bitmap bit, no additional overhead for the following writes. By supporting only resync or recover written data, means in the case creating new array or replacing with a new disk, there is no need to do a full disk resync/recovery. - Switch ->getgeo() and ->bios_param() to using struct gendisk rather than struct block_device. - Rust block changes via Andreas. This series adds configuration via configfs and remote completion to the rnull driver. The series also includes a set of changes to the rust block device driver API: a few cleanup patches, and a few features supporting the rnull changes. The series removes the raw buffer formatting logic from `kernel::block` and improves the logic available in `kernel::string` to support the same use as the removed logic. - floppy arch cleanups - Reduce the number of dereferencing needed for ublk commands - Restrict supported sockets for nbd. Mostly done to eliminate a class of issues perpetually reported by syzbot, by using nonsensical socket setups. - A few s390 dasd block fixes - Fix a few issues around atomic writes - Improve DMA interation for integrity requests - Improve how iovecs are treated with regards to O_DIRECT aligment constraints. We used to require each segment to adhere to the constraints, now only the request as a whole needs to. - Clean up and improve p2p support, enabling use of p2p for metadata payloads - Improve locking of request lookup, using SRCU where appropriate - Use page references properly for brd, avoiding very long RCU sections - Fix ordering of recursively submitted IOs - Clean up and improve updating nr_requests for a live device - Various fixes and cleanups * tag 'for-6.18/block-20250929' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (164 commits) s390/dasd: enforce dma_alignment to ensure proper buffer validation s390/dasd: Return BLK_STS_INVAL for EINVAL from do_dasd_request ublk: remove redundant zone op check in ublk_setup_iod() nvme: Use non zero KATO for persistent discovery connections nvmet: add safety check for subsys lock nvme-core: use nvme_is_io_ctrl() for I/O controller check nvme-core: do ioccsz/iorcsz validation only for I/O controllers nvme-core: add method to check for an I/O controller blk-cgroup: fix possible deadlock while configuring policy blk-mq: fix null-ptr-deref in blk_mq_free_tags() from error path blk-mq: Fix more tag iteration function documentation selftests: ublk: fix behavior when fio is not installed ublk: don't access ublk_queue in ublk_unmap_io() ublk: pass ublk_io to __ublk_complete_rq() ublk: don't access ublk_queue in ublk_need_complete_req() ublk: don't access ublk_queue in ublk_check_commit_and_fetch() ublk: don't pass ublk_queue to ublk_fetch() ublk: don't access ublk_queue in ublk_config_io_buf() ublk: don't access ublk_queue in ublk_check_fetch_buf() ublk: pass q_id and tag to __ublk_check_and_get_req() ...	2025-10-02 10:16:56 -07:00
Linus Torvalds	5832d26433	Merge tag 'for-6.18/io_uring-20250929' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull io_uring updates from Jens Axboe: - Store ring provided buffers locally for the users, rather than stuff them into struct io_kiocb. These types of buffers must always be fully consumed or recycled in the current context, and leaving them in struct io_kiocb is hence not a good ideas as that struct has a vastly different life time. Basically just an architecture cleanup that can help prevent issues with ring provided buffers in the future. - Support for mixed CQE sizes in the same ring. Before this change, a CQ ring either used the default 16b CQEs, or it was setup with 32b CQE using IORING_SETUP_CQE32. For use cases where a few 32b CQEs were needed, this caused everything else to use big CQEs. This is wasteful both in terms of memory usage, but also memory bandwidth for the posted CQEs. With IORING_SETUP_CQE_MIXED, applications may use request types that post both normal 16b and big 32b CQEs on the same ring. - Add helpers for async data management, to make it harder for opcode handlers to mess it up. - Add support for multishot for uring_cmd, which ublk can use. This helps improve efficiency, by providing a persistent request type that can trigger multiple CQEs. - Add initial support for ring feature querying. We had basic support for probe operations, but the API isn't great. Rather than expand that, add support for QUERY which is easily expandable and can cover a lot more cases than the existing probe support. This will help applications get a better idea of what operations are supported on a given host. - zcrx improvements from Pavel: - Improve refill entry alignment for better caching - Various cleanups, especially around deduplicating normal memory vs dmabuf setup. - Generalisation of the niov size (Patch 12). It's still hard coded to PAGE_SIZE on init, but will let the user to specify the rx buffer length on setup. - Syscall / synchronous bufer return. It'll be used as a slow fallback path for returning buffers when the refill queue is full. Useful for tolerating slight queue size misconfiguration or with inconsistent load. - Accounting more memory to cgroups. - Additional independent cleanups that will also be useful for mutli-area support. - Various fixes and cleanups * tag 'for-6.18/io_uring-20250929' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (68 commits) io_uring/cmd: drop unused res2 param from io_uring_cmd_done() io_uring: fix nvme's 32b cqes on mixed cq io_uring/query: cap number of queries io_uring/query: prevent infinite loops io_uring/zcrx: account niov arrays to cgroup io_uring/zcrx: allow synchronous buffer return io_uring/zcrx: introduce io_parse_rqe() io_uring/zcrx: don't adjust free cache space io_uring/zcrx: use guards for the refill lock io_uring/zcrx: reduce netmem scope in refill io_uring/zcrx: protect netdev with pp_lock io_uring/zcrx: rename dma lock io_uring/zcrx: make niov size variable io_uring/zcrx: set sgt for umem area io_uring/zcrx: remove dmabuf_offset io_uring/zcrx: deduplicate area mapping io_uring/zcrx: pass ifq to io_zcrx_alloc_fallback() io_uring/zcrx: check all niovs filled with dma addresses io_uring/zcrx: move area reg checks into io_import_area io_uring/zcrx: don't pass slot to io_zcrx_create_area ...	2025-10-02 09:56:23 -07:00
Rajasi Mandal	37e263e68c	cifs: client: force multichannel=off when max_channels=1 Previously, specifying both multichannel and max_channels=1 as mount options would leave multichannel enabled, even though it is not meaningful when only one channel is allowed. This led to confusion and inconsistent behavior, as the client would advertise multichannel capability but never establish secondary channels. Fix this by forcing multichannel to false whenever max_channels=1, ensuring the mount configuration is consistent and matches user intent. This prevents the client from advertising or attempting multichannel support when it is not possible. Signed-off-by: Rajasi Mandal <rajasimandal@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2025-10-01 22:42:15 -05:00
Bharath SM	aa12118dbc	smb client: fix bug with newly created file in cached dir Test generic/637 spotted a problem with create of a new file in a cached directory (by the same client) could cause cases where the new file does not show up properly in ls on that client until the lease times out. Fixes: `037e1bae58` ("smb: client: use ParentLeaseKey in cifs_do_create") Cc: stable@vger.kernel.org Signed-off-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2025-10-01 22:42:15 -05:00
Henrique Carvalho	316025335a	smb: client: short-circuit negative lookups when parent dir is fully cached When the parent directory has a valid and complete cached enumeration we can assume that negative dentries are not present in the directory, thus we can return without issuing a request. This reduces traffic for common ENOENT when the directory entries are cached. Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2025-10-01 22:42:15 -05:00
Henrique Carvalho	55580ad027	smb: client: short-circuit in open_cached_dir_by_dentry() if !dentry When dentry is NULL, the current code acquires the spinlock and traverses the entire list, but the condition (dentry && cfid->dentry == dentry) ensures no match will ever be found. Return -ENOENT early in this case, avoiding unnecessary lock acquisition and list traversal. Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2025-10-01 22:42:15 -05:00
Henrique Carvalho	2f6a4af028	smb: client: remove pointless cfid->has_lease check open_cached_dir() will only return a valid cfid, which has both has_lease = true and time != 0. Remove the pointless check of cfid->has_lease right after open_cached_dir() returns no error. Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com> Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>	2025-10-01 22:42:04 -05:00
Fiona Ebner	6c7fd18423	smb: client: transport: minor indentation style fix Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2025-10-01 22:25:25 -05:00
Fiona Ebner	00be6f26a2	smb: client: transport: avoid reconnects triggered by pending task work When io_uring is used in the same task as CIFS, there might be unnecessary reconnects, causing issues in user-space applications like QEMU with a log like: > CIFS: VFS: \\10.10.100.81 Error -512 sending data on socket to server Certain io_uring completions might be added to task_work with notify_method being TWA_SIGNAL and thus TIF_NOTIFY_SIGNAL is set for the task. In __smb_send_rqst(), signals are masked before calling smb_send_kvec(), but the masking does not apply to TIF_NOTIFY_SIGNAL. If sk_stream_wait_memory() is reached via sock_sendmsg() while TIF_NOTIFY_SIGNAL is set, signal_pending(current) will evaluate to true there, and -EINTR will be propagated all the way from sk_stream_wait_memory() to sock_sendmsg() in smb_send_kvec(). Afterwards, __smb_send_rqst() will see that not everything was written and reconnect. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2025-10-01 22:25:21 -05:00
Henrique Carvalho	17ef15fa80	smb: client: remove unused fid_lock The fid_lock in struct cached_fid does not currently provide any real synchronization. Previously, it had the intention to prevent a double release of the dentry, but every change to cfid->dentry is already protected either by cfid_list_lock (while the entry is in the list) or happens after the cfid has been removed (so no other thread should find it). Since there is no scenario in which fid_lock prevents any race, it is vestigial and can be removed along with its associated spin_lock()/spin_unlock() calls. Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com> Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>	2025-10-01 22:05:19 -05:00
Henrique Carvalho	5676398315	smb: client: update cfid->last_access_time in open_cached_dir_by_dentry() open_cached_dir_by_dentry() was missing an update of cfid->last_access_time to jiffies, similar to what open_cached_dir() has. Add it to the function. Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com> Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>	2025-10-01 22:01:03 -05:00
Steve French	a365f2c049	smb: client: ensure open_cached_dir_by_dentry() only returns valid cfid open_cached_dir_by_dentry() was exposing an invalid cached directory to callers. The validity check outside the function was exclusively based on cfid->time. Add validity check before returning success and introduce is_valid_cached_dir() helper for consistent checks across the code. Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com> Reviwed-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>	2025-10-01 21:49:59 -05:00
Bharath SM	63eb8bd6c8	smb: client: account smb directory cache usage and per-tcon totals Add lightweight accounting for directory lease cache usage to aid debugging and limiting cache size in future. Track per-directory entry/byte counts and maintain per-tcon aggregates. Also expose the totals in /proc/fs/cifs/open_dirs. Signed-off-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2025-10-01 21:49:53 -05:00
Bharath SM	dde6667fa3	smb: client: add drop_dir_cache module parameter to invalidate cached dirents Add write-only /sys/module/cifs/parameters/drop_dir_cache. Writing a non-zero value iterates all tcons and calls invalidate_all_cached_dirs() to drop cached directory entries. This is useful to force a dirent cache drop across mounts for debugging and testing purpose. Signed-off-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2025-10-01 20:36:23 -05:00
NeilBrown	73cc6ec1a8	nfsd: discard nfserr_dropit nfserr_dropit hasn't been used for over a decade, since rq_dropme and the RQ_DROPME were introduced. Time to get rid of it completely. Signed-off-by: NeilBrown <neil@brown.name> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-10-01 15:54:01 -04:00
Mike Snitzer	6304affe45	NFSD: Add io_cache_{read,write} controls to debugfs Add 'io_cache_read' to NFSD's debugfs interface so that any data read by NFSD will either be: - cached using page cache (NFSD_IO_BUFFERED=0) - cached but removed from the page cache upon completion (NFSD_IO_DONTCACHE=1). io_cache_read may be set by writing to: /sys/kernel/debug/nfsd/io_cache_read Add 'io_cache_write' to NFSD's debugfs interface so that any data written by NFSD will either be: - cached using page cache (NFSD_IO_BUFFERED=0) - cached but removed from the page cache upon completion (NFSD_IO_DONTCACHE=1). io_cache_write may be set by writing to: /sys/kernel/debug/nfsd/io_cache_write The default value for both settings is NFSD_IO_BUFFERED, which is NFSD's existing behavior for both read and write. Changes to these settings take immediate effect for all exports and NFS versions. Currently only xfs and ext4 implement RWF_DONTCACHE. For file systems that do not implement RWF_DONTCACHE, NFSD use only buffered I/O when the io_cache setting is NFSD_IO_DONTCACHE. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-10-01 15:54:01 -04:00
Chuck Lever	d6e80d48f9	NFSD: Do the grace period check in ->proc_layoutget RFC 8881 Section 18.43.3 states: > If the metadata server is in a grace period, and does not persist > layouts and device ID to device address mappings, then it MUST > return NFS4ERR_GRACE (see Section 8.4.2.1). Jeff observed that this suggests the grace period check is better done by the individual layout type implementations, because checking for the server grace period is unnecessary for some layout types. Suggested-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/linux-nfs/7h5p5ktyptyt37u6jhpbjfd5u6tg44lriqkdc7iz7czeeabrvo@ijgxz27dw4sg/T/#t Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-10-01 15:54:01 -04:00
Dan Carpenter	eafdd7e949	nfsd: delete unnecessary NULL check in __fh_verify() In commit 4a0de50a44bb ("nfsd: decouple the xprtsec policy check from check_nfsd_access()") we added a NULL check on "rqstp" to earlier in the function. This check is no longer required so delete it. Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-10-01 15:54:01 -04:00
Sergey Bashirov	e0963ce53b	NFSD: Allow layoutcommit during grace period If the loca_reclaim field is set to TRUE, this indicates that the client is attempting to commit changes to a layout after the restart of the metadata server during the metadata server's recovery grace period. This type of request may be necessary when the client has uncommitted writes to provisionally allocated byte-ranges of a file that were sent to the storage devices before the restart of the metadata server. See RFC 8881, section 18.42.3. Without this, the client is not able to increase the file size and commit preallocated extents when the block/scsi layout server is restarted during a write and is in a grace period. And when the grace period ends, the client also cannot perform layoutcommit because the old layout state becomes invalid, resulting in file corruption. Co-developed-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2025-10-01 15:54:01 -04:00
Bharath SM	ac3ad9845b	smb: client: show lease state as R/H/W (or NONE) in open_files Print the lease/oplock caching state for each open file as a compact string of letters: R (read), H (handle), W (write). Signed-off-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2025-10-01 12:41:46 -05:00
Linus Torvalds	eb3289fc47	Merge tag 'driver-core-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core Pull driver core updates from Danilo Krummrich: "Auxiliary: - Drop call to dev_pm_domain_detach() in auxiliary_bus_probe() - Optimize logic of auxiliary_match_id() Rust: - Auxiliary: - Use primitive C types from prelude - DebugFs: - Add debugfs support for simple read/write files and custom callbacks through a File-type-based and directory-scope-based API - Sample driver code for the File-type-based API - Sample module code for the directory-scope-based API - I/O: - Add io::poll module and implement Rust specific read_poll_timeout() helper - IRQ: - Implement support for threaded and non-threaded device IRQs based on (&Device<Bound>, IRQ number) tuples (IrqRequest) - Provide &Device<Bound> cookie in IRQ handlers - PCI: - Support IRQ requests from IRQ vectors for a specific pci::Device<Bound> - Implement accessors for subsystem IDs, revision, devid and resource start - Provide dedicated pci::Vendor and pci::Class types for vendor and class ID numbers - Implement Display to print actual vendor and class names; Debug to print the raw ID numbers - Add pci::DeviceId::from_class_and_vendor() helper - Use primitive C types from prelude - Various minor inline and (safety) comment improvements - Platform: - Support IRQ requests from IRQ vectors for a specific platform::Device<Bound> - Nova: - Use pci::DeviceId::from_class_and_vendor() to avoid probing non-display/compute PCI functions - Misc: - Add helper for cpu_relax() - Update ARef import from sync::aref sysfs: - Remove bin_attrs_new field from struct attribute_group - Remove read_new() and write_new() from struct bin_attribute Misc: - Document potential race condition in get_dev_from_fwnode() - Constify node_group argument in software node registration functions - Fix order of kernel-doc parameters in various functions - Set power.no_pm flag for faux devices - Set power.no_callbacks flag along with the power.no_pm flag - Constify the pmu_bus bus type - Minor spelling fixes" * tag 'driver-core-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core: (43 commits) rust: pci: display symbolic PCI vendor names rust: pci: display symbolic PCI class names rust: pci: fix incorrect platform reference in PCI driver probe doc comment rust: pci: fix incorrect platform reference in PCI driver unbind doc comment perf: make pmu_bus const samples: rust: Add scoped debugfs sample driver rust: debugfs: Add support for scoped directories samples: rust: Add debugfs sample driver rust: debugfs: Add support for callback-based files rust: debugfs: Add support for writable files rust: debugfs: Add support for read-only files rust: debugfs: Add initial support for directories driver core: auxiliary bus: Optimize logic of auxiliary_match_id() driver core: auxiliary bus: Drop dev_pm_domain_detach() call driver core: Fix order of the kernel-doc parameters driver core: get_dev_from_fwnode(): document potential race drivers: base: fix "publically"->"publicly" driver core/PM: Set power.no_callbacks along with power.no_pm driver core: faux: Set power.no_pm for faux devices rust: pci: inline several tiny functions ...	2025-10-01 08:39:23 -07:00
Nathan Chancellor	4335c4496b	btrfs: fix PAGE_SIZE format specifier in open_ctree() There is an instance of -Wformat when targeting 32-bit architectures due to using a 'size_t' specifier (which is 'unsigned int' for 32-bit platforms) to print PAGE_SIZE: In file included from fs/btrfs/compression.h:17, from fs/btrfs/extent_io.h:15, from fs/btrfs/locking.h:13, from fs/btrfs/ctree.h:19, from fs/btrfs/disk-io.c:22: fs/btrfs/disk-io.c: In function 'open_ctree': include/linux/kern_levels.h:5:25: error: format '%zu' expects argument of type 'size_t', but argument 4 has type 'long unsigned int' [-Werror=format=] ... fs/btrfs/disk-io.c:3398:17: note: in expansion of macro 'btrfs_warn' 3398 \| btrfs_warn(fs_info, \| ^~~~~~~~~~ PAGE_SIZE is consistently defined as an 'unsigned long' in include/vsdo/page.h so use '%lu' to clear up the warning. Fixes: `98077f7f21` ("btrfs: enable experimental bs > ps support") Reviewed-by: Qu Wenruo <wqu@suse.com> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-10-01 16:27:28 +02:00
Namjae Jeon	e28c5bc456	ksmbd: increase session and share hash table bits Increases the number of bits for the hash table from 3 to 12. The thousands of sessions and shares can be connected. So the current 3-bit size can lead to frequent hash collisions. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2025-09-30 21:37:55 -05:00
Namjae Jeon	0bcc831be5	ksmbd: replace connection list with hash table Replace connection list with hash table to improve lookup performance. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2025-09-30 21:37:55 -05:00
Namjae Jeon	6747885781	ksmbd: add an error print when maximum IP connections limit is reached This change introduces an error print using pr_info_ratelimited() to prevent excessive logging. This message will inform the user that the limit for maximum IP connections has been hit and what that current count is, which can be useful for debugging and monitoring connection limits. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2025-09-30 21:37:54 -05:00
Namjae Jeon	d8b6dc9256	ksmbd: add max ip connections parameter This parameter set the maximum number of connections per ip address. The default is 8. Cc: stable@vger.kernel.org Fixes: `c0d41112f1` ("ksmbd: extend the connection limiting mechanism to support IPv6") Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2025-09-30 21:37:54 -05:00
Matvey Kovalev	88daf2f448	ksmbd: fix error code overwriting in smb2_get_info_filesystem() If client doesn't negotiate with SMB3.1.1 POSIX Extensions, then proper error code won't be returned due to overwriting. Return error immediately. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: `e2f34481b2` ("cifsd: add server-side procedures for SMB3") Cc: stable@vger.kernel.org Signed-off-by: Matvey Kovalev <matvey.kovalev@ispras.ru> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2025-09-30 21:37:54 -05:00
Namjae Jeon	c20988c217	ksmbd: copy overlapped range within the same file cifs.ko request to copy overlapped range within the same file. ksmbd is using vfs_copy_file_range for this, vfs_copy_file_range() does not allow overlapped copying within the same file. This patch use do_splice_direct() if offset and length are overlapped. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2025-09-30 21:37:54 -05:00
Namjae Jeon	3677ca67b9	ksmbd: use sock_create_kern interface to create kernel socket we should use sock_create_kern() if the socket resides in kernel space. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2025-09-30 21:37:54 -05:00
Namjae Jeon	5da92a251e	ksmbd: make ksmbd thread names distinct by client IP This patch makes ksmbd thread names distinct by client IP address. 100943 ? S 0:00 [ksmbd:::ffff:10.177.110.57] or 101752 ? S 0:00 [ksmbd:10.177.110.57] Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2025-09-30 21:37:54 -05:00
Yunseong Kim	305853cce3	ksmbd: Fix race condition in RPC handle list access The 'sess->rpc_handle_list' XArray manages RPC handles within a ksmbd session. Access to this list is intended to be protected by 'sess->rpc_lock' (an rw_semaphore). However, the locking implementation was flawed, leading to potential race conditions. In ksmbd_session_rpc_open(), the code incorrectly acquired only a read lock before calling xa_store() and xa_erase(). Since these operations modify the XArray structure, a write lock is required to ensure exclusive access and prevent data corruption from concurrent modifications. Furthermore, ksmbd_session_rpc_method() accessed the list using xa_load() without holding any lock at all. This could lead to reading inconsistent data or a potential use-after-free if an entry is concurrently removed and the pointer is dereferenced. Fix these issues by: 1. Using down_write() and up_write() in ksmbd_session_rpc_open() to ensure exclusive access during XArray modification, and ensuring the lock is correctly released on error paths. 2. Adding down_read() and up_read() in ksmbd_session_rpc_method() to safely protect the lookup. Fixes: `a1f46c99d9` ("ksmbd: fix use-after-free in ksmbd_session_rpc_open") Fixes: `b685757c7b` ("ksmbd: Implements sess->rpc_handle_list as xarray") Cc: stable@vger.kernel.org Signed-off-by: Yunseong Kim <ysk@kzalloc.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2025-09-30 21:37:54 -05:00
Linus Torvalds	2cb8eeaf00	Merge tag 'x86_cache_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 resource control updates from Borislav Petkov: "Add support on AMD for assigning QoS bandwidth counters to resources (RMIDs) with the ability for those resources to be tracked by the counters as long as they're assigned to them. Previously, due to hw limitations, bandwidth counts from untracked resources would get lost when those resources are not tracked. Refactor the code and user interfaces to be able to also support other, similar features on ARM, for example" * tag 'x86_cache_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits) fs/resctrl: Fix counter auto-assignment on mkdir with mbm_event enabled MAINTAINERS: resctrl: Add myself as reviewer x86/resctrl: Configure mbm_event mode if supported fs/resctrl: Introduce the interface to switch between monitor modes fs/resctrl: Disable BMEC event configuration when mbm_event mode is enabled fs/resctrl: Introduce the interface to modify assignments in a group fs/resctrl: Introduce mbm_L3_assignments to list assignments in a group fs/resctrl: Auto assign counters on mkdir and clean up on group removal fs/resctrl: Introduce mbm_assign_on_mkdir to enable assignments on mkdir fs/resctrl: Provide interface to update the event configurations fs/resctrl: Add event configuration directory under info/L3_MON/ fs/resctrl: Support counter read/reset with mbm_event assignment mode x86/resctrl: Implement resctrl_arch_reset_cntr() and resctrl_arch_cntr_read() x86/resctrl: Refactor resctrl_arch_rmid_read() fs/resctrl: Introduce counter ID read, reset calls in mbm_event mode fs/resctrl: Pass struct rdtgroup instead of individual members fs/resctrl: Add the functionality to unassign MBM events fs/resctrl: Add the functionality to assign MBM events x86,fs/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC fs/resctrl: Introduce event configuration field in struct mon_evt ...	2025-09-30 13:29:42 -07:00
Mike Snitzer	1f0d4ab0f5	NFS: add basic STATX_DIOALIGN and STATX_DIO_READ_ALIGN support NFS doesn't have DIO alignment constraints, so have NFS respond with accommodating DIO alignment attributes (rather than plumb in GETATTR support for STATX_DIOALIGN and STATX_DIO_READ_ALIGN). The most coarse-grained dio_offset_align is the most accommodating (e.g. PAGE_SIZE, in future larger may be supported). Now that NFS has support, NFS reexport will now handle unaligned DIO (NFSD's NFSD_IO_DIRECT support requires the underlying filesystem support STATX_DIOALIGN and/or STATX_DIO_READ_ALIGN). Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-09-30 16:10:30 -04:00
Mike Snitzer	cda94457c2	nfs/localio: add tracepoints for misaligned DIO READ and WRITE support Add nfs_local_dio_class and use it to create nfs_local_dio_read, nfs_local_dio_write and nfs_local_dio_misaligned trace events. These trace events show how NFS LOCALIO splits a given misaligned IO into a mix of misaligned head and/or tail extents and a DIO-aligned middle extent. The misaligned head and/or tail extents are issued using buffered IO and the DIO-aligned middle is issued using O_DIRECT. This combination of trace events is useful for LOCALIO DIO READs: echo 1 > /sys/kernel/tracing/events/nfs/nfs_local_dio_read/enable echo 1 > /sys/kernel/tracing/events/nfs/nfs_local_dio_misaligned/enable echo 1 > /sys/kernel/tracing/events/nfs/nfs_initiate_read/enable echo 1 > /sys/kernel/tracing/events/nfs/nfs_readpage_done/enable echo 1 > /sys/kernel/tracing/events/xfs/xfs_file_direct_read/enable This combination of trace events is useful for LOCALIO DIO WRITEs: echo 1 > /sys/kernel/tracing/events/nfs/nfs_local_dio_write/enable echo 1 > /sys/kernel/tracing/events/nfs/nfs_local_dio_misaligned/enable echo 1 > /sys/kernel/tracing/events/nfs/nfs_initiate_write/enable echo 1 > /sys/kernel/tracing/events/nfs/nfs_writeback_done/enable echo 1 > /sys/kernel/tracing/events/xfs/xfs_file_direct_write/enable Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-09-30 16:10:30 -04:00
Mike Snitzer	c817248fc8	nfs/localio: add proper O_DIRECT support for READ and WRITE Because the NFS client will already happily handle misaligned O_DIRECT IO (by sending it out to NFSD via RPC) this commit's new capabilities are for the benefit of LOCALIO. LOCALIO will make best effort to transform misaligned IO to DIO-aligned extents when possible. LOCALIO's READ and WRITE DIO that is misaligned will be split into as many as 3 component IOs (@start, @middle and @end) as needed -- IFF the @middle extent is verified to be DIO-aligned, and then the @start and/or @end are misaligned (due to each being a partial page). Otherwise if the @middle isn't DIO-aligned the code will fallback to issuing only a single contiguous buffered IO. The @middle is only DIO-aligned if both the memory and on-disk offsets for the IO are aligned relative to the underlying local filesystem's block device limits (@dma_alignment and @logical_block_size respectively). The misaligned @start and/or @end extents are issued using buffered IO and the DIO-aligned @middle is issued using O_DIRECT. The @start and @end IOs are issued first using buffered IO with IOCB_SYNC and then the @middle is issued last using direct IO with async completion (AIO). This out of order IO completion means that LOCALIO's IO completion code (nfs_local_read_done and nfs_local_write_done) is only called for the IO's last associated iov_iter completion. And in the case of DIO-aligned @middle it completes last using AIO. nfs_local_pgio_done() is updated to handle piece-wise partial completion of each iov_iter. This implementation for LOCALIO's misaligned DIO handling uses 3 iov_iter that share the same backing pages in their bio_vecs (so unfortunately 'struct nfs_local_kiocb' has 3 instead of only 1). [Reducing LOCALIO's per-IO (struct nfs_local_kiocb) memory use can be explored in the future. One logical progression to improve this code, and eliminate explicit loops over up to 3 iov_iter, is by extending 'struct iov_iter' to support iov_iter_clone() and iov_iter_chain() interfaces that are comparable to what 'struct bio' is able to support in the block layer. But even that wouldn't avoid the need to allocate/use up to 3 iov_iter] Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-09-30 16:10:30 -04:00
Mike Snitzer	e43e9a3a3d	nfs/localio: refactor iocb initialization The goal of this commit's various refactoring is to have LOCALIO's per IO initialization occur in process context so that we don't get into a situation where IO fails to be issued from workqueue (e.g. due to lack of memory, etc). Better to have LOCALIO's iocb initialization fail early. There isn't immediate need but this commit makes it possible for LOCALIO to fallback to NFS pagelist code in process context to allow for immediate retry over RPC. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-09-30 16:10:30 -04:00
Mike Snitzer	091bdcfcec	nfs/localio: refactor iocb and iov_iter_bvec initialization nfs_local_iter_init() is updated to follow the same pattern to initializing LOCALIO's iov_iter_bvec as was established by nfsd_iter_read(). Other LOCALIO iocb initialization refactoring in this commit offers incremental cleanup that will be taken further by the next commit. No functional change. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-09-30 16:10:30 -04:00
Mike Snitzer	25ba2b84c3	nfs/localio: avoid issuing misaligned IO using O_DIRECT Add nfsd_file_dio_alignment and use it to avoid issuing misaligned IO using O_DIRECT. Any misaligned DIO falls back to using buffered IO. Because misaligned DIO is now handled safely, remove the nfs modparam 'localio_O_DIRECT_semantics' that was added to require users opt-in to the requirement that all O_DIRECT be properly DIO-aligned. Also, introduce nfs_iov_iter_aligned_bvec() which is a variant of iov_iter_aligned_bvec() that also verifies the offset associated with an iov_iter is DIO-aligned. NOTE: in a parallel effort, iov_iter_aligned_bvec() is being removed along with iov_iter_is_aligned(). Lastly, add pr_info_ratelimited if underlying filesystem returns -EINVAL because it was made to try O_DIRECT for IO that is not DIO-aligned (shouldn't happen, so its best to be louder if it does). Fixes: `3feec68563` ("nfs/localio: add direct IO enablement with sync and async IO support") Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-09-30 16:10:29 -04:00
Mike Snitzer	fd6d93c2b7	nfs/localio: make trace_nfs_local_open_fh more useful Always trigger trace event when LOCALIO opens a file. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-09-30 16:10:29 -04:00
Mike Snitzer	d11f6cd1bb	NFSD: filecache: add STATX_DIOALIGN and STATX_DIO_READ_ALIGN support Use STATX_DIOALIGN and STATX_DIO_READ_ALIGN to get DIO alignment attributes from the underlying filesystem and store them in the associated nfsd_file. This is done when the nfsd_file is first opened for each regular file. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-09-30 16:10:05 -04:00
Linus Torvalds	f3827213ab	Merge tag 'for-6.18-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "There are no new features, the changes are in the core code, notably tree-log error handling and reporting improvements, and initial support for block size > page size. Performance improvements: - search data checksums in the commit root (previous transaction) to avoid locking contention, this improves parallelism of read heavy/low write workloads, and also reduces transaction commit time; on real and reproducer workload the sync time went from minutes to tens of seconds (workload and numbers are in the changelog) Core: - tree-log updates: - error handling improvements, transaction aborts - add new error state 'O' (printed in status messages) when log replay fails and is aborted - reduced number of btrfs_path allocations when traversing the tree - 'block size > page size' support - basic implementation with limitations, under experimental build - limitations: no direct io, raid56, encoded read (standalone and in send ioctl), encoded write - preparatory work for compression, removing implicit assumptions of page and block sizes - compression workspaces are now per-filesystem, we cannot assume common block size for work memory among different filesystems - tree-checker now verifies INODE_EXTREF item (which is implementing hardlinks) - tree leaf pretty printer updates, there were missing data from items, keys/items - move config option CONFIG_BTRFS_REF_VERIFY to CONFIG_BTRFS_DEBUG, it's a debugging feature and not needed to be enabled separately - more struct btrfs_path auto free updates - use ref_tracker API for tracking delayed inodes, enabled by mount option 'ref_verify', allowing to better pinpoint leaking references - in zoned mode, avoid selecting data relocation zoned for ordinary data block groups - updated and enhanced error messages - lots of cleanups and refactoring" * tag 'for-6.18-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (113 commits) btrfs: use smp_mb__after_atomic() when forcing COW in create_pending_snapshot() btrfs: add unlikely annotations to branches leading to transaction abort btrfs: add unlikely annotations to branches leading to EIO btrfs: add unlikely annotations to branches leading to EUCLEAN btrfs: more trivial BTRFS_PATH_AUTO_FREE conversions btrfs: zoned: don't fail mount needlessly due to too many active zones btrfs: use kmalloc_array() for open-coded arithmetic in kmalloc() btrfs: enable experimental bs > ps support btrfs: add extra ASSERT()s to catch unaligned bios btrfs: fix symbolic link reading when bs > ps btrfs: prepare scrub to support bs > ps cases btrfs: prepare zlib to support bs > ps cases btrfs: prepare lzo to support bs > ps cases btrfs: prepare zstd to support bs > ps cases btrfs: prepare compression folio alloc/free for bs > ps cases btrfs: fix the incorrect max_bytes value for find_lock_delalloc_range() btrfs: remove pointless key offset setup in create_pending_snapshot() btrfs: annotate btrfs_is_testing() as unlikely and make it return bool btrfs: make the rule checking more readable for should_cow_block() btrfs: simplify inline extent end calculation at replay_one_extent() ...	2025-09-30 08:14:49 -07:00
Thorsten Blum	11f6bce77e	fs/orangefs: Replace kzalloc + copy_from_user with memdup_user_nul Replace kzalloc() followed by copy_from_user() with memdup_user_nul() to simplify and improve orangefs_debug_write(). Allocate only 'count' bytes instead of the maximum size ORANGEFS_MAX_DEBUG_STRING_LEN, and set 'buf' to NULL to ensure kfree(buf) still works. No functional changes intended. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Mike Marshall <hubcap@omnibond.com>	2025-09-30 10:23:24 -04:00
Mike Marshall	025e880759	orangefs: fix xattr related buffer overflow... Willy Tarreau <w@1wt.eu> forwarded me a message from Disclosure <disclosure@aisle.com> with the following warning: > The helper `xattr_key()` uses the pointer variable in the loop condition > rather than dereferencing it. As `key` is incremented, it remains non-NULL > (until it runs into unmapped memory), so the loop does not terminate on > valid C strings and will walk memory indefinitely, consuming CPU or hanging > the thread. I easily reproduced this with setfattr and getfattr, causing a kernel oops, hung user processes and corrupted orangefs files. Disclosure sent along a diff (not a patch) with a suggested fix, which I based this patch on. After xattr_key started working right, xfstest generic/069 exposed an xattr related memory leak that lead to OOM. xattr_key returns a hashed key. When adding xattrs to the orangefs xattr cache, orangefs used hash_add, a kernel hashing macro. hash_add also hashes the key using hash_log which resulted in additions to the xattr cache going to the wrong hash bucket. generic/069 tortures a single file and orangefs does a getattr for the xattr "security.capability" every time. Orangefs negative caches on xattrs which includes a kmalloc. Since adds to the xattr cache were going to the wrong bucket, every getattr for "security.capability" resulted in another kmalloc, none of which were ever freed. I changed the two uses of hash_add to hlist_add_head instead and the memory leak ceased and generic/069 quit throwing furniture. Signed-off-by: Mike Marshall <hubcap@omnibond.com> Reported-by: Stanislav Fort of Aisle Research <stanislav.fort@aisle.com>	2025-09-30 10:23:20 -04:00
Zhen Ni	3dffadfa99	orangefs: Remove unused type in macro fill_default_sys_attrs Remove the unused type parameter from the macro definition and all its callers, making the interface consistent with its actual usage. Fixes: `69a23de2f3` ("orangefs: clean up fill_default_sys_attrs") Signed-off-by: Zhen Ni <zhen.ni@easystack.cn> Signed-off-by: Mike Marshall <hubcap@omnibond.com>	2025-09-30 10:23:03 -04:00
Ethan Ferguson	d01579d590	exfat: Add support for FS_IOC_{GET,SET}FSLABEL Add support for reading / writing to the exfat volume label from the FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL ioctls Co-developed-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Ethan Ferguson <ethan.ferguson@zetier.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2025-09-30 13:49:31 +09:00
Sang-Heon Jeon	29c063658d	exfat: combine iocharset and utf8 option setup Currently, exfat utf8 mount option depends on the iocharset option value. After exfat remount, utf8 option may become inconsistent with iocharset option. If the options are inconsistent; (specifically, iocharset=utf8 but utf8=0) readdir may reference uninitalized NLS, leading to a null pointer dereference. Extract and combine utf8/iocharset setup logic into exfat_set_iocharset(). Then Replace iocharset setup logic to exfat_set_iocharset to prevent utf8/iocharset option inconsistentcy after remount. Reported-by: syzbot+3e9cb93e3c5f90d28e19@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=3e9cb93e3c5f90d28e19 Signed-off-by: Sang-Heon Jeon <ekffu200098@gmail.com> Fixes: acab02ffcd6b ("exfat: support modifying mount options via remount") Tested-by: syzbot+3e9cb93e3c5f90d28e19@syzkaller.appspotmail.com Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2025-09-30 13:41:22 +09:00
Yuezhang Mo	e6fd5d3a43	exfat: support modifying mount options via remount Before this commit, all exfat-defined mount options could not be modified dynamically via remount, and no error was returned. After this commit, these three exfat-defined mount options (discard, zero_size_dir, and errors) can be modified dynamically via remount. While other exfat-defined mount options cannot be modified dynamically via remount because their old settings are cached in inodes or dentries, modifying them will be rejected with an error. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2025-09-30 13:34:43 +09:00

... 4 5 6 7 8 ...

102029 Commits