Commit Graph

98043 Commits

Author SHA1 Message Date
Linus Torvalds
302deb109d Merge tag 'sched-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:

 - Fix a nonsensical Kconfig combination

 - Remove an unnecessary rseq-notification

* tag 'sched-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  rseq: Eliminate useless task_work on execve
  sched/isolation: Make CONFIG_CPU_ISOLATION depend on CONFIG_SMP
2025-04-06 10:44:58 -07:00
Thomas Gleixner
8fa7292fee treewide: Switch/rename to timer_delete[_sync]()
timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree
over and remove the historical wrapper inlines.

Conversion was done with coccinelle plus manual fixups where necessary.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-04-05 10:30:12 +02:00
Linus Torvalds
9f867ba24d Merge tag '6.15-rc-part2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull more smb client updates from Steve French:

 - reconnect fixes: three for updating rsize/wsize and an SMB1 reconnect
   fix

 - RFC1001 fixes: fixing connections to nonstandard ports, and negprot
   retries

 - fix mfsymlinks to old servers

 - make mapping of open flags for SMB1 more accurate

 - permission fixes: adding retry on open for write, and one for stat to
   workaround unexpected access denied

 - add two new xattrs, one for retrieving SACL and one for retrieving
   owner (without having to retrieve the whole ACL)

 - fix mount parm validation for echo_interval

 - minor cleanup (including removing now unneeded cifs_truncate_page)

* tag '6.15-rc-part2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: update internal version number
  cifs: Implement is_network_name_deleted for SMB1
  cifs: Remove cifs_truncate_page() as it should be superfluous
  cifs: Do not add FILE_READ_ATTRIBUTES when using GENERIC_READ/EXECUTE/ALL
  cifs: Improve SMB2+ stat() to work also without FILE_READ_ATTRIBUTES
  cifs: Add fallback for SMB2 CREATE without FILE_READ_ATTRIBUTES
  cifs: Fix querying and creating MF symlinks over SMB1
  cifs: Fix access_flags_to_smbopen_mode
  cifs: Fix negotiate retry functionality
  cifs: Improve handling of NetBIOS packets
  cifs: Allow to disable or force initialization of NetBIOS session
  cifs: Add a new xattr system.smb3_ntsd_owner for getting or setting owner
  cifs: Add a new xattr system.smb3_ntsd_sacl for getting or setting SACLs
  smb: client: Update IO sizes after reconnection
  smb: client: Store original IO parameters and prevent zero IO sizes
  smb:client: smb: client: Add reverse mapping from tcon to superblocks
  cifs: remove unreachable code in cifs_get_tcp_session()
  cifs: fix integer overflow in match_server()
2025-04-04 15:27:43 -07:00
Linus Torvalds
06a22366d6 Merge tag 'v6.15rc-part2-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
 "Four ksmbd SMB3 server fixes, all also for stable"

* tag 'v6.15rc-part2-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: fix null pointer dereference in alloc_preauth_hash()
  ksmbd: validate zero num_subauth before sub_auth is accessed
  ksmbd: fix overflow in dacloffset bounds check
  ksmbd: fix session use-after-free in multichannel connection
2025-04-03 16:18:06 -07:00
Christian Brauner
c0dbd11ada fs: actually hold the namespace semaphore
Don't use a scoped guard that only protects the next statement.

Use a regular guard to make sure that the namespace semaphore is held
across the whole function.

Signed-off-by: Christian Brauner <brauner@kernel.org>
Reported-by: Leon Romanovsky <leon@kernel.org>
Link: https://lore.kernel.org/all/20250401170715.GA112019@unreal/
Fixes: db04662e2f ("fs: allow detached mounts in clone_private_mount()")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-04-03 15:45:35 -07:00
Linus Torvalds
56770e24f6 Merge tag 'bcachefs-2025-04-03' of git://evilpiepirate.org/bcachefs
Pull more bcachefs updates from Kent Overstreet:
 "More notable fixes:

   - Fix for striping behaviour on tiering filesystems where replicas
     exceeds durability on destination target

   - Fix a race in device removal where deleting alloc info races with
     the discard worker

   - Some small stack usage improvements: this is just enough for KMSAN
     builds to not blow the stack, more is queued up for 6.16"

* tag 'bcachefs-2025-04-03' of git://evilpiepirate.org/bcachefs:
  bcachefs: Fix "journal stuck" during recovery
  bcachefs: backpointer_get_key: check for null from peek_slot()
  bcachefs: Fix null ptr deref in invalidate_one_bucket()
  bcachefs: Fix check_snapshot_exists() restart handling
  bcachefs: use nonblocking variant of print_string_as_lines in error path
  bcachefs: Fix scheduling while atomic from logging changes
  bcachefs: Add error handling for zlib_deflateInit2()
  bcachefs: add missing selection of XARRAY_MULTI
  bcachefs: bch_dev_usage_full
  bcachefs: Kill btree_iter.trans
  bcachefs: do_trace_key_cache_fill()
  bcachefs: Split up bch_dev.io_ref
  bcachefs: fix ref leak in btree_node_read_all_replicas
  bcachefs: Fix null ptr deref in bch2_write_endio()
  bcachefs: Fix field spanning write warning
  bcachefs: Fix striping behaviour
2025-04-03 15:39:47 -07:00
Linus Torvalds
bdafff62ae Merge tag '9p-for-6.15-rc1' of https://github.com/martinetd/linux
Pull 9p updates from Dominique Martinet:

 - fix handling of bogus (negative/too long) replies

 - fix crash on mkdir with ACLs (... looks like nobody is using ACLs
   with semi-recent kernels...)

 - ipv6 support for trans=tcp

 - minor concurrency fix to make syzbot happy

 - minor cleanup

* tag '9p-for-6.15-rc1' of https://github.com/martinetd/linux:
  docs: fs/9p: Add missing "not" in cache documentation
  9p: Use hashtable.h for hash_errmap
  Documentation/fs/9p: fix broken link
  9p/trans_fd: mark concurrent read and writes to p9_conn->err
  9p/net: return error on bogus (longer than requested) replies
  9p/net: fix improper handling of bogus negative read/write replies
  fs/9p: fix NULL pointer dereference on mkdir
  net/9p/fd: support ipv6 for trans=tcp
2025-04-03 15:35:46 -07:00
Linus Torvalds
204e9a18f1 Merge tag 'mm-hotfixes-stable-2025-04-02-21-57' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM hotfixes from Andrew Morton:
 "Five hotfixes. Three are cc:stable and the remainder address post-6.14
  issues or aren't considered necessary for -stable kernels.

  All patches are for MM"

* tag 'mm-hotfixes-stable-2025-04-02-21-57' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm: zswap: fix crypto_free_acomp() deadlock in zswap_cpu_comp_dead()
  mm/hugetlb: move hugetlb_sysctl_init() to the __init section
  mm: page_isolation: avoid calling folio_hstate() without hugetlb_lock
  mm/hugetlb_vmemmap: fix memory loads ordering
  mm/userfaultfd: fix release hang over concurrent GUP
2025-04-03 10:47:47 -07:00
Kent Overstreet
77ad1df82b bcachefs: Fix "journal stuck" during recovery
If we crash when the journal pin fifo is completely full - i.e. we're
at the maximum number of dirty journal entries - that may put us in a
sticky situation in recovery, as journal replay will need to be able to
open new journal entries in order to get going.

bch2_fs_journal_start() already had provisions for resizing the journal
pin fifo if needed, but it needs a fudge factor to ensure there's room
for journal replay.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-03 12:11:43 -04:00
Kent Overstreet
2581f89ac8 bcachefs: backpointer_get_key: check for null from peek_slot()
peek_slot() doesn't normally return bkey_s_c_null - except when we ask
for a key at a btree level that doesn't exist, which can happen here.

We might want to revisit this, but we'll have to look over all the
places where we use peek_slot() on interior nodes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-03 12:11:43 -04:00
Kent Overstreet
39ebd74864 bcachefs: Fix null ptr deref in invalidate_one_bucket()
bch2_backpointer_get_key() returns bkey_s_c_null when the target isn't
found.

backpointer_get_key() flags the error, so there's nothing else to do
here - just skip it and move on.

Link: https://github.com/koverstreet/bcachefs/issues/847
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-03 12:11:43 -04:00
Kent Overstreet
83d539b1b0 bcachefs: Fix check_snapshot_exists() restart handling
Codepaths that create entries in the snapshots btree currently call
bch2_mark_snapshot(), which updates the in-memory snapshot table, before
transaction commit.

This is because bch2_mark_snapshot() is an atomic trigger, run with
btree write locks held, and isn't allowed to fail - but it might need to
reallocate the table, hence we call it early when we're still allowed to
fail.

This is generally harmless - if we fail, we'll have left an entry in the
snapshots table around, but nothing will reference it and it'll get
overwritten if reused by another transaction.

But check_snapshot_exists(), which reconstructs snapshots when the
snapshots btree has been corrupted or lost, was erronously rechecking if
the snapshot exists inside the transaction commit loop - so on
transaction restart (in this case mem_realloced), the second iteration
would return without repairing.

This code needs some cleanup: splitting out a "maybe realloc snapshots
table" helper would have avoided this, that will be in the next patch.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-03 12:11:43 -04:00
Bharadwaj Raju
570f5050bb bcachefs: use nonblocking variant of print_string_as_lines in error path
The inconsistency error path calls print_string_as_lines, which calls
console_lock, which is a potentially-sleeping function and so can't be
called in an atomic context.

Replace calls to it with the nonblocking variant which is safe to call.

Signed-off-by: Bharadwaj Raju <bharadwaj.raju777@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-03 12:11:42 -04:00
Kent Overstreet
b2ffadcc7f bcachefs: Fix scheduling while atomic from logging changes
Two fixes from the recent logging changes:

bch2_inconsistent(), bch2_fs_inconsistent() be called from interrupt
context, or with rcu_read_lock() held.

The one syzbot found is in
  bch2_bkey_pick_read_device
  bch2_dev_rcu
  bch2_fs_inconsistent

We're starting to switch to lift the printbufs up to higher levels so we
can emit better log messages and print them all in one go (avoid
garbling), so that conversion will help with spotting these in the
future; when we declare a printbuf it must be flagged if we're in an
atomic context.

Secondly, in btree_node_write_endio:

00085 BUG: sleeping function called from invalid context at include/linux/sched/mm.h:321
00085 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 618, name: bch-reclaim/fa6
00085 preempt_count: 10001, expected: 0
00085 RCU nest depth: 0, expected: 0
00085 4 locks held by bch-reclaim/fa6/618:
00085  #0: ffffff80d7ccad68 (&j->reclaim_lock){+.+.}-{4:4}, at: bch2_journal_reclaim_thread+0x84/0x198
00085  #1: ffffff80d7c84218 (&c->btree_trans_barrier){.+.+}-{0:0}, at: __bch2_trans_get+0x1c0/0x440
00085  #2: ffffff80cd3f8140 (bcachefs_btree){+.+.}-{0:0}, at: __bch2_trans_get+0x22c/0x440
00085  #3: ffffff80c3823c20 (&vblk->vqs[i].lock){-.-.}-{3:3}, at: virtblk_done+0x58/0x130
00085 irq event stamp: 328
00085 hardirqs last  enabled at (327): [<ffffffc080073a14>] finish_task_switch.isra.0+0xbc/0x2a0
00085 hardirqs last disabled at (328): [<ffffffc080971a10>] el1_interrupt+0x20/0x60
00085 softirqs last  enabled at (0): [<ffffffc08002f920>] copy_process+0x7c8/0x2118
00085 softirqs last disabled at (0): [<0000000000000000>] 0x0
00085 Preemption disabled at:
00085 [<ffffffc08003ada0>] irq_enter_rcu+0x18/0x90
00085 CPU: 8 UID: 0 PID: 618 Comm: bch-reclaim/fa6 Not tainted 6.14.0-rc6-ktest-g04630bde23e8 #18798
00085 Hardware name: linux,dummy-virt (DT)
00085 Call trace:
00085  show_stack+0x1c/0x30 (C)
00085  dump_stack_lvl+0x84/0xc0
00085  dump_stack+0x14/0x20
00085  __might_resched+0x180/0x288
00085  __might_sleep+0x4c/0x88
00085  __kmalloc_node_track_caller_noprof+0x34c/0x3e0
00085  krealloc_noprof+0x1a0/0x2d8
00085  bch2_printbuf_make_room+0x9c/0x120
00085  bch2_prt_printf+0x60/0x1b8
00085  btree_node_write_endio+0x1b0/0x2d8
00085  bio_endio+0x138/0x1f0
00085  btree_node_write_endio+0xe8/0x2d8
00085  bio_endio+0x138/0x1f0
00085  blk_update_request+0x220/0x4c0
00085  blk_mq_end_request+0x28/0x148
00085  virtblk_request_done+0x64/0xe8
00085  blk_mq_complete_request+0x34/0x40
00085  virtblk_done+0x78/0x130
00085  vring_interrupt+0x6c/0xb0
00085  __handle_irq_event_percpu+0x8c/0x2e0
00085  handle_irq_event+0x50/0xb0
00085  handle_fasteoi_irq+0xc4/0x250
00085  handle_irq_desc+0x44/0x60
00085  generic_handle_domain_irq+0x20/0x30
00085  gic_handle_irq+0x54/0xc8
00085  call_on_irq_stack+0x24/0x40

Reported-by: syzbot+c82cd2906e2f192410bb@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-03 12:11:42 -04:00
Wentao Liang
9364f17ba4 bcachefs: Add error handling for zlib_deflateInit2()
In attempt_compress(), the return value of zlib_deflateInit2() needs to be
checked. A proper implementation can be found in  pstore_compress().

Add an error check and return 0 immediately if the initialzation fails.

Fixes: 986e9842fb ("bcachefs: Compression levels")
Signed-off-by: Wentao Liang <vulab@iscas.ac.cn>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-03 12:11:42 -04:00
Mathieu Desnoyers
169eae7711 rseq: Eliminate useless task_work on execve
Eliminate a useless task_work on execve by moving the call to
rseq_set_notify_resume() from sched_mm_cid_after_execve() to the error
path of bprm_execve().

The call to rseq_set_notify_resume() from sched_mm_cid_after_execve() is
pointless in the success case, because rseq_execve() will clear the rseq
pointer before returning to userspace.

sched_mm_cid_after_execve() is called from both the success and error
paths of bprm_execve(). The call to rseq_set_notify_resume() is needed
on error because the mm_cid may have changed.

Also move the rseq_execve() to right after sched_mm_cid_after_execve()
in bprm_execve().

[ mingo: Merged to a recent upstream kernel, extended the changelog. ]

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250327132945.1558783-1-mathieu.desnoyers@efficios.com
2025-04-03 13:10:47 +02:00
Steve French
827a1bd9af cifs: update internal version number
To 2.52

Signed-off-by: Steve French <stfrench@microsoft.com>
2025-04-02 20:01:14 -05:00
Pali Rohár
28753e4304 cifs: Implement is_network_name_deleted for SMB1
This change allows Linux SMB1 client to autoreconnect the share when it is
modified on server by admin operation which removes and re-adds it.

Implementation is reused from SMB2+ is_network_name_deleted callback. There
are just adjusted checks for error codes and access to struct smb_hdr.

Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-04-02 20:01:14 -05:00
David Howells
f83e10a233 cifs: Remove cifs_truncate_page() as it should be superfluous
The calls to cifs_truncate_page() should be superfluous as the places that
call it also call truncate_setsize() or cifs_setsize() and therefore
truncate_pagecache() which should also clear the tail part of the folio
containing the EOF marker.

Further, smb3_simple_falloc() calls both cifs_setsize() and
truncate_setsize() in addition to cifs_truncate_page().

Remove the superfluous calls.

This gets rid of another function referring to struct page.

[Should cifs_setsize() also set inode->i_blocks?]

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <stfrench@microsoft.com>
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-04-02 20:00:39 -05:00
Linus Torvalds
94d471a4f4 Merge tag 'nfs-for-6.15-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
 "Bugfixes:

   - Three fixes for looping in the NFSv4 state manager delegation code

   - Fix for the NFSv4 state XDR code (Neil Brown)

   - Fix a leaked reference in nfs_lock_and_join_requests()

   - Fix a use-after-free in the delegation return code

  Features:

   - Implement the NFSv4.2 copy offload OFFLOAD_STATUS operation to
     allow monitoring of an in-progress copy

   - Add a mount option to force NFSv3/NFSv4 to use READDIRPLUS in a
     getdents() call

   - SUNRPC now allows some basic management of an existing RPC client's
     connections using sysfs

   - Improvements to the automated teardown of a NFS client when the
     container it was initiated from gets killed

   - Improvements to prevent tasks from getting stuck in a killable wait
     state after calling exit_signals()"

* tag 'nfs-for-6.15-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (29 commits)
  nfs: Add missing release on error in nfs_lock_and_join_requests()
  NFSv4: Check for delegation validity in nfs_start_delegation_return_locked()
  NFS: Don't allow waiting for exiting tasks
  SUNRPC: Don't allow waiting for exiting tasks
  NFSv4: Treat ENETUNREACH errors as fatal for state recovery
  NFSv4: clp->cl_cons_state < 0 signifies an invalid nfs_client
  NFSv4: Further cleanups to shutdown loops
  NFS: Shut down the nfs_client only after all the superblocks
  SUNRPC: rpc_clnt_set_transport() must not change the autobind setting
  SUNRPC: rpcbind should never reset the port to the value '0'
  pNFS/flexfiles: Report ENETDOWN as a connection error
  pNFS/flexfiles: Treat ENETUNREACH errors as fatal in containers
  NFS: Treat ENETUNREACH errors as fatal in containers
  NFS: Add a mount option to make ENETUNREACH errors fatal
  sunrpc: Add a sysfs file for one-step xprt deletion
  sunrpc: Add a sysfs file for adding a new xprt
  sunrpc: Add a sysfs files for rpc_clnt information
  sunrpc: Add a sysfs attr for xprtsec
  NFS: Add implid to sysfs
  NFS: Extend rdirplus mount option with "force|none"
  ...
2025-04-02 17:06:31 -07:00
Linus Torvalds
5e17b5c717 Merge tag 'fuse-update-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse updates from Miklos Szeredi:

 - Allow connection to server to time out (Joanne Koong)

 - If server doesn't support creating a hard link, return EPERM rather
   than ENOSYS (Matt Johnston)

 - Allow file names longer than 1024 chars (Bernd Schubert)

 - Fix a possible race if request on io_uring queue is interrupted
   (Bernd Schubert)

 - Misc fixes and cleanups

* tag 'fuse-update-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: remove unneeded atomic set in uring creation
  fuse: fix uring race condition for null dereference of fc
  fuse: Increase FUSE_NAME_MAX to PATH_MAX
  fuse: Allocate only namelen buf memory in fuse_notify_
  fuse: add default_request_timeout and max_request_timeout sysctls
  fuse: add kernel-enforced timeout option for requests
  fuse: optmize missing FUSE_LINK support
  fuse: Return EPERM rather than ENOSYS from link()
  fuse: removed unused function fuse_uring_create() from header
  fuse: {io-uring} Fix a possible req cancellation race
2025-04-02 16:36:59 -07:00
Linus Torvalds
0cc5543fad Merge tag 'ntfs3_for_6.15' of https://github.com/Paragon-Software-Group/linux-ntfs3
Pull ntfs3 updates from Konstantin Komarov:

 - Fix integer overflows on 32-bit systems and in hdr_first_de()

 - Fix 'proc_info_root' leak on NTFS initialization failure

 - Remove unused functions ni_load_attr, ntfs_sb_read, ntfs_flush_inodes

 - update inode->i_mapping->a_ops on compression state

 - ensure atomicity of write operations

 - refactor ntfs_{create/remove}_{procdir,proc_root}()

* tag 'ntfs3_for_6.15' of https://github.com/Paragon-Software-Group/linux-ntfs3:
  fs/ntfs3: Remove unused ntfs_flush_inodes
  fs/ntfs3: Remove unused ntfs_sb_read
  fs/ntfs3: Remove unused ni_load_attr
  fs/ntfs3: Prevent integer overflow in hdr_first_de()
  fs/ntfs3: Fix a couple integer overflows on 32bit systems
  fs/ntfs3: Update inode->i_mapping->a_ops on compression state
  fs/ntfs3: Fix WARNING in ntfs_extend_initialized_size
  fs/ntfs3: Fix 'proc_info_root' leak when init ntfs failed
  fs/ntfs3: Factor out ntfs_{create/remove}_proc_root()
  fs/ntfs3: Factor out ntfs_{create/remove}_procdir()
  fs/ntfs3: Keep write operations atomic
2025-04-02 16:30:02 -07:00
Linus Torvalds
4b06c990c1 Merge tag 'vfs-6.15-rc1.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:

 - Add a new maintainer for configfs

 - Fix exportfs module description

 - Place flexible array memeber at the end of an internal struct in the
   mount code

 - Add new maintainer for netfslib as Jeff Layton is stepping down as
   current co-maintainer

 - Fix error handling in cachefiles_get_directory()

 - Cleanup do_notify_pidfd()

 - Fix syscall number definitions in pidfd selftests

 - Fix racy usage of fs_struct->in exec during multi-threaded exec

 - Ensure correct exit code is reported when pidfs_exit() is called from
   release_task() for a delayed thread-group leader exit

 - Fix conflicting iomap flag definitions

* tag 'vfs-6.15-rc1.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  iomap: Fix conflicting values of iomap flags
  fs: namespace: Avoid -Wflex-array-member-not-at-end warning
  MAINTAINERS: configfs: add Andreas Hindborg as maintainer
  exportfs: add module description
  exit: fix the usage of delay_group_leader->exit_code in do_notify_parent() and pidfs_exit()
  netfs: add Paulo as maintainer and remove myself as Reviewer
  cachefiles: Fix oops in vfs_mkdir from cachefiles_get_directory
  exec: fix the racy usage of fs_struct->in_exec
  selftests/pidfd: fixes syscall number defines
  pidfs: cleanup the usage of do_notify_pidfd()
2025-04-02 16:05:21 -07:00
Linus Torvalds
8a6b94032e Merge tag 'uml-for-linux-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux
Pull UML updates from Johannes Berg:

 - proper nofault accesses and read-only rodata

 - hostfs fix for host inode number reuse

 - fixes for host errno handling

 - various cleanups/small fixes

* tag 'uml-for-linux-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux:
  um: Rewrite the sigio workaround based on epoll and tgkill
  um: Prohibit the VM_CLONE flag in run_helper_thread()
  um: Switch to the pthread-based helper in sigio workaround
  um: ubd: Switch to the pthread-based helper
  um: Add pthread-based helper support
  um: x86: clean up elf specific definitions
  um: Store full CSGSFS and SS register from mcontext
  um: virt-pci: Refactor virtio_pcidev into its own module
  um: work around sched_yield not yielding in time-travel mode
  um/locking: Remove semicolon from "lock" prefix
  um: Update min_low_pfn to match changes in uml_reserved
  um: use str_yes_no() to remove hardcoded "yes" and "no"
  um: hostfs: avoid issues on inode number reuse by host
  um: Allocate vdso page pointer statically
  um: remove copy_from_kernel_nofault_allowed
  um: mark rodata read-only and implement _nofault accesses
  um: Pass the correct Rust target and options with gcc
2025-04-02 12:25:03 -07:00
Eric Biggers
a07c43e6c2 bcachefs: add missing selection of XARRAY_MULTI
When CONFIG_XARRAY_MULTI is not set, reading from a bcachefs file hits
the 'BUG_ON(order > 0);' in xas_set_order(), because it tries to insert
a large folio in the page cache.  Fix this by making bcachefs select
XARRAY_MULTI.

Fixes: be212d86b1 ("bcachefs: bs > ps support")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-02 10:24:34 -04:00
Kent Overstreet
955ba7b5ea bcachefs: bch_dev_usage_full
All the fastpaths that need device usage don't need the sector totals or
fragmentation, just bucket counts.

Split bch_dev_usage up into two different versions, the normal one with
just bucket counts.

This is also a stack usage improvement, since we have a bch_dev_usage on
the stack in the allocation path.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-02 10:24:34 -04:00
Kent Overstreet
9180ad2e16 bcachefs: Kill btree_iter.trans
This was planned to be done ages ago, now finally completed; there are
places where we have quite a few btree_trans objects on the stack, so
this reduces stack usage somewhat.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-02 10:24:34 -04:00
Kent Overstreet
1c8f4587d2 bcachefs: do_trace_key_cache_fill()
Reducing stack frame usage; this moves the printbuf out of the main
stack frame.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-02 10:24:34 -04:00
Kent Overstreet
dcffc3b1ae bcachefs: Split up bch_dev.io_ref
We now have separate per device io_refs for read and write access.

This fixes a device removal bug where the discard workers were still
running while we're removing alloc info for that device.

It's also a bit of hardening; we no longer allow writes to devices that
are read-only.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-02 10:24:34 -04:00
Dan Carpenter
8e5419d654 nfs: Add missing release on error in nfs_lock_and_join_requests()
Call nfs_release_request() on this error path before returning.

Fixes: c3f2235782 ("nfs: fold nfs_folio_find_and_lock_request into nfs_lock_and_join_requests")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/r/3aaaa3d5-1c8a-41e4-98c7-717801ddd171@stanley.mountain
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-04-02 09:53:16 -04:00
Namjae Jeon
c8b5b7c5da ksmbd: fix null pointer dereference in alloc_preauth_hash()
The Client send malformed smb2 negotiate request. ksmbd return error
response. Subsequently, the client can send smb2 session setup even
thought conn->preauth_info is not allocated.
This patch add KSMBD_SESS_NEED_SETUP status of connection to ignore
session setup request if smb2 negotiate phase is not complete.

Cc: stable@vger.kernel.org
Tested-by: Steve French <stfrench@microsoft.com>
Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-26505
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-04-01 23:02:20 -05:00
Peter Xu
fe4cdc2c4e mm/userfaultfd: fix release hang over concurrent GUP
This patch should fix a possible userfaultfd release() hang during
concurrent GUP.

This problem was initially reported by Dimitris Siakavaras in July 2023
[1] in a firecracker use case.  Firecracker has a separate process
handling page faults remotely, and when the process releases the
userfaultfd it can race with a concurrent GUP from KVM trying to fault in
a guest page during the secondary MMU page fault process.

A similar problem was reported recently again by Jinjiang Tu in March 2025
[2], even though the race happened this time with a mlockall() operation,
which does GUP in a similar fashion.

In 2017, commit 656710a60e ("userfaultfd: non-cooperative: closing the
uffd without triggering SIGBUS") was trying to fix this issue.  AFAIU,
that fixes well the fault paths but may not work yet for GUP.  In GUP, the
issue is NOPAGE will be almost treated the same as "page fault resolved"
in faultin_page(), then the GUP will follow page again, seeing page
missing, and it'll keep going into a live lock situation as reported.

This change makes core mm return RETRY instead of NOPAGE for both the GUP
and fault paths, proactively releasing the mmap read lock.  This should
guarantee the other release thread make progress on taking the write lock
and avoid the live lock even for GUP.

When at it, rearrange the comments to make sure it's uptodate.

[1] https://lore.kernel.org/r/79375b71-db2e-3e66-346b-254c90d915e2@cslab.ece.ntua.gr
[2] https://lore.kernel.org/r/20250307072133.3522652-1-tujinjiang@huawei.com

Link: https://lkml.kernel.org/r/20250312145131.1143062-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Jinjiang Tu <tujinjiang@huawei.com>
Cc: Dimitris Siakavaras <jimsiak@cslab.ece.ntua.gr>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:14:42 -07:00
Linus Torvalds
2cd5769fb0 Merge tag 'driver-core-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updatesk from Greg KH:
 "Here is the big set of driver core updates for 6.15-rc1. Lots of stuff
  happened this development cycle, including:

   - kernfs scaling changes to make it even faster thanks to rcu

   - bin_attribute constify work in many subsystems

   - faux bus minor tweaks for the rust bindings

   - rust binding updates for driver core, pci, and platform busses,
     making more functionaliy available to rust drivers. These are all
     due to people actually trying to use the bindings that were in
     6.14.

   - make Rafael and Danilo full co-maintainers of the driver core
     codebase

   - other minor fixes and updates"

* tag 'driver-core-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (52 commits)
  rust: platform: require Send for Driver trait implementers
  rust: pci: require Send for Driver trait implementers
  rust: platform: impl Send + Sync for platform::Device
  rust: pci: impl Send + Sync for pci::Device
  rust: platform: fix unrestricted &mut platform::Device
  rust: pci: fix unrestricted &mut pci::Device
  rust: device: implement device context marker
  rust: pci: use to_result() in enable_device_mem()
  MAINTAINERS: driver core: mark Rafael and Danilo as co-maintainers
  rust/kernel/faux: mark Registration methods inline
  driver core: faux: only create the device if probe() succeeds
  rust/faux: Add missing parent argument to Registration::new()
  rust/faux: Drop #[repr(transparent)] from faux::Registration
  rust: io: fix devres test with new io accessor functions
  rust: io: rename `io::Io` accessors
  kernfs: Move dput() outside of the RCU section.
  efi: rci2: mark bin_attribute as __ro_after_init
  rapidio: constify 'struct bin_attribute'
  firmware: qemu_fw_cfg: constify 'struct bin_attribute'
  powerpc/perf/hv-24x7: Constify 'struct bin_attribute'
  ...
2025-04-01 11:02:03 -07:00
Linus Torvalds
d6b02199cd Merge tag 'mm-nonmm-stable-2025-03-30-18-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:

 - The series "powerpc/crash: use generic crashkernel reservation" from
   Sourabh Jain changes powerpc's kexec code to use more of the generic
   layers.

 - The series "get_maintainer: report subsystem status separately" from
   Vlastimil Babka makes some long-requested improvements to the
   get_maintainer output.

 - The series "ucount: Simplify refcounting with rcuref_t" from
   Sebastian Siewior cleans up and optimizing the refcounting in the
   ucount code.

 - The series "reboot: support runtime configuration of emergency
   hw_protection action" from Ahmad Fatoum improves the ability for a
   driver to perform an emergency system shutdown or reboot.

 - The series "Converge on using secs_to_jiffies() part two" from Easwar
   Hariharan performs further migrations from msecs_to_jiffies() to
   secs_to_jiffies().

 - The series "lib/interval_tree: add some test cases and cleanup" from
   Wei Yang permits more userspace testing of kernel library code, adds
   some more tests and performs some cleanups.

 - The series "hung_task: Dump the blocking task stacktrace" from Masami
   Hiramatsu arranges for the hung_task detector to dump the stack of
   the blocking task and not just that of the blocked task.

 - The series "resource: Split and use DEFINE_RES*() macros" from Andy
   Shevchenko provides some cleanups to the resource definition macros.

 - Plus the usual shower of singleton patches - please see the
   individual changelogs for details.

* tag 'mm-nonmm-stable-2025-03-30-18-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (77 commits)
  mailmap: consolidate email addresses of Alexander Sverdlin
  fs/procfs: fix the comment above proc_pid_wchan()
  relay: use kasprintf() instead of fixed buffer formatting
  resource: replace open coded variant of DEFINE_RES()
  resource: replace open coded variants of DEFINE_RES_*_NAMED()
  resource: replace open coded variant of DEFINE_RES_NAMED_DESC()
  resource: split DEFINE_RES_NAMED_DESC() out of DEFINE_RES_NAMED()
  samples: add hung_task detector mutex blocking sample
  hung_task: show the blocker task if the task is hung on mutex
  kexec_core: accept unaccepted kexec segments' destination addresses
  watchdog/perf: optimize bytes copied and remove manual NUL-termination
  lib/interval_tree: fix the comment of interval_tree_span_iter_next_gap()
  lib/interval_tree: skip the check before go to the right subtree
  lib/interval_tree: add test case for span iteration
  lib/interval_tree: add test case for interval_tree_iter_xxx() helpers
  lib/rbtree: add random seed
  lib/rbtree: split tests
  lib/rbtree: enable userland test suite for rbtree related data structure
  checkpatch: describe --min-conf-desc-length
  scripts/gdb/symbols: determine KASLR offset on s390
  ...
2025-04-01 10:06:52 -07:00
Linus Torvalds
eb0ece1602 Merge tag 'mm-stable-2025-03-30-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:

 - The series "Enable strict percpu address space checks" from Uros
   Bizjak uses x86 named address space qualifiers to provide
   compile-time checking of percpu area accesses.

   This has caused a small amount of fallout - two or three issues were
   reported. In all cases the calling code was found to be incorrect.

 - The series "Some cleanup for memcg" from Chen Ridong implements some
   relatively monir cleanups for the memcontrol code.

 - The series "mm: fixes for device-exclusive entries (hmm)" from David
   Hildenbrand fixes a boatload of issues which David found then using
   device-exclusive PTE entries when THP is enabled. More work is
   needed, but this makes thins better - our own HMM selftests now
   succeed.

 - The series "mm: zswap: remove z3fold and zbud" from Yosry Ahmed
   remove the z3fold and zbud implementations. They have been deprecated
   for half a year and nobody has complained.

 - The series "mm: further simplify VMA merge operation" from Lorenzo
   Stoakes implements numerous simplifications in this area. No runtime
   effects are anticipated.

 - The series "mm/madvise: remove redundant mmap_lock operations from
   process_madvise()" from SeongJae Park rationalizes the locking in the
   madvise() implementation. Performance gains of 20-25% were observed
   in one MADV_DONTNEED microbenchmark.

 - The series "Tiny cleanup and improvements about SWAP code" from
   Baoquan He contains a number of touchups to issues which Baoquan
   noticed when working on the swap code.

 - The series "mm: kmemleak: Usability improvements" from Catalin
   Marinas implements a couple of improvements to the kmemleak
   user-visible output.

 - The series "mm/damon/paddr: fix large folios access and schemes
   handling" from Usama Arif provides a couple of fixes for DAMON's
   handling of large folios.

 - The series "mm/damon/core: fix wrong and/or useless damos_walk()
   behaviors" from SeongJae Park fixes a few issues with the accuracy of
   kdamond's walking of DAMON regions.

 - The series "expose mapping wrprotect, fix fb_defio use" from Lorenzo
   Stoakes changes the interaction between framebuffer deferred-io and
   core MM. No functional changes are anticipated - this is preparatory
   work for the future removal of page structure fields.

 - The series "mm/damon: add support for hugepage_size DAMOS filter"
   from Usama Arif adds a DAMOS filter which permits the filtering by
   huge page sizes.

 - The series "mm: permit guard regions for file-backed/shmem mappings"
   from Lorenzo Stoakes extends the guard region feature from its
   present "anon mappings only" state. The feature now covers shmem and
   file-backed mappings.

 - The series "mm: batched unmap lazyfree large folios during
   reclamation" from Barry Song cleans up and speeds up the unmapping
   for pte-mapped large folios.

 - The series "reimplement per-vma lock as a refcount" from Suren
   Baghdasaryan puts the vm_lock back into the vma. Our reasons for
   pulling it out were largely bogus and that change made the code more
   messy. This patchset provides small (0-10%) improvements on one
   microbenchmark.

 - The series "Docs/mm/damon: misc DAMOS filters documentation fixes and
   improves" from SeongJae Park does some maintenance work on the DAMON
   docs.

 - The series "hugetlb/CMA improvements for large systems" from Frank
   van der Linden addresses a pile of issues which have been observed
   when using CMA on large machines.

 - The series "mm/damon: introduce DAMOS filter type for unmapped pages"
   from SeongJae Park enables users of DMAON/DAMOS to filter my the
   page's mapped/unmapped status.

 - The series "zsmalloc/zram: there be preemption" from Sergey
   Senozhatsky teaches zram to run its compression and decompression
   operations preemptibly.

 - The series "selftests/mm: Some cleanups from trying to run them" from
   Brendan Jackman fixes a pile of unrelated issues which Brendan
   encountered while runnimg our selftests.

 - The series "fs/proc/task_mmu: add guard region bit to pagemap" from
   Lorenzo Stoakes permits userspace to use /proc/pid/pagemap to
   determine whether a particular page is a guard page.

 - The series "mm, swap: remove swap slot cache" from Kairui Song
   removes the swap slot cache from the allocation path - it simply
   wasn't being effective.

 - The series "mm: cleanups for device-exclusive entries (hmm)" from
   David Hildenbrand implements a number of unrelated cleanups in this
   code.

 - The series "mm: Rework generic PTDUMP configs" from Anshuman Khandual
   implements a number of preparatoty cleanups to the GENERIC_PTDUMP
   Kconfig logic.

 - The series "mm/damon: auto-tune aggregation interval" from SeongJae
   Park implements a feedback-driven automatic tuning feature for
   DAMON's aggregation interval tuning.

 - The series "Fix lazy mmu mode" from Ryan Roberts fixes some issues in
   powerpc, sparc and x86 lazy MMU implementations. Ryan did this in
   preparation for implementing lazy mmu mode for arm64 to optimize
   vmalloc.

 - The series "mm/page_alloc: Some clarifications for migratetype
   fallback" from Brendan Jackman reworks some commentary to make the
   code easier to follow.

 - The series "page_counter cleanup and size reduction" from Shakeel
   Butt cleans up the page_counter code and fixes a size increase which
   we accidentally added late last year.

 - The series "Add a command line option that enables control of how
   many threads should be used to allocate huge pages" from Thomas
   Prescher does that. It allows the careful operator to significantly
   reduce boot time by tuning the parallalization of huge page
   initialization.

 - The series "Fix calculations in trace_balance_dirty_pages() for cgwb"
   from Tang Yizhou fixes the tracing output from the dirty page
   balancing code.

 - The series "mm/damon: make allow filters after reject filters useful
   and intuitive" from SeongJae Park improves the handling of allow and
   reject filters. Behaviour is made more consistent and the documention
   is updated accordingly.

 - The series "Switch zswap to object read/write APIs" from Yosry Ahmed
   updates zswap to the new object read/write APIs and thus permits the
   removal of some legacy code from zpool and zsmalloc.

 - The series "Some trivial cleanups for shmem" from Baolin Wang does as
   it claims.

 - The series "fs/dax: Fix ZONE_DEVICE page reference counts" from
   Alistair Popple regularizes the weird ZONE_DEVICE page refcount
   handling in DAX, permittig the removal of a number of special-case
   checks.

 - The series "refactor mremap and fix bug" from Lorenzo Stoakes is a
   preparatoty refactoring and cleanup of the mremap() code.

 - The series "mm: MM owner tracking for large folios (!hugetlb) +
   CONFIG_NO_PAGE_MAPCOUNT" from David Hildenbrand reworks the manner in
   which we determine whether a large folio is known to be mapped
   exclusively into a single MM.

 - The series "mm/damon: add sysfs dirs for managing DAMOS filters based
   on handling layers" from SeongJae Park adds a couple of new sysfs
   directories to ease the management of DAMON/DAMOS filters.

 - The series "arch, mm: reduce code duplication in mem_init()" from
   Mike Rapoport consolidates many per-arch implementations of
   mem_init() into code generic code, where that is practical.

 - The series "mm/damon/sysfs: commit parameters online via
   damon_call()" from SeongJae Park continues the cleaning up of sysfs
   access to DAMON internal data.

 - The series "mm: page_ext: Introduce new iteration API" from Luiz
   Capitulino reworks the page_ext initialization to fix a boot-time
   crash which was observed with an unusual combination of compile and
   cmdline options.

 - The series "Buddy allocator like (or non-uniform) folio split" from
   Zi Yan reworks the code to split a folio into smaller folios. The
   main benefit is lessened memory consumption: fewer post-split folios
   are generated.

 - The series "Minimize xa_node allocation during xarry split" from Zi
   Yan reduces the number of xarray xa_nodes which are generated during
   an xarray split.

 - The series "drivers/base/memory: Two cleanups" from Gavin Shan
   performs some maintenance work on the drivers/base/memory code.

 - The series "Add tracepoints for lowmem reserves, watermarks and
   totalreserve_pages" from Martin Liu adds some more tracepoints to the
   page allocator code.

 - The series "mm/madvise: cleanup requests validations and
   classifications" from SeongJae Park cleans up some warts which
   SeongJae observed during his earlier madvise work.

 - The series "mm/hwpoison: Fix regressions in memory failure handling"
   from Shuai Xue addresses two quite serious regressions which Shuai
   has observed in the memory-failure implementation.

 - The series "mm: reliable huge page allocator" from Johannes Weiner
   makes huge page allocations cheaper and more reliable by reducing
   fragmentation.

 - The series "Minor memcg cleanups & prep for memdescs" from Matthew
   Wilcox is preparatory work for the future implementation of memdescs.

 - The series "track memory used by balloon drivers" from Nico Pache
   introduces a way to track memory used by our various balloon drivers.

 - The series "mm/damon: introduce DAMOS filter type for active pages"
   from Nhat Pham permits users to filter for active/inactive pages,
   separately for file and anon pages.

 - The series "Adding Proactive Memory Reclaim Statistics" from Hao Jia
   separates the proactive reclaim statistics from the direct reclaim
   statistics.

 - The series "mm/vmscan: don't try to reclaim hwpoison folio" from
   Jinjiang Tu fixes our handling of hwpoisoned pages within the reclaim
   code.

* tag 'mm-stable-2025-03-30-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (431 commits)
  mm/page_alloc: remove unnecessary __maybe_unused in order_to_pindex()
  x86/mm: restore early initialization of high_memory for 32-bits
  mm/vmscan: don't try to reclaim hwpoison folio
  mm/hwpoison: introduce folio_contain_hwpoisoned_page() helper
  cgroup: docs: add pswpin and pswpout items in cgroup v2 doc
  mm: vmscan: split proactive reclaim statistics from direct reclaim statistics
  selftests/mm: speed up split_huge_page_test
  selftests/mm: uffd-unit-tests support for hugepages > 2M
  docs/mm/damon/design: document active DAMOS filter type
  mm/damon: implement a new DAMOS filter type for active pages
  fs/dax: don't disassociate zero page entries
  MM documentation: add "Unaccepted" meminfo entry
  selftests/mm: add commentary about 9pfs bugs
  fork: use __vmalloc_node() for stack allocation
  docs/mm: Physical Memory: Populate the "Zones" section
  xen: balloon: update the NR_BALLOON_PAGES state
  hv_balloon: update the NR_BALLOON_PAGES state
  balloon_compaction: update the NR_BALLOON_PAGES state
  meminfo: add a per node counter for balloon drivers
  mm: remove references to folio in __memcg_kmem_uncharge_page()
  ...
2025-04-01 09:29:18 -07:00
Pali Rohár
e97aec7889 cifs: Do not add FILE_READ_ATTRIBUTES when using GENERIC_READ/EXECUTE/ALL
Individual bits GENERIC_READ, GENERIC_EXECUTE and GENERIC_ALL have meaning
which includes also access right for FILE_READ_ATTRIBUTES. So specifying
FILE_READ_ATTRIBUTES bit together with one of those GENERIC (except
GENERIC_WRITE) does not do anything.

This change prevents calling additional (fallback) code and sending more
requests without FILE_READ_ATTRIBUTES when the primary request fails on
-EACCES, as it is not needed at all.

Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-04-01 04:58:09 -05:00
Pali Rohár
b07687edee cifs: Improve SMB2+ stat() to work also without FILE_READ_ATTRIBUTES
If SMB2_OP_QUERY_INFO (called when POSIX extensions are not used) failed
with STATUS_ACCESS_DENIED then it means that caller does not have
permission to open the path with FILE_READ_ATTRIBUTES access and therefore
cannot issue SMB2_OP_QUERY_INFO command.

This will result in the -EACCES error from stat() sycall.

There is an alternative way how to query limited information about path but
still suitable for stat() syscall. SMB2 OPEN/CREATE operation returns in
its successful response subset of query information.

So try to open the path without FILE_READ_ATTRIBUTES but with
MAXIMUM_ALLOWED access which will grant the maximum possible access to the
file and the response will contain required query information for stat()
syscall.

This will improve smb2_query_path_info() to query also files which do not
grant FILE_READ_ATTRIBUTES access to caller.

Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-04-01 04:58:05 -05:00
Pali Rohár
e255612b5e cifs: Add fallback for SMB2 CREATE without FILE_READ_ATTRIBUTES
Some operations, like WRITE, does not require FILE_READ_ATTRIBUTES access.

So when FILE_READ_ATTRIBUTES is not explicitly requested for
smb2_open_file() then first try to do SMB2 CREATE with FILE_READ_ATTRIBUTES
access (like it was before) and then fallback to SMB2 CREATE without
FILE_READ_ATTRIBUTES access (less common case).

This change allows to complete WRITE operation to a file when it does not
grant FILE_READ_ATTRIBUTES permission and its parent directory does not
grant READ_DATA permission (parent directory READ_DATA is implicit grant of
child FILE_READ_ATTRIBUTES permission).

Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-04-01 04:58:00 -05:00
Pali Rohár
4236ac9fe5 cifs: Fix querying and creating MF symlinks over SMB1
Old SMB1 servers without CAP_NT_SMBS do not support CIFS_open() function
and instead SMBLegacyOpen() needs to be used. This logic is already handled
in cifs_open_file() function, which is server->ops->open callback function.

So for querying and creating MF symlinks use open callback function instead
of CIFS_open() function directly.

This change fixes querying and creating new MF symlinks on Windows 98.
Currently cifs_query_mf_symlink() is not able to detect MF symlink and
cifs_create_mf_symlink() is failing with EIO error.

Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-04-01 01:54:17 -05:00
Pali Rohár
6aa9f1c9cd cifs: Fix access_flags_to_smbopen_mode
When converting access_flags to SMBOPEN mode, check for all possible access
flags, not only GENERIC_READ and GENERIC_WRITE flags.

Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-04-01 01:54:17 -05:00
Pali Rohár
e94e882a6d cifs: Fix negotiate retry functionality
SMB negotiate retry functionality in cifs_negotiate() is currently broken
and does not work when doing socket reconnect. Caller of this function,
which is cifs_negotiate_protocol() requires that tcpStatus after successful
execution of negotiate callback stay in CifsInNegotiate. But if the
CIFSSMBNegotiate() called from cifs_negotiate() fails due to connection
issues then tcpStatus is changed as so repeated CIFSSMBNegotiate() call
does not help.

Fix this problem by moving retrying code from negotiate callback (which is
either cifs_negotiate() or smb2_negotiate()) to cifs_negotiate_protocol()
which is caller of those callbacks. This allows to properly handle and
implement correct transistions between tcpStatus states as function
cifs_negotiate_protocol() already handles it.

With this change, cifs_negotiate_protocol() now handles also -EAGAIN error
set by the RFC1002_NEGATIVE_SESSION_RESPONSE processing after reconnecting
with NetBIOS session.

Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-04-01 01:54:17 -05:00
Pali Rohár
665e187948 cifs: Improve handling of NetBIOS packets
Now all NetBIOS session logic is handled in ip_rfc1001_connect() function,
so cleanup is_smb_response() function which contains generic handling of
incoming SMB packets. Note that function is_smb_response() is not used
directly or indirectly (e.g. over cifs_demultiplex_thread() by
ip_rfc1001_connect() function.

Except the Negative Session Response and the Session Keep Alive packet, the
cifs_demultiplex_thread() should not receive any NetBIOS session packets.
And Session Keep Alive packet may be received only when the NetBIOS session
was established by ip_rfc1001_connect() function. So treat any such packet
as error and schedule reconnect.

Negative Session Response packet is returned from Windows SMB server (from
Windows 98 and also from Windows Server 2022) if client sent over port 139
SMB negotiate request without previously establishing a NetBIOS session.
The common scenario is that Negative Session Response packet is returned
for the SMB negotiate packet, which is the first one which SMB client
sends (if it is not establishing a NetBIOS session).

Note that server port 139 may be forwarded and mapped between virtual
machines to different number. And Linux SMB client do not call function
ip_rfc1001_connect() when prot is not 139. So nowadays when using port
mapping or port forwarding between VMs, it is not so uncommon to see this
error.

Currently the logic on Negative Session Response packet changes server port
to 445 and force reconnection. But this logic does not work when using
non-standard port numbers and also does not help if the server on specified
port is requiring establishing a NetBIOS session.

Fix this Negative Session Response logic and instead of changing server
port (on which server does not have to listen), force reconnection with
establishing a NetBIOS session.

Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-04-01 01:54:17 -05:00
Pali Rohár
7d14dd683b cifs: Allow to disable or force initialization of NetBIOS session
Currently SMB client always tries to initialize NetBIOS session when the
server port is 139. This is useful for default cases, but nowadays when
using non-standard routing or testing between VMs, it is common that
servers are listening on non-standard ports.

So add a new mount option -o nbsessinit and -o nonbsessinit which either
forces initialization or disables initialization regardless of server port
number.

This allows Linux SMB client to connect to older SMB1 server listening on
non-standard port, which requires initialization of NetBIOS session, by
using additional mount options -o port= and -o nbsessinit.

Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-04-01 01:54:17 -05:00
Pali Rohár
b1a37df6ba cifs: Add a new xattr system.smb3_ntsd_owner for getting or setting owner
Changing owner is controlled by DACL permission WRITE_OWNER. Changing DACL
itself is controlled by DACL permisssion WRITE_DAC. Owner of the file has
implicit WRITE_DAC permission even when it is not explicitly granted for
owner by DACL.

Reading DACL or owner is controlled only by one permission READ_CONTROL.
WRITE_OWNER permission can be bypassed by the SeTakeOwnershipPrivilege,
which is by default available for local administrators.

So if the local administrator wants to access some file to which does not
have access, it is required to first change owner to ourself and then
change DACL permissions.

Currently Linux SMB client does not support this because client does not
provide a way to change owner without touching DACL permissions.

Fix this problem by introducing a new xattr "system.smb3_ntsd_owner" for
setting/changing only owner part of the security descriptor.

Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-04-01 01:54:17 -05:00
Pali Rohár
bf782ada45 cifs: Add a new xattr system.smb3_ntsd_sacl for getting or setting SACLs
Access to SACL part of SMB security descriptor is granted by SACL privilege
which by default is accessible only for local administrator. But it can be
granted to any other user by local GPO or AD. SACL access is not granted by
DACL permissions and therefore is it possible that some user would not have
access to DACLs of some file, but would have access to SACLs of all files.
So it means that for accessing SACLs (either getting or setting) in some
cases requires not touching or asking for DACLs.

Currently Linux SMB client does not allow to get or set SACLs without
touching DACLs. Which means that user without DACL access is not able to
get or set SACLs even if it has access to SACLs.

Fix this problem by introducing a new xattr "system.smb3_ntsd_sacl" for
accessing only SACLs part of the security descriptor (therefore without
DACLs and OWNER/GROUP).

Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-04-01 01:54:17 -05:00
Kent Overstreet
f1350c2c74 bcachefs: fix ref leak in btree_node_read_all_replicas
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-01 02:47:41 -04:00
Norbert Szetei
bf21e29d78 ksmbd: validate zero num_subauth before sub_auth is accessed
Access psid->sub_auth[psid->num_subauth - 1] without checking
if num_subauth is non-zero leads to an out-of-bounds read.
This patch adds a validation step to ensure num_subauth != 0
before sub_auth is accessed.

Cc: stable@vger.kernel.org
Signed-off-by: Norbert Szetei <norbert@doyensec.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-04-01 00:04:21 -05:00
Norbert Szetei
beff0bc9d6 ksmbd: fix overflow in dacloffset bounds check
The dacloffset field was originally typed as int and used in an
unchecked addition, which could overflow and bypass the existing
bounds check in both smb_check_perm_dacl() and smb_inherit_dacl().

This could result in out-of-bounds memory access and a kernel crash
when dereferencing the DACL pointer.

This patch converts dacloffset to unsigned int and uses
check_add_overflow() to validate access to the DACL.

Cc: stable@vger.kernel.org
Signed-off-by: Norbert Szetei <norbert@doyensec.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-04-01 00:04:21 -05:00
Namjae Jeon
fa4cdb8cbc ksmbd: fix session use-after-free in multichannel connection
There is a race condition between session setup and
ksmbd_sessions_deregister. The session can be freed before the connection
is added to channel list of session.
This patch check reference count of session before freeing it.

Cc: stable@vger.kernel.org
Reported-by: Sean Heelan <seanheelan@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-04-01 00:04:21 -05:00
Wang Zhaolong
764da2fff3 smb: client: Update IO sizes after reconnection
When a SMB connection is reset and reconnected, the negotiated IO
parameters (rsize/wsize) can become out of sync with the server's
current capabilities. This can lead to suboptimal performance or
even IO failures if the server's limits have changed.

This patch implements automatic IO size renegotiation:
1. Adds cifs_renegotiate_iosize() function to update all superblocks
   associated with a tree connection
2. Updates each mount's rsize/wsize based on current server capabilities
3. Calls this function after successful tree connection reconnection

With this change, all mount points will automatically maintain optimal
and reliable IO parameters after network disruptions, using the
bidirectional mapping added in previous patches.

This completes the series improving connection resilience by keeping
mount parameters synchronized with server capabilities.

Signed-off-by: Wang Zhaolong <wangzhaolong1@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2025-03-31 21:12:31 -05:00