linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-16 07:51:31 -04:00

Author	SHA1	Message	Date
Tejun Heo	4fda9f0e7c	sched_ext: Guard scx_dsq_move() against NULL kit->dsq after failed iter_new bpf_iter_scx_dsq_new() clears kit->dsq on failure and bpf_iter_scx_dsq_{next,destroy}() guard against that. scx_dsq_move() doesn't - it dereferences kit->dsq immediately, so a BPF program that calls scx_bpf_dsq_move[_vtime]() after a failed iter_new oopses the kernel. Return false if kit->dsq is NULL. Fixes: `4c30f5ce4f` ("sched_ext: Implement scx_bpf_dispatch[_vtime]_from_dsq()") Cc: stable@vger.kernel.org # v6.12+ Reported-by: Chris Mason <clm@meta.com> Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Andrea Righi <arighi@nvidia.com>	2026-04-24 14:31:35 -10:00
Tejun Heo	411d3ef1a7	sched_ext: Unregister sub_kset on scheduler disable When ops.sub_attach is set, scx_alloc_and_add_sched() creates sub_kset as a child of &sch->kobj, which pins the parent with its own reference. The disable paths never call kset_unregister(), so the final kobject_put() in bpf_scx_unreg() leaves a stale reference and scx_kobj_release() never runs, leaking the whole struct scx_sched on every load/unload cycle. Unregister sub_kset in scx_root_disable() and scx_sub_disable() before kobject_del(&sch->kobj). Fixes: `ebeca1f930` ("sched_ext: Introduce cgroup sub-sched support") Reported-by: Chris Mason <clm@meta.com> Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Andrea Righi <arighi@nvidia.com>	2026-04-24 14:31:35 -10:00
Tejun Heo	bd2d76455b	sched_ext: Defer scx_hardlockup() out of NMI scx_hardlockup() runs from NMI and eventually calls scx_claim_exit(), which takes scx_sched_lock. scx_sched_lock isn't NMI-safe and grabbing it from NMI context can lead to deadlocks. The hardlockup handler is best-effort recovery and the disable path it triggers runs off of irq_work anyway. Move the handle_lockup() call into an irq_work so it runs in IRQ context. Fixes: `ebeca1f930` ("sched_ext: Introduce cgroup sub-sched support") Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Andrea Righi <arighi@nvidia.com>	2026-04-24 14:13:22 -10:00
Richard Cheng	510a270554	sched_ext: sync disable_irq_work in bpf_scx_unreg() When unregistered my self-written scx scheduler, the following panic occurs. [ 229.923133] Kernel text patching generated an invalid instruction at 0xffff80009bc2c1f8! [ 229.923146] Internal error: Oops - BRK: 00000000f2000100 [#1] SMP [ 230.077871] CPU: 48 UID: 0 PID: 1760 Comm: kworker/u583:7 Not tainted 7.0.0+ #3 PREEMPT(full) [ 230.086677] Hardware name: NVIDIA GB200 NVL/P3809-BMC, BIOS 02.05.12 20251107 [ 230.093972] Workqueue: events_unbound bpf_map_free_deferred [ 230.099675] Sched_ext: invariant_0.1.0_aarch64_unknown_linux_gnu_debug (disabling), task: runnable_at=-174ms [ 230.116843] pc : 0xffff80009bc2c1f8 [ 230.120406] lr : dequeue_task_scx+0x270/0x2d0 [ 230.217749] Call trace: [ 230.228515] 0xffff80009bc2c1f8 (P) [ 230.232077] dequeue_task+0x84/0x188 [ 230.235728] sched_change_begin+0x1dc/0x250 [ 230.240000] __set_cpus_allowed_ptr_locked+0x17c/0x240 [ 230.245250] __set_cpus_allowed_ptr+0x74/0xf0 [ 230.249701] ___migrate_enable+0x4c/0xa0 [ 230.253707] bpf_map_free_deferred+0x1a4/0x1b0 [ 230.258246] process_one_work+0x184/0x540 [ 230.262342] worker_thread+0x19c/0x348 [ 230.266170] kthread+0x13c/0x150 [ 230.269465] ret_from_fork+0x10/0x20 [ 230.281393] Code: d4202000 d4202000 d4202000 d4202000 (d4202000) [ 230.287621] ---[ end trace 0000000000000000 ]--- [ 231.160046] Kernel panic - not syncing: Oops - BRK: Fatal exception in interrupt The root cause is that the JIT page backing ops->quiescent() is freed before all callers of that function have stopped. The expected ordering during teardown is: bitmap_zero(sch->has_op) + synchronize_rcu() -> guarantees no CPU will ever call sch->ops.* again -> only THEN free the BPF struct_ops JIT page bpf_scx_unreg() is supposed to enforce the order, but after commit `f4a6c506d1` ("sched_ext: Always bounce scx_disable() through irq_work"), disable_work is no longer queued directly, causing kthread_flush_work() to be a noop. Thus, the caller drops the struct_ops map too early and poisoned with AARCH64_BREAK_FAULT before disable_workfn ever execute. So the subsequent dequeue_task() still sees SCX_HAS_OP(sch, quiescent) as true and calls ops.quiescent, which hit on the poisoned page and BRK panic. Add a helper scx_flush_disable_work() so the future use cases that want to flush disable_work can use it. Also amend the call for scx_root_enable_workfn() and scx_sub_enable_workfn() which have similar pattern in the error path. Fixes: `f4a6c506d1` ("sched_ext: Always bounce scx_disable() through irq_work") Signed-off-by: Richard Cheng <icheng@nvidia.com> Reviewed-by: Andrea Righi <arighi@nvidia.com> Reviewed-by: Cheng-Yang Chou <yphbchou0911@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2026-04-24 07:26:48 -10:00
zhidao su	4e3d7c89e1	sched_ext: Fix local_dsq_post_enq() to use task's scheduler in sub-sched local_dsq_post_enq() calls call_task_dequeue() with scx_root instead of the scheduler instance actually managing the task. When CONFIG_EXT_SUB_SCHED is enabled, tasks may be managed by a sub-scheduler whose ops.dequeue() callback differs from root's. Using scx_root causes the wrong scheduler's ops.dequeue() to be consulted: sub-sched tasks dispatched to a local DSQ via scx_bpf_dsq_move_to_local() will have SCX_TASK_IN_CUSTODY cleared but the sub-scheduler's ops.dequeue() is never invoked, violating the custody exit semantics. Fix by adding a 'struct scx_sched *sch' parameter to local_dsq_post_enq() and move_local_task_to_local_dsq(), and propagating the correct scheduler from their callers dispatch_enqueue(), move_task_between_dsqs(), and consume_dispatch_q(). This is consistent with dispatch_enqueue()'s non-local path which already passes 'sch' directly to call_task_dequeue() for global/bypass DSQs. Fixes: `ebf1ccff79` ("sched_ext: Fix ops.dequeue() semantics") Signed-off-by: zhidao su <suzhidao@xiaomi.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2026-04-23 06:36:56 -10:00
Tejun Heo	05909810a9	tools/sched_ext: scx_qmap: Silence task_ctx lookup miss scx_fork() dispatches ops.init_task to exactly one scheduler - the one owning the forking task's cgroup. A task forked inside a sub-scheduler's cgroup is init'd into the sub only; the root scheduler has no task_ctx entry for it. When that task later appears as @prev in the root's qmap_dispatch() (or flows through core-sched comparison via task_qdist), the bpf_task_storage_get() legitimately misses. qmap treated those misses as fatal via scx_bpf_error("task_ctx lookup failed") and aborted the scheduler as soon as the first cross-sched task hit the root. Drop the error in the sites where the miss is legitimate: lookup_task_ctx() (helper; callers already check for NULL), qmap_dispatch()'s @prev branch (bookkeeping-only), task_qdist() (returns 0 which makes the comparison a no-op), and qmap_select_cpu() (returns prev_cpu as a no-op fallback instead of -ESRCH). The existing scx_error was a paranoid guard from the pre-sub-sched world where every task was owned by the one and only scheduler. v2: qmap_select_cpu() returns prev_cpu on NULL instead of -ESRCH, so the root scheduler doesn't error on cross-sched tasks that pass through it (Andrea Righi). Fixes: `4f8b122848` ("sched_ext: Add basic building blocks for nested sub-scheduler dispatching") Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Andrea Righi <arighi@nvidia.com> Reviewed-by: Zhao Mengmeng <zhaomengmeng@kylinos.cn>	2026-04-21 06:18:58 -10:00
Tejun Heo	4fe9852927	rhashtable: Bounce deferred worker kick through irq_work Inserts past 75% load call schedule_work(&ht->run_work) to kick an async resize. If a caller holds a raw spinlock (e.g. an insecure_elasticity user), schedule_work() under that lock records caller_lock -> pool->lock -> pi_lock -> rq->__lock A cycle forms if any of these locks is acquired in the reverse direction elsewhere. sched_ext, the only current insecure_elasticity user, hits this: it holds scx_sched_lock across rhashtable inserts of sub-schedulers, while scx_bypass() takes rq->__lock -> scx_sched_lock. Exercising the resize path produces: Chain exists of: &pool->lock --> &rq->__lock --> scx_sched_lock Bounce the kick from the insert paths through irq_work so schedule_work() runs from hard IRQ context with the caller's lock no longer held. rht_deferred_worker()'s self-rearm on error stays on schedule_work(&ht->run_work) - the worker runs in process context with no caller lock held, and keeping the self-requeue on @run_work lets cancel_work_sync() in rhashtable_free_and_destroy() drain it. v3: Keep rht_deferred_worker()'s self-rearm on schedule_work(&run_work). Routing it through irq_work in v2 broke cancel_work_sync()'s self-requeue handling - an irq_work queued after irq_work_sync() returned but while cancel_work_sync() was still waiting could fire post-teardown. v2: Bounce unconditionally instead of gating on insecure_elasticity, as suggested by Herbert. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au>	2026-04-20 20:10:50 -10:00
Cheng-Yang Chou	5897ca15d2	selftests/sched_ext: Add non_scx_kfunc_deny test Verify that the BPF verifier rejects a non-SCX struct_ops program (tcp_congestion_ops) that attempts to call an SCX kfunc (scx_bpf_kick_cpu). The test expects the load to fail with -EACCES from scx_kfunc_context_filter. Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2026-04-20 07:57:29 -10:00
Cheng-Yang Chou	2d2b026c3e	sched_ext: Deny SCX kfuncs to non-SCX struct_ops programs scx_kfunc_context_filter() currently allows non-SCX struct_ops programs (e.g. tcp_congestion_ops) to call SCX unlocked kfuncs. This is wrong for two reasons: - It is semantically incorrect: a TCP congestion control program has no business calling SCX kfuncs such as scx_bpf_kick_cpu(). - With CONFIG_EXT_SUB_SCHED=y, kfuncs like scx_bpf_kick_cpu() call scx_prog_sched(aux), which invokes bpf_prog_get_assoc_struct_ops(aux) and casts the result to struct sched_ext_ops * before reading ops->priv. For a non-SCX struct_ops program the returned pointer is the kdata of that struct_ops type, which is far smaller than sched_ext_ops, making the read an out-of-bounds access (confirmed with KASAN). Extend the filter to cover scx_kfunc_set_any and scx_kfunc_set_idle as well, and deny all SCX kfuncs for any struct_ops program that is not the SCX struct_ops. This addresses both issues: the semantic contract is enforced at the verifier level, and the runtime out-of-bounds access becomes unreachable. Fixes: `d1d3c1c6ae` ("sched_ext: Add verifier-time kfunc context filter") Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2026-04-20 07:57:29 -10:00
Tejun Heo	87019cb6c2	sched_ext: Mark scx_sched_hash insecure_elasticity scx_sched_hash is inserted into under scx_sched_lock (raw_spinlock_irq) in scx_link_sched(). rhashtable's sync grow path calls get_random_u32() and does a GFP_ATOMIC allocation; both acquire regular spinlocks, which is unsafe under raw_spinlock_t. Set insecure_elasticity to skip the sync grow. v2: - Dropped dsq_hash changes. Insertion is not under raw_spin_lock. - Switched from no_sync_grow flag to insecure_elasticity. Fixes: `25037af712` ("sched_ext: Add rhashtable lookup for sub-schedulers") Signed-off-by: Tejun Heo <tj@kernel.org>	2026-04-19 05:47:28 -10:00
Herbert Xu	73bd122778	rhashtable: Restore insecure_elasticity toggle Some users of rhashtable cannot handle insertion failures, and are happy to accept the consequences of a hash table that having very long chains. Restore the insecure_elasticity toggle for these users. In addition to disabling the chain length checks, this also removes the emergency resize that would otherwise occur when the hash table occupancy hits 100% (an async resize is still scheduled at 75%). Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Tejun Heo <tj@kernel.org>	2026-04-19 05:47:21 -10:00
Linus Torvalds	3cd8b194bf	Merge tag 'v7.1-rc-part1-smbdirect-fixes' of git://git.samba.org/ksmbd Pull smbdirect updates from Steve French: "Move smbdirect server and client code to common directory: - temporary use of smbdirect_all_c_files.c to allow micro steps - factor out common functions into a smbdirect.ko. - convert cifs.ko to use smbdirect.ko - convert ksmbd.ko to use smbdirect.ko - let smbdirect.ko use global workqueues - move ib_client logic from ksmbd.ko into smbdirect.ko - remove smbdirect_all_c_files.c hack again - some locking and teardown related fixes on top" * tag 'v7.1-rc-part1-smbdirect-fixes' of git://git.samba.org/ksmbd: (145 commits) smb: smbdirect: let smbdirect_connection_deregister_mr_io unlock while waiting smb: smbdirect: fix the logic in smbdirect_socket_destroy_sync() without an error smb: smbdirect: fix copyright header of smbdirect.h smb: smbdirect: change smbdirect_socket_parameters.{initiator_depth,responder_resources} to __u16 smb: smbdirect: remove unused SMBDIRECT_USE_INLINE_C_FILES logic smb: server: no longer use smbdirect_socket_set_custom_workqueue() smb: client: no longer use smbdirect_socket_set_custom_workqueue() smb: smbdirect: introduce global workqueues smb: smbdirect: prepare use of dedicated workqueues for different steps smb: smbdirect: remove unused smbdirect_connection_mr_io_recovery_work() smb: smbdirect: wrap rdma_disconnect() in rdma_[un]lock_handler() smb: server: make use of smbdirect_netdev_rdma_capable_mode_type() smb: smbdirect: introduce smbdirect_netdev_rdma_capable_mode_type() smb: server: make use of smbdirect.ko smb: server: remove unused ksmbd_transport_ops.prepare() smb: server: make use of smbdirect_socket_{listen,accept}() smb: server: only use public smbdirect functions smb: server: make use of smbdirect_socket_create_accepting()/smbdirect_socket_release() smb: server: make use of smbdirect_{socket_init_accepting,connection_wait_for_connected}() smb: server: make use of smbdirect_connection_send_iter() and related functions ...	2026-04-16 08:25:04 -07:00
Linus Torvalds	d3d9443f8b	Merge tag 'livepatching-for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching Pull livepatching updates from Petr Mladek: - Add two new selftests * tag 'livepatching-for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching: selftests/livepatch: add test for module function patching selftests: livepatch: test-ftrace: livepatch a traced function	2026-04-16 08:13:27 -07:00
Linus Torvalds	090748e62f	Merge tag 'm68k-for-v7.1-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k Pull m68k updates from Geert Uytterhoeven: - Add support for QEMU virt-ctrl, and use it for system reset and power off on the virt platform - defconfig updates - Miscellaneous fixes and improvements * tag 'm68k-for-v7.1-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k: m68k: virt: Switch to qemu-virt-ctrl driver power: reset: Add QEMU virt-ctrl driver m68k: defconfig: Update defconfigs for v7.0-rc1 m68k: emu: Replace unbounded sprintf() in nfhd_init_one() m68k: uapi: Add ucontext.h m68k: defconfig: hp300: Enable monochrome and 16-color linux logos m68k: q40: Remove commented out code	2026-04-16 08:11:01 -07:00
Linus Torvalds	948ef73f7e	Merge tag 'efi-next-for-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI updates from Ard Biesheuvel: "Again not a busy cycle for EFI, just some minor tweaks and bug fixes: - Enable boot graphics resource table (BGRT) on Xen/x86 - Correct a misguided assumption in the memory attributes table sanity check - Start tagging efi_mem_reserve()'d regions as MEMBLOCK_RSRV_KERN - Some other minor fixes and cleanups" * tag 'efi-next-for-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: efi/capsule-loader: fix incorrect sizeof in phys array reallocation efi: Tag memblock reservations of boot services regions as RSRV_KERN memblock: Permit existing reserved regions to be marked RSRV_KERN efi/memattr: Fix thinko in table size sanity check efi: libstub: fix type of fdt 32 and 64bit variables efi: Drop unused efi_range_is_wc() function efi: Enable BGRT loading under Xen efi: make efi_mem_type() and efi_mem_attributes() work on Xen PV	2026-04-16 08:06:25 -07:00
Linus Torvalds	f0bf3eac92	Merge tag 'vfio-v7.1-rc1' of https://github.com/awilliam/linux-vfio Pull VFIO updates from Alex Williamson: - Update QAT vfio-pci variant driver for Gen 5, 420xx devices (Vijay Sundar Selvamani, Suman Kumar Chakraborty, Giovanni Cabiddu) - Fix vfio selftest MMIO DMA mapping selftest (Alex Mastro) - Conversions to const struct class in support of class_create() deprecation (Jori Koolstra) - Improve selftest compiler compatibility by avoiding initializer on variable-length array (Manish Honap) - Define new uAPI for drivers supporting migration to advise user- space of new initial data for reducing target startup latency. Implemented for mlx5 vfio-pci variant driver (Yishai Hadas) - Enable vfio selftests on aarch64, not just cross-compiles reporting arm64 (Ted Logan) - Update vfio selftest driver support to include additional DSA devices (Yi Lai) - Unconditionally include debugfs root pointer in vfio device struct, avoiding a build failure seen in hisi_acc variant driver without debugfs otherwise (Arnd Bergmann) - Add support for the s390 ISM (Internal Shared Memory) device via a new variant driver. The device is unique in the size of its BAR space (256TiB) and lack of mmap support (Julian Ruess) - Enforce that vfio-pci drivers implement a name in their ops structure for use in sequestering SR-IOV VFs (Alex Williamson) - Prune leftover group notifier code (Paolo Bonzini) - Fix Xe vfio-pci variant driver to avoid migration support as a dependency in the reset path and missing release call (Michał Winiarski) * tag 'vfio-v7.1-rc1' of https://github.com/awilliam/linux-vfio: (23 commits) vfio/xe: Add a missing vfio_pci_core_release_dev() vfio/xe: Reorganize the init to decouple migration from reset vfio: remove dead notifier code vfio/pci: Require vfio_device_ops.name MAINTAINERS: add VFIO ISM PCI DRIVER section vfio/ism: Implement vfio_pci driver for ISM devices vfio/pci: Rename vfio_config_do_rw() to vfio_pci_config_rw_single() and export it vfio: unhide vdev->debug_root vfio/qat: add support for Intel QAT 420xx VFs vfio: selftests: Support DMR and GNR-D DSA devices vfio: selftests: Build tests on aarch64 vfio/mlx5: Add REINIT support to VFIO_MIG_GET_PRECOPY_INFO vfio/mlx5: consider inflight SAVE during PRE_COPY net/mlx5: Add IFC bits for migration state vfio: Adapt drivers to use the core helper vfio_check_precopy_ioctl vfio: Add support for VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 vfio: Define uAPI for re-init initial bytes during the PRE_COPY phase vfio: selftests: Fix VLA initialisation in vfio_pci_irq_set() vfio: uapi: fix comment typo vfio: mdev: replace mtty_dev->vd_class with a const struct class ...	2026-04-16 08:01:16 -07:00
Petr Mladek	448c0f8cb7	Merge branch 'for-7.1/module-function-test' into for-linus	2026-04-16 10:33:43 +02:00
Stefan Metzmacher	d09a040c18	smb: smbdirect: let smbdirect_connection_deregister_mr_io unlock while waiting We should not hold a mutex locked during wait_for_completion() holding a reference is enough. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: David Howells <dhowells@redhat.com> Cc: Henrique Carvalho <henrique.carvalho@suse.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-15 21:58:24 -05:00
Stefan Metzmacher	25c2e34931	smb: smbdirect: fix the logic in smbdirect_socket_destroy_sync() without an error If smbdirect_socket_destroy_sync() and sc->first_error was not set we should set -ESHUTDOWN, that's a better condition doing it only implicitly with the sc->status < SMBDIRECT_SOCKET_DISCONNECTING check. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: David Howells <dhowells@redhat.com> Cc: Henrique Carvalho <henrique.carvalho@suse.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-15 21:58:24 -05:00
Stefan Metzmacher	3892007f2b	smb: smbdirect: fix copyright header of smbdirect.h Everything in smbdirect.h was taken from my out of tree prototype. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: David Howells <dhowells@redhat.com> Cc: Henrique Carvalho <henrique.carvalho@suse.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-15 21:58:24 -05:00
Stefan Metzmacher	735610d0ce	smb: smbdirect: change smbdirect_socket_parameters.{initiator_depth,responder_resources} to __u16 We still limit this to U8_MAX as the rdma api only uses __u8 and that's also the limit for Infiniband and RoCE*, while iWarp would be able to support larger values at the protocol level. As struct smbdirect_socket_parameters will be part of the uapi for IPPROTO_SMBDIRECT in future, change it now even if userspace sockets won't be supported yet. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: David Howells <dhowells@redhat.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Acked-by: Henrique Carvalho <henrique.carvalho@suse.com> Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-15 21:58:24 -05:00
Stefan Metzmacher	aa43bb2c0f	smb: smbdirect: remove unused SMBDIRECT_USE_INLINE_C_FILES logic We always build as standalone module (or as part of the core kernel). This also removes unused elements from struct smbdirect_socket and unused exports. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-15 21:58:24 -05:00
Stefan Metzmacher	649c47559a	smb: server: no longer use smbdirect_socket_set_custom_workqueue() smbdirect.ko has global workqueues now, so we should use these default once. Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-15 21:58:24 -05:00
Stefan Metzmacher	73dc52d294	smb: client: no longer use smbdirect_socket_set_custom_workqueue() smbdirect.ko has global workqueues now, so we should use these default once. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-15 21:58:24 -05:00
Stefan Metzmacher	1adde16a9e	smb: smbdirect: introduce global workqueues These will be used in future and callers should no longer use smbdirect_socket_set_custom_workqueue(). Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-15 21:58:24 -05:00
Stefan Metzmacher	e4ce1fca04	smb: smbdirect: prepare use of dedicated workqueues for different steps This is a preparation in order to have global workqueues in the smbdirect module instead of having the caller to provide one. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-15 21:58:24 -05:00
Stefan Metzmacher	00ac2a4fe0	smb: smbdirect: remove unused smbdirect_connection_mr_io_recovery_work() This would actually never be used as we only move to SMBDIRECT_MR_ERROR when we directly call smbdirect_socket_schedule_cleanup(). Doing an ib_dereg_mr/ib_alloc_mr dance on working connection is not needed and it's also pointless on a broken connection as we don't reuse any ib_pd. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-15 21:58:24 -05:00
Stefan Metzmacher	a40e6f0166	smb: smbdirect: wrap rdma_disconnect() in rdma_[un]lock_handler() This might not be needed, but it controls the order of ib_drain_qp() and rdma_disconnect(). Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-15 21:58:24 -05:00
Stefan Metzmacher	33b2894e8d	smb: server: make use of smbdirect_netdev_rdma_capable_mode_type() This removes is basically the same logic. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-15 21:58:24 -05:00
Stefan Metzmacher	81a7a3a0fa	smb: smbdirect: introduce smbdirect_netdev_rdma_capable_mode_type() This is basically a copy of ksmbd_rdma_capable_netdev() in the server, but this also prints a message when a device is renamed. The differences are: - It uses rdma_for_each_port() instead of implementing the same logic again. - It returns RDMA_NODE_{UNSPECIFIED,IB_CA,RNIC} values instead of bool Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-15 21:58:24 -05:00
Stefan Metzmacher	50bdab9ae4	smb: server: make use of smbdirect.ko This means we no longer inline the common smbdirect .c files and use the exported functions from the module instead. Note the connection specific logging is still redirect to ksmbd.ko functions via smbdirect_socket_set_logging(). We still don't use real socket layer, but we're very close... Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-15 21:58:24 -05:00
Stefan Metzmacher	98bdc5fda9	smb: server: remove unused ksmbd_transport_ops.prepare() This is no longer needed for smbdirect. Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-15 21:58:24 -05:00
Stefan Metzmacher	2eff5e51f9	smb: server: make use of smbdirect_socket_{listen,accept}() We no longer need the custom rdma listener. The code logic is very similar to transport_tcp.c now using a kernel thread that loops over smbdirect_socket_accept(). This is the first step in the direction of using IPPROTO_SMBDIRECT sockets in future. Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-15 21:58:24 -05:00
Stefan Metzmacher	1b2d94a3c9	smb: server: only use public smbdirect functions Also remove a lot of unused includes... Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-15 21:58:24 -05:00
Stefan Metzmacher	ff7673f6fd	smb: server: make use of smbdirect_socket_create_accepting()/smbdirect_socket_release() With this we no longer embed struct smbdirect_socket, which will allow us to make it private in the following commits. Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-15 21:58:24 -05:00
Stefan Metzmacher	9460416487	smb: server: make use of smbdirect_{socket_init_accepting,connection_wait_for_connected}() This means we finally only use common functions in the server. We still use the embedded struct smbdirect_socket and are able to access internals, but the will be removed in the next commits as well. Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-15 21:58:24 -05:00
Stefan Metzmacher	4b4c21a7d2	smb: server: make use of smbdirect_connection_send_iter() and related functions This makes use of common code for sending messages, this will allow to make more use of common code in the next commits. Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-15 21:58:24 -05:00
Stefan Metzmacher	c6b077efbc	smb: server: let smb_direct_post_send_data() return data_length This make it easier moving to common code shared with the client. Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-15 21:58:23 -05:00
Stefan Metzmacher	08ffdf0c41	smb: server: split out smb_direct_send_iter() out of smb_direct_writev() This will help to move to common code in future. Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-15 21:58:23 -05:00
Stefan Metzmacher	da20536c50	smb: server: let smbdirect_map_sges_from_iter() truncate the message boundary smbdirect_map_sges_from_iter() already handles the case that only a limited number of sges are available. Its return value is data_length and the remaining bytes in the iter are remaining_data_length. This is now much easier and will allow us to share more code with the client soon. Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-15 21:58:23 -05:00
Stefan Metzmacher	0af87a0a31	smb: server: inline smb_direct_create_header() into smb_direct_post_send_data() The point is that ib_dma_map_single() is done first, but the 'Fill in the packet header' will be done after smbdirect_map_sges_from_iter(). This will simplify further changes in order to share common code with the client. Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-15 21:58:23 -05:00
Stefan Metzmacher	0184d2b386	smb: server: move iov_iter_kvec() out of smb_direct_post_send_data() This will allow us to make the code more generic in order to move it to common with the client. Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-15 21:58:23 -05:00
Stefan Metzmacher	1421d50ea9	smb: server: make use of smbdirect_connection_request_keep_alive() This will help to share more common code soon. Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-15 21:58:23 -05:00
Stefan Metzmacher	0a1702e931	smb: server: make use of smbdirect_connection_grant_recv_credits() This is already used by the client too and will help to share more common code. Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-15 21:58:23 -05:00
Stefan Metzmacher	73489efdda	smb: server: make use of smbdirect_connection_recvmsg() This is basically the same logic, it just operates on iov_iter_kvec() instead of a raw buffer pointer. This allows us to use common code between client and server. We keep returning -EINTR instead of -ERESTARTSYS if wait_event_interruptible() fails. I don't if this is required, but changing it is a task for another patch. Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-15 21:58:23 -05:00
Stefan Metzmacher	a3bf9bfee8	smb: server: make use of smbdirect_socket_destroy_sync() This is basically the same logic as before, but we now use common code, which will also be used by the server soon. Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-15 21:58:23 -05:00
Stefan Metzmacher	21a72d0900	smb: server: make use of functions from smbdirect_rw.c The copied code only got new names, some indentation/formatting changes, some variable names are changed too. They also only use struct smbdirect_socket instead of struct smb_direct_transport. But the logic is still the same. Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-15 21:58:23 -05:00
Stefan Metzmacher	0911d32ba2	smb: server: make use of smbdirect_socket_wait_for_credits() This will allow us to share more common code between client and server soon. Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-15 21:58:23 -05:00
Stefan Metzmacher	be0ac9f59f	smb: server: make use of smbdirect_get_buf_page_count() This will allow us to move code into common code between client and server soon. Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-15 21:58:23 -05:00
Stefan Metzmacher	8d55169a57	smb: server: make use of smbdirect_connection_recv_io_refill[_work]() This is basically a copy of smb_direct_post_recv_credits(), but there are several improvements compared to the existing function: 1. We calculate the number of missing posted buffers by getting the difference between recv_io.credits.target and recv_io.posted.count. Instead of the difference between recv_io.credits.target and recv_io.credits.count, because recv_io.credits.count is only updated once a message is send to the peer. It was not really a problem before, because we have a fixed number smbdirect_recv_io buffers, so the loop terminated when smbdirect_connection_get_recv_io() returns NULL. But using recv_io.posted.count makes it easier to understand. 2. In order to tell the peer about the newly posted buffer and grant the credits, we only trigger the send immediate when we're not granting only the last possible credit. This is mostly a difference relative to the servers smb_direct_post_recv_credits() implementation, which should avoid useless ping pong messages. Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-15 21:58:23 -05:00

1 2 3 4 5 ...

1438645 Commits