If either of the calls to dm_bufio_client_create() in verity_fec_ctr()
fails, then dm_bufio_client_destroy() is later called with an ERR_PTR()
argument. That causes a crash. Fix this.
Fixes: a739ff3f54 ("dm verity: add support for forward error correction")
Cc: stable@vger.kernel.org
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
verity_fec_is_enabled() is very short and is called in quite a few
places, so make it an inline function.
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Since verity_fec_decode() has a !CONFIG_DM_VERITY_FEC stub, it can just
be called unconditionally, similar to the other calls in the same file.
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Make verity_finish_io() call verity_fec_finish_io() unconditionally,
instead of skipping it when 'in_bh' is true.
Although FEC can't have been done when 'in_bh' is true,
verity_fec_finish_io() is a no-op when FEC wasn't done. An earlier
change also made verity_fec_finish_io() very lightweight when FEC wasn't
done. So it should just be called unconditionally.
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
When correcting a data block, the FEC code performs optimally when it
has enough buffers to hold all the needed RS blocks. That number of
buffers is '1 << (v->data_dev_block_bits - DM_VERITY_FEC_BUF_RS_BITS)'.
However, since v->data_dev_block_bits isn't a compile-time constant, the
code actually used PAGE_SHIFT instead.
With the traditional PAGE_SIZE == data_block_size == 4096, this was
fine. However, when PAGE_SIZE > data_block_size, this wastes space.
E.g., with data_block_size == 4096 && PAGE_SIZE == 16384, struct
dm_verity_fec_io is 9240 bytes, when in fact only 3096 bytes are needed.
Fix this by making dm_verity_fec_io::bufs a variable-length array.
This makes the macros DM_VERITY_FEC_BUF_MAX and
fec_for_each_extra_buffer() no longer apply, so remove them. For
consistency, and because DM_VERITY_FEC_BUF_PREALLOC is fixed at 1 and
was already assumed to be 1 (considering that mempool_alloc() shouldn't
be called in a loop), also remove the related macros
DM_VERITY_FEC_BUF_PREALLOC and fec_for_each_prealloc_buffer().
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Currently, struct dm_verity_fec_io is allocated in the front padding of
struct bio using dm_target::per_io_data_size. Unfortunately, struct
dm_verity_fec_io is very large: 3096 bytes when CONFIG_64BIT=y &&
PAGE_SIZE == 4096, or 9240 bytes when CONFIG_64BIT=y && PAGE_SIZE ==
16384. This makes the bio size very large.
Moreover, most of dm_verity_fec_io gets iterated over up to three times,
even on I/O requests that don't require any error correction:
1. To zero the memory on allocation, if init_on_alloc=1. (This happens
when the bio is allocated, not in dm-verity itself.)
2. To zero the buffers array in verity_fec_init_io().
3. To free the buffers in verity_fec_finish_io().
Fix all of these inefficiencies by moving dm_verity_fec_io to a mempool.
Replace the embedded dm_verity_fec_io with a pointer
dm_verity_io::fec_io. verity_fec_init_io() initializes it to NULL,
verity_fec_decode() allocates it on the first call, and
verity_fec_finish_io() cleans it up. The normal case is that the
pointer simply stays NULL, so the overhead becomes negligible.
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
The clone target already exposes both source and destination devices via
clone_iterate_devices(), so dm-table's device_area_is_invalid() helper
ensures that the mapping does not extend past either underlying block
device.
The manual comparisons between ti->len and the source/destination device
sizes in parse_source_dev() and parse_dest_dev() are therefore
redundant. Remove these checks and rely on the core validation instead.
This changes the error strings reported when the devices are too small,
but preserves the failure behaviour.
Signed-off-by: Li Chen <me@linux.beauty>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
The cache target already exposes the origin device through
cache_iterate_devices(), which allows dm-table to call
device_area_is_invalid() and verify that the mapping fits inside the
underlying block device.
The explicit ti->len > origin_sectors test in parse_origin_dev() is
therefore redundant. Drop this check and rely on the core device
validation instead. This changes the user-visible error string when the
origin is too small, but preserves the failure behaviour.
Signed-off-by: Li Chen <me@linux.beauty>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Currently, the max_hw_discard_sectors of a stripe target is set to the
minimum max_hw_discard_sectors among all sub devices. When the discard
bio is larger than max_hw_discard_sectors, this may cause the stripe
device to split discard bios unnecessarily, because the value of
max_hw_discard_sectors affects max_discard_sectors, which equal to
min(max_hw_discard_sectors, max_user_discard_sectors).
For example:
root@vm:~# echo '0 33554432 striped 2 256 /dev/vdd 0 /dev/vde 0' | dmsetup create stripe_dev
root@vm:~# cat /sys/block/dm-1/queue/discard_max_bytes
536870912
root@vm:~# cat /sys/block/dm-1/slaves/vdd/queue/discard_max_bytes
536870912
root@vm:~# blkdiscard -o 0 -l 1073741824 -p 1073741824 /dev/mapper/stripe_dev
dm-1 is the stripe device, and its discard_max_bytes is equal to
each sub device’s discard_max_bytes. Since the requested discard
length exceeds discard_max_bytes, the block layer splits the discard bio:
block_bio_queue: 252,1 DS 0 + 2097152 [blkdiscard]
block_split: 252,1 DS 0 / 1048576 [blkdiscard]
block_rq_issue: 253,48 DS 268435456 () 0 + 524288 be,0,4 [blkdiscard]
block_bio_queue: 253,64 DS 524288 + 524288 [blkdiscard]
However, both vdd and vde can actually handle a discard bio of 536870912
bytes, so this split is not necessary.
This patch updates the stripe target’s q->limits.max_hw_discard_sectors
to be the minimum max_hw_discard_sectors of the sub devices multiplied
by the # of stripe devices, and max_hw_discard_sectors must round down to
chunk size multiply # of stripe devices to avoid issue discard bio to sub
devices which is larger than max_hw_discard_sectors.
This patch enables the stripe device to handle larger discard bios
without incurring unnecessary splitting.
Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com>
Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
The -EEXIST error code is reserved by the module loading infrastructure
to indicate that a module is already loaded. When a module's init
function returns -EEXIST, userspace tools like kmod interpret this as
"module already loaded" and treat the operation as successful, returning
0 to the user even though the module initialization actually failed.
This follows the precedent set by commit 54416fd767 ("netfilter:
conntrack: helper: Replace -EEXIST by -EBUSY") which fixed the same
issue in nf_conntrack_helper_register().
Affected modules:
* dm_cache dm_clone dm_integrity dm_mirror dm_multipath dm_pcache
* dm_vdo dm-ps-round-robin dm_historical_service_time dm_io_affinity
* dm_queue_length dm_service_time dm_snapshot
Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Since commit 15f73f5b3e ("blk-mq: move failure injection out of
blk_mq_complete_request"), drivers are responsible for calling
blk_should_fake_timeout() at appropriate code paths and opportunities.
However, the dm driver does not implement its own timeout handler and
relies on the timeout handling of its slave devices.
If an io-timeout-fail error is injected to a dm device, the request
will be leaked and never completed, causing tasks to hang indefinitely.
Reproduce:
1. prepare dm which has iscsi slave device
2. inject io-timeout-fail to dm
echo 1 >/sys/class/block/dm-0/io-timeout-fail
echo 100 >/sys/kernel/debug/fail_io_timeout/probability
echo 10 >/sys/kernel/debug/fail_io_timeout/times
3. read/write dm
4. iscsiadm -m node -u
Result: hang task like below
[ 862.243768] INFO: task kworker/u514:2:151 blocked for more than 122 seconds.
[ 862.244133] Tainted: G E 6.19.0-rc1+ #51
[ 862.244337] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 862.244718] task:kworker/u514:2 state:D stack:0 pid:151 tgid:151 ppid:2 task_flags:0x4288060 flags:0x00080000
[ 862.245024] Workqueue: iscsi_ctrl_3:1 __iscsi_unbind_session [scsi_transport_iscsi]
[ 862.245264] Call Trace:
[ 862.245587] <TASK>
[ 862.245814] __schedule+0x810/0x15c0
[ 862.246557] schedule+0x69/0x180
[ 862.246760] blk_mq_freeze_queue_wait+0xde/0x120
[ 862.247688] elevator_change+0x16d/0x460
[ 862.247893] elevator_set_none+0x87/0xf0
[ 862.248798] blk_unregister_queue+0x12e/0x2a0
[ 862.248995] __del_gendisk+0x231/0x7e0
[ 862.250143] del_gendisk+0x12f/0x1d0
[ 862.250339] sd_remove+0x85/0x130 [sd_mod]
[ 862.250650] device_release_driver_internal+0x36d/0x530
[ 862.250849] bus_remove_device+0x1dd/0x3f0
[ 862.251042] device_del+0x38a/0x930
[ 862.252095] __scsi_remove_device+0x293/0x360
[ 862.252291] scsi_remove_target+0x486/0x760
[ 862.252654] __iscsi_unbind_session+0x18a/0x3e0 [scsi_transport_iscsi]
[ 862.252886] process_one_work+0x633/0xe50
[ 862.253101] worker_thread+0x6df/0xf10
[ 862.253647] kthread+0x36d/0x720
[ 862.254533] ret_from_fork+0x2a6/0x470
[ 862.255852] ret_from_fork_asm+0x1a/0x30
[ 862.256037] </TASK>
Remove the blk_should_fake_timeout() check from dm, as dm has no
native timeout handling and should not attempt to fake timeouts.
Signed-off-by: Ding Hui <dinghui@sangfor.com.cn>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
There is no function advance_compression_stage(). But
advance_data_vio_compression_stage() does iterate through
the values of the data_vio_compression_stage enum, so it
seems to be what was intended.
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Pull USB fixes from Greg KH:
"Here are some small USB fixes, and bunch of reverts for 6.19-rc3.
Included in here are:
- reverts of some typec ucsi driver changes that had a lot of
regression reports after -rc1. Let's just revert it all for now and
it will come back in a way that is better tested.
- other typec bugfixes
- usb-storage quirk fixups
- dwc3 driver fix
- other minor USB fixes for reported problems.
All of these have passed 0-day testing and individual testing"
* tag 'usb-6.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (22 commits)
Revert "usb: typec: ucsi: Update UCSI structure to have message in and message out fields"
Revert "usb: typec: ucsi: Add support for message out data structure"
Revert "usb: typec: ucsi: Enable debugfs for message_out data structure"
Revert "usb: typec: ucsi: Add support for SET_PDOS command"
Revert "usb: typec: ucsi: Fix null pointer dereference in ucsi_sync_control_common"
Revert "usb: typec: ucsi: Get connector status after enable notifications"
usb: ohci-nxp: clean up probe error labels
usb: gadget: lpc32xx_udc: clean up probe error labels
usb: ohci-nxp: fix device leak on probe failure
usb: phy: isp1301: fix non-OF device reference imbalance
usb: gadget: lpc32xx_udc: fix clock imbalance in error path
usb: typec: ucsi: Get connector status after enable notifications
usb: usb-storage: Maintain minimal modifications to the bcdDevice range.
usb: dwc3: of-simple: fix clock resource leak in dwc3_of_simple_probe
usb: typec: ucsi: Fix null pointer dereference in ucsi_sync_control_common
USB: lpc32xx_udc: Fix error handling in probe
usb: typec: altmodes/displayport: Drop the device reference in dp_altmode_probe()
usb: phy: fsl-usb: Fix use-after-free in delayed work during device removal
usb: renesas_usbhs: Fix a resource leak in usbhs_pipe_malloc()
usb: typec: ucsi: huawei-gaokin: add DRM dependency
...
Pull serial driver fixes from Greg KH:
"Here are some small serial driver fixes for some reported issues.
Included in here are:
- serial sysfs fwnode fix that was much reported
- sh-sci driver fix
- serial device init bugfix
- 8250 bugfix
- xilinx_uartps bugfix
All of these have passed 0-day testing and individual testing"
* tag 'tty-6.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: xilinx_uartps: fix rs485 delay_rts_after_send
serial: sh-sci: Check that the DMA cookie is valid
serial: core: Fix serial device initialization
serial: 8250: longson: Fix NULL vs IS_ERR() bug in probe
serial: core: Restore sysfs fwnode information
Pull firewire fix from Takashi Sakamoto:
"A fix for PCI driver for Texas Instruments PCILyx series.
The driver had a bug where it allocated a DMA-coherent buffer of 16 KB
but released it using PAGE_SIZE. This disproportion was reported in
2020, but the fix was never merged. It is finally resolved"
* tag 'firewire-fixes-6.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
firewire: nosy: Fix dma_free_coherent() size
Pull RISC-V updates from Paul Walmsley:
"Nothing exotic here; these are the cleanup and new ISA extension
probing patches (not including CFI):
- Add probing and userspace reporting support for the standard RISC-V
ISA extensions Zilsd and Zclsd, which implement load/store dual
instructions on RV32
- Abstract the register saving code in setup_sigcontext() so it can
be used for stateful RISC-V ISA extensions beyond the vector
extension
- Add the SBI extension ID and some initial data structure
definitions for the RISC-V standard SBI debug trigger extension
- Clean up some code slightly: change some page table functions to
avoid atomic operations oinn !SMP and to avoid unnecessary casts to
atomic_long_t; and use the existing RISCV_FULL_BARRIER macro in
place of some open-coded 'fence rw,rw' instructions"
* tag 'riscv-for-linus-6.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Add SBI debug trigger extension and function ids
riscv/atomic.h: use RISCV_FULL_BARRIER in _arch_atomic* function.
riscv: hwprobe: export Zilsd and Zclsd ISA extensions
riscv: add ISA extension parsing for Zilsd and Zclsd
dt-bindings: riscv: add Zilsd and Zclsd extension descriptions
riscv: mm: use xchg() on non-atomic_long_t variables, not atomic_long_xchg()
riscv: mm: ptep_get_and_clear(): avoid atomic ops when !CONFIG_SMP
riscv: mm: pmdp_huge_get_and_clear(): avoid atomic ops when !CONFIG_SMP
riscv: signal: abstract header saving for setup_sigcontext
Pull powerpc fixes from Madhavan Srinivasan:
- Fix for kexec warning due to SMT disable or partial SMT enabled
- Handle font bitmap pointer with reloc_offset to fix boot crash
- Fix to enable cpuidle state for Power11
- Couple of misc fixes
Thanks to Aboorva Devarajan, Aditya Bodkhe, Cedar Maxwell, Christian
Zigotzky, Christophe Leroy, Christophe Leroy (CS GROUP), Finn Thain,
Gopi Krishna Menon, Guenter Roeck, Jan Stancek, Joe Lawrence, Josh
Poimboeuf, Justin M. Forbes, Madadi Vineeth Reddy, Naveen N Rao (AMD),
Nysal Jan K.A., Sachin P Bappalige, Samir M, Sourabh Jain, Srikar
Dronamraju, and Stan Johnson
* tag 'powerpc-6.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/32: Restore disabling of interrupts at interrupt/syscall exit
powerpc/powernv: Enable cpuidle state detection for POWER11
powerpc: Add reloc_offset() to font bitmap pointer used for bootx_printf()
powerpc/tools: drop `-o pipefail` in gcc check scripts
selftests/powerpc/pmu/: Add check_extended_reg_test to .gitignore
powerpc/kexec: Enable SMT before waking offline CPUs
Pull spi fixes from Mark Brown:
"We've got more fixes here for the Cadence QSPI controller, this time
fixing some issues that come up when working with slower flashes on
some platforms plus a general race condition.
We also add support for the Allwinner A523, this is just some new
compatibles"
* tag 'spi-fix-v6.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: cadence-quadspi: Improve CQSPI_SLOW_SRAM quirk if flash is slow
spi: cadence-quadspi: Prevent lost complete() call during indirect read
spi: sun6i: Support A523's SPI controllers
spi: dt-bindings: sun6i: Add compatibles for A523's SPI controllers
Pull regulator fixes from Mark Brown:
"A couple of fixes from Thomas, making the UAPI headers more robustly
correct and ensuring they are covered by checkpatch, and one from
Andreas fixing an update for a change to the DT bindings that I missed
was requested during bindings review in the newly added fp9931 driver"
* tag 'regulator-fix-v6.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: fp9931: fix regulator node pointer
regulator: Add UAPI headers to MAINTAINERS
regulator: uapi: Use UAPI integer type
Pull SCSI fixes from James Bottomley:
"Three HBA driver and one upper level driver (sg) fix.
The sg change is the largest, but that results mostly from moving code
to avoid the described race condition"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: core: Add ufshcd_update_evt_hist() for UFS suspend error
scsi: sg: Fix occasional bogus elapsed time that exceeds timeout
scsi: mpi3mr: Read missing IOCFacts flag for reply queue full overflow
scsi: scsi_debug: Fix atomic write enable module param description
Pull smb client fix from Steve French:
- Fix potential memory leak
* tag 'v6.19-rc2-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Fix memory and information leak in smb3_reconfigure()
Pull driver core fixes from Danilo Krummrich:
- Introduce DMA Rust helpers to avoid build errors when !CONFIG_HAS_DMA
- Remove unnecessary (and hence incorrect) endian conversion in the
Rust PCI driver sample code
- Fix memory leak in the unwind path of debugfs_change_name()
- Support non-const struct software_node pointers in
SOFTWARE_NODE_REFERENCE(), after introducing _Generic()
- Avoid NULL pointer dereference in the unwind path of
simple_xattrs_free()
* tag 'driver-core-6.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core:
fs/kernfs: null-ptr deref in simple_xattrs_free()
software node: Also support referencing non-constant software nodes
debugfs: Fix memleak in debugfs_change_name().
samples: rust: fix endianness issue in rust_driver_pci
rust: dma: add helpers for architectures without CONFIG_HAS_DMA
Pull EFI fixes from Ard Biesheuvel:
"A couple of fixes for EFI regressions introduced this cycle:
- Make EDID handling in the EFI stub mixed mode safe
- Ensure that efi_mm.user_ns has a sane value - this is needed now
that EFI runtime calls are preemptible on arm64"
* tag 'efi-fixes-for-v6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
kthread: Warn if mm_struct lacks user_ns in kthread_use_mm()
arm64: efi: Fix NULL pointer dereference by initializing user_ns
efi/libstub: gop: Fix EDID support in mixed-mode
Pull block fixes from Jens Axboe:
- Fix for a signedness issue introduced in this kernel release for rnbd
- Fix up user copy references for ublk when the server exits
* tag 'block-6.19-20251226' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
block: rnbd-clt: Fix signedness bug in init_dev()
ublk: clean up user copy references on ublk server exit
Pull io_uring fix from Jens Axboe:
"Just a single fix for a bug that can cause a leak of the filename with
IORING_OP_OPENAT, if direct descriptors are asked for and O_CLOEXEC
has been set in the request flags"
* tag 'io_uring-6.19-20251226' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
io_uring: fix filename leak in __io_openat_prep()
Pull virtio fixes from Michael Tsirkin:
"Just a bunch of fixes, mostly trivial ones in tools/virtio"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vhost/vsock: improve RCU read sections around vhost_vsock_get()
tools/virtio: add device, device_driver stubs
tools/virtio: fix up oot build
virtio_features: make it self-contained
tools/virtio: switch to kernel's virtio_config.h
tools/virtio: stub might_sleep and synchronize_rcu
tools/virtio: add struct cpumask to cpumask.h
tools/virtio: pass KCFLAGS to module build
tools/virtio: add ucopysize.h stub
tools/virtio: add dev_WARN_ONCE and is_vmalloc_addr stubs
tools/virtio: stub DMA mapping functions
tools/virtio: add struct module forward declaration
tools/virtio: use kernel's virtio.h
virtio: make it self-contained
tools/virtio: fix up compiler.h stub
Pull smb server fixes from Steve French:
- Fix parsing of SMB1 negotiate request by adjusting offsets affected
by the removal of the RFC1002 length field from the SMB header
- Update minimum PDU size macros for both SMB1 and SMB2
- Rename smb2_get_msg function to smb_get_msg to better reflect its
role in handling both SMB1 and SMB2 requests
* tag 'v6.19-rc2-smb3-server-fixes' of git://git.samba.org/ksmbd:
smb/server: fix minimum SMB2 PDU size
smb/server: fix minimum SMB1 PDU size
ksmbd: rename smb2_get_msg to smb_get_msg
ksmbd: Fix to handle removal of rfc1002 header from smb_hdr
__io_openat_prep() allocates a struct filename using getname(). However,
for the condition of the file being installed in the fixed file table as
well as having O_CLOEXEC flag set, the function returns early. At that
point, the request doesn't have REQ_F_NEED_CLEANUP flag set. Due to this,
the memory for the newly allocated struct filename is not cleaned up,
causing a memory leak.
Fix this by setting the REQ_F_NEED_CLEANUP for the request just after the
successful getname() call, so that when the request is torn down, the
filename will be cleaned up, along with other resources needing cleanup.
Reported-by: syzbot+00e61c43eb5e4740438f@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=00e61c43eb5e4740438f
Tested-by: syzbot+00e61c43eb5e4740438f@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Prithvi Tambewagh <activprithvi@gmail.com>
Fixes: b9445598d8 ("io_uring: openat directly into fixed fd table")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add a WARN_ON_ONCE() check to detect mm_struct instances that are
missing user_ns initialization when passed to kthread_use_mm().
When a kthread adopts an mm via kthread_use_mm(), LSM hooks and
capability checks may access current->mm->user_ns for credential
validation. If user_ns is NULL, this leads to a NULL pointer
dereference crash.
This was observed with efi_mm on arm64, where commit a5baf582f4
("arm64/efi: Call EFI runtime services without disabling preemption")
introduced kthread_use_mm(&efi_mm), but efi_mm lacked user_ns
initialization, causing crashes during /proc access.
Adding this warning helps catch similar bugs early during development
rather than waiting for hard-to-debug NULL pointer crashes in
production.
Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Linux 6.19-rc2 (9448598b22 ("Linux 6.19-rc2")) is crashing with a NULL
pointer dereference on arm64 hosts:
Unable to handle kernel NULL pointer dereference at virtual address 00000000000000c8
pc : cap_capable (security/commoncap.c:82 security/commoncap.c:128)
Call trace:
cap_capable (security/commoncap.c:82 security/commoncap.c:128) (P)
security_capable (security/security.c:?)
ns_capable_noaudit (kernel/capability.c:342 kernel/capability.c:381)
__ptrace_may_access (./include/linux/rcupdate.h:895 kernel/ptrace.c:326)
ptrace_may_access (kernel/ptrace.c:353)
do_task_stat (fs/proc/array.c:467)
proc_tgid_stat (fs/proc/array.c:673)
proc_single_show (fs/proc/base.c:803)
I've bissected the problem to commit a5baf582f4 ("arm64/efi: Call EFI
runtime services without disabling preemption").
>From my analyzes, the crash occurs because efi_mm lacks a user_ns field
initialization. This was previously harmless, but commit a5baf582f4
("arm64/efi: Call EFI runtime services without disabling preemption")
changed the EFI runtime call path to use kthread_use_mm(&efi_mm), which
temporarily adopts efi_mm as the current mm for the calling kthread.
When a thread has an active mm, LSM hooks like cap_capable() expect
mm->user_ns to be valid for credential checks. With efi_mm.user_ns being
NULL, capability checks during possible /proc access dereference the
NULL pointer and crash.
Fix by initializing efi_mm.user_ns to &init_user_ns.
Fixes: a5baf582f4 ("arm64/efi: Call EFI runtime services without disabling preemption")
Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
The efi_edid_discovered_protocol and efi_edid_active_protocol have mixed
mode fields. So all their attributes should be accessed through
the efi_table_attr() helper.
Doing so fixes the upper 32 bits of the 64 bit gop_edid pointer getting
set to random values (followed by a crash at boot) when booting a x86_64
kernel on a machine with 32 bit UEFI like the Asus T100TA.
Fixes: 17029cdd8f ("efi/libstub: gop: Add support for reading EDID")
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Javier Martinez Canillas <javierm@redhat.com>
Signed-off-by: Hans de Goede <johannes.goede@oss.qualcomm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Pull nfsd fixes from Chuck Lever:
"A set of NFSD fixes that arrived just a bit late for the 6.19 merge
window.
Regression fixes:
- Mark variable __maybe_unused to avoid W=1 build break
Stable fixes:
- NFSv4 file creation neglects setting ACL
- Clear TIME_DELEG in the suppattr_exclcreat bitmap
- Clear SECLABEL in the suppattr_exclcreat bitmap
- Fix memory leak in nfsd_create_serv error paths
- Bound check rq_pages index in inline path
- Return 0 on success from svc_rdma_copy_inline_range
- Use rc_pageoff for memcpy byte offset
- Avoid NULL deref on zero length gss_token in gss_read_proxy_verf"
* tag 'nfsd-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
NFSD: NFSv4 file creation neglects setting ACL
NFSD: Clear TIME_DELEG in the suppattr_exclcreat bitmap
NFSD: Clear SECLABEL in the suppattr_exclcreat bitmap
nfsd: fix memory leak in nfsd_create_serv error paths
nfsd: Mark variable __maybe_unused to avoid W=1 build break
svcrdma: bound check rq_pages index in inline path
svcrdma: return 0 on success from svc_rdma_copy_inline_range
svcrdma: use rc_pageoff for memcpy byte offset
SUNRPC: svcauth_gss: avoid NULL deref on zero length gss_token in gss_read_proxy_verf
Pull erofs fix from Gao Xiang:
"Junbeom reported that synchronous reads could hit unintended EIOs
under memory pressure due to incorrect error propagation in
z_erofs_decompress_queue(), where earlier physical clusters in the
same decompression queue may be served for another readahead.
This addresses the issue by decompressing each physical cluster
independently as long as disk I/Os succeed, rather than being impacted
by the error status of previous physical clusters in the same queue.
Summary:
- Fix unexpected EIOs under memory pressure caused by recent
incorrect error propagation logic"
* tag 'erofs-for-6.19-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: fix unexpected EIO under memory pressure
In smb3_reconfigure(), if smb3_sync_session_ctx_passwords() fails, the
function returns immediately without freeing and erasing the newly
allocated new_password and new_password2. This causes both a memory leak
and a potential information leak.
Fix this by calling kfree_sensitive() on both password buffers before
returning in this error case.
Fixes: 0f0e357902 ("cifs: during remount, make sure passwords are in sync")
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: Steve French <stfrench@microsoft.com>
The refill_buf function uses snprintf to append to a fixed-size buffer.
snprintf returns the length that would have been written, which can
exceed the remaining buffer size. If this happens, ptr advances beyond
the buffer and rem becomes negative. In the 2nd iteration, rem is
treated as a large unsigned integer, causing snprintf to write oob.
While this behavior is technically mitigated by num_perfcntrs being
locked at 5, it's still unsafe if num_perfcntrs were ever to change/a
second source was added.
Signed-off-by: Evan Lambert <veyga@veygax.dev>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/696358/
Link: https://lore.kernel.org/r/20251224124254.17920-3-veyga@veygax.dev
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
vhost_vsock_get() uses hash_for_each_possible_rcu() to find the
`vhost_vsock` associated with the `guest_cid`. hash_for_each_possible_rcu()
should only be called within an RCU read section, as mentioned in the
following comment in include/linux/rculist.h:
/**
* hlist_for_each_entry_rcu - iterate over rcu list of given type
* @pos: the type * to use as a loop cursor.
* @head: the head for your list.
* @member: the name of the hlist_node within the struct.
* @cond: optional lockdep expression if called from non-RCU protection.
*
* This list-traversal primitive may safely run concurrently with
* the _rcu list-mutation primitives such as hlist_add_head_rcu()
* as long as the traversal is guarded by rcu_read_lock().
*/
Currently, all calls to vhost_vsock_get() are between rcu_read_lock()
and rcu_read_unlock() except for calls in vhost_vsock_set_cid() and
vhost_vsock_reset_orphans(). In both cases, the current code is safe,
but we can make improvements to make it more robust.
About vhost_vsock_set_cid(), when building the kernel with
CONFIG_PROVE_RCU_LIST enabled, we get the following RCU warning when the
user space issues `ioctl(dev, VHOST_VSOCK_SET_GUEST_CID, ...)` :
WARNING: suspicious RCU usage
6.18.0-rc7 #62 Not tainted
-----------------------------
drivers/vhost/vsock.c:74 RCU-list traversed in non-reader section!!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
1 lock held by rpc-libvirtd/3443:
#0: ffffffffc05032a8 (vhost_vsock_mutex){+.+.}-{4:4}, at: vhost_vsock_dev_ioctl+0x2ff/0x530 [vhost_vsock]
stack backtrace:
CPU: 2 UID: 0 PID: 3443 Comm: rpc-libvirtd Not tainted 6.18.0-rc7 #62 PREEMPT(none)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-7.fc42 06/10/2025
Call Trace:
<TASK>
dump_stack_lvl+0x75/0xb0
dump_stack+0x14/0x1a
lockdep_rcu_suspicious.cold+0x4e/0x97
vhost_vsock_get+0x8f/0xa0 [vhost_vsock]
vhost_vsock_dev_ioctl+0x307/0x530 [vhost_vsock]
__x64_sys_ioctl+0x4f2/0xa00
x64_sys_call+0xed0/0x1da0
do_syscall_64+0x73/0xfa0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
...
</TASK>
This is not a real problem, because the vhost_vsock_get() caller, i.e.
vhost_vsock_set_cid(), holds the `vhost_vsock_mutex` used by the hash
table writers. Anyway, to prevent that warning, add lockdep_is_held()
condition to hash_for_each_possible_rcu() to verify that either the
caller is in an RCU read section or `vhost_vsock_mutex` is held when
CONFIG_PROVE_RCU_LIST is enabled; and also clarify the comment for
vhost_vsock_get() to better describe the locking requirements and the
scope of the returned pointer validity.
About vhost_vsock_reset_orphans(), currently this function is only
called via vsock_for_each_connected_socket(), which holds the
`vsock_table_lock` spinlock (which is also an RCU read-side critical
section). However, add an explicit RCU read lock there to make the code
more robust and explicit about the RCU requirements, and to prevent
issues if the calling context changes in the future or if
vhost_vsock_reset_orphans() is called from other contexts.
Fixes: 834e772c8d ("vhost/vsock: fix use-after-free in network stack callers")
Cc: stefanha@redhat.com
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20251126133826.142496-1-sgarzare@redhat.com>
Message-ID: <20251126210313.GA499503@fedora>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>