I encountered a hung task issue, but not a performance one. I run DIO
on a device (need lba continuous, for example open channel ssd), maybe
hungtask in below case:
DIO: Checkpoint:
get addr A(at boundary), merge into BIO,
no submit because boundary missing
flush dirty data(get addr A+1), wait IO(A+1)
writeback timeout, because DIO(A) didn't submit
get addr A+2 fail, because checkpoint is doing
dio_send_cur_page() may clear sdio->boundary, so prevent it from missing
a boundary.
Link: https://lkml.kernel.org/r/20210322042253.38312-1-jack.qiu@huawei.com
Fixes: b1058b9812 ("direct-io: submit bio after boundary buffer is added to it")
Signed-off-by: Jack Qiu <jack.qiu@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ia64 has two stacks:
- memory stack (or stack), pointed at by by r12
- register backing store (register stack), pointed at by
ar.bsp/ar.bspstore with complications around dirty
register frame on CPU.
In [1] Dmitry noticed that PTRACE_GET_SYSCALL_INFO returns the register
stack instead memory stack.
The bug comes from the fact that user_stack_pointer() and
current_user_stack_pointer() don't return the same register:
ulong user_stack_pointer(struct pt_regs *regs) { return regs->ar_bspstore; }
#define current_user_stack_pointer() (current_pt_regs()->r12)
The change gets both back in sync.
I think ptrace(PTRACE_GET_SYSCALL_INFO) is the only affected user by
this bug on ia64.
The change fixes 'rt_sigreturn.gen.test' strace test where it was
observed initially.
Link: https://bugs.gentoo.org/769614 [1]
Link: https://lkml.kernel.org/r/20210331084447.2561532-1-slyfox@gentoo.org
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Reported-by: Dmitry V. Levin <ldv@altlinux.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The following deadlock is detected:
truncate -> setattr path is waiting for pending direct IO to be done (inode->i_dio_count become zero) with inode->i_rwsem held (down_write).
PID: 14827 TASK: ffff881686a9af80 CPU: 20 COMMAND: "ora_p005_hrltd9"
#0 __schedule at ffffffff818667cc
#1 schedule at ffffffff81866de6
#2 inode_dio_wait at ffffffff812a2d04
#3 ocfs2_setattr at ffffffffc05f322e [ocfs2]
#4 notify_change at ffffffff812a5a09
#5 do_truncate at ffffffff812808f5
#6 do_sys_ftruncate.constprop.18 at ffffffff81280cf2
#7 sys_ftruncate at ffffffff81280d8e
#8 do_syscall_64 at ffffffff81003949
#9 entry_SYSCALL_64_after_hwframe at ffffffff81a001ad
dio completion path is going to complete one direct IO (decrement
inode->i_dio_count), but before that it hung at locking inode->i_rwsem:
#0 __schedule+700 at ffffffff818667cc
#1 schedule+54 at ffffffff81866de6
#2 rwsem_down_write_failed+536 at ffffffff8186aa28
#3 call_rwsem_down_write_failed+23 at ffffffff8185a1b7
#4 down_write+45 at ffffffff81869c9d
#5 ocfs2_dio_end_io_write+180 at ffffffffc05d5444 [ocfs2]
#6 ocfs2_dio_end_io+85 at ffffffffc05d5a85 [ocfs2]
#7 dio_complete+140 at ffffffff812c873c
#8 dio_aio_complete_work+25 at ffffffff812c89f9
#9 process_one_work+361 at ffffffff810b1889
#10 worker_thread+77 at ffffffff810b233d
#11 kthread+261 at ffffffff810b7fd5
#12 ret_from_fork+62 at ffffffff81a0035e
Thus above forms ABBA deadlock. The same deadlock was mentioned in
upstream commit 28f5a8a7c0 ("ocfs2: should wait dio before inode lock
in ocfs2_setattr()"). It seems that that commit only removed the
cluster lock (the victim of above dead lock) from the ABBA deadlock
party.
End-user visible effects: Process hang in truncate -> ocfs2_setattr path
and other processes hang at ocfs2_dio_end_io_write path.
This is to fix the deadlock itself. It removes inode_lock() call from
dio completion path to remove the deadlock and add ip_alloc_sem lock in
setattr path to synchronize the inode modifications.
[wen.gang.wang@oracle.com: remove the "had_alloc_lock" as suggested]
Link: https://lkml.kernel.org/r/20210402171344.1605-1-wen.gang.wang@oracle.com
Link: https://lkml.kernel.org/r/20210331203654.3911-1-wen.gang.wang@oracle.com
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When we do coredump for user process signal, this may be an SIGBUS signal
with BUS_MCEERR_AR or BUS_MCEERR_AO code, which means this signal is
resulted from ECC memory fail like SRAR or SRAO, we expect the memory
recovery work is finished correctly, then the get_dump_page() will not
return the error page as its process pte is set invalid by
memory_failure().
But memory_failure() may fail, and the process's related pte may not be
correctly set invalid, for current code, we will return the poison page,
get it dumped, and then lead to system panic as its in kernel code.
So check the poison status in get_dump_page(), and if TRUE, return NULL.
There maybe other scenario that is also better to check the posion status
and not to panic, so make a wrapper for this check, Thanks to David's
suggestion(<david@redhat.com>).
[akpm@linux-foundation.org: s/0/false/]
[yaoaili@kingsoft.com: is_page_poisoned() arg cannot be null, per Matthew]
Link: https://lkml.kernel.org/r/20210322115233.05e4e82a@alex-virtual-machine
Link: https://lkml.kernel.org/r/20210319104437.6f30e80d@alex-virtual-machine
Signed-off-by: Aili Yao <yaoaili@kingsoft.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Aili Yao <yaoaili@kingsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull cifs fixes from Steve French:
"Three cifs/smb3 fixes, two for stable: a reconnect fix and a fix for
display of devnames with special characters"
* tag '5.12-rc6-smb3' of git://git.samba.org/sfrench/cifs-2.6:
cifs: escape spaces in share names
fs: cifs: Remove unnecessary struct declaration
cifs: On cifs_reconnect, resolve the hostname again.
Pull rdma fixes from Jason Gunthorpe:
"Nothing very exciting here, just a few small bug fixes. No red flags
for this release have shown up.
- Regression from the last pull request in cxgb4 related to the ipv6
fixes
- KASAN crasher in rtrs
- oops in hfi1 related to a buggy BIOS
- Userspace could oops qedr's XRC support
- Uninitialized memory when parsing a LS_NLA_TYPE_DGID netlink
message"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/addr: Be strict with gid size
RDMA/qedr: Fix kernel panic when trying to access recv_cq
IB/hfi1: Fix probe time panic when AIP is enabled with a buggy BIOS
RDMA/cxgb4: check for ipv6 address properly while destroying listener
RDMA/rtrs-clt: Close rtrs client conn before destroying rtrs clt session files
Pull s390 fixes from Heiko Carstens:
- fix incorrect dereference of the ext_params2 external interrupt
parameter, which leads to an instant kernel crash if a pfault
interrupt occurs.
- add forgotten stack unwinder support, and fix memory leak for the
new machine check handler stack.
- fix inline assembly register clobbering due to KASAN code
instrumentation.
* tag 's390-5.12-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/setup: use memblock_free_late() to free old stack
s390/irq: fix reading of ext_params2 field from lowcore
s390/unwind: add machine check handler stack
s390/cpcmd: fix inline assembly register clobbering
Pull sound fixes from Takashi Iwai:
"This batch became unexpectedly bigger due to the pending ASoC patches,
but all look small and fine device-specific fixes.
Many of the commits are for ASoC Intel drivers, while the rest are for
ASoC small codec/platform fixes and HD-audio quirks"
* tag 'sound-5.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (21 commits)
ALSA: hda/realtek: Fix speaker amp setup on Acer Aspire E1
ALSA: aloop: Fix initialization of controls
ALSA: hda/conexant: Apply quirk for another HP ZBook G5 model
ASoC: fsl_esai: Fix TDM slot setup for I2S mode
ASoC: codecs: lpass-rx-macro: set npl clock rate correctly
ASoC: codecs: lpass-tx-macro: set npl clock rate correctly
ASoC: sunxi: sun4i-codec: fill ASoC card owner
ASoC: cygnus: fix for_each_child.cocci warnings
ASoC: max98373: Added 30ms turn on/off time delay
ASoC: max98373: Changed amp shutdown register as volatile
ASoC: intel: atom: Remove 44100 sample-rate from the media and deep-buffer DAI descriptions
ASoC: intel: atom: Stop advertising non working S24LE support
ASoC: wm8960: Fix wrong bclk and lrclk with pll enabled for some chips
ASoC: SOF: Intel: move ELH chip info
ASoC: SOF: Intel: APL: set shutdown callback to hda_dsp_shutdown
ASoC: SOF: Intel: CNL: set shutdown callback to hda_dsp_shutdown
ASoC: SOF: Intel: ICL: set shutdown callback to hda_dsp_shutdown
ASoC: SOF: Intel: TGL: set shutdown callback to hda_dsp_shutdown
ASoC: SOF: Intel: TGL: fix EHL ops
ASoC: SOF: core: harden shutdown helper
...
Pull kvm fix from Paolo Bonzini:
"A lone x86 patch, for a bug found while developing a backport to
stable versions"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86/mmu: preserve pending TLB flush across calls to kvm_tdp_mmu_zap_sp
Pull close_range() fix from Christian Brauner:
"Syzbot reported a bug in close_range.
Debugging this showed we didn't recalculate the current maximum fd
number for CLOSE_RANGE_UNSHARE | CLOSE_RANGE_CLOEXEC after we unshared
the file descriptors table. As a result, max_fd could exceed the
current fdtable maximum causing us to set excessive bits.
As a concrete example, let's say the user requested everything from fd
4 to ~0UL to be closed and their current fdtable size is 256 with
their highest open fd being 4. With CLOSE_RANGE_UNSHARE the caller
will end up with a new fdtable which has room for 64 file descriptors
since that is the lowest fdtable size we accept. But now max_fd will
still point to 255 and needs to be adjusted. Fix this by retrieving
the correct maximum fd value in __range_cloexec().
I've carried this fix for a little while but since there was no
linux-next release over easter I waited until now.
With this change close_range() can be further simplified but imho we
are in no hurry to do that and so I'll defer this for the 5.13 merge
window"
* tag 'for-linus-2021-04-08' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
file: fix close_range() for unshare+cloexec
Pull umount fix from Al Viro:
"Brown paperbag time: dumb braino in the series that went into 5.7
broke the 'don't step into ->d_weak_revalidate() when umount(2) looks
the victim up' behaviour.
Spotted only now - saw
if (!err && unlikely(nd->flags & LOOKUP_MOUNTPOINT)) {
err = handle_lookup_down(nd);
nd->flags &= ~LOOKUP_JUMPED; // no d_weak_revalidate(), please...
}
and went "why do we clear that flag here - nothing below that point is
going to check it anyway" / "wait a minute, what is it doing *after*
complete_walk() (which is where we check that flag and call
->d_weak_revalidate())" / "how could that possibly _not_ break?",
followed by reproducing the breakage and verifying that the obvious
fix of that braino does, indeed, fix it.
The reproducer is (assuming that $DIR exists and is exported r/w to
localhost)
mkdir $DIR/a
mkdir /tmp/foo
mount --bind /tmp/foo /tmp/foo
mkdir /tmp/foo/a
mkdir /tmp/foo/b
mount -t nfs4 localhost:$DIR/a /tmp/foo/a
mount -t nfs4 localhost:$DIR /tmp/foo/b
rmdir /tmp/foo/b/a
umount /tmp/foo/b
umount /tmp/foo/a
umount -l /tmp/foo # will get everything under /tmp/foo, no matter what
Correct behaviour is successful umount; broken kernels (5.7-rc1 and
later) get
umount.nfs4: /tmp/foo/a: Stale file handle
Note that bind mount is there to be able to recover - on broken
kernels we'd get stuck with impossible-to-umount filesystem if not for
that.
FWIW, that braino had been posted for review back then, at least
twice. Unfortunately, the call of complete_walk() was outside of diff
context, so the bogosity hadn't been immediately obvious from the
patch alone ;-/"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
LOOKUP_MOUNTPOINT: we are cleaning "jumped" flag too late
Right now, if a call to kvm_tdp_mmu_zap_sp returns false, the caller
will skip the TLB flush, which is wrong. There are two ways to fix
it:
- since kvm_tdp_mmu_zap_sp will not yield and therefore will not flush
the TLB itself, we could change the call to kvm_tdp_mmu_zap_sp to
use "flush |= ..."
- or we can chain the flush argument through kvm_tdp_mmu_zap_sp down
to __kvm_tdp_mmu_zap_gfn_range. Note that kvm_tdp_mmu_zap_sp will
neither yield nor flush, so flush would never go from true to
false.
This patch does the former to simplify application to stable kernels,
and to make it further clearer that kvm_tdp_mmu_zap_sp will not flush.
Cc: seanjc@google.com
Fixes: 048f49809c ("KVM: x86/mmu: Ensure TLBs are flushed for TDP MMU during NX zapping")
Cc: <stable@vger.kernel.org> # 5.10.x: 048f49809c: KVM: x86/mmu: Ensure TLBs are flushed for TDP MMU during NX zapping
Cc: <stable@vger.kernel.org> # 5.10.x: 33a3164161: KVM: x86/mmu: Don't allow TDP MMU to yield when recovering NX pages
Cc: <stable@vger.kernel.org>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit 653a5efb84 ("cifs: update super_operations to show_devname")
introduced the display of devname for cifs mounts. However, when mounting
a share which has a whitespace in the name, that exact share name is also
displayed in mountinfo. Make sure that all whitespace is escaped.
Signed-off-by: Maciek Borzecki <maciek.borzecki@gmail.com>
CC: <stable@vger.kernel.org> # 5.11+
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
struct cifs_readdata is declared twice. One is declared
at 208th line.
And struct cifs_readdata is defined blew.
The declaration here is not needed. Remove the duplicate.
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
On cifs_reconnect, make sure that DNS resolution happens again.
It could be the cause of connection to go dead in the first place.
This also contains the fix for a build issue identified by Intel bot.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
CC: <stable@vger.kernel.org> # 5.11+
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull ARC fixlets from Vineet Gupta:
"A few straggler fixes for ARC"
* tag 'arc-5.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: treewide: avoid the pointer addition with NULL pointer
arc: kernel: Return -EFAULT if copy_to_user() fails
ARC: haps: bump memory to 1 GB
ipv6 bit is wrongly set by the below which causes fatal adapter lookup
engine errors for ipv4 connections while destroying a listener. Fix it to
properly check the local address for ipv6.
Fixes: 3408be145a ("RDMA/cxgb4: Fix adapter LE hash errors while destroying ipv6 listening server")
Link: https://lore.kernel.org/r/20210331135715.30072-1-bharat@chelsio.com
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Pull ARM SoC fixes from Arnd Bergmann:
"Most of the changes again are devicetree fixes, but there are also
five trivial build fixes for issues I found when test building with
gcc-11 or when running 'make W=1', and some OMAP platform specific
code fixups.
Broadcom:
- One revert for a Raspberry pi interrupt controller change that
caused a regression.
TI OMAP:
- Remove unused duplicate sha2md5_fck clock node that can race with
the OMAP4_SHA2MD5_CLKCTRL clock node for disable for unused clocks
- Add aliases for omap4/5 mmc to put the slots back into the right
order again
- Fix typo for bionic voltage controllers that accidentally use mpu
for all instances instead of mpu, core and iva
- Fix random hangs for droid4 caused by missing fix from TI Android
kernel tree to do a dummy smc call on cpuidle wakeup path
NXP i.MX:
- Fix a system failure on imx6qdl-phytec-pfla02 board when booting
from SD, by adding missing vmmc supply for SD interfaces.
- Fix address typo in i.MX8MM/Q IOMUXC_SD1_DATA0_GPIO2_IO2
definition.
Marvell mvebu:
- Fix storm interrupt on Turris Omnia
- Enable hardware buffer management as it should be
... and build fixes for PXA, Freescale, Marvell, OMAP1 and Keystone"
* tag 'arm-fixes-5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
ARM: dts: turris-omnia: configure LED[2]/INTn pin as interrupt pin
ARM: dts: turris-omnia: fix hardware buffer management
Revert "arm64: dts: marvell: armada-cp110: Switch to per-port SATA interrupts"
ARM: mvebu: avoid clang -Wtautological-constant warning
ARM: pxa: mainstone: avoid -Woverride-init warning
ARM: omap1: fix building with clang IAS
soc/fsl: qbman: fix conflicting alignment attributes
ARM: keystone: fix integer overflow warning
ARM: dts: imx6: pbab01: Set vmmc supply for both SD interfaces
arm64: dts: imx8mm/q: Fix pad control of SD1_DATA0
ARM: OMAP4: PM: update ROM return address for OSWR and OFF
ARM: OMAP4: Fix PMIC voltage domains for bionic
ARM: dts: Fix moving mmc devices with aliases for omap4 & 5
ARM: dts: Drop duplicate sha2md5_fck to fix clk_disable race
Revert "ARM: dts: bcm2711: Add the BSC interrupt controller"
Pull parisc fixes from Helge Deller:
"One link error fix found by the kernel test robot, one sparse warning
fix, remove a duplicate declaration and some spelling fixes"
* 'parisc-5.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: math-emu: Few spelling fixes in the file fpu.h
parisc: avoid a warning on u8 cast for cmpxchg on u8 pointers
parisc: parisc-agp requires SBA IOMMU driver
parisc: Remove duplicate struct task_struct declaration
Pull x86 platform driver fix from Hans de Goede:
"A single bugfix to fix spurious wakeups from suspend caused by recent
intel-hid driver changes"
* tag 'platform-drivers-x86-v5.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: intel-hid: Fix spurious wakeups caused by tablet-mode events during suspend
Pull regulator fixes from Mark Brown:
"bd9571mwv regulator fixes for v5.12.
A set of driver specific fixes here, the main one is a fix to not try
to set unsupported voltages on this device. The other two patches
clean up the error handling and eliminate the possibility that we
could overflow the page when writing sysfs output (which AFAICT wasn't
an issue but better to be sure)"
* tag 'regulator-fix-v5.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: bd9571mwv: Convert device attribute to sysfs_emit()
regulator: bd9571mwv: Fix regulator name printed on registration failure
regulator: bd9571mwv: Fix AVS and DVFS voltage range
ASoC: Fixes for v5.12
A fairly small batch of driver specific fixes, mainly for various x86
systems with the biggest set being fixes to power down DSPs properly on
x86 SOF systems.
Use memblock_free_late() to free the old machine check stack to the
buddy allocator instead of leaking it.
Fixes: b61b159512 ("s390: add stack for machine check handler")
Cc: Vasily Gorbik <gor@linux.ibm.com>
Acked-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
That (and traversals in case of umount .) should be done before
complete_walk(). Either a braino or mismerge damage on queue
reorders - either way, I should've spotted that much earlier.
Fucked-up-by: Al Viro <viro@zeniv.linux.org.uk>
X-Paperbag: Brown
Fixes: 161aff1d93 "LOOKUP_MOUNTPOINT: fold path_mountpointat() into path_lookupat()"
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
mvebu fixes for 5.12 (part 1)
2 fixes on on turris-omnia (Armada 38x based:)
- Fix storm interrupt
- Enable hardware buffer management as it should be
Unbreak AHCI on all Marvell Armada 7k8k / CN913x platforms
* tag 'mvebu-fixes-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/gclement/mvebu:
ARM: dts: turris-omnia: configure LED[2]/INTn pin as interrupt pin
ARM: dts: turris-omnia: fix hardware buffer management
Revert "arm64: dts: marvell: armada-cp110: Switch to per-port SATA interrupts"
Link: https://lore.kernel.org/r/87a6qgctit.fsf@BL-laptop
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Pull fs fixes from Al Viro:
"Fairly old hostfs bug (in setups that are not used by anyone,
apparently) + fix for this cycle regression: extra dput/mntput in
LOOKUP_CACHED failure handling"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
Make sure nd->path.mnt and nd->path.dentry are always valid pointers
hostfs: fix memory handling in follow_link()
Initialize them in set_nameidata() and make sure that terminate_walk() clears them
once the pointers become potentially invalid (i.e. we leave RCU mode or drop them
in non-RCU one). Currently we have "path_init() always initializes them and nobody
accesses them outside of path_init()/terminate_walk() segments", which is asking
for trouble.
With that change we would have nd->path.{mnt,dentry}
1) always valid - NULL or pointing to currently allocated objects.
2) non-NULL while we are successfully walking
3) NULL when we are not walking at all
4) contributing to refcounts whenever non-NULL outside of RCU mode.
Fixes: 6c6ec2b0a3 ("fs: add support for LOOKUP_CACHED")
Reported-by: syzbot+c88a7030da47945a3cc3@syzkaller.appspotmail.com
Tested-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
struct task_struct is declared twice. One has been declared
at 154th line. Remove the duplicate.
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Pull workqueue fixes from Tejun Heo:
"Two workqueue fixes.
One is around debugobj and poses no risk. The other is to prevent the
stall watchdog from firing spuriously in certain conditions. Not as
trivial as debugobj change but is still fairly low risk"
* 'for-5.12-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue/watchdog: Make unbound workqueues aware of touch_softlockup_watchdog() 84;0;0c84;0;0c There are two workqueue-specific watchdog timestamps:
workqueue: Move the position of debug_work_activate() in __queue_work()
For each device, the nosy driver allocates a pcilynx structure.
A use-after-free might happen in the following scenario:
1. Open nosy device for the first time and call ioctl with command
NOSY_IOC_START, then a new client A will be malloced and added to
doubly linked list.
2. Open nosy device for the second time and call ioctl with command
NOSY_IOC_START, then a new client B will be malloced and added to
doubly linked list.
3. Call ioctl with command NOSY_IOC_START for client A, then client A
will be readded to the doubly linked list. Now the doubly linked
list is messed up.
4. Close the first nosy device and nosy_release will be called. In
nosy_release, client A will be unlinked and freed.
5. Close the second nosy device, and client A will be referenced,
resulting in UAF.
The root cause of this bug is that the element in the doubly linked list
is reentered into the list.
Fix this bug by adding a check before inserting a client. If a client
is already in the linked list, don't insert it.
The following KASAN report reveals it:
BUG: KASAN: use-after-free in nosy_release+0x1ea/0x210
Write of size 8 at addr ffff888102ad7360 by task poc
CPU: 3 PID: 337 Comm: poc Not tainted 5.12.0-rc5+ #6
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
Call Trace:
nosy_release+0x1ea/0x210
__fput+0x1e2/0x840
task_work_run+0xe8/0x180
exit_to_user_mode_prepare+0x114/0x120
syscall_exit_to_user_mode+0x1d/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
Allocated by task 337:
nosy_open+0x154/0x4d0
misc_open+0x2ec/0x410
chrdev_open+0x20d/0x5a0
do_dentry_open+0x40f/0xe80
path_openat+0x1cf9/0x37b0
do_filp_open+0x16d/0x390
do_sys_openat2+0x11d/0x360
__x64_sys_open+0xfd/0x1a0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
Freed by task 337:
kfree+0x8f/0x210
nosy_release+0x158/0x210
__fput+0x1e2/0x840
task_work_run+0xe8/0x180
exit_to_user_mode_prepare+0x114/0x120
syscall_exit_to_user_mode+0x1d/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
The buggy address belongs to the object at ffff888102ad7300 which belongs to the cache kmalloc-128 of size 128
The buggy address is located 96 bytes inside of 128-byte region [ffff888102ad7300, ffff888102ad7380)
[ Modified to use 'list_empty()' inside proper lock - Linus ]
Link: https://lore.kernel.org/lkml/1617433116-5930-1-git-send-email-zheyuma97@gmail.com/
Reported-and-tested-by: 马哲宇 (Zheyu Ma) <zheyuma97@gmail.com>
Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>