Pull SIGCHLD fix from Eric Biederman:
"Christof Meerwald reported that do_notify_parent has not been
successfully populating si_pid and si_uid for multi-threaded
processes.
This is the one-liner fix. Strictly speaking a one-liner plus
comment"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
signal: Avoid corrupting si_pid and si_uid in do_notify_parent
Pull PCI fixes from Bjorn Helgaas:
- Workaround Apex TPU class code issue that prevents resource
assignment (Bjorn Helgaas)
- Update MAINTAINERS to add Rob Herring for native PCI controller
drivers (Lorenzo Pieralisi)
* tag 'pci-v5.7-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
MAINTAINERS: Add Rob Herring and remove Andy Murray as PCI reviewers
PCI: Move Apex Edge TPU class quirk to fix BAR assignment
Pull ARM SoC fixes from Arnd Bergmann:
"A few smaller fixes for v5.7-rc3: The majority are fixes for bugs I
found after restarting my randconfig build testing that had been
dormant for a while.
On the Nokia N950/N9 phone, a DT fix is required to address a boot
regression.
For the bcm283x (Raspberry Pi), two DT fixes address minor issues"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
soc: imx8: select SOC_BUS
soc: tegra: fix tegra_pmc_get_suspend_mode definition
soc: fsl: dpio: avoid stack usage warning
soc: fsl: dpio: fix incorrect pointer conversions
ARM: imx: provide v7_cpu_resume() only on ARM_CPU_SUSPEND=y
ARM: dts: bcm283x: Disable dsi0 node
firmware: xilinx: make firmware_debugfs_root static
drivers: soc: xilinx: fix firmware driver Kconfig dependency
ARM: dts: bcm283x: Add cells encoding format to firmware bus
ARM: dts: OMAP3: disable RNG on N950/N9
Pull nfsd fixes from Chuck Lever:
"The first set of 5.7-rc fixes for NFS server issues.
These were all unresolved at the time the 5.7 window opened, and
needed some additional time to ensure they were correctly addressed.
They are ready now.
At the moment I know of one more urgent issue regarding the NFS
server. A fix has been tested and is under review. I expect to send
one more pull request, containing this fix (which now consists of 3
patches).
Fixes:
- Address several use-after-free and memory leak bugs
- Prevent a backchannel livelock"
* tag 'nfsd-5.7-rc-1' of git://git.linux-nfs.org/projects/cel/cel-2.6:
svcrdma: Fix leak of svc_rdma_recv_ctxt objects
svcrdma: Fix trace point use-after-free race
SUNRPC: Fix backchannel RPC soft lockups
SUNRPC/cache: Fix unsafe traverse caused double-free in cache_purge
nfsd: memory corruption in nfsd4_lock()
Pull exfat fixes from Namjae Jeon:
- several bug fixes(broken mount discard option, remount failure,
memory leak)
- add missing MODULE_ALIAS_FS for automatically loading exfat module.
- set s_time_gran and truncate atime with exfat timestamp granularity.
* tag 'for-5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
exfat: truncate atimes to 2s granularity
exfat: properly set s_time_gran
exfat: remove 'bps' mount-option
exfat: Unify access to the boot sector
exfat: add missing MODULE_ALIAS_FS()
exfat: Fix discard support
Pull remoteproc fixes from Bjorn Andersson:
"This fixes a regression in the probe error path of the Qualcomm modem
remoteproc driver and a mix up of phy_addr_t and dma_addr_t in the
Mediatek SCP control driver"
* tag 'rproc-v5.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc:
remoteproc: mtk_scp: use dma_addr_t for DMA API
remoteproc: qcom_q6v5_mss: fix q6v5_probe() error paths
remoteproc: qcom_q6v5_mss: fix a bug in q6v5_probe()
Pull audit fix from Paul Moore:
"One small audit patch fix, fixing a missing length check on input from
userspace, nothing crazy"
* tag 'audit-pr-20200422' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: check the length of userspace generated audit records
This pull request contains Broadcom ARM-based SoCs Device Tree fixes for
5.7, please pull the following:
- Nicolas provides a fix for 55c7c06210 ("ARM: dts: bcm283x: Fix vc4's
firmware bus DMA limitations") which missed adding proper
#address-cells and #size-cells properties and he also disables the DSI
node which should have been disabled by default but was not.
* tag 'arm-soc/for-5.7/devicetree-fixes' of https://github.com/Broadcom/stblinux:
ARM: dts: bcm283x: Disable dsi0 node
ARM: dts: bcm283x: Add cells encoding format to firmware bus
Link: https://lore.kernel.org/r/20200417171725.1084-1-f.fainelli@gmail.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Pull kselftest fixes from Shuah Khan:
"This consists of fixes to runner scripts and individual test run-time
bugs. Includes fixes to tpm2 and memfd test run-time regressions"
* tag 'linux-kselftest-5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/ipc: Fix test failure seen after initial test run
Revert "Kernel selftests: tpm2: check for tpm support"
selftests/ftrace: Add CONFIG_SAMPLE_FTRACE_DIRECT=m kconfig
selftests/seccomp: allow clock_nanosleep instead of nanosleep
kselftest/runner: allow to properly deliver signals to tests
selftests/harness: fix spelling mistake "SIGARLM" -> "SIGALRM"
selftests: Fix memfd test run-time regression
selftests: vm: Fix 64-bit test builds for powerpc64le
selftests: vm: Do not override definition of ARCH
The timestamp for access_time has double seconds granularity(There is no
10msIncrement field for access_time unlike create/modify_time).
exfat's atimes are restricted to only 2s granularity so after
we set an atime, round it down to the nearest 2s and set the
sub-second component of the timestamp to 0.
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
The s_time_gran superblock field indicates the on-disk nanosecond
granularity of timestamps, and for exfat that seems to be 10ms, so
set s_time_gran to 10000000ns. Without this, in-memory timestamps
change when they get re-read from disk.
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Unify access to boot sector via 'sbi->pbr_bh'.
This fixes vol_flags inconsistency at read failed in fs_set_vol_flags(),
and buffer_head leak in __exfat_fill_super().
Signed-off-by: Tetsuhiro Kohada <Kohada.Tetsuhiro@dc.MitsubishiElectric.co.jp>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
This adds the necessary MODULE_ALIAS_FS() to exfat so the module gets
automatically loaded when an exfat filesystem is mounted.
Signed-off-by: Thomas Backlund <tmb@mageia.org>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Discard support was always unconditionally disabled. Now it is disabled
only in the case when blk_queue_discard() returns false.
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Merge misc fixes from Andrew Morton:
"15 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
tools/vm: fix cross-compile build
coredump: fix null pointer dereference on coredump
mm: shmem: disable interrupt when acquiring info->lock in userfaultfd_copy path
shmem: fix possible deadlocks on shmlock_user_lock
vmalloc: fix remap_vmalloc_range() bounds checks
mm/shmem: fix build without THP
mm/ksm: fix NULL pointer dereference when KSM zero page is enabled
tools/build: tweak unused value workaround
checkpatch: fix a typo in the regex for $allocFunctions
mm, gup: return EINTR when gup is interrupted by fatal signals
mm/hugetlb: fix a addressing exception caused by huge_pte_offset
MAINTAINERS: add an entry for kfifo
mm/userfaultfd: disable userfaultfd-wp on x86_32
slub: avoid redzone when choosing freepointer location
sh: fix build error in mm/init.c
Pull kvm fixes from Paolo Bonzini:
"Bugfixes, and a few cleanups to the newly-introduced assembly language
vmentry code for AMD"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: PPC: Book3S HV: Handle non-present PTEs in page fault functions
kvm: Disable objtool frame pointer checking for vmenter.S
MAINTAINERS: add a reviewer for KVM/s390
KVM: s390: Fix PV check in deliverable_irqs()
kvm: Handle reads of SandyBridge RAPL PMU MSRs rather than injecting #GP
KVM: Remove CREATE_IRQCHIP/SET_PIT2 race
KVM: SVM: Fix __svm_vcpu_run declaration.
KVM: SVM: Do not setup frame pointer in __svm_vcpu_run
KVM: SVM: Fix build error due to missing release_pages() include
KVM: SVM: Do not mark svm_vcpu_run with STACK_FRAME_NON_STANDARD
kvm: nVMX: match comment with return type for nested_vmx_exit_reflected
kvm: nVMX: reflect MTF VM-exits if injected by L1
KVM: s390: Return last valid slot if approx index is out-of-bounds
KVM: Check validity of resolved slot when searching memslots
KVM: VMX: Enable machine check support for 32bit targets
KVM: SVM: move more vmentry code to assembly
KVM: SVM: fix compilation with modular PSP and non-modular KVM
Pull virtio fixes and cleanups from Michael Tsirkin:
- Some bug fixes
- Cleanup a couple of issues that surfaced meanwhile
- Disable vhost on ARM with OABI for now - to be fixed fully later in
the cycle or in the next release.
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (24 commits)
vhost: disable for OABI
virtio: drop vringh.h dependency
virtio_blk: add a missing include
virtio-balloon: Avoid using the word 'report' when referring to free page hinting
virtio-balloon: make virtballoon_free_page_report() static
vdpa: fix comment of vdpa_register_device()
vdpa: make vhost, virtio depend on menu
vdpa: allow a 32 bit vq alignment
drm/virtio: fix up for include file changes
remoteproc: pull in slab.h
rpmsg: pull in slab.h
virtio_input: pull in slab.h
remoteproc: pull in slab.h
virtio-rng: pull in slab.h
virtgpu: pull in uaccess.h
tools/virtio: make asm/barrier.h self contained
tools/virtio: define aligned attribute
virtio/test: fix up after IOTLB changes
vhost: Create accessors for virtqueues private_data
vdpasim: Return status in vdpasim_get_status
...
Pull tpm fixes from Jarkko Sakkinen:
"A few bug fixes"
* tag 'tpmdd-next-20200421' of git://git.infradead.org/users/jjs/linux-tpmdd:
tpm/tpm_tis: Free IRQ if probing fails
tpm: fix wrong return value in tpm_pcr_extend
tpm: ibmvtpm: retry on H_CLOSED in tpm_ibmvtpm_send()
tpm: Export tpm2_get_cc_attrs_tbl for ibmvtpm driver as module
Pull clang-format fixlets from Miguel Ojeda:
"Two trivial clang-format changes:
- Don't indent C++ namespaces (Ian Rogers)
- The usual clang-format macro list update (Miguel Ojeda)"
* tag 'clang-format-for-linus-v5.7-rc3' of git://github.com/ojeda/linux:
clang-format: Update with the latest for_each macro list
clang-format: don't indent namespaces
Syzbot reported the below lockdep splat:
WARNING: possible irq lock inversion dependency detected
5.6.0-rc7-syzkaller #0 Not tainted
--------------------------------------------------------
syz-executor.0/10317 just changed the state of lock:
ffff888021d16568 (&(&info->lock)->rlock){+.+.}, at: spin_lock include/linux/spinlock.h:338 [inline]
ffff888021d16568 (&(&info->lock)->rlock){+.+.}, at: shmem_mfill_atomic_pte+0x1012/0x21c0 mm/shmem.c:2407
but this lock was taken by another, SOFTIRQ-safe lock in the past:
(&(&xa->xa_lock)->rlock#5){..-.}
and interrupts could create inverse lock ordering between them.
other info that might help us debug this:
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&(&info->lock)->rlock);
local_irq_disable();
lock(&(&xa->xa_lock)->rlock#5);
lock(&(&info->lock)->rlock);
<Interrupt>
lock(&(&xa->xa_lock)->rlock#5);
*** DEADLOCK ***
The full report is quite lengthy, please see:
https://lore.kernel.org/linux-mm/alpine.LSU.2.11.2004152007370.13597@eggly.anvils/T/#m813b412c5f78e25ca8c6c7734886ed4de43f241d
It is because CPU 0 held info->lock with IRQ enabled in userfaultfd_copy
path, then CPU 1 is splitting a THP which held xa_lock and info->lock in
IRQ disabled context at the same time. If softirq comes in to acquire
xa_lock, the deadlock would be triggered.
The fix is to acquire/release info->lock with *_irq version instead of
plain spin_{lock,unlock} to make it softirq safe.
Fixes: 4c27fe4c4c ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support")
Reported-by: syzbot+e27980339d305f2dbfd9@syzkaller.appspotmail.com
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: syzbot+e27980339d305f2dbfd9@syzkaller.appspotmail.com
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Link: http://lkml.kernel.org/r/1587061357-122619-1-git-send-email-yang.shi@linux.alibaba.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Recent commit 71725ed10c ("mm: huge tmpfs: try to split_huge_page()
when punching hole") has allowed syzkaller to probe deeper, uncovering a
long-standing lockdep issue between the irq-unsafe shmlock_user_lock,
the irq-safe xa_lock on mapping->i_pages, and shmem inode's info->lock
which nests inside xa_lock (or tree_lock) since 4.8's shmem_uncharge().
user_shm_lock(), servicing SysV shmctl(SHM_LOCK), wants
shmlock_user_lock while its caller shmem_lock() holds info->lock with
interrupts disabled; but hugetlbfs_file_setup() calls user_shm_lock()
with interrupts enabled, and might be interrupted by a writeback endio
wanting xa_lock on i_pages.
This may not risk an actual deadlock, since shmem inodes do not take
part in writeback accounting, but there are several easy ways to avoid
it.
Requiring interrupts disabled for shmlock_user_lock would be easy, but
it's a high-level global lock for which that seems inappropriate.
Instead, recall that the use of info->lock to guard info->flags in
shmem_lock() dates from pre-3.1 days, when races with SHMEM_PAGEIN and
SHMEM_TRUNCATE could occur: nowadays it serves no purpose, the only flag
added or removed is VM_LOCKED itself, and calls to shmem_lock() an inode
are already serialized by the caller.
Take info->lock out of the chain and the possibility of deadlock or
lockdep warning goes away.
Fixes: 4595ef88d1 ("shmem: make shmem_inode_info::lock irq-safe")
Reported-by: syzbot+c8a8197c8852f566b9d9@syzkaller.appspotmail.com
Reported-by: syzbot+40b71e145e73f78f81ad@syzkaller.appspotmail.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2004161707410.16322@eggly.anvils
Link: https://lore.kernel.org/lkml/000000000000e5838c05a3152f53@google.com/
Link: https://lore.kernel.org/lkml/0000000000003712b305a331d3b1@google.com/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
remap_vmalloc_range() has had various issues with the bounds checks it
promises to perform ("This function checks that addr is a valid
vmalloc'ed area, and that it is big enough to cover the vma") over time,
e.g.:
- not detecting pgoff<<PAGE_SHIFT overflow
- not detecting (pgoff<<PAGE_SHIFT)+usize overflow
- not checking whether addr and addr+(pgoff<<PAGE_SHIFT) are the same
vmalloc allocation
- comparing a potentially wildly out-of-bounds pointer with the end of
the vmalloc region
In particular, since commit fc9702273e ("bpf: Add mmap() support for
BPF_MAP_TYPE_ARRAY"), unprivileged users can cause kernel null pointer
dereferences by calling mmap() on a BPF map with a size that is bigger
than the distance from the start of the BPF map to the end of the
address space.
This could theoretically be used as a kernel ASLR bypass, by using
whether mmap() with a given offset oopses or returns an error code to
perform a binary search over the possible address range.
To allow remap_vmalloc_range_partial() to verify that addr and
addr+(pgoff<<PAGE_SHIFT) are in the same vmalloc region, pass the offset
to remap_vmalloc_range_partial() instead of adding it to the pointer in
remap_vmalloc_range().
In remap_vmalloc_range_partial(), fix the check against
get_vm_area_size() by using size comparisons instead of pointer
comparisons, and add checks for pgoff.
Fixes: 833423143c ("[PATCH] mm: introduce remap_vmalloc_range()")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: Andrii Nakryiko <andriin@fb.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@chromium.org>
Link: http://lkml.kernel.org/r/20200415222312.236431-1-jannh@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Our machine encountered a panic(addressing exception) after run for a
long time and the calltrace is:
RIP: hugetlb_fault+0x307/0xbe0
RSP: 0018:ffff9567fc27f808 EFLAGS: 00010286
RAX: e800c03ff1258d48 RBX: ffffd3bb003b69c0 RCX: e800c03ff1258d48
RDX: 17ff3fc00eda72b7 RSI: 00003ffffffff000 RDI: e800c03ff1258d48
RBP: ffff9567fc27f8c8 R08: e800c03ff1258d48 R09: 0000000000000080
R10: ffffaba0704c22a8 R11: 0000000000000001 R12: ffff95c87b4b60d8
R13: 00005fff00000000 R14: 0000000000000000 R15: ffff9567face8074
FS: 00007fe2d9ffb700(0000) GS:ffff956900e40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffd3bb003b69c0 CR3: 000000be67374000 CR4: 00000000003627e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
follow_hugetlb_page+0x175/0x540
__get_user_pages+0x2a0/0x7e0
__get_user_pages_unlocked+0x15d/0x210
__gfn_to_pfn_memslot+0x3c5/0x460 [kvm]
try_async_pf+0x6e/0x2a0 [kvm]
tdp_page_fault+0x151/0x2d0 [kvm]
...
kvm_arch_vcpu_ioctl_run+0x330/0x490 [kvm]
kvm_vcpu_ioctl+0x309/0x6d0 [kvm]
do_vfs_ioctl+0x3f0/0x540
SyS_ioctl+0xa1/0xc0
system_call_fastpath+0x22/0x27
For 1G hugepages, huge_pte_offset() wants to return NULL or pudp, but it
may return a wrong 'pmdp' if there is a race. Please look at the
following code snippet:
...
pud = pud_offset(p4d, addr);
if (sz != PUD_SIZE && pud_none(*pud))
return NULL;
/* hugepage or swap? */
if (pud_huge(*pud) || !pud_present(*pud))
return (pte_t *)pud;
pmd = pmd_offset(pud, addr);
if (sz != PMD_SIZE && pmd_none(*pmd))
return NULL;
/* hugepage or swap? */
if (pmd_huge(*pmd) || !pmd_present(*pmd))
return (pte_t *)pmd;
...
The following sequence would trigger this bug:
- CPU0: sz = PUD_SIZE and *pud = 0 , continue
- CPU0: "pud_huge(*pud)" is false
- CPU1: calling hugetlb_no_page and set *pud to xxxx8e7(PRESENT)
- CPU0: "!pud_present(*pud)" is false, continue
- CPU0: pmd = pmd_offset(pud, addr) and maybe return a wrong pmdp
However, we want CPU0 to return NULL or pudp in this case.
We must make sure there is exactly one dereference of pud and pmd.
Signed-off-by: Longpeng <longpeng2@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200413010342.771-1-longpeng2@huawei.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christof Meerwald <cmeerw@cmeerw.org> writes:
> Hi,
>
> this is probably related to commit
> 7a0cf09494 (signal: Correct namespace
> fixups of si_pid and si_uid).
>
> With a 5.6.5 kernel I am seeing SIGCHLD signals that don't include a
> properly set si_pid field - this seems to happen for multi-threaded
> child processes.
>
> A simple test program (based on the sample from the signalfd man page):
>
> #include <sys/signalfd.h>
> #include <signal.h>
> #include <unistd.h>
> #include <spawn.h>
> #include <stdlib.h>
> #include <stdio.h>
>
> #define handle_error(msg) \
> do { perror(msg); exit(EXIT_FAILURE); } while (0)
>
> int main(int argc, char *argv[])
> {
> sigset_t mask;
> int sfd;
> struct signalfd_siginfo fdsi;
> ssize_t s;
>
> sigemptyset(&mask);
> sigaddset(&mask, SIGCHLD);
>
> if (sigprocmask(SIG_BLOCK, &mask, NULL) == -1)
> handle_error("sigprocmask");
>
> pid_t chldpid;
> char *chldargv[] = { "./sfdclient", NULL };
> posix_spawn(&chldpid, "./sfdclient", NULL, NULL, chldargv, NULL);
>
> sfd = signalfd(-1, &mask, 0);
> if (sfd == -1)
> handle_error("signalfd");
>
> for (;;) {
> s = read(sfd, &fdsi, sizeof(struct signalfd_siginfo));
> if (s != sizeof(struct signalfd_siginfo))
> handle_error("read");
>
> if (fdsi.ssi_signo == SIGCHLD) {
> printf("Got SIGCHLD %d %d %d %d\n",
> fdsi.ssi_status, fdsi.ssi_code,
> fdsi.ssi_uid, fdsi.ssi_pid);
> return 0;
> } else {
> printf("Read unexpected signal\n");
> }
> }
> }
>
>
> and a multi-threaded client to test with:
>
> #include <unistd.h>
> #include <pthread.h>
>
> void *f(void *arg)
> {
> sleep(100);
> }
>
> int main()
> {
> pthread_t t[8];
>
> for (int i = 0; i != 8; ++i)
> {
> pthread_create(&t[i], NULL, f, NULL);
> }
> }
>
> I tried to do a bit of debugging and what seems to be happening is
> that
>
> /* From an ancestor pid namespace? */
> if (!task_pid_nr_ns(current, task_active_pid_ns(t))) {
>
> fails inside task_pid_nr_ns because the check for "pid_alive" fails.
>
> This code seems to be called from do_notify_parent and there we
> actually have "tsk != current" (I am assuming both are threads of the
> current process?)
I instrumented the code with a warning and received the following backtrace:
> WARNING: CPU: 0 PID: 777 at kernel/pid.c:501 __task_pid_nr_ns.cold.6+0xc/0x15
> Modules linked in:
> CPU: 0 PID: 777 Comm: sfdclient Not tainted 5.7.0-rc1userns+ #2924
> Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> RIP: 0010:__task_pid_nr_ns.cold.6+0xc/0x15
> Code: ff 66 90 48 83 ec 08 89 7c 24 04 48 8d 7e 08 48 8d 74 24 04 e8 9a b6 44 00 48 83 c4 08 c3 48 c7 c7 59 9f ac 82 e8 c2 c4 04 00 <0f> 0b e9 3fd
> RSP: 0018:ffffc9000042fbf8 EFLAGS: 00010046
> RAX: 000000000000000c RBX: 0000000000000000 RCX: ffffc9000042faf4
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff81193d29
> RBP: ffffc9000042fc18 R08: 0000000000000000 R09: 0000000000000001
> R10: 000000100f938416 R11: 0000000000000309 R12: ffff8880b941c140
> R13: 0000000000000000 R14: 0000000000000000 R15: ffff8880b941c140
> FS: 0000000000000000(0000) GS:ffff8880bca00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f2e8c0a32e0 CR3: 0000000002e10000 CR4: 00000000000006f0
> Call Trace:
> send_signal+0x1c8/0x310
> do_notify_parent+0x50f/0x550
> release_task.part.21+0x4fd/0x620
> do_exit+0x6f6/0xaf0
> do_group_exit+0x42/0xb0
> get_signal+0x13b/0xbb0
> do_signal+0x2b/0x670
> ? __audit_syscall_exit+0x24d/0x2b0
> ? rcu_read_lock_sched_held+0x4d/0x60
> ? kfree+0x24c/0x2b0
> do_syscall_64+0x176/0x640
> ? trace_hardirqs_off_thunk+0x1a/0x1c
> entry_SYSCALL_64_after_hwframe+0x49/0xb3
The immediate problem is as Christof noticed that "pid_alive(current) == false".
This happens because do_notify_parent is called from the last thread to exit
in a process after that thread has been reaped.
The bigger issue is that do_notify_parent can be called from any
process that manages to wait on a thread of a multi-threaded process
from wait_task_zombie. So any logic based upon current for
do_notify_parent is just nonsense, as current can be pretty much
anything.
So change do_notify_parent to call __send_signal directly.
Inspecting the code it appears this problem has existed since the pid
namespace support started handling this case in 2.6.30. This fix only
backports to 7a0cf09494 ("signal: Correct namespace fixups of si_pid and si_uid")
where the problem logic was moved out of __send_signal and into send_signal.
Cc: stable@vger.kernel.org
Fixes: 6588c1e3ff ("signals: SI_USER: Masquerade si_pid when crossing pid ns boundary")
Ref: 921cf9f630 ("signals: protect cinit from unblocked SIG_DFL signals")
Link: https://lore.kernel.org/lkml/20200419201336.GI22017@edge.cmeerw.net/
Reported-by: Christof Meerwald <cmeerw@cmeerw.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Since cd758a9b57 "KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot in HPT
page fault handler", it's been possible in fairly rare circumstances to
load a non-present PTE in kvmppc_book3s_hv_page_fault() when running a
guest on a POWER8 host.
Because that case wasn't checked for, we could misinterpret the non-present
PTE as being a cache-inhibited PTE. That could mismatch with the
corresponding hash PTE, which would cause the function to fail with -EFAULT
a little further down. That would propagate up to the KVM_RUN ioctl()
generally causing the KVM userspace (usually qemu) to fall over.
This addresses the problem by catching that case and returning to the guest
instead.
For completeness, this fixes the radix page fault handler in the same
way. For radix this didn't cause any obvious misbehaviour, because we
ended up putting the non-present PTE into the guest's partition-scoped
page tables, leading immediately to another hypervisor data/instruction
storage interrupt, which would go through the page fault path again
and fix things up.
Fixes: cd758a9b57 "KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot in HPT page fault handler"
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1820402
Reported-by: David Gibson <david@gibson.dropbear.id.au>
Tested-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Frame pointers are completely broken by vmenter.S because it clobbers
RBP:
arch/x86/kvm/svm/vmenter.o: warning: objtool: __svm_vcpu_run()+0xe4: BP used as a scratch register
That's unavoidable, so just skip checking that file when frame pointers
are configured in.
On the other hand, ORC can handle that code just fine, so leave objtool
enabled in the !FRAME_POINTER case.
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Message-Id: <01fae42917bacad18be8d2cbc771353da6603473.1587398610.git.jpoimboe@redhat.com>
Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Fixes: 199cd1d7b5 ("KVM: SVM: Split svm_vcpu_run inline assembly to separate file")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit 7561252892 ("audit: always check the netlink payload length
in audit_receive_msg()") fixed a number of missing message length
checks, but forgot to check the length of userspace generated audit
records. The good news is that you need CAP_AUDIT_WRITE to submit
userspace audit records, which is generally only given to trusted
processes, so the impact should be limited.
Cc: stable@vger.kernel.org
Fixes: 7561252892 ("audit: always check the netlink payload length in audit_receive_msg()")
Reported-by: syzbot+49e69b4d71a420ceda3e@syzkaller.appspotmail.com
Signed-off-by: Paul Moore <paul@paul-moore.com>
Call disable_interrupts() if we have to revert to polling in order not to
unnecessarily reserve the IRQ for the life-cycle of the driver.
Cc: stable@vger.kernel.org # 4.5.x
Reported-by: Hans de Goede <hdegoede@redhat.com>
Fixes: e3837e74a0 ("tpm_tis: Refactor the interrupt setup")
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
tpm_ibmvtpm_send() can fail during PowerVM Live Partition Mobility resume
with an H_CLOSED return from ibmvtpm_send_crq(). The PAPR says, 'The
"partner partition suspended" transport event disables the associated CRQ
such that any H_SEND_CRQ hcall() to the associated CRQ returns H_Closed
until the CRQ has been explicitly enabled using the H_ENABLE_CRQ hcall.'
This patch adds a check in tpm_ibmvtpm_send() for an H_CLOSED return from
ibmvtpm_send_crq() and in that case calls tpm_ibmvtpm_resume() and
retries the ibmvtpm_send_crq() once.
Cc: stable@vger.kernel.org # 3.7.x
Fixes: 132f762947 ("drivers/char/tpm: Add new device driver to support IBM vTPM")
Reported-by: Linh Pham <phaml@us.ibm.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: George Wilson <gcwilson@linux.ibm.com>
Tested-by: Linh Pham <phaml@us.ibm.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
This patch fixes the following problem when the ibmvtpm driver
is built as a module:
ERROR: modpost: "tpm2_get_cc_attrs_tbl" [drivers/char/tpm/tpm_ibmvtpm.ko] undefined!
make[1]: *** [scripts/Makefile.modpost:94: __modpost] Error 1
make: *** [Makefile:1298: modules] Error 2
Fixes: 18b3670d79 ("tpm: ibmvtpm: Add support for TPM2")
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Boot regression fix for N950/N9
We need to tag RNG as disabled for N950/N9 as it blocked by the secure
mode. We have a similar change done for N900, but I missed adding it
for N950/N9 with the recent RNG changes.
* tag 'omap-for-v5.6/fixes-rc7-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: dts: OMAP3: disable RNG on N950/N9
Link: https://lore.kernel.org/r/pull-1585340588-558327@atomide.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
vhost is currently broken on the some ARM configs.
The reason is that the ring element addresses are passed between
components with different alignments assumptions. Thus, if
guest selects a pointer and host then gets and dereferences
it, then alignment assumed by the host's compiler might be
greater than the actual alignment of the pointer.
compiler on the host from assuming pointer is aligned.
This actually triggers on ARM with -mabi=apcs-gnu - which is a
deprecated configuration. With this OABI, compiler assumes that
all structures are 4 byte aligned - which is stronger than
virtio guarantees for available and used rings, which are
merely 2 bytes. Thus a guest without -mabi=apcs-gnu running
on top of host with -mabi=apcs-gnu will be broken.
The correct fix is to force alignment of structures - however
that is an intrusive fix that's best deferred until the next release.
We didn't previously support such ancient systems at all - this surfaced
after vdpa support prompted removing dependency of vhost on
VIRTULIZATION. So for now, let's just add something along the lines of
depends on !ARM || AEABI
to the virtio Kconfig declaration, and add a comment that it has to do
with struct member alignment.
Note: we can't make VHOST and VHOST_RING themselves have
a dependency since these are selected. Add a new symbol for that.
We should be able to drop this dependency down the road.
Fixes: 20c384f1ea ("vhost: refine vhost and vringh kconfig")
Suggested-by: Ard Biesheuvel <ardb@kernel.org>
Suggested-by: Richard Earnshaw <Richard.Earnshaw@arm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>