Commit Graph

966066 Commits

Author SHA1 Message Date
Yanan Wang
5c646b7e1d KVM: arm64: Fix memory leak on stage2 update of a valid PTE
When installing a new leaf PTE onto an invalid ptep, we need to
get_page(ptep) to account for the new mapping.

However, simply updating a valid PTE shouldn't result in any
additional refcounting, as there is new mapping. This otherwise
results in a page being forever wasted.

Address this by fixing-up the refcount in stage2_map_walker_try_leaf()
if the PTE was already valid, balancing out the later get_page()
in stage2_map_walk_leaf().

Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
[maz: update commit message, add comment in the code]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201201201034.116760-2-wangyanan55@huawei.com
2020-12-02 09:42:24 +00:00
Zenghui Yu
23bde34771 KVM: arm64: vgic-v3: Drop the reporting of GICR_TYPER.Last for userspace
It was recently reported that if GICR_TYPER is accessed before the RD base
address is set, we'll suffer from the unset @rdreg dereferencing. Oops...

	gpa_t last_rdist_typer = rdreg->base + GICR_TYPER +
			(rdreg->free_index - 1) * KVM_VGIC_V3_REDIST_SIZE;

It's "expected" that users will access registers in the redistributor if
the RD has been properly configured (e.g., the RD base address is set). But
it hasn't yet been covered by the existing documentation.

Per discussion on the list [1], the reporting of the GICR_TYPER.Last bit
for userspace never actually worked. And it's difficult for us to emulate
it correctly given that userspace has the flexibility to access it any
time. Let's just drop the reporting of the Last bit for userspace for now
(userspace should have full knowledge about it anyway) and it at least
prevents kernel from panic ;-)

[1] https://lore.kernel.org/kvmarm/c20865a267e44d1e2c0d52ce4e012263@kernel.org/

Fixes: ba7b3f1275 ("KVM: arm/arm64: Revisit Redistributor TYPER last bit computation")
Reported-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20201117151629.1738-1-yuzenghui@huawei.com
Cc: stable@vger.kernel.org
2020-11-17 18:51:09 +00:00
Jamie Iles
7bab16a607 KVM: arm64: Correctly align nVHE percpu data
The nVHE percpu data is partially linked but the nVHE linker script did
not align the percpu section.  The PERCPU_INPUT macro would then align
the data to a page boundary:

  #define PERCPU_INPUT(cacheline)					\
  	__per_cpu_start = .;						\
  	*(.data..percpu..first)						\
  	. = ALIGN(PAGE_SIZE);						\
  	*(.data..percpu..page_aligned)					\
  	. = ALIGN(cacheline);						\
  	*(.data..percpu..read_mostly)					\
  	. = ALIGN(cacheline);						\
  	*(.data..percpu)						\
  	*(.data..percpu..shared_aligned)				\
  	PERCPU_DECRYPTED_SECTION					\
  	__per_cpu_end = .;

but then when the final vmlinux linking happens the hypervisor percpu
data is included after page alignment and so the offsets potentially
don't match.  On my build I saw that the .hyp.data..percpu section was
at address 0x20 and then the percpu data would begin at 0x1000 (because
of the page alignment in PERCPU_INPUT), but when linked into vmlinux,
everything would be shifted down by 0x20 bytes.

This manifests as one of the CPUs getting lost when running
kvm-unit-tests or starting any VM and subsequent soft lockup on a Cortex
A72 device.

Fixes: 30c953911c ("kvm: arm64: Set up hyp percpu data for nVHE")
Signed-off-by: Jamie Iles <jamie@nuviainc.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: David Brazdil <dbrazdil@google.com>
Cc: David Brazdil <dbrazdil@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201113150406.14314-1-jamie@nuviainc.com
2020-11-16 09:30:42 +00:00
Marc Zyngier
ed4ffaf49b KVM: arm64: Handle SCXTNUM_ELx traps
As the kernel never sets HCR_EL2.EnSCXT, accesses to SCXTNUM_ELx
will trap to EL2. Let's handle that as gracefully as possible
by injecting an UNDEF exception into the guest. This is consistent
with the guest's view of ID_AA64PFR0_EL1.CSV2 being at most 1.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201110141308.451654-4-maz@kernel.org
2020-11-12 21:22:46 +00:00
Marc Zyngier
338b17933a KVM: arm64: Unify trap handlers injecting an UNDEF
A large number of system register trap handlers only inject an
UNDEF exeption, and yet each class of sysreg seems to provide its
own, identical function.

Let's unify them all, saving us introducing yet another one later.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201110141308.451654-3-maz@kernel.org
2020-11-12 21:22:45 +00:00
Marc Zyngier
23711a5e66 KVM: arm64: Allow setting of ID_AA64PFR0_EL1.CSV2 from userspace
We now expose ID_AA64PFR0_EL1.CSV2=1 to guests running on hosts
that are immune to Spectre-v2, but that don't have this field set,
most likely because they predate the specification.

However, this prevents the migration of guests that have started on
a host the doesn't fake this CSV2 setting to one that does, as KVM
rejects the write to ID_AA64PFR0_EL2 on the grounds that it isn't
what is already there.

In order to fix this, allow userspace to set this field as long as
this doesn't result in a promising more than what is already there
(setting CSV2 to 0 is acceptable, but setting it to 1 when it is
already set to 0 isn't).

Fixes: e1026237f9 ("KVM: arm64: Set CSV2 for guests on hardware unaffected by Spectre-v2")
Reported-by: Peng Liang <liangpeng10@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201110141308.451654-2-maz@kernel.org
2020-11-12 21:22:22 +00:00
Marc Zyngier
4f6b838c37 Merge tag 'v5.10-rc1' into kvmarm-master/next
Linux 5.10-rc1

Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-11-12 21:20:43 +00:00
Andrew Jones
c512298eed KVM: arm64: Remove AA64ZFR0_EL1 accessors
The AA64ZFR0_EL1 accessors are just the general accessors with
its visibility function open-coded. It also skips the if-else
chain in read_id_reg, but there's no reason not to go there.
Indeed consolidating ID register accessors and removing lines
of code make it worthwhile.

Remove the AA64ZFR0_EL1 accessors, replacing them with the
general accessors for sanitized ID registers.

No functional change intended.

Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201105091022.15373-5-drjones@redhat.com
2020-11-06 16:00:29 +00:00
Andrew Jones
912dee5726 KVM: arm64: Check RAZ visibility in ID register accessors
The instruction encodings of ID registers are preallocated. Until an
encoding is assigned a purpose the register is RAZ. KVM's general ID
register accessor functions already support both paths, RAZ or not.
If for each ID register we can determine if it's RAZ or not, then all
ID registers can build on the general functions. The register visibility
function allows us to check whether a register should be completely
hidden or not, extending it to also report when the register should
be RAZ or not allows us to use it for ID registers as well.

Check for RAZ visibility in the ID register accessor functions,
allowing the RAZ case to be handled in a generic way for all system
registers.

The new REG_RAZ flag will be used in a later patch. This patch has
no intended functional change.

Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201105091022.15373-4-drjones@redhat.com
2020-11-06 16:00:29 +00:00
Andrew Jones
01fe5ace92 KVM: arm64: Consolidate REG_HIDDEN_GUEST/USER
REG_HIDDEN_GUEST and REG_HIDDEN_USER are always used together.
Consolidate them into a single REG_HIDDEN flag. We can always
add another flag later if some register needs to expose itself
differently to the guest than it does to userspace.

No functional change intended.

Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201105091022.15373-3-drjones@redhat.com
2020-11-06 16:00:29 +00:00
Andrew Jones
f81cb2c3ad KVM: arm64: Don't hide ID registers from userspace
ID registers are RAZ until they've been allocated a purpose, but
that doesn't mean they should be removed from the KVM_GET_REG_LIST
list. So far we only have one register, SYS_ID_AA64ZFR0_EL1, that
is hidden from userspace when its function, SVE, is not present.

Expose SYS_ID_AA64ZFR0_EL1 to userspace as RAZ when SVE is not
implemented. Removing the userspace visibility checks is enough
to reexpose it, as it will already return zero to userspace when
SVE is not present. The register already behaves as RAZ for the
guest when SVE is not present.

Fixes: 73433762fc ("KVM: arm64/sve: System register context switch and access support")
Reported-by: 张东旭 <xu910121@sina.com>
Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org#v5.2+
Link: https://lore.kernel.org/r/20201105091022.15373-2-drjones@redhat.com
2020-11-06 16:00:29 +00:00
Gavin Shan
faf000397e KVM: arm64: Fix build error in user_mem_abort()
The PUD and PMD are folded into PGD when the following options are
enabled. In that case, PUD_SHIFT is equal to PMD_SHIFT and we fail
to build with the indicated errors:

   CONFIG_ARM64_VA_BITS_42=y
   CONFIG_ARM64_PAGE_SHIFT=16
   CONFIG_PGTABLE_LEVELS=3

   arch/arm64/kvm/mmu.c: In function ‘user_mem_abort’:
   arch/arm64/kvm/mmu.c:798:2: error: duplicate case value
     case PMD_SHIFT:
     ^~~~
   arch/arm64/kvm/mmu.c:791:2: note: previously used here
     case PUD_SHIFT:
     ^~~~

This fixes the issue by skipping the check on PUD huge page when PUD
and PMD are folded into PGD.

Fixes: 2f40c46021 ("KVM: arm64: Use fallback mapping sizes for contiguous huge page sizes")
Reported-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201103003009.32955-1-gshan@redhat.com
2020-11-06 16:00:29 +00:00
Qais Yousef
22f553842b KVM: arm64: Handle Asymmetric AArch32 systems
On a system without uniform support for AArch32 at EL0, it is possible
for the guest to force run AArch32 at EL0 and potentially cause an
illegal exception if running on a core without AArch32. Add an extra
check so that if we catch the guest doing that, then we prevent it from
running again by resetting vcpu->arch.target and return
ARM_EXCEPTION_IL.

We try to catch this misbehaviour as early as possible and not rely on
an illegal exception occuring to signal the problem. Attempting to run a
32bit app in the guest will produce an error from QEMU if the guest
exits while running in AArch32 EL0.

Tested on Juno by instrumenting the host to fake asym aarch32 and
instrumenting KVM to make the asymmetry visible to the guest.

[will: Incorporated feedback from Marc]

Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201021104611.2744565-2-qais.yousef@arm.com
Link: https://lore.kernel.org/r/20201027215118.27003-2-will@kernel.org
2020-10-30 16:06:22 +00:00
Mark Rutland
d86de40dec arm64: cpufeature: upgrade hyp caps to final
We finalize caps before initializing kvm hyp code, and any use of
cpus_have_const_cap() in kvm hyp code generates redundant and
potentially unsound code to read the cpu_hwcaps array.

A number of helper functions used in both hyp context and regular kernel
context use cpus_have_const_cap(), as some regular kernel code runs
before the capabilities are finalized. It's tedious and error-prone to
write separate copies of these for hyp and non-hyp code.

So that we can avoid the redundant code, let's automatically upgrade
cpus_have_const_cap() to cpus_have_final_cap() when used in hyp context.
With this change, there's never a reason to access to cpu_hwcaps array
from hyp code, and we don't need to create an NVHE alias for this.

This should have no effect on non-hyp code.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Cc: David Brazdil <dbrazdil@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201026134931.28246-4-mark.rutland@arm.com
2020-10-30 08:53:10 +00:00
Mark Rutland
dfc4e3f089 arm64: cpufeature: reorder cpus_have_{const, final}_cap()
In a subsequent patch we'll modify cpus_have_const_cap() to call
cpus_have_final_cap(), and hence we need to define cpus_have_final_cap()
first.

To make subsequent changes easier to follow, this patch reorders the two
without making any other changes.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Cc: David Brazdil <dbrazdil@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201026134931.28246-3-mark.rutland@arm.com
2020-10-30 08:53:10 +00:00
Mark Rutland
e9a33caec9 KVM: arm64: Factor out is_{vhe,nvhe}_hyp_code()
Currently has_vhe() detects whether it is being compiled for VHE/NVHE
hyp code based on preprocessor definitions, and uses this knowledge to
avoid redundant runtime checks.

There are other cases where we'd like to use this knowledge, so let's
factor the preprocessor checks out into separate helpers.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Cc: David Brazdil <dbrazdil@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201026134931.28246-2-mark.rutland@arm.com
2020-10-30 08:52:51 +00:00
Santosh Shukla
91a2c34b7d KVM: arm64: Force PTE mapping on fault resulting in a device mapping
VFIO allows a device driver to resolve a fault by mapping a MMIO
range. This can be subsequently result in user_mem_abort() to
try and compute a huge mapping based on the MMIO pfn, which is
a sure recipe for things to go wrong.

Instead, force a PTE mapping when the pfn faulted in has a device
mapping.

Fixes: 6d674e28f6 ("KVM: arm/arm64: Properly handle faulting of device mappings")
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Santosh Shukla <sashukla@nvidia.com>
[maz: rewritten commit message]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/1603711447-11998-2-git-send-email-sashukla@nvidia.com
2020-10-29 20:41:04 +00:00
Gavin Shan
2f40c46021 KVM: arm64: Use fallback mapping sizes for contiguous huge page sizes
Although huge pages can be created out of multiple contiguous PMDs
or PTEs, the corresponding sizes are not supported at Stage-2 yet.

Instead of failing the mapping, fall back to the nearer supported
mapping size (CONT_PMD to PMD and CONT_PTE to PTE respectively).

Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Gavin Shan <gshan@redhat.com>
[maz: rewritten commit message]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201025230626.18501-1-gshan@redhat.com
2020-10-29 20:39:46 +00:00
Will Deacon
e2fc6a9f68 KVM: arm64: Fix masks in stage2_pte_cacheable()
stage2_pte_cacheable() tries to figure out whether the mapping installed
in its 'pte' parameter is cacheable or not. Unfortunately, it fails
miserably because it extracts the memory attributes from the entry using
FIELD_GET(), which returns the attributes shifted down to bit 0, but then
compares this with the unshifted value generated by the PAGE_S2_MEMATTR()
macro.

A direct consequence of this bug is that cache maintenance is silently
skipped, which in turn causes 32-bit guests to crash early on when their
set/way maintenance is trapped but not emulated correctly.

Fix the broken masks by avoiding the use of FIELD_GET() altogether.

Fixes: 6d9d2115c4 ("KVM: arm64: Add support for stage-2 map()/unmap() in generic page-table")
Reported-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20201029144716.30476-1-will@kernel.org
2020-10-29 19:49:03 +00:00
Marc Zyngier
4a1c2c7f63 KVM: arm64: Fix AArch32 handling of DBGD{CCINT,SCRext} and DBGVCR
The DBGD{CCINT,SCRext} and DBGVCR register entries in the cp14 array
are missing their target register, resulting in all accesses being
targetted at the guard sysreg (indexed by __INVALID_SYSREG__).

Point the emulation code at the actual register entries.

Fixes: bdfb4b389c ("arm64: KVM: add trap handlers for AArch32 debug registers")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20201029172409.2768336-1-maz@kernel.org
2020-10-29 19:49:03 +00:00
Will Deacon
7efe8ef274 KVM: arm64: Allocate stage-2 pgd pages with GFP_KERNEL_ACCOUNT
For consistency with the rest of the stage-2 page-table page allocations
(performing using a kvm_mmu_memory_cache), ensure that __GFP_ACCOUNT is
included in the GFP flags for the PGD pages.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20201026144423.24683-1-will@kernel.org
2020-10-29 19:49:03 +00:00
Marc Zyngier
d2782505fb KVM: arm64: Drop useless PAN setting on host EL1 to EL2 transition
Setting PSTATE.PAN when entering EL2 on nVHE doesn't make much
sense as this bit only means something for translation regimes
that include EL0. This obviously isn't the case in the nVHE case,
so let's drop this setting.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>
Link: https://lore.kernel.org/r/20201026095116.72051-4-maz@kernel.org
2020-10-29 19:49:03 +00:00
Marc Zyngier
b6d6db4de8 KVM: arm64: Remove leftover kern_hyp_va() in nVHE TLB invalidation
The new calling convention says that pointers coming from the SMCCC
interface are turned into their HYP version in the host HVC handler.
However, there is still a stray kern_hyp_va() in the TLB invalidation
code, which could result in a corrupted pointer.

Drop the spurious conversion.

Fixes: a071261d93 ("KVM: arm64: nVHE: Fix pointers during SMCCC convertion")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201026095116.72051-3-maz@kernel.org
2020-10-29 19:49:03 +00:00
Marc Zyngier
28e81c6270 KVM: arm64: Don't corrupt tpidr_el2 on failed HVC call
The hyp-init code starts by stashing a register in TPIDR_EL2
in in order to free a register. This happens no matter if the
HVC call is legal or not.

Although nothing wrong seems to come out of it, it feels odd
to alter the EL2 state for something that eventually returns
an error.

Instead, use the fact that we know exactly which bits of the
__kvm_hyp_init call are non-zero to perform the check with
a series of EOR/ROR instructions, combined with a build-time
check that the value is the one we expect.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201026095116.72051-2-maz@kernel.org
2020-10-29 19:49:02 +00:00
Linus Torvalds
3650b228f8 Linux 5.10-rc1 v5.10-rc1 2020-10-25 15:14:11 -07:00
Joe Perches
33def8498f treewide: Convert macro and uses of __section(foo) to __section("foo")
Use a more generic form for __section that requires quotes to avoid
complications with clang and gcc differences.

Remove the quote operator # from compiler_attributes.h __section macro.

Convert all unquoted __section(foo) uses to quoted __section("foo").
Also convert __attribute__((section("foo"))) uses to __section("foo")
even if the __attribute__ has multiple list entry forms.

Conversion done using the script at:

    https://lore.kernel.org/lkml/75393e5ddc272dc7403de74d645e6c6e0f4e70eb.camel@perches.com/2-convert_section.pl

Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@gooogle.com>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-25 14:51:49 -07:00
Rasmus Villemoes
986b9eacb2 kernel/sys.c: fix prototype of prctl_get_tid_address()
tid_addr is not a "pointer to (pointer to int in userspace)"; it is in
fact a "pointer to (pointer to int in userspace) in userspace".  So
sparse rightfully complains about passing a kernel pointer to
put_user().

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-25 11:44:16 -07:00
Eric Biggers
23224e4500 mm: remove kzfree() compatibility definition
Commit 453431a549 ("mm, treewide: rename kzfree() to
kfree_sensitive()") renamed kzfree() to kfree_sensitive(),
but it left a compatibility definition of kzfree() to avoid
being too disruptive.

Since then a few more instances of kzfree() have slipped in.

Just get rid of them and remove the compatibility definition
once and for all.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-25 11:39:02 -07:00
Joe Perches
0f7f635b06 checkpatch: enable GIT_DIR environment use to set git repository location
If set, use the environment variable GIT_DIR to change the default .git
location of the kernel git tree.

If GIT_DIR is unset, keep using the current ".git" default.

Link: https://lkml.kernel.org/r/c5e23b45562373d632fccb8bc04e563abba4dd1d.camel@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-25 11:36:33 -07:00
Linus Torvalds
672f887126 Merge tag 'timers-urgent-2020-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
 "A time namespace fix and a matching selftest. The futex absolute
  timeouts which are based on CLOCK_MONOTONIC require time namespace
  corrected. This was missed in the original time namesapce support"

* tag 'timers-urgent-2020-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  selftests/timens: Add a test for futex()
  futex: Adjust absolute futex timeouts with per time namespace offset
2020-10-25 11:28:49 -07:00
Linus Torvalds
87702a337f Merge tag 'sched-urgent-2020-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Thomas Gleixner:
 "Two scheduler fixes:

   - A trivial build fix for sched_feat() to compile correctly with
     CONFIG_JUMP_LABEL=n

   - Replace a zero lenght array with a flexible array"

* tag 'sched-urgent-2020-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/features: Fix !CONFIG_JUMP_LABEL case
  sched: Replace zero-length array with flexible-array
2020-10-25 11:25:16 -07:00
Linus Torvalds
a3d1b31213 Merge tag 'perf-urgent-2020-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fix from Thomas Gleixner:
 "A single fix to compute the field offset of the SNOOPX bit in the data
  source bitmask of perf events correctly"

* tag 'perf-urgent-2020-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: correct SNOOPX field offset
2020-10-25 11:22:59 -07:00
Linus Torvalds
1c84550f47 Merge tag 'locking-urgent-2020-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fix from Thomas Gleixner:
 "Just a trivial fix for kernel-doc warnings"

* tag 'locking-urgent-2020-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/seqlocks: Fix kernel-doc warnings
2020-10-25 11:14:54 -07:00
Linus Torvalds
e5acf0ed8a Merge tag 'ntb-5.10' of git://github.com/jonmason/ntb
Pull NTB fixes from Jon Mason.

* tag 'ntb-5.10' of git://github.com/jonmason/ntb:
  NTB: Use struct_size() helper in devm_kzalloc()
  ntb: intel: Fix memleak in intel_ntb_pci_probe
  NTB: hw: amd: fix an issue about leak system resources
2020-10-25 11:12:31 -07:00
Linus Torvalds
0746c4a9f3 Merge branch 'i2c/for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fix from Wolfram Sang:
 "Regression fix for rc1 and stable kernels as well"

* 'i2c/for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: core: Restore acpi_walk_dep_device_list() getting called after registering the ACPI i2c devs
2020-10-25 11:10:23 -07:00
Linus Torvalds
c10037f832 Merge tag '5.10-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6
Pull more cifs updates from Steve French:
 "Add support for stat of various special file types (WSL reparse points
  for char, block, fifo)"

* tag '5.10-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: update internal module version number
  smb3: add some missing definitions from MS-FSCC
  smb3: remove two unused variables
  smb3: add support for stat of WSL reparse points for special file types
2020-10-25 11:05:04 -07:00
Linus Torvalds
f9c25d9864 Merge branch 'parisc-5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull more parisc updates from Helge Deller:

 - During this merge window O_NONBLOCK was changed to become 000200000,
   but we missed that the syscalls timerfd_create(), signalfd4(),
   eventfd2(), pipe2(), inotify_init1() and userfaultfd() do a strict
   bit-wise check of the flags parameter.

   To provide backward compatibility with existing userspace we
   introduce parisc specific wrappers for those syscalls which filter
   out the old O_NONBLOCK value and replaces it with the new one.

 - Prevent HIL bus driver to get stuck when keyboard or mouse isn't
   attached

 - Improve error return codes when setting rtc time

 - Minor documentation fix in pata_ns87415.c

* 'parisc-5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  ata: pata_ns87415.c: Document support on parisc with superio chip
  parisc: Add wrapper syscalls to fix O_NONBLOCK flag usage
  hil/parisc: Disable HIL driver when it gets stuck
  parisc: Improve error return codes when setting rtc time
2020-10-25 10:59:34 -07:00
Linus Torvalds
bd6aabc7ca Merge tag 'for-linus-5.10b-rc1c-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull more xen updates from Juergen Gross:

 - a series for the Xen pv block drivers adding module parameters for
   better control of resource usge

 - a cleanup series for the Xen event driver

* tag 'for-linus-5.10b-rc1c-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  Documentation: add xen.fifo_events kernel parameter description
  xen/events: unmask a fifo event channel only if it was masked
  xen/events: only register debug interrupt for 2-level events
  xen/events: make struct irq_info private to events_base.c
  xen: remove no longer used functions
  xen-blkfront: Apply changed parameter name to the document
  xen-blkfront: add a parameter for disabling of persistent grants
  xen-blkback: add a parameter for disabling of persistent grants
2020-10-25 10:55:35 -07:00
Linus Torvalds
81ecf91eab Merge tag 'safesetid-5.10' of git://github.com/micah-morton/linux
Pull SafeSetID updates from Micah Morton:
 "The changes are mostly contained to within the SafeSetID LSM, with the
  exception of a few 1-line changes to change some ns_capable() calls to
  ns_capable_setid() -- causing a flag (CAP_OPT_INSETID) to be set that
  is examined by SafeSetID code and nothing else in the kernel.

  The changes to SafeSetID internally allow for setting up GID
  transition security policies, as already existed for UIDs"

* tag 'safesetid-5.10' of git://github.com/micah-morton/linux:
  LSM: SafeSetID: Fix warnings reported by test bot
  LSM: SafeSetID: Add GID security policy handling
  LSM: Signal to SafeSetID when setting group IDs
2020-10-25 10:45:26 -07:00
Linus Torvalds
91f28da8c9 Merge tag '20201024-v4-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/wtarreau/prandom
Pull random32 updates from Willy Tarreau:
 "Make prandom_u32() less predictable.

  This is the cleanup of the latest series of prandom_u32
  experimentations consisting in using SipHash instead of Tausworthe to
  produce the randoms used by the network stack.

  The changes to the files were kept minimal, and the controversial
  commit that used to take noise from the fast_pool (f227e3ec3b) was
  reverted. Instead, a dedicated "net_rand_noise" per_cpu variable is
  fed from various sources of activities (networking, scheduling) to
  perturb the SipHash state using fast, non-trivially predictable data,
  instead of keeping it fully deterministic. The goal is essentially to
  make any occasional memory leakage or brute-force attempt useless.

  The resulting code was verified to be very slightly faster on x86_64
  than what is was with the controversial commit above, though this
  remains barely above measurement noise. It was also tested on i386 and
  arm, and build- tested only on arm64"

Link: https://lore.kernel.org/netdev/20200808152628.GA27941@SDF.ORG/

* tag '20201024-v4-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/wtarreau/prandom:
  random32: add a selftest for the prandom32 code
  random32: add noise from network and scheduling activity
  random32: make prandom_u32() output unpredictable
2020-10-25 10:40:08 -07:00
Hans de Goede
8058d69905 i2c: core: Restore acpi_walk_dep_device_list() getting called after registering the ACPI i2c devs
Commit 21653a4181 ("i2c: core: Call i2c_acpi_install_space_handler()
before i2c_acpi_register_devices()")'s intention was to only move the
acpi_install_address_space_handler() call to the point before where
the ACPI declared i2c-children of the adapter where instantiated by
i2c_acpi_register_devices().

But i2c_acpi_install_space_handler() had a call to
acpi_walk_dep_device_list() hidden (that is I missed it) at the end
of it, so as an unwanted side-effect now acpi_walk_dep_device_list()
was also being called before i2c_acpi_register_devices().

Move the acpi_walk_dep_device_list() call to the end of
i2c_acpi_register_devices(), so that it is once again called *after*
the i2c_client-s hanging of the adapter have been created.

This fixes the Microsoft Surface Go 2 hanging at boot.

Fixes: 21653a4181 ("i2c: core: Call i2c_acpi_install_space_handler() before i2c_acpi_register_devices()")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=209627
Reported-by: Rainer Finke <rainer@finke.cc>
Reported-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Suggested-by: Maximilian Luz <luzmaximilian@gmail.com>
Tested-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
2020-10-25 13:33:54 +01:00
Linus Torvalds
d769139081 Merge tag 'block-5.10-2020-10-24' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:

 - NVMe pull request from Christoph
     - rdma error handling fixes (Chao Leng)
     - fc error handling and reconnect fixes (James Smart)
     - fix the qid displace when tracing ioctl command (Keith Busch)
     - don't use BLK_MQ_REQ_NOWAIT for passthru (Chaitanya Kulkarni)
     - fix MTDT for passthru (Logan Gunthorpe)
     - blacklist Write Same on more devices (Kai-Heng Feng)
     - fix an uninitialized work struct (zhenwei pi)"

 - lightnvm out-of-bounds fix (Colin)

 - SG allocation leak fix (Doug)

 - rnbd fixes (Gioh, Guoqing, Jack)

 - zone error translation fixes (Keith)

 - kerneldoc markup fix (Mauro)

 - zram lockdep fix (Peter)

 - Kill unused io_context members (Yufen)

 - NUMA memory allocation cleanup (Xianting)

 - NBD config wakeup fix (Xiubo)

* tag 'block-5.10-2020-10-24' of git://git.kernel.dk/linux-block: (27 commits)
  block: blk-mq: fix a kernel-doc markup
  nvme-fc: shorten reconnect delay if possible for FC
  nvme-fc: wait for queues to freeze before calling update_hr_hw_queues
  nvme-fc: fix error loop in create_hw_io_queues
  nvme-fc: fix io timeout to abort I/O
  null_blk: use zone status for max active/open
  nvmet: don't use BLK_MQ_REQ_NOWAIT for passthru
  nvmet: cleanup nvmet_passthru_map_sg()
  nvmet: limit passthru MTDS by BIO_MAX_PAGES
  nvmet: fix uninitialized work for zero kato
  nvme-pci: disable Write Zeroes on Sandisk Skyhawk
  nvme: use queuedata for nvme_req_qid
  nvme-rdma: fix crash due to incorrect cqe
  nvme-rdma: fix crash when connect rejected
  block: remove unused members for io_context
  blk-mq: remove the calling of local_memory_node()
  zram: Fix __zram_bvec_{read,write}() locking order
  skd_main: remove unused including <linux/version.h>
  sgl_alloc_order: fix memory leak
  lightnvm: fix out-of-bounds write to array devices->info[]
  ...
2020-10-24 12:46:42 -07:00
Linus Torvalds
af0041875c Merge tag 'io_uring-5.10-2020-10-24' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:

 - fsize was missed in previous unification of work flags

 - Few fixes cleaning up the flags unification creds cases (Pavel)

 - Fix NUMA affinities for completely unplugged/replugged node for io-wq

 - Two fallout fixes from the set_fs changes. One local to io_uring, one
   for the splice entry point that io_uring uses.

 - Linked timeout fixes (Pavel)

 - Removal of ->flush() ->files work-around that we don't need anymore
   with referenced files (Pavel)

 - Various cleanups (Pavel)

* tag 'io_uring-5.10-2020-10-24' of git://git.kernel.dk/linux-block:
  splice: change exported internal do_splice() helper to take kernel offset
  io_uring: make loop_rw_iter() use original user supplied pointers
  io_uring: remove req cancel in ->flush()
  io-wq: re-set NUMA node affinities if CPUs come online
  io_uring: don't reuse linked_timeout
  io_uring: unify fsize with def->work_flags
  io_uring: fix racy REQ_F_LINK_TIMEOUT clearing
  io_uring: do poll's hash_node init in common code
  io_uring: inline io_poll_task_handler()
  io_uring: remove extra ->file check in poll prep
  io_uring: make cached_cq_overflow non atomic_t
  io_uring: inline io_fail_links()
  io_uring: kill ref get/drop in personality init
  io_uring: flags-based creds init in queue
2020-10-24 12:40:18 -07:00
Linus Torvalds
cb6b2897b9 Merge tag 'libata-5.10-2020-10-24' of git://git.kernel.dk/linux-block
Pull libata fixes from Jens Axboe:
 "Two minor libata fixes:

   - Fix a DMA boundary mask regression for sata_rcar (Geert)

   - kerneldoc markup fix (Mauro)"

* tag 'libata-5.10-2020-10-24' of git://git.kernel.dk/linux-block:
  ata: fix some kernel-doc markups
  ata: sata_rcar: Fix DMA boundary mask
2020-10-24 12:36:24 -07:00
Linus Torvalds
0eac1102e9 Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs updates from Al Viro:
 "Assorted stuff all over the place (the largest group here is
  Christoph's stat cleanups)"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fs: remove KSTAT_QUERY_FLAGS
  fs: remove vfs_stat_set_lookup_flags
  fs: move vfs_fstatat out of line
  fs: implement vfs_stat and vfs_lstat in terms of vfs_fstatat
  fs: remove vfs_statx_fd
  fs: omfs: use kmemdup() rather than kmalloc+memcpy
  [PATCH] reduce boilerplate in fsid handling
  fs: Remove duplicated flag O_NDELAY occurring twice in VALID_OPEN_FLAGS
  selftests: mount: add nosymfollow tests
  Add a "nosymfollow" mount option.
2020-10-24 12:26:05 -07:00
Linus Torvalds
1b307ac870 Merge tag 'dma-mapping-5.10-1' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fixes from Christoph Hellwig:

 - document the new dma_{alloc,free}_pages() API

 - two fixups for the dma-mapping.h split

* tag 'dma-mapping-5.10-1' of git://git.infradead.org/users/hch/dma-mapping:
  dma-mapping: document dma_{alloc,free}_pages
  dma-mapping: move more functions to dma-map-ops.h
  ARM/sa1111: add a missing include of dma-map-ops.h
2020-10-24 12:17:05 -07:00
Linus Torvalds
9bf8d8bcf3 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
 "Two fixes for this merge window, and an unrelated bugfix for a host
  hang"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: ioapic: break infinite recursion on lazy EOI
  KVM: vmx: rename pi_init to avoid conflict with paride
  KVM: x86/mmu: Avoid modulo operator on 64-bit value to fix i386 build
2020-10-24 12:09:22 -07:00
Linus Torvalds
c51ae12472 Merge tag 'x86_seves_fixes_for_v5.10_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 SEV-ES fixes from Borislav Petkov:
 "Three fixes to SEV-ES to correct setting up the new early pagetable on
  5-level paging machines, to always map boot_params and the kernel
  cmdline, and disable stack protector for ../compressed/head{32,64}.c.
  (Arvind Sankar)"

* tag 'x86_seves_fixes_for_v5.10_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot/64: Explicitly map boot_params and command line
  x86/head/64: Disable stack protection for head$(BITS).o
  x86/boot/64: Initialize 5-level paging variables earlier
2020-10-24 11:49:32 -07:00
Willy Tarreau
c6e169bc14 random32: add a selftest for the prandom32 code
Given that this code is new, let's add a selftest for it as well.
It doesn't rely on fixed sets, instead it picks 1024 numbers and
verifies that they're not more correlated than desired.

Link: https://lore.kernel.org/netdev/20200808152628.GA27941@SDF.ORG/
Cc: George Spelvin <lkml@sdf.org>
Cc: Amit Klein <aksecurity@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: tytso@mit.edu
Cc: Florian Westphal <fw@strlen.de>
Cc: Marc Plumb <lkml.mplumb@gmail.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2020-10-24 20:21:57 +02:00
Willy Tarreau
3744741ada random32: add noise from network and scheduling activity
With the removal of the interrupt perturbations in previous random32
change (random32: make prandom_u32() output unpredictable), the PRNG
has become 100% deterministic again. While SipHash is expected to be
way more robust against brute force than the previous Tausworthe LFSR,
there's still the risk that whoever has even one temporary access to
the PRNG's internal state is able to predict all subsequent draws till
the next reseed (roughly every minute). This may happen through a side
channel attack or any data leak.

This patch restores the spirit of commit f227e3ec3b ("random32: update
the net random state on interrupt and activity") in that it will perturb
the internal PRNG's statee using externally collected noise, except that
it will not pick that noise from the random pool's bits nor upon
interrupt, but will rather combine a few elements along the Tx path
that are collectively hard to predict, such as dev, skb and txq
pointers, packet length and jiffies values. These ones are combined
using a single round of SipHash into a single long variable that is
mixed with the net_rand_state upon each invocation.

The operation was inlined because it produces very small and efficient
code, typically 3 xor, 2 add and 2 rol. The performance was measured
to be the same (even very slightly better) than before the switch to
SipHash; on a 6-core 12-thread Core i7-8700k equipped with a 40G NIC
(i40e), the connection rate dropped from 556k/s to 555k/s while the
SYN cookie rate grew from 5.38 Mpps to 5.45 Mpps.

Link: https://lore.kernel.org/netdev/20200808152628.GA27941@SDF.ORG/
Cc: George Spelvin <lkml@sdf.org>
Cc: Amit Klein <aksecurity@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: tytso@mit.edu
Cc: Florian Westphal <fw@strlen.de>
Cc: Marc Plumb <lkml.mplumb@gmail.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2020-10-24 20:21:57 +02:00