Commit Graph

1264555 Commits

Author SHA1 Message Date
Vitaly Chikunov
eb5739a1ef crypto: ecrdsa - Fix module auto-load on add_key
Add module alias with the algorithm cra_name similar to what we have for
RSA-related and other algorithms.

The kernel attempts to modprobe asymmetric algorithms using the names
"crypto-$cra_name" and "crypto-$cra_name-all." However, since these
aliases are currently missing, the modules are not loaded. For instance,
when using the `add_key` function, the hash algorithm is typically
loaded automatically, but the asymmetric algorithm is not.

Steps to test:

1. Cert is generated usings ima-evm-utils test suite with
   `gen-keys.sh`, example cert is provided below:

  $ base64 -d >test-gost2012_512-A.cer <<EOF
  MIIB/DCCAWagAwIBAgIUK8+whWevr3FFkSdU9GLDAM7ure8wDAYIKoUDBwEBAwMFADARMQ8wDQYD
  VQQDDAZDQSBLZXkwIBcNMjIwMjAxMjIwOTQxWhgPMjA4MjEyMDUyMjA5NDFaMBExDzANBgNVBAMM
  BkNBIEtleTCBoDAXBggqhQMHAQEBAjALBgkqhQMHAQIBAgEDgYQABIGALXNrTJGgeErBUOov3Cfo
  IrHF9fcj8UjzwGeKCkbCcINzVUbdPmCopeJRHDJEvQBX1CQUPtlwDv6ANjTTRoq5nCk9L5PPFP1H
  z73JIXHT0eRBDVoWy0cWDRz1mmQlCnN2HThMtEloaQI81nTlKZOcEYDtDpi5WODmjEeRNQJMdqCj
  UDBOMAwGA1UdEwQFMAMBAf8wHQYDVR0OBBYEFCwfOITMbE9VisW1i2TYeu1tAo5QMB8GA1UdIwQY
  MBaAFCwfOITMbE9VisW1i2TYeu1tAo5QMAwGCCqFAwcBAQMDBQADgYEAmBfJCMTdC0/NSjz4BBiQ
  qDIEjomO7FEHYlkX5NGulcF8FaJW2jeyyXXtbpnub1IQ8af1KFIpwoS2e93LaaofxpWlpQLlju6m
  KYLOcO4xK3Whwa2hBAz9YbpUSFjvxnkS2/jpH2MsOSXuUEeCruG/RkHHB3ACef9umG6HCNQuAPY=
  EOF

2. Optionally, trace module requests with: trace-cmd stream -e module &

3. Trigger add_key call for the cert:

  # keyctl padd asymmetric "" @u <test-gost2012_512-A.cer
  939910969
  # lsmod | head -3
  Module                  Size  Used by
  ecrdsa_generic         16384  0
  streebog_generic       28672  0

Repored-by: Paul Wolneykien <manowar@altlinux.org>
Cc: stable@vger.kernel.org
Signed-off-by: Vitaly Chikunov <vt@altlinux.org>
Tested-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-02 10:49:38 +08:00
Li Zhijian
90d012fbbf hwrng: core - Convert sprintf/snprintf to sysfs_emit
Per filesystems/sysfs.rst, show() should only use sysfs_emit()
or sysfs_emit_at() when formatting the value to be returned to user space.

coccinelle complains that there are still a couple of functions that use
snprintf(). Convert them to sysfs_emit().

sprintf() will be converted as weel if they have.

Generally, this patch is generated by
make coccicheck M=<path/to/file> MODE=patch \
COCCI=scripts/coccinelle/api/device_attr_show.cocci

No functional change intended

CC: Olivia Mackall <olivia@selenic.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: linux-crypto@vger.kernel.org
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-02 10:49:38 +08:00
Luca Weiss
355577ef84 dt-bindings: crypto: ice: Document sc7280 inline crypto engine
Document the compatible used for the inline crypto engine found on
SC7280.

Signed-off-by: Luca Weiss <luca.weiss@fairphone.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-02 10:49:38 +08:00
Eric Biggers
29ce50e078 crypto: remove CONFIG_CRYPTO_STATS
Remove support for the "Crypto usage statistics" feature
(CONFIG_CRYPTO_STATS).  This feature does not appear to have ever been
used, and it is harmful because it significantly reduces performance and
is a large maintenance burden.

Covering each of these points in detail:

1. Feature is not being used

Since these generic crypto statistics are only readable using netlink,
it's fairly straightforward to look for programs that use them.  I'm
unable to find any evidence that any such programs exist.  For example,
Debian Code Search returns no hits except the kernel header and kernel
code itself and translations of the kernel header:
https://codesearch.debian.net/search?q=CRYPTOCFGA_STAT&literal=1&perpkg=1

The patch series that added this feature in 2018
(https://lore.kernel.org/linux-crypto/1537351855-16618-1-git-send-email-clabbe@baylibre.com/)
said "The goal is to have an ifconfig for crypto device."  This doesn't
appear to have happened.

It's not clear that there is real demand for crypto statistics.  Just
because the kernel provides other types of statistics such as I/O and
networking statistics and some people find those useful does not mean
that crypto statistics are useful too.

Further evidence that programs are not using CONFIG_CRYPTO_STATS is that
it was able to be disabled in RHEL and Fedora as a bug fix
(https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/2947).

Even further evidence comes from the fact that there are and have been
bugs in how the stats work, but they were never reported.  For example,
before Linux v6.7 hash stats were double-counted in most cases.

There has also never been any documentation for this feature, so it
might be hard to use even if someone wanted to.

2. CONFIG_CRYPTO_STATS significantly reduces performance

Enabling CONFIG_CRYPTO_STATS significantly reduces the performance of
the crypto API, even if no program ever retrieves the statistics.  This
primarily affects systems with a large number of CPUs.  For example,
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2039576 reported
that Lustre client encryption performance improved from 21.7GB/s to
48.2GB/s by disabling CONFIG_CRYPTO_STATS.

It can be argued that this means that CONFIG_CRYPTO_STATS should be
optimized with per-cpu counters similar to many of the networking
counters.  But no one has done this in 5+ years.  This is consistent
with the fact that the feature appears to be unused, so there seems to
be little interest in improving it as opposed to just disabling it.

It can be argued that because CONFIG_CRYPTO_STATS is off by default,
performance doesn't matter.  But Linux distros tend to error on the side
of enabling options.  The option is enabled in Ubuntu and Arch Linux,
and until recently was enabled in RHEL and Fedora (see above).  So, even
just having the option available is harmful to users.

3. CONFIG_CRYPTO_STATS is a large maintenance burden

There are over 1000 lines of code associated with CONFIG_CRYPTO_STATS,
spread among 32 files.  It significantly complicates much of the
implementation of the crypto API.  After the initial submission, many
fixes and refactorings have consumed effort of multiple people to keep
this feature "working".  We should be spending this effort elsewhere.

Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-02 10:49:38 +08:00
Gustavo A. R. Silva
1e6b251ce1 crypto: nx - Avoid -Wflex-array-member-not-at-end warning
-Wflex-array-member-not-at-end is coming in GCC-14, and we are getting
ready to enable it globally. So, we are deprecating flexible-array
members in the middle of another structure.

There is currently an object (`header`) in `struct nx842_crypto_ctx`
that contains a flexible structure (`struct nx842_crypto_header`):

struct nx842_crypto_ctx {
	...
        struct nx842_crypto_header header;
        struct nx842_crypto_header_group group[NX842_CRYPTO_GROUP_MAX];
	...
};

So, in order to avoid ending up with a flexible-array member in the
middle of another struct, we use the `struct_group_tagged()` helper to
separate the flexible array from the rest of the members in the flexible
structure:

struct nx842_crypto_header {
	struct_group_tagged(nx842_crypto_header_hdr, hdr,

		... the rest of the members

	);
        struct nx842_crypto_header_group group[];
} __packed;

With the change described above, we can now declare an object of the
type of the tagged struct, without embedding the flexible array in the
middle of another struct:

struct nx842_crypto_ctx {
	...
        struct nx842_crypto_header_hdr header;
        struct nx842_crypto_header_group group[NX842_CRYPTO_GROUP_MAX];
	...
 } __packed;

We also use `container_of()` whenever we need to retrieve a pointer to
the flexible structure, through which we can access the flexible
array if needed.

So, with these changes, fix the following warning:

In file included from drivers/crypto/nx/nx-842.c:55:
drivers/crypto/nx/nx-842.h:174:36: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
  174 |         struct nx842_crypto_header header;
      |                                    ^~~~~~

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-02 10:49:38 +08:00
Jia Jie Ho
7467147ef9 crypto: starfive - Use dma for aes requests
Convert AES module to use dma for data transfers to reduce cpu load and
compatible with future variants.

Signed-off-by: Jia Jie Ho <jiajie.ho@starfivetech.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-02 10:49:38 +08:00
Jia Jie Ho
a05c821e42 crypto: starfive - Skip unneeded key free
Skip unneeded kfree_sensitive if RSA module is using falback algo.

Signed-off-by: Jia Jie Ho <jiajie.ho@starfivetech.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-02 10:49:38 +08:00
Jia Jie Ho
b6e9eb69a1 crypto: starfive - Update hash dma usage
Current hash uses sw fallback for non-word aligned input scatterlists.
Add support for unaligned cases utilizing the data valid mask for dma.

Signed-off-by: Jia Jie Ho <jiajie.ho@starfivetech.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-02 10:49:38 +08:00
Jia Jie Ho
2ccf7a5d9c dt-bindings: crypto: starfive: Add jh8100 support
Add compatible string and additional interrupt for StarFive JH8100
crypto engine.

Signed-off-by: Jia Jie Ho <jiajie.ho@starfivetech.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-02 10:49:38 +08:00
Tom Zanussi
43698cd6c0 crypto: iaa - Change iaa statistics to atomic64_t
Change all the iaa statistics to use atomic64_t instead of the current
u64, to avoid potentially inconsistent counts.

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-02 10:49:38 +08:00
Tom Zanussi
c21fb22df6 crypto: iaa - Add global_stats file and remove individual stat files
Currently, the wq_stats output also includes the global stats, while
the individual global stats are also available as separate debugfs
files.  Since these are all read-only, there's really no reason to
have them as separate files, especially since we already display them
as global stats in the wq_stats.  It makes more sense to just add a
separate global_stats file to display those, and remove them from the
wq_stats, as well as removing the individual stats files.

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-02 10:49:38 +08:00
Tom Zanussi
956cb8a370 crypto: iaa - Remove comp/decomp delay statistics
As part of the simplification/cleanup of the iaa statistics, remove
the comp/decomp delay statistics.

They're actually not really useful and can be/are being more flexibly
generated using standard kernel tracing infrastructure.

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-02 10:49:38 +08:00
Tom Zanussi
19b0ed5ddc crypto: iaa - fix decomp_bytes_in stats
Decomp stats should use slen, not dlen.  Change both the global and
per-wq stats to use the correct value.

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-02 10:49:38 +08:00
Xin Zeng
f0bbfc391a crypto: qat - implement interface for live migration
Add logic to implement the interface for live migration defined in
qat/qat_mig_dev.h. This is specific for QAT GEN4 Virtual Functions
(VFs).

This introduces a migration data manager which is used to handle the
device state during migration. The manager ensures that the device state
is stored in a format that can be restored in the destination node.

The VF state is organized into a hierarchical structure that includes a
preamble, a general state section, a MISC bar section and an ETR bar
section. The latter contains the state of the 4 ring pairs contained on
a VF. Here is a graphical representation of the state:

    preamble | general state section | leaf state
             | MISC bar state section| leaf state
             | ETR bar state section | bank0 state section | leaf state
                                     | bank1 state section | leaf state
                                     | bank2 state section | leaf state
                                     | bank3 state section | leaf state

In addition to the implementation of the qat_migdev_ops interface and
the state manager framework, add a mutex in pfvf to avoid pf2vf messages
during migration.

Signed-off-by: Xin Zeng <xin.zeng@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-02 10:47:43 +08:00
Xin Zeng
0fce55e533 crypto: qat - add interface for live migration
Extend the driver with a new interface to be used for VF live migration.
This allows to create and destroy a qat_mig_dev object that contains
a set of methods to allow to save and restore the state of QAT VF.
This interface will be used by the qat-vfio-pci module.

Signed-off-by: Xin Zeng <xin.zeng@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-02 10:47:43 +08:00
Siming Wan
bbfdde7d19 crypto: qat - add bank save and restore flows
Add logic to save, restore, quiesce and drain a ring bank for QAT GEN4
devices.
This allows to save and restore the state of a Virtual Function (VF) and
will be used to implement VM live migration.

Signed-off-by: Siming Wan <siming.wan@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Xin Zeng <xin.zeng@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-02 10:47:43 +08:00
Siming Wan
3fa1057e35 crypto: qat - expand CSR operations for QAT GEN4 devices
Extend the CSR operations for QAT GEN4 devices to allow saving and
restoring the rings state.

The new operations will be used as a building block for implementing the
state save and restore of Virtual Functions necessary for VM live
migration.

This adds the following operations:
 - read ring status register
 - read ring underflow/overflow status register
 - read ring nearly empty status register
 - read ring nearly full status register
 - read ring full status register
 - read ring complete status register
 - read ring exception status register
 - read/write ring exception interrupt mask register
 - read ring configuration register
 - read ring base register
 - read/write ring interrupt enable register
 - read ring interrupt flag register
 - read/write ring interrupt source select register
 - read ring coalesced interrupt enable register
 - read ring coalesced interrupt control register
 - read ring flag and coalesced interrupt enable register
 - read ring service arbiter enable register
 - get ring coalesced interrupt control enable mask

Signed-off-by: Siming Wan <siming.wan@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Xin Zeng <xin.zeng@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-02 10:47:43 +08:00
Siming Wan
84058ffb91 crypto: qat - rename get_sla_arr_of_type()
The function get_sla_arr_of_type() returns a pointer to an SLA type
specific array.
Rename it and expose it as it will be used externally to this module.

This does not introduce any functional change.

Signed-off-by: Siming Wan <siming.wan@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Damian Muszynski <damian.muszynski@intel.com>
Signed-off-by: Xin Zeng <xin.zeng@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-02 10:47:43 +08:00
Giovanni Cabiddu
680302d191 crypto: qat - relocate CSR access code
As the common hw_data files are growing and the adf_hw_csr_ops is going
to be extended with new operations, move all logic related to ring CSRs
to the newly created adf_gen[2|4]_hw_csr_data.[c|h] files.

This does not introduce any functional change.

Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Xin Zeng <xin.zeng@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-02 10:47:43 +08:00
Xin Zeng
867e801005 crypto: qat - move PFVF compat checker to a function
Move the code that implements VF version compatibility on the PF side to
a separate function so that it can be reused when doing VM live
migration.

This does not introduce any functional change.

Signed-off-by: Xin Zeng <xin.zeng@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-02 10:47:43 +08:00
Xin Zeng
1f8d6a163c crypto: qat - relocate and rename 4xxx PF2VM definitions
Move and rename ADF_4XXX_PF2VM_OFFSET and ADF_4XXX_VM2PF_OFFSET to
ADF_GEN4_PF2VM_OFFSET and ADF_GEN4_VM2PF_OFFSET respectively.
These definitions are moved from adf_gen4_pfvf.c to adf_gen4_hw_data.h
as they are specific to GEN4 and not just to qat_4xxx.

This change is made in anticipation of their use in live migration.

This does not introduce any functional change.

Signed-off-by: Xin Zeng <xin.zeng@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-02 10:47:43 +08:00
Giovanni Cabiddu
1894cb1de6 crypto: qat - adf_get_etr_base() helper
Add and use the new helper function adf_get_etr_base() which retrieves
the virtual address of the ring bar.

This will be used extensively when adding support for Live Migration.

Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Xin Zeng <xin.zeng@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-02 10:47:43 +08:00
Linus Torvalds
174fdc93a2 Merge tag 'v6.9-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
 "This fixes a regression that broke iwd as well as a divide by zero in
  iaa"

* tag 'v6.9-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: iaa - Fix nr_cpus < nr_iaa case
  Revert "crypto: pkcs7 - remove sha1 support"
2024-03-25 10:48:23 -07:00
Linus Torvalds
4cece76496 Linux 6.9-rc1 v6.9-rc1 2024-03-24 14:10:05 -07:00
Linus Torvalds
ab8de2dbfc Merge tag 'efi-fixes-for-v6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI fixes from Ard Biesheuvel:

 - Fix logic that is supposed to prevent placement of the kernel image
   below LOAD_PHYSICAL_ADDR

 - Use the firmware stack in the EFI stub when running in mixed mode

 - Clear BSS only once when using mixed mode

 - Check efi.get_variable() function pointer for NULL before trying to
   call it

* tag 'efi-fixes-for-v6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  efi: fix panic in kdump kernel
  x86/efistub: Don't clear BSS twice in mixed mode
  x86/efistub: Call mixed mode boot services on the firmware's stack
  efi/libstub: fix efi_random_alloc() to allocate memory at alloc_min or higher address
2024-03-24 13:54:06 -07:00
Linus Torvalds
5e74df2f8f Merge tag 'x86-urgent-2024-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:

 - Ensure that the encryption mask at boot is properly propagated on
   5-level page tables, otherwise the PGD entry is incorrectly set to
   non-encrypted, which causes system crashes during boot.

 - Undo the deferred 5-level page table setup as it cannot work with
   memory encryption enabled.

 - Prevent inconsistent XFD state on CPU hotplug, where the MSR is reset
   to the default value but the cached variable is not, so subsequent
   comparisons might yield the wrong result and as a consequence the
   result prevents updating the MSR.

 - Register the local APIC address only once in the MPPARSE enumeration
   to prevent triggering the related WARN_ONs() in the APIC and topology
   code.

 - Handle the case where no APIC is found gracefully by registering a
   fake APIC in the topology code. That makes all related topology
   functions work correctly and does not affect the actual APIC driver
   code at all.

 - Don't evaluate logical IDs during early boot as the local APIC IDs
   are not yet enumerated and the invoked function returns an error
   code. Nothing requires the logical IDs before the final CPUID
   enumeration takes place, which happens after the enumeration.

 - Cure the fallout of the per CPU rework on UP which misplaced the
   copying of boot_cpu_data to per CPU data so that the final update to
   boot_cpu_data got lost which caused inconsistent state and boot
   crashes.

 - Use copy_from_kernel_nofault() in the kprobes setup as there is no
   guarantee that the address can be safely accessed.

 - Reorder struct members in struct saved_context to work around another
   kmemleak false positive

 - Remove the buggy code which tries to update the E820 kexec table for
   setup_data as that is never passed to the kexec kernel.

 - Update the resource control documentation to use the proper units.

 - Fix a Kconfig warning observed with tinyconfig

* tag 'x86-urgent-2024-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot/64: Move 5-level paging global variable assignments back
  x86/boot/64: Apply encryption mask to 5-level pagetable update
  x86/cpu: Add model number for another Intel Arrow Lake mobile processor
  x86/fpu: Keep xfd_state in sync with MSR_IA32_XFD
  Documentation/x86: Document that resctrl bandwidth control units are MiB
  x86/mpparse: Register APIC address only once
  x86/topology: Handle the !APIC case gracefully
  x86/topology: Don't evaluate logical IDs during early boot
  x86/cpu: Ensure that CPU info updates are propagated on UP
  kprobes/x86: Use copy_from_kernel_nofault() to read from unsafe address
  x86/pm: Work around false positive kmemleak report in msr_build_context()
  x86/kexec: Do not update E820 kexec table for setup_data
  x86/config: Fix warning for 'make ARCH=x86_64 tinyconfig'
2024-03-24 11:13:56 -07:00
Linus Torvalds
b136f68eb0 Merge tag 'sched-urgent-2024-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler doc clarification from Thomas Gleixner:
 "A single update for the documentation of the base_slice_ns tunable to
  clarify that any value which is less than the tick slice has no effect
  because the scheduler tick is not guaranteed to happen within the set
  time slice"

* tag 'sched-urgent-2024-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/doc: Update documentation for base_slice_ns and CONFIG_HZ relation
2024-03-24 11:11:05 -07:00
Linus Torvalds
864ad046c1 Merge tag 'dma-mapping-6.9-2024-03-24' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fixes from Christoph Hellwig:
 "This has a set of swiotlb alignment fixes for sometimes very long
  standing bugs from Will. We've been discussion them for a while and
  they should be solid now"

* tag 'dma-mapping-6.9-2024-03-24' of git://git.infradead.org/users/hch/dma-mapping:
  swiotlb: Reinstate page-alignment for mappings >= PAGE_SIZE
  iommu/dma: Force swiotlb_max_mapping_size on an untrusted device
  swiotlb: Fix alignment checks when both allocation and DMA masks are present
  swiotlb: Honour dma_alloc_coherent() alignment in swiotlb_alloc()
  swiotlb: Enforce page alignment in swiotlb_alloc()
  swiotlb: Fix double-allocation of slots due to broken alignment handling
2024-03-24 10:45:31 -07:00
Oleksandr Tymoshenko
62b71cd73d efi: fix panic in kdump kernel
Check if get_next_variable() is actually valid pointer before
calling it. In kdump kernel this method is set to NULL that causes
panic during the kexec-ed kernel boot.

Tested with QEMU and OVMF firmware.

Fixes: bad267f9e1 ("efi: verify that variable services are supported")
Signed-off-by: Oleksandr Tymoshenko <ovt@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-03-24 09:28:33 +01:00
Ard Biesheuvel
df7ecce842 x86/efistub: Don't clear BSS twice in mixed mode
Clearing BSS should only be done once, at the very beginning.
efi_pe_entry() is the entrypoint from the firmware, which may not clear
BSS and so it is done explicitly. However, efi_pe_entry() is also used
as an entrypoint by the mixed mode startup code, in which case BSS will
already have been cleared, and doing it again at this point will corrupt
global variables holding the firmware's GDT/IDT and segment selectors.

So make the memset() conditional on whether the EFI stub is running in
native mode.

Fixes: b3810c5a2c ("x86/efistub: Clear decompressor BSS in native EFI entrypoint")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-03-24 09:28:33 +01:00
Ard Biesheuvel
cefcd4fe2e x86/efistub: Call mixed mode boot services on the firmware's stack
Normally, the EFI stub calls into the EFI boot services using the stack
that was live when the stub was entered. According to the UEFI spec,
this stack needs to be at least 128k in size - this might seem large but
all asynchronous processing and event handling in EFI runs from the same
stack and so quite a lot of space may be used in practice.

In mixed mode, the situation is a bit different: the bootloader calls
the 32-bit EFI stub entry point, which calls the decompressor's 32-bit
entry point, where the boot stack is set up, using a fixed allocation
of 16k. This stack is still in use when the EFI stub is started in
64-bit mode, and so all calls back into the EFI firmware will be using
the decompressor's limited boot stack.

Due to the placement of the boot stack right after the boot heap, any
stack overruns have gone unnoticed. However, commit

  5c4feadb0011983b ("x86/decompressor: Move global symbol references to C code")

moved the definition of the boot heap into C code, and now the boot
stack is placed right at the base of BSS, where any overruns will
corrupt the end of the .data section.

While it would be possible to work around this by increasing the size of
the boot stack, doing so would affect all x86 systems, and mixed mode
systems are a tiny (and shrinking) fraction of the x86 installed base.

So instead, record the firmware stack pointer value when entering from
the 32-bit firmware, and switch to this stack every time a EFI boot
service call is made.

Cc: <stable@kernel.org> # v6.1+
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-03-24 09:28:32 +01:00
Tom Lendacky
9843231c97 x86/boot/64: Move 5-level paging global variable assignments back
Commit 63bed96604 ("x86/startup_64: Defer assignment of 5-level paging
global variables") moved assignment of 5-level global variables to later
in the boot in order to avoid having to use RIP relative addressing in
order to set them. However, when running with 5-level paging and SME
active (mem_encrypt=on), the variables are needed as part of the page
table setup needed to encrypt the kernel (using pgd_none(), p4d_offset(),
etc.). Since the variables haven't been set, the page table manipulation
is done as if 4-level paging is active, causing the system to crash on
boot.

While only a subset of the assignments that were moved need to be set
early, move all of the assignments back into check_la57_support() so that
these assignments aren't spread between two locations. Instead of just
reverting the fix, this uses the new RIP_REL_REF() macro when assigning
the variables.

Fixes: 63bed96604 ("x86/startup_64: Defer assignment of 5-level paging global variables")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/2ca419f4d0de719926fd82353f6751f717590a86.1711122067.git.thomas.lendacky@amd.com
2024-03-24 05:00:36 +01:00
Tom Lendacky
4d0d7e7852 x86/boot/64: Apply encryption mask to 5-level pagetable update
When running with 5-level page tables, the kernel mapping PGD entry is
updated to point to the P4D table. The assignment uses _PAGE_TABLE_NOENC,
which, when SME is active (mem_encrypt=on), results in a page table
entry without the encryption mask set, causing the system to crash on
boot.

Change the assignment to use _PAGE_TABLE instead of _PAGE_TABLE_NOENC so
that the encryption mask is set for the PGD entry.

Fixes: 533568e06b ("x86/boot/64: Use RIP_REL_REF() to access early_top_pgt[]")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/8f20345cda7dbba2cf748b286e1bc00816fe649a.1711122067.git.thomas.lendacky@amd.com
2024-03-24 05:00:35 +01:00
Tony Luck
8a8a9c9047 x86/cpu: Add model number for another Intel Arrow Lake mobile processor
This one is the regular laptop CPU.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20240322161725.195614-1-tony.luck@intel.com
2024-03-24 04:08:10 +01:00
Adamos Ttofari
10e4b5166d x86/fpu: Keep xfd_state in sync with MSR_IA32_XFD
Commit 672365477a ("x86/fpu: Update XFD state where required") and
commit 8bf26758ca ("x86/fpu: Add XFD state to fpstate") introduced a
per CPU variable xfd_state to keep the MSR_IA32_XFD value cached, in
order to avoid unnecessary writes to the MSR.

On CPU hotplug MSR_IA32_XFD is reset to the init_fpstate.xfd, which
wipes out any stale state. But the per CPU cached xfd value is not
reset, which brings them out of sync.

As a consequence a subsequent xfd_update_state() might fail to update
the MSR which in turn can result in XRSTOR raising a #NM in kernel
space, which crashes the kernel.

To fix this, introduce xfd_set_state() to write xfd_state together
with MSR_IA32_XFD, and use it in all places that set MSR_IA32_XFD.

Fixes: 672365477a ("x86/fpu: Update XFD state where required")
Signed-off-by: Adamos Ttofari <attofari@amazon.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240322230439.456571-1-chang.seok.bae@intel.com

Closes: https://lore.kernel.org/lkml/20230511152818.13839-1-attofari@amazon.de
2024-03-24 04:03:54 +01:00
Tony Luck
a8ed59a3a8 Documentation/x86: Document that resctrl bandwidth control units are MiB
The memory bandwidth software controller uses 2^20 units rather than
10^6. See mbm_bw_count() which computes bandwidth using the "SZ_1M"
Linux define for 0x00100000.

Update the documentation to use MiB when describing this feature.
It's too late to fix the mount option "mba_MBps" as that is now an
established user interface.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20240322182016.196544-1-tony.luck@intel.com
2024-03-24 03:58:43 +01:00
Linus Torvalds
70293240c5 Merge tag 'timers-urgent-2024-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
 "Two regression fixes for the timer and timer migration code:

   - Prevent endless timer requeuing which is caused by two CPUs racing
     out of idle. This happens when the last CPU goes idle and therefore
     has to ensure to expire the pending global timers and some other
     CPU come out of idle at the same time and the other CPU wins the
     race and expires the global queue. This causes the last CPU to
     chase ghost timers forever and reprogramming it's clockevent device
     endlessly.

     Cure this by re-evaluating the wakeup time unconditionally.

   - The split into local (pinned) and global timers in the timer wheel
     caused a regression for NOHZ full as it broke the idle tracking of
     global timers. On NOHZ full this prevents an self IPI being sent
     which in turn causes the timer to be not programmed and not being
     expired on time.

     Restore the idle tracking for the global timer base so that the
     self IPI condition for NOHZ full is working correctly again"

* tag 'timers-urgent-2024-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timers: Fix removed self-IPI on global timer's enqueue in nohz_full
  timers/migration: Fix endless timer requeue after idle interrupts
2024-03-23 14:49:25 -07:00
Linus Torvalds
00164f477f Merge tag 'timers-core-2024-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull more clocksource updates from Thomas Gleixner:
 "A set of updates for clocksource and clockevent drivers:

   - A fix for the prescaler of the ARM global timer where the prescaler
     mask define only covered 4 bits while it is actully 8 bits wide.
     This obviously restricted the possible range of prescaler
     adjustments

   - A fix for the RISC-V timer which prevents a timer interrupt being
     raised while the timer is initialized

   - A set of device tree updates to support new system on chips in
     various drivers

   - Kernel-doc and other cleanups all over the place"

* tag 'timers-core-2024-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clocksource/drivers/timer-riscv: Clear timer interrupt on timer initialization
  dt-bindings: timer: Add support for cadence TTC PWM
  clocksource/drivers/arm_global_timer: Simplify prescaler register access
  clocksource/drivers/arm_global_timer: Guard against division by zero
  clocksource/drivers/arm_global_timer: Make gt_target_rate unsigned long
  dt-bindings: timer: add Ralink SoCs system tick counter
  clocksource: arm_global_timer: fix non-kernel-doc comment
  clocksource/drivers/arm_global_timer: Remove stray tab
  clocksource/drivers/arm_global_timer: Fix maximum prescaler value
  clocksource/drivers/imx-sysctr: Add i.MX95 support
  clocksource/drivers/imx-sysctr: Drop use global variables
  dt-bindings: timer: nxp,sysctr-timer: support i.MX95
  dt-bindings: timer: renesas: ostm: Document RZ/Five SoC
  dt-bindings: timer: renesas,tmu: Document input capture interrupt
  clocksource/drivers/ti-32K: Fix misuse of "/**" comment
  clocksource/drivers/stm32: Fix all kernel-doc warnings
  dt-bindings: timer: exynos4210-mct: Add google,gs101-mct compatible
  clocksource/drivers/imx: Fix -Wunused-but-set-variable warning
2024-03-23 14:42:45 -07:00
Linus Torvalds
1a39193137 Merge tag 'irq-urgent-2024-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
 "A series of fixes for the Renesas RZG21 interrupt chip driver to
  prevent spurious and misrouted interrupts.

   - Ensure that posted writes are flushed in the eoi() callback

   - Ensure that interrupts are masked at the chip level when the
     trigger type is changed

   - Clear the interrupt status register when setting up edge type
     trigger modes

   - Ensure that the trigger type and routing information is set before
     the interrupt is enabled"

* tag 'irq-urgent-2024-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/renesas-rzg2l: Do not set TIEN and TINT source at the same time
  irqchip/renesas-rzg2l: Prevent spurious interrupts when setting trigger type
  irqchip/renesas-rzg2l: Rename rzg2l_irq_eoi()
  irqchip/renesas-rzg2l: Rename rzg2l_tint_eoi()
  irqchip/renesas-rzg2l: Flush posted write in irq_eoi()
2024-03-23 14:30:38 -07:00
Linus Torvalds
976b029d06 Merge tag 'core-entry-2024-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core entry fix from Thomas Gleixner:
 "A single fix for the generic entry code:

  The trace_sys_enter() tracepoint can modify the syscall number via
  kprobes or BPF in pt_regs, but that requires that the syscall number
  is re-evaluted from pt_regs after the tracepoint.

  A seccomp fix in that area removed the re-evaluation so the change
  does not take effect as the code just uses the locally cached number.

  Restore the original behaviour by re-evaluating the syscall number
  after the tracepoint"

* tag 'core-entry-2024-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  entry: Respect changes to system call number by trace_sys_enter()
2024-03-23 14:17:37 -07:00
Linus Torvalds
484193fecd Merge tag 'powerpc-6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull more powerpc updates from Michael Ellerman:

 - Handle errors in mark_rodata_ro() and mark_initmem_nx()

 - Make struct crash_mem available without CONFIG_CRASH_DUMP

Thanks to Christophe Leroy and Hari Bathini.

* tag 'powerpc-6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/kdump: Split KEXEC_CORE and CRASH_DUMP dependency
  powerpc/kexec: split CONFIG_KEXEC_FILE and CONFIG_CRASH_DUMP
  kexec/kdump: make struct crash_mem available without CONFIG_CRASH_DUMP
  powerpc: Handle error in mark_rodata_ro() and mark_initmem_nx()
2024-03-23 09:21:26 -07:00
Linus Torvalds
02fb638bed Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM updates from Russell King:

 - remove a misuse of kernel-doc comment

 - use "Call trace:" for backtraces like other architectures

 - implement copy_from_kernel_nofault_allowed() to fix a LKDTM test

 - add a "cut here" line for prefetch aborts

 - remove unnecessary Kconfing entry for FRAME_POINTER

 - remove iwmmxy support for PJ4/PJ4B cores

 - use bitfield helpers in ptrace to improve readabililty

 - check if folio is reserved before flushing

* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: 9359/1: flush: check if the folio is reserved for no-mapping addresses
  ARM: 9354/1: ptrace: Use bitfield helpers
  ARM: 9352/1: iwmmxt: Remove support for PJ4/PJ4B cores
  ARM: 9353/1: remove unneeded entry for CONFIG_FRAME_POINTER
  ARM: 9351/1: fault: Add "cut here" line for prefetch aborts
  ARM: 9350/1: fault: Implement copy_from_kernel_nofault_allowed()
  ARM: 9349/1: unwind: Add missing "Call trace:" line
  ARM: 9334/1: mm: init: remove misuse of kernel-doc comment
2024-03-23 09:17:03 -07:00
Linus Torvalds
b71871395c Merge tag 'hardening-v6.9-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull more hardening updates from Kees Cook:

 - CONFIG_MEMCPY_SLOW_KUNIT_TEST is no longer needed (Guenter Roeck)

 - Fix needless UTF-8 character in arch/Kconfig (Liu Song)

 - Improve __counted_by warning message in LKDTM (Nathan Chancellor)

 - Refactor DEFINE_FLEX() for default use of __counted_by

 - Disable signed integer overflow sanitizer on GCC < 8

* tag 'hardening-v6.9-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  lkdtm/bugs: Improve warning message for compilers without counted_by support
  overflow: Change DEFINE_FLEX to take __counted_by member
  Revert "kunit: memcpy: Split slow memcpy tests into MEMCPY_SLOW_KUNIT_TEST"
  arch/Kconfig: eliminate needless UTF-8 character in Kconfig help
  ubsan: Disable signed integer overflow sanitizer on GCC < 8
2024-03-23 08:43:21 -07:00
Thomas Gleixner
f2208aa12c x86/mpparse: Register APIC address only once
The APIC address is registered twice. First during the early detection and
afterwards when actually scanning the table for APIC IDs. The APIC and
topology core warn about the second attempt.

Restrict it to the early detection call.

Fixes: 81287ad65d ("x86/apic: Sanitize APIC address setup")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20240322185305.297774848@linutronix.de
2024-03-23 12:41:48 +01:00
Thomas Gleixner
5e25eb25da x86/topology: Handle the !APIC case gracefully
If there is no local APIC enumerated and registered then the topology
bitmaps are empty. Therefore, topology_init_possible_cpus() will die with
a division by zero exception.

Prevent this by registering a fake APIC id to populate the topology
bitmap. This also allows to use all topology query interfaces
unconditionally. It does not affect the actual APIC code because either
the local APIC address was not registered or no local APIC could be
detected.

Fixes: f1f758a805 ("x86/topology: Add a mechanism to track topology via APIC IDs")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20240322185305.242709302@linutronix.de
2024-03-23 12:35:56 +01:00
Thomas Gleixner
7af541cee1 x86/topology: Don't evaluate logical IDs during early boot
The local APICs have not yet been enumerated so the logical ID evaluation
from the topology bitmaps does not work and would return an error code.

Skip the evaluation during the early boot CPUID evaluation and only apply
it on the final run.

Fixes: 380414be78 ("x86/cpu/topology: Use topology logical mapping mechanism")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20240322185305.186943142@linutronix.de
2024-03-23 12:28:06 +01:00
Thomas Gleixner
c90399fbd7 x86/cpu: Ensure that CPU info updates are propagated on UP
The boot sequence evaluates CPUID information twice:

  1) During early boot

  2) When finalizing the early setup right before
     mitigations are selected and alternatives are patched.

In both cases the evaluation is stored in boot_cpu_data, but on UP the
copying of boot_cpu_data to the per CPU info of the boot CPU happens
between #1 and #2. So any update which happens in #2 is never propagated to
the per CPU info instance.

Consolidate the whole logic and copy boot_cpu_data right before applying
alternatives as that's the point where boot_cpu_data is in it's final
state and not supposed to change anymore.

This also removes the voodoo mb() from smp_prepare_cpus_common() which
had absolutely no purpose.

Fixes: 71eb4893cf ("x86/percpu: Cure per CPU madness on UP")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20240322185305.127642785@linutronix.de
2024-03-23 12:22:04 +01:00
Nathan Chancellor
231dc3f0c9 lkdtm/bugs: Improve warning message for compilers without counted_by support
The current message for telling the user that their compiler does not
support the counted_by attribute in the FAM_BOUNDS test does not make
much sense either grammatically or semantically. Fix it to make it
correct in both aspects.

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20240321-lkdtm-improve-lack-of-counted_by-msg-v1-1-0fbf7481a29c@kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
2024-03-22 16:25:31 -07:00
Kees Cook
d8e45f2929 overflow: Change DEFINE_FLEX to take __counted_by member
The norm should be flexible array structures with __counted_by
annotations, so DEFINE_FLEX() is updated to expect that. Rename
the non-annotated version to DEFINE_RAW_FLEX(), and update the
few existing users. Additionally add selftests for the macros.

Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20240306235128.it.933-kees@kernel.org
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
2024-03-22 16:25:31 -07:00
Linus Torvalds
bfa8f18691 Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull more SCSI updates from James Bottomley:
 "The vfs has long had a write lifetime hint mechanism that gives the
  expected longevity on storage of the data being written. f2fs was the
  original consumer of this and used the hint for flash data placement
  (mostly to avoid write amplification by placing objects with similar
  lifetimes in the same erase block).

  More recently the SCSI based UFS (Universal Flash Storage) drivers
  have wanted to take advantage of this as well, for the same reasons as
  f2fs, necessitating plumbing the write hints through the block layer
  and then adding it to the SCSI core.

  The vfs write_hints already taken plumbs this as far as block and this
  completes the SCSI core enabling based on a recently agreed reuse of
  the old write command group number. The additions to the scsi_debug
  driver are for emulating this property so we can run tests on it in
  the absence of an actual UFS device"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: scsi_debug: Maintain write statistics per group number
  scsi: scsi_debug: Implement GET STREAM STATUS
  scsi: scsi_debug: Implement the IO Advice Hints Grouping mode page
  scsi: scsi_debug: Allocate the MODE SENSE response from the heap
  scsi: scsi_debug: Rework subpage code error handling
  scsi: scsi_debug: Rework page code error handling
  scsi: scsi_debug: Support the block limits extension VPD page
  scsi: scsi_debug: Reduce code duplication
  scsi: sd: Translate data lifetime information
  scsi: scsi_proto: Add structures and constants related to I/O groups and streams
  scsi: core: Query the Block Limits Extension VPD page
2024-03-22 13:31:07 -07:00