Commit Graph

19238 Commits

Author SHA1 Message Date
Pawan Gupta
6613d82e61 x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key
The VERW mitigation at exit-to-user is enabled via a static branch
mds_user_clear. This static branch is never toggled after boot, and can
be safely replaced with an ALTERNATIVE() which is convenient to use in
asm.

Switch to ALTERNATIVE() to use the VERW mitigation late in exit-to-user
path. Also remove the now redundant VERW in exc_nmi() and
arch_exit_to_user_mode().

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20240213-delay-verw-v8-4-a6216d83edb7%40linux.intel.com
2024-02-19 16:31:49 -08:00
Linus Torvalds
c021e191cf Merge tag 'x86_urgent_for_v6.8_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:

 - Correct the minimum CPU family for Transmeta Crusoe in Kconfig so
   that such hw can boot again

 - Do not take into accout XSTATE buffer size info supplied by userspace
   when constructing a sigreturn frame

 - Switch get_/put_user* to EX_TYPE_UACCESS exception handling when an
   MCE is encountered so that it can be properly recovered from instead
   of simply panicking

* tag 'x86_urgent_for_v6.8_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/Kconfig: Transmeta Crusoe is CPU family 5, not 6
  x86/fpu: Stop relying on userspace for info to fault in xsave buffer
  x86/lib: Revert to _ASM_EXTABLE_UA() for {get,put}_user() fixups
2024-02-11 11:41:51 -08:00
Linus Torvalds
5c24ba2055 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
 "x86 guest:

   - Avoid false positive for check that only matters on AMD processors

  x86:

   - Give a hint when Win2016 might fail to boot due to XSAVES &&
     !XSAVEC configuration

   - Do not allow creating an in-kernel PIT unless an IOAPIC already
     exists

  RISC-V:

   - Allow ISA extensions that were enabled for bare metal in 6.8 (Zbc,
     scalar and vector crypto, Zfh[min], Zihintntl, Zvfh[min], Zfa)

  S390:

   - fix CC for successful PQAP instruction

   - fix a race when creating a shadow page"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  x86/coco: Define cc_vendor without CONFIG_ARCH_HAS_CC_PLATFORM
  x86/kvm: Fix SEV check in sev_map_percpu_data()
  KVM: x86: Give a hint when Win2016 might fail to boot due to XSAVES erratum
  KVM: x86: Check irqchip mode before create PIT
  KVM: riscv: selftests: Add Zfa extension to get-reg-list test
  RISC-V: KVM: Allow Zfa extension for Guest/VM
  KVM: riscv: selftests: Add Zvfh[min] extensions to get-reg-list test
  RISC-V: KVM: Allow Zvfh[min] extensions for Guest/VM
  KVM: riscv: selftests: Add Zihintntl extension to get-reg-list test
  RISC-V: KVM: Allow Zihintntl extension for Guest/VM
  KVM: riscv: selftests: Add Zfh[min] extensions to get-reg-list test
  RISC-V: KVM: Allow Zfh[min] extensions for Guest/VM
  KVM: riscv: selftests: Add vector crypto extensions to get-reg-list test
  RISC-V: KVM: Allow vector crypto extensions for Guest/VM
  KVM: riscv: selftests: Add scaler crypto extensions to get-reg-list test
  RISC-V: KVM: Allow scalar crypto extensions for Guest/VM
  KVM: riscv: selftests: Add Zbc extension to get-reg-list test
  RISC-V: KVM: Allow Zbc extension for Guest/VM
  KVM: s390: fix cc for successful PQAP
  KVM: s390: vsie: fix race during shadow creation
2024-02-07 17:52:16 +00:00
Kirill A. Shutemov
a9ef277488 x86/kvm: Fix SEV check in sev_map_percpu_data()
The function sev_map_percpu_data() checks if it is running on an SEV
platform by checking the CC_ATTR_GUEST_MEM_ENCRYPT attribute. However,
this attribute is also defined for TDX.

To avoid false positives, add a cc_vendor check.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Fixes: 4d96f91091 ("x86/sev: Replace occurrences of sev_active() with cc_platform_has()")
Suggested-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: David Rientjes <rientjes@google.com>
Message-Id: <20240124130317.495519-1-kirill.shutemov@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-01-31 16:21:01 -05:00
Andrei Vagin
d877550eaf x86/fpu: Stop relying on userspace for info to fault in xsave buffer
Before this change, the expected size of the user space buffer was
taken from fx_sw->xstate_size. fx_sw->xstate_size can be changed
from user-space, so it is possible construct a sigreturn frame where:

 * fx_sw->xstate_size is smaller than the size required by valid bits in
   fx_sw->xfeatures.
 * user-space unmaps parts of the sigrame fpu buffer so that not all of
   the buffer required by xrstor is accessible.

In this case, xrstor tries to restore and accesses the unmapped area
which results in a fault. But fault_in_readable succeeds because buf +
fx_sw->xstate_size is within the still mapped area, so it goes back and
tries xrstor again. It will spin in this loop forever.

Instead, fault in the maximum size which can be touched by XRSTOR (taken
from fpstate->user_size).

[ dhansen: tweak subject / changelog ]

Fixes: fcb3635f50 ("x86/fpu/signal: Handle #PF in the direct restore path")
Reported-by: Konstantin Bogomolov <bogomolov@google.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrei Vagin <avagin@google.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/20240130063603.3392627-1-avagin%40google.com
2024-01-30 07:25:48 -08:00
Mario Limonciello
b9328fd636 x86/CPU/AMD: Add more models to X86_FEATURE_ZEN5
Add model ranges starting at 0x20, 0x40 and 0x70 to the synthetic
feature flag X86_FEATURE_ZEN5.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240124220749.2983-1-mario.limonciello@amd.com
2024-01-25 12:26:21 +01:00
Borislav Petkov (AMD)
3e4147f33f x86/CPU/AMD: Add X86_FEATURE_ZEN5
Add a synthetic feature flag for Zen5.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240104201138.5072-1-bp@alien8.de
2024-01-23 11:06:18 +01:00
Juergen Gross
f7cfe7017b x86/paravirt: Make BUG_func() usable by non-GPL modules
Several inlined functions subject to paravirt patching are referencing
BUG_func() after the recent switch to the alternative patching
mechanism.

As those functions can legally be used by non-GPL modules, BUG_func()
must be usable by those modules, too. So use EXPORT_SYMBOL() when
exporting BUG_func().

Fixes: 9824b00c2b ("x86/paravirt: Move some functions and defines to alternative.c")
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240109082232.22657-1-jgross@suse.com
2024-01-22 15:51:07 +01:00
Linus Torvalds
378de6df19 Merge tag 'rtc-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
 "There are three new drivers this cycle. Also the cmos driver is
  getting fixes for longstanding wakeup issues on AMD.

  New drivers:
   - Analog Devices MAX31335
   - Nuvoton ma35d1
   - Texas Instrument TPS6594 PMIC RTC

  Drivers:
   - cmos: use ACPI alarm instead of HPET on recent AMD platforms
   - nuvoton: add NCT3015Y-R and NCT3018Y-R support
   - rv8803: proper suspend/resume and wakeup-source support"

* tag 'rtc-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (26 commits)
  rtc: nuvoton: Compatible with NCT3015Y-R and NCT3018Y-R
  rtc: da9063: Use dev_err_probe()
  rtc: da9063: Use device_get_match_data()
  rtc: da9063: Make IRQ as optional
  rtc: max31335: Fix comparison in max31335_volatile_reg()
  rtc: max31335: use regmap_update_bits_check
  rtc: max31335: remove unecessary locking
  rtc: max31335: add driver support
  dt-bindings: rtc: max31335: add max31335 bindings
  rtc: rv8803: add wakeup-source support
  rtc: ac100: remove misuses of kernel-doc
  rtc: class: Remove usage of the deprecated ida_simple_xx() API
  rtc: MAINTAINERS: drop Alessandro Zummo
  rtc: ma35d1: remove hardcoded UIE support
  dt-bindings: rtc: qcom-pm8xxx: fix inconsistent example
  rtc: rv8803: Add power management support
  rtc: ds3232: avoid unused-const-variable warning
  rtc: lpc24xx: add missing dependency
  rtc: tps6594: Add driver for TPS6594 RTC
  rtc: Add driver for Nuvoton ma35d1 rtc controller
  ...
2024-01-18 17:25:39 -08:00
Linus Torvalds
0dde2bf67b Merge tag 'iommu-updates-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu updates from Joerg Roedel:
 "Core changes:
   - Fix race conditions in device probe path
   - Retire IOMMU bus_ops
   - Support for passing custom allocators to page table drivers
   - Clean up Kconfig around IOMMU_SVA
   - Support for sharing SVA domains with all devices bound to a mm
   - Firmware data parsing cleanup
   - Tracing improvements for iommu-dma code
   - Some smaller fixes and cleanups

  ARM-SMMU drivers:
   - Device-tree binding updates:
      - Add additional compatible strings for Qualcomm SoCs
      - Document Adreno clocks for Qualcomm's SM8350 SoC
   - SMMUv2:
      - Implement support for the ->domain_alloc_paging() callback
      - Ensure Secure context is restored following suspend of Qualcomm
        SMMU implementation
   - SMMUv3:
      - Disable stalling mode for the "quiet" context descriptor
      - Minor refactoring and driver cleanups

  Intel VT-d driver:
   - Cleanup and refactoring

  AMD IOMMU driver:
   - Improve IO TLB invalidation logic
   - Small cleanups and improvements

  Rockchip IOMMU driver:
   - DT binding update to add Rockchip RK3588

  Apple DART driver:
   - Apple M1 USB4/Thunderbolt DART support
   - Cleanups

  Virtio IOMMU driver:
   - Add support for iotlb_sync_map
   - Enable deferred IO TLB flushes"

* tag 'iommu-updates-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (66 commits)
  iommu: Don't reserve 0-length IOVA region
  iommu/vt-d: Move inline helpers to header files
  iommu/vt-d: Remove unused vcmd interfaces
  iommu/vt-d: Remove unused parameter of intel_pasid_setup_pass_through()
  iommu/vt-d: Refactor device_to_iommu() to retrieve iommu directly
  iommu/sva: Fix memory leak in iommu_sva_bind_device()
  dt-bindings: iommu: rockchip: Add Rockchip RK3588
  iommu/dma: Trace bounce buffer usage when mapping buffers
  iommu/arm-smmu: Convert to domain_alloc_paging()
  iommu/arm-smmu: Pass arm_smmu_domain to internal functions
  iommu/arm-smmu: Implement IOMMU_DOMAIN_BLOCKED
  iommu/arm-smmu: Convert to a global static identity domain
  iommu/arm-smmu: Reorganize arm_smmu_domain_add_master()
  iommu/arm-smmu-v3: Remove ARM_SMMU_DOMAIN_NESTED
  iommu/arm-smmu-v3: Master cannot be NULL in arm_smmu_write_strtab_ent()
  iommu/arm-smmu-v3: Add a type for the STE
  iommu/arm-smmu-v3: disable stall for quiet_cd
  iommu/qcom: restore IOMMU state if needed
  iommu/arm-smmu-qcom: Add QCM2290 MDSS compatible
  iommu/arm-smmu-qcom: Add missing GMU entry to match table
  ...
2024-01-18 15:16:57 -08:00
Linus Torvalds
b4442cadca Merge tag 'x86_tdx_for_6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 TDX updates from Dave Hansen:
 "This contains the initial support for host-side TDX support so that
  KVM can run TDX-protected guests. This does not include the actual
  KVM-side support which will come from the KVM folks. The TDX host
  interactions with kexec also needs to be ironed out before this is
  ready for prime time, so this code is currently Kconfig'd off when
  kexec is on.

  The majority of the code here is the kernel telling the TDX module
  which memory to protect and handing some additional memory over to it
  to use to store TDX module metadata. That sounds pretty simple, but
  the TDX architecture is rather flexible and it takes quite a bit of
  back-and-forth to say, "just protect all memory, please."

  There is also some code tacked on near the end of the series to handle
  a hardware erratum. The erratum can make software bugs such as a
  kernel write to TDX-protected memory cause a machine check and
  masquerade as a real hardware failure. The erratum handling watches
  out for these and tries to provide nicer user errors"

* tag 'x86_tdx_for_6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
  x86/virt/tdx: Make TDX host depend on X86_MCE
  x86/virt/tdx: Disable TDX host support when kexec is enabled
  Documentation/x86: Add documentation for TDX host support
  x86/mce: Differentiate real hardware #MCs from TDX erratum ones
  x86/cpu: Detect TDX partial write machine check erratum
  x86/virt/tdx: Handle TDX interaction with sleep and hibernation
  x86/virt/tdx: Initialize all TDMRs
  x86/virt/tdx: Configure global KeyID on all packages
  x86/virt/tdx: Configure TDX module with the TDMRs and global KeyID
  x86/virt/tdx: Designate reserved areas for all TDMRs
  x86/virt/tdx: Allocate and set up PAMTs for TDMRs
  x86/virt/tdx: Fill out TDMRs to cover all TDX memory regions
  x86/virt/tdx: Add placeholder to construct TDMRs to cover all TDX memory regions
  x86/virt/tdx: Get module global metadata for module initialization
  x86/virt/tdx: Use all system memory when initializing TDX module as TDX memory
  x86/virt/tdx: Add skeleton to enable TDX on demand
  x86/virt/tdx: Add SEAMCALL error printing for module initialization
  x86/virt/tdx: Handle SEAMCALL no entropy error in common code
  x86/virt/tdx: Make INTEL_TDX_HOST depend on X86_X2APIC
  x86/virt/tdx: Define TDX supported page sizes as macros
  ...
2024-01-18 13:41:48 -08:00
Linus Torvalds
80955ae955 Merge tag 'driver-core-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
 "Here are the set of driver core and kernfs changes for 6.8-rc1.
  Nothing major in here this release cycle, just lots of small cleanups
  and some tweaks on kernfs that in the very end, got reverted and will
  come back in a safer way next release cycle.

  Included in here are:

   - more driver core 'const' cleanups and fixes

   - fw_devlink=rpm is now the default behavior

   - kernfs tiny changes to remove some string functions

   - cpu handling in the driver core is updated to work better on many
     systems that add topologies and cpus after booting

   - other minor changes and cleanups

  All of the cpu handling patches have been acked by the respective
  maintainers and are coming in here in one series. Everything has been
  in linux-next for a while with no reported issues"

* tag 'driver-core-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (51 commits)
  Revert "kernfs: convert kernfs_idr_lock to an irq safe raw spinlock"
  kernfs: convert kernfs_idr_lock to an irq safe raw spinlock
  class: fix use-after-free in class_register()
  PM: clk: make pm_clk_add_notifier() take a const pointer
  EDAC: constantify the struct bus_type usage
  kernfs: fix reference to renamed function
  driver core: device.h: fix Excess kernel-doc description warning
  driver core: class: fix Excess kernel-doc description warning
  driver core: mark remaining local bus_type variables as const
  driver core: container: make container_subsys const
  driver core: bus: constantify subsys_register() calls
  driver core: bus: make bus_sort_breadthfirst() take a const pointer
  kernfs: d_obtain_alias(NULL) will do the right thing...
  driver core: Better advertise dev_err_probe()
  kernfs: Convert kernfs_path_from_node_locked() from strlcpy() to strscpy()
  kernfs: Convert kernfs_name_locked() from strlcpy() to strscpy()
  kernfs: Convert kernfs_walk_ns() from strlcpy() to strscpy()
  initramfs: Expose retained initrd as sysfs file
  fs/kernfs/dir: obey S_ISGID
  kernel/cgroup: use kernfs_create_dir_ns()
  ...
2024-01-18 09:48:40 -08:00
Linus Torvalds
e1aa9df440 Merge tag 'pci-v6.8-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci updates from Bjorn Helgaas:
 "Enumeration:

   - Reserve ECAM so we don't assign it to PCI BARs; this works around
     bugs where BIOS included ECAM in a PNP0A03 host bridge window,
     didn't reserve it via a PNP0C02 motherboard device, and didn't
     allocate space for SR-IOV VF BARs (Bjorn Helgaas)

   - Add MMCONFIG/ECAM debug logging (Bjorn Helgaas)

   - Rename 'MMCONFIG' to 'ECAM' to match spec usage (Bjorn Helgaas)

   - Log device type (Root Port, Switch Port, etc) during enumeration
     (Bjorn Helgaas)

   - Log bridges before downstream devices so the dmesg order is more
     logical (Bjorn Helgaas)

   - Log resource names (BAR 0, VF BAR 0, bridge window, etc)
     consistently instead of a mix of names and "reg 0x10" (Puranjay
     Mohan, Bjorn Helgaas)

   - Fix 64GT/s effective data rate calculation to use 1b/1b encoding
     rather than the 8b/10b or 128b/130b used by lower rates (Ilpo
     Järvinen)

   - Use PCI_HEADER_TYPE_* instead of literals in x86, powerpc, SCSI
     lpfc (Ilpo Järvinen)

   - Clean up open-coded PCIBIOS return code mangling (Ilpo Järvinen)

  Resource management:

   - Restructure pci_dev_for_each_resource() to avoid computing the
     address of an out-of-bounds array element (the bounds check was
     performed later so the element was never actually *read*, but it's
     nicer to avoid even computing an out-of-bounds address) (Andy
     Shevchenko)

  Driver binding:

   - Convert pci-host-common.c platform .remove() callback to
     .remove_new() returning 'void' since it's not useful to return
     error codes here (Uwe Kleine-König)

   - Convert exynos, keystone, kirin from .remove() to .remove_new(),
     which returns void instead of int (Uwe Kleine-König)

   - Drop unused struct pci_driver.node member (Mathias Krause)

  Virtualization:

   - Add ACS quirk for more Zhaoxin Root Ports (LeoLiuoc)

  Error handling:

   - Log AER errors as "Correctable" (not "Corrected") or
     "Uncorrectable" to match spec terminology (Bjorn Helgaas)

   - Decode Requester ID when no error info found instead of printing
     the raw hex value (Bjorn Helgaas)

  Endpoint framework:

   - Use a unique test pattern for each BAR in the pci_endpoint_test to
     make it easier to debug address translation issues (Niklas Cassel)

  Broadcom STB PCIe controller driver:

   - Add DT property "brcm,clkreq-mode" and driver support for different
     CLKREQ# modes to make ASPM L1.x states possible (Jim Quinlan)

  Freescale Layerscape PCIe controller driver:

   - Add suspend/resume support for Layerscape LS1043a and LS1021a,
     including software-managed PME_Turn_Off and transitions between L0,
     L2/L3_Ready Link states (Frank Li)

  MediaTek PCIe controller driver:

   - Clear MSI interrupt status before handler to avoid missing MSIs
     that occur after the handler (qizhong cheng)

  MediaTek PCIe Gen3 controller driver:

   - Update mediatek-gen3 translation window setup to handle MMIO space
     that is not a power of two in size (Jianjun Wang)

  Qualcomm PCIe controller driver:

   - Increase qcom iommu-map maxItems to accommodate SDX55 (five
     entries) and SDM845 (sixteen entries) (Krzysztof Kozlowski)

   - Describe qcom,pcie-sc8180x clocks and resets accurately (Krzysztof
     Kozlowski)

   - Describe qcom,pcie-sm8150 clocks and resets accurately (Krzysztof
     Kozlowski)

   - Correct the qcom "reset-name" property, previously incorrectly
     called "reset-names" (Krzysztof Kozlowski)

   - Document qcom,pcie-sm8650, based on qcom,pcie-sm8550 (Neil
     Armstrong)

  Renesas R-Car PCIe controller driver:

   - Replace of_device.h with explicit of.h include to untangle header
     usage (Rob Herring)

   - Add DT and driver support for optional miniPCIe 1.5v and 3.3v
     regulators on KingFisher (Wolfram Sang)

  SiFive FU740 PCIe controller driver:

   - Convert fu740 CONFIG_PCIE_FU740 dependency from SOC_SIFIVE to
     ARCH_SIFIVE (Conor Dooley)

  Synopsys DesignWare PCIe controller driver:

   - Align iATU mapping for endpoint MSI-X (Niklas Cassel)

   - Drop "host_" prefix from struct dw_pcie_host_ops members (Yoshihiro
     Shimoda)

   - Drop "ep_" prefix from struct dw_pcie_ep_ops members (Yoshihiro
     Shimoda)

   - Rename struct dw_pcie_ep_ops.func_conf_select() to
     .get_dbi_offset() to be more descriptive (Yoshihiro Shimoda)

   - Add Endpoint DBI accessors to encapsulate offset lookups (Yoshihiro
     Shimoda)

  TI J721E PCIe driver:

   - Add j721e DT and driver support for 'num-lanes' for devices that
     support x1, x2, or x4 Links (Matt Ranostay)

   - Add j721e DT compatible strings and driver support for j784s4 (Matt
     Ranostay)

   - Make TI J721E Kconfig depend on ARCH_K3 since the hardware is
     specific to those TI SoC parts (Peter Robinson)

  TI Keystone PCIe controller driver:

   - Hold power management references to all PHYs while enabling them to
     avoid a race when one provides clocks to others (Siddharth
     Vadapalli)

  Xilinx XDMA PCIe controller driver:

   - Remove redundant dev_err(), since platform_get_irq() and
     platform_get_irq_byname() already log errors (Yang Li)

   - Fix uninitialized symbols in xilinx_pl_dma_pcie_setup_irq()
     (Krzysztof Wilczyński)

   - Fix xilinx_pl_dma_pcie_init_irq_domain() error return when
     irq_domain_add_linear() fails (Harshit Mogalapalli)

  MicroSemi Switchtec management driver:

   - Do dma_mrpc cleanup during switchtec_pci_remove() to match its devm
     ioremapping in switchtec_pci_probe(). Previously the cleanup was
     done in stdev_release(), which used stale pointers if stdev->cdev
     happened to be open when the PCI device was removed (Daniel
     Stodden)

  Miscellaneous:

   - Convert interrupt terminology from "legacy" to "INTx" to be more
     specific and match spec terminology (Damien Le Moal)

   - In dw-xdata-pcie, pci_endpoint_test, and vmd, replace usage of
     deprecated ida_simple_*() API with ida_alloc() and ida_free()
     (Christophe JAILLET)"

* tag 'pci-v6.8-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (97 commits)
  PCI: Fix kernel-doc issues
  PCI: brcmstb: Configure HW CLKREQ# mode appropriate for downstream device
  dt-bindings: PCI: brcmstb: Add property "brcm,clkreq-mode"
  PCI: mediatek-gen3: Fix translation window size calculation
  PCI: mediatek: Clear interrupt status before dispatching handler
  PCI: keystone: Fix race condition when initializing PHYs
  PCI: xilinx-xdma: Fix error code in xilinx_pl_dma_pcie_init_irq_domain()
  PCI: xilinx-xdma: Fix uninitialized symbols in xilinx_pl_dma_pcie_setup_irq()
  PCI: rcar-gen4: Fix -Wvoid-pointer-to-enum-cast error
  PCI: iproc: Fix -Wvoid-pointer-to-enum-cast warning
  PCI: dwc: Add dw_pcie_ep_{read,write}_dbi[2] helpers
  PCI: dwc: Rename .func_conf_select to .get_dbi_offset in struct dw_pcie_ep_ops
  PCI: dwc: Rename .ep_init to .init in struct dw_pcie_ep_ops
  PCI: dwc: Drop host prefix from struct dw_pcie_host_ops members
  misc: pci_endpoint_test: Use a unique test pattern for each BAR
  PCI: j721e: Make TI J721E depend on ARCH_K3
  PCI: j721e: Add TI J784S4 PCIe configuration
  PCI/AER: Use explicit register sizes for struct members
  PCI/AER: Decode Requester ID when no error info found
  PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors
  ...
2024-01-17 16:23:17 -08:00
Linus Torvalds
09d1c6a80f Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
 "Generic:

   - Use memdup_array_user() to harden against overflow.

   - Unconditionally advertise KVM_CAP_DEVICE_CTRL for all
     architectures.

   - Clean up Kconfigs that all KVM architectures were selecting

   - New functionality around "guest_memfd", a new userspace API that
     creates an anonymous file and returns a file descriptor that refers
     to it. guest_memfd files are bound to their owning virtual machine,
     cannot be mapped, read, or written by userspace, and cannot be
     resized. guest_memfd files do however support PUNCH_HOLE, which can
     be used to switch a memory area between guest_memfd and regular
     anonymous memory.

   - New ioctl KVM_SET_MEMORY_ATTRIBUTES allowing userspace to specify
     per-page attributes for a given page of guest memory; right now the
     only attribute is whether the guest expects to access memory via
     guest_memfd or not, which in Confidential SVMs backed by SEV-SNP,
     TDX or ARM64 pKVM is checked by firmware or hypervisor that
     guarantees confidentiality (AMD PSP, Intel TDX module, or EL2 in
     the case of pKVM).

  x86:

   - Support for "software-protected VMs" that can use the new
     guest_memfd and page attributes infrastructure. This is mostly
     useful for testing, since there is no pKVM-like infrastructure to
     provide a meaningfully reduced TCB.

   - Fix a relatively benign off-by-one error when splitting huge pages
     during CLEAR_DIRTY_LOG.

   - Fix a bug where KVM could incorrectly test-and-clear dirty bits in
     non-leaf TDP MMU SPTEs if a racing thread replaces a huge SPTE with
     a non-huge SPTE.

   - Use more generic lockdep assertions in paths that don't actually
     care about whether the caller is a reader or a writer.

   - let Xen guests opt out of having PV clock reported as "based on a
     stable TSC", because some of them don't expect the "TSC stable" bit
     (added to the pvclock ABI by KVM, but never set by Xen) to be set.

   - Revert a bogus, made-up nested SVM consistency check for
     TLB_CONTROL.

   - Advertise flush-by-ASID support for nSVM unconditionally, as KVM
     always flushes on nested transitions, i.e. always satisfies flush
     requests. This allows running bleeding edge versions of VMware
     Workstation on top of KVM.

   - Sanity check that the CPU supports flush-by-ASID when enabling SEV
     support.

   - On AMD machines with vNMI, always rely on hardware instead of
     intercepting IRET in some cases to detect unmasking of NMIs

   - Support for virtualizing Linear Address Masking (LAM)

   - Fix a variety of vPMU bugs where KVM fail to stop/reset counters
     and other state prior to refreshing the vPMU model.

   - Fix a double-overflow PMU bug by tracking emulated counter events
     using a dedicated field instead of snapshotting the "previous"
     counter. If the hardware PMC count triggers overflow that is
     recognized in the same VM-Exit that KVM manually bumps an event
     count, KVM would pend PMIs for both the hardware-triggered overflow
     and for KVM-triggered overflow.

   - Turn off KVM_WERROR by default for all configs so that it's not
     inadvertantly enabled by non-KVM developers, which can be
     problematic for subsystems that require no regressions for W=1
     builds.

   - Advertise all of the host-supported CPUID bits that enumerate
     IA32_SPEC_CTRL "features".

   - Don't force a masterclock update when a vCPU synchronizes to the
     current TSC generation, as updating the masterclock can cause
     kvmclock's time to "jump" unexpectedly, e.g. when userspace
     hotplugs a pre-created vCPU.

   - Use RIP-relative address to read kvm_rebooting in the VM-Enter
     fault paths, partly as a super minor optimization, but mostly to
     make KVM play nice with position independent executable builds.

   - Guard KVM-on-HyperV's range-based TLB flush hooks with an #ifdef on
     CONFIG_HYPERV as a minor optimization, and to self-document the
     code.

   - Add CONFIG_KVM_HYPERV to allow disabling KVM support for HyperV
     "emulation" at build time.

  ARM64:

   - LPA2 support, adding 52bit IPA/PA capability for 4kB and 16kB base
     granule sizes. Branch shared with the arm64 tree.

   - Large Fine-Grained Trap rework, bringing some sanity to the
     feature, although there is more to come. This comes with a prefix
     branch shared with the arm64 tree.

   - Some additional Nested Virtualization groundwork, mostly
     introducing the NV2 VNCR support and retargetting the NV support to
     that version of the architecture.

   - A small set of vgic fixes and associated cleanups.

  Loongarch:

   - Optimization for memslot hugepage checking

   - Cleanup and fix some HW/SW timer issues

   - Add LSX/LASX (128bit/256bit SIMD) support

  RISC-V:

   - KVM_GET_REG_LIST improvement for vector registers

   - Generate ISA extension reg_list using macros in get-reg-list
     selftest

   - Support for reporting steal time along with selftest

  s390:

   - Bugfixes

  Selftests:

   - Fix an annoying goof where the NX hugepage test prints out garbage
     instead of the magic token needed to run the test.

   - Fix build errors when a header is delete/moved due to a missing
     flag in the Makefile.

   - Detect if KVM bugged/killed a selftest's VM and print out a helpful
     message instead of complaining that a random ioctl() failed.

   - Annotate the guest printf/assert helpers with __printf(), and fix
     the various bugs that were lurking due to lack of said annotation"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (185 commits)
  x86/kvm: Do not try to disable kvmclock if it was not enabled
  KVM: x86: add missing "depends on KVM"
  KVM: fix direction of dependency on MMU notifiers
  KVM: introduce CONFIG_KVM_COMMON
  KVM: arm64: Add missing memory barriers when switching to pKVM's hyp pgd
  KVM: arm64: vgic-its: Avoid potential UAF in LPI translation cache
  RISC-V: KVM: selftests: Add get-reg-list test for STA registers
  RISC-V: KVM: selftests: Add steal_time test support
  RISC-V: KVM: selftests: Add guest_sbi_probe_extension
  RISC-V: KVM: selftests: Move sbi_ecall to processor.c
  RISC-V: KVM: Implement SBI STA extension
  RISC-V: KVM: Add support for SBI STA registers
  RISC-V: KVM: Add support for SBI extension registers
  RISC-V: KVM: Add SBI STA info to vcpu_arch
  RISC-V: KVM: Add steal-update vcpu request
  RISC-V: KVM: Add SBI STA extension skeleton
  RISC-V: paravirt: Implement steal-time support
  RISC-V: Add SBI STA extension definitions
  RISC-V: paravirt: Add skeleton for pv-time support
  RISC-V: KVM: Fix indentation in kvm_riscv_vcpu_set_reg_csr()
  ...
2024-01-17 13:03:37 -08:00
Linus Torvalds
3e7aeb78ab Merge tag 'net-next-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
 "The most interesting thing is probably the networking structs
  reorganization and a significant amount of changes is around
  self-tests.

  Core & protocols:

   - Analyze and reorganize core networking structs (socks, netdev,
     netns, mibs) to optimize cacheline consumption and set up build
     time warnings to safeguard against future header changes

     This improves TCP performances with many concurrent connections up
     to 40%

   - Add page-pool netlink-based introspection, exposing the memory
     usage and recycling stats. This helps indentify bad PP users and
     possible leaks

   - Refine TCP/DCCP source port selection to no longer favor even
     source port at connect() time when IP_LOCAL_PORT_RANGE is set. This
     lowers the time taken by connect() for hosts having many active
     connections to the same destination

   - Refactor the TCP bind conflict code, shrinking related socket
     structs

   - Refactor TCP SYN-Cookie handling, as a preparation step to allow
     arbitrary SYN-Cookie processing via eBPF

   - Tune optmem_max for 0-copy usage, increasing the default value to
     128KB and namespecifying it

   - Allow coalescing for cloned skbs coming from page pools, improving
     RX performances with some common configurations

   - Reduce extension header parsing overhead at GRO time

   - Add bridge MDB bulk deletion support, allowing user-space to
     request the deletion of matching entries

   - Reorder nftables struct members, to keep data accessed by the
     datapath first

   - Introduce TC block ports tracking and use. This allows supporting
     multicast-like behavior at the TC layer

   - Remove UAPI support for retired TC qdiscs (dsmark, CBQ and ATM) and
     classifiers (RSVP and tcindex)

   - More data-race annotations

   - Extend the diag interface to dump TCP bound-only sockets

   - Conditional notification of events for TC qdisc class and actions

   - Support for WPAN dynamic associations with nearby devices, to form
     a sub-network using a specific PAN ID

   - Implement SMCv2.1 virtual ISM device support

   - Add support for Batman-avd mulicast packet type

  BPF:

   - Tons of verifier improvements:
       - BPF register bounds logic and range support along with a large
         test suite
       - log improvements
       - complete precision tracking support for register spills
       - track aligned STACK_ZERO cases as imprecise spilled registers.
         This improves the verifier "instructions processed" metric from
         single digit to 50-60% for some programs
       - support for user's global BPF subprogram arguments with few
         commonly requested annotations for a better developer
         experience
       - support tracking of BPF_JNE which helps cases when the compiler
         transforms (unsigned) "a > 0" into "if a == 0 goto xxx" and the
         like
       - several fixes

   - Add initial TX metadata implementation for AF_XDP with support in
     mlx5 and stmmac drivers. Two types of offloads are supported right
     now, that is, TX timestamp and TX checksum offload

   - Fix kCFI bugs in BPF all forms of indirect calls from BPF into
     kernel and from kernel into BPF work with CFI enabled. This allows
     BPF to work with CONFIG_FINEIBT=y

   - Change BPF verifier logic to validate global subprograms lazily
     instead of unconditionally before the main program, so they can be
     guarded using BPF CO-RE techniques

   - Support uid/gid options when mounting bpffs

   - Add a new kfunc which acquires the associated cgroup of a task
     within a specific cgroup v1 hierarchy where the latter is
     identified by its id

   - Extend verifier to allow bpf_refcount_acquire() of a map value
     field obtained via direct load which is a use-case needed in
     sched_ext

   - Add BPF link_info support for uprobe multi link along with bpftool
     integration for the latter

   - Support for VLAN tag in XDP hints

   - Remove deprecated bpfilter kernel leftovers given the project is
     developed in user-space (https://github.com/facebook/bpfilter)

  Misc:

   - Support for parellel TC self-tests execution

   - Increase MPTCP self-tests coverage

   - Updated the bridge documentation, including several so-far
     undocumented features

   - Convert all the net self-tests to run in unique netns, to avoid
     random failures due to conflict and allow concurrent runs

   - Add TCP-AO self-tests

   - Add kunit tests for both cfg80211 and mac80211

   - Autogenerate Netlink families documentation from YAML spec

   - Add yml-gen support for fixed headers and recursive nests, the tool
     can now generate user-space code for all genetlink families for
     which we have specs

   - A bunch of additional module descriptions fixes

   - Catch incorrect freeing of pages belonging to a page pool

  Driver API:

   - Rust abstractions for network PHY drivers; do not cover yet the
     full C API, but already allow implementing functional PHY drivers
     in rust

   - Introduce queue and NAPI support in the netdev Netlink interface,
     allowing complete access to the device <> NAPIs <> queues
     relationship

   - Introduce notifications filtering for devlink to allow control
     application scale to thousands of instances

   - Improve PHY validation, requesting rate matching information for
     each ethtool link mode supported by both the PHY and host

   - Add support for ethtool symmetric-xor RSS hash

   - ACPI based Wifi band RFI (WBRF) mitigation feature for the AMD
     platform

   - Expose pin fractional frequency offset value over new DPLL generic
     netlink attribute

   - Convert older drivers to platform remove callback returning void

   - Add support for PHY package MMD read/write

  New hardware / drivers:

   - Ethernet:
       - Octeon CN10K devices
       - Broadcom 5760X P7
       - Qualcomm SM8550 SoC
       - Texas Instrument DP83TG720S PHY

   - Bluetooth:
       - IMC Networks Bluetooth radio

  Removed:

   - WiFi:
       - libertas 16-bit PCMCIA support
       - Atmel at76c50x drivers
       - HostAP ISA/PCMCIA style 802.11b driver
       - zd1201 802.11b USB dongles
       - Orinoco ISA/PCMCIA 802.11b driver
       - Aviator/Raytheon driver
       - Planet WL3501 driver
       - RNDIS USB 802.11b driver

  Driver updates:

   - Ethernet high-speed NICs:
       - Intel (100G, ice, idpf):
          - allow one by one port representors creation and removal
          - add temperature and clock information reporting
          - add get/set for ethtool's header split ringparam
          - add again FW logging
          - adds support switchdev hardware packet mirroring
          - iavf: implement symmetric-xor RSS hash
          - igc: add support for concurrent physical and free-running
            timers
          - i40e: increase the allowable descriptors
       - nVidia/Mellanox:
          - Preparation for Socket-Direct multi-dev netdev. That will
            allow in future releases combining multiple PFs devices
            attached to different NUMA nodes under the same netdev
       - Broadcom (bnxt):
          - TX completion handling improvements
          - add basic ntuple filter support
          - reduce MSIX vectors usage for MQPRIO offload
          - add VXLAN support, USO offload and TX coalesce completion
            for P7
       - Marvell Octeon EP:
          - xmit-more support
          - add PF-VF mailbox support and use it for FW notifications
            for VFs
       - Wangxun (ngbe/txgbe):
          - implement ethtool functions to operate pause param, ring
            param, coalesce channel number and msglevel
       - Netronome/Corigine (nfp):
          - add flow-steering support
          - support UDP segmentation offload

   - Ethernet NICs embedded, slower, virtual:
       - Xilinx AXI: remove duplicate DMA code adopting the dma engine
         driver
       - stmmac: add support for HW-accelerated VLAN stripping
       - TI AM654x sw: add mqprio, frame preemption & coalescing
       - gve: add support for non-4k page sizes.
       - virtio-net: support dynamic coalescing moderation

   - nVidia/Mellanox Ethernet datacenter switches:
       - allow firmware upgrade without a reboot
       - more flexible support for bridge flooding via the compressed
         FID flooding mode

   - Ethernet embedded switches:
       - Microchip:
          - fine-tune flow control and speed configurations in KSZ8xxx
          - KSZ88X3: enable setting rmii reference
       - Renesas:
          - add jumbo frames support
       - Marvell:
          - 88E6xxx: add "eth-mac" and "rmon" stats support

   - Ethernet PHYs:
       - aquantia: add firmware load support
       - at803x: refactor the driver to simplify adding support for more
         chip variants
       - NXP C45 TJA11xx: Add MACsec offload support

   - Wifi:
       - MediaTek (mt76):
          - NVMEM EEPROM improvements
          - mt7996 Extremely High Throughput (EHT) improvements
          - mt7996 Wireless Ethernet Dispatcher (WED) support
          - mt7996 36-bit DMA support
       - Qualcomm (ath12k):
          - support for a single MSI vector
          - WCN7850: support AP mode
       - Intel (iwlwifi):
          - new debugfs file fw_dbg_clear
          - allow concurrent P2P operation on DFS channels

   - Bluetooth:
       - QCA2066: support HFP offload
       - ISO: more broadcast-related improvements
       - NXP: better recovery in case receiver/transmitter get out of sync"

* tag 'net-next-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1714 commits)
  lan78xx: remove redundant statement in lan78xx_get_eee
  lan743x: remove redundant statement in lan743x_ethtool_get_eee
  bnxt_en: Fix RCU locking for ntuple filters in bnxt_rx_flow_steer()
  bnxt_en: Fix RCU locking for ntuple filters in bnxt_srxclsrldel()
  bnxt_en: Remove unneeded variable in bnxt_hwrm_clear_vnic_filter()
  tcp: Revert no longer abort SYN_SENT when receiving some ICMP
  Revert "mlx5 updates 2023-12-20"
  Revert "net: stmmac: Enable Per DMA Channel interrupt"
  ipvlan: Remove usage of the deprecated ida_simple_xx() API
  ipvlan: Fix a typo in a comment
  net/sched: Remove ipt action tests
  net: stmmac: Use interrupt mode INTM=1 for per channel irq
  net: stmmac: Add support for TX/RX channel interrupt
  net: stmmac: Make MSI interrupt routine generic
  dt-bindings: net: snps,dwmac: per channel irq
  net: phy: at803x: make read_status more generic
  net: phy: at803x: add support for cdt cross short test for qca808x
  net: phy: at803x: refactor qca808x cable test get status function
  net: phy: at803x: generalize cdt fault length function
  net: ethernet: cortina: Drop TSO support
  ...
2024-01-11 10:07:29 -08:00
Linus Torvalds
c299010061 Merge tag 'asm-generic-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic cleanups from Arnd Bergmann:
 "A series from Baoquan He cleans up the asm-generic/io.h to remove the
  ioremap_uc() definition from everything except x86, which still needs
  it for pre-PAT systems. This series notably contains a patch from
  Jiaxun Yang that converts MIPS to use asm-generic/io.h like every
  other architecture does, enabling future cleanups.

  Some of my own patches fix -Wmissing-prototype warnings in
  architecture specific code across several architectures. This is now
  needed as the warning is enabled by default. There are still some
  remaining warnings in minor platforms, but the series should catch
  most of the widely used ones make them more consistent with one
  another.

  David McKay fixes a bug in __generic_cmpxchg_local() when this is used
  on 64-bit architectures. This could currently only affect parisc64 and
  sparc64.

  Additional cleanups address from Linus Walleij, Uwe Kleine-König,
  Thomas Huth, and Kefeng Wang help reduce unnecessary inconsistencies
  between architectures"

* tag 'asm-generic-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  asm-generic: Fix 32 bit __generic_cmpxchg_local
  Hexagon: Make pfn accessors statics inlines
  ARC: mm: Make virt_to_pfn() a static inline
  mips: remove extraneous asm-generic/iomap.h include
  sparc: Use $(kecho) to announce kernel images being ready
  arm64: vdso32: Define BUILD_VDSO32_64 to correct prototypes
  csky: fix arch_jump_label_transform_static override
  arch: add do_page_fault prototypes
  arch: add missing prepare_ftrace_return() prototypes
  arch: vdso: consolidate gettime prototypes
  arch: include linux/cpu.h for trap_init() prototype
  arch: fix asm-offsets.c building with -Wmissing-prototypes
  arch: consolidate arch_irq_work_raise prototypes
  hexagon: Remove CONFIG_HEXAGON_ARCH_VERSION from uapi header
  asm/io: remove unnecessary xlate_dev_mem_ptr() and unxlate_dev_mem_ptr()
  mips: io: remove duplicated codes
  arch/*/io.h: remove ioremap_uc in some architectures
  mips: add <asm-generic/io.h> including
2024-01-10 18:13:44 -08:00
Linus Torvalds
78273df7f6 Merge tag 'header_cleanup-2024-01-10' of https://evilpiepirate.org/git/bcachefs
Pull header cleanups from Kent Overstreet:
 "The goal is to get sched.h down to a type only header, so the main
  thing happening in this patchset is splitting out various _types.h
  headers and dependency fixups, as well as moving some things out of
  sched.h to better locations.

  This is prep work for the memory allocation profiling patchset which
  adds new sched.h interdepencencies"

* tag 'header_cleanup-2024-01-10' of https://evilpiepirate.org/git/bcachefs: (51 commits)
  Kill sched.h dependency on rcupdate.h
  kill unnecessary thread_info.h include
  Kill unnecessary kernel.h include
  preempt.h: Kill dependency on list.h
  rseq: Split out rseq.h from sched.h
  LoongArch: signal.c: add header file to fix build error
  restart_block: Trim includes
  lockdep: move held_lock to lockdep_types.h
  sem: Split out sem_types.h
  uidgid: Split out uidgid_types.h
  seccomp: Split out seccomp_types.h
  refcount: Split out refcount_types.h
  uapi/linux/resource.h: fix include
  x86/signal: kill dependency on time.h
  syscall_user_dispatch.h: split out *_types.h
  mm_types_task.h: Trim dependencies
  Split out irqflags_types.h
  ipc: Kill bogus dependency on spinlock.h
  shm: Slim down dependencies
  workqueue: Split out workqueue_types.h
  ...
2024-01-10 16:43:55 -08:00
Linus Torvalds
9f2a635235 Merge tag 'mm-nonmm-stable-2024-01-09-10-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
 "Quite a lot of kexec work this time around. Many singleton patches in
  many places. The notable patch series are:

   - nilfs2 folio conversion from Matthew Wilcox in 'nilfs2: Folio
     conversions for file paths'.

   - Additional nilfs2 folio conversion from Ryusuke Konishi in 'nilfs2:
     Folio conversions for directory paths'.

   - IA64 remnant removal in Heiko Carstens's 'Remove unused code after
     IA-64 removal'.

   - Arnd Bergmann has enabled the -Wmissing-prototypes warning
     everywhere in 'Treewide: enable -Wmissing-prototypes'. This had
     some followup fixes:

      - Nathan Chancellor has cleaned up the hexagon build in the series
        'hexagon: Fix up instances of -Wmissing-prototypes'.

      - Nathan also addressed some s390 warnings in 's390: A couple of
        fixes for -Wmissing-prototypes'.

      - Arnd Bergmann addresses the same warnings for MIPS in his series
        'mips: address -Wmissing-prototypes warnings'.

   - Baoquan He has made kexec_file operate in a top-down-fitting manner
     similar to kexec_load in the series 'kexec_file: Load kernel at top
     of system RAM if required'

   - Baoquan He has also added the self-explanatory 'kexec_file: print
     out debugging message if required'.

   - Some checkstack maintenance work from Tiezhu Yang in the series
     'Modify some code about checkstack'.

   - Douglas Anderson has disentangled the watchdog code's logging when
     multiple reports are occurring simultaneously. The series is
     'watchdog: Better handling of concurrent lockups'.

   - Yuntao Wang has contributed some maintenance work on the crash code
     in 'crash: Some cleanups and fixes'"

* tag 'mm-nonmm-stable-2024-01-09-10-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (157 commits)
  crash_core: fix and simplify the logic of crash_exclude_mem_range()
  x86/crash: use SZ_1M macro instead of hardcoded value
  x86/crash: remove the unused image parameter from prepare_elf_headers()
  kdump: remove redundant DEFAULT_CRASH_KERNEL_LOW_SIZE
  scripts/decode_stacktrace.sh: strip unexpected CR from lines
  watchdog: if panicking and we dumped everything, don't re-enable dumping
  watchdog/hardlockup: use printk_cpu_sync_get_irqsave() to serialize reporting
  watchdog/softlockup: use printk_cpu_sync_get_irqsave() to serialize reporting
  watchdog/hardlockup: adopt softlockup logic avoiding double-dumps
  kexec_core: fix the assignment to kimage->control_page
  x86/kexec: fix incorrect end address passed to kernel_ident_mapping_init()
  lib/trace_readwrite.c:: replace asm-generic/io with linux/io
  nilfs2: cpfile: fix some kernel-doc warnings
  stacktrace: fix kernel-doc typo
  scripts/checkstack.pl: fix no space expression between sp and offset
  x86/kexec: fix incorrect argument passed to kexec_dprintk()
  x86/kexec: use pr_err() instead of kexec_dprintk() when an error occurs
  nilfs2: add missing set_freezable() for freezable kthread
  kernel: relay: remove relay_file_splice_read dead code, doesn't work
  docs: submit-checklist: remove all of "make namespacecheck"
  ...
2024-01-09 11:46:20 -08:00
Linus Torvalds
aac4de465a Merge tag 'perf-core-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull performance events updates from Ingo Molnar:

 - Add branch stack counters ABI extension to better capture the growing
   amount of information the PMU exposes via branch stack sampling.
   There's matching tooling support.

 - Fix race when creating the nr_addr_filters sysfs file

 - Add Intel Sierra Forest and Grand Ridge intel/cstate PMU support

 - Add Intel Granite Rapids, Sierra Forest and Grand Ridge uncore PMU
   support

 - Misc cleanups & fixes

* tag 'perf-core-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/uncore: Factor out topology_gidnid_map()
  perf/x86/intel/uncore: Fix NULL pointer dereference issue in upi_fill_topology()
  perf/x86/amd: Reject branch stack for IBS events
  perf/x86/intel/uncore: Support Sierra Forest and Grand Ridge
  perf/x86/intel/uncore: Support IIO free-running counters on GNR
  perf/x86/intel/uncore: Support Granite Rapids
  perf/x86/uncore: Use u64 to replace unsigned for the uncore offsets array
  perf/x86/intel/uncore: Generic uncore_get_uncores and MMIO format of SPR
  perf: Fix the nr_addr_filters fix
  perf/x86/intel/cstate: Add Grand Ridge support
  perf/x86/intel/cstate: Add Sierra Forest support
  x86/smp: Export symbol cpu_clustergroup_mask()
  perf/x86/intel/cstate: Cleanup duplicate attr_groups
  perf/core: Fix narrow startup race when creating the perf nr_addr_filters sysfs file
  perf/x86/intel: Support branch counters logging
  perf/x86/intel: Reorganize attrs and is_visible
  perf: Add branch_sample_call_stack
  perf/x86: Add PERF_X86_EVENT_NEEDS_BRANCH_STACK flag
  perf: Add branch stack counters
2024-01-08 19:37:20 -08:00
Linus Torvalds
b51cc5d028 Merge tag 'x86-cleanups-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:

 - Change global variables to local

 - Add missing kernel-doc function parameter descriptions

 - Remove unused parameter from a macro

 - Remove obsolete Kconfig entry

 - Fix comments

 - Fix typos, mostly scripted, manually reviewed

and a micro-optimization got misplaced as a cleanup:

 - Micro-optimize the asm code in secondary_startup_64_no_verify()

* tag 'x86-cleanups-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  arch/x86: Fix typos
  x86/head_64: Use TESTB instead of TESTL in secondary_startup_64_no_verify()
  x86/docs: Remove reference to syscall trampoline in PTI
  x86/Kconfig: Remove obsolete config X86_32_SMP
  x86/io: Remove the unused 'bw' parameter from the BUILDIO() macro
  x86/mtrr: Document missing function parameters in kernel-doc
  x86/setup: Make relocated_ramdisk a local variable of relocate_initrd()
2024-01-08 17:23:32 -08:00
Linus Torvalds
106b88d7a9 Merge tag 'x86-asm-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm updates from Ingo Molnar:
 "Replace magic numbers in GDT descriptor definitions & handling:

   - Introduce symbolic names via macros for descriptor
     types/fields/flags, and then use these symbolic names.

   - Clean up definitions a bit, such as GDT_ENTRY_INIT()

   - Fix/clean up details that became visibly inconsistent after the
     symbol-based code was introduced:

      - Unify accessed flag handling

      - Set the D/B size flag consistently & according to the HW
        specification"

* tag 'x86-asm-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/asm: Add DB flag to 32-bit percpu GDT entry
  x86/asm: Always set A (accessed) flag in GDT descriptors
  x86/asm: Replace magic numbers in GDT descriptors, script-generated change
  x86/asm: Replace magic numbers in GDT descriptors, preparations
  x86/asm: Provide new infrastructure for GDT descriptors
2024-01-08 17:02:57 -08:00
Linus Torvalds
33034c4f94 Merge tag 'x86-apic-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 apic updates from Ingo Molnar:

 - Clean up 'struct apic':
    - Drop ::delivery_mode
    - Drop 'enum apic_delivery_modes'
    - Drop 'struct local_apic'

 - Fix comments

* tag 'x86-apic-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ioapic: Remove unfinished sentence from comment
  x86/apic: Drop struct local_apic
  x86/apic: Drop enum apic_delivery_modes
  x86/apic: Drop apic::delivery_mode
2024-01-08 16:46:41 -08:00
Linus Torvalds
3edbe8afb6 Merge tag 'ras_core_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS updates from Borislav Petkov:

 - Convert the hw error storm handling into a finer-grained, per-bank
   solution which allows for more timely detection and reporting of
   errors

 - Start a documentation section which will hold down relevant RAS
   features description and how they should be used

 - Add new AMD error bank types

 - Slim down and remove error type descriptions from the kernel side of
   error decoding to rasdaemon which can be used from now on to decode
   hw errors on AMD

 - Mark pages containing uncorrectable errors as poison so that kdump
   can avoid them and thus not cause another panic

 - The usual cleanups and fixlets

* tag 'ras_core_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Handle Intel threshold interrupt storms
  x86/mce: Add per-bank CMCI storm mitigation
  x86/mce: Remove old CMCI storm mitigation code
  Documentation: Begin a RAS section
  x86/MCE/AMD: Add new MA_LLC, USR_DP, and USR_CP bank types
  EDAC/mce_amd: Remove SMCA Extended Error code descriptions
  x86/mce/amd, EDAC/mce_amd: Move long names to decoder module
  x86/mce/inject: Clear test status value
  x86/mce: Remove redundant check from mce_device_create()
  x86/mce: Mark fatal MCE's page as poison to avoid panic in the kdump kernel
2024-01-08 16:03:00 -08:00
Linus Torvalds
bef91c28f2 Merge tag 'x86_cpu_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu feature updates from Borislav Petkov:

 - Add synthetic X86_FEATURE flags for the different AMD Zen generations
   and use them everywhere instead of ad-hoc family/model checks. Drop
   an ancient AMD errata checking facility as a result

 - Fix a fragile initcall ordering in intel_epb

 - Do not issue the MFENCE+LFENCE barrier for the TSC deadline and
   X2APIC MSRs on AMD as it is not needed there

* tag 'x86_cpu_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/CPU/AMD: Add X86_FEATURE_ZEN1
  x86/CPU/AMD: Drop now unused CPU erratum checking function
  x86/CPU/AMD: Get rid of amd_erratum_1485[]
  x86/CPU/AMD: Get rid of amd_erratum_400[]
  x86/CPU/AMD: Get rid of amd_erratum_383[]
  x86/CPU/AMD: Get rid of amd_erratum_1054[]
  x86/CPU/AMD: Move the DIV0 bug detection to the Zen1 init function
  x86/CPU/AMD: Move Zenbleed check to the Zen2 init function
  x86/CPU/AMD: Rename init_amd_zn() to init_amd_zen_common()
  x86/CPU/AMD: Call the spectral chicken in the Zen2 init function
  x86/CPU/AMD: Move erratum 1076 fix into the Zen1 init function
  x86/CPU/AMD: Move the Zen3 BTC_NO detection to the Zen3 init function
  x86/CPU/AMD: Carve out the erratum 1386 fix
  x86/CPU/AMD: Add ZenX generations flags
  x86/cpu/intel_epb: Don't rely on link order
  x86/barrier: Do not serialize MSR accesses on AMD
2024-01-08 15:45:31 -08:00
Linus Torvalds
e900042f04 Merge tag 'x86_sev_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 SEV updates from Borislav Petkov:

 - Convert the sev-guest plaform ->remove callback to return void

 - Move the SEV C-bit verification to the BSP as it needs to happen only
   once and not on every AP

* tag 'x86_sev_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  virt: sev-guest: Convert to platform remove callback returning void
  x86/sev: Do the C-bit verification only on the BSP
2024-01-08 15:42:52 -08:00
Linus Torvalds
fc5e5c5923 Merge tag 'x86_paravirt_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 paravirt updates from Borislav Petkov:

 - Replace the paravirt patching functionality using the alternatives
   infrastructure and remove the former

 - Misc other improvements

* tag 'x86_paravirt_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/alternative: Correct feature bit debug output
  x86/paravirt: Remove no longer needed paravirt patching code
  x86/paravirt: Switch mixed paravirt/alternative calls to alternatives
  x86/alternative: Add indirect call patching
  x86/paravirt: Move some functions and defines to alternative.c
  x86/paravirt: Introduce ALT_NOT_XEN
  x86/paravirt: Make the struct paravirt_patch_site packed
  x86/paravirt: Use relative reference for the original instruction offset
2024-01-08 13:41:42 -08:00
Linus Torvalds
6e0b939180 Merge tag 'x86_microcode_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 microcode updates from Borislav Petkov:

 - Correct minor issues after the microcode revision reporting
   sanitization

* tag 'x86_microcode_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode/intel: Set new revision only after a successful update
  x86/microcode/intel: Remove redundant microcode late updated message
2024-01-08 13:24:30 -08:00
Kirill A. Shutemov
1c6d984f52 x86/kvm: Do not try to disable kvmclock if it was not enabled
kvm_guest_cpu_offline() tries to disable kvmclock regardless if it is
present in the VM. It leads to write to a MSR that doesn't exist on some
configurations, namely in TDX guest:

	unchecked MSR access error: WRMSR to 0x12 (tried to write 0x0000000000000000)
	at rIP: 0xffffffff8110687c (kvmclock_disable+0x1c/0x30)

kvmclock enabling is gated by CLOCKSOURCE and CLOCKSOURCE2 KVM paravirt
features.

Do not disable kvmclock if it was not enabled.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Fixes: c02027b574 ("x86/kvm: Disable kvmclock on all CPUs on shutdown")
Reviewed-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: stable@vger.kernel.org
Message-Id: <20231205004510.27164-6-kirill.shutemov@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-01-08 12:14:41 -05:00
Yuntao Wang
61bb219f9d x86/crash: use SZ_1M macro instead of hardcoded value
Use SZ_1M macro instead of hardcoded 1<<20 to make code more readable.

Link: https://lkml.kernel.org/r/20240102144905.110047-3-ytcoode@gmail.com
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05 10:45:25 -08:00
Yuntao Wang
83d4a42a91 x86/crash: remove the unused image parameter from prepare_elf_headers()
Patch series "crash: Some cleanups and fixes", v2.

This patchset includes two cleanups and one fix.


This patch (of 3):

The image parameter is no longer in use, remove it.  Also, tidy up the
code formatting.

Link: https://lkml.kernel.org/r/20240102144905.110047-1-ytcoode@gmail.com
Link: https://lkml.kernel.org/r/20240102144905.110047-2-ytcoode@gmail.com
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05 10:45:25 -08:00
Jakub Kicinski
e63c1822ac Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

drivers/net/ethernet/broadcom/bnxt/bnxt.c
  e009b2efb7 ("bnxt_en: Remove mis-applied code from bnxt_cfg_ntp_filters()")
  0f2b214779 ("bnxt_en: Fix compile error without CONFIG_RFS_ACCEL")
https://lore.kernel.org/all/20240105115509.225aa8a2@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04 18:06:46 -08:00
Jinghao Jia
f5d03da48d x86/kprobes: fix incorrect return address calculation in kprobe_emulate_call_indirect
kprobe_emulate_call_indirect currently uses int3_emulate_call to emulate
indirect calls. However, int3_emulate_call always assumes the size of
the call to be 5 bytes when calculating the return address. This is
incorrect for register-based indirect calls in x86, which can be either
2 or 3 bytes depending on whether REX prefix is used. At kprobe runtime,
the incorrect return address causes control flow to land onto the wrong
place after return -- possibly not a valid instruction boundary. This
can lead to a panic like the following:

[    7.308204][    C1] BUG: unable to handle page fault for address: 000000000002b4d8
[    7.308883][    C1] #PF: supervisor read access in kernel mode
[    7.309168][    C1] #PF: error_code(0x0000) - not-present page
[    7.309461][    C1] PGD 0 P4D 0
[    7.309652][    C1] Oops: 0000 [#1] SMP
[    7.309929][    C1] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.7.0-rc5-trace-for-next #6
[    7.310397][    C1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-20220807_005459-localhost 04/01/2014
[    7.311068][    C1] RIP: 0010:__common_interrupt+0x52/0xc0
[    7.311349][    C1] Code: 01 00 4d 85 f6 74 39 49 81 fe 00 f0 ff ff 77 30 4c 89 f7 4d 8b 5e 68 41 ba 91 76 d8 42 45 03 53 fc 74 02 0f 0b cc ff d3 65 48 <8b> 05 30 c7 ff 7e 65 4c 89 3d 28 c7 ff 7e 5b 41 5c 41 5e 41 5f c3
[    7.312512][    C1] RSP: 0018:ffffc900000e0fd0 EFLAGS: 00010046
[    7.312899][    C1] RAX: 0000000000000001 RBX: 0000000000000023 RCX: 0000000000000001
[    7.313334][    C1] RDX: 00000000000003cd RSI: 0000000000000001 RDI: ffff888100d302a4
[    7.313702][    C1] RBP: 0000000000000001 R08: 0ef439818636191f R09: b1621ff338a3b482
[    7.314146][    C1] R10: ffffffff81e5127b R11: ffffffff81059810 R12: 0000000000000023
[    7.314509][    C1] R13: 0000000000000000 R14: ffff888100d30200 R15: 0000000000000000
[    7.314951][    C1] FS:  0000000000000000(0000) GS:ffff88813bc80000(0000) knlGS:0000000000000000
[    7.315396][    C1] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    7.315691][    C1] CR2: 000000000002b4d8 CR3: 0000000003028003 CR4: 0000000000370ef0
[    7.316153][    C1] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    7.316508][    C1] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    7.316948][    C1] Call Trace:
[    7.317123][    C1]  <IRQ>
[    7.317279][    C1]  ? __die_body+0x64/0xb0
[    7.317482][    C1]  ? page_fault_oops+0x248/0x370
[    7.317712][    C1]  ? __wake_up+0x96/0xb0
[    7.317964][    C1]  ? exc_page_fault+0x62/0x130
[    7.318211][    C1]  ? asm_exc_page_fault+0x22/0x30
[    7.318444][    C1]  ? __cfi_native_send_call_func_single_ipi+0x10/0x10
[    7.318860][    C1]  ? default_idle+0xb/0x10
[    7.319063][    C1]  ? __common_interrupt+0x52/0xc0
[    7.319330][    C1]  common_interrupt+0x78/0x90
[    7.319546][    C1]  </IRQ>
[    7.319679][    C1]  <TASK>
[    7.319854][    C1]  asm_common_interrupt+0x22/0x40
[    7.320082][    C1] RIP: 0010:default_idle+0xb/0x10
[    7.320309][    C1] Code: 4c 01 c7 4c 29 c2 e9 72 ff ff ff cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 b8 0c 67 40 a5 66 90 0f 00 2d 09 b9 3b 00 fb f4 <fa> c3 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 b8 0c 67 40 a5 e9
[    7.321449][    C1] RSP: 0018:ffffc9000009bee8 EFLAGS: 00000256
[    7.321808][    C1] RAX: ffff88813bca8b68 RBX: 0000000000000001 RCX: 000000000001ef0c
[    7.322227][    C1] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 000000000001ef0c
[    7.322656][    C1] RBP: ffffc9000009bef8 R08: 8000000000000000 R09: 00000000000008c2
[    7.323083][    C1] R10: 0000000000000000 R11: ffffffff81058e70 R12: 0000000000000000
[    7.323530][    C1] R13: ffff8881002b30c0 R14: 0000000000000000 R15: 0000000000000000
[    7.323948][    C1]  ? __cfi_lapic_next_deadline+0x10/0x10
[    7.324239][    C1]  default_idle_call+0x31/0x50
[    7.324464][    C1]  do_idle+0xd3/0x240
[    7.324690][    C1]  cpu_startup_entry+0x25/0x30
[    7.324983][    C1]  start_secondary+0xb4/0xc0
[    7.325217][    C1]  secondary_startup_64_no_verify+0x179/0x17b
[    7.325498][    C1]  </TASK>
[    7.325641][    C1] Modules linked in:
[    7.325906][    C1] CR2: 000000000002b4d8
[    7.326104][    C1] ---[ end trace 0000000000000000 ]---
[    7.326354][    C1] RIP: 0010:__common_interrupt+0x52/0xc0
[    7.326614][    C1] Code: 01 00 4d 85 f6 74 39 49 81 fe 00 f0 ff ff 77 30 4c 89 f7 4d 8b 5e 68 41 ba 91 76 d8 42 45 03 53 fc 74 02 0f 0b cc ff d3 65 48 <8b> 05 30 c7 ff 7e 65 4c 89 3d 28 c7 ff 7e 5b 41 5c 41 5e 41 5f c3
[    7.327570][    C1] RSP: 0018:ffffc900000e0fd0 EFLAGS: 00010046
[    7.327910][    C1] RAX: 0000000000000001 RBX: 0000000000000023 RCX: 0000000000000001
[    7.328273][    C1] RDX: 00000000000003cd RSI: 0000000000000001 RDI: ffff888100d302a4
[    7.328632][    C1] RBP: 0000000000000001 R08: 0ef439818636191f R09: b1621ff338a3b482
[    7.329223][    C1] R10: ffffffff81e5127b R11: ffffffff81059810 R12: 0000000000000023
[    7.329780][    C1] R13: 0000000000000000 R14: ffff888100d30200 R15: 0000000000000000
[    7.330193][    C1] FS:  0000000000000000(0000) GS:ffff88813bc80000(0000) knlGS:0000000000000000
[    7.330632][    C1] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    7.331050][    C1] CR2: 000000000002b4d8 CR3: 0000000003028003 CR4: 0000000000370ef0
[    7.331454][    C1] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    7.331854][    C1] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    7.332236][    C1] Kernel panic - not syncing: Fatal exception in interrupt
[    7.332730][    C1] Kernel Offset: disabled
[    7.333044][    C1] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

The relevant assembly code is (from objdump, faulting address
highlighted):

ffffffff8102ed9d:       41 ff d3                  call   *%r11
ffffffff8102eda0:       65 48 <8b> 05 30 c7 ff    mov    %gs:0x7effc730(%rip),%rax

The emulation incorrectly sets the return address to be ffffffff8102ed9d
+ 0x5 = ffffffff8102eda2, which is the 8b byte in the middle of the next
mov. This in turn causes incorrect subsequent instruction decoding and
eventually triggers the page fault above.

Instead of invoking int3_emulate_call, perform push and jmp emulation
directly in kprobe_emulate_call_indirect. At this point we can obtain
the instruction size from p->ainsn.size so that we can calculate the
correct return address.

Link: https://lore.kernel.org/all/20240102233345.385475-1-jinghao7@illinois.edu/

Fixes: 6256e668b7 ("x86/kprobes: Use int3 instead of debug trap for single-step")
Cc: stable@vger.kernel.org
Signed-off-by: Jinghao Jia <jinghao7@illinois.edu>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2024-01-04 14:10:59 +09:00
Bjorn Helgaas
54aa699e80 arch/x86: Fix typos
Fix typos, most reported by "codespell arch/x86".  Only touches comments,
no code changes.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20240103004011.1758650-1-helgaas@kernel.org
2024-01-03 11:46:22 +01:00
Joerg Roedel
75f74f85a4 Merge branches 'apple/dart', 'arm/rockchip', 'arm/smmu', 'virtio', 'x86/vt-d', 'x86/amd' and 'core' into next 2024-01-03 09:59:32 +01:00
Borislav Petkov (AMD)
7991ed4358 x86/alternative: Correct feature bit debug output
In

  https://lore.kernel.org/r/20231206110636.GBZXBVvCWj2IDjVk4c@fat_crate.local

I wanted to adjust the alternative patching debug output to the new
changes introduced by

  da0fe6e68e ("x86/alternative: Add indirect call patching")

but removed the '*' which denotes the ->x86_capability word. The correct
output should be, for example:

  [    0.230071] SMP alternatives: feat: 11*32+15, old: (entry_SYSCALL_64_after_hwframe+0x5a/0x77 (ffffffff81c000c2) len: 16), repl: (ffffffff89ae896a, len: 5) flags: 0x0

while the incorrect one says "... 1132+15" currently.

Add back the '*'.

Fixes: da0fe6e68e ("x86/alternative: Add indirect call patching")
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231206110636.GBZXBVvCWj2IDjVk4c@fat_crate.local
2023-12-30 12:25:55 +01:00
Yuntao Wang
3177e6315b x86/kexec: fix incorrect end address passed to kernel_ident_mapping_init()
kernel_ident_mapping_init() takes an exclusive memory range [pstart, pend)
where pend is not included in the range, while res represents an inclusive
memory range [start, end] where end is considered part of the range.

Passing [start, end] rather than [start, end+1) to
kernel_ident_mapping_init() may result in the identity mapping for the
end address not being set up.

For example, when res->start is equal to res->end,
kernel_ident_mapping_init() will not establish any identity mapping. 
Similarly, when the value of res->end is a multiple of 2M and the page
table maps 2M pages, kernel_ident_mapping_init() will also not set up
identity mapping for res->end.

Therefore, passing res->end directly to kernel_ident_mapping_init() is
incorrect, the correct end address should be `res->end + 1`.

Link: https://lkml.kernel.org/r/20231221101702.20956-1-ytcoode@gmail.com
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Simon Horman <horms@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29 12:22:29 -08:00
Yuntao Wang
22bb6bcd4c x86/kexec: fix incorrect argument passed to kexec_dprintk()
kexec_dprintk() expects the last argument to be kbuf.memsz, but the actual
argument being passed is kbuf.bufsz.

Although these two values are currently equal, it is better to pass the
correct one, in case these two values become different in the future.

Link: https://lkml.kernel.org/r/20231220154105.215610-1-ytcoode@gmail.com
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29 12:22:28 -08:00
Yuntao Wang
43132282d8 x86/kexec: use pr_err() instead of kexec_dprintk() when an error occurs
When detecting an error, the current code uses kexec_dprintk() to output
log message. This is not quite appropriate as kexec_dprintk() is mainly
used for outputting debugging messages, rather than error messages.

Replace kexec_dprintk() with pr_err(). This also makes the output method
for this error log align with the output method for other error logs in
this function.

Additionally, the last return statement in set_page_address() is
unnecessary, remove it.

Link: https://lkml.kernel.org/r/20231220030124.149160-1-ytcoode@gmail.com
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29 12:22:28 -08:00
Kent Overstreet
932562a604 rseq: Split out rseq.h from sched.h
We're trying to get sched.h down to more or less just types only, not
code - rseq can live in its own header.

This helps us kill the dependency on preempt.h in sched.h.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-27 11:49:56 -05:00
Yuntao Wang
8474f82ade x86/kexec: simplify the logic of mem_region_callback()
The expression `mstart + resource_size(res) - 1` is actually equivalent to
`res->end`, simplify the logic of this function to improve readability.

Link: https://lkml.kernel.org/r/20231212150506.31711-1-ytcoode@gmail.com
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Simon Horman <horms@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-20 15:02:58 -08:00
Baoquan He
e687b2fabd kexec_file, x86: print out debugging message if required
Then when specifying '-d' for kexec_file_load interface, loaded locations
of kernel/initrd/cmdline etc can be printed out to help debug.

Here replace pr_debug() with the newly added kexec_dprintk() in kexec_file
loading related codes.

And also print out e820 memmap passed to 2nd kernel just as kexec_load
interface has been doing.

Link: https://lkml.kernel.org/r/20231213055747.61826-4-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Conor Dooley <conor@kernel.org>
Cc: Joe Perches <joe@perches.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-20 15:02:57 -08:00
Kent Overstreet
a5a0abfdb3 x86: fix missing includes/forward declarations
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-20 16:40:30 -05:00
Vegard Nossum
bc90aefa99 x86/asm: Add DB flag to 32-bit percpu GDT entry
The D/B size flag for the 32-bit percpu GDT entry was not set.

The Intel manual (vol 3, section 3.4.5) only specifies the meaning of
this flag for three cases:

 1) code segments used for %cs -- doesn't apply here

 2) stack segments used for %ss -- doesn't apply

 3) expand-down data segments -- but we don't have the expand-down flag
    set, so it also doesn't apply here

The flag likely doesn't do anything here, although the manual does also
say: "This flag should always be set to 1 for 32-bit code and data
segments [...]" so we should probably do it anyway.

Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231219151200.2878271-6-vegard.nossum@oracle.com
2023-12-20 10:57:51 +01:00
Vegard Nossum
3b184b71df x86/asm: Always set A (accessed) flag in GDT descriptors
We have no known use for having the CPU track whether GDT descriptors
have been accessed or not.

Simplify the code by adding the flag to the common flags and removing
it everywhere else.

Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231219151200.2878271-5-vegard.nossum@oracle.com
2023-12-20 10:57:51 +01:00
Vegard Nossum
1445f6e15f x86/asm: Replace magic numbers in GDT descriptors, script-generated change
Actually replace the numeric values by the new symbolic values.

I used this to find all the existing users of the GDT_ENTRY*() macros:

  $ git grep -P 'GDT_ENTRY(_INIT)?\('

Some of the lines will exceed 80 characters, but some of them will be
shorter again in the next couple of patches.

Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231219151200.2878271-4-vegard.nossum@oracle.com
2023-12-20 10:57:38 +01:00
Vegard Nossum
41ef75c848 x86/asm: Replace magic numbers in GDT descriptors, preparations
We'd like to replace all the magic numbers in various GDT descriptors
with new, semantically meaningful, symbolic values.

In order to be able to verify that the change doesn't cause any actual
changes to the compiled binary code, I've split the change into two
patches:

 - Part 1 (this commit): everything _but_ actually replacing the numbers
 - Part 2 (the following commit): _only_ replacing the numbers

The reason we need this split for verification is that including new
headers causes some spurious changes to the object files, mostly line
number changes in the debug info but occasionally other subtle codegen
changes.

Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231219151200.2878271-3-vegard.nossum@oracle.com
2023-12-20 10:57:20 +01:00
Jakub Kicinski
c49b292d03 Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2023-12-18

This PR is larger than usual and contains changes in various parts
of the kernel.

The main changes are:

1) Fix kCFI bugs in BPF, from Peter Zijlstra.

End result: all forms of indirect calls from BPF into kernel
and from kernel into BPF work with CFI enabled. This allows BPF
to work with CONFIG_FINEIBT=y.

2) Introduce BPF token object, from Andrii Nakryiko.

It adds an ability to delegate a subset of BPF features from privileged
daemon (e.g., systemd) through special mount options for userns-bound
BPF FS to a trusted unprivileged application. The design accommodates
suggestions from Christian Brauner and Paul Moore.

Example:
$ sudo mkdir -p /sys/fs/bpf/token
$ sudo mount -t bpf bpffs /sys/fs/bpf/token \
             -o delegate_cmds=prog_load:MAP_CREATE \
             -o delegate_progs=kprobe \
             -o delegate_attachs=xdp

3) Various verifier improvements and fixes, from Andrii Nakryiko, Andrei Matei.

 - Complete precision tracking support for register spills
 - Fix verification of possibly-zero-sized stack accesses
 - Fix access to uninit stack slots
 - Track aligned STACK_ZERO cases as imprecise spilled registers.
   It improves the verifier "instructions processed" metric from single
   digit to 50-60% for some programs.
 - Fix verifier retval logic

4) Support for VLAN tag in XDP hints, from Larysa Zaremba.

5) Allocate BPF trampoline via bpf_prog_pack mechanism, from Song Liu.

End result: better memory utilization and lower I$ miss for calls to BPF
via BPF trampoline.

6) Fix race between BPF prog accessing inner map and parallel delete,
from Hou Tao.

7) Add bpf_xdp_get_xfrm_state() kfunc, from Daniel Xu.

It allows BPF interact with IPSEC infra. The intent is to support
software RSS (via XDP) for the upcoming ipsec pcpu work.
Experiments on AWS demonstrate single tunnel pcpu ipsec reaching
line rate on 100G ENA nics.

8) Expand bpf_cgrp_storage to support cgroup1 non-attach, from Yafang Shao.

9) BPF file verification via fsverity, from Song Liu.

It allows BPF progs get fsverity digest.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (164 commits)
  bpf: Ensure precise is reset to false in __mark_reg_const_zero()
  selftests/bpf: Add more uprobe multi fail tests
  bpf: Fail uprobe multi link with negative offset
  selftests/bpf: Test the release of map btf
  s390/bpf: Fix indirect trampoline generation
  selftests/bpf: Temporarily disable dummy_struct_ops test on s390
  x86/cfi,bpf: Fix bpf_exception_cb() signature
  bpf: Fix dtor CFI
  cfi: Add CFI_NOSEAL()
  x86/cfi,bpf: Fix bpf_struct_ops CFI
  x86/cfi,bpf: Fix bpf_callback_t CFI
  x86/cfi,bpf: Fix BPF JIT call
  cfi: Flip headers
  selftests/bpf: Add test for abnormal cnt during multi-kprobe attachment
  selftests/bpf: Don't use libbpf_get_error() in kprobe_multi_test
  selftests/bpf: Add test for abnormal cnt during multi-uprobe attachment
  bpf: Limit the number of kprobes when attaching program to multiple kprobes
  bpf: Limit the number of uprobes when attaching program to multiple uprobes
  bpf: xdp: Register generic_kfunc_set with XDP programs
  selftests/bpf: utilize string values for delegate_xxx mount options
  ...
====================

Link: https://lore.kernel.org/r/20231219000520.34178-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-18 16:46:08 -08:00
Thomas Gleixner
d5a10b976e x86/acpi: Handle bogus MADT APIC tables gracefully
The recent fix to ignore invalid x2APIC entries inadvertently broke
systems with creative MADT APIC tables. The affected systems have APIC
MADT tables where all entries have invalid APIC IDs (0xFF), which means
they register exactly zero CPUs.

But the condition to ignore the entries of APIC IDs < 255 in the X2APIC
MADT table is solely based on the count of MADT APIC table entries.

As a consequence, the affected machines enumerate no secondary CPUs at
all because the APIC table has entries and therefore the X2APIC table
entries with APIC IDs < 255 are ignored.

Change the condition so that the APIC table preference for APIC IDs <
255 only becomes effective when the APIC table has valid APIC ID
entries.

IOW, an APIC table full of invalid APIC IDs is considered to be empty
which in consequence enables the X2APIC table entries with a APIC ID
< 255 and restores the expected behaviour.

Fixes: ec9aedb2aa ("x86/acpi: Ignore invalid x2APIC entries")
Reported-by: John Sperbeck <jsperbeck@google.com>
Reported-by: Andres Freund <andres@anarazel.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/169953729188.3135.6804572126118798018.tip-bot2@tip-bot2
2023-12-18 14:21:44 +01:00
Mario Limonciello
cef9ecc8e9 rtc: Extend timeout for waiting for UIP to clear to 1s
Specs don't say anything about UIP being cleared within 10ms. They
only say that UIP won't occur for another 244uS. If a long NMI occurs
while UIP is still updating it might not be possible to get valid
data in 10ms.

This has been observed in the wild that around s2idle some calls can
take up to 480ms before UIP is clear.

Adjust callers from outside an interrupt context to wait for up to a
1s instead of 10ms.

Cc:  <stable@vger.kernel.org> # 6.1.y
Fixes: ec5895c0f2 ("rtc: mc146818-lib: extract mc146818_avoid_UIP")
Reported-by: Carsten Hatger <xmb8dsv4@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217626
Tested-by: Mateusz Jończyk <mat.jonczyk@o2.pl>
Reviewed-by: Mateusz Jończyk <mat.jonczyk@o2.pl>
Acked-by: Mateusz Jończyk <mat.jonczyk@o2.pl>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20231128053653.101798-5-mario.limonciello@amd.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2023-12-17 22:33:55 +01:00
Mario Limonciello
120931db07 rtc: Add support for configuring the UIP timeout for RTC reads
The UIP timeout is hardcoded to 10ms for all RTC reads, but in some
contexts this might not be enough time. Add a timeout parameter to
mc146818_get_time() and mc146818_get_time_callback().

If UIP timeout is configured by caller to be >=100 ms and a call
takes this long, log a warning.

Make all callers use 10ms to ensure no functional changes.

Cc:  <stable@vger.kernel.org> # 6.1.y
Fixes: ec5895c0f2 ("rtc: mc146818-lib: extract mc146818_avoid_UIP")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Tested-by: Mateusz Jończyk <mat.jonczyk@o2.pl>
Reviewed-by: Mateusz Jończyk <mat.jonczyk@o2.pl>
Acked-by: Mateusz Jończyk <mat.jonczyk@o2.pl>
Link: https://lore.kernel.org/r/20231128053653.101798-4-mario.limonciello@amd.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2023-12-17 22:33:55 +01:00