Commit Graph

117 Commits

Author SHA1 Message Date
Linus Torvalds
3381d7b2b3 Merge tag 'irq-msi-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull MSI updates from Thomas Gleixner:
 "Updates for the [PCI] MSI subsystem:

   - Add interrupt redirection infrastructure

     Some PCI controllers use a single demultiplexing interrupt for the
     MSI interrupts of subordinate devices.

     This prevents setting the interrupt affinity of device interrupts,
     which causes device interrupts to be delivered to a single CPU.
     That obviously is counterproductive for multi-queue devices and
     interrupt balancing.

     To work around this limitation the new infrastructure installs a
     dummy irq_set_affinity() callback which captures the affinity mask
     and picks a redirection target CPU out of the mask.

     When the PCI controller demultiplexes the interrupts it invokes a
     new handling function in the core, which either runs the interrupt
     handler in the context of the target CPU or delegates it to
     irq_work on the target CPU.

   - Utilize the interrupt redirection mechanism in the PCI DWC host
     controller driver.

     This allows affinity control for the subordinate device MSI
     interrupts instead of being randomly executed on the CPU which runs
     the demultiplex handler.

   - Replace the binary 64-bit MSI flag with a DMA mask

     Some PCI devices have PCI_MSI_FLAGS_64BIT in the MSI capability,
     but implement less than 64 address bits. This breaks on platforms
     where such a device is assigned an MSI address higher than what's
     supported.

     With the binary 64-bit flag there is no other choice than disabling
     64-bit MSI support which leaves the device disfunctional.

     By using a DMA mask the address limit of a device can be described
     correctly which provides support for the above scenario.

   - Make use of the DMA mask based address limit in the hda/intel and
     radeon drivers to enable them on affected platforms

   - The usual small cleanups and improvements"

* tag 'irq-msi-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  ALSA: hda/intel: Make MSI address limit based on the device DMA limit
  drm/radeon: Make MSI address limit based on the device DMA limit
  PCI/MSI: Check the device specific address mask in msi_verify_entries()
  PCI/MSI: Convert the boolean no_64bit_msi flag to a DMA address mask
  genirq/redirect: Prevent writing MSI message on affinity change
  PCI/MSI: Unmap MSI-X region on error
  genirq: Update effective affinity for redirected interrupts
  PCI: dwc: Enable MSI affinity support
  PCI: dwc: Code cleanup
  genirq: Add interrupt redirection infrastructure
  genirq/msi: Correct kernel-doc in <linux/msi.h>
2026-02-10 16:30:29 -08:00
Vivian Wang
52f0d862f5 PCI/MSI: Check the device specific address mask in msi_verify_entries()
Instead of a 32-bit/64-bit dichotomy, check the MSI address against
the device specific address mask.

This allows platforms with an MSI doorbell address above the 32-bit limit
to work with devices without full 64-bit MSI address support, as long as
the doorbell is within the addressable range of the device.

[ tglx: Massaged changelog ]

Signed-off-by: Vivian Wang <wangruikang@iscas.ac.cn>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Reviewed-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260129-pci-msi-addr-mask-v4-2-70da998f2750@iscas.ac.cn
2026-01-31 01:11:48 +01:00
Vivian Wang
386ced19e9 PCI/MSI: Convert the boolean no_64bit_msi flag to a DMA address mask
Some PCI devices have PCI_MSI_FLAGS_64BIT in the MSI capability, but
implement less than 64 address bits. This breaks on platforms where such
a device is assigned an MSI address higher than what's supported.

Currently, no_64bit_msi bit is set for these devices, meaning that only
32-bit MSI addresses are allowed for them. However, on some platforms the
MSI doorbell address is above the 32-bit limit but within the addressable
range of the device.

As a first step to enable MSI on those combinations of devices and
platforms, convert the boolean no_64bit_msi flag to a DMA mask and fixup
the affected usage sites:

  - no_64bit_msi = 1    ->    msi_addr_mask = DMA_BIT_MASK(32)
  - no_64bit_msi = 0    ->    msi_addr_mask = DMA_BIT_MASK(64)
  - if (no_64bit_msi)   ->    if (msi_addr_mask < DMA_BIT_MASK(64))

Since no values other than DMA_BIT_MASK(32) and DMA_BIT_MASK(64) are used,
this is functionally equivalent.

This prepares for changing the binary decision between 32 and 64 bit to a
DMA mask based decision which allows to support systems which have a DMA
address space less than 64bit but a MSI doorbell address above the 32-bit
limit.

[ tglx: Massaged changelog ]

Signed-off-by: Vivian Wang <wangruikang@iscas.ac.cn>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Reviewed-by: Brett Creeley <brett.creeley@amd.com> # ionic
Reviewed-by: Thomas Gleixner <tglx@kernel.org>
Acked-by: Takashi Iwai <tiwai@suse.de> # sound
Link: https://patch.msgid.link/20260129-pci-msi-addr-mask-v4-1-70da998f2750@iscas.ac.cn
2026-01-31 01:11:48 +01:00
Lorenzo Pieralisi
a97efa5ba5 irqchip/gic-v5: Add ACPI ITS probing
On ACPI ARM64 systems the GICv5 ITS configuration and translate frames
are described in the MADT table.

Refactor the current GICv5 ITS driver code to share common functions
between ACPI and OF and implement ACPI probing in the GICv5 ITS driver.

Add iort_msi_xlate() to map a device ID and retrieve an MSI controller
fwnode node for ACPI systems and update pci_msi_map_rid_ctlr_node() to
use it in its ACPI code path.

Add the required functions to IORT code for deviceID retrieval and IRQ
domain registration and look-up so that the GICv5 ITS driver in an
ACPI based system can be successfully probed.

Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Acked-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260115-gicv5-host-acpi-v3-5-c13a9a150388@kernel.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2026-01-27 15:31:42 +01:00
Lorenzo Pieralisi
a08df2fbba PCI/MSI: Make the pci_msi_map_rid_ctlr_node() interface firmware agnostic
To support booting with OF and ACPI seamlessly, GIC ITS parent code
requires the PCI/MSI irqdomain layer to implement a function to retrieve
an MSI controller fwnode and map an RID in a firmware agnostic way
(ie pci_msi_map_rid_ctlr_node()).

Convert pci_msi_map_rid_ctlr_node() to an OF agnostic interface
(fwnode_handle based) and update the GIC ITS MSI parent code to reflect
the pci_msi_map_rid_ctlr_node() change.

Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260115-gicv5-host-acpi-v3-2-c13a9a150388@kernel.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2026-01-27 15:31:42 +01:00
Haoxiang Li
1a8d4c6ecb PCI/MSI: Unmap MSI-X region on error
msix_capability_init() fails to unmap the MSI-X region if
msix_setup_interrupts() fails.

Add the missing iounmap() for that error path.

[ tglx: Massaged change log ]

Signed-off-by: Haoxiang Li <lihaoxiang@isrc.iscas.ac.cn>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260125144452.2103812-1-lihaoxiang@isrc.iscas.ac.cn
2026-01-26 09:46:48 +01:00
Nam Cao
dce7450093 PCI/MSI: Delete pci_msi_create_irq_domain()
pci_msi_create_irq_domain() is now unused. Delete it.

Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
2025-10-16 21:09:52 +02:00
Linus Torvalds
9792d660a4 Merge tag 'devicetree-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree updates from Rob Herring:
 "DT core:

   - Update dtc to upstream version v1.7.2-35-g52f07dcca47c

   - Add stub for of_get_next_child_with_prefix()

   - Convert of_msi_map_id() callers to of_msi_xlate()

 DT bindings:

   - Convert multiple text board bindings to DT schema format

   - Add bindings for synaptics,synaptics_i2c touchscreen controller,
     innolux,n133hse-ea1 and nlt,nl12880bc20-spwg-24 displays, and NXP
     vf610 reboot controller

   - Add new Arm Cortex-A320/A520AE/A720AE and C1-Nano/Pro/Premium/Ultra
     CPUs. Add missing Applied Micro CPU compatibles. Add pu-supply and
     fsl,soc-operating-points properties for CPU nodes.

   - Add QCom Glymur PDC and tegra264-agic interrupt controllers

   - Add samsung,exynos8890-mali GPU to Arm Mali Midgard

   - Drop Samsung S3C2410 display related bindings

   - Allow separate DP lane and AUX connections in dp-connector

   - Add some missing, undocumented vendor prefixes

   - Add missing '#address-cells' properties in interrupt controller
     bindings which dtc now warns about

   - Drop duplicate socfpga-sdram-edac.txt, moxa,moxart-watchdog.txt,
     fsl/mpic.txt, ti,opa362.txt, and cavium-thunder2.txt legacy text
     bindings which are already covered by existing schemas.

   - Various binding fixes for Mediatek platforms in mailbox, regulator,
     pinctrl, timer, and display

   - Drop work-around for yamllint quoting of values containing ','

   - Various spelling, typo, grammar, and duplicated words fixes in DT
     bindings and docs

   - Add binding guidelines for defining properties at top level of
     schemas, lack of node name ABI, and usage of simple-mfd"

* tag 'devicetree-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (81 commits)
  dt-bindings: arm: altera: Drop socfpga-sdram-edac.txt
  dt-bindings: gpu: Convert nvidia,gk20a to DT schema
  dt-bindings: rng: sparc_sun_oracle_rng: convert to DT schema
  dt-bindings: vendor-prefixes: update regex for properties without a prefix
  dt-bindings: display: bridge: convert megachips-stdpxxxx-ge-b850v3-fw.txt to yaml
  scripts: dt_to_config: fix grammar and a typo in --help text
  dt-bindings: fix spelling, typos, grammar, duplicated words
  docs: dt: fix grammar and spelling
  of: base: Add of_get_next_child_with_prefix() stub
  dt-bindings: trivial-devices: Add compatible string synaptics,synaptics_i2c
  dt-bindings: soc: mediatek: pwrap: Add power-domains property
  dt-bindings: pinctrl: mt65xx: Allow gpio-line-names
  dt-bindings: media: Convert MediaTek mt8173-vpu bindings to DT schema
  dt-bindings: arm: mediatek: Support mt8183-audiosys variant
  dt-bindings: mailbox: mediatek,gce-mailbox: Make clock-names optional
  dt-bindings: regulator: mediatek,mt6331: Add missing compatible
  dt-bindings: regulator: mediatek,mt6331: Fix various regulator names
  dt-bindings: regulator: mediatek,mt6332-regulator: Add missing compatible
  dt-bindings: pinctrl: mediatek,mt7622-pinctrl: Add missing base reg
  dt-bindings: pinctrl: mediatek,mt7622-pinctrl: Add missing pwm_ch7_2
  ...
2025-10-01 16:58:24 -07:00
Thomas Gleixner
ba9d484ed3 PCI/MSI: Remove the conditional parent [un]mask logic
Now that msi_lib_init_dev_msi_info() overwrites the irq_[un]mask()
callbacks when the MSI_FLAG_PCI_MSI_MASK_PARENT flag is set by the parent
domain, the conditional [un]mask logic is obsolete.

Remove it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Marc Zyngier <maz@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/all/20250903135433.444329373@linutronix.de
2025-09-09 14:44:30 +02:00
Lorenzo Pieralisi
a576a849d5 of/irq: Convert of_msi_map_id() callers to of_msi_xlate()
With the introduction of the of_msi_xlate() function, the OF layer
provides an API to map a device ID and retrieve the MSI controller
node the ID is mapped to with a single call.

of_msi_map_id() is currently used to map a deviceID to a specific
MSI controller node; of_msi_xlate() can be used for that purpose
too, there is no need to keep the two functions.

Convert of_msi_map_id() to of_msi_xlate() calls and update the
of_msi_xlate() documentation to describe how the struct device_node
pointer passed in should be set-up to either provide the MSI controller
node target or receive its pointer upon mapping completion.

Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rob Herring <robh@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://lore.kernel.org/r/20250805133443.936955-1-lpieralisi@kernel.org
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
2025-09-03 09:01:26 -05:00
Inochi Amaoto
727e914bbf PCI/MSI: Check MSI_FLAG_PCI_MSI_MASK_PARENT in cond_[startup|shutdown]_parent()
For MSI controllers which only support MSI_FLAG_PCI_MSI_MASK_PARENT, the
newly added callback irq_startup() and irq_shutdown() for
pci_msi[x]_template will not unmask or mask the interrupt when startup()
resp.  shutdown() is invoked. This prevents the interrupt from being
enabled resp. disabled.

Invoke irq_[un]mask_parent() in cond_[startup|shutdown]_parent(), when the
interrupt has the MSI_FLAG_PCI_MSI_MASK_PARENT flag set.

Fixes: 54f45a30c0 ("PCI/MSI: Add startup/shutdown for per device domains")
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Reported-by: Nathan Chancellor <nathan@kernel.org>
Reported-by: Wei Fang <wei.fang@nxp.com>
Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Tested-by: Wei Fang <wei.fang@nxp.com>
Tested-by: Chen Wang <unicorn_wang@outlook.com> # Pioneerbox/SG2042
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/all/20250827230943.17829-1-inochiama@gmail.com
Closes: https://lore.kernel.org/regressions/aK4O7Hl8NCVEMznB@monster/
Closes: https://lore.kernel.org/regressions/20250826220959.GA4119563@ax162/
Closes: https://lore.kernel.org/all/20250827093911.1218640-1-wei.fang@nxp.com/
2025-09-02 14:42:09 +02:00
Inochi Amaoto
54f45a30c0 PCI/MSI: Add startup/shutdown for per device domains
As the RISC-V PLIC cannot apply affinity settings without invoking
irq_enable(), it will make the interrupt unavailble when used as an
underlying interrupt chip for the MSI controller.

Implement the irq_startup() and irq_shutdown() callbacks for the PCI MSI
and MSI-X templates.

For chips that specify MSI_FLAG_PCI_MSI_STARTUP_PARENT, the parent startup
and shutdown functions are invoked. That allows the interrupt on the parent
chip to be enabled if the interrupt has not been enabled during
allocation. This is necessary for MSI controllers which use PLIC as
underlying parent interrupt chip.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Chen Wang <unicorn_wang@outlook.com> # Pioneerbox
Reviewed-by: Chen Wang <unicorn_wang@outlook.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/all/20250813232835.43458-3-inochiama@gmail.com
2025-08-23 21:21:13 +02:00
Linus Torvalds
0bd0a41a51 Merge tag 'pci-v6.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull PCI updates from Bjorn Helgaas:
 "Enumeration:

   - Allow built-in drivers, not just modular drivers, to use async
     initial probing (Lukas Wunner)

   - Support Immediate Readiness even on devices with no PM Capability
     (Sean Christopherson)

   - Consolidate definition of PCIE_RESET_CONFIG_WAIT_MS (100ms), the
     required delay between a reset and sending config requests to a
     device (Niklas Cassel)

   - Add pci_is_display() to check for "Display" base class and use it
     in ALSA hda, vfio, vga_switcheroo, vt-d (Mario Limonciello)

   - Allow 'isolated PCI functions' (multi-function devices without a
     function 0) for LoongArch, similar to s390 and jailhouse (Huacai
     Chen)

  Power control:

   - Add ability to enable optional slot clock for cases where the PCIe
     host controller and the slot are supplied by different clocks
     (Marek Vasut)

  PCIe native device hotplug:

   - Fix runtime PM ref imbalance on Hot-Plug Capable ports caused by
     misinterpreting a config read failure after a device has been
     removed (Lukas Wunner)

   - Avoid creating a useless PCIe port service device for pciehp if the
     slot is handled by the ACPI hotplug driver (Lukas Wunner)

   - Ignore ACPI hotplug slots when calculating depth of pciehp hotplug
     ports (Lukas Wunner)

  Virtualization:

   - Save VF resizable BAR state and restore it after reset (Michał
     Winiarski)

   - Allow IOV resources (VF BARs) to be resized (Michał Winiarski)

   - Add pci_iov_vf_bar_set_size() so drivers can control VF BAR size
     (Michał Winiarski)

  Endpoint framework:

   - Add RC-to-EP doorbell support using platform MSI controller,
     including a test case (Frank Li)

   - Allow BAR assignment via configfs so platforms have flexibility in
     determining BAR usage (Jerome Brunet)

  Native PCIe controller drivers:

   - Convert amazon,al-alpine-v[23]-pcie, apm,xgene-pcie,
     axis,artpec6-pcie, marvell,armada-3700-pcie, st,spear1340-pcie to
     DT schema format (Rob Herring)

   - Use dev_fwnode() instead of of_fwnode_handle() to remove OF
     dependency in altera (fixes an unused variable), designware-host,
     mediatek, mediatek-gen3, mobiveil, plda, xilinx, xilinx-dma,
     xilinx-nwl (Jiri Slaby, Arnd Bergmann)

   - Convert aardvark, altera, brcmstb, designware-host, iproc,
     mediatek, mediatek-gen3, mobiveil, plda, rcar-host, vmd, xilinx,
     xilinx-dma, xilinx-nwl from using pci_msi_create_irq_domain() to
     using msi_create_parent_irq_domain() instead; this makes the
     interrupt controller per-PCI device, allows dynamic allocation of
     vectors after initialization, and allows support of IMS (Nam Cao)

  APM X-Gene PCIe controller driver:

   - Rewrite MSI handling to MSI CPU affinity, drop useless CPU hotplug
     bits, use device-managed memory allocations, and clean things up
     (Marc Zyngier)

   - Probe xgene-msi as a standard platform driver rather than a
     subsys_initcall (Marc Zyngier)

  Broadcom STB PCIe controller driver:

   - Add optional DT 'num-lanes' property and if present, use it to
     override the Maximum Link Width advertised in Link Capabilities
     (Jim Quinlan)

  Cadence PCIe controller driver:

   - Use PCIe Message routing types from the PCI core rather than
     defining private ones (Hans Zhang)

  Freescale i.MX6 PCIe controller driver:

   - Add IMX8MQ_EP third 64-bit BAR in epc_features (Richard Zhu)

   - Add IMX8MM_EP and IMX8MP_EP fixed 256-byte BAR 4 in epc_features
     (Richard Zhu)

   - Configure LUT for MSI/IOMMU in Endpoint mode so Root Complex can
     trigger doorbel on Endpoint (Frank Li)

   - Remove apps_reset (LTSSM_EN) from
     imx_pcie_{assert,deassert}_core_reset(), which fixes a hotplug
     regression on i.MX8MM (Richard Zhu)

   - Delay Endpoint link start until configfs 'start' written (Richard
     Zhu)

  Intel VMD host bridge driver:

   - Add Intel Panther Lake (PTL)-H/P/U Vendor ID (George D Sworo)

  Qualcomm PCIe controller driver:

   - Add DT binding and driver support for SA8255p, which supports ECAM
     for Configuration Space access (Mayank Rana)

   - Update DT binding and driver to describe PHYs and per-Root Port
     resets in a Root Port stanza and deprecate describing them in the
     host bridge; this makes it possible to support multiple Root Ports
     in the future (Krishna Chaitanya Chundru)

   - Add Qualcomm QCS615 to SM8150 DT binding (Ziyue Zhang)

   - Add Qualcomm QCS8300 to SA8775p DT binding (Ziyue Zhang)

   - Drop TBU and ref clocks from Qualcomm SM8150 and SC8180x DT
     bindings (Konrad Dybcio)

   - Document 'link_down' reset in Qualcomm SA8775P DT binding (Ziyue
     Zhang)

   - Add required PCIE_RESET_CONFIG_WAIT_MS delay after Link up IRQ
     (Niklas Cassel)

  Rockchip PCIe controller driver:

   - Drop unused PCIe Message routing and code definitions (Hans Zhang)

   - Remove several unused header includes (Hans Zhang)

   - Use standard PCIe config register definitions instead of
     rockchip-specific redefinitions (Geraldo Nascimento)

   - Set Target Link Speed to 5.0 GT/s before retraining so we have a
     chance to train at a higher speed (Geraldo Nascimento)

  Rockchip DesignWare PCIe controller driver:

   - Prevent race between link training and register update via DBI by
     inhibiting link training after hot reset and link down (Wilfred
     Mallawa)

   - Add required PCIE_RESET_CONFIG_WAIT_MS delay after Link up IRQ
     (Niklas Cassel)

  Sophgo PCIe controller driver:

   - Add DT binding and driver for Sophgo SG2044 PCIe controller driver
     in Root Complex mode (Inochi Amaoto)

  Synopsys DesignWare PCIe controller driver:

   - Add required PCIE_RESET_CONFIG_WAIT_MS after waiting for Link up on
     Ports that support > 5.0 GT/s. Slower Ports still rely on the
     not-quite-correct PCIE_LINK_WAIT_SLEEP_MS 90ms default delay while
     waiting for the Link (Niklas Cassel)"

* tag 'pci-v6.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (116 commits)
  dt-bindings: PCI: qcom,pcie-sa8775p: Document 'link_down' reset
  dt-bindings: PCI: Remove 83xx-512x-pci.txt
  dt-bindings: PCI: Convert amazon,al-alpine-v[23]-pcie to DT schema
  dt-bindings: PCI: Convert marvell,armada-3700-pcie to DT schema
  dt-bindings: PCI: Convert apm,xgene-pcie to DT schema
  dt-bindings: PCI: Convert axis,artpec6-pcie to DT schema
  dt-bindings: PCI: Convert st,spear1340-pcie to DT schema
  PCI: Move is_pciehp check out of pciehp_is_native()
  PCI: pciehp: Use is_pciehp instead of is_hotplug_bridge
  PCI/portdrv: Use is_pciehp instead of is_hotplug_bridge
  PCI/ACPI: Fix runtime PM ref imbalance on Hot-Plug Capable ports
  selftests: pci_endpoint: Add doorbell test case
  misc: pci_endpoint_test: Add doorbell test case
  PCI: endpoint: pci-epf-test: Add doorbell test support
  PCI: endpoint: Add pci_epf_align_inbound_addr() helper for inbound address alignment
  PCI: endpoint: pci-ep-msi: Add checks for MSI parent and mutability
  PCI: endpoint: Add RC-to-EP doorbell support using platform MSI controller
  PCI: dwc: Add Sophgo SG2044 PCIe controller driver in Root Complex mode
  PCI: vmd: Switch to msi_create_parent_irq_domain()
  PCI: vmd: Convert to lock guards
  ...
2025-08-01 13:59:07 -07:00
Linus Torvalds
63eb28bb14 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
 "ARM:

   - Host driver for GICv5, the next generation interrupt controller for
     arm64, including support for interrupt routing, MSIs, interrupt
     translation and wired interrupts

   - Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on
     GICv5 hardware, leveraging the legacy VGIC interface

   - Userspace control of the 'nASSGIcap' GICv3 feature, allowing
     userspace to disable support for SGIs w/o an active state on
     hardware that previously advertised it unconditionally

   - Map supporting endpoints with cacheable memory attributes on
     systems with FEAT_S2FWB and DIC where KVM no longer needs to
     perform cache maintenance on the address range

   - Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the
     guest hypervisor to inject external aborts into an L2 VM and take
     traps of masked external aborts to the hypervisor

   - Convert more system register sanitization to the config-driven
     implementation

   - Fixes to the visibility of EL2 registers, namely making VGICv3
     system registers accessible through the VGIC device instead of the
     ONE_REG vCPU ioctls

   - Various cleanups and minor fixes

  LoongArch:

   - Add stat information for in-kernel irqchip

   - Add tracepoints for CPUCFG and CSR emulation exits

   - Enhance in-kernel irqchip emulation

   - Various cleanups

  RISC-V:

   - Enable ring-based dirty memory tracking

   - Improve perf kvm stat to report interrupt events

   - Delegate illegal instruction trap to VS-mode

   - MMU improvements related to upcoming nested virtualization

  s390x

   - Fixes

  x86:

   - Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O
     APIC, PIC, and PIT emulation at compile time

   - Share device posted IRQ code between SVM and VMX and harden it
     against bugs and runtime errors

   - Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups
     O(1) instead of O(n)

   - For MMIO stale data mitigation, track whether or not a vCPU has
     access to (host) MMIO based on whether the page tables have MMIO
     pfns mapped; using VFIO is prone to false negatives

   - Rework the MSR interception code so that the SVM and VMX APIs are
     more or less identical

   - Recalculate all MSR intercepts from scratch on MSR filter changes,
     instead of maintaining shadow bitmaps

   - Advertise support for LKGS (Load Kernel GS base), a new instruction
     that's loosely related to FRED, but is supported and enumerated
     independently

   - Fix a user-triggerable WARN that syzkaller found by setting the
     vCPU in INIT_RECEIVED state (aka wait-for-SIPI), and then putting
     the vCPU into VMX Root Mode (post-VMXON). Trying to detect every
     possible path leading to architecturally forbidden states is hard
     and even risks breaking userspace (if it goes from valid to valid
     state but passes through invalid states), so just wait until
     KVM_RUN to detect that the vCPU state isn't allowed

   - Add KVM_X86_DISABLE_EXITS_APERFMPERF to allow disabling
     interception of APERF/MPERF reads, so that a "properly" configured
     VM can access APERF/MPERF. This has many caveats (APERF/MPERF
     cannot be zeroed on vCPU creation or saved/restored on suspend and
     resume, or preserved over thread migration let alone VM migration)
     but can be useful whenever you're interested in letting Linux
     guests see the effective physical CPU frequency in /proc/cpuinfo

   - Reject KVM_SET_TSC_KHZ for vm file descriptors if vCPUs have been
     created, as there's no known use case for changing the default
     frequency for other VM types and it goes counter to the very reason
     why the ioctl was added to the vm file descriptor. And also, there
     would be no way to make it work for confidential VMs with a
     "secure" TSC, so kill two birds with one stone

   - Dynamically allocation the shadow MMU's hashed page list, and defer
     allocating the hashed list until it's actually needed (the TDP MMU
     doesn't use the list)

   - Extract many of KVM's helpers for accessing architectural local
     APIC state to common x86 so that they can be shared by guest-side
     code for Secure AVIC

   - Various cleanups and fixes

  x86 (Intel):

   - Preserve the host's DEBUGCTL.FREEZE_IN_SMM when running the guest.
     Failure to honor FREEZE_IN_SMM can leak host state into guests

   - Explicitly check vmcs12.GUEST_DEBUGCTL on nested VM-Enter to
     prevent L1 from running L2 with features that KVM doesn't support,
     e.g. BTF

  x86 (AMD):

   - WARN and reject loading kvm-amd.ko instead of panicking the kernel
     if the nested SVM MSRPM offsets tracker can't handle an MSR (which
     is pretty much a static condition and therefore should never
     happen, but still)

   - Fix a variety of flaws and bugs in the AVIC device posted IRQ code

   - Inhibit AVIC if a vCPU's ID is too big (relative to what hardware
     supports) instead of rejecting vCPU creation

   - Extend enable_ipiv module param support to SVM, by simply leaving
     IsRunning clear in the vCPU's physical ID table entry

   - Disable IPI virtualization, via enable_ipiv, if the CPU is affected
     by erratum #1235, to allow (safely) enabling AVIC on such CPUs

   - Request GA Log interrupts if and only if the target vCPU is
     blocking, i.e. only if KVM needs a notification in order to wake
     the vCPU

   - Intercept SPEC_CTRL on AMD if the MSR shouldn't exist according to
     the vCPU's CPUID model

   - Accept any SNP policy that is accepted by the firmware with respect
     to SMT and single-socket restrictions. An incompatible policy
     doesn't put the kernel at risk in any way, so there's no reason for
     KVM to care

   - Drop a superfluous WBINVD (on all CPUs!) when destroying a VM and
     use WBNOINVD instead of WBINVD when possible for SEV cache
     maintenance

   - When reclaiming memory from an SEV guest, only do cache flushes on
     CPUs that have ever run a vCPU for the guest, i.e. don't flush the
     caches for CPUs that can't possibly have cache lines with dirty,
     encrypted data

  Generic:

   - Rework irqbypass to track/match producers and consumers via an
     xarray instead of a linked list. Using a linked list leads to
     O(n^2) insertion times, which is hugely problematic for use cases
     that create large numbers of VMs. Such use cases typically don't
     actually use irqbypass, but eliminating the pointless registration
     is a future problem to solve as it likely requires new uAPI

   - Track irqbypass's "token" as "struct eventfd_ctx *" instead of a
     "void *", to avoid making a simple concept unnecessarily difficult
     to understand

   - Decouple device posted IRQs from VFIO device assignment, as binding
     a VM to a VFIO group is not a requirement for enabling device
     posted IRQs

   - Clean up and document/comment the irqfd assignment code

   - Disallow binding multiple irqfds to an eventfd with a priority
     waiter, i.e. ensure an eventfd is bound to at most one irqfd
     through the entire host, and add a selftest to verify eventfd:irqfd
     bindings are globally unique

   - Add a tracepoint for KVM_SET_MEMORY_ATTRIBUTES to help debug issues
     related to private <=> shared memory conversions

   - Drop guest_memfd's .getattr() implementation as the VFS layer will
     call generic_fillattr() if inode_operations.getattr is NULL

   - Fix issues with dirty ring harvesting where KVM doesn't bound the
     processing of entries in any way, which allows userspace to keep
     KVM in a tight loop indefinitely

   - Kill off kvm_arch_{start,end}_assignment() and x86's associated
     tracking, now that KVM no longer uses assigned_device_count as a
     heuristic for either irqbypass usage or MDS mitigation

  Selftests:

   - Fix a comment typo

   - Verify KVM is loaded when getting any KVM module param so that
     attempting to run a selftest without kvm.ko loaded results in a
     SKIP message about KVM not being loaded/enabled (versus some random
     parameter not existing)

   - Skip tests that hit EACCES when attempting to access a file, and
     print a "Root required?" help message. In most cases, the test just
     needs to be run with elevated permissions"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (340 commits)
  Documentation: KVM: Use unordered list for pre-init VGIC registers
  RISC-V: KVM: Avoid re-acquiring memslot in kvm_riscv_gstage_map()
  RISC-V: KVM: Use find_vma_intersection() to search for intersecting VMAs
  RISC-V: perf/kvm: Add reporting of interrupt events
  RISC-V: KVM: Enable ring-based dirty memory tracking
  RISC-V: KVM: Fix inclusion of Smnpm in the guest ISA bitmap
  RISC-V: KVM: Delegate illegal instruction fault to VS mode
  RISC-V: KVM: Pass VMID as parameter to kvm_riscv_hfence_xyz() APIs
  RISC-V: KVM: Factor-out g-stage page table management
  RISC-V: KVM: Add vmid field to struct kvm_riscv_hfence
  RISC-V: KVM: Introduce struct kvm_gstage_mapping
  RISC-V: KVM: Factor-out MMU related declarations into separate headers
  RISC-V: KVM: Use ncsr_xyz() in kvm_riscv_vcpu_trap_redirect()
  RISC-V: KVM: Implement kvm_arch_flush_remote_tlbs_range()
  RISC-V: KVM: Don't flush TLB when PTE is unchanged
  RISC-V: KVM: Replace KVM_REQ_HFENCE_GVMA_VMID_ALL with KVM_REQ_TLB_FLUSH
  RISC-V: KVM: Rename and move kvm_riscv_local_tlb_sanitize()
  RISC-V: KVM: Drop the return value of kvm_riscv_vcpu_aia_init()
  RISC-V: KVM: Check kvm_riscv_vcpu_alloc_vector_context() return value
  KVM: arm64: selftests: Add FEAT_RAS EL2 registers to get-reg-list
  ...
2025-07-30 17:14:01 -07:00
Linus Torvalds
8be4d31cb8 Merge tag 'net-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
 "Core & protocols:

   - Wrap datapath globals into net_aligned_data, to avoid false sharing

   - Preserve MSG_ZEROCOPY in forwarding (e.g. out of a container)

   - Add SO_INQ and SCM_INQ support to AF_UNIX

   - Add SIOCINQ support to AF_VSOCK

   - Add TCP_MAXSEG sockopt to MPTCP

   - Add IPv6 force_forwarding sysctl to enable forwarding per interface

   - Make TCP validation of whether packet fully fits in the receive
     window and the rcv_buf more strict. With increased use of HW
     aggregation a single "packet" can be multiple 100s of kB

   - Add MSG_MORE flag to optimize large TCP transmissions via sockmap,
     improves latency up to 33% for sockmap users

   - Convert TCP send queue handling from tasklet to BH workque

   - Improve BPF iteration over TCP sockets to see each socket exactly
     once

   - Remove obsolete and unused TCP RFC3517/RFC6675 loss recovery code

   - Support enabling kernel threads for NAPI processing on per-NAPI
     instance basis rather than a whole device. Fully stop the kernel
     NAPI thread when threaded NAPI gets disabled. Previously thread
     would stick around until ifdown due to tricky synchronization

   - Allow multicast routing to take effect on locally-generated packets

   - Add output interface argument for End.X in segment routing

   - MCTP: add support for gateway routing, improve bind() handling

   - Don't require rtnl_lock when fetching an IPv6 neighbor over Netlink

   - Add a new neighbor flag ("extern_valid"), which cedes refresh
     responsibilities to userspace. This is needed for EVPN multi-homing
     where a neighbor entry for a multi-homed host needs to be synced
     across all the VTEPs among which the host is multi-homed

   - Support NUD_PERMANENT for proxy neighbor entries

   - Add a new queuing discipline for IETF RFC9332 DualQ Coupled AQM

   - Add sequence numbers to netconsole messages. Unregister
     netconsole's console when all net targets are removed. Code
     refactoring. Add a number of selftests

   - Align IPSec inbound SA lookup to RFC 4301. Only SPI and protocol
     should be used for an inbound SA lookup

   - Support inspecting ref_tracker state via DebugFS

   - Don't force bonding advertisement frames tx to ~333 ms boundaries.
     Add broadcast_neighbor option to send ARP/ND on all bonded links

   - Allow providing upcall pid for the 'execute' command in openvswitch

   - Remove DCCP support from Netfilter's conntrack

   - Disallow multiple packet duplications in the queuing layer

   - Prevent use of deprecated iptables code on PREEMPT_RT

  Driver API:

   - Support RSS and hashing configuration over ethtool Netlink

   - Add dedicated ethtool callbacks for getting and setting hashing
     fields

   - Add support for power budget evaluation strategy in PSE /
     Power-over-Ethernet. Generate Netlink events for overcurrent etc

   - Support DPLL phase offset monitoring across all device inputs.
     Support providing clock reference and SYNC over separate DPLL
     inputs

   - Support traffic classes in devlink rate API for bandwidth
     management

   - Remove rtnl_lock dependency from UDP tunnel port configuration

  Device drivers:

   - Add a new Broadcom driver for 800G Ethernet (bnge)

   - Add a standalone driver for Microchip ZL3073x DPLL

   - Remove IBM's NETIUCV device driver

   - Ethernet high-speed NICs:
      - Broadcom (bnxt):
         - support zero-copy Tx of DMABUF memory
         - take page size into account for page pool recycling rings
      - Intel (100G, ice, idpf):
         - idpf: XDP and AF_XDP support preparations
         - idpf: add flow steering
         - add link_down_events statistic
         - clean up the TSPLL code
         - preparations for live VM migration
      - nVidia/Mellanox:
         - support zero-copy Rx/Tx interfaces (DMABUF and io_uring)
         - optimize context memory usage for matchers
         - expose serial numbers in devlink info
         - support PCIe congestion metrics
      - Meta (fbnic):
         - add 25G, 50G, and 100G link modes to phylink
         - support dumping FW logs
      - Marvell/Cavium:
         - support for CN20K generation of the Octeon chips
      - Amazon:
         - add HW clock (without timestamping, just hypervisor time access)

   - Ethernet virtual:
      - VirtIO net:
         - support segmentation of UDP-tunnel-encapsulated packets
      - Google (gve):
         - support packet timestamping and clock synchronization
      - Microsoft vNIC:
         - add handler for device-originated servicing events
         - allow dynamic MSI-X vector allocation
         - support Tx bandwidth clamping

   - Ethernet NICs consumer, and embedded:
      - AMD:
         - amd-xgbe: hardware timestamping and PTP clock support
      - Broadcom integrated MACs (bcmgenet, bcmasp):
         - use napi_complete_done() return value to support NAPI polling
         - add support for re-starting auto-negotiation
      - Broadcom switches (b53):
         - support BCM5325 switches
         - add bcm63xx EPHY power control
      - Synopsys (stmmac):
         - lots of code refactoring and cleanups
      - TI:
         - icssg-prueth: read firmware-names from device tree
         - icssg: PRP offload support
      - Microchip:
         - lan78xx: convert to PHYLINK for improved PHY and MAC management
         - ksz: add KSZ8463 switch support
      - Intel:
         - support similar queue priority scheme in multi-queue and
           time-sensitive networking (taprio)
         - support packet pre-emption in both
      - RealTek (r8169):
         - enable EEE at 5Gbps on RTL8126
      - Airoha:
         - add PPPoE offload support
         - MDIO bus controller for Airoha AN7583

   - Ethernet PHYs:
      - support for the IPQ5018 internal GE PHY
      - micrel KSZ9477 switch-integrated PHYs:
         - add MDI/MDI-X control support
         - add RX error counters
         - add cable test support
         - add Signal Quality Indicator (SQI) reporting
      - dp83tg720: improve reset handling and reduce link recovery time
      - support bcm54811 (and its MII-Lite interface type)
      - air_en8811h: support resume/suspend
      - support PHY counters for QCA807x and QCA808x
      - support WoL for QCA807x

   - CAN drivers:
      - rcar_canfd: support for Transceiver Delay Compensation
      - kvaser: report FW versions via devlink dev info

   - WiFi:
      - extended regulatory info support (6 GHz)
      - add statistics and beacon monitor for Multi-Link Operation (MLO)
      - support S1G aggregation, improve S1G support
      - add Radio Measurement action fields
      - support per-radio RTS threshold
      - some work around how FIPS affects wifi, which was wrong (RC4 is
        used by TKIP, not only WEP)
      - improvements for unsolicited probe response handling

   - WiFi drivers:
      - RealTek (rtw88):
         - IBSS mode for SDIO devices
      - RealTek (rtw89):
         - BT coexistence for MLO/WiFi7
         - concurrent station + P2P support
         - support for USB devices RTL8851BU/RTL8852BU
      - Intel (iwlwifi):
         - use embedded PNVM in (to be released) FW images to fix
           compatibility issues
         - many cleanups (unused FW APIs, PCIe code, WoWLAN)
         - some FIPS interoperability
      - MediaTek (mt76):
         - firmware recovery improvements
         - more MLO work
      - Qualcomm/Atheros (ath12k):
         - fix scan on multi-radio devices
         - more EHT/Wi-Fi 7 features
         - encapsulation/decapsulation offload
      - Broadcom (brcm80211):
         - support SDIO 43751 device

   - Bluetooth:
      - hci_event: add support for handling LE BIG Sync Lost event
      - ISO: add socket option to report packet seqnum via CMSG
      - ISO: support SCM_TIMESTAMPING for ISO TS

   - Bluetooth drivers:
      - intel_pcie: support Function Level Reset
      - nxpuart: add support for 4M baudrate
      - nxpuart: implement powerup sequence, reset, FW dump, and FW loading"

* tag 'net-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1742 commits)
  dpll: zl3073x: Fix build failure
  selftests: bpf: fix legacy netfilter options
  ipv6: annotate data-races around rt->fib6_nsiblings
  ipv6: fix possible infinite loop in fib6_info_uses_dev()
  ipv6: prevent infinite loop in rt6_nlmsg_size()
  ipv6: add a retry logic in net6_rt_notify()
  vrf: Drop existing dst reference in vrf_ip6_input_dst
  net/sched: taprio: align entry index attr validation with mqprio
  net: fsl_pq_mdio: use dev_err_probe
  selftests: rtnetlink.sh: remove esp4_offload after test
  vsock: remove unnecessary null check in vsock_getname()
  igb: xsk: solve negative overflow of nb_pkts in zerocopy mode
  stmmac: xsk: fix negative overflow of budget in zerocopy mode
  dt-bindings: ieee802154: Convert at86rf230.txt yaml format
  net: dsa: microchip: Disable PTP function of KSZ8463
  net: dsa: microchip: Setup fiber ports for KSZ8463
  net: dsa: microchip: Write switch MAC address differently for KSZ8463
  net: dsa: microchip: Use different registers for KSZ8463
  net: dsa: microchip: Add KSZ8463 switch support to KSZ DSA driver
  dt-bindings: net: dsa: microchip: Add KSZ8463 switch support
  ...
2025-07-30 08:58:55 -07:00
Linus Torvalds
5623870d9b Merge tag 'irq-msi-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull MSI update from Thomas Gleixner:
 "A trivial cleanup in the PCI/MSI code to remove a duplicated back and
  forth conversion"

* tag 'irq-msi-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  PCI/MSI: Remove duplicated to_pci_dev() conversion
2025-07-29 12:52:06 -07:00
Paolo Bonzini
314b40b3b6 Merge tag 'kvmarm-6.17' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 changes for 6.17, round #1

 - Host driver for GICv5, the next generation interrupt controller for
   arm64, including support for interrupt routing, MSIs, interrupt
   translation and wired interrupts.

 - Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on
   GICv5 hardware, leveraging the legacy VGIC interface.

 - Userspace control of the 'nASSGIcap' GICv3 feature, allowing
   userspace to disable support for SGIs w/o an active state on hardware
   that previously advertised it unconditionally.

 - Map supporting endpoints with cacheable memory attributes on systems
   with FEAT_S2FWB and DIC where KVM no longer needs to perform cache
   maintenance on the address range.

 - Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the guest
   hypervisor to inject external aborts into an L2 VM and take traps of
   masked external aborts to the hypervisor.

 - Convert more system register sanitization to the config-driven
   implementation.

 - Fixes to the visibility of EL2 registers, namely making VGICv3 system
   registers accessible through the VGIC device instead of the ONE_REG
   vCPU ioctls.

 - Various cleanups and minor fixes.
2025-07-29 12:27:40 -04:00
Bjorn Helgaas
50fcd1c14e PCI: Fix typos
Fix typos.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20250722213743.2822761-1-helgaas@kernel.org
2025-07-23 13:12:38 -05:00
Jakub Kicinski
af2d6148d2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.16-rc7).

Conflicts:

Documentation/netlink/specs/ovpn.yaml
  880d43ca9a ("netlink: specs: clean up spaces in brackets")
  af52020fc5 ("ovpn: reject unexpected netlink attributes")

drivers/net/phy/phy_device.c
  a44312d58e ("net: phy: Don't register LEDs for genphy")
  f0f2b992d8 ("net: phy: Don't register LEDs for genphy")
https://lore.kernel.org/20250710114926.7ec3a64f@kernel.org

drivers/net/wireless/intel/iwlwifi/fw/regulatory.c
drivers/net/wireless/intel/iwlwifi/mld/regulatory.c
  5fde0fcbd7 ("wifi: iwlwifi: mask reserved bits in chan_state_active_bitmap")
  ea045a0de3 ("wifi: iwlwifi: add support for accepting raw DSM tables by firmware")

net/ipv6/mcast.c
  ae3264a25a ("ipv6: mcast: Delay put pmc->idev in mld_del_delrec()")
  a8594c956c ("ipv6: mcast: Avoid a duplicate pointer check in mld_del_delrec()")
https://lore.kernel.org/8cc52891-3653-4b03-a45e-05464fe495cf@kernel.org

No adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17 11:00:33 -07:00
Himanshu Madhani
68ea85df15 PCI/MSI: Prevent recursive locking in pci_msix_write_tph_tag()
pci_msix_write_tph_tag() takes the per device MSI descriptor mutex and then
invokes msi_domain_get_virq(), which takes the same mutex again. That
obviously results in a system hang which is exposed by a softlockup or
lockdep warning.

Move the lock guard after the invocation of msi_domain_get_virq() to fix
this.

[ tglx: Massage changelog by adding a proper explanation and removing the
  	not really useful stacktrace ]

Fixes: d5124a9957 ("PCI/MSI: Provide a sane mechanism for TPH")
Reported-by: Jorge Lopez <jorge.jo.lopez@oracle.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jorge Lopez <jorge.jo.lopez@oracle.com>
Link: https://lore.kernel.org/all/20250708222530.1041477-1-himanshu.madhani@oracle.com
2025-07-10 23:41:08 +02:00
Lorenzo Pieralisi
cd0ec59aff PCI/MSI: Add pci_msi_map_rid_ctlr_node() helper function
IRQchip drivers need a PCI/MSI function to map a RID to a MSI
controller deviceID namespace and at the same time retrieve the
struct device_node pointer of the MSI controller the RID is mapped
to.

Add pci_msi_map_rid_ctlr_node() to achieve this purpose.

Cc  Bjorn Helgaas <bhelgaas@google.com>

Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250703-gicv5-host-v7-25-12e71f1b3528@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-07-08 18:35:52 +01:00
Chris Li
4a089c0b3f PCI/MSI: Remove duplicated to_pci_dev() conversion
In pci_msi_update_mask(), "lock = &to_pci_dev()" does the to_pci_dev()
lookup, and there's another one buried inside msi_desc_to_pci_dev().

Introduce a local variable to remove that duplication.

Signed-off-by: Chris Li <chrisl@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/all/20250617-pci-msi-avoid-dup-pcidev-v1-1-ed75b0419023@kernel.org
2025-06-18 20:50:04 +02:00
Shradha Gupta
5da8a8b809 PCI/MSI: Export pci_msix_prepare_desc() for dynamic MSI-X allocations
For supporting dynamic MSI-X vector allocation by PCI controllers, enabling
the flag MSI_FLAG_PCI_MSIX_ALLOC_DYN is not enough, msix_prepare_msi_desc()
to prepare the MSI descriptor is also needed.

Export pci_msix_prepare_desc() to allow PCI controllers to support dynamic
MSI-X vector allocation.

Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
2025-06-17 06:15:14 +00:00
Marc Zyngier
9cc82d99b1 PCI/MSI: Size device MSI domain with the maximum number of vectors
Zenghui reports that since 1396e89e09 ("genirq/msi: Move prepare() call
to per-device allocation"), his Multi-MSI capable device isn't working
anymore.

This is a consequence of 15c72f824b ("PCI/MSI: Add support for per device
MSI[X] domains"), which always creates a MSI domain of size 1, even in the
presence of Multi-MSI.

While this was somehow working until then, moving the .prepare() call ends
up sizing the ITS table with a tiny value for this device, and making the
endpoint driver unhappy.

Instead, always create the domain and call the .prepare() helper with the
maximum expected size.

Fixes: 1396e89e09 ("genirq/msi: Move prepare() call to per-device allocation")
Fixes: 15c72f824b ("PCI/MSI: Add support for per device MSI[X] domains")
Reported-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Link: https://lore.kernel.org/all/20250603141801.915305-1-maz@kernel.org
Closes: https://lore.kernel.org/r/0b1d7aec-1eac-a9cd-502a-339e216e08a1@huawei.com
2025-06-04 11:19:25 +02:00
Linus Torvalds
44ed0f35df Merge tag 'irq-msi-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull MSI updates from Thomas Gleixner:
 "Updates for the MSI subsystem (core code and PCI):

   - Switch the MSI descriptor locking to lock guards

   - Replace a broken and naive implementation of PCI/MSI-X control word
     updates in the PCI/TPH driver with a properly serialized variant in
     the PCI/MSI core code.

   - Remove the MSI descriptor abuse in the SCCI/UFS/QCOM driver by
     replacing the direct access to the MSI descriptors with the proper
     API function calls. People will never understand that APIs exist
     for a reason...

   - Provide core infrastructre for the upcoming PCI endpoint library
     extensions. Currently limited to ARM GICv3+, but in theory
     extensible to other architectures.

   - Provide a MSI domain::teardown() callback, which allows drivers to
     undo the effects of the prepare() callback.

   - Move the MSI domain::prepare() callback invocation to domain
     creation time to avoid redundant (and in case of ARM/GIC-V3-ITS
     confusing) invocations on every allocation.

     In combination with the new teardown callback this removes some
     ugly hacks in the GIC-V3-ITS driver, which pretended to work around
     the short comings of the core code so far. With this update the
     code is correct by design and implementation.

   - Make the irqchip MSI library globally available, provide a MSI
     parent domain creation helper and convert a bunch of (PCI/)MSI
     drivers over to the modern MSI parent mechanism. This is the first
     step to get rid of at least one incarnation of the three PCI/MSI
     management schemes.

   - The usual small cleanups and improvements"

* tag 'irq-msi-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
  PCI/MSI: Use bool for MSI enable state tracking
  PCI: tegra: Convert to MSI parent infrastructure
  PCI: xgene: Convert to MSI parent infrastructure
  PCI: apple: Convert to MSI parent infrastructure
  irqchip/msi-lib: Honour the MSI_FLAG_NO_AFFINITY flag
  irqchip/mvebu: Convert to msi_create_parent_irq_domain() helper
  irqchip/gic: Convert to msi_create_parent_irq_domain() helper
  genirq/msi: Add helper for creating MSI-parent irq domains
  irqchip: Make irq-msi-lib.h globally available
  irqchip/gic-v3-its: Use allocation size from the prepare call
  genirq/msi: Engage the .msi_teardown() callback on domain removal
  genirq/msi: Move prepare() call to per-device allocation
  irqchip/gic-v3-its: Implement .msi_teardown() callback
  genirq/msi: Add .msi_teardown() callback as the reverse of .msi_prepare()
  irqchip/gic-v3-its: Add support for device tree msi-map and msi-mask
  dt-bindings: PCI: pci-ep: Add support for iommu-map and msi-map
  irqchip/gic-v3-its: Set IRQ_DOMAIN_FLAG_MSI_IMMUTABLE for ITS
  irqdomain: Add IRQ_DOMAIN_FLAG_MSI_IMMUTABLE and irq_domain_is_msi_immutable()
  platform-msi: Add msi_remove_device_irq_domain() in platform_device_msi_free_irqs_all()
  genirq/msi: Rename msi_[un]lock_descs()
  ...
2025-05-27 08:15:26 -07:00
Hans Zhang
4e7bca76e3 PCI/MSI: Use bool for MSI enable state tracking
Convert pci_msi_enable and pci_msi_enabled() to use bool type for clarity.
No functional changes, only code cleanup.

Signed-off-by: Hans Zhang <hans.zhang@cixtech.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250516165223.125083-2-18255117159@163.com
2025-05-21 21:28:53 +02:00
Jonathan Currier
cf761e3dac PCI/MSI: Add an option to write MSIX ENTRY_DATA before any reads
Commit 7d5ec3d361 ("PCI/MSI: Mask all unused MSI-X entries") introduced a
readl() from ENTRY_VECTOR_CTRL before the writel() to ENTRY_DATA.

This is correct, however some hardware, like the Sun Neptune chips, the NIU
module, will cause an error and/or fatal trap if any MSIX table entry is
read before the corresponding ENTRY_DATA field is written to.

Add an optional early writel() in msix_prepare_msi_desc().

Fixes: 7d5ec3d361 ("PCI/MSI: Mask all unused MSI-X entries")
Signed-off-by: Jonathan Currier <dullfire@yahoo.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20241117234843.19236-2-dullfire@yahoo.com
2025-04-15 08:32:18 +02:00
Thomas Gleixner
d5124a9957 PCI/MSI: Provide a sane mechanism for TPH
The PCI/TPH driver fiddles with the MSI-X control word of an active
interrupt completely unserialized against concurrent operations issued
from the interrupt core. It also brings the PCI/MSI-X internal cached
control word out of sync.

Provide a function, which has the required serialization and keeps the
control word cache in sync.

Unfortunately this requires to look up and lock the interrupt descriptor,
which should be only done in the interrupt core code. But confining this
particular oddity in the PCI/MSI core is the lesser of all evil. A
interrupt core implementation would require a larger pile of infrastructure
and indirections for dubious value.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/all/20250319105506.683663807@linutronix.de
2025-04-09 20:47:30 +02:00
Thomas Gleixner
891146645e PCI/MSI: Switch msix_capability_init() to guard(msi_desc_lock)
Split the lock protected functionality of msix_capability_init() out into a
helper function and use guard(msi_desc_lock) to replace the lock/unlock
pair.

Simplify the error path in the helper function by utilizing a custom
cleanup to get rid of the remaining gotos.

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/all/20250319105506.564105011@linutronix.de
2025-04-09 20:47:30 +02:00
Thomas Gleixner
f11cc2af8f PCI/MSI: Switch msi_capability_init() to guard(msi_desc_lock)
Split the lock protected functionality of msi_capability_init() out into a
helper function and use guard(msi_desc_lock) to replace the lock/unlock
pair.

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/all/20250319105506.504992208@linutronix.de
2025-04-09 20:47:30 +02:00
Thomas Gleixner
5c0ba4f9d2 PCI/MSI: Use __free() for affinity masks
Let cleanup handle the freeing of the affinity mask. That prepares for
switching the MSI descriptor locking to a guard().

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/all/20250319105506.444764312@linutronix.de
2025-04-09 20:47:30 +02:00
Thomas Gleixner
b0c44a5ec3 PCI/MSI: Set pci_dev:: Msi_enabled late
The comment claiming that pci_dev::msi_enabled has to be set across setup
is a leftover from ancient code versions. Nothing in the setup code
requires the flag to be set anymore.

Set it in the success path and remove the extra goto label.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/all/20250319105506.383222333@linutronix.de
2025-04-09 20:47:29 +02:00
Thomas Gleixner
497f68cff6 PCI/MSI: Use guard(msi_desc_lock) where applicable
Convert the trivial cases of msi_desc_lock/unlock() pairs.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/all/20250319105506.322536126@linutronix.de
2025-04-09 20:47:29 +02:00
Linus Torvalds
7d06015d93 Merge tag 'pci-v6.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci updates from Bjorn Helgaas:
 "Enumeration:

   - Enable Configuration RRS SV, which makes device readiness visible,
     early instead of during child bus scanning (Bjorn Helgaas)

   - Log debug messages about reset methods being used (Bjorn Helgaas)

   - Avoid reset when it has been disabled via sysfs (Nishanth
     Aravamudan)

   - Add common pci-ep-bus.yaml schema for exporting several peripherals
     of a single PCI function via devicetree (Andrea della Porta)

   - Create DT nodes for PCI host bridges to enable loading device tree
     overlays to create platform devices for PCI devices that have
     several features that require multiple drivers (Herve Codina)

  Resource management:

   - Enlarge devres table[] to accommodate bridge windows, ROM, IOV
     BARs, etc., and validate BAR index in devres interfaces (Philipp
     Stanner)

   - Fix typo that repeatedly distributed resources to a bridge instead
     of iterating over subordinate bridges, which resulted in too little
     space to assign some BARs (Kai-Heng Feng)

   - Relax bridge window tail sizing for optional resources, e.g., IOV
     BARs, to avoid failures when removing and re-adding devices (Ilpo
     Järvinen)

   - Allow drivers to enable devices even if we haven't assigned
     optional IOV resources to them (Ilpo Järvinen)

   - Rework handling of optional resources (IOV BARs, ROMs) to reduce
     failures if we can't allocate them (Ilpo Järvinen)

   - Fix a NULL dereference in the SR-IOV VF creation error path (Shay
     Drory)

   - Fix s390 mmio_read/write syscalls, which didn't cause page faults
     in some cases, which broke vfio-pci lazy mapping on first access
     (Niklas Schnelle)

   - Add pdev->non_mappable_bars to replace CONFIG_VFIO_PCI_MMAP, which
     was disabled only for s390 (Niklas Schnelle)

   - Support mmap of PCI resources on s390 except for ISM devices
     (Niklas Schnelle)

  ASPM:

   - Delay pcie_link_state deallocation to avoid dangling pointers that
     cause invalid references during hot-unplug (Daniel Stodden)

  Power management:

   - Allow PCI bridges to go to D3Hot when suspending on all non-x86
     systems (Manivannan Sadhasivam)

  Power control:

   - Create pwrctrl devices in pci_scan_device() to make it more
     symmetric with pci_pwrctrl_unregister() and make pwrctrl devices
     for PCI bridges possible (Manivannan Sadhasivam)

   - Unregister pwrctrl devices in pci_destroy_dev() so DOE, ASPM, etc.
     can still access devices after pci_stop_dev() (Manivannan
     Sadhasivam)

   - If there's a pwrctrl device for a PCI device, skip scanning it
     because the pwrctrl core will rescan the bus after the device is
     powered on (Manivannan Sadhasivam)

   - Add a pwrctrl driver for PCI slots based on voltage regulators
     described via devicetree (Manivannan Sadhasivam)

  Bandwidth control:

   - Add set_pcie_speed.sh to TEST_PROGS to fix issue when executing the
     set_pcie_cooling_state.sh test case (Yi Lai)

   - Avoid a NULL pointer dereference when we run out of bus numbers to
     assign for a bridge secondary bus (Lukas Wunner)

  Hotplug:

   - Drop superfluous pci_hotplug_slot_list, try_module_get() calls, and
     NULL pointer checks (Lukas Wunner)

   - Drop shpchp module init/exit logging, replace shpchp dbg() with
     ctrl_dbg(), and remove unused dbg(), err(), info(), warn() wrappers
     (Ilpo Järvinen)

   - Drop 'shpchp_debug' module parameter in favor of standard dynamic
     debugging (Ilpo Järvinen)

   - Drop unused cpcihp .get_power(), .set_power() function pointers
     (Guilherme Giacomo Simoes)

   - Disable hotplug interrupts in portdrv only when pciehp is not
     enabled to avoid issuing two hotplug commands too close together
     (Feng Tang)

   - Skip pciehp 'device replaced' check if the device has been removed
     to address a deadlock when resuming after a device was removed
     during system sleep (Lukas Wunner)

   - Don't enable pciehp hotplug interupt when resuming in poll mode
     (Ilpo Järvinen)

  Virtualization:

   - Fix bugs in 'pci=config_acs=' kernel command line parameter (Tushar
     Dave)

  DOE:

   - Expose supported DOE features via sysfs (Alistair Francis)

   - Allow DOE support to be enabled even if CXL isn't enabled (Alistair
     Francis)

  Endpoint framework:

   - Convert PCI device data so pci-epf-test works correctly on
     big-endian endpoint systems (Niklas Cassel)

   - Add BAR_RESIZABLE type to endpoint framework and add DWC core
     support for EPF drivers to set BAR_RESIZABLE type and size (Niklas
     Cassel)

   - Fix pci-epf-test double free that causes an oops if the host
     reboots and PERST# deassertion restarts endpoint BAR allocation
     (Christian Bruel)

   - Fix endpoint BAR testing so tests can skip disabled BARs instead of
     reporting them as failures (Niklas Cassel)

   - Widen endpoint test BAR size variable to accommodate BARs larger
     than INT_MAX (Niklas Cassel)

   - Remove unused tools 'pci' build target left over after moving tests
     to tools/testing/selftests/pci_endpoint (Jianfeng Liu)

  Altera PCIe controller driver:

   - Add DT binding and driver support for Agilex family (P-Tile,
     F-Tile, R-Tile) (Matthew Gerlach and D M, Sharath Kumar)

  AMD MDB PCIe controller driver:

   - Add DT binding and driver for AMD MDB (Multimedia DMA Bridge)
     (Thippeswamy Havalige)

  Broadcom STB PCIe controller driver:

   - Add BCM2712 MSI-X DT binding and interrupt controller drivers and
     add softdep on irq_bcm2712_mip driver to ensure that it is loaded
     first (Stanimir Varbanov)

   - Expand inbound window map to 64GB so it can accommodate BCM2712
     (Stanimir Varbanov)

   - Add BCM2712 support and DT updates (Stanimir Varbanov)

   - Apply link speed restriction before bringing link up, not after
     (Jim Quinlan)

   - Update Max Link Speed in Link Capabilities via the internal
     writable register, not the read-only config register (Jim Quinlan)

   - Handle regulator_bulk_get() error to avoid panic when we call
     regulator_bulk_free() later (Jim Quinlan)

   - Disable regulators only when removing the bus immediately below a
     Root Port because we don't support regulators deeper in the
     hierarchy (Jim Quinlan)

   - Make const read-only arrays static (Colin Ian King)

  Cadence PCIe endpoint driver:

   - Correct MSG TLP generation so endpoints can generate INTx messages
     (Hans Zhang)

  Freescale i.MX6 PCIe controller driver:

   - Identify the second controller on i.MX8MQ based on devicetree
     'linux,pci-domain' instead of DBI 'reg' address (Richard Zhu)

   - Remove imx_pcie_cpu_addr_fixup() since dwc core can now derive the
     ATU input address (using parent_bus_offset) from devicetree (Frank
     Li)

  Freescale Layerscape PCIe controller driver:

   - Drop deprecated 'num-ib-windows' and 'num-ob-windows' and
     unnecessary 'status' from example (Krzysztof Kozlowski)

   - Correct the syscon_regmap_lookup_by_phandle_args("fsl,pcie-scfg")
     arg_count to fix probe failure on LS1043A (Ioana Ciornei)

  HiSilicon STB PCIe controller driver:

   - Call phy_exit() to clean up if histb_pcie_probe() fails (Christophe
     JAILLET)

  Intel Gateway PCIe controller driver:

   - Remove intel_pcie_cpu_addr() since dwc core can now derive the ATU
     input address (using parent_bus_offset) from devicetree (Frank Li)

  Intel VMD host bridge driver:

   - Convert vmd_dev.cfg_lock from spinlock_t to raw_spinlock_t so
     pci_ops.read() will never sleep, even on PREEMPT_RT where
     spinlock_t becomes a sleepable lock, to avoid calling a sleeping
     function from invalid context (Ryo Takakura)

  MediaTek PCIe Gen3 controller driver:

   - Remove leftover mac_reset assert for Airoha EN7581 SoC (Lorenzo
     Bianconi)

   - Add EN7581 PBUS controller 'mediatek,pbus-csr' DT property and
     program host bridge memory aperture to this syscon node (Lorenzo
     Bianconi)

  Qualcomm PCIe controller driver:

   - Add qcom,pcie-ipq5332 binding (Varadarajan Narayanan)

   - Add qcom i.MX8QM and i.MX8QXP/DXP optional DMA interrupt (Alexander
     Stein)

   - Add optional dma-coherent DT property for Qualcomm SA8775P (Dmitry
     Baryshkov)

   - Make DT iommu property required for SA8775P and prohibited for
     SDX55 (Dmitry Baryshkov)

   - Add DT IOMMU and DMA-related properties for Qualcomm SM8450 (Dmitry
     Baryshkov)

   - Add endpoint DT properties for SAR2130P and enable endpoint mode in
     driver (Dmitry Baryshkov)

   - Describe endpoint BAR0 and BAR2 as 64-bit only and BAR1 and BAR3 as
     RESERVED (Manivannan Sadhasivam)

  Rockchip DesignWare PCIe controller driver:

   - Describe rk3568 and rk3588 BARs as Resizable, not Fixed (Niklas
     Cassel)

  Synopsys DesignWare PCIe controller driver:

   - Add debugfs-based Silicon Debug, Error Injection, Statistical
     Counter support for DWC (Shradha Todi)

   - Add debugfs property to expose LTSSM status of DWC PCIe link (Hans
     Zhang)

   - Add Rockchip support for DWC debugfs features (Niklas Cassel)

   - Add dw_pcie_parent_bus_offset() to look up the parent bus address
     of a specified 'reg' property and return the offset from the CPU
     physical address (Frank Li)

   - Use dw_pcie_parent_bus_offset() to derive CPU -> ATU addr offset
     via 'reg[config]' for host controllers and 'reg[addr_space]' for
     endpoint controllers (Frank Li)

   - Apply struct dw_pcie.parent_bus_offset in ATU users to remove use
     of .cpu_addr_fixup() when programming ATU (Frank Li)

  TI J721E PCIe driver:

   - Correct the 'link down' interrupt bit for J784S4 (Siddharth
     Vadapalli)

  TI Keystone PCIe controller driver:

   - Describe AM65x BARs 2 and 5 as Resizable (not Fixed) and reduce
     alignment requirement from 1MB to 64KB (Niklas Cassel)

  Xilinx Versal CPM PCIe controller driver:

   - Free IRQ domain in probe error path to avoid leaking it
     (Thippeswamy Havalige)

   - Add DT .compatible "xlnx,versal-cpm5nc-host" and driver support for
     Versal Net CPM5NC Root Port controller (Thippeswamy Havalige)

   - Add driver support for CPM5_HOST1 (Thippeswamy Havalige)

  Miscellaneous:

   - Convert fsl,mpc83xx-pcie binding to YAML (J. Neuschäfer)

   - Use for_each_available_child_of_node_scoped() to simplify apple,
     kirin, mediatek, mt7621, tegra drivers (Zhang Zekun)"

* tag 'pci-v6.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (197 commits)
  PCI: layerscape: Fix arg_count to syscon_regmap_lookup_by_phandle_args()
  PCI: j721e: Fix the value of .linkdown_irq_regfield for J784S4
  misc: pci_endpoint_test: Add support for PCITEST_IRQ_TYPE_AUTO
  PCI: endpoint: pci-epf-test: Expose supported IRQ types in CAPS register
  PCI: dw-rockchip: Endpoint mode cannot raise INTx interrupts
  PCI: endpoint: Add intx_capable to epc_features struct
  dt-bindings: PCI: Add common schema for devices accessible through PCI BARs
  PCI: intel-gw: Remove intel_pcie_cpu_addr()
  PCI: imx6: Remove imx_pcie_cpu_addr_fixup()
  PCI: dwc: Use parent_bus_offset to remove need for .cpu_addr_fixup()
  PCI: dwc: ep: Ensure proper iteration over outbound map windows
  PCI: dwc: ep: Use devicetree 'reg[addr_space]' to derive CPU -> ATU addr offset
  PCI: dwc: ep: Consolidate devicetree handling in dw_pcie_ep_get_resources()
  PCI: dwc: ep: Call epc_create() early in dw_pcie_ep_init()
  PCI: dwc: Use devicetree 'reg[config]' to derive CPU -> ATU addr offset
  PCI: dwc: Add dw_pcie_parent_bus_offset() checking and debug
  PCI: dwc: Add dw_pcie_parent_bus_offset()
  PCI/bwctrl: Fix NULL pointer dereference on bus number exhaustion
  PCI: xilinx-cpm: Add cpm_csr register mapping for CPM5_HOST1 variant
  PCI: brcmstb: Make const read-only arrays static
  ...
2025-03-28 19:36:53 -07:00
Linus Torvalds
112e43e9fd Revert "Merge tag 'irq-msi-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip"
This reverts commit 36f5f026df, reversing
changes made to 43a7eec035.

Thomas says:
 "I just noticed that for some incomprehensible reason, probably sheer
  incompetemce when trying to utilize b4, I managed to merge an outdated
  _and_ buggy version of that series.

  Can you please revert that merge completely?"

Done.

Requested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-03-28 11:22:54 -07:00
Thomas Gleixner
3ece3e8e59 PCI/MSI: Handle the NOMASK flag correctly for all PCI/MSI backends
The conversion of the XEN specific global variable pci_msi_ignore_mask to a
MSI domain flag, missed the facts that:

    1) Legacy architectures do not provide a interrupt domain
    2) Parent MSI domains do not necessarily have a domain info attached
   
Both cases result in an unconditional NULL pointer dereference. This was
unfortunatly missed in review and testing revealed it late.

Cure this by using the existing pci_msi_domain_supports() helper, which
handles all possible cases correctly.

Fixes: c3164d2e0d ("PCI/MSI: Convert pci_msi_ignore_mask to per MSI domain flag")
Reported-by: Daniel Gomez <da.gomez@kernel.org>
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Daniel Gomez <da.gomez@kernel.org>
Link: https://lore.kernel.org/all/87iknwyp2o.ffs@tglx
Closes: https://lore.kernel.org/all/qn7fzggcj6qe6r6gdbwcz23pzdz2jx64aldccmsuheabhmjgrt@tawf5nfwuvw7
2025-03-26 15:28:43 +01:00
Linus Torvalds
dce3ab4c57 Merge tag 'for-linus-6.15-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen updates from Juergen Gross:

 - cleanup: remove an used function

 - add support for a XenServer specific virtual PCI device

 - fix the handling of a sparse Xen hypervisor symbol table

 - avoid warnings when building the kernel with gcc 15

 - fix use of devices behind a VMD bridge when running as a Xen PV dom0

* tag 'for-linus-6.15-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  PCI/MSI: Convert pci_msi_ignore_mask to per MSI domain flag
  PCI: vmd: Disable MSI remapping bypass under Xen
  xen/pci: Do not register devices with segments >= 0x10000
  xen/pciback: Remove unused pcistub_get_pci_dev
  xenfs/xensyms: respect hypervisor's "next" indication
  xen/mcelog: Add __nonstring annotations for unterminated strings
  xen: Add support for XenServer 6.1 platform device
2025-03-25 14:33:32 -07:00
Roger Pau Monne
c3164d2e0d PCI/MSI: Convert pci_msi_ignore_mask to per MSI domain flag
Setting pci_msi_ignore_mask inhibits the toggling of the mask bit for both
MSI and MSI-X entries globally, regardless of the IRQ chip they are using.
Only Xen sets the pci_msi_ignore_mask when routing physical interrupts over
event channels, to prevent PCI code from attempting to toggle the maskbit,
as it's Xen that controls the bit.

However, the pci_msi_ignore_mask being global will affect devices that use
MSI interrupts but are not routing those interrupts over event channels
(not using the Xen pIRQ chip).  One example is devices behind a VMD PCI
bridge.  In that scenario the VMD bridge configures MSI(-X) using the
normal IRQ chip (the pIRQ one in the Xen case), and devices behind the
bridge configure the MSI entries using indexes into the VMD bridge MSI
table.  The VMD bridge then demultiplexes such interrupts and delivers to
the destination device(s).  Having pci_msi_ignore_mask set in that scenario
prevents (un)masking of MSI entries for devices behind the VMD bridge.

Move the signaling of no entry masking into the MSI domain flags, as that
allows setting it on a per-domain basis.  Set it for the Xen MSI domain
that uses the pIRQ chip, while leaving it unset for the rest of the
cases.

Remove pci_msi_ignore_mask at once, since it was only used by Xen code, and
with Xen dropping usage the variable is unneeded.

This fixes using devices behind a VMD bridge on Xen PV hardware domains.

Albeit Devices behind a VMD bridge are not known to Xen, that doesn't mean
Linux cannot use them.  By inhibiting the usage of
VMD_FEAT_CAN_BYPASS_MSI_REMAP and the removal of the pci_msi_ignore_mask
bodge devices behind a VMD bridge do work fine when use from a Linux Xen
hardware domain.  That's the whole point of the series.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Juergen Gross <jgross@suse.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Message-ID: <20250219092059.90850-4-roger.pau@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2025-03-21 08:57:57 +01:00
Thomas Gleixner
b9db8df433 PCI/MSI: Provide a sane mechanism for TPH
The PCI/TPH driver fiddles with the MSI-X control word of an active
interrupt completely unserialized against concurrent operations issued
from the interrupt core. It also brings the PCI/MSI-X internal cached
control word out of sync.

Provide a function, which has the required serialization and keeps the
control word cache in sync.

Unfortunately this requires to look up and lock the interrupt descriptor,
which should be only done in the interrupt core code. But confining this
particular oddity in the PCI/MSI core is the lesser of all evil. A
interrupt core implementation would require a larger pile of infrastructure
and indirections for dubious value.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/all/20250313130321.822790423@linutronix.de
2025-03-13 18:58:00 +01:00
Thomas Gleixner
1bc7e262a2 PCI/MSI: Switch to MSI descriptor locking to guard()
Convert the code to use the new guard(msi_descs_lock).

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/all/20250313130321.695027112@linutronix.de
2025-03-13 18:58:00 +01:00
Bjorn Helgaas
f4e026f454 PCI: Fix typos
Fix typos and whitespace errors.

Link: https://lore.kernel.org/r/20250307231715.438518-1-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-08 15:08:45 -06:00
Thomas Gleixner
a60b990798 PCI/MSI: Handle lack of irqdomain gracefully
Alexandre observed a warning emitted from pci_msi_setup_msi_irqs() on a
RISCV platform which does not provide PCI/MSI support:

 WARNING: CPU: 1 PID: 1 at drivers/pci/msi/msi.h:121 pci_msi_setup_msi_irqs+0x2c/0x32
 __pci_enable_msix_range+0x30c/0x596
 pci_msi_setup_msi_irqs+0x2c/0x32
 pci_alloc_irq_vectors_affinity+0xb8/0xe2

RISCV uses hierarchical interrupt domains and correctly does not implement
the legacy fallback. The warning triggers from the legacy fallback stub.

That warning is bogus as the PCI/MSI layer knows whether a PCI/MSI parent
domain is associated with the device or not. There is a check for MSI-X,
which has a legacy assumption. But that legacy fallback assumption is only
valid when legacy support is enabled, but otherwise the check should simply
return -ENOTSUPP.

Loongarch tripped over the same problem and blindly enabled legacy support
without implementing the legacy fallbacks. There are weak implementations
which return an error, so the problem was papered over.

Correct pci_msi_domain_supports() to evaluate the legacy mode and add
the missing supported check into the MSI enable path to complete it.

Fixes: d2a463b297 ("PCI/MSI: Reject multi-MSI early")
Reported-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/87ed2a8ow5.ffs@tglx
2024-12-16 10:59:47 +01:00
Linus Torvalds
66ebbdfdeb Merge tag 'irq-msi-2024-07-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull MSI interrupt updates from Thomas Gleixner:
 "Switch ARM/ARM64 over to the modern per device MSI domains.

  This simplifies the handling of platform MSI and wire to MSI
  controllers and removes about 500 lines of legacy code.

  Aside of that it paves the way for ARM/ARM64 to utilize the dynamic
  allocation of PCI/MSI interrupts and to support the upcoming non
  standard IMS (Interrupt Message Store) mechanism on PCIe devices"

* tag 'irq-msi-2024-07-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
  irqchip/gic-v3-its: Correctly fish out the DID for platform MSI
  irqchip/gic-v3-its: Correctly honor the RID remapping
  genirq/msi: Move msi_device_data to core
  genirq/msi: Remove platform MSI leftovers
  irqchip/irq-mvebu-icu: Remove platform MSI leftovers
  irqchip/irq-mvebu-sei: Switch to MSI parent
  irqchip/mvebu-odmi: Switch to parent MSI
  irqchip/mvebu-gicp: Switch to MSI parent
  irqchip/irq-mvebu-icu: Prepare for real per device MSI
  irqchip/imx-mu-msi: Switch to MSI parent
  irqchip/gic-v2m: Switch to device MSI
  irqchip/gic_v3_mbi: Switch over to parent domain
  genirq/msi: Remove platform_msi_create_device_domain()
  irqchip/mbigen: Remove platform_msi_create_device_domain() fallback
  irqchip/gic-v3-its: Switch platform MSI to MSI parent
  irqchip/irq-msi-lib: Prepare for DOMAIN_BUS_WIRED_TO_MSI
  irqchip/mbigen: Prepare for real per device MSI
  irqchip/irq-msi-lib: Prepare for DEVICE MSI to replace platform MSI
  irqchip/gic-v3-its: Provide MSI parent for PCI/MSI[-X]
  irqchip/irq-msi-lib: Prepare for PCI MSI/MSIX
  ...
2024-07-22 14:02:19 -07:00
Shivamurthy Shastri
7d189c7710 PCI/MSI: Provide MSI_FLAG_PCI_MSI_MASK_PARENT
Most ARM(64) PCI/MSI domains mask and unmask in the parent domain after or
before the PCI mask/unmask operation takes place. So there are more than a
dozen of the same wrapper implementation all over the place.

Don't make the same mistake with the new per device PCI/MSI domains and
provide a new MSI feature flag, which lets the domain implementation
enable this sequence in the PCI/MSI code.

Signed-off-by: Shivamurthy Shastri <shivamurthy.shastri@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/87ed8j34pj.ffs@tglx
2024-07-18 20:31:19 +02:00
Mostafa Saleh
9eee533065 PCI/MSI: Fix UAF in msi_capability_init
KFENCE reports the following UAF:

 BUG: KFENCE: use-after-free read in __pci_enable_msi_range+0x2c0/0x488

 Use-after-free read at 0x0000000024629571 (in kfence-#12):
  __pci_enable_msi_range+0x2c0/0x488
  pci_alloc_irq_vectors_affinity+0xec/0x14c
  pci_alloc_irq_vectors+0x18/0x28

 kfence-#12: 0x0000000008614900-0x00000000e06c228d, size=104, cache=kmalloc-128

 allocated by task 81 on cpu 7 at 10.808142s:
  __kmem_cache_alloc_node+0x1f0/0x2bc
  kmalloc_trace+0x44/0x138
  msi_alloc_desc+0x3c/0x9c
  msi_domain_insert_msi_desc+0x30/0x78
  msi_setup_msi_desc+0x13c/0x184
  __pci_enable_msi_range+0x258/0x488
  pci_alloc_irq_vectors_affinity+0xec/0x14c
  pci_alloc_irq_vectors+0x18/0x28

 freed by task 81 on cpu 7 at 10.811436s:
  msi_domain_free_descs+0xd4/0x10c
  msi_domain_free_locked.part.0+0xc0/0x1d8
  msi_domain_alloc_irqs_all_locked+0xb4/0xbc
  pci_msi_setup_msi_irqs+0x30/0x4c
  __pci_enable_msi_range+0x2a8/0x488
  pci_alloc_irq_vectors_affinity+0xec/0x14c
  pci_alloc_irq_vectors+0x18/0x28

Descriptor allocation done in:
__pci_enable_msi_range
    msi_capability_init
        msi_setup_msi_desc
            msi_insert_msi_desc
                msi_domain_insert_msi_desc
                    msi_alloc_desc
                        ...

Freed in case of failure in __msi_domain_alloc_locked()
__pci_enable_msi_range
    msi_capability_init
        pci_msi_setup_msi_irqs
            msi_domain_alloc_irqs_all_locked
                msi_domain_alloc_locked
                    __msi_domain_alloc_locked => fails
                    msi_domain_free_locked
                        ...

That failure propagates back to pci_msi_setup_msi_irqs() in
msi_capability_init() which accesses the descriptor for unmasking in the
error exit path.

Cure it by copying the descriptor and using the copy for the error exit path
unmask operation.

[ tglx: Massaged change log ]

Fixes: bf6e054e0e ("genirq/msi: Provide msi_device_populate/destroy_sysfs()")
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Bjorn Heelgas <bhelgaas@google.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240624203729.1094506-1-smostafa@google.com
2024-06-24 23:33:38 +02:00
Bjorn Helgaas
375a99fd86 Merge branch 'pci/ims-removal'
- Remove unused Interrupt Message Store (IMS) support (Bjorn Helgaas)

* pci/ims-removal:
  Revert "genirq/msi: Provide constants for PCI/IMS support"
  Revert "x86/apic/msi: Enable PCI/IMS"
  Revert "iommu/vt-d: Enable PCI/IMS"
  Revert "iommu/amd: Enable PCI/IMS"
  Revert "PCI/MSI: Provide IMS (Interrupt Message Store) support"
  Revert "PCI/MSI: Provide pci_ims_alloc/free_irq()"
  Revert "PCI/MSI: Provide stubs for IMS functions"
2024-05-16 18:14:14 -05:00
Bjorn Helgaas
ed11a28cb7 Merge branch 'pci/msi'
- Update coding style to "mainline is normal path, errors are the
  exceptions" (Andy Shevchenko)

* pci/msi:
  PCI/MSI: Make error path handling follow the standard pattern
2024-05-16 18:14:11 -05:00
Bjorn Helgaas
b966b11028 Revert "PCI/MSI: Provide IMS (Interrupt Message Store) support"
This reverts commit 0194425af0.

IMS (Interrupt Message Store) support appeared in v6.2, but there are no
users yet.

Remove it for now.  We can add it back when a user comes along.  If this is
re-added later, the relevant part of 41efa43124 ("PCI/MSI: Provide stubs
for IMS functions") should be squashed into it.

Link: https://lore.kernel.org/r/20240410221307.2162676-4-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2024-05-15 17:01:54 -05:00
Bjorn Helgaas
1794808fb1 Revert "PCI/MSI: Provide pci_ims_alloc/free_irq()"
This reverts commit c9e5bea273.

IMS (Interrupt Message Store) support appeared in v6.2, but there are no
users yet.

Remove it for now.  We can add it back when a user comes along.  If this is
re-added later, the relevant part of 41efa43124 ("PCI/MSI: Provide stubs
for IMS functions") should be squashed into it.

Link: https://lore.kernel.org/r/20240410221307.2162676-3-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2024-05-15 17:01:51 -05:00
Andy Shevchenko
0ba5cd94bb PCI/MSI: Make error path handling follow the standard pattern
Make error path handling follow the standard pattern, i.e.  checking for
errors first. This makes code much easier to read and understand despite
being a bit longer.

Link: https://lore.kernel.org/r/20240426144039.557907-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-04-26 17:04:20 -05:00