772 Commits

Author SHA1 Message Date
Linus Torvalds
249872f53d Merge tag 'tsm-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/devsec/tsm
Pull PCIe Link Encryption and Device Authentication from Dan Williams:
 "New PCI infrastructure and one architecture implementation for PCIe
  link encryption establishment via platform firmware services.

  This work is the result of multiple vendors coming to consensus on
  some core infrastructure (thanks Alexey, Yilun, and Aneesh!), and
  three vendor implementations, although only one is included in this
  pull. The PCI core changes have an ack from Bjorn, the crypto/ccp/
  changes have an ack from Tom, and the iommu/amd/ changes have an ack
  from Joerg.

  PCIe link encryption is made possible by the soup of acronyms
  mentioned in the shortlog below. Link Integrity and Data Encryption
  (IDE) is a protocol for installing keys in the transmitter and
  receiver at each end of a link. That protocol is transported over Data
  Object Exchange (DOE) mailboxes using PCI configuration requests.

  The aspect that makes this a "platform firmware service" is that the
  key provisioning and protocol is coordinated through a Trusted
  Execution Envrionment (TEE) Security Manager (TSM). That is either
  firmware running in a coprocessor (AMD SEV-TIO), or quasi-hypervisor
  software (Intel TDX Connect / ARM CCA) running in a protected CPU
  mode.

  Now, the only reason to ask a TSM to run this protocol and install the
  keys rather than have a Linux driver do the same is so that later, a
  confidential VM can ask the TSM directly "can you certify this
  device?".

  That precludes host Linux from provisioning its own keys, because host
  Linux is outside the trust domain for the VM. It also turns out that
  all architectures, save for one, do not publish a mechanism for an OS
  to establish keys in the root port. So "TSM-established link
  encryption" is the only cross-architecture path for this capability
  for the foreseeable future.

  This unblocks the other arch implementations to follow in v6.20/v7.0,
  once they clear some other dependencies, and it unblocks the next
  phase of work to implement the end-to-end flow of confidential device
  assignment. The PCIe specification calls this end-to-end flow Trusted
  Execution Environment (TEE) Device Interface Security Protocol
  (TDISP).

  In the meantime, Linux gets a link encryption facility which has
  practical benefits along the same lines as memory encryption. It
  authenticates devices via certificates and may protect against
  interposer attacks trying to capture clear-text PCIe traffic.

  Summary:

   - Introduce the PCI/TSM core for the coordination of device
     authentication, link encryption and establishment (IDE), and later
     management of the device security operational states (TDISP).
     Notify the new TSM core layer of PCI device arrival and departure

   - Add a low level TSM driver for the link encryption establishment
     capabilities of the AMD SEV-TIO architecture

   - Add a library of helpers TSM drivers to use for IDE establishment
     and the DOE transport

   - Add skeleton support for 'bind' and 'guest_request' operations in
     support of TDISP"

* tag 'tsm-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/devsec/tsm: (23 commits)
  crypto/ccp: Fix CONFIG_PCI=n build
  virt: Fix Kconfig warning when selecting TSM without VIRT_DRIVERS
  crypto/ccp: Implement SEV-TIO PCIe IDE (phase1)
  iommu/amd: Report SEV-TIO support
  psp-sev: Assign numbers to all status codes and add new
  ccp: Make snp_reclaim_pages and __sev_do_cmd_locked public
  PCI/TSM: Add 'dsm' and 'bound' attributes for dependent functions
  PCI/TSM: Add pci_tsm_guest_req() for managing TDIs
  PCI/TSM: Add pci_tsm_bind() helper for instantiating TDIs
  PCI/IDE: Initialize an ID for all IDE streams
  PCI/IDE: Add Address Association Register setup for downstream MMIO
  resource: Introduce resource_assigned() for discerning active resources
  PCI/TSM: Drop stub for pci_tsm_doe_transfer()
  drivers/virt: Drop VIRT_DRIVERS build dependency
  PCI/TSM: Report active IDE streams
  PCI/IDE: Report available IDE streams
  PCI/IDE: Add IDE establishment helpers
  PCI: Establish document for PCI host bridge sysfs attributes
  PCI: Add PCIe Device 3 Extended Capability enumeration
  PCI/TSM: Establish Secure Sessions and Link Encryption
  ...
2025-12-06 10:15:41 -08:00
Linus Torvalds
43dfc13ca9 Merge tag 'pci-v6.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull PCI updates from Bjorn Helgaas:
 "Enumeration:

   - Enable host bridge emulation for PCI_DOMAINS_GENERIC platforms (Dan
     Williams)

   - Switch vmd from custom domain number allocator to the common
     allocator to prevent a potential race with new non-VMD buses (Dan
     Williams)

   - Enable Precision Time Measurement (PTM) only if device advertises
     support for a relevant role, to prevent invalid PTM Requests that
     cause ACS violations that are reported as AER Uncorrectable
     Non-Fatal errors (Mika Westerberg)

  Resource management:

   - Prevent resource tree corruption when BAR resize fails (Ilpo
     Järvinen)

   - Restore BARs to the original size if a BAR resize fails (Ilpo
     Järvinen)

   - Remove BAR release from BAR resize attempts by the xe, i915, and
     amdgpu drivers so the PCI core can restore BARs if the resize fails
     (Ilpo Järvinen)

   - Move Resizable BAR code to rebar.c (Ilpo Järvinen)

   - Add pci_rebar_size_supported() and use it in i915 and xe (Ilpo
     Järvinen)

   - Add pci_rebar_get_max_size() and use it in xe and amdgpu (Ilpo
     Järvinen)

  Power management and error handling:

   - For drivers using PCI legacy suspend, save config state at suspend
     so that state (not any earlier state from enumeration, probe, or
     error recovery) will be restored when resuming (Lukas Wunner)

   - For devices with no driver or a driver that lacks power management,
     save config state at hibernate so that state (not any earlier state
     from enumeration, probe, or error recovery) will be restored when
     resuming (Lukas Wunner)

   - Save device config space on device addition, before driver binding,
     so error recovery works more reliably (Lukas Wunner)

   - Drop pci_save_state() from several drivers that no longer need it
     since the PCI core always does it and pci_restore_state() no longer
     invalidates the saved state (Lukas Wunner)

   - Document use of pci_save_state() by drivers to capture the state
     they want restored during error recovery (Lukas Wunner)

  Power control:

   - Add a struct pci_ops.assert_perst() function pointer to
     assert/deassert PCIe PERST# and implement it for the qcom driver
     (Krishna Chaitanya Chundru)

   - Add DT binding and pwrctrl driver for the Toshiba TC9563 PCIe
     switch, which must be held in reset after poweron so the pwrctrl
     driver can configure the switch via I2C before bringing up the
     links (Krishna Chaitanya Chundru)

  Endpoint framework:

   - Convert the endpoint doorbell test to use a threaded IRQ to fix a
     'sleeping while atomic' issue (Bhanu Seshu Kumar Valluri)

   - Add endpoint VNTB MSI doorbell support to reduce latency between
     host and endpoint (Frank Li)

  New native PCIe controller drivers:

   - Add CIX Sky1 host controller DT binding and driver (Hans Zhang)

   - Add NXP S32G host controller DT binding and driver (Vincent
     Guittot)

   - Add Renesas RZ/G3S host controller DT binding and driver (Claudiu
     Beznea)

   - Add SpacemiT K1 host controller DT binding and driver (Alex Elder)

  Amlogic Meson PCIe controller driver:

   - Update DT binding to name DBI region 'dbi', not 'elbi', and update
     driver to support both (Manivannan Sadhasivam)

  Apple PCIe controller driver:

   - Move struct pci_host_bridge allocation from pci_host_common_init()
     to callers, which significantly simplifies pcie-apple (Marc
     Zyngier)

  Broadcom STB PCIe controller driver:

   - Disable advertising ASPM L0s support correctly (Jim Quinlan)

   - Add a panic/die handler to print diagnostic info in case PCIe
     caused an unrecoverable abort (Jim Quinlan)

  Cadence PCIe controller driver:

   - Add module support for Cadence platform host and endpoint
     controller driver (Manikandan K Pillai)

   - Split headers into 'legacy' (LGA) and 'high perf' (HPA) to prepare
     for new CIX Sky1 driver (Manikandan K Pillai)

  MediaTek PCIe controller driver:

   - Convert DT binding to YAML schema (Christian Marangi)

   - Add Airoha AN7583 DT compatible and driver support (Christian
     Marangi)

  Qualcomm PCIe controller driver:

   - Add Qualcomm Kaanapali to SM8550 DT binding (Qiang Yu)

   - Add required 'power-domains' and 'resets' to qcom sa8775p, sc7280,
     sc8280xp, sm8150, sm8250, sm8350, sm8450, sm8550, x1e80100 DT
     schemas (Krzysztof Kozlowski)

   - Look up OPP using both frequency and data rate (not just frequency)
     so RPMh votes can account for both (Krishna Chaitanya Chundru)

  Rockchip DesignWare PCIe controller driver:

   - Add Rockchip RK3528 compatible strings in DT binding (Yao Zi)

  STMicroelectronics STM32MP25 PCIe controller driver:

   - Fix a race between link training and endpoint register
     initialization (Christian Bruel)

   - Align endpoint allocations to match the ATU requirements (Christian
     Bruel)

  Synopsys DesignWare PCIe controller driver:

   - Clear L1 PM Substate Capability 'Supported' bits unless glue driver
     says it's supported, which prevents users from enabling non-working
     L1SS. Currently only qcom and tegra194 support L1SS (Bjorn Helgaas)

   - Remove now-superfluous L1SS disable code from tegra194 (Bjorn
     Helgaas)

   - Configure L1SS support in dw-rockchip when DT says
     'supports-clkreq' (Shawn Lin)

  TI Keystone PCIe controller driver:

   - Fail the probe instead of silently succeeding if ks_pcie_of_data
     didn't specify Root Complex or Endpoint mode (Siddharth Vadapalli)

   - Make keystone buildable as a loadable module, except on ARM32 where
     hook_fault_code() is __init (Siddharth Vadapalli)"

* tag 'pci-v6.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (100 commits)
  MAINTAINERS: Add Manivannan Sadhasivam as PCI/pwrctrl maintainer
  MAINTAINERS: Add CIX Sky1 PCIe controller driver maintainer
  PCI: sky1: Add PCIe host support for CIX Sky1
  dt-bindings: PCI: Add CIX Sky1 PCIe Root Complex bindings
  PCI: cadence: Add support for High Perf Architecture (HPA) controller
  MAINTAINERS: Add NXP S32G PCIe controller driver maintainer
  PCI: s32g: Add NXP S32G PCIe controller driver (RC)
  PCI: dwc: Add register and bitfield definitions
  dt-bindings: PCI: s32g: Add NXP S32G PCIe controller
  PCI: Add Renesas RZ/G3S host controller driver
  PCI: host-generic: Move bridge allocation outside of pci_host_common_init()
  dt-bindings: PCI: Add Renesas RZ/G3S PCIe controller binding
  PCI: Validate pci_rebar_size_supported() input
  Documentation: PCI: Amend error recovery doc with pci_save_state() rules
  treewide: Drop pci_save_state() after pci_restore_state()
  PCI/ERR: Ensure error recoverability at all times
  PCI/PM: Stop needlessly clearing state_saved on enumeration and thaw
  PCI/PM: Reinstate clearing state_saved in legacy and !PM codepaths
  PCI: dw-rockchip: Configure L1SS support
  PCI: tegra194: Remove unnecessary L1SS disable code
  ...
2025-12-04 17:29:41 -08:00
Bjorn Helgaas
cd6b7c82b6 Merge branch 'pci/misc'
- Use max() instead of max_t() to ease static analysis (David Laight)

- Add Manivannan Sadhasivam as PCI/pwrctrl maintainer (Bartosz Golaszewski)

* pci/misc:
  MAINTAINERS: Add Manivannan Sadhasivam as PCI/pwrctrl maintainer
  PCI: Use max() instead of max_t() to ease static analysis
2025-12-03 14:18:46 -06:00
Bjorn Helgaas
5c5b8751e5 Merge branch 'pci/err'
- For drivers using PCI legacy suspend, save config state at suspend so
  that state (not any earlier state from enumeration, probe, or error
  recovery) will be restored when resuming (Lukas Wunner)

- For devices with no driver or a driver that lacks PM, save config state
  at hibernate so that state (not any earlier state from enumeration,
  probe, or error recovery) will be restored when resuming (Lukas Wunner)

- Save device config space on device addition, before driver binding, so
  error recovery works more reliably (Lukas Wunner)

- Drop pci_save_state() from several drivers that no longer need it since
  the PCI core always does it and pci_restore_state() no longer invalidates
  the saved state (Lukas Wunner)

- Document use of pci_save_state() by drivers to capture the state they
  want restored during error recovery (Lukas Wunner)

* pci/err:
  Documentation: PCI: Amend error recovery doc with pci_save_state() rules
  treewide: Drop pci_save_state() after pci_restore_state()
  PCI/ERR: Ensure error recoverability at all times
  PCI/PM: Stop needlessly clearing state_saved on enumeration and thaw
  PCI/PM: Reinstate clearing state_saved in legacy and !PM codepaths
2025-12-03 14:18:31 -06:00
Lukas Wunner
be9edde43d PCI/PM: Stop needlessly clearing state_saved on enumeration and thaw
The state_saved flag tells the PCI core whether a driver assumes
responsibility to save Config Space and put the device into a low power
state on suspend.

The flag is currently initialized to false on enumeration, even though it
already is false (because struct pci_dev is zeroed by kzalloc()) and even
though it is set to false before commencing the suspend sequence (the only
code path where it's relevant).

The flag is also set to false in pci_pm_thaw(), i.e. on resume, when it's
no longer relevant.

Drop these two superfluous flag assignments for simplicity.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Link: https://patch.msgid.link/fd167945bd7852e1ca08cd4b202130659eea2c2f.1763483367.git.lukas@wunner.de
2025-11-24 16:58:20 -06:00
David Laight
e2378e6115 PCI: Use max() instead of max_t() to ease static analysis
In this code:

  used_buses = max_t(unsigned int, available_buses,
                     pci_hotplug_bus_size - 1);

max_t() casts the 'unsigned long' pci_hotplug_bus_size (either 32 or 64
bits) to 'unsigned int' (32 bits) result type, so there's a potential of
discarding significant bits.

Instead, use max(a, b), which casts 'unsigned int' to 'unsigned long' and
cannot discard significant bits.

In this case, pci_hotplug_bus_size is constrained to <= 0xff by pci_setup()
so this doesn't fix a bug, but it makes static analysis easier.

Signed-off-by: David Laight <david.laight.linux@gmail.com>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20251119224140.8616-26-david.laight.linux@gmail.com
2025-11-24 14:37:13 -06:00
Linus Torvalds
7a0892d283 Merge tag 'pci-v6.18-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci fixes from Bjorn Helgaas:

 - Cache the ASPM L0s/L1 Supported bits early so quirks can override
   them if necessary (Bjorn Helgaas)

 - Add quirks for PA Semi and Freescale Root Ports and a HiSilicon Wi-Fi
   device that are reported to have broken L0s and L1 (Shawn Lin, Bjorn
   Helgaas)

* tag 'pci-v6.18-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
  PCI/ASPM: Avoid L0s and L1 on Hi1105 [19e5:1105] Wi-Fi
  PCI/ASPM: Avoid L0s and L1 on PA Semi [1959:a002] Root Ports
  PCI/ASPM: Avoid L0s and L1 on Freescale [1957:0451] Root Ports
  PCI/ASPM: Convert quirks to override advertised link states
  PCI/ASPM: Add pcie_aspm_remove_cap() to override advertised link states
  PCI/ASPM: Cache L0s/L1 Supported so advertised link states can be overridden
2025-11-14 15:45:31 -08:00
Bjorn Helgaas
4495bffd86 PCI/ASPM: Cache L0s/L1 Supported so advertised link states can be overridden
Defective devices sometimes advertise support for ASPM L0s or L1 states
even if they don't work correctly.

Cache the L0s Supported and L1 Supported bits early in enumeration so
HEADER quirks can override the ASPM states advertised in Link Capabilities
before pcie_aspm_cap_init() enables ASPM.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Shawn Lin <shawn.lin@rock-chips.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Link: https://patch.msgid.link/20251110222929.2140564-2-helgaas@kernel.org
2025-11-12 18:47:16 -06:00
Dan Williams
9ddaf9c3ed PCI/IDE: Report available IDE streams
The limited number of link-encryption (IDE) streams that a given set of
host bridges supports is a platform specific detail. Provide
pci_ide_init_nr_streams() as a generic facility for either platform TSM
drivers, or PCI core native IDE, to report the number available streams.
After invoking pci_ide_init_nr_streams() an "available_secure_streams"
attribute appears in PCI host bridge sysfs to convey that count.

Introduce a device-type, @pci_host_bridge_type, now that both a release
method and sysfs attribute groups are being specified for all 'struct
pci_host_bridge' instances.

Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Samuel Ortiz <sameo@rivosinc.com>
Cc: Alexey Kardashevskiy <aik@amd.com>
Cc: Xu Yilun <yilun.xu@linux.intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://patch.msgid.link/20251031212902.2256310-9-dan.j.williams@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2025-11-03 19:27:41 -08:00
Dan Williams
1e4d2ff3ae PCI/IDE: Add IDE establishment helpers
There are two components to establishing an encrypted link, provisioning
the stream in Partner Port config-space, and programming the keys into
the link layer via IDE_KM (IDE Key Management). This new library,
drivers/pci/ide.c, enables the former. IDE_KM, via a TSM low-level
driver, is saved for later.

With the platform TSM implementations of SEV-TIO and TDX Connect in mind
this library abstracts small differences in those implementations. For
example, TDX Connect handles Root Port register setup while SEV-TIO
expects System Software to update the Root Port registers. This is the
rationale for fine-grained 'setup' + 'enable' verbs.

The other design detail for TSM-coordinated IDE establishment is that
the TSM may manage allocation of Stream IDs, this is why the Stream ID
value is passed in to pci_ide_stream_setup().

The flow is:

pci_ide_stream_alloc():
    Allocate a Selective IDE Stream Register Block in each Partner Port
    (Endpoint + Root Port), and reserve a host bridge / platform stream
    slot. Gather Partner Port specific stream settings like Requester ID.

pci_ide_stream_register():
    Publish the stream in sysfs after allocating a Stream ID. In the TSM
    case the TSM allocates the Stream ID for the Partner Port pair.

pci_ide_stream_setup():
    Program the stream settings to a Partner Port. Caller is responsible
    for optionally calling this for the Root Port as well if the TSM
    implementation requires it.

pci_ide_stream_enable():
    Enable the stream after IDE_KM.

In support of system administrators auditing where platform, Root Port,
and Endpoint IDE stream resources are being spent, the allocated stream
is reflected as a symlink from the host bridge to the endpoint with the
name:

    stream%d.%d.%d

Where the tuple of integers reflects the allocated platform, Root Port,
and Endpoint stream index (Selective IDE Stream Register Block) values.

Thanks to Wu Hao for a draft implementation of this infrastructure.

Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Samuel Ortiz <sameo@rivosinc.com>
Co-developed-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Co-developed-by: Xu Yilun <yilun.xu@linux.intel.com>
Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://patch.msgid.link/20251031212902.2256310-8-dan.j.williams@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2025-11-03 19:27:41 -08:00
Dan Williams
c0c1262fbf PCI: Add PCIe Device 3 Extended Capability enumeration
PCIe r7.0 Section 7.7.9 Device 3 Extended Capability Structure, defines the
canonical location for determining the Flit Mode of a device. This status
is a dependency for PCIe IDE enabling. Add a new fm_enabled flag to 'struct
pci_dev'.

Cc: Lukas Wunner <lukas@wunner.de>
Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Samuel Ortiz <sameo@rivosinc.com>
Cc: Alexey Kardashevskiy <aik@amd.com>
Cc: Xu Yilun <yilun.xu@linux.intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://patch.msgid.link/20251031212902.2256310-6-dan.j.williams@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2025-11-03 19:27:41 -08:00
Dan Williams
3225f52cde PCI/TSM: Establish Secure Sessions and Link Encryption
The PCIe 7.0 specification, section 11, defines the Trusted Execution
Environment (TEE) Device Interface Security Protocol (TDISP).  This
protocol definition builds upon Component Measurement and Authentication
(CMA), and link Integrity and Data Encryption (IDE). It adds support for
assigning devices (PCI physical or virtual function) to a confidential VM
such that the assigned device is enabled to access guest private memory
protected by technologies like Intel TDX, AMD SEV-SNP, RISCV COVE, or ARM
CCA.

The "TSM" (TEE Security Manager) is a concept in the TDISP specification
of an agent that mediates between a "DSM" (Device Security Manager) and
system software in both a VMM and a confidential VM. A VMM uses TSM ABIs
to setup link security and assign devices. A confidential VM uses TSM
ABIs to transition an assigned device into the TDISP "RUN" state and
validate its configuration. From a Linux perspective the TSM abstracts
many of the details of TDISP, IDE, and CMA. Some of those details leak
through at times, but for the most part TDISP is an internal
implementation detail of the TSM.

CONFIG_PCI_TSM adds an "authenticated" attribute and "tsm/" subdirectory
to pci-sysfs. Consider that the TSM driver may itself be a PCI driver.
Userspace can watch for the arrival of a "TSM" device,
/sys/class/tsm/tsm0/uevent KOBJ_CHANGE, to know when the PCI core has
initialized TSM services.

The operations that can be executed against a PCI device are split into
two mutually exclusive operation sets, "Link" and "Security" (struct
pci_tsm_{link,security}_ops). The "Link" operations manage physical link
security properties and communication with the device's Device Security
Manager firmware. These are the host side operations in TDISP. The
"Security" operations coordinate the security state of the assigned
virtual device (TDI). These are the guest side operations in TDISP.

Only "link" (Secure Session and physical Link Encryption) operations are
defined at this stage. There are placeholders for the device security
(Trusted Computing Base entry / exit) operations.

The locking allows for multiple devices to be executing commands
simultaneously, one outstanding command per-device and an rwsem
synchronizes the implementation relative to TSM registration/unregistration
events.

Thanks to Wu Hao for his work on an early draft of this support.

Cc: Lukas Wunner <lukas@wunner.de>
Cc: Samuel Ortiz <sameo@rivosinc.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Alexey Kardashevskiy <aik@amd.com>
Co-developed-by: Xu Yilun <yilun.xu@linux.intel.com>
Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
Link: https://patch.msgid.link/20251031212902.2256310-5-dan.j.williams@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2025-11-03 19:27:41 -08:00
Dan Williams
f16469ee73 PCI/IDE: Enumerate Selective Stream IDE capabilities
Link encryption is a new PCIe feature enumerated by "PCIe r7.0 section
7.9.26 IDE Extended Capability".

It is both a standalone port + endpoint capability, and a building block
for the security protocol defined by "PCIe r7.0 section 11 TEE Device
Interface Security Protocol (TDISP)". That protocol coordinates device
security setup between a platform TSM (TEE Security Manager) and a
device DSM (Device Security Manager). While the platform TSM can
allocate resources like Stream ID and manage keys, it still requires
system software to manage the IDE capability register block.

Add register definitions and basic enumeration in preparation for
Selective IDE Stream establishment. A follow on change selects the new
CONFIG_PCI_IDE symbol. Note that while the IDE specification defines
both a point-to-point "Link Stream" and a Root Port to endpoint
"Selective Stream", only "Selective Stream" is considered for Linux as
that is the predominant mode expected by Trusted Execution Environment
Security Managers (TSMs), and it is the security model that limits the
number of PCI components within the TCB in a PCIe topology with
switches.

Co-developed-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Co-developed-by: Xu Yilun <yilun.xu@linux.intel.com>
Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alexey Kardashevskiy <aik@amd.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@kernel.org>
Link: https://patch.msgid.link/20251031212902.2256310-3-dan.j.williams@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2025-11-03 19:27:40 -08:00
Dan Williams
bcce8c74f1 PCI: Enable host bridge emulation for PCI_DOMAINS_GENERIC platforms
The ability to emulate a host bridge is useful not only for hardware PCI
controllers like CONFIG_VMD, or virtual PCI controllers like
CONFIG_PCI_HYPERV, but also for test and development scenarios like
CONFIG_SAMPLES_DEVSEC [1].

One stumbling block for defining CONFIG_SAMPLES_DEVSEC, a sample
implementation of a platform TSM for PCI Device Security, is the need to
accommodate PCI_DOMAINS_GENERIC architectures alongside x86 [2].

In support of supplementing the existing CONFIG_PCI_BRIDGE_EMUL
infrastructure for host bridges:

* Introduce pci_bus_find_emul_domain_nr() as a common way to find a free
  PCI domain number whether that is to reuse the existing dynamic
  allocation code in the !ACPI case, or to assign an unused domain above
  the last ACPI segment.

* Convert pci-hyperv to the new allocator so that the PCI core can
  unconditionally assume that bridge->domain_nr != PCI_DOMAIN_NR_NOT_SET
  is the dynamically allocated case.

A follow on patch can also convert vmd to the new scheme. Currently vmd is
limited to CONFIG_PCI_DOMAINS_GENERIC=n (x86) so, unlike pci-hyperv, it
does not immediately conflict with this new pci_bus_find_emul_domain_nr()
mechanism.

Link: http://lore.kernel.org/174107249038.1288555.12362100502109498455.stgit@dwillia2-xfh.jf.intel.com [1]
Reported-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Closes: http://lore.kernel.org/20250311144601.145736-3-suzuki.poulose@arm.com [2]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Cc: Lorenzo Pieralisi <lpieralisi@kernel.org>
Cc: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Cc: Rob Herring <robh@kernel.org>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Dexuan Cui <decui@microsoft.com>
Link: https://patch.msgid.link/20251024224622.1470555-2-dan.j.williams@intel.com
2025-10-28 12:36:34 -05:00
Ilpo Järvinen
469276c06a PCI: Revert early bridge resource set up
The commit a43ac325c7 ("PCI: Set up bridge resources earlier") moved
bridge window resources set up earlier than before. The change was
necessary to support another change that got pulled on the last minute
due to breaking s390 and other systems.

The presence of valid bridge window resources earlier than before allows
pci_assign_unassigned_root_bus_resources() call from pci_host_probe()
assign the bridge windows. Some host bridges, however, have to wait first
for the link up event before they can enumerate successfully (see e.g.
qcom_pcie_global_irq_thread()) and thus the bus has not been enumerated yet
while calling pci_host_probe().

Calling pci_assign_unassigned_root_bus_resources() without results from
enumeration can result in sizing bridge windows with too small sizes which
cannot be later corrected after the enumeration has completed because
bridge windows have become pinned in place by the other resources.

Interestingly, it seems pci_read_bridge_bases() is not called at all in the
problematic case and the bridge window resource type setup is done by
pci_bridge_check_ranges() and sizing by the usual resource fitting logic.

The root problem behind all this looks pretty generic. If resource fitting
is called too early, the hotplug reservation and old size lower bounding
cause the bridge windows to be assigned without children but with non-zero
size, which leads to these pinning problems. As such, this can likely be
solved on the general level but the solution does not look trivial.

As the commit a43ac325c7 ("PCI: Set up bridge resources earlier") was
prequisite for other change that did not end up into kernel yet, revert it
to resolve the resource assignment failures and give time to code and test
a generic solution.

Fixes: a43ac325c7 ("PCI: Set up bridge resources earlier")
Reported-by: Val Packett <val@packett.cool>
Link: https://lore.kernel.org/r/017ff8df-511c-4da8-b3cf-edf2cb7f1a67@packett.cool
Reported-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/df266709-a9b3-4fd8-af3a-c22eb3c9523a@roeck-us.net
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20251014163602.17138-1-ilpo.jarvinen@linux.intel.com
2025-10-14 15:36:07 -05:00
Bjorn Helgaas
3d56c86318 Merge branch 'pci/virtualization'
- Add rescan/remove locking when enabling/disabling SR-IOV, which solves
  list corruption on s390, where disabling SR-IOV also generates hotplug
  events (Niklas Schnelle)

- Add lockdep assertion in pci_stop_and_remove_bus_device() to catch
  device removal without appropriate locking (Niklas Schnelle)

* pci/virtualization:
  PCI: Add lockdep assertion in pci_stop_and_remove_bus_device()
  PCI/IOV: Add PCI rescan-remove locking when enabling/disabling SR-IOV
2025-10-03 12:13:12 -05:00
Bjorn Helgaas
fead6a0b15 Merge branch 'pci/resource'
- Ensure relaxed tail alignment does not increase min_align when computing
  bridge window size, to fix a regression (Ilpo Järvinen)

- Fix bridge window size computation to fix a regression for devices with
  undefined PCI class, e.g., Samsung [144d:a5a5] (Ilpo Järvinen)

- Fix error handling during resource resize to fix a regression in amdgpu
  (Ilpo Järvinen)

- Align m68k pcibios_enable_device() with other arches (Ilpo Järvinen)

- Remove several sparc pcibios_enable_device() implementations that don't
  do anything beyond what pci_enable_resources() does (Ilpo Järvinen)

- Remove mips pcibios_enable_resources() and use pci_enable_resources()
  instead (Ilpo Järvinen)

- Refactor and simplify find_bus_resource_of_type() (Ilpo Järvinen)

- Claim bridge windows before setting them up (Ilpo Järvinen)

- Disable non-claimed bridge windows so the kernel's view matches the
  hardware configuration (Ilpo Järvinen)

- Use pci_release_resource() instead of release_resource() to reduce code
  duplication and increase consistency (Ilpo Järvinen)

- Enable bridges even if bridge window assignment fails (Ilpo Järvinen)

- Preserve bridge window resource type flags when assignment fails because
  we may need it later (Ilpo Järvinen)

- Add bridge window selection functions to make the selection consistent
  across the several places that do this (Ilpo Järvinen)

- Warn if bridge window cannot be released when resizing BAR (Ilpo
  Järvinen)

- Set up bridge resources before enumerating children so we can check
  whether child resources are inside bridge windows (Ilpo Järvinen)

* pci/resource:
  PCI: Set up bridge resources earlier
  PCI: Don't print stale information about resource
  PCI: Alter misleading recursion to pci_bus_release_bridge_resources()
  PCI: Pass bridge window to pci_bus_release_bridge_resources()
  PCI: Add pci_setup_one_bridge_window()
  PCI: Refactor remove_dev_resources() to use pbus_select_window()
  PCI: Refactor distributing available memory to use loops
  PCI: Use pbus_select_window_for_type() during mem window sizing
  PCI: Use pbus_select_window() in space available checker
  PCI: Rename resource variable from r to res
  PCI: Use pbus_select_window_for_type() during IO window sizing
  PCI: Use pbus_select_window() during BAR resize
  PCI: Warn if bridge window cannot be released when resizing BAR
  PCI: Fix finding bridge window in pci_reassign_bridge_resources()
  PCI: Add bridge window selection functions
  PCI: Add defines for bridge window indexing
  PCI: Preserve bridge window resource type flags
  PCI: Enable bridge even if bridge window fails to assign
  PCI: Use pci_release_resource() instead of release_resource()
  PCI: Disable non-claimed bridge window
  PCI: Always claim bridge window before its setup
  PCI: Refactor find_bus_resource_of_type() logic checks
  PCI: Move find_bus_resource_of_type() earlier
  MIPS: PCI: Use pci_enable_resources()
  sparc/PCI: Remove pcibios_enable_device() as they do nothing extra
  m68k/PCI: Use pci_enable_resources() in pcibios_enable_device()
  PCI: Fix failure detection during resource resize
  PCI: Fix pdev_resources_assignable() disparity
  PCI: Ensure relaxed tail alignment does not increase min_align
2025-10-03 12:13:12 -05:00
Bjorn Helgaas
b365c0a769 Merge branch 'pci/pwrctrl'
- Fix a double cleanup of regulators if devm_add_action_or_reset() fails
  (Geert Uytterhoeven)

* pci/pwrctrl:
  PCI/pwrctrl: Fix device leak at device stop
  PCI/pwrctrl: Fix device and OF node leak at bus scan
  PCI/pwrctrl: Fix device leak at registration
  PCI/pwrctrl: Fix double cleanup on devm_add_action_or_reset() failure
2025-10-03 12:13:11 -05:00
Niklas Schnelle
60e7b5aa85 PCI: Add lockdep assertion in pci_stop_and_remove_bus_device()
Removing a PCI devices requires holding pci_rescan_remove_lock. Prompted by
this being missed in sriov_disable() and going unnoticed since its
inception, add a lockdep assert so this doesn't get missed again in the
future.

Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Reviewed-by: Julian Ruess <julianr@linux.ibm.com>
Link: https://patch.msgid.link/20250826-pci_fix_sriov_disable-v1-2-2d0bc938f2a3@linux.ibm.com
2025-09-26 16:01:17 -05:00
Ilpo Järvinen
a43ac325c7 PCI: Set up bridge resources earlier
Bridge windows are read twice from PCI Config Space, the first time from
pci_read_bridge_windows(), which does not set up the device's resources.
This causes problems down the road as child resources of the bridge cannot
check whether they reside within the bridge window or not.

Set up the bridge windows already in pci_read_bridge_windows().

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250924134228.1663-2-ilpo.jarvinen@linux.intel.com
2025-09-25 16:15:22 -05:00
Ilpo Järvinen
e4934832c5 PCI: Add defines for bridge window indexing
include/linux/pci.h provides PCI_BRIDGE_{IO,MEM,PREF_MEM}_WINDOW defines,
however, they're based on the resource array indexing in the pci_dev
struct. The struct pci_bus also has pointers to those same resources but
they start from zeroth index.

Add PCI_BUS_BRIDGE_{IO,MEM,PREF_MEM}_WINDOW defines to get rid of literal
indexing.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250829131113.36754-12-ilpo.jarvinen@linux.intel.com
2025-09-16 11:19:24 -05:00
Ilpo Järvinen
8278c69143 PCI: Preserve bridge window resource type flags
When a bridge window is found unused or fails to assign, the flags of the
associated resource are cleared. Clearing flags is problematic as it also
removes the type information of the resource which is needed later.

Thus, always preserve the bridge window type flags and use IORESOURCE_UNSET
and IORESOURCE_DISABLED to indicate the status of the bridge window. Also,
when initializing resources, make sure all valid bridge windows do get
their type flags set.

Change various places that relied on resource flags being cleared to check
for IORESOURCE_UNSET and IORESOURCE_DISABLED to allow bridge window
resource to retain their type flags. Add pdev_resource_assignable() and
pdev_resource_should_fit() helpers to filter out disabled bridge windows
during resource fitting; the latter combines more common checks into the
helper.

When reading the bridge windows from the registers, instead of leaving the
resource flags cleared for bridge windows that are not enabled, always
set up the flags and set IORESOURCE_UNSET | IORESOURCE_DISABLED as needed.

When resource fitting or assignment fails for a bridge window resource, or
the bridge window is not needed, mark the resource with IORESOURCE_UNSET or
IORESOURCE_DISABLED, respectively.

Use dummy zero resource in resource_show() for backwards compatibility as
lspci will otherwise misrepresent disabled bridge windows.

This change fixes an issue which highlights the importance of keeping the
resource type flags intact:

  At the end of __assign_resources_sorted(), reset_resource() is called,
  previously clearing the flags. Later, pci_prepare_next_assign_round()
  attempted to release bridge resources using
  pci_bus_release_bridge_resources() that calls into
  pci_bridge_release_resources() that assumes type flags are still present.
  As type flags were cleared, IORESOURCE_MEM_64 was not set leading to
  resources under an incorrect bridge window to be released (idx = 1
  instead of idx = 2). While the assignments performed later covered this
  problem so that the wrongly released resources got assigned in the end,
  it was still causing extra release+assign pairs.

There are other reasons why the resource flags should be retained in
upcoming changes too.

Removing the flag reset for non-bridge window resource is left as future
work, in part because it has a much higher regression potential due to
pci_enable_resources() that will start to work also for those resources
then and due to what endpoint drivers might assume about resources.

Despite the Fixes tag, backporting this (at least any time soon) is highly
discouraged. The issue fixed is borderline cosmetic as the later
assignments normally cover the problem entirely. Also there might be
non-obvious dependencies.

Fixes: 5b28541552 ("PCI: Restrict 64-bit prefetchable bridge windows to 64-bit resources")
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20250829131113.36754-11-ilpo.jarvinen@linux.intel.com
2025-09-16 11:19:18 -05:00
Johan Hovold
e24bbbe078 PCI/pwrctrl: Fix device and OF node leak at bus scan
Make sure to drop the references to the pwrctrl OF node and device taken by
of_pci_find_child_device() and of_find_device_by_node() respectively when
scanning the bus.

Fixes: 957f40d039 ("PCI/pwrctrl: Move creation of pwrctrl devices to pci_scan_device()")
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Cc: stable@vger.kernel.org	# v6.15
Link: https://patch.msgid.link/20250721153609.8611-3-johan+linaro@kernel.org
2025-08-27 14:32:11 -05:00
Ilpo Järvinen
c763fae8c4 PCI: Clean up pci_scan_child_bus_extend() loop
pci_scan_child_bus_extend() open-codes device number iteration in the for
loop. Convert to use PCI_DEVFN() and add PCI_MAX_NR_DEVS (there seems to be
no pre-existing define for this purpose).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://patch.msgid.link/20250610105820.7126-3-ilpo.jarvinen@linux.intel.com
2025-08-11 15:00:51 -05:00
Ilpo Järvinen
aa84931ba7 PCI: Clean up early_dump_pci_device()
Convert 256 to PCI_CFG_SPACE_SIZE and 4 to sizeof(u32) and avoid i / 4
construct by changing the iteration.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://patch.msgid.link/20250610105820.7126-2-ilpo.jarvinen@linux.intel.com
2025-08-11 15:00:51 -05:00
Ilpo Järvinen
24b2d3c452 PCI: Use header type defines in pci_setup_device()
Replace literals with PCI_HEADER_TYPE_* defines in pci_setup_device().

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://patch.msgid.link/20250610105820.7126-1-ilpo.jarvinen@linux.intel.com
2025-08-11 15:00:51 -05:00
Linus Torvalds
0bd0a41a51 Merge tag 'pci-v6.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull PCI updates from Bjorn Helgaas:
 "Enumeration:

   - Allow built-in drivers, not just modular drivers, to use async
     initial probing (Lukas Wunner)

   - Support Immediate Readiness even on devices with no PM Capability
     (Sean Christopherson)

   - Consolidate definition of PCIE_RESET_CONFIG_WAIT_MS (100ms), the
     required delay between a reset and sending config requests to a
     device (Niklas Cassel)

   - Add pci_is_display() to check for "Display" base class and use it
     in ALSA hda, vfio, vga_switcheroo, vt-d (Mario Limonciello)

   - Allow 'isolated PCI functions' (multi-function devices without a
     function 0) for LoongArch, similar to s390 and jailhouse (Huacai
     Chen)

  Power control:

   - Add ability to enable optional slot clock for cases where the PCIe
     host controller and the slot are supplied by different clocks
     (Marek Vasut)

  PCIe native device hotplug:

   - Fix runtime PM ref imbalance on Hot-Plug Capable ports caused by
     misinterpreting a config read failure after a device has been
     removed (Lukas Wunner)

   - Avoid creating a useless PCIe port service device for pciehp if the
     slot is handled by the ACPI hotplug driver (Lukas Wunner)

   - Ignore ACPI hotplug slots when calculating depth of pciehp hotplug
     ports (Lukas Wunner)

  Virtualization:

   - Save VF resizable BAR state and restore it after reset (Michał
     Winiarski)

   - Allow IOV resources (VF BARs) to be resized (Michał Winiarski)

   - Add pci_iov_vf_bar_set_size() so drivers can control VF BAR size
     (Michał Winiarski)

  Endpoint framework:

   - Add RC-to-EP doorbell support using platform MSI controller,
     including a test case (Frank Li)

   - Allow BAR assignment via configfs so platforms have flexibility in
     determining BAR usage (Jerome Brunet)

  Native PCIe controller drivers:

   - Convert amazon,al-alpine-v[23]-pcie, apm,xgene-pcie,
     axis,artpec6-pcie, marvell,armada-3700-pcie, st,spear1340-pcie to
     DT schema format (Rob Herring)

   - Use dev_fwnode() instead of of_fwnode_handle() to remove OF
     dependency in altera (fixes an unused variable), designware-host,
     mediatek, mediatek-gen3, mobiveil, plda, xilinx, xilinx-dma,
     xilinx-nwl (Jiri Slaby, Arnd Bergmann)

   - Convert aardvark, altera, brcmstb, designware-host, iproc,
     mediatek, mediatek-gen3, mobiveil, plda, rcar-host, vmd, xilinx,
     xilinx-dma, xilinx-nwl from using pci_msi_create_irq_domain() to
     using msi_create_parent_irq_domain() instead; this makes the
     interrupt controller per-PCI device, allows dynamic allocation of
     vectors after initialization, and allows support of IMS (Nam Cao)

  APM X-Gene PCIe controller driver:

   - Rewrite MSI handling to MSI CPU affinity, drop useless CPU hotplug
     bits, use device-managed memory allocations, and clean things up
     (Marc Zyngier)

   - Probe xgene-msi as a standard platform driver rather than a
     subsys_initcall (Marc Zyngier)

  Broadcom STB PCIe controller driver:

   - Add optional DT 'num-lanes' property and if present, use it to
     override the Maximum Link Width advertised in Link Capabilities
     (Jim Quinlan)

  Cadence PCIe controller driver:

   - Use PCIe Message routing types from the PCI core rather than
     defining private ones (Hans Zhang)

  Freescale i.MX6 PCIe controller driver:

   - Add IMX8MQ_EP third 64-bit BAR in epc_features (Richard Zhu)

   - Add IMX8MM_EP and IMX8MP_EP fixed 256-byte BAR 4 in epc_features
     (Richard Zhu)

   - Configure LUT for MSI/IOMMU in Endpoint mode so Root Complex can
     trigger doorbel on Endpoint (Frank Li)

   - Remove apps_reset (LTSSM_EN) from
     imx_pcie_{assert,deassert}_core_reset(), which fixes a hotplug
     regression on i.MX8MM (Richard Zhu)

   - Delay Endpoint link start until configfs 'start' written (Richard
     Zhu)

  Intel VMD host bridge driver:

   - Add Intel Panther Lake (PTL)-H/P/U Vendor ID (George D Sworo)

  Qualcomm PCIe controller driver:

   - Add DT binding and driver support for SA8255p, which supports ECAM
     for Configuration Space access (Mayank Rana)

   - Update DT binding and driver to describe PHYs and per-Root Port
     resets in a Root Port stanza and deprecate describing them in the
     host bridge; this makes it possible to support multiple Root Ports
     in the future (Krishna Chaitanya Chundru)

   - Add Qualcomm QCS615 to SM8150 DT binding (Ziyue Zhang)

   - Add Qualcomm QCS8300 to SA8775p DT binding (Ziyue Zhang)

   - Drop TBU and ref clocks from Qualcomm SM8150 and SC8180x DT
     bindings (Konrad Dybcio)

   - Document 'link_down' reset in Qualcomm SA8775P DT binding (Ziyue
     Zhang)

   - Add required PCIE_RESET_CONFIG_WAIT_MS delay after Link up IRQ
     (Niklas Cassel)

  Rockchip PCIe controller driver:

   - Drop unused PCIe Message routing and code definitions (Hans Zhang)

   - Remove several unused header includes (Hans Zhang)

   - Use standard PCIe config register definitions instead of
     rockchip-specific redefinitions (Geraldo Nascimento)

   - Set Target Link Speed to 5.0 GT/s before retraining so we have a
     chance to train at a higher speed (Geraldo Nascimento)

  Rockchip DesignWare PCIe controller driver:

   - Prevent race between link training and register update via DBI by
     inhibiting link training after hot reset and link down (Wilfred
     Mallawa)

   - Add required PCIE_RESET_CONFIG_WAIT_MS delay after Link up IRQ
     (Niklas Cassel)

  Sophgo PCIe controller driver:

   - Add DT binding and driver for Sophgo SG2044 PCIe controller driver
     in Root Complex mode (Inochi Amaoto)

  Synopsys DesignWare PCIe controller driver:

   - Add required PCIE_RESET_CONFIG_WAIT_MS after waiting for Link up on
     Ports that support > 5.0 GT/s. Slower Ports still rely on the
     not-quite-correct PCIE_LINK_WAIT_SLEEP_MS 90ms default delay while
     waiting for the Link (Niklas Cassel)"

* tag 'pci-v6.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (116 commits)
  dt-bindings: PCI: qcom,pcie-sa8775p: Document 'link_down' reset
  dt-bindings: PCI: Remove 83xx-512x-pci.txt
  dt-bindings: PCI: Convert amazon,al-alpine-v[23]-pcie to DT schema
  dt-bindings: PCI: Convert marvell,armada-3700-pcie to DT schema
  dt-bindings: PCI: Convert apm,xgene-pcie to DT schema
  dt-bindings: PCI: Convert axis,artpec6-pcie to DT schema
  dt-bindings: PCI: Convert st,spear1340-pcie to DT schema
  PCI: Move is_pciehp check out of pciehp_is_native()
  PCI: pciehp: Use is_pciehp instead of is_hotplug_bridge
  PCI/portdrv: Use is_pciehp instead of is_hotplug_bridge
  PCI/ACPI: Fix runtime PM ref imbalance on Hot-Plug Capable ports
  selftests: pci_endpoint: Add doorbell test case
  misc: pci_endpoint_test: Add doorbell test case
  PCI: endpoint: pci-epf-test: Add doorbell test support
  PCI: endpoint: Add pci_epf_align_inbound_addr() helper for inbound address alignment
  PCI: endpoint: pci-ep-msi: Add checks for MSI parent and mutability
  PCI: endpoint: Add RC-to-EP doorbell support using platform MSI controller
  PCI: dwc: Add Sophgo SG2044 PCIe controller driver in Root Complex mode
  PCI: vmd: Switch to msi_create_parent_irq_domain()
  PCI: vmd: Convert to lock guards
  ...
2025-08-01 13:59:07 -07:00
Bjorn Helgaas
0e142889f4 Merge branch 'pci/hotplug'
- Fix runtime PM ref imbalance on Hot-Plug Capable ports caused by
  misinterpreting a config read failure after a device has been removed
  (Lukas Wunner)

- Avoid creating a useless PCIe port service device for pciehp if the slot
  is handled by the ACPI hotplug driver (Lukas Wunner)

- Ignore ACPI hotplug slots when calculating depth of pciehp hotplug ports
  (Lukas Wunner)

- Simplify pci_bridge_d3_possible() and clarify comments (Lukas Wunner)

* pci/hotplug:
  PCI: Move is_pciehp check out of pciehp_is_native()
  PCI: pciehp: Use is_pciehp instead of is_hotplug_bridge
  PCI/portdrv: Use is_pciehp instead of is_hotplug_bridge
  PCI/ACPI: Fix runtime PM ref imbalance on Hot-Plug Capable ports
2025-07-31 16:11:42 -05:00
Lukas Wunner
6cff20ce3b PCI/ACPI: Fix runtime PM ref imbalance on Hot-Plug Capable ports
pci_bridge_d3_possible() is called from both pcie_portdrv_probe() and
pcie_portdrv_remove() to determine whether runtime power management shall
be enabled (on probe) or disabled (on remove) on a PCIe port.

The underlying assumption is that pci_bridge_d3_possible() always returns
the same value, else a runtime PM reference imbalance would occur.  That
assumption is not given if the PCIe port is inaccessible on remove due to
hot-unplug:  pci_bridge_d3_possible() calls pciehp_is_native(), which
accesses Config Space to determine whether the port is Hot-Plug Capable.
An inaccessible port returns "all ones", which is converted to "all
zeroes" by pcie_capability_read_dword().  Hence the port no longer seems
Hot-Plug Capable on remove even though it was on probe.

The resulting runtime PM ref imbalance causes warning messages such as:

  pcieport 0000:02:04.0: Runtime PM usage count underflow!

Avoid the Config Space access (and thus the runtime PM ref imbalance) by
caching the Hot-Plug Capable bit in struct pci_dev.

The struct already contains an "is_hotplug_bridge" flag, which however is
not only set on Hot-Plug Capable PCIe ports, but also Conventional PCI
Hot-Plug bridges and ACPI slots.  The flag identifies bridges which are
allocated additional MMIO and bus number resources to allow for hierarchy
expansion.

The kernel is somewhat sloppily using "is_hotplug_bridge" in a number of
places to identify Hot-Plug Capable PCIe ports, even though the flag
encompasses other devices.  Subsequent commits replace these occurrences
with the new flag to clearly delineate Hot-Plug Capable PCIe ports from
other kinds of hotplug bridges.

Document the existing "is_hotplug_bridge" and the new "is_pciehp" flag
and document the (non-obvious) requirement that pci_bridge_d3_possible()
always returns the same value across the entire lifetime of a bridge,
including its hot-removal.

Fixes: 5352a44a56 ("PCI: pciehp: Make pciehp_is_native() stricter")
Reported-by: Laurent Bigonville <bigon@bigon.be>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220216
Reported-by: Mario Limonciello <mario.limonciello@amd.com>
Closes: https://lore.kernel.org/r/20250609020223.269407-3-superm1@kernel.org/
Link: https://lore.kernel.org/all/20250620025535.3425049-3-superm1@kernel.org/T/#u
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Cc: stable@vger.kernel.org # v4.18+
Link: https://patch.msgid.link/fe5dcc3b2e62ee1df7905d746bde161eb1b3291c.1752390101.git.lukas@wunner.de
2025-07-29 11:45:10 -05:00
Sean Christopherson
5c0d0ee36f PCI: Support Immediate Readiness on devices without PM capabilities
Query support for Immediate Readiness irrespective of whether or not the
device supports PM capabilities, as nothing in the PCIe spec suggests that
Immediate Readiness is in any way dependent on PM functionality.

Fixes: d6112f8def ("PCI: Add support for Immediate Readiness")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: David Matlack <dmatlack@google.com>
Cc: Vipin Sharma <vipinsh@google.com>
Cc: Aaron Lewis <aaronlewis@google.com>
Link: https://patch.msgid.link/20250722155926.352248-1-seanjc@google.com
2025-07-22 18:02:44 -05:00
Manivannan Sadhasivam
8c493cc91f PCI/pwrctrl: Create pwrctrl devices only when CONFIG_PCI_PWRCTRL is enabled
If devicetree describes power supplies related to a PCI device, we
unnecessarily created a pwrctrl device even if CONFIG_PCI_PWRCTL was not
enabled.

We only need pci_pwrctrl_create_device() when CONFIG_PCI_PWRCTRL is
enabled.  Compile it out when CONFIG_PCI_PWRCTRL is not enabled.

When pci_pwrctrl_create_device() creates and returns a pwrctrl device,
pci_scan_device() doesn't enumerate the PCI device. It assumes the pwrctrl
core will rescan the bus after turning on the power. However, if
CONFIG_PCI_PWRCTRL is not enabled, the rescan never happens, which breaks
PCI enumeration on any system that describes power supplies in devicetree
but does not use pwrctrl.

Jim reported that some brcmstb platforms break this way.  The brcmstb
driver is still broken if CONFIG_PCI_PWRCTRL is enabled, but this commit at
least allows brcmstb to work when it's NOT enabled.

Fixes: 957f40d039 ("PCI/pwrctrl: Move creation of pwrctrl devices to pci_scan_device()")
Reported-by: Jim Quinlan <james.quinlan@broadcom.com>
Link: https://lore.kernel.org/r/CA+-6iNwgaByXEYD3j=-+H_PKAxXRU78svPMRHDKKci8AGXAUPg@mail.gmail.com
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org	# v6.15
Link: https://patch.msgid.link/20250701064731.52901-1-manivannan.sadhasivam@linaro.org
2025-07-22 13:36:04 -05:00
Lukas Wunner
ce45dc4bb2 PCI: Limit visibility of match_driver flag to PCI core
Since commit 58d9a38f6f ("PCI: Skip attaching driver in device_add()"),
PCI enumeration is split into two steps:  In the first step, all devices
are published in sysfs with device_add().  In the second step, drivers are
bound to the devices with device_attach().  To delay driver binding until
the second step, a "bool match_driver" in struct pci_dev is used.

Instead of a bool, use a bit in the "unsigned long priv_flags" to shrink
struct pci_dev a little and prevent use of the bool outside the PCI core
(as has happened with commit cbbc00be2c ("iommu/amd: Prevent binding
other PCI drivers to IOMMU PCI devices")).

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Link: https://patch.msgid.link/d22a9e5b81d6bd8dd1837607d6156679b3b1199c.1745572340.git.lukas@wunner.de
2025-05-15 13:40:52 +00:00
Ilpo Järvinen
5fe8d08139 PCI: Use PCI_STD_NUM_BARS instead of 6
pci_read_bases() is given literal 6 that means PCI_STD_NUM_BARS.  Replace
the literal with the define to annotate the code better.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Link: https://patch.msgid.link/20250416100239.6958-1-ilpo.jarvinen@linux.intel.com
2025-04-16 13:21:28 -05:00
Linus Torvalds
7d06015d93 Merge tag 'pci-v6.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci updates from Bjorn Helgaas:
 "Enumeration:

   - Enable Configuration RRS SV, which makes device readiness visible,
     early instead of during child bus scanning (Bjorn Helgaas)

   - Log debug messages about reset methods being used (Bjorn Helgaas)

   - Avoid reset when it has been disabled via sysfs (Nishanth
     Aravamudan)

   - Add common pci-ep-bus.yaml schema for exporting several peripherals
     of a single PCI function via devicetree (Andrea della Porta)

   - Create DT nodes for PCI host bridges to enable loading device tree
     overlays to create platform devices for PCI devices that have
     several features that require multiple drivers (Herve Codina)

  Resource management:

   - Enlarge devres table[] to accommodate bridge windows, ROM, IOV
     BARs, etc., and validate BAR index in devres interfaces (Philipp
     Stanner)

   - Fix typo that repeatedly distributed resources to a bridge instead
     of iterating over subordinate bridges, which resulted in too little
     space to assign some BARs (Kai-Heng Feng)

   - Relax bridge window tail sizing for optional resources, e.g., IOV
     BARs, to avoid failures when removing and re-adding devices (Ilpo
     Järvinen)

   - Allow drivers to enable devices even if we haven't assigned
     optional IOV resources to them (Ilpo Järvinen)

   - Rework handling of optional resources (IOV BARs, ROMs) to reduce
     failures if we can't allocate them (Ilpo Järvinen)

   - Fix a NULL dereference in the SR-IOV VF creation error path (Shay
     Drory)

   - Fix s390 mmio_read/write syscalls, which didn't cause page faults
     in some cases, which broke vfio-pci lazy mapping on first access
     (Niklas Schnelle)

   - Add pdev->non_mappable_bars to replace CONFIG_VFIO_PCI_MMAP, which
     was disabled only for s390 (Niklas Schnelle)

   - Support mmap of PCI resources on s390 except for ISM devices
     (Niklas Schnelle)

  ASPM:

   - Delay pcie_link_state deallocation to avoid dangling pointers that
     cause invalid references during hot-unplug (Daniel Stodden)

  Power management:

   - Allow PCI bridges to go to D3Hot when suspending on all non-x86
     systems (Manivannan Sadhasivam)

  Power control:

   - Create pwrctrl devices in pci_scan_device() to make it more
     symmetric with pci_pwrctrl_unregister() and make pwrctrl devices
     for PCI bridges possible (Manivannan Sadhasivam)

   - Unregister pwrctrl devices in pci_destroy_dev() so DOE, ASPM, etc.
     can still access devices after pci_stop_dev() (Manivannan
     Sadhasivam)

   - If there's a pwrctrl device for a PCI device, skip scanning it
     because the pwrctrl core will rescan the bus after the device is
     powered on (Manivannan Sadhasivam)

   - Add a pwrctrl driver for PCI slots based on voltage regulators
     described via devicetree (Manivannan Sadhasivam)

  Bandwidth control:

   - Add set_pcie_speed.sh to TEST_PROGS to fix issue when executing the
     set_pcie_cooling_state.sh test case (Yi Lai)

   - Avoid a NULL pointer dereference when we run out of bus numbers to
     assign for a bridge secondary bus (Lukas Wunner)

  Hotplug:

   - Drop superfluous pci_hotplug_slot_list, try_module_get() calls, and
     NULL pointer checks (Lukas Wunner)

   - Drop shpchp module init/exit logging, replace shpchp dbg() with
     ctrl_dbg(), and remove unused dbg(), err(), info(), warn() wrappers
     (Ilpo Järvinen)

   - Drop 'shpchp_debug' module parameter in favor of standard dynamic
     debugging (Ilpo Järvinen)

   - Drop unused cpcihp .get_power(), .set_power() function pointers
     (Guilherme Giacomo Simoes)

   - Disable hotplug interrupts in portdrv only when pciehp is not
     enabled to avoid issuing two hotplug commands too close together
     (Feng Tang)

   - Skip pciehp 'device replaced' check if the device has been removed
     to address a deadlock when resuming after a device was removed
     during system sleep (Lukas Wunner)

   - Don't enable pciehp hotplug interupt when resuming in poll mode
     (Ilpo Järvinen)

  Virtualization:

   - Fix bugs in 'pci=config_acs=' kernel command line parameter (Tushar
     Dave)

  DOE:

   - Expose supported DOE features via sysfs (Alistair Francis)

   - Allow DOE support to be enabled even if CXL isn't enabled (Alistair
     Francis)

  Endpoint framework:

   - Convert PCI device data so pci-epf-test works correctly on
     big-endian endpoint systems (Niklas Cassel)

   - Add BAR_RESIZABLE type to endpoint framework and add DWC core
     support for EPF drivers to set BAR_RESIZABLE type and size (Niklas
     Cassel)

   - Fix pci-epf-test double free that causes an oops if the host
     reboots and PERST# deassertion restarts endpoint BAR allocation
     (Christian Bruel)

   - Fix endpoint BAR testing so tests can skip disabled BARs instead of
     reporting them as failures (Niklas Cassel)

   - Widen endpoint test BAR size variable to accommodate BARs larger
     than INT_MAX (Niklas Cassel)

   - Remove unused tools 'pci' build target left over after moving tests
     to tools/testing/selftests/pci_endpoint (Jianfeng Liu)

  Altera PCIe controller driver:

   - Add DT binding and driver support for Agilex family (P-Tile,
     F-Tile, R-Tile) (Matthew Gerlach and D M, Sharath Kumar)

  AMD MDB PCIe controller driver:

   - Add DT binding and driver for AMD MDB (Multimedia DMA Bridge)
     (Thippeswamy Havalige)

  Broadcom STB PCIe controller driver:

   - Add BCM2712 MSI-X DT binding and interrupt controller drivers and
     add softdep on irq_bcm2712_mip driver to ensure that it is loaded
     first (Stanimir Varbanov)

   - Expand inbound window map to 64GB so it can accommodate BCM2712
     (Stanimir Varbanov)

   - Add BCM2712 support and DT updates (Stanimir Varbanov)

   - Apply link speed restriction before bringing link up, not after
     (Jim Quinlan)

   - Update Max Link Speed in Link Capabilities via the internal
     writable register, not the read-only config register (Jim Quinlan)

   - Handle regulator_bulk_get() error to avoid panic when we call
     regulator_bulk_free() later (Jim Quinlan)

   - Disable regulators only when removing the bus immediately below a
     Root Port because we don't support regulators deeper in the
     hierarchy (Jim Quinlan)

   - Make const read-only arrays static (Colin Ian King)

  Cadence PCIe endpoint driver:

   - Correct MSG TLP generation so endpoints can generate INTx messages
     (Hans Zhang)

  Freescale i.MX6 PCIe controller driver:

   - Identify the second controller on i.MX8MQ based on devicetree
     'linux,pci-domain' instead of DBI 'reg' address (Richard Zhu)

   - Remove imx_pcie_cpu_addr_fixup() since dwc core can now derive the
     ATU input address (using parent_bus_offset) from devicetree (Frank
     Li)

  Freescale Layerscape PCIe controller driver:

   - Drop deprecated 'num-ib-windows' and 'num-ob-windows' and
     unnecessary 'status' from example (Krzysztof Kozlowski)

   - Correct the syscon_regmap_lookup_by_phandle_args("fsl,pcie-scfg")
     arg_count to fix probe failure on LS1043A (Ioana Ciornei)

  HiSilicon STB PCIe controller driver:

   - Call phy_exit() to clean up if histb_pcie_probe() fails (Christophe
     JAILLET)

  Intel Gateway PCIe controller driver:

   - Remove intel_pcie_cpu_addr() since dwc core can now derive the ATU
     input address (using parent_bus_offset) from devicetree (Frank Li)

  Intel VMD host bridge driver:

   - Convert vmd_dev.cfg_lock from spinlock_t to raw_spinlock_t so
     pci_ops.read() will never sleep, even on PREEMPT_RT where
     spinlock_t becomes a sleepable lock, to avoid calling a sleeping
     function from invalid context (Ryo Takakura)

  MediaTek PCIe Gen3 controller driver:

   - Remove leftover mac_reset assert for Airoha EN7581 SoC (Lorenzo
     Bianconi)

   - Add EN7581 PBUS controller 'mediatek,pbus-csr' DT property and
     program host bridge memory aperture to this syscon node (Lorenzo
     Bianconi)

  Qualcomm PCIe controller driver:

   - Add qcom,pcie-ipq5332 binding (Varadarajan Narayanan)

   - Add qcom i.MX8QM and i.MX8QXP/DXP optional DMA interrupt (Alexander
     Stein)

   - Add optional dma-coherent DT property for Qualcomm SA8775P (Dmitry
     Baryshkov)

   - Make DT iommu property required for SA8775P and prohibited for
     SDX55 (Dmitry Baryshkov)

   - Add DT IOMMU and DMA-related properties for Qualcomm SM8450 (Dmitry
     Baryshkov)

   - Add endpoint DT properties for SAR2130P and enable endpoint mode in
     driver (Dmitry Baryshkov)

   - Describe endpoint BAR0 and BAR2 as 64-bit only and BAR1 and BAR3 as
     RESERVED (Manivannan Sadhasivam)

  Rockchip DesignWare PCIe controller driver:

   - Describe rk3568 and rk3588 BARs as Resizable, not Fixed (Niklas
     Cassel)

  Synopsys DesignWare PCIe controller driver:

   - Add debugfs-based Silicon Debug, Error Injection, Statistical
     Counter support for DWC (Shradha Todi)

   - Add debugfs property to expose LTSSM status of DWC PCIe link (Hans
     Zhang)

   - Add Rockchip support for DWC debugfs features (Niklas Cassel)

   - Add dw_pcie_parent_bus_offset() to look up the parent bus address
     of a specified 'reg' property and return the offset from the CPU
     physical address (Frank Li)

   - Use dw_pcie_parent_bus_offset() to derive CPU -> ATU addr offset
     via 'reg[config]' for host controllers and 'reg[addr_space]' for
     endpoint controllers (Frank Li)

   - Apply struct dw_pcie.parent_bus_offset in ATU users to remove use
     of .cpu_addr_fixup() when programming ATU (Frank Li)

  TI J721E PCIe driver:

   - Correct the 'link down' interrupt bit for J784S4 (Siddharth
     Vadapalli)

  TI Keystone PCIe controller driver:

   - Describe AM65x BARs 2 and 5 as Resizable (not Fixed) and reduce
     alignment requirement from 1MB to 64KB (Niklas Cassel)

  Xilinx Versal CPM PCIe controller driver:

   - Free IRQ domain in probe error path to avoid leaking it
     (Thippeswamy Havalige)

   - Add DT .compatible "xlnx,versal-cpm5nc-host" and driver support for
     Versal Net CPM5NC Root Port controller (Thippeswamy Havalige)

   - Add driver support for CPM5_HOST1 (Thippeswamy Havalige)

  Miscellaneous:

   - Convert fsl,mpc83xx-pcie binding to YAML (J. Neuschäfer)

   - Use for_each_available_child_of_node_scoped() to simplify apple,
     kirin, mediatek, mt7621, tegra drivers (Zhang Zekun)"

* tag 'pci-v6.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (197 commits)
  PCI: layerscape: Fix arg_count to syscon_regmap_lookup_by_phandle_args()
  PCI: j721e: Fix the value of .linkdown_irq_regfield for J784S4
  misc: pci_endpoint_test: Add support for PCITEST_IRQ_TYPE_AUTO
  PCI: endpoint: pci-epf-test: Expose supported IRQ types in CAPS register
  PCI: dw-rockchip: Endpoint mode cannot raise INTx interrupts
  PCI: endpoint: Add intx_capable to epc_features struct
  dt-bindings: PCI: Add common schema for devices accessible through PCI BARs
  PCI: intel-gw: Remove intel_pcie_cpu_addr()
  PCI: imx6: Remove imx_pcie_cpu_addr_fixup()
  PCI: dwc: Use parent_bus_offset to remove need for .cpu_addr_fixup()
  PCI: dwc: ep: Ensure proper iteration over outbound map windows
  PCI: dwc: ep: Use devicetree 'reg[addr_space]' to derive CPU -> ATU addr offset
  PCI: dwc: ep: Consolidate devicetree handling in dw_pcie_ep_get_resources()
  PCI: dwc: ep: Call epc_create() early in dw_pcie_ep_init()
  PCI: dwc: Use devicetree 'reg[config]' to derive CPU -> ATU addr offset
  PCI: dwc: Add dw_pcie_parent_bus_offset() checking and debug
  PCI: dwc: Add dw_pcie_parent_bus_offset()
  PCI/bwctrl: Fix NULL pointer dereference on bus number exhaustion
  PCI: xilinx-cpm: Add cpm_csr register mapping for CPM5_HOST1 variant
  PCI: brcmstb: Make const read-only arrays static
  ...
2025-03-28 19:36:53 -07:00
Bjorn Helgaas
a1aed6b34f Merge branch 'pci/devtree-create'
- Add device_add_of_node() to set dev->of_node and dev->fwnode only if they
  haven't been set already (Herve Codina)

- Allow of_pci_set_address() to set the DT address property for root bus
  nodes, where there is no PCI bridge to supply the PCI bus/device/function
  part of the property (Herve Codina)

- Create DT nodes for PCI host bridges to enable loading device tree
  overlays to create platform devices for PCI devices that have several
  features that require multiple drivers (Herve Codina)

* pci/devtree-create:
  PCI: of: Create device tree PCI host bridge node
  PCI: of_property: Constify parameter in of_pci_get_addr_flags()
  PCI: of_property: Add support for NULL pdev in of_pci_set_address()
  PCI: of: Use device_{add,remove}_of_node() to attach of_node to existing device
  driver core: Introduce device_{add,remove}_of_node()
2025-03-27 13:14:45 -05:00
Bjorn Helgaas
55d25a101d Merge branch 'pci/pwrctrl'
- Create pwrctrl devices in pci_scan_device() to make it more symmetric
  with pci_pwrctrl_unregister() and make pwrctrl devices for PCI bridges
  possible (Manivannan Sadhasivam)

- Unregister pwrctrl devices in pci_destroy_dev() so DOE, ASPM, etc. can
  still access devices after pci_stop_dev() (Manivannan Sadhasivam)

- If there's a pwrctrl device for a PCI device, skip scanning it because
  the pwrctrl core will rescan the bus after the device is powered on
  (Manivannan Sadhasivam)

- Add a pwrctrl driver for PCI slots based on voltage regulators described
  via devicetree (Manivannan Sadhasivam)

* pci/pwrctrl:
  PCI/pwrctrl: Add pwrctrl driver for PCI slots
  dt-bindings: vendor-prefixes: Document the 'pciclass' prefix
  PCI/pwrctrl: Skip scanning for the device further if pwrctrl device is created
  PCI/pwrctrl: Move pci_pwrctrl_unregister() to pci_destroy_dev()
  PCI/pwrctrl: Move creation of pwrctrl devices to pci_scan_device()
2025-03-27 13:14:44 -05:00
Bjorn Helgaas
e9e224dadd Merge branch 'pci/enumeration'
- Enable Configuration RRS SV early instead of during child bus scanning
  (Bjorn Helgaas)

- Cache offset of Resizable BAR capability to avoid redundant searches for
  it (Bjorn Helgaas)

- Fix reference leaks in pci_register_host_bridge() and
  pci_alloc_child_bus() (Ma Ke)

- Drop put_device() in pci_register_host_bridge() left over from converting
  device_register() to device_add() (Dan Carpenter)

* pci/enumeration:
  PCI: Remove stray put_device() in pci_register_host_bridge()
  PCI: Fix reference leak in pci_alloc_child_bus()
  PCI: Fix reference leak in pci_register_host_bridge()
  PCI: Cache offset of Resizable BAR capability
  PCI: Enable Configuration RRS SV early
2025-03-27 13:14:43 -05:00
Bjorn Helgaas
651aa9052c Merge branch 'pci/doe'
- Rename DOE 'protocol' to 'feature' to follow spec terminology (Alistair
  Francis)

- Expose supported DOE features via sysfs (Alistair Francis)

- Allow DOE support to be enabled even if CXL isn't enabled (Alistair
  Francis)

* pci/doe:
  PCI/DOE: Allow enabling DOE without CXL
  PCI/DOE: Expose DOE features via sysfs
  PCI/DOE: Rename Discovery Response Data Object Contents to type
  PCI/DOE: Rename DOE protocol to feature
2025-03-27 13:14:43 -05:00
Alistair Francis
2311ab1820 PCI/DOE: Expose DOE features via sysfs
PCIe r6.0 added support for Data Object Exchange (DOE).  When DOE is
supported, the DOE Discovery Feature must be implemented per PCIe r6.1, sec
6.30.1.1. DOE allows a requester to obtain information about the other DOE
features supported by the device.

The kernel already queries the DOE features supported and caches the
values.  Expose the values in sysfs to allow user space to determine which
DOE features are supported by the PCIe device.

By exposing the information to userspace, tools like lspci can relay the
information to users. By listing all of the supported features we can allow
userspace to parse the list, which might include vendor specific features
as well as yet to be supported features.

As the DOE Discovery feature must always be supported we treat it as a
special named attribute case. This allows the usual PCI attribute_group
handling to correctly create the doe_features directory when registering
pci_doe_sysfs_group (otherwise it doesn't and sysfs_add_file_to_group()
will seg fault).

After this patch is supported you can see something like this when
attaching a DOE device:

  $ ls /sys/devices/pci0000:00/0000:00:02.0//doe*
  0001:01        0001:02        doe_discovery

Link: https://lore.kernel.org/r/20250306075211.1855177-3-alistair@alistair23.me
Signed-off-by: Alistair Francis <alistair@alistair23.me>
[bhelgaas: drop pci_doe_sysfs_init() stub return, make
DEVICE_ATTR_RO(doe_discovery) static]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2025-03-21 16:36:01 -05:00
Dan Carpenter
6e8d06e509 PCI: Remove stray put_device() in pci_register_host_bridge()
This put_device() was accidentally left over from when we changed the code
from using device_register() to calling device_add().  Delete it.

Link: https://lore.kernel.org/r/55b24870-89fb-4c91-b85d-744e35db53c2@stanley.mountain
Fixes: 9885440b16 ("PCI: Fix pci_host_bridge struct device release/free handling")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-10 13:41:48 -05:00
Ma Ke
1f2768b6a3 PCI: Fix reference leak in pci_alloc_child_bus()
If device_register(&child->dev) fails, call put_device() to explicitly
release child->dev, per the comment at device_register().

Found by code review.

Link: https://lore.kernel.org/r/20250202062357.872971-1-make24@iscas.ac.cn
Fixes: 4f535093cf ("PCI: Put pci_dev in device tree as early as possible")
Signed-off-by: Ma Ke <make24@iscas.ac.cn>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: stable@vger.kernel.org
2025-03-10 13:41:48 -05:00
Ma Ke
804443c1f2 PCI: Fix reference leak in pci_register_host_bridge()
If device_register() fails, call put_device() to give up the reference to
avoid a memory leak, per the comment at device_register().

Found by code review.

Link: https://lore.kernel.org/r/20250225021440.3130264-1-make24@iscas.ac.cn
Fixes: 37d6a0a6f4 ("PCI: Add pci_register_host_bridge() interface")
Signed-off-by: Ma Ke <make24@iscas.ac.cn>
[bhelgaas: squash Dan Carpenter's double free fix from
https://lore.kernel.org/r/db806a6c-a91b-4e5a-a84b-6b7e01bdac85@stanley.mountain]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org
2025-03-10 13:41:48 -05:00
Bjorn Helgaas
a7eb9124d9 PCI: Cache offset of Resizable BAR capability
Previously most resizable BAR interfaces (pci_rebar_get_possible_sizes(),
pci_rebar_set_size(), etc) as well as pci_restore_state() searched config
space for a Resizable BAR capability.  Most devices don't have such a
capability, so this is wasted effort, especially for pci_restore_state().

Search for a Resizable BAR capability once at enumeration-time and cache
the offset so we don't have to search every time we need it.  No functional
change intended.

Link: https://lore.kernel.org/r/20250215000301.175097-3-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-03-10 13:41:47 -05:00
Bjorn Helgaas
3f8c4959fc PCI: Enable Configuration RRS SV early
Following a reset, a Function may respond to Config Requests with Request
Retry Status (RRS) Completion Status to indicate that it is temporarily
unable to process the Request, but will be able to process the Request in
the future (PCIe r6.0, sec 2.3.1).

If the Configuration RRS Software Visibility feature is enabled and a Root
Complex receives RRS for a config read of the Vendor ID, the Root Complex
completes the Request to the host by returning PCI_VENDOR_ID_PCI_SIG,
0x0001 (sec 2.3.2).

The Config RRS SV feature applies only to Root Ports and is not directly
related to pci_scan_bridge_extend().  Move the RRS SV enable to
set_pcie_port_type() where we handle other PCIe-specific configuration.

Link: https://lore.kernel.org/r/20250303210217.199504-1-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-10 13:41:39 -05:00
Herve Codina
1f34072441 PCI: of: Create device tree PCI host bridge node
PCI devices device tree nodes can be already created. This was introduced
by commit 407d1a5192 ("PCI: Create device tree node for bridge").

In order to have device tree nodes related to PCI devices attached on their
PCI root bus (the PCI bus handled by the PCI host bridge), a PCI root bus
device tree node is needed. This root bus node will be used as the parent
node of the first level devices scanned on the bus. On device tree based
systems, this PCI root bus device tree node is set to the node of the
related PCI host bridge. The PCI host bridge node is available in the
device tree used to describe the hardware passed at boot.

On non device tree based system (such as ACPI), a device tree node for the
PCI host bridge or for the root bus does not exist. Indeed, the PCI host
bridge is not described in a device tree used at boot simply because no
device tree is passed at boot.

The device tree PCI host bridge node creation needs to be done at runtime.
This is done in the same way as for the creation of the PCI device nodes.
I.e. node and properties are created based on computed information done by
the PCI core. Also, as is done on device tree based systems, this PCI host
bridge node is used for the PCI root bus.

With this done, hardware available in a PCI device that doesn't follow the
PCI model consisting in one PCI function handled by one driver can be
described by a device tree overlay loaded by the PCI device driver on non
device tree based systems. Those PCI devices provide a single PCI function
that includes several functionalities that require different drivers. The
device tree overlay describes the internal devices and their relationships.
It allows to load drivers needed by those different devices in order to
have functionalities handled.

Link: https://lore.kernel.org/r/20250224141356.36325-6-herve.codina@bootlin.com
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
2025-02-28 15:13:51 -06:00
Ilpo Järvinen
79c731e20d PCI: Track Flit Mode Status & print it with link status
PCIe r6.0 added Flit mode, which mainly alters HW behavior, but there are
some OS visible changes. The OS visible changes include differences in the
layout of some capabilities and interpretation of the TLP headers (in
diagnostics situations).

To be able to determine which mode the PCIe Link is using, store the Flit
Mode Status (PCIe r6.1 sec 7.5.3.20) information in addition to the Link
speed into struct pci_bus in pcie_update_link_speed().

Link: https://lore.kernel.org/r/20250207161836.2755-2-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
[bhelgaas: use unsigned int:1 instead of bool, update flit_mode setting]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-02-21 17:31:35 -06:00
Manivannan Sadhasivam
2489eeb777 PCI/pwrctrl: Skip scanning for the device further if pwrctrl device is created
The pwrctrl core will rescan the bus once the device is powered on. So
there is no need to continue scanning for the device further.

Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Tested-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20250116-pci-pwrctrl-slot-v3-3-827473c8fbf4@linaro.org
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-02-20 10:59:10 +00:00
Manivannan Sadhasivam
957f40d039 PCI/pwrctrl: Move creation of pwrctrl devices to pci_scan_device()
Current way of creating pwrctrl devices requires iterating through the
child devicetree nodes of the PCI bridge in pci_pwrctrl_create_devices().
Even though it works, it creates confusion as there is no symmetry between
this and pci_pwrctrl_unregister() function that removes the pwrctrl
devices.

So to make these two functions symmetric, move the creation of pwrctrl
devices to pci_scan_device(). During the scan of each device in a slot,
the devicetree node (if exists) for the PCI device will be checked. If it
has the supplies populated, then the pwrctrl device will be created.

Since the PCI device scan happens so early, there would be no "struct
pci_dev" available for the device. So the host bridge is used as the
parent of all pwrctrl devices.

One nice side effect of this move is that, it is now possible to have
pwrctrl devices for PCI bridges as well (to control the supplies of PCI
slots).

Suggested-by: Lukas Wunner <lukas@wunner.de>
Tested-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20250116-pci-pwrctrl-slot-v3-1-827473c8fbf4@linaro.org
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-02-20 10:59:02 +00:00
Alex Williamson
472ff48e2c PCI: Fix BUILD_BUG_ON usage for old gcc
As reported in the below link, it seems older versions of gcc cannot
determine that the howmany variable is known for all callers.  Include
a test so that newer compilers can enforce this sanity check and older
compilers can still work.  Add __always_inline attribute to give the
compiler an even better chance to know the inputs.

Link: https://lore.kernel.org/r/20250212185337.293023-1-alex.williamson@redhat.com
Fixes: 4453f36086 ("PCI: Batch BAR sizing operations")
Reported-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/all/20250209154512.GA18688@redhat.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: Mitchell Augustin <mitchell.augustin@canonical.com>
2025-02-12 16:16:21 -06:00
Bjorn Helgaas
f4a09274c5 Merge branch 'pci/err'
- Unexport pcie_read_tlp_log() to encourage drivers to use PCI core logging
  rather than building their own (Ilpo Järvinen)

- Move TLP Log handling to its own file (Ilpo Järvinen)

- Add #defines for TLP Header/Prefix log sizes (Ilpo Järvinen)

- Store number of supported End-End TLP Prefixes always so we can read the
  correct number of DWORDs from the TLP Prefix Log (Ilpo Järvinen)

- Read TLP Prefixes in addition to the Header Log in pcie_read_tlp_log()
  (Ilpo Järvinen)

- Add pcie_print_tlp_log() to consolidate printing of TLP Header and Prefix
  Log (Ilpo Järvinen)

* pci/err:
  PCI: Add pcie_print_tlp_log() to print TLP Header and Prefix Log
  PCI: Add TLP Prefix reading to pcie_read_tlp_log()
  PCI: Store number of supported End-End TLP Prefixes
  PCI: Use unsigned int i in pcie_read_tlp_log()
  PCI: Use same names in pcie_read_tlp_log() prototype and definition
  PCI: Add defines for TLP Header/Prefix log sizes
  PCI: Move TLP Log handling to its own file
  PCI: Don't expose pcie_read_tlp_log() outside PCI subsystem
2025-01-23 13:04:50 -06:00