After recovering from a PCI error through reset, affected devices are in
D0_uninitialized state and need to be brought into D0_active state by
re-initializing their Config Space registers (PCIe r7.0 sec 5.3.1.1).
To facilitate that, the PCI core provides pci_restore_state() and
pci_save_state() helpers. Document rules governing their usage.
As Bjorn notes, so far no file in "Documentation/ includes anything about
the idea of a driver using pci_save_state() to capture the state it wants
to restore after an error", even though it is a common pattern in drivers.
So that's obviously a gap that should be closed.
Reported-by: Bjorn Helgaas <helgaas@kernel.org>
Closes: https://lore.kernel.org/r/20251113161556.GA2284238@bhelgaas/
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Link: https://patch.msgid.link/077596ba70202be0e43fdad3bb9b93d356cbe4ec.1763746079.git.lukas@wunner.de
Pull pci updates from Bjorn Helgaas:
"Enumeration:
- Add PCI_FIND_NEXT_CAP() and PCI_FIND_NEXT_EXT_CAP() macros that
take config space accessor functions.
Implement pci_find_capability(), pci_find_ext_capability(), and
dwc, dwc endpoint, and cadence capability search interfaces with
them (Hans Zhang)
- Leave parent unit address 0 in 'interrupt-map' so that when we
build devicetree nodes to describe PCI functions that contain
multiple peripherals, we can build this property even when
interrupt controllers lack 'reg' properties (Lorenzo Pieralisi)
- Add a Xeon 6 quirk to disable Extended Tags and limit Max Read
Request Size to 128B to avoid a performance issue (Ilpo Järvinen)
- Add sysfs 'serial_number' file to expose the Device Serial Number
(Matthew Wood)
- Fix pci_acpi_preserve_config() memory leak (Nirmoy Das)
Resource management:
- Align m68k pcibios_enable_device() with other arches (Ilpo
Järvinen)
- Remove sparc pcibios_enable_device() implementations that don't do
anything beyond what pci_enable_resources() does (Ilpo Järvinen)
- Remove mips pcibios_enable_resources() and use
pci_enable_resources() instead (Ilpo Järvinen)
- Clean up bridge window sizing and assignment (Ilpo Järvinen),
including:
- Leave non-claimed bridge windows disabled
- Enable bridges even if a window wasn't assigned because not all
windows are required by downstream devices
- Preserve bridge window type when releasing the resource, since
the type is needed for reassignment
- Consolidate selection of bridge windows into two new
interfaces, pbus_select_window() and
pbus_select_window_for_type(), so this is done consistently
- Compute bridge window start and end earlier to avoid logging
stale information
MSI:
- Add quirk to disable MSI on RDC PCI to PCIe bridges (Marcos Del Sol
Vives)
Error handling:
- Align AER with EEH by allowing drivers to request a Bus Reset on
Non-Fatal Errors (in addition to the reset on Fatal Errors that we
already do) (Lukas Wunner)
- If error recovery fails, emit FAILED_RECOVERY uevents for the
devices, not for the bridge leading to them.
This makes them correspond to BEGIN_RECOVERY uevents (Lukas Wunner)
- Align AER with EEH by calling err_handler.error_detected()
callbacks to notify drivers if error recovery fails (Lukas Wunner)
- Align AER with EEH by restoring device error_state to
pci_channel_io_normal before the err_handler.slot_reset() callback.
This is earlier than before the err_handler.resume() callback
(Lukas Wunner)
- Emit a BEGIN_RECOVERY uevent when driver's
err_handler.error_detected() requests a reset, as well as when it
says recovery is complete or can be done without a reset (Niklas
Schnelle)
- Align s390 with AER and EEH by emitting uevents during error
recovery (Niklas Schnelle)
- Align EEH with AER and s390 by emitting BEGIN_RECOVERY,
SUCCESSFUL_RECOVERY, or FAILED_RECOVERY uevents depending on the
result of err_handler.error_detected() (Niklas Schnelle)
- Fix a NULL pointer dereference in aer_ratelimit() when ACPI GHES
error information identifies a device without an AER Capability
(Breno Leitao)
- Update error decoding and TLP Log printing for new errors in
current PCIe base spec (Lukas Wunner)
- Update error recovery documentation to match the current code
and use consistent nomenclature (Lukas Wunner)
ASPM:
- Enable all ClockPM and ASPM states for devicetree platforms, since
there's typically no firmware that enables ASPM
This is a risky change that may uncover hardware or configuration
defects at boot-time rather than when users enable ASPM via sysfs
later. Booting with "pcie_aspm=off" prevents this enabling
(Manivannan Sadhasivam)
- Remove the qcom code that enabled ASPM (Manivannan Sadhasivam)
Power management:
- If a device has already been disconnected, e.g., by a hotplug
removal, don't bother trying to resume it to D0 when detaching the
driver.
This avoids annoying "Unable to change power state from D3cold to
D0" messages (Mario Limonciello)
- Ensure devices are powered up before config reads for
'max_link_width', 'current_link_speed', 'current_link_width',
'secondary_bus_number', and 'subordinate_bus_number' sysfs files.
This prevents using invalid data (~0) in drivers or lspci and,
depending on how the PCIe controller reports errors, may avoid
error interrupts or crashes (Brian Norris)
Virtualization:
- Add rescan/remove locking when enabling/disabling SR-IOV, which
avoids list corruption on s390, where disabling SR-IOV also
generates hotplug events (Niklas Schnelle)
Peer-to-peer DMA:
- Free struct p2p_pgmap, not a member within it, in the
pci_p2pdma_add_resource() error path (Sungho Kim)
Endpoint framework:
- Document sysfs interface for BAR assignment of vNTB endpoint
functions (Jerome Brunet)
- Fix array underflow in endpoint BAR test case (Dan Carpenter)
- Skip endpoint IRQ test if the IRQ is out of range to avoid false
errors (Christian Bruel)
- Fix endpoint test case for controllers with fixed-size BARs smaller
than requested by the test (Marek Vasut)
- Restore inbound translation when disabling doorbell so the endpoint
doorbell test case can be run more than once (Niklas Cassel)
- Avoid a NULL pointer dereference when releasing DMA channels in
endpoint DMA test case (Shin'ichiro Kawasaki)
- Convert tegra194 interrupt number to MSI vector to fix endpoint
Kselftest MSI_TEST test case (Niklas Cassel)
- Reset tegra194 BARs when running in endpoint mode so the BAR tests
don't overwrite the ATU settings in BAR4 (Niklas Cassel)
- Handle errors in tegra194 BPMP transactions so we don't mistakenly
skip future PERST# assertion (Vidya Sagar)
AMD MDB PCIe controller driver:
- Update DT binding example to separate PERST# to a Root Port stanza
to make multiple Root Ports possible in the future (Sai Krishna
Musham)
- Add driver support for PERST# being described in a Root Port
stanza, falling back to the host bridge if not found there (Sai
Krishna Musham)
Freescale i.MX6 PCIe controller driver:
- Enable the 3.3V Vaux supply if available so devices can request
wakeup with either Beacon or WAKE# (Richard Zhu)
MediaTek PCIe Gen3 controller driver:
- Add optional sys clock ready time setting to avoid sys_clk_rdy
signal glitching in MT6991 and MT8196 (AngeloGioacchino Del Regno)
- Add DT binding and driver support for MT6991 and MT8196
(AngeloGioacchino Del Regno)
NVIDIA Tegra PCIe controller driver:
- When asserting PERST#, disable the controller instead of mistakenly
disabling the PLL twice (Nagarjuna Kristam)
- Convert struct tegra_msi mask_lock to raw spinlock to avoid a lock
nesting error (Marek Vasut)
Qualcomm PCIe controller driver:
- Select PCI Power Control Slot driver so slot voltage rails can be
turned on/off if described in Root Port devicetree node (Qiang Yu)
- Parse only PCI bridge child nodes in devicetree, skipping unrelated
nodes such as OPP (Operating Performance Points), which caused
probe failures (Krishna Chaitanya Chundru)
- Add 8.0 GT/s and 32.0 GT/s equalization settings (Ziyue Zhang)
- Consolidate Root Port 'phy' and 'reset' properties in struct
qcom_pcie_port, regardless of whether we got them from the Root
Port node or the host bridge node (Manivannan Sadhasivam)
- Fetch and map the ELBI register space in the DWC core rather than
in each driver individually (Krishna Chaitanya Chundru)
- Enable ECAM mechanism in DWC core by setting up iATU with 'CFG
Shift Feature' and use this in the qcom driver (Krishna Chaitanya
Chundru)
- Add SM8750 compatible to qcom,pcie-sm8550.yaml (Krishna Chaitanya
Chundru)
- Update qcom,pcie-x1e80100.yaml to allow fifth PCIe host on Qualcomm
Glymur, which is compatible with X1E80100 but doesn't have the
cnoc_sf_axi clock (Qiang Yu)
Renesas R-Car PCIe controller driver:
- Fix a typo that prevented correct PHY initialization (Marek Vasut)
- Add a missing 1ms delay after PWR reset assertion as required by
the V4H manual (Marek Vasut)
- Assure reset has completed before DBI access to avoid SError (Marek
Vasut)
- Fix inverted PHY initialization check, which sometimes led to
timeouts and failure to start the controller (Marek Vasut)
- Pass the correct IRQ domain to generic_handle_domain_irq() to fix a
regression when converting to msi_create_parent_irq_domain()
(Claudiu Beznea)
- Drop the spinlock protecting the PMSR register - it's no longer
required since pci_lock already serializes accesses (Marek Vasut)
- Convert struct rcar_msi mask_lock to raw spinlock to avoid a lock
nesting error (Marek Vasut)
SOPHGO PCIe controller driver:
- Check for existence of struct cdns_pcie.ops before using it to
allow Cadence drivers that don't need to supply ops (Chen Wang)
- Add DT binding and driver for the SOPHGO SG2042 PCIe controller
(Chen Wang)
STMicroelectronics STM32MP25 PCIe controller driver:
- Update pinctrl documentation of initial states and use in runtime
suspend/resume (Christian Bruel)
- Add pinctrl_pm_select_init_state() for use by stm32 driver, which
needs it during resume (Christian Bruel)
- Add devicetree bindings and drivers for the STMicroelectronics
STM32MP25 in host and endpoint modes (Christian Bruel)
Synopsys DesignWare PCIe controller driver:
- Add support for x16 in devicetree 'num-lanes' property (Konrad
Dybcio)
- Verify that if DT specifies a single IRQ for all eDMA channels, it
is named 'dma' (Niklas Cassel)
TI J721E PCIe driver:
- Add MODULE_DEVICE_TABLE() so driver can be autoloaded (Siddharth
Vadapalli)
- Power controller off before configuring the glue layer so the
controller latches the correct values on power-on (Siddharth
Vadapalli)
TI Keystone PCIe controller driver:
- Use devm_request_irq() so 'ks-pcie-error-irq' is freed when driver
exits with error (Siddharth Vadapalli)
- Add Peripheral Virtualization Unit (PVU), which restricts DMA from
PCIe devices to specific regions of host memory, to the ti,am65
binding (Jan Kiszka)
Xilinx NWL PCIe controller driver:
- Clear bootloader E_ECAM_CONTROL before merging in the new driver
value to avoid writing invalid values (Jani Nurminen)"
* tag 'pci-v6.18-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (141 commits)
PCI/AER: Avoid NULL pointer dereference in aer_ratelimit()
MAINTAINERS: Add entry for ST STM32MP25 PCIe drivers
PCI: stm32-ep: Add PCIe Endpoint support for STM32MP25
dt-bindings: PCI: Add STM32MP25 PCIe Endpoint bindings
PCI: stm32: Add PCIe host support for STM32MP25
PCI: xilinx-nwl: Fix ECAM programming
PCI: j721e: Fix incorrect error message in probe()
PCI: keystone: Use devm_request_irq() to free "ks-pcie-error-irq" on exit
dt-bindings: PCI: qcom,pcie-x1e80100: Set clocks minItems for the fifth Glymur PCIe Controller
PCI: dwc: Support 16-lane operation
PCI: Add lockdep assertion in pci_stop_and_remove_bus_device()
PCI/IOV: Add PCI rescan-remove locking when enabling/disabling SR-IOV
PCI: rcar-host: Convert struct rcar_msi mask_lock into raw spinlock
PCI: tegra194: Rename 'root_bus' to 'root_port_bus' in tegra_pcie_downstream_dev_to_D0()
PCI: tegra: Convert struct tegra_msi mask_lock into raw spinlock
PCI: rcar-gen4: Fix inverted break condition in PHY initialization
PCI: rcar-gen4: Assure reset occurs before DBI access
PCI: rcar-gen4: Add missing 1ms delay after PWR reset assertion
PCI: Set up bridge resources earlier
PCI: rcar-host: Drop PMSR spinlock
...
- Check for errors returned from pci_epc_get(), which returns IS_ERR(), not
NULL on error (Dan Carpenter)
- Fix pci_endpoint_test_ioctl() array underflow (Dan Carpenter)
- Document sysfs interface for BAR assignment of vNTB endpoint functions
(Jerome Brunet)
- Drop superfluous pci_epc_features initialization for unsupported
features; we only have to mention features that *are* supported (Niklas
Cassel)
- Skip IRQ tests if the IRQ is out of range (Christian Bruel)
- Fix pci-epf-test for controllers with fixed-size BARs smaller than
requested by the test (Marek Vasut)
- Restore inbound translation when disabling doorbell so the doorbell test
case can be run more than once (Niklas Cassel)
- Check for NULL before releasing DMA channels to avoid a NULL pointer
dereference (Shin'ichiro Kawasaki)
- Convert tegra194 interrupt number to MSI vector to fix endpoint Kselftest
MSI_TEST test case (Niklas Cassel)
- Set tegra_pcie_epc_features.msi_capable so the pci_endpoint_test can use
the optimal IRQ type (Niklas Cassel)
- Reset tegra194 BARs when running in endpoint mode so the BAR tests don't
overwrite the ATU settings in BAR4 (Niklas Cassel)
- Handle errors in tegra194 BPMP transactions so we don't mistakenly skip
future PERST# assertion (Vidya Sagar)
* pci/endpoint:
PCI: tegra194: Handle errors in BPMP response
PCI: tegra194: Reset BARs when running in PCIe endpoint mode
PCI: tegra194: Set pci_epc_features::msi_capable to true
PCI: tegra194: Fix broken tegra_pcie_ep_raise_msi_irq()
PCI: endpoint: pci-epf-test: Add NULL check for DMA channels before release
PCI: endpoint: pci-epf-test: Fix doorbell test support
PCI: endpoint: pci-epf-test: Limit PCIe BAR size for fixed BARs
selftests: pci_endpoint: Skip IRQ test if IRQ is out of range.
misc: pci_endpoint_test: Cleanup extra 0 initialization
misc: pci_endpoint_test: Skip IRQ tests if irq is out of range
PCI: endpoint: Drop superfluous pci_epc_features initialization
Documentation: PCI: endpoint: Document BAR assignment
misc: pci_endpoint_test: Fix array underflow in pci_endpoint_test_ioctl()
PCI: endpoint: pci-ep-msi: Fix NULL vs IS_ERR() check in pci_epf_write_msi_msg()
Amend the documentation on PCI error recovery with specifics about
Downstream Port Containment and Advanced Error Reporting:
* Explain that with DPC, devices are inaccessible upon an error (similar
to EEH on powerpc) and do not become accessible until the link is
re-enabled.
* Explain that with AER, although devices may already be accessible in the
->error_detected() callback, accesses should be deferred to the
->mmio_enabled() callback for compatibility with EEH on powerpc and with
s390.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Brian Norris <briannorris@chromium.org>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://patch.msgid.link/61d8eeadb20ee71c3a852f44c863bfe0209c454d.1757942121.git.lukas@wunner.de
Amend the documentation on PCI error recovery to fix minor inaccuracies
vis-à-vis the actual code:
* The documentation claims that a missing ->resume() or ->mmio_enabled()
callback always leads to recovery through reset. But none of the
implementations do this (pcie_do_recovery(), eeh_handle_normal_event(),
zpci_event_do_error_state_clear()).
Drop the claim to align the documentation with the code.
* The documentation does not list PCI_ERS_RESULT_RECOVERED as a valid
return value from ->error_detected(). But none of the implementations
forbid this and some drivers are returning it, e.g.:
drivers/bus/mhi/host/pci_generic.c
drivers/infiniband/hw/hfi1/pcie.c
Further down in the documentation it is implied that the return value is
in fact allowed:
"The platform will call the resume() callback on all affected device
drivers if all drivers on the segment have returned
PCI_ERS_RESULT_RECOVERED from one of the 3 previous callbacks."
The "3 previous callbacks" being ->error_detected(), ->mmio_enabled()
and ->slot_reset().
Add it to the valid return values for consistency.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Brian Norris <briannorris@chromium.org>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://patch.msgid.link/ed3c3385499775fcc25f1ee66f395e212919f94a.1757942121.git.lukas@wunner.de
The PCIe Advanced Error Reporting driver has evolved over the years but
its documentation hasn't. Catch up with past code changes:
* The documentation claims that Correctable Errors are logged with
KERN_INFO severity, but the code uses KERN_WARN.
It had used KERN_WARN from the beginning with commit 6c2b374d74
("PCI-Express AER implemetation: AER core and aerdriver"). In 2013,
commit 2cced2d959 ("aerdrv: Cleanup log output for AER") switched to
KERN_ERR, until 2020 when it was reverted back to KERN_WARN by commit
e83e2ca3c3 ("PCI/AER: Log correctable errors as warning, not error").
* An example log message in the documentation uses the term "Uncorrected",
but the code uses "Uncorrectable" since commit 02a06f5f1a ("PCI/AER:
Use 'Correctable' and 'Uncorrectable' spec terms for errors").
* The example contains the Requester ID "id=0500", which is omitted since
commit 010caed4cc ("PCI/AER: Decode Error Source Requester ID").
* The example contains the error name "Unsupported Request", which is
instead reported as "UnsupReq" since commit bd237801fe ("PCI/AER:
Adopt lspci names for AER error decoding").
* The example doesn't prepend "0x" to hex values from the TLP Header Log,
as introduced by commit f68ea779d9 ("PCI: Add pcie_print_tlp_log() to
print TLP Header and Prefix Log").
* The documentation refers to a reset_link callback which was removed by
commit b6cf1a42f9 ("PCI/ERR: Remove service dependency in
pcie_do_recovery()").
* Commit 5790862255 ("PCI/ERR: Recover from RCiEP AER errors") added
support to recover Root Complex Integrated Endpoints by applying a
Function Level Reset, alternatively to the Secondary Bus Reset which is
applied otherwise.
* On non-fatal errors, a reset was previously never performed. But the
AER driver has just been amended to allow drivers to opt in to a reset.
* The documentation claims that a warning message is logged if a driver
lacks pci_error_handlers. But the message has been informational
(logged with KERN_INFO severity) since its introduction with commit
01daacfb90 ("PCI/AER: Log which device prevents error recovery").
The documentation claims that the message is only logged for fatal
errors, which is incorrect. Moreover it refers to "section 3", even
though the documentation no longer contains section numbers since commit
4e37f055a9 ("Documentation: PCI: convert pcieaer-howto.txt to reST").
Section 3 is titled "Developer Guide". That's the same section where
the reference is located, so it is self-referential and can be dropped.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Brian Norris <briannorris@chromium.org>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://patch.msgid.link/7501bfc5b9920193a25998a3cbcf72c47674ec63.1757942121.git.lukas@wunner.de
- Describe endpoint BAR 4 as being fixed size (Jerome Brunet)
- Document how to obtain R-Car V4H (r8a779g0) controller firmware
(Yoshihiro Shimoda)
* pci/controller/rcar-gen4:
PCI: rcar-gen4: Document how to obtain platform firmware
PCI: rcar-gen4: set ep BAR4 fixed size
Allow userspace to read/write log ratelimits per device (including
enable/disable). Create aer/ sysfs directory to store them and any
future AER configs.
The new sysfs files are:
/sys/bus/pci/devices/*/aer/correctable_ratelimit_burst
/sys/bus/pci/devices/*/aer/correctable_ratelimit_interval_ms
/sys/bus/pci/devices/*/aer/nonfatal_ratelimit_burst
/sys/bus/pci/devices/*/aer/nonfatal_ratelimit_interval_ms
The default values are ratelimit_burst=10, ratelimit_interval_ms=5000, so
if we try to emit more than 10 messages in a 5 second period, some are
suppressed.
Update AER sysfs ABI filename to reflect the broader scope of AER sysfs
attributes (e.g. stats and ratelimits).
Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats ->
sysfs-bus-pci-devices-aer
Tested using aer-inject[1]. Configured correctable log ratelimit to 5.
Sent 6 AER errors. Observed 5 errors logged while AER stats
(cat /sys/bus/pci/devices/<dev>/aer_dev_correctable) shows 6.
Disabled ratelimiting and sent 6 more AER errors. Observed all 6 errors
logged and accounted in AER stats (12 total errors).
[1] https://git.kernel.org/pub/scm/linux/kernel/git/gong.chen/aer-inject.git
[bhelgaas: note fatal errors are not ratelimited, "aer_report" ->
"aer_info", replace ratelimit_log_enable toggle with *_ratelimit_interval_ms]
Signed-off-by: Karolina Stolarek <karolina.stolarek@oracle.com>
Signed-off-by: Jon Pan-Doh <pandoh@google.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://patch.msgid.link/20250522232339.1525671-21-helgaas@kernel.org
Pull pci updates from Bjorn Helgaas:
"Enumeration:
- Batch sizing of multiple BARs while memory decoding is disabled
instead of disabling/enabling decoding for each BAR individually;
this optimizes virtualized environments where toggling decoding
enable is expensive (Alex Williamson)
- Add host bridge .enable_device() and .disable_device() hooks for
bridges that need to configure things like Requester ID to StreamID
mapping when enabling devices (Frank Li)
- Extend struct pci_ecam_ops with .enable_device() and
.disable_device() hooks so drivers that use pci_host_common_probe()
instead of their own .probe() have a way to set the
.enable_device() callbacks (Marc Zyngier)
- Drop 'No bus range found' message so we don't complain when DTs
don't specify the default 'bus-range = <0x00 0xff>' (Bjorn Helgaas)
- Rename the drivers/pci/of_property.c struct of_pci_range to
of_pci_range_entry to avoid confusion with the global of_pci_range
in include/linux/of_address.h (Bjorn Helgaas)
Driver binding:
- Update resource request API documentation to encourage callers to
supply a driver name when requesting resources (Philipp Stanner)
- Export pci_intx_unmanaged() and pcim_intx() (always managed) so
callers of pci_intx() (which is sometimes managed) can explicitly
choose the one they need (Philipp Stanner)
- Convert drivers from pci_intx() to always-managed pcim_intx() or
never-managed pci_intx_unmanaged(): amd_sfh, ata (ahci, ata_piix,
pata_rdc, sata_sil24, sata_sis, sata_uli, sata_vsc), bnx2x, bna,
ntb, qtnfmac, rtsx, tifm_7xx1, vfio, xen-pciback (Philipp Stanner)
- Remove pci_intx_unmanaged() since pci_intx() is now always
unmanaged and pcim_intx() is always managed (Philipp Stanner)
Error handling:
- Unexport pcie_read_tlp_log() to encourage drivers to use PCI core
logging rather than building their own (Ilpo Järvinen)
- Move TLP Log handling to its own file (Ilpo Järvinen)
- Store number of supported End-End TLP Prefixes always so we can
read the correct number of DWORDs from the TLP Prefix Log (Ilpo
Järvinen)
- Read TLP Prefixes in addition to the Header Log in
pcie_read_tlp_log() (Ilpo Järvinen)
- Add pcie_print_tlp_log() to consolidate printing of TLP Header and
Prefix Log (Ilpo Järvinen)
- Quirk the Intel Raptor Lake-P PIO log size to accommodate vendor
BIOSes that don't configure it correctly (Takashi Iwai)
ASPM:
- Save parent L1 PM Substates config so when we restore it along with
an endpoint's config, the parent info isn't junk (Jian-Hong Pan)
Power management:
- Avoid D3 for Root Ports on TUXEDO Sirius Gen1 with old BIOS because
the system can't wake up from suspend (Werner Sembach)
Endpoint framework:
- Destroy the EPC device in devm_pci_epc_destroy(), which previously
didn't call devres_release() (Zijun Hu)
- Finish virtual EP removal in pci_epf_remove_vepf(), which
previously caused a subsequent pci_epf_add_vepf() to fail with
-EBUSY (Zijun Hu)
- Write BAR_MASK before iATU registers in pci_epc_set_bar() so we
don't depend on the BAR_MASK reset value being larger than the
requested BAR size (Niklas Cassel)
- Prevent changing BAR size/flags in pci_epc_set_bar() to prevent
reads from bypassing the iATU if we reduced the BAR size (Niklas
Cassel)
- Verify address alignment when programming iATU so we don't attempt
to write bits that are read-only because of the BAR size, which
could lead to directing accesses to the wrong address (Niklas
Cassel)
- Implement artpec6 pci_epc_features so we can rely on all drivers
supporting it so we can use it in EPC core code (Niklas Cassel)
- Check for BARs of fixed size to prevent endpoint drivers from
trying to change their size (Niklas Cassel)
- Verify that requested BAR size is a power of two when endpoint
driver sets the BAR (Niklas Cassel)
Endpoint framework tests:
- Clear pci-epf-test dma_chan_rx, not dma_chan_tx, after freeing
dma_chan_rx (Mohamed Khalfella)
- Correct the DMA MEMCPY test so it doesn't fail if the Endpoint
supports both DMA_PRIVATE and DMA_MEMCPY (Manivannan Sadhasivam)
- Add pci-epf-test and pci_endpoint_test support for capabilities
(Niklas Cassel)
- Add Endpoint test for consecutive BARs (Niklas Cassel)
- Remove redundant comparison from Endpoint BAR test because a > 1MB
BAR can always be exactly covered by iterating with a 1MB buffer
(Hans Zhang)
- Move and convert PCI Endpoint tests from tools/pci to Kselftests
(Manivannan Sadhasivam)
Apple PCIe controller driver:
- Convert StreamID mapping configuration from a bus notifier to the
.enable_device() and .disable_device() callbacks (Marc Zyngier)
Freescale i.MX6 PCIe controller driver:
- Add Requester ID to StreamID mapping configuration when enabling
devices (Frank Li)
- Use DWC core suspend/resume functions for imx6 (Frank Li)
- Add suspend/resume support for i.MX8MQ, i.MX8Q, and i.MX95 (Richard
Zhu)
- Add DT compatible string 'fsl,imx8q-pcie-ep' and driver support for
i.MX8Q series (i.MX8QM, i.MX8QXP, and i.MX8DXL) Endpoints (Frank
Li)
- Add DT binding for optional i.MX95 Refclk and driver support to
enable it if the platform hasn't enabled it (Richard Zhu)
- Configure PHY based on controller being in Root Complex or Endpoint
mode (Frank Li)
- Rely on dbi2 and iATU base addresses from DT via
dw_pcie_get_resources() instead of hardcoding them (Richard Zhu)
- Deassert apps_reset in imx_pcie_deassert_core_reset() since it is
asserted in imx_pcie_assert_core_reset() (Richard Zhu)
- Add missing reference clock enable or disable logic for IMX6SX,
IMX7D, IMX8MM (Richard Zhu)
- Remove redundant imx7d_pcie_init_phy() since
imx7d_pcie_enable_ref_clk() does the same thing (Richard Zhu)
Freescale Layerscape PCIe controller driver:
- Simplify by using syscon_regmap_lookup_by_phandle_args() instead
of syscon_regmap_lookup_by_phandle() followed by
of_property_read_u32_array() (Krzysztof Kozlowski)
Marvell MVEBU PCIe controller driver:
- Add MODULE_DEVICE_TABLE() to enable module autoloading (Liao Chen)
MediaTek PCIe Gen3 controller driver:
- Use clk_bulk_prepare_enable() instead of separate
clk_bulk_prepare() and clk_bulk_enable() (Lorenzo Bianconi)
- Rearrange reset assert/deassert so they're both done in the
*_power_up() callbacks (Lorenzo Bianconi)
- Document that Airoha EN7581 requires PHY init and power-on before
PHY reset deassert, unlike other MediaTek Gen3 controllers (Lorenzo
Bianconi)
- Move Airoha EN7581 post-reset delay from the en7581 clock .enable()
method to mtk_pcie_en7581_power_up() (Lorenzo Bianconi)
- Sleep instead of delay during Airoha EN7581 power-up, since this is
a non-atomic context (Lorenzo Bianconi)
- Skip PERST# assertion on Airoha EN7581 during probe and
suspend/resume to avoid a hardware defect (Lorenzo Bianconi)
- Enable async probe to reduce system startup time (Douglas Anderson)
Microchip PolarFlare PCIe controller driver:
- Set up the inbound address translation based on whether the
platform allows coherent or non-coherent DMA (Daire McNamara)
- Update DT binding such that platforms are DMA-coherent by default
and must specify 'dma-noncoherent' if needed (Conor Dooley)
Mobiveil PCIe controller driver:
- Convert mobiveil-pcie.txt to YAML and update 'interrupt-names'
and 'reg-names' (Frank Li)
Qualcomm PCIe controller driver:
- Add DT SM8550 and SM8650 optional 'global' interrupt for link
events (Neil Armstrong)
- Add DT 'compatible' strings for IPQ5424 PCIe controller (Manikanta
Mylavarapu)
- If 'global' IRQ is supported for detection of Link Up events, tell
DWC core not to wait for link up (Krishna chaitanya chundru)
Renesas R-Car PCIe controller driver:
- Avoid passing stack buffer as resource name (King Dix)
Rockchip PCIe controller driver:
- Simplify clock and reset handling by using bulk interfaces (Anand
Moon)
- Pass typed rockchip_pcie (not void) pointer to
rockchip_pcie_disable_clocks() (Anand Moon)
- Return -ENOMEM, not success, when pci_epc_mem_alloc_addr() fails
(Dan Carpenter)
Rockchip DesignWare PCIe controller driver:
- Use dll_link_up IRQ to detect Link Up and enumerate devices so
users don't have to manually rescan (Niklas Cassel)
- Tell DWC core not to wait for link up since the 'sys' interrupt is
required and detects Link Up events (Niklas Cassel)
Synopsys DesignWare PCIe controller driver:
- Don't wait for link up in DWC core if driver can detect Link Up
event (Krishna chaitanya chundru)
- Update ICC and OPP votes after Link Up events (Krishna chaitanya
chundru)
- Always stop link in dw_pcie_suspend_noirq(), which is required at
least for i.MX8QM to re-establish link on resume (Richard Zhu)
- Drop racy and unnecessary LTSSM state check before sending
PME_TURN_OFF message in dw_pcie_suspend_noirq() (Richard Zhu)
- Add struct of_pci_range.parent_bus_addr for devices that need their
immediate parent bus address, not the CPU address, e.g., to program
an internal Address Translation Unit (iATU) (Frank Li)
TI DRA7xx PCIe controller driver:
- Simplify by using syscon_regmap_lookup_by_phandle_args() instead of
syscon_regmap_lookup_by_phandle() followed by
of_parse_phandle_with_fixed_args() or of_property_read_u32_index()
(Krzysztof Kozlowski)
Xilinx Versal CPM PCIe controller driver:
- Add DT binding and driver support for Xilinx Versal CPM5
(Thippeswamy Havalige)
MicroSemi Switchtec management driver:
- Add Microchip PCI100X device IDs (Rakesh Babu Saladi)
Miscellaneous:
- Move reset related sysfs code from pci.c to pci-sysfs.c where other
similar code lives (Ilpo Järvinen)
- Simplify reset_method_store() memory management by using __free()
instead of explicit kfree() cleanup (Ilpo Järvinen)
- Constify struct bin_attribute for sysfs, VPD, P2PDMA, and the IBM
ACPI hotplug driver (Thomas Weißschuh)
- Remove redundant PCI_VSEC_HDR and PCI_VSEC_HDR_LEN_SHIFT (Dongdong
Zhang)
- Correct documentation of the 'config_acs=' kernel parameter
(Akihiko Odaki)"
* tag 'pci-v6.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (111 commits)
PCI: Batch BAR sizing operations
dt-bindings: PCI: microchip,pcie-host: Allow dma-noncoherent
PCI: microchip: Set inbound address translation for coherent or non-coherent mode
Documentation: Fix pci=config_acs= example
PCI: Remove redundant PCI_VSEC_HDR and PCI_VSEC_HDR_LEN_SHIFT
PCI: Don't include 'pm_wakeup.h' directly
selftests: pci_endpoint: Migrate to Kselftest framework
selftests: Move PCI Endpoint tests from tools/pci to Kselftests
misc: pci_endpoint_test: Fix IOCTL return value
dt-bindings: PCI: qcom: Document the IPQ5424 PCIe controller
dt-bindings: PCI: qcom,pcie-sm8550: Document 'global' interrupt
dt-bindings: PCI: mobiveil: Convert mobiveil-pcie.txt to YAML
PCI: switchtec: Add Microchip PCI100X device IDs
misc: pci_endpoint_test: Remove redundant 'remainder' test
misc: pci_endpoint_test: Add consecutive BAR test
misc: pci_endpoint_test: Add support for capabilities
PCI: endpoint: pci-epf-test: Add support for capabilities
PCI: endpoint: pci-epf-test: Fix check for DMA MEMCPY test
PCI: endpoint: pci-epf-test: Set dma_chan_rx pointer to NULL on error
PCI: dwc: Simplify config resource lookup
...
Add a documentation file
(Documentation/nvme/nvme-pci-endpoint-target.rst) for the new NVMe PCI
endpoint target driver. This provides an overview of the driver
requirements, capabilities and limitations. A user guide describing how
to setup a NVMe PCI endpoint device using this driver is also provided.
This document is made accessible also from the PCI endpoint
documentation using a link. Furthermore, since the existing nvme
documentation was not accessible from the top documentation index, an
index file is added to Documentation/nvme and this index listed as
"NVMe Subsystem" in the "Storage interfaces" section of the subsystem
API index.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Keith Busch <kbusch@kernel.org>
- Add pci_epc_function_is_valid() to avoid repeating common validation
checks (Damien Le Moal)
- Skip attempts to allocate from endpoint controller memory window if the
requested size is larger than the window (Damien Le Moal)
- Add and document pci_epc_mem_map() and pci_epc_mem_unmap() to handle
controller-specific size and alignment constraints, and add test cases to
the endpoint test driver (Damien Le Moal)
- Implement dwc pci_epc_ops.align_addr() so pci_epc_mem_map() can observe
DWC-specific alignment requirements (Damien Le Moal)
- Synchronously cancel command handler work in endpoint test before
cleaning up DMA and BARs (Damien Le Moal)
- Respect endpoint page size in dw_pcie_ep_align_addr() (Niklas Cassel)
- Use dw_pcie_ep_align_addr() in dw_pcie_ep_raise_msi_irq() and
dw_pcie_ep_raise_msix_irq() instead of open coding the equivalent (Niklas
Cassel)
- Remove superfluous 'return' from pci_epf_test_clean_dma_chan() (Wang
Jiang)
- Avoid NULL dereference if Modem Host Interface Endpoint lacks 'mmio' DT
property (Zhongqiu Han)
- Release PCI domain ID of Endpoint controller parent (not controller
itself) and before unregistering the controller, to avoid use-after-free
(Zijun Hu)
- Clear secondary (not primary) EPC in pci_epc_remove_epf() when removing
the secondary controller associated with an NTB (Zijun Hu)
- Fix pci_epc_map map_size kerneldoc (Rick Wertenbroek)
* pci/endpoint:
PCI: endpoint: Fix pci_epc_map map_size kerneldoc string
PCI: endpoint: Clear secondary (not primary) EPC in pci_epc_remove_epf()
PCI: endpoint: Fix PCI domain ID release in pci_epc_destroy()
PCI: endpoint: epf-mhi: Avoid NULL dereference if DT lacks 'mmio'
PCI: endpoint: Remove surplus return statement from pci_epf_test_clean_dma_chan()
PCI: dwc: ep: Use align addr function for dw_pcie_ep_raise_{msi,msix}_irq()
PCI: endpoint: test: Synchronously cancel command handler work
PCI: dwc: endpoint: Implement the pci_epc_ops::align_addr() operation
PCI: endpoint: test: Use pci_epc_mem_map/unmap()
PCI: endpoint: Update documentation
PCI: endpoint: Introduce pci_epc_mem_map()/unmap()
PCI: endpoint: Improve pci_epc_mem_alloc_addr()
PCI: endpoint: Introduce pci_epc_function_is_valid()
- Add and document TLP Processing Hints (TPH) support so drivers can enable
and disable TPH and the kernel can save/restore TPH configuration (Wei
Huang)
- Add TPH Steering Tag support so drivers can retrieve Steering Tag values
associated with specific CPUs via an ACPI _DSM to direct DMA writes
closer to their consumers (Wei Huang)
* pci/tph:
PCI/TPH: Add TPH documentation
PCI/TPH: Add Steering Tag support
PCI: Add TLP Processing Hints (TPH) support
- Remove unused struct 'acpi_handle_node' (Dr. David Alan Gilbert)
- Use array notation for portdrv .id_table consistently (Masahiro Yamada)
- Switch to new Intel CPU model defines (Tony Luck)
- Add missing MODULE_DESCRIPTION() macros (Jeff Johnson)
* pci/misc:
PCI: controller: Add missing MODULE_DESCRIPTION() macros
PCI: Add missing MODULE_DESCRIPTION() macros
PCI/PM: Switch to new Intel CPU model defines
PCI: Use array for .id_table consistently
ACPI: PCI: Remove unused struct 'acpi_handle_node'
- Clear bridge Secondary Status errors after enumeration since enumeration
causes many errors (Vidya Sagar)
- Wait for Link Training==0 before starting Link retrain to avoid a race;
this was done previously but broken by a faulty merge (Ilpo Järvinen)
- Rename PCI_IRQ_LEGACY to PCI_IRQ_INTX to be more specific about what
"LEGACY" means (Damien Le Moal)
- Update return types of pci_find_capability() stubs to match the extern
declarations for the actual implementations (Bjorn Helgaas)
- Drop unnecessary pci_enable_device_io() from pata_cs5520 (Heiner
Kallweit)
- Drop unused pci_enable_device_io() (Heiner Kallweit)
- On 2016 and newer BIOSes, skip early E820 check for ECAM regions
described in ACPI MCFG; there's no spec requirement for E820
reservations, and some machines don't provide them (Bjorn Helgaas)
- If devices were disconnected while suspended, don't wait for them when
resuming (Ilpo Järvinen)
* pci/enumeration:
PCI: Do not wait for disconnected devices when resuming
x86/pci: Skip early E820 check for ECAM region
PCI: Remove unused pci_enable_device_io()
ata: pata_cs5520: Remove unnecessary call to pci_enable_device_io()
PCI: Update pci_find_capability() stub return types
PCI: Remove PCI_IRQ_LEGACY
scsi: vmw_pvscsi: Do not use PCI_IRQ_LEGACY instead of PCI_IRQ_LEGACY
scsi: pmcraid: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
scsi: mpt3sas: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
scsi: megaraid_sas: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
scsi: ipr: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
scsi: hpsa: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
scsi: arcmsr: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
wifi: rtw89: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
wifi: rtw88: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
wifi: ath10k: Refer to INTX instead of LEGACY
net: wangxun: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
r8169: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
net: alx: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
net: atlantic: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
net: amd-xgbe: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
VMCI: Use PCI_IRQ_ALL_TYPES to remove PCI_IRQ_LEGACY use
RDMA/vmw_pvrdma: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
IB/qib: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
drm/amdgpu: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
mfd: intel-lpss: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
ntb: idt: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
platform/x86: intel_ips: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
tty: 8250_pci: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
usb: hcd-pci: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
ASoC: Intel: avs: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
Documentation: PCI: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
PCI/portdrv: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
PCI/MSI: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
PCI: Clarify intent of LT wait
PCI: Wait for Link Training==0 before starting Link retrain
PCI: Clear Secondary Status errors after enumeration
- Reorder struct pci_dev to avoid holes and reduce size (Christophe
JAILLET)
- Change pdev->rom_attr_enabled to single bit since it's only a boolean
value (Christophe JAILLET)
- Use struct_size() in pirq_convert_irt_table() instead of hand-writing it
(Christophe JAILLET)
- Explicitly include correct DT includes to untangle headers (Rob Herring)
- Fix a DOE race between destroy_work_on_stack() and the stack-allocated
task->work struct going out of scope in pci_doe() (Ira Weiny)
- Use pci_dev_id() when possible instead of manually composing ID from
dev->bus->number and dev->devfn (Xiongfeng Wang, Zheng Zengkai)
- Move pci_create_resource_files() declarations to linux/pci.h for alpha
build warnings (Arnd Bergmann)
- Remove unused hotplug function declarations (Yue Haibing)
- Remove unused mvebu struct mvebu_pcie.busn (Pali Rohár)
- Unexport pcie_port_bus_type (Bjorn Helgaas)
- Remove unnecessary sysfs ID local variable initialization (Bjorn Helgaas)
- Fix BAR value printk formatting to accommodate 32-bit values (Bjorn
Helgaas)
- Use consistent pointer types for config access syscall get_user() and
put_user() uses (Bjorn Helgaas)
- Simplify AER_RECOVER_RING_SIZE definition (Bjorn Helgaas)
- Simplify pci_pio_to_address() (Bjorn Helgaas)
- Simplify pci_dev_driver() (Bjorn Helgaas)
- Fix pci_bus_resetable(), pci_slot_resetable() name typos (Bjorn Helgaas)
- Fix code and doc typos and code formatting (Bjorn Helgaas)
- Tidy config space save/restore messages (Bjorn Helgaas)
* pci/misc:
PCI: Tidy config space save/restore messages
PCI: Fix code formatting inconsistencies
PCI: Fix typos in docs and comments
PCI: Fix pci_bus_resetable(), pci_slot_resetable() name typos
PCI: Simplify pci_dev_driver()
PCI: Simplify pci_pio_to_address()
PCI/AER: Simplify AER_RECOVER_RING_SIZE definition
PCI: Use consistent put_user() pointer types
PCI: Fix printk field formatting
PCI: Remove unnecessary initializations
PCI: Unexport pcie_port_bus_type
PCI: mvebu: Remove unused busn member
PCI: Remove unused function declarations
PCI/sysfs: Move declarations to linux/pci.h
PCI/P2PDMA: Use pci_dev_id() to simplify the code
PCI/IOV: Use pci_dev_id() to simplify the code
PCI/AER: Use pci_dev_id() to simplify the code
PCI: apple: Use pci_dev_id() to simplify the code
PCI/DOE: Fix destroy_work_on_stack() race
PCI: Explicitly include correct DT includes
x86/PCI: Use struct_size() in pirq_convert_irt_table()
PCI: Change pdev->rom_attr_enabled to single bit
PCI: Reorder pci_dev fields to reduce holes
- Change "PCI Endpoint Virtual NTB driver" Kconfig prompt to be different
from "PCI Endpoint NTB driver" (Shunsuke Mie)
- Automatically create a function specific attributes group for endpoint
drivers to avoid reference counting issues (Damien Le Moal)
- Move and unexport pci_epf_type_add_cfs() (Damien Le Moal)
- Reinitialize EPF test DMA transfer completion before submitting it to
avoid losing the completion notification (Damien Le Moal)
- Fix EPF test DMA transfer completion detection (Damien Le Moal)
- Submit EPF test DMA transfers with dmaengine_submit(), not tx_submit()
(Damien Le Moal)
- Simplify EPF test read/write/copy functions (Damien Le Moal)
- Simplify EPF test "raise IRQ" interface (Damien Le Moal)
- Simplify EPF test IRQ command execution (Damien Le Moal)
- Improve EPF test command/status register handling (Damien Le Moal)
- Free IRQs before removing device (Damien Le Moal)
- Reinitialize IRQ completions for every test (Damien Le Moal)
- Don't write status in IRQ handler to avoid race (Damien Le Moal)
- Fix dma_chan direction in data transfer test (Yoshihiro Shimoda)
- Return pci_epf_type_add_cfs() error if EPF has no driver (Damien Le Moal)
- Add kernel-doc for pci_epc_raise_irq() and pci_epc_map_msi_irq() MSI
vector parameters (Manivannan Sadhasivam)
- Pass EPF device ID to driver probe functions (Manivannan Sadhasivam)
- Return -EALREADY if EPC has already been started/stopped (Manivannan
Sadhasivam)
- Add linkdown notifier support and use it in qcom-ep (Manivannan
Sadhasivam)
- Add Bus Master Enable event support and use it in qcom-ep (Manivannan
Sadhasivam)
- Add Qualcomm Modem Host Interface (MHI) endpoint driver (Manivannan
Sadhasivam)
- Add Layerscape PME interrupt handling to manage link-up notification
(Frank Li)
* pci/controller/endpoint:
PCI: layerscape: Add the endpoint linkup notifier support
PCI: endpoint: pci-epf-vntb: Fix typo in comments
MAINTAINERS: Add PCI MHI endpoint function driver under MHI bus
PCI: endpoint: Add PCI Endpoint function driver for MHI bus
PCI: qcom-ep: Add support for BME notification
PCI: qcom-ep: Add support for Link down notification
PCI: endpoint: Add BME notifier support
PCI: endpoint: Add linkdown notifier support
PCI: endpoint: Return error if EPC is started/stopped multiple times
PCI: endpoint: Pass EPF device ID to the probe function
PCI: endpoint: Add missing documentation about the MSI/MSI-X range
PCI: endpoint: Improve pci_epf_type_add_cfs()
PCI: endpoint: functions/pci-epf-test: Fix dma_chan direction
misc: pci_endpoint_test: Simplify pci_endpoint_test_msi_irq()
misc: pci_endpoint_test: Do not write status in IRQ handler
misc: pci_endpoint_test: Re-init completion for every test
misc: pci_endpoint_test: Free IRQs before removing the device
PCI: epf-test: Simplify transfers result print
PCI: epf-test: Simplify DMA support checks
PCI: epf-test: Cleanup request result handling
PCI: epf-test: Cleanup pci_epf_test_cmd_handler()
PCI: epf-test: Improve handling of command and status registers
PCI: epf-test: Simplify IRQ test commands execution
PCI: epf-test: Simplify pci_epf_test_raise_irq()
PCI: epf-test: Simplify read/write/copy test functions
PCI: epf-test: Use dmaengine_submit() to initiate DMA transfer
PCI: epf-test: Fix DMA transfer completion detection
PCI: epf-test: Fix DMA transfer completion initialization
PCI: endpoint: Move pci_epf_type_add_cfs() code
PCI: endpoint: Automatically create a function specific attributes group
PCI: endpoint: Fix a Kconfig prompt of vNTB driver
A PCI endpoint function driver can define function specific attributes
under its function configfs directory using the add_cfs() endpoint driver
operation. This is done by tying up the mkdir operation for the function
configfs directory to a call to the add_cfs() operation. However, there
are no checks preventing the user from repeatedly creating function
specific attribute directories with different names, resulting in the same
endpoint specific attributes group being added multiple times, which also
result in an invalid reference counting for the attribute groups. E.g.,
using the pci-epf-ntb function driver as an example, the user creates the
function as follows:
$ modprobe pci-epf-ntb
$ cd /sys/kernel/config/pci_ep/functions/pci_epf_ntb
$ mkdir func0
$ tree func0
func0/
|-- baseclass_code
|-- cache_line_size
|-- ...
`-- vendorid
$ mkdir func0/attrs
$ tree func0
func0/
|-- attrs
| |-- db_count
| |-- mw1
| |-- mw2
| |-- mw3
| |-- mw4
| |-- num_mws
| `-- spad_count
|-- baseclass_code
|-- cache_line_size
|-- ...
`-- vendorid
At this point, the function can be started by linking the EP controller.
However, if the user mistakenly creates again a directory:
$ mkdir func0/attrs2
$ tree func0
func0/
|-- attrs
| |-- db_count
| |-- mw1
| |-- mw2
| |-- mw3
| |-- mw4
| |-- num_mws
| `-- spad_count
|-- attrs2
| |-- db_count
| |-- mw1
| |-- mw2
| |-- mw3
| |-- mw4
| |-- num_mws
| `-- spad_count
|-- baseclass_code
|-- cache_line_size
|-- ...
`-- vendorid
The endpoint function specific attributes are duplicated and cause a crash
when the endpoint function device is torn down:
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 2 PID: 834 at lib/refcount.c:25 refcount_warn_saturate+0xc8/0x144
CPU: 2 PID: 834 Comm: rmdir Not tainted 6.3.0-rc1 #1
Hardware name: Pine64 RockPro64 v2.1 (DT)
pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
...
Call trace:
refcount_warn_saturate+0xc8/0x144
config_item_get+0x7c/0x80
configfs_rmdir+0x17c/0x30c
vfs_rmdir+0x8c/0x204
do_rmdir+0x158/0x184
__arm64_sys_unlinkat+0x64/0x80
invoke_syscall+0x48/0x114
...
Fix this by modifying pci_epf_cfs_work() to execute the new function
pci_ep_cfs_add_type_group() which itself calls pci_epf_type_add_cfs() to
obtain the function specific attribute group and the group name (directory
name) from the endpoint function driver. If the function driver defines an
attribute group, pci_ep_cfs_add_type_group() then proceeds to register this
group using configfs_register_group(), thus automatically exposing the
function type specific configfs attributes to the user. E.g.:
$ modprobe pci-epf-ntb
$ cd /sys/kernel/config/pci_ep/functions/pci_epf_ntb
$ mkdir func0
$ tree func0
func0/
|-- baseclass_code
|-- cache_line_size
|-- ...
|-- pci_epf_ntb.0
| |-- db_count
| |-- mw1
| |-- mw2
| |-- mw3
| |-- mw4
| |-- num_mws
| `-- spad_count
|-- primary
|-- ...
`-- vendorid
With this change, there is no need for the user to create or delete
directories in the endpoint function attributes directory. The
pci_epf_type_group_ops group operations are thus removed.
Also update the documentation for the pci-epf-ntb and pci-epf-vntb function
drivers to reflect this change, removing the explanations showing the need
to manually create the sub-directory for the function specific attributes.
Link: https://lore.kernel.org/r/20230415023542.77601-2-dlemoal@kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Since f26e58bf6f ("PCI/AER: Enable error reporting when AER is native"),
the PCI core enables PCIe device error reporting for all devices during
enumeration, so drivers don't need to do it.
Remove the recommendation for drivers to configure AER and call
pci_enable_pcie_error_reporting() themselves.
Also remove the suggestion that drivers may change AER mask and severity
registers. Ownership of these registers is negotiated between the OS and
platform firmware. If firmware owns these registers, the OS must not
change them.
Link: https://lore.kernel.org/r/20230609222500.1267795-3-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Stefan Roese <sr@denx.de>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
There are likely no users of this driver as the hardware has been
discontinued since 2010. Remove the driver and all references to it
in documentation.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull cxl updates from Dan Williams:
"Compute Express Link (CXL) updates for 6.2.
While it may seem backwards, the CXL update this time around includes
some focus on CXL 1.x enabling where the work to date had been with
CXL 2.0 (VH topologies) in mind.
First generation CXL can mostly be supported via BIOS, similar to DDR,
however it became clear there are use cases for OS native CXL error
handling and some CXL 3.0 endpoint features can be deployed on CXL 1.x
hosts (Restricted CXL Host (RCH) topologies). So, this update brings
RCH topologies into the Linux CXL device model.
In support of the ongoing CXL 2.0+ enabling two new core kernel
facilities are added.
One is the ability for the kernel to flag collisions between userspace
access to PCI configuration registers and kernel accesses. This is
brought on by the PCIe Data-Object-Exchange (DOE) facility, a hardware
mailbox over config-cycles.
The other is a cpu_cache_invalidate_memregion() API that maps to
wbinvd_on_all_cpus() on x86. To prevent abuse it is disabled in guest
VMs and architectures that do not support it yet. The CXL paths that
need it, dynamic memory region creation and security commands (erase /
unlock), are disabled when it is not present.
As for the CXL 2.0+ this cycle the subsystem gains support Persistent
Memory Security commands, error handling in response to PCIe AER
notifications, and support for the "XOR" host bridge interleave
algorithm.
Summary:
- Add the cpu_cache_invalidate_memregion() API for cache flushing in
response to physical memory reconfiguration, or memory-side data
invalidation from operations like secure erase or memory-device
unlock.
- Add a facility for the kernel to warn about collisions between
kernel and userspace access to PCI configuration registers
- Add support for Restricted CXL Host (RCH) topologies (formerly CXL
1.1)
- Add handling and reporting of CXL errors reported via the PCIe AER
mechanism
- Add support for CXL Persistent Memory Security commands
- Add support for the "XOR" algorithm for CXL host bridge interleave
- Rework / simplify CXL to NVDIMM interactions
- Miscellaneous cleanups and fixes"
* tag 'cxl-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (71 commits)
cxl/region: Fix memdev reuse check
cxl/pci: Remove endian confusion
cxl/pci: Add some type-safety to the AER trace points
cxl/security: Drop security command ioctl uapi
cxl/mbox: Add variable output size validation for internal commands
cxl/mbox: Enable cxl_mbox_send_cmd() users to validate output size
cxl/security: Fix Get Security State output payload endian handling
cxl: update names for interleave ways conversion macros
cxl: update names for interleave granularity conversion macros
cxl/acpi: Warn about an invalid CHBCR in an existing CHBS entry
tools/testing/cxl: Require cache invalidation bypass
cxl/acpi: Fail decoder add if CXIMS for HBIG is missing
cxl/region: Fix spelling mistake "memergion" -> "memregion"
cxl/regs: Fix sparse warning
cxl/acpi: Set ACPI's CXL _OSC to indicate RCD mode support
tools/testing/cxl: Add an RCH topology
cxl/port: Add RCD endpoint port enumeration
cxl/mem: Move devm_cxl_add_endpoint() from cxl_core to cxl_mem
tools/testing/cxl: Add XOR Math support to cxl_test
cxl/acpi: Support CXL XOR Interleave Math (CXIMS)
...