Commit Graph

766526 Commits

Author SHA1 Message Date
Bjorn Helgaas
a8bcb5e596 Merge branch 'pci/enumeration'
- Work around IDT switch ACS Source Validation erratum (James
    Puthukattukaran)

  - Emit diagnostics for all cases of PCIe Link downtraining (Links
    operating slower than they're capable of) (Alexandru Gagniuc)

  - Skip VFs when configuring Max Payload Size (Myron Stowe)

  - Reduce Root Port Max Payload Size if necessary when hot-adding a device
    below it (Myron Stowe)

* pci/enumeration:
  PCI: Match Root Port's MPS to endpoint's MPSS as necessary
  PCI: Skip MPS logic for Virtual Functions (VFs)
  PCI: Check for PCIe Link downtraining
  PCI: Workaround IDT switch ACS Source Validation erratum
2018-08-15 14:58:52 -05:00
Bjorn Helgaas
1ca358a8e3 Merge branch 'pci/dpc'
- Defer DPC event handling to work queue (Keith Busch)

  - Use threaded IRQ for DPC bottom half (Keith Busch)

  - Print AER status while handling DPC events (Keith Busch)

* pci/dpc:
  PCI/DPC: Remove indirection waiting for inactive link
  PCI/DPC: Use threaded IRQ for bottom half handling
  PCI/DPC: Print AER status in DPC event handling
  PCI/DPC: Remove rp_pio_status from dpc struct
  PCI/DPC: Defer event handling to work queue
  PCI/DPC: Leave interrupts enabled while handling event
2018-08-15 14:58:51 -05:00
Bjorn Helgaas
187dacce19 Merge branch 'pci/aspm'
- Use sysfs_match_string() to simplify ASPM sysfs parsing (Andy
    Shevchenko)

  - Remove unnecessary includes of <linux/pci-aspm.h> (Bjorn Helgaas)

* pci/aspm:
  PCI: Remove unnecessary include of <linux/pci-aspm.h>
  iwlwifi: Remove unnecessary include of <linux/pci-aspm.h>
  ath9k: Remove unnecessary include of <linux/pci-aspm.h>
  igb: Remove unnecessary include of <linux/pci-aspm.h>
  PCI/ASPM: Convert to use sysfs_match_string() helper
2018-08-15 14:58:46 -05:00
Bjorn Helgaas
3c3ab37f4c Merge branch 'pci/aer'
- Decode AER errors with names similar to "lspci" (Tyler Baicar)

  - Expose AER statistics in sysfs (Rajat Jain)

  - Clear AER status bits selectively based on the type of recovery (Oza
    Pawandeep)

  - Honor "pcie_ports=native" even if HEST sets FIRMWARE_FIRST (Alexandru
    Gagniuc)

  - Don't clear AER status bits if we're using the "Firmware-First"
    strategy where firmware owns the registers (Alexandru Gagniuc)

* pci/aer:
  PCI/AER: Don't clear AER bits if error handling is Firmware-First
  PCI/AER: Remove duplicate PCI_EXP_AER_FLAGS definition
  PCI/portdrv: Remove pcie_portdrv_err_handler.slot_reset
  PCI/AER: Clear device status bits during ERR_COR handling
  PCI/AER: Clear device status bits during ERR_FATAL and ERR_NONFATAL
  PCI/AER: Remove ERR_FATAL code from ERR_NONFATAL path
  PCI/AER: Factor out ERR_NONFATAL status bit clearing
  PCI/AER: Clear only ERR_NONFATAL bits during non-fatal recovery
  PCI/AER: Clear only ERR_FATAL status bits during fatal recovery
  PCI/AER: Honor "pcie_ports=native" even if HEST sets FIRMWARE_FIRST
  PCI/AER: Add sysfs attributes for rootport cumulative stats
  PCI/AER: Add sysfs attributes to provide AER stats and breakdown
  PCI/AER: Define aer_stats structure for AER capable devices
  PCI/AER: Move internal declarations to drivers/pci/pci.h
  PCI/AER: Adopt lspci names for AER error decoding
  PCI/AER: Expose internal API for obtaining AER information

# Conflicts:
#	drivers/pci/pci.h
2018-08-15 14:58:45 -05:00
Bjorn Helgaas
af863d18a1 Merge branch 'for-linus'
* for-linus:
  PCI: Fix is_added/is_busmaster race condition
  PCI: mobiveil: Avoid integer overflow in IB_WIN_SIZE
  PCI/AER: Work around use-after-free in pcie_do_fatal_recovery()
  PCI: v3-semi: Fix I/O space page leak
  PCI: mediatek: Fix I/O space page leak
  PCI: faraday: Fix I/O space page leak
  PCI: aardvark: Fix I/O space page leak
  PCI: designware: Fix I/O space page leak
  PCI: versatile: Fix I/O space page leak
  PCI: xgene: Fix I/O space page leak
  PCI: OF: Fix I/O space page leak
  PCI: endpoint: Fix NULL pointer dereference error when CONFIGFS is disabled
  PCI: hv: Disable/enable IRQs rather than BH in hv_compose_msi_msg()
  nfp: stop limiting VFs to 0
  PCI/IOV: Reset total_VFs limit after detaching PF driver
  PCI: faraday: Add missing of_node_put()
  PCI: xilinx-nwl: Add missing of_node_put()
  PCI: xilinx: Add missing of_node_put()
  PCI: endpoint: Use after free in pci_epf_unregister_driver()
  PCI: controller: dwc: Do not let PCIE_DW_PLAT_HOST default to yes
  PCI: rcar: Clean up PHY init on failure
  PCI: rcar: Shut the PHY down in failpath
  PCI: controller: Move PCI_DOMAINS selection to arch Kconfig
  PCI: Initialize endpoint library before controllers
  PCI: shpchp: Manage SHPC unconditionally on non-ACPI systems
2018-08-15 14:58:43 -05:00
Alexandru Gagniuc
45687f96c1 PCI/AER: Don't clear AER bits if error handling is Firmware-First
If the platform requests Firmware-First error handling, firmware is
responsible for reading and clearing AER status bits.  If OSPM also clears
them, we may miss errors.  See ACPI v6.2, sec 18.3.2.5 and 18.4.

This race is mostly of theoretical significance, as it is not easy to
reasonably demonstrate it in testing.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
[bhelgaas: add similar guards to pci_cleanup_aer_uncorrect_error_status()
and pci_aer_clear_fatal_status()]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-08-15 14:35:40 -05:00
Myron Stowe
9f0e893597 PCI: Match Root Port's MPS to endpoint's MPSS as necessary
In commit 27d868b5e6 ("PCI: Set MPS to match upstream bridge"), we made
sure every device's MPS setting matches its upstream bridge, making it more
likely that a hot-added device will work in a system with an optimized MPS
configuration.

Recently I've started encountering systems where the endpoint device's MPSS
capability is less than its Root Port's current MPS value, thus the
endpoint is not capable of matching its upstream bridge's MPS setting (see:
bugzilla via "Link:" below).  This leaves the system vulnerable - the
upstream Root Port could respond with larger TLPs than the device can
handle, and the device will consider them to be 'Malformed'.

One could use the "pci=pcie_bus_safe" kernel parameter to work around the
issue, but that forces a user to supply a kernel parameter to get the
system to function reliably and may end up limiting MPS settings of other
unrelated, sub-topologies which could benefit from maintaining their larger
values.

Augment Keith's approach to include tuning down a Root Port's MPS setting
when its hot-added endpoint device is not capable of matching it.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=200527
Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Jon Mason <jdmason@kudzu.us>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Sinan Kaya <okaya@kernel.org>
Cc: Dongdong Liu <liudongdong3@huawei.com>
2018-08-14 09:09:33 -05:00
Myron Stowe
3dbe97efe8 PCI: Skip MPS logic for Virtual Functions (VFs)
PCIe r4.0, sec 9.3.5.4, "Device Control Register", shows both
Max_Payload_Size (MPS) and Max_Read_request_Size (MRRS) to be 'RsvdP' for
VFs.  Just prior to the table it states:

  "PF and VF functionality is defined in Section 7.5.3.4 except where
   noted in Table 9-16.  For VF fields marked 'RsvdP', the PF setting
   applies to the VF."

All of which implies that with respect to Max_Payload_Size Supported
(MPSS), MPS, and MRRS values, we should not be paying any attention to the
VF's fields, but rather only to the PF's.  Only looking at the PF's fields
also logically makes sense as it's the sole physical interface to the PCIe
bus.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=200527
Fixes: 27d868b5e6 ("PCI: Set MPS to match upstream bridge")
Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # 4.3+
Cc: Keith Busch <keith.busch@intel.com>
Cc: Sinan Kaya <okaya@kernel.org>
Cc: Dongdong Liu <liudongdong3@huawei.com>
Cc: Jon Mason <jdmason@kudzu.us>
2018-08-14 09:07:31 -05:00
Alexandru Gagniuc
2d1ce5ec21 PCI: Check for PCIe Link downtraining
When both ends of a PCIe Link are capable of a higher bandwidth than is
currently in use, the Link is said to be "downtrained".  A downtrained Link
may indicate hardware or configuration problems in the system, but it's
hard to identify such Links from userspace.

Refactor pcie_print_link_status() so it continues to always print PCIe
bandwidth information, as several NIC drivers desire.

Add a new internal __pcie_print_link_status() to emit a message only when a
device's bandwidth is constrained by the fabric and call it from the PCI
core for all devices, which identifies all downtrained Links.  It also
emits messages for a few cases that are technically not downtrained, such
as a x4 device in an open-ended x1 slot.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
[bhelgaas: changelog, move __pcie_print_link_status() declaration to
drivers/pci/, rename pcie_check_upstream_link() to
pcie_report_downtraining()]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-08-10 12:29:04 -05:00
Bjorn Helgaas
ce29af2a50 PCI: Remove unnecessary include of <linux/pci-aspm.h>
Several PCI core files include pci-aspm.h even though they don't need
anything provided by that file.  Remove the unnecessary includes of it.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>
2018-08-06 14:32:22 -05:00
Bjorn Helgaas
2b2654b892 iwlwifi: Remove unnecessary include of <linux/pci-aspm.h>
This part of the iwlwifi driver doesn't need anything provided by
pci-aspm.h, so remove the unnecessary include of it.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>
Acked-by: Luca Coelho <luciano.coelho@intel.com>
2018-08-06 14:32:21 -05:00
Bjorn Helgaas
a78613c082 ath9k: Remove unnecessary include of <linux/pci-aspm.h>
The ath9k driver doesn't need anything provided by pci-aspm.h, so remove
the unnecessary include of it.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>
Acked-by: Kalle Valo <kvalo@codeaurora.org>
2018-08-06 14:32:21 -05:00
Bjorn Helgaas
f5ddcf71e6 igb: Remove unnecessary include of <linux/pci-aspm.h>
The igb driver doesn't need anything provided by pci-aspm.h, so remove
the unnecessary include of it.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-08-06 14:32:21 -05:00
Andy Shevchenko
36131ce9a0 PCI/ASPM: Convert to use sysfs_match_string() helper
The sysfs_match_string() helper returns index of the matching string in an
array.  Use it in pcie_aspm_set_policy() to simplify the code.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
[bhelgaas: squash sysfs_match_string() fix into original patch for issue
Reported-by: Heiner Kallweit <hkallweit1@gmail.com>]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-08-06 14:30:34 -05:00
Bjorn Helgaas
944d58595b PCI/AER: Remove duplicate PCI_EXP_AER_FLAGS definition
PCI_EXP_AER_FLAGS was defined twice (with identical definitions), once
under #ifdef CONFIG_ACPI_APEI, and again at the top level.  This looks like
my merge error from these commits:

  fd3362cb73 ("PCI/AER: Squash aerdrv_core.c into aerdrv.c")
  41cbc9eb1a ("PCI/AER: Squash ecrc.c into aerdrv.c")

Remove the duplicate PCI_EXP_AER_FLAGS definition.

Fixes: 41cbc9eb1a ("PCI/AER: Squash ecrc.c into aerdrv.c")
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Oza Pawandeep <poza@codeaurora.org>
2018-07-31 16:26:09 -05:00
Hari Vyas
44bda4b7d2 PCI: Fix is_added/is_busmaster race condition
When a PCI device is detected, pdev->is_added is set to 1 and proc and
sysfs entries are created.

When the device is removed, pdev->is_added is checked for one and then
device is detached with clearing of proc and sys entries and at end,
pdev->is_added is set to 0.

is_added and is_busmaster are bit fields in pci_dev structure sharing same
memory location.

A strange issue was observed with multiple removal and rescan of a PCIe
NVMe device using sysfs commands where is_added flag was observed as zero
instead of one while removing device and proc,sys entries are not cleared.
This causes issue in later device addition with warning message
"proc_dir_entry" already registered.

Debugging revealed a race condition between the PCI core setting the
is_added bit in pci_bus_add_device() and the NVMe driver reset work-queue
setting the is_busmaster bit in pci_set_master().  As these fields are not
handled atomically, that clears the is_added bit.

Move the is_added bit to a separate private flag variable and use atomic
functions to set and retrieve the device addition state.  This avoids the
race because is_added no longer shares a memory location with is_busmaster.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=200283
Signed-off-by: Hari Vyas <hari.vyas@broadcom.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-31 11:27:54 -05:00
Dan Carpenter
a54e43f993 PCI: mobiveil: Avoid integer overflow in IB_WIN_SIZE
IB_WIN_SIZE is larger than INT_MAX so we need to cast it to u64.

Fixes: 9af6bcb11e ("PCI: mobiveil: Add Mobiveil PCIe Host Bridge IP driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-27 17:10:39 -05:00
Thomas Tai
bd91b56cb3 PCI/AER: Work around use-after-free in pcie_do_fatal_recovery()
When an fatal error is received by a non-bridge device, the device is
removed, and pci_stop_and_remove_bus_device() deallocates the device
structure.  The freed device structure is used by subsequent code to send
uevents and print messages.

Hold a reference on the device until we're finished using it.  This is not
an ideal fix because pcie_do_fatal_recovery() should not use the device at
all after removing it, but that's too big a project for right now.

Fixes: 7e9084b367 ("PCI/AER: Handle ERR_FATAL with removal and re-enumeration of devices")
Signed-off-by: Thomas Tai <thomas.tai@oracle.com>
[bhelgaas: changelog, reduce get/put coverage]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-26 12:13:04 -05:00
Oza Pawandeep
89e1f5cb1e PCI/portdrv: Remove pcie_portdrv_err_handler.slot_reset
The pci_error_handlers.slot_reset() callback is only used for non-bridge
devices (see broadcast_error_message()).  Since portdrv only binds to
bridges, we don't need pcie_portdrv_slot_reset(), so remove it.

Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
[bhelgaas: changelog, remove pcie_portdrv_slot_reset() completely]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-20 15:27:13 -05:00
Oza Pawandeep
10d790d99d PCI/AER: Clear device status bits during ERR_COR handling
In case of correctable error, the Correctable Error Detected bit in the
Device Status register is set.  Clear it after handling the error.

Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-20 15:27:12 -05:00
Oza Pawandeep
ec752f5d54 PCI/AER: Clear device status bits during ERR_FATAL and ERR_NONFATAL
Clear the device status bits while handling both ERR_FATAL and ERR_NONFATAL
cases.

Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
[bhelgaas: rename to pci_aer_clear_device_status(), declare internal to PCI
core instead of exposing it everywhere]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-20 15:27:11 -05:00
Oza Pawandeep
43ec03a9e5 PCI/AER: Remove ERR_FATAL code from ERR_NONFATAL path
broadcast_error_message() is only used for ERR_NONFATAL events, when the
state is always pci_channel_io_normal, so remove the unused alternate path.

Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-20 15:27:10 -05:00
Oza Pawandeep
5b6c09660d PCI/AER: Factor out ERR_NONFATAL status bit clearing
aer_error_resume() clears all ERR_NONFATAL error status bits.  This is
exactly what pci_cleanup_aer_uncorrect_error_status(), so use that instead
of duplicating the code.

Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
[bhelgaas: split to separate patch]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-20 15:27:09 -05:00
Oza Pawandeep
e7b0b847de PCI/AER: Clear only ERR_NONFATAL bits during non-fatal recovery
pci_cleanup_aer_uncorrect_error_status() is called by driver .slot_reset()
methods when handling ERR_NONFATAL errors.  Previously this cleared *all*
the bits, including ERR_FATAL bits.

Since we're only handling ERR_NONFATAL errors, clear only the ERR_NONFATAL
error status bits.

Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
[bhelgaas: split to separate patch]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-20 15:27:08 -05:00
Bjorn Helgaas
7ab92e89bf PCI/AER: Clear only ERR_FATAL status bits during fatal recovery
During recovery from fatal errors, we previously called
pci_cleanup_aer_uncorrect_error_status(), which cleared *all* uncorrectable
error status bits (both ERR_FATAL and ERR_NONFATAL).

Instead, call a new pci_aer_clear_fatal_status() that clears only the
ERR_FATAL bits (as indicated by the PCI_ERR_UNCOR_SEVER register).

Based-on-patch-by: Oza Pawandeep <poza@codeaurora.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-20 15:27:07 -05:00
Keith Busch
e77b8216a2 PCI/DPC: Remove indirection waiting for inactive link
Simplify waiting for the contained link to become inactive, removing the
indirection to a unnecessary DPC-specific handler.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>
Reviewed-by: Oza Pawandeep <poza@codeaurora.org>
2018-07-19 16:21:01 -05:00
Keith Busch
738c4e411d PCI/DPC: Use threaded IRQ for bottom half handling
Remove the work struct that was being used to handle a DPC event and use a
threaded IRQ instead.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>
Reviewed-by: Oza Pawandeep <poza@codeaurora.org>
2018-07-19 16:21:01 -05:00
Keith Busch
8aefa9b0d9 PCI/DPC: Print AER status in DPC event handling
A DPC enabled device suppresses ERR_(NON)FATAL messages, preventing the AER
handler from reporting error details.  If the DPC trigger reason says the
downstream port detected the error, collect the AER uncorrectable status
for logging, then clear the status.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>
Reviewed-by: Oza Pawandeep <poza@codeaurora.org>
2018-07-19 16:21:01 -05:00
Keith Busch
f1d16b1756 PCI/DPC: Remove rp_pio_status from dpc struct
We don't need to save the rp pio status across multiple contexts as all
DPC event handling occurs in a single work queue context.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>
Reviewed-by: Oza Pawandeep <poza@codeaurora.org>
2018-07-19 16:21:01 -05:00
Keith Busch
0c27e28f77 PCI/DPC: Defer event handling to work queue
Move all event handling to the existing work queue, which will
make it simpler to pass event information to the handler.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>
Reviewed-by: Oza Pawandeep <poza@codeaurora.org>
2018-07-19 16:21:01 -05:00
Keith Busch
f8d46c89c8 PCI/DPC: Leave interrupts enabled while handling event
Now that the DPC driver clears the interrupt status before exiting the
IRQ handler, we don't need to abuse the DPC control register to know if
a shared interrupt is for a new DPC event: a DPC port can not trigger
a second interrupt until the host clears the trigger status later in the
work queue handler.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>
Reviewed-by: Oza Pawandeep <poza@codeaurora.org>
2018-07-19 16:20:59 -05:00
Alexandru Gagniuc
7af02fcd84 PCI/AER: Honor "pcie_ports=native" even if HEST sets FIRMWARE_FIRST
According to the documentation, "pcie_ports=native", linux should use
native AER and DPC services.  While that is true for the _OSC method
parsing, this is not the only place that is checked.  Should the HEST
list PCIe ports as firmware-first, linux will not use native services.

This happens because aer_acpi_firmware_first() doesn't take 'pcie_ports'
into account.  This is wrong.  DPC uses the same logic when it decides
whether to load or not, so fixing this also fixes DPC not loading.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
[bhelgaas: return "false" from bool function (from kbuild robot)]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-19 16:19:53 -05:00
Rajat Jain
12833017e5 PCI/AER: Add sysfs attributes for rootport cumulative stats
Add sysfs attributes for rootport statistics (that are cumulative of all
the ERR_* messages seen on this PCI hierarchy).

Signed-off-by: Rajat Jain <rajatja@google.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-19 16:19:52 -05:00
Rajat Jain
81aa5206f9 PCI/AER: Add sysfs attributes to provide AER stats and breakdown
Add sysfs attributes to provide total and breakdown of the AERs seen,
into different type of correctable, fatal and nonfatal errors:

  /sys/bus/pci/devices/<dev>/aer_dev_correctable
  /sys/bus/pci/devices/<dev>/aer_dev_fatal
  /sys/bus/pci/devices/<dev>/aer_dev_nonfatal

Signed-off-by: Rajat Jain <rajatja@google.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-19 16:19:51 -05:00
Rajat Jain
db89ccbe52 PCI/AER: Define aer_stats structure for AER capable devices
Define a structure to hold the AER statistics.  There are 2 groups of
statistics: dev_* counters that are to be collected for all AER capable
devices and rootport_* counters that are collected for all (AER capable)
rootports only.  Allocate and free this structure when device is added or
released (thus counters survive the lifetime of the device).

Signed-off-by: Rajat Jain <rajatja@google.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-19 16:17:03 -05:00
Rajat Jain
60ed982a4e PCI/AER: Move internal declarations to drivers/pci/pci.h
Since pci_aer_init() and pci_no_aer() are used only internally, move their
declarations to the PCI internal header file.  Also, no one cares about
return value of pci_aer_init(), so make it void.

Signed-off-by: Rajat Jain <rajatja@google.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-19 16:17:03 -05:00
Tyler Baicar
bd237801fe PCI/AER: Adopt lspci names for AER error decoding
lspci uses abbreviated naming for AER error strings.  Adopt the same naming
convention for the AER printing so they match.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Oza Pawandeep <poza@codeaurora.org>
2018-07-19 16:17:01 -05:00
Keith Busch
1e4511604d PCI/AER: Expose internal API for obtaining AER information
Export some common AER functions and structures for other PCI core drivers
to use.  Since this is making the function externally visible inside the
PCI core, prepend "aer_" to the function name.

Signed-off-by: Keith Busch <keith.busch@intel.com>
[bhelgaas: move AER declarations from linux/aer.h to drivers/pci/pci.h]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>
Reviewed-by: Oza Pawandeep <poza@codeaurora.org>
2018-07-19 16:16:55 -05:00
Sergei Shtylyov
270ed733e6 PCI: v3-semi: Fix I/O space page leak
When testing the R-Car PCIe driver on the Condor board, if the PCIe PHY
driver was left disabled, the kernel crashed with this BUG:

  kernel BUG at lib/ioremap.c:72!
  Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
  Modules linked in:
  CPU: 0 PID: 39 Comm: kworker/0:1 Not tainted 4.17.0-dirty #1092
  Hardware name: Renesas Condor board based on r8a77980 (DT)
  Workqueue: events deferred_probe_work_func
  pstate: 80000005 (Nzcv daif -PAN -UAO)
  pc : ioremap_page_range+0x370/0x3c8
  lr : ioremap_page_range+0x40/0x3c8
  sp : ffff000008da39e0
  x29: ffff000008da39e0 x28: 00e8000000000f07
  x27: ffff7dfffee00000 x26: 0140000000000000
  x25: ffff7dfffef00000 x24: 00000000000fe100
  x23: ffff80007b906000 x22: ffff000008ab8000
  x21: ffff000008bb1d58 x20: ffff7dfffef00000
  x19: ffff800009c30fb8 x18: 0000000000000001
  x17: 00000000000152d0 x16: 00000000014012d0
  x15: 0000000000000000 x14: 0720072007200720
  x13: 0720072007200720 x12: 0720072007200720
  x11: 0720072007300730 x10: 00000000000000ae
  x9 : 0000000000000000 x8 : ffff7dffff000000
  x7 : 0000000000000000 x6 : 0000000000000100
  x5 : 0000000000000000 x4 : 000000007b906000
  x3 : ffff80007c61a880 x2 : ffff7dfffeefffff
  x1 : 0000000040000000 x0 : 00e80000fe100f07
  Process kworker/0:1 (pid: 39, stack limit = 0x        (ptrval))
  Call trace:
   ioremap_page_range+0x370/0x3c8
   pci_remap_iospace+0x7c/0xac
   pci_parse_request_of_pci_ranges+0x13c/0x190
   rcar_pcie_probe+0x4c/0xb04
   platform_drv_probe+0x50/0xbc
   driver_probe_device+0x21c/0x308
   __device_attach_driver+0x98/0xc8
   bus_for_each_drv+0x54/0x94
   __device_attach+0xc4/0x12c
   device_initial_probe+0x10/0x18
   bus_probe_device+0x90/0x98
   deferred_probe_work_func+0xb0/0x150
   process_one_work+0x12c/0x29c
   worker_thread+0x200/0x3fc
   kthread+0x108/0x134
   ret_from_fork+0x10/0x18
  Code: f9004ba2 54000080 aa0003fb 17ffff48 (d4210000)

It turned out that pci_remap_iospace() wasn't undone when the driver's
probe failed, and since devm_phy_optional_get() returned -EPROBE_DEFER,
the probe was retried,  finally causing the BUG due to trying to remap
already remapped pages.

The V3 Semiconductor PCI driver has the same issue.
Replace devm_pci_remap_iospace() with its devm_ managed version to fix
the bug.

Fixes: 68a15eb7bd ("PCI: v3-semi: Add V3 Semiconductor PCI host driver")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
[lorenzo.pieralisi@arm.com: updated the commit log]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
2018-07-18 17:02:13 -05:00
Sergei Shtylyov
438477b9a0 PCI: mediatek: Fix I/O space page leak
When testing the R-Car PCIe driver on the Condor board, if the PCIe PHY
driver was left disabled, the kernel crashed with this BUG:

  kernel BUG at lib/ioremap.c:72!
  Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
  Modules linked in:
  CPU: 0 PID: 39 Comm: kworker/0:1 Not tainted 4.17.0-dirty #1092
  Hardware name: Renesas Condor board based on r8a77980 (DT)
  Workqueue: events deferred_probe_work_func
  pstate: 80000005 (Nzcv daif -PAN -UAO)
  pc : ioremap_page_range+0x370/0x3c8
  lr : ioremap_page_range+0x40/0x3c8
  sp : ffff000008da39e0
  x29: ffff000008da39e0 x28: 00e8000000000f07
  x27: ffff7dfffee00000 x26: 0140000000000000
  x25: ffff7dfffef00000 x24: 00000000000fe100
  x23: ffff80007b906000 x22: ffff000008ab8000
  x21: ffff000008bb1d58 x20: ffff7dfffef00000
  x19: ffff800009c30fb8 x18: 0000000000000001
  x17: 00000000000152d0 x16: 00000000014012d0
  x15: 0000000000000000 x14: 0720072007200720
  x13: 0720072007200720 x12: 0720072007200720
  x11: 0720072007300730 x10: 00000000000000ae
  x9 : 0000000000000000 x8 : ffff7dffff000000
  x7 : 0000000000000000 x6 : 0000000000000100
  x5 : 0000000000000000 x4 : 000000007b906000
  x3 : ffff80007c61a880 x2 : ffff7dfffeefffff
  x1 : 0000000040000000 x0 : 00e80000fe100f07
  Process kworker/0:1 (pid: 39, stack limit = 0x        (ptrval))
  Call trace:
   ioremap_page_range+0x370/0x3c8
   pci_remap_iospace+0x7c/0xac
   pci_parse_request_of_pci_ranges+0x13c/0x190
   rcar_pcie_probe+0x4c/0xb04
   platform_drv_probe+0x50/0xbc
   driver_probe_device+0x21c/0x308
   __device_attach_driver+0x98/0xc8
   bus_for_each_drv+0x54/0x94
   __device_attach+0xc4/0x12c
   device_initial_probe+0x10/0x18
   bus_probe_device+0x90/0x98
   deferred_probe_work_func+0xb0/0x150
   process_one_work+0x12c/0x29c
   worker_thread+0x200/0x3fc
   kthread+0x108/0x134
   ret_from_fork+0x10/0x18
  Code: f9004ba2 54000080 aa0003fb 17ffff48 (d4210000)

It turned out that pci_remap_iospace() wasn't undone when the driver's
probe failed, and since devm_phy_optional_get() returned -EPROBE_DEFER,
the probe was retried, finally causing the BUG due to trying to remap
already remapped pages.

The MediaTek PCIe driver has the same issue.

Replace devm_pci_remap_iospace() with its devm_ managed counterpart
to fix the bug.

Fixes: 637cfacae9 ("PCI: mediatek: Add MediaTek PCIe host controller support")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
[lorenzo.pieralisi@arm.com: updated the commit log]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
2018-07-18 17:01:36 -05:00
Sergei Shtylyov
e30609454b PCI: faraday: Fix I/O space page leak
When testing the R-Car PCIe driver on the Condor board, if
the PCIe PHY driver was left disabled, the kernel crashed with this BUG:

  kernel BUG at lib/ioremap.c:72!
  Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
  Modules linked in:
  CPU: 0 PID: 39 Comm: kworker/0:1 Not tainted 4.17.0-dirty #1092
  Hardware name: Renesas Condor board based on r8a77980 (DT)
  Workqueue: events deferred_probe_work_func
  pstate: 80000005 (Nzcv daif -PAN -UAO)
  pc : ioremap_page_range+0x370/0x3c8
  lr : ioremap_page_range+0x40/0x3c8
  sp : ffff000008da39e0
  x29: ffff000008da39e0 x28: 00e8000000000f07
  x27: ffff7dfffee00000 x26: 0140000000000000
  x25: ffff7dfffef00000 x24: 00000000000fe100
  x23: ffff80007b906000 x22: ffff000008ab8000
  x21: ffff000008bb1d58 x20: ffff7dfffef00000
  x19: ffff800009c30fb8 x18: 0000000000000001
  x17: 00000000000152d0 x16: 00000000014012d0
  x15: 0000000000000000 x14: 0720072007200720
  x13: 0720072007200720 x12: 0720072007200720
  x11: 0720072007300730 x10: 00000000000000ae
  x9 : 0000000000000000 x8 : ffff7dffff000000
  x7 : 0000000000000000 x6 : 0000000000000100
  x5 : 0000000000000000 x4 : 000000007b906000
  x3 : ffff80007c61a880 x2 : ffff7dfffeefffff
  x1 : 0000000040000000 x0 : 00e80000fe100f07
  Process kworker/0:1 (pid: 39, stack limit = 0x        (ptrval))
  Call trace:
   ioremap_page_range+0x370/0x3c8
   pci_remap_iospace+0x7c/0xac
   pci_parse_request_of_pci_ranges+0x13c/0x190
   rcar_pcie_probe+0x4c/0xb04
   platform_drv_probe+0x50/0xbc
   driver_probe_device+0x21c/0x308
   __device_attach_driver+0x98/0xc8
   bus_for_each_drv+0x54/0x94
   __device_attach+0xc4/0x12c
   device_initial_probe+0x10/0x18
   bus_probe_device+0x90/0x98
   deferred_probe_work_func+0xb0/0x150
   process_one_work+0x12c/0x29c
   worker_thread+0x200/0x3fc
   kthread+0x108/0x134
   ret_from_fork+0x10/0x18
  Code: f9004ba2 54000080 aa0003fb 17ffff48 (d4210000)

It turned out that pci_remap_iospace() wasn't undone when the driver's
probe failed, and since devm_phy_optional_get() returned -EPROBE_DEFER,
the probe was retried, finally causing the BUG due to trying to remap
already remapped pages.

The Faraday PCI driver has the same issue. Replace pci_remap_iospace()
with its devm_ managed version to fix the bug.

Fixes: d3c68e0a7e ("PCI: faraday: Add Faraday Technology FTPCI100 PCI Host Bridge driver")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
[lorenzo.pieralisi@arm.com: updated the commit log]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
2018-07-18 17:01:14 -05:00
Sergei Shtylyov
1df3e5b3fe PCI: aardvark: Fix I/O space page leak
When testing the R-Car PCIe driver on the Condor board, if the PCIe PHY
driver was left disabled, the kernel crashed with this BUG:

  kernel BUG at lib/ioremap.c:72!
  Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
  Modules linked in:
  CPU: 0 PID: 39 Comm: kworker/0:1 Not tainted 4.17.0-dirty #1092
  Hardware name: Renesas Condor board based on r8a77980 (DT)
  Workqueue: events deferred_probe_work_func
  pstate: 80000005 (Nzcv daif -PAN -UAO)
  pc : ioremap_page_range+0x370/0x3c8
  lr : ioremap_page_range+0x40/0x3c8
  sp : ffff000008da39e0
  x29: ffff000008da39e0 x28: 00e8000000000f07
  x27: ffff7dfffee00000 x26: 0140000000000000
  x25: ffff7dfffef00000 x24: 00000000000fe100
  x23: ffff80007b906000 x22: ffff000008ab8000
  x21: ffff000008bb1d58 x20: ffff7dfffef00000
  x19: ffff800009c30fb8 x18: 0000000000000001
  x17: 00000000000152d0 x16: 00000000014012d0
  x15: 0000000000000000 x14: 0720072007200720
  x13: 0720072007200720 x12: 0720072007200720
  x11: 0720072007300730 x10: 00000000000000ae
  x9 : 0000000000000000 x8 : ffff7dffff000000
  x7 : 0000000000000000 x6 : 0000000000000100
  x5 : 0000000000000000 x4 : 000000007b906000
  x3 : ffff80007c61a880 x2 : ffff7dfffeefffff
  x1 : 0000000040000000 x0 : 00e80000fe100f07
  Process kworker/0:1 (pid: 39, stack limit = 0x        (ptrval))
  Call trace:
   ioremap_page_range+0x370/0x3c8
   pci_remap_iospace+0x7c/0xac
   pci_parse_request_of_pci_ranges+0x13c/0x190
   rcar_pcie_probe+0x4c/0xb04
   platform_drv_probe+0x50/0xbc
   driver_probe_device+0x21c/0x308
   __device_attach_driver+0x98/0xc8
   bus_for_each_drv+0x54/0x94
   __device_attach+0xc4/0x12c
   device_initial_probe+0x10/0x18
   bus_probe_device+0x90/0x98
   deferred_probe_work_func+0xb0/0x150
   process_one_work+0x12c/0x29c
   worker_thread+0x200/0x3fc
   kthread+0x108/0x134
   ret_from_fork+0x10/0x18
  Code: f9004ba2 54000080 aa0003fb 17ffff48 (d4210000)

It turned out that pci_remap_iospace() wasn't undone when the driver's
probe failed, and since devm_phy_optional_get() returned -EPROBE_DEFER,
the probe was retried, finally causing the BUG due to trying to remap
already remapped pages.

The Aardvark PCI controller driver has the same issue.
Replace pci_remap_iospace() with its devm_ managed version to fix the bug.

Fixes: 8c39d71036 ("PCI: aardvark: Add Aardvark PCI host controller driver")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
[lorenzo.pieralisi@arm.com: updated the commit log]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
2018-07-18 17:00:54 -05:00
Sergei Shtylyov
fd07f5e19c PCI: designware: Fix I/O space page leak
When testing the R-Car PCIe driver on the Condor board, if the PCIe PHY
driver is left disabled, the kernel crashed with this BUG:

  kernel BUG at lib/ioremap.c:72!
  Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
  Modules linked in:
  CPU: 0 PID: 39 Comm: kworker/0:1 Not tainted 4.17.0-dirty #1092
  Hardware name: Renesas Condor board based on r8a77980 (DT)
  Workqueue: events deferred_probe_work_func
  pstate: 80000005 (Nzcv daif -PAN -UAO)
  pc : ioremap_page_range+0x370/0x3c8
  lr : ioremap_page_range+0x40/0x3c8
  sp : ffff000008da39e0
  x29: ffff000008da39e0 x28: 00e8000000000f07
  x27: ffff7dfffee00000 x26: 0140000000000000
  x25: ffff7dfffef00000 x24: 00000000000fe100
  x23: ffff80007b906000 x22: ffff000008ab8000
  x21: ffff000008bb1d58 x20: ffff7dfffef00000
  x19: ffff800009c30fb8 x18: 0000000000000001
  x17: 00000000000152d0 x16: 00000000014012d0
  x15: 0000000000000000 x14: 0720072007200720
  x13: 0720072007200720 x12: 0720072007200720
  x11: 0720072007300730 x10: 00000000000000ae
  x9 : 0000000000000000 x8 : ffff7dffff000000
  x7 : 0000000000000000 x6 : 0000000000000100
  x5 : 0000000000000000 x4 : 000000007b906000
  x3 : ffff80007c61a880 x2 : ffff7dfffeefffff
  x1 : 0000000040000000 x0 : 00e80000fe100f07
  Process kworker/0:1 (pid: 39, stack limit = 0x        (ptrval))
  Call trace:
   ioremap_page_range+0x370/0x3c8
   pci_remap_iospace+0x7c/0xac
   pci_parse_request_of_pci_ranges+0x13c/0x190
   rcar_pcie_probe+0x4c/0xb04
   platform_drv_probe+0x50/0xbc
   driver_probe_device+0x21c/0x308
   __device_attach_driver+0x98/0xc8
   bus_for_each_drv+0x54/0x94
   __device_attach+0xc4/0x12c
   device_initial_probe+0x10/0x18
   bus_probe_device+0x90/0x98
   deferred_probe_work_func+0xb0/0x150
   process_one_work+0x12c/0x29c
   worker_thread+0x200/0x3fc
   kthread+0x108/0x134
   ret_from_fork+0x10/0x18
  Code: f9004ba2 54000080 aa0003fb 17ffff48 (d4210000)

It turned out that pci_remap_iospace() wasn't undone when the driver's
probe failed, and since devm_phy_optional_get() returned -EPROBE_DEFER,
the probe was retried, finally causing the BUG due to trying to remap
already remapped pages.

The DesignWare PCIe controller driver has the same issue.

Replace devm_pci_remap_iospace() with a devm_ managed version to fix the
bug.

Fixes: cbce790059 ("PCI: designware: Make driver arch-agnostic")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
[lorenzo.pieralisi@arm.com: updated the commit log]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Jingoo Han <jingoohan1@gmail.com>
2018-07-18 17:00:29 -05:00
Sergei Shtylyov
0018b265ad PCI: versatile: Fix I/O space page leak
When testing the R-Car PCIe driver on the Condor board, if the PCIe PHY
driver was left disabled, the kernel crashed with this BUG:

  kernel BUG at lib/ioremap.c:72!
  Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
  Modules linked in:
  CPU: 0 PID: 39 Comm: kworker/0:1 Not tainted 4.17.0-dirty #1092
  Hardware name: Renesas Condor board based on r8a77980 (DT)
  Workqueue: events deferred_probe_work_func
  pstate: 80000005 (Nzcv daif -PAN -UAO)
  pc : ioremap_page_range+0x370/0x3c8
  lr : ioremap_page_range+0x40/0x3c8
  sp : ffff000008da39e0
  x29: ffff000008da39e0 x28: 00e8000000000f07
  x27: ffff7dfffee00000 x26: 0140000000000000
  x25: ffff7dfffef00000 x24: 00000000000fe100
  x23: ffff80007b906000 x22: ffff000008ab8000
  x21: ffff000008bb1d58 x20: ffff7dfffef00000
  x19: ffff800009c30fb8 x18: 0000000000000001
  x17: 00000000000152d0 x16: 00000000014012d0
  x15: 0000000000000000 x14: 0720072007200720
  x13: 0720072007200720 x12: 0720072007200720
  x11: 0720072007300730 x10: 00000000000000ae
  x9 : 0000000000000000 x8 : ffff7dffff000000
  x7 : 0000000000000000 x6 : 0000000000000100
  x5 : 0000000000000000 x4 : 000000007b906000
  x3 : ffff80007c61a880 x2 : ffff7dfffeefffff
  x1 : 0000000040000000 x0 : 00e80000fe100f07
  Process kworker/0:1 (pid: 39, stack limit = 0x        (ptrval))
  Call trace:
   ioremap_page_range+0x370/0x3c8
   pci_remap_iospace+0x7c/0xac
   pci_parse_request_of_pci_ranges+0x13c/0x190
   rcar_pcie_probe+0x4c/0xb04
   platform_drv_probe+0x50/0xbc
   driver_probe_device+0x21c/0x308
   __device_attach_driver+0x98/0xc8
   bus_for_each_drv+0x54/0x94
   __device_attach+0xc4/0x12c
   device_initial_probe+0x10/0x18
   bus_probe_device+0x90/0x98
   deferred_probe_work_func+0xb0/0x150
   process_one_work+0x12c/0x29c
   worker_thread+0x200/0x3fc
   kthread+0x108/0x134
   ret_from_fork+0x10/0x18
  Code: f9004ba2 54000080 aa0003fb 17ffff48 (d4210000)

It turned out that pci_remap_iospace() wasn't undone when the driver's
probe failed, and since devm_phy_optional_get() returned -EPROBE_DEFER,
the probe was retried, finally causing the BUG due to trying to remap
already remapped pages.

The Versatile PCI controller driver has the same issue.
Replace pci_remap_iospace() with the devm_ managed version to fix the bug.

Fixes: b7e78170ef ("PCI: versatile: Add DT-based ARM Versatile PB PCIe host driver")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
[lorenzo.pieralisi@arm.com: updated the commit log]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
2018-07-18 17:00:11 -05:00
Sergei Shtylyov
925652d035 PCI: xgene: Fix I/O space page leak
When testing the R-Car PCIe driver on the Condor board, if the PCIe PHY
driver was left disabled, the kernel crashed with this BUG:

  kernel BUG at lib/ioremap.c:72!
  Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
  Modules linked in:
  CPU: 0 PID: 39 Comm: kworker/0:1 Not tainted 4.17.0-dirty #1092
  Hardware name: Renesas Condor board based on r8a77980 (DT)
  Workqueue: events deferred_probe_work_func
  pstate: 80000005 (Nzcv daif -PAN -UAO)
  pc : ioremap_page_range+0x370/0x3c8
  lr : ioremap_page_range+0x40/0x3c8
  sp : ffff000008da39e0
  x29: ffff000008da39e0 x28: 00e8000000000f07
  x27: ffff7dfffee00000 x26: 0140000000000000
  x25: ffff7dfffef00000 x24: 00000000000fe100
  x23: ffff80007b906000 x22: ffff000008ab8000
  x21: ffff000008bb1d58 x20: ffff7dfffef00000
  x19: ffff800009c30fb8 x18: 0000000000000001
  x17: 00000000000152d0 x16: 00000000014012d0
  x15: 0000000000000000 x14: 0720072007200720
  x13: 0720072007200720 x12: 0720072007200720
  x11: 0720072007300730 x10: 00000000000000ae
  x9 : 0000000000000000 x8 : ffff7dffff000000
  x7 : 0000000000000000 x6 : 0000000000000100
  x5 : 0000000000000000 x4 : 000000007b906000
  x3 : ffff80007c61a880 x2 : ffff7dfffeefffff
  x1 : 0000000040000000 x0 : 00e80000fe100f07
  Process kworker/0:1 (pid: 39, stack limit = 0x        (ptrval))
  Call trace:
   ioremap_page_range+0x370/0x3c8
   pci_remap_iospace+0x7c/0xac
   pci_parse_request_of_pci_ranges+0x13c/0x190
   rcar_pcie_probe+0x4c/0xb04
   platform_drv_probe+0x50/0xbc
   driver_probe_device+0x21c/0x308
   __device_attach_driver+0x98/0xc8
   bus_for_each_drv+0x54/0x94
   __device_attach+0xc4/0x12c
   device_initial_probe+0x10/0x18
   bus_probe_device+0x90/0x98
   deferred_probe_work_func+0xb0/0x150
   process_one_work+0x12c/0x29c
   worker_thread+0x200/0x3fc
   kthread+0x108/0x134
   ret_from_fork+0x10/0x18
  Code: f9004ba2 54000080 aa0003fb 17ffff48 (d4210000)

It turned out that pci_remap_iospace() wasn't undone when the driver's
probe failed, and since devm_phy_optional_get() returned -EPROBE_DEFER,
the probe was retried, finally causing the BUG due to trying to remap
already remapped pages.

The X-Gene PCI controller driver has the same issue.
Replace pci_remap_iospace() with the devm_ managed version so that the
pages get unmapped automagically on any probe failure.

Fixes: 5f6b6ccdbe ("PCI: xgene: Add APM X-Gene PCIe driver")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
[lorenzo.pieralisi@arm.com: updated the commit log]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
2018-07-18 16:59:40 -05:00
Sergei Shtylyov
a5fb9fb023 PCI: OF: Fix I/O space page leak
When testing the R-Car PCIe driver on the Condor board, if the PCIe PHY
driver was left disabled, the kernel crashed with this BUG:

  kernel BUG at lib/ioremap.c:72!
  Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
  Modules linked in:
  CPU: 0 PID: 39 Comm: kworker/0:1 Not tainted 4.17.0-dirty #1092
  Hardware name: Renesas Condor board based on r8a77980 (DT)
  Workqueue: events deferred_probe_work_func
  pstate: 80000005 (Nzcv daif -PAN -UAO)
  pc : ioremap_page_range+0x370/0x3c8
  lr : ioremap_page_range+0x40/0x3c8
  sp : ffff000008da39e0
  x29: ffff000008da39e0 x28: 00e8000000000f07
  x27: ffff7dfffee00000 x26: 0140000000000000
  x25: ffff7dfffef00000 x24: 00000000000fe100
  x23: ffff80007b906000 x22: ffff000008ab8000
  x21: ffff000008bb1d58 x20: ffff7dfffef00000
  x19: ffff800009c30fb8 x18: 0000000000000001
  x17: 00000000000152d0 x16: 00000000014012d0
  x15: 0000000000000000 x14: 0720072007200720
  x13: 0720072007200720 x12: 0720072007200720
  x11: 0720072007300730 x10: 00000000000000ae
  x9 : 0000000000000000 x8 : ffff7dffff000000
  x7 : 0000000000000000 x6 : 0000000000000100
  x5 : 0000000000000000 x4 : 000000007b906000
  x3 : ffff80007c61a880 x2 : ffff7dfffeefffff
  x1 : 0000000040000000 x0 : 00e80000fe100f07
  Process kworker/0:1 (pid: 39, stack limit = 0x        (ptrval))
  Call trace:
   ioremap_page_range+0x370/0x3c8
   pci_remap_iospace+0x7c/0xac
   pci_parse_request_of_pci_ranges+0x13c/0x190
   rcar_pcie_probe+0x4c/0xb04
   platform_drv_probe+0x50/0xbc
   driver_probe_device+0x21c/0x308
   __device_attach_driver+0x98/0xc8
   bus_for_each_drv+0x54/0x94
   __device_attach+0xc4/0x12c
   device_initial_probe+0x10/0x18
   bus_probe_device+0x90/0x98
   deferred_probe_work_func+0xb0/0x150
   process_one_work+0x12c/0x29c
   worker_thread+0x200/0x3fc
   kthread+0x108/0x134
   ret_from_fork+0x10/0x18
  Code: f9004ba2 54000080 aa0003fb 17ffff48 (d4210000)

It turned out that pci_remap_iospace() wasn't undone when the driver's
probe failed, and since devm_phy_optional_get() returned -EPROBE_DEFER,
the probe was retried, finally causing the BUG due to trying to remap
already remapped pages.

Introduce the devm_pci_remap_iospace() managed API and replace the
pci_remap_iospace() call with it to fix the bug.

Fixes: dbf9826d57 ("PCI: generic: Convert to DT resource parsing API")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
[lorenzo.pieralisi@arm.com: split commit/updated the commit log]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
2018-07-18 15:40:26 -05:00
James Puthukattukaran
aa667c6408 PCI: Workaround IDT switch ACS Source Validation erratum
Some IDT switches incorrectly flag an ACS Source Validation error on
completions for config read requests even though PCIe r4.0, sec 6.12.1.1,
says that completions are never affected by ACS Source Validation.  Here's
the text of IDT 89H32H8G3-YC, erratum #36:

  Item #36 - Downstream port applies ACS Source Validation to Completions
  Section 6.12.1.1 of the PCI Express Base Specification 3.1 states that
  completions are never affected by ACS Source Validation.  However,
  completions received by a downstream port of the PCIe switch from a
  device that has not yet captured a PCIe bus number are incorrectly
  dropped by ACS Source Validation by the switch downstream port.

  Workaround: Issue a CfgWr1 to the downstream device before issuing the
  first CfgRd1 to the device.  This allows the downstream device to capture
  its bus number; ACS Source Validation no longer stops completions from
  being forwarded by the downstream port.  It has been observed that
  Microsoft Windows implements this workaround already; however, some
  versions of Linux and other operating systems may not.

When doing the first config read to probe for a device, if the device is
behind an IDT switch with this erratum:

  1. Disable ACS Source Validation if enabled
  2. Wait for device to become ready to accept config accesses (by using
     the Config Request Retry Status mechanism)
  3. Do a config write to the endpoint
  4. Enable ACS Source Validation (if it was enabled to begin with)

The workaround suggested by IDT is basically only step 3, but we don't know
when the device is ready to accept config requests.  That means we need to
do config reads until we receive a non-Config Request Retry Status, which
means we need to disable ACS SV temporarily.

Signed-off-by: James Puthukattukaran <james.puthukattukaran@oracle.com>
[bhelgaas: changelog, clean up whitespace, fold in unused variable fix
from Anders Roxell <anders.roxell@linaro.org>]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
2018-07-12 16:54:35 -05:00
Kishon Vijay Abraham I
a83a217344 PCI: endpoint: Fix NULL pointer dereference error when CONFIGFS is disabled
commit ef1433f717 ("PCI: endpoint: Create configfs entry for each
pci_epf_device_id table entry") while adding configfs entry for each
pci_epf_device_id table entry introduced a NULL pointer dereference error
when CONFIG_PCI_ENDPOINT_CONFIGFS is not enabled.

Fix it here.

Fixes: ef1433f717 ("PCI: endpoint: Create configfs entry for each
pci_epf_device_id table entry")
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
[lorenzo.pieralisi: updated commit log]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-09 15:19:51 -05:00
Dexuan Cui
35a88a18d7 PCI: hv: Disable/enable IRQs rather than BH in hv_compose_msi_msg()
Commit de0aa7b2f9 ("PCI: hv: Fix 2 hang issues in hv_compose_msi_msg()")
uses local_bh_disable()/enable(), because hv_pci_onchannelcallback() can
also run in tasklet context as the channel event callback, so bottom halves
should be disabled to prevent a race condition.

With CONFIG_PROVE_LOCKING=y in the recent mainline, or old kernels that
don't have commit f71b74bca6 ("irq/softirqs: Use lockdep to assert IRQs
are disabled/enabled"), when the upper layer IRQ code calls
hv_compose_msi_msg() with local IRQs disabled, we'll see a warning at the
beginning of __local_bh_enable_ip():

  IRQs not enabled as expected
    WARNING: CPU: 0 PID: 408 at kernel/softirq.c:162 __local_bh_enable_ip

The warning exposes an issue in de0aa7b2f9: local_bh_enable() can
potentially call do_softirq(), which is not supposed to run when local IRQs
are disabled. Let's fix this by using local_irq_save()/restore() instead.

Note: hv_pci_onchannelcallback() is not a hot path because it's only called
when the PCI device is hot added and removed, which is infrequent.

Fixes: de0aa7b2f9 ("PCI: hv: Fix 2 hang issues in hv_compose_msi_msg()")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Cc: stable@vger.kernel.org
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
2018-07-09 13:16:07 -05:00
Jakub Kicinski
83235822b8 nfp: stop limiting VFs to 0
Before 8d85a7a4f2 ("PCI/IOV: Allow PF drivers to limit total_VFs to 0"),
pci_sriov_set_totalvfs(pdev, 0) meant "we can enable TotalVFs virtual
functions".  After 8d85a7a4f2, it means "we can't enable *any* VFs".

That broke this scenario where nfp intends to remove any limit on the
number of VFs that can be enabled:

  nfp_pci_probe
    nfp_pcie_sriov_read_nfd_limit
      nfp_rtsym_read_le("nfd_vf_cfg_max_vfs", &err)
      pci_sriov_set_totalvfs(pf->pdev, 0)   # if FW didn't expose a limit

  ...
  # userspace writes N to sysfs "sriov_numvfs":
  sriov_numvfs_store
    pci_sriov_get_totalvfs                  # now returns 0
    return -ERANGE

Prior to 8d85a7a4f2, pci_sriov_get_totalvfs() returned TotalVFs, but it
now returns 0.

Remove the pci_sriov_set_totalvfs(pdev, 0) calls so we don't limit the
number of VFs that can be enabled.

Fixes: 8d85a7a4f2 ("PCI/IOV: Allow PF drivers to limit total_VFs to 0")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-06-29 15:09:00 -05:00