Pull iommufd updates from Jason Gunthorpe:
- The iova_bitmap logic for efficiently reporting dirty pages back to
userspace has a few more tricky corner case bugs that have been
resolved and backed with new tests.
The revised version has simpler logic.
- Shared branch with iommu for handle support when doing domain attach.
Handles allow the domain owner to include additional private data on
a per-device basis.
- IO Page Fault Reporting to userspace via iommufd. Page faults can be
generated on fault capable HWPTs when a translation is not present.
Routing them to userspace would allow a VMM to be able to virtualize
them into an emulated vIOMMU. This is the next step to fully enabling
vSVA support.
* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (26 commits)
iommufd: Put constants for all the uAPI enums
iommufd: Fix error pointer checking
iommufd: Add check on user response code
iommufd: Remove IOMMUFD_PAGE_RESP_FAILURE
iommufd: Require drivers to supply the cache_invalidate_user ops
iommufd/selftest: Add coverage for IOPF test
iommufd/selftest: Add IOPF support for mock device
iommufd: Associate fault object with iommufd_hw_pgtable
iommufd: Fault-capable hwpt attach/detach/replace
iommufd: Add iommufd fault object
iommufd: Add fault and response message definitions
iommu: Extend domain attach group with handle support
iommu: Add attach handle to struct iopf_group
iommu: Remove sva handle list
iommu: Introduce domain attachment handle
iommufd/iova_bitmap: Remove iterator logic
iommufd/iova_bitmap: Dynamic pinning on iova_bitmap_set()
iommufd/iova_bitmap: Consolidate iova_bitmap_set exit conditionals
iommufd/iova_bitmap: Move initial pinning to iova_bitmap_for_each()
iommufd/iova_bitmap: Cache mapped length in iova_bitmap_map struct
...
Lu Baolu says:
====================
This series implements the functionality of delivering IO page faults to
user space through the IOMMUFD framework. One feasible use case is the
nested translation. Nested translation is a hardware feature that supports
two-stage translation tables for IOMMU. The second-stage translation table
is managed by the host VMM, while the first-stage translation table is
owned by user space. This allows user space to control the IOMMU mappings
for its devices.
When an IO page fault occurs on the first-stage translation table, the
IOMMU hardware can deliver the page fault to user space through the
IOMMUFD framework. User space can then handle the page fault and respond
to the device top-down through the IOMMUFD. This allows user space to
implement its own IO page fault handling policies.
User space application that is capable of handling IO page faults should
allocate a fault object, and bind the fault object to any domain that it
is willing to handle the fault generatd for them. On a successful return
of fault object allocation, the user can retrieve and respond to page
faults by reading or writing to the file descriptor (FD) returned.
The iommu selftest framework has been updated to test the IO page fault
delivery and response functionality.
====================
* iommufd_pri:
iommufd/selftest: Add coverage for IOPF test
iommufd/selftest: Add IOPF support for mock device
iommufd: Associate fault object with iommufd_hw_pgtable
iommufd: Fault-capable hwpt attach/detach/replace
iommufd: Add iommufd fault object
iommufd: Add fault and response message definitions
iommu: Extend domain attach group with handle support
iommu: Add attach handle to struct iopf_group
iommu: Remove sva handle list
iommu: Introduce domain attachment handle
Link: https://lore.kernel.org/all/20240702063444.105814-1-baolu.lu@linux.intel.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Currently, when attaching a domain to a device or its PASID, domain is
stored within the iommu group. It could be retrieved for use during the
window between attachment and detachment.
With new features introduced, there's a need to store more information
than just a domain pointer. This information essentially represents the
association between a domain and a device. For example, the SVA code
already has a custom struct iommu_sva which represents a bond between
sva domain and a PASID of a device. Looking forward, the IOMMUFD needs
a place to store the iommufd_device pointer in the core, so that the
device object ID could be quickly retrieved in the critical fault handling
path.
Introduce domain attachment handle that explicitly represents the
attachment relationship between a domain and a device or its PASID.
Co-developed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20240702063444.105814-2-baolu.lu@linux.intel.com
Signed-off-by: Will Deacon <will@kernel.org>
Cross-merge networking fixes after downstream PR.
No conflicts.
Adjacent changes:
e3f02f32a0 ("ionic: fix kernel panic due to multi-buffer handling")
d9c0420999 ("ionic: Mark error paths in the data path as unlikely")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kbuild does not support having a source file compiled multiple times
and linked into distinct modules, or built-in and modular at the
same time. For fs-edma, there are two common components that are
linked into the fsl-edma.ko for Arm and PowerPC, plus the mcf-edma.ko
module on Coldfire. This violates the rule for compile-testing:
scripts/Makefile.build:236: drivers/dma/Makefile: fsl-edma-common.o is added to multiple modules: fsl-edma mcf-edma
scripts/Makefile.build:236: drivers/dma/Makefile: fsl-edma-trace.o is added to multiple modules: fsl-edma mcf-edma
I tried splitting out the common parts into a separate modules, but
that adds back the complexity that a cleanup patch removed, and it
gets harder with the addition of the tracepoints.
As a minimal workaround, address it at the Kconfig level, by disallowing
the broken configurations.
Link: https://lore.kernel.org/lkml/20240110232255.1099757-1-arnd@kernel.org/
Fixes: 66aac8ea0a ("dmaengine: fsl-edma: clean up EXPORT_SYMBOL_GPL in fsl-edma-common.c")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Peng Fan <peng.fan@nxp.com>
Link: https://lore.kernel.org/r/20240528115440.2965975-1-arnd@kernel.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Use list_for_each_entry_safe() to allow iterating through the list and
deleting the entry in the iteration process. The descriptor is freed via
idxd_desc_complete() and there's a slight chance may cause issue for
the list iterator when the descriptor is reused by another thread
without it being deleted from the list.
Fixes: 16e19e1122 ("dmaengine: idxd: Fix list corruption in description completion")
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240603012444.11902-1-lirongqing@baidu.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Currently the k3_udma_glue_rx_get_irq() function returns either negative
error codes or zero on error. Generally, in the kernel, zero means
success so this be confusing and has caused bugs in the past. Also the
"tx" version of this function only returns negative error codes. Let's
clean this "rx" function so both functions match.
This patch has no effect on runtime.
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Acked-by: Peter Ujfalusi <peter.ujfalusi@gmail.com>
Acked-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull ARM updates from Russell King:
- Updates to AMBA bus subsystem to drop .owner struct device_driver
initialisations, moving that to code instead.
- Add LPAE privileged-access-never support
- Add support for Clang CFI
- clkdev: report over-sized device or connection strings
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux: (36 commits)
ARM: 9398/1: Fix userspace enter on LPAE with CC_OPTIMIZE_FOR_SIZE=y
clkdev: report over-sized strings when creating clkdev entries
ARM: 9393/1: mm: Use conditionals for CFI branches
ARM: 9392/2: Support CLANG CFI
ARM: 9391/2: hw_breakpoint: Handle CFI breakpoints
ARM: 9390/2: lib: Annotate loop delay instructions for CFI
ARM: 9389/2: mm: Define prototypes for all per-processor calls
ARM: 9388/2: mm: Type-annotate all per-processor assembly routines
ARM: 9387/2: mm: Rewrite cacheflush vtables in CFI safe C
ARM: 9386/2: mm: Use symbol alias for cache functions
ARM: 9385/2: mm: Type-annotate all cache assembly routines
ARM: 9384/2: mm: Make tlbflush routines CFI safe
ARM: 9382/1: ftrace: Define ftrace_stub_graph
ARM: 9358/2: Implement PAN for LPAE by TTBR0 page table walks disablement
ARM: 9357/2: Reduce the number of #ifdef CONFIG_CPU_SW_DOMAIN_PAN
ARM: 9356/2: Move asm statements accessing TTBCR into C functions
ARM: 9355/2: Add TTBCR_* definitions to pgtable-3level-hwdef.h
ARM: 9379/1: coresight: tpda: drop owner assignment
ARM: 9378/1: coresight: etm4x: drop owner assignment
ARM: 9377/1: hwrng: nomadik: drop owner assignment
...
After the patch to restrict the use of mmap() to CAP_SYS_RAWIO for
the currently existing devices, most applications can no longer make
use of the accelerators as in production "you don't run things as root".
To keep the DSA and IAA accelerators usable, hook up a write() method
so that applications can still submit work. In the write method,
sufficient input validation is performed to avoid the security issue
that required the mmap CAP_SYS_RAWIO check.
One complication is that the DSA device allows for indirect ("batched")
descriptors. There is no reasonable way to do the input validation
on these indirect descriptors so the write() method will not allow these
to be submitted to the hardware on affected hardware, and the sysfs
enumeration of support for the opcode is also removed.
Early performance data shows that the performance delta for most common
cases is within the noise.
Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
On Sapphire Rapids and related platforms, the DSA and IAA devices have an
erratum that causes direct access (for example, by using the ENQCMD or
MOVDIR64 instructions) from untrusted applications to be a security problem.
To solve this, add a flag to the PCI device enumeration and device structures
to indicate the presence/absence of this security exposure. In the mmap()
method of the device, this flag is then used to enforce that the user
has the CAP_SYS_RAWIO capability.
In a future patch, a write() based method will be added that allows untrusted
applications submit work to the accelerator, where the kernel can do
sanity checking on the user input to ensure secure operation of the accelerator.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Due to an erratum with the SPR_DSA and SPR_IAX devices, it is not secure to assign
these devices to virtual machines. Add the PCI IDs of these devices to the VFIO
denylist to ensure that this is handled appropriately by the VFIO subsystem.
The SPR_DSA and SPR_IAX devices are on-SOC devices for the Sapphire Rapids
(and related) family of products that perform data movement and compression.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
file_ida is allocated during cdev open and is freed accordingly
during cdev release. This sequence is guaranteed by driver file
operations. Therefore, there is no need to destroy an already empty
file_ida when the WQ cdev is removed.
Worse, ida_free() in cdev release may happen after destruction of
file_ida per WQ cdev. This can lead to accessing an id in file_ida
after it has been destroyed, resulting in a kernel panic.
Remove ida_destroy(&file_ida) to address these issues.
Fixes: e6fd6d7e5f ("dmaengine: idxd: add a device to represent the file opened")
Signed-off-by: Lijun Pan <lijun.pan@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20240130013954.2024231-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
The macros SDMA_SCRIPT_ADDRS_ARRAY_SIZE_V<n> actually related with the
struct sdma_script_start_addrs.
struct sdma_script_start_addrs {
...
/* End of v1 array */
...
/* End of v2 array */
...
/* End of v3 array */
...
/* End of v4 array */
};
When add new field of sdma_script_start_addrs, it is easy to miss update
SDMA_SCRIPT_ADDRS_ARRAY_SIZE_V<n>.
Employ offsetof for SDMA_SCRIPT_ADDRS_ARRAY_SIZE_V<n> macros instead of
hardcoding numbers. the preprocessing stage will calculate the size for
each version automatically.
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Link: https://lore.kernel.org/r/20240419150729.1071904-2-Frank.Li@nxp.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
The DT support in hidma has been broken since commit 37fa4905d2
("dmaengine: qcom_hidma: simplify DT resource parsing") in 2018. The
issue is the of_address_to_resource() calls bail out on success rather
than failure. This driver is for a defunct QCom server platform where
DT use was limited to start with. As it seems no one has noticed the
breakage, just remove the DT support altogether.
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Link: https://lore.kernel.org/r/20240423161413.481670-1-robh@kernel.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Update the DPDMAI interfaces to support MC firmware up to 10.1x.x, which
major change is to add dpaa domain id support. User space MC controller
tool can create difference dpaa domain for difference virtual environment.
DMA queues can map to difference service priorities.
The MC command was basic compatible original one. The new command use
previous reserved field.
- Add queue number for dpdmai_get_tx(rx)_queue().
- Unified rx(tx)_queue_attr.
- Update pad/reserved field of struct dpdmai_rsp_get_attributes and
struct dpdmai_cmd_queue for new API.
- Update command DPDMAI_SET(GET)_RX_QUEUE and DPDMAI_CMDID_GET_TX_QUEUE
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Link: https://lore.kernel.org/r/20240409163630.1996052-1-Frank.Li@nxp.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
The helper function chan2parent is not used and has never been
used since the first commit to the code back in 2010. The function
is redundant and can be removed.
Cleans up clang scan build warning:
drivers/dma/pch_dma.c:158:30: warning: unused function 'chan2parent' [-Wunused-function]
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://lore.kernel.org/r/20240308134750.2058556-1-colin.i.king@gmail.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:
BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>
Fix the issue by preventing the migration of the perf context to an
invalid target.
Fixes: 81dd4d4d61 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
I have a use case where nr_buffers = 3 and in which each descriptor is composed by 3
segments, resulting in the DMA channel descs_allocated to be 9. Since axi_desc_put()
handles the hw_desc considering the descs_allocated, this scenario would result in a
kernel panic (hw_desc array will be overrun).
To fix this, the proposal is to add a new member to the axi_dma_desc structure,
where we keep the number of allocated hw_descs (axi_desc_alloc()) and use it in
axi_desc_put() to handle the hw_desc array correctly.
Additionally I propose to remove the axi_chan_start_first_queued() call after completing
the transfer, since it was identified that unbalance can occur (started descriptors can
be interrupted and transfer ignored due to DMA channel not being enabled).
Signed-off-by: Joao Pinto <jpinto@synopsys.com>
Link: https://lore.kernel.org/r/1711536564-12919-1-git-send-email-jpinto@synopsys.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
The current xdma_synchronize method does not properly wait for the last
transfer to be done. Due to limitations of the XMDA engine, it is not
possible to stop a transfer in the middle of a descriptor. Said
otherwise, if a stop is requested at the end of descriptor "N" and the OS
is fast enough, the DMA controller will effectively stop immediately.
However, if the OS is slightly too slow to request the stop and the DMA
engine starts descriptor "N+1", the N+1 transfer will be performed until
its end. This means that after a terminate_all, the last descriptor must
remain valid and the synchronization must wait for this last descriptor to
be terminated.
Fixes: 855c2e1d18 ("dmaengine: xilinx: xdma: Rework xdma_terminate_all()")
Fixes: f5c392d106 ("dmaengine: xilinx: xdma: Add terminate_all/synchronize callbacks")
Cc: stable@vger.kernel.org
Suggested-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
Link: https://lore.kernel.org/r/20240327-digigram-xdma-fixes-v1-2-45f4a52c0283@bootlin.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>