Commit Graph

1169240 Commits

Author SHA1 Message Date
Alvaro Karsz
db6c4dee4c PCI: Add SolidRun vendor ID
Add SolidRun vendor ID to pci_ids.h

The vendor ID is used in 2 different source files, the SNET vDPA driver
and PCI quirks.

Signed-off-by: Alvaro Karsz <alvaro.karsz@solid-run.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Message-Id: <20230110165638.123745-2-alvaro.karsz@solid-run.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:55 -05:00
Eugenio Pérez
d8b3832a78 vdpa_sim_net: Offer VIRTIO_NET_F_STATUS
VIRTIO_NET_S_LINK_UP is already returned in config reads since vdpasim
creation, but the feature bit was not offered to the driver.

Tested modifying VIRTIO_NET_S_LINK_UP and different values of "status"
in qemu virtio-net options, using vhost_vdpa.

Not considering as a fix, because there should be no driver trusting in
this config read before the feature flag.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20221117155502.1394700-1-eperezma@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
2023-02-20 19:26:55 -05:00
Zhu Lingshan
46fc0917bb vDPA/ifcvf: implement features provisioning
This commit implements features provisioning for ifcvf, that means:
1)checkk whether the provisioned features are supported by
the management device
2)vDPA device only presents selected feature bits

Examples:
a)The management device supported features:
$ vdpa mgmtdev show pci/0000:01:00.5
pci/0000:01:00.5:
  supported_classes net
  max_supported_vqs 9
  dev_features MTU MAC MRG_RXBUF CTRL_VQ MQ ANY_LAYOUT VERSION_1 ACCESS_PLATFORM

b)Provision a vDPA device with all supported features:
$ vdpa dev add name vdpa0 mgmtdev pci/0000:01:00.5
$ vdpa/vdpa dev config show vdpa0
vdpa0: mac 00:e8:ca:11:be:05 link up link_announce false max_vq_pairs 4 mtu 1500
  negotiated_features MRG_RXBUF CTRL_VQ MQ VERSION_1 ACCESS_PLATFORM

c)Provision a vDPA device with a subset of the supported features:
$ vdpa dev add name vdpa0 mgmtdev pci/0000:01:00.5 device_features 0x300020020
$ vdpa dev config show vdpa0
mac 00:e8:ca:11:be:05 link up link_announce false
  negotiated_features CTRL_VQ VERSION_1 ACCESS_PLATFORM

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Message-Id: <20221125145724.1129962-13-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:55 -05:00
Zhu Lingshan
267000e980 vDPA/ifcvf: retire ifcvf_private_to_vf
This commit retires ifcvf_private_to_vf, because
the vf is already a member of the adapter,
so it could be easily addressed by adapter->vf.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Message-Id: <20221125145724.1129962-12-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:55 -05:00
Zhu Lingshan
93139037b5 vDPA/ifcvf: allocate the adapter in dev_add()
The adapter is the container of the vdpa_device,
this commits allocate the adapter in dev_add()
rather than in probe(). So that the vdpa_device()
could be re-created when the userspace creates
the vdpa device, and free-ed in dev_del()

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Cc: stable@vger.kernel.org
Message-Id: <20221125145724.1129962-11-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:55 -05:00
Zhu Lingshan
6a3b2f179b vDPA/ifcvf: manage ifcvf_hw in the mgmt_dev
This commit allocates the hw structure in the
management device structure. So the hardware
can be initialized once the management device
is allocated in probe.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Cc: stable@vger.kernel.org
Message-Id: <20221125145724.1129962-10-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:55 -05:00
Zhu Lingshan
7cfd36b7e8 vDPA/ifcvf: ifcvf_request_irq works on ifcvf_hw
All ifcvf_request_irq's callees are refactored
to work on ifcvf_hw, so it should be decoupled
from the adapter as well

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Cc: stable@vger.kernel.org
Message-Id: <20221125145724.1129962-9-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:55 -05:00
Zhu Lingshan
a70d833e69 vDPA/ifcvf: decouple config/dev IRQ requester and vectors allocator from the adapter
This commit decouples the config irq requester, the device
shared irq requester and the MSI vectors allocator from
the adapter. So they can be safely invoked since probe
before the adapter is allocated.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Cc: stable@vger.kernel.org
Message-Id: <20221125145724.1129962-8-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:55 -05:00
Zhu Lingshan
f9a9ffb2e4 vDPA/ifcvf: decouple vq irq requester from the adapter
This commit decouples the vq irq requester from the adapter,
so that these functions can be invoked since probe.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Cc: stable@vger.kernel.org
Message-Id: <20221125145724.1129962-7-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:54 -05:00
Zhu Lingshan
23dac55cec vDPA/ifcvf: decouple config IRQ releaser from the adapter
This commit decouples config IRQ releaser from the adapter,
so that it could be invoked once probe or in err handlers.
ifcvf_free_irq() works on ifcvf_hw in this commit

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Cc: stable@vger.kernel.org
Message-Id: <20221125145724.1129962-6-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:54 -05:00
Zhu Lingshan
004cbcabab vDPA/ifcvf: decouple vq IRQ releasers from the adapter
This commit decouples the IRQ releasers from the
adapter, so that these functions could be
safely invoked once probe

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Cc: stable@vger.kernel.org
Message-Id: <20221125145724.1129962-5-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:54 -05:00
Zhu Lingshan
66e3970b16 vDPA/ifcvf: alloc the mgmt_dev before the adapter
This commit reverses the order of allocating the
management device and the adapter. So that it would
be possible to move the allocation of the adapter
to dev_add().

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Cc: stable@vger.kernel.org
Message-Id: <20221125145724.1129962-4-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:54 -05:00
Zhu Lingshan
af8eb69a62 vDPA/ifcvf: decouple config space ops from the adapter
This commit decopules the config space ops from the
adapter layer, so these functions can be invoked
once the device is probed.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Cc: stable@vger.kernel.org
Message-Id: <20221125145724.1129962-3-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:54 -05:00
Zhu Lingshan
d59f633dd0 vDPA/ifcvf: decouple hw features manipulators from the adapter
This commit gets rid of ifcvf_adapter in hw features related
functions in ifcvf_base. Then these functions are more rubust
and de-coupling from the ifcvf_adapter layer. So these
functions could be invoded once the device is probed, even
before the adapter is allocaed.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Cc: stable@vger.kernel.org
Message-Id: <20221125145724.1129962-2-lingshan.zhu@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:54 -05:00
Eli Cohen
0a59975088 vdpa/mlx5: Add RX counters to debugfs
For each interface, either VLAN tagged or untagged, add two hardware
counters: one for unicast and another for multicast. The counters count
RX packets and bytes and can be read through debugfs:

$ cat /sys/kernel/debug/mlx5/mlx5_core.sf.1/vdpa-0/rx/untagged/mcast/packets
$ cat /sys/kernel/debug/mlx5/mlx5_core.sf.1/vdpa-0/rx/untagged/ucast/bytes

This feature is controlled via the config option
MLX5_VDPA_STEERING_DEBUG. It is off by default as it may have some
impact on performance.

includes a fixup By Yang Yingliang <yangyingliang@huawei.com>:

vdpa/mlx5: fix check wrong pointer in mlx5_vdpa_add_mac_vlan_rules()

The local variable 'rule' is not used anymore, fix return value
check after calling mlx5_add_flow_rules().

Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20221114131759.57883-9-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Message-Id: <20230104074418.1737510-1-yangyingliang@huawei.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Eli Cohen <elic@nvidia.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2023-02-20 19:26:54 -05:00
Eli Cohen
2942210043 vdpa/mlx5: Add debugfs subtree
Add debugfs subtree and expose flow table ID and TIR number. This
information can be used by external tools to do extended
troubleshooting.

The information can be retrieved like so:
$ cat /sys/kernel/debug/mlx5/mlx5_core.sf.1/vdpa-0/rx/table_id
$ cat /sys/kernel/debug/mlx5/mlx5_core.sf.1/vdpa-0/rx/tirn

Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20221114131759.57883-8-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:54 -05:00
Eli Cohen
72c67e9b90 vdpa/mlx5: Move some definitions to a new header file
Move some definitions from mlx5_vnet.c to newly added header file
mlx5_vnet.h. We need these definitions for the following patches that
add debugfs tree to expose information vital for debug.

Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20221114131759.57883-7-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-20 19:26:54 -05:00
Stefan Schmidt
195d6cc9c3 MAINTAINERS: Add Miquel Raynal as additional maintainer for ieee802154
We are growing the maintainer team for ieee802154 to spread the load for
review and general maintenance. Miquel has been driving the subsystem
forward over the last year and we would like to welcome him as a
maintainer.

Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Acked-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/r/20230218211317.284889-4-stefan@datenfreihafen.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-20 16:18:53 -08:00
Stefan Schmidt
6b44177285 MAINTAINERS: Switch maintenance for mrf24j40 driver over
Alan Ott has not been actively working on the driver or reviewing
patches for several years. I have been taking odd fixes in through the
wpan/ieee802154 tree. Update the MAINTAINERS file to reflect this
reality. I wanted to thank Alan for his work on the driver.

Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Link: https://lore.kernel.org/r/20230218211317.284889-3-stefan@datenfreihafen.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-20 16:18:53 -08:00
Stefan Schmidt
d1b4b4117f MAINTAINERS: Switch maintenance for mcr20a driver over
Xue Liu has not been actively working on the driver or reviewing
patches for several years. I have been taking odd fixes in through the
wpan/ieee802154 tree. Update the MAINTAINERS file to reflect this
reality. I wanted to thank Xue Liu for his work on the driver.

Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Link: https://lore.kernel.org/r/20230218211317.284889-2-stefan@datenfreihafen.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-20 16:18:53 -08:00
Stefan Schmidt
c551c569e3 MAINTAINERS: Switch maintenance for cc2520 driver over
Varka Bhadram has not been actively working on the driver or reviewing
patches for several years. I have been taking odd fixes in through the
wpan/ieee802154 tree. Update the MAINTAINERS file to reflect this
reality. I wanted to thank Varka for his work on the driver.

Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Link: https://lore.kernel.org/r/20230218211317.284889-1-stefan@datenfreihafen.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-20 16:18:53 -08:00
Dave Airlie
fec67d1896 Merge tag 'amd-drm-next-6.3-2023-02-17' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-6.3-2023-02-17:

amdgpu:
- GC 11 fixes
- Display fixes
- Backlight cleanup
- SMU13 fixes
- SMU7 regression fix
- GFX9 sw queues fix
- AGP fix for GMC 11
- W1 warning fixes
- S/G display fixes
- Misc spelling fixes
- Driver unload fix
- DCN 3.1.4 fixes
- Display code reorg fixes
- Rotation fixes

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230217230930.64821-1-alexander.deucher@amd.com
2023-02-21 10:14:52 +10:00
Linus Torvalds
db77b8502a Merge tag 'asm-generic-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic cleanups from Arnd Bergmann:
 "Only three minor changes: a cross-platform series from Mike Rapoport
  to consolidate asm/agp.h between architectures, and a correctness
  change for __generic_cmpxchg_local() from Matt Evans"

* tag 'asm-generic-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  char/agp: introduce asm-generic/agp.h
  char/agp: consolidate {alloc,free}_gatt_pages()
  locking/atomic: cmpxchg: Make __generic_cmpxchg_local compare against zero-extended 'old' value
2023-02-20 15:55:47 -08:00
Quanfa Fu
c96abaec78 tracing/eprobe: no need to check for negative ret value for snprintf
No need to check for negative return value from snprintf() as the
code does not return negative values.

Link: https://lore.kernel.org/all/20230109040625.3259642-1-quanfafu@gmail.com/

Signed-off-by: Quanfa Fu <quanfafu@gmail.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2023-02-21 08:52:42 +09:00
Masami Hiramatsu (Google)
1fcd09fd4f test_kprobes: Add recursed kprobe test case
Add a recursed kprobe test case to the KUnit test module for kprobes.
This will probe a function which is called from the pre_handler and
post_handler itself. If the kprobe is correctly implemented, the recursed
kprobe handlers will be skipped and the number of skipped kprobe will
be counted on kprobe::nmissed.

Link: https://lore.kernel.org/all/167414238758.2301956.258548940194352895.stgit@devnote3/

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2023-02-21 08:52:42 +09:00
Donglin Peng
8478cca1e3 tracing/probe: add a char type to show the character value of traced arguments
There are scenes that we want to show the character value of traced
arguments other than a decimal or hexadecimal or string value for debug
convinience. I add a new type named 'char' to do it and a new test case
file named 'kprobe_args_char.tc' to do selftest for char type.

For example:

The to be traced function is 'void demo_func(char type, char *name);', we
can add a kprobe event as follows to show argument values as we want:

echo  'p:myprobe demo_func $arg1:char +0($arg2):char[5]' > kprobe_events

we will get the following trace log:

... myprobe: (demo_func+0x0/0x29) arg1='A' arg2={'b','p','f','1',''}

Link: https://lore.kernel.org/all/20221219110613.367098-1-dolinux.peng@gmail.com/

Signed-off-by: Donglin Peng <dolinux.peng@gmail.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2023-02-21 08:52:42 +09:00
Linus Torvalds
950b6662e2 Merge tag 'soc-dt-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull SoC DT updates from Arnd Bergmann:
 "About a quarter of the changes are for 32-bit arm, mostly filling in
  device support for existing machines and adding minor cleanups, mostly
  for Qualcomm and Samsung based machines.

  Two new 32-bit SoCs are added, both are quad-core Cortex-A7 chips from
  Rockchips that have been around for a while but were lacking kernel
  support so far: RV1126 is a Vision SoC with an NPU and is used in the
  Edgeble Neural Compute Module 2(Neu2) board, while RK3128 is design
  for TV boxes and so far only comes with a dts for its refernece
  design.

  The other 32-bit boards that were added are two ASpeed AST2600 based
  BMC boards, the Microchip sam9x60_curiosity development board (Armv5
  based!), the Enclustra PE1 FPGA-SoM baseboard, and a few more boards
  for i.MX53 and i.MX6ULL.

  On the RISC-V side, there are fewer patches, but a total of ten new
  single-board computers based on variations of the Allwinner D1/T113
  chip, plus one more board based on Microchip Polarfire.

  As usual, arm64 has by far the most changes here, with over 700
  non-merge changesets, among them over 400 alone for Qualcomm. The
  newly added SoCs this time are all recent high-end embedded SoCs for
  various markets, each on comes with support for its reference board:

   - Qualcomm SM8550 (Snapdragon 8 Gen 2) for mobile phones
   - Qualcomm QDU1000/QRU1000 5G RAN platform
   - Rockchips RK3588/RK3588s for tablets, chromebooks and SBCs
   - TI J784S4 for industrial and automotive applications

  In total, there are 46 new arm64 machines:
   - Reference platforms for each of the five new SoCs
   - Three Amlogic based development boards
   - Six embedded machines based on NXP i.MX8MM and i.MX8MP
   - The Mediatek mt7986a based Banana Pi R3 router
   - Six tablets based on Qualcomm MSM8916 (Snapdragon 410), SM6115
     (Snapdragon 662) and SM8250 (Snapdragon 865)
   - Two LTE dongles, also based on MSM8916
   - Seven mobile phones, based on Qualcomm MSM8953 (Snapdragon 610),
     SDM450 and SDM632
   - Three chromebooks based on Qualcomm SC7280 (Snapdragon 7c)
   - Nine development boards based on Rockchips RK3588, RK3568, RK3566
     and RK3328.
   - Five development machines based on TI K3 (AM642/AM654/AM68/AM69)

  The cleanup of dtc warnings continues across all platforms, adding to
  the total number of changes"

* tag 'soc-dt-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (1035 commits)
  dt-bindings: riscv: correct starfive visionfive 2 compatibles
  ARM: dts: socfpga: Add enclustra PE1 devicetree
  dt-bindings: altera: Add enclustra mercury PE1
  arm64: dts: qcom: msm8996: align RPM G-Link clock-controller node with bindings
  arm64: dts: qcom: qcs404: align RPM G-Link node with bindings
  arm64: dts: qcom: ipq6018: align RPM G-Link node with bindings
  arm64: dts: qcom: sm8550: remove invalid interconnect property from cryptobam
  arm64: dts: qcom: sc7280: Adjust zombie PWM frequency
  arm64: dts: qcom: sc8280xp-pmics: Specify interrupt parent explicitly
  arm64: dts: qcom: sm7225-fairphone-fp4: enable remaining i2c busses
  arm64: dts: qcom: sm7225-fairphone-fp4: move status property down
  arm64: dts: qcom: pmk8350: Use the correct PON compatible
  arm64: dts: qcom: sc8280xp-x13s: Enable external display
  arm64: dts: qcom: sc8280xp-crd: Introduce pmic_glink
  arm64: dts: qcom: sc8280xp: Add USB-C-related DP blocks
  arm64: dts: qcom: sm8350-hdk: enable GPU
  arm64: dts: qcom: sm8350: add GPU, GMU, GPU CC and SMMU nodes
  arm64: dts: qcom: sm8350: finish reordering nodes
  arm64: dts: qcom: sm8350: move more nodes to correct place
  arm64: dts: qcom: sm8350: reorder device nodes
  ...
2023-02-20 15:49:56 -08:00
Masami Hiramatsu (Google)
96cd93af79 selftests/ftrace: Fix probepoint testcase to ignore __pfx_* symbols
Fix kprobe probepoint testcase to ignore __pfx_* prefix symbols. Those are
introduced by commit b341b20d64 ("x86: Add prefix symbols for function
padding") for identifying PADDING_BYTES of NOPs. Since kprobe events can
not probe these prefix symbols, this testcase has to skip those symbols.

Link: https://lore.kernel.org/all/167309835609.640500.9664678940260305746.stgit@devnote3/

Fixes: b341b20d64 ("x86: Add prefix symbols for function padding")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
2023-02-21 08:49:16 +09:00
Masami Hiramatsu (Google)
a457e944df selftests/ftrace: Fix eprobe syntax test case to check filter support
Fix eprobe syntax test case to check whether the kernel supports the filter
on eprobe for filter syntax test command. Without this fix, this test case
will fail if the kernel supports eprobe but doesn't support the filter on
eprobe.

Link: https://lore.kernel.org/all/167309834742.640500.379128668288448035.stgit@devnote3/

Fixes: 9e14bae7d0 ("selftests/ftrace: Add eprobe syntax error testcase")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
2023-02-21 08:49:16 +09:00
Masami Hiramatsu (Google)
133921530c tracing/eprobe: Fix to add filter on eprobe description in README file
Fix to add a description of the filter on eprobe in README file. This
is required to identify the kernel supports the filter on eprobe or not.

Link: https://lore.kernel.org/all/167309833728.640500.12232259238201433587.stgit@devnote3/

Fixes: 752be5c5c9 ("tracing/eprobe: Add eprobe filter support")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-02-21 08:49:16 +09:00
Yang Jihong
f1c97a1b4e x86/kprobes: Fix arch_check_optimized_kprobe check within optimized_kprobe range
When arch_prepare_optimized_kprobe calculating jump destination address,
it copies original instructions from jmp-optimized kprobe (see
__recover_optprobed_insn), and calculated based on length of original
instruction.

arch_check_optimized_kprobe does not check KPROBE_FLAG_OPTIMATED when
checking whether jmp-optimized kprobe exists.
As a result, setup_detour_execution may jump to a range that has been
overwritten by jump destination address, resulting in an inval opcode error.

For example, assume that register two kprobes whose addresses are
<func+9> and <func+11> in "func" function.
The original code of "func" function is as follows:

   0xffffffff816cb5e9 <+9>:     push   %r12
   0xffffffff816cb5eb <+11>:    xor    %r12d,%r12d
   0xffffffff816cb5ee <+14>:    test   %rdi,%rdi
   0xffffffff816cb5f1 <+17>:    setne  %r12b
   0xffffffff816cb5f5 <+21>:    push   %rbp

1.Register the kprobe for <func+11>, assume that is kp1, corresponding optimized_kprobe is op1.
  After the optimization, "func" code changes to:

   0xffffffff816cc079 <+9>:     push   %r12
   0xffffffff816cc07b <+11>:    jmp    0xffffffffa0210000
   0xffffffff816cc080 <+16>:    incl   0xf(%rcx)
   0xffffffff816cc083 <+19>:    xchg   %eax,%ebp
   0xffffffff816cc084 <+20>:    (bad)
   0xffffffff816cc085 <+21>:    push   %rbp

Now op1->flags == KPROBE_FLAG_OPTIMATED;

2. Register the kprobe for <func+9>, assume that is kp2, corresponding optimized_kprobe is op2.

register_kprobe(kp2)
  register_aggr_kprobe
    alloc_aggr_kprobe
      __prepare_optimized_kprobe
        arch_prepare_optimized_kprobe
          __recover_optprobed_insn    // copy original bytes from kp1->optinsn.copied_insn,
                                      // jump address = <func+14>

3. disable kp1:

disable_kprobe(kp1)
  __disable_kprobe
    ...
    if (p == orig_p || aggr_kprobe_disabled(orig_p)) {
      ret = disarm_kprobe(orig_p, true)       // add op1 in unoptimizing_list, not unoptimized
      orig_p->flags |= KPROBE_FLAG_DISABLED;  // op1->flags ==  KPROBE_FLAG_OPTIMATED | KPROBE_FLAG_DISABLED
    ...

4. unregister kp2
__unregister_kprobe_top
  ...
  if (!kprobe_disabled(ap) && !kprobes_all_disarmed) {
    optimize_kprobe(op)
      ...
      if (arch_check_optimized_kprobe(op) < 0) // because op1 has KPROBE_FLAG_DISABLED, here not return
        return;
      p->kp.flags |= KPROBE_FLAG_OPTIMIZED;   //  now op2 has KPROBE_FLAG_OPTIMIZED
  }

"func" code now is:

   0xffffffff816cc079 <+9>:     int3
   0xffffffff816cc07a <+10>:    push   %rsp
   0xffffffff816cc07b <+11>:    jmp    0xffffffffa0210000
   0xffffffff816cc080 <+16>:    incl   0xf(%rcx)
   0xffffffff816cc083 <+19>:    xchg   %eax,%ebp
   0xffffffff816cc084 <+20>:    (bad)
   0xffffffff816cc085 <+21>:    push   %rbp

5. if call "func", int3 handler call setup_detour_execution:

  if (p->flags & KPROBE_FLAG_OPTIMIZED) {
    ...
    regs->ip = (unsigned long)op->optinsn.insn + TMPL_END_IDX;
    ...
  }

The code for the destination address is

   0xffffffffa021072c:  push   %r12
   0xffffffffa021072e:  xor    %r12d,%r12d
   0xffffffffa0210731:  jmp    0xffffffff816cb5ee <func+14>

However, <func+14> is not a valid start instruction address. As a result, an error occurs.

Link: https://lore.kernel.org/all/20230216034247.32348-3-yangjihong1@huawei.com/

Fixes: f66c0447cc ("kprobes: Set unoptimized flag after unoptimizing code")
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Cc: stable@vger.kernel.org
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2023-02-21 08:49:16 +09:00
Yang Jihong
868a6fc0ca x86/kprobes: Fix __recover_optprobed_insn check optimizing logic
Since the following commit:

  commit f66c0447cc ("kprobes: Set unoptimized flag after unoptimizing code")

modified the update timing of the KPROBE_FLAG_OPTIMIZED, a optimized_kprobe
may be in the optimizing or unoptimizing state when op.kp->flags
has KPROBE_FLAG_OPTIMIZED and op->list is not empty.

The __recover_optprobed_insn check logic is incorrect, a kprobe in the
unoptimizing state may be incorrectly determined as unoptimizing.
As a result, incorrect instructions are copied.

The optprobe_queued_unopt function needs to be exported for invoking in
arch directory.

Link: https://lore.kernel.org/all/20230216034247.32348-2-yangjihong1@huawei.com/

Fixes: f66c0447cc ("kprobes: Set unoptimized flag after unoptimizing code")
Cc: stable@vger.kernel.org
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2023-02-21 08:49:16 +09:00
Masami Hiramatsu (Google)
4fbd2f83fd kprobes: Fix to handle forcibly unoptimized kprobes on freeing_list
Since forcibly unoptimized kprobes will be put on the freeing_list directly
in the unoptimize_kprobe(), do_unoptimize_kprobes() must continue to check
the freeing_list even if unoptimizing_list is empty.

This bug can happen if a kprobe is put in an instruction which is in the
middle of the jump-replaced instruction sequence of an optprobe, *and* the
optprobe is recently unregistered and queued on unoptimizing_list.
In this case, the optprobe will be unoptimized forcibly (means immediately)
and put it into the freeing_list, expecting the optprobe will be handled in
do_unoptimize_kprobe().
But if there is no other optprobes on the unoptimizing_list, current code
returns from the do_unoptimize_kprobe() soon and does not handle the
optprobe which is on the freeing_list. Then the optprobe will hit the
WARN_ON_ONCE() in the do_free_cleaned_kprobes(), because it is not handled
in the latter loop of the do_unoptimize_kprobe().

To solve this issue, do not return from do_unoptimize_kprobes() immediately
even if unoptimizing_list is empty.

Moreover, this change affects another case. kill_optimized_kprobes() expects
kprobe_optimizer() will just free the optprobe on freeing_list.
So I changed it to just do list_move() to freeing_list if optprobes are on
unoptimizing list. And the do_unoptimize_kprobe() will skip
arch_disarm_kprobe() if the probe on freeing_list has gone flag.

Link: https://lore.kernel.org/all/Y8URdIfVr3pq2X8w@xpf.sh.intel.com/
Link: https://lore.kernel.org/all/167448024501.3253718.13037333683110512967.stgit@devnote3/

Fixes: e4add24778 ("kprobes: Fix optimize_kprobe()/unoptimize_kprobe() cancellation logic")
Reported-by: Pengfei Xu <pengfei.xu@intel.com>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: stable@vger.kernel.org
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-02-21 08:49:16 +09:00
Dave Airlie
ef04277600 Merge tag 'drm-misc-next-fixes-2023-02-16' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
Short summary of fixes pull:

Contains fixes for DP MST and the panel orientation on an Lenovo
IdeaPad model.

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/Y+4H4C4E6cZcM9+J@linux-uq9g
2023-02-21 09:44:21 +10:00
Linus Torvalds
c72e04c26f Merge tag 'soc-defconfig-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM defconfigs updates from Arnd Bergmann:
 "As usual, this contains all the patches to enable options for newly
  added device drivers in the 32-bit and 64-bit defconfig files.

  I have sorted the files according to the changes to Kconfig files,
  to make it easier to check what has changed compared to the 'make
  savedefconfig' output.

  The most notable change this time is a series from Mark Brown to add
  a 'virtconfig' target for arm64, which is for the moment the same as
  the 'defconfig' target but disables all the top-level SoC specific
  options in order to have a smaller and faster kernel build"

* tag 'soc-defconfig-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (39 commits)
  arm64: defconfig: enable drivers required by the Qualcomm SA8775P platform
  arm64: defconfig: Enable DisplayPort on SC8280XP laptops
  arm64: configs: Add virtconfig
  kbuild: Provide a version of merge_into_defconfig without override warnings
  scripts: merge_config: Add option to suppress warning on overrides
  ARM: reorder defconfig files
  arm64: reorder defconfig
  arm64: defconfig: enable Qualcomm SDAM nvmem driver
  arm64: defconfig: enable SM8450 DISPCC clock driver
  ARM: defconfig: Add IOSCHED_BFQ to the default configs
  ARM: configs: multi_v7: enable NVMEM driver for STM32
  ARM: Add wpcm450_defconfig for Nuvoton WPCM450
  arm64: defconfig: Enable DMA_RESTRICTED_POOL
  arm64: defconfig: Enable missing configs for mt8192-asurada
  riscv: defconfig: Enable the Allwinner D1 platform and drivers
  ARM: imx_v6_v7_defconfig: Don't enable PROVE_LOCKING
  ARM: multi_v7_defconfig: Add GXP Fan and SPI support
  ARM: add multi_v7_lpae_defconfig
  kbuild: Add config fragment merge functionality
  ARM: multi_v7_defconfig: Add options to support TQMLS102xA series
  ...
2023-02-20 15:43:36 -08:00
Linus Torvalds
b32c6e029c Merge tag 'arm-soc-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC updates from Arnd Bergmann:
 "The majority of the changes are for the OMAP2 platform, mostly
  removing some dead code that got left behind from previous cleanups.

  Aside from that, there are very minor updates and correctness fixes
  for Zynq, i.MX, Samsung, Broadcom, AT91, ep93xx, and OMAP1"

* tag 'arm-soc-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (26 commits)
  dt-bindings: soc: samsung: exynos-pmu: allow phys as child
  ARM: imx: mach-imx6ul: add imx6ulz support
  ARM: imx: Call ida_simple_remove() for ida_simple_get
  arm64: drop redundant "ARMv8" from Kconfig option title
  ARM: ep93xx: Convert to use descriptors for GPIO LEDs
  ARM: s3c: fix s3c64xx_set_timer_source prototype
  ARM: OMAP2+: Fix spelling typos in comment
  ARM: OMAP2+: Remove unneeded #include <linux/pinctrl/machine.h>
  ARM: OMAP2+: Remove unneeded #include <linux/pinctrl/pinmux.h>
  ARM: OMAP1: call platform_device_put() in error case in omap1_dm_timer_init()
  ARM: BCM63xx: remove useless goto statement
  ARM: omap2: make functions static
  ARM: omap2: remove unused omap2_pm_init
  ARM: omap2: remove unused declarations
  ARM: omap2: remove unused functions
  ARM: omap2: smartreflex: remove on_init control
  ARM: omap2: remove APLL control
  ARM: omap2: simplify clock2xxx header
  ARM: omap2: remove unused omap_hwmod_reset.c
  ARM: omap2: remove unused headers
  ...
2023-02-20 15:36:37 -08:00
Linus Torvalds
ff0c7e1862 Merge tag 'arm-boardfile-remove-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC boardfile updates from Arnd Bergmann
 "Unused boardfile removal for 6.3

  This is a follow-up to the deprecation of most of the old-style board
  files that was merged in linux-6.0, removing them for good.

  This branch is almost exclusively dead code removal based on those
  annotations. Some device driver removals went through separate
  subsystem trees, but the majority is in the same branch, in order to
  better handle dependencies between the patches and avoid breaking
  bisection.

  Unfortunately that leads to merge conflicts against other changes in
  the subsystem trees, but they should all be trivial to resolve by
  removing the files.

  See commit 7d0d3fa733 ("Merge tag 'arm-boardfiles-6.0' of
  git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc") for the
  description of which machines were marked unused and are now removed.

  The only removals that got postponed are Terastation WXL (mv78xx0) and
  Jornada720 (StrongARM1100), which turned out to still have potential
  users"

* tag 'arm-boardfile-remove-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (91 commits)
  mmc: omap: drop TPS65010 dependency
  ARM: pxa: restore mfp-pxa320.h
  usb: ohci-omap: avoid unused-variable warning
  ARM: debug: remove references in DEBUG_UART_8250_SHIFT to removed configs
  ARM: s3c: remove obsolete s3c-cpu-freq header
  MAINTAINERS: adjust SAMSUNG SOC CLOCK DRIVERS after s3c24xx support removal
  MAINTAINERS: update file entries after arm multi-platform rework and mach-pxa removal
  ARM: remove CONFIG_UNUSED_BOARD_FILES
  mfd: remove htc-pasic3 driver
  w1: remove ds1wm driver
  usb: remove ohci-tmio driver
  fbdev: remove w100fb driver
  fbdev: remove tmiofb driver
  mmc: remove tmio_mmc driver
  mfd: remove ucb1400 support
  mfd: remove toshiba tmio drivers
  rtc: remove v3020 driver
  power: remove pda_power supply driver
  ASoC: pxa: remove unused board support
  pcmcia: remove unused pxa/sa1100 drivers
  ...
2023-02-20 15:28:57 -08:00
David Howells
16541195c6 cifs: Add a function to read into an iter from a socket
Add a helper function to read data from a socket into the given iterator.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org

Link: https://lore.kernel.org/r/164928617874.457102.10021662143234315566.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/165211419563.3154751.18431990381145195050.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/165348879662.2106726.16881134187242702351.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/165364826398.3334034.12541600783145647319.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/166126395495.708021.12328677373159554478.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/166697258876.61150.3530237818849429372.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/166732031039.3186319.10691316510079412635.stgit@warthog.procyon.org.uk/ # rfc
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-02-20 17:25:44 -06:00
David Howells
b8713c4dbf cifs: Add some helper functions
Add some helper functions to manipulate the folio marks by iterating
through a list of folios held in an xarray rather than using a page list.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org

Link: https://lore.kernel.org/r/164928616583.457102.15157033997163988344.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/165211418840.3154751.3090684430628501879.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/165348878940.2106726.204291614267188735.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/165364825674.3334034.3356201708659748648.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/166126394799.708021.10637797063862600488.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/166697258147.61150.9940790486999562110.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/166732030314.3186319.9209944805565413627.stgit@warthog.procyon.org.uk/ # rfc
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-02-20 17:25:44 -06:00
David Howells
39bc58203f cifs: Add a function to Hash the contents of an iterator
Add a function to push the contents of a BVEC-, KVEC- or XARRAY-type
iterator into a synchronous hash algorithm.

UBUF- and IOBUF-type iterators are not supported on the assumption that
either we're doing buffered I/O, in which case we won't see them, or we're
doing direct I/O, in which case the iterator will have been extracted into
a BVEC-type iterator higher up.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-crypto@vger.kernel.org

Link: https://lore.kernel.org/r/166697257423.61150.12070648579830206483.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/166732029577.3186319.17162612653237909961.stgit@warthog.procyon.org.uk/ # rfc
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-02-20 17:25:44 -06:00
David Howells
e5fbdde430 cifs: Add a function to build an RDMA SGE list from an iterator
Add a function to add elements onto an RDMA SGE list representing page
fragments extracted from a BVEC-, KVEC- or XARRAY-type iterator and DMA
mapped until the maximum number of elements is reached.

Nothing is done to make sure the pages remain present - that must be done
by the caller.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Tom Talpey <tom@talpey.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-rdma@vger.kernel.org

Link: https://lore.kernel.org/r/166697256704.61150.17388516338310645808.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/166732028840.3186319.8512284239779728860.stgit@warthog.procyon.org.uk/ # rfc
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-02-20 17:25:44 -06:00
David Howells
0185846975 netfs: Add a function to extract an iterator into a scatterlist
Provide a function for filling in a scatterlist from the list of pages
contained in an iterator.

If the iterator is UBUF- or IOBUF-type, the pages have a pin taken on them
(as FOLL_PIN).

If the iterator is BVEC-, KVEC- or XARRAY-type, no pin is taken on the
pages and it is left to the caller to manage their lifetime.  It cannot be
assumed that a ref can be validly taken, particularly in the case of a KVEC
iterator.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: linux-cachefs@redhat.com
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-02-20 17:25:43 -06:00
David Howells
85dd2c8ff3 netfs: Add a function to extract a UBUF or IOVEC into a BVEC iterator
Add a function to extract the pages from a user-space supplied iterator
(UBUF- or IOVEC-type) into a BVEC-type iterator, retaining the pages by
getting a pin on them (as FOLL_PIN) as we go.

This is useful in three situations:

 (1) A userspace thread may have a sibling that unmaps or remaps the
     process's VM during the operation, changing the assignment of the
     pages and potentially causing an error.  Retaining the pages keeps
     some pages around, even if this occurs; futher, we find out at the
     point of extraction if EFAULT is going to be incurred.

 (2) Pages might get swapped out/discarded if not retained, so we want to
     retain them to avoid the reload causing a deadlock due to a DIO
     from/to an mmapped region on the same file.

 (3) The iterator may get passed to sendmsg() by the filesystem.  If a
     fault occurs, we may get a short write to a TCP stream that's then
     tricky to recover from.

We don't deal with other types of iterator here, leaving it to other
mechanisms to retain the pages (eg. PG_locked, PG_writeback and the pipe
lock).

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: linux-cachefs@redhat.com
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-02-20 17:25:43 -06:00
David Howells
4e260a8fd7 cifs: Implement splice_read to pass down ITER_BVEC not ITER_PIPE
Provide cifs_splice_read() to use a bvec rather than an pipe iterator as
the latter cannot so easily be split and advanced, which is necessary to
pass an iterator down to the bottom levels.  Upstream cifs gets around this
problem by using iov_iter_get_pages() to prefill the pipe and then passing
the list of pages down.

This is done by:

 (1) Bulk-allocate a bunch of pages to carry as much of the requested
     amount of data as possible, but without overrunning the available
     slots in the pipe and add them to an ITER_BVEC.

 (2) Synchronously call ->read_iter() to read into the buffer.

 (3) Discard any unused pages.

 (4) Load the remaining pages into the pipe in order and advance the head
     pointer.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: linux-cifs@vger.kernel.org

Link: https://lore.kernel.org/r/166732028113.3186319.1793644937097301358.stgit@warthog.procyon.org.uk/ # rfc
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-02-20 17:25:43 -06:00
David Howells
7c8e01ebf2 splice: Export filemap/direct_splice_read()
filemap_splice_read() and direct_splice_read() should be exported.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Christoph Hellwig <hch@lst.de>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: David Hildenbrand <david@redhat.com>
cc: John Hubbard <jhubbard@nvidia.com>
cc: linux-cifs@vger.kernel.org
cc: linux-mm@kvack.org
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-02-20 17:25:43 -06:00
David Howells
7d58fe7310 iov_iter: Add a function to extract a page list from an iterator
Add a function, iov_iter_extract_pages(), to extract a list of pages from
an iterator.  The pages may be returned with a pin added or nothing,
depending on the type of iterator.

Add a second function, iov_iter_extract_will_pin(), to determine how the
cleanup should be done.

There are two cases:

 (1) ITER_IOVEC or ITER_UBUF iterator.

     Extracted pages will have pins (FOLL_PIN) obtained on them so that a
     concurrent fork() will forcibly copy the page so that DMA is done
     to/from the parent's buffer and is unavailable to/unaffected by the
     child process.

     iov_iter_extract_will_pin() will return true for this case.  The
     caller should use something like unpin_user_page() to dispose of the
     page.

 (2) Any other sort of iterator.

     No refs or pins are obtained on the page, the assumption is made that
     the caller will manage page retention.

     iov_iter_extract_will_pin() will return false.  The pages don't need
     additional disposal.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: John Hubbard <jhubbard@nvidia.com>
cc: David Hildenbrand <david@redhat.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-02-20 17:25:43 -06:00
David Howells
f62e52d127 iov_iter: Define flags to qualify page extraction.
Define flags to qualify page extraction to pass into iov_iter_*_pages*()
rather than passing in FOLL_* flags.

For now only a flag to allow peer-to-peer DMA is supported.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Logan Gunthorpe <logang@deltatee.com>
cc: linux-fsdevel@vger.kernel.org
cc: linux-block@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-02-20 17:25:43 -06:00
David Howells
33b3b04154 splice: Add a func to do a splice from an O_DIRECT file without ITER_PIPE
Implement a function, direct_file_splice(), that deals with this by using
an ITER_BVEC iterator instead of an ITER_PIPE iterator as the former won't
free its buffers when reverted.  The function bulk allocates all the
buffers it thinks it is going to use in advance, does the read
synchronously and only then trims the buffer down.  The pages we did use
get pushed into the pipe.

This fixes a problem with the upcoming iov_iter_extract_pages() function,
whereby pages extracted from a non-user-backed iterator such as ITER_PIPE
aren't pinned.  __iomap_dio_rw(), however, calls iov_iter_revert() to
shorten the iterator to just the bufferage it is going to use - which has
the side-effect of freeing the excess pipe buffers, even though they're
attached to a bio and may get written to by DMA (thanks to Hillf Danton for
spotting this[1]).

This then causes memory corruption that is particularly noticeable when the
syzbot test[2] is run.  The test boils down to:

	out = creat(argv[1], 0666);
	ftruncate(out, 0x800);
	lseek(out, 0x200, SEEK_SET);
	in = open(argv[1], O_RDONLY | O_DIRECT | O_NOFOLLOW);
	sendfile(out, in, NULL, 0x1dd00);

run repeatedly in parallel.  What I think is happening is that ftruncate()
occasionally shortens the DIO read that's about to be made by sendfile's
splice core by reducing i_size.

This should be more efficient for DIO read by virtue of doing a bulk page
allocation, but slightly less efficient by ignoring any partial page in the
pipe.

Reported-by: syzbot+a440341a59e3b7142895@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
cc: Christoph Hellwig <hch@lst.de>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: David Hildenbrand <david@redhat.com>
cc: John Hubbard <jhubbard@nvidia.com>
cc: linux-mm@kvack.org
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20230207094731.1390-1-hdanton@sina.com/ [1]
Link: https://lore.kernel.org/r/000000000000b0b3c005f3a09383@google.com/ [2]
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-02-20 17:25:43 -06:00
David Howells
07073eb01c splice: Add a func to do a splice from a buffered file without ITER_PIPE
Provide a function to do splice read from a buffered file, pulling the
folios out of the pagecache directly by calling filemap_get_pages() to do
any required reading and then pasting the returned folios into the pipe.

A helper function is provided to do the actual folio pasting and will
handle multipage folios by splicing as many of the relevant subpages as
will fit into the pipe.

The code is loosely based on filemap_read() and might belong in
mm/filemap.c with that as it needs to use filemap_get_pages().

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
cc: Christoph Hellwig <hch@lst.de>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: David Hildenbrand <david@redhat.com>
cc: John Hubbard <jhubbard@nvidia.com>
cc: linux-mm@kvack.org
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-02-20 17:25:43 -06:00
David Howells
dd5b9d003e mm: Pass info, not iter, into filemap_get_pages()
filemap_get_pages() and a number of functions that it calls take an
iterator to provide two things: the number of bytes to be got from the file
specified and whether partially uptodate pages are allowed.  Change these
functions so that this information is passed in directly.  This allows it
to be called without having an iterator to hand.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
cc: Christoph Hellwig <hch@lst.de>
cc: Matthew Wilcox <willy@infradead.org>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: David Hildenbrand <david@redhat.com>
cc: John Hubbard <jhubbard@nvidia.com>
cc: linux-mm@kvack.org
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-02-20 17:25:43 -06:00