linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-07 20:58:14 -04:00

Author	SHA1	Message	Date
Maxim Mikityanskiy	fb5cd0ce70	selftests/bpf: Add selftests for raw syncookie helpers This commit adds selftests for the new BPF helpers: bpf_tcp_raw_{gen,check}_syncookie_ipv{4,6}. xdp_synproxy_kern.c is a BPF program that generates SYN cookies on allowed TCP ports and sends SYNACKs to clients, accelerating synproxy iptables module. xdp_synproxy.c is a userspace control application that allows to configure the following options in runtime: list of allowed ports, MSS, window scale, TTL. A selftest is added to prog_tests that leverages the above programs to test the functionality of the new helpers. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20220615134847.3753567-5-maximmi@nvidia.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-06-16 21:20:30 -07:00
Maxim Mikityanskiy	33bf988504	bpf: Add helpers to issue and check SYN cookies in XDP The new helpers bpf_tcp_raw_{gen,check}_syncookie_ipv{4,6} allow an XDP program to generate SYN cookies in response to TCP SYN packets and to check those cookies upon receiving the first ACK packet (the final packet of the TCP handshake). Unlike bpf_tcp_{gen,check}_syncookie these new helpers don't need a listening socket on the local machine, which allows to use them together with synproxy to accelerate SYN cookie generation. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20220615134847.3753567-4-maximmi@nvidia.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-06-16 21:20:30 -07:00
Maxim Mikityanskiy	508362ac66	bpf: Allow helpers to accept pointers with a fixed size Before this commit, the BPF verifier required ARG_PTR_TO_MEM arguments to be followed by ARG_CONST_SIZE holding the size of the memory region. The helpers had to check that size in runtime. There are cases where the size expected by a helper is a compile-time constant. Checking it in runtime is an unnecessary overhead and waste of BPF registers. This commit allows helpers to accept pointers to memory without the corresponding ARG_CONST_SIZE, given that they define the memory region size in struct bpf_func_proto and use ARG_PTR_TO_FIXED_SIZE_MEM type. arg_size is unionized with arg_btf_id to reduce the kernel image size, and it's valid because they are used by different argument types. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20220615134847.3753567-3-maximmi@nvidia.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-06-16 21:20:29 -07:00
Maxim Mikityanskiy	ac80287a6a	bpf: Fix documentation of th_len in bpf_tcp_{gen,check}_syncookie bpf_tcp_gen_syncookie expects the full length of the TCP header (with all options), and bpf_tcp_check_syncookie accepts lengths bigger than sizeof(struct tcphdr). Fix the documentation that says these lengths should be exactly sizeof(struct tcphdr). While at it, fix a typo in the name of struct ipv6hdr. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20220615134847.3753567-2-maximmi@nvidia.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-06-16 21:20:29 -07:00
Jakub Kicinski	e8b03391b6	Merge branch 'net-lan743x-pci11010-pci11414-devices-enhancements' Raju Lakkaraju says: ==================== net: lan743x: PCI11010 / PCI11414 devices Enhancements This patch series continues with the addition of supported features for the Ethernet function of the PCI11010 / PCI11414 devices to the LAN743x driver. ==================== Link: https://lore.kernel.org/r/20220616041226.26996-1-Raju.Lakkaraju@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-16 20:45:52 -07:00
Raju Lakkaraju	311abcdddc	net: phy: add support to get Master-Slave configuration Add support to Master-Slave configuration and state Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-16 20:45:48 -07:00
Raju Lakkaraju	46b777ad9a	net: lan743x: Add support to SGMII 1G and 2.5G Add SGMII access read and write functions Add support to SGMII 1G and 2.5G for PCI11010/PCI11414 chips Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-16 20:45:48 -07:00
Raju Lakkaraju	6b3768ac8e	net: lan743x: Add support to Secure-ON WOL Add support to Magic Packet Detection with Secure-ON for PCI11010/PCI11414 chips Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-16 20:45:48 -07:00
Raju Lakkaraju	9aeb87d2b5	net: lan743x: Add support to LAN743x register dump Add support to LAN743x common register dump Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-16 20:45:48 -07:00
Jakub Kicinski	f050272436	Merge branch 'net-dsa-realtek-rtl8365mb-improve-handling-of-phy-modes' Alvin Šipraga says: ==================== net: dsa: realtek: rtl8365mb: improve handling of PHY modes This series introduces some minor cleanup of the driver and improves the handling of PHY interface modes to break the assumption that CPU ports are always over an external interface, and the assumption that user ports are always using an internal PHY. ==================== Link: https://lore.kernel.org/r/20220615225116.432283-1-alvin@pqrs.dk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-16 20:45:07 -07:00
Alvin Šipraga	a48b6e44a9	net: dsa: realtek: rtl8365mb: handle PHY interface modes correctly Realtek switches in the rtl8365mb family always have at least one port with a so-called external interface, supporting PHY interface modes such as RGMII or SGMII. The purpose of this patch is to improve the driver's handling of these ports. A new struct rtl8365mb_chip_info is introduced together with a static array of such structs. An instance of this struct is added for each supported switch, distinguished by its chip ID and version. Embedded in each chip_info struct is an array of struct rtl8365mb_extint, describing the external interfaces available. This is more specific than the old rtl8365mb_extint_port_map, which was only valid for switches with up to 6 ports. The struct rtl8365mb_extint also contains a bitmask of supported PHY interface modes, which allows the driver to distinguish which ports support RGMII. This corrects a previous mistake in the driver whereby it was assumed that any port with an external interface supports RGMII. This is not actually the case: for example, the RTL8367S has two external interfaces, only the second of which supports RGMII. The first supports only SGMII and HSGMII. This new design will make it easier to add support for other interface modes. Finally, rtl8365mb_phylink_get_caps() is fixed up to return supported capabilities based on the external interface properties described above. This addresses Vladimir's point in the linked thread that the capabilities are not actually a function of the DSA port type: Although most typical applications will treat the ports with internal PHY as user ports, there is no actual hardware limitation preventing one from using them as a CPU port. Equally, ports with external interface(s) may well be treated as user ports, even though it is typical to use those ports as CPU ports. Link: https://lore.kernel.org/netdev/20220510192301.5djdt3ghoavxulhl@bang-olufsen.dk/ Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Acked-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-16 20:44:47 -07:00
Alvin Šipraga	b3456030f5	net: dsa: realtek: rtl8365mb: remove learn_limit_max private data member The variable is just assigned the value of a macro, so it can be removed. Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-16 20:44:47 -07:00
Alvin Šipraga	ca5ecd4246	net: dsa: realtek: rtl8365mb: correct the max number of ports The maximum number of ports is actually 11, according to two observations: 1. The highest port ID used in the vendor driver is 10. Since port IDs are indexed from 0, and since DSA follows the same numbering system, this means up to 11 ports are to be presumed. 2. The registers with port mask fields always amount to a maximum port mask of 0x7FF, corresponding to a maximum 11 ports. In view of this, I also deleted the comment. Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Reviewed-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-16 20:44:47 -07:00
Alvin Šipraga	b325159d00	net: dsa: realtek: rtl8365mb: remove port_mask private data member There is no real need for this variable: the line change interrupt mask is sufficiently masked out when getting linkup_ind and linkdown_ind in the interrupt handler. Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-16 20:44:46 -07:00
Alvin Šipraga	5eb1a23840	net: dsa: realtek: rtl8365mb: rename macro RTL8367RB -> RTL8367RB_VB The official name of this switch is RTL8367RB-VB, not RTL8367RB. There is also an RTL8367RB-VC which is rather different. Change the name of the CHIP_ID/_VER macros for reasons of consistency. Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Reviewed-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-16 20:44:46 -07:00
Jakub Kicinski	821c7733d2	Merge branch 'net-ipa-more-multi-channel-event-ring-work' Alex Elder says: ==================== net: ipa: more multi-channel event ring work This series makes a little more progress toward supporting multiple channels with a single event ring. The first removes the assumption that consecutive events are associated with the same RX channel. The second derives the channel associated with an event from the event itself, and the next does a small cleanup enabled by that. The fourth causes updates to occur for every event processed (rather once). And the final patch does a little more rework to make TX completion have more in common with RX completion. ==================== Link: https://lore.kernel.org/r/20220615165929.5924-1-elder@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-16 20:44:08 -07:00
Alex Elder	81765eeac1	net: ipa: move more code out of gsi_channel_update() Move the processing done for TX channels in gsi_channel_update() into gsi_evt_ring_rx_update(). The called function is called for both RX and TX channels, so rename it to be gsi_evt_ring_update(). As a result, this code no longer assumes events in an event ring are associated with just one channel. Because all events in a ring are handled in that function, we can move the call to gsi_trans_move_complete() there, and can ring the event ring doorbell there as well after all new events in the ring have been processed. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-16 20:44:04 -07:00
Alex Elder	9f1c3ad654	net: ipa: call gsi_evt_ring_rx_update() unconditionally When an RX transaction completes, we update the trans->len field to contain the actual number of bytes received. This is done in a loop in gsi_evt_ring_rx_update(). Change that function so it checks the data transfer direction recorded in the transaction, and only updates trans->len for RX transfers. Then call it unconditionally. This means events for TX endpoints will run through the loop without otherwise doing anything, but this will change shortly. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-16 20:44:04 -07:00
Alex Elder	2f48fb0edc	net: ipa: pass GSI pointer to gsi_evt_ring_rx_update() The only reason the event ring's channel pointer is needed in gsi_evt_ring_rx_update() is so we can get at its GSI pointer. We can pass the GSI pointer as an argument, along with the event ring ID, and thereby avoid using the event ring channel pointer. This is another step toward no longer assuming an event ring services a single channel. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-16 20:44:03 -07:00
Alex Elder	8eec783195	net: ipa: don't pass channel when mapping transaction Change gsi_channel_trans_map() so it derives the channel used from the transaction. Pass the index of the first TRE used by the transaction, and have the called function account for the fact that the last one used is what's important. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-16 20:44:03 -07:00
Alex Elder	dd5a046cbb	net: ipa: don't assume one channel per event ring In gsi_evt_ring_rx_update(), use gsi_event_trans() repeatedly to find the transaction associated with an event, rather than assuming consecutive events are associated with the same channel. This removes the only caller of gsi_trans_pool_next(), so get rid of it. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-16 20:44:03 -07:00
Jakub Kicinski	6c0d09d937	Merge branch 'dt-bindings-dp83867-add-binding-for-io_impedance_ctrl-nvmem-cell' Rasmus Villemoes says: ==================== dt-bindings: dp83867: add binding for io_impedance_ctrl nvmem cell We have a board where measurements indicate that the current three options - leaving IO_IMPEDANCE_CTRL at the reset value (which is factory calibrated to a value corresponding to approximately 50 ohms) or using one of the two boolean properties to set it to the min/max value - are too coarse. This series adds a device tree binding for an nvmem cell which can be populated during production with a suitable value calibrated for each board, and corresponding support in the driver. The second patch adds a trivial phy wrapper for dev_err_probe(), used in the third. ==================== Link: https://lore.kernel.org/r/20220614084612.325229-1-linux@rasmusvillemoes.dk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-16 20:29:08 -07:00
Rasmus Villemoes	5c2d0a6a07	net: phy: dp83867: implement support for io_impedance_ctrl nvmem cell We have a board where measurements indicate that the current three options - leaving IO_IMPEDANCE_CTRL at the (factory calibrated) reset value or using one of the two boolean properties to set it to the min/max value - are too coarse. Implement support for the newly added binding allowing device tree to specify an nvmem cell containing an appropriate value for this specific board. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-16 20:29:06 -07:00
Rasmus Villemoes	a793679827	linux/phy.h: add phydev_err_probe() wrapper for dev_err_probe() The dev_err_probe() function is quite useful to avoid boilerplate related to -EPROBE_DEFER handling. Add a phydev_err_probe() helper to simplify making use of that from phy drivers which otherwise use the phydev_* helpers. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-16 20:29:06 -07:00
Rasmus Villemoes	ab1e9de84a	dt-bindings: dp83867: add binding for io_impedance_ctrl nvmem cell We have a board where measurements indicate that the current three options - leaving IO_IMPEDANCE_CTRL at the reset value (which is factory calibrated to a value corresponding to approximately 50 ohms) or using one of the two boolean properties to set it to the min/max value - are too coarse. There is no fixed mapping from register values to values in the range 35-70 ohms; it varies from chip to chip, and even that target range is approximate. So add a DT binding for an nvmem cell which can be populated during production with a value suitable for each specific board. Reviewed-by: Rob Herring <robh@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-16 20:28:53 -07:00
Claudiu Manoil	9b7fd1670a	phy: aquantia: Fix AN when higher speeds than 1G are not advertised Even when the eth port is resticted to work with speeds not higher than 1G, and so the eth driver is requesting the phy (via phylink) to advertise up to 1000BASET support, the aquantia phy device is still advertising for 2.5G and 5G speeds. Clear these advertising defaults when requested. Cc: Ondrej Spacek <ondrej.spacek@nxp.com> Fixes: `09c4c57f7b` ("net: phy: aquantia: add support for auto-negotiation configuration") Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Link: https://lore.kernel.org/r/20220610084037.7625-1-claudiu.manoil@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-16 20:25:55 -07:00
Jakub Kicinski	9cbc991126	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-16 20:13:52 -07:00
Alexei Starovoitov	a4a8b2eea4	Merge branch 'bpf: Fix cookie values for kprobe multi' Jiri Olsa says: ==================== hi, there's bug in kprobe_multi link that makes cookies misplaced when using symbols to attach. The reason is that we sort symbols by name but not adjacent cookie values. Current test did not find it because bpf_fentry_test* are already sorted by name. v3 changes: - fixed kprobe_multi bench test to filter out invalid entries from available_filter_functions v2 changes: - rebased on top of bpf/master - checking if cookies are defined later in swap function [Andrii] - added acks thanks, jirka ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-06-16 19:42:22 -07:00
Jiri Olsa	730067022c	selftest/bpf: Fix kprobe_multi bench test With [1] the available_filter_functions file contains records starting with __ftrace_invalid_address___ and marking disabled entries. We need to filter them out for the bench test to pass only resolvable symbols to kernel. [1] commit `b39181f7c6` ("ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function") Fixes: `b39181f7c6` ("ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20220615112118.497303-5-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-06-16 19:42:21 -07:00
Jiri Olsa	eb5fb03256	bpf: Force cookies array to follow symbols sorting When user specifies symbols and cookies for kprobe_multi link interface it's very likely the cookies will be misplaced and returned to wrong functions (via get_attach_cookie helper). The reason is that to resolve the provided functions we sort them before passing them to ftrace_lookup_symbols, but we do not do the same sort on the cookie values. Fixing this by using sort_r function with custom swap callback that swaps cookie values as well. Fixes: `0236fec57a` ("bpf: Resolve symbols with ftrace_lookup_symbols for kprobe multi link") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20220615112118.497303-4-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-06-16 19:42:21 -07:00
Jiri Olsa	eb1b2985fe	ftrace: Keep address offset in ftrace_lookup_symbols We want to store the resolved address on the same index as the symbol string, because that's the user (bpf kprobe link) code assumption. Also making sure we don't store duplicates that might be present in kallsyms. Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Fixes: `bed0d9a50d` ("ftrace: Add ftrace_lookup_symbols function") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20220615112118.497303-3-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-06-16 19:42:21 -07:00
Jiri Olsa	ad8848535e	selftests/bpf: Shuffle cookies symbols in kprobe multi test There's a kernel bug that causes cookies to be misplaced and the reason we did not catch this with this test is that we provide bpf_fentry_test* functions already sorted by name. Shuffling function bpf_fentry_test2 deeper in the list and keeping the current cookie values as before will trigger the bug. The kernel fix is coming in following changes. Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20220615112118.497303-2-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-06-16 19:42:21 -07:00
Alexei Starovoitov	88bf185813	Merge branch 'sleepable uprobe support' Delyan Kratunov says: ==================== This series implements support for sleepable uprobe programs. Key work is in patches 2 and 3, the rest is plumbing and tests. The main observation is that the only obstacle in the way of sleepable uprobe programs is not the uprobe infrastructure, which already runs in a user-like context, but the rcu usage around bpf_prog_array. Details are in patch 2 but the tl;dr is that we chain trace_tasks and normal rcu grace periods when releasing to array to accommodate users of either rcu type. This introduces latency for non-sleepable users (kprobe, tp) but that's deemed acceptable, given recent benchmarks by Andrii [1]. We're a couple of orders of magnitude under the rate of bpf_prog_array churn that would raise flags (~1MM/s per Paul). [1]: https://lore.kernel.org/bpf/CAEf4BzbpjN6ca7D9KOTiFPOoBYkciYvTz0UJNp5c-_3ptm=Mrg@mail.gmail.com/ v3 -> v4: * Fix kdoc and inline issues * Rebase v2 -> v3: * Inline uprobe_call_bpf into trace_uprobe.c, it's just a bpf_prog_run_array_sleepable call now. * Do not disable preemption for uprobe non-sleepable programs. * Add acks. v1 -> v2: * Fix lockdep annotations in bpf_prog_run_array_sleepable * Chain rcu grace periods only for perf_event-attached programs. This limits the additional latency on the free path to use cases where we know it won't be a problem. * Add tests calling helpers only available in sleepable programs. * Remove kprobe.s support from libbpf. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-06-16 19:27:37 -07:00
Delyan Kratunov	cb3f4a4a46	selftests/bpf: add tests for sleepable (uk)probes Add tests that ensure sleepable uprobe programs work correctly. Add tests that ensure sleepable kprobe programs cannot attach. Also add tests that attach both sleepable and non-sleepable uprobe programs to the same location (i.e. same bpf_prog_array). Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Delyan Kratunov <delyank@fb.com> Link: https://lore.kernel.org/r/c744e5bb7a5c0703f05444dc41f2522ba3579a48.1655248076.git.delyank@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-06-16 19:27:30 -07:00
Delyan Kratunov	c4cac71fc8	libbpf: add support for sleepable uprobe programs Add section mappings for u(ret)probe.s programs. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Delyan Kratunov <delyank@fb.com> Link: https://lore.kernel.org/r/aedbc3b74f3523f00010a7b0df8f3388cca59f16.1655248076.git.delyank@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-06-16 19:27:30 -07:00
Delyan Kratunov	64ad7556c7	bpf: allow sleepable uprobe programs to attach uprobe and kprobe programs have the same program type, KPROBE, which is currently not allowed to load sleepable programs. To avoid adding a new UPROBE type, instead allow sleepable KPROBE programs to load and defer the is-it-actually-a-uprobe-program check to attachment time, where there's already validation of the corresponding perf_event. A corollary of this patch is that you can now load a sleepable kprobe program but cannot attach it. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Delyan Kratunov <delyank@fb.com> Link: https://lore.kernel.org/r/fcd44a7cd204f372f6bb03ef794e829adeaef299.1655248076.git.delyank@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-06-16 19:27:29 -07:00
Delyan Kratunov	8c7dcb84e3	bpf: implement sleepable uprobes by chaining gps uprobes work by raising a trap, setting a task flag from within the interrupt handler, and processing the actual work for the uprobe on the way back to userspace. As a result, uprobe handlers already execute in a might_fault/_sleep context. The primary obstacle to sleepable bpf uprobe programs is therefore on the bpf side. Namely, the bpf_prog_array attached to the uprobe is protected by normal rcu. In order for uprobe bpf programs to become sleepable, it has to be protected by the tasks_trace rcu flavor instead (and kfree() called after a corresponding grace period). Therefore, the free path for bpf_prog_array now chains a tasks_trace and normal grace periods one after the other. Users who iterate under tasks_trace read section would be safe, as would users who iterate under normal read sections (from non-sleepable locations). The downside is that the tasks_trace latency affects all perf_event-attached bpf programs (and not just uprobe ones). This is deemed safe given the possible attach rates for kprobe/uprobe/tp programs. Separately, non-sleepable programs need access to dynamically sized rcu-protected maps, so bpf_run_prog_array_sleepables now conditionally takes an rcu read section, in addition to the overarching tasks_trace section. Signed-off-by: Delyan Kratunov <delyank@fb.com> Link: https://lore.kernel.org/r/ce844d62a2fd0443b08c5ab02e95bc7149f9aeb1.1655248076.git.delyank@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-06-16 19:27:29 -07:00
Delyan Kratunov	d687f621c5	bpf: move bpf_prog to bpf.h In order to add a version of bpf_prog_run_array which accesses the bpf_prog->aux member, bpf_prog needs to be more than a forward declaration inside bpf.h. Given that filter.h already includes bpf.h, this merely reorders the type declarations for filter.h users. bpf.h users now have access to bpf_prog internals. Signed-off-by: Delyan Kratunov <delyank@fb.com> Link: https://lore.kernel.org/r/3ed7824e3948f22d84583649ccac0ff0d38b6b58.1655248076.git.delyank@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-06-16 19:27:29 -07:00
Tyrel Datwyler	aeaadcde1a	scsi: ibmvfc: Store vhost pointer during subcrq allocation Currently the back pointer from a queue to the vhost adapter isn't set until after subcrq interrupt registration. The value is available when a queue is first allocated and can/should be also set for primary and async queues as well as subcrqs. This fixes a crash observed during kexec/kdump on Power 9 with legacy XICS interrupt controller where a pending subcrq interrupt from the previous kernel can be replayed immediately upon IRQ registration resulting in dereference of a garbage backpointer in ibmvfc_interrupt_scsi(). Kernel attempted to read user page (58) - exploit attempt? (uid: 0) BUG: Kernel NULL pointer dereference on read at 0x00000058 Faulting instruction address: 0xc008000003216a08 Oops: Kernel access of bad area, sig: 11 [#1] ... NIP [c008000003216a08] ibmvfc_interrupt_scsi+0x40/0xb0 [ibmvfc] LR [c0000000082079e8] __handle_irq_event_percpu+0x98/0x270 Call Trace: [c000000047fa3d80] [c0000000123e6180] 0xc0000000123e6180 (unreliable) [c000000047fa3df0] [c0000000082079e8] __handle_irq_event_percpu+0x98/0x270 [c000000047fa3ea0] [c000000008207d18] handle_irq_event+0x98/0x188 [c000000047fa3ef0] [c00000000820f564] handle_fasteoi_irq+0xc4/0x310 [c000000047fa3f40] [c000000008205c60] generic_handle_irq+0x50/0x80 [c000000047fa3f60] [c000000008015c40] __do_irq+0x70/0x1a0 [c000000047fa3f90] [c000000008016d7c] __do_IRQ+0x9c/0x130 [c000000014622f60] [0000000020000000] 0x20000000 [c000000014622ff0] [c000000008016e50] do_IRQ+0x40/0xa0 [c000000014623020] [c000000008017044] replay_soft_interrupts+0x194/0x2f0 [c000000014623210] [c0000000080172a8] arch_local_irq_restore+0x108/0x170 [c000000014623240] [c000000008eb1008] _raw_spin_unlock_irqrestore+0x58/0xb0 [c000000014623270] [c00000000820b12c] __setup_irq+0x49c/0x9f0 [c000000014623310] [c00000000820b7c0] request_threaded_irq+0x140/0x230 [c000000014623380] [c008000003212a50] ibmvfc_register_scsi_channel+0x1e8/0x2f0 [ibmvfc] [c000000014623450] [c008000003213d1c] ibmvfc_init_sub_crqs+0xc4/0x1f0 [ibmvfc] [c0000000146234d0] [c0080000032145a8] ibmvfc_reset_crq+0x150/0x210 [ibmvfc] [c000000014623550] [c0080000032147c8] ibmvfc_init_crq+0x160/0x280 [ibmvfc] [c0000000146235f0] [c00800000321a9cc] ibmvfc_probe+0x2a4/0x530 [ibmvfc] Link: https://lore.kernel.org/r/20220616191126.1281259-2-tyreld@linux.ibm.com Fixes: `3034ebe263` ("scsi: ibmvfc: Add alloc/dealloc routines for SCSI Sub-CRQ Channels") Cc: stable@vger.kernel.org Reviewed-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-06-16 21:42:04 -04:00
Tyrel Datwyler	72ea7fe0db	scsi: ibmvfc: Allocate/free queue resource only during probe/remove Currently, the sub-queues and event pool resources are allocated/freed for every CRQ connection event such as reset and LPM. This exposes the driver to a couple issues. First the inefficiency of freeing and reallocating memory that can simply be resued after being sanitized. Further, a system under memory pressue runs the risk of allocation failures that could result in a crippled driver. Finally, there is a race window where command submission/compeletion can try to pull/return elements from/to an event pool that is being deleted or already has been deleted due to the lack of host state around freeing/allocating resources. The following is an example of list corruption following a live partition migration (LPM): Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: vfat fat isofs cdrom ext4 mbcache jbd2 nft_counter nft_compat nf_tables nfnetlink rpadlpar_io rpaphp xsk_diag nfsv3 nfs_acl nfs lockd grace fscache netfs rfkill bonding tls sunrpc pseries_rng drm drm_panel_orientation_quirks xfs libcrc32c dm_service_time sd_mod t10_pi sg ibmvfc scsi_transport_fc ibmveth vmx_crypto dm_multipath dm_mirror dm_region_hash dm_log dm_mod ipmi_devintf ipmi_msghandler fuse CPU: 0 PID: 2108 Comm: ibmvfc_0 Kdump: loaded Not tainted 5.14.0-70.9.1.el9_0.ppc64le #1 NIP: c0000000007c4bb0 LR: c0000000007c4bac CTR: 00000000005b9a10 REGS: c00000025c10b760 TRAP: 0700 Not tainted (5.14.0-70.9.1.el9_0.ppc64le) MSR: 800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 2800028f XER: 0000000f CFAR: c0000000001f55bc IRQMASK: 0 GPR00: c0000000007c4bac c00000025c10ba00 c000000002a47c00 000000000000004e GPR04: c0000031e3006f88 c0000031e308bd00 c00000025c10b768 0000000000000027 GPR08: 0000000000000000 c0000031e3009dc0 00000031e0eb0000 0000000000000000 GPR12: c0000031e2ffffa8 c000000002dd0000 c000000000187108 c00000020fcee2c0 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 c008000002f81300 GPR24: 5deadbeef0000100 5deadbeef0000122 c000000263ba6910 c00000024cc88000 GPR28: 000000000000003c c0000002430a0000 c0000002430ac300 000000000000c300 NIP [c0000000007c4bb0] __list_del_entry_valid+0x90/0x100 LR [c0000000007c4bac] __list_del_entry_valid+0x8c/0x100 Call Trace: [c00000025c10ba00] [c0000000007c4bac] __list_del_entry_valid+0x8c/0x100 (unreliable) [c00000025c10ba60] [c008000002f42284] ibmvfc_free_queue+0xec/0x210 [ibmvfc] [c00000025c10bb10] [c008000002f4246c] ibmvfc_deregister_scsi_channel+0xc4/0x160 [ibmvfc] [c00000025c10bba0] [c008000002f42580] ibmvfc_release_sub_crqs+0x78/0x130 [ibmvfc] [c00000025c10bc20] [c008000002f4f6cc] ibmvfc_do_work+0x5c4/0xc70 [ibmvfc] [c00000025c10bce0] [c008000002f4fdec] ibmvfc_work+0x74/0x1e8 [ibmvfc] [c00000025c10bda0] [c0000000001872b8] kthread+0x1b8/0x1c0 [c00000025c10be10] [c00000000000cd64] ret_from_kernel_thread+0x5c/0x64 Instruction dump: 40820034 38600001 38210060 4e800020 7c0802a6 7c641b78 3c62fe7a 7d254b78 3863b590 f8010070 4ba309cd 60000000 <0fe00000> 7c0802a6 3c62fe7a 3863b640 ---[ end trace 11a2b65a92f8b66c ]--- ibmvfc 30000003: Send warning. Receive queue closed, will retry. Add registration/deregistration helpers that are called instead during connection resets to sanitize and reconfigure the queues. Link: https://lore.kernel.org/r/20220616191126.1281259-3-tyreld@linux.ibm.com Fixes: `3034ebe263` ("scsi: ibmvfc: Add alloc/dealloc routines for SCSI Sub-CRQ Channels") Cc: stable@vger.kernel.org Reviewed-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-06-16 21:40:10 -04:00
Saurabh Sengar	1d3e098078	scsi: storvsc: Correct reporting of Hyper-V I/O size limits Current code is based on the idea that the max number of SGL entries also determines the max size of an I/O request. While this idea was true in older versions of the storvsc driver when SGL entry length was limited to 4 Kbytes, commit `3d9c3dcc58` ("scsi: storvsc: Enable scatterlist entry lengths > 4Kbytes") removed that limitation. It's now theoretically possible for the block layer to send requests that exceed the maximum size supported by Hyper-V. This problem doesn't currently happen in practice because the block layer defaults to a 512 Kbyte maximum, while Hyper-V in Azure supports 2 Mbyte I/O sizes. But some future configuration of Hyper-V could have a smaller max I/O size, and the block layer could exceed that max. Fix this by correctly setting max_sectors as well as sg_tablesize to reflect the maximum I/O size that Hyper-V reports. While allowing I/O sizes larger than the block layer default of 512 Kbytes doesn’t provide any noticeable performance benefit in the tests we ran, it's still appropriate to report the correct underlying Hyper-V capabilities to the Linux block layer. Also tweak the virt_boundary_mask to reflect that the required alignment derives from Hyper-V communication using a 4 Kbyte page size, and not on the guest page size, which might be bigger (eg. ARM64). Link: https://lore.kernel.org/r/1655190355-28722-1-git-send-email-ssengar@linux.microsoft.com Fixes: `3d9c3dcc58` ("scsi: storvsc: Enable scatter list entry lengths > 4Kbytes") Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-06-16 21:36:03 -04:00
Dave Airlie	65cf7c02cf	Merge tag 'exynos-drm-fixes-v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-fixes two regression fixups - Check a null pointer instead of IS_ERR(). - Rework initialization code of Exynos MIC driver. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Inki Dae <inki.dae@samsung.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220614141336.88614-1-inki.dae@samsung.com	2022-06-17 11:32:35 +10:00
Bart Van Assche	2acd76e7b8	scsi: ufs: Fix a race between the interrupt handler and the reset handler Prevent that both the interrupt handler and the reset handler try to complete a request at the same time. This patch is the result of an analysis of the following crash: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000120 CPU: 0 PID: 0 Comm: swapper/0 Tainted: G OE 5.10.107-android13-4-00051-g1e48e8970cca-ab8664745 #1 pc : ufshcd_release_scsi_cmd+0x30/0x46c lr : __ufshcd_transfer_req_compl+0x4fc/0x9c0 Call trace: ufshcd_release_scsi_cmd+0x30/0x46c __ufshcd_transfer_req_compl+0x4fc/0x9c0 ufshcd_poll+0xf0/0x208 ufshcd_sl_intr+0xb8/0xf0 ufshcd_intr+0x168/0x2f4 __handle_irq_event_percpu+0xa0/0x30c handle_irq_event+0x84/0x178 handle_fasteoi_irq+0x150/0x2e8 __handle_domain_irq+0x114/0x1e4 gic_handle_irq.31846+0x58/0x300 el1_irq+0xe4/0x1c0 cpuidle_enter_state+0x3ac/0x8c4 do_idle+0x2fc/0x55c cpu_startup_entry+0x84/0x90 kernel_init+0x0/0x310 start_kernel+0x0/0x608 start_kernel+0x4ec/0x608 Link: https://lore.kernel.org/r/20220613214442.212466-4-bvanassche@acm.org Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-06-16 21:32:09 -04:00
Bart Van Assche	d1a7644648	scsi: ufs: Support clearing multiple commands at once Modify ufshcd_clear_cmd() such that it supports clearing multiple commands at once instead of one command at a time. This change will be used in a later patch to reduce the time spent in the reset handler. Link: https://lore.kernel.org/r/20220613214442.212466-3-bvanassche@acm.org Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-06-16 21:32:09 -04:00
Bart Van Assche	da8badd7d3	scsi: ufs: Simplify ufshcd_clear_cmd() Remove the local variable 'err'. This patch does not change any functionality. Link: https://lore.kernel.org/r/20220613214442.212466-2-bvanassche@acm.org Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-06-16 21:32:09 -04:00
Dave Airlie	d08227a8b1	Merge tag 'amd-drm-fixes-5.19-2022-06-15' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-5.19-2022-06-15: amdgpu: - Fix regression in GTT size reporting - OLED backlight fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220615205609.28763-1-alexander.deucher@amd.com	2022-06-17 11:17:37 +10:00
Dave Airlie	3f0acf259a	Merge tag 'drm-intel-fixes-2022-06-16' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes drm/i915 fixes for v5.19-rc3: - Fix page fault on error state read - Fix memory leaks in per-gt sysfs - Fix multiple fence handling - Remove accidental static from a local variable Signed-off-by: Dave Airlie <airlied@redhat.com> From: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/8735g5xd25.fsf@intel.com	2022-06-17 10:24:42 +10:00
Mikulas Patocka	85e123c27d	dm mirror log: round up region bitmap size to BITS_PER_LONG The code in dm-log rounds up bitset_size to 32 bits. It then uses find_next_zero_bit_le on the allocated region. find_next_zero_bit_le accesses the bitmap using unsigned long pointers. So, on 64-bit architectures, it may access 4 bytes beyond the allocated size. Fix this bug by rounding up bitset_size to BITS_PER_LONG. This bug was found by running the lvm2 testsuite with kasan. Fixes: `29121bd0b0` ("[PATCH] dm mirror log: bitset_size fix") Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>	2022-06-16 19:39:29 -04:00
Mikulas Patocka	1ee88de395	dm: fix narrow race for REQ_NOWAIT bios being issued despite no support Starting with the commit 63a225c9fd20, device mapper has an optimization that it will take cheaper table lock (dm_get_live_table_fast instead of dm_get_live_table) if the bio has REQ_NOWAIT. The bios with REQ_NOWAIT must not block in the target request routine, if they did, we would be blocking while holding rcu_read_lock, which is prohibited. The targets that are suitable for REQ_NOWAIT optimization (and that don't block in the map routine) have the flag DM_TARGET_NOWAIT set. Device mapper will test if all the targets and all the devices in a table support nowait (see the function dm_table_supports_nowait) and it will set or clear the QUEUE_FLAG_NOWAIT flag on its request queue according to this check. There's a test in submit_bio_noacct: "if ((bio->bi_opf & REQ_NOWAIT) && !blk_queue_nowait(q)) goto not_supported" - this will make sure that REQ_NOWAIT bios can't enter a request queue that doesn't support them. This mechanism works to prevent REQ_NOWAIT bios from reaching dm targets that don't support the REQ_NOWAIT flag (and that may block in the map routine) - except that there is a small race condition: submit_bio_noacct checks if the queue has the QUEUE_FLAG_NOWAIT without holding any locks. Immediatelly after this check, the device mapper table may be reloaded with a table that doesn't support REQ_NOWAIT (for example, if we start moving the logical volume or if we activate a snapshot). However the REQ_NOWAIT bio that already passed the check in submit_bio_noacct would be sent to device mapper, where it could be redirected to a dm target that doesn't support REQ_NOWAIT - the result is sleeping while we hold rcu_read_lock. In order to fix this race, we double-check if the target supports REQ_NOWAIT while we hold the table lock (so that the table can't change under us). Fixes: `563a225c9f` ("dm: introduce dm_{get,put}_live_table_bio called from dm_submit_bio") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>	2022-06-16 19:39:02 -04:00
Mikulas Patocka	5d7362d0d5	dm: fix use-after-free in dm_put_live_table_bio dm_put_live_table_bio is called from the end of dm_submit_bio. However, at this point, the bio may be already finished and the caller may have freed the bio. Consequently, dm_put_live_table_bio accesses the stale "bio" pointer. Fix this bug by loading the bi_opf value and passing it to dm_get_live_table_bio and dm_put_live_table_bio instead of the bio. This bug was found by running the lvm2 testsuite with kasan. Fixes: `563a225c9f` ("dm: introduce dm_{get,put}_live_table_bio called from dm_submit_bio") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>	2022-06-16 19:38:49 -04:00

... 8 9 10 11 12 ...

1106036 Commits