linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-07 16:17:41 -04:00

Author	SHA1	Message	Date
Amit Cohen	80dfeafd64	mlxsw: spectrum_switchdev: Add support of QinQ traffic 802.1ad, also known as QinQ is an extension to the 802.1q standard, which is concerned with passing possibly 802.1q-tagged packets through another VLAN-like tunnel. The format of 802.1ad tag is the same as 802.1q, except it uses the EtherType of 0x88a8, unlike 802.1q's 0x8100. Add support for 802.1ad protocol. Most of the configuration is the same as 802.1q. The difference is that before a port joins an 802.1ad bridge it needs to be configured to recognize 802.1ad packets as tagged and other packets (e.g., 802.1q) as untagged. VXLAN is not currently supported with 802.1ad bridge, so return an error with an appropriate extack message. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 15:21:12 -08:00
Amit Cohen	773ce33a48	mlxsw: spectrum_switchdev: Create common functions for VLAN-aware bridge The code in mlxsw_sp_bridge_8021q_port_{join, leave}() can be used also for 802.1ad bridge. Move the code to functions called mlxsw_sp_bridge_vlan_aware_port_{join, leave}() and call them from mlxsw_sp_bridge_8021q_port_{join, leave}() respectively to enable code reuse. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 15:21:12 -08:00
Amit Cohen	3ae7a65b64	mlxsw: Make EtherType configurable when pushing VLAN at ingress Currently, when pushing a PVID at ingress, mlxsw always uses 802.1q EtherType. Make this EtherType configurable by extending mlxsw_sp_port_pvid_set() with an EtherType argument. This is a preparation for QinQ support, that needs to push a PVID with 802.1ad EtherType. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 15:21:12 -08:00
Amit Cohen	a2ef3ae158	mlxsw: spectrum: Only treat 802.1q packets as tagged packets By default, the device considers both 802.1ad and 802.1q packets as tagged, but this is not supported by the driver. It only supports VLAN and bridge devices that use 802.1q protocol. Instead, configure the device to only treat 802.1q packets as tagged packets. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 15:21:12 -08:00
Amit Cohen	2a5a290d6d	mlxsw: reg: Add et_vlan field to SPVID register et_vlan field is used to configure which EtherType is used when VLAN is pushed at ingress (for untagged packets or for QinQ push mode). It will be used to configure tagging with ether_type1 (i.e., 0x88A8) for QinQ mode. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 15:21:12 -08:00
Amit Cohen	7e9a6620d5	mlxsw: reg: Add Switch Port VLAN Classification Register SPVC configures the port to identify packets as untagged / single tagged / double tagged packets based on the packet EtherTypes. It will be used to classify 802.1q packets as untagged and 802.1ad packets as tagged when received by ports member in a 802.1ad bridge. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 15:21:12 -08:00
Jakub Kicinski	ac6e918554	Merge branch 'net-hns3-updates-for-next' Huazhong Tan says: ==================== net: hns3: updates for -next This series includes some updates for the HNS3 ethernet driver. #1~#6: add some updates related to the checksum offload. #7: add support for multiple TCs' MAC pauce mode. ==================== Link: https://lore.kernel.org/r/1606535510-44346-1-git-send-email-tanhuazhong@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 15:17:01 -08:00
Yonglong Liu	d78e5b6a67	net: hns3: keep MAC pause mode when multiple TCs are enabled Bellow HNAE3_DEVICE_VERSION_V3, MAC pause mode just support one TC, when enabled multiple TCs, force enable PFC mode. HNAE3_DEVICE_VERSION_V3 can support MAC pause mode on multiple TCs, so when enable multiple TCs, just keep MAC pause mode, and enable PFC mode just according to the user settings. Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 15:16:31 -08:00
Huazhong Tan	ade36ccef1	net: hns3: add a check for devcie's verion in hns3_tunnel_csum_bug() For the device whose version is above V3(include V3), the hardware can do checksum offload for the non-tunnel udp packet, who has a dest port as the IANA assigned. So add a check for devcie's verion in hns3_tunnel_csum_bug(). Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 15:16:31 -08:00
Huazhong Tan	b1533ada74	net: hns3: add more info to hns3_dbg_bd_info() Since TX hardware checksum and RX completion checksum have been supported now, so add related information in hns3_dbg_bd_info(). Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 15:16:31 -08:00
Huazhong Tan	3e2816219d	net: hns3: add udp tunnel checksum segmentation support For the device who has the capability to handle udp tunnel checksum segmentation, add support for it. Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 15:16:31 -08:00
Huazhong Tan	57e72c121c	net: hns3: remove unsupported NETIF_F_GSO_UDP_TUNNEL_CSUM Currently, device V1 and V2 do not support segmentation offload for UDP based tunnel packet who needs outer UDP checksum offload, so there is a workaround in the driver to set the checksum of the outer UDP checksum as zero. This is not what the user wants, so remove this feature for device V1 and V2, add support for it later(when the device has the ability to do that). Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 15:16:31 -08:00
Huazhong Tan	66d52f3bf3	net: hns3: add support for TX hardware checksum offload For the device that supports TX hardware checksum, the hardware can calculate the checksum from the start and fill the checksum to the offset position, which reduces the operations of calculating the type and header length of L3/L4. So add this feature for the HNS3 ethernet driver. The previous simple BD description is unsuitable, rename it as HW TX CSUM. Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 15:16:31 -08:00
Huazhong Tan	4b2fe769aa	net: hns3: add support for RX completion checksum In some cases (for example ip fragment), hardware will calculate the checksum of whole packet in RX, and setup the HNS3_RXD_L2_CSUM_B flag in the descriptor, so add support to utilize this checksum. Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 15:16:31 -08:00
Dominique Martinet	960f4f8a4e	fs: 9p: add generic splice_write file operation The default splice operations got removed recently, add it back to 9p with iter_file_splice_write like many other filesystems do. Link: http://lkml.kernel.org/r/1606837496-21717-1-git-send-email-asmadeus@codewreck.org Fixes: `36e2c7421f` ("fs: don't allow splice read/write without explicit ops") Signed-off-by: Dominique Martinet <asmadeus@codewreck.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2020-12-01 21:40:47 +01:00
Linus Torvalds	f43691b59f	Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost Pull vhost fixes from Michael Tsirkin: "A couple of minor fixes" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vhost-vdpa: fix page pinning leakage in error path (rework) vringh: fix vringh_iov_push_*() documentation vhost scsi: fix lun reset completion handling	2020-12-01 12:11:09 -08:00
Randy Dunlap	14483cbf04	net: broadcom CNIC: requires MMU The CNIC kconfig symbol selects UIO and UIO depends on MMU. Since 'select' does not follow dependency chains, add the same MMU dependency to CNIC. Quietens this kconfig warning: WARNING: unmet direct dependencies detected for UIO Depends on [n]: MMU [=n] Selected by [m]: - CNIC [=m] && NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_BROADCOM [=y] && PCI [=y] && (IPV6 [=m] \|\| IPV6 [=m]=n) Fixes: `adfc5217e9` ("broadcom: Move the Broadcom drivers") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Rasesh Mody <rmody@marvell.com> Cc: GR-Linux-NIC-Dev@marvell.com Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: netdev@vger.kernel.org Link: https://lore.kernel.org/r/20201129070843.3859-1-rdunlap@infradead.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 11:44:02 -08:00
Lukas Bulwahn	9e39394fae	net/ipv6: propagate user pointer annotation For IPV6_2292PKTOPTIONS, do_ipv6_getsockopt() stores the user pointer optval in the msg_control field of the msghdr. Hence, sparse rightfully warns at ./net/ipv6/ipv6_sockglue.c:1151:33: warning: incorrect type in assignment (different address spaces) expected void msg_control got char [noderef] __user optval Since commit `1f466e1f15` ("net: cleanly handle kernel vs user buffers for ->msg_control"), user pointers shall be stored in the msg_control_user field, and kernel pointers in the msg_control field. This allows to propagate __user annotations nicely through this struct. Store optval in msg_control_user to properly record and propagate the memory space annotation of this pointer. Note that msg_control_is_user is set to true, so the key invariant, i.e., use msg_control_user if and only if msg_control_is_user is true, holds. The msghdr is further used in the six alternative put_cmsg() calls, with msg_control_is_user being true, put_cmsg() picks msg_control_user preserving the __user annotation and passes that properly to copy_to_user(). No functional change. No change in object code. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20201127093421.21673-1-lukas.bulwahn@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 11:42:33 -08:00
Marco Elver	fa69ee5aa4	net: switch to storing KCOV handle directly in sk_buff It turns out that usage of skb extensions can cause memory leaks. Ido Schimmel reported: "[...] there are instances that blindly overwrite 'skb->extensions' by invoking skb_copy_header() after __alloc_skb()." Therefore, give up on using skb extensions for KCOV handle, and instead directly store kcov_handle in sk_buff. Fixes: `6370cc3bbd` ("net: add kcov handle to skb extensions") Fixes: `85ce50d337` ("net: kcov: don't select SKB_EXTENSIONS when there is no NET") Fixes: `97f53a08cb` ("net: linux/skbuff.h: combine SKB_EXTENSIONS + KCOV handling") Link: https://lore.kernel.org/linux-wireless/20201121160941.GA485907@shredder.lan/ Reported-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Marco Elver <elver@google.com> Link: https://lore.kernel.org/r/20201125224840.2014773-1-elver@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 11:26:19 -08:00
Vlad Buslov	0fca55ed98	net: sched: remove redundant 'rtnl_held' argument Functions tfilter_notify_chain() and tcf_get_next_proto() are always called with rtnl lock held in current implementation. Moreover, attempting to call them without rtnl lock would cause a warning down the call chain in function __tcf_get_next_proto() that requires the lock to be held by callers. Remove the 'rtnl_held' argument in order to simplify the code and make rtnl lock requirement explicit. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Link: https://lore.kernel.org/r/20201127151205.23492-1-vladbu@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 11:24:56 -08:00
David S. Miller	de7b3f8164	Merge branch 'ibmvnic-Bug-fixes-for-queue-descriptor-processing' Thomas Falcon says: ==================== ibmvnic: Bug fixes for queue descriptor processing This series resolves a few issues in the ibmvnic driver's RX buffer and TX completion processing. The first patch includes memory barriers to synchronize queue descriptor reads. The second patch fixes a memory leak that could occur if the device returns a TX completion with an error code in the descriptor, in which case the respective socket buffer and other relevant data structures may not be freed or updated properly. v3: Correct length of Fixes tags, requested by Jakub Kicinski v2: Provide more detailed comments explaining specifically what reads are being ordered, suggested by Michael Ellerman ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-12-01 10:11:15 -08:00
Thomas Falcon	ba246c1751	ibmvnic: Fix TX completion error handling TX completions received with an error return code are not being processed properly. When an error code is seen, do not proceed to the next completion before cleaning up the existing entry's data structures. Fixes: `032c5e8284` ("Driver for IBM System i/p VNIC protocol") Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-12-01 10:09:04 -08:00
Thomas Falcon	b71ec95223	ibmvnic: Ensure that SCRQ entry reads are correctly ordered Ensure that received Subordinate Command-Response Queue (SCRQ) entries are properly read in order by the driver. These queues are used in the ibmvnic device to process RX buffer and TX completion descriptors. dma_rmb barriers have been added after checking for a pending descriptor to ensure the correct descriptor entry is checked and after reading the SCRQ descriptor to ensure the entire descriptor is read before processing. Fixes: `032c5e8284` ("Driver for IBM System i/p VNIC protocol") Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-12-01 10:09:04 -08:00
Toke Høiland-Jørgensen	cf03f316ad	fs: 9p: add generic splice_read file operations The v9fs file operations were missing the splice_read operations, which breaks sendfile() of files on such a filesystem. I discovered this while trying to load an eBPF program using iproute2 inside a 'virtme' environment which uses 9pfs for the virtual file system. iproute2 relies on sendfile() with an AF_ALG socket to hash files, which was erroring out in the virtual environment. Since generic_file_splice_read() seems to just implement splice_read in terms of the read_iter operation, I simply added the generic implementation to the file operations, which fixed the error I was seeing. A quick grep indicates that this is what most other file systems do as well. Link: http://lkml.kernel.org/r/20201201135409.55510-1-toke@redhat.com Fixes: `36e2c7421f` ("fs: don't allow splice read/write without explicit ops") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>	2020-12-01 17:53:49 +01:00
Daniel Borkmann	ba0581749f	net, xdp, xsk: fix __sk_mark_napi_id_once napi_id error Stephen reported the following build error for !CONFIG_NET_RX_BUSY_POLL built kernels: In file included from fs/select.c:32: include/net/busy_poll.h: In function 'sk_mark_napi_id_once': include/net/busy_poll.h:150:36: error: 'const struct sk_buff' has no member named 'napi_id' 150 \| __sk_mark_napi_id_once_xdp(sk, skb->napi_id); \| ^~ Fix it by wrapping a CONFIG_NET_RX_BUSY_POLL around the helpers. Fixes: `b02e5a0ebb` ("xsk: Propagate napi_id to XDP socket Rx path") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/linux-next/20201201190746.7d3357fb@canb.auug.org.au	2020-12-01 15:51:19 +01:00
Masami Hiramatsu	05227490c5	docs: bootconfig: Add the endianness of fields Add a description about the endianness of the size and the checksum fields. Those must be stored as le32 instead of u32. This will allow us to apply bootconfig to the cross build initrd without caring the endianness. Link: https://lkml.kernel.org/r/160583936246.547349.10964204130590955409.stgit@devnote2 Reported-by: Steven Rostedt <rostedt@goodmis.org> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2020-11-30 23:22:11 -05:00
Masami Hiramatsu	e86843580d	tools/bootconfig: Store size and checksum in footer as le32 Store the size and the checksum fields in the footer as le32 instead of u32. This will allow us to apply bootconfig to the cross build initrd without caring the endianness. Link: https://lkml.kernel.org/r/160583935332.547349.5897811300636587426.stgit@devnote2 Reported-by: Steven Rostedt <rostedt@goodmis.org> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2020-11-30 23:22:11 -05:00
Masami Hiramatsu	24aed09451	bootconfig: Load size and checksum in the footer as le32 Load the size and the checksum fields in the footer as le32 instead of u32. This will allow us to apply bootconfig to the cross build initrd without caring the endianness. Link: https://lkml.kernel.org/r/160583934457.547349.10504070298990791074.stgit@devnote2 Reported-by: Steven Rostedt <rostedt@goodmis.org> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2020-11-30 23:22:11 -05:00
Steven Rostedt (VMware)	68e10d5ff5	ring-buffer: Always check to put back before stamp when crossing pages The current ring buffer logic checks to see if the updating of the event buffer was interrupted, and if it is, it will try to fix up the before stamp with the write stamp to make them equal again. This logic is flawed, because if it is not interrupted, the two are guaranteed to be different, as the current event just updated the before stamp before allocation. This guarantees that the next event (this one or another interrupting one) will think it interrupted the time updates of a previous event and inject an absolute time stamp to compensate. The correct logic is to always update the timestamps when traversing to a new sub buffer. Cc: stable@vger.kernel.org Fixes: `a389d86f7f` ("ring-buffer: Have nested events still record running time stamp") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2020-11-30 23:21:51 -05:00
Jakub Kicinski	237f977ab9	Merge tag 'linux-can-fixes-for-5.10-20201130' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2020-11-30 The first patch is by me an target the tcan4x5x bindings for the m_can driver. It fixes the error path in the tcan4x5x_can_probe() function. The next two patches are by Jeroen Hofstee and makes the lost of arbitration error counters of sja1000 and the sun4i drivers consistent with the other drivers. Zhang Qilong contributes two patch that clean up the error path in the c_can and kvaser_pciefd drivers. * tag 'linux-can-fixes-for-5.10-20201130' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: kvaser_pciefd: kvaser_pciefd_open(): fix error handling can: c_can: c_can_power_up(): fix error handling can: sun4i_can: sun4i_can_err(): don't count arbitration lose as an error can: sja1000: sja1000_err(): don't count arbitration lose as an error can: m_can: tcan4x5x_can_probe(): fix error path: remove erroneous clk_disable_unprepare() ==================== Link: https://lore.kernel.org/r/20201130125307.218258-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-30 19:03:49 -08:00
Jakub Kicinski	cb7fb043e6	Merge tag 'linux-can-next-for-5.11-20201130' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2020-11-30 Gustavo A. R. Silva's patch for the pcan_usb driver fixes fall-through warnings for Clang. The next 5 patches target the mcp251xfd driver and are by Ursula Maplehurst and me. They optimizie the TEF and RX path by reducing number of SPI core requests to set the UINC bit. The remaining 8 patches target the m_can driver. The first 4 are various cleanups for the SPI binding driver (tcan4x5x) by Sean Nyekjaer, Dan Murphy and me. Followed by 4 cleanup patches by me for the m_can and m_can_platform driver. * tag 'linux-can-next-for-5.11-20201130' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: can: m_can: m_can_class_unregister(): move right after m_can_class_register() can: m_can: m_can_plat_remove(): remove unneeded platform_set_drvdata() can: m_can: remove not used variable struct m_can_classdev::freq can: m_can: Kconfig: convert the into menu can: tcan4x5x: tcan4x5x_can_probe(): remove probe failed error message can: tcan4x5x: remove mram_start and reg_offset from struct tcan4x5x_priv can: tcan4x5x: rename parse_config() function can: tcan4x5x: tcan4x5x_clear_interrupts(): remove redundant return statement can: mcp251xfd: tef-path: reduce number of SPI core requests to set UINC bit can: mcp251xfd: move struct mcp251xfd_tef_ring definition can: mcp251xfd: struct mcp251xfd_priv::tef to array of length 1 can: mcp25xxfd: rx-path: reduce number of SPI core requests to set UINC bit can: mcp251xfd: mcp25xxfd_ring_alloc(): add define instead open coding the maximum number of RX objects can: pcan_usb_core: fix fall-through warnings for Clang ==================== Link: https://lore.kernel.org/r/20201130141432.278219-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-30 19:03:32 -08:00
Tom Rix	76810ed840	net: wan: remove trailing semicolon in macro definition The macro use will already have a semicolon. Signed-off-by: Tom Rix <trix@redhat.com> Link: https://lore.kernel.org/r/20201127165734.2694693-1-trix@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-30 18:58:31 -08:00
Naveen N. Rao	49a962c075	ftrace: Fix DYNAMIC_FTRACE_WITH_DIRECT_CALLS dependency DYNAMIC_FTRACE_WITH_DIRECT_CALLS should depend on DYNAMIC_FTRACE_WITH_REGS since we need ftrace_regs_caller(). Link: https://lkml.kernel.org/r/fc4b257ea8689a36f086d2389a9ed989496ca63a.1606412433.git.naveen.n.rao@linux.vnet.ibm.com Cc: stable@vger.kernel.org Fixes: `763e34e74b` ("ftrace: Add register_ftrace_direct()") Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2020-11-30 21:43:08 -05:00
Naveen N. Rao	4c75b0ff4e	ftrace: Fix updating FTRACE_FL_TRAMP On powerpc, kprobe-direct.tc triggered FTRACE_WARN_ON() in ftrace_get_addr_new() followed by the below message: Bad trampoline accounting at: 000000004222522f (wake_up_process+0xc/0x20) (f0000001) The set of steps leading to this involved: - modprobe ftrace-direct-too - enable_probe - modprobe ftrace-direct - rmmod ftrace-direct <-- trigger The problem turned out to be that we were not updating flags in the ftrace record properly. From the above message about the trampoline accounting being bad, it can be seen that the ftrace record still has FTRACE_FL_TRAMP set though ftrace-direct module is going away. This happens because we are checking if any ftrace_ops has the FTRACE_FL_TRAMP flag set _before_ updating the filter hash. The fix for this is to look for any _other_ ftrace_ops that also needs FTRACE_FL_TRAMP. Link: https://lkml.kernel.org/r/56c113aa9c3e10c19144a36d9684c7882bf09af5.1606412433.git.naveen.n.rao@linux.vnet.ibm.com Cc: stable@vger.kernel.org Fixes: `a124692b69` ("ftrace: Enable trampoline when rec count returns back to one") Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2020-11-30 21:43:07 -05:00
Minchan Kim	8fa655a3a0	tracing: Fix alignment of static buffer With 5.9 kernel on ARM64, I found ftrace_dump output was broken but it had no problem with normal output "cat /sys/kernel/debug/tracing/trace". With investigation, it seems coping the data into temporal buffer seems to break the align binary printf expects if the static buffer is not aligned with 4-byte. IIUC, get_arg in bstr_printf expects that args has already right align to be decoded and seq_buf_bprintf says ``the arguments are saved in a 32bit word array that is defined by the format string constraints``. So if we don't keep the align under copy to temporal buffer, the output will be broken by shifting some bytes. This patch fixes it. Link: https://lkml.kernel.org/r/20201125225654.1618966-1-minchan@kernel.org Cc: <stable@vger.kernel.org> Fixes: `8e99cf91b9` ("tracing: Do not allocate buffer in trace_find_next_entry() in atomic") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2020-11-30 21:43:07 -05:00
Vasily Averin	310e3a4b5a	tracing: Remove WARN_ON in start_thread() This patch reverts commit `978defee11` ("tracing: Do a WARN_ON() if start_thread() in hwlat is called when thread exists") .start hook can be legally called several times if according tracer is stopped screen window 1 [root@localhost ~]# echo 1 > /sys/kernel/tracing/events/kmem/kfree/enable [root@localhost ~]# echo 1 > /sys/kernel/tracing/options/pause-on-trace [root@localhost ~]# less -F /sys/kernel/tracing/trace screen window 2 [root@localhost ~]# cat /sys/kernel/debug/tracing/tracing_on 0 [root@localhost ~]# echo hwlat > /sys/kernel/debug/tracing/current_tracer [root@localhost ~]# echo 1 > /sys/kernel/debug/tracing/tracing_on [root@localhost ~]# cat /sys/kernel/debug/tracing/tracing_on 0 [root@localhost ~]# echo 2 > /sys/kernel/debug/tracing/tracing_on triggers warning in dmesg: WARNING: CPU: 3 PID: 1403 at kernel/trace/trace_hwlat.c:371 hwlat_tracer_start+0xc9/0xd0 Link: https://lkml.kernel.org/r/bd4d3e70-400d-9c82-7b73-a2d695e86b58@virtuozzo.com Cc: Ingo Molnar <mingo@redhat.com> Cc: stable@vger.kernel.org Fixes: `978defee11` ("tracing: Do a WARN_ON() if start_thread() in hwlat is called when thread exists") Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2020-11-30 21:43:07 -05:00
Sami Tolvanen	983df5f269	samples/ftrace: Mark my_tramp[12]? global my_tramp[12]? are declared as global functions in C, but they are not marked global in the inline assembly definition. This mismatch confuses Clang's Control-Flow Integrity checking. Fix the definitions by adding .globl. Link: https://lkml.kernel.org/r/20201113183414.1446671-1-samitolvanen@google.com Fixes: `9d907f1ae8` ("ftrace/samples: Add a sample module that implements modify_ftrace_direct()") Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2020-11-30 21:42:48 -05:00
Sven Eckelmann	a5e74021e8	vxlan: Copy needed_tailroom from lowerdev While vxlan doesn't need any extra tailroom, the lowerdev might need it. In that case, copy it over to reduce the chance for additional (re)allocations in the transmit path. Signed-off-by: Sven Eckelmann <sven@narfation.org> Link: https://lore.kernel.org/r/20201126125247.1047977-2-sven@narfation.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-30 18:10:12 -08:00
Sven Eckelmann	0a35dc41fe	vxlan: Add needed_headroom for lower device It was observed that sending data via batadv over vxlan (on top of wireguard) reduced the performance massively compared to raw ethernet or batadv on raw ethernet. A check of perf data showed that the vxlan_build_skb was calling all the time pskb_expand_head to allocate enough headroom for: min_headroom = LL_RESERVED_SPACE(dst->dev) + dst->header_len + VXLAN_HLEN + iphdr_len; But the vxlan_config_apply only requested needed headroom for: lowerdev->hard_header_len + VXLAN6_HEADROOM or VXLAN_HEADROOM So it completely ignored the needed_headroom of the lower device. The first caller of net_dev_xmit could therefore never make sure that enough headroom was allocated for the rest of the transmit path. Cc: Annika Wickert <annika.wickert@exaring.de> Signed-off-by: Sven Eckelmann <sven@narfation.org> Tested-by: Annika Wickert <aw@awlnx.space> Link: https://lore.kernel.org/r/20201126125247.1047977-1-sven@narfation.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-30 18:10:12 -08:00
Jakub Kicinski	5f3e915c36	Merge branch 'mptcp-avoid-workqueue-usage-for-data' Paolo Abeni says: ==================== mptcp: avoid workqueue usage for data The current locking schema used to protect the MPTCP data-path requires the usage of the MPTCP workqueue to process the incoming data, depending on trylock result. The above poses scalability limits and introduces random delays in MPTCP-level acks. With this series we use a single spinlock to protect the MPTCP data-path, removing the need for workqueue and delayed ack usage. This additionally reduces the number of atomic operations required per packet and cleans-up considerably the poll/wake-up code. ==================== Link: https://lore.kernel.org/r/cover.1606413118.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-30 17:55:26 -08:00
Paolo Abeni	6e628cd3a8	mptcp: use mptcp release_cb for delayed tasks We have some tasks triggered by the subflow receive path which require to access the msk socket status, specifically: mptcp_clean_una() and mptcp_push_pending() We have almost everything in place to defer to the msk release_cb such tasks when the msk sock is owned. Since the worker is no more used to clean the acked data, for fallback sockets we need to explicitly flush them. As an added bonus we can move the wake-up code in __mptcp_clean_una(), simplify a lot mptcp_poll() and move the timer update under the data lock. The worker is now used only to process and send DATA_FIN packets and do the mptcp-level retransmissions. Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-30 17:55:23 -08:00
Paolo Abeni	7439d687b7	mptcp: avoid a few atomic ops in the rx path Extending the data_lock scope in mptcp_incoming_option we can use that to protect both snd_una and wnd_end. In the typical case, we will have a single atomic op instead of 2 Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-30 17:55:23 -08:00
Paolo Abeni	724cfd2ee8	mptcp: allocate TX skbs in msk context Move the TX skbs allocation in mptcp_sendmsg() scope, and tentatively pre-allocate a skbs number proportional to the sendmsg() length. Use the ssk tx skb cache to prevent the subflow allocation. This allows removing the msk skb extension cache and will make possible the later patches. Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-30 17:55:23 -08:00
Paolo Abeni	879526030c	mptcp: protect the rx path with the msk socket spinlock Such spinlock is currently used only to protect the 'owned' flag inside the socket lock itself. With this patch, we extend its scope to protect the whole msk receive path and sk_forward_memory. Given the above, we can always move data into the msk receive queue (and OoO queue) from the subflow. We leverage the previous commit, so that we need to acquire the spinlock in the tx path only when moving fwd memory. recvmsg() must now explicitly acquire the socket spinlock when moving skbs out of sk_receive_queue. To reduce the number of lock operations required we use a second rx queue and splice the first into the latter in mptcp_lock_sock(). Additionally rmem allocated memory is bulk-freed via release_cb() Acked-by: Florian Westphal <fw@strlen.de> Co-developed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-30 17:55:23 -08:00
Paolo Abeni	e93da92896	mptcp: implement wmem reservation This leverages the previous commit to reserve the wmem required for the sendmsg() operation when the msk socket lock is first acquired. Some heuristics are used to get a reasonable [over] estimation of the whole memory required. If we can't forward alloc such amount fallback to a reasonable small chunk, otherwise enter the wait for memory path. When sendmsg() needs more memory it looks at wmem_reserved first and if that is exhausted, move more space from sk_forward_alloc. The reserved memory is not persistent and is released at the next socket unlock via the release_cb(). Overall this will simplify the next patch. Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-30 17:55:23 -08:00
Paolo Abeni	ad80b0fc6e	mptcp: open code mptcp variant for lock_sock This allows invoking an additional callback under the socket spin lock. Will be used by the next patches to avoid additional spin lock contention. Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-30 17:55:23 -08:00
Vinay Kumar Yadav	e3d5e971d2	chelsio/chtls: fix panic during unload reload chtls there is kernel panic in inet_twsk_free() while chtls module unload when socket is in TIME_WAIT state because sk_prot_creator was not preserved on connection socket. Fixes: `cc35c88ae4` ("crypto : chtls - CPL handler definition") Signed-off-by: Udai Sharma <udai.sharma@chelsio.com> Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com> Link: https://lore.kernel.org/r/20201125214913.16938-1-vinay.yadav@chelsio.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-30 17:36:19 -08:00
Jakub Kicinski	be5724240b	Merge branch 'dpaa_eth-add-xdp-support' Camelia Groza says: ==================== dpaa_eth: add XDP support Enable XDP support for the QorIQ DPAA1 platforms. Implement all the current actions (DROP, ABORTED, PASS, TX, REDIRECT). No Tx batching is added at this time. Additional XDP_PACKET_HEADROOM bytes are reserved in each frame's headroom. After transmit, a reference to the xdp_frame is saved in the buffer for clean-up on confirmation in a newly created structure for software annotations. DPAA_TX_PRIV_DATA_SIZE bytes are reserved in the buffer for storing this structure and the XDP program is restricted from accessing them. The driver shares the egress frame queues used for XDP with the network stack. The DPAA driver is a LLTX driver so no explicit locking is required on transmission. ==================== Link: https://lore.kernel.org/r/cover.1606322126.git.camelia.groza@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-30 17:33:24 -08:00
Camelia Groza	ae680bcbd0	dpaa_eth: implement the A050385 erratum workaround for XDP For XDP TX, even tough we start out with correctly aligned buffers, the XDP program might change the data's alignment. For REDIRECT, we have no control over the alignment either. Create a new workaround for xdp_frame structures to verify the erratum conditions and move the data to a fresh buffer if necessary. Create a new xdp_frame for managing the new buffer and free the old one using the XDP API. Due to alignment constraints, all frames have a 256 byte headroom that is offered fully to XDP under the erratum. If the XDP program uses all of it, the data needs to be move to make room for the xdpf backpointer. Disable the metadata support since the information can be lost. Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com> Signed-off-by: Camelia Groza <camelia.groza@nxp.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-30 17:33:23 -08:00
Camelia Groza	d7af04486d	dpaa_eth: rename current skb A050385 erratum workaround Explicitly point that the current workaround addresses skbs. This change is in preparation for adding a workaround for XDP scenarios. Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com> Signed-off-by: Camelia Groza <camelia.groza@nxp.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-30 17:33:23 -08:00

... 4 5 6 7 8 ...

969790 Commits