linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-01-03 15:54:42 -05:00

Author	SHA1	Message	Date
Paolo Abeni	5a369ca643	tcp: propagate MPTCP skb extensions on xmit splits When the TCP stack splits a packet on the write queue, the tail half currently lose the associated skb extensions, and will not carry the DSM on the wire. The above does not cause functional problems and is allowed by the RFC, but interact badly with GRO and RX coalescing, as possible candidates for aggregation will carry different TCP options. This change tries to improve the MPTCP behavior, propagating the skb extensions on split. Additionally, we must prevent the MPTCP stack from updating the mapping after the split occur: that will both violate the RFC and fool the reader. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-04 17:45:53 -08:00
Sebastian Andrzej Siewior	f84754dbc5	soc/fsl/qbman: Add an argument to signal if NAPI processing is required. dpaa_eth_napi_schedule() and caam_qi_napi_schedule() schedule NAPI if invoked from: - Hard interrupt context - Any context which is not serving soft interrupts Any context which is not serving soft interrupts includes hard interrupts so the in_irq() check is redundant. caam_qi_napi_schedule() has a comment about this: /* * In case of threaded ISR, for RT kernels in_irq() does not return * appropriate value, so use in_serving_softirq to distinguish between * softirq and irq contexts. / if (in_irq() \|\| !in_serving_softirq()) This has nothing to do with RT. Even on a non RT kernel force threaded interrupts run obviously in thread context and therefore in_irq() returns false when invoked from the handler. The extension of the in_irq() check with !in_serving_softirq() was there when the drivers were added, but in the out of tree FSL BSP the original condition was in_irq() which got extended due to failures on RT. The usage of in_xxx() in drivers is phased out and Linus clearly requested that code which changes behaviour depending on context should either be separated or the context be conveyed in an argument passed by the caller, which usually knows the context. Right he is, the above construct is clearly showing why. The following callchains have been analyzed to end up in dpaa_eth_napi_schedule(): qman_p_poll_dqrr() __poll_portal_fast() fq->cb.dqrr() dpaa_eth_napi_schedule() portal_isr() __poll_portal_fast() fq->cb.dqrr() dpaa_eth_napi_schedule() Both need to schedule NAPI. The crypto part has another code path leading up to this: kill_fq() empty_retired_fq() qman_p_poll_dqrr() __poll_portal_fast() fq->cb.dqrr() dpaa_eth_napi_schedule() kill_fq() is called from task context and ends up scheduling NAPI, but that's pointless and an unintended side effect of the !in_serving_softirq() check. The code path: caam_qi_poll() -> qman_p_poll_dqrr() is invoked from NAPI and I assume* from crypto's NAPI device and not from qbman's NAPI device. I guess it is okay to skip scheduling NAPI (because this is what happens now) but could be changed if it is wrong due to `budget' handling. Add an argument to __poll_portal_fast() which is true if NAPI needs to be scheduled. This requires propagating the value to the caller including `qman_cb_dqrr' typedef which is used by the dpaa and the crypto driver. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Aymen Sghaier <aymen.sghaier@nxp.com> Cc: Herbert XS <herbert@gondor.apana.org.au> Cc: Li Yang <leoyang.li@nxp.com> Reviewed-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Madalin Bucur <madalin.bucur@oss.nxp.com> Tested-by: Camelia Groza <camelia.groza@nxp.com>	2020-11-03 17:41:03 -08:00
Alexander Lobakin	2e4ef10f58	net: add GSO UDP L4 and GSO fraglists to the list of software-backed types Commit `e20cf8d3f1` ("udp: implement GRO for plain UDP sockets.") and commit `9fd1ff5d2a` ("udp: Support UDP fraglist GRO/GSO.") made UDP L4 and fraglisted GRO/GSO fully supported by the software fallback mode. We can safely add them to NETIF_F_GSO_SOFTWARE to allow logical/virtual netdevs to forward these types of skbs up to the real drivers. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-03 16:53:55 -08:00
Guillaume Nault	0992d67bc2	mpls: drop skb's dst in mpls_forward() Commit `394de110a7` ("net: Added pointer check for dst->ops->neigh_lookup in dst_neigh_lookup_skb") added a test in dst_neigh_lookup_skb() to avoid a NULL pointer dereference. The root cause was the MPLS forwarding code, which doesn't call skb_dst_drop() on incoming packets. That is, if the packet is received from a collect_md device, it has a metadata_dst attached to it that doesn't implement any dst_ops function. To align the MPLS behaviour with IPv4 and IPv6, let's drop the dst in mpls_forward(). This way, dst_neigh_lookup_skb() doesn't need to test ->neigh_lookup any more. Let's keep a WARN condition though, to document the precondition and to ease detection of such problems in the future. Signed-off-by: Guillaume Nault <gnault@redhat.com> Link: https://lore.kernel.org/r/f8c2784c13faa54469a2aac339470b1049ca6b63.1604102750.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-03 12:55:53 -08:00
Aleksandr Nogikh	6370cc3bbd	net: add kcov handle to skb extensions Remote KCOV coverage collection enables coverage-guided fuzzing of the code that is not reachable during normal system call execution. It is especially helpful for fuzzing networking subsystems, where it is common to perform packet handling in separate work queues even for the packets that originated directly from the user space. Enable coverage-guided frame injection by adding kcov remote handle to skb extensions. Default initialization in __alloc_skb and __build_skb_around ensures that no socket buffer that was generated during a system call will be missed. Code that is of interest and that performs packet processing should be annotated with kcov_remote_start()/kcov_remote_stop(). An alternative approach is to determine kcov_handle solely on the basis of the device/interface that received the specific socket buffer. However, in this case it would be impossible to distinguish between packets that originated during normal background network processes or were intentionally injected from the user space. Signed-off-by: Aleksandr Nogikh <nogikh@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-02 18:01:34 -08:00
Yuchung Cheng	7e901ee7b6	tcp: avoid slow start during fast recovery on new losses During TCP fast recovery, the congestion control in charge is by default the Proportional Rate Reduction (PRR) unless the congestion control module specified otherwise (e.g. BBR). Previously when tcp_packets_in_flight() is below snd_ssthresh PRR would slow start upon receiving an ACK that 1) cumulatively acknowledges retransmitted data and 2) does not detect further lost retransmission Such conditions indicate the repair is in good steady progress after the first round trip of recovery. Otherwise PRR adopts the packet conservation principle to send only the amount that was newly delivered (indicated by this ACK). This patch generalizes the previous design principle to include also the newly sent data beside retransmission: as long as the delivery is making good progress, both retransmission and new data should be accounted to make PRR more cautious in slow starting. Suggested-by: Matt Mathis <mattmathis@google.com> Suggested-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20201031013412.1973112-1-ycheng@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-02 17:17:40 -08:00
Vladimir Oltean	2f0402fedf	net: mscc: ocelot: deny changing the native VLAN from the prepare phase Put the preparation phase of switchdev VLAN objects to some good use, and move the check we already had, for preventing the existence of more than one egress-untagged VLAN per port, to the preparation phase of the addition. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-02 17:09:07 -08:00
Vladimir Oltean	e2b2e83e52	net: mscc: ocelot: add a "valid" boolean to struct ocelot_vlan Currently we are checking in some places whether the port has a native VLAN on egress or not, by comparing the ocelot_port->vid value with zero. That works, because VID 0 can never be a native VLAN configured by the bridge, but now we want to make similar checks for the pvid. That won't work, because there are cases when we do have the pvid set to 0 (not by the bridge, by ourselves, but still.. it's confusing). And we can't encode a negative value into an u16, so add a bool to the structure. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-02 17:09:06 -08:00
Vladimir Oltean	c3e58a750e	net: mscc: ocelot: transform the pvid and native vlan values into a structure This is a mechanical patch only. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-02 17:09:06 -08:00
Heiner Kallweit	81b01894d7	net: core: add devm_netdev_alloc_pcpu_stats We have netdev_alloc_pcpu_stats(), and we have devm_alloc_percpu(). Add a managed version of netdev_alloc_pcpu_stats, e.g. for allocating the per-cpu stats in the probe() callback of a driver. It needs to be a macro for dealing properly with the type argument. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-31 10:22:49 -07:00
Heiner Kallweit	d3fd65484c	net: core: add dev_sw_netstats_tx_add Add dev_sw_netstats_tx_add(), complementing already existing dev_sw_netstats_rx_add(). Other than dev_sw_netstats_rx_add allow to pass the number of packets as function argument. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-31 10:21:31 -07:00
Vladimir Oltean	e5d1f896fd	net: mscc: ocelot: support L2 multicast entries There is one main difference in mscc_ocelot between IP multicast and L2 multicast. With IP multicast, destination ports are encoded into the upper bytes of the multicast MAC address. Example: to deliver the address 01:00:5E:11:22:33 to ports 3, 8, and 9, one would need to program the address of 00:03:08:11:22:33 into hardware. Whereas for L2 multicast, the MAC table entry points to a Port Group ID (PGID), and that PGID contains the port mask that the packet will be forwarded to. As to why it is this way, no clue. My guess is that not all port combinations can be supported simultaneously with the limited number of PGIDs, and this was somehow an issue for IP multicast but not for L2 multicast. Anyway. Prior to this change, the raw L2 multicast code was bogus, due to the fact that there wasn't really any way to test it using the bridge code. There were 2 issues: - A multicast PGID was allocated for each MDB entry, but it wasn't in fact programmed to hardware. It was dummy. - In fact we don't want to reserve a multicast PGID for every single MDB entry. That would be odd because we can only have ~60 PGIDs, but thousands of MDB entries. So instead, we want to reserve a multicast PGID for every single port combination for multicast traffic. And since we can have 2 (or more) MDB entries delivered to the same port group (and therefore PGID), we need to reference-count the PGIDs. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-30 18:25:56 -07:00
Nikolay Aleksandrov	955062b03f	net: bridge: mcast: add support for raw L2 multicast groups Extend the bridge multicast control and data path to configure routes for L2 (non-IP) multicast groups. The uapi struct br_mdb_entry union u is extended with another variant, mac_addr, which does not change the structure size, and which is valid when the proto field is zero. To be compatible with the forwarding code that is already in place, which acts as an IGMP/MLD snooping bridge with querier capabilities, we need to declare that for L2 MDB entries (for which there exists no such thing as IGMP/MLD snooping/querying), that there is always a querier. Otherwise, these entries would be flooded to all bridge ports and not just to those that are members of the L2 multicast group. Needless to say, only permanent L2 multicast groups can be installed on a bridge port. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20201028233831.610076-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-30 17:49:19 -07:00
Robert Hancock	1887023a5e	net: phy: marvell: add special handling of Finisar modules with 88E1111 The Finisar FCLF8520P2BTL 1000BaseT SFP module uses a Marvel 88E1111 PHY with a modified PHY ID. Add support for this ID using the 88E1111 methods. By default these modules do not have 1000BaseX auto-negotiation enabled, which is not generally desirable with Linux networking drivers. Add handling to enable 1000BaseX auto-negotiation when these modules are used in 1000BaseX mode. Also, some special handling is required to ensure that 1000BaseT auto-negotiation is enabled properly when desired. Based on existing handling in the AMD xgbe driver and the information in the Finisar FAQ: https://www.finisar.com/sites/default/files/resources/an-2036_1000base-t_sfp_faqreve1.pdf Signed-off-by: Robert Hancock <robert.hancock@calian.com> Reviewed-by: Russell King <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20201028171540.1700032-1-robert.hancock@calian.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-30 17:11:44 -07:00
Xin Long	e38d86b354	sctp: add the error cause for new encapsulation port restart This patch is to add the function to make the abort chunk with the error cause for new encapsulation port restart, defined on Section 4.4 in draft-tuexen-tsvwg-sctp-udp-encaps-cons-03. v1->v2: - no change. v2->v3: - no need to call htons() when setting nep.cur_port/new_port. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-30 15:24:40 -07:00
Xin Long	f1bfe8b541	sctp: add udphdr to overhead when udp_port is set sctp_mtu_payload() is for calculating the frag size before making chunks from a msg. So we should only add udphdr size to overhead when udp socks are listening, as only then sctp can handle the incoming sctp over udp packets and outgoing sctp over udp packets will be possible. Note that we can't do this according to transport->encap_port, as different transports may be set to different values, while the chunks were made before choosing the transport, we could not be able to meet all rfc6951#section-5.6 recommends. v1->v2: - Add udp_port for sctp_sock to avoid a potential race issue, it will be used in xmit path in the next patch. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-30 15:24:21 -07:00
Xin Long	a1dd2cf2f1	sctp: allow changing transport encap_port by peer packets As rfc6951#section-5.4 says: "After finding the SCTP association (which includes checking the verification tag), the UDP source port MUST be stored as the encapsulation port for the destination address the SCTP packet is received from (see Section 5.1). When a non-encapsulated SCTP packet is received by the SCTP stack, the encapsulation of outgoing packets belonging to the same association and the corresponding destination address MUST be disabled." transport encap_port should be updated by a validated incoming packet's udp src port. We save the udp src port in sctp_input_cb->encap_port, and then update the transport in two places: 1. right after vtag is verified, which is required by RFC, and this allows the existent transports to be updated by the chunks that can only be processed on an asoc. 2. right before processing the 'init' where the transports are added, and this allows building a sctp over udp connection by client with the server not knowing the remote encap port. 3. when processing ootb_pkt and creating the temporary transport for the reply pkt. Note that sctp_input_cb->header is removed, as it's not used any more in sctp. v1->v2: - Change encap_port as __be16 for sctp_input_cb. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-30 15:24:16 -07:00
Xin Long	8dba29603b	sctp: add SCTP_REMOTE_UDP_ENCAPS_PORT sockopt This patch is to implement: rfc6951#section-6.1: Get or Set the Remote UDP Encapsulation Port Number with the param of the struct: struct sctp_udpencaps { sctp_assoc_t sue_assoc_id; struct sockaddr_storage sue_address; uint16_t sue_port; }; the encap_port of sock, assoc or transport can be changed by users, which also means it allows the different transports of the same asoc to have different encap_port value. v1->v2: - no change. v2->v3: - fix the endian warning when setting values between encap_port and sue_port. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-30 15:24:11 -07:00
Xin Long	e8a3001c21	sctp: add encap_port for netns sock asoc and transport encap_port is added as per netns/sock/assoc/transport, and the latter one's encap_port inherits the former one's by default. The transport's encap_port value would mostly decide if one packet should go out with udp encapsulated or not. This patch also allows users to set netns' encap_port by sysctl. v1->v2: - Change to define encap_port as __be16 for sctp_sock, asoc and transport. v2->v3: - No change. v3->v4: - Add 'encap_port' entry in ip-sysctl.rst. v4->v5: - Improve the description of encap_port in ip-sysctl.rst. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-30 15:24:06 -07:00
Xin Long	9d6ba260a0	sctp: create udp6 sock and set its encap_rcv This patch is to add the udp6 sock part in sctp_udp_sock_start/stop(). udp_conf.use_udp6_rx_checksums is set to true, as: "The SCTP checksum MUST be computed for IPv4 and IPv6, and the UDP checksum SHOULD be computed for IPv4 and IPv6" says in rfc6951#section-5.3. v1->v2: - Add pr_err() when fails to create udp v6 sock. - Add #if IS_ENABLED(CONFIG_IPV6) not to create v6 sock when ipv6 is disabled. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-30 15:23:56 -07:00
Xin Long	965ae44412	sctp: create udp4 sock and add its encap_rcv This patch is to add the functions to create/release udp4 sock, and set the sock's encap_rcv to process the incoming udp encap sctp packets. In sctp_udp_rcv(), as we can see, all we need to do is fix the transport header for sctp_rcv(), then it would implement the part of rfc6951#section-5.4: "When an encapsulated packet is received, the UDP header is removed. Then, the generic lookup is performed, as done by an SCTP stack whenever a packet is received, to find the association for the received SCTP packet" Note that these functions will be called in the last patch of this patchset when enabling this feature. v1->v2: - Add pr_err() when fails to create udp v4 sock. v2->v3: - Add 'select NET_UDP_TUNNEL' in sctp Kconfig. v3->v4: - No change. v4->v5: - Change to set udp_port to 0 by default. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-30 15:23:52 -07:00
Jakub Kicinski	8911097fbf	Merge tag 'wimax-staging' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground Arnd Bergmann says: ==================== wimax: move to staging After I sent a fix for what appeared to be a harmless warning in the wimax user interface code, the conclusion was that the whole thing has most likely not been used in a very long time, and the user interface possibly been broken since `b61a5eea59` ("wimax: use genl_register_family_with_ops()"). Using a shared branch between net-next and staging should help coordinate patches getting submitted against it. ==================== Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-30 09:08:33 -07:00
Henrik Bjoernlund	e77824d81d	bridge: cfm: Netlink GET status Interface. This is the implementation of CFM netlink status get information interface. Add new nested netlink attributes. These attributes are used by the user space to get status information. GETLINK: Request filter RTEXT_FILTER_CFM_STATUS: Indicating that CFM status information must be delivered. IFLA_BRIDGE_CFM: Points to the CFM information. IFLA_BRIDGE_CFM_MEP_STATUS_INFO: This indicate that the MEP instance status are following. IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO: This indicate that the peer MEP status are following. CFM nested attribute has the following attributes in next level. GETLINK RTEXT_FILTER_CFM_STATUS: IFLA_BRIDGE_CFM_MEP_STATUS_INSTANCE: The MEP instance number of the delivered status. The type is u32. IFLA_BRIDGE_CFM_MEP_STATUS_OPCODE_UNEXP_SEEN: The MEP instance received CFM PDU with unexpected Opcode. The type is u32 (bool). IFLA_BRIDGE_CFM_MEP_STATUS_VERSION_UNEXP_SEEN: The MEP instance received CFM PDU with unexpected version. The type is u32 (bool). IFLA_BRIDGE_CFM_MEP_STATUS_RX_LEVEL_LOW_SEEN: The MEP instance received CCM PDU with MD level lower than configured level. This frame is discarded. The type is u32 (bool). IFLA_BRIDGE_CFM_CC_PEER_STATUS_INSTANCE: The MEP instance number of the delivered status. The type is u32. IFLA_BRIDGE_CFM_CC_PEER_STATUS_PEER_MEPID: The added Peer MEP ID of the delivered status. The type is u32. IFLA_BRIDGE_CFM_CC_PEER_STATUS_CCM_DEFECT: The CCM defect status. The type is u32 (bool). True means no CCM frame is received for 3.25 intervals. IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL. IFLA_BRIDGE_CFM_CC_PEER_STATUS_RDI: The last received CCM PDU RDI. The type is u32 (bool). IFLA_BRIDGE_CFM_CC_PEER_STATUS_PORT_TLV_VALUE: The last received CCM PDU Port Status TLV value field. The type is u8. IFLA_BRIDGE_CFM_CC_PEER_STATUS_IF_TLV_VALUE: The last received CCM PDU Interface Status TLV value field. The type is u8. IFLA_BRIDGE_CFM_CC_PEER_STATUS_SEEN: A CCM frame has been received from Peer MEP. The type is u32 (bool). This is cleared after GETLINK IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO. IFLA_BRIDGE_CFM_CC_PEER_STATUS_TLV_SEEN: A CCM frame with TLV has been received from Peer MEP. The type is u32 (bool). This is cleared after GETLINK IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO. IFLA_BRIDGE_CFM_CC_PEER_STATUS_SEQ_UNEXP_SEEN: A CCM frame with unexpected sequence number has been received from Peer MEP. The type is u32 (bool). When a sequence number is not one higher than previously received then it is unexpected. This is cleared after GETLINK IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO. Signed-off-by: Henrik Bjoernlund <henrik.bjoernlund@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-29 18:39:44 -07:00
Henrik Bjoernlund	5e312fc0e7	bridge: cfm: Netlink GET configuration Interface. This is the implementation of CFM netlink configuration get information interface. Add new nested netlink attributes. These attributes are used by the user space to get configuration information. GETLINK: Request filter RTEXT_FILTER_CFM_CONFIG: Indicating that CFM configuration information must be delivered. IFLA_BRIDGE_CFM: Points to the CFM information. IFLA_BRIDGE_CFM_MEP_CREATE_INFO: This indicate that MEP instance create parameters are following. IFLA_BRIDGE_CFM_MEP_CONFIG_INFO: This indicate that MEP instance config parameters are following. IFLA_BRIDGE_CFM_CC_CONFIG_INFO: This indicate that MEP instance CC functionality parameters are following. IFLA_BRIDGE_CFM_CC_RDI_INFO: This indicate that CC transmitted CCM PDU RDI parameters are following. IFLA_BRIDGE_CFM_CC_CCM_TX_INFO: This indicate that CC transmitted CCM PDU parameters are following. IFLA_BRIDGE_CFM_CC_PEER_MEP_INFO: This indicate that the added peer MEP IDs are following. CFM nested attribute has the following attributes in next level. GETLINK RTEXT_FILTER_CFM_CONFIG: IFLA_BRIDGE_CFM_MEP_CREATE_INSTANCE: The created MEP instance number. The type is u32. IFLA_BRIDGE_CFM_MEP_CREATE_DOMAIN: The created MEP domain. The type is u32 (br_cfm_domain). It must be BR_CFM_PORT. This means that CFM frames are transmitted and received directly on the port - untagged. Not in a VLAN. IFLA_BRIDGE_CFM_MEP_CREATE_DIRECTION: The created MEP direction. The type is u32 (br_cfm_mep_direction). It must be BR_CFM_MEP_DIRECTION_DOWN. This means that CFM frames are transmitted and received on the port. Not in the bridge. IFLA_BRIDGE_CFM_MEP_CREATE_IFINDEX: The created MEP residence port ifindex. The type is u32 (ifindex). IFLA_BRIDGE_CFM_MEP_DELETE_INSTANCE: The deleted MEP instance number. The type is u32. IFLA_BRIDGE_CFM_MEP_CONFIG_INSTANCE: The configured MEP instance number. The type is u32. IFLA_BRIDGE_CFM_MEP_CONFIG_UNICAST_MAC: The configured MEP unicast MAC address. The type is 6u8 (array). This is used as SMAC in all transmitted CFM frames. IFLA_BRIDGE_CFM_MEP_CONFIG_MDLEVEL: The configured MEP unicast MD level. The type is u32. It must be in the range 1-7. No CFM frames are passing through this MEP on lower levels. IFLA_BRIDGE_CFM_MEP_CONFIG_MEPID: The configured MEP ID. The type is u32. It must be in the range 0-0x1FFF. This MEP ID is inserted in any transmitted CCM frame. IFLA_BRIDGE_CFM_CC_CONFIG_INSTANCE: The configured MEP instance number. The type is u32. IFLA_BRIDGE_CFM_CC_CONFIG_ENABLE: The Continuity Check (CC) functionality is enabled or disabled. The type is u32 (bool). IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL: The CC expected receive interval of CCM frames. The type is u32 (br_cfm_ccm_interval). This is also the transmission interval of CCM frames when enabled. IFLA_BRIDGE_CFM_CC_CONFIG_EXP_MAID: The CC expected receive MAID in CCM frames. The type is CFM_MAID_LENGTHu8. This is MAID is also inserted in transmitted CCM frames. IFLA_BRIDGE_CFM_CC_PEER_MEP_INSTANCE: The configured MEP instance number. The type is u32. IFLA_BRIDGE_CFM_CC_PEER_MEPID: The CC Peer MEP ID added. The type is u32. When a Peer MEP ID is added and CC is enabled it is expected to receive CCM frames from that Peer MEP. IFLA_BRIDGE_CFM_CC_RDI_INSTANCE: The configured MEP instance number. The type is u32. IFLA_BRIDGE_CFM_CC_RDI_RDI: The RDI that is inserted in transmitted CCM PDU. The type is u32 (bool). IFLA_BRIDGE_CFM_CC_CCM_TX_INSTANCE: The configured MEP instance number. The type is u32. IFLA_BRIDGE_CFM_CC_CCM_TX_DMAC: The transmitted CCM frame destination MAC address. The type is 6*u8 (array). This is used as DMAC in all transmitted CFM frames. IFLA_BRIDGE_CFM_CC_CCM_TX_SEQ_NO_UPDATE: The transmitted CCM frame update (increment) of sequence number is enabled or disabled. The type is u32 (bool). IFLA_BRIDGE_CFM_CC_CCM_TX_PERIOD: The period of time where CCM frame are transmitted. The type is u32. The time is given in seconds. SETLINK IFLA_BRIDGE_CFM_CC_CCM_TX must be done before timeout to keep transmission alive. When period is zero any ongoing CCM frame transmission will be stopped. IFLA_BRIDGE_CFM_CC_CCM_TX_IF_TLV: The transmitted CCM frame update with Interface Status TLV is enabled or disabled. The type is u32 (bool). IFLA_BRIDGE_CFM_CC_CCM_TX_IF_TLV_VALUE: The transmitted Interface Status TLV value field. The type is u8. IFLA_BRIDGE_CFM_CC_CCM_TX_PORT_TLV: The transmitted CCM frame update with Port Status TLV is enabled or disabled. The type is u32 (bool). IFLA_BRIDGE_CFM_CC_CCM_TX_PORT_TLV_VALUE: The transmitted Port Status TLV value field. The type is u8. Signed-off-by: Henrik Bjoernlund <henrik.bjoernlund@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-29 18:39:43 -07:00
Henrik Bjoernlund	2be665c394	bridge: cfm: Netlink SET configuration Interface. This is the implementation of CFM netlink configuration set information interface. Add new nested netlink attributes. These attributes are used by the user space to create/delete/configure CFM instances. SETLINK: IFLA_BRIDGE_CFM: Indicate that the following attributes are CFM. IFLA_BRIDGE_CFM_MEP_CREATE: This indicate that a MEP instance must be created. IFLA_BRIDGE_CFM_MEP_DELETE: This indicate that a MEP instance must be deleted. IFLA_BRIDGE_CFM_MEP_CONFIG: This indicate that a MEP instance must be configured. IFLA_BRIDGE_CFM_CC_CONFIG: This indicate that a MEP instance Continuity Check (CC) functionality must be configured. IFLA_BRIDGE_CFM_CC_PEER_MEP_ADD: This indicate that a CC Peer MEP must be added. IFLA_BRIDGE_CFM_CC_PEER_MEP_REMOVE: This indicate that a CC Peer MEP must be removed. IFLA_BRIDGE_CFM_CC_CCM_TX: This indicate that the CC transmitted CCM PDU must be configured. IFLA_BRIDGE_CFM_CC_RDI: This indicate that the CC transmitted CCM PDU RDI must be configured. CFM nested attribute has the following attributes in next level. SETLINK RTEXT_FILTER_CFM_CONFIG: IFLA_BRIDGE_CFM_MEP_CREATE_INSTANCE: The created MEP instance number. The type is u32. IFLA_BRIDGE_CFM_MEP_CREATE_DOMAIN: The created MEP domain. The type is u32 (br_cfm_domain). It must be BR_CFM_PORT. This means that CFM frames are transmitted and received directly on the port - untagged. Not in a VLAN. IFLA_BRIDGE_CFM_MEP_CREATE_DIRECTION: The created MEP direction. The type is u32 (br_cfm_mep_direction). It must be BR_CFM_MEP_DIRECTION_DOWN. This means that CFM frames are transmitted and received on the port. Not in the bridge. IFLA_BRIDGE_CFM_MEP_CREATE_IFINDEX: The created MEP residence port ifindex. The type is u32 (ifindex). IFLA_BRIDGE_CFM_MEP_DELETE_INSTANCE: The deleted MEP instance number. The type is u32. IFLA_BRIDGE_CFM_MEP_CONFIG_INSTANCE: The configured MEP instance number. The type is u32. IFLA_BRIDGE_CFM_MEP_CONFIG_UNICAST_MAC: The configured MEP unicast MAC address. The type is 6u8 (array). This is used as SMAC in all transmitted CFM frames. IFLA_BRIDGE_CFM_MEP_CONFIG_MDLEVEL: The configured MEP unicast MD level. The type is u32. It must be in the range 1-7. No CFM frames are passing through this MEP on lower levels. IFLA_BRIDGE_CFM_MEP_CONFIG_MEPID: The configured MEP ID. The type is u32. It must be in the range 0-0x1FFF. This MEP ID is inserted in any transmitted CCM frame. IFLA_BRIDGE_CFM_CC_CONFIG_INSTANCE: The configured MEP instance number. The type is u32. IFLA_BRIDGE_CFM_CC_CONFIG_ENABLE: The Continuity Check (CC) functionality is enabled or disabled. The type is u32 (bool). IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL: The CC expected receive interval of CCM frames. The type is u32 (br_cfm_ccm_interval). This is also the transmission interval of CCM frames when enabled. IFLA_BRIDGE_CFM_CC_CONFIG_EXP_MAID: The CC expected receive MAID in CCM frames. The type is CFM_MAID_LENGTHu8. This is MAID is also inserted in transmitted CCM frames. IFLA_BRIDGE_CFM_CC_PEER_MEP_INSTANCE: The configured MEP instance number. The type is u32. IFLA_BRIDGE_CFM_CC_PEER_MEPID: The CC Peer MEP ID added. The type is u32. When a Peer MEP ID is added and CC is enabled it is expected to receive CCM frames from that Peer MEP. IFLA_BRIDGE_CFM_CC_RDI_INSTANCE: The configured MEP instance number. The type is u32. IFLA_BRIDGE_CFM_CC_RDI_RDI: The RDI that is inserted in transmitted CCM PDU. The type is u32 (bool). IFLA_BRIDGE_CFM_CC_CCM_TX_INSTANCE: The configured MEP instance number. The type is u32. IFLA_BRIDGE_CFM_CC_CCM_TX_DMAC: The transmitted CCM frame destination MAC address. The type is 6*u8 (array). This is used as DMAC in all transmitted CFM frames. IFLA_BRIDGE_CFM_CC_CCM_TX_SEQ_NO_UPDATE: The transmitted CCM frame update (increment) of sequence number is enabled or disabled. The type is u32 (bool). IFLA_BRIDGE_CFM_CC_CCM_TX_PERIOD: The period of time where CCM frame are transmitted. The type is u32. The time is given in seconds. SETLINK IFLA_BRIDGE_CFM_CC_CCM_TX must be done before timeout to keep transmission alive. When period is zero any ongoing CCM frame transmission will be stopped. IFLA_BRIDGE_CFM_CC_CCM_TX_IF_TLV: The transmitted CCM frame update with Interface Status TLV is enabled or disabled. The type is u32 (bool). IFLA_BRIDGE_CFM_CC_CCM_TX_IF_TLV_VALUE: The transmitted Interface Status TLV value field. The type is u8. IFLA_BRIDGE_CFM_CC_CCM_TX_PORT_TLV: The transmitted CCM frame update with Port Status TLV is enabled or disabled. The type is u32 (bool). IFLA_BRIDGE_CFM_CC_CCM_TX_PORT_TLV_VALUE: The transmitted Port Status TLV value field. The type is u8. Signed-off-by: Henrik Bjoernlund <henrik.bjoernlund@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-29 18:39:43 -07:00
Henrik Bjoernlund	dc32cbb3db	bridge: cfm: Kernel space implementation of CFM. CCM frame RX added. This is the third commit of the implementation of the CFM protocol according to 802.1Q section 12.14. Functionality is extended with CCM frame reception. The MEP instance now contains CCM based status information. Most important is the CCM defect status indicating if correct CCM frames are received with the expected interval. Signed-off-by: Henrik Bjoernlund <henrik.bjoernlund@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-29 18:39:43 -07:00
Henrik Bjoernlund	a806ad8ee2	bridge: cfm: Kernel space implementation of CFM. CCM frame TX added. This is the second commit of the implementation of the CFM protocol according to 802.1Q section 12.14. Functionality is extended with CCM frame transmission. Interface is extended with these functions: br_cfm_cc_rdi_set() br_cfm_cc_ccm_tx() br_cfm_cc_config_set() A MEP Continuity Check feature can be configured by br_cfm_cc_config_set() The Continuity Check parameters can be configured to be used when transmitting CCM. A MEP can be configured to start or stop transmission of CCM frames by br_cfm_cc_ccm_tx() The CCM will be transmitted for a selected period in seconds. Must call this function before timeout to keep transmission alive. A MEP transmitting CCM can be configured with inserted RDI in PDU by br_cfm_cc_rdi_set() Signed-off-by: Henrik Bjoernlund <henrik.bjoernlund@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-29 18:39:43 -07:00
Henrik Bjoernlund	86a14b79e1	bridge: cfm: Kernel space implementation of CFM. MEP create/delete. This is the first commit of the implementation of the CFM protocol according to 802.1Q section 12.14. It contains MEP instance create, delete and configuration. Connectivity Fault Management (CFM) comprises capabilities for detecting, verifying, and isolating connectivity failures in Virtual Bridged Networks. These capabilities can be used in networks operated by multiple independent organizations, each with restricted management access to each others equipment. CFM functions are partitioned as follows: - Path discovery - Fault detection - Fault verification and isolation - Fault notification - Fault recovery Interface consists of these functions: br_cfm_mep_create() br_cfm_mep_delete() br_cfm_mep_config_set() br_cfm_cc_config_set() br_cfm_cc_peer_mep_add() br_cfm_cc_peer_mep_remove() A MEP instance is created by br_cfm_mep_create() -It is the Maintenance association End Point described in 802.1Q section 19.2. -It is created on a specific level (1-7) and is assuring that no CFM frames are passing through this MEP on lower levels. -It initiates and validates CFM frames on its level. -It can only exist on a port that is related to a bridge. -Attributes given cannot be changed until the instance is deleted. A MEP instance can be deleted by br_cfm_mep_delete(). A created MEP instance has attributes that can be configured by br_cfm_mep_config_set(). A MEP Continuity Check feature can be configured by br_cfm_cc_config_set() The Continuity Check Receiver state machine can be enabled and disabled. According to 802.1Q section 19.2.8 A MEP can have Peer MEPs added and removed by br_cfm_cc_peer_mep_add() and br_cfm_cc_peer_mep_remove() The Continuity Check feature can maintain connectivity status on each added Peer MEP. Signed-off-by: Henrik Bjoernlund <henrik.bjoernlund@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-29 18:39:43 -07:00
Henrik Bjoernlund	fbaedb4129	bridge: uapi: cfm: Added EtherType used by the CFM protocol. This EtherType is used by all CFM protocal frames transmitted according to 802.1Q section 12.14. Signed-off-by: Henrik Bjoernlund <henrik.bjoernlund@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-29 18:39:43 -07:00
Linus Torvalds	07e0887302	Merge tag 'fallthrough-fixes-clang-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux Pull fallthrough fix from Gustavo A. R. Silva: "This fixes a ton of fall-through warnings when building with Clang 12.0.0 and -Wimplicit-fallthrough" * tag 'fallthrough-fixes-clang-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: include: jhash/signal: Fix fall-through warnings for Clang	2020-10-29 13:02:52 -07:00
Linus Torvalds	b9c0f4bd5b	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma fixes from Jason Gunthorpe: "The good news is people are testing rc1 in the RDMA world - the bad news is testing of the for-next area is not as good as I had hoped, as we really should have caught at least the rdma_connect_locked() issue before now. Notable merge window regressions that didn't get caught/fixed in time for rc1: - Fix in kernel users of rxe, they were broken by the rapid fix to undo the uABI breakage in rxe from another patch - EFA userspace needs to read the GID table but was broken with the new GID table logic - Fix user triggerable deadlock in mlx5 using devlink reload - Fix deadlock in several ULPs using rdma_connect from the CM handler callbacks - Memory leak in qedr" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/qedr: Fix memory leak in iWARP CM RDMA: Add rdma_connect_locked() RDMA/uverbs: Fix false error in query gid IOCTL RDMA/mlx5: Fix devlink deadlock on net namespace deletion RDMA/rxe: Fix small problem in network_type patch	2020-10-29 11:50:59 -07:00
Arnd Bergmann	f54ec58fee	wimax: move out to staging There are no known users of this driver as of October 2020, and it will be removed unless someone turns out to still need it in future releases. According to https://en.wikipedia.org/wiki/List_of_WiMAX_networks, there have been many public wimax networks, but it appears that many of these have migrated to LTE or discontinued their service altogether. As most PCs and phones lack WiMAX hardware support, the remaining networks tend to use standalone routers. These almost certainly run Linux, but not a modern kernel or the mainline wimax driver stack. NetworkManager appears to have dropped userspace support in 2015 https://bugzilla.gnome.org/show_bug.cgi?id=747846, the www.linuxwimax.org site had already shut down earlier. WiMax is apparently still being deployed on airport campus networks ("AeroMACS"), but in a frequency band that was not supported by the old Intel 2400m (used in Sandy Bridge laptops and earlier), which is the only driver using the kernel's wimax stack. Move all files into drivers/staging/wimax, including the uapi header files and documentation, to make it easier to remove it when it gets to that. Only minimal changes are made to the source files, in order to make it possible to port patches across the move. Also remove the MAINTAINERS entry that refers to a broken mailing list and website. Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-By: Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Suggested-by: Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2020-10-29 19:27:45 +01:00
Gustavo A. R. Silva	4169e889e5	include: jhash/signal: Fix fall-through warnings for Clang In preparation to enable -Wimplicit-fallthrough for Clang, explicitly add break statements instead of letting the code fall through to the next case. This patch adds four break statements that, together, fix almost 40,000 warnings when building Linux 5.10-rc1 with Clang 12.0.0 and this[1] change reverted. Notice that in order to enable -Wimplicit-fallthrough for Clang, such change[1] is meant to be reverted at some point. So, this patch helps to move in that direction. Something important to mention is that there is currently a discrepancy between GCC and Clang when dealing with switch fall-through to empty case statements or to cases that only contain a break/continue/return statement[2][3][4]. Now that the -Wimplicit-fallthrough option has been globally enabled[5], any compiler should really warn on missing either a fallthrough annotation or any of the other case-terminating statements (break/continue/return/ goto) when falling through to the next case statement. Making exceptions to this introduces variation in case handling which may continue to lead to bugs, misunderstandings, and a general lack of robustness. The point of enabling options like -Wimplicit-fallthrough is to prevent human error and aid developers in spotting bugs before their code is even built/ submitted/committed, therefore eliminating classes of bugs. So, in order to really accomplish this, we should, and can, move in the direction of addressing any error-prone scenarios and get rid of the unintentional fallthrough bug-class in the kernel, entirely, even if there is some minor redundancy. Better to have explicit case-ending statements than continue to have exceptions where one must guess as to the right result. The compiler will eliminate any actual redundancy. [1] commit `e2079e93f5` ("kbuild: Do not enable -Wimplicit-fallthrough for clang for now") [2] https://github.com/ClangBuiltLinux/linux/issues/636 [3] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91432 [4] https://godbolt.org/z/xgkvIh [5] commit `a035d552a9` ("Makefile: Globally enable fall-through warning") Co-developed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>	2020-10-29 13:17:58 -05:00
Linus Torvalds	598a597636	Merge tag 'afs-fixes-20201029' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull AFS fixes from David Howells: - Fix copy_file_range() to an afs file now returning EINVAL if the splice_write file op isn't supplied. - Fix a deref-before-check in afs_unuse_cell(). - Fix a use-after-free in afs_xattr_get_acl(). - Fix afs to not try to clear PG_writeback when laundering a page. - Fix afs to take a ref on a page that it sets PG_private on and to drop that ref when clearing PG_private. This is done through recently added helpers. - Fix a page leak if write_begin() fails. - Fix afs_write_begin() to not alter the dirty region info stored in page->private, but rather do this in afs_write_end() instead when we know what we actually changed. - Fix afs_invalidatepage() to alter the dirty region info on a page when partial page invalidation occurs so that we don't inadvertantly include a span of zeros that will get written back if a page gets laundered due to a remote 3rd-party induced invalidation. We mustn't, however, reduce the dirty region if the page has been seen to be mapped (ie. we got called through the page_mkwrite vector) as the page might still be mapped and we might lose data if the file is extended again. - Fix the dirty region info to have a lower resolution if the size of the page is too large for this to be encoded (e.g. powerpc32 with 64K pages). Note that this might not be the ideal way to handle this, since it may allow some leakage of undirtied zero bytes to the server's copy in the case of a 3rd-party conflict. To aid the last two fixes, two additional changes: - Wrap the manipulations of the dirty region info stored in page->private into helper functions. - Alter the encoding of the dirty region so that the region bounds can be stored with one fewer bit, making a bit available for the indication of mappedness. * tag 'afs-fixes-20201029' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: afs: Fix dirty-region encoding on ppc32 with 64K pages afs: Fix afs_invalidatepage to adjust the dirty region afs: Alter dirty range encoding in page->private afs: Wrap page->private manipulations in inline functions afs: Fix where page->private is set during write afs: Fix page leak on afs_write_begin() failure afs: Fix to take ref on page when PG_private is set afs: Fix afs_launder_page to not clear PG_writeback afs: Fix a use after free in afs_xattr_get_acl() afs: Fix tracing deref-before-check afs: Fix copy_file_range()	2020-10-29 10:13:09 -07:00
Linus Torvalds	58130a6cd0	Merge tag 'ext4_for_linus_fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Bug fixes for the new ext4 fast commit feature, plus a fix for the 'data=journal' bug fix. Also use the generic casefolding support which has now landed in fs/libfs.c for 5.10" * tag 'ext4_for_linus_fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: indicate that fast_commit is available via /sys/fs/ext4/feature/... ext4: use generic casefolding support ext4: do not use extent after put_bh ext4: use IS_ERR() for error checking of path ext4: fix mmap write protection for data=journal mode jbd2: fix a kernel-doc markup ext4: use s_mount_flags instead of s_mount_state for fast commit state ext4: make num of fast commit blocks configurable ext4: properly check for dirty state in ext4_inode_datasync_dirty() ext4: fix double locking in ext4_fc_commit_dentry_updates()	2020-10-29 09:36:11 -07:00
David Howells	f86726a69d	afs: Fix afs_invalidatepage to adjust the dirty region Fix afs_invalidatepage() to adjust the dirty region recorded in page->private when truncating a page. If the dirty region is entirely removed, then the private data is cleared and the page dirty state is cleared. Without this, if the page is truncated and then expanded again by truncate, zeros from the expanded, but no-longer dirty region may get written back to the server if the page gets laundered due to a conflicting 3rd-party write. It mustn't, however, shorten the dirty region of the page if that page is still mmapped and has been marked dirty by afs_page_mkwrite(), so a flag is stored in page->private to record this. Fixes: `4343d00872` ("afs: Get rid of the afs_writeback record") Signed-off-by: David Howells <dhowells@redhat.com>	2020-10-29 13:53:04 +00:00
David Howells	185f0c7073	afs: Wrap page->private manipulations in inline functions The afs filesystem uses page->private to store the dirty range within a page such that in the event of a conflicting 3rd-party write to the server, we write back just the bits that got changed locally. However, there are a couple of problems with this: (1) I need a bit to note if the page might be mapped so that partial invalidation doesn't shrink the range. (2) There aren't necessarily sufficient bits to store the entire range of data altered (say it's a 32-bit system with 64KiB pages or transparent huge pages are in use). So wrap the accesses in inline functions so that future commits can change how this works. Also move them out of the tracing header into the in-directory header. There's not really any need for them to be in the tracing header. Signed-off-by: David Howells <dhowells@redhat.com>	2020-10-29 13:53:04 +00:00
Mauro Carvalho Chehab	ea4b01d9b8	jbd2: fix a kernel-doc markup The kernel-doc markup that documents _fc_replay_callback is missing an asterisk, causing this warning: ../include/linux/jbd2.h:1271: warning: Function parameter or member 'j_fc_replay_callback' not described in 'journal_s' When building the docs. Fixes: 609f928af48f ("jbd2: fast commit recovery path") Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Link: https://lore.kernel.org/r/6055927ada2015b55b413cdd2670533bdc9a8da2.1603791716.git.mchehab+huawei@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2020-10-28 13:42:36 -04:00
Harshad Shirwadkar	e029c5f279	ext4: make num of fast commit blocks configurable This patch reserves a field in the jbd2 superblock for number of fast commit blocks. When this value is non-zero, Ext4 uses this field to set the number of fast commit blocks. Fixes: `6866d7b3f2` ("ext4/jbd2: add fast commit initialization") Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201027044915.2553163-2-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2020-10-28 13:42:03 -04:00
Jason Gunthorpe	071ba4cc55	RDMA: Add rdma_connect_locked() There are two flows for handling RDMA_CM_EVENT_ROUTE_RESOLVED, either the handler triggers a completion and another thread does rdma_connect() or the handler directly calls rdma_connect(). In all cases rdma_connect() needs to hold the handler_mutex, but when handler's are invoked this is already held by the core code. This causes ULPs using the 2nd method to deadlock. Provide a rdma_connect_locked() and have all ULPs call it from their handlers. Link: https://lore.kernel.org/r/0-v2-53c22d5c1405+33-rdma_connect_locking_jgg@nvidia.com Reported-and-tested-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Fixes: `2a7cec5381` ("RDMA/cma: Fix locking for the RDMA_CM_CONNECT state") Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-10-28 09:14:49 -03:00
Kees Cook	3e6631485f	vmlinux.lds.h: Keep .ctors.* with .ctors Under some circumstances, the compiler generates .ctors.* sections. This is seen doing a cross compile of x86_64 from a powerpc64el host: x86_64-linux-gnu-ld: warning: orphan section `.ctors.65435' from `kernel/trace/trace_clock.o' being placed in section `.ctors.65435' x86_64-linux-gnu-ld: warning: orphan section `.ctors.65435' from `kernel/trace/ftrace.o' being placed in section `.ctors.65435' x86_64-linux-gnu-ld: warning: orphan section `.ctors.65435' from `kernel/trace/ring_buffer.o' being placed in section `.ctors.65435' Include these orphans along with the regular .ctors section. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Tested-by: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: `83109d5d5f` ("x86/build: Warn on orphan section placement") Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Nick Desaulniers <ndesaulniers@google.com> Link: https://lore.kernel.org/r/20201005025720.2599682-1-keescook@chromium.org	2020-10-27 11:13:41 -07:00
Parav Pandit	fbdd0049d9	RDMA/mlx5: Fix devlink deadlock on net namespace deletion When a mlx5 core devlink instance is reloaded in different net namespace, its associated IB device is deleted and recreated. Example sequence is: $ ip netns add foo $ devlink dev reload pci/0000:00:08.0 netns foo $ ip netns del foo mlx5 IB device needs to attach and detach the netdevice to it through the netdev notifier chain during load and unload sequence. A below call graph of the unload flow. cleanup_net() down_read(&pernet_ops_rwsem); <- first sem acquired ops_pre_exit_list() pre_exit() devlink_pernet_pre_exit() devlink_reload() mlx5_devlink_reload_down() mlx5_unload_one() [...] mlx5_ib_remove() mlx5_ib_unbind_slave_port() mlx5_remove_netdev_notifier() unregister_netdevice_notifier() down_write(&pernet_ops_rwsem);<- recurrsive lock Hence, when net namespace is deleted, mlx5 reload results in deadlock. When deadlock occurs, devlink mutex is also held. This not only deadlocks the mlx5 device under reload, but all the processes which attempt to access unrelated devlink devices are deadlocked. Hence, fix this by mlx5 ib driver to register for per net netdev notifier instead of global one, which operats on the net namespace without holding the pernet_ops_rwsem. Fixes: `4383cfcc65` ("net/mlx5: Add devlink reload") Link: https://lore.kernel.org/r/20201026134359.23150-1-parav@nvidia.com Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-10-26 19:18:19 -03:00
Joe Perches	33def8498f	treewide: Convert macro and uses of __section(foo) to __section("foo") Use a more generic form for __section that requires quotes to avoid complications with clang and gcc differences. Remove the quote operator # from compiler_attributes.h __section macro. Convert all unquoted __section(foo) uses to quoted __section("foo"). Also convert __attribute__((section("foo"))) uses to __section("foo") even if the __attribute__ has multiple list entry forms. Conversion done using the script at: https://lore.kernel.org/lkml/75393e5ddc272dc7403de74d645e6c6e0f4e70eb.camel@perches.com/2-convert_section.pl Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Nick Desaulniers <ndesaulniers@gooogle.com> Reviewed-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-10-25 14:51:49 -07:00
Eric Biggers	23224e4500	mm: remove kzfree() compatibility definition Commit `453431a549` ("mm, treewide: rename kzfree() to kfree_sensitive()") renamed kzfree() to kfree_sensitive(), but it left a compatibility definition of kzfree() to avoid being too disruptive. Since then a few more instances of kzfree() have slipped in. Just get rid of them and remove the compatibility definition once and for all. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-10-25 11:39:02 -07:00
Linus Torvalds	a3d1b31213	Merge tag 'perf-urgent-2020-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fix from Thomas Gleixner: "A single fix to compute the field offset of the SNOOPX bit in the data source bitmask of perf events correctly" * tag 'perf-urgent-2020-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: correct SNOOPX field offset	2020-10-25 11:22:59 -07:00
Linus Torvalds	1c84550f47	Merge tag 'locking-urgent-2020-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fix from Thomas Gleixner: "Just a trivial fix for kernel-doc warnings" * tag 'locking-urgent-2020-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/seqlocks: Fix kernel-doc warnings	2020-10-25 11:14:54 -07:00
Linus Torvalds	f9c25d9864	Merge branch 'parisc-5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull more parisc updates from Helge Deller: - During this merge window O_NONBLOCK was changed to become 000200000, but we missed that the syscalls timerfd_create(), signalfd4(), eventfd2(), pipe2(), inotify_init1() and userfaultfd() do a strict bit-wise check of the flags parameter. To provide backward compatibility with existing userspace we introduce parisc specific wrappers for those syscalls which filter out the old O_NONBLOCK value and replaces it with the new one. - Prevent HIL bus driver to get stuck when keyboard or mouse isn't attached - Improve error return codes when setting rtc time - Minor documentation fix in pata_ns87415.c * 'parisc-5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: ata: pata_ns87415.c: Document support on parisc with superio chip parisc: Add wrapper syscalls to fix O_NONBLOCK flag usage hil/parisc: Disable HIL driver when it gets stuck parisc: Improve error return codes when setting rtc time	2020-10-25 10:59:34 -07:00
Linus Torvalds	bd6aabc7ca	Merge tag 'for-linus-5.10b-rc1c-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull more xen updates from Juergen Gross: - a series for the Xen pv block drivers adding module parameters for better control of resource usge - a cleanup series for the Xen event driver * tag 'for-linus-5.10b-rc1c-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: Documentation: add xen.fifo_events kernel parameter description xen/events: unmask a fifo event channel only if it was masked xen/events: only register debug interrupt for 2-level events xen/events: make struct irq_info private to events_base.c xen: remove no longer used functions xen-blkfront: Apply changed parameter name to the document xen-blkfront: add a parameter for disabling of persistent grants xen-blkback: add a parameter for disabling of persistent grants	2020-10-25 10:55:35 -07:00
Linus Torvalds	91f28da8c9	Merge tag '20201024-v4-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/wtarreau/prandom Pull random32 updates from Willy Tarreau: "Make prandom_u32() less predictable. This is the cleanup of the latest series of prandom_u32 experimentations consisting in using SipHash instead of Tausworthe to produce the randoms used by the network stack. The changes to the files were kept minimal, and the controversial commit that used to take noise from the fast_pool (`f227e3ec3b`) was reverted. Instead, a dedicated "net_rand_noise" per_cpu variable is fed from various sources of activities (networking, scheduling) to perturb the SipHash state using fast, non-trivially predictable data, instead of keeping it fully deterministic. The goal is essentially to make any occasional memory leakage or brute-force attempt useless. The resulting code was verified to be very slightly faster on x86_64 than what is was with the controversial commit above, though this remains barely above measurement noise. It was also tested on i386 and arm, and build- tested only on arm64" Link: https://lore.kernel.org/netdev/20200808152628.GA27941@SDF.ORG/ * tag '20201024-v4-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/wtarreau/prandom: random32: add a selftest for the prandom32 code random32: add noise from network and scheduling activity random32: make prandom_u32() output unpredictable	2020-10-25 10:40:08 -07:00
Linus Torvalds	d769139081	Merge tag 'block-5.10-2020-10-24' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: - NVMe pull request from Christoph - rdma error handling fixes (Chao Leng) - fc error handling and reconnect fixes (James Smart) - fix the qid displace when tracing ioctl command (Keith Busch) - don't use BLK_MQ_REQ_NOWAIT for passthru (Chaitanya Kulkarni) - fix MTDT for passthru (Logan Gunthorpe) - blacklist Write Same on more devices (Kai-Heng Feng) - fix an uninitialized work struct (zhenwei pi)" - lightnvm out-of-bounds fix (Colin) - SG allocation leak fix (Doug) - rnbd fixes (Gioh, Guoqing, Jack) - zone error translation fixes (Keith) - kerneldoc markup fix (Mauro) - zram lockdep fix (Peter) - Kill unused io_context members (Yufen) - NUMA memory allocation cleanup (Xianting) - NBD config wakeup fix (Xiubo) * tag 'block-5.10-2020-10-24' of git://git.kernel.dk/linux-block: (27 commits) block: blk-mq: fix a kernel-doc markup nvme-fc: shorten reconnect delay if possible for FC nvme-fc: wait for queues to freeze before calling update_hr_hw_queues nvme-fc: fix error loop in create_hw_io_queues nvme-fc: fix io timeout to abort I/O null_blk: use zone status for max active/open nvmet: don't use BLK_MQ_REQ_NOWAIT for passthru nvmet: cleanup nvmet_passthru_map_sg() nvmet: limit passthru MTDS by BIO_MAX_PAGES nvmet: fix uninitialized work for zero kato nvme-pci: disable Write Zeroes on Sandisk Skyhawk nvme: use queuedata for nvme_req_qid nvme-rdma: fix crash due to incorrect cqe nvme-rdma: fix crash when connect rejected block: remove unused members for io_context blk-mq: remove the calling of local_memory_node() zram: Fix __zram_bvec_{read,write}() locking order skd_main: remove unused including <linux/version.h> sgl_alloc_order: fix memory leak lightnvm: fix out-of-bounds write to array devices->info[] ...	2020-10-24 12:46:42 -07:00

1 2 3 4 5 ...

124808 Commits