Commit Graph

1123098 Commits

Author SHA1 Message Date
Suman Ghosh
99c969a83d octeontx2-pf: Add egress PFC support
As of now all transmit queues transmit packets out of same scheduler
queue hierarchy. Due to this PFC frames sent by peer are not handled
properly, either all transmit queues are backpressured or none.
To fix this when user enables PFC for a given priority map relavant
transmit queue to a different scheduler queue hierarcy, so that
backpressure is applied only to the traffic egressing out of that TXQ.

Signed-off-by: Suman Ghosh <sumang@marvell.com>
Link: https://lore.kernel.org/r/20220830120304.158060-1-sumang@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-01 09:37:47 +02:00
Zhengchao Shao
a102c8973d net: sched: remove redundant NULL check in change hook function
Currently, the change function can be called by two ways. The one way is
that qdisc_change() will call it. Before calling change function,
qdisc_change() ensures tca[TCA_OPTIONS] is not empty. The other way is
that .init() will call it. The opt parameter is also checked before
calling change function in .init(). Therefore, it's no need to check the
input parameter opt in change function.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Link: https://lore.kernel.org/r/20220829071219.208646-1-shaozhengchao@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-01 08:06:45 +02:00
Jakub Kicinski
0b4f688d53 Revert "sch_cake: Return __NET_XMIT_STOLEN when consuming enqueued skb"
This reverts commit 90fabae8a2.

Patch was applied hastily, revert and let the v2 be reviewed.

Fixes: 90fabae8a2 ("sch_cake: Return __NET_XMIT_STOLEN when consuming enqueued skb")
Link: https://lore.kernel.org/all/87wnao2ha3.fsf@toke.dk/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-31 20:02:28 -07:00
Jakub Kicinski
a3daac631e Merge branch 'tcp-tcp-challenge-ack-fixes'
Eric Dumazet says:

====================
tcp: tcp challenge ack fixes

syzbot found a typical data-race addressed in the first patch.

While we are at it, second patch makes the global rate limit
per net-ns and disabled by default.
====================

Link: https://lore.kernel.org/r/20220830185656.268523-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-31 19:56:54 -07:00
Eric Dumazet
79e3602caa tcp: make global challenge ack rate limitation per net-ns and default disabled
Because per host rate limiting has been proven problematic (side channel
attacks can be based on it), per host rate limiting of challenge acks ideally
should be per netns and turned off by default.

This is a long due followup of following commits:

083ae30828 ("tcp: enable per-socket rate limiting of all 'challenge acks'")
f2b2c582e8 ("tcp: mitigate ACK loops for connections as tcp_sock")
75ff39ccc1 ("tcp: make challenge acks less predictable")

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jason Baron <jbaron@akamai.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-31 19:56:48 -07:00
Eric Dumazet
8c70521238 tcp: annotate data-race around challenge_timestamp
challenge_timestamp can be read an written by concurrent threads.

This was expected, but we need to annotate the race to avoid potential issues.

Following patch moves challenge_timestamp and challenge_count
to per-netns storage to provide better isolation.

Fixes: 354e4aa391 ("tcp: RFC 5961 5.2 Blind Data Injection Attack Mitigation")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-31 19:56:48 -07:00
Kurt Kanzenbach
52267ce25f net: dsa: hellcreek: Print warning only once
In case the source port cannot be decoded, print the warning only once. This
still brings attention to the user and does not spam the logs at the same time.

Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20220830163448.8921-1-kurt@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-31 19:54:04 -07:00
Richard Gobert
0e4d354762 net-next: Fix IP_UNICAST_IF option behavior for connected sockets
The IP_UNICAST_IF socket option is used to set the outgoing interface
for outbound packets.

The IP_UNICAST_IF socket option was added as it was needed by the
Wine project, since no other existing option (SO_BINDTODEVICE socket
option, IP_PKTINFO socket option or the bind function) provided the
needed characteristics needed by the IP_UNICAST_IF socket option. [1]
The IP_UNICAST_IF socket option works well for unconnected sockets,
that is, the interface specified by the IP_UNICAST_IF socket option
is taken into consideration in the route lookup process when a packet
is being sent. However, for connected sockets, the outbound interface
is chosen when connecting the socket, and in the route lookup process
which is done when a packet is being sent, the interface specified by
the IP_UNICAST_IF socket option is being ignored.

This inconsistent behavior was reported and discussed in an issue
opened on systemd's GitHub project [2]. Also, a bug report was
submitted in the kernel's bugzilla [3].

To understand the problem in more detail, we can look at what happens
for UDP packets over IPv4 (The same analysis was done separately in
the referenced systemd issue).
When a UDP packet is sent the udp_sendmsg function gets called and
the following happens:

1. The oif member of the struct ipcm_cookie ipc (which stores the
output interface of the packet) is initialized by the ipcm_init_sk
function to inet->sk.sk_bound_dev_if (the device set by the
SO_BINDTODEVICE socket option).

2. If the IP_PKTINFO socket option was set, the oif member gets
overridden by the call to the ip_cmsg_send function.

3. If no output interface was selected yet, the interface specified
by the IP_UNICAST_IF socket option is used.

4. If the socket is connected and no destination address is
specified in the send function, the struct ipcm_cookie ipc is not
taken into consideration and the cached route, that was calculated in
the connect function is being used.

Thus, for a connected socket, the IP_UNICAST_IF sockopt isn't taken
into consideration.

This patch corrects the behavior of the IP_UNICAST_IF socket option
for connect()ed sockets by taking into consideration the
IP_UNICAST_IF sockopt when connecting the socket.

In order to avoid reconnecting the socket, this option is still
ignored when applied on an already connected socket until connect()
is called again by the Richard Gobert.

Change the __ip4_datagram_connect function, which is called during
socket connection, to take into consideration the interface set by
the IP_UNICAST_IF socket option, in a similar way to what is done in
the udp_sendmsg function.

[1] https://lore.kernel.org/netdev/1328685717.4736.4.camel@edumazet-laptop/T/
[2] https://github.com/systemd/systemd/issues/11935#issuecomment-618691018
[3] https://bugzilla.kernel.org/show_bug.cgi?id=210255

Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220829111554.GA1771@debian
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-31 19:51:06 -07:00
Nicolas Dichtel
eb55dc09b5 ip: fix triggering of 'icmp redirect'
__mkroute_input() uses fib_validate_source() to trigger an icmp redirect.
My understanding is that fib_validate_source() is used to know if the src
address and the gateway address are on the same link. For that,
fib_validate_source() returns 1 (same link) or 0 (not the same network).
__mkroute_input() is the only user of these positive values, all other
callers only look if the returned value is negative.

Since the below patch, fib_validate_source() didn't return anymore 1 when
both addresses are on the same network, because the route lookup returns
RT_SCOPE_LINK instead of RT_SCOPE_HOST. But this is, in fact, right.
Let's adapat the test to return 1 again when both addresses are on the same
link.

CC: stable@vger.kernel.org
Fixes: 747c143072 ("ip: fix dflt addr selection for connected nexthop")
Reported-by: kernel test robot <yujie.liu@intel.com>
Reported-by: Heng Qi <hengqi@linux.alibaba.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220829100121.3821-1-nicolas.dichtel@6wind.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-31 19:50:36 -07:00
Jakub Kicinski
744ccd5c64 Merge branch 'net-sched-remove-unused-variables'
Zhengchao Shao says:

====================
net: sched: remove unused variables

The variable "other" is unused, remove it.
====================

Link: https://lore.kernel.org/r/20220830092255.281330-1-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-31 19:39:56 -07:00
Zhengchao Shao
4516c873e3 net: sched: gred/red: remove unused variables in struct red_stats
The variable "other" in the struct red_stats is not used. Remove it.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-31 19:39:53 -07:00
Zhengchao Shao
38af11717b net: sched: choke: remove unused variables in struct choke_sched_data
The variable "other" in the struct choke_sched_data is not used. Remove it.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-31 19:39:53 -07:00
Robert Hancock
cb45a8bf46 net: axienet: Switch to 64-bit RX/TX statistics
The RX and TX byte/packet statistics in this driver could be overflowed
relatively quickly on a 32-bit platform. Switch these stats to use the
u64_stats infrastructure to avoid this.

Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Link: https://lore.kernel.org/r/20220829233901.3429419-1-robert.hancock@calian.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-31 19:18:20 -07:00
Linus Walleij
a60511cf15 net/rds: Pass a pointer to virt_to_page()
Functions that work on a pointer to virtual memory such as
virt_to_pfn() and users of that function such as
virt_to_page() are supposed to pass a pointer to virtual
memory, ideally a (void *) or other pointer. However since
many architectures implement virt_to_pfn() as a macro,
this function becomes polymorphic and accepts both a
(unsigned long) and a (void *).

If we instead implement a proper virt_to_pfn(void *addr)
function the following happens (occurred on arch/arm):

net/rds/message.c:357:56: warning: passing argument 1
  of 'virt_to_pfn' makes pointer from integer without a
  cast [-Wint-conversion]

Fix this with an explicit cast.

Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Cc: rds-devel@oss.oracle.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20220829132001.114858-1-linus.walleij@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-31 19:12:32 -07:00
Jann Horn
2555283eb4 mm/rmap: Fix anon_vma->degree ambiguity leading to double-reuse
anon_vma->degree tracks the combined number of child anon_vmas and VMAs
that use the anon_vma as their ->anon_vma.

anon_vma_clone() then assumes that for any anon_vma attached to
src->anon_vma_chain other than src->anon_vma, it is impossible for it to
be a leaf node of the VMA tree, meaning that for such VMAs ->degree is
elevated by 1 because of a child anon_vma, meaning that if ->degree
equals 1 there are no VMAs that use the anon_vma as their ->anon_vma.

This assumption is wrong because the ->degree optimization leads to leaf
nodes being abandoned on anon_vma_clone() - an existing anon_vma is
reused and no new parent-child relationship is created.  So it is
possible to reuse an anon_vma for one VMA while it is still tied to
another VMA.

This is an issue because is_mergeable_anon_vma() and its callers assume
that if two VMAs have the same ->anon_vma, the list of anon_vmas
attached to the VMAs is guaranteed to be the same.  When this assumption
is violated, vma_merge() can merge pages into a VMA that is not attached
to the corresponding anon_vma, leading to dangling page->mapping
pointers that will be dereferenced during rmap walks.

Fix it by separately tracking the number of child anon_vmas and the
number of VMAs using the anon_vma as their ->anon_vma.

Fixes: 7a3ef208e6 ("mm: prevent endless growth of anon_vma hierarchy")
Cc: stable@kernel.org
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-08-31 15:45:10 -07:00
Sven van Ashbrook
7305b78ae4 r8152: allow userland to disable multicast
The rtl8152 driver does not disable multicasting when userspace asks
it to. For example:
 $ ifconfig eth0 -multicast -allmulti
 $ tcpdump -p -i eth0  # will still capture multicast frames

Fix by clearing the device multicast filter table when multicast and
allmulti are both unset.

Tested as follows:
- Set multicast on eth0 network interface
- verify that multicast packets are coming in:
  $ tcpdump -p -i eth0
- Clear multicast and allmulti on eth0 network interface
- verify that no more multicast packets are coming in:
  $ tcpdump -p -i eth0

Signed-off-by: Sven van Ashbrook <svenva@chromium.org>
Acked-by: Hayes Wang <hayeswang@realtek.com>
Link: https://lore.kernel.org/r/20220830045923.net-next.v1.1.I4fee0ac057083d4f848caf0fa3a9fd466fc374a0@changeid
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-31 14:55:50 -07:00
Toke Høiland-Jørgensen
90fabae8a2 sch_cake: Return __NET_XMIT_STOLEN when consuming enqueued skb
When the GSO splitting feature of sch_cake is enabled, GSO superpackets
will be broken up and the resulting segments enqueued in place of the
original skb. In this case, CAKE calls consume_skb() on the original skb,
but still returns NET_XMIT_SUCCESS. This can confuse parent qdiscs into
assuming the original skb still exists, when it really has been freed. Fix
this by adding the __NET_XMIT_STOLEN flag to the return value in this case.

Fixes: 0c850344d3 ("sch_cake: Conditionally split GSO segments")
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-18231
Link: https://lore.kernel.org/r/20220831092103.442868-1-toke@toke.dk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-31 14:20:08 -07:00
Wolfram Sang
f029c781dd net: ethernet: move from strlcpy with unused retval to strscpy
Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.

Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Petr Machata <petrm@nvidia.com> # For drivers/net/ethernet/mellanox/mlxsw
Acked-by: Geoff Levand <geoff@infradead.org> # For ps3_gelic_net and spider_net_ethtool
Acked-by: Tom Lendacky <thomas.lendacky@amd.com> # For drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c
Acked-by: Marcin Wojtas <mw@semihalf.com> # For drivers/net/ethernet/marvell/mvpp2
Reviewed-by: Leon Romanovsky <leonro@nvidia.com> # For drivers/net/ethernet/mellanox/mlx{4|5}
Reviewed-by: Shay Agroskin <shayagr@amazon.com> # For drivers/net/ethernet/amazon/ena
Acked-by: Krzysztof Hałasa <khalasa@piap.pl> # For IXP4xx Ethernet
Link: https://lore.kernel.org/r/20220830201457.7984-3-wsa+renesas@sang-engineering.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-31 14:11:26 -07:00
Wolfram Sang
fb3ceec187 net: move from strlcpy with unused retval to strscpy
Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.

Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for CAN
Link: https://lore.kernel.org/r/20220830201457.7984-1-wsa+renesas@sang-engineering.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-31 14:11:07 -07:00
Martin KaFai Lau
c9ae8c966f Merge branch 'fixes for concurrent htab updates'
Hou Tao says:

====================

From: Hou Tao <houtao1@huawei.com>

Hi,

The patchset aims to fix the issues found during investigating the
syzkaller problem reported in [0]. It seems that the concurrent updates
to the same hash-table bucket may fail as shown in patch 1.

Patch 1 uses preempt_disable() to fix the problem for
htab_use_raw_lock() case. For !htab_use_raw_lock() case, the problem is
left to "BPF specific memory allocator" patchset [1] in which
!htab_use_raw_lock() will be removed.

Patch 2 fixes the out-of-bound memory read problem reported in [0]. The
problem has the root cause as patch 1 and it is fixed by handling -EBUSY
from htab_lock_bucket() correctly.

Patch 3 add two cases for hash-table update: one for the reentrancy of
bpf_map_update_elem(), and another one for concurrent updates of the
same hash-table bucket.

Comments are always welcome.

Regards,
Tao

[0]: https://lore.kernel.org/bpf/CACkBjsbuxaR6cv0kXJoVnBfL9ZJXjjoUcMpw_Ogc313jSrg14A@mail.gmail.com/
[1]: https://lore.kernel.org/bpf/20220819214232.18784-1-alexei.starovoitov@gmail.com/

Change Log:

v4:
 * rebased on bpf-next
 * add htab_update to DENYLIST.s390x

v3: https://lore.kernel.org/bpf/20220829023709.1958204-1-houtao@huaweicloud.com/
 * patch 1: update commit message and add Fixes tag
 * patch 2: add Fixes tag
 * patch 3: elaborate the description of test cases

v2: https://lore.kernel.org/bpf/bd60ef93-1c6a-2db2-557d-b09b92ad22bd@huaweicloud.com/
 * Note the fix is for CONFIG_PREEMPT case in commit message and add
   Reviewed-by tag for patch 1
 * Drop patch "bpf: Allow normally concurrent map updates for !htab_use_raw_lock() case"

v1: https://lore.kernel.org/bpf/20220821033223.2598791-1-houtao@huaweicloud.com/
====================

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2022-08-31 14:10:01 -07:00
Hou Tao
1c636b6277 selftests/bpf: Add test cases for htab update
One test demonstrates the reentrancy of hash map update on the same
bucket should fail, and another one shows concureently updates of
the same hash map bucket should succeed and not fail due to
the reentrancy checking for bucket lock.

There is no trampoline support on s390x, so move htab_update to
denylist.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20220831042629.130006-4-houtao@huaweicloud.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2022-08-31 14:10:01 -07:00
Hou Tao
66a7a92e4d bpf: Propagate error from htab_lock_bucket() to userspace
In __htab_map_lookup_and_delete_batch() if htab_lock_bucket() returns
-EBUSY, it will go to next bucket. Going to next bucket may not only
skip the elements in current bucket silently, but also incur
out-of-bound memory access or expose kernel memory to userspace if
current bucket_cnt is greater than bucket_size or zero.

Fixing it by stopping batch operation and returning -EBUSY when
htab_lock_bucket() fails, and the application can retry or skip the busy
batch as needed.

Fixes: 20b6cc34ea ("bpf: Avoid hashtab deadlock with map_locked")
Reported-by: Hao Sun <sunhao.th@gmail.com>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20220831042629.130006-3-houtao@huaweicloud.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2022-08-31 14:10:01 -07:00
Hou Tao
2775da2162 bpf: Disable preemption when increasing per-cpu map_locked
Per-cpu htab->map_locked is used to prohibit the concurrent accesses
from both NMI and non-NMI contexts. But since commit 74d862b682
("sched: Make migrate_disable/enable() independent of RT"),
migrate_disable() is also preemptible under CONFIG_PREEMPT case, so now
map_locked also disallows concurrent updates from normal contexts
(e.g. userspace processes) unexpectedly as shown below:

process A                      process B

htab_map_update_elem()
  htab_lock_bucket()
    migrate_disable()
    /* return 1 */
    __this_cpu_inc_return()
    /* preempted by B */

                               htab_map_update_elem()
                                 /* the same bucket as A */
                                 htab_lock_bucket()
                                   migrate_disable()
                                   /* return 2, so lock fails */
                                   __this_cpu_inc_return()
                                   return -EBUSY

A fix that seems feasible is using in_nmi() in htab_lock_bucket() and
only checking the value of map_locked for nmi context. But it will
re-introduce dead-lock on bucket lock if htab_lock_bucket() is re-entered
through non-tracing program (e.g. fentry program).

One cannot use preempt_disable() to fix this issue as htab_use_raw_lock
being false causes the bucket lock to be a spin lock which can sleep and
does not work with preempt_disable().

Therefore, use migrate_disable() when using the spinlock instead of
preempt_disable() and defer fixing concurrent updates to when the kernel
has its own BPF memory allocator.

Fixes: 74d862b682 ("sched: Make migrate_disable/enable() independent of RT")
Reviewed-by: Hao Luo <haoluo@google.com>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20220831042629.130006-2-houtao@huaweicloud.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2022-08-31 14:10:01 -07:00
Martin KaFai Lau
197072945a selftest/bpf: Ensure no module loading in bpf_setsockopt(TCP_CONGESTION)
This patch adds a test to ensure bpf_setsockopt(TCP_CONGESTION, "not_exist")
will not trigger the kernel module autoload.

Before the fix:

  [   40.535829] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:274
  [...]
  [   40.552134]  tcp_ca_find_autoload.constprop.0+0xcb/0x200
  [   40.552689]  tcp_set_congestion_control+0x99/0x7b0
  [   40.553203]  do_tcp_setsockopt+0x3ed/0x2240
  [...]
  [   40.556041]  __bpf_setsockopt+0x124/0x640

Signed-off-by: Martin KaFai Lau <martin.lau@linux.dev>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220830231953.792412-1-martin.lau@linux.dev
2022-08-31 22:22:29 +02:00
Martin KaFai Lau
84e5a0f208 bpf, net: Avoid loading module when calling bpf_setsockopt(TCP_CONGESTION)
When bpf prog changes tcp-cc by calling bpf_setsockopt(TCP_CONGESTION),
it should not try to load module which may be a blocking operation.
This details was correct in the v1 [0] but missed by mistake in the
later revision in commit cb388e7ee3 ("bpf: net: Change do_tcp_setsockopt()
to use the sockopt's lock_sock() and capable()"). This patch fixes it by
checking the has_current_bpf_ctx().

  [0] https://lore.kernel.org/bpf/20220727060921.2373314-1-kafai@fb.com/

Fixes: cb388e7ee3 ("bpf: net: Change do_tcp_setsockopt() to use the sockopt's lock_sock() and capable()")
Signed-off-by: Martin KaFai Lau <martin.lau@linux.dev>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220830231946.791504-1-martin.lau@linux.dev
2022-08-31 22:21:45 +02:00
Axel Rasmussen
5a3a599810 selftests: net: sort .gitignore file
This is the result of `sort tools/testing/selftests/net/.gitignore`, but
preserving the comment at the top.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Link: https://lore.kernel.org/r/20220829184748.1535580-1-axelrasmussen@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-31 12:47:57 -07:00
Randy Dunlap
404a5ad720 Documentation: networking: correct possessive "its"
Change occurrences of "it's" that are possessive to "its"
so that they don't read as "it is".

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20220829235414.17110-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-31 12:36:08 -07:00
Heiner Kallweit
8af1a9afe1 net: phy: smsc: use device-managed clock API
Simplify the code by using the device-managed clock API.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/b222be68-ba7e-999d-0a07-eca0ecedf74e@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-31 12:19:18 -07:00
Cong Wang
8fc29ff391 kcm: fix strp_init() order and cleanup
strp_init() is called just a few lines above this csk->sk_user_data
check, it also initializes strp->work etc., therefore, it is
unnecessary to call strp_done() to cancel the freshly initialized
work.

And if sk_user_data is already used by KCM, psock->strp should not be
touched, particularly strp->work state, so we need to move strp_init()
after the csk->sk_user_data check.

This also makes a lockdep warning reported by syzbot go away.

Reported-and-tested-by: syzbot+9fc084a4348493ef65d2@syzkaller.appspotmail.com
Reported-by: syzbot+e696806ef96cdd2d87cd@syzkaller.appspotmail.com
Fixes: e557124023 ("kcm: Check if sk_user_data already set in kcm_attach")
Fixes: dff8baa261 ("kcm: Call strp_stop before strp_done in kcm_attach")
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Link: https://lore.kernel.org/r/20220827181314.193710-1-xiyou.wangcong@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-31 12:16:44 -07:00
David Thompson
3a1a274e93 mlxbf_gige: compute MDIO period based on i1clk
This patch adds logic to compute the MDIO period based on
the i1clk, and thereafter write the MDIO period into the YU
MDIO config register. The i1clk resource from the ACPI table
is used to provide addressing to YU bootrecord PLL registers.
The values in these registers are used to compute MDIO period.
If the i1clk resource is not present in the ACPI table, then
the current default hardcorded value of 430Mhz is used.
The i1clk clock value of 430MHz is only accurate for boards
with BF2 mid bin and main bin SoCs. The BF2 high bin SoCs
have i1clk = 500MHz, but can support a slower MDIO period.

Fixes: f92e1869d7 ("Add Mellanox BlueField Gigabit Ethernet driver")
Reviewed-by: Asmaa Mnebhi <asmaa@nvidia.com>
Signed-off-by: David Thompson <davthompson@nvidia.com>
Link: https://lore.kernel.org/r/20220826155916.12491-1-davthompson@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-31 12:13:46 -07:00
James Hilliard
14e5ce7994 libbpf: Add GCC support for bpf_tail_call_static
The bpf_tail_call_static function is currently not defined unless
using clang >= 8.

To support bpf_tail_call_static on GCC we can check if __clang__ is
not defined to enable bpf_tail_call_static.

We need to use GCC assembly syntax when the compiler does not define
__clang__ as LLVM inline assembly is not fully compatible with GCC.

Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220829210546.755377-1-james.hilliard1@gmail.com
2022-08-31 20:54:23 +02:00
Linus Torvalds
c5e4d5e991 Merge tag 'fscache-fixes-20220831' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull fscache/cachefiles fixes from David Howells:

 - Fix kdoc on fscache_use/unuse_cookie().

 - Fix the error returned by cachefiles_ondemand_copen() from an upcall
   result.

 - Fix the distribution of requests in on-demand mode in cachefiles to
   be fairer by cycling through them rather than picking the one with
   the lowest ID each time (IDs being reused).

* tag 'fscache-fixes-20220831' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  cachefiles: make on-demand request distribution fairer
  cachefiles: fix error return code in cachefiles_ondemand_copen()
  fscache: fix misdocumented parameter
2022-08-31 10:13:34 -07:00
Linus Torvalds
a1f764268f Merge tag 'for-linus-2022083101' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID fixes from Jiri Kosina:

 - NULL pointer dereference fix for Steam driver (Lee Jones)

 - memory leak fix for hidraw (Karthik Alapati)

 - regression fix for functionality of some UCLogic tables (Benjamin
   Tissoires)

 - a few new device IDs and device-specific quirks

* tag 'for-linus-2022083101' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
  HID: nintendo: fix rumble worker null pointer deref
  HID: intel-ish-hid: ipc: Add Meteor Lake PCI device ID
  HID: input: fix uclogic tablets
  HID: Add Apple Touchbar on T2 Macs in hid_have_special_driver list
  HID: add Lenovo Yoga C630 battery quirk
  HID: AMD_SFH: Add a DMI quirk entry for Chromebooks
  HID: thrustmaster: Add sparco wheel and fix array length
  hid: intel-ish-hid: ishtp: Fix ishtp client sending disordered message
  HID: ishtp-hid-clientHID: ishtp-hid-client: Fix comment typo
  HID: asus: ROG NKey: Ignore portion of 0x5a report
  HID: hidraw: fix memory leak in hidraw_release()
  HID: steam: Prevent NULL pointer dereference in steam_{recv,send}_report
2022-08-31 09:54:14 -07:00
Linus Torvalds
2361d3841f Merge tag 'v6.0-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
 "Fix a boot performance regression due to an unnecessary dependency on
  XOR_BLOCKS"

* tag 'v6.0-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: lib - remove unneeded selection of XOR_BLOCKS
2022-08-31 09:47:06 -07:00
Linus Torvalds
9c9d1896fa Merge tag 'lsm-pr-20220829' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull LSM support for IORING_OP_URING_CMD from Paul Moore:
 "Add SELinux and Smack controls to the io_uring IORING_OP_URING_CMD.

  These are necessary as without them the IORING_OP_URING_CMD remains
  outside the purview of the LSMs (Luis' LSM patch, Casey's Smack patch,
  and my SELinux patch). They have been discussed at length with the
  io_uring folks, and Jens has given his thumbs-up on the relevant
  patches (see the commit descriptions).

  There is one patch that is not strictly necessary, but it makes
  testing much easier and is very trivial: the /dev/null
  IORING_OP_URING_CMD patch."

* tag 'lsm-pr-20220829' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
  Smack: Provide read control for io_uring_cmd
  /dev/null: add IORING_OP_URING_CMD support
  selinux: implement the security_uring_cmd() LSM hook
  lsm,io_uring: add LSM hooks for the new uring_cmd file op
2022-08-31 09:23:16 -07:00
Xin Yin
1122f40072 cachefiles: make on-demand request distribution fairer
For now, enqueuing and dequeuing on-demand requests all start from
idx 0, this makes request distribution unfair. In the weighty
concurrent I/O scenario, the request stored in higher idx will starve.

Searching requests cyclically in cachefiles_ondemand_daemon_read,
makes distribution fairer.

Fixes: c838305450 ("cachefiles: notify the user daemon when looking up cookie")
Reported-by: Yongqing Li <liyongqing@bytedance.com>
Signed-off-by: Xin Yin <yinxin.x@bytedance.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220817065200.11543-1-yinxin.x@bytedance.com/ # v1
Link: https://lore.kernel.org/r/20220825020945.2293-1-yinxin.x@bytedance.com/ # v2
2022-08-31 16:41:10 +01:00
Sun Ke
c93ccd63b1 cachefiles: fix error return code in cachefiles_ondemand_copen()
The cache_size field of copen is specified by the user daemon.
If cache_size < 0, then the OPEN request is expected to fail,
while copen itself shall succeed. However, returning 0 is indeed
unexpected when cache_size is an invalid error code.

Fix this by returning error when cache_size is an invalid error code.

Changes
=======
v4: update the code suggested by Dan
v3: update the commit log suggested by Jingbo.

Fixes: c838305450 ("cachefiles: notify the user daemon when looking up cookie")
Signed-off-by: Sun Ke <sunke32@huawei.com>
Suggested-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Suggested-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/20220818111935.1683062-1-sunke32@huawei.com/ # v2
Link: https://lore.kernel.org/r/20220818125038.2247720-1-sunke32@huawei.com/ # v3
Link: https://lore.kernel.org/r/20220826023515.3437469-1-sunke32@huawei.com/ # v4
2022-08-31 16:41:10 +01:00
Khalid Masum
ec1bd37123 fscache: fix misdocumented parameter
This patch fixes two warnings generated by make docs. The functions
fscache_use_cookie and fscache_unuse_cookie, both have a parameter
named cookie. But they are documented with the name "object" with
unclear description. Which generates the warning when creating docs.

This commit will replace the currently misdocumented parameter names
with the correct ones while adding proper descriptions.

CC: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Khalid Masum <khalid.masum.92@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20220521142446.4746-1-khalid.masum.92@gmail.com/ # v1
Link: https://lore.kernel.org/r/20220818040738.12036-1-khalid.masum.92@gmail.com/ # v2
Link: https://lore.kernel.org/r/880d7d25753fb326ee17ac08005952112fcf9bdb.1657360984.git.mchehab@kernel.org/ # Mauro's version
2022-08-31 14:57:28 +01:00
David S. Miller
6edd302a1c Merge branch 'hns3-next'
Guangbin Huang says:

====================
net: hns3: updates for -next

This series includes some updates for the HNS3 ethernet driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-31 14:10:50 +01:00
Guangbin Huang
08aa17a0c1 net: hns3: net: hns3: add querying and setting fec off mode from firmware
For some new devices, the FEC mode can not be set to OFF in speed 200G.
In order to flexibly adapt to all types of devices, driver queries
fec ability from firmware to decide whether OFF mode can be supported.

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-31 14:10:49 +01:00
Hao Lan
5c4f72842d net: hns3: add querying and setting fec llrs mode from firmware
This patch supports llrs fec mode in speed 200G for some new devices, and
suppoprts querying llrs fec ability from firmware.

Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-31 14:10:49 +01:00
Guangbin Huang
eaf83ae59e net: hns3: add querying fec ability from firmware
For some new devices, driver can queries fec ability from firmware to
decide which FEC mode can be supported.

If devices of old version which not support querying fec ability, driver
sets fixed ability according to current speed.

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-31 14:10:49 +01:00
Guangbin Huang
507e46ae26 net: hns3: add getting capabilities of gro offload and fd from firmware
As some new devices may not support GRO offload and flow table director,
to support these devices, driver needs to querying capabilities of GRO
offload and flow table director from firmware. Whether the driver
supports these two features depends on capabilities.

For old device of version HNAE3_DEVICE_VERSION_V2, driver sets their
capabilities of these two features to fixed value.

Setting default features of netdev and debugfs also need to identify
whether support these two features.

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-31 14:10:49 +01:00
David S. Miller
39a7d7261a Merge branch 'thunderbolt-end-to-end-flow-control'
Mika Westerberg says:

====================
thunderbolt: net: Enable full end-to-end flow control

Thunderbolt/USB4 host controllers support full end-to-end flow control
that prevents dropping packets if there are not enough hardware receive
buffers. So far it has not been enabled for the networking driver yet
but this series changes that. There is one snag though: the second
generation (Intel Falcon Ridge) had a bug that needs special quirk to
get it working. We had that in the early stages of the Thunderbolt/USB4
driver but it got dropped because it was not needed at the time. Now we
add it back as a quirk for the host controller (NHI).

The first patch of this series is a bugfix that I'm planning to push for
v6.0-rc. Rest are v6.1 material. This also includes a patch that shows
the XDomain link type in sysfs the same way we do for USB4 routers and
updates the networking driver module description.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-31 14:05:12 +01:00
Mika Westerberg
e550ed4b87 net: thunderbolt: Update module description with mention of USB4
It is Thunderbolt/USB4 now so reflect that in the module description too
to avoid any confusion. No functional changes.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-31 14:05:12 +01:00
Mika Westerberg
8bdc25cf62 net: thunderbolt: Enable full end-to-end flow control
USB4NET protocol allows the networking drivers to take advantage of
end-to-end flow control supported by the USB4 host interface. This
should prevent the receiving side from dropping network packets.

In adddition add a module parameter that can be used to turn this off
just in case it causes problems.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-31 14:05:12 +01:00
Mika Westerberg
54669e2f17 thunderbolt: Add back Intel Falcon Ridge end-to-end flow control workaround
As we are now enabling full end-to-end flow control to the Thunderbolt
networking driver, in order for it to work properly on second generation
Thunderbolt hardware (Falcon Ridge), we need to add back the workaround
that was removed with commit 53f13319d1 ("thunderbolt: Get rid of E2E
workaround"). However, this time we only apply it for Falcon Ridge
controllers as a form of an additional quirk. For non-Falcon Ridge this
does nothing.

While there fix a typo 'reqister' -> 'register' in the comment.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-31 14:05:12 +01:00
Mika Westerberg
f9cad07b84 thunderbolt: Show link type for XDomain connections too
Following what we do for routers already, extend this to XDomain
connections as well. This will show in sysfs whether the link is in USB4
or Thunderbolt mode.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-31 14:05:12 +01:00
Mika Westerberg
ff7cd07f30 net: thunderbolt: Enable DMA paths only after rings are enabled
If the other host starts sending packets early on it is possible that we
are still in the middle of populating the initial Rx ring packets to the
ring. This causes the tbnet_poll() to mess over the queue and causes
list corruption. This happens specifically when connected with macOS as
it seems start sending various IP discovery packets as soon as its side
of the paths are configured.

To prevent this we move the DMA path enabling to happen after we have
primed the Rx ring. This makes sure no incoming packets can arrive
before we are ready to handle them.

Fixes: e69b6c02b4 ("net: Add support for networking over Thunderbolt cable")
Cc: stable@vger.kernel.org
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-31 14:05:11 +01:00
Duoming Zhou
c0955bf957 ethernet: rocker: fix sleep in atomic context bug in neigh_timer_handler
The function neigh_timer_handler() is a timer handler that runs in an
atomic context. When used by rocker, neigh_timer_handler() calls
"kzalloc(.., GFP_KERNEL)" that may sleep. As a result, the sleep in
atomic context bug will happen. One of the processes is shown below:

ofdpa_fib4_add()
 ...
 neigh_add_timer()

(wait a timer)

neigh_timer_handler()
 neigh_release()
  neigh_destroy()
   rocker_port_neigh_destroy()
    rocker_world_port_neigh_destroy()
     ofdpa_port_neigh_destroy()
      ofdpa_port_ipv4_neigh()
       kzalloc(sizeof(.., GFP_KERNEL) //may sleep

This patch changes the gfp_t parameter of kzalloc() from GFP_KERNEL to
GFP_ATOMIC in order to mitigate the bug.

Fixes: 00fc0c51e3 ("rocker: Change world_ops API and implementation to be switchdev independant")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-31 14:01:29 +01:00