Yan-Hsuan has been away since 2021 and Ping has been the de facto maintainer
the past year, actively reviewing patches and doing all other maintainer
duties. So fix the MAINTAINERS file to show the current situation.
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230724104547.3061709-2-kvalo@kernel.org
Kuniyuki Iwashima says:
====================
net: Support STP on bridge in non-root netns.
Currently, STP does not work in non-root netns as llc_rcv() drops
packets from non-root netns.
This series fixes it by making some protocol handlers netns-aware,
which are called from llc_rcv() as follows:
llc_rcv()
|
|- sap->rcv_func : registered by llc_sap_open()
|
| * functions : regsitered by register_8022_client()
| -> No in-kernel user call register_8022_client()
|
| * snap_rcv()
| |
| `- proto->rcvfunc() : registered by register_snap_client()
|
| * aarp_rcv() : drop packets from non-root netns
| * atalk_rcv() : drop packets from non-root netns
|
| * stp_pdu_rcv()
| |
| `- garp_protos[]->rcv() : registered by stp_proto_register()
|
| * garp_pdu_rcv() : netns-aware
| * br_stp_rcv() : netns-aware
|
|- llc_type_handlers[llc_pdu_type(skb) - 1]
|
| * llc_sap_handler() : NOT netns-aware (Patch 1)
| * llc_conn_handler() : NOT netns-aware (Patch 2)
|
`- llc_station_handler
* llc_station_rcv() : netns-aware
Patch 1 & 2 convert not-netns-aware functions and Patch 3 remove the
netns restriction in llc_rcv().
Note this series does not namespacify AF_LLC so that these patches
can be backported to stable without conflicts (at least to 4.14.y).
Another series that adds netns support for AF_LLC will be targeted
to net-next later.
====================
Link: https://lore.kernel.org/r/20230718174152.57408-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This reverts commit 56a16035bb.
Since the previous commit, STP works on bridge in netns.
# unshare -n
# ip link add br0 type bridge
# ip link add veth0 type veth peer name veth1
# ip link set veth0 master br0 up
[ 50.558135] br0: port 1(veth0) entered blocking state
[ 50.558366] br0: port 1(veth0) entered disabled state
[ 50.558798] veth0: entered allmulticast mode
[ 50.564401] veth0: entered promiscuous mode
# ip link set veth1 master br0 up
[ 54.215487] br0: port 2(veth1) entered blocking state
[ 54.215657] br0: port 2(veth1) entered disabled state
[ 54.215848] veth1: entered allmulticast mode
[ 54.219577] veth1: entered promiscuous mode
# ip link set br0 type bridge stp_state 1
# ip link set br0 up
[ 61.960726] br0: port 2(veth1) entered blocking state
[ 61.961097] br0: port 2(veth1) entered listening state
[ 61.961495] br0: port 1(veth0) entered blocking state
[ 61.961653] br0: port 1(veth0) entered listening state
[ 63.998835] br0: port 2(veth1) entered blocking state
[ 77.437113] br0: port 1(veth0) entered learning state
[ 86.653501] br0: received packet on veth0 with own address as source address (addr:6e:0f:e7:6f:5f:5f, vlan:0)
[ 92.797095] br0: port 1(veth0) entered forwarding state
[ 92.797398] br0: topology change detected, propagating
Let's remove the warning.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Now these upper layer protocol handlers can be called from llc_rcv()
as sap->rcv_func(), which is registered by llc_sap_open().
* function which is passed to register_8022_client()
-> no in-kernel user calls register_8022_client().
* snap_rcv()
`- proto->rcvfunc() : registered by register_snap_client()
-> aarp_rcv() and atalk_rcv() drop packets from non-root netns
* stp_pdu_rcv()
`- garp_protos[]->rcv() : registered by stp_proto_register()
-> garp_pdu_rcv() and br_stp_rcv() are netns-aware
So, we can safely remove the netns restriction in llc_rcv().
Fixes: e730c15519 ("[NET]: Make packet reception network namespace safe")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
We will remove this restriction in llc_rcv() in the following patch,
which means that the protocol handler must be aware of netns.
if (!net_eq(dev_net(dev), &init_net))
goto drop;
llc_rcv() fetches llc_type_handlers[llc_pdu_type(skb) - 1] and calls it
if not NULL.
If the PDU type is LLC_DEST_CONN, llc_conn_handler() is called to pass
skb to corresponding sockets. Then, we must look up a proper socket in
the same netns with skb->dev.
llc_conn_handler() calls __llc_lookup() to look up a established or
litening socket by __llc_lookup_established() and llc_lookup_listener().
Both functions iterate on a list and call llc_estab_match() or
llc_listener_match() to check if the socket is the correct destination.
However, these functions do not check netns.
Also, bind() and connect() call llc_establish_connection(), which
finally calls __llc_lookup_established(), to check if there is a
conflicting socket.
Let's test netns in llc_estab_match() and llc_listener_match().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
We will remove this restriction in llc_rcv() soon, which means that the
protocol handler must be aware of netns.
if (!net_eq(dev_net(dev), &init_net))
goto drop;
llc_rcv() fetches llc_type_handlers[llc_pdu_type(skb) - 1] and calls it
if not NULL.
If the PDU type is LLC_DEST_SAP, llc_sap_handler() is called to pass skb
to corresponding sockets. Then, we must look up a proper socket in the
same netns with skb->dev.
If the destination is a multicast address, llc_sap_handler() calls
llc_sap_mcast(). It calculates a hash based on DSAP and skb->dev->ifindex,
iterates on a socket list, and calls llc_mcast_match() to check if the
socket is the correct destination. Then, llc_mcast_match() checks if
skb->dev matches with llc_sk(sk)->dev. So, we need not check netns here.
OTOH, if the destination is a unicast address, llc_sap_handler() calls
llc_lookup_dgram() to look up a socket, but it does not check the netns.
Therefore, we need to add netns check in llc_lookup_dgram().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This reverts commit 3f4ca5fafc.
Commit 3f4ca5fafc ("tcp: avoid the lookup process failing to get sk in
ehash table") reversed the order in how a socket is inserted into ehash
to fix an issue that ehash-lookup could fail when reqsk/full sk/twsk are
swapped. However, it introduced another lookup failure.
The full socket in ehash is allocated from a slab with SLAB_TYPESAFE_BY_RCU
and does not have SOCK_RCU_FREE, so the socket could be reused even while
it is being referenced on another CPU doing RCU lookup.
Let's say a socket is reused and inserted into the same hash bucket during
lookup. After the blamed commit, a new socket is inserted at the end of
the list. If that happens, we will skip sockets placed after the previous
position of the reused socket, resulting in ehash lookup failure.
As described in Documentation/RCU/rculist_nulls.rst, we should insert a
new socket at the head of the list to avoid such an issue.
This issue, the swap-lookup-failure, and another variant reported in [0]
can all be handled properly by adding a locked ehash lookup suggested by
Eric Dumazet [1].
However, this issue could occur for every packet, thus more likely than
the other two races, so let's revert the change for now.
Link: https://lore.kernel.org/netdev/20230606064306.9192-1-duanmuquan@baidu.com/ [0]
Link: https://lore.kernel.org/netdev/CANn89iK8snOz8TYOhhwfimC7ykYA78GA3Nyv8x06SZYa1nKdyA@mail.gmail.com/ [1]
Fixes: 3f4ca5fafc ("tcp: avoid the lookup process failing to get sk in ehash table")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230717215918.15723-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alexei Starovoitov says:
====================
pull-request: bpf 2023-07-19
We've added 4 non-merge commits during the last 1 day(s) which contain
a total of 3 files changed, 55 insertions(+), 10 deletions(-).
The main changes are:
1) Fix stack depth check in presence of async callbacks,
from Kumar Kartikeya Dwivedi.
2) Fix BTI type used for freplace attached functions,
from Alexander Duyck.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf, arm64: Fix BTI type used for freplace attached functions
selftests/bpf: Add more tests for check_max_stack_depth bug
bpf: Repeat check_max_stack_depth for async callbacks
bpf: Fix subprog idx logic in check_max_stack_depth
====================
Link: https://lore.kernel.org/r/20230719174502.74023-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
goto free_skb if an unexpected result is returned by pskb_tirm()
in erspan_xmit().
Signed-off-by: Yuanjun Gong <ruc_gongyuanjun@163.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
goto err_free_skb if an unexpected result is returned by pskb_tirm()
in erspan_fb_xmit().
Signed-off-by: Yuanjun Gong <ruc_gongyuanjun@163.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ocelot_fdma_receive_skb should return false if an unexpected
value is returned by pskb_trim.
Signed-off-by: Yuanjun Gong <ruc_gongyuanjun@163.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
in emac_tso_csum(), return an error code if an unexpected value
is returned by pskb_trim().
Signed-off-by: Yuanjun Gong <ruc_gongyuanjun@163.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
goto tx_err if an unexpected result is returned by pskb_tirm()
in ip6erspan_tunnel_xmit().
Fixes: 5a963eb61b ("ip6_gre: Add ERSPAN native tunnel support")
Signed-off-by: Yuanjun Gong <ruc_gongyuanjun@163.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
key might contain private part of the key, so better use
kfree_sensitive to free it.
Fixes: 38320c70d2 ("[IPSEC]: Use crypto_aead and authenc in ESP")
Signed-off-by: Wang Ming <machel@vivo.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-07-17 (iavf)
This series contains updates to iavf driver only.
Ding Hui fixes use-after-free issue by calling netif_napi_del() for all
allocated q_vectors. He also resolves out-of-bounds issue by not
updating to new values when timeout is encountered.
Marcin and Ahmed change the way resets are handled so that the callback
operating under the RTNL lock will wait for the reset to finish, the
rtnl_lock sensitive functions in reset flow will schedule the netdev update
for later in order to remove circular dependency with the critical lock.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
iavf: fix reset task race with iavf_remove()
iavf: fix a deadlock caused by rtnl and driver's lock circular dependencies
Revert "iavf: Do not restart Tx queues after reset task failure"
Revert "iavf: Detach device during reset task"
iavf: Wait for reset in callbacks which trigger it
iavf: use internal state to free traffic IRQs
iavf: Fix out-of-bounds when setting channels on remove
iavf: Fix use-after-free in free_netdev
====================
Link: https://lore.kernel.org/r/20230717175205.3217774-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Hardware generated encryption and ICV tags are found to
be wrong when tested with IEEE MACSEC test vectors.
This is because as per the HRM, the hash key (derived by
AES-ECB block encryption of an all 0s block with the SAK)
has to be programmed by the software in
MCSX_RS_MCS_CPM_TX_SLAVE_SA_PLCY_MEM_4X register.
Hence fix this by generating hash key in software and
configuring in hardware.
Fixes: c54ffc7360 ("octeontx2-pf: mcs: Introduce MACSEC hardware offloading")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://lore.kernel.org/r/1689574603-28093-1-git-send-email-sbhatta@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In normal operation, each populated queue item has
next_to_watch pointing to the last TX desc of the packet,
while each cleaned item has it set to 0. In particular,
next_to_use that points to the next (necessarily clean)
item to use has next_to_watch set to 0.
When the TX queue is used both by an application using
AF_XDP with ZEROCOPY as well as a second non-XDP application
generating high traffic, the queue pointers can get in
an invalid state where next_to_use points to an item
where next_to_watch is NOT set to 0.
However, the implementation assumes at several places
that this is never the case, so if it does hold,
bad things happen. In particular, within the loop inside
of igc_clean_tx_irq(), next_to_clean can overtake next_to_use.
Finally, this prevents any further transmission via
this queue and it never gets unblocked or signaled.
Secondly, if the queue is in this garbled state,
the inner loop of igc_clean_tx_ring() will never terminate,
completely hogging a CPU core.
The reason is that igc_xdp_xmit_zc() reads next_to_use
before acquiring the lock, and writing it back
(potentially unmodified) later. If it got modified
before locking, the outdated next_to_use is written
pointing to an item that was already used elsewhere
(and thus next_to_watch got written).
Fixes: 9acf59a752 ("igc: Enable TX via AF_XDP zero-copy")
Signed-off-by: Florian Kauer <florian.kauer@linutronix.de>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Kurt Kanzenbach <kurt@linutronix.de>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20230717175444.3217831-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Marc Kleine-Budde says:
====================
pull-request: can 2023-07-17
The 1st patch is by Ziyang Xuan and fixes a possible memory leak in
the receiver handling in the CAN RAW protocol.
YueHaibing contributes a use after free in bcm_proc_show() of the
Broad Cast Manager (BCM) CAN protocol.
The next 2 patches are by me and fix a possible null pointer
dereference in the RX path of the gs_usb driver with activated
hardware timestamps and the candlelight firmware.
The last patch is by Fedor Ross, Marek Vasut and me and targets the
mcp251xfd driver. The polling timeout of __mcp251xfd_chip_set_mode()
is increased to fix bus joining on busy CAN buses and very low bit
rate.
* tag 'linux-can-fixes-for-6.5-20230717' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: mcp251xfd: __mcp251xfd_chip_set_mode(): increase poll timeout
can: gs_usb: fix time stamp counter initialization
can: gs_usb: gs_can_open(): improve error handling
can: bcm: Fix UAF in bcm_proc_show()
can: raw: fix receiver memory leak
====================
Link: https://lore.kernel.org/r/20230717180938.230816-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When running an freplace attached bpf program on an arm64 system w were
seeing the following issue:
Unhandled 64-bit el1h sync exception on CPU47, ESR 0x0000000036000003 -- BTI
After a bit of work to track it down I determined that what appeared to be
happening is that the 'bti c' at the start of the program was somehow being
reached after a 'br' instruction. Further digging pointed me toward the
fact that the function was attached via freplace. This in turn led me to
build_plt which I believe is invoking the long jump which is triggering
this error.
To resolve it we can replace the 'bti c' with 'bti jc' and add a comment
explaining why this has to be modified as such.
Fixes: b2ad54e153 ("bpf, arm64: Implement bpf_arch_text_poke() for arm64")
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Acked-by: Xu Kuohai <xukuohai@huawei.com>
Link: https://lore.kernel.org/r/168926677665.316237.9953845318337455525.stgit@ahduyck-xeon-server.home.arpa
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Kumar Kartikeya Dwivedi says:
====================
Two more fixes for check_max_stack_depth
I noticed two more bugs while reviewing the code, description and
examples available in the patches.
One leads to incorrect subprog index to be stored in the frame stack
maintained by the function (leading to incorrect tail_call_reachable
marks, among other things).
The other problem is missing exploration pass of other async callbacks
when they are not called from the main prog. Call chains rooted at them
can thus bypass the stack limits (32 call frames * max permitted stack
depth per function).
Changelog:
----------
v1 -> v2
v1: https://lore.kernel.org/bpf/20230713003118.1327943-1-memxor@gmail.com
* Fix commit message for patch 2 (Alexei)
====================
Link: https://lore.kernel.org/r/20230717161530.1238-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Another test which now exercies the path of the verifier where it will
explore call chains rooted at the async callback. Without the prior
fixes, this program loads successfully, which is incorrect.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230717161530.1238-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
While the check_max_stack_depth function explores call chains emanating
from the main prog, which is typically enough to cover all possible call
chains, it doesn't explore those rooted at async callbacks unless the
async callback will have been directly called, since unlike non-async
callbacks it skips their instruction exploration as they don't
contribute to stack depth.
It could be the case that the async callback leads to a callchain which
exceeds the stack depth, but this is never reachable while only
exploring the entry point from main subprog. Hence, repeat the check for
the main subprog *and* all async callbacks marked by the symbolic
execution pass of the verifier, as execution of the program may begin at
any of them.
Consider functions with following stack depths:
main: 256
async: 256
foo: 256
main:
rX = async
bpf_timer_set_callback(...)
async:
foo()
Here, async is not descended as it does not contribute to stack depth of
main (since it is referenced using bpf_pseudo_func and not
bpf_pseudo_call). However, when async is invoked asynchronously, it will
end up breaching the MAX_BPF_STACK limit by calling foo.
Hence, in addition to main, we also need to explore call chains
beginning at all async callback subprogs in a program.
Fixes: 7ddc80a476 ("bpf: Teach stack depth check about async callbacks.")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230717161530.1238-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The assignment to idx in check_max_stack_depth happens once we see a
bpf_pseudo_call or bpf_pseudo_func. This is not an issue as the rest of
the code performs a few checks and then pushes the frame to the frame
stack, except the case of async callbacks. If the async callback case
causes the loop iteration to be skipped, the idx assignment will be
incorrect on the next iteration of the loop. The value stored in the
frame stack (as the subprogno of the current subprog) will be incorrect.
This leads to incorrect checks and incorrect tail_call_reachable
marking. Save the target subprog in a new variable and only assign to
idx once we are done with the is_async_cb check which may skip pushing
of frame to the frame stack and subsequent stack depth checks and tail
call markings.
Fixes: 7ddc80a476 ("bpf: Teach stack depth check about async callbacks.")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230717161530.1238-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>