mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2026-04-15 13:02:46 -04:00
1db080cbdbab28752bbb1c86d64daf96253a5da1
73122 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
1db080cbdb |
can: j1939: recvmsg(): allow MSG_CMSG_COMPAT flag
The control message provided by J1939 support MSG_CMSG_COMPAT but
blocked recvmsg() syscalls that have set this flag, i.e. on 32bit user
space on 64 bit kernels.
Link: https://github.com/hartkopp/can-isotp/issues/59
Cc: Oleksij Rempel <o.rempel@pengutronix.de>
Suggested-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Tested-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Fixes:
|
||
|
|
db2773d65b |
can: isotp: recvmsg(): allow MSG_CMSG_COMPAT flag
The control message provided by isotp support MSG_CMSG_COMPAT but
blocked recvmsg() syscalls that have set this flag, i.e. on 32bit user
space on 64 bit kernels.
Link: https://github.com/hartkopp/can-isotp/issues/59
Cc: Oleksij Rempel <o.rempel@pengutronix.de>
Suggested-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Fixes:
|
||
|
|
35a089b5d7 |
tipc: check the bearer min mtu properly when setting it by netlink
Checking the bearer min mtu with tipc_udp_mtu_bad() only works for
IPv4 UDP bearer, and IPv6 UDP bearer has a different value for the
min mtu. This patch checks with encap_hlen + TIPC_MIN_BEARER_MTU
for min mtu, which works for both IPv4 and IPv6 UDP bearer.
Note that tipc_udp_mtu_bad() is still used to check media min mtu
in __tipc_nl_media_set(), as m->mtu currently is only used by the
IPv4 UDP bearer as its default mtu value.
Fixes:
|
||
|
|
56077b56cd |
tipc: do not update mtu if msg_max is too small in mtu negotiation
When doing link mtu negotiation, a malicious peer may send Activate msg
with a very small mtu, e.g. 4 in Shuang's testing, without checking for
the minimum mtu, l->mtu will be set to 4 in tipc_link_proto_rcv(), then
n->links[bearer_id].mtu is set to 4294967228, which is a overflow of
'4 - INT_H_SIZE - EMSG_OVERHEAD' in tipc_link_mss().
With tipc_link.mtu = 4, tipc_link_xmit() kept printing the warning:
tipc: Too large msg, purging xmit list 1 5 0 40 4!
tipc: Too large msg, purging xmit list 1 15 0 60 4!
And with tipc_link_entry.mtu 4294967228, a huge skb was allocated in
named_distribute(), and when purging it in tipc_link_xmit(), a crash
was even caused:
general protection fault, probably for non-canonical address 0x2100001011000dd: 0000 [#1] PREEMPT SMP PTI
CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Not tainted 6.3.0.neta #19
RIP: 0010:kfree_skb_list_reason+0x7e/0x1f0
Call Trace:
<IRQ>
skb_release_data+0xf9/0x1d0
kfree_skb_reason+0x40/0x100
tipc_link_xmit+0x57a/0x740 [tipc]
tipc_node_xmit+0x16c/0x5c0 [tipc]
tipc_named_node_up+0x27f/0x2c0 [tipc]
tipc_node_write_unlock+0x149/0x170 [tipc]
tipc_rcv+0x608/0x740 [tipc]
tipc_udp_recv+0xdc/0x1f0 [tipc]
udp_queue_rcv_one_skb+0x33e/0x620
udp_unicast_rcv_skb.isra.72+0x75/0x90
__udp4_lib_rcv+0x56d/0xc20
ip_protocol_deliver_rcu+0x100/0x2d0
This patch fixes it by checking the new mtu against tipc_bearer_min_mtu(),
and not updating mtu if it is too small.
Fixes:
|
||
|
|
3ae6d66b60 |
tipc: add tipc_bearer_min_mtu to calculate min mtu
As different media may requires different min mtu, and even the same media with different net family requires different min mtu, add tipc_bearer_min_mtu() to calculate min mtu accordingly. This API will be used to check the new mtu when doing the link mtu negotiation in the next patch. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
|
|
c83b49383b |
net: nsh: Use correct mac_offset to unwind gso skb in nsh_gso_segment()
As the call trace shows, skb_panic was caused by wrong skb->mac_header
in nsh_gso_segment():
invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
CPU: 3 PID: 2737 Comm: syz Not tainted 6.3.0-next-20230505 #1
RIP: 0010:skb_panic+0xda/0xe0
call Trace:
skb_push+0x91/0xa0
nsh_gso_segment+0x4f3/0x570
skb_mac_gso_segment+0x19e/0x270
__skb_gso_segment+0x1e8/0x3c0
validate_xmit_skb+0x452/0x890
validate_xmit_skb_list+0x99/0xd0
sch_direct_xmit+0x294/0x7c0
__dev_queue_xmit+0x16f0/0x1d70
packet_xmit+0x185/0x210
packet_snd+0xc15/0x1170
packet_sendmsg+0x7b/0xa0
sock_sendmsg+0x14f/0x160
The root cause is:
nsh_gso_segment() use skb->network_header - nhoff to reset mac_header
in skb_gso_error_unwind() if inner-layer protocol gso fails.
However, skb->network_header may be reset by inner-layer protocol
gso function e.g. mpls_gso_segment. skb->mac_header reset by the
inaccurate network_header will be larger than skb headroom.
nsh_gso_segment
nhoff = skb->network_header - skb->mac_header;
__skb_pull(skb,nsh_len)
skb_mac_gso_segment
mpls_gso_segment
skb_reset_network_header(skb);//skb->network_header+=nsh_len
return -EINVAL;
skb_gso_error_unwind
skb_push(skb, nsh_len);
skb->mac_header = skb->network_header - nhoff;
// skb->mac_header > skb->headroom, cause skb_push panic
Use correct mac_offset to restore mac_header and get rid of nhoff.
Fixes:
|
||
|
|
d80fc101d2 |
erspan: get the proto with the md version for collect_md
In commit |
||
|
|
1e306ec49a |
tcp: fix possible sk_priority leak in tcp_v4_send_reset()
When tcp_v4_send_reset() is called with @sk == NULL,
we do not change ctl_sk->sk_priority, which could have been
set from a prior invocation.
Change tcp_v4_send_reset() to set sk_priority and sk_mark
fields before calling ip_send_unicast_reply().
This means tcp_v4_send_reset() and tcp_v4_send_ack()
no longer have to clear ctl_sk->sk_mark after
their call to ip_send_unicast_reply().
Fixes:
|
||
|
|
6d4486efe9 |
vsock: avoid to close connected socket after the timeout
When client and server establish a connection through vsock,
the client send a request to the server to initiate the connection,
then start a timer to wait for the server's response. When the server's
RESPONSE message arrives, the timer also times out and exits. The
server's RESPONSE message is processed first, and the connection is
established. However, the client's timer also times out, the original
processing logic of the client is to directly set the state of this vsock
to CLOSE and return ETIMEDOUT. It will not notify the server when the port
is released, causing the server port remain.
when client's vsock_connect timeout,it should check sk state is
ESTABLISHED or not. if sk state is ESTABLISHED, it means the connection
is established, the client should not set the sk state to CLOSE
Note: I encountered this issue on kernel-4.18, which can be fixed by
this patch. Then I checked the latest code in the community
and found similar issue.
Fixes:
|
||
|
|
ef1148d448 |
ipv6: remove nexthop_fib6_nh_bh()
After blamed commit, nexthop_fib6_nh_bh() and nexthop_fib6_nh()
are the same.
Delete nexthop_fib6_nh_bh(), and convert /proc/net/ipv6_route
to standard rcu to avoid this splat:
[ 5723.180080] WARNING: suspicious RCU usage
[ 5723.180083] -----------------------------
[ 5723.180084] include/net/nexthop.h:516 suspicious rcu_dereference_check() usage!
[ 5723.180086]
other info that might help us debug this:
[ 5723.180087]
rcu_scheduler_active = 2, debug_locks = 1
[ 5723.180089] 2 locks held by cat/55856:
[ 5723.180091] #0: ffff9440a582afa8 (&p->lock){+.+.}-{3:3}, at: seq_read_iter (fs/seq_file.c:188)
[ 5723.180100] #1: ffffffffaac07040 (rcu_read_lock_bh){....}-{1:2}, at: rcu_lock_acquire (include/linux/rcupdate.h:326)
[ 5723.180109]
stack backtrace:
[ 5723.180111] CPU: 14 PID: 55856 Comm: cat Tainted: G S I 6.3.0-dbx-DEV #528
[ 5723.180115] Call Trace:
[ 5723.180117] <TASK>
[ 5723.180119] dump_stack_lvl (lib/dump_stack.c:107)
[ 5723.180124] dump_stack (lib/dump_stack.c:114)
[ 5723.180126] lockdep_rcu_suspicious (include/linux/context_tracking.h:122)
[ 5723.180132] ipv6_route_seq_show (include/net/nexthop.h:?)
[ 5723.180135] ? ipv6_route_seq_next (net/ipv6/ip6_fib.c:2605)
[ 5723.180140] seq_read_iter (fs/seq_file.c:272)
[ 5723.180145] seq_read (fs/seq_file.c:163)
[ 5723.180151] proc_reg_read (fs/proc/inode.c:316 fs/proc/inode.c:328)
[ 5723.180155] vfs_read (fs/read_write.c:468)
[ 5723.180160] ? up_read (kernel/locking/rwsem.c:1617)
[ 5723.180164] ksys_read (fs/read_write.c:613)
[ 5723.180168] __x64_sys_read (fs/read_write.c:621)
[ 5723.180170] do_syscall_64 (arch/x86/entry/common.c:?)
[ 5723.180174] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
[ 5723.180177] RIP: 0033:0x7fa455677d2a
Fixes:
|
||
|
|
e93c9378e3 |
devlink: change per-devlink netdev notifier to static one
The commit |
||
|
|
cceac92678 |
Merge tag 'nf-23-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says: ==================== Netfilter updates for net The following patchset contains Netfilter fixes for net: 1) Fix UAF when releasing netnamespace, from Florian Westphal. 2) Fix possible BUG_ON when nf_conntrack is enabled with enable_hooks, from Florian Westphal. 3) Fixes for nft_flowtable.sh selftest, from Boris Sukholitko. 4) Extend nft_flowtable.sh selftest to cover integration with ingress/egress hooks, from Florian Westphal. * tag 'nf-23-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: selftests: nft_flowtable.sh: check ingress/egress chain too selftests: nft_flowtable.sh: monitor result file sizes selftests: nft_flowtable.sh: wait for specific nc pids selftests: nft_flowtable.sh: no need for ps -x option selftests: nft_flowtable.sh: use /proc for pid checking netfilter: conntrack: fix possible bug_on with enable_hooks=1 netfilter: nf_tables: always release netdev hooks from notifier ==================== Link: https://lore.kernel.org/r/20230510083313.152961-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
e1d09c2c2f |
af_unix: Fix data races around sk->sk_shutdown.
KCSAN found a data race around sk->sk_shutdown where unix_release_sock() and unix_shutdown() update it under unix_state_lock(), OTOH unix_poll() and unix_dgram_poll() read it locklessly. We need to annotate the writes and reads with WRITE_ONCE() and READ_ONCE(). BUG: KCSAN: data-race in unix_poll / unix_release_sock write to 0xffff88800d0f8aec of 1 bytes by task 264 on cpu 0: unix_release_sock+0x75c/0x910 net/unix/af_unix.c:631 unix_release+0x59/0x80 net/unix/af_unix.c:1042 __sock_release+0x7d/0x170 net/socket.c:653 sock_close+0x19/0x30 net/socket.c:1397 __fput+0x179/0x5e0 fs/file_table.c:321 ____fput+0x15/0x20 fs/file_table.c:349 task_work_run+0x116/0x1a0 kernel/task_work.c:179 resume_user_mode_work include/linux/resume_user_mode.h:49 [inline] exit_to_user_mode_loop kernel/entry/common.c:171 [inline] exit_to_user_mode_prepare+0x174/0x180 kernel/entry/common.c:204 __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline] syscall_exit_to_user_mode+0x1a/0x30 kernel/entry/common.c:297 do_syscall_64+0x4b/0x90 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x72/0xdc read to 0xffff88800d0f8aec of 1 bytes by task 222 on cpu 1: unix_poll+0xa3/0x2a0 net/unix/af_unix.c:3170 sock_poll+0xcf/0x2b0 net/socket.c:1385 vfs_poll include/linux/poll.h:88 [inline] ep_item_poll.isra.0+0x78/0xc0 fs/eventpoll.c:855 ep_send_events fs/eventpoll.c:1694 [inline] ep_poll fs/eventpoll.c:1823 [inline] do_epoll_wait+0x6c4/0xea0 fs/eventpoll.c:2258 __do_sys_epoll_wait fs/eventpoll.c:2270 [inline] __se_sys_epoll_wait fs/eventpoll.c:2265 [inline] __x64_sys_epoll_wait+0xcc/0x190 fs/eventpoll.c:2265 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x72/0xdc value changed: 0x00 -> 0x03 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 222 Comm: dbus-broker Not tainted 6.3.0-rc7-02330-gca6270c12e20 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Fixes: |
||
|
|
679ed006d4 |
af_unix: Fix a data race of sk->sk_receive_queue->qlen.
KCSAN found a data race of sk->sk_receive_queue->qlen where recvmsg()
updates qlen under the queue lock and sendmsg() checks qlen under
unix_state_sock(), not the queue lock, so the reader side needs
READ_ONCE().
BUG: KCSAN: data-race in __skb_try_recv_from_queue / unix_wait_for_peer
write (marked) to 0xffff888019fe7c68 of 4 bytes by task 49792 on cpu 0:
__skb_unlink include/linux/skbuff.h:2347 [inline]
__skb_try_recv_from_queue+0x3de/0x470 net/core/datagram.c:197
__skb_try_recv_datagram+0xf7/0x390 net/core/datagram.c:263
__unix_dgram_recvmsg+0x109/0x8a0 net/unix/af_unix.c:2452
unix_dgram_recvmsg+0x94/0xa0 net/unix/af_unix.c:2549
sock_recvmsg_nosec net/socket.c:1019 [inline]
____sys_recvmsg+0x3a3/0x3b0 net/socket.c:2720
___sys_recvmsg+0xc8/0x150 net/socket.c:2764
do_recvmmsg+0x182/0x560 net/socket.c:2858
__sys_recvmmsg net/socket.c:2937 [inline]
__do_sys_recvmmsg net/socket.c:2960 [inline]
__se_sys_recvmmsg net/socket.c:2953 [inline]
__x64_sys_recvmmsg+0x153/0x170 net/socket.c:2953
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
read to 0xffff888019fe7c68 of 4 bytes by task 49793 on cpu 1:
skb_queue_len include/linux/skbuff.h:2127 [inline]
unix_recvq_full net/unix/af_unix.c:229 [inline]
unix_wait_for_peer+0x154/0x1a0 net/unix/af_unix.c:1445
unix_dgram_sendmsg+0x13bc/0x14b0 net/unix/af_unix.c:2048
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x148/0x160 net/socket.c:747
____sys_sendmsg+0x20e/0x620 net/socket.c:2503
___sys_sendmsg+0xc6/0x140 net/socket.c:2557
__sys_sendmmsg+0x11d/0x370 net/socket.c:2643
__do_sys_sendmmsg net/socket.c:2672 [inline]
__se_sys_sendmmsg net/socket.c:2669 [inline]
__x64_sys_sendmmsg+0x58/0x70 net/socket.c:2669
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
value changed: 0x0000000b -> 0x00000001
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 49793 Comm: syz-executor.0 Not tainted 6.3.0-rc7-02330-gca6270c12e20 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Fixes:
|
||
|
|
5bca1d081f |
net: datagram: fix data-races in datagram_poll()
datagram_poll() runs locklessly, we should add READ_ONCE()
annotations while reading sk->sk_err, sk->sk_shutdown and sk->sk_state.
Fixes:
|
||
|
|
e14cadfd80 |
tcp: add annotations around sk->sk_shutdown accesses
Now sk->sk_shutdown is no longer a bitfield, we can add
standard READ_ONCE()/WRITE_ONCE() annotations to silence
KCSAN reports like the following:
BUG: KCSAN: data-race in tcp_disconnect / tcp_poll
write to 0xffff88814588582c of 1 bytes by task 3404 on cpu 1:
tcp_disconnect+0x4d6/0xdb0 net/ipv4/tcp.c:3121
__inet_stream_connect+0x5dd/0x6e0 net/ipv4/af_inet.c:715
inet_stream_connect+0x48/0x70 net/ipv4/af_inet.c:727
__sys_connect_file net/socket.c:2001 [inline]
__sys_connect+0x19b/0x1b0 net/socket.c:2018
__do_sys_connect net/socket.c:2028 [inline]
__se_sys_connect net/socket.c:2025 [inline]
__x64_sys_connect+0x41/0x50 net/socket.c:2025
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
read to 0xffff88814588582c of 1 bytes by task 3374 on cpu 0:
tcp_poll+0x2e6/0x7d0 net/ipv4/tcp.c:562
sock_poll+0x253/0x270 net/socket.c:1383
vfs_poll include/linux/poll.h:88 [inline]
io_poll_check_events io_uring/poll.c:281 [inline]
io_poll_task_func+0x15a/0x820 io_uring/poll.c:333
handle_tw_list io_uring/io_uring.c:1184 [inline]
tctx_task_work+0x1fe/0x4d0 io_uring/io_uring.c:1246
task_work_run+0x123/0x160 kernel/task_work.c:179
get_signal+0xe64/0xff0 kernel/signal.c:2635
arch_do_signal_or_restart+0x89/0x2a0 arch/x86/kernel/signal.c:306
exit_to_user_mode_loop+0x6f/0xe0 kernel/entry/common.c:168
exit_to_user_mode_prepare+0x6c/0xb0 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x26/0x140 kernel/entry/common.c:297
do_syscall_64+0x4d/0xc0 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x63/0xcd
value changed: 0x03 -> 0x00
Fixes:
|
||
|
|
4063384ef7 |
net: add vlan_get_protocol_and_depth() helper
Before blamed commit, pskb_may_pull() was used instead of skb_header_pointer() in __vlan_get_protocol() and friends. Few callers depended on skb->head being populated with MAC header, syzbot caught one of them (skb_mac_gso_segment()) Add vlan_get_protocol_and_depth() to make the intent clearer and use it where sensible. This is a more generic fix than commit |
||
|
|
d0ac89f6f9 |
net: deal with most data-races in sk_wait_event()
__condition is evaluated twice in sk_wait_event() macro.
First invocation is lockless, and reads can race with writes,
as spotted by syzbot.
BUG: KCSAN: data-race in sk_stream_wait_connect / tcp_disconnect
write to 0xffff88812d83d6a0 of 4 bytes by task 9065 on cpu 1:
tcp_disconnect+0x2cd/0xdb0
inet_shutdown+0x19e/0x1f0 net/ipv4/af_inet.c:911
__sys_shutdown_sock net/socket.c:2343 [inline]
__sys_shutdown net/socket.c:2355 [inline]
__do_sys_shutdown net/socket.c:2363 [inline]
__se_sys_shutdown+0xf8/0x140 net/socket.c:2361
__x64_sys_shutdown+0x31/0x40 net/socket.c:2361
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
read to 0xffff88812d83d6a0 of 4 bytes by task 9040 on cpu 0:
sk_stream_wait_connect+0x1de/0x3a0 net/core/stream.c:75
tcp_sendmsg_locked+0x2e4/0x2120 net/ipv4/tcp.c:1266
tcp_sendmsg+0x30/0x50 net/ipv4/tcp.c:1484
inet6_sendmsg+0x63/0x80 net/ipv6/af_inet6.c:651
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg net/socket.c:747 [inline]
__sys_sendto+0x246/0x300 net/socket.c:2142
__do_sys_sendto net/socket.c:2154 [inline]
__se_sys_sendto net/socket.c:2150 [inline]
__x64_sys_sendto+0x78/0x90 net/socket.c:2150
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
value changed: 0x00000000 -> 0x00000068
Fixes:
|
||
|
|
e05a5f510f |
net: annotate sk->sk_err write from do_recvmmsg()
do_recvmmsg() can write to sk->sk_err from multiple threads.
As said before, many other points reading or writing sk_err
need annotations.
Fixes:
|
||
|
|
a939d14919 |
netlink: annotate accesses to nlk->cb_running
Both netlink_recvmsg() and netlink_native_seq_show() read
nlk->cb_running locklessly. Use READ_ONCE() there.
Add corresponding WRITE_ONCE() to netlink_dump() and
__netlink_dump_start()
syzbot reported:
BUG: KCSAN: data-race in __netlink_dump_start / netlink_recvmsg
write to 0xffff88813ea4db59 of 1 bytes by task 28219 on cpu 0:
__netlink_dump_start+0x3af/0x4d0 net/netlink/af_netlink.c:2399
netlink_dump_start include/linux/netlink.h:308 [inline]
rtnetlink_rcv_msg+0x70f/0x8c0 net/core/rtnetlink.c:6130
netlink_rcv_skb+0x126/0x220 net/netlink/af_netlink.c:2577
rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:6192
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x56f/0x640 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x665/0x770 net/netlink/af_netlink.c:1942
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg net/socket.c:747 [inline]
sock_write_iter+0x1aa/0x230 net/socket.c:1138
call_write_iter include/linux/fs.h:1851 [inline]
new_sync_write fs/read_write.c:491 [inline]
vfs_write+0x463/0x760 fs/read_write.c:584
ksys_write+0xeb/0x1a0 fs/read_write.c:637
__do_sys_write fs/read_write.c:649 [inline]
__se_sys_write fs/read_write.c:646 [inline]
__x64_sys_write+0x42/0x50 fs/read_write.c:646
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
read to 0xffff88813ea4db59 of 1 bytes by task 28222 on cpu 1:
netlink_recvmsg+0x3b4/0x730 net/netlink/af_netlink.c:2022
sock_recvmsg_nosec+0x4c/0x80 net/socket.c:1017
____sys_recvmsg+0x2db/0x310 net/socket.c:2718
___sys_recvmsg net/socket.c:2762 [inline]
do_recvmmsg+0x2e5/0x710 net/socket.c:2856
__sys_recvmmsg net/socket.c:2935 [inline]
__do_sys_recvmmsg net/socket.c:2958 [inline]
__se_sys_recvmmsg net/socket.c:2951 [inline]
__x64_sys_recvmmsg+0xe2/0x160 net/socket.c:2951
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
value changed: 0x00 -> 0x01
Fixes:
|
||
|
|
e72eeab542 |
netfilter: conntrack: fix possible bug_on with enable_hooks=1
I received a bug report (no reproducer so far) where we trip over
712 rcu_read_lock();
713 ct_hook = rcu_dereference(nf_ct_hook);
714 BUG_ON(ct_hook == NULL); // here
In nf_conntrack_destroy().
First turn this BUG_ON into a WARN. I think it was triggered
via enable_hooks=1 flag.
When this flag is turned on, the conntrack hooks are registered
before nf_ct_hook pointer gets assigned.
This opens a short window where packets enter the conntrack machinery,
can have skb->_nfct set up and a subsequent kfree_skb might occur
before nf_ct_hook is set.
Call nf_conntrack_init_end() to set nf_ct_hook before we register the
pernet ops.
Fixes:
|
||
|
|
dc1c9fd4a8 |
netfilter: nf_tables: always release netdev hooks from notifier
This reverts "netfilter: nf_tables: skip netdev events generated on netns removal".
The problem is that when a veth device is released, the veth release
callback will also queue the peer netns device for removal.
Its possible that the peer netns is also slated for removal. In this
case, the device memory is already released before the pre_exit hook of
the peer netns runs:
BUG: KASAN: slab-use-after-free in nf_hook_entry_head+0x1b8/0x1d0
Read of size 8 at addr ffff88812c0124f0 by task kworker/u8:1/45
Workqueue: netns cleanup_net
Call Trace:
nf_hook_entry_head+0x1b8/0x1d0
__nf_unregister_net_hook+0x76/0x510
nft_netdev_unregister_hooks+0xa0/0x220
__nft_release_hook+0x184/0x490
nf_tables_pre_exit_net+0x12f/0x1b0
..
Order is:
1. First netns is released, veth_dellink() queues peer netns device
for removal
2. peer netns is queued for removal
3. peer netns device is released, unreg event is triggered
4. unreg event is ignored because netns is going down
5. pre_exit hook calls nft_netdev_unregister_hooks but device memory
might be free'd already.
Fixes:
|
||
|
|
424f8416bb |
net: skb_partial_csum_set() fix against transport header magic value
skb->transport_header uses the special 0xFFFF value
to mark if the transport header was set or not.
We must prevent callers to accidentaly set skb->transport_header
to 0xFFFF. Note that only fuzzers can possibly do this today.
syzbot reported:
WARNING: CPU: 0 PID: 2340 at include/linux/skbuff.h:2847 skb_transport_offset include/linux/skbuff.h:2956 [inline]
WARNING: CPU: 0 PID: 2340 at include/linux/skbuff.h:2847 virtio_net_hdr_to_skb+0xbcc/0x10c0 include/linux/virtio_net.h:103
Modules linked in:
CPU: 0 PID: 2340 Comm: syz-executor.0 Not tainted 6.3.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/14/2023
RIP: 0010:skb_transport_header include/linux/skbuff.h:2847 [inline]
RIP: 0010:skb_transport_offset include/linux/skbuff.h:2956 [inline]
RIP: 0010:virtio_net_hdr_to_skb+0xbcc/0x10c0 include/linux/virtio_net.h:103
Code: 41 39 df 0f 82 c3 04 00 00 48 8b 7c 24 10 44 89 e6 e8 08 6e 59 ff 48 85 c0 74 54 e8 ce 36 7e fc e9 37 f8 ff ff e8 c4 36 7e fc <0f> 0b e9 93 f8 ff ff 44 89 f7 44 89 e6 e8 32 38 7e fc 45 39 e6 0f
RSP: 0018:ffffc90004497880 EFLAGS: 00010293
RAX: ffffffff84fea55c RBX: 000000000000ffff RCX: ffff888120be2100
RDX: 0000000000000000 RSI: 000000000000ffff RDI: 000000000000ffff
RBP: ffffc90004497990 R08: ffffffff84fe9de5 R09: 0000000000000034
R10: ffffea00048ebd80 R11: 0000000000000034 R12: ffff88811dc2d9c8
R13: dffffc0000000000 R14: ffff88811dc2d9ae R15: 1ffff11023b85b35
FS: 00007f9211a59700(0000) GS:ffff8881f6c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000200002c0 CR3: 00000001215a5000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
packet_snd net/packet/af_packet.c:3076 [inline]
packet_sendmsg+0x4590/0x61a0 net/packet/af_packet.c:3115
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg net/socket.c:747 [inline]
__sys_sendto+0x472/0x630 net/socket.c:2144
__do_sys_sendto net/socket.c:2156 [inline]
__se_sys_sendto net/socket.c:2152 [inline]
__x64_sys_sendto+0xe5/0x100 net/socket.c:2152
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x2f/0x50 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f9210c8c169
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f9211a59168 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007f9210dabf80 RCX: 00007f9210c8c169
RDX: 000000000000ffed RSI: 00000000200000c0 RDI: 0000000000000003
RBP: 00007f9210ce7ca1 R08: 0000000020000540 R09: 0000000000000014
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe135d65cf R14: 00007f9211a59300 R15: 0000000000022000
Fixes:
|
||
|
|
ed23734c23 |
Merge tag 'net-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter.
Current release - regressions:
- sched: act_pedit: free pedit keys on bail from offset check
Current release - new code bugs:
- pds_core:
- Kconfig fixes (DEBUGFS and AUXILIARY_BUS)
- fix mutex double unlock in error path
Previous releases - regressions:
- sched: cls_api: remove block_cb from driver_list before freeing
- nf_tables: fix ct untracked match breakage
- eth: mtk_eth_soc: drop generic vlan rx offload
- sched: flower: fix error handler on replace
Previous releases - always broken:
- tcp: fix skb_copy_ubufs() vs BIG TCP
- ipv6: fix skb hash for some RST packets
- af_packet: don't send zero-byte data in packet_sendmsg_spkt()
- rxrpc: timeout handling fixes after moving client call connection
to the I/O thread
- ixgbe: fix panic during XDP_TX with > 64 CPUs
- igc: RMW the SRRCTL register to prevent losing timestamp config
- dsa: mt7530: fix corrupt frames using TRGMII on 40 MHz XTAL MT7621
- r8152:
- fix flow control issue of RTL8156A
- fix the poor throughput for 2.5G devices
- move setting r8153b_rx_agg_chg_indicate() to fix coalescing
- enable autosuspend
- ncsi: clear Tx enable mode when handling a Config required AEN
- octeontx2-pf: macsec: fixes for CN10KB ASIC rev
Misc:
- 9p: remove INET dependency"
* tag 'net-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (69 commits)
net: bcmgenet: Remove phy_stop() from bcmgenet_netif_stop()
pds_core: fix mutex double unlock in error path
net/sched: flower: fix error handler on replace
Revert "net/sched: flower: Fix wrong handle assignment during filter change"
net/sched: flower: fix filter idr initialization
net: fec: correct the counting of XDP sent frames
bonding: add xdp_features support
net: enetc: check the index of the SFI rather than the handle
sfc: Add back mailing list
virtio_net: suppress cpu stall when free_unused_bufs
ice: block LAN in case of VF to VF offload
net: dsa: mt7530: fix network connectivity with multiple CPU ports
net: dsa: mt7530: fix corrupt frames using trgmii on 40 MHz XTAL MT7621
9p: Remove INET dependency
netfilter: nf_tables: fix ct untracked match breakage
af_packet: Don't send zero-byte data in packet_sendmsg_spkt().
igc: read before write to SRRCTL register
pds_core: add AUXILIARY_BUS and NET_DEVLINK to Kconfig
pds_core: remove CONFIG_DEBUG_FS from makefile
ionic: catch failure from devlink_alloc
...
|
||
|
|
644bca1d48 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
There's a fix which landed in net-next, pull it in along with the couple of minor cleanups. Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
fd741f0d9f |
net/sched: flower: fix error handler on replace
When replacing a filter (i.e. 'fold' pointer is not NULL) the insertion of
new filter to idr is postponed until later in code since handle is already
provided by the user. However, the error handling code in fl_change()
always assumes that the new filter had been inserted into idr. If error
handler is reached when replacing existing filter it may remove it from idr
therefore making it unreachable for delete or dump afterwards. Fix the
issue by verifying that 'fold' argument wasn't provided by caller before
calling idr_remove().
Fixes:
|
||
|
|
5110f3ff6d |
Revert "net/sched: flower: Fix wrong handle assignment during filter change"
This reverts commit
|
||
|
|
dd4f6bbfa6 |
net/sched: flower: fix filter idr initialization
The cited commit moved idr initialization too early in fl_change() which
allows concurrent users to access the filter that is still being
initialized and is in inconsistent state, which, in turn, can cause NULL
pointer dereference [0]. Since there is no obvious way to fix the ordering
without reverting the whole cited commit, alternative approach taken to
first insert NULL pointer into idr in order to allocate the handle but
still cause fl_get() to return NULL and prevent concurrent users from
seeing the filter while providing miss-to-action infrastructure with valid
handle id early in fl_change().
[ 152.434728] general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASAN
[ 152.436163] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
[ 152.437269] CPU: 4 PID: 3877 Comm: tc Not tainted 6.3.0-rc4+ #5
[ 152.438110] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 152.439644] RIP: 0010:fl_dump_key+0x8b/0x1d10 [cls_flower]
[ 152.440461] Code: 01 f2 02 f2 c7 40 08 04 f2 04 f2 c7 40 0c 04 f3 f3 f3 65 48 8b 04 25 28 00 00 00 48 89 84 24 00 01 00 00 48 89 c8 48 c1 e8 03 <0f> b6 04 10 84 c0 74 08 3c 03 0f 8e 98 19 00 00 8b 13 85 d2 74 57
[ 152.442885] RSP: 0018:ffff88817a28f158 EFLAGS: 00010246
[ 152.443851] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 152.444826] RDX: dffffc0000000000 RSI: ffffffff8500ae80 RDI: ffff88810a987900
[ 152.445791] RBP: ffff888179d88240 R08: ffff888179d8845c R09: ffff888179d88240
[ 152.446780] R10: ffffed102f451e48 R11: 00000000fffffff2 R12: ffff88810a987900
[ 152.447741] R13: ffffffff8500ae80 R14: ffff88810a987900 R15: ffff888149b3c738
[ 152.448756] FS: 00007f5eb2a34800(0000) GS:ffff88881ec00000(0000) knlGS:0000000000000000
[ 152.449888] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 152.450685] CR2: 000000000046ad19 CR3: 000000010b0bd006 CR4: 0000000000370ea0
[ 152.451641] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 152.452628] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 152.453588] Call Trace:
[ 152.454032] <TASK>
[ 152.454447] ? netlink_sendmsg+0x7a1/0xcb0
[ 152.455109] ? sock_sendmsg+0xc5/0x190
[ 152.455689] ? ____sys_sendmsg+0x535/0x6b0
[ 152.456320] ? ___sys_sendmsg+0xeb/0x170
[ 152.456916] ? do_syscall_64+0x3d/0x90
[ 152.457529] ? entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 152.458321] ? ___sys_sendmsg+0xeb/0x170
[ 152.458958] ? __sys_sendmsg+0xb5/0x140
[ 152.459564] ? do_syscall_64+0x3d/0x90
[ 152.460122] ? entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 152.460852] ? fl_dump_key_options.part.0+0xea0/0xea0 [cls_flower]
[ 152.461710] ? _raw_spin_lock+0x7a/0xd0
[ 152.462299] ? _raw_read_lock_irq+0x30/0x30
[ 152.462924] ? nla_put+0x15e/0x1c0
[ 152.463480] fl_dump+0x228/0x650 [cls_flower]
[ 152.464112] ? fl_tmplt_dump+0x210/0x210 [cls_flower]
[ 152.464854] ? __kmem_cache_alloc_node+0x1a7/0x330
[ 152.465592] ? nla_put+0x15e/0x1c0
[ 152.466160] tcf_fill_node+0x515/0x9a0
[ 152.466766] ? tc_setup_offload_action+0xf0/0xf0
[ 152.467463] ? __alloc_skb+0x13c/0x2a0
[ 152.468067] ? __build_skb_around+0x330/0x330
[ 152.468814] ? fl_get+0x107/0x1a0 [cls_flower]
[ 152.469503] tc_del_tfilter+0x718/0x1330
[ 152.470115] ? is_bpf_text_address+0xa/0x20
[ 152.470765] ? tc_ctl_chain+0xee0/0xee0
[ 152.471335] ? __kernel_text_address+0xe/0x30
[ 152.471948] ? unwind_get_return_address+0x56/0xa0
[ 152.472639] ? __thaw_task+0x150/0x150
[ 152.473218] ? arch_stack_walk+0x98/0xf0
[ 152.473839] ? __stack_depot_save+0x35/0x4c0
[ 152.474501] ? stack_trace_save+0x91/0xc0
[ 152.475119] ? security_capable+0x51/0x90
[ 152.475741] rtnetlink_rcv_msg+0x2c1/0x9d0
[ 152.476387] ? rtnl_calcit.isra.0+0x2b0/0x2b0
[ 152.477042] ? __sys_sendmsg+0xb5/0x140
[ 152.477664] ? do_syscall_64+0x3d/0x90
[ 152.478255] ? entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 152.479010] ? __stack_depot_save+0x35/0x4c0
[ 152.479679] ? __stack_depot_save+0x35/0x4c0
[ 152.480346] netlink_rcv_skb+0x12c/0x360
[ 152.480929] ? rtnl_calcit.isra.0+0x2b0/0x2b0
[ 152.481517] ? do_syscall_64+0x3d/0x90
[ 152.482061] ? netlink_ack+0x1550/0x1550
[ 152.482612] ? rhashtable_walk_peek+0x170/0x170
[ 152.483262] ? kmem_cache_alloc_node+0x1af/0x390
[ 152.483875] ? _copy_from_iter+0x3d6/0xc70
[ 152.484528] netlink_unicast+0x553/0x790
[ 152.485168] ? netlink_attachskb+0x6a0/0x6a0
[ 152.485848] ? unwind_next_frame+0x11cc/0x1a10
[ 152.486538] ? arch_stack_walk+0x61/0xf0
[ 152.487169] netlink_sendmsg+0x7a1/0xcb0
[ 152.487799] ? netlink_unicast+0x790/0x790
[ 152.488355] ? iovec_from_user.part.0+0x4d/0x220
[ 152.488990] ? _raw_spin_lock+0x7a/0xd0
[ 152.489598] ? netlink_unicast+0x790/0x790
[ 152.490236] sock_sendmsg+0xc5/0x190
[ 152.490796] ____sys_sendmsg+0x535/0x6b0
[ 152.491394] ? import_iovec+0x7/0x10
[ 152.491964] ? kernel_sendmsg+0x30/0x30
[ 152.492561] ? __copy_msghdr+0x3c0/0x3c0
[ 152.493160] ? do_syscall_64+0x3d/0x90
[ 152.493706] ___sys_sendmsg+0xeb/0x170
[ 152.494283] ? may_open_dev+0xd0/0xd0
[ 152.494858] ? copy_msghdr_from_user+0x110/0x110
[ 152.495541] ? __handle_mm_fault+0x2678/0x4ad0
[ 152.496205] ? copy_page_range+0x2360/0x2360
[ 152.496862] ? __fget_light+0x57/0x520
[ 152.497449] ? mas_find+0x1c0/0x1c0
[ 152.498026] ? sockfd_lookup_light+0x1a/0x140
[ 152.498703] __sys_sendmsg+0xb5/0x140
[ 152.499306] ? __sys_sendmsg_sock+0x20/0x20
[ 152.499951] ? do_user_addr_fault+0x369/0xd80
[ 152.500595] do_syscall_64+0x3d/0x90
[ 152.501185] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 152.501917] RIP: 0033:0x7f5eb294f887
[ 152.502494] Code: 0a 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
[ 152.505008] RSP: 002b:00007ffd2c708f78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 152.506152] RAX: ffffffffffffffda RBX: 00000000642d9472 RCX: 00007f5eb294f887
[ 152.507134] RDX: 0000000000000000 RSI: 00007ffd2c708fe0 RDI: 0000000000000003
[ 152.508113] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
[ 152.509119] R10: 00007f5eb2808708 R11: 0000000000000246 R12: 0000000000000001
[ 152.510068] R13: 0000000000000000 R14: 00007ffd2c70d1b8 R15: 0000000000485400
[ 152.511031] </TASK>
[ 152.511444] Modules linked in: cls_flower sch_ingress openvswitch nsh mlx5_vdpa vringh vhost_iotlb vdpa mlx5_ib mlx5_core rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm ib_uverbs ib_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter overlay zram zsmalloc fuse [last unloaded: mlx5_core]
[ 152.515720] ---[ end trace 0000000000000000 ]---
Fixes:
|
||
|
|
8e15605be8 |
Merge tag '9p-6.4-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
Pull 9p updates from Eric Van Hensbergen: "This includes a number of patches that didn't quite make the cut last merge window while we addressed some outstanding issues and review comments. It includes some new caching modes for those that only want readahead caches and reworks how we do writeback caching so we are not keeping extra references around which both causes performance problems and uses lots of additional resources on the server. It also includes a new flag to force disabling of xattrs which can also cause major performance issues, particularly if the underlying filesystem on the server doesn't support them. Finally it adds a couple of additional mount options to better support directio and enabling caches when the server doesn't support qid.version. There was one late-breaking bug report that has also been included as its own patch where I forgot to propagate an embarassing bit-logic fix to the various variations of open" * tag '9p-6.4-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: fs/9p: Fix bit operation logic error fs/9p: Rework cache modes and add new options to Documentation fs/9p: remove writeback fid and fix per-file modes fs/9p: Add new mount modes 9p: Add additional debug flags and open modes fs/9p: allow disable of xattr support on mount fs/9p: Remove unnecessary superblock flags fs/9p: Consolidate file operations and add readahead and writeback |
||
|
|
d7385ba137 |
9p: Remove INET dependency
9pfs can run over assorted transports, so it doesn't have an INET dependency. Drop it and remove the includes of linux/inet.h. NET_9P_FD/trans_fd.o builds without INET or UNIX and is usable over plain file descriptors. However, tcp and unix functionality is still built and would generate runtime failures if used. Add imply INET and UNIX to NET_9P_FD, so functionality is enabled by default but can still be explicitly disabled. This allows configuring 9pfs over Xen with INET and UNIX disabled. Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
|
|
f057b63bc1 |
netfilter: nf_tables: fix ct untracked match breakage
"ct untracked" no longer works properly due to erroneous NFT_BREAK.
We have to check ctinfo enum first.
Fixes:
|
||
|
|
6a341729fb |
af_packet: Don't send zero-byte data in packet_sendmsg_spkt().
syzkaller reported a warning below [0].
We can reproduce it by sending 0-byte data from the (AF_PACKET,
SOCK_PACKET) socket via some devices whose dev->hard_header_len
is 0.
struct sockaddr_pkt addr = {
.spkt_family = AF_PACKET,
.spkt_device = "tun0",
};
int fd;
fd = socket(AF_PACKET, SOCK_PACKET, 0);
sendto(fd, NULL, 0, 0, (struct sockaddr *)&addr, sizeof(addr));
We have a similar fix for the (AF_PACKET, SOCK_RAW) socket as
commit
|
||
|
|
9ad685dbfe |
ethtool: Fix uninitialized number of lanes
It is not possible to set the number of lanes when setting link modes
using the legacy IOCTL ethtool interface. Since 'struct
ethtool_link_ksettings' is not initialized in this path, drivers receive
an uninitialized number of lanes in 'struct
ethtool_link_ksettings::lanes'.
When this information is later queried from drivers, it results in the
ethtool code making decisions based on uninitialized memory, leading to
the following KMSAN splat [1]. In practice, this most likely only
happens with the tun driver that simply returns whatever it got in the
set operation.
As far as I can tell, this uninitialized memory is not leaked to user
space thanks to the 'ethtool_ops->cap_link_lanes_supported' check in
linkmodes_prepare_data().
Fix by initializing the structure in the IOCTL path. Did not find any
more call sites that pass an uninitialized structure when calling
'ethtool_ops::set_link_ksettings()'.
[1]
BUG: KMSAN: uninit-value in ethnl_update_linkmodes net/ethtool/linkmodes.c:273 [inline]
BUG: KMSAN: uninit-value in ethnl_set_linkmodes+0x190b/0x19d0 net/ethtool/linkmodes.c:333
ethnl_update_linkmodes net/ethtool/linkmodes.c:273 [inline]
ethnl_set_linkmodes+0x190b/0x19d0 net/ethtool/linkmodes.c:333
ethnl_default_set_doit+0x88d/0xde0 net/ethtool/netlink.c:640
genl_family_rcv_msg_doit net/netlink/genetlink.c:968 [inline]
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x141a/0x14c0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x3f8/0x750 net/netlink/af_netlink.c:2577
genl_rcv+0x40/0x60 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0xf41/0x1270 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x127d/0x1430 net/netlink/af_netlink.c:1942
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg net/socket.c:747 [inline]
____sys_sendmsg+0xa24/0xe40 net/socket.c:2501
___sys_sendmsg+0x2a1/0x3f0 net/socket.c:2555
__sys_sendmsg net/socket.c:2584 [inline]
__do_sys_sendmsg net/socket.c:2593 [inline]
__se_sys_sendmsg net/socket.c:2591 [inline]
__x64_sys_sendmsg+0x36b/0x540 net/socket.c:2591
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Uninit was stored to memory at:
tun_get_link_ksettings+0x37/0x60 drivers/net/tun.c:3544
__ethtool_get_link_ksettings+0x17b/0x260 net/ethtool/ioctl.c:441
ethnl_set_linkmodes+0xee/0x19d0 net/ethtool/linkmodes.c:327
ethnl_default_set_doit+0x88d/0xde0 net/ethtool/netlink.c:640
genl_family_rcv_msg_doit net/netlink/genetlink.c:968 [inline]
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x141a/0x14c0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x3f8/0x750 net/netlink/af_netlink.c:2577
genl_rcv+0x40/0x60 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0xf41/0x1270 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x127d/0x1430 net/netlink/af_netlink.c:1942
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg net/socket.c:747 [inline]
____sys_sendmsg+0xa24/0xe40 net/socket.c:2501
___sys_sendmsg+0x2a1/0x3f0 net/socket.c:2555
__sys_sendmsg net/socket.c:2584 [inline]
__do_sys_sendmsg net/socket.c:2593 [inline]
__se_sys_sendmsg net/socket.c:2591 [inline]
__x64_sys_sendmsg+0x36b/0x540 net/socket.c:2591
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Uninit was stored to memory at:
tun_set_link_ksettings+0x37/0x60 drivers/net/tun.c:3553
ethtool_set_link_ksettings+0x600/0x690 net/ethtool/ioctl.c:609
__dev_ethtool net/ethtool/ioctl.c:3024 [inline]
dev_ethtool+0x1db9/0x2a70 net/ethtool/ioctl.c:3078
dev_ioctl+0xb07/0x1270 net/core/dev_ioctl.c:524
sock_do_ioctl+0x295/0x540 net/socket.c:1213
sock_ioctl+0x729/0xd90 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl+0x222/0x400 fs/ioctl.c:856
__x64_sys_ioctl+0x96/0xe0 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Local variable link_ksettings created at:
ethtool_set_link_ksettings+0x54/0x690 net/ethtool/ioctl.c:577
__dev_ethtool net/ethtool/ioctl.c:3024 [inline]
dev_ethtool+0x1db9/0x2a70 net/ethtool/ioctl.c:3078
Fixes:
|
||
|
|
c1592a8994 |
netfilter: nf_tables: deactivate anonymous set from preparation phase
Toggle deleted anonymous sets as inactive in the next generation, so users cannot perform any update on it. Clear the generation bitmask in case the transaction is aborted. The following KASAN splat shows a set element deletion for a bound anonymous set that has been already removed in the same transaction. [ 64.921510] ================================================================== [ 64.923123] BUG: KASAN: wild-memory-access in nf_tables_commit+0xa24/0x1490 [nf_tables] [ 64.924745] Write of size 8 at addr dead000000000122 by task test/890 [ 64.927903] CPU: 3 PID: 890 Comm: test Not tainted 6.3.0+ #253 [ 64.931120] Call Trace: [ 64.932699] <TASK> [ 64.934292] dump_stack_lvl+0x33/0x50 [ 64.935908] ? nf_tables_commit+0xa24/0x1490 [nf_tables] [ 64.937551] kasan_report+0xda/0x120 [ 64.939186] ? nf_tables_commit+0xa24/0x1490 [nf_tables] [ 64.940814] nf_tables_commit+0xa24/0x1490 [nf_tables] [ 64.942452] ? __kasan_slab_alloc+0x2d/0x60 [ 64.944070] ? nf_tables_setelem_notify+0x190/0x190 [nf_tables] [ 64.945710] ? kasan_set_track+0x21/0x30 [ 64.947323] nfnetlink_rcv_batch+0x709/0xd90 [nfnetlink] [ 64.948898] ? nfnetlink_rcv_msg+0x480/0x480 [nfnetlink] Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> |
||
|
|
8509f62b0b |
netfilter: nf_tables: hit ENOENT on unexisting chain/flowtable update with missing attributes
If user does not specify hook number and priority, then assume this is
a chain/flowtable update. Therefore, report ENOENT which provides a
better hint than EINVAL. Set on extended netlink error report to refer
to the chain name.
Fixes:
|
||
|
|
db099c625b |
rxrpc: Fix timeout of a call that hasn't yet been granted a channel
afs_make_call() calls rxrpc_kernel_begin_call() to begin a call (which may
get stalled in the background waiting for a connection to become
available); it then calls rxrpc_kernel_set_max_life() to set the timeouts -
but that starts the call timer so the call timer might then expire before
we get a connection assigned - leading to the following oops if the call
stalled:
BUG: kernel NULL pointer dereference, address: 0000000000000000
...
CPU: 1 PID: 5111 Comm: krxrpcio/0 Not tainted 6.3.0-rc7-build3+ #701
RIP: 0010:rxrpc_alloc_txbuf+0xc0/0x157
...
Call Trace:
<TASK>
rxrpc_send_ACK+0x50/0x13b
rxrpc_input_call_event+0x16a/0x67d
rxrpc_io_thread+0x1b6/0x45f
? _raw_spin_unlock_irqrestore+0x1f/0x35
? rxrpc_input_packet+0x519/0x519
kthread+0xe7/0xef
? kthread_complete_and_exit+0x1b/0x1b
ret_from_fork+0x22/0x30
Fix this by noting the timeouts in struct rxrpc_call when the call is
created. The timer will be started when the first packet is transmitted.
It shouldn't be possible to trigger this directly from userspace through
AF_RXRPC as sendmsg() will return EBUSY if the call is in the
waiting-for-conn state if it dropped out of the wait due to a signal.
Fixes:
|
||
|
|
0eb362d254 |
rxrpc: Make it so that a waiting process can be aborted
When sendmsg() creates an rxrpc call, it queues it to wait for a connection
and channel to be assigned and then waits before it can start shovelling
data as the encrypted DATA packet content includes a summary of the
connection parameters.
However, sendmsg() may get interrupted before a connection gets assigned
and further sendmsg() calls will fail with EBUSY until an assignment is
made.
Fix this so that the call can at least be aborted without failing on
EBUSY. We have to be careful here as sendmsg() mustn't be allowed to start
the call timer if the call doesn't yet have a connection assigned as an
oops may follow shortly thereafter.
Fixes:
|
||
|
|
0d098d83c5 |
rxrpc: Fix hard call timeout units
The hard call timeout is specified in the RXRPC_SET_CALL_TIMEOUT cmsg in
seconds, so fix the point at which sendmsg() applies it to the call to
convert to jiffies from seconds, not milliseconds.
Fixes:
|
||
|
|
526f28bd0f |
net/sched: act_mirred: Add carrier check
There are cases where the device is adminstratively UP, but operationally
down. For example, we have a physical device (Nvidia ConnectX-6 Dx, 25Gbps)
who's cable was pulled out, here is its ip link output:
5: ens2f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
link/ether b8:ce:f6:4b:68:35 brd ff:ff:ff:ff:ff:ff
altname enp179s0f1np1
As you can see, it's administratively UP but operationally down.
In this case, sending a packet to this port caused a nasty kernel hang (so
nasty that we were unable to capture it). Aborting a transmit based on
operational status (in addition to administrative status) fixes the issue.
Fixes:
|
||
|
|
4e1c80ae5c |
Merge tag 'nfsd-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd updates from Chuck Lever: "The big ticket item for this release is that support for RPC-with-TLS [RFC 9289] has been added to the Linux NFS server. The goal is to provide a simple-to-deploy, low-overhead in-transit confidentiality and peer authentication mechanism. It can supplement NFS Kerberos and it can protect the use of legacy non-cryptographic user authentication flavors such as AUTH_SYS. The TLS Record protocol is handled entirely by kTLS, meaning it can use either software encryption or offload encryption to smart NICs. Aside from that, work continues on improving NFSD's open file cache. Among the many clean-ups in that area is a patch to convert the rhashtable to use the list-hashing version of that data structure" * tag 'nfsd-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (31 commits) NFSD: Handle new xprtsec= export option SUNRPC: Support TLS handshake in the server-side TCP socket code NFSD: Clean up xattr memory allocation flags NFSD: Fix problem of COMMIT and NFS4ERR_DELAY in infinite loop SUNRPC: Clear rq_xid when receiving a new RPC Call SUNRPC: Recognize control messages in server-side TCP socket code SUNRPC: Be even lazier about releasing pages SUNRPC: Convert svc_xprt_release() to the release_pages() API SUNRPC: Relocate svc_free_res_pages() nfsd: simplify the delayed disposal list code SUNRPC: Ignore return value of ->xpo_sendto SUNRPC: Ensure server-side sockets have a sock->file NFSD: Watch for rq_pages bounds checking errors in nfsd_splice_actor() sunrpc: simplify two-level sysctl registration for svcrdma_parm_table SUNRPC: return proper error from get_expiry() lockd: add some client-side tracepoints nfs: move nfs_fhandle_hash to common include file lockd: server should unlock lock if client rejects the grant lockd: fix races in client GRANTED_MSG wait logic lockd: move struct nlm_wait to lockd.h ... |
||
|
|
0127f25b5d |
Merge tag 'nfs-for-6.4-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client updates from Anna Schumaker: "New Features: - Convert the readdir path to use folios - Convert the NFS fscache code to use netfs Bugfixes and Cleanups: - Always send a RECLAIM_COMPLETE after establishing a lease - Simplify sysctl registrations and other cleanups - Handle out-of-order write replies on NFS v3 - Have sunrpc call_bind_status use standard hard/soft task semantics - Other minor cleanups" * tag 'nfs-for-6.4-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: NFSv4.2: Rework scratch handling for READ_PLUS NFS: Cleanup unused rpc_clnt variable NFS: set varaiable nfs_netfs_debug_id storage-class-specifier to static SUNRPC: remove the maximum number of retries in call_bind_status NFS: Convert readdir page array functions to use a folio NFS: Convert the readdir array-of-pages into an array-of-folios NFSv3: handle out-of-order write replies. NFS: Remove fscache specific trace points and NFS_INO_FSCACHE bit NFS: Remove all NFSIOS_FSCACHE counters due to conversion to netfs API NFS: Convert buffered read paths to use netfs when fscache is enabled NFS: Configure support for netfs when NFS fscache is configured NFS: Rename readpage_async_filler to nfs_read_add_folio sunrpc: simplify one-level sysctl registration for debug_table sunrpc: move sunrpc_table and proc routines above sunrpc: simplify one-level sysctl registration for xs_tunables_table sunrpc: simplify one-level sysctl registration for xr_tunables_table nfs: simplify two-level sysctl registration for nfs_cb_sysctls nfs: simplify two-level sysctl registration for nfs4_cb_sysctls lockd: simplify two-level sysctl registration for nlm_sysctls NFSv4.1: Always send a RECLAIM_COMPLETE after establishing lease |
||
|
|
dc6456e938 |
net: ipv6: fix skb hash for some RST packets
The skb hash comes from sk->sk_txhash when using TCP, except for some
IPv6 RST packets. This is because in tcp_v6_send_reset when not in
TIME_WAIT the hash is taken from sk->sk_hash, while it should come from
sk->sk_txhash as those two hashes are not computed the same way.
Packetdrill script to test the above,
0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
+0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
+0 > (flowlabel 0x1) S 0:0(0) <...>
// Wrong ack seq, trigger a rst.
+0 < S. 0:0(0) ack 0 win 4000
// Check the flowlabel matches prior one from SYN.
+0 > (flowlabel 0x1) R 0:0(0) <...>
Fixes:
|
||
|
|
c88f8d5cd9 |
sit: update dev->needed_headroom in ipip6_tunnel_bind_dev()
When a tunnel device is bound with the underlying device, its dev->needed_headroom needs to be updated properly. IPv4 tunnels already do the same in ip_tunnel_bind_dev(). Otherwise we may not have enough header room for skb, especially after commit |
||
|
|
da94a7781f |
net/sched: cls_api: remove block_cb from driver_list before freeing
Error handler of tcf_block_bind() frees the whole bo->cb_list on error.
However, by that time the flow_block_cb instances are already in the driver
list because driver ndo_setup_tc() callback is called before that up the
call chain in tcf_block_offload_cmd(). This leaves dangling pointers to
freed objects in the list and causes use-after-free[0]. Fix it by also
removing flow_block_cb instances from driver_list before deallocating them.
[0]:
[ 279.868433] ==================================================================
[ 279.869964] BUG: KASAN: slab-use-after-free in flow_block_cb_setup_simple+0x631/0x7c0
[ 279.871527] Read of size 8 at addr ffff888147e2bf20 by task tc/2963
[ 279.873151] CPU: 6 PID: 2963 Comm: tc Not tainted 6.3.0-rc6+ #4
[ 279.874273] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 279.876295] Call Trace:
[ 279.876882] <TASK>
[ 279.877413] dump_stack_lvl+0x33/0x50
[ 279.878198] print_report+0xc2/0x610
[ 279.878987] ? flow_block_cb_setup_simple+0x631/0x7c0
[ 279.879994] kasan_report+0xae/0xe0
[ 279.880750] ? flow_block_cb_setup_simple+0x631/0x7c0
[ 279.881744] ? mlx5e_tc_reoffload_flows_work+0x240/0x240 [mlx5_core]
[ 279.883047] flow_block_cb_setup_simple+0x631/0x7c0
[ 279.884027] tcf_block_offload_cmd.isra.0+0x189/0x2d0
[ 279.885037] ? tcf_block_setup+0x6b0/0x6b0
[ 279.885901] ? mutex_lock+0x7d/0xd0
[ 279.886669] ? __mutex_unlock_slowpath.constprop.0+0x2d0/0x2d0
[ 279.887844] ? ingress_init+0x1c0/0x1c0 [sch_ingress]
[ 279.888846] tcf_block_get_ext+0x61c/0x1200
[ 279.889711] ingress_init+0x112/0x1c0 [sch_ingress]
[ 279.890682] ? clsact_init+0x2b0/0x2b0 [sch_ingress]
[ 279.891701] qdisc_create+0x401/0xea0
[ 279.892485] ? qdisc_tree_reduce_backlog+0x470/0x470
[ 279.893473] tc_modify_qdisc+0x6f7/0x16d0
[ 279.894344] ? tc_get_qdisc+0xac0/0xac0
[ 279.895213] ? mutex_lock+0x7d/0xd0
[ 279.896005] ? __mutex_lock_slowpath+0x10/0x10
[ 279.896910] rtnetlink_rcv_msg+0x5fe/0x9d0
[ 279.897770] ? rtnl_calcit.isra.0+0x2b0/0x2b0
[ 279.898672] ? __sys_sendmsg+0xb5/0x140
[ 279.899494] ? do_syscall_64+0x3d/0x90
[ 279.900302] ? entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 279.901337] ? kasan_save_stack+0x2e/0x40
[ 279.902177] ? kasan_save_stack+0x1e/0x40
[ 279.903058] ? kasan_set_track+0x21/0x30
[ 279.903913] ? kasan_save_free_info+0x2a/0x40
[ 279.904836] ? ____kasan_slab_free+0x11a/0x1b0
[ 279.905741] ? kmem_cache_free+0x179/0x400
[ 279.906599] netlink_rcv_skb+0x12c/0x360
[ 279.907450] ? rtnl_calcit.isra.0+0x2b0/0x2b0
[ 279.908360] ? netlink_ack+0x1550/0x1550
[ 279.909192] ? rhashtable_walk_peek+0x170/0x170
[ 279.910135] ? kmem_cache_alloc_node+0x1af/0x390
[ 279.911086] ? _copy_from_iter+0x3d6/0xc70
[ 279.912031] netlink_unicast+0x553/0x790
[ 279.912864] ? netlink_attachskb+0x6a0/0x6a0
[ 279.913763] ? netlink_recvmsg+0x416/0xb50
[ 279.914627] netlink_sendmsg+0x7a1/0xcb0
[ 279.915473] ? netlink_unicast+0x790/0x790
[ 279.916334] ? iovec_from_user.part.0+0x4d/0x220
[ 279.917293] ? netlink_unicast+0x790/0x790
[ 279.918159] sock_sendmsg+0xc5/0x190
[ 279.918938] ____sys_sendmsg+0x535/0x6b0
[ 279.919813] ? import_iovec+0x7/0x10
[ 279.920601] ? kernel_sendmsg+0x30/0x30
[ 279.921423] ? __copy_msghdr+0x3c0/0x3c0
[ 279.922254] ? import_iovec+0x7/0x10
[ 279.923041] ___sys_sendmsg+0xeb/0x170
[ 279.923854] ? copy_msghdr_from_user+0x110/0x110
[ 279.924797] ? ___sys_recvmsg+0xd9/0x130
[ 279.925630] ? __perf_event_task_sched_in+0x183/0x470
[ 279.926656] ? ___sys_sendmsg+0x170/0x170
[ 279.927529] ? ctx_sched_in+0x530/0x530
[ 279.928369] ? update_curr+0x283/0x4f0
[ 279.929185] ? perf_event_update_userpage+0x570/0x570
[ 279.930201] ? __fget_light+0x57/0x520
[ 279.931023] ? __switch_to+0x53d/0xe70
[ 279.931846] ? sockfd_lookup_light+0x1a/0x140
[ 279.932761] __sys_sendmsg+0xb5/0x140
[ 279.933560] ? __sys_sendmsg_sock+0x20/0x20
[ 279.934436] ? fpregs_assert_state_consistent+0x1d/0xa0
[ 279.935490] do_syscall_64+0x3d/0x90
[ 279.936300] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 279.937311] RIP: 0033:0x7f21c814f887
[ 279.938085] Code: 0a 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
[ 279.941448] RSP: 002b:00007fff11efd478 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 279.942964] RAX: ffffffffffffffda RBX: 0000000064401979 RCX: 00007f21c814f887
[ 279.944337] RDX: 0000000000000000 RSI: 00007fff11efd4e0 RDI: 0000000000000003
[ 279.945660] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
[ 279.947003] R10: 00007f21c8008708 R11: 0000000000000246 R12: 0000000000000001
[ 279.948345] R13: 0000000000409980 R14: 000000000047e538 R15: 0000000000485400
[ 279.949690] </TASK>
[ 279.950706] Allocated by task 2960:
[ 279.951471] kasan_save_stack+0x1e/0x40
[ 279.952338] kasan_set_track+0x21/0x30
[ 279.953165] __kasan_kmalloc+0x77/0x90
[ 279.954006] flow_block_cb_setup_simple+0x3dd/0x7c0
[ 279.955001] tcf_block_offload_cmd.isra.0+0x189/0x2d0
[ 279.956020] tcf_block_get_ext+0x61c/0x1200
[ 279.956881] ingress_init+0x112/0x1c0 [sch_ingress]
[ 279.957873] qdisc_create+0x401/0xea0
[ 279.958656] tc_modify_qdisc+0x6f7/0x16d0
[ 279.959506] rtnetlink_rcv_msg+0x5fe/0x9d0
[ 279.960392] netlink_rcv_skb+0x12c/0x360
[ 279.961216] netlink_unicast+0x553/0x790
[ 279.962044] netlink_sendmsg+0x7a1/0xcb0
[ 279.962906] sock_sendmsg+0xc5/0x190
[ 279.963702] ____sys_sendmsg+0x535/0x6b0
[ 279.964534] ___sys_sendmsg+0xeb/0x170
[ 279.965343] __sys_sendmsg+0xb5/0x140
[ 279.966132] do_syscall_64+0x3d/0x90
[ 279.966908] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 279.968407] Freed by task 2960:
[ 279.969114] kasan_save_stack+0x1e/0x40
[ 279.969929] kasan_set_track+0x21/0x30
[ 279.970729] kasan_save_free_info+0x2a/0x40
[ 279.971603] ____kasan_slab_free+0x11a/0x1b0
[ 279.972483] __kmem_cache_free+0x14d/0x280
[ 279.973337] tcf_block_setup+0x29d/0x6b0
[ 279.974173] tcf_block_offload_cmd.isra.0+0x226/0x2d0
[ 279.975186] tcf_block_get_ext+0x61c/0x1200
[ 279.976080] ingress_init+0x112/0x1c0 [sch_ingress]
[ 279.977065] qdisc_create+0x401/0xea0
[ 279.977857] tc_modify_qdisc+0x6f7/0x16d0
[ 279.978695] rtnetlink_rcv_msg+0x5fe/0x9d0
[ 279.979562] netlink_rcv_skb+0x12c/0x360
[ 279.980388] netlink_unicast+0x553/0x790
[ 279.981214] netlink_sendmsg+0x7a1/0xcb0
[ 279.982043] sock_sendmsg+0xc5/0x190
[ 279.982827] ____sys_sendmsg+0x535/0x6b0
[ 279.983703] ___sys_sendmsg+0xeb/0x170
[ 279.984510] __sys_sendmsg+0xb5/0x140
[ 279.985298] do_syscall_64+0x3d/0x90
[ 279.986076] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 279.987532] The buggy address belongs to the object at ffff888147e2bf00
which belongs to the cache kmalloc-192 of size 192
[ 279.989747] The buggy address is located 32 bytes inside of
freed 192-byte region [ffff888147e2bf00, ffff888147e2bfc0)
[ 279.992367] The buggy address belongs to the physical page:
[ 279.993430] page:00000000550f405c refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147e2a
[ 279.995182] head:00000000550f405c order:1 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[ 279.996713] anon flags: 0x200000000010200(slab|head|node=0|zone=2)
[ 279.997878] raw: 0200000000010200 ffff888100042a00 0000000000000000 dead000000000001
[ 279.999384] raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000
[ 280.000894] page dumped because: kasan: bad access detected
[ 280.002386] Memory state around the buggy address:
[ 280.003338] ffff888147e2be00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 280.004781] ffff888147e2be80: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[ 280.006224] >ffff888147e2bf00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 280.007700] ^
[ 280.008592] ffff888147e2bf80: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[ 280.010035] ffff888147e2c000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 280.011564] ==================================================================
Fixes:
|
||
|
|
7e692df393 |
tcp: fix skb_copy_ubufs() vs BIG TCP
David Ahern reported crashes in skb_copy_ubufs() caused by TCP tx zerocopy
using hugepages, and skb length bigger than ~68 KB.
skb_copy_ubufs() assumed it could copy all payload using up to
MAX_SKB_FRAGS order-0 pages.
This assumption broke when BIG TCP was able to put up to 512 KB per skb.
We did not hit this bug at Google because we use CONFIG_MAX_SKB_FRAGS=45
and limit gso_max_size to 180000.
A solution is to use higher order pages if needed.
v2: add missing __GFP_COMP, or we leak memory.
Fixes:
|
||
|
|
6f75cd166a |
net/ncsi: clear Tx enable mode when handling a Config required AEN
ncsi_channel_is_tx() determines whether a given channel should be
used for Tx or not. However, when reconfiguring the channel by
handling a Configuration Required AEN, there is a misjudgment that
the channel Tx has already been enabled, which results in the Enable
Channel Network Tx command not being sent.
Clear the channel Tx enable flag before reconfiguring the channel to
avoid the misjudgment.
Fixes:
|
||
|
|
7fa8a8ee94 |
Merge tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of
switching from a user process to a kernel thread.
- More folio conversions from Kefeng Wang, Zhang Peng and Pankaj
Raghav.
- zsmalloc performance improvements from Sergey Senozhatsky.
- Yue Zhao has found and fixed some data race issues around the
alteration of memcg userspace tunables.
- VFS rationalizations from Christoph Hellwig:
- removal of most of the callers of write_one_page()
- make __filemap_get_folio()'s return value more useful
- Luis Chamberlain has changed tmpfs so it no longer requires swap
backing. Use `mount -o noswap'.
- Qi Zheng has made the slab shrinkers operate locklessly, providing
some scalability benefits.
- Keith Busch has improved dmapool's performance, making part of its
operations O(1) rather than O(n).
- Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd,
permitting userspace to wr-protect anon memory unpopulated ptes.
- Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive
rather than exclusive, and has fixed a bunch of errors which were
caused by its unintuitive meaning.
- Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature,
which causes minor faults to install a write-protected pte.
- Vlastimil Babka has done some maintenance work on vma_merge():
cleanups to the kernel code and improvements to our userspace test
harness.
- Cleanups to do_fault_around() by Lorenzo Stoakes.
- Mike Rapoport has moved a lot of initialization code out of various
mm/ files and into mm/mm_init.c.
- Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for
DRM, but DRM doesn't use it any more.
- Lorenzo has also coverted read_kcore() and vread() to use iterators
and has thereby removed the use of bounce buffers in some cases.
- Lorenzo has also contributed further cleanups of vma_merge().
- Chaitanya Prakash provides some fixes to the mmap selftesting code.
- Matthew Wilcox changes xfs and afs so they no longer take sleeping
locks in ->map_page(), a step towards RCUification of pagefaults.
- Suren Baghdasaryan has improved mmap_lock scalability by switching to
per-VMA locking.
- Frederic Weisbecker has reworked the percpu cache draining so that it
no longer causes latency glitches on cpu isolated workloads.
- Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig
logic.
- Liu Shixin has changed zswap's initialization so we no longer waste a
chunk of memory if zswap is not being used.
- Yosry Ahmed has improved the performance of memcg statistics
flushing.
- David Stevens has fixed several issues involving khugepaged,
userfaultfd and shmem.
- Christoph Hellwig has provided some cleanup work to zram's IO-related
code paths.
- David Hildenbrand has fixed up some issues in the selftest code's
testing of our pte state changing.
- Pankaj Raghav has made page_endio() unneeded and has removed it.
- Peter Xu contributed some rationalizations of the userfaultfd
selftests.
- Yosry Ahmed has fixed an issue around memcg's page recalim
accounting.
- Chaitanya Prakash has fixed some arm-related issues in the
selftests/mm code.
- Longlong Xia has improved the way in which KSM handles hwpoisoned
pages.
- Peter Xu fixes a few issues with uffd-wp at fork() time.
- Stefan Roesch has changed KSM so that it may now be used on a
per-process and per-cgroup basis.
* tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits)
mm,unmap: avoid flushing TLB in batch if PTE is inaccessible
shmem: restrict noswap option to initial user namespace
mm/khugepaged: fix conflicting mods to collapse_file()
sparse: remove unnecessary 0 values from rc
mm: move 'mmap_min_addr' logic from callers into vm_unmapped_area()
hugetlb: pte_alloc_huge() to replace huge pte_alloc_map()
maple_tree: fix allocation in mas_sparse_area()
mm: do not increment pgfault stats when page fault handler retries
zsmalloc: allow only one active pool compaction context
selftests/mm: add new selftests for KSM
mm: add new KSM process and sysfs knobs
mm: add new api to enable ksm per process
mm: shrinkers: fix debugfs file permissions
mm: don't check VMA write permissions if the PTE/PMD indicates write permissions
migrate_pages_batch: fix statistics for longterm pin retry
userfaultfd: use helper function range_in_vma()
lib/show_mem.c: use for_each_populated_zone() simplify code
mm: correct arg in reclaim_pages()/reclaim_clean_pages_from_list()
fs/buffer: convert create_page_buffers to folio_create_buffers
fs/buffer: add folio_create_empty_buffers helper
...
|
||
|
|
b6a7828502 |
Merge tag 'modules-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pull module updates from Luis Chamberlain:
"The summary of the changes for this pull requests is:
- Song Liu's new struct module_memory replacement
- Nick Alcock's MODULE_LICENSE() removal for non-modules
- My cleanups and enhancements to reduce the areas where we vmalloc
module memory for duplicates, and the respective debug code which
proves the remaining vmalloc pressure comes from userspace.
Most of the changes have been in linux-next for quite some time except
the minor fixes I made to check if a module was already loaded prior
to allocating the final module memory with vmalloc and the respective
debug code it introduces to help clarify the issue. Although the
functional change is small it is rather safe as it can only *help*
reduce vmalloc space for duplicates and is confirmed to fix a bootup
issue with over 400 CPUs with KASAN enabled. I don't expect stable
kernels to pick up that fix as the cleanups would have also had to
have been picked up. Folks on larger CPU systems with modules will
want to just upgrade if vmalloc space has been an issue on bootup.
Given the size of this request, here's some more elaborate details:
The functional change change in this pull request is the very first
patch from Song Liu which replaces the 'struct module_layout' with a
new 'struct module_memory'. The old data structure tried to put
together all types of supported module memory types in one data
structure, the new one abstracts the differences in memory types in a
module to allow each one to provide their own set of details. This
paves the way in the future so we can deal with them in a cleaner way.
If you look at changes they also provide a nice cleanup of how we
handle these different memory areas in a module. This change has been
in linux-next since before the merge window opened for v6.3 so to
provide more than a full kernel cycle of testing. It's a good thing as
quite a bit of fixes have been found for it.
Jason Baron then made dynamic debug a first class citizen module user
by using module notifier callbacks to allocate / remove module
specific dynamic debug information.
Nick Alcock has done quite a bit of work cross-tree to remove module
license tags from things which cannot possibly be module at my request
so to:
a) help him with his longer term tooling goals which require a
deterministic evaluation if a piece a symbol code could ever be
part of a module or not. But quite recently it is has been made
clear that tooling is not the only one that would benefit.
Disambiguating symbols also helps efforts such as live patching,
kprobes and BPF, but for other reasons and R&D on this area is
active with no clear solution in sight.
b) help us inch closer to the now generally accepted long term goal
of automating all the MODULE_LICENSE() tags from SPDX license tags
In so far as a) is concerned, although module license tags are a no-op
for non-modules, tools which would want create a mapping of possible
modules can only rely on the module license tag after the commit
|
||
|
|
b3cbf98e2f |
SUNRPC: Support TLS handshake in the server-side TCP socket code
This patch adds opportunitistic RPC-with-TLS to the Linux in-kernel NFS server. If the client requests RPC-with-TLS and the user space handshake agent is running, the server will set up a TLS session. There are no policy settings yet. For example, the server cannot yet require the use of RPC-with-TLS to access its data. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> |
||
|
|
695bc1f32c |
SUNRPC: Clear rq_xid when receiving a new RPC Call
This is an eye-catcher for tracepoints that record the XID: it means svc_rqst() has not received a full RPC Call with an XID yet. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> |