Pull networking fixes from Jakub Kicinski:
"A little calmer than usual, probably just the timing of sub-tree PRs.
Including fixes from netfilter.
Current release - regressions:
- inet: bring NLM_DONE out to a separate recv() again, fix user space
which assumes multiple recv()s will happen and gets blocked forever
- drv: mlx5:
- restore mistakenly dropped parts in register devlink flow
- use channel mdev reference instead of global mdev instance for
coalescing
- acquire RTNL lock before RQs/SQs activation/deactivation
Previous releases - regressions:
- net: change maximum number of UDP segments to 128, fix virtio
compatibility with Windows peers
- usb: ax88179_178a: avoid writing the mac address before first
reading
Previous releases - always broken:
- sched: fix mirred deadlock on device recursion
- netfilter:
- br_netfilter: skip conntrack input hook for promisc packets
- fixes removal of duplicate elements in the pipapo set backend
- various fixes for abort paths and error handling
- af_unix: don't peek OOB data without MSG_OOB
- drv: flower: fix fragment flags handling in multiple drivers
- drv: ravb: fix jumbo frames and packet stats accounting
Misc:
- kselftest_harness: fix Clang warning about zero-length format
- tun: limit printing rate when illegal packet received by tun dev"
* tag 'net-6.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (46 commits)
net: ethernet: ti: am65-cpsw-nuss: cleanup DMA Channels before using them
net: usb: ax88179_178a: avoid writing the mac address before first reading
net: ravb: Fix RX byte accounting for jumbo packets
net: ravb: Fix GbEth jumbo packet RX checksum handling
net: ravb: Allow RX loop to move past DMA mapping errors
net: ravb: Count packets instead of descriptors in R-Car RX path
net: ethernet: mtk_eth_soc: fix WED + wifi reset
net:usb:qmi_wwan: support Rolling modules
selftests: kselftest_harness: fix Clang warning about zero-length format
net/sched: Fix mirred deadlock on device recursion
netfilter: nf_tables: fix memleak in map from abort path
netfilter: nf_tables: restore set elements when delete set fails
netfilter: nf_tables: missing iterator type in lookup walk
s390/ism: Properly fix receive message buffer allocation
net: dsa: mt7530: fix port mirroring for MT7988 SoC switch
net: dsa: mt7530: fix mirroring frames received on local port
tun: limit printing rate when illegal packet received by tun dev
ice: Fix checking for unsupported keys on non-tunnel device
ice: tc: allow zero flags in parsing tc flower
ice: tc: check src_vsi in case of traffic from VF
...
On my new laptop with packages from nixos-unstable, gcc 12.3.0 produces
> lib/setup.c: In function ‘__test_msg’:
> lib/setup.c:20:9: error: format not a string literal and no format arguments [-Werror=format-security]
> 20 | ksft_print_msg(buf);
> | ^~~~~~~~~~~~~~
> lib/setup.c: In function ‘__test_ok’:
> lib/setup.c:26:9: error: format not a string literal and no format arguments [-Werror=format-security]
> 26 | ksft_test_result_pass(buf);
> | ^~~~~~~~~~~~~~~~~~~~~
> lib/setup.c: In function ‘__test_fail’:
> lib/setup.c:32:9: error: format not a string literal and no format arguments [-Werror=format-security]
> 32 | ksft_test_result_fail(buf);
> | ^~~~~~~~~~~~~~~~~~~~~
> lib/setup.c: In function ‘__test_xfail’:
> lib/setup.c:38:9: error: format not a string literal and no format arguments [-Werror=format-security]
> 38 | ksft_test_result_xfail(buf);
> | ^~~~~~~~~~~~~~~~~~~~~~
> lib/setup.c: In function ‘__test_error’:
> lib/setup.c:44:9: error: format not a string literal and no format arguments [-Werror=format-security]
> 44 | ksft_test_result_error(buf);
> | ^~~~~~~~~~~~~~~~~~~~~~
> lib/setup.c: In function ‘__test_skip’:
> lib/setup.c:50:9: error: format not a string literal and no format arguments [-Werror=format-security]
> 50 | ksft_test_result_skip(buf);
> | ^~~~~~~~~~~~~~~~~~~~~
> cc1: some warnings being treated as errors
As the buffer was already pre-printed into, print it as a string
rather than a format-string.
Fixes: cfbab37b3d ("selftests/net: Add TCP-AO library")
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Reported-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
On my new laptop with packages from nixos-unstable, gcc 12.3.0 produces:
> lib/proc.c: In function ‘netstat_read_type’:
> lib/proc.c:89:9: error: format not a string literal and no format arguments [-Werror=format-security]
> 89 | if (fscanf(fnetstat, type->header_name) == EOF)
> | ^~
> cc1: some warnings being treated as errors
Here the selftests lib parses header name, while expectes non-space word
ending with a column.
Fixes: cfbab37b3d ("selftests/net: Add TCP-AO library")
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Reported-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The structure is on the stack and has to be zero-initialized as
the kernel checks for:
> if (in.reserved != 0 || in.reserved2 != 0)
> return -EINVAL;
Fixes: b26660531c ("selftests/net: Add test for TCP-AO add setsockopt() command")
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Currently, "active reset" cases are flaky, because select() is called
for 3 sockets, while only 2 are expected to receive RST.
The idea of the third socket was to get into request_sock_queue,
but the test mistakenly attempted to connect() after the listener
socket was shut down.
Repair this test, it's important to check the different kernel
code-paths for signing RST TCP-AO segments.
Fixes: c6df7b2361 ("selftests/net: Add TCP-AO RST test")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Pull kselftest fixes from Shuah Khan:
"A fix to kselftest harness to prevent infinite loop triggered in an
assert in FIXTURE_TEARDOWN and a fix to a problem seen in being able
to stop subsystem-enable tests when sched events are being traced"
* tag 'linux_kselftest-fixes-6.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/harness: Prevent infinite loop due to Assert in FIXTURE_TEARDOWN
selftests/ftrace: Limit length in subsystem-enable tests
The commit fc8b2a6194
("net: more strict VIRTIO_NET_HDR_GSO_UDP_L4 validation")
adds check of potential number of UDP segments vs
UDP_MAX_SEGMENTS in linux/virtio_net.h.
After this change certification test of USO guest-to-guest
transmit on Windows driver for virtio-net device fails,
for example with packet size of ~64K and mss of 536 bytes.
In general the USO should not be more restrictive than TSO.
Indeed, in case of unreasonably small mss a lot of segments
can cause queue overflow and packet loss on the destination.
Limit of 128 segments is good for any practical purpose,
with minimal meaningful mss of 536 the maximal UDP packet will
be divided to ~120 segments.
The number of segments for UDP packets is validated vs
UDP_MAX_SEGMENTS also in udp.c (v4,v6), this does not affect
quest-to-guest path but does affect packets sent to host, for
example.
It is important to mention that UDP_MAX_SEGMENTS is kernel-only
define and not available to user mode socket applications.
In order to request MSS smaller than MTU the applications
just uses setsockopt with SOL_UDP and UDP_SEGMENT and there is
no limitations on socket API level.
Fixes: fc8b2a6194 ("net: more strict VIRTIO_NET_HDR_GSO_UDP_L4 validation")
Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Building with clang results in the following warning:
posix_timers.c:69:6: warning: absolute value function 'abs' given an
argument of type 'long long' but has parameter of type 'int' which may
cause truncation of value [-Wabsolute-value]
if (abs(diff - DELAY * USECS_PER_SEC) > USECS_PER_SEC / 2) {
^
So switch to using llabs() instead.
Fixes: 0bc4b0cf15 ("selftests: add basic posix timers selftests")
Signed-off-by: John Stultz <jstultz@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240410232637.4135564-3-jstultz@google.com
After commit 6d029c25b7 ("selftests/timers/posix_timers: Reimplement
check_timer_distribution()"), clang warns:
tools/testing/selftests/timers/../kselftest.h:398:6: warning: variable 'major' is used uninitialized whenever '||' condition is true [-Wsometimes-uninitialized]
398 | if (uname(&info) || sscanf(info.release, "%u.%u.", &major, &minor) != 2)
| ^~~~~~~~~~~~
tools/testing/selftests/timers/../kselftest.h:401:9: note: uninitialized use occurs here
401 | return major > min_major || (major == min_major && minor >= min_minor);
| ^~~~~
tools/testing/selftests/timers/../kselftest.h:398:6: note: remove the '||' if its condition is always false
398 | if (uname(&info) || sscanf(info.release, "%u.%u.", &major, &minor) != 2)
| ^~~~~~~~~~~~~~~
tools/testing/selftests/timers/../kselftest.h:395:20: note: initialize the variable 'major' to silence this warning
395 | unsigned int major, minor;
| ^
| = 0
This is a false positive because if uname() fails, ksft_exit_fail_msg()
will be called, which unconditionally calls exit(), a noreturn function.
However, clang does not know that ksft_exit_fail_msg() will call exit() at
the point in the pipeline that the warning is emitted because inlining has
not occurred, so it assumes control flow will resume normally after
ksft_exit_fail_msg() is called.
Make it clear to clang that all of the functions that call exit()
unconditionally in kselftest.h are noreturn transitively by marking them
explicitly with '__attribute__((__noreturn__))', which clears up the
warning above and any future warnings that may appear for the same reason.
Fixes: 6d029c25b7 ("selftests/timers/posix_timers: Reimplement check_timer_distribution()")
Reported-by: John Stultz <jstultz@google.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240411-mark-kselftest-exit-funcs-noreturn-v1-1-b027c948f586@kernel.org
Closes: https://lore.kernel.org/all/20240410232637.4135564-2-jstultz@google.com/
Pull cxl fixes from Dave Jiang:
- Fix index of Clear Event Record handles in cxl_clear_event_record()
- Fix use before init of map->reg_type in cxl_decode_regblock()
- Fix initialization of mbox_cmd.size_out in cxl_mem_get_records_log()
- Fix CXL path access_coordinate computation:
- Remove unneded check of iter in loop
- Fix of retrieving of access_coordinate in PCI topology walk
- Fix of incorrect region access_coordinate data calculation
- Consolidate of access_coordinates attached to downstream port
context
- Add check to validate access_coordinate validity to prevent
incorrect data being exposed via sysfs
* tag 'cxl-fixes-6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl: Add checks to access_coordinate calculation to fail missing data
cxl: Consolidate dport access_coordinate ->hb_coord and ->sw_coord into ->coord
cxl: Fix incorrect region perf data calculation
cxl: Fix retrieving of access_coordinates in PCIe path
cxl: Remove checking of iter in cxl_endpoint_get_perf_coordinates()
cxl/core: Fix initialization of mbox_cmd.size_out in get event
cxl/core/regs: Fix usage of map->reg_type in cxl_decode_regblock() before assigned
cxl/mem: Fix for the index of Clear Event Record Handle
Pull turbostat updates from Len Brown:
- Use of the CPU MSR driver is now optional
- Perf is now preferred for many counters
- Non-root users can now execute turbostat, though with limited
functionality
- Add counters for some new GFX hardware
- Minor fixes
* tag 'turbostat-2024.04.10' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (26 commits)
tools/power turbostat: v2024.04.10
tools/power/turbostat: Add support for Xe sysfs knobs
tools/power/turbostat: Add support for new i915 sysfs knobs
tools/power/turbostat: Introduce BIC_SAM_mc6/BIC_SAMMHz/BIC_SAMACTMHz
tools/power/turbostat: Fix uncore frequency file string
tools/power/turbostat: Unify graphics sysfs snapshots
tools/power/turbostat: Cache graphics sysfs path
tools/power/turbostat: Enable MSR_CORE_C1_RES support for ICX
tools/power turbostat: Add selftests
tools/power turbostat: read RAPL counters via perf
tools/power turbostat: Add proper re-initialization for perf file descriptors
tools/power turbostat: Clear added counters when in no-msr mode
tools/power turbostat: add early exits for permission checks
tools/power turbostat: detect and disable unavailable BICs at runtime
tools/power turbostat: Add reading aperf and mperf via perf API
tools/power turbostat: Add --no-perf option
tools/power turbostat: Add --no-msr option
tools/power turbostat: enhance -D (debug counter dump) output
tools/power turbostat: Fix warning upon failed /dev/cpu_dma_latency read
tools/power turbostat: Read base_hz and bclk from CPUID.16H if available
...
The struct adjtimex freq field takes a signed value who's units are in
shifted (<<16) parts-per-million.
Unfortunately for negative adjustments, the straightforward use of:
freq = ppm << 16 trips undefined behavior warnings with clang:
valid-adjtimex.c:66:6: warning: shifting a negative signed value is undefined [-Wshift-negative-value]
-499<<16,
~~~~^
valid-adjtimex.c:67:6: warning: shifting a negative signed value is undefined [-Wshift-negative-value]
-450<<16,
~~~~^
..
Fix it by using a multiply by (1 << 16) instead of shifting negative values
in the valid-adjtimex test case. Align the values for better readability.
Reported-by: Lee Jones <joneslee@google.com>
Reported-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: John Stultz <jstultz@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Link: https://lore.kernel.org/r/20240409202222.2830476-1-jstultz@google.com
Link: https://lore.kernel.org/lkml/0c6d4f0d-2064-4444-986b-1d1ed782135f@collabora.com/
Much of turbostat can now run with perf, rather than using the MSR driver
Some of turbostat can now run as a regular non-root user.
Add some new output columns for some new GFX hardware.
[This patch updates the version, but otherwise changes no function;
it touches up some checkpatch issues from previous patches]
Signed-off-by: Len Brown <len.brown@intel.com>
check_timer_distribution() runs ten threads in a busy loop and tries to
test that the kernel distributes a process posix CPU timer signal to every
thread over time.
There is not guarantee that this is true even after commit bcb7ee7902
("posix-timers: Prefer delivery of signals to the current thread") because
that commit only avoids waking up the sleeping process leader thread, but
that has nothing to do with the actual signal delivery.
As the signal is process wide the first thread which observes sigpending
and wins the race to lock sighand will deliver the signal. Testing shows
that this hangs on a regular base because some threads never win the race.
The comment "This primarily tests that the kernel does not favour any one."
is wrong. The kernel does favour a thread which hits the timer interrupt
when CLOCK_PROCESS_CPUTIME_ID expires.
Rewrite the test so it only checks that the group leader sleeping in join()
never receives SIGALRM and the thread which burns CPU cycles receives all
signals.
In older kernels which do not have commit bcb7ee7902 ("posix-timers:
Prefer delivery of signals to the current thread") the test-case fails
immediately, the very 1st tick wakes the leader up. Otherwise it quickly
succeeds after 100 ticks.
CI testing wants to use newer selftest versions on stable kernels. In this
case the test is guaranteed to fail.
So check in the failure case whether the kernel version is less than v6.3
and skip the test result in that case.
[ tglx: Massaged change log, renamed the version check helper ]
Fixes: e797203fb3 ("selftests/timers/posix_timers: Test delivery of signals across threads")
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240409133802.GD29396@redhat.com
Pull misc fixes from Andrew Morton:
"8 hotfixes, 3 are cc:stable
There are a couple of fixups for this cycle's vmalloc changes and one
for the stackdepot changes. And a fix for a very old x86 PAT issue
which can cause a warning splat"
* tag 'mm-hotfixes-stable-2024-04-05-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
stackdepot: rename pool_index to pool_index_plus_1
x86/mm/pat: fix VM_PAT handling in COW mappings
MAINTAINERS: change vmware.com addresses to broadcom.com
selftests/mm: include strings.h for ffsl
mm: vmalloc: fix lockdep warning
mm: vmalloc: bail out early in find_vmap_area() if vmap is not init
init: open output files from cpio unpacking with O_LARGEFILE
mm/secretmem: fix GUP-fast succeeding on secretmem folios
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter, bluetooth and bpf.
Fairly usual collection of driver and core fixes. The large selftest
accompanying one of the fixes is also becoming a common occurrence.
Current release - regressions:
- ipv6: fix infinite recursion in fib6_dump_done()
- net/rds: fix possible null-deref in newly added error path
Current release - new code bugs:
- net: do not consume a full cacheline for system_page_pool
- bpf: fix bpf_arena-related file descriptor leaks in the verifier
- drv: ice: fix freeing uninitialized pointers, fixing misuse of the
newfangled __free() auto-cleanup
Previous releases - regressions:
- x86/bpf: fixes the BPF JIT with retbleed=stuff
- xen-netfront: add missing skb_mark_for_recycle, fix page pool
accounting leaks, revealed by recently added explicit warning
- tcp: fix bind() regression for v6-only wildcard and v4-mapped-v6
non-wildcard addresses
- Bluetooth:
- replace "hci_qca: Set BDA quirk bit if fwnode exists in DT" with
better workarounds to un-break some buggy Qualcomm devices
- set conn encrypted before conn establishes, fix re-connecting to
some headsets which use slightly unusual sequence of msgs
- mptcp:
- prevent BPF accessing lowat from a subflow socket
- don't account accept() of non-MPC client as fallback to TCP
- drv: mana: fix Rx DMA datasize and skb_over_panic
- drv: i40e: fix VF MAC filter removal
Previous releases - always broken:
- gro: various fixes related to UDP tunnels - netns crossing
problems, incorrect checksum conversions, and incorrect packet
transformations which may lead to panics
- bpf: support deferring bpf_link dealloc to after RCU grace period
- nf_tables:
- release batch on table validation from abort path
- release mutex after nft_gc_seq_end from abort path
- flush pending destroy work before exit_net release
- drv: r8169: skip DASH fw status checks when DASH is disabled"
* tag 'net-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (81 commits)
netfilter: validate user input for expected length
net/sched: act_skbmod: prevent kernel-infoleak
net: usb: ax88179_178a: avoid the interface always configured as random address
net: dsa: sja1105: Fix parameters order in sja1110_pcs_mdio_write_c45()
net: ravb: Always update error counters
net: ravb: Always process TX descriptor ring
netfilter: nf_tables: discard table flag update with pending basechain deletion
netfilter: nf_tables: Fix potential data-race in __nft_flowtable_type_get()
netfilter: nf_tables: reject new basechain after table flag update
netfilter: nf_tables: flush pending destroy work before exit_net release
netfilter: nf_tables: release mutex after nft_gc_seq_end from abort path
netfilter: nf_tables: release batch on table validation from abort path
Revert "tg3: Remove residual error handling in tg3_suspend"
tg3: Remove residual error handling in tg3_suspend
net: mana: Fix Rx DMA datasize and skb_over_panic
net/sched: fix lockdep splat in qdisc_tree_reduce_backlog()
net: phy: micrel: lan8814: Fix when enabling/disabling 1-step timestamping
net: stmmac: fix rx queue priority assignment
net: txgbe: fix i2c dev name cannot match clkdev
net: fec: Set mac_managed_pm during probe
...
This patch addresses an issue in the selftests/harness where an
assertion within FIXTURE_TEARDOWN could trigger an infinite loop.
The problem arises because the teardown procedure is meant to
execute once, but the presence of failing assertions (ASSERT_EQ(0, 1))
leads to repeated attempts to execute teardown due to
the long jump mechanism used by the harness for handling assertions.
To resolve this, the patch ensures that the teardown process
runs only once, regardless of assertion outcomes, preventing
the infinite loop and allowing tests to fail.
A simple test demo(test.c):
#include "kselftest_harness.h"
FIXTURE(f)
{
int fd;
};
FIXTURE_SETUP(f)
{
self->fd = 0;
}
FIXTURE_TEARDOWN(f)
{
TH_LOG("TEARDOWN");
ASSERT_EQ(0, 1);
self->fd = -1;
}
TEST_F(f, open_close)
{
ASSERT_NE(self->fd, 1);
}
TEST_HARNESS_MAIN
will always output the following output due to a dead loop until timeout:
# test.c:15:open_close:TEARDOWN
# test.c:16:open_close:Expected 0 (0) == 1 (1)
# test.c:15:open_close:TEARDOWN
# test.c:16:open_close:Expected 0 (0) == 1 (1)
...
But here's what we should and expect to get:
TAP version 13
1..1
# Starting 1 tests from 2 test cases.
# RUN f.open_close ...
# test.c:15:open_close:TEARDOWN
# test.c:16:open_close:Expected 0 (0) == 1 (1)
# open_close: Test terminated by assertion
# FAIL f.open_close
not ok 1 f.open_close
# FAILED: 0 / 1 tests passed.
# Totals: pass:0 fail:1 xfail:0 xpass:0 skip:0 error:0
also this is related to the issue mentioned in this patch
https://patchwork.kernel.org/project/linux-kselftest/patch/e2ba3f8c-80e6-477d-9cea-1c9af820e0ed@alu.unizg.hr/
Signed-off-by: Shengyu Li <shengyu.li.evgeny@gmail.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
While sched* events being traced and sched* events continuously happen,
"[xx] event tracing - enable/disable with subsystem level files" would
not stop as on some slower systems it seems to take forever.
Select the first 100 lines of output would be enough to judge whether
there are more than 3 types of sched events.
Fixes: 815b18ea66 ("ftracetest: Add basic event tracing test cases")
Cc: stable@vger.kernel.org
Signed-off-by: Yuanhe Shu <xiangzao@linux.alibaba.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Pull KVM fixes from Paolo Bonzini:
"ARM:
- Ensure perf events programmed to count during guest execution are
actually enabled before entering the guest in the nVHE
configuration
- Restore out-of-range handler for stage-2 translation faults
- Several fixes to stage-2 TLB invalidations to avoid stale
translations, possibly including partial walk caches
- Fix early handling of architectural VHE-only systems to ensure E2H
is appropriately set
- Correct a format specifier warning in the arch_timer selftest
- Make the KVM banner message correctly handle all of the possible
configurations
RISC-V:
- Remove redundant semicolon in num_isa_ext_regs()
- Fix APLIC setipnum_le/be write emulation
- Fix APLIC in_clrip[x] read emulation
x86:
- Fix a bug in KVM_SET_CPUID{2,} where KVM looks at the wrong CPUID
entries (old vs. new) and ultimately neglects to clear PV_UNHALT
from vCPUs with HLT-exiting disabled
- Documentation fixes for SEV
- Fix compat ABI for KVM_MEMORY_ENCRYPT_OP
- Fix a 14-year-old goof in a declaration shared by host and guest;
the enabled field used by Linux when running as a guest pushes the
size of "struct kvm_vcpu_pv_apf_data" from 64 to 68 bytes. This is
really unconsequential because KVM never consumes anything beyond
the first 64 bytes, but the resulting struct does not match the
documentation
Selftests:
- Fix spelling mistake in arch_timer selftest"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits)
KVM: arm64: Rationalise KVM banner output
arm64: Fix early handling of FEAT_E2H0 not being implemented
KVM: arm64: Ensure target address is granule-aligned for range TLBI
KVM: arm64: Use TLBI_TTL_UNKNOWN in __kvm_tlb_flush_vmid_range()
KVM: arm64: Don't pass a TLBI level hint when zapping table entries
KVM: arm64: Don't defer TLB invalidation when zapping table entries
KVM: selftests: Fix __GUEST_ASSERT() format warnings in ARM's arch timer test
KVM: arm64: Fix out-of-IPA space translation fault handling
KVM: arm64: Fix host-programmed guest events in nVHE
RISC-V: KVM: Fix APLIC in_clrip[x] read emulation
RISC-V: KVM: Fix APLIC setipnum_le/be write emulation
RISC-V: KVM: Remove second semicolon
KVM: selftests: Fix spelling mistake "trigged" -> "triggered"
Documentation: kvm/sev: clarify usage of KVM_MEMORY_ENCRYPT_OP
Documentation: kvm/sev: separate description of firmware
KVM: SEV: fix compat ABI for KVM_MEMORY_ENCRYPT_OP
KVM: selftests: Check that PV_UNHALT is cleared when HLT exiting is disabled
KVM: x86: Use actual kvm_cpuid.base for clearing KVM_FEATURE_PV_UNHALT
KVM: x86: Introduce __kvm_get_hypervisor_cpuid() helper
KVM: SVM: Return -EINVAL instead of -EBUSY on attempt to re-init SEV/SEV-ES
...
KVM/arm64 fixes for 6.9, part #1
- Ensure perf events programmed to count during guest execution
are actually enabled before entering the guest in the nVHE
configuration.
- Restore out-of-range handler for stage-2 translation faults.
- Several fixes to stage-2 TLB invalidations to avoid stale
translations, possibly including partial walk caches.
- Fix early handling of architectural VHE-only systems to ensure E2H is
appropriately set.
- Correct a format specifier warning in the arch_timer selftest.
- Make the KVM banner message correctly handle all of the possible
configurations.
Current MPTCP servers increment MPTcpExtMPCapableFallbackACK when they
accept non-MPC connections. As reported by Christoph, this is "surprising"
because the counter might become greater than MPTcpExtMPCapableSYNRX.
MPTcpExtMPCapableFallbackACK counter's name suggests it should only be
incremented when a connection was seen using MPTCP options, then a
fallback to TCP has been done. Let's do that by incrementing it when
the subflow context of an inbound MPC connection attempt is dropped.
Also, update mptcp_connect.sh kselftest, to ensure that the
above MIB does not increment in case a pure TCP client connects to a
MPTCP server.
Fixes: fc518953bc ("mptcp: add and use MIB counter infrastructure")
Cc: stable@vger.kernel.org
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/449
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240329-upstream-net-20240329-fallback-mib-v1-1-324a8981da48@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The netdev CI runs in a VM and captures serial, so stdout and
stderr get combined. Because there's a missing new line in
stderr the test ends up corrupting KTAP:
# Successok 1 selftests: net: reuseaddr_conflict
which should have been:
# Success
ok 1 selftests: net: reuseaddr_conflict
Fixes: 422d8dc6fd ("selftest: add a reuseaddr test")
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Link: https://lore.kernel.org/r/20240329160559.249476-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull kselftest fixes from Shuah Khan:
"Fixes to seccomp and ftrace tests and a change to add config file for
dmabuf-heap test to increase coverage"
* tag 'linux_kselftest-fixes-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: dmabuf-heap: add config file for the test
selftests/seccomp: Try to fit runtime of benchmark into timeout
selftests/ftrace: Fix event filter target_func selection
Pull KUnit fixes from Shuah Khan:
"One urgent fix for --alltests build failure related to renaming of
CONFIG_DAMON_DBGFS to DAMON_DBGFS_DEPRECATED to the missing config
option"
* tag 'linux_kselftest-kunit-fixes-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: configs: Enable CONFIG_DAMON_DBGFS_DEPRECATED for --alltests
This patch adds two tests using SO_REUSEADDR and SO_REUSEPORT and
defines errno for each test case.
SO_REUSEADDR/SO_REUSEPORT is set for the per-fixture two bind()
calls.
The notable pattern is the pair of v6only [::] and plain [::].
The two sockets are put into the same tb2, where per-bucket v6only
flag would be useless to detect bind() conflict.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240326204251.51301-9-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In addtition to the two addresses defined in the fixtures, this patch
add 6 more bind calls():
* 0.0.0.0
* 127.0.0.1
* ::
* ::1
* ::ffff:0.0.0.0
* ::ffff:127.0.0.1
The first two per-fixture bind() calls control how inet_bind2_bucket
is created, and the rest 6 bind() calls cover as many conflicting
patterns as possible.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240326204251.51301-7-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, bind_wildcard.c calls bind() twice for two addresses and
checks the pre-defined errno against the 2nd call. Also, the two
bind() calls are swapped to cover various patterns how bind buckets
are created.
However, only testing two addresses is insufficient to detect regression.
So, we will add more bind() calls, and then, we need to define different
errno for each bind() per test case.
As a prepartion, let's define the reverse order bind() test cases as
fixtures.
No functional changes are intended.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240326204251.51301-5-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, bind_wildcard.c tests only (IPv4, IPv6) pairs, but we will
add more tests for the same protocol pairs.
This patch makes it possible by changing the address pointer to void.
No functional changes are intended.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240326204251.51301-4-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The config fragment enlists all the config options needed for the test.
This config is merged into the kernel's config on which this test is
run.
Fixed whitespace errors during commit:
Shuah Khan <skhan@linuxfoundation.org>
Reviewed-by: T.J. Mercier <tjmercier@google.com>
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The seccomp benchmark runs five scenarios, one calibration run with no
seccomp filters enabled then four further runs each adding a filter. The
calibration run times itself for 15s and then each additional run executes
for the same number of times.
Currently the seccomp tests, including the benchmark, run with an extended
120s timeout but this is not sufficient to robustly run the tests on a lot
of platforms. Sample timings from some recent runs:
Platform Run 1 Run 2 Run 3 Run 4
--------- ----- ----- ----- -----
PowerEdge R200 16.6s 16.6s 31.6s 37.4s
BBB (arm) 20.4s 20.4s 54.5s
Synquacer (arm64) 20.7s 23.7s 40.3s
The x86 runs from the PowerEdge are quite marginal and routinely fail, for
the successful run reported here the timed portions of the run are at
117.2s leaving less than 3s of margin which is frequently breached. The
added overhead of adding filters on the other platforms is such that there
is no prospect of their runs fitting into the 120s timeout, especially
on 32 bit arm where there is no BPF JIT.
While we could lower the time we calibrate for I'm also already seeing the
currently completing runs reporting issues with the per filter overheads
not matching expectations:
Let's instead raise the timeout to 180s which is only a 50% increase on the
current timeout which is itself not *too* large given that there's only two
tests in this suite.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The event filter function test has been failing in our internal test
farm:
| # not ok 33 event filter function - test event filtering on functions
Running the test in verbose mode indicates that this is because the test
erroneously determines that kmem_cache_free() is the most common caller
of kmem_cache_free():
# # + cut -d: -f3 trace
# # + sed s/call_site=([^+]*)+0x.*/1/
# # + sort
# # + uniq -c
# # + sort
# # + tail -n 1
# # + sed s/^[ 0-9]*//
# # + target_func=kmem_cache_free
... and as kmem_cache_free() doesn't call itself, setting this as the
filter function for kmem_cache_free() results in no hits, and
consequently the test fails:
# # + grep kmem_cache_free trace
# # + grep kmem_cache_free
# # + wc -l
# # + hitcnt=0
# # + grep kmem_cache_free trace
# # + grep -v kmem_cache_free
# # + wc -l
# # + misscnt=0
# # + [ 0 -eq 0 ]
# # + exit_fail
This seems to be because the system in question has tasks with ':' in
their name (which a number of kernel worker threads have). These show up
in the trace, e.g.
test:.sh-1299 [004] ..... 2886.040608: kmem_cache_free: call_site=putname+0xa4/0xc8 ptr=000000000f4d22f4 name=names_cache
... and so when we try to extact the call_site with:
cut -d: -f3 trace | sed 's/call_site=\([^+]*\)+0x.*/\1/'
... the 'cut' command will extrace the column containing
'kmem_cache_free' rather than the column containing 'call_site=...', and
the 'sed' command will leave this unchanged. Consequently, the test will
decide to use 'kmem_cache_free' as the filter function, resulting in the
failure seen above.
Fix this by matching the 'call_site=<func>' part specifically to extract
the function name.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Aishwarya TCV <aishwarya.tcv@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-kernel@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Cc: linux-trace-kernel@vger.kernel.org
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
UDP tunnel packets can't be GRO in-between their endpoints as this
causes different issues. The UDP GRO fwd vxlan tests were relying on
this and their expectations have to be fixed.
We keep both vxlan tests and expected no GRO from happening. The vxlan
UDP GRO bench test was removed as it's not providing any valuable
information now.
Fixes: a062260a9d ("selftests: net: add UDP GRO forwarding self-tests")
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from Paolo Abeni:
"Including fixes from bpf, WiFi and netfilter.
Current release - regressions:
- ipv6: fix address dump when IPv6 is disabled on an interface
Current release - new code bugs:
- bpf: temporarily disable atomic operations in BPF arena
- nexthop: fix uninitialized variable in nla_put_nh_group_stats()
Previous releases - regressions:
- bpf: protect against int overflow for stack access size
- hsr: fix the promiscuous mode in offload mode
- wifi: don't always use FW dump trig
- tls: adjust recv return with async crypto and failed copy to
userspace
- tcp: properly terminate timers for kernel sockets
- ice: fix memory corruption bug with suspend and rebuild
- at803x: fix kernel panic with at8031_probe
- qeth: handle deferred cc1
Previous releases - always broken:
- bpf: fix bug in BPF_LDX_MEMSX
- netfilter: reject table flag and netdev basechain updates
- inet_defrag: prevent sk release while still in use
- wifi: pick the version of SESSION_PROTECTION_NOTIF
- wwan: t7xx: split 64bit accesses to fix alignment issues
- mlxbf_gige: call request_irq() after NAPI initialized
- hns3: fix kernel crash when devlink reload during pf
initialization"
* tag 'net-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (81 commits)
inet: inet_defrag: prevent sk release while still in use
Octeontx2-af: fix pause frame configuration in GMP mode
net: lan743x: Add set RFE read fifo threshold for PCI1x1x chips
net: bcmasp: Remove phy_{suspend/resume}
net: bcmasp: Bring up unimac after PHY link up
net: phy: qcom: at803x: fix kernel panic with at8031_probe
netfilter: arptables: Select NETFILTER_FAMILY_ARP when building arp_tables.c
netfilter: nf_tables: skip netdev hook unregistration if table is dormant
netfilter: nf_tables: reject table flag and netdev basechain updates
netfilter: nf_tables: reject destroy command to remove basechain hooks
bpf: update BPF LSM designated reviewer list
bpf: Protect against int overflow for stack access size
bpf: Check bloom filter map value size
bpf: fix warning for crash_kexec
selftests: netdevsim: set test timeout to 10 minutes
net: wan: framer: Add missing static inline qualifiers
mlxbf_gige: call request_irq() after NAPI initialized
tls: get psock ref after taking rxlock to avoid leak
selftests: tls: add test with a partially invalid iov
tls: adjust recv return with async crypto and failed copy to userspace
...
This is required, as CONFIG_DAMON_DEBUGFS is enabled, and --alltests UML
builds will fail due to the missing config option otherwise.
Fixes: f4cba4bf67 ("mm/damon: rename CONFIG_DAMON_DBGFS to DAMON_DBGFS_DEPRECATED")
Signed-off-by: David Gow <davidgow@google.com>
Reviewed-by: Rae Moar <rmoar@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Alexei Starovoitov says:
====================
pull-request: bpf 2024-03-27
The following pull-request contains BPF updates for your *net* tree.
We've added 4 non-merge commits during the last 1 day(s) which contain
a total of 5 files changed, 26 insertions(+), 3 deletions(-).
The main changes are:
1) Fix bloom filter value size validation and protect the verifier
against such mistakes, from Andrei.
2) Fix build due to CONFIG_KEXEC_CORE/CRASH_DUMP split, from Hari.
3) Update bpf_lsm maintainers entry, from Matt.
* tag 'for-net' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: update BPF LSM designated reviewer list
bpf: Protect against int overflow for stack access size
bpf: Check bloom filter map value size
bpf: fix warning for crash_kexec
====================
Link: https://lore.kernel.org/r/20240328012938.24249-1-alexei.starovoitov@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Kalle Valo says:
====================
wireless fixes for v6.9-rc2
The first fixes for v6.9. Ping-Ke Shih now maintains a separate tree
for Realtek drivers, document that in the MAINTAINERS. Plenty of fixes
for both to stack and iwlwifi. Our kunit tests were working only on um
architecture but that's fixed now.
* tag 'wireless-2024-03-27' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: (21 commits)
MAINTAINERS: wifi: mwifiex: add Francesco as reviewer
kunit: fix wireless test dependencies
wifi: iwlwifi: mvm: include link ID when releasing frames
wifi: iwlwifi: mvm: handle debugfs names more carefully
wifi: iwlwifi: mvm: guard against invalid STA ID on removal
wifi: iwlwifi: read txq->read_ptr under lock
wifi: iwlwifi: fw: don't always use FW dump trig
wifi: iwlwifi: mvm: rfi: fix potential response leaks
wifi: mac80211: correctly set active links upon TTLM
wifi: iwlwifi: mvm: Configure the link mapping for non-MLD FW
wifi: iwlwifi: mvm: consider having one active link
wifi: iwlwifi: mvm: pick the version of SESSION_PROTECTION_NOTIF
wifi: mac80211: fix prep_connection error path
wifi: cfg80211: fix rdev_dump_mpp() arguments order
wifi: iwlwifi: mvm: disable MLO for the time being
wifi: cfg80211: add a flag to disable wireless extensions
wifi: mac80211: fix ieee80211_bss_*_flags kernel-doc
wifi: mac80211: check/clear fast rx for non-4addr sta VLAN changes
wifi: mac80211: fix mlme_link_id_dbg()
MAINTAINERS: wifi: add git tree for Realtek WiFi drivers
...
====================
Link: https://lore.kernel.org/r/20240327191346.1A1EAC433C7@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull misc fixes from Andrew Morton:
"Various hotfixes. About half are cc:stable and the remainder address
post-6.8 issues or aren't considered suitable for backporting.
zswap figures prominently in the post-6.8 issues - folloup against the
large amount of changes we have just made to that code.
Apart from that, all over the map"
* tag 'mm-hotfixes-stable-2024-03-27-11-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (21 commits)
crash: use macro to add crashk_res into iomem early for specific arch
mm: zswap: fix data loss on SWP_SYNCHRONOUS_IO devices
selftests/mm: fix ARM related issue with fork after pthread_create
hexagon: vmlinux.lds.S: handle attributes section
userfaultfd: fix deadlock warning when locking src and dst VMAs
tmpfs: fix race on handling dquot rbtree
selftests/mm: sigbus-wp test requires UFFD_FEATURE_WP_HUGETLBFS_SHMEM
mm: zswap: fix writeback shinker GFP_NOIO/GFP_NOFS recursion
ARM: prctl: reject PR_SET_MDWE on pre-ARMv6
prctl: generalize PR_SET_MDWE support check to be per-arch
MAINTAINERS: remove incorrect M: tag for dm-devel@lists.linux.dev
mm: zswap: fix kernel BUG in sg_init_one
selftests: mm: restore settings from only parent process
tools/Makefile: remove cgroup target
mm: cachestat: fix two shmem bugs
mm: increase folio batch size
mm,page_owner: fix recursion
mailmap: update entry for Leonard Crestez
init: open /initrd.image with O_LARGEFILE
selftests/mm: Fix build with _FORTIFY_SOURCE
...
Pull execve fixes from Kees Cook:
- Fix selftests to conform to the TAP output format (Muhammad Usama
Anjum)
- Fix NOMMU linux_binprm::exec pointer in auxv (Max Filippov)
- Replace deprecated strncpy usage (Justin Stitt)
- Replace another /bin/sh instance in selftests
* tag 'execve-v6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
binfmt: replace deprecated strncpy
exec: Fix NOMMU linux_binprm::exec in transfer_args_to_stack()
selftests/exec: Convert remaining /bin/sh to /bin/bash
selftests/exec: execveat: Improve debug reporting
selftests/exec: recursion-depth: conform test to TAP format output
selftests/exec: load_address: conform test to TAP format output
selftests/exec: binfmt_script: Add the overall result line according to TAP
This patch adds a missing check to bloom filter creating, rejecting
values above KMALLOC_MAX_SIZE. This brings the bloom map in line with
many other map types.
The lack of this protection can cause kernel crashes for value sizes
that overflow int's. Such a crash was caught by syzkaller. The next
patch adds more guard-rails at a lower level.
Signed-off-by: Andrei Matei <andreimatei1@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240327024245.318299-2-andreimatei1@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>