Reuse the long sequence test to max out the GRO contexts.
Repeat for a single queue, 8 queues, and default number
of queues but flow steering to just one.
The SW GRO's capacity should be around 64 per queue
(8 buckets, up to 8 skbs in a chain).
Link: https://patch.msgid.link/20260318033819.1469350-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Test accuracy of GRO stats. We want to cover two potentially tricky
cases:
- single segment GRO
- packets which were eligible but didn't get GRO'd
The first case is trivial, teach gro.c to send one packet, and check
GRO stats didn't move.
Second case requires gro.c to send a lot of flows expecting the NIC
to run out of GRO flow capacity.
To avoid system traffic noise we steer the packets to a dedicated
queue and operate on qstat.
Link: https://patch.msgid.link/20260318033819.1469350-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Longer packet sequence tests are quite flaky when the test is run
over a real network. Try to avoid at least the jitter on the sender
side by scheduling all the packets to be sent at once using SO_TXTIME.
Use hardcoded tx time of 5msec in the future. In my test increasing
this time past 2msec makes no difference so 5msec is plenty of margin.
Since we now expect more output buffering make sure to raise SNDBUF.
Note that this is an opportunistic reliability improvement which
will only work if the qdisc can schedule Tx time for us (fq).
Fiddling with qdisc config was deemed too complex, so it's not
part of the patch.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20260318033819.1469350-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There are transient failures for devices which update stats
periodically, especially if it's the FW DMA'ing the stats
rather than host periodic work querying the FW. Wait 25%
longer than strictly necessary.
For devices which don't report stats-block-usecs we retain
25 msec as the default wait time (0.025sec == 20,000usec * 1.25).
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260318033819.1469350-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Antonio Quartulli says:
====================
Included features:
* use bitops.h API when possible
* send netlink notification in case of client float event
* implement support for asymmetric peer IDs
* consolidate memory allocations during crypto operations
* add netlink notification check in selftests
* add FW mark check in selftest
* tag 'ovpn-net-next-20260317' of https://github.com/OpenVPN/ovpn-net-next:
ovpn: consolidate crypto allocations in one chunk
selftests: ovpn: add test for the FW mark feature
selftests: ovpn: check asymmetric peer-id
ovpn: add support for asymmetric peer IDs
selftests: ovpn: add notification parsing and matching
ovpn: notify userspace on client float event
ovpn: pktid: use bitops.h API
ovpn: use correct array size to parse nested attributes in ovpn_nl_key_swap_doit
selftests: ovpn: allow compiling ovpn-cli.c with mbedtls3
====================
Link: https://patch.msgid.link/20260317104023.192548-1-antonio@openvpn.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When running vmtest.sh inside a nested VM the running kernel may not be
installed on the filesystem at the standard /boot/ or /usr/lib/modules/
paths.
Previously, this would cause vng to fail with "does not exist" since it
could not find the kernel image. Instead, this patch uses --dry-run to
detect if the kernel is available. If not, then we fall back to the
kernel in the kernel source tree. If that fails, then we die.
This way runners, like NIPA, can use vng --run arch/x86/boot/bzImage to
setup an outer VM, and vmtest.sh will still do the right thing setting
up the inner VM.
Due to job control issues in vng, a workaround is used to prevent 'make
kselftest TARGETS=vsock' from hanging until test timeout. A PR has been
placed upstream to solve the issue in vng:
https://github.com/arighi/virtme-ng/pull/453
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260316-vsock-vmtest-autodetect-kernel-v2-1-5eec7b4831f8@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Donald points out that the current naive implementation using dicts
breaks if policy is recursive (child nest uses policy idx already
used by its parent).
Lean more into the NlPolicy class. This lets us "render" the policy
on demand, when user accesses it. If someone wants to do an infinite
walk that's on them :) Show policy info as attributes of the class
and use dict format to descend into sub-policies for extra neatness.
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20260313232047.2068518-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull HID fixes from Jiri Kosina:
- various fixes dealing with (intentionally) broken devices in HID
core, logitech-hidpp and multitouch drivers (Lee Jones)
- fix for OOB in wacom driver (Benoît Sevens)
- fix for potentialy HID-bpf-induced buffer overflow in () (Benjamin
Tissoires)
- various other small fixes and device ID / quirk additions
* tag 'hid-for-linus-2026031701' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: multitouch: Check to ensure report responses match the request
HID: logitech-hidpp: Prevent use-after-free on force feedback initialisation failure
HID: bpf: prevent buffer overflow in hid_hw_request
selftests/hid: fix compilation when bpf_wq and hid_device are not exported
HID: core: Mitigate potential OOB by removing bogus memset()
HID: intel-thc-hid: Set HID_PHYS with PCI BDF
HID: appletb-kbd: add .resume method in PM
HID: logitech-hidpp: Enable MX Master 4 over bluetooth
HID: input: Add HID_BATTERY_QUIRK_DYNAMIC for Elan touchscreens
HID: input: Drop Asus UX550* touchscreen ignore battery quirks
HID: asus: add xg mobile 2022 external hardware support
HID: wacom: fix out-of-bounds read in wacom_intuos_bt_irq
As commit bbf4a17ad9 ("ipv6: Fix ECMP sibling count mismatch when
clearing RTF_ADDRCONF") pointed out, RA routes are not elegible for ECMP
merging.
Add a test scenario mixing RA and static routes with gateway to check
that they are not getting merged.
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Link: https://patch.msgid.link/20260313124827.3945-1-fmancera@suse.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add a selftest to verify that the FW mark socket option is correctly
supported and its value propagated by ovpn.
The test adds and removes nftables DROP rules based on the mark value,
and checks that the rule counter aligns with the number of lost ping
packets.
Cc: Shuah Khan <shuah@kernel.org>
Cc: linux-kselftest@vger.kernel.org
Cc: horms@kernel.org
Signed-off-by: Ralf Lici <ralf@mandelbit.com>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
Extend the base test to verify that the correct peer-id is set in data
packet headers. This is done by capturing ping packets with tcpdump during
the initial exchange and matching the first portion of the header
against the expected sequence for every connection.
Cc: Shuah Khan <shuah@kernel.org>
Cc: linux-kselftest@vger.kernel.org
Cc: horms@kernel.org
Signed-off-by: Ralf Lici <ralf@mandelbit.com>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
To verify that netlink notifications are correctly emitted and contain
the expected fields, this commit uses the tools/net/ynl/pyynl/cli.py
script to create multicast listeners. These listeners record the
captured notifications to a JSON file, which is later compared to the
expected output.
Cc: linux-kselftest@vger.kernel.org
Cc: shuah@kernel.org
Cc: horms@kernel.org
Signed-off-by: Ralf Lici <ralf@mandelbit.com>
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
mbedtls 3 installs headers and calls the shared object
differently than version 2, therefore we must now rely
on pkgconfig to fill the right C/LDFLAGS.
Moreover the mbedtls3 library expects any base64 file to
have their content on one line.
Since this change does no break older versions,
let's change the sample key file format and make mbedtls3
happy.
Cc: Shuah Khan <shuah@kernel.org>
Cc: linux-kselftest@vger.kernel.org
Cc: horms@kernel.org
Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
After commit under Fixes debug runners in the CI hit the following:
# subprocess.TimeoutExpired: Command '['bpftrace', '-f', 'json', '-q', '-e', 'kprobe:netpoll_poll_dev { @hits = count(); } interval:s:10 { exit(); }']' timed out after 15 seconds
# # Exception| net.lib.py.ksft.KsftFailEx: bpftrace failed to run!?: {}
in netpoll_basic.py >10% of the time. Let's give bpftool more time
to start, it can take a while on a debug kernel.
Fixes: 82562972b8 ("selftests: net: pass bpftrace timeout to cmd()")
Reviewed-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Link: https://patch.msgid.link/20260315160038.3187730-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull kvm fixes from Paolo Bonzini:
"Quite a large pull request, partly due to skipping last week and
therefore having material from ~all submaintainers in this one. About
a fourth of it is a new selftest, and a couple more changes are large
in number of files touched (fixing a -Wflex-array-member-not-at-end
compiler warning) or lines changed (reformatting of a table in the API
documentation, thanks rST).
But who am I kidding---it's a lot of commits and there are a lot of
bugs being fixed here, some of them on the nastier side like the
RISC-V ones.
ARM:
- Correctly handle deactivation of interrupts that were activated
from LRs. Since EOIcount only denotes deactivation of interrupts
that are not present in an LR, start EOIcount deactivation walk
*after* the last irq that made it into an LR
- Avoid calling into the stubs to probe for ICH_VTR_EL2.TDS when pKVM
is already enabled -- not only thhis isn't possible (pKVM will
reject the call), but it is also useless: this can only happen for
a CPU that has already booted once, and the capability will not
change
- Fix a couple of low-severity bugs in our S2 fault handling path,
affecting the recently introduced LS64 handling and the even more
esoteric handling of hwpoison in a nested context
- Address yet another syzkaller finding in the vgic initialisation,
where we would end-up destroying an uninitialised vgic with nasty
consequences
- Address an annoying case of pKVM failing to boot when some of the
memblock regions that the host is faulting in are not page-aligned
- Inject some sanity in the NV stage-2 walker by checking the limits
against the advertised PA size, and correctly report the resulting
faults
PPC:
- Fix a PPC e500 build error due to a long-standing wart that was
exposed by the recent conversion to kmalloc_obj(); rip out all the
ugliness that led to the wart
RISC-V:
- Prevent speculative out-of-bounds access using array_index_nospec()
in APLIC interrupt handling, ONE_REG regiser access, AIA CSR
access, float register access, and PMU counter access
- Fix potential use-after-free issues in kvm_riscv_gstage_get_leaf(),
kvm_riscv_aia_aplic_has_attr(), and kvm_riscv_aia_imsic_has_attr()
- Fix potential null pointer dereference in
kvm_riscv_vcpu_aia_rmw_topei()
- Fix off-by-one array access in SBI PMU
- Skip THP support check during dirty logging
- Fix error code returned for Smstateen and Ssaia ONE_REG interface
- Check host Ssaia extension when creating AIA irqchip
x86:
- Fix cases where CPUID mitigation features were incorrectly marked
as available whenever the kernel used scattered feature words for
them
- Validate _all_ GVAs, rather than just the first GVA, when
processing a range of GVAs for Hyper-V's TLB flush hypercalls
- Fix a brown paper bug in add_atomic_switch_msr()
- Use hlist_for_each_entry_srcu() when traversing mask_notifier_list,
to fix a lockdep warning; KVM doesn't hold RCU, just irq_srcu
- Ensure AVIC VMCB fields are initialized if the VM has an in-kernel
local APIC (and AVIC is enabled at the module level)
- Update CR8 write interception when AVIC is (de)activated, to fix a
bug where the guest can run in perpetuity with the CR8 intercept
enabled
- Add a quirk to skip the consistency check on FREEZE_IN_SMM, i.e. to
allow L1 hypervisors to set FREEZE_IN_SMM. This reverts (by
default) an unintentional tightening of userspace ABI in 6.17, and
provides some amount of backwards compatibility with hypervisors
who want to freeze PMCs on VM-Entry
- Validate the VMCS/VMCB on return to a nested guest from SMM,
because either userspace or the guest could stash invalid values in
memory and trigger the processor's consistency checks
Generic:
- Remove a subtle pseudo-overlay of kvm_stats_desc, which, aside from
being unnecessary and confusing, triggered compiler warnings due to
-Wflex-array-member-not-at-end
- Document that vcpu->mutex is take outside of kvm->slots_lock and
kvm->slots_arch_lock, which is intentional and desirable despite
being rather unintuitive
Selftests:
- Increase the maximum number of NUMA nodes in the guest_memfd
selftest to 64 (from 8)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (43 commits)
KVM: selftests: Verify SEV+ guests can read and write EFER, CR0, CR4, and CR8
Documentation: kvm: fix formatting of the quirks table
KVM: x86: clarify leave_smm() return value
selftests: kvm: add a test that VMX validates controls on RSM
selftests: kvm: extract common functionality out of smm_test.c
KVM: SVM: check validity of VMCB controls when returning from SMM
KVM: VMX: check validity of VMCS controls when returning from SMM
KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated
KVM: SVM: Initialize AVIC VMCB fields if AVIC is enabled with in-kernel APIC
KVM: x86: Introduce KVM_X86_QUIRK_VMCS12_ALLOW_FREEZE_IN_SMM
KVM: x86: Fix SRCU list traversal in kvm_fire_mask_notifiers()
KVM: VMX: Fix a wrong MSR update in add_atomic_switch_msr()
KVM: x86: hyper-v: Validate all GVAs during PV TLB flush
KVM: x86: synthesize CPUID bits only if CPU capability is set
KVM: PPC: e500: Rip out "struct tlbe_ref"
KVM: PPC: e500: Fix build error due to using kmalloc_obj() with wrong type
KVM: selftests: Increase 'maxnode' for guest_memfd tests
KVM: arm64: pkvm: Don't reprobe for ICH_VTR_EL2.TDS on CPU hotplug
KVM: arm64: vgic: Pick EOIcount deactivations from AP-list tail
KVM: arm64: Remove the redundant ISB in __kvm_at_s1e2()
...
Pull powerpc fixes from Madhavan Srinivasan:
- Fix KUAP warning in VMX usercopy path
- Fix lockdep warning during PCI enumeration
- Fix to move CMA reservations to arch_mm_preinit
- Fix to check current->mm is alive before getting user callchain
Thanks to Aboorva Devarajan, Christophe Leroy (CS GROUP), Dan Horák,
Nicolin Chen, Nilay Shroff, Qiao Zhao, Ritesh Harjani (IBM), Saket Kumar
Bhaskar, Sayali Patil, Shrikanth Hegde, Venkat Rao Bagalkote, and Viktor
Malik.
* tag 'powerpc-7.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/iommu: fix lockdep warning during PCI enumeration
powerpc/selftests/copyloops: extend selftest to exercise __copy_tofrom_user_power7_vmx
powerpc: fix KUAP warning in VMX usercopy path
powerpc, perf: Check that current->mm is alive before getting user callchain
powerpc/mem: Move CMA reservations to arch_mm_preinit
By default, the Linux TCP implementation does not shrink the
advertised window (RFC 7323 calls this "window retraction") with the
following exceptions:
- When an incoming segment cannot be added due to the receive buffer
running out of memory. Since commit 8c670bdfa5 ("tcp: correct
handling of extreme memory squeeze") a zero window will be
advertised in this case. It turns out that reaching the required
memory pressure is easy when window scaling is in use. In the
simplest case, sending a sufficient number of segments smaller than
the scale factor to a receiver that does not read data is enough.
- Commit b650d953cd ("tcp: enforce receive buffer memory limits by
allowing the tcp window to shrink") addressed the "eating memory"
problem by introducing a sysctl knob that allows shrinking the
window before running out of memory.
However, RFC 7323 does not only state that shrinking the window is
necessary in some cases, it also formulates requirements for TCP
implementations when doing so (Section 2.4).
This commit addresses the receiver-side requirements: After retracting
the window, the peer may have a snd_nxt that lies within a previously
advertised window but is now beyond the retracted window. This means
that all incoming segments (including pure ACKs) will be rejected
until the application happens to read enough data to let the peer's
snd_nxt be in window again (which may be never).
To comply with RFC 7323, the receiver MUST honor any segment that
would have been in window for any ACK sent by the receiver and, when
window scaling is in effect, SHOULD track the maximum window sequence
number it has advertised. This patch tracks that maximum window
sequence number rcv_mwnd_seq throughout the connection and uses it in
tcp_sequence() when deciding whether a segment is acceptable.
rcv_mwnd_seq is updated together with rcv_wup and rcv_wnd in
tcp_select_window(). If we count tcp_sequence() as fast path, it is
read in the fast path. Therefore, rcv_mwnd_seq is put into rcv_wnd's
cacheline group.
The logic for handling received data in tcp_data_queue() is already
sufficient and does not need to be updated.
Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260309-tcp_rfc7323_retract_wnd_rfc-v3-1-4c7f96b1ec69@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull sched_ext fixes from Tejun Heo:
- Fix data races flagged by KCSAN: add missing READ_ONCE()/WRITE_ONCE()
annotations for lock-free accesses to module parameters and dsq->seq
- Fix silent truncation of upper 32 enqueue flags (SCX_ENQ_PREEMPT and
above) when passed through the int sched_class interface
- Documentation updates: scheduling class precedence, task ownership
state machine, example scheduler descriptions, config list cleanup
- Selftest fix for format specifier and buffer length in
file_write_long()
* tag 'sched_ext-for-7.0-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
sched_ext: Use WRITE_ONCE() for the write side of scx_enable helper pointer
sched_ext: Fix enqueue_task_scx() truncation of upper enqueue flags
sched_ext: Documentation: Update sched-ext.rst
sched_ext: Use READ_ONCE() for scx_slice_bypass_us in scx_bypass()
sched_ext: Documentation: Mention scheduling class precedence
sched_ext: Document task ownership state machine
sched_ext: Use READ_ONCE() for lock-free reads of module param variables
sched_ext/selftests: Fix format specifier and buffer length in file_write_long()
sched_ext: Use WRITE_ONCE() for the write side of dsq->seq update
Add validation for the nlctrl family, accessing family info and
dumping policies.
TAP version 13
1..4
ok 1 nl_nlctrl.getfamily_do
ok 2 nl_nlctrl.getfamily_dump
ok 3 nl_nlctrl.getpolicy_dump
ok 4 nl_nlctrl.getpolicy_by_op
# Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0
Link: https://patch.msgid.link/20260311032839.417748-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add "do no harm" testing of EFER, CR0, CR4, and CR8 for SEV+ guests to
verify that the guest can read and write the registers, without hitting
e.g. a #VC on SEV-ES guests due to KVM incorrectly trying to intercept a
register.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20260310211841.2552361-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The new PowerPC VMX fast path (__copy_tofrom_user_power7_vmx) is not
exercised by existing copyloops selftests. This patch updates
the selftest to exercise the VMX variant, ensuring the VMX copy path
is validated.
Changes include:
- COPY_LOOP=test___copy_tofrom_user_power7_vmx with -D VMX_TEST is used
in existing selftest build targets.
- Inclusion of ../utils.c to provide get_auxv_entry() for hardware
feature detection.
- At runtime, the test skips execution if Altivec is not available.
- Copy sizes above VMX_COPY_THRESHOLD are used to ensure the VMX
path is taken.
This enables validation of the VMX fast path without affecting systems
that do not support Altivec.
Signed-off-by: Sayali Patil <sayalip@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260304122201.153049-2-sayalip@linux.ibm.com
The cited commit refactored the hardcoded timeout=5 into a parameter,
but dropped the keyword from the communicate() call.
Since Popen.communicate()'s first positional argument is 'input' (not
'timeout'), the timeout value is silently treated as stdin input and the
call never enforces a timeout.
Pass timeout as a keyword argument to restore the intended behavior.
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20260310115803.2521050-3-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The bpftrace() helper configures an interval based exit timer but does
not propagate the timeout to the cmd object, which defaults to 5
seconds. Since the default BPFTRACE_TIMEOUT is 10 seconds, cmd.process()
always raises a TimeoutExpired exception before bpftrace has a chance to
exit gracefully.
Pass timeout+5 to cmd() to allow bpftrace to complete gracefully.
Note: this issue is masked by a bug in the way cmd() passes timeout,
this is fixed in the next commit.
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20260310115803.2521050-2-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add tests to local_termination.sh to verify that link-local frames
arrive. On some switches the DSA driver uses bridges to connect the
user ports to their CPU ports. More "intelligent" switches typically
don't forward link-local frames, but may trap them to an internal
microcontroller. The driver may have to change trapping rules, so
link-local frames end up on the DSA CPU ports instead of being
silently dropped or trapped to the internal microcontroller of the
switch.
Add two tests which help to validate this has been done correctly:
- Link-local STP BPDU should arrive at the Linux netdev when the
bridge has STP disabled (BR_NO_STP), in which case the bridge
forwards them rather than consuming them in the control plane
- Link-local LLDP should arrive at standalone ports (and the test
should be skipped on bridged ports similar to how it is done
for the IEEE1588v2/PTP tests)
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/1a67081b2ede1e6d2d32f7dd54ae9688f3566152.1773166131.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a test checking that invalid eVMCS contents are validated after an
RSM instruction is emulated.
The failure mode is simply that the RSM succeeds, because KVM virtualizes
NMIs anyway while running L2; the two pin-based execution controls used
by the test are entirely handled by KVM and not by the processor.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Increase 'maxnode' when using 'get_mempolicy' syscall in guest_memfd
mmap and NUMA policy tests to fix a failure on one Intel GNR platform.
On a CXL-capable platform, the memory affinity of CXL memory regions may
not be covered by the SRAT. Since each CXL memory region is enumerated
via a CFMWS table, at early boot the kernel parses all CFMWS tables to
detect all CXL memory regions and assigns a 'faked' NUMA node for each
of them, starting from the highest NUMA node ID enumerated via the SRAT.
This increases the 'nr_node_ids'. E.g., on the aforementioned Intel GNR
platform which has 4 NUMA nodes and 18 CFMWS tables, it increases to 22.
This results in the 'get_mempolicy' syscall failure on that platform,
because currently 'maxnode' is hard-coded to 8 but the 'get_mempolicy'
syscall requires the 'maxnode' to be not smaller than the 'nr_node_ids'.
Increase the 'maxnode' to the number of bits of 'nodemask', which is
'unsigned long', to fix this.
This may not cover all systems. Perhaps a better way is to always set
the 'nodemask' and 'maxnode' based on the actual maximum NUMA node ID on
the system, but for now just do the simple way.
Reported-by: Yi Lai <yi1.lai@intel.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221014
Closes: https://lore.kernel.org/all/bug-221014-28872@https.bugzilla.kernel.org%2F
Signed-off-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Yuan Yao <yaoyuan@linux.alibaba.com>
Link: https://patch.msgid.link/20260302205158.178058-1-kai.huang@intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The format of Netlink policy dump is a bit curious with messages
in the same dump carrying both attrs and mapping info. Plus each
message carries a single piece of the puzzle the caller must then
reassemble.
I need to do this reassembly for a test, but I think it's generally
useful. So let's add proper support to YnlFamily to return more
user-friendly representation. See the various docs in the patch
for more details.
Link: https://patch.msgid.link/20260310005337.3594225-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The test generates 16 flows, and verifies that traffic is distributed
across two queues via the NICs RSS indirection table. The likelihood of the
flows skewing to a single queue is high, so we retry sending traffic up to
3 times.
Alternatively, we could increase the number of generated flows. But
debug kernels may struggle to ramp this many flows.
During manual testing, the test passed for 10,000 consecutive runs.
Signed-off-by: Dimitri Daskalakis <dimitri.daskalakis1@gmail.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://patch.msgid.link/20260309204215.2110486-1-dimitri.daskalakis1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net/rds/test.py sees a segfault in tcpdump when executed through the
ksft runner.
[ 21.903713] tcpdump[1469]: segfault at 0 ip 000072100e99126d
sp 00007ffccf740fd0 error 4
[ 21.903721] in libc.so.6[16a26d,7798b149a000+188000]
[ 21.905074] in libc.so.6[16a26d,72100e84f000+188000] likely on
CPU 5 (core 5, socket 0)
[ 21.905084] Code: 00 0f 85 a0 00 00 00 48 83 c4 38 89 d8 5b 41 5c
41 5d 41 5e 41 5f 5d c3 0f 1f 44 00 00 48 8b 05 91 8b 09 00 8b 4d ac
64 89 08 <41> 0f b6 07 83 e8 2b a8 fd 0f 84 54 ff ff ff 49 8b 36 4c 89
ff e8
[ 21.906760] likely on CPU 9 (core 9, socket 0)
[ 21.913469] Code: 00 0f 85 a0 00 00 00 48 83 c4 38 89 d8 5b 41 5c 41
5d 41 5e 41 5f 5d c3 0f 1f 44 00 00 48 8b 05 91 8b 09 00 8b 4d ac 64 89
08 <41> 0f b6 07 83 e8 2b a8 fd 0f 84 54 ff ff ff 49 8b 36 4c 89 ff e8
The os.fork() call creates extra complexity because it forks the entire
process including the python interpreter. ip() then calls cmd() which
creates a subprocess.Popen. We can avoid the extra layering by simply
calling subprocess.Popen directly. Track the process handles directly
and terminate them at cleanup rather than relying on killall. Further
tcpdump's -Z flag attempts to change savefile ownership, which is not
supported by the 9p protocol. Fix this by writing pcap captures to
"/tmp" during the test and move them to the log directory after tcpdump
exits.
Signed-off-by: Allison Henderson <achender@kernel.org>
Link: https://patch.msgid.link/20260308055835.1338257-4-achender@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
fib6_nexthop() retrieves the link-local address for two interfaces used
in the test. However, both lldummy and llv1 are obtained from dummy0.
llv1 is expected to be retrieved from veth1, which is the interface used
later in the test. The subsequent check and error message also expect
the address to be retrieved from veth1.
Fix this by retrieving llv1 from veth1.
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Link: https://patch.msgid.link/20260306180830.2329477-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>