Commit Graph

12491 Commits

Author SHA1 Message Date
Linus Torvalds
6fda0bb806 Merge tag 'mm-hotfixes-stable-2023-04-07-16-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM fixes from Andrew Morton:
 "28 hotfixes.

  23 are cc:stable and the other five address issues which were
  introduced during this merge cycle.

  20 are for MM and the remainder are for other subsystems"

* tag 'mm-hotfixes-stable-2023-04-07-16-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (28 commits)
  maple_tree: fix a potential concurrency bug in RCU mode
  maple_tree: fix get wrong data_end in mtree_lookup_walk()
  mm/swap: fix swap_info_struct race between swapoff and get_swap_pages()
  nilfs2: fix sysfs interface lifetime
  mm: take a page reference when removing device exclusive entries
  mm: vmalloc: avoid warn_alloc noise caused by fatal signal
  nilfs2: initialize "struct nilfs_binfo_dat"->bi_pad field
  nilfs2: fix potential UAF of struct nilfs_sc_info in nilfs_segctor_thread()
  zsmalloc: document freeable stats
  zsmalloc: document new fullness grouping
  fsdax: force clear dirty mark if CoW
  mm/hugetlb: fix uffd wr-protection for CoW optimization path
  mm: enable maple tree RCU mode by default
  maple_tree: add RCU lock checking to rcu callback functions
  maple_tree: add smp_rmb() to dead node detection
  maple_tree: fix write memory barrier of nodes once dead for RCU mode
  maple_tree: remove extra smp_wmb() from mas_dead_leaves()
  maple_tree: fix freeing of nodes in rcu mode
  maple_tree: detect dead nodes in mas_start()
  maple_tree: be more cautious about dead nodes
  ...
2023-04-08 10:51:12 -07:00
Jakub Kicinski
029294d019 Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2023-04-08

We've added 4 non-merge commits during the last 11 day(s) which contain
a total of 5 files changed, 39 insertions(+), 6 deletions(-).

The main changes are:

1) Fix BPF TCP socket iterator to use correct helper for dropping
   socket's refcount, that is, sock_gen_put instead of sock_put,
   from Martin KaFai Lau.

2) Fix a BTI exception splat in BPF trampoline-generated code on arm64,
   from Xu Kuohai.

3) Fix a LongArch JIT error from missing BPF_NOSPEC no-op, from George Guo.

4) Fix dynamic XDP feature detection of veth in xdp_redirect selftest,
   from Lorenzo Bianconi.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: fix xdp_redirect xdp-features selftest for veth driver
  bpf, arm64: Fixed a BTI error on returning to patched function
  LoongArch, bpf: Fix jit to skip speculation barrier opcode
  bpf: tcp: Use sock_gen_put instead of sock_put in bpf_iter_tcp
====================

Link: https://lore.kernel.org/r/20230407224642.30906-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-07 18:23:37 -07:00
Eduard Zingerman
5855b0999d selftests/bpf: Prevent infinite loop in veristat when base file is too short
The following example forces veristat to loop indefinitely:

$ cat two-ok
file_name,prog_name,verdict,total_states
file-a,a,success,12
file-b,b,success,67

$ cat add-failure
file_name,prog_name,verdict,total_states
file-a,a,success,12
file-b,b,success,67
file-b,c,failure,32

$ veristat -C two-ok add-failure
  <does not return>

The loop is caused by handle_comparison_mode() not checking if `base`
variable points to `fallback_stats` prior advancing joined results
using `base`.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230407154125.896927-1-eddyz87@gmail.com
2023-04-07 15:30:31 -07:00
Song Liu
3ebf5212bf selftests/bpf: Use PERF_COUNT_HW_CPU_CYCLES event for get_branch_snapshot
perf_event with type=PERF_TYPE_RAW and config=0x1b00 turned out to be not
reliable in ensuring LBR is active. Thus, test_progs:get_branch_snapshot is
not reliable in some systems. Replace it with PERF_COUNT_HW_CPU_CYCLES
event, which gives more consistent results.

Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20230407190130.2093736-1-song@kernel.org
2023-04-07 15:24:43 -07:00
Hangbin Liu
2e825f8acc selftests: bonding: add arp validate test
This patch add bonding arp validate tests with mode active backup,
monitor arp_ip_target and ns_ip6_target. It also checks mii_status
to make sure all slaves are UP.

Acked-by: Jonathan Toppins <jtoppins@redhat.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-07 08:47:20 +01:00
Hangbin Liu
481b56e039 selftests: bonding: re-format bond option tests
To improve the testing process for bond options, A new bond topology lib
is added to our testing setup. The current option_prio.sh file will be
renamed to bond_options.sh so that all bonding options can be tested here.
Specifically, for priority testing, we will run all tests using modes
1, 5, and 6. These changes will help us streamline the testing process
and ensure that our bond options are rigorously evaluated.

Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jonathan Toppins <jtoppins@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-07 08:47:20 +01:00
Petr Machata
a9fda7a0b0 selftests: forwarding: hw_stats_l3: Detect failure to install counters
Running this test makes little sense if the enabled l3_stats are not
actually reported as "used". This can signify a failure of a driver to
install the necessary counters, or simply lack of support for enabling
in-HW counters on a given netdevice. It is generally impossible to tell
from the outside which it is. But more likely than not, if somebody is
running this on veth pairs, they do not intend to actually test that a
certain piece of HW can install in-HW counters for the veth. It is more
likely they are e.g. running the test by mistake.

Therefore detect that the counter has not been actually installed. In that
case, if the netdevice is one end of a veth pair, SKIP. Otherwise FAIL.

Suggested-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Tested-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/a86817961903cca5cb0aebf2b2a06294b8aa7dea.1680704172.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-06 19:06:17 -07:00
Yonghong Song
23a88fae9f selftests/bpf: Add verifier tests for code pattern '<const> <cond_op> <non_const>'
Add various tests for code pattern '<const> <cond_op> <non_const>' to
exercise the previous verifier patch.

The following are veristat changed number of processed insns stat
comparing the previous patch vs. this patch:

File                                                   Program                                               Insns (A)  Insns (B)  Insns  (DIFF)
-----------------------------------------------------  ----------------------------------------------------  ---------  ---------  -------------
test_seg6_loop.bpf.linked3.o                           __add_egr_x                                               12423      12314  -109 (-0.88%)

Only one program is affected with minor change.

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230406164510.1047757-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-06 15:26:08 -07:00
Yonghong Song
953d9f5bea bpf: Improve handling of pattern '<const> <cond_op> <non_const>' in verifier
Currently, the verifier does not handle '<const> <cond_op> <non_const>' well.
For example,
  ...
  10: (79) r1 = *(u64 *)(r10 -16)       ; R1_w=scalar() R10=fp0
  11: (b7) r2 = 0                       ; R2_w=0
  12: (2d) if r2 > r1 goto pc+2
  13: (b7) r0 = 0
  14: (95) exit
  15: (65) if r1 s> 0x1 goto pc+3
  16: (0f) r0 += r1
  ...
At insn 12, verifier decides both true and false branch are possible, but
actually only false branch is possible.

Currently, the verifier already supports patterns '<non_const> <cond_op> <const>.
Add support for patterns '<const> <cond_op> <non_const>' in a similar way.

Also fix selftest 'verifier_bounds_mix_sign_unsign/bounds checks mixing signed and unsigned, variant 10'
due to this change.

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Dave Marchevsky <davemarchevsky@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230406164505.1046801-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-06 15:26:08 -07:00
Yonghong Song
aec08d677b selftests/bpf: Add tests for non-constant cond_op NE/EQ bound deduction
Add various tests for code pattern '<non-const> NE/EQ <const>' implemented
in the previous verifier patch. Without the verifier patch, these new
tests will fail.

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230406164500.1045715-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-06 15:26:08 -07:00
Jakub Kicinski
d9c960675a Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts:

drivers/net/ethernet/google/gve/gve.h
  3ce9345580 ("gve: Secure enough bytes in the first TX desc for all TCP pkts")
  75eaae158b ("gve: Add XDP DROP and TX support for GQI-QPL format")
https://lore.kernel.org/all/20230406104927.45d176f5@canb.auug.org.au/
https://lore.kernel.org/all/c5872985-1a95-0bc8-9dcc-b6f23b439e9d@tessares.net/

Adjacent changes:

net/can/isotp.c
  051737439e ("can: isotp: fix race between isotp_sendsmg() and isotp_release()")
  96d1c81e6a ("can: isotp: add module parameter for maximum pdu size")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-06 12:01:20 -07:00
Linus Torvalds
f2afccfefe Merge tag 'net-6.3-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
 "Including fixes from wireless and can.

  Current release - regressions:

   - wifi: mac80211:
      - fix potential null pointer dereference
      - fix receiving mesh packets in forwarding=0 networks
      - fix mesh forwarding

  Current release - new code bugs:

   - virtio/vsock: fix leaks due to missing skb owner

  Previous releases - regressions:

   - raw: fix NULL deref in raw_get_next().

   - sctp: check send stream number after wait_for_sndbuf

   - qrtr:
      - fix a refcount bug in qrtr_recvmsg()
      - do not do DEL_SERVER broadcast after DEL_CLIENT

   - wifi: brcmfmac: fix SDIO suspend/resume regression

   - wifi: mt76: fix use-after-free in fw features query.

   - can: fix race between isotp_sendsmg() and isotp_release()

   - eth: mtk_eth_soc: fix remaining throughput regression

   - eth: ice: reset FDIR counter in FDIR init stage

  Previous releases - always broken:

   - core: don't let netpoll invoke NAPI if in xmit context

   - icmp: guard against too small mtu

   - ipv6: fix an uninit variable access bug in __ip6_make_skb()

   - wifi: mac80211: fix the size calculation of
     ieee80211_ie_len_eht_cap()

   - can: fix poll() to not report false EPOLLOUT events

   - eth: gve: secure enough bytes in the first TX desc for all TCP
     pkts"

* tag 'net-6.3-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
  net: stmmac: check fwnode for phy device before scanning for phy
  net: stmmac: Add queue reset into stmmac_xdp_open() function
  selftests: net: rps_default_mask.sh: delete veth link specifically
  net: fec: make use of MDIO C45 quirk
  can: isotp: fix race between isotp_sendsmg() and isotp_release()
  can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events
  can: isotp: isotp_recvmsg(): use sock_recv_cmsgs() to get SOCK_RXQ_OVFL infos
  can: j1939: j1939_tp_tx_dat_new(): fix out-of-bounds memory access
  gve: Secure enough bytes in the first TX desc for all TCP pkts
  netlink: annotate lockless accesses to nlk->max_recvmsg_len
  ethtool: reset #lanes when lanes is omitted
  ping: Fix potentail NULL deref for /proc/net/icmp.
  raw: Fix NULL deref in raw_get_next().
  ice: Reset FDIR counter in FDIR init stage
  ice: fix wrong fallback logic for FDIR
  net: stmmac: fix up RX flow hash indirection table when setting channels
  net: ethernet: ti: am65-cpsw: Fix mdio cleanup in probe
  wifi: mt76: ignore key disable commands
  wifi: ath11k: reduce the MHI timeout to 20s
  ipv6: Fix an uninit variable access bug in __ip6_make_skb()
  ...
2023-04-06 11:39:07 -07:00
Linus Torvalds
8f2e1a855b Merge tag 'linux-kselftest-fixes-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull Kselftest fixes from Shuah Khan:
 "One single fix to mount_setattr_test build failure"

* tag 'linux-kselftest-fixes-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests mount: Fix mount_setattr_test builds failed
2023-04-06 11:34:18 -07:00
Kees Cook
74522fea27 ACPICA: actbl2: Replace 1-element arrays with flexible arrays
ACPICA commit 44f1af0664599e87bebc3a1260692baa27b2f264

Similar to "Replace one-element array with flexible-array", replace the
1-element array with a proper flexible array member as defined by C99.

This allows the code to operate without tripping compile-time and run-
time bounds checkers (e.g. via __builtin_object_size(), -fsanitize=bounds,
and/or -fstrict-flex-arrays=3).

The sizeof() uses with struct acpi_nfit_flush_address and struct
acpi_nfit_smbios have been adjusted to drop the open-coded subtraction
of the trailing single element. The result is no binary differences in
.text nor .data sections.

Link: https://github.com/acpica/acpica/commit/44f1af06
Signed-off-by: Bob Moore <robert.moore@intel.com>
Co-developed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-04-06 20:29:11 +02:00
Kal Conley
c0801598e5 selftests: xsk: Add test UNALIGNED_INV_DESC_4K1_FRAME_SIZE
Add unaligned descriptor test for frame size of 4001. Using an odd frame
size ensures that the end of the UMEM is not near a page boundary. This
allows testing descriptors that staddle the end of the UMEM but not a
page.

This test used to fail without the previous commit ("xsk: Fix unaligned
descriptor validation").

Signed-off-by: Kal Conley <kal.conley@dectris.com>
Link: https://lore.kernel.org/r/20230405235920.7305-3-kal.conley@dectris.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-04-06 10:50:45 -07:00
Lorenzo Bianconi
919e659ed1 selftests/bpf: fix xdp_redirect xdp-features selftest for veth driver
xdp-features supported by veth driver are no more static, but they
depends on veth configuration (e.g. if GRO is enabled/disabled or
TX/RX queue configuration). Take it into account in xdp_redirect
xdp-features selftest for veth driver.

Fixes: fccca038f3 ("veth: take into account device reconfiguration for xdp_features flag")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/bc35455cfbb1d4f7f52536955ded81ad47d8dc54.1680777371.git.lorenzo@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-04-06 09:35:09 -07:00
Eric Dumazet
905a9eb5f6 selftests/net: fix typo in tcp_mmap
kernel test robot reported the following warning:

All warnings (new ones prefixed by >>):

   tcp_mmap.c: In function 'child_thread':
>> tcp_mmap.c:211:61: warning: 'lu' may be used uninitialized in this function [-Wmaybe-uninitialized]
     211 |                         zc.length = min(chunk_size, FILE_SZ - lu);

We want to read FILE_SZ bytes, so the correct expression
should be (FILE_SZ - total)

Fixes: 5c5945dc69 ("selftests/net: Add SHA256 computation over data sent in tcp_mmap")
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202304042104.UFIuevBp-lkp@intel.com/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Xiaoyan Li <lixiaoyan@google.com>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230405071556.1019623-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-04-06 13:19:25 +02:00
Tobias Klauser
d95debbdc5 selftests/clone3: fix number of tests in ksft_set_plan
Commit 515bddf0ec ("selftests/clone3: test clone3 with CLONE_NEWTIME")
added an additional test, so the number passed to ksft_set_plan needs to
be bumped accordingly.

Also use ksft_finished() to print results and exit. This will catch future
mismatches between ksft_set_plan() and the number of tests being run.

Fixes: 515bddf0ec ("selftests/clone3: test clone3 with CLONE_NEWTIME")
Cc: Christian Brauner <brauner@kernel.org>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-04-06 11:57:28 +02:00
Chaitanya S Prakash
d6c2789778 selftests/mm: set overcommit_policy as OVERCOMMIT_ALWAYS
The kernel's default behaviour is to obstruct the allocation of high
virtual address as it handles memory overcommit in a heuristic manner. 
Setting the parameter as OVERCOMMIT_ALWAYS, ensures kernel isn't
susceptible to the availability of a platform's physical memory when
denying a memory allocation request.

Link: https://lkml.kernel.org/r/20230323060121.1175830-4-chaitanyas.prakash@arm.com
Signed-off-by: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05 19:42:59 -07:00
Chaitanya S Prakash
3f9bea2b8a selftests/mm: change NR_CHUNKS_HIGH for aarch64
Although there is a provision for 52 bit VA on arm64 platform, it remains
unutilised and higher addresses are not allocated.  In order to
accommodate 4PB [2^52] virtual address space where supported,
NR_CHUNKS_HIGH is changed accordingly.

Array holding addresses is changed from static allocation to dynamic
allocation to accommodate its voluminous nature which otherwise might
overflow the stack.

Link: https://lkml.kernel.org/r/20230323060121.1175830-3-chaitanyas.prakash@arm.com
Signed-off-by: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05 19:42:59 -07:00
Chaitanya S Prakash
3cce258ea4 selftests/mm: change MAP_CHUNK_SIZE
Patch series "selftests: Fix virtual address range for arm64", v2.

When the virtual address range selftest is run on arm64 and x86 platforms,
it is observed that both the low and high VA range iterations are skipped
when the MAP_CHUNK_SIZE is set to 16GB.  The MAP_CHUNK_SIZE is changed to
1GB to resolve this issue, following which support for arm64 platform is
added by changing the NR_CHUNKS_HIGH for aarch64 to accommodate up to 4PB
of virtual address space allocation requests.  Dynamic memory allocation
of array holding addresses is introduced to prevent overflow of the stack.
Finally, the overcommit_policy is set as OVERCOMMIT_ALWAYS to prevent the
kernel from denying a memory allocation request based on a platform's
physical memory availability.


This patch (of 3):

mmap() fails to allocate 16GB virtual space chunk, skipping both low and
high VA range iterations.  Hence, reduce MAP_CHUNK_SIZE to 1GB and update
relevant macros as required.

Link: https://lkml.kernel.org/r/20230323060121.1175830-1-chaitanyas.prakash@arm.com
Link: https://lkml.kernel.org/r/20230323060121.1175830-2-chaitanyas.prakash@arm.com
Signed-off-by: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05 19:42:58 -07:00
Axel Rasmussen
0289184476 mm: userfaultfd: add UFFDIO_CONTINUE_MODE_WP to install WP PTEs
UFFDIO_COPY already has UFFDIO_COPY_MODE_WP, so when installing a new PTE
to resolve a missing fault, one can install a write-protected one.  This
is useful when using UFFDIO_REGISTER_MODE_{MISSING,WP} in combination.

This was motivated by testing HugeTLB HGM [1], and in particular its
interaction with userfaultfd features.  Existing userfaultfd code supports
using WP and MINOR modes together (i.e.  you can register an area with
both enabled), but without this CONTINUE flag the combination is in
practice unusable.

So, add an analogous UFFDIO_CONTINUE_MODE_WP, which does the same thing as
UFFDIO_COPY_MODE_WP, but for *minor* faults.

Update the selftest to do some very basic exercising of the new flag.

Update Documentation/ to describe how these flags are used (neither the
COPY nor the new CONTINUE versions of this mode flag were described there
before).

[1]: https://patchwork.kernel.org/project/linux-mm/cover/20230218002819.1486479-1-jthoughton@google.com/

Link: https://lkml.kernel.org/r/20230314221250.682452-5-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Acked-by: Peter Xu <peterx@redhat.com>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nadav Amit <namit@vmware.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05 19:42:48 -07:00
Kirill A. Shutemov
23baf831a3 mm, treewide: redefine MAX_ORDER sanely
MAX_ORDER currently defined as number of orders page allocator supports:
user can ask buddy allocator for page order between 0 and MAX_ORDER-1.

This definition is counter-intuitive and lead to number of bugs all over
the kernel.

Change the definition of MAX_ORDER to be inclusive: the range of orders
user can ask from buddy allocator is 0..MAX_ORDER now.

[kirill@shutemov.name: fix min() warning]
  Link: https://lkml.kernel.org/r/20230315153800.32wib3n5rickolvh@box
[akpm@linux-foundation.org: fix another min_t warning]
[kirill@shutemov.name: fixups per Zi Yan]
  Link: https://lkml.kernel.org/r/20230316232144.b7ic4cif4kjiabws@box.shutemov.name
[akpm@linux-foundation.org: fix underlining in docs]
  Link: https://lore.kernel.org/oe-kbuild-all/202303191025.VRCTk6mP-lkp@intel.com/
Link: https://lkml.kernel.org/r/20230315113133.11326-11-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Michael Ellerman <mpe@ellerman.id.au>	[powerpc]
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05 19:42:46 -07:00
Peter Xu
47fba2b6d5 selftests/mm: smoke test UFFD_FEATURE_WP_UNPOPULATED
Enable it by default on the stress test, and add some smoke tests for the
pte markers on anonymous.

Link: https://lkml.kernel.org/r/20230309223711.823547-3-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Paul Gofman <pgofman@codeweavers.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05 19:42:44 -07:00
Hangbin Liu
38e058cc7d selftests: net: rps_default_mask.sh: delete veth link specifically
When deleting the netns and recreating a new one while re-adding the
veth interface, there is a small window of time during which the old
veth interface has not yet been removed. This can cause the new addition
to fail. To resolve this issue, we can either wait for a short while to
ensure that the old veth interface is deleted, or we can specifically
remove the veth interface.

Before this patch:
  # ./rps_default_mask.sh
  empty rps_default_mask                                      [ ok ]
  changing rps_default_mask dont affect existing devices      [ ok ]
  changing rps_default_mask dont affect existing netns        [ ok ]
  changing rps_default_mask affect newly created devices      [ ok ]
  changing rps_default_mask don't affect newly child netns[II][ ok ]
  rps_default_mask is 0 by default in child netns             [ ok ]
  RTNETLINK answers: File exists
  changing rps_default_mask in child ns don't affect the main one[ ok ]
  cat: /sys/class/net/vethC11an1/queues/rx-0/rps_cpus: No such file or directory
  changing rps_default_mask in child ns affects new childns devices./rps_default_mask.sh: line 36: [: -eq: unary operator expected
  [fail] expected 1 found
  changing rps_default_mask in child ns don't affect existing devices[ ok ]

After this patch:
  # ./rps_default_mask.sh
  empty rps_default_mask                                      [ ok ]
  changing rps_default_mask dont affect existing devices      [ ok ]
  changing rps_default_mask dont affect existing netns        [ ok ]
  changing rps_default_mask affect newly created devices      [ ok ]
  changing rps_default_mask don't affect newly child netns[II][ ok ]
  rps_default_mask is 0 by default in child netns             [ ok ]
  changing rps_default_mask in child ns don't affect the main one[ ok ]
  changing rps_default_mask in child ns affects new childns devices[ ok ]
  changing rps_default_mask in child ns don't affect existing devices[ ok ]

Fixes: 3a7d84eae0 ("self-tests: more rps self tests")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20230404072411.879476-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-05 18:59:32 -07:00
Liam R. Howlett
c13af03de4 maple_tree: fix write memory barrier of nodes once dead for RCU mode
During the development of the maple tree, the strategy of freeing multiple
nodes changed and, in the process, the pivots were reused to store
pointers to dead nodes.  To ensure the readers see accurate pivots, the
writers need to mark the nodes as dead and call smp_wmb() to ensure any
readers can identify the node as dead before using the pivot values.

There were two places where the old method of marking the node as dead
without smp_wmb() were being used, which resulted in RCU readers seeing
the wrong pivot value before seeing the node was dead.  Fix this race
condition by using mte_set_node_dead() which has the smp_wmb() call to
ensure the race is closed.

Add a WARN_ON() to the ma_free_rcu() call to ensure all nodes being freed
are marked as dead to ensure there are no other call paths besides the two
updated paths.

This is necessary for the RCU mode of the maple tree.

Link: https://lkml.kernel.org/r/20230227173632.3292573-6-surenb@google.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05 18:06:21 -07:00
YiFei Zhu
5af607a861 selftests/bpf: Wait for receive in cg_storage_multi test
In some cases the loopback latency might be large enough, causing
the assertion on invocations to be run before ingress prog getting
executed. The assertion would fail and the test would flake.

This can be reliably reproduced by arbitrarily increasing the
loopback latency (thanks to [1]):
  tc qdisc add dev lo root handle 1: htb default 12
  tc class add dev lo parent 1:1 classid 1:12 htb rate 20kbps ceil 20kbps
  tc qdisc add dev lo parent 1:12 netem delay 100ms

Fix this by waiting on the receive end, instead of instantly
returning to the assert. The call to read() will wait for the
default SO_RCVTIMEO timeout of 3 seconds provided by
start_server().

[1] https://gist.github.com/kstevens715/4598301

Reported-by: Martin KaFai Lau <martin.lau@linux.dev>
Link: https://lore.kernel.org/bpf/9c5c8b7e-1d89-a3af-5400-14fde81f4429@linux.dev/
Fixes: 3573f38401 ("selftests/bpf: Test CGROUP_STORAGE behavior on shared egress + ingress")
Acked-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Link: https://lore.kernel.org/r/20230405193354.1956209-1-zhuyifei@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-04-05 14:44:07 -07:00
Kal Conley
68e7322142 selftests: xsk: Deflakify STATS_RX_DROPPED test
Fix flaky STATS_RX_DROPPED test. The receiver calls getsockopt after
receiving the last (valid) packet which is not the final packet sent in
the test (valid and invalid packets are sent in alternating fashion with
the final packet being invalid). Since the last packet may or may not
have been dropped already, both outcomes must be allowed.

This issue could also be fixed by making sure the last packet sent is
valid. This alternative is left as an exercise to the reader (or the
benevolent maintainers of this file).

This problem was quite visible on certain setups. On one machine this
failure was observed 50% of the time.

Also, remove a redundant assignment of pkt_stream->nb_pkts. This field
is already initialized by __pkt_stream_alloc.

Fixes: 27e934bec3 ("selftests: xsk: make stat tests not spin on getsockopt")
Signed-off-by: Kal Conley <kal.conley@dectris.com>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230403120400.31018-1-kal.conley@dectris.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-04-05 12:43:52 -07:00
Kal Conley
f2b50f1726 selftests: xsk: Disable IPv6 on VETH1
This change fixes flakiness in the BIDIRECTIONAL test:

    # [is_pkt_valid] expected length [60], got length [90]
    not ok 1 FAIL: SKB BUSY-POLL BIDIRECTIONAL

When IPv6 is enabled, the interface will periodically send MLDv1 and
MLDv2 packets. These packets can cause the BIDIRECTIONAL test to fail
since it uses VETH0 for RX.

For other tests, this was not a problem since they only receive on VETH1
and IPv6 was already disabled on VETH0.

Fixes: a89052572e ("selftests/bpf: Xsk selftests framework")
Signed-off-by: Kal Conley <kal.conley@dectris.com>
Link: https://lore.kernel.org/r/20230405082905.6303-1-kal.conley@dectris.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-04-05 12:22:59 -07:00
Geert Uytterhoeven
8110a3cab0 kunit: tool: Add support for SH under QEMU
Add basic support to run SH under QEMU via kunit_tool using the
virtualized r2d platform.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-04-05 12:51:16 -06:00
Geert Uytterhoeven
5ffb8629b1 kunit: tool: Add support for overriding the QEMU serial port
On some platforms, the console is not the first serial port.  To make
this work, the first serial port in QEMU must be set to "null".

Add support for this by adding an optional "serial" parameter, which
defaults to "stdio", and can be overridden by platform-specific
configuration.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-04-05 12:51:06 -06:00
Kal Conley
ccd1b2933f selftests: xsk: Add test case for packets at end of UMEM
Add test case to testapp_invalid_desc for valid packets at the end of
the UMEM.

Signed-off-by: Kal Conley <kal.conley@dectris.com>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230403145047.33065-3-kal.conley@dectris.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-04-05 11:29:34 -07:00
Kal Conley
7a2050df24 selftests: xsk: Use correct UMEM size in testapp_invalid_desc
Avoid UMEM_SIZE macro in testapp_invalid_desc which is incorrect when
the frame size is not XSK_UMEM__DEFAULT_FRAME_SIZE. Also remove the
macro since it's no longer being used.

Fixes: 909f0e2820 ("selftests: xsk: Add tests for 2K frame size")
Signed-off-by: Kal Conley <kal.conley@dectris.com>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230403145047.33065-2-kal.conley@dectris.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-04-05 11:29:34 -07:00
Kal Conley
9af8716694 selftests: xsk: Add xskxceiver.h dependency to Makefile
xskxceiver depends on xskxceiver.h so tell make about it.

Signed-off-by: Kal Conley <kal.conley@dectris.com>
Link: https://lore.kernel.org/r/20230403130151.31195-1-kal.conley@dectris.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-04-05 11:18:09 -07:00
Joel Fernandes (Google)
8ae9985774 Merge branches 'rcu/staging-core', 'rcu/staging-docs' and 'rcu/staging-kfree', remote-tracking branches 'paul/srcu-cf.2023.04.04a', 'fbq/rcu/lockdep.2023.03.27a' and 'fbq/rcu/rcutorture.2023.03.20a' into rcu/staging 2023-04-05 13:50:37 +00:00
Paul E. McKenney
e035e8876e rcu: Remove CONFIG_SRCU
Now that all references to CONFIG_SRCU have been removed, it is time to
remove CONFIG_SRCU itself.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Petr Mladek <pmladek@suse.com>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
2023-04-05 13:47:41 +00:00
Alexei Starovoitov
69f41a7877 selftests/bpf: Add tracing tests for walking skb and req.
Add tracing tests for walking skb->sk and req->sk.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20230404045029.82870-9-alexei.starovoitov@gmail.com
2023-04-04 16:57:27 -07:00
Ilya Leoshkevich
8fc59c26d2 selftests/bpf: Add RESOLVE_BTFIDS dependency to bpf_testmod.ko
bpf_testmod.ko sometimes fails to build from a clean checkout:

    BTF [M] linux/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.ko
    /bin/sh: 1: linux-build//tools/build/resolve_btfids/resolve_btfids: not found

The reason is that RESOLVE_BTFIDS may not yet be built. Fix by adding a
dependency.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20230403172935.1553022-1-iii@linux.ibm.com
2023-04-04 16:16:42 -07:00
Jason Gunthorpe
62e37c86bf iommufd/selftest: Cover domain unmap with huge pages and access
Inspired by the syzkaller reproducer check the batch carry path with a
simple test.

Link: https://lore.kernel.org/r/4-v1-ceab6a4d7d7a+94-iommufd_syz_jgg@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-04-04 13:11:24 -03:00
Jason Gunthorpe
692d42d411 Merge branch 'iommufd/for-rc' into for-next
The following selftest patch requires both the bug fixes and the
improvements of the selftest framework.

* iommufd/for-rc:
  iommufd: Do not corrupt the pfn list when doing batch carry
  iommufd: Fix unpinning of pages when an access is present
  iommufd: Check for uptr overflow
  Linux 6.3-rc5

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-04-04 11:04:30 -03:00
Arseniy Krasnov
b5d54eb589 vsock/test: update expected return values
This updates expected return values for invalid buffer test. Now such
values are returned from transport, not from af_vsock.c.

Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-04-04 12:46:24 +02:00
Greg Kroah-Hartman
cd8fe5b6db Merge 6.3-rc5 into driver-core-next
We need the fixes in here for testing, as well as the driver core
changes for documentation updates to build on.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-03 09:33:30 +02:00
David Vernet
f85671c6ef bpf: Remove now-defunct task kfuncs
In commit 22df776a9a ("tasks: Extract rcu_users out of union"), the
'refcount_t rcu_users' field was extracted out of a union with the
'struct rcu_head rcu' field. This allows us to safely perform a
refcount_inc_not_zero() on task->rcu_users when acquiring a reference on
a task struct. A prior patch leveraged this by making struct task_struct
an RCU-protected object in the verifier, and by bpf_task_acquire() to
use the task->rcu_users field for synchronization.

Now that we can use RCU to protect tasks, we no longer need
bpf_task_kptr_get(), or bpf_task_acquire_not_zero(). bpf_task_kptr_get()
is truly completely unnecessary, as we can just use RCU to get the
object. bpf_task_acquire_not_zero() is now equivalent to
bpf_task_acquire().

In addition to these changes, this patch also updates the associated
selftests to no longer use these kfuncs.

Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230331195733.699708-3-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-01 09:07:20 -07:00
David Vernet
d02c48fa11 bpf: Make struct task_struct an RCU-safe type
struct task_struct objects are a bit interesting in terms of how their
lifetime is protected by refcounts. task structs have two refcount
fields:

1. refcount_t usage: Protects the memory backing the task struct. When
   this refcount drops to 0, the task is immediately freed, without
   waiting for an RCU grace period to elapse. This is the field that
   most callers in the kernel currently use to ensure that a task
   remains valid while it's being referenced, and is what's currently
   tracked with bpf_task_acquire() and bpf_task_release().

2. refcount_t rcu_users: A refcount field which, when it drops to 0,
   schedules an RCU callback that drops a reference held on the 'usage'
   field above (which is acquired when the task is first created). This
   field therefore provides a form of RCU protection on the task by
   ensuring that at least one 'usage' refcount will be held until an RCU
   grace period has elapsed. The qualifier "a form of" is important
   here, as a task can remain valid after task->rcu_users has dropped to
   0 and the subsequent RCU gp has elapsed.

In terms of BPF, we want to use task->rcu_users to protect tasks that
function as referenced kptrs, and to allow tasks stored as referenced
kptrs in maps to be accessed with RCU protection.

Let's first determine whether we can safely use task->rcu_users to
protect tasks stored in maps. All of the bpf_task* kfuncs can only be
called from tracepoint, struct_ops, or BPF_PROG_TYPE_SCHED_CLS, program
types. For tracepoint and struct_ops programs, the struct task_struct
passed to a program handler will always be trusted, so it will always be
safe to call bpf_task_acquire() with any task passed to a program.
Note, however, that we must update bpf_task_acquire() to be KF_RET_NULL,
as it is possible that the task has exited by the time the program is
invoked, even if the pointer is still currently valid because the main
kernel holds a task->usage refcount. For BPF_PROG_TYPE_SCHED_CLS, tasks
should never be passed as an argument to the any program handlers, so it
should not be relevant.

The second question is whether it's safe to use RCU to access a task
that was acquired with bpf_task_acquire(), and stored in a map. Because
bpf_task_acquire() now uses task->rcu_users, it follows that if the task
is present in the map, that it must have had at least one
task->rcu_users refcount by the time the current RCU cs was started.
Therefore, it's safe to access that task until the end of the current
RCU cs.

With all that said, this patch makes struct task_struct is an
RCU-protected object. In doing so, we also change bpf_task_acquire() to
be KF_ACQUIRE | KF_RCU | KF_RET_NULL, and adjust any selftests as
necessary. A subsequent patch will remove bpf_task_kptr_get(), and
bpf_task_acquire_not_zero() respectively.

Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230331195733.699708-2-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-01 09:07:20 -07:00
Andrii Nakryiko
ebf390c9d0 veristat: small fixed found in -O2 mode
Fix few potentially unitialized variables uses, found while building
veristat.c in release (-O2) mode.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230331222405.3468634-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-01 09:05:57 -07:00
Andrii Nakryiko
e3b65c0c1a veristat: avoid using kernel-internal headers
Drop linux/compiler.h include, which seems to be needed for ARRAY_SIZE
macro only. Redefine own version of ARRAY_SIZE instead.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230331222405.3468634-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-01 09:05:57 -07:00
Andrii Nakryiko
71c8c39f51 veristat: improve version reporting
For packaging version of the tool is important, so add a simple way to
specify veristat version for upstream mirror at Github.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230331222405.3468634-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-01 09:05:57 -07:00
Andrii Nakryiko
3ed85ae802 veristat: relicense veristat.c as dual GPL-2.0-only or BSD-2-Clause licensed
Dual-license veristat.c to dual GPL-2.0-only or BSD-2-Clause license.
This is needed to mirror it to Github to make it convenient for distro
packagers to package veristat as a separate package.

Veristat grew into a useful tool by itself, and there are already
a bunch of users relying on veristat as generic BPF loading and
verification helper tool. So making it easy to packagers by providing
Github mirror just like we do for bpftool and libbpf is the next step to
get veristat into the hands of users.

Apart from few typo fixes, I'm the sole contributor to veristat.c so
far, so no extra Acks should be needed for relicensing.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230331222405.3468634-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-01 09:05:56 -07:00
James Hilliard
9af0f555ae selftests/bpf: Fix conflicts with built-in functions in bench_local_storage_create
The fork function in gcc is considered a built in function due to
being used by libgcov when building with gnu extensions.

Rename fork to sched_process_fork to prevent this conflict.

See details:
d1c3882392
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82457

Fixes the following error:

In file included from progs/bench_local_storage_create.c:6:
progs/bench_local_storage_create.c:43:14: error: conflicting types for
built-in function 'fork'; expected 'int(void)'
[-Werror=builtin-declaration-mismatch]
   43 | int BPF_PROG(fork, struct task_struct *parent, struct
task_struct *child)
      |              ^~~~

Fixes: cbe9d93d58 ("selftests/bpf: Add bench for task storage creation")
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230331075848.1642814-1-james.hilliard1@gmail.com
2023-03-31 11:36:18 -07:00
Jiri Olsa
dcc46f51d7 selftests/bpf: Replace extract_build_id with read_build_id
Replacing extract_build_id with read_build_id that parses out
build id directly from elf without using readelf tool.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230331093157.1749137-4-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-31 09:40:16 -07:00