Commit Graph

1233453 Commits

Author SHA1 Message Date
Andreas Gruenbacher
703df114f9 gfs2: Two quota=account mode fixes
Make sure we don't skip accounting for quota changes with the
quota=account mount option.

Reviewed-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-11-02 20:09:38 +01:00
Linus Torvalds
4652b8e4f3 Merge tag '6.7-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Pull smb server updates from Steve French:
 "Seven ksmbd server fixes:

   - logoff improvement for multichannel bound connections

   - unicode fix for surrogate pairs

   - RDMA (smbdirect) fix for IB devices

   - fix locking deadlock in kern_path_create during rename

   - iov memory allocation fix

   - two minor cleanup patches (doc cleanup, and unused variable)"

* tag '6.7-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: no need to wait for binded connection termination at logoff
  ksmbd: add support for surrogate pair conversion
  ksmbd: fix missing RDMA-capable flag for IPoIB device in ksmbd_rdma_capable_netdev()
  ksmbd: fix recursive locking in vfs helpers
  ksmbd: fix kernel-doc comment of ksmbd_vfs_setxattr()
  ksmbd: reorganize ksmbd_iov_pin_rsp()
  ksmbd: Remove unused field in ksmbd_user struct
2023-11-02 08:32:07 -10:00
Linus Torvalds
71fb7b320b Merge tag 'fsnotify_for_v6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fsnotify update from Jan Kara:
 "This time just one tiny cleanup for fsnotify"

* tag 'fsnotify_for_v6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  fanotify: delete useless parenthesis in FANOTIFY_INLINE_FH macro
2023-11-02 08:27:04 -10:00
Linus Torvalds
5efad0a765 Merge tag 'fs_for_v6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext2, udf, and quota updates from Jan Kara:

 - conversion of ext2 directory code to use folios

 - cleanups in UDF declarations

 - bugfix for quota interaction with file encryption

* tag 'fs_for_v6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  ext2: Convert ext2_prepare_chunk and ext2_commit_chunk to folios
  ext2: Convert ext2_make_empty() to use a folio
  ext2: Convert ext2_unlink() and ext2_rename() to use folios
  ext2: Convert ext2_delete_entry() to use folios
  ext2: Convert ext2_empty_dir() to use a folio
  ext2: Convert ext2_add_link() to use a folio
  ext2: Convert ext2_readdir to use a folio
  ext2: Add ext2_get_folio()
  ext2: Convert ext2_check_page to ext2_check_folio
  highmem: Add folio_release_kmap()
  udf: Avoid unneeded variable length array in struct fileIdentDesc
  udf: Annotate struct udf_bitmap with __counted_by
  quota: explicitly forbid quota files from being encrypted
2023-11-02 08:19:51 -10:00
Linus Torvalds
e9806ff8a0 Merge tag 'jfs-6.7' of https://github.com/kleikamp/linux-shaggy
Pull jfs updates from Dave Kleikamp:
 "Minor stability improvements"

* tag 'jfs-6.7' of https://github.com/kleikamp/linux-shaggy:
  jfs: define xtree root and page independently
  jfs: fix array-index-out-of-bounds in diAlloc
  jfs: fix array-index-out-of-bounds in dbFindLeaf
  fs/jfs: Add validity check for db_maxag and db_agpref
  fs/jfs: Add check for negative db_l2nbperpage
2023-11-02 08:08:28 -10:00
Linus Torvalds
dc737f11c2 Merge tag 'exfat-for-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat
Pull exfat updates from Namjae Jeon:

 - Add ioctls to get and set file attribute that is used in
   the fatattr util

 - Add zero_size_dir mount option to avoid allocating a cluster
   when creating a directory

* tag 'exfat-for-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
  exfat: support create zero-size directory
  exfat: support handle zero-size directory
  exfat: add ioctls for accessing attributes
2023-11-02 08:00:53 -10:00
Linus Torvalds
87a201b43b Merge tag 'erofs-for-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs updates from Gao Xiang:
 "Nothing exciting lands for this cycle, since we're still busying in
  developing support for sub-page blocks and large-folios of compressed
  data for new scenarios on Android.

  In this cycle, MicroLZMA format is marked as stable, and there are
  minor cleanups around documentation and codebase. In addition, it also
  fixes incorrect lockref usage in erofs_insert_workgroup().

  Summary:

   - Fix inode metadata space layout documentation

   - Avoid warning for MicroLZMA format anymore

   - Fix erofs_insert_workgroup() lockref usage

   - Some cleanups"

* tag 'erofs-for-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: fix erofs_insert_workgroup() lockref usage
  erofs: tidy up redundant includes
  erofs: get rid of ROOT_NID()
  erofs: simplify compression configuration parser
  erofs: don't warn MicroLZMA format anymore
  erofs: fix inode metadata space layout description in documentation
2023-11-02 07:53:57 -10:00
Linus Torvalds
57aff99745 Merge tag 'ext4_for_linus-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
 "Cleanup ext4's multi-block allocator, including adding some unit
  tests, as well as cleaning how we update the backup superblock after
  online resizes or updating the label or uuid.

  Optimize handling of released data blocks in ext4's commit machinery
  to avoid a potential lock contention on s_md_lock spinlock.

  Fix a number of ext4 bugs:

   - fix race between writepages and remount

   - fix racy may inline data check in dio write

   - add missed brelse in an error path in update_backups

   - fix umask handling when ACL support is disabled

   - fix lost EIO error when a journal commit races with a fsync of the
     blockdev

   - fix potential improper i_size when there is a crash right after an
     O_SYNC direct write.

   - check extent node for validity before potentially using what might
     be an invalid pointer

   - fix potential stale data exposure when writing to an unwritten
     extent and the file system is nearly out of space

   - fix potential accounting error around block reservations when
     writing partial delayed allocation writes to a bigalloc cluster

   - avoid memory allocation failure when tracking partial delayed
     allocation writes to a bigalloc cluster

   - fix various debugging print messages"

* tag 'ext4_for_linus-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (41 commits)
  ext4: properly sync file size update after O_SYNC direct IO
  ext4: fix racy may inline data check in dio write
  ext4: run mballoc test with different layouts setting
  ext4: add first unit test for ext4_mb_new_blocks_simple in mballoc
  ext4: add some kunit stub for mballoc kunit test
  ext4: call ext4_mb_mark_context in ext4_group_add_blocks()
  ext4: Separate block bitmap and buddy bitmap freeing in ext4_group_add_blocks()
  ext4: call ext4_mb_mark_context in ext4_mb_clear_bb
  ext4: Separate block bitmap and buddy bitmap freeing in ext4_mb_clear_bb()
  ext4: call ext4_mb_mark_context in ext4_mb_mark_diskspace_used
  ext4: extend ext4_mb_mark_context to support allocation under journal
  ext4: call ext4_mb_mark_context in ext4_free_blocks_simple
  ext4: factor out codes to update block bitmap and group descriptor on disk from ext4_mb_mark_bb
  ext4: make state in ext4_mb_mark_bb to be bool
  jbd2: fix potential data lost in recovering journal raced with synchronizing fs bdev
  ext4: apply umask if ACL support is disabled
  ext4: mark buffer new if it is unwritten to avoid stale data exposure
  ext4: move 'ix' sanity check to corrent position
  jbd2: fix printk format type for 'io_block' in do_one_pass()
  jbd2: print io_block if check data block checksum failed when do recovery
  ...
2023-11-02 07:45:14 -10:00
Linus Torvalds
91a683cdf6 Merge tag 'dlm-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
Pull dlm updates from David Teigland:
 "This set of patches has some minor fixes for message handling, some
  misc cleanups, and updates the maintainers entry"

* tag 'dlm-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
  MAINTAINERS: Update dlm maintainer and web page
  dlm: slow down filling up processing queue
  dlm: fix no ack after final message
  dlm: be sure we reset all nodes at forced shutdown
  dlm: fix remove member after close call
  dlm: fix creating multiple node structures
  fs: dlm: Remove some useless memset()
  fs: dlm: Fix the size of a buffer in dlm_create_debug_file()
  fs: dlm: Simplify buffer size computation in dlm_create_debug_file()
2023-11-02 07:40:07 -10:00
Andrea Righi
17fc8084aa module/decompress: use kvmalloc() consistently
We consistently switched from kmalloc() to vmalloc() in module
decompression to prevent potential memory allocation failures with large
modules, however vmalloc() is not as memory-efficient and fast as
kmalloc().

Since we don't know in general the size of the workspace required by the
decompression algorithm, it is more reasonable to use kvmalloc()
consistently, also considering that we don't have special memory
requirements here.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Tested-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-11-02 07:35:39 -10:00
Linus Torvalds
ca219be012 Merge tag 'integrity-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
Pull integrity updates from Mimi Zohar:
 "Four integrity changes: two IMA-overlay updates, an integrity Kconfig
  cleanup, and a secondary keyring update"

* tag 'integrity-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
  ima: detect changes to the backing overlay file
  certs: Only allow certs signed by keys on the builtin keyring
  integrity: fix indentation of config attributes
  ima: annotate iint mutex to avoid lockdep false positive warnings
2023-11-02 06:53:22 -10:00
Björn Töpel
d84b139f53 selftests/bpf: Fix broken build where char is unsigned
There are architectures where char is not signed. If so, the following
error is triggered:

  | xdp_hw_metadata.c:435:42: error: result of comparison of constant -1 \
  |   with expression of type 'char' is always true \
  |   [-Werror,-Wtautological-constant-out-of-range-compare]
  |   435 |         while ((opt = getopt(argc, argv, "mh")) != -1) {
  |       |                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^  ~~
  | 1 error generated.

Correct by changing the char to int.

Fixes: bb6a88885f ("selftests/bpf: Add options and frags to xdp_hw_metadata")
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Acked-by: Larysa Zaremba <larysa.zaremba@intel.com>
Tested-by: Anders Roxell <anders.roxell@linaro.org>
Link: https://lore.kernel.org/r/20231102103537.247336-1-bjorn@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-11-02 07:57:21 -07:00
Shyam Prasad N
d9a6d78096 cifs: force interface update before a fresh session setup
During a session reconnect, it is possible that the
server moved to another physical server (happens in case
of Azure files). So at this time, force a query of server
interfaces again (in case of multichannel session), such
that the secondary channels connect to the right
IP addresses (possibly updated now).

Cc: stable@vger.kernel.org
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-02 08:06:06 -05:00
Shyam Prasad N
6e5e64c947 cifs: do not reset chan_max if multichannel is not supported at mount
If the mount command has specified multichannel as a mount option,
but multichannel is found to be unsupported by the server at the time
of mount, we set chan_max to 1. Which means that the user needs to
remount the share if the server starts supporting multichannel.

This change removes this reset. What it means is that if the user
specified multichannel or max_channels during mount, and at this
time, multichannel is not supported, but the server starts supporting
it at a later point, the client will be capable of scaling out the
number of channels.

Cc: stable@vger.kernel.org
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-02 08:06:06 -05:00
Shyam Prasad N
c3326a61cd cifs: reconnect helper should set reconnect for the right channel
We introduced a helper function to be used by non-cifsd threads to
mark the connection for reconnect. For multichannel, when only
a particular channel needs to be reconnected, this had a bug.

This change fixes that by marking that particular channel
for reconnect.

Fixes: dca65818c8 ("cifs: use a different reconnect helper for non-cifsd threads")
Cc: stable@vger.kernel.org
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-02 08:06:06 -05:00
Paulo Alcantara
5c86919455 smb: client: fix use-after-free in smb2_query_info_compound()
The following UAF was triggered when running fstests generic/072 with
KASAN enabled against Windows Server 2022 and mount options
'multichannel,max_channels=2,vers=3.1.1,mfsymlinks,noperm'

  BUG: KASAN: slab-use-after-free in smb2_query_info_compound+0x423/0x6d0 [cifs]
  Read of size 8 at addr ffff888014941048 by task xfs_io/27534

  CPU: 0 PID: 27534 Comm: xfs_io Not tainted 6.6.0-rc7 #1
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
  rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
  Call Trace:
   dump_stack_lvl+0x4a/0x80
   print_report+0xcf/0x650
   ? srso_alias_return_thunk+0x5/0x7f
   ? srso_alias_return_thunk+0x5/0x7f
   ? __phys_addr+0x46/0x90
   kasan_report+0xda/0x110
   ? smb2_query_info_compound+0x423/0x6d0 [cifs]
   ? smb2_query_info_compound+0x423/0x6d0 [cifs]
   smb2_query_info_compound+0x423/0x6d0 [cifs]
   ? __pfx_smb2_query_info_compound+0x10/0x10 [cifs]
   ? srso_alias_return_thunk+0x5/0x7f
   ? __stack_depot_save+0x39/0x480
   ? kasan_save_stack+0x33/0x60
   ? kasan_set_track+0x25/0x30
   ? ____kasan_slab_free+0x126/0x170
   smb2_queryfs+0xc2/0x2c0 [cifs]
   ? __pfx_smb2_queryfs+0x10/0x10 [cifs]
   ? __pfx___lock_acquire+0x10/0x10
   smb311_queryfs+0x210/0x220 [cifs]
   ? __pfx_smb311_queryfs+0x10/0x10 [cifs]
   ? srso_alias_return_thunk+0x5/0x7f
   ? __lock_acquire+0x480/0x26c0
   ? lock_release+0x1ed/0x640
   ? srso_alias_return_thunk+0x5/0x7f
   ? do_raw_spin_unlock+0x9b/0x100
   cifs_statfs+0x18c/0x4b0 [cifs]
   statfs_by_dentry+0x9b/0xf0
   fd_statfs+0x4e/0xb0
   __do_sys_fstatfs+0x7f/0xe0
   ? __pfx___do_sys_fstatfs+0x10/0x10
   ? srso_alias_return_thunk+0x5/0x7f
   ? lockdep_hardirqs_on_prepare+0x136/0x200
   ? srso_alias_return_thunk+0x5/0x7f
   do_syscall_64+0x3f/0x90
   entry_SYSCALL_64_after_hwframe+0x6e/0xd8

  Allocated by task 27534:
   kasan_save_stack+0x33/0x60
   kasan_set_track+0x25/0x30
   __kasan_kmalloc+0x8f/0xa0
   open_cached_dir+0x71b/0x1240 [cifs]
   smb2_query_info_compound+0x5c3/0x6d0 [cifs]
   smb2_queryfs+0xc2/0x2c0 [cifs]
   smb311_queryfs+0x210/0x220 [cifs]
   cifs_statfs+0x18c/0x4b0 [cifs]
   statfs_by_dentry+0x9b/0xf0
   fd_statfs+0x4e/0xb0
   __do_sys_fstatfs+0x7f/0xe0
   do_syscall_64+0x3f/0x90
   entry_SYSCALL_64_after_hwframe+0x6e/0xd8

  Freed by task 27534:
   kasan_save_stack+0x33/0x60
   kasan_set_track+0x25/0x30
   kasan_save_free_info+0x2b/0x50
   ____kasan_slab_free+0x126/0x170
   slab_free_freelist_hook+0xd0/0x1e0
   __kmem_cache_free+0x9d/0x1b0
   open_cached_dir+0xff5/0x1240 [cifs]
   smb2_query_info_compound+0x5c3/0x6d0 [cifs]
   smb2_queryfs+0xc2/0x2c0 [cifs]

This is a race between open_cached_dir() and cached_dir_lease_break()
where the cache entry for the open directory handle receives a lease
break while creating it.  And before returning from open_cached_dir(),
we put the last reference of the new @cfid because of
!@cfid->has_lease.

Besides the UAF, while running xfstests a lot of missed lease breaks
have been noticed in tests that run several concurrent statfs(2) calls
on those cached fids

  CIFS: VFS: \\w22-root1.gandalf.test No task to wake, unknown frame...
  CIFS: VFS: \\w22-root1.gandalf.test Cmd: 18 Err: 0x0 Flags: 0x1...
  CIFS: VFS: \\w22-root1.gandalf.test smb buf 00000000715bfe83 len 108
  CIFS: VFS: Dump pending requests:
  CIFS: VFS: \\w22-root1.gandalf.test No task to wake, unknown frame...
  CIFS: VFS: \\w22-root1.gandalf.test Cmd: 18 Err: 0x0 Flags: 0x1...
  CIFS: VFS: \\w22-root1.gandalf.test smb buf 000000005aa7316e len 108
  ...

To fix both, in open_cached_dir() ensure that @cfid->has_lease is set
right before sending out compounded request so that any potential
lease break will be get processed by demultiplex thread while we're
still caching @cfid.  And, if open failed for some reason, re-check
@cfid->has_lease to decide whether or not put lease reference.

Cc: stable@vger.kernel.org
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-02 08:06:06 -05:00
Paulo Alcantara
c37ed2d7d0 smb: client: remove extra @chan_count check in __cifs_put_smb_ses()
If @ses->chan_count <= 1, then for-loop body will not be executed so
no need to check it twice.

Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-02 08:05:45 -05:00
Patrick Thompson
efa5f1311c net: r8169: Disable multicast filter for RTL8168H and RTL8107E
RTL8168H and RTL8107E ethernet adapters erroneously filter unicast
eapol packets unless allmulti is enabled. These devices correspond to
RTL_GIGA_MAC_VER_46 and VER_48. Add an exception for VER_46 and VER_48
in the same way that VER_35 has an exception.

Fixes: 6e1d0b8988 ("r8169:add support for RTL8168H and RTL8107E")
Signed-off-by: Patrick Thompson <ptf@google.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/20231030205031.177855-1-ptf@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-02 13:07:14 +01:00
Petr Mladek
2966bd3698 Merge branch 'rework/nbcon-base' into for-linus 2023-11-02 13:04:59 +01:00
Petr Mladek
86098bcdde Merge branch 'rework/misc-cleanups' into for-linus 2023-11-02 13:00:34 +01:00
Petr Mladek
adb982ad4b Merge branch 'for-6.7' into for-linus 2023-11-02 13:00:20 +01:00
Paolo Abeni
ff2c051fdc Merge branch 'dccp-tcp-relocate-security_inet_conn_request'
Kuniyuki Iwashima says:

====================
dccp/tcp: Relocate security_inet_conn_request().

security_inet_conn_request() reads reqsk's remote address, but it's not
initialised in some places.

Let's make sure the address is set before security_inet_conn_request().
====================

Link: https://lore.kernel.org/r/20231030201042.32885-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-02 12:56:03 +01:00
Kuniyuki Iwashima
23be1e0e2a dccp/tcp: Call security_inet_conn_request() after setting IPv6 addresses.
Initially, commit 4237c75c0a ("[MLSXFRM]: Auto-labeling of child
sockets") introduced security_inet_conn_request() in some functions
where reqsk is allocated.  The hook is added just after the allocation,
so reqsk's IPv6 remote address was not initialised then.

However, SELinux/Smack started to read it in netlbl_req_setattr()
after commit e1adea9270 ("calipso: Allow request sockets to be
relabelled by the lsm.").

Commit 284904aa79 ("lsm: Relocate the IPv4 security_inet_conn_request()
hooks") fixed that kind of issue only in TCPv4 because IPv6 labeling was
not supported at that time.  Finally, the same issue was introduced again
in IPv6.

Let's apply the same fix on DCCPv6 and TCPv6.

Fixes: e1adea9270 ("calipso: Allow request sockets to be relabelled by the lsm.")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-02 12:56:03 +01:00
Kuniyuki Iwashima
fa2df45af1 dccp: Call security_inet_conn_request() after setting IPv4 addresses.
Initially, commit 4237c75c0a ("[MLSXFRM]: Auto-labeling of child
sockets") introduced security_inet_conn_request() in some functions
where reqsk is allocated.  The hook is added just after the allocation,
so reqsk's IPv4 remote address was not initialised then.

However, SELinux/Smack started to read it in netlbl_req_setattr()
after the cited commits.

This bug was partially fixed by commit 284904aa79 ("lsm: Relocate
the IPv4 security_inet_conn_request() hooks").

This patch fixes the last bug in DCCPv4.

Fixes: 389fb800ac ("netlabel: Label incoming TCP connections correctly in SELinux")
Fixes: 07feee8f81 ("netlabel: Cleanup the Smack/NetLabel code to fix incoming TCP connections")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-02 12:55:42 +01:00
Gerd Bayer
a1602d7490 net/smc: fix documentation of buffer sizes
Since commit 833bac7ec3 ("net/smc: Fix setsockopt and sysctl to
specify same buffer size again") the SMC protocol uses its own
default values for the smc.rmem and smc.wmem sysctl variables
which are no longer derived from the TCP IPv4 buffer sizes.

Fixup the kernel documentation to reflect this change, too.

Fixes: 833bac7ec3 ("net/smc: Fix setsockopt and sysctl to specify same buffer size again")
Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Link: https://lore.kernel.org/r/20231030170343.748097-1-gbayer@linux.ibm.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-02 12:50:17 +01:00
Jian Shen
8ffbd1669e net: page_pool: add missing free_percpu when page_pool_init fail
When ptr_ring_init() returns failure in page_pool_init(), free_percpu()
is not called to free pool->recycle_stats, which may cause memory
leak.

Fixes: ad6fa1e1ab ("page_pool: Add recycle stats")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Link: https://lore.kernel.org/r/20231030091256.2915394-1-shaojijie@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-02 12:27:53 +01:00
Andrew Lunn
f55d8e60f1 net: ethtool: Fix documentation of ethtool_sprintf()
This function takes a pointer to a pointer, unlike sprintf() which is
passed a plain pointer. Fix up the documentation to make this clear.

Fixes: 7888fe53b7 ("ethtool: Add common function for filling out strings")
Cc: Alexander Duyck <alexanderduyck@fb.com>
Cc: Justin Stitt <justinstitt@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Justin Stitt <justinstitt@google.com>
Link: https://lore.kernel.org/r/20231028192511.100001-1-andrew@lunn.ch
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-02 12:17:17 +01:00
Alexander Sverdlin
5a22fbcc10 net: dsa: lan9303: consequently nested-lock physical MDIO
When LAN9303 is MDIO-connected two callchains exist into
mdio->bus->write():

1. switch ports 1&2 ("physical" PHYs):

virtual (switch-internal) MDIO bus (lan9303_switch_ops->phy_{read|write})->
  lan9303_mdio_phy_{read|write} -> mdiobus_{read|write}_nested

2. LAN9303 virtual PHY:

virtual MDIO bus (lan9303_phy_{read|write}) ->
  lan9303_virt_phy_reg_{read|write} -> regmap -> lan9303_mdio_{read|write}

If the latter functions just take
mutex_lock(&sw_dev->device->bus->mdio_lock) it triggers a LOCKDEP
false-positive splat. It's false-positive because the first
mdio_lock in the second callchain above belongs to virtual MDIO bus, the
second mdio_lock belongs to physical MDIO bus.

Consequent annotation in lan9303_mdio_{read|write} as nested lock
(similar to lan9303_mdio_phy_{read|write}, it's the same physical MDIO bus)
prevents the following splat:

WARNING: possible circular locking dependency detected
5.15.71 #1 Not tainted
------------------------------------------------------
kworker/u4:3/609 is trying to acquire lock:
ffff000011531c68 (lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock){+.+.}-{3:3}, at: regmap_lock_mutex
but task is already holding lock:
ffff0000114c44d8 (&bus->mdio_lock){+.+.}-{3:3}, at: mdiobus_read
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&bus->mdio_lock){+.+.}-{3:3}:
       lock_acquire
       __mutex_lock
       mutex_lock_nested
       lan9303_mdio_read
       _regmap_read
       regmap_read
       lan9303_probe
       lan9303_mdio_probe
       mdio_probe
       really_probe
       __driver_probe_device
       driver_probe_device
       __device_attach_driver
       bus_for_each_drv
       __device_attach
       device_initial_probe
       bus_probe_device
       deferred_probe_work_func
       process_one_work
       worker_thread
       kthread
       ret_from_fork
-> #0 (lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock){+.+.}-{3:3}:
       __lock_acquire
       lock_acquire.part.0
       lock_acquire
       __mutex_lock
       mutex_lock_nested
       regmap_lock_mutex
       regmap_read
       lan9303_phy_read
       dsa_slave_phy_read
       __mdiobus_read
       mdiobus_read
       get_phy_device
       mdiobus_scan
       __mdiobus_register
       dsa_register_switch
       lan9303_probe
       lan9303_mdio_probe
       mdio_probe
       really_probe
       __driver_probe_device
       driver_probe_device
       __device_attach_driver
       bus_for_each_drv
       __device_attach
       device_initial_probe
       bus_probe_device
       deferred_probe_work_func
       process_one_work
       worker_thread
       kthread
       ret_from_fork
other info that might help us debug this:
 Possible unsafe locking scenario:
       CPU0                    CPU1
       ----                    ----
  lock(&bus->mdio_lock);
                               lock(lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock);
                               lock(&bus->mdio_lock);
  lock(lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock);
*** DEADLOCK ***
5 locks held by kworker/u4:3/609:
 #0: ffff000002842938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work
 #1: ffff80000bacbd60 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work
 #2: ffff000007645178 (&dev->mutex){....}-{3:3}, at: __device_attach
 #3: ffff8000096e6e78 (dsa2_mutex){+.+.}-{3:3}, at: dsa_register_switch
 #4: ffff0000114c44d8 (&bus->mdio_lock){+.+.}-{3:3}, at: mdiobus_read
stack backtrace:
CPU: 1 PID: 609 Comm: kworker/u4:3 Not tainted 5.15.71 #1
Workqueue: events_unbound deferred_probe_work_func
Call trace:
 dump_backtrace
 show_stack
 dump_stack_lvl
 dump_stack
 print_circular_bug
 check_noncircular
 __lock_acquire
 lock_acquire.part.0
 lock_acquire
 __mutex_lock
 mutex_lock_nested
 regmap_lock_mutex
 regmap_read
 lan9303_phy_read
 dsa_slave_phy_read
 __mdiobus_read
 mdiobus_read
 get_phy_device
 mdiobus_scan
 __mdiobus_register
 dsa_register_switch
 lan9303_probe
 lan9303_mdio_probe
...

Cc: stable@vger.kernel.org
Fixes: dc70058315 ("net: dsa: LAN9303: add MDIO managed mode support")
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20231027065741.534971-1-alexander.sverdlin@siemens.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-02 10:48:09 +01:00
Ratheesh Kannoth
7aeeb2cb7a octeontx2-pf: Fix holes in error code
Error code strings are not getting printed properly
due to holes. Print error code as well.

Fixes: 51afe9026d ("octeontx2-pf: NIX TX overwrites SQ_CTX_HW_S[SQ_INT]")
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Link: https://lore.kernel.org/r/20231027021953.1819959-2-rkannoth@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-02 10:40:15 +01:00
Ratheesh Kannoth
96b9a68d1a octeontx2-pf: Fix error codes
Some of error codes were wrong. Fix the same.

Fixes: 51afe9026d ("octeontx2-pf: NIX TX overwrites SQ_CTX_HW_S[SQ_INT]")
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Link: https://lore.kernel.org/r/20231027021953.1819959-1-rkannoth@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-02 10:40:15 +01:00
Masami Hiramatsu
63f1ee2061 locking/atomic: sh: Use generic_cmpxchg_local for arch_cmpxchg_local()
Use __generic_cmpxchg_local() for arch_cmpxchg_local() implementation
on SH architecture because it does not implement arch_cmpxchg_local().

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202310241310.Ir5uukOG-lkp@intel.com/
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Link: https://lore.kernel.org/r/169824660459.24340.14614817132696360531.stgit@devnote2
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
2023-11-02 10:13:12 +01:00
Eric Dumazet
1726483b79 inet: shrink struct flowi_common
I am looking at syzbot reports triggering kernel stack overflows
involving a cascade of ipvlan devices.

We can save 8 bytes in struct flowi_common.

This patch alone will not fix the issue, but is a start.

Fixes: 24ba14406c ("route: Add multipath_hash in flowi_common to make user-define hash")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: wenxu <wenxu@ucloud.cn>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20231025141037.3448203-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-02 09:31:02 +01:00
Linus Torvalds
21e80f3841 Merge tag 'modules-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pull modules updates from Luis Chamberlain:
 "The only thing worth highligthing is that gzip moves to use vmalloc()
  instead of kmalloc just as we had a fix for this for zstd on v6.6-rc1.

  The rest is regular house keeping, keeping things neat, tidy, and
  boring"

[ The kmalloc -> vmalloc conversion is not the right approach.

  Unless you know you need huge areas or know you need to use virtual
  mappings for some reason (playing with protection bits or whatever),
  you should use kvmalloc()/kvfree, which automatically picks the right
  allocation model    - Linus ]

* tag 'modules-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
  module: Annotate struct module_notes_attrs with __counted_by
  module: Fix comment typo
  module: Make is_valid_name() return bool
  module: Make is_mapping_symbol() return bool
  module/decompress: use vmalloc() for gzip decompression workspace
  MAINTAINERS: add include/linux/module*.h to modules
  module: Clarify documentation of module_param_call()
2023-11-01 21:09:37 -10:00
Linus Torvalds
426ee5196d Merge tag 'sysctl-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pull sysctl updates from Luis Chamberlain:
 "To help make the move of sysctls out of kernel/sysctl.c not incur a
  size penalty sysctl has been changed to allow us to not require the
  sentinel, the final empty element on the sysctl array. Joel Granados
  has been doing all this work. On the v6.6 kernel we got the major
  infrastructure changes required to support this. For v6.7-rc1 we have
  all arch/ and drivers/ modified to remove the sentinel. Both arch and
  driver changes have been on linux-next for a bit less than a month. It
  is worth re-iterating the value:

   - this helps reduce the overall build time size of the kernel and run
     time memory consumed by the kernel by about ~64 bytes per array

   - the extra 64-byte penalty is no longer inncurred now when we move
     sysctls out from kernel/sysctl.c to their own files

  For v6.8-rc1 expect removal of all the sentinels and also then the
  unneeded check for procname == NULL.

  The last two patches are fixes recently merged by Krister Johansen
  which allow us again to use softlockup_panic early on boot. This used
  to work but the alias work broke it. This is useful for folks who want
  to detect softlockups super early rather than wait and spend money on
  cloud solutions with nothing but an eventual hung kernel. Although
  this hadn't gone through linux-next it's also a stable fix, so we
  might as well roll through the fixes now"

* tag 'sysctl-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: (23 commits)
  watchdog: move softlockup_panic back to early_param
  proc: sysctl: prevent aliased sysctls from getting passed to init
  intel drm: Remove now superfluous sentinel element from ctl_table array
  Drivers: hv: Remove now superfluous sentinel element from ctl_table array
  raid: Remove now superfluous sentinel element from ctl_table array
  fw loader: Remove the now superfluous sentinel element from ctl_table array
  sgi-xp: Remove the now superfluous sentinel element from ctl_table array
  vrf: Remove the now superfluous sentinel element from ctl_table array
  char-misc: Remove the now superfluous sentinel element from ctl_table array
  infiniband: Remove the now superfluous sentinel element from ctl_table array
  macintosh: Remove the now superfluous sentinel element from ctl_table array
  parport: Remove the now superfluous sentinel element from ctl_table array
  scsi: Remove now superfluous sentinel element from ctl_table array
  tty: Remove now superfluous sentinel element from ctl_table array
  xen: Remove now superfluous sentinel element from ctl_table array
  hpet: Remove now superfluous sentinel element from ctl_table array
  c-sky: Remove now superfluous sentinel element from ctl_talbe array
  powerpc: Remove now superfluous sentinel element from ctl_table arrays
  riscv: Remove now superfluous sentinel element from ctl_table array
  x86/vdso: Remove now superfluous sentinel element from ctl_table array
  ...
2023-11-01 20:51:41 -10:00
Alexei Starovoitov
94e88b8a3e Merge branch 'bpf-fix-precision-tracking-for-bpf_alu-bpf_to_be-bpf_end'
Shung-Hsi Yu says:

====================
bpf: Fix precision tracking for BPF_ALU | BPF_TO_BE | BPF_END

Changes since v1:
- add test for negation and bswap (Alexei, Eduard)
- add test for BPF_TO_LE as well to cover all types of BPF_END opcode
- remove vals map and trigger backtracking with jump instead, based of
  Eduard's code
- v1 at https://lore.kernel.org/bpf/20231030132145.20867-1-shung-hsi.yu@suse.com

This patchset fixes and adds selftest for the issue reported by Mohamed
Mahmoud and Toke Høiland-Jørgensen where the kernel can run into a
verifier bug during backtracking of BPF_ALU | BPF_TO_BE | BPF_END
instruction[0]. As seen in the verifier log below, r0 was incorrectly
marked as precise even tough its value was not being used.

Patch 1 fixes the issue based on Andrii's analysis, and patch 2 adds a
selftest for such case using inline assembly. Please see individual
patch for detail.

    ...
	mark_precise: frame2: regs=r2 stack= before 1891: (77) r2 >>= 56
	mark_precise: frame2: regs=r2 stack= before 1890: (dc) r2 = be64 r2
	mark_precise: frame2: regs=r0,r2 stack= before 1889: (73) *(u8 *)(r1 +47) = r3
	...
	mark_precise: frame2: regs=r0 stack= before 212: (85) call pc+1617
	BUG regs 1
	processed 5112 insns (limit 1000000) max_states_per_insn 4 total_states 92 peak_states 90 mark_read 20

0: https://lore.kernel.org/r/87jzrrwptf.fsf@toke.dk

Shung-Hsi Yu (2):
  bpf: Fix precision tracking for BPF_ALU | BPF_TO_BE | BPF_END
  selftests/bpf: precision tracking test for BPF_NEG and BPF_END

 kernel/bpf/verifier.c                         |  7 +-
 .../selftests/bpf/prog_tests/verifier.c       |  2 +
 .../selftests/bpf/progs/verifier_precision.c  | 93 +++++++++++++++++++
 3 files changed, 101 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/progs/verifier_precision.c

base-commit: c17cda15cc
====================

Link: https://lore.kernel.org/r/20231102053913.12004-1-shung-hsi.yu@suse.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-11-01 22:54:28 -07:00
Shung-Hsi Yu
3c41971550 selftests/bpf: precision tracking test for BPF_NEG and BPF_END
As seen from previous commit that fix backtracking for BPF_ALU | BPF_TO_BE
| BPF_END, both BPF_NEG and BPF_END require special handling. Add tests
written with inline assembly to check that the verifier does not incorrecly
use the src_reg field of BPF_NEG and BPF_END (including bswap added in v4).

Suggested-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Link: https://lore.kernel.org/r/20231102053913.12004-4-shung-hsi.yu@suse.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-11-01 22:54:28 -07:00
Shung-Hsi Yu
291d044fd5 bpf: Fix precision tracking for BPF_ALU | BPF_TO_BE | BPF_END
BPF_END and BPF_NEG has a different specification for the source bit in
the opcode compared to other ALU/ALU64 instructions, and is either
reserved or use to specify the byte swap endianness. In both cases the
source bit does not encode source operand location, and src_reg is a
reserved field.

backtrack_insn() currently does not differentiate BPF_END and BPF_NEG
from other ALU/ALU64 instructions, which leads to r0 being incorrectly
marked as precise when processing BPF_ALU | BPF_TO_BE | BPF_END
instructions. This commit teaches backtrack_insn() to correctly mark
precision for such case.

While precise tracking of BPF_NEG and other BPF_END instructions are
correct and does not need fixing, this commit opt to process all BPF_NEG
and BPF_END instructions within the same if-clause to better align with
current convention used in the verifier (e.g. check_alu_op).

Fixes: b5dc0163d8 ("bpf: precise scalar_value tracking")
Cc: stable@vger.kernel.org
Reported-by: Mohamed Mahmoud <mmahmoud@redhat.com>
Closes: https://lore.kernel.org/r/87jzrrwptf.fsf@toke.dk
Tested-by: Toke Høiland-Jørgensen <toke@redhat.com>
Tested-by: Tao Lyu <tao.lyu@epfl.ch>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Link: https://lore.kernel.org/r/20231102053913.12004-2-shung-hsi.yu@suse.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-11-01 22:54:27 -07:00
Alexei Starovoitov
698b8c5e3b Merge branch 'relax-allowlist-for-open-coded-css_task-iter'
Chuyi Zhou says:

====================
Relax allowlist for open-coded css_task iter

Hi,
The patchset aims to relax the allowlist for open-coded css_task iter
suggested by Alexei[1].

Please see individual patches for more details. And comments are always
welcome.

Patch summary:
 * Patch #1: Relax the allowlist and let css_task iter can be used in
   bpf iters and any sleepable progs.
 * Patch #2: Add a test in cgroup_iters.c which demonstrates how
   css_task iters can be combined with cgroup iter.
 * Patch #3: Add a test to prove css_task iter can be used in normal
 * sleepable progs.
link[1]:https://lore.kernel.org/lkml/CAADnVQKafk_junRyE=-FVAik4hjTRDtThymYGEL8hGTuYoOGpA@mail.gmail.com/
---

Changes in v2:
 * Fix the incorrect logic in check_css_task_iter_allowlist. Use
   expected_attach_type to check whether we are using bpf_iters.
 * Link to v1:https://lore.kernel.org/bpf/20231022154527.229117-1-zhouchuyi@bytedance.com/T/#m946f9cde86b44a13265d9a44c5738a711eb578fd
Changes in v3:
 * Add a testcase to prove css_task can be used in fentry.s
 * Link to v2:https://lore.kernel.org/bpf/20231024024240.42790-1-zhouchuyi@bytedance.com/T/#m14a97041ff56c2df21bc0149449abd275b73f6a3
Changes in v4:
 * Add Yonghong's ack for patch #1 and patch #2.
 * Solve Yonghong's comments for patch #2
 * Move prog 'iter_css_task_for_each_sleep' from iters_task_failure.c to
   iters_css_task.c. Use RUN_TESTS to prove we can load this prog.
 * Link to v3:https://lore.kernel.org/bpf/20231025075914.30979-1-zhouchuyi@bytedance.com/T/#m3200d8ad29af4ffab97588e297361d0a45d7585d

---
====================

Link: https://lore.kernel.org/r/20231031050438.93297-1-zhouchuyi@bytedance.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-11-01 22:49:20 -07:00
Chuyi Zhou
d8234d47c4 selftests/bpf: Add test for using css_task iter in sleepable progs
This Patch add a test to prove css_task iter can be used in normal
sleepable progs.

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20231031050438.93297-4-zhouchuyi@bytedance.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-11-01 22:49:20 -07:00
Chuyi Zhou
f49843afde selftests/bpf: Add tests for css_task iter combining with cgroup iter
This patch adds a test which demonstrates how css_task iter can be combined
with cgroup iter and it won't cause deadlock, though cgroup iter is not
sleepable.

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20231031050438.93297-3-zhouchuyi@bytedance.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-11-01 22:49:20 -07:00
Chuyi Zhou
3091b66749 bpf: Relax allowlist for css_task iter
The newly added open-coded css_task iter would try to hold the global
css_set_lock in bpf_iter_css_task_new, so the bpf side has to be careful in
where it allows to use this iter. The mainly concern is dead locking on
css_set_lock. check_css_task_iter_allowlist() in verifier enforced css_task
can only be used in bpf_lsm hooks and sleepable bpf_iter.

This patch relax the allowlist for css_task iter. Any lsm and any iter
(even non-sleepable) and any sleepable are safe since they would not hold
the css_set_lock before entering BPF progs context.

This patch also fixes the misused BPF_TRACE_ITER in
check_css_task_iter_allowlist which compared bpf_prog_type with
bpf_attach_type.

Fixes: 9c66dc94b6 ("bpf: Introduce css_task open-coded iterator kfuncs")
Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20231031050438.93297-2-zhouchuyi@bytedance.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-11-01 22:49:20 -07:00
Andrii Nakryiko
9af3775962 selftests/bpf: fix test_maps' use of bpf_map_create_opts
Use LIBBPF_OPTS() macro to properly initialize bpf_map_create_opts in
test_maps' tests.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20231029011509.2479232-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-11-01 22:42:38 -07:00
Hou Tao
fd381ce60a bpf: Check map->usercnt after timer->timer is assigned
When there are concurrent uref release and bpf timer init operations,
the following sequence diagram is possible. It will break the guarantee
provided by bpf_timer: bpf_timer will still be alive after userspace
application releases or unpins the map. It also will lead to kmemleak
for old kernel version which doesn't release bpf_timer when map is
released.

bpf program X:

bpf_timer_init()
  lock timer->lock
    read timer->timer as NULL
    read map->usercnt != 0

                process Y:

                close(map_fd)
                  // put last uref
                  bpf_map_put_uref()
                    atomic_dec_and_test(map->usercnt)
                      array_map_free_timers()
                        bpf_timer_cancel_and_free()
                          // just return
                          read timer->timer is NULL

    t = bpf_map_kmalloc_node()
    timer->timer = t
  unlock timer->lock

Fix the problem by checking map->usercnt after timer->timer is assigned,
so when there are concurrent uref release and bpf timer init, either
bpf_timer_cancel_and_free() from uref release reads a no-NULL timer
or the newly-added atomic64_read() returns a zero usercnt.

Because atomic_dec_and_test(map->usercnt) and READ_ONCE(timer->timer)
in bpf_timer_cancel_and_free() are not protected by a lock, so add
a memory barrier to guarantee the order between map->usercnt and
timer->timer. Also use WRITE_ONCE(timer->timer, x) to match the lockless
read of timer->timer in bpf_timer_cancel_and_free().

Reported-by: Hsin-Wei Hung <hsinweih@uci.edu>
Closes: https://lore.kernel.org/bpf/CABcoxUaT2k9hWsS1tNgXyoU3E-=PuOgMn737qK984fbFmfYixQ@mail.gmail.com
Fixes: b00628b1c7 ("bpf: Introduce bpf timers.")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20231030063616.1653024-1-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-11-01 22:37:31 -07:00
Varadarajan Narayanan
5b5b5806f2 cpufreq: qcom-nvmem: Introduce cpufreq for ipq95xx
IPQ95xx SoCs have different OPPs available for the CPU based on
the SoC variant. This can be determined from an eFuse register
present in the silicon.

Added support for ipq95xx on nvmem driver which helps to
determine OPPs at runtime based on the eFuse register which
has the CPU frequency limits. opp-supported-hw dt binding
can be used to indicate the available OPPs for each limit.

Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Praveenkumar I <ipkumar@codeaurora.org>
Signed-off-by: Kathiravan T <quic_kathirav@quicinc.com>
Signed-off-by: Varadarajan Narayanan <quic_varada@quicinc.com>
[ Viresh: Fixed subject ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2023-11-02 11:04:53 +05:30
Varadarajan Narayanan
ba5a61a08d cpufreq: qcom-nvmem: Enable cpufreq for ipq53xx
IPQ53xx have different OPPs available for the CPU based on
SoC variant. This can be determined through use of an eFuse
register present in the silicon.

Added support for ipq53xx on nvmem driver which helps to
determine OPPs at runtime based on the eFuse register which
has the CPU frequency limits. opp-supported-hw dt binding
can be used to indicate the available OPPs for each limit.

nvmem driver also creates the "cpufreq-dt" platform_device after
passing the version matching data to the OPP framework so that the
cpufreq-dt handles the actual cpufreq implementation.

Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Kathiravan T <quic_kathirav@quicinc.com>
Signed-off-by: Varadarajan Narayanan <quic_varada@quicinc.com>
[ Viresh: Fixed subject ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2023-11-02 11:04:53 +05:30
Robert Marko
0b9cd94913 cpufreq: qcom-nvmem: add support for IPQ8074
IPQ8074 comes in 3 families:
* IPQ8070A/IPQ8071A (Acorn) up to 1.4GHz
* IPQ8172/IPQ8173/IPQ8174 (Oak) up to 1.4GHz
* IPQ8072A/IPQ8074A/IPQ8076A/IPQ8078A (Hawkeye) up to 2.2GHz

So, in order to be able to share one OPP table lets add support for IPQ8074
family based of SMEM SoC ID-s as speedbin fuse is always 0 on IPQ8074.

IPQ8074 compatible is blacklisted from DT platdev as the cpufreq device
will get created by NVMEM CPUFreq driver.

Signed-off-by: Robert Marko <robimarko@gmail.com>
Acked-by: Konrad Dybcio <konrad.dybcio@linaro.org>
[ Viresh: Fixed rebase conflict. ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2023-11-02 11:04:53 +05:30
Viresh Kumar
477348dbfd Merge branch 'cpufreq/arm/qcom-nvmem' into HEAD
Merge base changes for cpufreq support for IPQ8074.
2023-11-02 11:04:33 +05:30
Dave Marchevsky
15fb6f2b6c bpf: Add __bpf_hook_{start,end} macros
Not all uses of __diag_ignore_all(...) in BPF-related code in order to
suppress warnings are wrapping kfunc definitions. Some "hook point"
definitions - small functions meant to be used as attach points for
fentry and similar BPF progs - need to suppress -Wmissing-declarations.

We could use __bpf_kfunc_{start,end}_defs added in the previous patch in
such cases, but this might be confusing to someone unfamiliar with BPF
internals. Instead, this patch adds __bpf_hook_{start,end} macros,
currently having the same effect as __bpf_kfunc_{start,end}_defs, then
uses them to suppress warnings for two hook points in the kernel itself
and some bpf_testmod hook points as well.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Cc: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/r/20231031215625.2343848-2-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-11-01 22:33:53 -07:00
Dave Marchevsky
391145ba2a bpf: Add __bpf_kfunc_{start,end}_defs macros
BPF kfuncs are meant to be called from BPF programs. Accordingly, most
kfuncs are not called from anywhere in the kernel, which the
-Wmissing-prototypes warning is unhappy about. We've peppered
__diag_ignore_all("-Wmissing-prototypes", ... everywhere kfuncs are
defined in the codebase to suppress this warning.

This patch adds two macros meant to bound one or many kfunc definitions.
All existing kfunc definitions which use these __diag calls to suppress
-Wmissing-prototypes are migrated to use the newly-introduced macros.
A new __diag_ignore_all - for "-Wmissing-declarations" - is added to the
__bpf_kfunc_start_defs macro based on feedback from Andrii on an earlier
version of this patch [0] and another recent mailing list thread [1].

In the future we might need to ignore different warnings or do other
kfunc-specific things. This change will make it easier to make such
modifications for all kfunc defs.

  [0]: https://lore.kernel.org/bpf/CAEf4BzaE5dRWtK6RPLnjTW-MW9sx9K3Fn6uwqCTChK2Dcb1Xig@mail.gmail.com/
  [1]: https://lore.kernel.org/bpf/ZT+2qCc%2FaXep0%2FLf@krava/

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Cc: Jiri Olsa <olsajiri@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Acked-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/r/20231031215625.2343848-1-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-11-01 22:33:53 -07:00
Manu Bretelle
cd60f410dd selftests/bpf: fix test_bpffs
Currently this tests tries to umount /sys/kernel/debug (TDIR) but the
system it is running on may have mounts below.

For example, danobi/vmtest [0] VMs have
    mount -t tracefs tracefs /sys/kernel/debug/tracing
as part of their init.

This change instead creates a "random" directory under /tmp and uses this
as TDIR.
If the directory already exists, ignore the error and keep moving on.

Test:

Originally:

    $ vmtest -k $KERNEL_REPO/arch/x86_64/boot/bzImage "./test_progs -vv -a test_bpffs"
    => bzImage
    ===> Booting
    ===> Setting up VM
    ===> Running command
    [    2.138818] bpf_testmod: loading out-of-tree module taints kernel.
    [    2.140913] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
    bpf_testmod.ko is already unloaded.
    Loading bpf_testmod.ko...
    Successfully loaded bpf_testmod.ko.
    test_test_bpffs:PASS:clone 0 nsec
    fn:PASS:unshare 0 nsec
    fn:PASS:mount / 0 nsec
    fn:FAIL:umount /sys/kernel/debug unexpected error: -1 (errno 16)
    bpf_testmod.ko is already unloaded.
    Loading bpf_testmod.ko...
    Successfully loaded bpf_testmod.ko.
    test_test_bpffs:PASS:clone 0 nsec
    test_test_bpffs:PASS:waitpid 0 nsec
    test_test_bpffs:FAIL:bpffs test  failed 255#282     test_bpffs:FAIL
    Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
    Successfully unloaded bpf_testmod.ko.
    Command failed with exit code: 1

After this change:

    $ vmtest -k $(make image_name) 'cd tools/testing/selftests/bpf && ./test_progs -vv -a test_bpffs'
    => bzImage
    ===> Booting
    ===> Setting up VM
    ===> Running command
    [    2.295696] bpf_testmod: loading out-of-tree module taints kernel.
    [    2.296468] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
    bpf_testmod.ko is already unloaded.
    Loading bpf_testmod.ko...
    Successfully loaded bpf_testmod.ko.
    test_test_bpffs:PASS:clone 0 nsec
    fn:PASS:unshare 0 nsec
    fn:PASS:mount / 0 nsec
    fn:PASS:mount tmpfs 0 nsec
    fn:PASS:mkdir /tmp/test_bpffs_testdir/fs1 0 nsec
    fn:PASS:mkdir /tmp/test_bpffs_testdir/fs2 0 nsec
    fn:PASS:mount bpffs /tmp/test_bpffs_testdir/fs1 0 nsec
    fn:PASS:mount bpffs /tmp/test_bpffs_testdir/fs2 0 nsec
    fn:PASS:reading /tmp/test_bpffs_testdir/fs1/maps.debug 0 nsec
    fn:PASS:reading /tmp/test_bpffs_testdir/fs2/progs.debug 0 nsec
    fn:PASS:creating /tmp/test_bpffs_testdir/fs1/a 0 nsec
    fn:PASS:creating /tmp/test_bpffs_testdir/fs1/a/1 0 nsec
    fn:PASS:creating /tmp/test_bpffs_testdir/fs1/b 0 nsec
    fn:PASS:create_map(ARRAY) 0 nsec
    fn:PASS:pin map 0 nsec
    fn:PASS:stat(/tmp/test_bpffs_testdir/fs1/a) 0 nsec
    fn:PASS:renameat2(/fs1/a, /fs1/b, RENAME_EXCHANGE) 0 nsec
    fn:PASS:stat(/tmp/test_bpffs_testdir/fs1/b) 0 nsec
    fn:PASS:b should have a's inode 0 nsec
    fn:PASS:access(/tmp/test_bpffs_testdir/fs1/b/1) 0 nsec
    fn:PASS:stat(/tmp/test_bpffs_testdir/fs1/map) 0 nsec
    fn:PASS:renameat2(/fs1/c, /fs1/b, RENAME_EXCHANGE) 0 nsec
    fn:PASS:stat(/tmp/test_bpffs_testdir/fs1/b) 0 nsec
    fn:PASS:b should have c's inode 0 nsec
    fn:PASS:access(/tmp/test_bpffs_testdir/fs1/c/1) 0 nsec
    fn:PASS:renameat2(RENAME_NOREPLACE) 0 nsec
    fn:PASS:access(/tmp/test_bpffs_testdir/fs1/b) 0 nsec
    bpf_testmod.ko is already unloaded.
    Loading bpf_testmod.ko...
    Successfully loaded bpf_testmod.ko.
    test_test_bpffs:PASS:clone 0 nsec
    test_test_bpffs:PASS:waitpid 0 nsec
    test_test_bpffs:PASS:bpffs test  0 nsec
    #282     test_bpffs:OK
    Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
    Successfully unloaded bpf_testmod.ko.

[0] https://github.com/danobi/vmtest

This is a follow-up of https://lore.kernel.org/bpf/20231024201852.1512720-1-chantr4@gmail.com/T/

v1 -> v2:
  - use a TDIR name that is related to test
  - use C-style comments

Signed-off-by: Manu Bretelle <chantr4@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20231031223606.2927976-1-chantr4@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-11-01 22:31:41 -07:00