Commit Graph

1336438 Commits

Author SHA1 Message Date
Linus Torvalds
f34b580514 Merge tag 'nfsd-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd updates from Chuck Lever:
 "Jeff Layton contributed an implementation of NFSv4.2+ attribute
  delegation, as described here:

    https://www.ietf.org/archive/id/draft-ietf-nfsv4-delstid-08.html

  This interoperates with similar functionality introduced into the
  Linux NFS client in v6.11. An attribute delegation permits an NFS
  client to manage a file's mtime, rather than flushing dirty data to
  the NFS server so that the file's mtime reflects the last write, which
  is considerably slower.

  Neil Brown contributed dynamic NFSv4.1 session slot table resizing.
  This facility enables NFSD to increase or decrease the number of slots
  per NFS session depending on server memory availability. More session
  slots means greater parallelism.

  Chuck Lever fixed a long-standing latent bug where NFSv4 COMPOUND
  encoding screws up when crossing a page boundary in the encoding
  buffer. This is a zero-day bug, but hitting it is rare and depends on
  the NFS client implementation. The Linux NFS client does not happen to
  trigger this issue.

  A variety of bug fixes and other incremental improvements fill out the
  list of commits in this release. Great thanks to all contributors,
  reviewers, testers, and bug reporters who participated during this
  development cycle"

* tag 'nfsd-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (42 commits)
  sunrpc: Remove gss_{de,en}crypt_xdr_buf deadcode
  sunrpc: Remove gss_generic_token deadcode
  sunrpc: Remove unused xprt_iter_get_xprt
  Revert "SUNRPC: Reduce thread wake-up rate when receiving large RPC messages"
  nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION
  nfsd: handle delegated timestamps in SETATTR
  nfsd: add support for delegated timestamps
  nfsd: rework NFS4_SHARE_WANT_* flag handling
  nfsd: add support for FATTR4_OPEN_ARGUMENTS
  nfsd: prepare delegation code for handing out *_ATTRS_DELEG delegations
  nfsd: rename NFS4_SHARE_WANT_* constants to OPEN4_SHARE_ACCESS_WANT_*
  nfsd: switch to autogenerated definitions for open_delegation_type4
  nfs_common: make include/linux/nfs4.h include generated nfs4_1.h
  nfsd: fix handling of delegated change attr in CB_GETATTR
  SUNRPC: Document validity guarantees of the pointer returned by reserve_space
  NFSD: Insulate nfsd4_encode_fattr4() from page boundaries in the encode buffer
  NFSD: Insulate nfsd4_encode_secinfo() from page boundaries in the encode buffer
  NFSD: Refactor nfsd4_do_encode_secinfo() again
  NFSD: Insulate nfsd4_encode_readlink() from page boundaries in the encode buffer
  NFSD: Insulate nfsd4_encode_read_plus_data() from page boundaries in the encode buffer
  ...
2025-01-27 17:27:24 -08:00
Linus Torvalds
7d6e5b5258 Merge tag 'drm-next-2025-01-27' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Simona Vetter:
 "cgroup:
   - fix Koncfig fallout from new dmem controller

  Driver Changes:
   - v3d NULL pointer regression fix in fence signalling race
   - virtio: uaf in dma_buf free path
   - xlnx: fix kerneldoc
   - bochs: fix double-free on driver removal
   - zynqmp: add missing locking to DP bridge driver
   - amdgpu fixes all over:
       - documentation, display, sriov, various hw block drivers
       - use drm/sched helper
       - mark some debug module options as unsafe
       - amdkfd: mark some debug module options as unsafe, trap handler
         updates, fix partial migration handling

  DRM core:
   - fix fbdev Kconfig select rules, improve tiled-based display
     support"

* tag 'drm-next-2025-01-27' of https://gitlab.freedesktop.org/drm/kernel: (40 commits)
  drm/amd/display: Optimize cursor position updates
  drm/amd/display: Add hubp cache reset when powergating
  drm/amd/amdgpu: Enable scratch data dump for mes 12
  drm/amd: Clarify kdoc for amdgpu.gttsize
  drm/amd/amdgpu: Prevent null pointer dereference in GPU bandwidth calculation
  drm/amd/display: Fix error pointers in amdgpu_dm_crtc_mem_type_changed
  drm/amdgpu: fix ring timeout issue in gfx10 sr-iov environment
  drm/amd/pm: Fix smu v13.0.6 caps initialization
  drm/amd/pm: Refactor SMU 13.0.6 SDMA reset firmware version checks
  revert "drm/amdgpu/pm: add definition PPSMC_MSG_ResetSDMA2"
  revert "drm/amdgpu/pm: Implement SDMA queue reset for different asic"
  drm/amd/pm: Add capability flags for SMU v13.0.6
  drm/amd/display: fix SUBVP DC_DEBUG_MASK documentation
  drm/amd/display: fix CEC DC_DEBUG_MASK documentation
  drm/amdgpu: fix the PCIe lanes reporting in the INFO IOCTL
  drm/amdgpu: cache gpu pcie link width
  drm/amd/display: mark static functions noinline_for_stack
  drm/amdkfd: Clear MODE.VSKIP in gfx9 trap handler
  drm/amdgpu: Refine ip detection log message
  drm/amdgpu: Add handler for SDMA context empty
  ...
2025-01-27 17:17:16 -08:00
Linus Torvalds
9629d83f05 Merge tag 'for-6.14/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mikulas Patocka:

 - fix a spelling error in dm-raid

 - change kzalloc to kcalloc

 - remove useless test in alloc_multiple_bios

 - disable REQ_NOWAIT for flushes

 - dm-transaction-manager: use red-black trees instead of linear lists

 - atomic writes support for dm-linear, dm-stripe and dm-mirror

 - dm-crypt: code cleanups and two bugfixes

* tag 'for-6.14/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm-crypt: track tag_offset in convert_context
  dm-crypt: don't initialize cc_sector again
  dm-crypt: don't update io->sector after kcryptd_crypt_write_io_submit()
  dm-crypt: use bi_sector in bio when initialize integrity seed
  dm-crypt: fully initialize clone->bi_iter in crypt_alloc_buffer()
  dm-crypt: set atomic as false when calling crypt_convert() in kworker
  dm-mirror: Support atomic writes
  dm-io: Warn on creating multiple atomic write bios for a region
  dm-stripe: Enable atomic writes
  dm-linear: Enable atomic writes
  dm: Ensure cloned bio is same length for atomic write
  dm-table: atomic writes support
  dm-transaction-manager: use red-black trees instead of linear lists
  dm: disable REQ_NOWAIT for flushes
  dm: remove useless test in alloc_multiple_bios
  dm: change kzalloc to kcalloc
  dm raid: fix spelling errors in raid_ctr()
2025-01-27 17:06:42 -08:00
Linus Torvalds
13845bdc86 Merge tag 'char-misc-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull Char/Misc/IIO driver updates from Greg KH:
 "Here is the "big" set of char/misc/iio and other smaller driver
  subsystem updates for 6.14-rc1. Loads of different things in here this
  development cycle, highlights are:

   - ntsync "driver" to handle Windows locking types enabling Wine to
     work much better on many workloads (i.e. games). The driver
     framework was in 6.13, but now it's enabled and fully working
     properly. Should make many SteamOS users happy. Even comes with
     tests!

   - Large IIO driver updates and bugfixes

   - FPGA driver updates

   - Coresight driver updates

   - MHI driver updates

   - PPS driver updatesa

   - const bin_attribute reworking for many drivers

   - binder driver updates

   - smaller driver updates and fixes

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'char-misc-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (311 commits)
  ntsync: Fix reference leaks in the remaining create ioctls.
  spmi: hisi-spmi-controller: Drop duplicated OF node assignment in spmi_controller_probe()
  spmi: Set fwnode for spmi devices
  ntsync: fix a file reference leak in drivers/misc/ntsync.c
  scripts/tags.sh: Don't tag usages of DECLARE_BITMAP
  dt-bindings: interconnect: qcom,msm8998-bwmon: Add SM8750 CPU BWMONs
  dt-bindings: interconnect: OSM L3: Document sm8650 OSM L3 compatible
  dt-bindings: interconnect: qcom-bwmon: Document QCS615 bwmon compatibles
  interconnect: sm8750: Add missing const to static qcom_icc_desc
  memstick: core: fix kernel-doc notation
  intel_th: core: fix kernel-doc warnings
  binder: log transaction code on failure
  iio: dac: ad3552r-hs: clear reset status flag
  iio: dac: ad3552r-common: fix ad3541/2r ranges
  iio: chemical: bme680: Fix uninitialized variable in __bme680_read_raw()
  misc: fastrpc: Fix copy buffer page size
  misc: fastrpc: Fix registered buffer page address
  misc: fastrpc: Deregister device nodes properly in error scenarios
  nvmem: core: improve range check for nvmem_cell_write()
  nvmem: qcom-spmi-sdam: Set size in struct nvmem_config
  ...
2025-01-27 16:51:51 -08:00
Linus Torvalds
125ca74546 Merge tag 'staging-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver updates from Greg KH:
 "Here's the pretty small staging driver tree update for 6.14-rc1. Not
  much happened this development cycle:

   - deleted some unused ioctl code from the rtl8723bs driver

   - gpib driver cleanups and fixes

   - other tiny minor coding style fixes.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'staging-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (38 commits)
  staging: gpib: Agilent usb code cleanup
  staging: gpib: Fix NULL pointer dereference in detach
  staging: gpib: Fix inadvertent negative shift
  staging: gpib: fix prefixing 0x with decimal output
  staging: gpib: Use C99 syntax and make static
  staging: gpib: Avoid plain integers as NULL pointers
  staging: gpib: Use __user for user space pointers
  staging: gpib: Use __iomem attribute for io addresses
  staging: gpib: Add missing mutex unlock in ni usb driver
  staging: gpib: Add missing mutex unlock in agilent usb driver
  staging: gpib: Modernize gpib_interface_t initialization and make static
  staging: gpib: Remove commented-out debug code
  staging: rtl8723bs: Remove ioctl interface
  staging: gpib: tnt4882: Handle gpib_register_driver() errors
  staging: gpib: pc2: Handle gpib_register_driver() errors
  staging: gpib: ni_usb: Handle gpib_register_driver() errors
  staging: gpib: lpvo_usb: Return error value from gpib_register_driver()
  staging: gpib: ines: Handle gpib_register_driver() errors
  staging: gpib: hp_82341: Handle gpib_register_driver() errors
  staging: gpib: hp_82335: Return error value from gpib_register_driver()
  ...
2025-01-27 16:43:27 -08:00
Linus Torvalds
cc8b10fa70 Merge tag 'usb-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB / Thunderbolt driver updates from Greg KH:
 "Here is the USB and Thunderbolt driver updates for 6.14-rc1. Nothing
  huge in here, just lots of new hardware support and updates for
  existing drivers. Changes here are:

   - big gadget f_tcm driver update

   - other gadget driver updates and fixes

   - thunderbolt driver updates for new hardware and capabilities and
     lots more debugging functionality to handle it when things aren't
     working well.

   - xhci driver updates

   - new USB-serial device updates

   - typec driver updates, including a chrome platform driver (acked by
     the subsystem maintainers)

   - other small driver updates

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'usb-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (123 commits)
  usb: hcd: Bump local buffer size in rh_string()
  Revert "usb: gadget: u_serial: Disable ep before setting port to null to fix the crash caused by port being null"
  usb: typec: tcpci: Prevent Sink disconnection before vPpsShutdown in SPR PPS
  usb: xhci: tegra: Fix OF boolean read warning
  usb: host: xhci-plat: add support compatible ID PNP0D15
  usb: typec: ucsi: Add a macro definition for UCSI v1.0
  usb: dwc3: core: Defer the probe until USB power supply ready
  usbip: Correct format specifier for seqnum from %d to %u
  usbip: Fix seqnum sign extension issue in vhci_tx_urb
  dt-bindings: usb: snps,dwc3: Split core description
  usb: quirks: Add NO_LPM quirk for TOSHIBA TransMemory-Mx device
  usb: dwc3: gadget: Reinitiate stream for all host NoStream behavior
  USB: Use str_enable_disable-like helpers
  USB: gadget: Use str_enable_disable-like helpers
  USB: phy: Use str_enable_disable-like helpers
  USB: typec: Use str_enable_disable-like helpers
  USB: host: Use str_enable_disable-like helpers
  USB: Replace own str_plural with common one
  USB: serial: quatech2: fix null-ptr-deref in qt2_process_read_urb()
  usb: phy: Remove API devm_usb_put_phy()
  ...
2025-01-27 16:29:16 -08:00
Al Viro
c1feab95e0 add a string-to-qstr constructor
Quite a few places want to build a struct qstr by given string;
it would be convenient to have a primitive doing that, rather
than open-coding it via QSTR_INIT().

The closest approximation was in bcachefs, but that expands to
initializer list - {.len = strlen(string), .name = string}.
It would be more useful to have it as compound literal -
(struct qstr){.len = strlen(string), .name = string}.

Unlike initializer list it's a valid expression.  What's more,
it's a valid lvalue - it's an equivalent of anonymous local
variable with such initializer, so the things like
	path->dentry = d_alloc_pseudo(mnt->mnt_sb, &QSTR(name));
are valid.  It can also be used as initializer, with identical
effect -
	struct qstr x = (struct qstr){.name = s, .len = strlen(s)};
is equivalent to
	struct qstr anon_variable = {.name = s, .len = strlen(s)};
	struct qstr x = anon_variable;
	// anon_variable is never used after that point
and any even remotely sane compiler will manage to collapse that
into
	struct qstr x = {.name = s, .len = strlen(s)};

What compound literals can't be used for is initialization of
global variables, but those are covered by QSTR_INIT().

This commit lifts definition(s) of QSTR() into linux/dcache.h,
converts it to compound literal (all bcachefs users are fine
with that) and converts assorted open-coded instances to using
that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-27 19:25:45 -05:00
Al Viro
30d61efe11 9p: fix ->rename_sem exclusion
9p wants to be able to build a path from given dentry to fs root and keep
it valid over a blocking operation.

->s_vfs_rename_mutex would be a natural candidate, but there are places
where we need that and where we have no way to tell if ->s_vfs_rename_mutex
is already held deeper in callchain.  Moreover, it's only held for
cross-directory renames; name changes within the same directory happen
without it.

Solution:
	* have d_move() done in ->rename() rather than in its caller
	* maintain a 9p-private rwsem (per-filesystem)
	* hold it exclusive over the relevant part of ->rename()
	* hold it shared over the places where we want the path.

That almost works.  FS_RENAME_DOES_D_MOVE is enough to put all d_move()
and d_exchange() calls under filesystem's control.  However, there's
also __d_unalias(), which isn't covered by any of that.

If ->lookup() hits a directory inode with preexisting dentry elsewhere
(due to e.g. rename done on server behind our back), d_splice_alias()
called by ->lookup() will move/rename that alias.

Add a couple of optional methods, so that __d_unalias() would do
	if alias->d_op->d_unalias_trylock != NULL
		if (!alias->d_op->d_unalias_trylock(alias))
			fail (resulting in -ESTALE from lookup)
	__d_move(...)
	if alias->d_op->d_unalias_unlock != NULL
		alias->d_unalias_unlock(alias)
where it currently does __d_move().  9p instances do down_write_trylock()
and up_write() of ->rename_mutex.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-27 19:25:24 -05:00
Al Viro
90341f22c3 orangefs_d_revalidate(): use stable parent inode and name passed by caller
->d_name use is a UAF if the userland side of things can be slowed down
by attacker.

Tested-by: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-27 19:25:24 -05:00
Al Viro
9640fe5b5e ocfs2_dentry_revalidate(): use stable parent inode and name passed by caller
theoretically, ->d_name use in there is a UAF, but only if you are messing with
tracepoints...

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-27 19:25:24 -05:00
Al Viro
ffeeaada2b nfs: fix ->d_revalidate() UAF on ->d_name accesses
Pass the stable name all the way down to ->rpc_ops->lookup() instances.

Note that passing &dentry->d_name is safe in e.g. nfs_lookup() - it *is*
stable there, as it is in ->create() et.al.

dget_parent() in nfs_instantiate() should be redundant - it'd better be
stable there; if it's not, we have more trouble, since ->d_name would
also be unsafe in such case.

nfs_submount() and nfs4_submount() may or may not require fixes - if
they ever get moved on server with fhandle preserved, we are in trouble
there...

UAF window is fairly narrow here and exfiltration requires the ability
to watch the traffic.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-27 19:25:24 -05:00
Al Viro
39f644a266 nfs{,4}_lookup_validate(): use stable parent inode passed by caller
we can't kill __nfs_lookup_revalidate() completely, but ->d_parent boilerplate
in it is gone

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-27 19:25:24 -05:00
Al Viro
eab2a11e5b gfs2_drevalidate(): use stable parent inode and name passed by caller
No need to mess with dget_parent() for the former; for the latter we really should
not rely upon ->d_name.name remaining stable.  Theoretically a UAF, but it's
hard to exfiltrate the information...

Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-27 19:25:24 -05:00
Al Viro
19e1dbdc6b fuse_dentry_revalidate(): use stable parent inode and name passed by caller
No need to mess with dget_parent() for the former; for the latter we really should
not rely upon ->d_name.name remaining stable - it's a real-life UAF.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-27 19:25:24 -05:00
Al Viro
464361133e vfat_revalidate{,_ci}(): use stable parent inode passed by caller
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-27 19:25:24 -05:00
Al Viro
14d02c3dcf exfat_d_revalidate(): use stable parent inode passed by caller
... no need to bother with ->d_lock and ->d_parent->d_inode.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-27 19:25:23 -05:00
Al Viro
c4a9fe6319 fscrypt_d_revalidate(): use stable parent inode passed by caller
The only thing it's using is parent directory inode and we are already
given a stable reference to that - no need to bother with boilerplate.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-27 19:25:23 -05:00
Al Viro
541795cb0b ceph_d_revalidate(): propagate stable name down into request encoding
Currently get_fscrypt_altname() requires ->r_dentry->d_name to be stable
and it gets that in almost all cases.  The only exception is ->d_revalidate(),
where we have a stable name, but it's passed separately - dentry->d_name
is not stable there.

Propagate it down to get_fscrypt_altname() as a new field of struct
ceph_mds_request - ->r_dname, to be used instead ->r_dentry->d_name
when non-NULL.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-27 19:25:23 -05:00
Al Viro
bf636ed4a9 ceph_d_revalidate(): use stable parent inode passed by caller
No need to mess with the boilerplate for obtaining what we already
have.  Note that ceph is one of the "will want a path from filesystem
root if we want to talk to server" cases, so the name of the last
component is of little use - it is passed to fscrypt_d_revalidate()
and it's used to deal with (also crypt-related) case in request
marshalling, when encrypted name turns out to be too long.  The former
is not a problem, but the latter is racy; that part will be handled
in the next commit.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-27 19:25:23 -05:00
Al Viro
da192022f8 afs_d_revalidate(): use stable name and parent inode passed by caller
No need to bother with boilerplate for obtaining the latter and for
the former we really should not count upon ->d_name.name remaining
stable under us.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-27 19:25:23 -05:00
Al Viro
5be1fa8abd Pass parent directory inode and expected name to ->d_revalidate()
->d_revalidate() often needs to access dentry parent and name; that has
to be done carefully, since the locking environment varies from caller
to caller.  We are not guaranteed that dentry in question will not be
moved right under us - not unless the filesystem is such that nothing
on it ever gets renamed.

It can be dealt with, but that results in boilerplate code that isn't
even needed - the callers normally have just found the dentry via dcache
lookup and want to verify that it's in the right place; they already
have the values of ->d_parent and ->d_name stable.  There is a couple
of exceptions (overlayfs and, to less extent, ecryptfs), but for the
majority of calls that song and dance is not needed at all.

It's easier to make ecryptfs and overlayfs find and pass those values if
there's a ->d_revalidate() instance to be called, rather than doing that
in the instances.

This commit only changes the calling conventions; making use of supplied
values is left to followups.

NOTE: some instances need more than just the parent - things like CIFS
may need to build an entire path from filesystem root, so they need
more precautions than the usual boilerplate.  This series doesn't
do anything to that need - these filesystems have to keep their locking
mechanisms (rename_lock loops, use of dentry_path_raw(), private rwsem
a-la v9fs).

One thing to keep in mind when using name is that name->name will normally
point into the pathname being resolved; the filename in question occupies
name->len bytes starting at name->name, and there is NUL somewhere after it,
but it the next byte might very well be '/' rather than '\0'.  Do not
ignore name->len.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Gabriel Krisman Bertazi <gabriel@krisman.be>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-27 19:25:23 -05:00
Al Viro
4c43ab1b10 generic_ci_d_compare(): use shortname_storage
... and check the "name might be unstable" predicate
the right way.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Gabriel Krisman Bertazi <gabriel@krisman.be>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-27 19:24:43 -05:00
Al Viro
7e3270165a ext4 fast_commit: make use of name_snapshot primitives
... rather than open-coding them.  As a bonus, that avoids the pointless
work with extra allocations, etc. for long names.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-27 19:24:43 -05:00
Al Viro
95a4ccbbe5 dissolve external_name.u into separate members
... and document the constraints on the layout.  Kept separate from
the previous commit to keep the noise separate from actual changes.
The reason for explicit __aligned() on ->name[] rather than relying
upon the alignment of the previous field is that the previous iteration
of that commit tried to save 4 bytes on 64bit by eliminating a hole
in there, which broke the assumptions in dentry_string_cmp().
Better spell it out and avoid the temptation for the future...

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-27 19:17:22 -05:00
Jakub Kicinski
b2aec4efe8 Merge branch '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2025-01-24 (idpf, ice, iavf)

For idpf:

Emil adds memory barrier when accessing control queue descriptors and
restores call to idpf_vc_xn_shutdown() when resetting.

Manoj Vishwanathan expands transaction lock to properly protect xn->salt
value and adds additional debugging information.

Marco Leogrande converts workqueues to be unbound.

For ice:

Przemek fixes incorrect size use for array.

Mateusz removes reporting of invalid parameter and value.

For iavf:

Michal adjusts some VLAN changes to occur without a PF call to avoid
timing issues with the calls.

* '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  iavf: allow changing VLAN state without calling PF
  ice: remove invalid parameter of equalizer
  ice: fix ice_parser_rt::bst_key array size
  idpf: add more info during virtchnl transaction timeout/salt mismatch
  idpf: convert workqueues to unbound
  idpf: Acquire the lock before accessing the xn->salt
  idpf: fix transaction timeouts on reset
  idpf: add read memory barrier when checking descriptor done bit
====================

Link: https://patch.msgid.link/20250124213213.1328775-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-27 16:16:32 -08:00
Ian Rogers
bde4ccfd5a perf annotate: Use an array for the disassembler preference
Prior to this change a string was used which could cause issues with
an unrecognized disassembler in symbol__disassembler. Change to
initializing an array of perf_disassembler enum values. If a value
already exists then adding it a second time is ignored to avoid array
out of bounds problems present in the previous code, it also allows a
statically sized array and removes memory allocation needs. Errors in
the disassembler string are reported when the config is parsed during
perf annotate or perf top start up. If the array is uninitialized
after processing the config file the default llvm, capstone then
objdump values are added but without a need to parse a string.

Fixes: a6e8a58de6 ("perf disasm: Allow configuring what disassemblers to use")
Closes: https://lore.kernel.org/lkml/CAP-5=fUdfCyxmEiTpzS2uumUp3-SyQOseX2xZo81-dQtWXj6vA@mail.gmail.com/
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20250124043856.1177264-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-01-27 15:58:01 -08:00
Linus Torvalds
078eac2b5b Merge tag 'pwm/for-6.14-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux
Pull pwm fixes from Uwe Kleine-König:
 "Two fixes.

  Conor Dooley found and fixed a problem in the pwm-microchip-core
  driver that existed since the driver's birth in v6.5-rc1. It's about a
  corner case that only happens if two pwm devices of the same chip are
  set to the same long period.

  The other problem is about the new pwm API that currently is only
  supported by two hardware drivers. The fix prevents a NULL pointer
  exception if one of the new functions is called for a pwm device with
  a driver that only provides the old callbacks"

* tag 'pwm/for-6.14-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux:
  pwm: Ensure callbacks exist before calling them
  pwm: microchip-core: fix incorrect comparison with max period
2025-01-27 15:45:29 -08:00
Linus Torvalds
f28f489045 Merge tag 'for-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply
Pull power supply and reset updates from Sebastian Reichel:
 "Power-supply core:

   - introduce power supply extensions, which allows adding properties
     to a power supply device from a separate driver. This will be used
     initially to extend the generic ACPI charger/battery driver with
     vendor extensions for charge thresholds.

   - convert all drivers from power_supply_for_each_device to new
     power_supply_for_each_psy(), which avoids lots of casting being
     done in the drivers.

   - avoid LED trigger like values in uevent for
     POWER_SUPPLY_PROP_CHARGE_BEHAVIOUR

   - introduce POWER_SUPPLY_PROP_CHARGE_TYPES, which is similar to the
     POWER_SUPPLY_PROP_CHARGE_TYPE property, but also lists the
     available options on the specific platform

  Power-supply drivers

   - dell-laptop: use new power_supply_charge_types_show/_parse helpers

   - stc3117: new driver for equally named fuel gauge chip

   - bq24190: add support for new POWER_SUPPLY_PROP_CHARGE_TYPES

   - bq24190: add BQ24297 support

   - bq27xxx: add voltage min design for bq27000/bq27200

   - cros_charge-control: convert to new power supply extension API

   - multiple drivers: constify 'struct bin_attribute'

   - ds2782: convert to device managed resources

   - max1720x: add charge full property

   - max1720x: support extra thermistor temperatures

   - max17042: add max77705 support

   - ip5xxx-power: add support for IP5306

   - ltc4162-l-charger: add ltc4162-f/s and ltc4015 support

   - gpio-charger: support for default charge current limit

   - misc small cleanups and fixes

  Reset drivers:

   - at91-poweroff: add sam9x7 support"

* tag 'for-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: (77 commits)
  power: supply: max1720x: add support for reading internal and thermistor temperatures
  power: supply: ltc4162l: Use GENMASK macro in bitmask operation
  power: supply: max17042: add max77705 fuel gauge support
  dt-bindings: power: supply: max17042: add max77705 support
  power: supply: add undervoltage health status property
  power: supply: max17042: add platform driver variant
  power: supply: max17042: make interrupt shared
  power: reset: keystone: Use syscon_regmap_lookup_by_phandle_args
  power: supply: Use str_enable_disable-like helpers
  platform/x86: dell-laptop: Use power_supply_charge_types_show/_parse() helpers
  power: supply: bq2415x_charger: Immediately reschedule delayed work on notifier events
  power: supply: Add STC3117 fuel gauge unit driver
  dt-bindings: power: supply: Add STC3117 Fuel Gauge
  power: supply: ug3105_battery: Let the core handle POWER_SUPPLY_PROP_TECHNOLOGY
  power: supply: gpio-charger: add support for default charge current limit
  dt-bindings: power: supply: gpio-charger: add support for default charge current limit
  power: supply: Use power_supply_external_power_changed() in __power_supply_changed_work()
  power: supply: core: fix build of extension sysfs group if CONFIG_SYSFS=n
  power: supply: bq2415x_charger: report charging state changes to userspace
  bq27xxx: add voltage min design for bq27000 and bq27200
  ...
2025-01-27 15:37:16 -08:00
Linus Torvalds
deee7487f5 Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio updates from Michael Tsirkin:
 "A small number of improvements all over the place:

   - vdpa/octeon support for multiple interrupts

   - virtio-pci support for error recovery

   - vp_vdpa support for notification with data

   - vhost/net fix to set num_buffers for spec compliance

   - virtio-mem now works with kdump on s390

  And small cleanups all over the place"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (23 commits)
  virtio_blk: Add support for transport error recovery
  virtio_pci: Add support for PCIe Function Level Reset
  vhost/net: Set num_buffers for virtio 1.0
  vdpa/octeon_ep: read vendor-specific PCI capability
  virtio-pci: define type and header for PCI vendor data
  vdpa/octeon_ep: handle device config change events
  vdpa/octeon_ep: enable support for multiple interrupts per device
  vdpa: solidrun: Replace deprecated PCI functions
  s390/kdump: virtio-mem kdump support (CONFIG_PROC_VMCORE_DEVICE_RAM)
  virtio-mem: support CONFIG_PROC_VMCORE_DEVICE_RAM
  virtio-mem: remember usable region size
  virtio-mem: mark device ready before registering callbacks in kdump mode
  fs/proc/vmcore: introduce PROC_VMCORE_DEVICE_RAM to detect device RAM ranges in 2nd kernel
  fs/proc/vmcore: factor out freeing a list of vmcore ranges
  fs/proc/vmcore: factor out allocating a vmcore range and adding it to a list
  fs/proc/vmcore: move vmcore definitions out of kcore.h
  fs/proc/vmcore: prefix all pr_* with "vmcore:"
  fs/proc/vmcore: disallow vmcore modifications while the vmcore is open
  fs/proc/vmcore: replace vmcoredd_mutex by vmcore_mutex
  fs/proc/vmcore: convert vmcore_cb_lock into vmcore_mutex
  ...
2025-01-27 15:26:06 -08:00
Jakub Kicinski
463ec95a16 Merge tag 'ipsec-2025-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:

====================
pull request (net): ipsec 2025-01-27

1) Fix incrementing the upper 32 bit sequence numbers for GSO skbs.
   From Jianbo Liu.

2) Fix an out-of-bounds read on xfrm state lookup.
   From Florian Westphal.

3) Fix secpath handling on packet offload mode.
   From Alexandre Cassen.

4) Fix the usage of skb->sk in the xfrm layer.

5) Don't disable preemption while looking up cache state
   to fix PREEMPT_RT.
   From Sebastian Sewior.

* tag 'ipsec-2025-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
  xfrm: Don't disable preemption while looking up cache state.
  xfrm: Fix the usage of skb->sk
  xfrm: delete intermediate secpath entry in packet offload mode
  xfrm: state: fix out-of-bounds read during lookup
  xfrm: replay: Fix the update of replay_esn->oseq_hi for GSO
====================

Link: https://patch.msgid.link/20250127060757.3946314-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-27 15:15:12 -08:00
Jakub Kicinski
0154b949a1 Merge branch 'mptcp-fixes-addressing-syzbot-reports'
Matthieu Baerts says:

====================
mptcp: fixes addressing syzbot reports

Recently, a few issues linked to MPTCP have been reported by syzbot. All
the remaining ones are addressed in this series.

- Patch 1: Address "KMSAN: uninit-value in mptcp_incoming_options (2)".
  A fix for v5.11.

- Patch 2: Address "WARNING in mptcp_pm_nl_set_flags (2)". A fix for
  v5.18.

- Patch 3: Address "WARNING in __mptcp_clean_una (2)". A fix for v6.4,
  backported up to v6.1.
====================

Link: https://patch.msgid.link/20250123-net-mptcp-syzbot-issues-v1-0-af73258a726f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-27 15:07:09 -08:00
Paolo Abeni
619af16b3b mptcp: handle fastopen disconnect correctly
Syzbot was able to trigger a data stream corruption:

  WARNING: CPU: 0 PID: 9846 at net/mptcp/protocol.c:1024 __mptcp_clean_una+0xddb/0xff0 net/mptcp/protocol.c:1024
  Modules linked in:
  CPU: 0 UID: 0 PID: 9846 Comm: syz-executor351 Not tainted 6.13.0-rc2-syzkaller-00059-g00a5acdbf398 #0
  Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 11/25/2024
  RIP: 0010:__mptcp_clean_una+0xddb/0xff0 net/mptcp/protocol.c:1024
  Code: fa ff ff 48 8b 4c 24 18 80 e1 07 fe c1 38 c1 0f 8c 8e fa ff ff 48 8b 7c 24 18 e8 e0 db 54 f6 e9 7f fa ff ff e8 e6 80 ee f5 90 <0f> 0b 90 4c 8b 6c 24 40 4d 89 f4 e9 04 f5 ff ff 44 89 f1 80 e1 07
  RSP: 0018:ffffc9000c0cf400 EFLAGS: 00010293
  RAX: ffffffff8bb0dd5a RBX: ffff888033f5d230 RCX: ffff888059ce8000
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
  RBP: ffffc9000c0cf518 R08: ffffffff8bb0d1dd R09: 1ffff110170c8928
  R10: dffffc0000000000 R11: ffffed10170c8929 R12: 0000000000000000
  R13: ffff888033f5d220 R14: dffffc0000000000 R15: ffff8880592b8000
  FS:  00007f6e866496c0(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f6e86f491a0 CR3: 00000000310e6000 CR4: 00000000003526f0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   <TASK>
   __mptcp_clean_una_wakeup+0x7f/0x2d0 net/mptcp/protocol.c:1074
   mptcp_release_cb+0x7cb/0xb30 net/mptcp/protocol.c:3493
   release_sock+0x1aa/0x1f0 net/core/sock.c:3640
   inet_wait_for_connect net/ipv4/af_inet.c:609 [inline]
   __inet_stream_connect+0x8bd/0xf30 net/ipv4/af_inet.c:703
   mptcp_sendmsg_fastopen+0x2a2/0x530 net/mptcp/protocol.c:1755
   mptcp_sendmsg+0x1884/0x1b10 net/mptcp/protocol.c:1830
   sock_sendmsg_nosec net/socket.c:711 [inline]
   __sock_sendmsg+0x1a6/0x270 net/socket.c:726
   ____sys_sendmsg+0x52a/0x7e0 net/socket.c:2583
   ___sys_sendmsg net/socket.c:2637 [inline]
   __sys_sendmsg+0x269/0x350 net/socket.c:2669
   do_syscall_x64 arch/x86/entry/common.c:52 [inline]
   do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
   entry_SYSCALL_64_after_hwframe+0x77/0x7f
  RIP: 0033:0x7f6e86ebfe69
  Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 b1 1f 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
  RSP: 002b:00007f6e86649168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
  RAX: ffffffffffffffda RBX: 00007f6e86f491b8 RCX: 00007f6e86ebfe69
  RDX: 0000000030004001 RSI: 0000000020000080 RDI: 0000000000000003
  RBP: 00007f6e86f491b0 R08: 00007f6e866496c0 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000246 R12: 00007f6e86f491bc
  R13: 000000000000006e R14: 00007ffe445d9420 R15: 00007ffe445d9508
   </TASK>

The root cause is the bad handling of disconnect() generated internally
by the MPTCP protocol in case of connect FASTOPEN errors.

Address the issue increasing the socket disconnect counter even on such
a case, to allow other threads waiting on the same socket lock to
properly error out.

Fixes: c2b2ae3925 ("mptcp: handle correctly disconnect() failures")
Cc: stable@vger.kernel.org
Reported-by: syzbot+ebc0b8ae5d3590b2c074@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/67605870.050a0220.37aaf.0137.GAE@google.com
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/537
Tested-by: syzbot+ebc0b8ae5d3590b2c074@syzkaller.appspotmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250123-net-mptcp-syzbot-issues-v1-3-af73258a726f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-27 15:07:02 -08:00
Matthieu Baerts (NGI0)
1bb0d13485 mptcp: pm: only set fullmesh for subflow endp
With the in-kernel path-manager, it is possible to change the 'fullmesh'
flag. The code in mptcp_pm_nl_fullmesh() expects to change it only on
'subflow' endpoints, to recreate more or less subflows using the linked
address.

Unfortunately, the set_flags() hook was a bit more permissive, and
allowed 'implicit' endpoints to get the 'fullmesh' flag while it is not
allowed before.

That's what syzbot found, triggering the following warning:

  WARNING: CPU: 0 PID: 6499 at net/mptcp/pm_netlink.c:1496 __mark_subflow_endp_available net/mptcp/pm_netlink.c:1496 [inline]
  WARNING: CPU: 0 PID: 6499 at net/mptcp/pm_netlink.c:1496 mptcp_pm_nl_fullmesh net/mptcp/pm_netlink.c:1980 [inline]
  WARNING: CPU: 0 PID: 6499 at net/mptcp/pm_netlink.c:1496 mptcp_nl_set_flags net/mptcp/pm_netlink.c:2003 [inline]
  WARNING: CPU: 0 PID: 6499 at net/mptcp/pm_netlink.c:1496 mptcp_pm_nl_set_flags+0x974/0xdc0 net/mptcp/pm_netlink.c:2064
  Modules linked in:
  CPU: 0 UID: 0 PID: 6499 Comm: syz.1.413 Not tainted 6.13.0-rc5-syzkaller-00172-gd1bf27c4e176 #0
  Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
  RIP: 0010:__mark_subflow_endp_available net/mptcp/pm_netlink.c:1496 [inline]
  RIP: 0010:mptcp_pm_nl_fullmesh net/mptcp/pm_netlink.c:1980 [inline]
  RIP: 0010:mptcp_nl_set_flags net/mptcp/pm_netlink.c:2003 [inline]
  RIP: 0010:mptcp_pm_nl_set_flags+0x974/0xdc0 net/mptcp/pm_netlink.c:2064
  Code: 01 00 00 49 89 c5 e8 fb 45 e8 f5 e9 b8 fc ff ff e8 f1 45 e8 f5 4c 89 f7 be 03 00 00 00 e8 44 1d 0b f9 eb a0 e8 dd 45 e8 f5 90 <0f> 0b 90 e9 17 ff ff ff 89 d9 80 e1 07 38 c1 0f 8c c9 fc ff ff 48
  RSP: 0018:ffffc9000d307240 EFLAGS: 00010293
  RAX: ffffffff8bb72e03 RBX: 0000000000000000 RCX: ffff88807da88000
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
  RBP: ffffc9000d307430 R08: ffffffff8bb72cf0 R09: 1ffff1100b842a5e
  R10: dffffc0000000000 R11: ffffed100b842a5f R12: ffff88801e2e5ac0
  R13: ffff88805c214800 R14: ffff88805c2152e8 R15: 1ffff1100b842a5d
  FS:  00005555619f6500(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000020002840 CR3: 00000000247e6000 CR4: 00000000003526f0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   <TASK>
   genl_family_rcv_msg_doit net/netlink/genetlink.c:1115 [inline]
   genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
   genl_rcv_msg+0xb14/0xec0 net/netlink/genetlink.c:1210
   netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2542
   genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219
   netlink_unicast_kernel net/netlink/af_netlink.c:1321 [inline]
   netlink_unicast+0x7f6/0x990 net/netlink/af_netlink.c:1347
   netlink_sendmsg+0x8e4/0xcb0 net/netlink/af_netlink.c:1891
   sock_sendmsg_nosec net/socket.c:711 [inline]
   __sock_sendmsg+0x221/0x270 net/socket.c:726
   ____sys_sendmsg+0x52a/0x7e0 net/socket.c:2583
   ___sys_sendmsg net/socket.c:2637 [inline]
   __sys_sendmsg+0x269/0x350 net/socket.c:2669
   do_syscall_x64 arch/x86/entry/common.c:52 [inline]
   do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
   entry_SYSCALL_64_after_hwframe+0x77/0x7f
  RIP: 0033:0x7f5fe8785d29
  Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
  RSP: 002b:00007fff571f5558 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
  RAX: ffffffffffffffda RBX: 00007f5fe8975fa0 RCX: 00007f5fe8785d29
  RDX: 0000000000000000 RSI: 0000000020000480 RDI: 0000000000000007
  RBP: 00007f5fe8801b08 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
  R13: 00007f5fe8975fa0 R14: 00007f5fe8975fa0 R15: 00000000000011f4
   </TASK>

Here, syzbot managed to set the 'fullmesh' flag on an 'implicit' and
used -- according to 'id_avail_bitmap' -- endpoint, causing the PM to
try decrement the local_addr_used counter which is only incremented for
the 'subflow' endpoint.

Note that 'no type' endpoints -- not 'subflow', 'signal', 'implicit' --
are fine, because their ID will not be marked as used in the 'id_avail'
bitmap, and setting 'fullmesh' can help forcing the creation of subflow
when receiving an ADD_ADDR.

Fixes: 73c762c1f0 ("mptcp: set fullmesh flag in pm_netlink")
Cc: stable@vger.kernel.org
Reported-by: syzbot+cd16e79c1e45f3fe0377@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/6786ac51.050a0220.216c54.00a6.GAE@google.com
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/540
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250123-net-mptcp-syzbot-issues-v1-2-af73258a726f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-27 15:07:02 -08:00
Paolo Abeni
c86b000782 mptcp: consolidate suboption status
MPTCP maintains the received sub-options status is the bitmask carrying
the received suboptions and in several bitfields carrying per suboption
additional info.

Zeroing the bitmask before parsing is not enough to ensure a consistent
status, and the MPTCP code has to additionally clear some bitfiled
depending on the actually parsed suboption.

The above schema is fragile, and syzbot managed to trigger a path where
a relevant bitfield is not cleared/initialized:

  BUG: KMSAN: uninit-value in __mptcp_expand_seq net/mptcp/options.c:1030 [inline]
  BUG: KMSAN: uninit-value in mptcp_expand_seq net/mptcp/protocol.h:864 [inline]
  BUG: KMSAN: uninit-value in ack_update_msk net/mptcp/options.c:1060 [inline]
  BUG: KMSAN: uninit-value in mptcp_incoming_options+0x2036/0x3d30 net/mptcp/options.c:1209
   __mptcp_expand_seq net/mptcp/options.c:1030 [inline]
   mptcp_expand_seq net/mptcp/protocol.h:864 [inline]
   ack_update_msk net/mptcp/options.c:1060 [inline]
   mptcp_incoming_options+0x2036/0x3d30 net/mptcp/options.c:1209
   tcp_data_queue+0xb4/0x7be0 net/ipv4/tcp_input.c:5233
   tcp_rcv_established+0x1061/0x2510 net/ipv4/tcp_input.c:6264
   tcp_v4_do_rcv+0x7f3/0x11a0 net/ipv4/tcp_ipv4.c:1916
   tcp_v4_rcv+0x51df/0x5750 net/ipv4/tcp_ipv4.c:2351
   ip_protocol_deliver_rcu+0x2a3/0x13d0 net/ipv4/ip_input.c:205
   ip_local_deliver_finish+0x336/0x500 net/ipv4/ip_input.c:233
   NF_HOOK include/linux/netfilter.h:314 [inline]
   ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254
   dst_input include/net/dst.h:460 [inline]
   ip_rcv_finish+0x4a2/0x520 net/ipv4/ip_input.c:447
   NF_HOOK include/linux/netfilter.h:314 [inline]
   ip_rcv+0xcd/0x380 net/ipv4/ip_input.c:567
   __netif_receive_skb_one_core net/core/dev.c:5704 [inline]
   __netif_receive_skb+0x319/0xa00 net/core/dev.c:5817
   process_backlog+0x4ad/0xa50 net/core/dev.c:6149
   __napi_poll+0xe7/0x980 net/core/dev.c:6902
   napi_poll net/core/dev.c:6971 [inline]
   net_rx_action+0xa5a/0x19b0 net/core/dev.c:7093
   handle_softirqs+0x1a0/0x7c0 kernel/softirq.c:561
   __do_softirq+0x14/0x1a kernel/softirq.c:595
   do_softirq+0x9a/0x100 kernel/softirq.c:462
   __local_bh_enable_ip+0x9f/0xb0 kernel/softirq.c:389
   local_bh_enable include/linux/bottom_half.h:33 [inline]
   rcu_read_unlock_bh include/linux/rcupdate.h:919 [inline]
   __dev_queue_xmit+0x2758/0x57d0 net/core/dev.c:4493
   dev_queue_xmit include/linux/netdevice.h:3168 [inline]
   neigh_hh_output include/net/neighbour.h:523 [inline]
   neigh_output include/net/neighbour.h:537 [inline]
   ip_finish_output2+0x187c/0x1b70 net/ipv4/ip_output.c:236
   __ip_finish_output+0x287/0x810
   ip_finish_output+0x4b/0x600 net/ipv4/ip_output.c:324
   NF_HOOK_COND include/linux/netfilter.h:303 [inline]
   ip_output+0x15f/0x3f0 net/ipv4/ip_output.c:434
   dst_output include/net/dst.h:450 [inline]
   ip_local_out net/ipv4/ip_output.c:130 [inline]
   __ip_queue_xmit+0x1f2a/0x20d0 net/ipv4/ip_output.c:536
   ip_queue_xmit+0x60/0x80 net/ipv4/ip_output.c:550
   __tcp_transmit_skb+0x3cea/0x4900 net/ipv4/tcp_output.c:1468
   tcp_transmit_skb net/ipv4/tcp_output.c:1486 [inline]
   tcp_write_xmit+0x3b90/0x9070 net/ipv4/tcp_output.c:2829
   __tcp_push_pending_frames+0xc4/0x380 net/ipv4/tcp_output.c:3012
   tcp_send_fin+0x9f6/0xf50 net/ipv4/tcp_output.c:3618
   __tcp_close+0x140c/0x1550 net/ipv4/tcp.c:3130
   __mptcp_close_ssk+0x74e/0x16f0 net/mptcp/protocol.c:2496
   mptcp_close_ssk+0x26b/0x2c0 net/mptcp/protocol.c:2550
   mptcp_pm_nl_rm_addr_or_subflow+0x635/0xd10 net/mptcp/pm_netlink.c:889
   mptcp_pm_nl_rm_subflow_received net/mptcp/pm_netlink.c:924 [inline]
   mptcp_pm_flush_addrs_and_subflows net/mptcp/pm_netlink.c:1688 [inline]
   mptcp_nl_flush_addrs_list net/mptcp/pm_netlink.c:1709 [inline]
   mptcp_pm_nl_flush_addrs_doit+0xe10/0x1630 net/mptcp/pm_netlink.c:1750
   genl_family_rcv_msg_doit net/netlink/genetlink.c:1115 [inline]
   genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
   genl_rcv_msg+0x1214/0x12c0 net/netlink/genetlink.c:1210
   netlink_rcv_skb+0x375/0x650 net/netlink/af_netlink.c:2542
   genl_rcv+0x40/0x60 net/netlink/genetlink.c:1219
   netlink_unicast_kernel net/netlink/af_netlink.c:1321 [inline]
   netlink_unicast+0xf52/0x1260 net/netlink/af_netlink.c:1347
   netlink_sendmsg+0x10da/0x11e0 net/netlink/af_netlink.c:1891
   sock_sendmsg_nosec net/socket.c:711 [inline]
   __sock_sendmsg+0x30f/0x380 net/socket.c:726
   ____sys_sendmsg+0x877/0xb60 net/socket.c:2583
   ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2637
   __sys_sendmsg net/socket.c:2669 [inline]
   __do_sys_sendmsg net/socket.c:2674 [inline]
   __se_sys_sendmsg net/socket.c:2672 [inline]
   __x64_sys_sendmsg+0x212/0x3c0 net/socket.c:2672
   x64_sys_call+0x2ed6/0x3c30 arch/x86/include/generated/asm/syscalls_64.h:47
   do_syscall_x64 arch/x86/entry/common.c:52 [inline]
   do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
   entry_SYSCALL_64_after_hwframe+0x77/0x7f

  Uninit was stored to memory at:
   mptcp_get_options+0x2c0f/0x2f20 net/mptcp/options.c:397
   mptcp_incoming_options+0x19a/0x3d30 net/mptcp/options.c:1150
   tcp_data_queue+0xb4/0x7be0 net/ipv4/tcp_input.c:5233
   tcp_rcv_established+0x1061/0x2510 net/ipv4/tcp_input.c:6264
   tcp_v4_do_rcv+0x7f3/0x11a0 net/ipv4/tcp_ipv4.c:1916
   tcp_v4_rcv+0x51df/0x5750 net/ipv4/tcp_ipv4.c:2351
   ip_protocol_deliver_rcu+0x2a3/0x13d0 net/ipv4/ip_input.c:205
   ip_local_deliver_finish+0x336/0x500 net/ipv4/ip_input.c:233
   NF_HOOK include/linux/netfilter.h:314 [inline]
   ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254
   dst_input include/net/dst.h:460 [inline]
   ip_rcv_finish+0x4a2/0x520 net/ipv4/ip_input.c:447
   NF_HOOK include/linux/netfilter.h:314 [inline]
   ip_rcv+0xcd/0x380 net/ipv4/ip_input.c:567
   __netif_receive_skb_one_core net/core/dev.c:5704 [inline]
   __netif_receive_skb+0x319/0xa00 net/core/dev.c:5817
   process_backlog+0x4ad/0xa50 net/core/dev.c:6149
   __napi_poll+0xe7/0x980 net/core/dev.c:6902
   napi_poll net/core/dev.c:6971 [inline]
   net_rx_action+0xa5a/0x19b0 net/core/dev.c:7093
   handle_softirqs+0x1a0/0x7c0 kernel/softirq.c:561
   __do_softirq+0x14/0x1a kernel/softirq.c:595

  Uninit was stored to memory at:
   put_unaligned_be32 include/linux/unaligned.h:68 [inline]
   mptcp_write_options+0x17f9/0x3100 net/mptcp/options.c:1417
   mptcp_options_write net/ipv4/tcp_output.c:465 [inline]
   tcp_options_write+0x6d9/0xe90 net/ipv4/tcp_output.c:759
   __tcp_transmit_skb+0x294b/0x4900 net/ipv4/tcp_output.c:1414
   tcp_transmit_skb net/ipv4/tcp_output.c:1486 [inline]
   tcp_write_xmit+0x3b90/0x9070 net/ipv4/tcp_output.c:2829
   __tcp_push_pending_frames+0xc4/0x380 net/ipv4/tcp_output.c:3012
   tcp_send_fin+0x9f6/0xf50 net/ipv4/tcp_output.c:3618
   __tcp_close+0x140c/0x1550 net/ipv4/tcp.c:3130
   __mptcp_close_ssk+0x74e/0x16f0 net/mptcp/protocol.c:2496
   mptcp_close_ssk+0x26b/0x2c0 net/mptcp/protocol.c:2550
   mptcp_pm_nl_rm_addr_or_subflow+0x635/0xd10 net/mptcp/pm_netlink.c:889
   mptcp_pm_nl_rm_subflow_received net/mptcp/pm_netlink.c:924 [inline]
   mptcp_pm_flush_addrs_and_subflows net/mptcp/pm_netlink.c:1688 [inline]
   mptcp_nl_flush_addrs_list net/mptcp/pm_netlink.c:1709 [inline]
   mptcp_pm_nl_flush_addrs_doit+0xe10/0x1630 net/mptcp/pm_netlink.c:1750
   genl_family_rcv_msg_doit net/netlink/genetlink.c:1115 [inline]
   genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
   genl_rcv_msg+0x1214/0x12c0 net/netlink/genetlink.c:1210
   netlink_rcv_skb+0x375/0x650 net/netlink/af_netlink.c:2542
   genl_rcv+0x40/0x60 net/netlink/genetlink.c:1219
   netlink_unicast_kernel net/netlink/af_netlink.c:1321 [inline]
   netlink_unicast+0xf52/0x1260 net/netlink/af_netlink.c:1347
   netlink_sendmsg+0x10da/0x11e0 net/netlink/af_netlink.c:1891
   sock_sendmsg_nosec net/socket.c:711 [inline]
   __sock_sendmsg+0x30f/0x380 net/socket.c:726
   ____sys_sendmsg+0x877/0xb60 net/socket.c:2583
   ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2637
   __sys_sendmsg net/socket.c:2669 [inline]
   __do_sys_sendmsg net/socket.c:2674 [inline]
   __se_sys_sendmsg net/socket.c:2672 [inline]
   __x64_sys_sendmsg+0x212/0x3c0 net/socket.c:2672
   x64_sys_call+0x2ed6/0x3c30 arch/x86/include/generated/asm/syscalls_64.h:47
   do_syscall_x64 arch/x86/entry/common.c:52 [inline]
   do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
   entry_SYSCALL_64_after_hwframe+0x77/0x7f

  Uninit was stored to memory at:
   mptcp_pm_add_addr_signal+0x3d7/0x4c0
   mptcp_established_options_add_addr net/mptcp/options.c:666 [inline]
   mptcp_established_options+0x1b9b/0x3a00 net/mptcp/options.c:884
   tcp_established_options+0x2c4/0x7d0 net/ipv4/tcp_output.c:1012
   __tcp_transmit_skb+0x5b7/0x4900 net/ipv4/tcp_output.c:1333
   tcp_transmit_skb net/ipv4/tcp_output.c:1486 [inline]
   tcp_write_xmit+0x3b90/0x9070 net/ipv4/tcp_output.c:2829
   __tcp_push_pending_frames+0xc4/0x380 net/ipv4/tcp_output.c:3012
   tcp_send_fin+0x9f6/0xf50 net/ipv4/tcp_output.c:3618
   __tcp_close+0x140c/0x1550 net/ipv4/tcp.c:3130
   __mptcp_close_ssk+0x74e/0x16f0 net/mptcp/protocol.c:2496
   mptcp_close_ssk+0x26b/0x2c0 net/mptcp/protocol.c:2550
   mptcp_pm_nl_rm_addr_or_subflow+0x635/0xd10 net/mptcp/pm_netlink.c:889
   mptcp_pm_nl_rm_subflow_received net/mptcp/pm_netlink.c:924 [inline]
   mptcp_pm_flush_addrs_and_subflows net/mptcp/pm_netlink.c:1688 [inline]
   mptcp_nl_flush_addrs_list net/mptcp/pm_netlink.c:1709 [inline]
   mptcp_pm_nl_flush_addrs_doit+0xe10/0x1630 net/mptcp/pm_netlink.c:1750
   genl_family_rcv_msg_doit net/netlink/genetlink.c:1115 [inline]
   genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
   genl_rcv_msg+0x1214/0x12c0 net/netlink/genetlink.c:1210
   netlink_rcv_skb+0x375/0x650 net/netlink/af_netlink.c:2542
   genl_rcv+0x40/0x60 net/netlink/genetlink.c:1219
   netlink_unicast_kernel net/netlink/af_netlink.c:1321 [inline]
   netlink_unicast+0xf52/0x1260 net/netlink/af_netlink.c:1347
   netlink_sendmsg+0x10da/0x11e0 net/netlink/af_netlink.c:1891
   sock_sendmsg_nosec net/socket.c:711 [inline]
   __sock_sendmsg+0x30f/0x380 net/socket.c:726
   ____sys_sendmsg+0x877/0xb60 net/socket.c:2583
   ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2637
   __sys_sendmsg net/socket.c:2669 [inline]
   __do_sys_sendmsg net/socket.c:2674 [inline]
   __se_sys_sendmsg net/socket.c:2672 [inline]
   __x64_sys_sendmsg+0x212/0x3c0 net/socket.c:2672
   x64_sys_call+0x2ed6/0x3c30 arch/x86/include/generated/asm/syscalls_64.h:47
   do_syscall_x64 arch/x86/entry/common.c:52 [inline]
   do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
   entry_SYSCALL_64_after_hwframe+0x77/0x7f

  Uninit was stored to memory at:
   mptcp_pm_add_addr_received+0x95f/0xdd0 net/mptcp/pm.c:235
   mptcp_incoming_options+0x2983/0x3d30 net/mptcp/options.c:1169
   tcp_data_queue+0xb4/0x7be0 net/ipv4/tcp_input.c:5233
   tcp_rcv_state_process+0x2a38/0x49d0 net/ipv4/tcp_input.c:6972
   tcp_v4_do_rcv+0xbf9/0x11a0 net/ipv4/tcp_ipv4.c:1939
   tcp_v4_rcv+0x51df/0x5750 net/ipv4/tcp_ipv4.c:2351
   ip_protocol_deliver_rcu+0x2a3/0x13d0 net/ipv4/ip_input.c:205
   ip_local_deliver_finish+0x336/0x500 net/ipv4/ip_input.c:233
   NF_HOOK include/linux/netfilter.h:314 [inline]
   ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254
   dst_input include/net/dst.h:460 [inline]
   ip_rcv_finish+0x4a2/0x520 net/ipv4/ip_input.c:447
   NF_HOOK include/linux/netfilter.h:314 [inline]
   ip_rcv+0xcd/0x380 net/ipv4/ip_input.c:567
   __netif_receive_skb_one_core net/core/dev.c:5704 [inline]
   __netif_receive_skb+0x319/0xa00 net/core/dev.c:5817
   process_backlog+0x4ad/0xa50 net/core/dev.c:6149
   __napi_poll+0xe7/0x980 net/core/dev.c:6902
   napi_poll net/core/dev.c:6971 [inline]
   net_rx_action+0xa5a/0x19b0 net/core/dev.c:7093
   handle_softirqs+0x1a0/0x7c0 kernel/softirq.c:561
   __do_softirq+0x14/0x1a kernel/softirq.c:595

  Local variable mp_opt created at:
   mptcp_incoming_options+0x119/0x3d30 net/mptcp/options.c:1127
   tcp_data_queue+0xb4/0x7be0 net/ipv4/tcp_input.c:5233

The current schema is too fragile; address the issue grouping all the
state-related data together and clearing the whole group instead of
just the bitmask. This also cleans-up the code a bit, as there is no
need to individually clear "random" bitfield in a couple of places
any more.

Fixes: 84dfe3677a ("mptcp: send out dedicated ADD_ADDR packet")
Cc: stable@vger.kernel.org
Reported-by: syzbot+23728c2df58b3bd175ad@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/6786ac51.050a0220.216c54.00a7.GAE@google.com
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/541
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250123-net-mptcp-syzbot-issues-v1-1-af73258a726f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-27 15:07:02 -08:00
Chenyuan Yang
19e65c45a1 net: davicom: fix UAF in dm9000_drv_remove
dm is netdev private data and it cannot be
used after free_netdev() call. Using dm after free_netdev()
can cause UAF bug. Fix it by moving free_netdev() at the end of the
function.

This is similar to the issue fixed in commit
ad297cd2db ("net: qcom/emac: fix UAF in emac_remove").

This bug is detected by our static analysis tool.

Fixes: cf9e60aa69 ("net: davicom: Fix regulator not turned off on driver removal")
Signed-off-by: Chenyuan Yang <chenyuan0y@gmail.com>
CC: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Link: https://patch.msgid.link/20250123214213.623518-1-chenyuan0y@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-27 15:05:37 -08:00
Milos Reljin
bd1bbab717 net: phy: c45-tjaxx: add delay between MDIO write and read in soft_reset
In application note (AN13663) for TJA1120, on page 30, there's a figure
with average PHY startup timing values following software reset.
The time it takes for SMI to become operational after software reset
ranges roughly from 500 us to 1500 us.

This commit adds 2000 us delay after MDIO write which triggers software
reset. Without this delay, soft_reset function returns an error and
prevents successful PHY init.

Cc: stable@vger.kernel.org
Fixes: b050f2f15e ("phy: nxp-c45: add driver for tja1103")
Signed-off-by: Milos Reljin <milos_reljin@outlook.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/AM8P250MB0124D258E5A71041AF2CC322E1E32@AM8P250MB0124.EURP250.PROD.OUTLOOK.COM
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-27 15:02:51 -08:00
Len Brown
b32c36975d tools/power turbostat: Fix forked child affinity regression
In "one-shot" mode, turbostat
1. takes a counter snapshot
2. forks and waits for a child
3. takes the end counter snapshot and prints the result.

But turbostat counter snapshots currently use affinity to travel
around the system so that counter reads are "local", and this
affinity must be cleared between #1 and #2 above.

The offending commit removed that reset that allowed the child
to run on cpu_present_set.

Fix that issue, and improve upon the original by using
cpu_possible_set for the child.  This allows the child
to also run on CPUs that hotplug online during its runtime.

Reported-by: Zhang Rui <rui.zhang@intel.com>
Fixes: 7bb3fe27ad ("tools/power/turbostat: Obey allowed CPUs during startup")
Signed-off-by: Len Brown <len.brown@intel.com>
2025-01-27 16:56:51 -06:00
Shigeru Yoshida
5066293b9b vxlan: Fix uninit-value in vxlan_vnifilter_dump()
KMSAN reported an uninit-value access in vxlan_vnifilter_dump() [1].

If the length of the netlink message payload is less than
sizeof(struct tunnel_msg), vxlan_vnifilter_dump() accesses bytes
beyond the message. This can lead to uninit-value access. Fix this by
returning an error in such situations.

[1]
BUG: KMSAN: uninit-value in vxlan_vnifilter_dump+0x328/0x920 drivers/net/vxlan/vxlan_vnifilter.c:422
 vxlan_vnifilter_dump+0x328/0x920 drivers/net/vxlan/vxlan_vnifilter.c:422
 rtnl_dumpit+0xd5/0x2f0 net/core/rtnetlink.c:6786
 netlink_dump+0x93e/0x15f0 net/netlink/af_netlink.c:2317
 __netlink_dump_start+0x716/0xd60 net/netlink/af_netlink.c:2432
 netlink_dump_start include/linux/netlink.h:340 [inline]
 rtnetlink_dump_start net/core/rtnetlink.c:6815 [inline]
 rtnetlink_rcv_msg+0x1256/0x14a0 net/core/rtnetlink.c:6882
 netlink_rcv_skb+0x467/0x660 net/netlink/af_netlink.c:2542
 rtnetlink_rcv+0x35/0x40 net/core/rtnetlink.c:6944
 netlink_unicast_kernel net/netlink/af_netlink.c:1321 [inline]
 netlink_unicast+0xed6/0x1290 net/netlink/af_netlink.c:1347
 netlink_sendmsg+0x1092/0x1230 net/netlink/af_netlink.c:1891
 sock_sendmsg_nosec net/socket.c:711 [inline]
 __sock_sendmsg+0x330/0x3d0 net/socket.c:726
 ____sys_sendmsg+0x7f4/0xb50 net/socket.c:2583
 ___sys_sendmsg+0x271/0x3b0 net/socket.c:2637
 __sys_sendmsg net/socket.c:2669 [inline]
 __do_sys_sendmsg net/socket.c:2674 [inline]
 __se_sys_sendmsg net/socket.c:2672 [inline]
 __x64_sys_sendmsg+0x211/0x3e0 net/socket.c:2672
 x64_sys_call+0x3878/0x3d90 arch/x86/include/generated/asm/syscalls_64.h:47
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xd9/0x1d0 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Uninit was created at:
 slab_post_alloc_hook mm/slub.c:4110 [inline]
 slab_alloc_node mm/slub.c:4153 [inline]
 kmem_cache_alloc_node_noprof+0x800/0xe80 mm/slub.c:4205
 kmalloc_reserve+0x13b/0x4b0 net/core/skbuff.c:587
 __alloc_skb+0x347/0x7d0 net/core/skbuff.c:678
 alloc_skb include/linux/skbuff.h:1323 [inline]
 netlink_alloc_large_skb+0xa5/0x280 net/netlink/af_netlink.c:1196
 netlink_sendmsg+0xac9/0x1230 net/netlink/af_netlink.c:1866
 sock_sendmsg_nosec net/socket.c:711 [inline]
 __sock_sendmsg+0x330/0x3d0 net/socket.c:726
 ____sys_sendmsg+0x7f4/0xb50 net/socket.c:2583
 ___sys_sendmsg+0x271/0x3b0 net/socket.c:2637
 __sys_sendmsg net/socket.c:2669 [inline]
 __do_sys_sendmsg net/socket.c:2674 [inline]
 __se_sys_sendmsg net/socket.c:2672 [inline]
 __x64_sys_sendmsg+0x211/0x3e0 net/socket.c:2672
 x64_sys_call+0x3878/0x3d90 arch/x86/include/generated/asm/syscalls_64.h:47
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xd9/0x1d0 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

CPU: 0 UID: 0 PID: 30991 Comm: syz.4.10630 Not tainted 6.12.0-10694-gc44daa7e3c73 #29
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014

Fixes: f9c4bb0b24 ("vxlan: vni filtering support on collect metadata device")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250123145746.785768-1-syoshida@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-27 14:53:13 -08:00
David Howells
79d458c130 rxrpc, afs: Fix peer hash locking vs RCU callback
In its address list, afs now retains pointers to and refs on one or more
rxrpc_peer objects.  The address list is freed under RCU and at this time,
it puts the refs on those peers.

Now, when an rxrpc_peer object runs out of refs, it gets removed from the
peer hash table and, for that, rxrpc has to take a spinlock.  However, it
is now being called from afs's RCU cleanup, which takes place in BH
context - but it is just taking an ordinary spinlock.

The put may also be called from non-BH context, and so there exists the
possibility of deadlock if the BH-based RCU cleanup happens whilst the hash
spinlock is held.  This led to the attached lockdep complaint.

Fix this by changing spinlocks of rxnet->peer_hash_lock back to
BH-disabling locks.

    ================================
    WARNING: inconsistent lock state
    6.13.0-rc5-build2+ #1223 Tainted: G            E
    --------------------------------
    inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
    swapper/1/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
    ffff88810babe228 (&rxnet->peer_hash_lock){+.?.}-{3:3}, at: rxrpc_put_peer+0xcb/0x180
    {SOFTIRQ-ON-W} state was registered at:
      mark_usage+0x164/0x180
      __lock_acquire+0x544/0x990
      lock_acquire.part.0+0x103/0x280
      _raw_spin_lock+0x2f/0x40
      rxrpc_peer_keepalive_worker+0x144/0x440
      process_one_work+0x486/0x7c0
      process_scheduled_works+0x73/0x90
      worker_thread+0x1c8/0x2a0
      kthread+0x19b/0x1b0
      ret_from_fork+0x24/0x40
      ret_from_fork_asm+0x1a/0x30
    irq event stamp: 972402
    hardirqs last  enabled at (972402): [<ffffffff8244360e>] _raw_spin_unlock_irqrestore+0x2e/0x50
    hardirqs last disabled at (972401): [<ffffffff82443328>] _raw_spin_lock_irqsave+0x18/0x60
    softirqs last  enabled at (972300): [<ffffffff810ffbbe>] handle_softirqs+0x3ee/0x430
    softirqs last disabled at (972313): [<ffffffff810ffc54>] __irq_exit_rcu+0x44/0x110

    other info that might help us debug this:
     Possible unsafe locking scenario:
           CPU0
           ----
      lock(&rxnet->peer_hash_lock);
      <Interrupt>
        lock(&rxnet->peer_hash_lock);

     *** DEADLOCK ***
    1 lock held by swapper/1/0:
     #0: ffffffff83576be0 (rcu_callback){....}-{0:0}, at: rcu_lock_acquire+0x7/0x30

    stack backtrace:
    CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Tainted: G            E      6.13.0-rc5-build2+ #1223
    Tainted: [E]=UNSIGNED_MODULE
    Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
    Call Trace:
     <IRQ>
     dump_stack_lvl+0x57/0x80
     print_usage_bug.part.0+0x227/0x240
     valid_state+0x53/0x70
     mark_lock_irq+0xa5/0x2f0
     mark_lock+0xf7/0x170
     mark_usage+0xe1/0x180
     __lock_acquire+0x544/0x990
     lock_acquire.part.0+0x103/0x280
     _raw_spin_lock+0x2f/0x40
     rxrpc_put_peer+0xcb/0x180
     afs_free_addrlist+0x46/0x90 [kafs]
     rcu_do_batch+0x2d2/0x640
     rcu_core+0x2f7/0x350
     handle_softirqs+0x1ee/0x430
     __irq_exit_rcu+0x44/0x110
     irq_exit_rcu+0xa/0x30
     sysvec_apic_timer_interrupt+0x7f/0xa0
     </IRQ>

Fixes: 72904d7b9b ("rxrpc, afs: Allow afs to pin rxrpc_peer objects")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/2095618.1737622752@warthog.procyon.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-27 14:46:18 -08:00
Jan Stancek
9b06d5b956 selftests: net/{lib,openvswitch}: extend CFLAGS to keep options from environment
Package build environments like Fedora rpmbuild introduced hardening
options (e.g. -pie -Wl,-z,now) by passing a -spec option to CFLAGS
and LDFLAGS.

Some Makefiles currently override CFLAGS but not LDFLAGS, which leads
to a mismatch and build failure, for example:
  /usr/bin/ld: /tmp/ccd2apay.o: relocation R_X86_64_32 against
    `.rodata.str1.1' can not be used when making a PIE object; recompile with -fPIE
  /usr/bin/ld: failed to set dynamic section sizes: bad value
  collect2: error: ld returned 1 exit status
  make[1]: *** [../../lib.mk:222: tools/testing/selftests/net/lib/csum] Error 1

openvswitch/Makefile CFLAGS currently do not appear to be used, but
fix it anyway for the case when new tests are introduced in future.

Fixes: 1d0dc857b5 ("selftests: drv-net: add checksum tests")
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/3d173603ee258f419d0403363765c9f9494ff79a.1737635092.git.jstancek@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-27 14:45:27 -08:00
Jan Stancek
23b3a7c4a7 selftests: mptcp: extend CFLAGS to keep options from environment
Package build environments like Fedora rpmbuild introduced hardening
options (e.g. -pie -Wl,-z,now) by passing a -spec option to CFLAGS
and LDFLAGS.

mptcp Makefile currently overrides CFLAGS but not LDFLAGS, which leads
to a mismatch and build failure, for example:
  make[1]: *** [../../lib.mk:222: tools/testing/selftests/net/mptcp/mptcp_sockopt] Error 1
  /usr/bin/ld: /tmp/ccqyMVdb.o: relocation R_X86_64_32 against `.rodata.str1.8' can not be used when making a PIE object; recompile with -fPIE
  /usr/bin/ld: failed to set dynamic section sizes: bad value
  collect2: error: ld returned 1 exit status

Fixes: cc937dad85 ("selftests: centralize -D_GNU_SOURCE= to CFLAGS in lib.mk")
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/7abc701da9df39c2d6cd15bc3cf9e6cee445cb96.1737621162.git.jstancek@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-27 14:45:19 -08:00
Andrea Righi
1626e5ef0b sched_ext: Fix lock imbalance in dispatch_to_local_dsq()
While performing the rq locking dance in dispatch_to_local_dsq(), we may
trigger the following lock imbalance condition, in particular when
multiple tasks are rapidly changing CPU affinity (i.e., running a
`stress-ng --race-sched 0`):

[   13.413579] =====================================
[   13.413660] WARNING: bad unlock balance detected!
[   13.413729] 6.13.0-virtme #15 Not tainted
[   13.413792] -------------------------------------
[   13.413859] kworker/1:1/80 is trying to release lock (&rq->__lock) at:
[   13.413954] [<ffffffff873c6c48>] dispatch_to_local_dsq+0x108/0x1a0
[   13.414111] but there are no more locks to release!
[   13.414176]
[   13.414176] other info that might help us debug this:
[   13.414258] 1 lock held by kworker/1:1/80:
[   13.414318]  #0: ffff8b66feb41698 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x20/0x90
[   13.414612]
[   13.414612] stack backtrace:
[   13.415255] CPU: 1 UID: 0 PID: 80 Comm: kworker/1:1 Not tainted 6.13.0-virtme #15
[   13.415505] Workqueue:  0x0 (events)
[   13.415567] Sched_ext: dsp_local_on (enabled+all), task: runnable_at=-2ms
[   13.415570] Call Trace:
[   13.415700]  <TASK>
[   13.415744]  dump_stack_lvl+0x78/0xe0
[   13.415806]  ? dispatch_to_local_dsq+0x108/0x1a0
[   13.415884]  print_unlock_imbalance_bug+0x11b/0x130
[   13.415965]  ? dispatch_to_local_dsq+0x108/0x1a0
[   13.416226]  lock_release+0x231/0x2c0
[   13.416326]  _raw_spin_unlock+0x1b/0x40
[   13.416422]  dispatch_to_local_dsq+0x108/0x1a0
[   13.416554]  flush_dispatch_buf+0x199/0x1d0
[   13.416652]  balance_one+0x194/0x370
[   13.416751]  balance_scx+0x61/0x1e0
[   13.416848]  prev_balance+0x43/0xb0
[   13.416947]  __pick_next_task+0x6b/0x1b0
[   13.417052]  __schedule+0x20d/0x1740

This happens because dispatch_to_local_dsq() is racing with
dispatch_dequeue() and, when the latter wins, we incorrectly assume that
the task has been moved to dst_rq.

Fix by properly tracking the currently locked rq.

Fixes: 4d3ca89bdd ("sched_ext: Refactor consume_remote_task()")
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-01-27 12:41:12 -10:00
Jakub Kicinski
67e4bb2ced net: page_pool: don't try to stash the napi id
Page ppol tried to cache the NAPI ID in page pool info to avoid
having a dependency on the life cycle of the NAPI instance.
Since commit under Fixes the NAPI ID is not populated until
napi_enable() and there's a good chance that page pool is
created before NAPI gets enabled.

Protect the NAPI pointer with the existing page pool mutex,
the reading path already holds it. napi_id itself we need
to READ_ONCE(), it's protected by netdev_lock() which are
not holding in page pool.

Before this patch napi IDs were missing for mlx5:

 # ./cli.py --spec netlink/specs/netdev.yaml --dump page-pool-get

 [{'id': 144, 'ifindex': 2, 'inflight': 3072, 'inflight-mem': 12582912},
  {'id': 143, 'ifindex': 2, 'inflight': 5568, 'inflight-mem': 22806528},
  {'id': 142, 'ifindex': 2, 'inflight': 5120, 'inflight-mem': 20971520},
  {'id': 141, 'ifindex': 2, 'inflight': 4992, 'inflight-mem': 20447232},
  ...

After:

 [{'id': 144, 'ifindex': 2, 'inflight': 3072, 'inflight-mem': 12582912,
   'napi-id': 565},
  {'id': 143, 'ifindex': 2, 'inflight': 4224, 'inflight-mem': 17301504,
   'napi-id': 525},
  {'id': 142, 'ifindex': 2, 'inflight': 4288, 'inflight-mem': 17563648,
   'napi-id': 524},
  ...

Fixes: 86e25f40aa ("net: napi: Add napi_config")
Reviewed-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20250123231620.1086401-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-27 14:37:41 -08:00
Jakub Kicinski
6db9d3a536 netdevsim: don't assume core pre-populates HDS params on GET
Syzbot reports:

  BUG: KMSAN: uninit-value in nsim_get_ringparam+0xa8/0xe0 drivers/net/netdevsim/ethtool.c:77
   nsim_get_ringparam+0xa8/0xe0 drivers/net/netdevsim/ethtool.c:77
   ethtool_set_ringparam+0x268/0x570 net/ethtool/ioctl.c:2072
   __dev_ethtool net/ethtool/ioctl.c:3209 [inline]
   dev_ethtool+0x126d/0x2a40 net/ethtool/ioctl.c:3398
   dev_ioctl+0xb0e/0x1280 net/core/dev_ioctl.c:759

This is the SET path, where we call GET to either check user request
against max values, or check if any of the settings will change.

The logic in netdevsim is trying to report the default (ENABLED)
if user has not requested any specific setting. The user setting
is recorded in dev->cfg, don't depend on kernel_ringparam being
pre-populated with it.

Fixes: 928459bbda ("net: ethtool: populate the default HDS params in the core")
Reported-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot+b3bcd80232d00091e061@syzkaller.appspotmail.com
Tested-by: syzbot+b3bcd80232d00091e061@syzkaller.appspotmail.com
Link: https://patch.msgid.link/20250123221410.1067678-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-27 14:37:11 -08:00
Jakub Kicinski
3b1af76604 MAINTAINERS: add Paul Fertser as a NC-SI reviewer
Paul has been providing very solid reviews for NC-SI changes
lately, so much so I started CCing him on all NC-SI patches.
Make the designation official.

Reviewed-by: Paul Fertser <fercerpav@gmail.com>
Link: https://patch.msgid.link/20250123155540.943243-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-27 14:31:46 -08:00
Jakub Kicinski
f17b15d0c2 Merge branch 'eth-fix-calling-napi_enable-in-atomic-context'
Jakub Kicinski says:

====================
eth: fix calling napi_enable() in atomic context

Dan has reported that I missed a lot of drivers which call napi_enable()
in atomic with the naive coccinelle search for spin locks:
https://lore.kernel.org/dcfd56bc-de32-4b11-9e19-d8bd1543745d@stanley.mountain

Fix them. Most of the fixes involve taking the netdev_lock()
before the spin lock. mt76 is special because we can just
move napi_enable() from the BH section.

All patches compile tested only.

v2: https://lore.kernel.org/20250123004520.806855-1-kuba@kernel.org
v1: https://lore.kernel.org/20250121221519.392014-1-kuba@kernel.org
====================

Link: https://patch.msgid.link/20250124031841.1179756-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-27 14:30:52 -08:00
Jakub Kicinski
a60558644e wifi: mt76: move napi_enable() from under BH
mt76 does a lot of:

  local_bh_disable();
  napi_enable(...napi);
  napi_schedule(...napi);
  local_bh_enable();

local_bh_disable() is not a real lock, its most likely taken
because napi_schedule() requires that we invoke softirqs at
some point. napi_enable() needs to take a mutex, so move it
from under the BH protection.

Fixes: 413f0271f3 ("net: protect NAPI enablement with netdev_lock()")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/dcfd56bc-de32-4b11-9e19-d8bd1543745d@stanley.mountain
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250124031841.1179756-8-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-27 14:30:50 -08:00
Jakub Kicinski
09a939487f eth: via-rhine: fix calling napi_enable() in atomic context
napi_enable() may sleep now, take netdev_lock() before rp->lock.
napi_enable() is hidden inside init_registers().

Note that this patch orders netdev_lock after rp->task_lock,
to avoid having to take the netdev_lock() around disable path.

Fixes: 413f0271f3 ("net: protect NAPI enablement with netdev_lock()")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/dcfd56bc-de32-4b11-9e19-d8bd1543745d@stanley.mountain
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250124031841.1179756-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-27 14:30:49 -08:00
Jakub Kicinski
f1d12bc7a5 eth: niu: fix calling napi_enable() in atomic context
napi_enable() may sleep now, take netdev_lock() before np->lock.

Fixes: 413f0271f3 ("net: protect NAPI enablement with netdev_lock()")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/dcfd56bc-de32-4b11-9e19-d8bd1543745d@stanley.mountain
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250124031841.1179756-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-27 14:30:49 -08:00
Jakub Kicinski
d19e612c47 eth: 8139too: fix calling napi_enable() in atomic context
napi_enable() may sleep now, take netdev_lock() before tp->lock and
tp->rx_lock.

Fixes: 413f0271f3 ("net: protect NAPI enablement with netdev_lock()")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/dcfd56bc-de32-4b11-9e19-d8bd1543745d@stanley.mountain
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Link: https://patch.msgid.link/20250124031841.1179756-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-27 14:30:49 -08:00