Commit Graph

99234 Commits

Author SHA1 Message Date
Ulf Hansson
a8e9ef4c8f pmdomain: Merge tag regulator-devm-of-get into next
Merge the tag regulator-devm-of-get from
git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator into next.

This introduces devm_of_regulator_get without the _optional suffix, since
that is more sensible for the Rockchip usecase.

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2025-02-28 12:39:40 +01:00
Ulf Hansson
c432bdcf39 pmdomain: Merge tag 'v6.14-rc4' from Linus into next
Linux 6.14-rc4
2025-02-28 12:37:14 +01:00
Sebastian Reichel
0dffacbbf8 regulator: Add (devm_)of_regulator_get()
The Rockchip power-domain controller also plans to make use of
per-domain regulators similar to the MediaTek power-domain controller.
Since existing DTs are missing the regulator information, the kernel
should fallback to the automatically created dummy regulator if
necessary. Thus the version without the _optional suffix is needed.

The Rockchip driver plans to use the managed version, but to be
consistent with existing code the unmanaged version is added at the
same time.

Tested-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Link: https://patch.msgid.link/20250220-rk3588-gpu-pwr-domain-regulator-v6-1-a4f9c24e5b81@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
2025-02-24 15:26:08 +00:00
Linus Torvalds
8b82c18bf9 Merge tag 'sched-urgent-2025-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull rseq fixes from Ingo Molnar:

 - Fix overly spread-out RSEQ concurrency ID allocation pattern that
   regressed certain workloads

 - Fix RSEQ registration syscall behavior on -EFAULT errors when
   CONFIG_DEBUG_RSEQ=y (This debug option is disabled on most
   distributions)

* tag 'sched-urgent-2025-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  rseq: Fix rseq registration with CONFIG_DEBUG_RSEQ
  sched: Compact RSEQ concurrency IDs with reduced threads and affinity
2025-02-22 09:30:04 -08:00
Linus Torvalds
8a61cb6e15 Merge tag 'block-6.14-20250221' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:

 - NVMe pull request via Keith:
      - FC controller state check fixes (Daniel)
      - PCI Endpoint fixes (Damien)
      - TCP connection failure fixe (Caleb)
      - TCP handling C2HTermReq PDU (Maurizio)
      - RDMA queue state check (Ruozhu)
      - Apple controller fixes (Hector)
      - Target crash on disbaled namespace (Hannes)

 - MD pull request via Yu:
      - Fix queue limits error handling for raid0, raid1 and raid10

 - Fix for a NULL pointer deref in request data mapping

 - Code cleanup for request merging

* tag 'block-6.14-20250221' of git://git.kernel.dk/linux:
  nvme: only allow entering LIVE from CONNECTING state
  nvme-fc: rely on state transitions to handle connectivity loss
  apple-nvme: Support coprocessors left idle
  apple-nvme: Release power domains when probe fails
  nvmet: Use enum definitions instead of hardcoded values
  nvme: Cleanup the definition of the controller config register fields
  nvme/ioctl: add missing space in err message
  nvme-tcp: fix connect failure on receiving partial ICResp PDU
  nvme: tcp: Fix compilation warning with W=1
  nvmet: pci-epf: Avoid RCU stalls under heavy workload
  nvmet: pci-epf: Do not uselessly write the CSTS register
  nvmet: pci-epf: Correctly initialize CSTS when enabling the controller
  nvmet-rdma: recheck queue state is LIVE in state lock in recv done
  nvmet: Fix crash when a namespace is disabled
  nvme-tcp: add basic support for the C2HTermReq PDU
  nvme-pci: quirk Acer FA100 for non-uniqueue identifiers
  block: fix NULL pointer dereferenced within __blk_rq_map_sg
  block/merge: remove unnecessary min() with UINT_MAX
  md/raid*: Fix the set_queue_limits implementations
2025-02-21 09:36:28 -08:00
Linus Torvalds
319fc77f8f Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull BPF fixes from Daniel Borkmann:

 - Fix a soft-lockup in BPF arena_map_free on 64k page size kernels
   (Alan Maguire)

 - Fix a missing allocation failure check in BPF verifier's
   acquire_lock_state (Kumar Kartikeya Dwivedi)

 - Fix a NULL-pointer dereference in trace_kfree_skb by adding kfree_skb
   to the raw_tp_null_args set (Kuniyuki Iwashima)

 - Fix a deadlock when freeing BPF cgroup storage (Abel Wu)

 - Fix a syzbot-reported deadlock when holding BPF map's freeze_mutex
   (Andrii Nakryiko)

 - Fix a use-after-free issue in bpf_test_init when eth_skb_pkt_type is
   accessing skb data not containing an Ethernet header (Shigeru
   Yoshida)

 - Fix skipping non-existing keys in generic_map_lookup_batch (Yan Zhai)

 - Several BPF sockmap fixes to address incorrect TCP copied_seq
   calculations, which prevented correct data reads from recv(2) in user
   space (Jiayuan Chen)

 - Two fixes for BPF map lookup nullness elision (Daniel Xu)

 - Fix a NULL-pointer dereference from vmlinux BTF lookup in
   bpf_sk_storage_tracing_allowed (Jared Kangas)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests: bpf: test batch lookup on array of maps with holes
  bpf: skip non exist keys in generic_map_lookup_batch
  bpf: Handle allocation failure in acquire_lock_state
  bpf: verifier: Disambiguate get_constant_map_key() errors
  bpf: selftests: Test constant key extraction on irrelevant maps
  bpf: verifier: Do not extract constant map keys for irrelevant maps
  bpf: Fix softlockup in arena_map_free on 64k page kernel
  net: Add rx_skb of kfree_skb to raw_tp_null_args[].
  bpf: Fix deadlock when freeing cgroup storage
  selftests/bpf: Add strparser test for bpf
  selftests/bpf: Fix invalid flag of recv()
  bpf: Disable non stream socket for strparser
  bpf: Fix wrong copied_seq calculation
  strparser: Add read_sock callback
  bpf: avoid holding freeze_mutex during mmap operation
  bpf: unify VM_WRITE vs VM_MAYWRITE use in BPF map mmaping logic
  selftests/bpf: Adjust data size to have ETH_HLEN
  bpf, test_run: Fix use-after-free issue in eth_skb_pkt_type()
  bpf: Remove unnecessary BTF lookups in bpf_sk_storage_tracing_allowed
2025-02-20 15:37:17 -08:00
Linus Torvalds
27eddbf344 Merge tag 'net-6.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
 "Smaller than usual with no fixes from any subtree.

  Current release - regressions:

   - core: fix race of rtnl_net_lock(dev_net(dev))

  Previous releases - regressions:

   - core: remove the single page frag cache for good

   - flow_dissector: fix handling of mixed port and port-range keys

   - sched: cls_api: fix error handling causing NULL dereference

   - tcp:
       - adjust rcvq_space after updating scaling ratio
       - drop secpath at the same time as we currently drop dst

   - eth: gtp: suppress list corruption splat in gtp_net_exit_batch_rtnl().

  Previous releases - always broken:

   - vsock:
       - fix variables initialization during resuming
       - for connectible sockets allow only connected

   - eth:
       - geneve: fix use-after-free in geneve_find_dev()
       - ibmvnic: don't reference skb after sending to VIOS"

* tag 'net-6.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (34 commits)
  Revert "net: skb: introduce and use a single page frag cache"
  net: allow small head cache usage with large MAX_SKB_FRAGS values
  nfp: bpf: Add check for nfp_app_ctrl_msg_alloc()
  tcp: drop secpath at the same time as we currently drop dst
  net: axienet: Set mac_managed_pm
  arp: switch to dev_getbyhwaddr() in arp_req_set_public()
  net: Add non-RCU dev_getbyhwaddr() helper
  sctp: Fix undefined behavior in left shift operation
  selftests/bpf: Add a specific dst port matching
  flow_dissector: Fix port range key handling in BPF conversion
  selftests/net/forwarding: Add a test case for tc-flower of mixed port and port-range
  flow_dissector: Fix handling of mixed port and port-range keys
  geneve: Suppress list corruption splat in geneve_destroy_tunnels().
  gtp: Suppress list corruption splat in gtp_net_exit_batch_rtnl().
  dev: Use rtnl_net_dev_lock() in unregister_netdev().
  net: Fix dev_net(dev) race in unregister_netdevice_notifier_dev_net().
  net: Add net_passive_inc() and net_passive_dec().
  net: pse-pd: pd692x0: Fix power limit retrieval
  MAINTAINERS: trim the GVE entry
  gve: set xdp redirect target only when it is available
  ...
2025-02-20 10:19:54 -08:00
Paolo Abeni
6bc7e4eb04 Revert "net: skb: introduce and use a single page frag cache"
After the previous commit is finally safe to revert commit dbae2b0628
("net: skb: introduce and use a single page frag cache"): do it here.

The intended goal of such change was to counter a performance regression
introduced by commit 3226b158e6 ("net: avoid 32 x truesize
under-estimation for tiny skbs").

Unfortunately, the blamed commit introduces another regression for the
virtio_net driver. Such a driver calls napi_alloc_skb() with a tiny
size, so that the whole head frag could fit a 512-byte block.

The single page frag cache uses a 1K fragment for such allocation, and
the additional overhead, under small UDP packets flood, makes the page
allocator a bottleneck.

Thanks to commit bf9f1baa27 ("net: add dedicated kmem_cache for
typical/small skb->head"), this revert does not re-introduce the
original regression. Actually, in the relevant test on top of this
revert, I measure a small but noticeable positive delta, just above
noise level.

The revert itself required some additional mangling due to recent updates
in the affected code.

Suggested-by: Eric Dumazet <edumazet@google.com>
Fixes: dbae2b0628 ("net: skb: introduce and use a single page frag cache")
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-20 10:53:25 +01:00
Breno Leitao
4b5a28b38c net: Add non-RCU dev_getbyhwaddr() helper
Add dedicated helper for finding devices by hardware address when
holding rtnl_lock, similar to existing dev_getbyhwaddr_rcu(). This prevents
PROVE_LOCKING warnings when rtnl_lock is held but RCU read lock is not.

Extract common address comparison logic into dev_addr_cmp().

The context about this change could be found in the following
discussion:

Link: https://lore.kernel.org/all/20250206-scarlet-ermine-of-improvement-1fcac5@leitao/

Cc: kuniyu@amazon.com
Cc: ushankar@purestorage.com
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250218-arm_fix_selftest-v5-1-d3d6892db9e1@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-19 18:59:29 -08:00
Linus Torvalds
6537cfb395 Merge tag 'sound-6.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
 "A slightly large collection of fixes, spread over various drivers.

  Almost all are small and device-specific fixes and quirks in ASoC SOF
  Intel and AMD, Renesas, Cirrus, HD-audio, in addition to a small fix
  for MIDI 2.0"

* tag 'sound-6.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (41 commits)
  ALSA: seq: Drop UMP events when no UMP-conversion is set
  ALSA: hda/conexant: Add quirk for HP ProBook 450 G4 mute LED
  ALSA: hda/cirrus: Reduce codec resume time
  ALSA: hda/cirrus: Correct the full scale volume set logic
  virtio_snd.h: clarify that `controls` depends on VIRTIO_SND_F_CTLS
  ALSA: hda: Add error check for snd_ctl_rename_id() in snd_hda_create_dig_out_ctls()
  ALSA: hda/tas2781: Fix index issue in tas2781 hda SPI driver
  ASoC: imx-audmix: remove cpu_mclk which is from cpu dai device
  ALSA: hda/realtek: Fixup ALC225 depop procedure
  ALSA: hda/tas2781: Update tas2781 hda SPI driver
  ASoC: cs35l41: Fix acpi_device_hid() not found
  ASoC: SOF: amd: Add branch prediction hint in ACP IRQ handler
  ASoC: SOF: amd: Handle IPC replies before FW_BOOT_COMPLETE
  ASoC: SOF: amd: Drop unused includes from Vangogh driver
  ASoC: SOF: amd: Add post_fw_run_delay ACP quirk
  ASoC: Intel: soc-acpi-intel-ptl-match: revise typo of rt713_vb_l2_rt1320_l13
  ASoC: Intel: soc-acpi-intel-ptl-match: revise typo of rt712_vb + rt1320 support
  ALSA: Switch to use hrtimer_setup()
  ALSA: hda: hda-intel: add Panther Lake-H support
  ASoC: SOF: Intel: pci-ptl: Add support for PTL-H
  ...
2025-02-18 09:00:31 -08:00
Damien Le Moal
d422247d14 nvme: Cleanup the definition of the controller config register fields
Reorganized the enum used to define the fields of the contrller
configuration (CC) register in include/linux/nvme.h to:
1) Group together all the values defined for each field.
2) Add the missing field masks definitions.
3) Add comments to describe the enum and each field.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-02-18 07:39:42 -08:00
Maurizio Lombardi
84e009042d nvme-tcp: add basic support for the C2HTermReq PDU
Previously, the NVMe/TCP host driver did not handle the C2HTermReq PDU,
instead printing "unsupported pdu type (3)" when received. This patch adds
support for processing the C2HTermReq PDU, allowing the driver
to print the Fatal Error Status field.

Example of output:
nvme nvme4: Received C2HTermReq (FES = Invalid PDU Header Field)

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-02-18 07:13:26 -08:00
Mathieu Desnoyers
02d954c0fd sched: Compact RSEQ concurrency IDs with reduced threads and affinity
When a process reduces its number of threads or clears bits in its CPU
affinity mask, the mm_cid allocation should eventually converge towards
smaller values.

However, the change introduced by:

commit 7e019dcc47 ("sched: Improve cache locality of RSEQ concurrency
IDs for intermittent workloads")

adds a per-mm/CPU recent_cid which is never unset unless a thread
migrates.

This is a tradeoff between:

A) Preserving cache locality after a transition from many threads to few
   threads, or after reducing the hamming weight of the allowed CPU mask.

B) Making the mm_cid upper bounds wrt nr threads and allowed CPU mask
   easy to document and understand.

C) Allowing applications to eventually react to mm_cid compaction after
   reduction of the nr threads or allowed CPU mask, making the tracking
   of mm_cid compaction easier by shrinking it back towards 0 or not.

D) Making sure applications that periodically reduce and then increase
   again the nr threads or allowed CPU mask still benefit from good
   cache locality with mm_cid.

Introduce the following changes:

* After shrinking the number of threads or reducing the number of
  allowed CPUs, reduce the value of max_nr_cid so expansion of CID
  allocation will preserve cache locality if the number of threads or
  allowed CPUs increase again.

* Only re-use a recent_cid if it is within the max_nr_cid upper bound,
  else find the first available CID.

Fixes: 7e019dcc47 ("sched: Improve cache locality of RSEQ concurrency IDs for intermittent workloads")
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Gabriele Monaco <gmonaco@redhat.com>
Link: https://lkml.kernel.org/r/20250210153253.460471-2-gmonaco@redhat.com
2025-02-18 08:50:36 +01:00
Linus Torvalds
2408a807bf Merge tag 'vfs-6.14-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
 "It was reported that the acct(2) system call can be used to trigger a
  NULL deref in cases where it is set to write to a file that triggers
  an internal lookup.

  This can e.g., happen when pointing acct(2) to /sys/power/resume. At
  the point the where the write to this file happens the calling task
  has already exited and called exit_fs() but an internal lookup might
  be triggered through lookup_bdev(). This may trigger a NULL-deref when
  accessing current->fs.

  Reorganize the code so that the the final write happens from the
  workqueue but with the caller's credentials. This preserves the
  (strange) permission model and has almost no regression risk.

  Also block access to kernel internal filesystems as well as procfs and
  sysfs in the first place.

  Various fixes for netfslib:

   - Fix a number of read-retry hangs, including:

      - Incorrect getting/putting of references on subreqs as we retry
        them

      - Failure to track whether a last old subrequest in a retried set
        is superfluous

      - Inconsistency in the usage of wait queues used for subrequests
        (ie. using clear_and_wake_up_bit() whilst waiting on a private
        waitqueue)

   - Add stats counters for retries and publish in /proc/fs/netfs/stats.
     This is not a fix per se, but is useful in debugging and shouldn't
     otherwise change the operation of the code

   - Fix the ordering of queuing subrequests with respect to setting the
     request flag that says we've now queued them all"

* tag 'vfs-6.14-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  netfs: Fix setting NETFS_RREQ_ALL_QUEUED to be after all subreqs queued
  netfs: Add retry stat counters
  netfs: Fix a number of read-retry hangs
  acct: block access to kernel internal filesystems
  acct: perform last write from workqueue
2025-02-17 10:38:25 -08:00
Linus Torvalds
ae5fa8ce7e Merge tag 'driver-core-6.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core api addition from Greg KH:
 "Here is a driver core new api for 6.14-rc3 that is being added to
  allow platform devices from stop being abused.

  It adds a new 'faux_device' structure and bus and api to allow almost
  a straight or simpler conversion from platform devices that were not
  really a platform device. It also comes with a binding for rust, with
  an example driver in rust showing how it's used.

  I'm adding this now so that the patches that convert the different
  drivers and subsystems can all start flowing into linux-next now
  through their different development trees, in time for 6.15-rc1.

  We have a number that are already reviewed and tested, but adding
  those conversions now doesn't seem right. For now, no one is using
  this, and it passes all build tests from 0-day and linux-next, so all
  should be good"

* tag 'driver-core-6.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  rust/kernel: Add faux device bindings
  driver core: add a faux bus for use when a simple device/bus is needed
2025-02-16 12:54:42 -08:00
Linus Torvalds
82ff316456 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
 "ARM:

   - Large set of fixes for vector handling, especially in the
     interactions between host and guest state.

     This fixes a number of bugs affecting actual deployments, and
     greatly simplifies the FP/SIMD/SVE handling. Thanks to Mark Rutland
     for dealing with this thankless task.

   - Fix an ugly race between vcpu and vgic creation/init, resulting in
     unexpected behaviours

   - Fix use of kernel VAs at EL2 when emulating timers with nVHE

   - Small set of pKVM improvements and cleanups

  x86:

   - Fix broken SNP support with KVM module built-in, ensuring the PSP
     module is initialized before KVM even when the module
     infrastructure cannot be used to order initcalls

   - Reject Hyper-V SEND_IPI hypercalls if the local APIC isn't being
     emulated by KVM to fix a NULL pointer dereference

   - Enter guest mode (L2) from KVM's perspective before initializing
     the vCPU's nested NPT MMU so that the MMU is properly tagged for
     L2, not L1

   - Load the guest's DR6 outside of the innermost .vcpu_run() loop, as
     the guest's value may be stale if a VM-Exit is handled in the
     fastpath"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits)
  x86/sev: Fix broken SNP support with KVM module built-in
  KVM: SVM: Ensure PSP module is initialized if KVM module is built-in
  crypto: ccp: Add external API interface for PSP module initialization
  KVM: arm64: vgic: Hoist SGI/PPI alloc from vgic_init() to kvm_create_vgic()
  KVM: arm64: timer: Drop warning on failed interrupt signalling
  KVM: arm64: Fix alignment of kvm_hyp_memcache allocations
  KVM: arm64: Convert timer offset VA when accessed in HYP code
  KVM: arm64: Simplify warning in kvm_arch_vcpu_load_fp()
  KVM: arm64: Eagerly switch ZCR_EL{1,2}
  KVM: arm64: Mark some header functions as inline
  KVM: arm64: Refactor exit handlers
  KVM: arm64: Refactor CPTR trap deactivation
  KVM: arm64: Remove VHE host restore of CPACR_EL1.SMEN
  KVM: arm64: Remove VHE host restore of CPACR_EL1.ZEN
  KVM: arm64: Remove host FPSIMD saving for non-protected KVM
  KVM: arm64: Unconditionally save+flush host FPSIMD/SVE/SME state
  KVM: x86: Load DR6 with guest value only before entering .vcpu_run() loop
  KVM: nSVM: Enter guest mode before initializing nested NPT MMU
  KVM: selftests: Add CPUID tests for Hyper-V features that need in-kernel APIC
  KVM: selftests: Manage CPUID array in Hyper-V CPUID test's core helper
  ...
2025-02-16 10:25:12 -08:00
Paolo Bonzini
d3d0b8dfe0 Merge tag 'kvm-x86-fixes-6.14-rcN' of https://github.com/kvm-x86/linux into HEAD
KVM fixes for 6.14 part 1

 - Reject Hyper-V SEND_IPI hypercalls if the local APIC isn't being emulated
   by KVM to fix a NULL pointer dereference.

 - Enter guest mode (L2) from KVM's perspective before initializing the vCPU's
   nested NPT MMU so that the MMU is properly tagged for L2, not L1.

 - Load the guest's DR6 outside of the innermost .vcpu_run() loop, as the
   guest's value may be stale if a VM-Exit is handled in the fastpath.
2025-02-14 19:08:35 -05:00
Sean Christopherson
435b344a70 crypto: ccp: Add external API interface for PSP module initialization
KVM is dependent on the PSP SEV driver and PSP SEV driver needs to be
loaded before KVM module. In case of module loading any dependent
modules are automatically loaded but in case of built-in modules there
is no inherent mechanism available to specify dependencies between
modules and ensure that any dependent modules are loaded implicitly.

Add a new external API interface for PSP module initialization which
allows PSP SEV driver to be loaded explicitly if KVM is built-in.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-ID: <15279ca0cad56a07cf12834ec544310f85ff5edc.1739226950.git.ashish.kalra@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-02-14 18:39:19 -05:00
Linus Torvalds
c7ab7b2a18 Merge tag 'efi-fixes-for-v6.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI fixes from Ard Biesheuvel:
 "Take the newly introduced EFI_MEMORY_HOT_PLUGGABLE memory attribute
  into account when placing the kernel image in memory at boot.

  Otherwise, the presence of the kernel image could prevent such a
  memory region from being unplugged at runtime if it was 'cold
  plugged', i.e., already plugged in at boot time (and exposed via the
  EFI memory map).

  This should ensure that the new EFI_MEMORY_HOT_PLUGGABLE memory
  attribute is used consistently by Linux before it ever turns up in
  production, ensuring that we can make meaningful use of it without
  running the risk of regressing existing users"

* tag 'efi-fixes-for-v6.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  efi: Use BIT_ULL() constants for memory attributes
  efi: Avoid cold plugged memory for placing the kernel
2025-02-14 13:56:04 -08:00
Linus Torvalds
1b8c8cdad1 Merge tag 'block-6.14-20250214' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:

 - Fix for request rejection for batch addition

 - Fix a few issues for bogus mac partition tables

* tag 'block-6.14-20250214' of git://git.kernel.dk/linux:
  partitions: mac: fix handling of bogus partition table
  block: cleanup and fix batch completion adding conditions
2025-02-14 11:40:59 -08:00
Linus Torvalds
80868f5d3d Merge tag 'cgroup-for-6.14-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:

 - Fix a race window where a newly forked task could escape cgroup.kill

 - Remove incorrectly included steal time from cpu.stat::usage_usec

 - Minor update in selftest

* tag 'cgroup-for-6.14-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: Remove steal time from usage_usec
  selftests/cgroup: use bash in test_cpuset_v1_hp.sh
  cgroup: fix race between fork and cgroup.kill
2025-02-14 11:00:42 -08:00
Linus Torvalds
348f968b89 Merge tag 'net-6.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
 "Including fixes from netfilter, wireless and bluetooth.

  Kalle Valo steps down after serving as the WiFi driver maintainer for
  over a decade.

  Current release - fix to a fix:

   - vsock: orphan socket after transport release, avoid null-deref

   - Bluetooth: L2CAP: fix corrupted list in hci_chan_del

  Current release - regressions:

   - eth:
      - stmmac: correct Rx buffer layout when SPH is enabled
      - iavf: fix a locking bug in an error path

   - rxrpc: fix alteration of headers whilst zerocopy pending

   - s390/qeth: move netif_napi_add_tx() and napi_enable() from under BH

   - Revert "netfilter: flowtable: teardown flow if cached mtu is stale"

  Current release - new code bugs:

   - rxrpc: fix ipv6 path MTU discovery, only ipv4 worked

   - pse-pd: fix deadlock in current limit functions

  Previous releases - regressions:

   - rtnetlink: fix netns refleak with rtnl_setlink()

   - wifi: brcmfmac: use random seed flag for BCM4355 and BCM4364
     firmware

  Previous releases - always broken:

   - add missing RCU protection of struct net throughout the stack

   - can: rockchip: bail out if skb cannot be allocated

   - eth: ti: am65-cpsw: base XDP support fixes

  Misc:

   - ethtool: tsconfig: update the format of hwtstamp flags, changes the
     uAPI but this uAPI was not in any release yet"

* tag 'net-6.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (72 commits)
  net: pse-pd: Fix deadlock in current limit functions
  rxrpc: Fix ipv6 path MTU discovery
  Reapply "net: skb: introduce and use a single page frag cache"
  s390/qeth: move netif_napi_add_tx() and napi_enable() from under BH
  mlxsw: Add return value check for mlxsw_sp_port_get_stats_raw()
  ipv6: mcast: add RCU protection to mld_newpack()
  team: better TEAM_OPTION_TYPE_STRING validation
  Bluetooth: L2CAP: Fix corrupted list in hci_chan_del
  Bluetooth: btintel_pcie: Fix a potential race condition
  Bluetooth: L2CAP: Fix slab-use-after-free Read in l2cap_send_cmd
  net: ethernet: ti: am65_cpsw: fix tx_cleanup for XDP case
  net: ethernet: ti: am65-cpsw: fix RX & TX statistics for XDP_TX case
  net: ethernet: ti: am65-cpsw: fix memleak in certain XDP cases
  vsock/test: Add test for SO_LINGER null ptr deref
  vsock: Orphan socket after transport release
  MAINTAINERS: Add sctp headers to the general netdev entry
  Revert "netfilter: flowtable: teardown flow if cached mtu is stale"
  iavf: Fix a locking bug in an error path
  rxrpc: Fix alteration of headers whilst zerocopy pending
  net: phylink: make configuring clock-stop dependent on MAC support
  ...
2025-02-13 12:17:04 -08:00
Jakub Kicinski
0892b84031 Reapply "net: skb: introduce and use a single page frag cache"
This reverts commit 011b033590.

Sabrina reports that the revert may trigger warnings due to intervening
changes, especially the ability to rise MAX_SKB_FRAGS. Let's drop it
and revisit once that part is also ironed out.

Fixes: 011b033590 ("Revert "net: skb: introduce and use a single page frag cache"")
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/6bf54579233038bc0e76056c5ea459872ce362ab.1739375933.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-13 08:49:44 -08:00
Greg Kroah-Hartman
35fa2d88ca driver core: add a faux bus for use when a simple device/bus is needed
Many drivers abuse the platform driver/bus system as it provides a
simple way to create and bind a device to a driver-specific set of
probe/release functions.  Instead of doing that, and wasting all of the
memory associated with a platform device, here is a "faux" bus that
can be used instead.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Danilo Krummrich <dakr@kernel.org>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Zijun Hu <quic_zijuhu@quicinc.com>
Link: https://lore.kernel.org/r/2025021026-atlantic-gibberish-3f0c@gregkh
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-13 16:58:51 +01:00
Jens Axboe
1f47ed294a block: cleanup and fix batch completion adding conditions
The conditions for whether or not a request is allowed adding to a
completion batch are a bit hard to read, and they also have a few
issues. One is that ioerror may indeed be a random value on passthrough,
and it's being checked unconditionally of whether or not the given
request is a passthrough request or not.

Rewrite the conditions to be separate for easier reading, and only check
ioerror for non-passthrough requests. This fixes an issue with bio
unmapping on passthrough, where it fails getting added to a batch. This
both leads to suboptimal performance, and may trigger a potential
schedule-under-atomic condition for polled passthrough IO.

Fixes: f794f3351f ("block: add support for blk_mq_end_request_batch()")
Link: https://lore.kernel.org/r/20575f0a-656e-4bb3-9d82-dec6c7e3a35c@kernel.dk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-13 08:20:18 -07:00
David Howells
1d0013962d netfs: Fix a number of read-retry hangs
Fix a number of hangs in the netfslib read-retry code, including:

 (1) netfs_reissue_read() doubles up the getting of references on
     subrequests, thereby leaking the subrequest and causing inode eviction
     to wait indefinitely.  This can lead to the kernel reporting a hang in
     the filesystem's evict_inode().

     Fix this by removing the get from netfs_reissue_read() and adding one
     to netfs_retry_read_subrequests() to deal with the one place that
     didn't double up.

 (2) The loop in netfs_retry_read_subrequests() that retries a sequence of
     failed subrequests doesn't record whether or not it retried the one
     that the "subreq" pointer points to when it leaves the loop.  It may
     not if renegotiation/repreparation of the subrequests means that fewer
     subrequests are needed to span the cumulative range of the sequence.

     Because it doesn't record this, the piece of code that discards
     now-superfluous subrequests doesn't know whether it should discard the
     one "subreq" points to - and so it doesn't.

     Fix this by noting whether the last subreq it examines is superfluous
     and if it is, then getting rid of it and all subsequent subrequests.

     If that one one wasn't superfluous, then we would have tried to go
     round the previous loop again and so there can be no further unretried
     subrequests in the sequence.

 (3) netfs_retry_read_subrequests() gets yet an extra ref on any additional
     subrequests it has to get because it ran out of ones it could reuse to
     to renegotiation/repreparation shrinking the subrequests.

     Fix this by removing that extra ref.

 (4) In netfs_retry_reads(), it was using wait_on_bit() to wait for
     NETFS_SREQ_IN_PROGRESS to be cleared on all subrequests in the
     sequence - but netfs_read_subreq_terminated() is now using a wait
     queue on the request instead and so this wait will never finish.

     Fix this by waiting on the wait queue instead.  To make this work, a
     new flag, NETFS_RREQ_RETRYING, is now set around the wait loop to tell
     the wake-up code to wake up the wait queue rather than requeuing the
     request's work item.

     Note that this flag replaces the NETFS_RREQ_NEED_RETRY flag which is
     no longer used.

 (5) Whilst not strictly anything to do with the hang,
     netfs_retry_read_subrequests() was also doubly incrementing the
     subreq_counter and re-setting the debug index, leaving a gap in the
     trace.  This is also fixed.

One of these hangs was observed with 9p and with cifs.  Others were forced
by manual code injection into fs/afs/file.c.  Firstly, afs_prepare_read()
was created to provide an changing pattern of maximum subrequest sizes:

	static int afs_prepare_read(struct netfs_io_subrequest *subreq)
	{
		struct netfs_io_request *rreq = subreq->rreq;
		if (!S_ISREG(subreq->rreq->inode->i_mode))
			return 0;
		if (subreq->retry_count < 20)
			rreq->io_streams[0].sreq_max_len =
				umax(200, 2222 - subreq->retry_count * 40);
		else
			rreq->io_streams[0].sreq_max_len = 3333;
		return 0;
	}

and pointed to by afs_req_ops.  Then the following:

	struct netfs_io_subrequest *subreq = op->fetch.subreq;
	if (subreq->error == 0 &&
	    S_ISREG(subreq->rreq->inode->i_mode) &&
	    subreq->retry_count < 20) {
		subreq->transferred = subreq->already_done;
		__clear_bit(NETFS_SREQ_HIT_EOF, &subreq->flags);
		__set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
		afs_fetch_data_notify(op);
		return;
	}

was inserted into afs_fetch_data_success() at the beginning and struct
netfs_io_subrequest given an extra field, "already_done" that was set to
the value in "subreq->transferred" by netfs_reissue_read().

When reading a 4K file, the subrequests would get gradually smaller, a new
subrequest would be allocated around the 3rd retry and then eventually be
rendered superfluous when the 20th retry was hit and the limit on the first
subrequest was eased.

Fixes: e2d46f2ec3 ("netfs: Change the read result collector to only use one work item")
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20250212222402.3618494-2-dhowells@redhat.com
Tested-by: Marc Dionne <marc.dionne@auristor.com>
Tested-by: Steve French <stfrench@microsoft.com>
cc: Ihor Solodrai <ihor.solodrai@pm.me>
cc: Eric Van Hensbergen <ericvh@kernel.org>
cc: Latchesar Ionkov <lucho@ionkov.net>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: Christian Schoenebeck <linux_oss@crudebyte.com>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: v9fs@lists.linux.dev
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-13 16:00:38 +01:00
Ulf Hansson
cd3fa304ba pmdomain: core: Introduce dev_pm_genpd_rpm_always_on()
For some usecases a consumer driver requires its device to remain power-on
from the PM domain perspective during runtime. Using dev PM qos along with
the genpd governors, doesn't work for this case as would potentially
prevent the device from being runtime suspended too.

To support these usecases, let's introduce dev_pm_genpd_rpm_always_on() to
allow consumers drivers to dynamically control the behaviour in genpd for a
device that is attached to it.

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
Acked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/1738736156-119203-4-git-send-email-shawn.lin@rock-chips.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2025-02-13 14:47:48 +01:00
Pierre-Louis Bossart
a1f7b7ff0e PCI: pci_ids: add INTEL_HDA_PTL_H
Add Intel PTL-H audio Device ID.

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/20250210081730.22916-2-peter.ujfalusi@linux.intel.com
2025-02-10 09:22:32 +01:00
Linus Torvalds
69b54314c9 Merge tag 'kbuild-fixes-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:

 - Suppress false-positive -Wformat-{overflow,truncation}-non-kprintf
   warnings regardless of the W= option

 - Avoid CONFIG_TRIM_UNUSED_KSYMS dropping symbols passed to symbol_get()

 - Fix a build regression of the Debian linux-headers package

* tag 'kbuild-fixes-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kbuild: install-extmod-build: add missing quotation marks for CC variable
  kbuild: fix misspelling in scripts/Makefile.lib
  kbuild: keep symbols for symbol_get() even with CONFIG_TRIM_UNUSED_KSYMS
  scripts/Makefile.extrawarn: Do not show clang's non-kprintf warnings at W=1
2025-02-09 10:05:32 -08:00
Linus Torvalds
954a209f43 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
 "ARM:

   - Correctly clean the BSS to the PoC before allowing EL2 to access it
     on nVHE/hVHE/protected configurations

   - Propagate ownership of debug registers in protected mode after the
     rework that landed in 6.14-rc1

   - Stop pretending that we can run the protected mode without a GICv3
     being present on the host

   - Fix a use-after-free situation that can occur if a vcpu fails to
     initialise the NV shadow S2 MMU contexts

   - Always evaluate the need to arm a background timer for fully
     emulated guest timers

   - Fix the emulation of EL1 timers in the absence of FEAT_ECV

   - Correctly handle the EL2 virtual timer, specially when HCR_EL2.E2H==0

  s390:

   - move some of the guest page table (gmap) logic into KVM itself,
     inching towards the final goal of completely removing gmap from the
     non-kvm memory management code.

     As an initial set of cleanups, move some code from mm/gmap into kvm
     and start using __kvm_faultin_pfn() to fault-in pages as needed;
     but especially stop abusing page->index and page->lru to aid in the
     pgdesc conversion.

  x86:

   - Add missing check in the fix to defer starting the huge page
     recovery vhost_task

   - SRSO_USER_KERNEL_NO does not need SYNTHESIZED_F"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (31 commits)
  KVM: x86/mmu: Ensure NX huge page recovery thread is alive before waking
  KVM: remove kvm_arch_post_init_vm
  KVM: selftests: Fix spelling mistake "initally" -> "initially"
  kvm: x86: SRSO_USER_KERNEL_NO is not synthesized
  KVM: arm64: timer: Don't adjust the EL2 virtual timer offset
  KVM: arm64: timer: Correctly handle EL1 timer emulation when !FEAT_ECV
  KVM: arm64: timer: Always evaluate the need for a soft timer
  KVM: arm64: Fix nested S2 MMU structures reallocation
  KVM: arm64: Fail protected mode init if no vgic hardware is present
  KVM: arm64: Flush/sync debug state in protected mode
  KVM: s390: selftests: Streamline uc_skey test to issue iske after sske
  KVM: s390: remove the last user of page->index
  KVM: s390: move PGSTE softbits
  KVM: s390: remove useless page->index usage
  KVM: s390: move gmap_shadow_pgt_lookup() into kvm
  KVM: s390: stop using lists to keep track of used dat tables
  KVM: s390: stop using page->index for non-shadow gmaps
  KVM: s390: move some gmap shadowing functions away from mm/gmap.c
  KVM: s390: get rid of gmap_translate()
  KVM: s390: get rid of gmap_fault()
  ...
2025-02-09 09:41:38 -08:00
Linus Torvalds
9946eaf552 Merge tag 'hardening-v6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening fixes from Kees Cook:
 "Address a KUnit stack initialization regression that got tickled on
  m68k, and solve a Clang(v14 and earlier) bug found by 0day:

   - Fix stackinit KUnit regression on m68k

   - Use ARRAY_SIZE() for memtostr*()/strtomem*()"

* tag 'hardening-v6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  string.h: Use ARRAY_SIZE() for memtostr*()/strtomem*()
  compiler.h: Introduce __must_be_byte_array()
  compiler.h: Move C string helpers into C-only kernel section
  stackinit: Fix comment for test_small_end
  stackinit: Keep selftest union size small on m68k
2025-02-08 14:12:17 -08:00
Linus Torvalds
74b5161d57 Merge tag 'i2c-for-6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c reverts from Wolfram Sang:
 "It turned out the new mechanism for handling created devices does not
  handle all muxing cases.

  Revert the changes to give a proper solution more time"

* tag 'i2c-for-6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  Revert "i2c: Replace list-based mechanism for handling auto-detected clients"
  Revert "i2c: Replace list-based mechanism for handling userspace-created clients"
2025-02-08 13:35:17 -08:00
Paolo Abeni
011b033590 Revert "net: skb: introduce and use a single page frag cache"
This reverts commit dbae2b0628 ("net: skb: introduce and use a single
page frag cache"). The intended goal of such change was to counter a
performance regression introduced by commit 3226b158e6 ("net: avoid
32 x truesize under-estimation for tiny skbs").

Unfortunately, the blamed commit introduces another regression for the
virtio_net driver. Such a driver calls napi_alloc_skb() with a tiny
size, so that the whole head frag could fit a 512-byte block.

The single page frag cache uses a 1K fragment for such allocation, and
the additional overhead, under small UDP packets flood, makes the page
allocator a bottleneck.

Thanks to commit bf9f1baa27 ("net: add dedicated kmem_cache for
typical/small skb->head"), this revert does not re-introduce the
original regression. Actually, in the relevant test on top of this
revert, I measure a small but noticeable positive delta, just above
noise level.

The revert itself required some additional mangling due to the
introduction of the SKB_HEAD_ALIGN() helper and local lock infra in the
affected code.

Suggested-by: Eric Dumazet <edumazet@google.com>
Fixes: dbae2b0628 ("net: skb: introduce and use a single page frag cache")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/e649212fde9f0fdee23909ca0d14158d32bb7425.1738877290.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-07 17:20:22 -08:00
Linus Torvalds
8c67da5bc1 Merge tag 'vfs-6.14-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:

 - Fix fsnotify FMODE_NONOTIFY* handling.

   This also disables fsnotify on all pseudo files by default apart from
   very select exceptions. This carries a regression risk so we need to
   watch out and adapt accordingly. However, it is overall a significant
   improvement over the current status quo where every rando file can
   get fsnotify enabled.

 - Cleanup and simplify lockref_init() after recent lockref changes.

 - Fix vboxfs build with gcc-15.

 - Add an assert into inode_set_cached_link() to catch corrupt links.

 - Allow users to also use an empty string check to detect whether a
   given mount option string was empty or not.

 - Fix how security options were appended to statmount()'s ->mnt_opt
   field.

 - Fix statmount() selftests to always check the returned mask.

 - Fix uninitialized value in vfs_statx_path().

 - Fix pidfs_ioctl() sanity checks to guard against ioctl() overloading
   and preserve extensibility.

* tag 'vfs-6.14-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  vfs: sanity check the length passed to inode_set_cached_link()
  pidfs: improve ioctl handling
  fsnotify: disable pre-content and permission events by default
  selftests: always check mask returned by statmount(2)
  fsnotify: disable notification by default for all pseudo files
  fs: fix adding security options to statmount.mnt_opt
  fsnotify: use accessor to set FMODE_NONOTIFY_*
  lockref: remove count argument of lockref_init
  gfs2: switch to lockref_init(..., 1)
  gfs2: use lockref_init for gl_lockref
  statmount: let unset strings be empty
  vboxsf: fix building with GCC 15
  fs/stat.c: avoid harmless garbage value problem in vfs_statx_path()
2025-02-07 09:22:31 -08:00
Mateusz Guzik
37d11cfc63 vfs: sanity check the length passed to inode_set_cached_link()
This costs a strlen() call when instatianating a symlink.

Preferably it would be hidden behind VFS_WARN_ON (or compatible), but
there is no such facility at the moment. With the facility in place the
call can be patched out in production kernels.

In the meantime, since the cost is being paid unconditionally, use the
result to a fixup the bad caller.

This is not expected to persist in the long run (tm).

Sample splat:
bad length passed for symlink [/tmp/syz-imagegen43743633/file0/file0] (got 131109, expected 37)
[rest of WARN blurp goes here]

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/r/20250204213207.337980-1-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-07 10:29:59 +01:00
Amir Goldstein
95101401bb fsnotify: use accessor to set FMODE_NONOTIFY_*
The FMODE_NONOTIFY_* bits are a 2-bits mode.  Open coding manipulation
of those bits is risky.  Use an accessor file_set_fsnotify_mode() to
set the mode.

Rename file_set_fsnotify_mode() => file_set_fsnotify_mode_from_watchers()
to make way for the simple accessor name.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/r/20250203223205.861346-2-amir73il@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-07 10:27:26 +01:00
Andreas Gruenbacher
bb504b4d64 lockref: remove count argument of lockref_init
All users of lockref_init() now initialize the count to 1, so hardcode
that and remove the count argument.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Link: https://lore.kernel.org/r/20250130135624.1899988-4-agruenba@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-07 10:27:25 +01:00
Kees Cook
6270f4deba string.h: Use ARRAY_SIZE() for memtostr*()/strtomem*()
The destination argument of memtostr*() and strtomem*() must be a
fixed-size char array at compile time, so there is no need to use
__builtin_object_size() (which is useful for when an argument is
either a pointer or unknown). Instead use ARRAY_SIZE(), which has the
benefit of working around a bug in Clang (fixed[1] in 15+) that got
__builtin_object_size() wrong sometimes.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202501310832.kiAeOt2z-lkp@intel.com/
Suggested-by: Kent Overstreet <kent.overstreet@linux.dev>
Link: d8e0a6d5e9 [1]
Tested-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Kees Cook <kees@kernel.org>
2025-02-06 18:48:04 -08:00
Kees Cook
20e5cc26e5 compiler.h: Introduce __must_be_byte_array()
In preparation for adding stricter type checking to the str/mem*()
helpers, provide a way to check that a variable is a byte array
via __must_be_byte_array().

Suggested-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Kees Cook <kees@kernel.org>
2025-02-06 18:48:02 -08:00
Kees Cook
cb7380de9e compiler.h: Move C string helpers into C-only kernel section
The C kernel helpers for evaluating C Strings were positioned where they
were visible to assembly inclusion, which was not intended. Move them
into the kernel and C-only area of the header so future changes won't
confuse the assembler.

Fixes: d7a516c6ee ("compiler.h: Fix undefined BUILD_BUG_ON_ZERO()")
Fixes: 559048d156 ("string: Check for "nonstring" attribute on strscpy() arguments")
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Kees Cook <kees@kernel.org>
2025-02-06 18:48:00 -08:00
Eric Dumazet
482ad2a4ac net: add dev_net_rcu() helper
dev->nd_net can change, readers should either
use rcu_read_lock() or RTNL.

We currently use a generic helper, dev_net() with
no debugging support. We probably have many hidden bugs.

Add dev_net_rcu() helper for callers using rcu_read_lock()
protection.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250205155120.1676781-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-06 16:14:14 -08:00
Linus Torvalds
3cf0a98fea Merge tag 'net-6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
 "Interestingly the recent kmemleak improvements allowed our CI to catch
  a couple of percpu leaks addressed here.

  We (mostly Jakub, to be accurate) are working to increase review
  coverage over the net code-base tweaking the MAINTAINER entries.

  Current release - regressions:

   - core: harmonize tstats and dstats

   - ipv6: fix dst refleaks in rpl, seg6 and ioam6 lwtunnels

   - eth: tun: revert fix group permission check

   - eth: stmmac: revert "specify hardware capability value when FIFO
     size isn't specified"

  Previous releases - regressions:

   - udp: gso: do not drop small packets when PMTU reduces

   - rxrpc: fix race in call state changing vs recvmsg()

   - eth: ice: fix Rx data path for heavy 9k MTU traffic

   - eth: vmxnet3: fix tx queue race condition with XDP

  Previous releases - always broken:

   - sched: pfifo_tail_enqueue: drop new packet when sch->limit == 0

   - ethtool: ntuple: fix rss + ring_cookie check

   - rxrpc: fix the rxrpc_connection attend queue handling

  Misc:

   - recognize Kuniyuki Iwashima as a maintainer"

* tag 'net-6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (34 commits)
  Revert "net: stmmac: Specify hardware capability value when FIFO size isn't specified"
  MAINTAINERS: add a sample ethtool section entry
  MAINTAINERS: add entry for ethtool
  rxrpc: Fix race in call state changing vs recvmsg()
  rxrpc: Fix call state set to not include the SERVER_SECURING state
  net: sched: Fix truncation of offloaded action statistics
  tun: revert fix group permission check
  selftests/tc-testing: Add a test case for qdisc_tree_reduce_backlog()
  netem: Update sch->q.qlen before qdisc_tree_reduce_backlog()
  selftests/tc-testing: Add a test case for pfifo_head_drop qdisc when limit==0
  pfifo_tail_enqueue: Drop new packet when sch->limit == 0
  selftests: mptcp: connect: -f: no reconnect
  net: rose: lock the socket in rose_bind()
  net: atlantic: fix warning during hot unplug
  rxrpc: Fix the rxrpc_connection attend queue handling
  net: harmonize tstats and dstats
  selftests: drv-net: rss_ctx: don't fail reconfigure test if queue offset not supported
  selftests: drv-net: rss_ctx: add missing cleanup in queue reconfigure
  ethtool: ntuple: fix rss + ring_cookie check
  ethtool: rss: fix hiding unsupported fields in dumps
  ...
2025-02-06 09:14:54 -08:00
Masahiro Yamada
4c56eb33e6 kbuild: keep symbols for symbol_get() even with CONFIG_TRIM_UNUSED_KSYMS
Linus observed that the symbol_request(utf8_data_table) call fails when
CONFIG_UNICODE=y and CONFIG_TRIM_UNUSED_KSYMS=y.

symbol_get() relies on the symbol data being present in the ksymtab for
symbol lookups. However, EXPORT_SYMBOL_GPL(utf8_data_table) is dropped
due to CONFIG_TRIM_UNUSED_KSYMS, as no module references it in this case.

Probably, this has been broken since commit dbacb0ef67 ("kconfig option
for TRIM_UNUSED_KSYMS").

This commit addresses the issue by leveraging modpost. Symbol names
passed to symbol_get() are recorded in the special .no_trim_symbol
section, which is then parsed by modpost to forcibly keep such symbols.
The .no_trim_symbol section is discarded by the linker scripts, so there
is no impact on the size of the final vmlinux or modules.

This commit cannot resolve the issue for direct calls to __symbol_get()
because the symbol name is not known at compile-time.

Although symbol_get() may eventually be deprecated, this workaround
should be good enough meanwhile.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-02-06 01:08:58 +09:00
Wolfram Sang
3bfa08fe9e Revert "i2c: Replace list-based mechanism for handling auto-detected clients"
This reverts commit 56a50667cb. Mux
handling is not sufficiently implemented. It needs more time.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
2025-02-05 14:22:12 +01:00
Wolfram Sang
c4d3dfd8cc Revert "i2c: Replace list-based mechanism for handling userspace-created clients"
This reverts commit 3cfe39b3a8. Mux
handling is not sufficiently implemented. It needs more time.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
2025-02-05 14:21:36 +01:00
Paolo Bonzini
6f61269495 KVM: remove kvm_arch_post_init_vm
The only statement in a kvm_arch_post_init_vm implementation
can be moved into the x86 kvm_arch_init_vm.  Do so and remove all
traces from architecture-independent code.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-02-04 11:27:45 -05:00
Ard Biesheuvel
bbc4578537 efi: Use BIT_ULL() constants for memory attributes
For legibility, use the existing BIT_ULL() to generate the u64 type EFI
memory attribute macros.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-02-04 14:08:22 +01:00
Ard Biesheuvel
ba69e0750b efi: Avoid cold plugged memory for placing the kernel
UEFI 2.11 introduced EFI_MEMORY_HOT_PLUGGABLE to annotate system memory
regions that are 'cold plugged' at boot, i.e., hot pluggable memory that
is available from early boot, and described as system RAM by the
firmware.

Existing loaders and EFI applications running in the boot context will
happily use this memory for allocating data structures that cannot be
freed or moved at runtime, and this prevents the memory from being
unplugged. Going forward, the new EFI_MEMORY_HOT_PLUGGABLE attribute
should be tested, and memory annotated as such should be avoided for
such allocations.

In the EFI stub, there are a couple of occurrences where, instead of the
high-level AllocatePages() UEFI boot service, a low-level code sequence
is used that traverses the EFI memory map and carves out the requested
number of pages from a free region. This is needed, e.g., for allocating
as low as possible, or for allocating pages at random.

While AllocatePages() should presumably avoid special purpose memory and
cold plugged regions, this manual approach needs to incorporate this
logic itself, in order to prevent the kernel itself from ending up in a
hot unpluggable region, preventing it from being unplugged.

So add the EFI_MEMORY_HOTPLUGGABLE macro definition, and check for it
where appropriate.

Cc: stable@vger.kernel.org
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-02-04 14:08:22 +01:00
Paolo Abeni
d3ed6dee73 net: harmonize tstats and dstats
After the blamed commits below, some UDP tunnel use dstats for
accounting. On the xmit path, all the UDP-base tunnels ends up
using iptunnel_xmit_stats() for stats accounting, and the latter
assumes the relevant (tunnel) network device uses tstats.

The end result is some 'funny' stat report for the mentioned UDP
tunnel, e.g. when no packet is actually dropped and a bunch of
packets are transmitted:

gnv2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue \
		state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether ee:7d:09:87:90:ea brd ff:ff:ff:ff:ff:ff
    RX:  bytes packets errors dropped  missed   mcast
         14916      23      0      15       0       0
    TX:  bytes packets errors dropped carrier collsns
             0    1566      0       0       0       0

Address the issue ensuring the same binary layout for the overlapping
fields of dstats and tstats. While this solution is a bit hackish, is
smaller and with no performance pitfall compared to other alternatives
i.e. supporting both dstat and tstat in iptunnel_xmit_stats() or
reverting the blamed commit.

With time we should possibly move all the IP-based tunnel (and virtual
devices) to dstats.

Fixes: c77200c074 ("bareudp: Handle stats using NETDEV_PCPU_STAT_DSTATS.")
Fixes: 6fa6de3022 ("geneve: Handle stats using NETDEV_PCPU_STAT_DSTATS.")
Fixes: be226352e8 ("vxlan: Handle stats using NETDEV_PCPU_STAT_DSTATS.")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/2e1c444cf0f63ae472baff29862c4c869be17031.1738432804.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-03 18:39:59 -08:00
Linus Torvalds
f286757b64 Merge tag 'timers-urgent-2025-02-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:

 - Properly cast the input to secs_to_jiffies() to unsigned long as
   otherwise the result uses the data type of the input variable, which
   causes result range checks to fail if the input data type is signed
   and smaller than unsigned long.

 - Handle late armed hrtimers gracefully on CPU hotplug

   There are legitimate cases where a hrtimer is (re)armed on an
   outgoing CPU after the timers have been migrated away. This triggers
   warnings and caused people to implement horrible workarounds in RCU.
   But those workarounds are incomplete and do not cover e.g. the
   scheduler hrtimers.

   Stop this by force moving timer which are enqueued on the current CPU
   after timer migration to be queued on a remote online CPU.

   This allows to undo the workarounds in a seperate step.

 - Demote a warning level printk() to info level in the clocksource
   watchdog code as there is no point to emit a warning level message
   for a purely informational message.

 - Mark a helper function __always_inline and move it into the existing
   #ifdef block to avoid 'unused function' warnings from CLANG

* tag 'timers-urgent-2025-02-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  jiffies: Cast to unsigned long in secs_to_jiffies() conversion
  clocksource: Use pr_info() for "Checking clocksource synchronization" message
  hrtimers: Force migrate away hrtimers queued after CPUHP_AP_HRTIMERS_DYING
  hrtimers: Mark is_migration_base() with __always_inline
2025-02-03 09:10:56 -08:00