Commit Graph

1368207 Commits

Author SHA1 Message Date
Kent Overstreet
ff6369da9a bcachefs: reduce stack usage in alloc_sectors_start()
with typical config options, variables in different inline functions
aren't sharing stack space - and these are slowpaths.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:13 -04:00
Kent Overstreet
eabef52ff8 bcachefs: bch2_alloc_v4_to_text()
Specialize the .to_text() for alloc_v4, to avoid the temporary on the
stack for conversion from old versions.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:13 -04:00
Kent Overstreet
0c34e7ff69 bcachefs: Tweak bch2_data_update_init() for stack usage
- Separate out a slowpath for bkey_nocow_lock()
- Don't call bch2_bkey_ptrs_c() or loop over pointers more than
  necessary

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:13 -04:00
Kent Overstreet
56e5c7f65f bcachefs: kill replicas_sectors arg to __trigger_extent()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:13 -04:00
Kent Overstreet
92caf17189 bcachefs: Don't stack allocate bch_writepage_state
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:12 -04:00
Kent Overstreet
cd831a9494 bcachefs: factor out break_cycle_fail()
More stack usage work.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:12 -04:00
Kent Overstreet
19c0a8aa8a bcachefs: btree_node_missing_err()
Factor out an error path for a small stack usage improvement.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:12 -04:00
Kent Overstreet
0d25264ecf bcachefs: Kill bkey_buf in btree_path_down()
Allocate some (smaller) temporary storage in btree_trans for this -
btree_path_down() is in our max-stack call stack.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:12 -04:00
Kent Overstreet
99813d88e3 bcachefs: Add missing error logging in delete_dead_inodes()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:12 -04:00
Kent Overstreet
f54b2a80d0 bcachefs: Fix misaligned bucket check in journal space calculations
Fix an assertion pop in the tiering_misaligned test: rounding down to
bucket size at the end of the journal space calculations leaves
cur_entry_sectors == 0, which is incorrect with !cur_entry_err.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:12 -04:00
Kent Overstreet
813825d241 bcachefs: Fix incorrect multiple dev check in journal write path
It's uncomon to have multiple devices with journalling only on a subset,
but can be specified with the 'data_allowed' option. We need to know if
we're doing data/metadata writes to multiple devices, as that requires
issuing flushes before the journal writes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:12 -04:00
Kent Overstreet
327971cef5 bcachefs: Catch data_update_done events in trace_io_move_start_fail
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:12 -04:00
Kent Overstreet
c7897b5055 bcachefs: io_move_evacuate_bucket tracepoint, counter
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:12 -04:00
Kent Overstreet
060ff4b794 bcachefs: trace_io_move_pred
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:12 -04:00
Kent Overstreet
d6efd42a84 bcachefs: Fix infinite loop in journal_entry_btree_keys_to_text()
Fix an infinite loop when bkey_i->k.u64s is 0.

This only happens in userspace, where 'bcachefs list_journal' can print
the entire contents of the journal, and non-dirty entries aren't
validated.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:12 -04:00
Kent Overstreet
cd04497b10 bcachefs: Journal read error message improvements
- Don't print a checksum error when we first read a journal entry: we
  print a checksum error later if we'll be using the journal entry.

- Continuing with the theme of of improving error messages and grouping
  errors into a single log message per error, print a single 'checksum
  error' message per journal entry, and use bch2_journal_ptr_to_text()
  to print out where on the device it was.

- Factor out checksum error messages and checking for missing journal
  entries into helpers, bch2_journal_read() has gotten obnoxiously big.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30 01:21:12 -04:00
Linus Torvalds
f66bc387ef Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
 "Updates to the usual drivers (smartpqi, ufs, lpfc, scsi_debug, target,
  hisi_sas) with the only substantive core change being the removal of
  the stream_status member from the scsi_stream_status_header (to get
  rid of flex array members)"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (77 commits)
  scsi: target: core: Constify struct target_opcode_descriptor
  scsi: target: core: Constify enabled() in struct target_opcode_descriptor
  scsi: hisi_sas: Fix warning detected by sparse
  scsi: mpt3sas: Fix _ctl_get_mpt_mctp_passthru_adapter() to return IOC pointer
  scsi: sg: Remove unnecessary NULL check before unregister_sysctl_table()
  scsi: ufs: mcq: Delete ufshcd_release_scsi_cmd() in ufshcd_mcq_abort()
  scsi: ufs: qcom: dt-bindings: Document the SM8750 UFS Controller
  scsi: mvsas: Fix typos in SAS/SATA VSP register comments
  scsi: fnic: Replace memset() with eth_zero_addr()
  scsi: ufs: core: Support updating device command timeout
  scsi: ufs: core: Change hwq_id type and value
  scsi: ufs: core: Increase the UIC command timeout further
  scsi: zfcp: Simplify workqueue allocation
  scsi: ufs: core: Print error value as hex format in ufshcd_err_handler()
  scsi: sd: Remove the stream_status member from scsi_stream_status_header
  scsi: docs: Clean up some style in scsi_mid_low_api
  scsi: core: Remove unused scsi_dev_info_list_del_keyed()
  scsi: isci: Remove unused sci_remote_device_reset()
  scsi: scsi_debug: Reduce DEF_ATOMIC_WR_MAX_LENGTH
  scsi: smartpqi: Delete a stray tab in pqi_is_parity_write_stream()
  ...
2025-05-29 22:17:52 -07:00
Christian Brauner
21fae34a27 Merge patch series "rust: file: mark LocalFile as repr(transparent)"
Mark files as repr(transparent) to ensure identical layout between C and Rust.

* patches from https://lore.kernel.org/20250527204636.12573-1-pekkarr@protonmail.com:
  rust: file: improve safety comments
  rust: file: mark `LocalFile` as `repr(transparent)`

Link: https://lore.kernel.org/20250527204636.12573-1-pekkarr@protonmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-30 07:12:08 +02:00
Pekka Ristola
946026ba42 rust: file: improve safety comments
Some of the safety comments in `LocalFile`'s methods incorrectly refer to
the `File` type instead of `LocalFile`, so fix them to use the correct
type.

Also add missing Markdown code spans around lifetimes in the safety
comments, i.e. change 'a to `'a`.

Link: https://github.com/Rust-for-Linux/linux/issues/1165
Signed-off-by: Pekka Ristola <pekkarr@protonmail.com>
Link: https://lore.kernel.org/20250527204636.12573-2-pekkarr@protonmail.com
Reviewed-by: Benno Lossin <lossin@kernel.org>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-30 07:12:05 +02:00
Pekka Ristola
15ecd83dc0 rust: file: mark LocalFile as repr(transparent)
Unsafe code in `LocalFile`'s methods assumes that the type has the same
layout as the inner `bindings::file`. This is not guaranteed by the default
struct representation in Rust, but requires specifying the `transparent`
representation.

The `File` struct (which also wraps `bindings::file`) is already marked as
`repr(transparent)`, so this change makes their layouts equivalent.

Fixes: 851849824b ("rust: file: add Rust abstraction for `struct file`")
Closes: https://github.com/Rust-for-Linux/linux/issues/1165
Signed-off-by: Pekka Ristola <pekkarr@protonmail.com>
Link: https://lore.kernel.org/20250527204636.12573-1-pekkarr@protonmail.com
Reviewed-by: Benno Lossin <lossin@kernel.org>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-30 07:12:05 +02:00
Alistair Popple
dd59137bfe fs/dax: Fix "don't skip locked entries when scanning entries"
Commit 6be3e21d25 ("fs/dax: don't skip locked entries when scanning
entries") introduced a new function, wait_entry_unlocked_exclusive(),
which waits for the current entry to become unlocked without advancing
the XArray iterator state.

Waiting for the entry to become unlocked requires dropping the XArray
lock. This requires calling xas_pause() prior to dropping the lock
which leaves the xas in a suitable state for the next iteration. However
this has the side-effect of advancing the xas state to the next index.
Normally this isn't an issue because xas_for_each() contains code to
detect this state and thus avoid advancing the index a second time on
the next loop iteration.

However both callers of and wait_entry_unlocked_exclusive() itself
subsequently use the xas state to reload the entry. As xas_pause()
updated the state to the next index this will cause the current entry
which is being waited on to be skipped. This caused the following
warning to fire intermittently when running xftest generic/068 on an XFS
filesystem with FS DAX enabled:

[   35.067397] ------------[ cut here ]------------
[   35.068229] WARNING: CPU: 21 PID: 1640 at mm/truncate.c:89 truncate_folio_batch_exceptionals+0xd8/0x1e0
[   35.069717] Modules linked in: nd_pmem dax_pmem nd_btt nd_e820 libnvdimm
[   35.071006] CPU: 21 UID: 0 PID: 1640 Comm: fstest Not tainted 6.15.0-rc7+ #77 PREEMPT(voluntary)
[   35.072613] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/204
[   35.074845] RIP: 0010:truncate_folio_batch_exceptionals+0xd8/0x1e0
[   35.075962] Code: a1 00 00 00 f6 47 0d 20 0f 84 97 00 00 00 4c 63 e8 41 39 c4 7f 0b eb 61 49 83 c5 01 45 39 ec 7e 58 42 f68
[   35.079522] RSP: 0018:ffffb04e426c7850 EFLAGS: 00010202
[   35.080359] RAX: 0000000000000000 RBX: ffff9d21e3481908 RCX: ffffb04e426c77f4
[   35.081477] RDX: ffffb04e426c79e8 RSI: ffffb04e426c79e0 RDI: ffff9d21e34816e8
[   35.082590] RBP: ffffb04e426c79e0 R08: 0000000000000001 R09: 0000000000000003
[   35.083733] R10: 0000000000000000 R11: 822b53c0f7a49868 R12: 000000000000001f
[   35.084850] R13: 0000000000000000 R14: ffffb04e426c78e8 R15: fffffffffffffffe
[   35.085953] FS:  00007f9134c87740(0000) GS:ffff9d22abba0000(0000) knlGS:0000000000000000
[   35.087346] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   35.088244] CR2: 00007f9134c86000 CR3: 000000040afff000 CR4: 00000000000006f0
[   35.089354] Call Trace:
[   35.089749]  <TASK>
[   35.090168]  truncate_inode_pages_range+0xfc/0x4d0
[   35.091078]  truncate_pagecache+0x47/0x60
[   35.091735]  xfs_setattr_size+0xc7/0x3e0
[   35.092648]  xfs_vn_setattr+0x1ea/0x270
[   35.093437]  notify_change+0x1f4/0x510
[   35.094219]  ? do_truncate+0x97/0xe0
[   35.094879]  do_truncate+0x97/0xe0
[   35.095640]  path_openat+0xabd/0xca0
[   35.096278]  do_filp_open+0xd7/0x190
[   35.096860]  do_sys_openat2+0x8a/0xe0
[   35.097459]  __x64_sys_openat+0x6d/0xa0
[   35.098076]  do_syscall_64+0xbb/0x1d0
[   35.098647]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[   35.099444] RIP: 0033:0x7f9134d81fc1
[   35.100033] Code: 75 57 89 f0 25 00 00 41 00 3d 00 00 41 00 74 49 80 3d 2a 26 0e 00 00 74 6d 89 da 48 89 ee bf 9c ff ff ff5
[   35.102993] RSP: 002b:00007ffcd41e0d10 EFLAGS: 00000202 ORIG_RAX: 0000000000000101
[   35.104263] RAX: ffffffffffffffda RBX: 0000000000000242 RCX: 00007f9134d81fc1
[   35.105452] RDX: 0000000000000242 RSI: 00007ffcd41e1200 RDI: 00000000ffffff9c
[   35.106663] RBP: 00007ffcd41e1200 R08: 0000000000000000 R09: 0000000000000064
[   35.107923] R10: 00000000000001a4 R11: 0000000000000202 R12: 0000000000000066
[   35.109112] R13: 0000000000100000 R14: 0000000000100000 R15: 0000000000000400
[   35.110357]  </TASK>
[   35.110769] irq event stamp: 8415587
[   35.111486] hardirqs last  enabled at (8415599): [<ffffffff8d74b562>] __up_console_sem+0x52/0x60
[   35.113067] hardirqs last disabled at (8415610): [<ffffffff8d74b547>] __up_console_sem+0x37/0x60
[   35.114575] softirqs last  enabled at (8415300): [<ffffffff8d6ac625>] handle_softirqs+0x315/0x3f0
[   35.115933] softirqs last disabled at (8415291): [<ffffffff8d6ac811>] __irq_exit_rcu+0xa1/0xc0
[   35.117316] ---[ end trace 0000000000000000 ]---

Fix this by using xas_reset() instead, which is equivalent in
implementation to xas_pause() but does not advance the XArray state.

Fixes: 6be3e21d25 ("fs/dax: don't skip locked entries when scanning entries")
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Link: https://lore.kernel.org/20250523043749.1460780-1-apopple@nvidia.com
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Alison Schofield <alison.schofield@intel.com>
Cc: "Matthew Wilcow (Oracle)" <willy@infradead.org>
Cc: Balbir Singh <balbirs@nvidia.com>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-30 07:09:52 +02:00
Linus Torvalds
3536049822 Merge tag 'vfio-v6.16-rc1' of https://github.com/awilliam/linux-vfio
Pull VFIO updates from Alex Williamson:

 - Remove an outdated DMA unmap optimization that relies on a feature
   only implemented in AMDv1 page tables. (Jason Gunthorpe)

 - Fix various migration issues in the hisi_acc_vfio_pci variant driver,
   including use of a wrong DMA address requiring an update to the
   migration data structure, resending task completion interrupt after
   migration to re-sync queues, fixing a write-back cache sequencing
   issue, fixing a driver unload issue, behaving correctly when the
   guest driver is not loaded, and avoiding to squash errors from
   sub-functions. (Longfang Liu)

 - mlx5-vfio-pci variant driver update to make use of the new two-step
   DMA API for migration, using a page array directly rather than using
   a page list mapped across a scatter list. (Leon Romanovsky)

 - Fix an incorrect loop index used when unwinding allocation of dirty
   page bitmaps on error, resulting in temporary failure in freeing
   unused bitmaps. (Li RongQing)

* tag 'vfio-v6.16-rc1' of https://github.com/awilliam/linux-vfio:
  vfio/type1: Fix error unwind in migration dirty bitmap allocation
  vfio/mlx5: Enable the DMA link API
  vfio/mlx5: Rewrite create mkey flow to allow better code reuse
  vfio/mlx5: Explicitly use number of pages instead of allocated length
  hisi_acc_vfio_pci: update function return values.
  hisi_acc_vfio_pci: bugfix live migration function without VF device driver
  hisi_acc_vfio_pci: bugfix the problem of uninstalling driver
  hisi_acc_vfio_pci: bugfix cache write-back issue
  hisi_acc_vfio_pci: add eq and aeq interruption restore
  hisi_acc_vfio_pci: fix XQE dma address error
  vfio/type1: Remove Fine Grained Superpages detection
2025-05-29 22:09:08 -07:00
Matthew Brost
1a524e8b48 drm/xe: Do not warn on SVM migration failing because of 64k requirements
On platforms which only support 64k VRAM pages, it is expected that 4k
faults will not migrate. Do not warn on this, rather print a debug
message.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250529164338.1745515-1-matthew.brost@intel.com
2025-05-29 21:52:15 -07:00
Linus Torvalds
02897f5e56 Merge tag 'for-linus-6.16-1' of https://github.com/cminyard/linux-ipmi
Pull IPMI updates from Corey Minyard:
 "Restructure the IPMI driver.

  This is a restructure of the IPMI driver, mostly to remove SRCU. The
  locking had issues, and they were not going to be straightforward to
  fix. Plus it used tons of memory and was generally a pain.

  Most of this moves handling of messages out of bh and interrupt
  context and runs it in thread context. Then getting rid of SRCU is
  easy.

  This also has a minor cleanup to remove a warning on newer GCCs and to
  fix some documentation"

* tag 'for-linus-6.16-1' of https://github.com/cminyard/linux-ipmi: (26 commits)
  docs: ipmi: fix spelling and grammar mistakes
  ipmi:msghandler: Fix potential memory corruption in ipmi_create_user()
  ipmi:watchdog: Use the new interface for panic messages
  ipmi:msghandler: Export and fix panic messaging capability
  Documentation:ipmi: Remove comments about interrupt level
  ipmi:ssif: Fix a shutdown race
  ipmi:msghandler: Don't deliver messages to deleted users
  ipmi:si: Rework startup of IPMI devices
  ipmi:msghandler: Add a error return from unhandle LAN cmds
  ipmi:msghandler: Shut down lower layer first at unregister
  ipmi:msghandler: Remove proc_fs.h
  ipmi:msghandler: Don't check for shutdown when returning responses
  ipmi:msghandler: Don't acquire a user refcount for queued messages
  ipmi:msghandler: Fix locking around users and interfaces
  ipmi:msghandler: Remove some user level processing in panic mode
  ipmi: Add a note about the pretimeout callback
  ipmi:watchdog: Change lock to mutex
  ipmi:msghandler: Remove srcu for the ipmi_interfaces list
  ipmi:msghandler: Remove srcu from the ipmi user structure
  ipmi:msghandler: Use the system_wq, not system_bh_wq
  ...
2025-05-29 21:37:11 -07:00
Linus Torvalds
ae5ec8adb8 Merge tag 'tsm-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/devsec/tsm
Pull trusted security manager (TSM) updates from Dan Williams:

 - Add a general sysfs scheme for publishing "Measurement" values
   provided by the architecture's TEE Security Manager. Use it to
   publish TDX "Runtime Measurement Registers" ("RTMRs") that either
   maintain a hash of stored values (similar to a TPM PCR) or provide
   statically provisioned data. These measurements are validated by a
   relying party.

 - Reorganize the drivers/virt/coco/ directory for "host" and "guest"
   shared infrastructure.

 - Fix a configfs-tsm-report unregister bug

 - With CONFIG_TSM_MEASUREMENTS joining CONFIG_TSM_REPORTS and in
   anticipation of more shared "TSM" infrastructure arriving, rename the
   maintainer entry to "TRUSTED SECURITY MODULE (TSM) INFRASTRUCTURE".

* tag 'tsm-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/devsec/tsm:
  tsm-mr: Fix init breakage after bin_attrs constification by scoping non-const pointers to init phase
  sample/tsm-mr: Fix missing static for sample_report
  virt: tdx-guest: Transition to scoped_cond_guard for mutex operations
  virt: tdx-guest: Refactor and streamline TDREPORT generation
  virt: tdx-guest: Expose TDX MRs as sysfs attributes
  x86/tdx: tdx_mcall_get_report0: Return -EBUSY on TDCALL_OPERAND_BUSY error
  x86/tdx: Add tdx_mcall_extend_rtmr() interface
  tsm-mr: Add tsm-mr sample code
  tsm-mr: Add TVM Measurement Register support
  configfs-tsm-report: Fix NULL dereference of tsm_ops
  coco/guest: Move shared guest CC infrastructure to drivers/virt/coco/guest/
  configfs-tsm: Namespace TSM report symbols
2025-05-29 21:21:11 -07:00
Linus Torvalds
bbd9c366bf Merge tag 'x86_sgx_for_6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull Intel software guard extension (SGX) updates from Dave Hansen:
 "A couple of x86/sgx changes.

  The first one is a no-brainer to use the (simple) SHA-256 library.

  For the second one, some folks doing testing noticed that SGX systems
  under memory pressure were inducing fatal machine checks at pretty
  unnerving rates, despite the SGX code having _some_ awareness of
  memory poison.

  It turns out that the SGX reclaim path was not checking for poison
  _and_ it always accesses memory to copy it around. Make sure that
  poisoned pages are not reclaimed"

* tag 'x86_sgx_for_6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sgx: Prevent attempts to reclaim poisoned pages
  x86/sgx: Use SHA-256 library API instead of crypto_shash API
2025-05-29 21:13:17 -07:00
Linus Torvalds
b78f1293f9 Merge tag 'trace-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing updates from Steven Rostedt:

 - Have module addresses get updated in the persistent ring buffer

   The addresses of the modules from the previous boot are saved in the
   persistent ring buffer. If the same modules are loaded and an address
   is in the old buffer points to an address that was both saved in the
   persistent ring buffer and is loaded in memory, shift the address to
   point to the address that is loaded in memory in the trace event.

 - Print function names for irqs off and preempt off callsites

   When ignoring the print fmt of a trace event and just printing the
   fields directly, have the fields for preempt off and irqs off events
   still show the function name (via kallsyms) instead of just showing
   the raw address.

 - Clean ups of the histogram code

   The histogram functions saved over 800 bytes on the stack to process
   events as they come in. Instead, create per-cpu buffers that can hold
   this information and have a separate location for each context level
   (thread, softirq, IRQ and NMI).

   Also add some more comments to the code.

 - Add "common_comm" field for histograms

   Add "common_comm" that uses the current->comm as a field in an event
   histogram and acts like any of the other fields of the event.

 - Show "subops" in the enabled_functions file

   When the function graph infrastructure is used, a subsystem has a
   "subops" that it attaches its callback function to. Instead of the
   enabled_functions just showing a function calling the function that
   calls the subops functions, also show the subops functions that will
   get called for that function too.

 - Add "copy_trace_marker" option to instances

   There are cases where an instance is created for tooling to write
   into, but the old tooling has the top level instance hardcoded into
   the application. New tools want to consume the data from an instance
   and not the top level buffer. By adding a copy_trace_marker option,
   whenever the top instance trace_marker is written into, a copy of it
   is also written into the instance with this option set. This allows
   new tools to read what old tools are writing into the top buffer.

   If this option is cleared by the top instance, then what is written
   into the trace_marker is not written into the top instance. This is a
   way to redirect the trace_marker writes into another instance.

 - Have tracepoints created by DECLARE_TRACE() use trace_<name>_tp()

   If a tracepoint is created by DECLARE_TRACE() instead of
   TRACE_EVENT(), then it will not be exposed via tracefs. Currently
   there's no way to differentiate in the kernel the tracepoint
   functions between those that are exposed via tracefs or not. A
   calling convention has been made manually to append a "_tp" prefix
   for events created by DECLARE_TRACE(). Instead of doing this
   manually, force it so that all DECLARE_TRACE() events have this
   notation.

 - Use __string() for task->comm in some sched events

   Instead of hardcoding the comm to be TASK_COMM_LEN in some of the
   scheduler events use __string() which makes it dynamic. Note, if
   these events are parsed by user space it they may break, and the
   event may have to be converted back to the hardcoded size.

 - Have function graph "depth" be unsigned to the user

   Internally to the kernel, the "depth" field of the function graph
   event is signed due to -1 being used for end of boundary. What
   actually gets recorded in the event itself is zero or positive.
   Reflect this to user space by showing "depth" as unsigned int and be
   consistent across all events.

 - Allow an arbitrary long CPU string to osnoise_cpus_write()

   The filtering of which CPUs to write to can exceed 256 bytes. If a
   machine has 256 CPUs, and the filter is to filter every other CPU,
   the write would take a string larger than 256 bytes. Instead of using
   a fixed size buffer on the stack that is 256 bytes, allocate it to
   handle what is passed in.

 - Stop having ftrace check the per-cpu data "disabled" flag

   The "disabled" flag in the data structure passed to most ftrace
   functions is checked to know if tracing has been disabled or not.
   This flag was added back in 2008 before the ring buffer had its own
   way to disable tracing. The "disable" flag is now not always set when
   needed, and the ring buffer flag should be used in all locations
   where the disabled is needed. Since the "disable" flag is redundant
   and incorrect, stop using it. Fix up some locations that use the
   "disable" flag to use the ring buffer info.

 - Use a new tracer_tracing_disable/enable() instead of data->disable
   flag

   There's a few cases that set the data->disable flag to stop tracing,
   but this flag is not consistently used. It is also an on/off switch
   where if a function set it and calls another function that sets it,
   the called function may incorrectly enable it.

   Use a new trace_tracing_disable() and tracer_tracing_enable() that
   uses a counter and can be nested. These use the ring buffer flags
   which are always checked making the disabling more consistent.

 - Save the trace clock in the persistent ring buffer

   Save what clock was used for tracing in the persistent ring buffer
   and set it back to that clock after a reboot.

 - Remove unused reference to a per CPU data pointer in mmiotrace
   functions

 - Remove unused buffer_page field from trace_array_cpu structure

 - Remove more strncpy() instances

 - Other minor clean ups and fixes

* tag 'trace-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (36 commits)
  tracing: Fix compilation warning on arm32
  tracing: Record trace_clock and recover when reboot
  tracing/sched: Use __string() instead of fixed lengths for task->comm
  tracepoint: Have tracepoints created with DECLARE_TRACE() have _tp suffix
  tracing: Cleanup upper_empty() in pid_list
  tracing: Allow the top level trace_marker to write into another instances
  tracing: Add a helper function to handle the dereference arg in verifier
  tracing: Remove unnecessary "goto out" that simply returns ret is trigger code
  tracing: Fix error handling in event_trigger_parse()
  tracing: Rename event_trigger_alloc() to trigger_data_alloc()
  tracing: Replace deprecated strncpy() with strscpy() for stack_trace_filter_buf
  tracing: Remove unused buffer_page field from trace_array_cpu structure
  tracing: Use atomic_inc_return() for updating "disabled" counter in irqsoff tracer
  tracing: Convert the per CPU "disabled" counter to local from atomic
  tracing: branch: Use trace_tracing_is_on_cpu() instead of "disabled" field
  ring-buffer: Add ring_buffer_record_is_on_cpu()
  tracing: Do not use per CPU array_buffer.data->disabled for cpumask
  ftrace: Do not disabled function graph based on "disabled" field
  tracing: kdb: Use tracer_tracing_on/off() instead of setting per CPU disabled
  tracing: Use tracer_tracing_disable() instead of "disabled" field for ftrace_dump_one()
  ...
2025-05-29 21:04:36 -07:00
Linus Torvalds
472c5f736b Merge tag 'trace-tools-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing tools updates from Steven Rostedt:

 - Set distinctive value for failed tests

   When running "make check" that performs tests on rtla the failure is
   checked by examining the output. Instead have the tool return an
   error status if it exceeds the threadhold.

 - Define __NR_sched_setattr for LoongArch

   Define __NR_sched_setattr to allow this to build for LoongArch.

 - Define _GNU_SOURCE for timerlat_bpf.c

   Due to modifications of struct sched_attr in utils.h when _GNU_SOURCE
   is not defined, this can cause errors for timerlat_bpf_init() and
   breakage in BPF sample collection mode.

* tag 'trace-tools-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  rtla: Define _GNU_SOURCE in timerlat_bpf.c
  rtla: Define __NR_sched_setattr for LoongArch
  rtla: Set distinctive exit value for failed tests
2025-05-29 20:59:52 -07:00
Anubhav Shelat
8c56bfe53b perf trace: Set errpid to false for rseq and set_robust_list
The 'rseq' and 'set_robust_list' syscalls don't return a pid, so set
errpid for both to false.

Fixes: 0c1019e346 ("perf trace: Mark the 'rseq' arg in the rseq syscall as coming from user space")
Fixes: 1de5b5dcb8 ("perf trace: Mark the 'head' arg in the set_robust_list syscall as coming from user space")
Signed-off-by: Anubhav Shelat <ashelat@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250529143334.1469669-2-ashelat@redhat.com
[ Remove explicit .errpid = false, omitting its initialization zeroes it, as noted by Namhyung ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-29 22:09:37 -03:00
Sylvan Smit
7a17bbc1d9 rust: list: Fix typo much in arc.rs
Correct the typo (s/much/must) in the ListArc documentation.

Reported-by: Miguel Ojeda <ojeda@kernel.org>
Closes: https://github.com/Rust-for-Linux/linux/issues/1166
Fixes: a48026315c ("rust: list: add tracking for ListArc")
Signed-off-by: Sylvan Smit <sylvan@sylvansmit.com>
Reviewed-by: Benno Lossin <lossin@kernel.org>
Link: https://lore.kernel.org/r/20250529162923.434978-1-sylvan@sylvansmit.com
[ Changed tag to "Reported-by" and sorted. - Miguel ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-05-29 23:35:44 +02:00
Balasubramani Vivekanandan
241cc827c0 drm/xe/mocs: Initialize MOCS index early
MOCS uc_index is used even before it is initialized in the following
callstack
    guc_prepare_xfer()
    __xe_guc_upload()
    xe_guc_min_load_for_hwconfig()
    xe_uc_init_hwconfig()
    xe_gt_init_hwconfig()

Do MOCS index initialization earlier in the device probe.

Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Reviewed-by: Ravi Kumar Vodapalli <ravi.kumar.vodapalli@intel.com>
Link: https://lore.kernel.org/r/20250520142445.2792824-1-balasubramani.vivekanandan@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-05-29 14:29:18 -07:00
Tamir Duberstein
b20fbbc08a rust: check type of $ptr in container_of!
Add a compile-time check that `*$ptr` is of the type of `$type->$($f)*`.
Rename those placeholders for clarity.

Given the incorrect usage:

> diff --git a/rust/kernel/rbtree.rs b/rust/kernel/rbtree.rs
> index 8d978c896747..6a7089149878 100644
> --- a/rust/kernel/rbtree.rs
> +++ b/rust/kernel/rbtree.rs
> @@ -329,7 +329,7 @@ fn raw_entry(&mut self, key: &K) -> RawEntry<'_, K, V> {
>          while !(*child_field_of_parent).is_null() {
>              let curr = *child_field_of_parent;
>              // SAFETY: All links fields we create are in a `Node<K, V>`.
> -            let node = unsafe { container_of!(curr, Node<K, V>, links) };
> +            let node = unsafe { container_of!(curr, Node<K, V>, key) };
>
>              // SAFETY: `node` is a non-null node so it is valid by the type invariants.
>              match key.cmp(unsafe { &(*node).key }) {

this patch produces the compilation error:

> error[E0308]: mismatched types
>    --> rust/kernel/lib.rs:220:45
>     |
> 220 |         $crate::assert_same_type(field_ptr, (&raw const (*container_ptr).$($fields)*).cast_mut());
>     |         ------------------------ ---------  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected `*mut rb_node`, found `*mut K`
>     |         |                        |
>     |         |                        expected all arguments to be this `*mut bindings::rb_node` type because they need to match the type of this parameter
>     |         arguments to this function are incorrect
>     |
>    ::: rust/kernel/rbtree.rs:270:6
>     |
> 270 | impl<K, V> RBTree<K, V>
>     |      - found this type parameter
> ...
> 332 |             let node = unsafe { container_of!(curr, Node<K, V>, key) };
>     |                                 ------------------------------------ in this macro invocation
>     |
>     = note: expected raw pointer `*mut bindings::rb_node`
>                found raw pointer `*mut K`
> note: function defined here
>    --> rust/kernel/lib.rs:227:8
>     |
> 227 | pub fn assert_same_type<T>(_: T, _: T) {}
>     |        ^^^^^^^^^^^^^^^^ -  ----  ---- this parameter needs to match the `*mut bindings::rb_node` type of parameter #1
>     |                         |  |
>     |                         |  parameter #2 needs to match the `*mut bindings::rb_node` type of this parameter
>     |                         parameter #1 and parameter #2 both reference this parameter `T`
>     = note: this error originates in the macro `container_of` (in Nightly builds, run with -Z macro-backtrace for more info)

[ We decided to go with a variation of v1 [1] that became v4, since it
  seems like the obvious approach, the error messages seem good enough
  and the debug performance should be fine, given the kernel is always
  built with -O2.

  In the future, we may want to make the helper non-hidden, with
  proper documentation, for others to use.

  [1] https://lore.kernel.org/rust-for-linux/CANiq72kQWNfSV0KK6qs6oJt+aGdgY=hXg=wJcmK3zYcokY1LNw@mail.gmail.com/

    - Miguel ]

Suggested-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/all/CAH5fLgh6gmqGBhPMi2SKn7mCmMWfOSiS0WP5wBuGPYh9ZTAiww@mail.gmail.com/
Signed-off-by: Tamir Duberstein <tamird@gmail.com>
Reviewed-by: Benno Lossin <lossin@kernel.org>
Link: https://lore.kernel.org/r/20250529-b4-container-of-type-check-v4-1-bf3a7ad73cec@gmail.com
[ Added intra-doc link. - Miguel ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-05-29 23:16:38 +02:00
Ahmed Salem
d0b29661a9 ACPICA: Switch back to using strncpy() in acpi_ut_safe_strncpy()
ACPICA commit b90d0d65ec97ff8279ad826f4102e0d31c5f662a

I mistakenly replaced strncpy() with memcpy() in commit ebf2776542
("ACPICA: Replace strncpy() with memcpy()"), not realizing the entire
context behind *why* strncpy() was used.

In this safer implementation of strncpy(), it does not make
sense to use memcpy() only to null-terminate strings passed to
acpi_ut_safe_strncpy() one byte early.

The consequences of doing so are understandably *bad*, as was
evident by the kernel test bot reporting problems [1].

Fixes: ebf2776542 ("ACPICA: Replace strncpy() with memcpy()")
Link: https://lore.kernel.org/all/202505081033.50e45ff4-lkp@intel.com [1]
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202505081033.50e45ff4-lkp@intel.com
Link: https://github.com/acpica/acpica/commit/b90d0d65
Signed-off-by: Ahmed Salem <x0rw3ll@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/12685690.O9o76ZdvQC@rjwysocki.net
2025-05-29 21:19:10 +02:00
Linus Torvalds
e0797d3b91 Merge tag 'fs_for_v6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext2 and isofs updates from Jan Kara:

 - isofs fix of handling of particularly formatted Rock Ridge timestamps

 - Add deprecation notice about support of DAX in ext2 filesystem driver

* tag 'fs_for_v6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  ext2: Deprecate DAX
  isofs: fix Y2038 and Y2156 issues in Rock Ridge TF entry
2025-05-29 10:38:23 -07:00
Linus Torvalds
db340159f1 Merge tag 'fsnotify_for_v6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fsnotify updates from Jan Kara:
 "Two fanotify cleanups and support for watching namespace-owned
  filesystems by namespace admins (most useful for being able to watch
  for new mounts / unmounts happening within a user namespace)"

* tag 'fsnotify_for_v6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  fanotify: support watching filesystems and mounts inside userns
  fanotify: remove redundant permission checks
  fanotify: Drop use of flex array in fanotify_fh
2025-05-29 10:34:26 -07:00
Linus Torvalds
1193e205db Merge tag 'platform-drivers-x86-v6.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform drivers updates from Ilpo Järvinen:
 "The changes are mostly business as usual. Besides pdx86 changes, there
  are a few power supply changes needed for related pdx86 features, move
  of oxpec driver from hwmon (oxp-sensors) to pdx86, and one FW version
  warning to hid-asus.

  Highlights:

   - alienware-wmi-wmax:
       - Add HWMON support
       - Add ABI and admin-guide documentation
       - Expose GPIO debug methods through debug FS
       - Support manual fan control and "custom" thermal profile

   - amd/hsmp:
       - Add sysfs files to show HSMP telemetry
       - Report power readings and limits via hwmon

   - amd/isp4: Add AMD ISP platform config for OV05C10

   - asus-wmi:
       - Refactor Ally suspend/resume to work better with older FW
       - hid-asus: check ROG Ally MCU version and warn about old FW versions

   - dasharo-acpi:
       - Add driver for Dasharo devices supporting fans and temperatures
         monitoring

   - dell-ddv:
       - Expose the battery health and manufacture date to userspace
         using power supply extensions
       - Implement the battery matching algorithm

   - dell-pc:
       - Improve error propagation
       - Use faux device

   - int3472:
       - Add delays to avoid GPIO regulator spikes
       - Add handshake pin support
       - Make regulator supply name configurable and allow registering
         more than 1 GPIO regulator
       - Map mt9m114 powerdown pin to powerenable

   - intel/pmc: Add separate SSRAM Telemetry driver

   - intel-uncore-freq: Add attributes to show agent types and die ID

   - ISST:
       - Support SST-TF revision 2 (allows more cores per bucket)
       - Support SST-PP revision 2 (fabric 1 frequencies)
       - Remove unnecessary SST MSRs restore (the package retains MSRs
         despite CPU offlining)

   - mellanox: Add support for SN2201, SN4280, SN5610, and SN5640

   - mellanox: mlxbf-pmc: Support additional PMC blocks

   - oxpec:
       - Add OneXFly variants
       - Add support for charge limit, charge thresholds, and turbo LED
       - Distinguish current X1 variants to avoid unwanted matching to
         new variants
       - Follow hwmon conventions
       - Move from hwmon/oxp-sensors to platform/x86 to match the
         enlarged scope

   - power supply:
       - Add inhibit-charge-awake (needed by oxpec)
       - Add additional battery health status values ("blown fuse" and
         "cell imbalance") (needed by dell-ddv)

   - powerwell-ec: Add driver for Portwell EC supporting GPIO and watchdog

   - thinkpad-acpi: Support camera shutter switch hotkey

   - tuxedo: Add virtual LampArray for TUXEDO NB04 devices

   - tools/power/x86/intel-speed-select:
       - Support displaying SST-PP revision 2 fields
       - Skip uncore frequency update on newer generations of CPUs

   - Miscellaneous cleanups / refactoring / improvements"

* tag 'platform-drivers-x86-v6.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (112 commits)
  thermal/drivers/acerhdf: Constify struct thermal_zone_device_ops
  platform/x86/amd/hsmp: fix building with CONFIG_HWMON=m
  platform/x86: asus-wmi: fix build without CONFIG_SUSPEND
  docs: ABI: Fix "aassociated" to "associated"
  platform/x86: Add AMD ISP platform config for OV05C10
  Documentation: admin-guide: pm: Add documentation for die_id
  platform/x86/intel-uncore-freq: Add attributes to show die_id
  platform/x86/intel: power-domains: Add interface to get Linux die ID
  Documentation: admin-guide: pm: Add documentation for agent_types
  platform/x86/intel-uncore-freq: Add attributes to show agent types
  platform/x86/tuxedo: Prevent invalid Kconfig state
  platform/x86: dell-ddv: Expose the battery health to userspace
  platform/x86: dell-ddv: Expose the battery manufacture date to userspace
  platform/x86: dell-ddv: Implement the battery matching algorithm
  power: supply: core: Add additional health status values
  platform/x86/amd/hsmp: acpi: Add sysfs files to display HSMP telemetry
  platform/x86/amd/hsmp: Report power via hwmon sensors
  platform/x86/amd/hsmp: Use a single DRIVER_VERSION for all hsmp modules
  platform/mellanox: mlxreg-dpu: Fix smatch warnings
  platform: mellanox: nvsw-sn2200: Fix .items in nvsw_sn2201_busbar_hotplug
  ...
2025-05-29 10:19:22 -07:00
Niranjana Vishwanathapura
fbeaad071a drm/xe: Create LRC BO without VM
Specifying VM during lrc->bo creation requires VM's reference
to be held for the lifetime of lrc->bo as it will use VM's dma
reservation object. Using VM's dma reservation object for
lrc->bo doesn't provide any advantage. Hence do not pass VM
while creating lrc->bo.

v2: Use xe_bo_unpin_map_no_vm (Matthew Brost)

Fixes: 264eecdba2 ("drm/xe: Decouple xe_exec_queue and xe_lrc")
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250529052031.2429120-2-niranjana.vishwanathapura@intel.com
2025-05-29 09:18:31 -07:00
Linus Torvalds
9d230d500b Merge tag 'driver-core-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core
Pull driver core updates from Greg KH:
 "Here are the driver core / kernfs changes for 6.16-rc1.

  Not a huge number of changes this development cycle, here's the
  summary of what is included in here:

   - kernfs locking tweaks, pushing some global locks down into a per-fs
     image lock

   - rust driver core and pci device bindings added for new features.

   - sysfs const work for bin_attributes.

     The final churn of switching away from and removing the
     transitional struct members, "read_new", "write_new" and
     "bin_attrs_new" will come after the merge window to avoid
     unnecesary merge conflicts.

   - auxbus device creation helpers added

   - fauxbus fix for creating sysfs files after the probe completed
     properly

   - other tiny updates for driver core things.

  All of these have been in linux-next for over a week with no reported
  issues"

* tag 'driver-core-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core:
  kernfs: Relax constraint in draining guard
  Documentation: embargoed-hardware-issues.rst: Remove myself
  drivers: hv: fix up const issue with vmbus_chan_bin_attrs
  firmware_loader: use SHA-256 library API instead of crypto_shash API
  docs: debugfs: do not recommend debugfs_remove_recursive
  PM: wakeup: Do not expose 4 device wakeup source APIs
  kernfs: switch global kernfs_rename_lock to per-fs lock
  kernfs: switch global kernfs_idr_lock to per-fs lock
  driver core: auxiliary bus: Fix IS_ERR() vs NULL mixup in __devm_auxiliary_device_create()
  sysfs: constify attribute_group::bin_attrs
  sysfs: constify bin_attribute argument of bin_attribute::read/write()
  software node: Correct a OOB check in software_node_get_reference_args()
  devres: simplify devm_kstrdup() using devm_kmemdup()
  platform: replace magic number with macro PLATFORM_DEVID_NONE
  component: do not try to unbind unbound components
  driver core: auxiliary bus: add device creation helpers
  driver core: faux: Add sysfs groups after probing
2025-05-29 09:11:39 -07:00
Niravkumar L Rabara
e5ef4cd2a4 EDAC/altera: Use correct write width with the INTTEST register
On the SoCFPGA platform, the INTTEST register supports only 16-bit writes.
A 32-bit write triggers an SError to the CPU so do 16-bit accesses only.

  [ bp: AI-massage the commit message. ]

Fixes: c7b4be8db8 ("EDAC, altera: Add Arria10 OCRAM ECC support")
Signed-off-by: Niravkumar L Rabara <niravkumar.l.rabara@intel.com>
Signed-off-by: Matthew Gerlach <matthew.gerlach@altera.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Dinh Nguyen <dinguyen@kernel.org>
Cc: stable@kernel.org
Link: https://lore.kernel.org/20250527145707.25458-1-matthew.gerlach@altera.com
2025-05-29 17:38:55 +02:00
Rafael J. Wysocki
70523f3357 Revert "x86/smp: Eliminate mwait_play_dead_cpuid_hint()"
Revert commit 96040f7273 ("x86/smp: Eliminate mwait_play_dead_cpuid_hint()")
because it introduced a significant power regression on systems that
start with "nosmt" in the kernel command line.

Namely, on such systems, SMT siblings permanently go offline early,
when cpuidle has not been initialized yet, so after the above commit,
hlt_play_dead() is called for them.  Later on, when the processor
attempts to enter a deep package C-state, including PC10 which is
requisite for reaching minimum power in suspend-to-idle, it is not
able to do that because of the SMT siblings staying in C1 (which
they have been put into by HLT).

As a result, the idle power (including power in suspend-to-idle)
rises quite dramatically on those systems with all of the possible
consequences, which (needless to say) may not be expected by their
users.

This issue is hard to debug and potentially dangerous, so it needs to
be addressed as soon as possible in a way that will work for 6.15.y,
hence the revert.

Of course, after this revert, the issue that commit 96040f7273
attempted to address will be back and it will need to be fixed again
later.

Fixes: 96040f7273 ("x86/smp: Eliminate mwait_play_dead_cpuid_hint()")
Reported-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Tested-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Cc: 6.15+ <stable@vger.kernel.org> # 6.15+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://patch.msgid.link/12674167.O9o76ZdvQC@rjwysocki.net
2025-05-29 17:34:18 +02:00
Robert Mader
2271e0a20e drm: drm_fourcc: add 10/12/16bit software decoder YCbCr formats
This adds FOURCCs for 3-plane 10/12/16bit YCbCr formats used by software
decoders like ffmpeg, dav1d and libvpx. The intended use-case is buffer
sharing between decoders and GPUs by allocating buffers with e.g. udmabuf
or dma-heaps, avoiding unnecessary copies and format conversions in
various scenarios.

Unlike formats typically used by hardware decoders the 10/12bit formats
use a LSB alignment. In order to allow fast implementations in GL
and Vulkan the padding must contain only zeros, so the float
representation can be calculated by multiplying with 2^6=64 or 2^4=16
respectively.

MRs or branches for Mesa, Vulkan, Gstreamer, Weston and Mutter can be found at:
 - https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34303
 - https://github.com/rmader/Vulkan-Docs/commits/ycbcr-16bit-lsb-formats/
 - https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/8540
 - https://gitlab.freedesktop.org/wayland/weston/-/merge_requests/1753
 - https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/4348

The naming scheme follows the 'P' and 'Q' formats. The 'S' stands for
'software' and was selected in order to make remembering easy.

The 'Sx16' formats could as well be 'Qx16'. We stick with 'S' as 16bit software
decoders are likely much more common than hardware ones for the foreseeable
future. Note that these formats already have Vulkan equivalents:
 - VK_FORMAT_G16_B16_R16_3PLANE_420_UNORM
 - VK_FORMAT_G16_B16_R16_3PLANE_422_UNORM
 - VK_FORMAT_G16_B16_R16_3PLANE_444_UNORM

Signed-off-by: Robert Mader <robert.mader@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Link: https://lore.kernel.org/r/20250509133535.60330-1-robert.mader@collabora.com
Signed-off-by: Daniel Stone <daniels@collabora.com>
2025-05-29 16:32:58 +01:00
Linus Torvalds
bf373e4c78 Merge tag 'devicetree-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree updates from Rob Herring:
 "DT Bindings:

   - Convert all remaining interrupt-controller bindings to DT schema

   - Convert Rockchip CDN-DP and Freescale TCON, M4IF, TigerP, LDB, PPC
     PMC, imx-drm, and ftm-quaddec to DT schema

   - Add bindings for fsl,vf610-pit, fsl,ls1021a-wdt, sgx,vz89te,
     maxim,max30208, ti,lp8864, and fairphone,fp5-sndcard

   - Add top-level constraints for renesas,vsp1 and renesas,fcp

   - Add missing constraint in amlogic,pinctrl-a4 'group' nodes

   - Adjust the allowed properties for dwc3-xilinx, sony,imx219,
     pci-iommu, and renesas,dsi

   - Add EcoNet vendor prefix

   - Fix the reserved-memory.yaml in fsl,qman-fqd

   - Drop obsolete numa.txt and cpu-topology.txt which are schemas in
     dtschema now

   - Drop Renesas RZ/N1S bindings

   - Ensure Arm cpu nodes don't allow undocumented properties. Add all
     the properties which are in use and undocumented. Drop the Mediatek
     cpufreq binding which is not a binding, but just what DT properties
     the driver uses.

   - Add compatibles for Renesas RZ/G3E and RZ/V2N Mali Bifrost GPU

   - Update documentation on defining child nodes with separate schemas

   - Add bindings to PSCI MAINTAINERS entry

  DT core:

   - Add new functions to simplify driver handling of 'memory-region'
     properties. Users to be added next cycle.

   - Simplify of_dma_set_restricted_buffer() to use
     of_for_each_phandle()

   - Add missing unlock on error in unittest_data_add()"

* tag 'devicetree-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (87 commits)
  dt-bindings: timer: Add fsl,vf610-pit.yaml
  dt-bindings: gpu: mali-bifrost: Add compatible for RZ/G3E SoC
  ASoC: dt-bindings: qcom,sm8250: Add Fairphone 5 sound card
  dt-bindings: arm/cpus: Allow 2 power-domains entries
  dt-bindings: usb: dwc3-xilinx: allow dma-coherent
  media: dt-bindings: sony,imx219: Allow props from video-interface-devices
  dt-bindings: soundwire: qcom: Document v2.1.0 version of IP block
  dt-bindings: watchdog: fsl-imx-wdt: add compatible string fsl,ls1021a-wdt
  dt-bindings: pinctrl: amlogic,pinctrl-a4: Add missing constraint on allowed 'group' node properties
  dt-bindings: display: rockchip: Convert cdn-dp-rockchip.txt to yaml
  dt-bindings: display: bridge: renesas,dsi: allow properties from dsi-controller
  dt-bindings: trivial-devices: Add VZ89TE to trivial
  media: dt-bindings: renesas,vsp1: add top-level constraints
  media: dt-bindings: renesas,fcp: add top-level constraints
  dt-bindings: trivial-devices: Add Maxim max30208
  dt-bindings: soc: fsl,qman-fqd: Fix reserved-memory.yaml reference
  dt-bindings: interrupt-controller: Convert ti,omap-intc-irq to DT schema
  dt-bindings: interrupt-controller: Convert ti,omap4-wugen-mpu to DT schema
  dt-bindings: interrupt-controller: Convert ti,keystone-irq to DT schema
  dt-bindings: interrupt-controller: Convert technologic,ts4800-irqc to DT schema
  ...
2025-05-29 08:22:07 -07:00
Linus Torvalds
8ca154e491 Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio updates from Michael Tsirkin:

 - A new virtio RTC driver

 - vhost scsi now logs write descriptors so migration works

 - Some hardening work in virtio core

 - An old spec compliance issue fixed in vhost net

 - A couple of cleanups, fixes in vringh, virtio-pci, vdpa

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  virtio: reject shm region if length is zero
  virtio_rtc: Add RTC class driver
  virtio_rtc: Add Arm Generic Timer cross-timestamping
  virtio_rtc: Add PTP clocks
  virtio_rtc: Add module and driver core
  vringh: use bvec_kmap_local
  vhost: vringh: Use matching allocation type in resize_iovec()
  virtio-pci: Fix result size returned for the admin command completion
  vdpa/octeon_ep: Control PCI dev enabling manually
  vhost-scsi: log event queue write descriptors
  vhost-scsi: log control queue write descriptors
  vhost-scsi: log I/O queue write descriptors
  vhost-scsi: adjust vhost_scsi_get_desc() to log vring descriptors
  vhost: modify vhost_log_write() for broader users
2025-05-29 08:15:35 -07:00
Linus Torvalds
43db111107 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
 "As far as x86 goes this pull request "only" includes TDX host support.

  Quotes are appropriate because (at 6k lines and 100+ commits) it is
  much bigger than the rest, which will come later this week and
  consists mostly of bugfixes and selftests. s390 changes will also come
  in the second batch.

  ARM:

   - Add large stage-2 mapping (THP) support for non-protected guests
     when pKVM is enabled, clawing back some performance.

   - Enable nested virtualisation support on systems that support it,
     though it is disabled by default.

   - Add UBSAN support to the standalone EL2 object used in nVHE/hVHE
     and protected modes.

   - Large rework of the way KVM tracks architecture features and links
     them with the effects of control bits. While this has no functional
     impact, it ensures correctness of emulation (the data is
     automatically extracted from the published JSON files), and helps
     dealing with the evolution of the architecture.

   - Significant changes to the way pKVM tracks ownership of pages,
     avoiding page table walks by storing the state in the hypervisor's
     vmemmap. This in turn enables the THP support described above.

   - New selftest checking the pKVM ownership transition rules

   - Fixes for FEAT_MTE_ASYNC being accidentally advertised to guests
     even if the host didn't have it.

   - Fixes for the address translation emulation, which happened to be
     rather buggy in some specific contexts.

   - Fixes for the PMU emulation in NV contexts, decoupling PMCR_EL0.N
     from the number of counters exposed to a guest and addressing a
     number of issues in the process.

   - Add a new selftest for the SVE host state being corrupted by a
     guest.

   - Keep HCR_EL2.xMO set at all times for systems running with the
     kernel at EL2, ensuring that the window for interrupts is slightly
     bigger, and avoiding a pretty bad erratum on the AmpereOne HW.

   - Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers
     from a pretty bad case of TLB corruption unless accesses to HCR_EL2
     are heavily synchronised.

   - Add a per-VM, per-ITS debugfs entry to dump the state of the ITS
     tables in a human-friendly fashion.

   - and the usual random cleanups.

  LoongArch:

   - Don't flush tlb if the host supports hardware page table walks.

   - Add KVM selftests support.

  RISC-V:

   - Add vector registers to get-reg-list selftest

   - VCPU reset related improvements

   - Remove scounteren initialization from VCPU reset

   - Support VCPU reset from userspace using set_mpstate() ioctl

  x86:

   - Initial support for TDX in KVM.

     This finally makes it possible to use the TDX module to run
     confidential guests on Intel processors. This is quite a large
     series, including support for private page tables (managed by the
     TDX module and mirrored in KVM for efficiency), forwarding some
     TDVMCALLs to userspace, and handling several special VM exits from
     the TDX module.

     This has been in the works for literally years and it's not really
     possible to describe everything here, so I'll defer to the various
     merge commits up to and including commit 7bcf7246c4 ('Merge
     branch 'kvm-tdx-finish-initial' into HEAD')"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (248 commits)
  x86/tdx: mark tdh_vp_enter() as __flatten
  Documentation: virt/kvm: remove unreferenced footnote
  RISC-V: KVM: lock the correct mp_state during reset
  KVM: arm64: Fix documentation for vgic_its_iter_next()
  KVM: arm64: np-guest CMOs with PMD_SIZE fixmap
  KVM: arm64: Stage-2 huge mappings for np-guests
  KVM: arm64: Add a range to pkvm_mappings
  KVM: arm64: Convert pkvm_mappings to interval tree
  KVM: arm64: Add a range to __pkvm_host_test_clear_young_guest()
  KVM: arm64: Add a range to __pkvm_host_wrprotect_guest()
  KVM: arm64: Add a range to __pkvm_host_unshare_guest()
  KVM: arm64: Add a range to __pkvm_host_share_guest()
  KVM: arm64: Introduce for_each_hyp_page
  KVM: arm64: Handle huge mappings for np-guest CMOs
  KVM: arm64: nv: Release faulted-in VNCR page from mmu_lock critical section
  KVM: arm64: nv: Handle TLBI S1E2 for VNCR invalidation with mmu_lock held
  KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translating
  RISC-V: KVM: add KVM_CAP_RISCV_MP_STATE_RESET
  RISC-V: KVM: Remove scounteren initialization
  KVM: RISC-V: remove unnecessary SBI reset state
  ...
2025-05-29 08:10:01 -07:00
Linus Torvalds
12e9b9e522 Merge tag 'ipe-pr-20250527' of git://git.kernel.org/pub/scm/linux/kernel/git/wufan/ipe
Pull IPE update from Fan Wu:
 "A single commit from Jasjiv Singh, that adds an errno field to IPE
  policy load auditing to log failures with error details, not just
  successes.

  This improves the security audit trail and helps diagnose policy
  deployment issues"

* tag 'ipe-pr-20250527' of git://git.kernel.org/pub/scm/linux/kernel/git/wufan/ipe:
  ipe: add errno field to IPE policy load auditing
2025-05-29 08:01:53 -07:00
Stephan Gerhold
d0b497df02 mailbox: qcom-apcs-ipc: Assign OF node to clock controller child device
Currently, the child device for the clock controller inside the APCS block
is created without any OF node assigned, so the drivers need to rely on the
parent device for obtaining any resources.

Add support for defining the clock controller inside a "clock-controller"
subnode to break up circular dependencies between the mailbox and required
parent clocks of the clock controller. For backwards compatibility, if the
subnode is not defined, reuse the OF node from the parent device.

Signed-off-by: Stephan Gerhold <stephan.gerhold@linaro.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
2025-05-29 10:01:35 -05:00
Stephan Gerhold
c3c5138714 dt-bindings: mailbox: qcom,apcs: Add separate node for clock-controller
APCS "global" is sort of a "miscellaneous" hardware block that combines
multiple registers inside the application processor subsystem. Two distinct
use cases are currently stuffed together in a single device tree node:

 - Mailbox: to communicate with other remoteprocs in the system.
 - Clock: for controlling the CPU frequency.

These two use cases have unavoidable circular dependencies: the mailbox is
needed as early as possible during boot to start controlling shared
resources like clocks and power domains, while the clock controller needs
one of these shared clocks as its parent. Currently, there is no way to
distinguish these two use cases for generic mechanisms like fw_devlink.

This is currently blocking conversion of the deprecated custom "qcom,ipc"
properties to the standard "mboxes", see e.g. commit d92e9ea2f0
("arm64: dts: qcom: msm8939: revert use of APCS mbox for RPM"):
  1. remoteproc &rpm needs mboxes = <&apcs1_mbox 8>;
  2. The clock controller inside &apcs1_mbox needs
     clocks = <&rpmcc RPM_SMD_XO_CLK_SRC>.
  3. &rpmcc is a child of remoteproc &rpm

The mailbox itself does not need any clocks and should probe early to
unblock the rest of the boot process. The "clocks" are only needed for the
separate clock controller. In Linux, these are already two separate drivers
that can probe independently.

Break up the circular dependency chain in the device tree by separating the
clock controller into a separate child node. Deprecate the old approach of
specifying the clock properties as part of the root node, but keep them for
backwards compatibility.

Signed-off-by: Stephan Gerhold <stephan.gerhold@linaro.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
2025-05-29 10:01:30 -05:00
Lang Yu
30837a49bd drm/amdkfd: Map wptr BO to GART unconditionally
For simulation C models that don't run CP FW where adev->mes.sched_version
is not populated correctly. This causes NULL dereference in
amdgpu_amdkfd_free_gtt_mem(dev->adev, (void **)&pqn->q->wptr_bo_gart)
and warning on unpinned BO in amdgpu_bo_gpu_offset(q->properties.wptr_bo).

Compared with adding version check here and there,
always map wptr BO to GART simplifies things.

v2: Add NULL check in amdgpu_amdkfd_free_gtt_mem.(Philip)

Signed-off-by: Lang Yu <lang.yu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-29 10:58:44 -04:00
Alex Deucher
684530526f drm/amdgpu/mes: remove some unused functions
Nothing uses them so remove them.  Leftover from
MES bring up.

Reviewed-by: Michael Chen <michael.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-29 10:58:39 -04:00
Alex Deucher
40f970ba7a drm/amdgpu/mes: add missing locking in helper functions
We need to take the MES lock.

Reviewed-by: Michael Chen <michael.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2025-05-29 10:58:10 -04:00