Commit Graph

1323415 Commits

Author SHA1 Message Date
Daniel Lee
91b587ba79 f2fs: Introduce linear search for dentries
This patch addresses an issue where some files in case-insensitive
directories become inaccessible due to changes in how the kernel function,
utf8_casefold(), generates case-folded strings from the commit 5c26d2f1d3
("unicode: Don't special case ignorable code points").

F2FS uses these case-folded names to calculate hash values for locating
dentries and stores them on disk. Since utf8_casefold() can produce
different output across kernel versions, stored hash values and newly
calculated hash values may differ. This results in affected files no
longer being found via the hash-based lookup.

To resolve this, the patch introduces a linear search fallback.
If the initial hash-based search fails, F2FS will sequentially scan the
directory entries.

Fixes: 5c26d2f1d3 ("unicode: Don't special case ignorable code points")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219586
Signed-off-by: Daniel Lee <chullee@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-08 18:31:45 +00:00
Yi Sun
d217b5cea4 f2fs: add parameter @len to f2fs_invalidate_internal_cache()
New function can process some consecutive blocks at a time.

Signed-off-by: Yi Sun <yi.sun@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-08 18:31:45 +00:00
Yi Sun
3d56fbb1f0 f2fs: expand f2fs_invalidate_compress_page() to f2fs_invalidate_compress_pages_range()
New function f2fs_invalidate_compress_pages_range() adds the @len
parameter. So it can process some consecutive blocks at a time.

Signed-off-by: Yi Sun <yi.sun@unisoc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-08 18:31:45 +00:00
Dmitry Antipov
76f01376df f2fs: ensure that node info flags are always initialized
Syzbot has reported the following KMSAN splat:

BUG: KMSAN: uninit-value in f2fs_new_node_page+0x1494/0x1630
 f2fs_new_node_page+0x1494/0x1630
 f2fs_new_inode_page+0xb9/0x100
 f2fs_init_inode_metadata+0x176/0x1e90
 f2fs_add_inline_entry+0x723/0xc90
 f2fs_do_add_link+0x48f/0xa70
 f2fs_symlink+0x6af/0xfc0
 vfs_symlink+0x1f1/0x470
 do_symlinkat+0x471/0xbc0
 __x64_sys_symlink+0xcf/0x140
 x64_sys_call+0x2fcc/0x3d90
 do_syscall_64+0xd9/0x1b0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Local variable new_ni created at:
 f2fs_new_node_page+0x9d/0x1630
 f2fs_new_inode_page+0xb9/0x100

So adjust 'f2fs_get_node_info()' to ensure that 'flag'
field of 'struct node_info' is always initialized.

Reported-by: syzbot+5141f6db57a2f7614352@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=5141f6db57a2f7614352
Fixes: e05df3b115 ("f2fs: add node operations")
Suggested-by: Chao Yu <chao@kernel.org>
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16 16:12:54 +00:00
Yongpeng Yang
e9a844f6e4 f2fs: The GC triggered by ioctl also needs to mark the segno as victim
In SSR mode, the segment selected for allocation might be the same as
the target segment of the GC triggered by ioctl, resulting in the GC
moving the CURSEG_I(sbi, type)->segno.
Thread A				Thread B or Thread A
- f2fs_ioc_gc_range
 - __f2fs_ioc_gc_range(.victim_segno=segno#N)
  - f2fs_gc
   - __get_victim
    - f2fs_get_victim
    : segno#N is valid, return segno#N as source segment of GC
					- f2fs_allocate_data_block
						- need_new_seg
						- get_ssr_segment
						- f2fs_get_victim
						: get segno #N as destination segment
						- change_curseg

Fixes: e066b83c9b ("f2fs: add ioctl to flush data from faster device to cold area")
Signed-off-by: Yongpeng Yang <yangyongpeng1@oppo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16 16:12:29 +00:00
zangyangyang1
5f65945427 f2fs: cache more dentry pages
While traversing dir entries in dentry page, it's better to refresh current
accessed page in lru list by using FGP_ACCESSED flag, otherwise, such page
may has less chance to survive during memory reclaim, result in causing
additional IO when revisiting dentry page.

Signed-off-by: zangyangyang1 <zangyangyang1@xiaomi.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16 16:12:28 +00:00
Matthew Wilcox (Oracle)
c910a64bc4 f2fs: Remove calls to folio_file_mapping()
All folios that f2fs sees belong to f2fs and not to the swapcache
so it can dereference folio->mapping directly like all other
filesystems do.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16 16:12:26 +00:00
Matthew Wilcox (Oracle)
19bbd306dd f2fs: Convert __read_io_type() to take a folio
Remove the last call to page_file_mapping() as both callers can now pass
in a folio.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16 16:12:23 +00:00
Matthew Wilcox (Oracle)
f58d864582 f2fs: Use a data folio in f2fs_submit_page_bio()
Remove a call to compound_head().  We can call bio_add_folio_nofail()
here because we just allocated the bio, so we know it can't fail and
thus the error path can never be taken.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16 16:12:19 +00:00
Matthew Wilcox (Oracle)
0765b3f989 f2fs: Use a folio more in f2fs_submit_page_bio()
Cache the result of page_folio(fio->page) in a local variable so
we don't have to keep calling it.  Saves a couple of calls to
compound_head() and removes an access to page->mapping.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16 16:12:16 +00:00
Matthew Wilcox (Oracle)
e0821645dd f2fs: Convert f2fs_finish_read_bio() to use folios
Use bio_for_each_folio_all() to iterate over each folio in the bio.
This lets us use folio_end_read() which saves an atomic operation and
memory barrier compared to marking the folio uptodate and unlocking
it as two separate operations.  This also removes a few hidden calls
to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16 16:12:13 +00:00
Matthew Wilcox (Oracle)
1cf7460070 f2fs: Add F2FS_F_SB()
This is the folio equivalent of F2FS_P_SB().  Removes a call to
page_file_mapping() as we know folios seen by f2fs are never part of
the swap cache.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16 16:12:10 +00:00
Matthew Wilcox (Oracle)
87e2a15bc0 f2fs: Convert submit tracepoints to take a folio
Remove accesses to page->index and page->mapping as well as
unnecessary calls to page_file_mapping().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16 16:12:07 +00:00
Matthew Wilcox (Oracle)
ac866908d7 f2fs: Use a folio in f2fs_write_compressed_pages()
Remove accesses to page->index and an unnecessary reference to
page->mapping.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16 16:12:04 +00:00
Matthew Wilcox (Oracle)
1cda5bc0b2 f2fs: Use a folio in f2fs_truncate_partial_cluster()
Convert the incoming page to a folio and use it throughout.
Removes an access to page->index.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16 16:12:01 +00:00
Matthew Wilcox (Oracle)
ff6c82a934 f2fs: Use a folio in f2fs_compress_write_end()
This removes an access of page->index.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16 16:11:58 +00:00
Matthew Wilcox (Oracle)
a909c17953 f2fs: Use a folio in f2fs_all_cluster_page_ready()
Remove references to page->index and use folio_test_uptodate()
instead of PageUptodate().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16 16:11:51 +00:00
Linus Torvalds
40384c840e Linux 6.13-rc1 v6.13-rc1 2024-12-01 14:28:56 -08:00
Linus Torvalds
a14bf463e7 Merge tag 'i2c-for-6.13-rc1-part3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c component probing support from Wolfram Sang:
 "Add OF component probing.

  Some devices are designed and manufactured with some components having
  multiple drop-in replacement options. These components are often
  connected to the mainboard via ribbon cables, having the same signals
  and pin assignments across all options. These may include the display
  panel and touchscreen on laptops and tablets, and the trackpad on
  laptops. Sometimes which component option is used in a particular
  device can be detected by some firmware provided identifier, other
  times that information is not available, and the kernel has to try to
  probe each device.

  Instead of a delicate dance between drivers and device tree quirks,
  this change introduces a simple I2C component probe function. For a
  given class of devices on the same I2C bus, it will go through all of
  them, doing a simple I2C read transfer and see which one of them
  responds. It will then enable the device that responds"

* tag 'i2c-for-6.13-rc1-part3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  MAINTAINERS: fix typo in I2C OF COMPONENT PROBER
  of: base: Document prefix argument for of_get_next_child_with_prefix()
  i2c: Fix whitespace style issue
  arm64: dts: mediatek: mt8173-elm-hana: Mark touchscreens and trackpads as fail
  platform/chrome: Introduce device tree hardware prober
  i2c: of-prober: Add GPIO support to simple helpers
  i2c: of-prober: Add simple helpers for regulator support
  i2c: Introduce OF component probe function
  of: base: Add for_each_child_of_node_with_prefix()
  of: dynamic: Add of_changeset_update_prop_string
2024-12-01 13:38:24 -08:00
Linus Torvalds
88862eeb47 Merge tag 'trace-printf-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull bprintf() removal from Steven Rostedt:

 - Remove unused bprintf() function, that was added with the rest of the
   "bin-printf" functions.

   These are functions that are used by trace_printk() that allows to
   quickly save the format and arguments into the ring buffer without
   the expensive processing of converting numbers to ASCII. Then on
   output, at a much later time, the ring buffer is read and the string
   processing occurs then. The bprintf() was added for consistency but
   was never used. It can be safely removed.

* tag 'trace-printf-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  printf: Remove unused 'bprintf'
2024-12-01 13:10:51 -08:00
Linus Torvalds
f788b5ef1c Merge tag 'timers_urgent_for_v6.13_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Borislav Petkov:

 - Fix a case where posix timers with a thread-group-wide target would
   miss signals if some of the group's threads are exiting

 - Fix a hang caused by ndelay() calling the wrong delay function
   __udelay()

 - Fix a wrong offset calculation in adjtimex(2) when using ADJ_MICRO
   (microsecond resolution) and a negative offset

* tag 'timers_urgent_for_v6.13_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  posix-timers: Target group sigqueue to current task only if not exiting
  delay: Fix ndelay() spuriously treated as udelay()
  ntp: Remove invalid cast in time offset math
2024-12-01 12:41:21 -08:00
Linus Torvalds
63f4993b79 Merge tag 'irq_urgent_for_v6.13_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Borislav Petkov:

 - Move the ->select callback to the correct ops structure in
   irq-mvebu-sei to fix some Marvell Armada platforms

 - Add a workaround for Hisilicon ITS erratum 162100801 which can cause
   some virtual interrupts to get lost

 - More platform_driver::remove() conversion

* tag 'irq_urgent_for_v6.13_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip: Switch back to struct platform_driver::remove()
  irqchip/gicv3-its: Add workaround for hip09 ITS erratum 162100801
  irqchip/irq-mvebu-sei: Move misplaced select() callback to SEI CP domain
2024-12-01 12:37:58 -08:00
Linus Torvalds
58ac609b99 Merge tag 'x86_urgent_for_v6.13_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:

 - Add a terminating zero end-element to the array describing AMD CPUs
   affected by erratum 1386 so that the matching loop actually
   terminates instead of going off into the weeds

 - Update the boot protocol documentation to mention the fact that the
   preferred address to load the kernel to is considered in the
   relocatable kernel case too

 - Flush the memory buffer containing the microcode patch after applying
   microcode on AMD Zen1 and Zen2, to avoid unnecessary slowdowns

 - Make sure the PPIN CPU feature flag is cleared on all CPUs if PPIN
   has been disabled

* tag 'x86_urgent_for_v6.13_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/CPU/AMD: Terminate the erratum_1386_microcode array
  x86/Documentation: Update algo in init_size description of boot protocol
  x86/microcode/AMD: Flush patch buffer mapping after application
  x86/mm: Carve out INVLPG inline asm for use by others
  x86/cpu: Fix PPIN initialization
2024-12-01 12:35:37 -08:00
Linus Torvalds
9022ed0e7e strscpy: write destination buffer only once
The point behind strscpy() was to once and for all avoid all the
problems with 'strncpy()' and later broken "fixed" versions like
strlcpy() that just made things worse.

So strscpy not only guarantees NUL-termination (unlike strncpy), it also
doesn't do unnecessary padding at the destination.  But at the same time
also avoids byte-at-a-time reads and writes by _allowing_ some extra NUL
writes - within the size, of course - so that the whole copy can be done
with word operations.

It is also stable in the face of a mutable source string: it explicitly
does not read the source buffer multiple times (so an implementation
using "strnlen()+memcpy()" would be wrong), and does not read the source
buffer past the size (like the mis-design that is strlcpy does).

Finally, the return value is designed to be simple and unambiguous: if
the string cannot be copied fully, it returns an actual negative error,
making error handling clearer and simpler (and the caller already knows
the size of the buffer).  Otherwise it returns the string length of the
result.

However, there was one final stability issue that can be important to
callers: the stability of the destination buffer.

In particular, the same way we shouldn't read the source buffer more
than once, we should avoid doing multiple writes to the destination
buffer: first writing a potentially non-terminated string, and then
terminating it with NUL at the end does not result in a stable result
buffer.

Yes, it gives the right result in the end, but if the rule for the
destination buffer was that it is _always_ NUL-terminated even when
accessed concurrently with updates, the final byte of the buffer needs
to always _stay_ as a NUL byte.

[ Note that "final byte is NUL" here is literally about the final byte
  in the destination array, not the terminating NUL at the end of the
  string itself. There is no attempt to try to make concurrent reads and
  writes give any kind of consistent string length or contents, but we
  do want to guarantee that there is always at least that final
  terminating NUL character at the end of the destination array if it
  existed before ]

This is relevant in the kernel for the tsk->comm[] array, for example.
Even without locking (for either readers or writers), we want to know
that while the buffer contents may be garbled, it is always a valid C
string and always has a NUL character at 'comm[TASK_COMM_LEN-1]' (and
never has any "out of thin air" data).

So avoid any "copy possibly non-terminated string, and terminate later"
behavior, and write the destination buffer only once.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-12-01 12:17:16 -08:00
Dr. David Alan Gilbert
f69e63756f printf: Remove unused 'bprintf'
bprintf() is unused. Remove it. It was added in the commit 4370aa4aa7
("vsprintf: add binary printf") but as far as I can see was never used,
unlike the other two functions in that patch.

Link: https://lore.kernel.org/20241002173147.210107-1-linux@treblig.org
Reviewed-by: Andy Shevchenko <andy@kernel.org>
Acked-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-11-30 22:41:35 -05:00
Linus Torvalds
bcc8eda6d3 Merge tag 'turbostat-2024.11.30' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
Pull turbostat updates from Len Brown:

 - assorted minor bug fixes

 - assorted platform specific tweaks

 - initial RAPL PSYS (SysWatt) support

* tag 'turbostat-2024.11.30' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
  tools/power turbostat: 2024.11.30
  tools/power turbostat: Add RAPL psys as a built-in counter
  tools/power turbostat: Fix child's argument forwarding
  tools/power turbostat: Force --no-perf in --dump mode
  tools/power turbostat: Add support for /sys/class/drm/card1
  tools/power turbostat: Cache graphics sysfs file descriptors during probe
  tools/power turbostat: Consolidate graphics sysfs access
  tools/power turbostat: Remove unnecessary fflush() call
  tools/power turbostat: Enhance platform divergence description
  tools/power turbostat: Add initial support for GraniteRapids-D
  tools/power turbostat: Remove PC3 support on Lunarlake
  tools/power turbostat: Rename arl_features to lnl_features
  tools/power turbostat: Add back PC8 support on Arrowlake
  tools/power turbostat: Remove PC7/PC9 support on MTL
  tools/power turbostat: Honor --show CPU, even when even when num_cpus=1
  tools/power turbostat: Fix trailing '\n' parsing
  tools/power turbostat: Allow using cpu device in perf counters on hybrid platforms
  tools/power turbostat: Fix column printing for PMT xtal_time counters
  tools/power turbostat: fix GCC9 build regression
2024-11-30 18:30:22 -08:00
Linus Torvalds
0cb71708c5 Merge tag 'pci-v6.13-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull PCI fix from Bjorn Helgaas:

 - When removing a PCI device, only look up and remove a platform device
   if there is an associated device node for which there could be a
   platform device, to fix a merge window regression (Brian Norris)

* tag 'pci-v6.13-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
  PCI/pwrctrl: Unregister platform device only if one actually exists
2024-11-30 18:23:05 -08:00
Linus Torvalds
8a6a03ad5b Merge tag 'lsm-pr-20241129' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull ima fix from Paul Moore:
 "One small patch to fix a function parameter / local variable naming
  snafu that went up to you in the current merge window"

* tag 'lsm-pr-20241129' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
  ima: uncover hidden variable in ima_match_rules()
2024-11-30 18:14:56 -08:00
Linus Torvalds
cfd47302ac Merge tag 'block-6.13-20242901' of git://git.kernel.dk/linux
Pull more block updates from Jens Axboe:

 - NVMe pull request via Keith:
      - Use correct srcu list traversal (Breno)
      - Scatter-gather support for metadata (Keith)
      - Fabrics shutdown race condition fix (Nilay)
      - Persistent reservations updates (Guixin)

 - Add the required bits for MD atomic write support for raid0/1/10

 - Correct return value for unknown opcode in ublk

 - Fix deadlock with zone revalidation

 - Fix for the io priority request vs bio cleanups

 - Use the correct unsigned int type for various limit helpers

 - Fix for a race in loop

 - Cleanup blk_rq_prep_clone() to prevent uninit-value warning and make
   it easier for actual humans to read

 - Fix potential UAF when iterating tags

 - A few fixes for bfq-iosched UAF issues

 - Fix for brd discard not decrementing the allocated page count

 - Various little fixes and cleanups

* tag 'block-6.13-20242901' of git://git.kernel.dk/linux: (36 commits)
  brd: decrease the number of allocated pages which discarded
  block, bfq: fix bfqq uaf in bfq_limit_depth()
  block: Don't allow an atomic write be truncated in blkdev_write_iter()
  mq-deadline: don't call req_get_ioprio from the I/O completion handler
  block: Prevent potential deadlock in blk_revalidate_disk_zones()
  block: Remove extra part pointer NULLify in blk_rq_init()
  nvme: tuning pr code by using defined structs and macros
  nvme: introduce change ptpl and iekey definition
  block: return bool from get_disk_ro and bdev_read_only
  block: remove a duplicate definition for bdev_read_only
  block: return bool from blk_rq_aligned
  block: return unsigned int from blk_lim_dma_alignment_and_pad
  block: return unsigned int from queue_dma_alignment
  block: return unsigned int from bdev_io_opt
  block: req->bio is always set in the merge code
  block: don't bother checking the data direction for merges
  block: blk-mq: fix uninit-value in blk_rq_prep_clone and refactor
  Revert "block, bfq: merge bfq_release_process_ref() into bfq_put_cooperator()"
  md/raid10: Atomic write support
  md/raid1: Atomic write support
  ...
2024-11-30 15:47:29 -08:00
Linus Torvalds
dd54fcced8 Merge tag 'io_uring-6.13-20242901' of git://git.kernel.dk/linux
Pull more io_uring updates from Jens Axboe:

 - Remove a leftover struct from when the cqwait registered waiting was
   transitioned to regions.

 - Fix for an issue introduced in this merge window, where nop->fd might
   be used uninitialized. Ensure it's always set.

 - Add capping of the task_work run in local task_work mode, to prevent
   bursty and long chains from adding too much latency.

 - Work around xa_store() leaving ->head non-NULL if it encounters an
   allocation error during storing. Just a debug trigger, and can go
   away once xa_store() behaves in a more expected way for this
   condition. Not a major thing as it basically requires fault injection
   to trigger it.

 - Fix a few mapping corner cases

 - Fix KCSAN complaint on reading the table size post unlock. Again not
   a "real" issue, but it's easy to silence by just keeping the reading
   inside the lock that protects it.

* tag 'io_uring-6.13-20242901' of git://git.kernel.dk/linux:
  io_uring/tctx: work around xa_store() allocation error issue
  io_uring: fix corner case forgetting to vunmap
  io_uring: fix task_work cap overshooting
  io_uring: check for overflows in io_pin_pages
  io_uring/nop: ensure nop->fd is always initialized
  io_uring: limit local tw done
  io_uring: add io_local_work_pending()
  io_uring/region: return negative -E2BIG in io_create_region()
  io_uring: protect register tracing
  io_uring: remove io_uring_cqwait_reg_arg
2024-11-30 15:43:02 -08:00
Linus Torvalds
133577cad6 Merge tag 'dma-mapping-6.13-2024-11-30' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fix from Christoph Hellwig:

 - fix physical address calculation for struct dma_debug_entry (Fedor
   Pchelkin)

* tag 'dma-mapping-6.13-2024-11-30' of git://git.infradead.org/users/hch/dma-mapping:
  dma-debug: fix physical address calculation for struct dma_debug_entry
2024-11-30 15:36:17 -08:00
Linus Torvalds
c4bb3a2d64 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull more kvm updates from Paolo Bonzini:

 - ARM fixes

 - RISC-V Svade and Svadu (accessed and dirty bit) extension support for
   host and guest

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: riscv: selftests: Add Svade and Svadu Extension to get-reg-list test
  RISC-V: KVM: Add Svade and Svadu Extensions Support for Guest/VM
  dt-bindings: riscv: Add Svade and Svadu Entries
  RISC-V: Add Svade and Svadu Extensions Support
  KVM: arm64: Use MDCR_EL2.HPME to evaluate overflow of hyp counters
  KVM: arm64: Ignore PMCNTENSET_EL0 while checking for overflow status
  KVM: arm64: Mark set_sysreg_masks() as inline to avoid build failure
  KVM: arm64: vgic-its: Add stronger type-checking to the ITS entry sizes
  KVM: arm64: vgic: Kill VGIC_MAX_PRIVATE definition
  KVM: arm64: vgic: Make vgic_get_irq() more robust
  KVM: arm64: vgic-v3: Sanitise guest writes to GICR_INVLPIR
2024-11-30 14:51:08 -08:00
Linus Torvalds
0ff86d8da7 Merge tag 'sh-for-v6.13-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux
Pull sh updates from John Paul Adrian Glaubitz:
 "Two small fixes.

  The first one by Huacai Chen addresses a runtime warning when
  CONFIG_CPUMASK_OFFSTACK and CONFIG_DEBUG_PER_CPU_MAPS are selected
  which occurs because the cpuinfo code on sh incorrectly uses NR_CPUS
  when iterating CPUs instead of the runtime limit nr_cpu_ids.

  A second fix by Dan Carpenter fixes a use-after-free bug in
  register_intc_controller() which occurred as a result of improper
  error handling in the interrupt controller driver code when
  registering an interrupt controller during plat_irq_setup() on sh"

* tag 'sh-for-v6.13-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux:
  sh: intc: Fix use-after-free bug in register_intc_controller()
  sh: cpuinfo: Fix a warning for CONFIG_CPUMASK_OFFSTACK
2024-11-30 14:45:29 -08:00
Linus Torvalds
50ee4a6fe3 Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:

 - Deselect ARCH_CORRECT_STACKTRACE_ON_KRETPROBE so that tests depending
   on it don't run (and fail) on arm64

 - Fix lockdep assert in the Arm SMMUv3 PMU driver

 - Fix the port and device ID bits setting in the Arm CMN perf driver

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  perf/arm-cmn: Ensure port and device id bits are set properly
  perf/arm-smmuv3: Fix lockdep assert in ->event_init()
  arm64: disable ARCH_CORRECT_STACKTRACE_ON_KRETPROBE tests
2024-11-30 14:33:44 -08:00
Len Brown
86d2377340 tools/power turbostat: 2024.11.30
since 2024.07.26:

assorted minor bug fixes
assorted platform specific tweaks
initial RAPL PSYS (SysWatt) support

Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30 16:48:56 -05:00
Patryk Wlazlyn
e5f687b89b tools/power turbostat: Add RAPL psys as a built-in counter
Introduce the counter as a part of global, platform counters structure.
We open the counter for only one cpu, but otherwise treat it as an
ordinary RAPL counter, allowing for grouped perf read.

The counter is disabled by default, because it's interpretation may
require additional, platform specific information, making it unsuitable
for general use.

Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30 16:42:07 -05:00
Patryk Wlazlyn
1da0daf746 tools/power turbostat: Fix child's argument forwarding
Add '+' to optstring when early scanning for --no-msr and --no-perf.
It causes option processing to stop as soon as a nonoption argument is
encountered, effectively skipping child's arguments.

Fixes: 3e4048466c ("tools/power turbostat: Add --no-msr option")
Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30 16:42:07 -05:00
Patryk Wlazlyn
bcfab87108 tools/power turbostat: Force --no-perf in --dump mode
Force the --no-perf early to prevent using it as a source. User asks for
raw values, but perf returns them relative to the opening of the file
descriptor.

Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30 16:42:07 -05:00
Zhang Rui
03109e2f0d tools/power turbostat: Add support for /sys/class/drm/card1
On some machines, the graphics device is enumerated as
/sys/class/drm/card1 instead of /sys/class/drm/card0. The current
implementation does not handle this scenario, resulting in the loss of
graphics C6 residency and frequency information.

Add support for /sys/class/drm/card1, ensuring that turbostat can
retrieve and display the graphics columns for these platforms.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30 16:42:07 -05:00
Zhang Rui
c7538f3385 tools/power turbostat: Cache graphics sysfs file descriptors during probe
Snapshots of the graphics sysfs knobs are taken based on file
descriptors. To optimize this process, open the files and cache the file
descriptors during the graphics probe phase. As a result, the previously
cached pathnames become redundant and are removed.

This change aims to streamline the code without altering its functionality.

No functional change intended.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30 16:42:07 -05:00
Zhang Rui
d071004e62 tools/power turbostat: Consolidate graphics sysfs access
Currently, there is an inconsistency in how graphics sysfs knobs are
accessed: graphics residency sysfs knobs are opened and closed for each
read, while graphics frequency sysfs knobs are opened once and remain
open until turbostat exits. This inconsistency is confusing and adds
unnecessary code complexity.

Consolidate the access method by opening the sysfs files once and
reusing the file pointers for subsequent accesses. This approach
simplifies the code and ensures a consistent method for accessing
graphics sysfs knobs.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30 16:42:07 -05:00
Zhang Rui
ba99a4fc8c tools/power turbostat: Remove unnecessary fflush() call
The graphics sysfs knobs are read-only, making the use of fflush()
before reading them redundant.

Remove the unnecessary fflush() call.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30 16:42:07 -05:00
Zhang Rui
1958f4e168 tools/power turbostat: Enhance platform divergence description
In various generations, platforms often share a majority of features,
diverging only in a few specific aspects. The current approach of using
hardcoded values in 'platform_features' structure fails to effectively
represent these divergences.

To improve the description of platform divergence:
1. Each newly introduced 'platform_features' structure must have a base,
   typically derived from the previous generation.
2. Platform feature values should be inherited from the base structure
   rather than being hardcoded.
This approach ensures a more accurate and maintainable representation of
platform-specific features across different generations.

Converts `adl_features` and `lnl_features` to follow this new scheme.

No functional change.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30 16:42:06 -05:00
Zhang Rui
d39d586ee4 tools/power turbostat: Add initial support for GraniteRapids-D
Add initial support for GraniteRapids-D. It shares the same features
with SapphireRapids.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30 16:42:06 -05:00
Zhang Rui
26c57a152b tools/power turbostat: Remove PC3 support on Lunarlake
Lunarlake supports CC1/CC6/CC7/PC2/PC6/PC10.

Remove PC3 support on Lunarlake.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30 16:42:06 -05:00
Zhang Rui
3ae5f34384 tools/power turbostat: Rename arl_features to lnl_features
As ARL shares the same features with ADL/RPL/MTL, now 'arl_features' is
used by Lunarlake platform only.

Rename 'arl_features' to 'lnl_features'.

No functional change.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30 16:42:06 -05:00
Zhang Rui
b082e07aec tools/power turbostat: Add back PC8 support on Arrowlake
Similar to ADL/RPL/MTL, ARL supports CC1/CC6/CC7/PC2/PC3/PC6/PC8/PC10.

Add back PC8 support on Arrowlake.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30 16:42:06 -05:00
Zhang Rui
f5e2cf228f tools/power turbostat: Remove PC7/PC9 support on MTL
Similar to ADL/RPL, MTL support CC1/CC6/CC7/PC2/PC3/PC6/PC8/CP10.

Remove PC7/PC9 support on MTL.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30 16:42:06 -05:00
Patryk Wlazlyn
c808624e2d tools/power turbostat: Honor --show CPU, even when even when num_cpus=1
Honor --show CPU and --show Core when "topo.num_cpus == 1".
Previously turbostat assumed that on a 1-CPU system, these
columns should never appear.

Honoring these flags makes it easier for several programs
that parse turbostat output.

Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30 16:42:06 -05:00
Zhang Rui
fed8511cc8 tools/power turbostat: Fix trailing '\n' parsing
parse_cpu_string() parses the string input either from command line or
from /sys/fs/cgroup/cpuset.cpus.effective to get a list of CPUs that
turbostat can run with.

The cpu string returned by /sys/fs/cgroup/cpuset.cpus.effective contains
a trailing '\n', but strtoul() fails to treat this as an error.

That says, for the code below
	val = ("\n", NULL, 10);
val returns 0, and errno is also not set.

As a result, CPU0 is erroneously considered as allowed CPU and this
causes failures when turbostat tries to run on CPU0.

 get_counters: Could not migrate to CPU 0
 ...
 turbostat: re-initialized with num_cpus 8, allowed_cpus 5
 get_counters: Could not migrate to CPU 0

Add a check to return immediately if '\n' or '\0' is detected.

Fixes: 8c3dd2c9e5 ("tools/power/turbostat: Abstrct function for parsing cpu string")
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30 16:42:06 -05:00