Commit Graph

1323413 Commits

Author SHA1 Message Date
Kalesh AP
ae51cb9821 RDMA/bnxt_re: Remove unnecessary goto in bnxt_re_netdev_event
Return directly in case of error without a goto label as there
is no cleanup actions performed.

Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://patch.msgid.link/1733888745-30939-3-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-12-16 08:22:06 -05:00
Kalesh AP
c7f2cfe81e RDMA/bnxt_re: Remove extra new line in bnxt_re_netdev_event
This is a purely cosmetic change.

Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://patch.msgid.link/1733888745-30939-2-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-12-16 08:22:06 -05:00
Boshi Yu
999a0a2e9b RDMA/erdma: Support UD QPs and UD WRs
The iWARP protocol supports only RC QPs previously. Now we add UD QPs
and UD WRs support for the RoCEv2 protocol.

Signed-off-by: Boshi Yu <boshiyu@linux.alibaba.com>
Link: https://patch.msgid.link/20241211020930.68833-9-boshiyu@linux.alibaba.com
Reviewed-by: Cheng Xu <chengyou@linux.alibaba.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-12-16 08:20:05 -05:00
Boshi Yu
1cccbd3eec RDMA/erdma: Add the query_qp command to the cmdq
Certian QP attributes, such as sq_draining, can only be obtained
by querying the hardware on the erdma RoCEv2 device. To address this,
we add the query_qp command to the cmdq and parse the response to
retrieve corresponding QP attributes.

Signed-off-by: Boshi Yu <boshiyu@linux.alibaba.com>
Link: https://patch.msgid.link/20241211020930.68833-8-boshiyu@linux.alibaba.com
Reviewed-by: Cheng Xu <chengyou@linux.alibaba.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-12-16 08:20:05 -05:00
Boshi Yu
de5b8008aa RDMA/erdma: Refactor the code of the modify_qp interface
The procedure for modifying QP is similar for both the iWARP and
RoCEv2 protocols. Therefore, we unify the code and provide the
erdma_modify_qp() interface for both protocols.

Signed-off-by: Boshi Yu <boshiyu@linux.alibaba.com>
Link: https://patch.msgid.link/20241211020930.68833-7-boshiyu@linux.alibaba.com
Reviewed-by: Cheng Xu <chengyou@linux.alibaba.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-12-16 08:20:05 -05:00
Boshi Yu
9566cf6a77 RDMA/erdma: Add erdma_modify_qp_rocev2() interface
The QP state machines in the RoCEv2 and iWARP protocols are
different. To handle these differences for the erdma RoCEv2
device, we provide the erdma_modify_qp_rocev2() interface,
which transitions the QP state and modifies QP attributes
accordingly.

Signed-off-by: Boshi Yu <boshiyu@linux.alibaba.com>
Link: https://patch.msgid.link/20241211020930.68833-6-boshiyu@linux.alibaba.com
Reviewed-by: Cheng Xu <chengyou@linux.alibaba.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-12-16 08:20:05 -05:00
Boshi Yu
41dcaf48ff RDMA/erdma: Add address handle implementation
The address handle contains the necessary information to transmit
messages to a remote peer in the RoCEv2 protocol. This commit
implements the erdma_create_ah(), erdma_destroy_ah(), and
erdma_query_ah() interfaces, which are used to create, destroy,
and query an address handle, respectively.

Signed-off-by: Boshi Yu <boshiyu@linux.alibaba.com>
Link: https://patch.msgid.link/20241211020930.68833-5-boshiyu@linux.alibaba.com
Reviewed-by: Cheng Xu <chengyou@linux.alibaba.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-12-16 08:20:05 -05:00
Boshi Yu
14bcf7354a RDMA/erdma: Add the erdma_query_pkey() interface
The erdma_query_pkey() interface queries the PKey at the specified
index. Currently, erdma supports only one partition and returns the
default PKey for each query. Besides, the correct length of the PKey
table can be obtained by calling the erdma_query_port() and
erdma_get_port_immutable() interfaces.

Signed-off-by: Boshi Yu <boshiyu@linux.alibaba.com>
Link: https://patch.msgid.link/20241211020930.68833-4-boshiyu@linux.alibaba.com
Reviewed-by: Cheng Xu <chengyou@linux.alibaba.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-12-16 08:20:05 -05:00
Boshi Yu
6edc15abc2 RDMA/erdma: Add GID table management interfaces
The erdma_add_gid() interface inserts a GID entry at the
specified index. The erdma_del_gid() interface deletes the
GID entry at the specified index. Additionally, programs
can invoke the erdma_query_port() and erdma_get_port_immutable()
interfaces to query the GID table length.

Signed-off-by: Boshi Yu <boshiyu@linux.alibaba.com>
Link: https://patch.msgid.link/20241211020930.68833-3-boshiyu@linux.alibaba.com
Reviewed-by: Cheng Xu <chengyou@linux.alibaba.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-12-16 08:20:05 -05:00
Boshi Yu
a883e71345 RDMA/erdma: Probe the erdma RoCEv2 device
Currently, the erdma driver supports both the iWARP and RoCEv2 protocols.
The erdma driver reads the ERDMA_REGS_DEV_PROTO_REG register to identify
the protocol used by the erdma device. Since each protocol requires
different ib_device_ops, we introduce the erdma_device_ops_iwarp and
erdma_device_ops_rocev2 for iWARP and RoCEv2 protocols, respectively.

Signed-off-by: Boshi Yu <boshiyu@linux.alibaba.com>
Link: https://patch.msgid.link/20241211020930.68833-2-boshiyu@linux.alibaba.com
Reviewed-by: Cheng Xu <chengyou@linux.alibaba.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-12-16 08:20:05 -05:00
Dan Carpenter
bd96a3935e rdma/cxgb4: Prevent potential integer overflow on 32bit
The "gl->tot_len" variable is controlled by the user.  It comes from
process_responses().  On 32bit systems, the "gl->tot_len + sizeof(struct
cpl_pass_accept_req) + sizeof(struct rss_header)" addition could have an
integer wrapping bug.  Use size_add() to prevent this.

Fixes: 1cab775c3e ("RDMA/cxgb4: Fix LE hash collision bug for passive open connection")
Link: https://patch.msgid.link/r/86b404e1-4a75-4a35-a34e-e3054fa554c7@stanley.mountain
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-12-10 11:16:18 -04:00
Chiara Meiohas
fbef60de6c RDMA/mlx5: Extend ODP statistics with operation count
The current ODP counters represent the total number of pages
handled, but it is not enough to understand the effectiveness
of these operations.

Extend the ODP counters to include the number of times page fault
and invalidation events were handled.

Example for a single page fault handling 512 pages:
- page_fault: incremented by 512 (total pages)
- page_fault_handled: incremented by 1 (operation count)

The same example is applicable for page invalidation too.

Previous output:
$ rdma stat mr
dev rocep8s0f0 mrn 8 page_faults 27 page_invalidations 0 page_prefetch 29

New output:
$ rdma stat mr
dev rocep8s0f0 mrn 21 page_faults 512 page_faults_handled 1
page_invalidations 0 page_invalidations_handled 0 page_prefetch 51200

Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Link: https://patch.msgid.link/b18f29ed1392996ade66e9e6c45f018925253f6a.1733234165.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-12-10 04:09:09 -05:00
Leon Romanovsky
f5afe060b1 RDMA/mlx4: Use DMA iterator to write MTT
Replace an open coding of rdma_umem_for_each_dma_block() with
the proper function.

Link: https://patch.msgid.link/0bf595962c964fb8918743405acf9103a5a85983.1733233299.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2024-12-10 04:08:12 -05:00
Leon Romanovsky
d31ba16c43 RDMA/mlx4: Use ib_umem_find_best_pgsz() to calculate MTT size
Convert mlx4 to use ib_umem_find_best_pgsz() instead of open-coded
variant to calculate MTT size.

Link: https://patch.msgid.link/c39ec6f5d4664c439a72f2961728ebb5895a9f07.1733233299.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2024-12-10 04:08:04 -05:00
Leon Romanovsky
1f53d88cbb RDMA/mlx4: Avoid false error about access to uninitialized gids array
Smatch generates the following false error report:
drivers/infiniband/hw/mlx4/main.c:393 mlx4_ib_del_gid() error: uninitialized symbol 'gids'.

Traditionally, we are not changing kernel code and asking people to fix
the tools. However in this case, the fix can be done by simply rearranging
the code to be more clear.

Fixes: e26be1bfef ("IB/mlx4: Implement ib_device callbacks")
Link: https://patch.msgid.link/6a3a1577463da16962463fcf62883a87506e9b62.1733233426.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2024-12-10 04:07:09 -05:00
Linus Torvalds
40384c840e Linux 6.13-rc1 v6.13-rc1 2024-12-01 14:28:56 -08:00
Linus Torvalds
a14bf463e7 Merge tag 'i2c-for-6.13-rc1-part3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c component probing support from Wolfram Sang:
 "Add OF component probing.

  Some devices are designed and manufactured with some components having
  multiple drop-in replacement options. These components are often
  connected to the mainboard via ribbon cables, having the same signals
  and pin assignments across all options. These may include the display
  panel and touchscreen on laptops and tablets, and the trackpad on
  laptops. Sometimes which component option is used in a particular
  device can be detected by some firmware provided identifier, other
  times that information is not available, and the kernel has to try to
  probe each device.

  Instead of a delicate dance between drivers and device tree quirks,
  this change introduces a simple I2C component probe function. For a
  given class of devices on the same I2C bus, it will go through all of
  them, doing a simple I2C read transfer and see which one of them
  responds. It will then enable the device that responds"

* tag 'i2c-for-6.13-rc1-part3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  MAINTAINERS: fix typo in I2C OF COMPONENT PROBER
  of: base: Document prefix argument for of_get_next_child_with_prefix()
  i2c: Fix whitespace style issue
  arm64: dts: mediatek: mt8173-elm-hana: Mark touchscreens and trackpads as fail
  platform/chrome: Introduce device tree hardware prober
  i2c: of-prober: Add GPIO support to simple helpers
  i2c: of-prober: Add simple helpers for regulator support
  i2c: Introduce OF component probe function
  of: base: Add for_each_child_of_node_with_prefix()
  of: dynamic: Add of_changeset_update_prop_string
2024-12-01 13:38:24 -08:00
Linus Torvalds
88862eeb47 Merge tag 'trace-printf-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull bprintf() removal from Steven Rostedt:

 - Remove unused bprintf() function, that was added with the rest of the
   "bin-printf" functions.

   These are functions that are used by trace_printk() that allows to
   quickly save the format and arguments into the ring buffer without
   the expensive processing of converting numbers to ASCII. Then on
   output, at a much later time, the ring buffer is read and the string
   processing occurs then. The bprintf() was added for consistency but
   was never used. It can be safely removed.

* tag 'trace-printf-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  printf: Remove unused 'bprintf'
2024-12-01 13:10:51 -08:00
Linus Torvalds
f788b5ef1c Merge tag 'timers_urgent_for_v6.13_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Borislav Petkov:

 - Fix a case where posix timers with a thread-group-wide target would
   miss signals if some of the group's threads are exiting

 - Fix a hang caused by ndelay() calling the wrong delay function
   __udelay()

 - Fix a wrong offset calculation in adjtimex(2) when using ADJ_MICRO
   (microsecond resolution) and a negative offset

* tag 'timers_urgent_for_v6.13_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  posix-timers: Target group sigqueue to current task only if not exiting
  delay: Fix ndelay() spuriously treated as udelay()
  ntp: Remove invalid cast in time offset math
2024-12-01 12:41:21 -08:00
Linus Torvalds
63f4993b79 Merge tag 'irq_urgent_for_v6.13_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Borislav Petkov:

 - Move the ->select callback to the correct ops structure in
   irq-mvebu-sei to fix some Marvell Armada platforms

 - Add a workaround for Hisilicon ITS erratum 162100801 which can cause
   some virtual interrupts to get lost

 - More platform_driver::remove() conversion

* tag 'irq_urgent_for_v6.13_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip: Switch back to struct platform_driver::remove()
  irqchip/gicv3-its: Add workaround for hip09 ITS erratum 162100801
  irqchip/irq-mvebu-sei: Move misplaced select() callback to SEI CP domain
2024-12-01 12:37:58 -08:00
Linus Torvalds
58ac609b99 Merge tag 'x86_urgent_for_v6.13_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:

 - Add a terminating zero end-element to the array describing AMD CPUs
   affected by erratum 1386 so that the matching loop actually
   terminates instead of going off into the weeds

 - Update the boot protocol documentation to mention the fact that the
   preferred address to load the kernel to is considered in the
   relocatable kernel case too

 - Flush the memory buffer containing the microcode patch after applying
   microcode on AMD Zen1 and Zen2, to avoid unnecessary slowdowns

 - Make sure the PPIN CPU feature flag is cleared on all CPUs if PPIN
   has been disabled

* tag 'x86_urgent_for_v6.13_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/CPU/AMD: Terminate the erratum_1386_microcode array
  x86/Documentation: Update algo in init_size description of boot protocol
  x86/microcode/AMD: Flush patch buffer mapping after application
  x86/mm: Carve out INVLPG inline asm for use by others
  x86/cpu: Fix PPIN initialization
2024-12-01 12:35:37 -08:00
Linus Torvalds
9022ed0e7e strscpy: write destination buffer only once
The point behind strscpy() was to once and for all avoid all the
problems with 'strncpy()' and later broken "fixed" versions like
strlcpy() that just made things worse.

So strscpy not only guarantees NUL-termination (unlike strncpy), it also
doesn't do unnecessary padding at the destination.  But at the same time
also avoids byte-at-a-time reads and writes by _allowing_ some extra NUL
writes - within the size, of course - so that the whole copy can be done
with word operations.

It is also stable in the face of a mutable source string: it explicitly
does not read the source buffer multiple times (so an implementation
using "strnlen()+memcpy()" would be wrong), and does not read the source
buffer past the size (like the mis-design that is strlcpy does).

Finally, the return value is designed to be simple and unambiguous: if
the string cannot be copied fully, it returns an actual negative error,
making error handling clearer and simpler (and the caller already knows
the size of the buffer).  Otherwise it returns the string length of the
result.

However, there was one final stability issue that can be important to
callers: the stability of the destination buffer.

In particular, the same way we shouldn't read the source buffer more
than once, we should avoid doing multiple writes to the destination
buffer: first writing a potentially non-terminated string, and then
terminating it with NUL at the end does not result in a stable result
buffer.

Yes, it gives the right result in the end, but if the rule for the
destination buffer was that it is _always_ NUL-terminated even when
accessed concurrently with updates, the final byte of the buffer needs
to always _stay_ as a NUL byte.

[ Note that "final byte is NUL" here is literally about the final byte
  in the destination array, not the terminating NUL at the end of the
  string itself. There is no attempt to try to make concurrent reads and
  writes give any kind of consistent string length or contents, but we
  do want to guarantee that there is always at least that final
  terminating NUL character at the end of the destination array if it
  existed before ]

This is relevant in the kernel for the tsk->comm[] array, for example.
Even without locking (for either readers or writers), we want to know
that while the buffer contents may be garbled, it is always a valid C
string and always has a NUL character at 'comm[TASK_COMM_LEN-1]' (and
never has any "out of thin air" data).

So avoid any "copy possibly non-terminated string, and terminate later"
behavior, and write the destination buffer only once.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-12-01 12:17:16 -08:00
Dr. David Alan Gilbert
f69e63756f printf: Remove unused 'bprintf'
bprintf() is unused. Remove it. It was added in the commit 4370aa4aa7
("vsprintf: add binary printf") but as far as I can see was never used,
unlike the other two functions in that patch.

Link: https://lore.kernel.org/20241002173147.210107-1-linux@treblig.org
Reviewed-by: Andy Shevchenko <andy@kernel.org>
Acked-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-11-30 22:41:35 -05:00
Linus Torvalds
bcc8eda6d3 Merge tag 'turbostat-2024.11.30' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
Pull turbostat updates from Len Brown:

 - assorted minor bug fixes

 - assorted platform specific tweaks

 - initial RAPL PSYS (SysWatt) support

* tag 'turbostat-2024.11.30' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
  tools/power turbostat: 2024.11.30
  tools/power turbostat: Add RAPL psys as a built-in counter
  tools/power turbostat: Fix child's argument forwarding
  tools/power turbostat: Force --no-perf in --dump mode
  tools/power turbostat: Add support for /sys/class/drm/card1
  tools/power turbostat: Cache graphics sysfs file descriptors during probe
  tools/power turbostat: Consolidate graphics sysfs access
  tools/power turbostat: Remove unnecessary fflush() call
  tools/power turbostat: Enhance platform divergence description
  tools/power turbostat: Add initial support for GraniteRapids-D
  tools/power turbostat: Remove PC3 support on Lunarlake
  tools/power turbostat: Rename arl_features to lnl_features
  tools/power turbostat: Add back PC8 support on Arrowlake
  tools/power turbostat: Remove PC7/PC9 support on MTL
  tools/power turbostat: Honor --show CPU, even when even when num_cpus=1
  tools/power turbostat: Fix trailing '\n' parsing
  tools/power turbostat: Allow using cpu device in perf counters on hybrid platforms
  tools/power turbostat: Fix column printing for PMT xtal_time counters
  tools/power turbostat: fix GCC9 build regression
2024-11-30 18:30:22 -08:00
Linus Torvalds
0cb71708c5 Merge tag 'pci-v6.13-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull PCI fix from Bjorn Helgaas:

 - When removing a PCI device, only look up and remove a platform device
   if there is an associated device node for which there could be a
   platform device, to fix a merge window regression (Brian Norris)

* tag 'pci-v6.13-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
  PCI/pwrctrl: Unregister platform device only if one actually exists
2024-11-30 18:23:05 -08:00
Linus Torvalds
8a6a03ad5b Merge tag 'lsm-pr-20241129' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull ima fix from Paul Moore:
 "One small patch to fix a function parameter / local variable naming
  snafu that went up to you in the current merge window"

* tag 'lsm-pr-20241129' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
  ima: uncover hidden variable in ima_match_rules()
2024-11-30 18:14:56 -08:00
Linus Torvalds
cfd47302ac Merge tag 'block-6.13-20242901' of git://git.kernel.dk/linux
Pull more block updates from Jens Axboe:

 - NVMe pull request via Keith:
      - Use correct srcu list traversal (Breno)
      - Scatter-gather support for metadata (Keith)
      - Fabrics shutdown race condition fix (Nilay)
      - Persistent reservations updates (Guixin)

 - Add the required bits for MD atomic write support for raid0/1/10

 - Correct return value for unknown opcode in ublk

 - Fix deadlock with zone revalidation

 - Fix for the io priority request vs bio cleanups

 - Use the correct unsigned int type for various limit helpers

 - Fix for a race in loop

 - Cleanup blk_rq_prep_clone() to prevent uninit-value warning and make
   it easier for actual humans to read

 - Fix potential UAF when iterating tags

 - A few fixes for bfq-iosched UAF issues

 - Fix for brd discard not decrementing the allocated page count

 - Various little fixes and cleanups

* tag 'block-6.13-20242901' of git://git.kernel.dk/linux: (36 commits)
  brd: decrease the number of allocated pages which discarded
  block, bfq: fix bfqq uaf in bfq_limit_depth()
  block: Don't allow an atomic write be truncated in blkdev_write_iter()
  mq-deadline: don't call req_get_ioprio from the I/O completion handler
  block: Prevent potential deadlock in blk_revalidate_disk_zones()
  block: Remove extra part pointer NULLify in blk_rq_init()
  nvme: tuning pr code by using defined structs and macros
  nvme: introduce change ptpl and iekey definition
  block: return bool from get_disk_ro and bdev_read_only
  block: remove a duplicate definition for bdev_read_only
  block: return bool from blk_rq_aligned
  block: return unsigned int from blk_lim_dma_alignment_and_pad
  block: return unsigned int from queue_dma_alignment
  block: return unsigned int from bdev_io_opt
  block: req->bio is always set in the merge code
  block: don't bother checking the data direction for merges
  block: blk-mq: fix uninit-value in blk_rq_prep_clone and refactor
  Revert "block, bfq: merge bfq_release_process_ref() into bfq_put_cooperator()"
  md/raid10: Atomic write support
  md/raid1: Atomic write support
  ...
2024-11-30 15:47:29 -08:00
Linus Torvalds
dd54fcced8 Merge tag 'io_uring-6.13-20242901' of git://git.kernel.dk/linux
Pull more io_uring updates from Jens Axboe:

 - Remove a leftover struct from when the cqwait registered waiting was
   transitioned to regions.

 - Fix for an issue introduced in this merge window, where nop->fd might
   be used uninitialized. Ensure it's always set.

 - Add capping of the task_work run in local task_work mode, to prevent
   bursty and long chains from adding too much latency.

 - Work around xa_store() leaving ->head non-NULL if it encounters an
   allocation error during storing. Just a debug trigger, and can go
   away once xa_store() behaves in a more expected way for this
   condition. Not a major thing as it basically requires fault injection
   to trigger it.

 - Fix a few mapping corner cases

 - Fix KCSAN complaint on reading the table size post unlock. Again not
   a "real" issue, but it's easy to silence by just keeping the reading
   inside the lock that protects it.

* tag 'io_uring-6.13-20242901' of git://git.kernel.dk/linux:
  io_uring/tctx: work around xa_store() allocation error issue
  io_uring: fix corner case forgetting to vunmap
  io_uring: fix task_work cap overshooting
  io_uring: check for overflows in io_pin_pages
  io_uring/nop: ensure nop->fd is always initialized
  io_uring: limit local tw done
  io_uring: add io_local_work_pending()
  io_uring/region: return negative -E2BIG in io_create_region()
  io_uring: protect register tracing
  io_uring: remove io_uring_cqwait_reg_arg
2024-11-30 15:43:02 -08:00
Linus Torvalds
133577cad6 Merge tag 'dma-mapping-6.13-2024-11-30' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fix from Christoph Hellwig:

 - fix physical address calculation for struct dma_debug_entry (Fedor
   Pchelkin)

* tag 'dma-mapping-6.13-2024-11-30' of git://git.infradead.org/users/hch/dma-mapping:
  dma-debug: fix physical address calculation for struct dma_debug_entry
2024-11-30 15:36:17 -08:00
Linus Torvalds
c4bb3a2d64 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull more kvm updates from Paolo Bonzini:

 - ARM fixes

 - RISC-V Svade and Svadu (accessed and dirty bit) extension support for
   host and guest

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: riscv: selftests: Add Svade and Svadu Extension to get-reg-list test
  RISC-V: KVM: Add Svade and Svadu Extensions Support for Guest/VM
  dt-bindings: riscv: Add Svade and Svadu Entries
  RISC-V: Add Svade and Svadu Extensions Support
  KVM: arm64: Use MDCR_EL2.HPME to evaluate overflow of hyp counters
  KVM: arm64: Ignore PMCNTENSET_EL0 while checking for overflow status
  KVM: arm64: Mark set_sysreg_masks() as inline to avoid build failure
  KVM: arm64: vgic-its: Add stronger type-checking to the ITS entry sizes
  KVM: arm64: vgic: Kill VGIC_MAX_PRIVATE definition
  KVM: arm64: vgic: Make vgic_get_irq() more robust
  KVM: arm64: vgic-v3: Sanitise guest writes to GICR_INVLPIR
2024-11-30 14:51:08 -08:00
Linus Torvalds
0ff86d8da7 Merge tag 'sh-for-v6.13-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux
Pull sh updates from John Paul Adrian Glaubitz:
 "Two small fixes.

  The first one by Huacai Chen addresses a runtime warning when
  CONFIG_CPUMASK_OFFSTACK and CONFIG_DEBUG_PER_CPU_MAPS are selected
  which occurs because the cpuinfo code on sh incorrectly uses NR_CPUS
  when iterating CPUs instead of the runtime limit nr_cpu_ids.

  A second fix by Dan Carpenter fixes a use-after-free bug in
  register_intc_controller() which occurred as a result of improper
  error handling in the interrupt controller driver code when
  registering an interrupt controller during plat_irq_setup() on sh"

* tag 'sh-for-v6.13-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux:
  sh: intc: Fix use-after-free bug in register_intc_controller()
  sh: cpuinfo: Fix a warning for CONFIG_CPUMASK_OFFSTACK
2024-11-30 14:45:29 -08:00
Linus Torvalds
50ee4a6fe3 Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:

 - Deselect ARCH_CORRECT_STACKTRACE_ON_KRETPROBE so that tests depending
   on it don't run (and fail) on arm64

 - Fix lockdep assert in the Arm SMMUv3 PMU driver

 - Fix the port and device ID bits setting in the Arm CMN perf driver

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  perf/arm-cmn: Ensure port and device id bits are set properly
  perf/arm-smmuv3: Fix lockdep assert in ->event_init()
  arm64: disable ARCH_CORRECT_STACKTRACE_ON_KRETPROBE tests
2024-11-30 14:33:44 -08:00
Len Brown
86d2377340 tools/power turbostat: 2024.11.30
since 2024.07.26:

assorted minor bug fixes
assorted platform specific tweaks
initial RAPL PSYS (SysWatt) support

Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30 16:48:56 -05:00
Patryk Wlazlyn
e5f687b89b tools/power turbostat: Add RAPL psys as a built-in counter
Introduce the counter as a part of global, platform counters structure.
We open the counter for only one cpu, but otherwise treat it as an
ordinary RAPL counter, allowing for grouped perf read.

The counter is disabled by default, because it's interpretation may
require additional, platform specific information, making it unsuitable
for general use.

Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30 16:42:07 -05:00
Patryk Wlazlyn
1da0daf746 tools/power turbostat: Fix child's argument forwarding
Add '+' to optstring when early scanning for --no-msr and --no-perf.
It causes option processing to stop as soon as a nonoption argument is
encountered, effectively skipping child's arguments.

Fixes: 3e4048466c ("tools/power turbostat: Add --no-msr option")
Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30 16:42:07 -05:00
Patryk Wlazlyn
bcfab87108 tools/power turbostat: Force --no-perf in --dump mode
Force the --no-perf early to prevent using it as a source. User asks for
raw values, but perf returns them relative to the opening of the file
descriptor.

Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30 16:42:07 -05:00
Zhang Rui
03109e2f0d tools/power turbostat: Add support for /sys/class/drm/card1
On some machines, the graphics device is enumerated as
/sys/class/drm/card1 instead of /sys/class/drm/card0. The current
implementation does not handle this scenario, resulting in the loss of
graphics C6 residency and frequency information.

Add support for /sys/class/drm/card1, ensuring that turbostat can
retrieve and display the graphics columns for these platforms.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30 16:42:07 -05:00
Zhang Rui
c7538f3385 tools/power turbostat: Cache graphics sysfs file descriptors during probe
Snapshots of the graphics sysfs knobs are taken based on file
descriptors. To optimize this process, open the files and cache the file
descriptors during the graphics probe phase. As a result, the previously
cached pathnames become redundant and are removed.

This change aims to streamline the code without altering its functionality.

No functional change intended.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30 16:42:07 -05:00
Zhang Rui
d071004e62 tools/power turbostat: Consolidate graphics sysfs access
Currently, there is an inconsistency in how graphics sysfs knobs are
accessed: graphics residency sysfs knobs are opened and closed for each
read, while graphics frequency sysfs knobs are opened once and remain
open until turbostat exits. This inconsistency is confusing and adds
unnecessary code complexity.

Consolidate the access method by opening the sysfs files once and
reusing the file pointers for subsequent accesses. This approach
simplifies the code and ensures a consistent method for accessing
graphics sysfs knobs.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30 16:42:07 -05:00
Zhang Rui
ba99a4fc8c tools/power turbostat: Remove unnecessary fflush() call
The graphics sysfs knobs are read-only, making the use of fflush()
before reading them redundant.

Remove the unnecessary fflush() call.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30 16:42:07 -05:00
Zhang Rui
1958f4e168 tools/power turbostat: Enhance platform divergence description
In various generations, platforms often share a majority of features,
diverging only in a few specific aspects. The current approach of using
hardcoded values in 'platform_features' structure fails to effectively
represent these divergences.

To improve the description of platform divergence:
1. Each newly introduced 'platform_features' structure must have a base,
   typically derived from the previous generation.
2. Platform feature values should be inherited from the base structure
   rather than being hardcoded.
This approach ensures a more accurate and maintainable representation of
platform-specific features across different generations.

Converts `adl_features` and `lnl_features` to follow this new scheme.

No functional change.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30 16:42:06 -05:00
Zhang Rui
d39d586ee4 tools/power turbostat: Add initial support for GraniteRapids-D
Add initial support for GraniteRapids-D. It shares the same features
with SapphireRapids.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30 16:42:06 -05:00
Zhang Rui
26c57a152b tools/power turbostat: Remove PC3 support on Lunarlake
Lunarlake supports CC1/CC6/CC7/PC2/PC6/PC10.

Remove PC3 support on Lunarlake.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30 16:42:06 -05:00
Zhang Rui
3ae5f34384 tools/power turbostat: Rename arl_features to lnl_features
As ARL shares the same features with ADL/RPL/MTL, now 'arl_features' is
used by Lunarlake platform only.

Rename 'arl_features' to 'lnl_features'.

No functional change.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30 16:42:06 -05:00
Zhang Rui
b082e07aec tools/power turbostat: Add back PC8 support on Arrowlake
Similar to ADL/RPL/MTL, ARL supports CC1/CC6/CC7/PC2/PC3/PC6/PC8/PC10.

Add back PC8 support on Arrowlake.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30 16:42:06 -05:00
Zhang Rui
f5e2cf228f tools/power turbostat: Remove PC7/PC9 support on MTL
Similar to ADL/RPL, MTL support CC1/CC6/CC7/PC2/PC3/PC6/PC8/CP10.

Remove PC7/PC9 support on MTL.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30 16:42:06 -05:00
Patryk Wlazlyn
c808624e2d tools/power turbostat: Honor --show CPU, even when even when num_cpus=1
Honor --show CPU and --show Core when "topo.num_cpus == 1".
Previously turbostat assumed that on a 1-CPU system, these
columns should never appear.

Honoring these flags makes it easier for several programs
that parse turbostat output.

Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30 16:42:06 -05:00
Zhang Rui
fed8511cc8 tools/power turbostat: Fix trailing '\n' parsing
parse_cpu_string() parses the string input either from command line or
from /sys/fs/cgroup/cpuset.cpus.effective to get a list of CPUs that
turbostat can run with.

The cpu string returned by /sys/fs/cgroup/cpuset.cpus.effective contains
a trailing '\n', but strtoul() fails to treat this as an error.

That says, for the code below
	val = ("\n", NULL, 10);
val returns 0, and errno is also not set.

As a result, CPU0 is erroneously considered as allowed CPU and this
causes failures when turbostat tries to run on CPU0.

 get_counters: Could not migrate to CPU 0
 ...
 turbostat: re-initialized with num_cpus 8, allowed_cpus 5
 get_counters: Could not migrate to CPU 0

Add a check to return immediately if '\n' or '\0' is detected.

Fixes: 8c3dd2c9e5 ("tools/power/turbostat: Abstrct function for parsing cpu string")
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30 16:42:06 -05:00
Patryk Wlazlyn
ae2cdf8d92 tools/power turbostat: Allow using cpu device in perf counters on hybrid platforms
Intel hybrid platforms expose different perf devices for P and E cores.
Instead of one, "/sys/bus/event_source/devices/cpu" device, there are
"/sys/bus/event_source/devices/{cpu_core,cpu_atom}".

This, however makes it more complicated for the user,
because most of the counters are available on both and had to be
handled manually.

This patch allows users to use "virtual" cpu device that is seemingly
translated to cpu_core and cpu_atom perf devices, depending on the type
of a CPU we are opening the counter for.

Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30 16:42:06 -05:00
Patryk Wlazlyn
ea8614c08d tools/power turbostat: Fix column printing for PMT xtal_time counters
If the very first printed column was for a PMT counter of type xtal_time
we would misalign the column header, because we were always printing the
delimiter.

Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2024-11-30 16:42:06 -05:00