Commit Graph

14892 Commits

Author SHA1 Message Date
Wang Yaxin
f65c64f311 delayacct: add delay min to record delay peak
Delay accounting can now calculate the average delay of processes, detect
the overall system load, and also record the 'delay max' to identify
potential abnormal delays.  However, 'delay min' can help us identify
another useful delay peak.  By comparing the difference between 'delay
max' and 'delay min', we can understand the optimization space for
latency, providing a reference for the optimization of latency
performance.

Use case
=========
bash-4.4# ./getdelays -d -t 242
print delayacct stats ON
TGID    242
CPU         count     real total  virtual total    delay total  delay average      delay max      delay min
               39      156000000      156576579        2111069          0.054ms     0.212296ms     0.031307ms
IO          count    delay total  delay average      delay max      delay min
                0              0          0.000ms     0.000000ms     0.000000ms
SWAP        count    delay total  delay average      delay max      delay min
                0              0          0.000ms     0.000000ms     0.000000ms
RECLAIM     count    delay total  delay average      delay max      delay min
                0              0          0.000ms     0.000000ms     0.000000ms
THRASHING   count    delay total  delay average      delay max      delay min
                0              0          0.000ms     0.000000ms     0.000000ms
COMPACT     count    delay total  delay average      delay max      delay min
                0              0          0.000ms     0.000000ms     0.000000ms
WPCOPY      count    delay total  delay average      delay max      delay min
              156       11215873          0.072ms     0.207403ms     0.033913ms
IRQ         count    delay total  delay average      delay max      delay min
                0              0          0.000ms     0.000000ms     0.000000ms

Link: https://lkml.kernel.org/r/20241220173105906EOdsPhzjMLYNJJBqgz1ga@zte.com.cn
Co-developed-by: Wang Yong <wang.yong12@zte.com.cn>
Signed-off-by: Wang Yong <wang.yong12@zte.com.cn>
Co-developed-by: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Wang Yaxin <wang.yaxin@zte.com.cn>
Co-developed-by: Kun Jiang <jiang.kun2@zte.com.cn>
Signed-off-by: Kun Jiang <jiang.kun2@zte.com.cn>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Fan Yu <fan.yu9@zte.com.cn>
Cc: Peilin He <he.peilin@zte.com.cn>
Cc: tuqiang <tu.qiang35@zte.com.cn>
Cc: ye xingchen <ye.xingchen@zte.com.cn>
Cc: Yunkai Zhang <zhang.yunkai@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-12 20:21:16 -08:00
Wang Yaxin
658eb5ab91 delayacct: add delay max to record delay peak
Introduce the use cases of delay max, which can help quickly detect
potential abnormal delays in the system and record the types and specific
details of delay spikes.

Problem
========
Delay accounting can track the average delay of processes to show
system workload. However, when a process experiences a significant
delay, maybe a delay spike, which adversely affects performance,
getdelays can only display the average system delay over a period
of time. Yet, average delay is unhelpful for diagnosing delay peak.
It is not even possible to determine which type of delay has spiked,
as this information might be masked by the average delay.

Solution
=========
the 'delay max' can display delay peak since the system's startup,
which can record potential abnormal delays over time, including
the type of delay and the maximum delay. This is helpful for
quickly identifying crash caused by delay.

Use case
=========
bash# ./getdelays -d -p 244
print delayacct stats ON
PID     244

CPU             count     real total  virtual total    delay total  delay average      delay max
                   68      192000000      213676651         705643          0.010ms     0.306381ms
IO              count    delay total  delay average      delay max
                    0              0          0.000ms     0.000000ms
SWAP            count    delay total  delay average      delay max
                    0              0          0.000ms     0.000000ms
RECLAIM         count    delay total  delay average      delay max
                    0              0          0.000ms     0.000000ms
THRASHING       count    delay total  delay average      delay max
                    0              0          0.000ms     0.000000ms
COMPACT         count    delay total  delay average      delay max
                    0              0          0.000ms     0.000000ms
WPCOPY          count    delay total  delay average      delay max
                  235       15648284          0.067ms     0.263842ms
IRQ             count    delay total  delay average      delay max
                    0              0          0.000ms     0.000000ms

[wang.yaxin@zte.com.cn: update docs and fix some spelling errors]
  Link: https://lkml.kernel.org/r/20241213192700771XKZ8H30OtHSeziGqRVMs0@zte.com.cn
Link: https://lkml.kernel.org/r/20241203164848805CS62CQPQWG9GLdQj2_BxS@zte.com.cn
Co-developed-by: Wang Yong <wang.yong12@zte.com.cn>
Signed-off-by: Wang Yong <wang.yong12@zte.com.cn>
Co-developed-by: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
Co-developed-by: Wang Yaxin <wang.yaxin@zte.com.cn>
Signed-off-by: Wang Yaxin <wang.yaxin@zte.com.cn>
Signed-off-by: Kun Jiang <jiang.kun2@zte.com.cn>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Fan Yu <fan.yu9@zte.com.cn>
Cc: Peilin He <he.peilin@zte.com.cn>
Cc: tuqiang <tu.qiang35@zte.com.cn>
Cc: Yang Yang <yang.yang29@zte.com.cn>
Cc: ye xingchen <ye.xingchen@zte.com.cn>
Cc: Yunkai Zhang <zhang.yunkai@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-12 20:20:59 -08:00
Dave Airlie
24c61d5533 Merge tag 'drm-msm-next-2025-01-07' of gitlab.freedesktop.org:drm/msm into drm-next
Updates for v6.14

MDSS:
- properly described UBWC registers
- added SM6150 (aka QCS615) support

MDP4:
- several small fixes

DPU:
- added SM6150 (aka QCS615) support
- enabled wide planes if virtual planes are enabled (by using two SSPPs for a single plane)
- fixed modes filtering for platforms w/o 3DMux
- fixed DSPP DSPP_2 / _3 links on several platforms
- corrected DSPP definitions on SDM670
- added CWB hardware blocks support
- added VBIF to DPU snapshots
- dropped struct dpu_rm_requirements

DP:
- reworked DP audio support

DSI:
- added SM6150 (aka QCS615) support

GPU:
- Print GMU core fw version
- GMU bandwidth voting for a740 and a750
- Expose uche trap base via uapi
- UAPI error reporting

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rob Clark <robdclark@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGsutUu4ff6OpXNXxqf1xaV0rV6oV23VXNRiF0_OEfe72Q@mail.gmail.com
2025-01-13 11:14:07 +10:00
Takashi Iwai
3ab4a3199c ALSA: seq: Notify UMP EP and FB changes
So far we notify the sequencer client and port changes upon UMP FB
changes, but those aren't really corresponding to the UMP updates.
e.g. when a FB info gets updated, it's not notified but done only when
some of sequencer port attribute is changed.  This is no ideal
behavior.

This patch adds the two new sequencer event types for notifying the
UMP EP and FB changes via the announce port.  The new event takes
snd_seq_ev_ump_notify type data, which is compatible with
snd_seq_addr (where the port number is replaced with the block
number).

The events are sent when the EP and FB info gets updated explicitly
via ioctl, or the backend UMP receives the corresponding UMP
messages.

The sequencer protocol version is bumped to 1.0.5 along with it.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/20250110155943.31578-9-tiwai@suse.de
2025-01-12 13:12:21 +01:00
Takashi Iwai
7bb49d2e8b ALSA: rawmidi: Bump protocol version to 2.0.5
Bump the protocol version to 2.0.5, as we extended the rawmidi ABI for
the new tied_device info and the substream inactive flag.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/20250110155943.31578-4-tiwai@suse.de
2025-01-12 13:12:20 +01:00
Takashi Iwai
b8fefed73a ALSA: rawmidi: Show substream activity in info ioctl
The UMP legacy rawmidi may turn on/off the substream dynamically
depending on the UMP Function Block information.  So far, there was no
direct way to know whether the substream is disabled (inactive) or
not; at most one can take a look at the substream name string or try
to open and get -ENODEV.

This patch extends the rawmidi info ioctl to show the current inactive
state of the given substream.  When the selected substream is
inactive, info flags field contains the new bit flag
SNDRV_RAWMIDI_INFO_STREAM_INACTIVE.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/20250110155943.31578-3-tiwai@suse.de
2025-01-12 13:12:20 +01:00
Takashi Iwai
bdf46443f3 ALSA: rawmidi: Expose the tied device number in info ioctl
The UMP legacy rawmidi is derived from the UMP rawmidi, but currently
there is no way to know which device is involved in other side.

This patch extends the rawmidi info ioctl to show the tied device
number.  As default it stores -1, indicating that no tied device.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/20250110155943.31578-2-tiwai@suse.de
2025-01-12 13:12:20 +01:00
Anuj Gupta
94d57442e5 io_uring: expose read/write attribute capability
After commit 9a213d3b80c0, we can pass additional attributes along with
read/write. However, userspace doesn't know that. Add a new feature flag
IORING_FEAT_RW_ATTR, to notify the userspace that the kernel has this
ability.

Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
Reviewed-by: Li Zetao <lizetao1@huawei.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Tested-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20241205062109.1788-1-anuj20.g@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-10 17:12:42 -07:00
David S. Miller
7b24f164cf Merge tag 'ipsec-next-2025-01-09' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
ipsec-next-2025-01-09

1) Implement the AGGFRAG protocol and basic IP-TFS (RFC9347) functionality.
   From Christian Hopps.

2) Support ESN context update to hardware for TX.
   From Jianbo Liu.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2025-01-10 09:15:17 +00:00
Dave Airlie
f6001870ed Merge tag 'v6.13-rc6' into drm-next
This backmerges Linux 6.13-rc6 this is need for the newer pulls.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2025-01-10 14:24:17 +10:00
Christoph Hellwig
7ed6cbe0f8 fs: add STATX_DIO_READ_ALIGN
Add a separate dio read align field, as many out of place write
file systems can easily do reads aligned to the device sector size,
but require bigger alignment for writes.

This is usually papered over by falling back to buffered I/O for smaller
writes and doing read-modify-write cycles, but performance for this
sucks, so applications benefit from knowing the actual write alignment.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250109083109.1441561-3-hch@lst.de
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-01-09 16:23:17 +01:00
Christoph Hellwig
8fc7e23a9b fs: reformat the statx definition
The comments after the declaration are becoming rather unreadable with
long enough comments.  Move them into lines of their own.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250109083109.1441561-2-hch@lst.de
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-01-09 16:23:17 +01:00
Florian Westphal
601731fc7c netfilter: conntrack: add conntrack event timestamp
Nadia Pinaeva writes:
  I am working on a tool that allows collecting network performance
  metrics by using conntrack events.
  Start time of a conntrack entry is used to evaluate seen_reply
  latency, therefore the sooner it is timestamped, the better the
  precision is.
  In particular, when using this tool to compare the performance of the
  same feature implemented using iptables/nftables/OVS it is crucial
  to have the entry timestamped earlier to see any difference.

At this time, conntrack events can only get timestamped at recv time in
userspace, so there can be some delay between the event being generated
and the userspace process consuming the message.

There is sys/net/netfilter/nf_conntrack_timestamp, which adds a
64bit timestamp (ns resolution) that records start and stop times,
but its not suited for this either, start time is the 'hashtable insertion
time', not 'conntrack allocation time'.

There is concern that moving the start-time moment to conntrack
allocation will add overhead in case of flooding, where conntrack
entries are allocated and released right away without getting inserted
into the hashtable.

Also, even if this was changed it would not with events other than
new (start time) and destroy (stop time).

Pablo suggested to add new CTA_TIMESTAMP_EVENT, this adds this feature.
The timestamp is recorded in case both events are requested and the
sys/net/netfilter/nf_conntrack_timestamp toggle is enabled.

Reported-by: Nadia Pinaeva <n.m.pinaeva@gmail.com>
Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-01-09 14:42:16 +01:00
Yuyang Huang
33d97a07b3 netlink: add IPv6 anycast join/leave notifications
This change introduces a mechanism for notifying userspace
applications about changes to IPv6 anycast addresses via netlink. It
includes:

* Addition and deletion of IPv6 anycast addresses are reported using
  RTM_NEWANYCAST and RTM_DELANYCAST.
* A new netlink group (RTNLGRP_IPV6_ACADDR) for subscribing to these
  notifications.

This enables user space applications(e.g. ip monitor) to efficiently
track anycast addresses through netlink messages, improving metrics
collection and system monitoring. It also unlocks the potential for
advanced anycast management in user space, such as hardware offload
control and fine grained network control.

Cc: Maciej Żenczykowski <maze@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Yuyang Huang <yuyanghuang@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250107114355.1766086-1-yuyanghuang@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-09 12:54:45 +01:00
Dave Airlie
9cc3e4e9f4 Merge tag 'drm-xe-next-2025-01-07' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next
UAPI Changes:
- OA new property: 'unblock after N reports' (Ashutosh)

i915 display Changes:
- UHBR rates for Thunderbolt (Kahola)

Driver Changes:
- IRQ related fixes and improvements (Ilia)
 - Revert some changes that break a mesa debug tool (John)
 - Fix migration issues (Nirmoy)
 - Enable GuC's WA_DUAL_QUEUE for newer platforms (Daniele)
 - Move shrink test out of xe_bo (Nirmoy)
 - SRIOV PF: Use correct function to check LMEM provisioning (Michal)
 - Fix a false-positive "Missing outer runtime PM protection" warning (Rodrigo)
 - Make GSCCS disabling message less alarming (Daniele)
 - Fix DG1 power gate sequence (Rodrigo)
 - Xe files fixes (Lucas)
 - Fix a potential TP_printk UAF (Thomas)
 - OA Fixes (Umesh)
 - Fix tlb invalidation when wedging (Lucas)
 - Documentation fix (Lucas)

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/Z31579j3V3XCPFaK@intel.com
2025-01-09 18:47:17 +10:00
Karol Wachowski
465a3914b2 accel/ivpu: Add API for command queue create/destroy/submit
Implement support for explicit command queue management.
To allow more flexible control over command queues add capabilities
to create, destroy and submit jobs to specific command queues.

Signed-off-by: Karol Wachowski <karol.wachowski@intel.com>
Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250107173238.381120-3-maciej.falkowski@linux.intel.com
2025-01-09 09:35:44 +01:00
Elizabeth Figura
a138179a59 ntsync: Introduce alertable waits.
NT waits can optionally be made "alertable". This is a special channel for
thread wakeup that is mildly similar to SIGIO. A thread has an internal single
bit of "alerted" state, and if a thread is alerted while an alertable wait, the
wait will return a special value, consume the "alerted" state, and will not
consume any of its objects.

Alerts are implemented using events; the user-space NT emulator is expected to
create an internal ntsync event for each thread and pass that event to wait
functions.

Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20241213193511.457338-16-zfigura@codeweavers.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-08 13:18:11 +01:00
Elizabeth Figura
e864071a63 ntsync: Introduce NTSYNC_IOC_EVENT_READ.
This corresponds to the NT syscall NtQueryEvent().

This returns the signaled state of the event and whether it is manual-reset.

Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com>
Link: https://lore.kernel.org/r/20241213193511.457338-15-zfigura@codeweavers.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-08 13:18:11 +01:00
Elizabeth Figura
0b3c31449d ntsync: Introduce NTSYNC_IOC_MUTEX_READ.
This corresponds to the NT syscall NtQueryMutant().

This returns the recursion count, owner, and abandoned state of the mutex.

Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com>
Link: https://lore.kernel.org/r/20241213193511.457338-14-zfigura@codeweavers.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-08 13:18:11 +01:00
Elizabeth Figura
a948f4177c ntsync: Introduce NTSYNC_IOC_SEM_READ.
This corresponds to the NT syscall NtQuerySemaphore().

This returns the current count and maximum count of the semaphore.

Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com>
Link: https://lore.kernel.org/r/20241213193511.457338-13-zfigura@codeweavers.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-08 13:18:11 +01:00
Elizabeth Figura
12b29d3008 ntsync: Introduce NTSYNC_IOC_EVENT_PULSE.
This corresponds to the NT syscall NtPulseEvent().

This wakes up any waiters as if the event had been set, but does not set the
event, instead resetting it if it had been signalled. Thus, for a manual-reset
event, all waiters are woken, whereas for an auto-reset event, at most one
waiter is woken.

Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20241213193511.457338-12-zfigura@codeweavers.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-08 13:18:11 +01:00
Elizabeth Figura
bbb9797514 ntsync: Introduce NTSYNC_IOC_EVENT_RESET.
This corresponds to the NT syscall NtResetEvent().

This sets the event to the unsignaled state, and returns its previous state.

Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20241213193511.457338-11-zfigura@codeweavers.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-08 13:18:11 +01:00
Elizabeth Figura
2dcba6fc15 ntsync: Introduce NTSYNC_IOC_EVENT_SET.
This corresponds to the NT syscall NtSetEvent().

This sets the event to the signaled state, and returns its previous state.

Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com>
Link: https://lore.kernel.org/r/20241213193511.457338-10-zfigura@codeweavers.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-08 13:18:11 +01:00
Elizabeth Figura
4c7404b9c2 ntsync: Introduce NTSYNC_IOC_CREATE_EVENT.
This correspond to the NT syscall NtCreateEvent().

An NT event holds a single bit of state denoting whether it is signaled or
unsignaled.

There are two types of events: manual-reset and automatic-reset. When an
automatic-reset event is acquired via a wait function, its state is reset to
unsignaled. Manual-reset events are not affected by wait functions.

Whether the event is manual-reset, and its initial state, are specified at
creation time.

Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com>
Link: https://lore.kernel.org/r/20241213193511.457338-9-zfigura@codeweavers.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-08 13:18:11 +01:00
Elizabeth Figura
ecc2ee3614 ntsync: Introduce NTSYNC_IOC_MUTEX_KILL.
This does not correspond to any NT syscall. Rather, when a thread dies, it
should be called by the NT emulator for each mutex, with the TID of the dying
thread.

NT mutexes are robust (in the pthread sense). When an NT thread dies, any
mutexes it owned are immediately released. Acquisition of those mutexes by other
threads will return a special value indicating that the mutex was abandoned,
like EOWNERDEAD returned from pthread_mutex_lock(), and EOWNERDEAD is indeed
used here for that purpose.

Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com>
Link: https://lore.kernel.org/r/20241213193511.457338-8-zfigura@codeweavers.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-08 13:18:10 +01:00
Elizabeth Figura
31ca7bb8e8 ntsync: Introduce NTSYNC_IOC_MUTEX_UNLOCK.
This corresponds to the NT syscall NtReleaseMutant().

This syscall decrements the mutex's recursion count by one, and returns the
previous value. If the mutex is not owned by the current task, the function
instead fails and returns -EPERM.

Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com>
Link: https://lore.kernel.org/r/20241213193511.457338-7-zfigura@codeweavers.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-08 13:18:10 +01:00
Elizabeth Figura
5bc2479a35 ntsync: Introduce NTSYNC_IOC_CREATE_MUTEX.
This corresponds to the NT syscall NtCreateMutant().

An NT mutex is recursive, with a 32-bit recursion counter. When acquired via
NtWaitForMultipleObjects(), the recursion counter is incremented by one. The OS
records the thread which acquired it.

The OS records the thread which acquired it. However, in order to keep this
driver self-contained, the owning thread ID is managed by user-space, and passed
as a parameter to all relevant ioctls.

The initial owner and recursion count, if any, are specified when the mutex is
created.

Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com>
Link: https://lore.kernel.org/r/20241213193511.457338-6-zfigura@codeweavers.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-08 13:18:10 +01:00
Elizabeth Figura
cdbb997822 ntsync: Introduce NTSYNC_IOC_WAIT_ALL.
This is similar to NTSYNC_IOC_WAIT_ANY, but waits until all of the objects are
simultaneously signaled, and then acquires all of them as a single atomic
operation.

Because acquisition of multiple objects is atomic, some complex locking is
required. We cannot simply spin-lock multiple objects simultaneously, as that
may disable preëmption for a problematically long time.

Instead, modifying any object which may be involved in a wait-all operation takes
a device-wide sleeping mutex, "wait_all_lock", instead of the normal object
spinlock.

Because wait-for-all is a rare operation, in order to optimize wait-for-any,
this lock is only taken when necessary. "all_hint" is used to mark objects which
are involved in a wait-for-all operation, and if an object is not, only its
spinlock is taken.

The locking scheme used here was written by Peter Zijlstra.

Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com>
Link: https://lore.kernel.org/r/20241213193511.457338-5-zfigura@codeweavers.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-08 13:18:10 +01:00
Elizabeth Figura
b4a7b5fe3f ntsync: Introduce NTSYNC_IOC_WAIT_ANY.
This corresponds to part of the functionality of the NT syscall
NtWaitForMultipleObjects(). Specifically, it implements the behaviour where
the third argument (wait_any) is TRUE, and it does not handle alertable waits.
Those features have been split out into separate patches to ease review.

This patch therefore implements the wait/wake infrastructure which comprises the
core of ntsync's functionality.

NTSYNC_IOC_WAIT_ANY is a vectored wait function similar to poll(). Unlike
poll(), it "consumes" objects when they are signaled. For semaphores, this means
decreasing one from the internal counter. At most one object can be consumed by
this function.

This wait/wake model is fundamentally different from that used anywhere else in
the kernel, and for that reason ntsync does not use any existing infrastructure,
such as futexes, kernel mutexes or semaphores, or wait_event().

Up to 64 objects can be waited on at once. As soon as one is signaled, the
object with the lowest index is consumed, and that index is returned via the
"index" field.

A timeout is supported. The timeout is passed as a u64 nanosecond value, which
represents absolute time measured against either the MONOTONIC or REALTIME clock
(controlled by the flags argument). If U64_MAX is passed, the ioctl waits
indefinitely.

This ioctl validates that all objects belong to the relevant device. This is not
necessary for any technical reason related to NTSYNC_IOC_WAIT_ANY, but will be
necessary for NTSYNC_IOC_WAIT_ALL introduced in the following patch.

Some padding fields are added for alignment and for fields which will be added
in future patches (split out to ease review).

Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com>
Link: https://lore.kernel.org/r/20241213193511.457338-4-zfigura@codeweavers.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-08 13:18:10 +01:00
Elizabeth Figura
5ec43d6b03 ntsync: Rename NTSYNC_IOC_SEM_POST to NTSYNC_IOC_SEM_RELEASE.
Use the more common "release" terminology, which is also the term used by NT,
instead of "post" (which is used by POSIX).

Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com>
Link: https://lore.kernel.org/r/20241213193511.457338-3-zfigura@codeweavers.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-08 13:18:10 +01:00
Elizabeth Figura
d75abf2f9f ntsync: Return the fd from NTSYNC_IOC_CREATE_SEM.
Simplify the user API a bit by returning the fd as return value from the ioctl
instead of through the argument pointer.

Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com>
Link: https://lore.kernel.org/r/20241213193511.457338-2-zfigura@codeweavers.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-08 13:18:10 +01:00
Rodolfo Giometti
86b525bed2 drivers pps: add PPS generators support
Sometimes one needs to be able not only to catch PPS signals but to
produce them also. For example, running a distributed simulation,
which requires computers' clock to be synchronized very tightly.

This patch adds PPS generators class in order to have a well-defined
interface for these devices.

Signed-off-by: Rodolfo Giometti <giometti@enneenne.com>
Link: https://lore.kernel.org/r/20241108073115.759039-2-giometti@enneenne.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-08 13:18:09 +01:00
Mark Brown
309caeef43 ASoC: Merge up v6.13-rc6
This helps several of my boards in CI.
2025-01-08 11:58:49 +00:00
Yongji Xie
b989531e0b vduse: relicense under GPL-2.0 OR BSD-3-Clause
Dual-license the vduse kernel header file to dual
GPL-2.0 OR BSD-3-Clause license to make it possible
to ship it with DPDK (under BSD-3-Clause) for older
distros.

Signed-off-by: Yongji Xie <xieyongji@bytedance.com>
Message-Id: <20241119074238.38299-1-xieyongji@bytedance.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-01-08 06:37:13 -05:00
Jakub Kicinski
a8a6531164 Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2025-01-07

We've added 7 non-merge commits during the last 32 day(s) which contain
a total of 11 files changed, 190 insertions(+), 103 deletions(-).

The main changes are:

1) Migrate the test_xdp_meta.sh BPF selftest into test_progs
   framework, from Bastien Curutchet.

2) Add ability to configure head/tailroom for netkit devices,
   from Daniel Borkmann.

3) Fixes and improvements to the xdp_hw_metadata selftest,
   from Song Yoong Siang.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
  selftests/bpf: Extend netkit tests to validate set {head,tail}room
  netkit: Add add netkit {head,tail}room to rt_link.yaml
  netkit: Allow for configuring needed_{head,tail}room
  selftests/bpf: Migrate test_xdp_meta.sh into xdp_context_test_run.c
  selftests/bpf: test_xdp_meta: Rename BPF sections
  selftests/bpf: Enable Tx hwtstamp in xdp_hw_metadata
  selftests/bpf: Actuate tx_metadata_len in xdp_hw_metadata
====================

Link: https://patch.msgid.link/20250107130908.143644-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-07 15:39:09 -08:00
Daniel Borkmann
b9ed315d3c netkit: Allow for configuring needed_{head,tail}room
Allow the user to configure needed_{head,tail}room for both netkit
devices. The idea is similar to 163e529200 ("veth: implement
ndo_set_rx_headroom") with the difference that the two parameters
can be specified upon device creation. By default the current behavior
stays as is which is needed_{head,tail}room is 0.

In case of Cilium, for example, the netkit devices are not enslaved
into a bridge or openvswitch device (rather, BPF-based redirection
is used out of tcx), and as such these parameters are not propagated
into the Pod's netns via peer device.

Given Cilium can run in vxlan/geneve tunneling mode (needed_headroom)
and/or be used in combination with WireGuard (needed_{head,tail}room),
allow the Cilium CNI plugin to specify these two upon netkit device
creation.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/bpf/20241220234658.490686-1-daniel@iogearbox.net
2025-01-06 09:48:49 +01:00
Christian Brauner
d2fc0ed52a Merge branch 'vfs-6.14.uncached_buffered_io'
Bring in the VFS changes for uncached buffered io.

Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-01-04 10:12:18 +01:00
Jens Axboe
af6505e574 fs: add RWF_DONTCACHE iocb and FOP_DONTCACHE file_operations flag
If a file system supports uncached buffered IO, it may set FOP_DONTCACHE
and enable support for RWF_DONTCACHE. If RWF_DONTCACHE is attempted
without the file system supporting it, it'll get errored with -EOPNOTSUPP.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20241220154831.1086649-8-axboe@kernel.dk
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-01-04 09:36:44 +01:00
Jakub Kicinski
385f186aba Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.13-rc6).

No conflicts.

Adjacent changes:

include/linux/if_vlan.h
  f91a5b8089 ("af_packet: fix vlan_get_protocol_dgram() vs MSG_PEEK")
  3f330db306 ("net: reformat kdoc return statements")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-03 16:29:29 -08:00
Linus Torvalds
aba74e639f Merge tag 'net-6.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
 "Including fixes from wireles and netfilter.

  Nothing major here. Over the last two weeks we gathered only around
  two-thirds of our normal weekly fix count, but delaying sending these
  until -rc7 seemed like a really bad idea.

  AFAIK we have no bugs under investigation. One or two reverts for
  stuff for which we haven't gotten a proper fix will likely come in the
  next PR.

  Current release - fix to a fix:

   - netfilter: nft_set_hash: unaligned atomic read on struct
     nft_set_ext

   - eth: gve: trigger RX NAPI instead of TX NAPI in gve_xsk_wakeup

  Previous releases - regressions:

   - net: reenable NETIF_F_IPV6_CSUM offload for BIG TCP packets

   - mptcp:
      - fix sleeping rcvmsg sleeping forever after bad recvbuffer adjust
      - fix TCP options overflow
      - prevent excessive coalescing on receive, fix throughput

   - net: fix memory leak in tcp_conn_request() if map insertion fails

   - wifi: cw1200: fix potential NULL dereference after conversion to
     GPIO descriptors

   - phy: micrel: dynamically control external clock of KSZ PHY, fix
     suspend behavior

  Previous releases - always broken:

   - af_packet: fix VLAN handling with MSG_PEEK

   - net: restrict SO_REUSEPORT to inet sockets

   - netdev-genl: avoid empty messages in NAPI get

   - dsa: microchip: fix set_ageing_time function on KSZ9477 and LAN937X

   - eth:
      - gve: XDP fixes around transmit, queue wakeup etc.
      - ti: icssg-prueth: fix firmware load sequence to prevent time
        jump which breaks timesync related operations

  Misc:

   - netlink: specs: mptcp: add missing attr and improve documentation"

* tag 'net-6.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (50 commits)
  net: ti: icssg-prueth: Fix clearing of IEP_CMP_CFG registers during iep_init
  net: ti: icssg-prueth: Fix firmware load sequence.
  mptcp: prevent excessive coalescing on receive
  mptcp: don't always assume copied data in mptcp_cleanup_rbuf()
  mptcp: fix recvbuffer adjust on sleeping rcvmsg
  ila: serialize calls to nf_register_net_hooks()
  af_packet: fix vlan_get_protocol_dgram() vs MSG_PEEK
  af_packet: fix vlan_get_tci() vs MSG_PEEK
  net: wwan: iosm: Properly check for valid exec stage in ipc_mmio_init()
  net: restrict SO_REUSEPORT to inet sockets
  net: reenable NETIF_F_IPV6_CSUM offload for BIG TCP packets
  net: sfc: Correct key_len for efx_tc_ct_zone_ht_params
  net: wwan: t7xx: Fix FSM command timeout issue
  sky2: Add device ID 11ab:4373 for Marvell 88E8075
  mptcp: fix TCP options overflow.
  net: mv643xx_eth: fix an OF node reference leak
  gve: trigger RX NAPI instead of TX NAPI in gve_xsk_wakeup
  eth: bcmsysport: fix call balance of priv->clk handling routines
  net: llc: reset skb->transport_header
  netlink: specs: mptcp: fix missing doc
  ...
2025-01-03 14:36:54 -08:00
Danylo Piliaiev
7a637e5e27 drm/msm: Expose uche trap base via uapi
This adds MSM_PARAM_UCHE_TRAP_BASE that will be used by Mesa
implementation for VK_KHR_shader_clock and GL_ARB_shader_clock.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Patchwork: https://patchwork.freedesktop.org/patch/627036/
Signed-off-by: Rob Clark <robdclark@chromium.org>
2025-01-03 07:20:27 -08:00
Matthieu Baerts (NGI0)
bea87657b5 netlink: specs: mptcp: clearly mention attributes
The rendered version of the MPTCP events [1] looked strange, because the
whole content of the 'doc' was displayed in the same block.

It was then not clear that the first words, not even ended by a period,
were the attributes that are defined when such events are emitted. These
attributes have now been moved to the end, prefixed by 'Attributes:' and
ended with a period. Note that '>-' has been added after 'doc:' to allow
':' in the text below.

The documentation in the UAPI header has been auto-generated by:

  ./tools/net/ynl/ynl-regen.sh

Link: https://docs.kernel.org/networking/netlink_spec/mptcp_pm.html#event-type [1]
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241221-net-mptcp-netlink-specs-pm-doc-fixes-v2-2-e54f2db3f844@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-27 11:16:21 -08:00
Matthieu Baerts (NGI0)
6b830c6a02 netlink: specs: mptcp: add missing 'server-side' attr
This attribute is added with the 'created' and 'established' events, but
the documentation didn't mention it.

The documentation in the UAPI header has been auto-generated by:

  ./tools/net/ynl/ynl-regen.sh

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241221-net-mptcp-netlink-specs-pm-doc-fixes-v2-1-e54f2db3f844@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-27 11:16:21 -08:00
Linus Torvalds
f0bc704f46 Merge tag 'hardening-v6.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening fix from Kees Cook:

 - stddef: make __struct_group() UAPI C++-friendly (Alexander Lobakin)

* tag 'hardening-v6.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  stddef: make __struct_group() UAPI C++-friendly
2024-12-27 10:39:05 -08:00
Anuj Gupta
59a7d12a7f io_uring: introduce attributes for read/write and PI support
Add the ability to pass additional attributes along with read/write.
Application can prepare attibute specific information and pass its
address using the SQE field:
	__u64	attr_ptr;

Along with setting a mask indicating attributes being passed:
	__u64	attr_type_mask;

Overall 64 attributes are allowed and currently one attribute
'IORING_RW_ATTR_FLAG_PI' is supported.

With PI attribute, userspace can pass following information:
- flags: integrity check flags IO_INTEGRITY_CHK_{GUARD/APPTAG/REFTAG}
- len: length of PI/metadata buffer
- addr: address of metadata buffer
- seed: seed value for reftag remapping
- app_tag: application defined 16b value

Process this information to prepare uio_meta_descriptor and pass it down
using kiocb->private.

PI attribute is supported only for direct IO.

Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/r/20241128112240.8867-7-anuj20.g@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23 08:17:16 -07:00
Anuj Gupta
10783d0ba0 fs, iov_iter: define meta io descriptor
Add flags to describe checks for integrity meta buffer. Also, introduce
a  new 'uio_meta' structure that upper layer can use to pass the
meta/integrity information.

Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20241128112240.8867-5-anuj20.g@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23 08:17:16 -07:00
Greg Kroah-Hartman
d3571faa1b Merge 6.14-rc4 into usb-next
We need the USB fixes in here as well for testing.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-23 07:59:03 +01:00
Randy Dunlap
135ec43eb2 fiemap: use kernel-doc includes in fiemap docbook
Add some kernel-doc notation to structs in fiemap header files
then pull that into Documentation/filesystems/fiemap.rst
instead of duplicating the header file structs in fiemap.rst.
This helps to future-proof fiemap.rst against struct changes.

Add missing flags documentation from header files into fiemap.rst
for FIEMAP_FLAG_CACHE and FIEMAP_EXTENT_SHARED.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20241121011352.201907-1-rdunlap@infradead.org
Cc: Christoph Hellwig <hch@lst.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-22 11:29:50 +01:00
Alexander Lobakin
724c6ce38b stddef: make __struct_group() UAPI C++-friendly
For the most part of the C++ history, it couldn't have type
declarations inside anonymous unions for different reasons. At the
same time, __struct_group() relies on the latters, so when the @TAG
argument is not empty, C++ code doesn't want to build (even under
`extern "C"`):

../linux/include/uapi/linux/pkt_cls.h:25:24: error:
'struct tc_u32_sel::<unnamed union>::tc_u32_sel_hdr,' invalid;
an anonymous union may only have public non-static data members
[-fpermissive]

The safest way to fix this without trying to switch standards (which
is impossible in UAPI anyway) etc., is to disable tag declaration
for that language. This won't break anything since for now it's not
buildable at all.
Use a separate definition for __struct_group() when __cplusplus is
defined to mitigate the error, including the version from tools/.

Fixes: 50d7bd38c3 ("stddef: Introduce struct_group() helper macro")
Reported-by: Christopher Ferris <cferris@google.com>
Closes: https://lore.kernel.org/linux-hardening/Z1HZpe3WE5As8UAz@google.com
Suggested-by: Kees Cook <kees@kernel.org> # __struct_group_tag()
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20241219135734.2130002-1-aleksander.lobakin@intel.com
Signed-off-by: Kees Cook <kees@kernel.org>
2024-12-20 09:05:53 -08:00
Dave Airlie
d678c63534 Merge tag 'drm-misc-next-2024-12-19' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
drm-misc-next for 6.14:

UAPI Changes:

Cross-subsystem Changes:

Core Changes:
  - connector: Add a mutex to protect ELD access, Add a helper to create
    a connector in two steps

Driver Changes:
  - amdxdna: Add RyzenAI-npu6 Support, various improvements
  - rcar-du: Add r8a779h0 Support
  - rockchip: various improvements
  - zynqmp: Add DP audio support
  - bridges:
    - ti-sn65dsi83: Add ti,lvds-vod-swing optional properties
  - panels:
    - new panels: Tianma TM070JDHG34-00, Multi-Inno Technology MI1010Z1T-1CP11

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <mripard@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241219-truthful-demonic-hound-598f63@houat
2024-12-20 08:24:34 +10:00