Now that snd_pcm_sync_ptr_get_user() and snd_pcm_sync_ptr_put_user()
are converted to user_access_begin/user_access_end(),
snd_pcm_sync_ptr_get_user() is more efficient than a raw get_user()
followed by a copy_from_user(). And because copy_{to/from}_user() are
generic functions focussed on transfer of big data blocks to/from user,
snd_pcm_sync_ptr_put_user() is also more efficient for small amont of
data.
So use snd_pcm_sync_ptr_get_user() and snd_pcm_sync_ptr_put_user() in
snd_pcm_sync_ptr() too.
snd_pcm_ioctl_sync_ptr_buggy() is left as it is because the conversion
wouldn't be straigh-forward due to the workaround it provides.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/6ce6bc4da498ea7ea2be5f279b374370b1613b13.1749883041.git.christophe.leroy@csgroup.eu
To match struct __snd_pcm_mmap_status and enable reuse of
snd_pcm_sync_ptr_get_user() and snd_pcm_sync_ptr_put_user() by
snd_pcm_sync_ptr() replace tstamp_sec and tstamp_nsec fields by
a struct __snd_timespec in struct snd_pcm_mmap_status32.
Do the same with audio_tstamp_sec and audio_tstamp_nsec.
This is possible because struct snd_pcm_mmap_status32 is packed
and __SND_STRUCT_TIME64 is always defined for kernel which means
struct __snd_timespec is always defined as:
struct __snd_timespec {
__s32 tv_sec;
__s32 tv_nsec;
};
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/df8ea1a9aff61c3e358759b1f495bdb9fb8a3e6a.1749883041.git.christophe.leroy@csgroup.eu
With user access protection (Called SMAP on x86 or KUAP on powerpc)
each and every call to get_user() or put_user() performs heavy
operations to unlock and lock kernel access to userspace.
SNDRV_PCM_IOCTL_SYNC_PTR is a hot path which is called really often
and needs to run as fast as possible.
To improve performance, perform user accesses by blocks using
user_access_begin/user_access_end() and unsafe_get_user()/
unsafe_put_user().
Before the patch the 9 calls to put_user() at the end of
snd_pcm_ioctl_sync_ptr_compat() imply the following set of
instructions about 9 times (access_ok - enable user - write - disable
user):
0.00 : c057f858: 3d 20 7f ff lis r9,32767
0.29 : c057f85c: 39 5e 00 14 addi r10,r30,20
0.77 : c057f860: 61 29 ff fc ori r9,r9,65532
0.32 : c057f864: 7c 0a 48 40 cmplw r10,r9
0.36 : c057f868: 41 a1 fb 58 bgt c057f3c0 <snd_pcm_ioctl+0xbb0>
0.30 : c057f86c: 3d 20 dc 00 lis r9,-9216
1.95 : c057f870: 7d 3a c3 a6 mtspr 794,r9
0.33 : c057f874: 92 8a 00 00 stw r20,0(r10)
0.27 : c057f878: 3d 20 de 00 lis r9,-8704
0.28 : c057f87c: 7d 3a c3 a6 mtspr 794,r9
...
A perf profile shows that in total the 9 put_user() represent 36% of
the time spent in snd_pcm_ioctl() and about 80 instructions.
With this patch everything is done in 13 instructions and represent
only 15% of the time spent in snd_pcm_ioctl():
0.57 : c057f5dc: 3d 20 dc 00 lis r9,-9216
0.98 : c057f5e0: 7d 3a c3 a6 mtspr 794,r9
0.16 : c057f5e4: 92 7f 00 04 stw r19,4(r31)
0.63 : c057f5e8: 93 df 00 0c stw r30,12(r31)
0.16 : c057f5ec: 93 9f 00 10 stw r28,16(r31)
4.95 : c057f5f0: 92 9f 00 14 stw r20,20(r31)
0.19 : c057f5f4: 92 5f 00 18 stw r18,24(r31)
0.49 : c057f5f8: 92 bf 00 1c stw r21,28(r31)
0.27 : c057f5fc: 93 7f 00 20 stw r27,32(r31)
5.88 : c057f600: 93 36 00 00 stw r25,0(r22)
0.11 : c057f604: 93 17 00 00 stw r24,0(r23)
0.00 : c057f608: 3d 20 de 00 lis r9,-8704
0.79 : c057f60c: 7d 3a c3 a6 mtspr 794,r9
Note that here the access_ok() in user_write_access_begin() is skipped
because the exact same verification has already been performed at the
beginning of the fonction with the call to user_read_access_begin().
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/eccd047f2dfeb550129a1d60035e2233c4401d0c.1749883041.git.christophe.leroy@csgroup.eu
In an effort of optimising SNDRV_PCM_IOCTL_SYNC_PTR ioctl which
is a hot path, lets first refactor the copy from and to user
with macros.
This is done with macros and not static inline fonctions because
types differs between the different versions of snd_pcm_sync_ptr()
like functions.
First step is to refactor only snd_pcm_ioctl_sync_ptr_compat() and
snd_pcm_ioctl_sync_ptr_x32() as it would be a performance
regression for snd_pcm_sync_ptr() and snd_pcm_ioctl_sync_ptr_buggy()
for now. They may be refactored after next patch.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/f8b77932bb9ce96148ae5c3953e7ee44fa2359f8.1749883041.git.christophe.leroy@csgroup.eu
Replace comma between expressions with semicolons.
Using a ',' in place of a ';' can have unintended side effects.
Although that is not the case here, it is seems best to use ';'
unless ',' is intended.
Found by inspection.
No functional change intended.
Compile tested only.
Fixes: 79d561c4ec ("ALSA: usb-audio: Add mixer quirk for Sony DualSense PS5")
Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Reviewed-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Link: https://patch.msgid.link/20250612060228.1518028-1-nichen@iscas.ac.cn
Signed-off-by: Takashi Iwai <tiwai@suse.de>
At the time being recalculate_boundary() is implemented with a
loop which shows up as costly in a perf profile, as depicted by
the annotate below:
0.00 : c057e934: 3d 40 7f ff lis r10,32767
0.03 : c057e938: 61 4a ff ff ori r10,r10,65535
0.21 : c057e93c: 7d 49 50 50 subf r10,r9,r10
5.39 : c057e940: 7d 3c 4b 78 mr r28,r9
2.11 : c057e944: 55 29 08 3c slwi r9,r9,1
3.04 : c057e948: 7c 09 50 40 cmplw r9,r10
2.47 : c057e94c: 40 81 ff f4 ble c057e940 <snd_pcm_ioctl+0xee0>
Total: 13.2% on that simple loop.
But what the loop does is to multiply the boundary by 2 until it is
over the wanted border. This can be avoided by using fls() to get the
boundary value order and shift it by the appropriate number of bits at
once.
This change provides the following profile:
0.04 : c057f6e8: 3d 20 7f ff lis r9,32767
0.02 : c057f6ec: 61 29 ff ff ori r9,r9,65535
0.34 : c057f6f0: 7d 5a 48 50 subf r10,r26,r9
0.23 : c057f6f4: 7c 1a 50 40 cmplw r26,r10
0.02 : c057f6f8: 41 81 00 20 bgt c057f718 <snd_pcm_ioctl+0xf08>
0.26 : c057f6fc: 7f 47 00 34 cntlzw r7,r26
0.09 : c057f700: 7d 48 00 34 cntlzw r8,r10
0.22 : c057f704: 7d 08 38 50 subf r8,r8,r7
0.04 : c057f708: 7f 5a 40 30 slw r26,r26,r8
0.35 : c057f70c: 7c 0a d0 40 cmplw r10,r26
0.13 : c057f710: 40 80 05 f8 bge c057fd08 <snd_pcm_ioctl+0x14f8>
0.00 : c057f714: 57 5a f8 7e srwi r26,r26,1
Total: 1.7% with that loopless alternative.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Link: https://patch.msgid.link/4836e2cde653eebaf2709ebe30eec736bb8c67fd.1749202237.git.christophe.leroy@csgroup.eu
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The Sony DualSense wireless controller (PS5) features an internal mono
speaker, but it also provides a 3.5mm jack socket for headphone output
and headset microphone input.
Since this is a UAC1 device, it doesn't advertise any jack detection
capability. However, the controller is able to report HP & MIC insert
events via HID, i.e. through a dedicated input device managed by the
hid-playstation driver.
Add a quirk to create the jack controls for headphone and headset mic,
respectively, and setup an input handler for each of them in order to
intercept the related hotplug events.
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/20250526-dualsense-alsa-jack-v1-9-1a821463b632@collabora.com
Adding a memory barrier before wake_up() in
snd_usb_soundblaster_remote_complete() is supposed to ensure the write
to mixer->rc_code is visible in wait_event_interruptible() from
snd_usb_sbrc_hwdep_read().
However, this is not really necessary, since wake_up() is just a wrapper
over __wake_up() which already executes a full memory barrier before
accessing the state of the task to be waken up.
Drop the redundant call to wmb() and implicitly fix the checkpatch
complaint:
WARNING: memory barrier without comment
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/20250526-dualsense-alsa-jack-v1-8-1a821463b632@collabora.com
Address all whitespace & blank line(s) related issues reported by
checkpatch.pl:
ERROR: trailing whitespace
ERROR: space required after that ',' (ctx:VxV)
WARNING: Missing a blank line after declarations
CHECK: Please use a blank line after function/struct/union/enum declarations
CHECK: Please don't use multiple blank lines
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/20250526-dualsense-alsa-jack-v1-2-1a821463b632@collabora.com
Pull timer cleanup from Thomas Gleixner:
"The delayed from_timer() API cleanup:
The renaming to the timer_*() namespace was delayed due massive
conflicts against Linux-next. Now that everything is upstream finish
the conversion"
* tag 'timers-cleanups-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
treewide, timers: Rename from_timer() to timer_container_of()
Pull x86 fixes from Thomas Gleixner:
"A small set of x86 fixes:
- Cure IO bitmap inconsistencies
A failed fork cleans up all resources of the newly created thread
via exit_thread(). exit_thread() invokes io_bitmap_exit() which
does the IO bitmap cleanups, which unfortunately assume that the
cleanup is related to the current task, which is obviously bogus.
Make it work correctly
- A lockdep fix in the resctrl code removed the clearing of the
command buffer in two places, which keeps stale error messages
around. Bring them back.
- Remove unused trace events"
* tag 'x86-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
fs/resctrl: Restore the rdt_last_cmd_clear() calls after acquiring rdtgroup_mutex
x86/iopl: Cure TIF_IO_BITMAP inconsistencies
x86/fpu: Remove unused trace events
Pull timer fix from Thomas Gleixner:
"Add the missing seq_file forward declaration in the timer namespace
header"
* tag 'timers-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timens: Add struct seq_file forward declaration
Add initial DMR support, which required smarter RAPL probe
Fix AMD MSR RAPL energy reporting
Add RAPL power limit configuration output
Minor fixes
Signed-off-by: Len Brown <len.brown@intel.com>
For the RAPL package energy status counter, Intel and AMD share the same
perf_subsys and perf_name, but with different MSR addresses.
Both rapl_counter_arch_infos[0] and rapl_counter_arch_infos[1] are
introduced to describe this counter for different Vendors.
As a result, the perf counter is probed twice, and causes a failure in
in get_rapl_counters() because expected_read_size and actual_read_size
don't match.
Fix the problem by skipping the already probed counter.
Note, this is not a perfect fix. For example, if different
vendors/platforms use the same MSR value for different purpose, the code
can be fooled when it probes a rapl_counter_arch_infos[] entry that does
not belong to the running Vendor/Platform.
In a long run, better to put rapl_counter_arch_infos[] into the
platform_features so that this becomes Vendor/Platform specific.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
platform_features->rapl_msrs describes the RAPL MSRs supported. While
RAPL Perf counters can be exposed from different kernel backend drivers,
e.g. RAPL MSR I/F driver, or RAPL TPMI I/F driver.
Thus, turbostat should first blindly probe all the available RAPL Perf
counters, and falls back to the RAPL MSR counters if they are listed in
platform_features->rapl_msrs.
With this, platforms that don't have RAPL MSRs can clear the
platform_features->rapl_msrs bits and use RAPL Perf counters only.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Increase the code readability by moving the no_perf/no_msr flag and the
cai->perf_name/cai->msr sanity checks into the counter probe functions.
No functional change.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
probe_rapl_msr() is reused for probing RAPL MSR counters, cstate MSR
counters and MPERF/APERF/SMI MSR counters, thus its name is misleading.
Similar to add_perf_counter(), introduce add_msr_counter() to probe a
counter via MSR. Introduce wrapper function add_rapl_msr_counter() at
the same time to add extra check for Zero return value for specified
RAPL counters.
No functional change intended.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
As the only caller of add_msr_perf_counter_(), add_msr_perf_counter()
just gives extra debug output on top. There is no need to keep both
functions.
Remove add_msr_perf_counter_() and move all the logic to
add_msr_perf_counter().
No functional change.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
As the only caller of add_cstate_perf_counter_(),
add_cstate_perf_counter() just gives extra debug output on top. There is
no need to keep both functions.
Remove add_cstate_perf_counter_() and move all the logic to
add_cstate_perf_counter().
No functional change.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
As the only caller of add_rapl_perf_counter_(), add_rapl_perf_counter()
just gives extra debug output on top. There is no need to keep both
functions.
Remove add_rapl_perf_counter_() and move all the logic to
add_rapl_perf_counter().
No functional change.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Quit early for unsupported RAPL counters.
No functional change.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
rapl_joules bit should always be checked even if
platform_features->rapl_msrs is not set or no_msr flag is used.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
commit 05a2f07db8 ("tools/power turbostat: read RAPL counters via
perf") that adds support to read RAPL counters via perf defines the
notion of a RAPL domain_id which is set to physical_core_id on
platforms which support per_core_rapl counters (Eg: AMD processors
Family 17h onwards) and is set to the physical_package_id on all the
other platforms.
However, the physical_core_id is only unique within a package and on
platforms with multiple packages more than one core can have the same
physical_core_id and thus the same domain_id. (For eg, the first cores
of each package have the physical_core_id = 0). This results in all
these cores with the same physical_core_id using the same entry in the
rapl_counter_info_perdomain[]. Since rapl_perf_init() skips the
perf-initialization for cores whose domain_ids have already been
visited, cores that have the same physical_core_id always read the
perf file corresponding to the physical_core_id of the first package
and thus the package-energy is incorrectly reported to be the same
value for different packages.
Note: This issue only arises when RAPL counters are read via perf and
not when they are read via MSRs since in the latter case the MSRs are
read separately on each core.
Fix this issue by associating each CPU with rapl_core_id which is
unique across all the packages in the system.
Fixes: 05a2f07db8 ("tools/power turbostat: read RAPL counters via perf")
Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Len Brown <len.brown@intel.com>
It uses /dev/msrN device paths on Android instead of /dev/cpu/N/msr,
updates error messages and permission checks to reflect the Android
device path, and wraps platform-specific code with #if defined(ANDROID)
to ensure correct behavior on both Android and non-Android systems.
These changes improve compatibility and usability of turbostat on
Android devices.
Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Pull x86 perf fix from Thomas Gleixner:
"A single fix for the x86 performance counters on Intel CPUs:
The MSR offset calculations for fixed performance counters are stored
at the wrong index in the configuration array causing the general
purpose counter MSR offset to be overwritten, so both the general
purpose and the fixed counters offsets are incorrect.
Correct the array index calculation to fix that"
* tag 'perf-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Fix incorrect MSR index calculations in intel_pmu_config_acr()
Pull irq fix from Thomas Gleixner:
"A single fix for the PCI/MSI code:
The conversion to per device MSI domains created a MSI domain with
size 1 instead of sizing it to the maximum possible number of MSI
interrupts for the device. This "worked" as the subsequent allocations
resized the domain, but the recent change to move the prepare() call
into the domain creation path broke this works by chance mechanism.
Size the domain properly at creation time"
* tag 'irq-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
PCI/MSI: Size device MSI domain with the maximum number of vectors