mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2026-05-08 08:02:59 -04:00
1b4406d2a88cf65d615b0bcb42525d7a4723eb70
1294557 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
1b4406d2a8 |
perf annotate: Update parameters for reg extract functions to use raw instruction on powerpc
Use the raw instruction code and macros to identify memory instructions, extract register fields and also offset. The implementation addresses the D-form, X-form, DS-form instructions. Adds "mem_ref" field to check whether source/target has memory reference. Add function "get_powerpc_regs" which will set these fields: reg1, reg2, offset depending of where it is source or target ops. Update "parse" callback for "struct ins_ops" to also pass "struct disasm_line" as argument. This is needed in parse functions where opcode is used to determine whether to set multi_regs and other fields Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Kajol Jain <kjain@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Akanksha J N <akanksha@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Segher Boessenkool <segher@kernel.crashing.org> Link: https://lore.kernel.org/lkml/20240718084358.72242-7-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
0b971e6bf1 |
perf annotate: Add support to capture and parse raw instruction in powerpc using dso__data_read_offset utility
Add support to capture and parse raw instruction in powerpc.
Currently, the perf tool infrastructure uses two ways to disassemble
and understand the instruction. One is objdump and other option is
via libcapstone.
Currently, the perf tool infrastructure uses "--no-show-raw-insn" option
with "objdump" while disassemble. Example from powerpc with this option
for an instruction address is:
Snippet from:
objdump --start-address=<address> --stop-address=<address> -d --no-show-raw-insn -C <vmlinux>
c0000000010224b4: lwz r10,0(r9)
This line "lwz r10,0(r9)" is parsed to extract instruction name,
registers names and offset. Also to find whether there is a memory
reference in the operands, "memory_ref_char" field of objdump is used.
For x86, "(" is used as memory_ref_char to tackle instructions of the
form "mov (%rax), %rcx".
In case of powerpc, not all instructions using "(" are the only memory
instructions. Example, above instruction can also be of extended form (X
form) "lwzx r10,0,r19". Inorder to easy identify the instruction category
and extract the source/target registers, patch adds support to use raw
instruction for powerpc. Approach used is to read the raw instruction
directly from the DSO file using "dso__data_read_offset" utility which
is already implemented in perf infrastructure in "util/dso.c".
Example:
38 01 81 e8 ld r4,312(r1)
Here "38 01 81 e8" is the raw instruction representation. In powerpc,
this translates to instruction form: "ld RT,DS(RA)" and binary code
as:
| 58 | RT | RA | DS | |
-------------------------------------
0 6 11 16 30 31
Function "symbol__disassemble_dso" is updated to read raw instruction
directly from DSO using dso__data_read_offset utility. In case of
above example, this captures:
line: 38 01 81 e8
The above works well when 'perf report' is invoked with only sort keys
for data type ie type and typeoff.
Because there is no instruction level annotation needed if only data
type information is requested for.
For annotating sample, along with type and typeoff sort key, "sym" sort
key is also needed. And by default invoking just "perf report" uses sort
key "sym" that displays the symbol information.
With approach changes in powerpc which first reads DSO for raw
instruction, "perf annotate" and "perf report" + a key breaks since
it doesn't do the instruction level disassembly.
Snippet of result from 'perf report':
Samples: 1K of event 'mem-loads', 4000 Hz, Event count (approx.): 937238
do_work /usr/bin/pmlogger [Percent: local period]
Percent│ ea230010
│ 3a550010
│ 3a600000
│ 38f60001
│ 39490008
│ 42400438
51.44 │ 81290008
│ 7d485378
Here, raw instruction is displayed in the output instead of human
readable annotated form.
One way to get the appropriate data is to specify "--objdump path", by
which code annotation will be done. But the default behaviour will be
changed. To fix this breakage, check if "sym" sort key is set. If so
fallback and use the libcapstone/objdump way of disassmbling the sample.
With the changes and "perf report"
Samples: 1K of event 'mem-loads', 4000 Hz, Event count (approx.): 937238
do_work /usr/bin/pmlogger [Percent: local period]
Percent│ ld r17,16(r3)
│ addi r18,r21,16
│ li r19,0
│ 8b0: rldicl r10,r10,63,33
│ addi r10,r10,1
│ mtctr r10
│ ↓ b 8e4
│ 8c0: addi r7,r22,1
│ addi r10,r9,8
│ ↓ bdz d00
51.44 │ lwz r9,8(r9)
│ mr r8,r10
│ cmpw r20,r9
Committer notes:
Just add the extern for 'sort_order' in disasm.c so that we don't end up
breaking the build due to this type colision with capstone and libbpf:
In file included from /usr/include/capstone/capstone.h:325,
from /git/perf-6.10.0/tools/perf/util/print_insn.h:23,
from builtin-script.c:38:
/usr/include/capstone/bpf.h:94:14: error: 'bpf_insn' defined as wrong kind of tag
94 | typedef enum bpf_insn {
I reported this to the bpf mailing list, see one of the links below.
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Kajol Jain <kjain@linux.ibm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Akanksha J N <akanksha@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Link: https://lore.kernel.org/lkml/20240718084358.72242-6-atrajeev@linux.vnet.ibm.com
Link: https://lore.kernel.org/bpf/ZqOltPk9VQGgJZAA@x1/T/#u
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
||
|
|
06dd4c5a56 |
perf annotate: Add disasm_line__parse() to parse raw instruction for powerpc
Currently, the perf tool infrastructure uses the disasm_line__parse function to parse disassembled line. Example snippet from objdump: objdump --start-address=<address> --stop-address=<address> -d --no-show-raw-insn -C <vmlinux> c0000000010224b4: lwz r10,0(r9) This line "lwz r10,0(r9)" is parsed to extract instruction name, registers names and offset. In powerpc, the approach for data type profiling uses raw instruction instead of result from objdump to identify the instruction category and extract the source/target registers. Example: 38 01 81 e8 ld r4,312(r1) Here "38 01 81 e8" is the raw instruction representation. Add function "disasm_line__parse_powerpc" to handle parsing of raw instruction. Also update "struct disasm_line" to save the binary code/ With the change, function captures: line -> "38 01 81 e8 ld r4,312(r1)" raw instruction "38 01 81 e8" Raw instruction is used later to extract the reg/offset fields. Macros are added to extract opcode and register fields. "struct disasm_line" is updated to carry union of "bytes" and "raw_insn" of 32 bit to carry raw code (raw). Function "disasm_line__parse_powerpc fills the raw instruction hex value and can use macros to get opcode. There is no changes in existing code paths, which parses the disassembled code. The size of raw instruction depends on architecture. In case of powerpc, the parsing the disasm line needs to handle cases for reading binary code directly from DSO as well as parsing the objdump result. Hence adding the logic into separate function instead of updating "disasm_line__parse". The architecture using the instruction name and present approach is not altered. Since this approach targets powerpc, the macro implementation is added for powerpc as of now. Since the disasm_line__parse is used in other cases (perf annotate) and not only data tye profiling, the powerpc callback includes changes to work with binary code as well as mnemonic representation. Also in case if the DSO read fails and libcapstone is not supported, the approach fallback to use objdump as option. Hence as option, patch has changes to ensure objdump option also works well. Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Kajol Jain <kjain@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Akanksha J N <akanksha@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Segher Boessenkool <segher@kernel.crashing.org> Link: https://lore.kernel.org/lkml/20240718084358.72242-5-atrajeev@linux.vnet.ibm.com [ Add check for strndup() result ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
b1d8d968a7 |
perf annotate: Update TYPE_STATE_MAX_REGS to include max of regs in powerpc
TYPE_STATE_MAX_REGS is arch-dependent. Currently this is defined to be 16. While checking if reg is valid using has_reg_type, max value is checked using TYPE_STATE_MAX_REGS value. Define this conditionally for powerpc. Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Kajol Jain <kjain@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Akanksha J N <akanksha@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Segher Boessenkool <segher@kernel.crashing.org> Link: https://lore.kernel.org/lkml/20240718084358.72242-4-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
782959ac24 |
perf annotate: Add "update_insn_state" callback function to handle arch specific instruction tracking
Add "update_insn_state" callback to "struct arch" to handle instruction tracking. Currently updating instruction state is handled by static function "update_insn_state_x86" which is defined in "annotate-data.c". Make this as a callback for specific arch and move to archs specific file "arch/x86/annotate/instructions.c" . This will help to add helper function for other platforms in file: "arch/<platform>/annotate/instructions.c" and make changes/updates easier. Define callback "update_insn_state" as part of "struct arch", also make some of the debug functions non-static so that it can be referenced from other places. Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Kajol Jain <kjain@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Akanksha J N <akanksha@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Segher Boessenkool <segher@kernel.crashing.org> Link: https://lore.kernel.org/lkml/20240718084358.72242-3-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
1d303deedb |
perf annotate: Move the data structures related to register type to header file
Data type profiling uses instruction tracking by checking each instruction and updating the register type state in some data structures. This is useful to find the data type in cases when the register state gets transferred from one reg to another. Example, in x86, "mov" instruction and in powerpc, "mr" instruction. Currently these structures are defined in annotate-data.c and instruction tracking is implemented only for x86. Move these data structures to "annotate-data.h" header file so that other arch implementations can use it in arch specific files as well. Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Kajol Jain <kjain@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Akanksha J N <akanksha@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Segher Boessenkool <segher@kernel.crashing.org> Link: https://lore.kernel.org/lkml/20240718084358.72242-2-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
e293f4b1e5 |
perf test: Avoid python leak sanitizer test failures
Leak sanitizer will report memory leaks from python and the leak
sanitizer output causes tests to fail. For example:
```
$ perf test 98 -v
98: perf script tests:
--- start ---
test child forked, pid 1272962
DB test
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.046 MB /tmp/perf-test-script.x0EktdCel8/perf.data (8 samples) ]
call_path_table((1, 0, 0, 0)
call_path_table((2, 1, 0, 140339508617447)
call_path_table((3, 2, 2, 0)
call_path_table((4, 3, 3, 0)
call_path_table((5, 4, 4, 0)
call_path_table((6, 5, 5, 0)
call_path_table((7, 6, 6, 0)
call_path_table((8, 7, 7, 0)
call_path_table((9, 8, 8, 0)
call_path_table((10, 9, 9, 0)
call_path_table((11, 10, 10, 0)
call_path_table((12, 11, 11, 0)
call_path_table((13, 12, 1, 0)
sample_table((1, 1, 1, 1, 1, 1, 1, 8, -2058824120, 588306954119000, -1, 0, 0, 0, 0, 1, 0, 0, 128933429281, 0, 0, 13, 0, 0, 0, -1, -1))
sample_table((2, 1, 1, 1, 1, 1, 1, 8, -2058824120, 588306954137053, -1, 0, 0, 0, 0, 1, 0, 0, 128933429281, 0, 0, 13, 0, 0, 0, -1, -1))
sample_table((3, 1, 1, 1, 1, 1, 1, 8, -2058824120, 588306954140089, -1, 0, 0, 0, 0, 9, 0, 0, 128933429281, 0, 0, 13, 0, 0, 0, -1, -1))
sample_table((4, 1, 1, 1, 1, 1, 1, 8, -2058824120, 588306954142376, -1, 0, 0, 0, 0, 155, 0, 0, 128933429281, 0, 0, 13, 0, 0, 0, -1, -1))
sample_table((5, 1, 1, 1, 1, 1, 1, 8, -2058824120, 588306954144045, -1, 0, 0, 0, 0, 2493, 0, 0, 128933429281, 0, 0, 13, 0, 0, 0, -1, -1))
sample_table((6, 1, 1, 1, 1, 1, 12, 77, -2046828595, 588306954145722, -1, 0, 0, 0, 0, 47555, 0, 0, 128933429281, 0, 0, 13, 0, 0, 0, -1, -1))
call_path_table((14, 9, 14, 0)
call_path_table((15, 14, 15, 0)
call_path_table((16, 15, 0, -1040969624)
call_path_table((17, 16, 16, 0)
call_path_table((18, 17, 17, 0)
call_path_table((19, 18, 18, 0)
call_path_table((20, 19, 19, 0)
call_path_table((21, 20, 13, 0)
sample_table((7, 1, 1, 1, 2, 1, 13, 46, -2053700898, 588306954157436, -1, 0, 0, 0, 0, 964078, 0, 0, 128933429281, 0, 0, 21, 0, 0, 0, -1, -1))
call_path_table((22, 1, 21, 0)
call_path_table((23, 22, 22, 0)
call_path_table((24, 23, 23, 0)
call_path_table((25, 24, 24, 0)
call_path_table((26, 25, 25, 0)
call_path_table((27, 26, 26, 0)
call_path_table((28, 27, 27, 0)
call_path_table((29, 28, 28, 0)
call_path_table((30, 29, 29, 0)
call_path_table((31, 30, 30, 0)
call_path_table((32, 31, 31, 0)
call_path_table((33, 32, 32, 0)
call_path_table((34, 33, 33, 0)
call_path_table((35, 34, 20, 0)
sample_table((8, 1, 1, 1, 2, 1, 20, 49, -2046878127, 588306954378624, -1, 0, 0, 0, 0, 2534317, 0, 0, 128933429281, 0, 0, 35, 0, 0, 0, -1, -1))
=================================================================
==1272975==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 13628 byte(s) in 6 object(s) allocated from:
#0 0x56354f60c092 in malloc (/tmp/perf/perf+0x29c092)
#1 0x7ff25c7d02e7 in _PyObject_Malloc /build/python3.11/../Objects/obmalloc.c:2003:11
#2 0x7ff25c7d02e7 in _PyObject_Malloc /build/python3.11/../Objects/obmalloc.c:1996:1
SUMMARY: AddressSanitizer: 13628 byte(s) leaked in 6 allocation(s).
--- Cleaning up ---
---- end(-1) ----
98: perf script tests : FAILED!
```
Disable leak sanitizer when running specific perf+python tests to
avoid this. This causes the tests to pass when run with leak
sanitizer.
Reviewed-by: Aditya Gupta <adityag@linux.ibm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
||
|
|
c3d747134c |
perf trace: Remove arg_fmt->is_enum, we can get that from the BTF type
This is to pave the way for other BTF types, i.e. we try to find BTF type then use things like btf_is_enum(btf_type) that we cached to find the right strtoul and scnprintf routines. For now only enum is supported, all the other types simple return zero for scnprintf which makes it have the same behaviour as when BTF isn't available, i.e. fallback to no pretty printing. Ditto for strtoul. root@x1:~# perf test -v enum 124: perf trace enum augmentation tests : Ok root@x1:~# perf test -v enum 124: perf trace enum augmentation tests : Ok root@x1:~# perf test -v enum 124: perf trace enum augmentation tests : Ok root@x1:~# perf test -v enum 124: perf trace enum augmentation tests : Ok root@x1:~# perf test -v enum 124: perf trace enum augmentation tests : Ok root@x1:~# Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Howard Chu <howardchu95@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240624181345.124764-9-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
62284329b1 |
perf trace: Introduce trace__btf_scnprintf()
To have a central place that will look at the BTF type and call the right scnprintf routine or return zero, meaning BTF pretty printing isn't available or not implemented for a specific type. Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Howard Chu <howardchu95@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240624181345.124764-8-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
d66763fed3 |
perf test trace_btf_enum: Add regression test for the BTF augmentation of enums in 'perf trace'
Trace landlock_add_rule syscall to see if the output is desirable.
Trace the non-syscall tracepoint 'timer:hrtimer_init' and
'timer:hrtimer_start', see if the 'mode' argument is augmented,
the 'mode' enum argument has the prefix of 'HRTIMER_MODE_'
in its name.
Committer testing:
root@x1:~# perf test enum
124: perf trace enum augmentation tests : Ok
root@x1:~# perf test -v enum
124: perf trace enum augmentation tests : Ok
root@x1:~# perf trace -e landlock_add_rule perf test -v enum
0.000 ( 0.010 ms): perf/749827 landlock_add_rule(ruleset_fd: 11, rule_type: LANDLOCK_RULE_PATH_BENEATH, rule_attr: 0x7ffd324171d4, flags: 45) = -1 EINVAL (Invalid argument)
0.012 ( 0.002 ms): perf/749827 landlock_add_rule(ruleset_fd: 11, rule_type: LANDLOCK_RULE_NET_PORT, rule_attr: 0x7ffd324171e0, flags: 45) = -1 EINVAL (Invalid argument)
457.821 ( 0.007 ms): perf/749830 landlock_add_rule(ruleset_fd: 11, rule_type: LANDLOCK_RULE_PATH_BENEATH, rule_attr: 0x7ffd4acd31e4, flags: 45) = -1 EINVAL (Invalid argument)
457.832 ( 0.003 ms): perf/749830 landlock_add_rule(ruleset_fd: 11, rule_type: LANDLOCK_RULE_NET_PORT, rule_attr: 0x7ffd4acd31f0, flags: 45) = -1 EINVAL (Invalid argument)
124: perf trace enum augmentation tests : Ok
root@x1:~#
Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/20240619082042.4173621-6-howardchu95@gmail.com
Link: https://lore.kernel.org/r/20240624181345.124764-7-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
||
|
|
3656e566cf |
perf test: Add landlock workload
We'll use it to add a regression test for the BTF augmentation of enum
arguments for tracepoints in 'perf trace':
root@x1:~# perf trace -e landlock_add_rule perf test -w landlock
0.000 ( 0.009 ms): perf/747160 landlock_add_rule(ruleset_fd: 11, rule_type: LANDLOCK_RULE_PATH_BENEATH, rule_attr: 0x7ffd8e258594, flags: 45) = -1 EINVAL (Invalid argument)
0.011 ( 0.002 ms): perf/747160 landlock_add_rule(ruleset_fd: 11, rule_type: LANDLOCK_RULE_NET_PORT, rule_attr: 0x7ffd8e2585a0, flags: 45) = -1 EINVAL (Invalid argument)
root@x1:~#
Committer notes:
It was agreed on the discussion (see Link below) to shorten then name of
the workload from 'landlock_add_rule' to 'landlock', and I moved it to a
separate patch.
Also, to address a build failure from Namhyung, I stopped loading
linux/landlock.h and instead added the used defines, enums and types to
make this build in older systems. All we want is to emit the syscall and
intercept it.
Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/CAH0uvohaypdTV6Z7O5QSK+va_qnhZ6BP6oSJ89s1c1E0CjgxDA@mail.gmail.com
Link: https://lore.kernel.org/r/20240624181345.124764-1-howardchu95@gmail.com
Link: https://lore.kernel.org/r/20240624181345.124764-6-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
||
|
|
9558658886 |
perf trace: Filter enum arguments with enum names
Before:
perf $ ./perf trace -e timer:hrtimer_start --filter='mode!=HRTIMER_MODE_ABS_PINNED_HARD' --max-events=1
No resolver (strtoul) for "mode" in "timer:hrtimer_start", can't set filter "(mode!=HRTIMER_MODE_ABS_PINNED_HARD) && (common_pid != 281988)"
After:
perf $ ./perf trace -e timer:hrtimer_start --filter='mode!=HRTIMER_MODE_ABS_PINNED_HARD' --max-events=1
0.000 :0/0 timer:hrtimer_start(hrtimer: 0xffff9498a6ca5f18, function: 0xffffffffa77a5be0, expires: 12351248764875, softexpires: 12351248764875, mode: HRTIMER_MODE_ABS)
&& and ||:
perf $ ./perf trace -e timer:hrtimer_start --filter='mode != HRTIMER_MODE_ABS_PINNED_HARD && mode != HRTIMER_MODE_ABS' --max-events=1
0.000 Hyprland/534 timer:hrtimer_start(hrtimer: 0xffff9497801a84d0, function: 0xffffffffc04cdbe0, expires: 12639434638458, softexpires: 12639433638458, mode: HRTIMER_MODE_REL)
perf $ ./perf trace -e timer:hrtimer_start --filter='mode == HRTIMER_MODE_REL || mode == HRTIMER_MODE_PINNED' --max-events=1
0.000 ldlck-test/60639 timer:hrtimer_start(hrtimer: 0xffffb16404ee7bf8, function: 0xffffffffa7790420, expires: 12772614418016, softexpires: 12772614368016, mode: HRTIMER_MODE_REL)
Switching it up, using both enum name and integer value(--filter='mode == HRTIMER_MODE_ABS_PINNED_HARD || mode == 0'):
perf $ ./perf trace -e timer:hrtimer_start --filter='mode == HRTIMER_MODE_ABS_PINNED_HARD || mode == 0' --max-events=3
0.000 :0/0 timer:hrtimer_start(hrtimer: 0xffff9498a6ca5f18, function: 0xffffffffa77a5be0, expires: 12601748739825, softexpires: 12601748739825, mode: HRTIMER_MODE_ABS_PINNED_HARD)
0.036 :0/0 timer:hrtimer_start(hrtimer: 0xffff9498a6ca5f18, function: 0xffffffffa77a5be0, expires: 12518758748124, softexpires: 12518758748124, mode: HRTIMER_MODE_ABS_PINNED_HARD)
0.172 tmux: server/41881 timer:hrtimer_start(hrtimer: 0xffffb164081e7838, function: 0xffffffffa7790420, expires: 12518768255836, softexpires: 12518768205836, mode: HRTIMER_MODE_ABS)
P.S.
perf $ pahole hrtimer_mode
enum hrtimer_mode {
HRTIMER_MODE_ABS = 0,
HRTIMER_MODE_REL = 1,
HRTIMER_MODE_PINNED = 2,
HRTIMER_MODE_SOFT = 4,
HRTIMER_MODE_HARD = 8,
HRTIMER_MODE_ABS_PINNED = 2,
HRTIMER_MODE_REL_PINNED = 3,
HRTIMER_MODE_ABS_SOFT = 4,
HRTIMER_MODE_REL_SOFT = 5,
HRTIMER_MODE_ABS_PINNED_SOFT = 6,
HRTIMER_MODE_REL_PINNED_SOFT = 7,
HRTIMER_MODE_ABS_HARD = 8,
HRTIMER_MODE_REL_HARD = 9,
HRTIMER_MODE_ABS_PINNED_HARD = 10,
HRTIMER_MODE_REL_PINNED_HARD = 11,
};
Committer testing:
root@x1:~# perf trace -e timer:hrtimer_start --filter='mode != HRTIMER_MODE_ABS' --max-events=2
0.000 :0/0 timer:hrtimer_start(hrtimer: 0xffff8d4eff2a5050, function: 0xffffffff9e22ddd0, expires: 241502326000000, softexpires: 241502326000000, mode: HRTIMER_MODE_ABS_PINNED_HARD)
18446744073709.488 :0/0 timer:hrtimer_start(hrtimer: 0xffff8d4eff425050, function: 0xffffffff9e22ddd0, expires: 241501814000000, softexpires: 241501814000000, mode: HRTIMER_MODE_ABS_PINNED_HARD)
root@x1:~# perf trace -e timer:hrtimer_start --filter='mode != HRTIMER_MODE_ABS && mode != HRTIMER_MODE_ABS_PINNED_HARD' --max-events=2
0.000 podman/510644 timer:hrtimer_start(hrtimer: 0xffffa2024f5f7dd0, function: 0xffffffff9e2170c0, expires: 241530497418194, softexpires: 241530497368194, mode: HRTIMER_MODE_REL)
40.251 gnome-shell/2484 timer:hrtimer_start(hrtimer: 0xffff8d48bda17650, function: 0xffffffffc0661550, expires: 241550528619247, softexpires: 241550527619247, mode: HRTIMER_MODE_REL)
root@x1:~# perf trace -v -e timer:hrtimer_start --filter='mode != HRTIMER_MODE_ABS && mode != HRTIMER_MODE_ABS_PINNED_HARD && mode != HRTIMER_MODE_REL' --max-events=2
Using CPUID GenuineIntel-6-BA-3
vmlinux BTF loaded
<SNIP>
0
0xa
0x1
New filter for timer:hrtimer_start: (mode != 0 && mode != 0xa && mode != 0x1) && (common_pid != 524049 && common_pid != 4041)
mmap size 528384B
^Croot@x1:~#
Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/lkml/ZnCcliuecJABD5FN@x1
Link: https://lore.kernel.org/r/20240624181345.124764-5-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
||
|
|
607bbdb49c |
perf trace: Augment non-syscall tracepoints with enum arguments with BTF
Before:
perf $ ./perf trace -e timer:hrtimer_start --max-events=1
0.000 :0/0 timer:hrtimer_start(hrtimer: 0xffff974466c25f18, function: 0xffffffff89da5be0, expires: 377432432256753, softexpires: 377432432256753, mode: 10)
After:
perf $ ./perf trace -e timer:hrtimer_start --max-events=1
0.000 :0/0 timer:hrtimer_start(hrtimer: 0xffff9498a6ca5f18, function: 0xffffffffa77a5be0, expires: 4382442895089, softexpires: 4382442895089, mode: HRTIMER_MODE_ABS_PINNED_HARD)
in which HRTIMER_MODE_ABS_PINNED_HARD is:
perf $ pahole hrtimer_mode
enum hrtimer_mode {
HRTIMER_MODE_ABS = 0,
HRTIMER_MODE_REL = 1,
HRTIMER_MODE_PINNED = 2,
HRTIMER_MODE_SOFT = 4,
HRTIMER_MODE_HARD = 8,
HRTIMER_MODE_ABS_PINNED = 2,
HRTIMER_MODE_REL_PINNED = 3,
HRTIMER_MODE_ABS_SOFT = 4,
HRTIMER_MODE_REL_SOFT = 5,
HRTIMER_MODE_ABS_PINNED_SOFT = 6,
HRTIMER_MODE_REL_PINNED_SOFT = 7,
HRTIMER_MODE_ABS_HARD = 8,
HRTIMER_MODE_REL_HARD = 9,
HRTIMER_MODE_ABS_PINNED_HARD = 10,
HRTIMER_MODE_REL_PINNED_HARD = 11,
};
Can also be tested by
./perf trace -e pagemap:mm_lru_insertion,timer:hrtimer_start,timer:hrtimer_init,skb:kfree_skb --max-events=10
(Chose these 4 events because they happen quite frequently.)
However some enum arguments may not be contained in vmlinux BTF. To see
what enum arguments are supported, use:
vmlinux_dir $ bpftool btf dump file /sys/kernel/btf/vmlinux > vmlinux
vmlinux_dir $ while read l; do grep "ENUM '$l'" vmlinux; done < <(grep field:enum /sys/kernel/tracing/events/*/*/format | awk '{print $3}' | sort | uniq) | awk '{print $3}' | sed "s/'\(.*\)'/\1/g"
dev_pm_qos_req_type
error_detector
hrtimer_mode
i2c_slave_event
ieee80211_bss_type
lru_list
migrate_mode
nl80211_auth_type
nl80211_band
nl80211_iftype
numa_vmaskip_reason
pm_qos_req_action
pwm_polarity
skb_drop_reason
thermal_trip_type
xen_lazy_mode
xen_mc_extend_args
xen_mc_flush_reason
zone_type
And what tracepoints have these enum types as their arguments:
vmlinux_dir $ while read l; do grep "ENUM '$l'" vmlinux; done < <(grep field:enum /sys/kernel/tracing/events/*/*/format | awk '{print $3}' | sort | uniq) | awk '{print $3}' | sed "s/'\(.*\)'/\1/g" > good_enums
vmlinux_dir $ cat good_enums
dev_pm_qos_req_type
error_detector
hrtimer_mode
i2c_slave_event
ieee80211_bss_type
lru_list
migrate_mode
nl80211_auth_type
nl80211_band
nl80211_iftype
numa_vmaskip_reason
pm_qos_req_action
pwm_polarity
skb_drop_reason
thermal_trip_type
xen_lazy_mode
xen_mc_extend_args
xen_mc_flush_reason
zone_type
vmlinux_dir $ grep -f good_enums -l /sys/kernel/tracing/events/*/*/format
/sys/kernel/tracing/events/cfg80211/cfg80211_chandef_dfs_required/format
/sys/kernel/tracing/events/cfg80211/cfg80211_ch_switch_notify/format
/sys/kernel/tracing/events/cfg80211/cfg80211_ch_switch_started_notify/format
/sys/kernel/tracing/events/cfg80211/cfg80211_get_bss/format
/sys/kernel/tracing/events/cfg80211/cfg80211_ibss_joined/format
/sys/kernel/tracing/events/cfg80211/cfg80211_inform_bss_frame/format
/sys/kernel/tracing/events/cfg80211/cfg80211_radar_event/format
/sys/kernel/tracing/events/cfg80211/cfg80211_ready_on_channel_expired/format
/sys/kernel/tracing/events/cfg80211/cfg80211_ready_on_channel/format
/sys/kernel/tracing/events/cfg80211/cfg80211_reg_can_beacon/format
/sys/kernel/tracing/events/cfg80211/cfg80211_return_bss/format
/sys/kernel/tracing/events/cfg80211/cfg80211_tx_mgmt_expired/format
/sys/kernel/tracing/events/cfg80211/rdev_add_virtual_intf/format
/sys/kernel/tracing/events/cfg80211/rdev_auth/format
/sys/kernel/tracing/events/cfg80211/rdev_change_virtual_intf/format
/sys/kernel/tracing/events/cfg80211/rdev_channel_switch/format
/sys/kernel/tracing/events/cfg80211/rdev_connect/format
/sys/kernel/tracing/events/cfg80211/rdev_inform_bss/format
/sys/kernel/tracing/events/cfg80211/rdev_libertas_set_mesh_channel/format
/sys/kernel/tracing/events/cfg80211/rdev_mgmt_tx/format
/sys/kernel/tracing/events/cfg80211/rdev_remain_on_channel/format
/sys/kernel/tracing/events/cfg80211/rdev_return_chandef/format
/sys/kernel/tracing/events/cfg80211/rdev_return_int_survey_info/format
/sys/kernel/tracing/events/cfg80211/rdev_set_ap_chanwidth/format
/sys/kernel/tracing/events/cfg80211/rdev_set_monitor_channel/format
/sys/kernel/tracing/events/cfg80211/rdev_set_radar_background/format
/sys/kernel/tracing/events/cfg80211/rdev_start_ap/format
/sys/kernel/tracing/events/cfg80211/rdev_start_radar_detection/format
/sys/kernel/tracing/events/cfg80211/rdev_tdls_channel_switch/format
/sys/kernel/tracing/events/compaction/mm_compaction_defer_compaction/format
/sys/kernel/tracing/events/compaction/mm_compaction_deferred/format
/sys/kernel/tracing/events/compaction/mm_compaction_defer_reset/format
/sys/kernel/tracing/events/compaction/mm_compaction_finished/format
/sys/kernel/tracing/events/compaction/mm_compaction_kcompactd_wake/format
/sys/kernel/tracing/events/compaction/mm_compaction_suitable/format
/sys/kernel/tracing/events/compaction/mm_compaction_wakeup_kcompactd/format
/sys/kernel/tracing/events/error_report/error_report_end/format
/sys/kernel/tracing/events/i2c_slave/i2c_slave/format
/sys/kernel/tracing/events/migrate/mm_migrate_pages/format
/sys/kernel/tracing/events/migrate/mm_migrate_pages_start/format
/sys/kernel/tracing/events/pagemap/mm_lru_insertion/format
/sys/kernel/tracing/events/power/dev_pm_qos_add_request/format
/sys/kernel/tracing/events/power/dev_pm_qos_remove_request/format
/sys/kernel/tracing/events/power/dev_pm_qos_update_request/format
/sys/kernel/tracing/events/power/pm_qos_update_flags/format
/sys/kernel/tracing/events/power/pm_qos_update_target/format
/sys/kernel/tracing/events/pwm/pwm_apply/format
/sys/kernel/tracing/events/pwm/pwm_get/format
/sys/kernel/tracing/events/sched/sched_skip_vma_numa/format
/sys/kernel/tracing/events/skb/kfree_skb/format
/sys/kernel/tracing/events/thermal/thermal_zone_trip/format
/sys/kernel/tracing/events/timer/hrtimer_init/format
/sys/kernel/tracing/events/timer/hrtimer_start/format
/sys/kernel/tracing/events/xen/xen_mc_batch/format
/sys/kernel/tracing/events/xen/xen_mc_extend_args/format
/sys/kernel/tracing/events/xen/xen_mc_flush_reason/format
/sys/kernel/tracing/events/xen/xen_mc_issue/format
Committer testing:
root@x1:~# perf trace -e timer:hrtimer_start --max-events=2
0.000 :0/0 timer:hrtimer_start(hrtimer: 0xffff8d4eff225050, function: 0xffffffff9e22ddd0, expires: 241152380000000, softexpires: 241152380000000, mode: HRTIMER_MODE_ABS)
0.028 :0/0 timer:hrtimer_start(hrtimer: 0xffff8d4eff225050, function: 0xffffffff9e22ddd0, expires: 241153654000000, softexpires: 241153654000000, mode: HRTIMER_MODE_ABS_PINNED_HARD)
root@x1:~#
Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Reviewed-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/lkml/20240615032743.112750-1-howardchu95@gmail.com
Link: https://lore.kernel.org/r/20240624181345.124764-4-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
||
|
|
45a0c928e7 |
perf trace: BTF-based enum pretty printing for syscall args
In this patch, BTF is used to turn enum value to the corresponding
name. There is only one system call that uses enum value as its
argument, that is `landlock_add_rule()`.
The vmlinux btf is loaded lazily, when user decided to trace the
`landlock_add_rule` syscall. But if one decide to run `perf trace`
without any arguments, the behaviour is to trace `landlock_add_rule`,
so vmlinux btf will be loaded by default.
The laziest behaviour is to load vmlinux btf when a
`landlock_add_rule` syscall hits. But I think you could lose some
samples when loading vmlinux btf at run time, for it can delay the
handling of other samples. I might need your precious opinions on
this...
before:
```
perf $ ./perf trace -e landlock_add_rule
0.000 ( 0.008 ms): ldlck-test/438194 landlock_add_rule(rule_type: 2) = -1 EBADFD (File descriptor in bad state)
0.010 ( 0.001 ms): ldlck-test/438194 landlock_add_rule(rule_type: 1) = -1 EBADFD (File descriptor in bad state)
```
after:
```
perf $ ./perf trace -e landlock_add_rule
0.000 ( 0.029 ms): ldlck-test/438194 landlock_add_rule(rule_type: LANDLOCK_RULE_NET_PORT) = -1 EBADFD (File descriptor in bad state)
0.036 ( 0.004 ms): ldlck-test/438194 landlock_add_rule(rule_type: LANDLOCK_RULE_PATH_BENEATH) = -1 EBADFD (File descriptor in bad state)
```
Committer notes:
Made it build with NO_LIBBPF=1, simplified btf_enum_fprintf(), see [1]
for the discussion.
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Günther Noack <gnoack@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mickaël Salaün <mic@digikod.net>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/lkml/20240613022757.3589783-1-howardchu95@gmail.com
Link: https://lore.kernel.org/lkml/ZnXAhFflUl_LV1QY@x1 # [1]
Link: https://lore.kernel.org/r/20240624181345.124764-3-howardchu95@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
||
|
|
e4fc196f5b |
Merge tag 'for-6.11-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba: - fix regression in extent map rework when handling insertion of overlapping compressed extent - fix unexpected file length when appending to a file using direct io and buffer not faulted in - in zoned mode, fix accounting of unusable space when flipping read-only block group back to read-write - fix page locking when COWing an inline range, assertion failure found by syzbot - fix calculation of space info in debugging print - tree-checker, add validation of data reference item - fix a few -Wmaybe-uninitialized build warnings * tag 'for-6.11-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: initialize location to fix -Wmaybe-uninitialized in btrfs_lookup_dentry() btrfs: fix corruption after buffer fault in during direct IO append write btrfs: zoned: fix zone_unusable accounting on making block group read-write again btrfs: do not subtract delalloc from avail bytes btrfs: make cow_file_range_inline() honor locked_page on error btrfs: fix corrupt read due to bad offset of a compressed extent map btrfs: tree-checker: validate dref root and objectid |
||
|
|
e254e0c5ba |
Merge tag 'perf-tools-fixes-for-v6.11-2024-07-30' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools fixes from Namhyung Kim: "Some more build fixes and a random crash fix: - Fix cross-build by setting pkg-config env according to the arch - Fix static build for missing library dependencies - Fix Segfault when callchain has no symbols" * tag 'perf-tools-fixes-for-v6.11-2024-07-30' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: perf docs: Document cross compilation perf: build: Link lib 'zstd' for static build perf: build: Link lib 'lzma' for static build perf: build: Only link libebl.a for old libdw perf: build: Set Python configuration for cross compilation perf: build: Setup PKG_CONFIG_LIBDIR for cross compilation perf tool: fix dereferencing NULL al->maps |
||
|
|
c91a7dee05 |
Merge tag 'chrome-platform-fixes-for-v6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux
Pull chrome-platform fix from Tzung-Bi Shih: "Fix a race condition that sends multiple host commands at a time" * tag 'chrome-platform-fixes-for-v6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux: platform/chrome: cros_ec_proto: Lock device when updating MKBP version |
||
|
|
22f5468731 |
minmax: improve macro expansion and type checking
This clarifies the rules for min()/max()/clamp() type checking and makes
them a much more efficient macro expansion.
In particular, we now look at the type and range of the inputs to see
whether they work together, generating a mask of acceptable comparisons,
and then just verifying that the inputs have a shared case:
- an expression with a signed type can be used for
(1) signed comparisons
(2) unsigned comparisons if it is statically known to have a
non-negative value
- an expression with an unsigned type can be used for
(3) unsigned comparison
(4) signed comparisons if the type is smaller than 'int' and thus
the C integer promotion rules will make it signed anyway
Here rule (1) and (3) are obvious, and rule (2) is important in order to
allow obvious trivial constants to be used together with unsigned
values.
Rule (4) is not necessarily a good idea, but matches what we used to do,
and we have extant cases of this situation in the kernel. Notably with
bcachefs having an expression like
min(bch2_bucket_sectors_dirty(a), ca->mi.bucket_size)
where bch2_bucket_sectors_dirty() returns an 's64', and
'ca->mi.bucket_size' is of type 'u16'.
Technically that bcachefs comparison is clearly sensible on a C type
level, because the 'u16' will go through the normal C integer promotion,
and become 'int', and then we're comparing two signed values and
everything looks sane.
However, it's not entirely clear that a 'min(s64,u16)' operation makes a
lot of conceptual sense, and it's possible that we will remove rule (4).
After all, the _reason_ we have these complicated type checks is exactly
that the C type promotion rules are not very intuitive.
But at least for now the rule is in place for backwards compatibility.
Also note that rule (2) existed before, but is hugely relaxed by this
commit. It used to be true only for the simplest compile-time
non-negative integer constants. The new macro model will allow cases
where the compiler can trivially see that an expression is non-negative
even if it isn't necessarily a constant.
For example, the amdgpu driver does
min_t(size_t, sizeof(fru_info->serial), pia[addr] & 0x3F));
because our old 'min()' macro would see that 'pia[addr] & 0x3F' is of
type 'int' and clearly not a C constant expression, so doing a 'min()'
with a 'size_t' is a signedness violation.
Our new 'min()' macro still sees that 'pia[addr] & 0x3F' is of type
'int', but is smart enough to also see that it is clearly non-negative,
and thus would allow that case without any complaints.
Cc: Arnd Bergmann <arnd@kernel.org>
Cc: David Laight <David.Laight@aculab.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
||
|
|
b8e947e9f6 |
btrfs: initialize location to fix -Wmaybe-uninitialized in btrfs_lookup_dentry()
Some arch + compiler combinations report a potentially unused variable
location in btrfs_lookup_dentry(). This is a false alert as the variable
is passed by value and always valid or there's an error. The compilers
cannot probably reason about that although btrfs_inode_by_name() is in
the same file.
> + /kisskb/src/fs/btrfs/inode.c: error: 'location.objectid' may be used
+uninitialized in this function [-Werror=maybe-uninitialized]: => 5603:9
> + /kisskb/src/fs/btrfs/inode.c: error: 'location.type' may be used
+uninitialized in this function [-Werror=maybe-uninitialized]: => 5674:5
m68k-gcc8/m68k-allmodconfig
mips-gcc8/mips-allmodconfig
powerpc-gcc5/powerpc-all{mod,yes}config
powerpc-gcc5/ppc64_defconfig
Initialize it to zero, this should fix the warnings and won't change the
behaviour as btrfs_inode_by_name() accepts only a root or inode item
types, otherwise returns an error.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Link: https://lore.kernel.org/linux-btrfs/bd4e9928-17b3-9257-8ba7-6b7f9bbb639a@linux-m68k.org/
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
||
|
|
df615907f1 |
platform/chrome: cros_ec_proto: Lock device when updating MKBP version
The cros_ec_get_host_command_version_mask() function requires that the
caller must have ec_dev->lock mutex before calling it. This requirement
was not met and as a result it was possible that two commands were sent
to the device at the same time.
The problem was observed while using UART backend which doesn't use any
additional locks, unlike SPI backend which locks the controller until
response is received.
Fixes:
|
||
|
|
94ede2a3e9 |
profiling: remove stale percpu flip buffer variables
For some reason I didn't see this issue on my arm64 or x86-64 builds, but Stephen Rothwell reports that commit |
||
|
|
6b5faec9f5 |
Merge tag 'for-linus-2024072901' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID fixes from Benjamin Tissoires: - fixes for HID-BPF after the merge with the bpf tree (Arnd Bergmann and Benjamin Tissoires) - some tool type fix for the Wacom driver (Tatsunosuke Tobita) - a reorder of the sensor discovery to ensure the HID AMD SFH is removed when no sensors are available (Basavaraj Natikar) * tag 'for-linus-2024072901' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: selftests/hid: add test for attaching multiple time the same struct_ops HID: bpf: prevent the same struct_ops to be attached more than once selftests/hid: disable struct_ops auto-attach selftests/hid: fix bpf_wq new API HID: amd_sfh: Move sensor discovery before HID device initialization hid: bpf: add BPF_JIT dependency HID: wacom: more appropriate tool type categorization HID: wacom: Modify pen IDs |
||
|
|
10826505f5 |
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio fixes from Michael Tsirkin:
"The biggest thing here is the adminq change - but it looks like the
only way to avoid headq blocking causing indefinite stalls.
This fixes three issues:
- Prevent admin commands on one VF blocking another.
This prevents a bad VF from blocking a good one, as well as fixing
a scalability issue with large # of VFs
- Correctly return error on command failure on octeon. We used to
treat failed commands as a success.
- Fix modpost warning when building virtio_dma_buf. Harmless, but the
fix is trivial"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio_pci_modern: remove admin queue serialization lock
virtio_pci_modern: use completion instead of busy loop to wait on admin cmd result
virtio_pci_modern: pass cmd as an identification token
virtio_pci_modern: create admin queue of queried size
virtio: create admin queues alongside other virtqueues
virtio_pci: pass vq info as an argument to vp_setup_vq()
virtio: push out code to vp_avq_index()
virtio_pci_modern: treat vp_dev->admin_vq.info.vq pointer as static
virtio_pci: introduce vector allocation fallback for slow path virtqueues
virtio_pci: pass vector policy enum to vp_find_one_vq_msix()
virtio_pci: pass vector policy enum to vp_find_vqs_msix()
virtio_pci: simplify vp_request_msix_vectors() call a bit
virtio_pci: push out single vq find code to vp_find_one_vq_msix()
vdpa/octeon_ep: Fix error code in octep_process_mbox()
virtio: add missing MODULE_DESCRIPTION() macro
|
||
|
|
cec6937dd1 |
task_work: make TWA_NMI_CURRENT handling conditional on IRQ_WORK
The TWA_NMI_CURRENT handling very much depends on IRQ_WORK, but that
isn't universally enabled everywhere.
Maybe the IRQ_WORK infrastructure should just be unconditional - x86
ends up indirectly enabling it through unconditionally enabling
PERF_EVENTS, for example. But it also gets enabled by having SMP
support, or even if you just have PRINTK enabled.
But in the meantime TWA_NMI_CURRENT causes tons of build failures on
various odd minimal configs. Which did show up in linux-next, but
despite that nobody bothered to fix it or even inform me until -rc1 was
out.
Fixes:
|
||
|
|
2accfdb7ef |
profiling: attempt to remove per-cpu profile flip buffer
This is the really old legacy kernel profiling code, which has long since been obviated by "real profiling" (ie 'prof' and company), and mainly remains as a source of syzbot reports. There are anecdotal reports that people still use it for boot-time profiling, but it's unlikely that such use would care about the old NUMA optimizations in this code from 2004 (commit ad02973d42: "profile: 512x Altix timer interrupt livelock fix" in the BK import archive at [1]) So in order to head off future syzbot reports, let's try to simplify this code and get rid of the per-cpu profile buffers that are quite a large portion of the complexity footprint of this thing (including CPU hotplug callbacks etc). It's unlikely anybody will actually notice, or possibly, as Thomas put it: "Only people who indulge in nostalgia will notice :)". That said, if it turns out that this code is actually actively used by somebody, we can always revert this removal. Thus the "attempt" in the summary line. [ Note: in a small nod to "the profiling code can cause NUMA problems", this also removes the "increment the last entry in the profiling array on any unknown hits" logic. That would account any program counter in a module to that single counter location, and might exacerbate any NUMA cacheline bouncing issues ] Link: https://lore.kernel.org/all/CAHk-=wgs52BxT4Zjmjz8aNvHWKxf5_ThBY4bYL1Y6CTaNL2dTw@mail.gmail.com/ Link: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git [1] Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
7c51f7bbf0 |
profiling: remove prof_cpu_mask
syzbot is reporting uninit-value at profile_hits(), for there is a race
window between
if (!alloc_cpumask_var(&prof_cpu_mask, GFP_KERNEL))
return -ENOMEM;
cpumask_copy(prof_cpu_mask, cpu_possible_mask);
in profile_init() and
cpumask_available(prof_cpu_mask) &&
cpumask_test_cpu(smp_processor_id(), prof_cpu_mask))
in profile_tick(); prof_cpu_mask remains uninitialzed until cpumask_copy()
completes while cpumask_available(prof_cpu_mask) returns true as soon as
alloc_cpumask_var(&prof_cpu_mask) completes.
We could replace alloc_cpumask_var() with zalloc_cpumask_var() and
call cpumask_copy() from create_proc_profile() on only UP kernels, for
profile_online_cpu() calls cpumask_set_cpu() as needed via
cpuhp_setup_state(CPUHP_AP_ONLINE_DYN) on SMP kernels. But this patch
removes prof_cpu_mask because it seems unnecessary.
The cpumask_test_cpu(smp_processor_id(), prof_cpu_mask) test
in profile_tick() is likely always true due to
a CPU cannot call profile_tick() if that CPU is offline
and
cpumask_set_cpu(cpu, prof_cpu_mask) is called when that CPU becomes
online and cpumask_clear_cpu(cpu, prof_cpu_mask) is called when that
CPU becomes offline
. This test could be false during transition between online and offline.
But according to include/linux/cpuhotplug.h , CPUHP_PROFILE_PREPARE
belongs to PREPARE section, which means that the CPU subjected to
profile_dead_cpu() cannot be inside profile_tick() (i.e. no risk of
use-after-free bug) because interrupt for that CPU is disabled during
PREPARE section. Therefore, this test is guaranteed to be true, and
can be removed. (Since profile_hits() checks prof_buffer != NULL, we
don't need to check prof_buffer != NULL here unless get_irq_regs() or
user_mode() is such slow that we want to avoid when prof_buffer == NULL).
do_profile_hits() is called from profile_tick() from timer interrupt
only if cpumask_test_cpu(smp_processor_id(), prof_cpu_mask) is true and
prof_buffer is not NULL. But syzbot is also reporting that sometimes
do_profile_hits() is called while current thread is still doing vzalloc(),
where prof_buffer must be NULL at this moment. This indicates that multiple
threads concurrently tried to write to /sys/kernel/profiling interface,
which caused that somebody else try to re-allocate prof_buffer despite
somebody has already allocated prof_buffer. Fix this by using
serialization.
Reported-by: syzbot <syzbot+b1a83ab2a9eb9321fbdd@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=b1a83ab2a9eb9321fbdd
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: syzbot <syzbot+b1a83ab2a9eb9321fbdd@syzkaller.appspotmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
||
|
|
99d3bf5f73 |
Input: MT - limit max slots
syzbot is reporting too large allocation at input_mt_init_slots(), for num_slots is supplied from userspace using ioctl(UI_DEV_CREATE). Since nobody knows possible max slots, this patch chose 1024. Reported-by: syzbot <syzbot+0122fa359a69694395d5@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=0122fa359a69694395d5 Suggested-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
3894840a7a |
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux
Pull ARM updates from Russell King: - ftrace: don't assume stack frames are contiguous in memory - remove unused mod_inwind_map structure - spelling fixes - allow use of LD dead code/data elimination - fix callchain_trace() return value - add support for stackleak gcc plugin - correct some reset asm function prototypes for CFI [ Missed the merge window because Russell forgot to push out ] * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux: ARM: 9408/1: mm: CFI: Fix some erroneous reset prototypes ARM: 9407/1: Add support for STACKLEAK gcc plugin ARM: 9406/1: Fix callchain_trace() return value ARM: 9404/1: arm32: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION ARM: 9403/1: Alpine: Spelling s/initialiing/initializing/ ARM: 9402/1: Kconfig: Spelling s/Cortex A-/Cortex-A/ ARM: 9400/1: Remove unused struct 'mod_unwind_map' |
||
|
|
939b656bc8 |
btrfs: fix corruption after buffer fault in during direct IO append write
During an append (O_APPEND write flag) direct IO write if the input buffer
was not previously faulted in, we can corrupt the file in a way that the
final size is unexpected and it includes an unexpected hole.
The problem happens like this:
1) We have an empty file, with size 0, for example;
2) We do an O_APPEND direct IO with a length of 4096 bytes and the input
buffer is not currently faulted in;
3) We enter btrfs_direct_write(), lock the inode and call
generic_write_checks(), which calls generic_write_checks_count(), and
that function sets the iocb position to 0 with the following code:
if (iocb->ki_flags & IOCB_APPEND)
iocb->ki_pos = i_size_read(inode);
4) We call btrfs_dio_write() and enter into iomap, which will end up
calling btrfs_dio_iomap_begin() and that calls
btrfs_get_blocks_direct_write(), where we update the i_size of the
inode to 4096 bytes;
5) After btrfs_dio_iomap_begin() returns, iomap will attempt to access
the page of the write input buffer (at iomap_dio_bio_iter(), with a
call to bio_iov_iter_get_pages()) and fail with -EFAULT, which gets
returned to btrfs at btrfs_direct_write() via btrfs_dio_write();
6) At btrfs_direct_write() we get the -EFAULT error, unlock the inode,
fault in the write buffer and then goto to the label 'relock';
7) We lock again the inode, do all the necessary checks again and call
again generic_write_checks(), which calls generic_write_checks_count()
again, and there we set the iocb's position to 4K, which is the current
i_size of the inode, with the following code pointed above:
if (iocb->ki_flags & IOCB_APPEND)
iocb->ki_pos = i_size_read(inode);
8) Then we go again to btrfs_dio_write() and enter iomap and the write
succeeds, but it wrote to the file range [4K, 8K), leaving a hole in
the [0, 4K) range and an i_size of 8K, which goes against the
expectations of having the data written to the range [0, 4K) and get an
i_size of 4K.
Fix this by not unlocking the inode before faulting in the input buffer,
in case we get -EFAULT or an incomplete write, and not jumping to the
'relock' label after faulting in the buffer - instead jump to a location
immediately before calling iomap, skipping all the write checks and
relocking. This solves this problem and it's fine even in case the input
buffer is memory mapped to the same file range, since only holding the
range locked in the inode's io tree can cause a deadlock, it's safe to
keep the inode lock (VFS lock), as was fixed and described in commit
|
||
|
|
8cd44dd1d1 |
btrfs: zoned: fix zone_unusable accounting on making block group read-write again
When btrfs makes a block group read-only, it adds all free regions in the
block group to space_info->bytes_readonly. That free space excludes
reserved and pinned regions. OTOH, when btrfs makes the block group
read-write again, it moves all the unused regions into the block group's
zone_unusable. That unused region includes reserved and pinned regions.
As a result, it counts too much zone_unusable bytes.
Fortunately (or unfortunately), having erroneous zone_unusable does not
affect the calculation of space_info->bytes_readonly, because free
space (num_bytes in btrfs_dec_block_group_ro) calculation is done based on
the erroneous zone_unusable and it reduces the num_bytes just to cancel the
error.
This behavior can be easily discovered by adding a WARN_ON to check e.g,
"bg->pinned > 0" in btrfs_dec_block_group_ro(), and running fstests test
case like btrfs/282.
Fix it by properly considering pinned and reserved in
btrfs_dec_block_group_ro(). Also, add a WARN_ON and introduce
btrfs_space_info_update_bytes_zone_unusable() to catch a similar mistake.
Fixes:
|
||
|
|
d89c285d28 |
btrfs: do not subtract delalloc from avail bytes
The block group's avail bytes printed when dumping a space info subtract
the delalloc_bytes. However, as shown in btrfs_add_reserved_bytes() and
btrfs_free_reserved_bytes(), it is added or subtracted along with
"reserved" for the delalloc case, which means the "delalloc_bytes" is a
part of the "reserved" bytes. So, excluding it to calculate the avail space
counts delalloc_bytes twice, which can lead to an invalid result.
Fixes:
|
||
|
|
478574370b |
btrfs: make cow_file_range_inline() honor locked_page on error
The btrfs buffered write path runs through __extent_writepage() which
has some tricky return value handling for writepage_delalloc().
Specifically, when that returns 1, we exit, but for other return values
we continue and end up calling btrfs_folio_end_all_writers(). If the
folio has been unlocked (note that we check the PageLocked bit at the
start of __extent_writepage()), this results in an assert panic like
this one from syzbot:
BTRFS: error (device loop0 state EAL) in free_log_tree:3267: errno=-5 IO failure
BTRFS warning (device loop0 state EAL): Skipping commit of aborted transaction.
BTRFS: error (device loop0 state EAL) in cleanup_transaction:2018: errno=-5 IO failure
assertion failed: folio_test_locked(folio), in fs/btrfs/subpage.c:871
------------[ cut here ]------------
kernel BUG at fs/btrfs/subpage.c:871!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
CPU: 1 PID: 5090 Comm: syz-executor225 Not tainted
6.10.0-syzkaller-05505-gb1bc554e009e #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 06/27/2024
RIP: 0010:btrfs_folio_end_all_writers+0x55b/0x610 fs/btrfs/subpage.c:871
Code: e9 d3 fb ff ff e8 25 22 c2 fd 48 c7 c7 c0 3c 0e 8c 48 c7 c6 80 3d
0e 8c 48 c7 c2 60 3c 0e 8c b9 67 03 00 00 e8 66 47 ad 07 90 <0f> 0b e8
6e 45 b0 07 4c 89 ff be 08 00 00 00 e8 21 12 25 fe 4c 89
RSP: 0018:ffffc900033d72e0 EFLAGS: 00010246
RAX: 0000000000000045 RBX: 00fff0000000402c RCX: 663b7a08c50a0a00
RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000
RBP: ffffc900033d73b0 R08: ffffffff8176b98c R09: 1ffff9200067adfc
R10: dffffc0000000000 R11: fffff5200067adfd R12: 0000000000000001
R13: dffffc0000000000 R14: 0000000000000000 R15: ffffea0001cbee80
FS: 0000000000000000(0000) GS:ffff8880b9500000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f5f076012f8 CR3: 000000000e134000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__extent_writepage fs/btrfs/extent_io.c:1597 [inline]
extent_write_cache_pages fs/btrfs/extent_io.c:2251 [inline]
btrfs_writepages+0x14d7/0x2760 fs/btrfs/extent_io.c:2373
do_writepages+0x359/0x870 mm/page-writeback.c:2656
filemap_fdatawrite_wbc+0x125/0x180 mm/filemap.c:397
__filemap_fdatawrite_range mm/filemap.c:430 [inline]
__filemap_fdatawrite mm/filemap.c:436 [inline]
filemap_flush+0xdf/0x130 mm/filemap.c:463
btrfs_release_file+0x117/0x130 fs/btrfs/file.c:1547
__fput+0x24a/0x8a0 fs/file_table.c:422
task_work_run+0x24f/0x310 kernel/task_work.c:222
exit_task_work include/linux/task_work.h:40 [inline]
do_exit+0xa2f/0x27f0 kernel/exit.c:877
do_group_exit+0x207/0x2c0 kernel/exit.c:1026
__do_sys_exit_group kernel/exit.c:1037 [inline]
__se_sys_exit_group kernel/exit.c:1035 [inline]
__x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1035
x64_sys_call+0x2634/0x2640
arch/x86/include/generated/asm/syscalls_64.h:232
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f5f075b70c9
Code: Unable to access opcode bytes at
0x7f5f075b709f.
I was hitting the same issue by doing hundreds of accelerated runs of
generic/475, which also hits IO errors by design.
I instrumented that reproducer with bpftrace and found that the
undesirable folio_unlock was coming from the following callstack:
folio_unlock+5
__process_pages_contig+475
cow_file_range_inline.constprop.0+230
cow_file_range+803
btrfs_run_delalloc_range+566
writepage_delalloc+332
__extent_writepage # inlined in my stacktrace, but I added it here
extent_write_cache_pages+622
Looking at the bisected-to patch in the syzbot report, Josef realized
that the logic of the cow_file_range_inline error path subtly changing.
In the past, on error, it jumped to out_unlock in cow_file_range(),
which honors the locked_page, so when we ultimately call
folio_end_all_writers(), the folio of interest is still locked. After
the change, we always unlocked ignoring the locked_page, on both success
and error. On the success path, this all results in returning 1 to
__extent_writepage(), which skips the folio_end_all_writers() call,
which makes it OK to have unlocked.
Fix the bug by wiring the locked_page into cow_file_range_inline() and
only setting locked_page to NULL on success.
Reported-by: syzbot+a14d8ac9af3a2a4fd0c8@syzkaller.appspotmail.com
Fixes:
|
||
|
|
dc1c8034e3 |
minmax: simplify min()/max()/clamp() implementation
Now that we no longer have any C constant expression contexts (ie array size declarations or static initializers) that use min() or max(), we can simpify the implementation by not having to worry about the result staying as a C constant expression. So now we can unconditionally just use temporary variables of the right type, and get rid of the excessive expansion that used to come from the use of __builtin_choose_expr(__is_constexpr(...), .. to pick the specialized code for constant expressions. Another expansion simplification is to pass the temporary variables (in addition to the original expression) to our __types_ok() macro. That may superficially look like it complicates the macro, but when we only want the type of the expression, expanding the temporary variable names is much simpler and smaller than expanding the potentially complicated original expression. As a result, on my machine, doing a $ time make drivers/staging/media/atomisp/pci/isp/kernels/ynr/ynr_1.0/ia_css_ynr.host.i goes from real 0m16.621s user 0m15.360s sys 0m1.221s to real 0m2.532s user 0m2.091s sys 0m0.452s because the token expansion goes down dramatically. In particular, the longest line expansion (which was line 71 of that 'ia_css_ynr.host.c' file) shrinks from 23,338kB (yes, 23MB for one single line) to "just" 1,444kB (now "only" 1.4MB). And yes, that line is still the line from hell, because it's doing multiple levels of "min()/max()" expansion thanks to some of them being hidden inside the uDIGIT_FITTING() macro. Lorenzo has a nice cleanup patch that makes that driver use inline functions instead of macros for sDIGIT_FITTING() and uDIGIT_FITTING(), which will fix that line once and for all, but the 16-fold reduction in this case does show why we need to simplify these helpers. Cc: David Laight <David.Laight@aculab.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
cb04e8b1d2 |
minmax: don't use max() in situations that want a C constant expression
We only had a couple of array[] declarations, and changing them to just use 'MAX()' instead of 'max()' fixes the issue. This will allow us to simplify our min/max macros enormously, since they can now unconditionally use temporary variables to avoid using the argument values multiple times. Cc: David Laight <David.Laight@aculab.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
9f499b8c79 |
minmax: scsi: fix mis-use of 'clamp()' in sr.c
While working on simplifying the minmax functions, and avoiding
excessive macro expansion, it turns out that the sr.c use of the
'clamp()' macro has the arguments the wrong way around.
The clamp logic is
val = clamp(in, low, high);
and it returns the input clamped to the low/high limits. But sr.c ddid
speed = clamp(0, speed, 0xffff / 177);
which clamps the value '0' to the range '[speed, 0xffff / 177]' and ends
up being nonsensical.
Happily, I don't think anybody ever cared.
Fixes:
|
||
|
|
1a251f52cf |
minmax: make generic MIN() and MAX() macros available everywhere
This just standardizes the use of MIN() and MAX() macros, with the very traditional semantics. The goal is to use these for C constant expressions and for top-level / static initializers, and so be able to simplify the min()/max() macros. These macro names were used by various kernel code - they are very traditional, after all - and all such users have been fixed up, with a few different approaches: - trivial duplicated macro definitions have been removed Note that 'trivial' here means that it's obviously kernel code that already included all the major kernel headers, and thus gets the new generic MIN/MAX macros automatically. - non-trivial duplicated macro definitions are guarded with #ifndef This is the "yes, they define their own versions, but no, the include situation is not entirely obvious, and maybe they don't get the generic version automatically" case. - strange use case #1 A couple of drivers decided that the way they want to describe their versioning is with #define MAJ 1 #define MIN 2 #define DRV_VERSION __stringify(MAJ) "." __stringify(MIN) which adds zero value and I just did my Alexander the Great impersonation, and rewrote that pointless Gordian knot as #define DRV_VERSION "1.2" instead. - strange use case #2 A couple of drivers thought that it's a good idea to have a random 'MIN' or 'MAX' define for a value or index into a table, rather than the traditional macro that takes arguments. These values were re-written as C enum's instead. The new function-line macros only expand when followed by an open parenthesis, and thus don't clash with enum use. Happily, there weren't really all that many of these cases, and a lot of users already had the pattern of using '#ifndef' guarding (or in one case just using '#undef MIN') before defining their own private version that does the same thing. I left such cases alone. Cc: David Laight <David.Laight@aculab.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
8400291e28 | Linux 6.11-rc1 v6.11-rc1 | ||
|
|
a0c04bd55a |
Merge tag 'kbuild-fixes-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada: - Fix RPM package build error caused by an incorrect locale setup - Mark modules.weakdep as ghost in RPM package - Fix the odd combination of -S and -c in stack protector scripts, which is an error with the latest Clang * tag 'kbuild-fixes-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kbuild: Fix '-S -c' in x86 stack protector scripts kbuild: rpm-pkg: ghost modules.weakdep file kbuild: rpm-pkg: Fix C locale setup |
||
|
|
017fa3e891 |
minmax: simplify and clarify min_t()/max_t() implementation
This simplifies the min_t() and max_t() macros by no longer making them work in the context of a C constant expression. That means that you can no longer use them for static initializers or for array sizes in type definitions, but there were only a couple of such uses, and all of them were converted (famous last words) to use MIN_T/MAX_T instead. Cc: David Laight <David.Laight@aculab.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
4477b39c32 |
minmax: add a few more MIN_T/MAX_T users
Commit
|
||
|
|
7e2d0ba732 |
Merge tag 'ubifs-for-linus-6.11-rc1-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs
Pull UBI and UBIFS updates from Richard Weinberger: - Many fixes for power-cut issues by Zhihao Cheng - Another ubiblock error path fix - ubiblock section mismatch fix - Misc fixes all over the place * tag 'ubifs-for-linus-6.11-rc1-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs: ubi: Fix ubi_init() ubiblock_exit() section mismatch ubifs: add check for crypto_shash_tfm_digest ubifs: Fix inconsistent inode size when powercut happens during appendant writing ubi: block: fix null-pointer-dereference in ubiblock_create() ubifs: fix kernel-doc warnings ubifs: correct UBIFS_DFS_DIR_LEN macro definition and improve code clarity mtd: ubi: Restore missing cleanup on ubi_init() failure path ubifs: dbg_orphan_check: Fix missed key type checking ubifs: Fix unattached inode when powercut happens in creating ubifs: Fix space leak when powercut happens in linking tmpfile ubifs: Move ui->data initialization after initializing security ubifs: Fix adding orphan entry twice for the same inode ubifs: Remove insert_dead_orphan from replaying orphan process Revert "ubifs: ubifs_symlink: Fix memleak of inode->i_link in error path" ubifs: Don't add xattr inode into orphan area ubifs: Fix unattached xattr inode if powercut happens after deleting mtd: ubi: avoid expensive do_div() on 32-bit machines mtd: ubi: make ubi_class constant ubi: eba: properly rollback inside self_check_eba |
||
|
|
3415b10a03 |
kbuild: Fix '-S -c' in x86 stack protector scripts
After a recent change in clang to stop consuming all instances of '-S'
and '-c' [1], the stack protector scripts break due to the kernel's use
of -Werror=unused-command-line-argument to catch cases where flags are
not being properly consumed by the compiler driver:
$ echo | clang -o - -x c - -S -c -Werror=unused-command-line-argument
clang: error: argument unused during compilation: '-c' [-Werror,-Wunused-command-line-argument]
This results in CONFIG_STACKPROTECTOR getting disabled because
CONFIG_CC_HAS_SANE_STACKPROTECTOR is no longer set.
'-c' and '-S' both instruct the compiler to stop at different stages of
the pipeline ('-S' after compiling, '-c' after assembling), so having
them present together in the same command makes little sense. In this
case, the test wants to stop before assembling because it is looking at
the textual assembly output of the compiler for either '%fs' or '%gs',
so remove '-c' from the list of arguments to resolve the error.
All versions of GCC continue to work after this change, along with
versions of clang that do or do not contain the change mentioned above.
Cc: stable@vger.kernel.org
Fixes:
|
||
|
|
92a286e902 |
ubi: Fix ubi_init() ubiblock_exit() section mismatch
Since ubiblock_exit() is now called from an init function, the __exit section no longer makes sense. Cc: Ben Hutchings <bwh@kernel.org> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202407131403.wZJpd8n2-lkp@intel.com/ Signed-off-by: Richard Weinberger <richard@nod.at> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> |
||
|
|
e172f1e906 |
Merge tag 'v6.11-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
Pull turbostat updates from Len Brown: - Enable turbostat extensions to add both perf and PMT (Intel Platform Monitoring Technology) counters via the cmdline - Demonstrate PMT access with built-in support for Meteor Lake's Die C6 counter * tag 'v6.11-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: tools/power turbostat: version 2024.07.26 tools/power turbostat: Include umask=%x in perf counter's config tools/power turbostat: Document PMT in turbostat.8 tools/power turbostat: Add MTL's PMT DC6 builtin counter tools/power turbostat: Add early support for PMT counters tools/power turbostat: Add selftests for added perf counters tools/power turbostat: Add selftests for SMI, APERF and MPERF counters tools/power turbostat: Move verbose counter messages to level 2 tools/power turbostat: Move debug prints from stdout to stderr tools/power turbostat: Fix typo in turbostat.8 tools/power turbostat: Add perf added counter example to turbostat.8 tools/power turbostat: Fix formatting in turbostat.8 tools/power turbostat: Extend --add option with perf counters tools/power turbostat: Group SMI counter with APERF and MPERF tools/power turbostat: Add ZERO_ARRAY for zero initializing builtin array tools/power turbostat: Replace enum rapl_source and cstate_source with counter_source tools/power turbostat: Remove anonymous union from rapl_counter_info_t tools/power/turbostat: Switch to new Intel CPU model defines |
||
|
|
e62f81bbd2 |
Merge tag 'cxl-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull CXL updates from Dave Jiang:
"Core:
- A CXL maturity map has been added to the documentation to detail
the current state of CXL enabling.
It provides the status of the current state of various CXL features
to inform current and future contributors of where things are and
which areas need contribution.
- A notifier handler has been added in order for a newly created CXL
memory region to trigger the abstract distance metrics calculation.
This should bring parity for CXL memory to the same level vs
hotplugged DRAM for NUMA abstract distance calculation. The
abstract distance reflects relative performance used for memory
tiering handling.
- An addition for XOR math has been added to address the CXL DPA to
SPA translation.
CXL address translation did not support address interleave math
with XOR prior to this change.
Fixes:
- Fix to address race condition in the CXL memory hotplug notifier
- Add missing MODULE_DESCRIPTION() for CXL modules
- Fix incorrect vendor debug UUID define
Misc:
- A warning has been added to inform users of an unsupported
configuration when mixing CXL VH and RCH/RCD hierarchies
- The ENXIO error code has been replaced with EBUSY for inject poison
limit reached via debugfs and cxl-test support
- Moving the PCI config read in cxl_dvsec_rr_decode() to avoid
unnecessary PCI config reads
- A refactor to a common struct for DRAM and general media CXL
events"
* tag 'cxl-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl/core/pci: Move reading of control register to immediately before usage
cxl: Remove defunct code calculating host bridge target positions
cxl/region: Verify target positions using the ordered target list
cxl: Restore XOR'd position bits during address translation
cxl/core: Fold cxl_trace_hpa() into cxl_dpa_to_hpa()
cxl/test: Replace ENXIO with EBUSY for inject poison limit reached
cxl/memdev: Replace ENXIO with EBUSY for inject poison limit reached
cxl/acpi: Warn on mixed CXL VH and RCH/RCD Hierarchy
cxl/core: Fix incorrect vendor debug UUID define
Documentation: CXL Maturity Map
cxl/region: Simplify cxl_region_nid()
cxl/region: Support to calculate memory tier abstract distance
cxl/region: Fix a race condition in memory hotplug notifier
cxl: add missing MODULE_DESCRIPTION() macros
cxl/events: Use a common struct for DRAM and General Media events
|
||
|
|
7b5d481889 |
Merge tag 'unicode-next-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode
Pull unicode update from Gabriel Krisman Bertazi: "Two small fixes to silence the compiler and static analyzers tools from Ben Dooks and Jeff Johnson" * tag 'unicode-next-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode: unicode: add MODULE_DESCRIPTION() macros unicode: make utf8 test count static |
||
|
|
d01c14074b |
kbuild: rpm-pkg: ghost modules.weakdep file
In the same way as for other similar files, mark as ghost the new file generated by depmod for configured weak dependencies for modules, modules.weakdep, so that although it is not included in the package, claim the ownership on it. Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> |
||
|
|
5437f30d34 |
Merge tag '6.11-rc-smb-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6
Pull more smb client updates from Steve French: - fix for potential null pointer use in init cifs - additional dynamic trace points to improve debugging of some common scenarios - two SMB1 fixes (one addressing reconnect with POSIX extensions, one a mount parsing error) * tag '6.11-rc-smb-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6: smb3: add dynamic trace point for session setup key expired failures smb3: add four dynamic tracepoints for copy_file_range and reflink smb3: add dynamic tracepoint for reflink errors cifs: mount with "unix" mount option for SMB1 incorrectly handled cifs: fix reconnect with SMB1 UNIX Extensions cifs: fix potential null pointer use in destroy_workqueue in init_cifs error path |
||
|
|
6342649c33 |
Merge tag 'block-6.11-20240726' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
- NVMe pull request via Keith:
- Fix request without payloads cleanup (Leon)
- Use new protection information format (Francis)
- Improved debug message for lost pci link (Bart)
- Another apst quirk (Wang)
- Use appropriate sysfs api for printing chars (Markus)
- ublk async device deletion fix (Ming)
- drbd kerneldoc fixups (Simon)
- Fix deadlock between sd removal and release (Yang)
* tag 'block-6.11-20240726' of git://git.kernel.dk/linux:
nvme-pci: add missing condition check for existence of mapped data
ublk: fix UBLK_CMD_DEL_DEV_ASYNC handling
block: fix deadlock between sd_remove & sd_release
drbd: Add peer_device to Kernel doc
nvme-core: choose PIF from QPIF if QPIFS supports and PIF is QTYPE
nvme-pci: Fix the instructions for disabling power management
nvme: remove redundant bdev local variable
nvme-fabrics: Use seq_putc() in __nvmf_concat_opt_tokens()
nvme/pci: Add APST quirk for Lenovo N60z laptop
|
||
|
|
8c93074743 |
Merge tag 'io_uring-6.11-20240726' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe: - Fix a syzbot issue for the msg ring cache added in this release. No ill effects from this one, but it did make KMSAN unhappy (me) - Sanitize the NAPI timeout handling, by unifying the value handling into all ktime_t rather than converting back and forth (Pavel) - Fail NAPI registration for IOPOLL rings, it's not supported (Pavel) - Fix a theoretical issue with ring polling and cancelations (Pavel) - Various little cleanups and fixes (Pavel) * tag 'io_uring-6.11-20240726' of git://git.kernel.dk/linux: io_uring/napi: pass ktime to io_napi_adjust_timeout io_uring/napi: use ktime in busy polling io_uring/msg_ring: fix uninitialized use of target_req->flags io_uring: align iowq and task request error handling io_uring: kill REQ_F_CANCEL_SEQ io_uring: simplify io_uring_cmd return io_uring: fix io_match_task must_hold io_uring: don't allow netpolling with SETUP_IOPOLL io_uring: tighten task exit cancellations |