Commit Graph

1138104 Commits

Author SHA1 Message Date
Tomer Tayar
fe3e88c947 habanalabs/gaudi: fix print for firmware-alive event
Add missing le{32,64}_to_cpu conversions.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:47 +02:00
Tomer Tayar
5f8981d699 habanalabs: fix print for out-of-sync and pkt-failure events
Add missing le32_to_cpu() conversions, and use %d for the value
returned from atomic_read().

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:47 +02:00
Dani Liberman
d3027f4a62 habanalabs/gaudi2: add page fault notify event
Each time page fault happens, besides capturing its data, also notify
the user about it.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:47 +02:00
Ofir Bitton
a63de89bee habanalabs/gaudi2: classify power/thermal events as info
As power and thermal envelope events are pure informative and not
indicating an error, we reduce the print level to info only.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:46 +02:00
Ohad Sharabi
b829e01025 habanalabs: skip events info ioctl if not supported
Some ASICs haven't yet implemented this functionality and so the
ioctl call should fail and the user should be notified of the reason.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:46 +02:00
farah kassabri
3daa64eea1 habanalabs: fix firmware descriptor copy operation
This is needed to allow adding more data to the lkd_fw_comms_desc
structure.

Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:46 +02:00
Dani Liberman
413bdb176e habanalabs/gaudi2: add razwi notify event
Each time razwi (read-only zero, write ignored) event happens, besides
capturing its data, also notify the user about it.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:46 +02:00
Ofir Bitton
91bd822448 habanalabs/gaudi2: implement fp32 not supported event
Due to binning, Gaudi2 does not always support fp32.
We add support for such an event in case fp32 is used by the user
in such a device.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:46 +02:00
Dani Liberman
aff6354afd habanalabs/gaudi: add page fault notify event
Each time page fault happens, besides capturing its data, also notify
the user about it.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:46 +02:00
Dani Liberman
cd21701cde habanalabs: use single threaded WQ for event handling
Creating event queue workqueue using alloc_workqueue made it run in
multi threaded mode, which caused parallel dumping of events as well as
parallel events notifying to user, causing logs with multiple
events to be out of order.

Fixed by creating event queue workqueue as single threaded work queue.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:46 +02:00
Dani Liberman
cb5fb665f3 habanalabs/gaudi: add razwi notify event
Each time razwi (read-only zero, write ignore) happens, besides
capturing its data, also notify the user about it.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:46 +02:00
Ofir Bitton
841cd2d765 habanalabs/gaudi2: add PCI revision 2 support
Add support for Gaudi2 Device with PCI revision 2.
Functionality is exactly the same as revision 1, the only difference
is device name exposed to user.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:46 +02:00
Ofir Bitton
306206985a habanalabs: remove redundant gaudi2_sec asic type
As Gaudi2 has a single PCI id, the secured asic type is redundant.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:45 +02:00
Ofir Bitton
bdfef91e7c habanalabs: add warning print upon a PCI error
In order to know if driver catches PCI errors correctly, we need to
print a warning per each error.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:45 +02:00
Tomer Tayar
fc69aa8640 habanalabs: fix PCIe access to SRAM via debugfs
hl_access_sram_dram_region() uses a region base which is set within the
hl_set_dram_bar() function. However, for SRAM access this function is
not called, and we end up with a wrong value of region base and with a
bad calculated address.
Fix it by initializing the region base value independently of whether
hl_set_dram_bar() is called or not.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:45 +02:00
farah kassabri
679e968908 habanalabs: zero ts registration buff when allocated
To avoid memory corruption in kernel memory while using timestamp
registration nodes, zero the kernel buff memory when its allocated.

Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:45 +02:00
Tal Cohen
4a9c6e2cdf habanalabs: no consecutive err when user context is enabled
Consecutive error protects a device reset loop from being triggered
due to h/w issues and enters the device into an unavailable state.
When user may cause the error, an unavailable state
will prevent the user from running its workloads.

The commit prevents entering consecutive state when a user context
is enabled.

Signed-off-by: Tal Cohen <talcohen@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:44 +02:00
Tomer Tayar
1b363adc7f habanalabs: use graceful hard reset for CS timeouts
Use graceful hard reset when detecting a CS timeout that requires a
device reset.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:44 +02:00
Tomer Tayar
d1ce7e5ea1 habanalabs/gaudi2: use graceful hard reset for F/W events
Use graceful hard reset for F/W events on Gaudi2 device that require a
device reset.

While at it, do a small refactor of the checks and function calls,
to simplify it and to avoid code duplication.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:43 +02:00
Tomer Tayar
5b8873b39c habanalabs/gaudi: use graceful hard reset for F/W events
Use graceful hard reset for F/W events on Gaudi device that require a
device reset.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:43 +02:00
Tomer Tayar
11669b58fa habanalabs: add an option to control watchdog timeout via debugfs
Add an option to control the timeout value for the driver's watchdog
of the reset process. The timeout represents the amount of the user
has to close his process once he gets a device reset notification from
the driver.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:43 +02:00
Tomer Tayar
a88a6f5f5c habanalabs: add support for graceful hard reset
Calling hl_device_reset() for a hard reset will lead to a quite
immediate device reset and to killing user process.
For resets that follow errors, it disables the option to debug the
errors on both the device side and the user application side.

This patch adds a 'graceful hard reset' option and a new
hl_device_cond_reset() function.
Under some conditions, mainly if there is no user process or if he is
not registered to driver notifications, this function will execute hard
reset as usual.
Otherwise, the reset will be postponed and a notification will be sent
to user, to let him perform post-error actions and then to release the
device, after which reset will take place.

If device is not released by user in some defined time, a watchdog work
will execute the reset in any case.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:43 +02:00
Ohad Sharabi
d1e0ac37ed habanalabs: avoid divide by zero in device utilization
Currently there is no verification whether the divisor is legal.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:43 +02:00
Dani Liberman
6bcb2d05a5 habanalabs: fix user mappings calculation in case of page fault
As there are 2 types of user mappings, pmmu and hmmu, calculate
only the relevant mappings for the requested type.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:43 +02:00
Tomer Tayar
5ad06bb1d2 habanalabs/gaudi2: remove configurations to access the MSI-X doorbell
The virtual MSI-X doorbell is supported now in F/W, so all
configurations to access the PCIE_DBI MSI-X doorbell can be removed.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:43 +02:00
Ohad Sharabi
e325d5dbf3 habanalabs: allow setting HBM BAR to other regions
Up until now the use-case in the driver was that the HBM is accessed
using the HBM BAR, yet the BAR sometimes cannot cover the whole HBM and
so we needed to set the BAR to other HBM offset.
Now we are facing the need to access other PCI memory regions that can
be covered by the HBM BAR.
To answer that we are allowing the caller to determine if the HBM BAR
need to be set or not regardless of the PCI memory region.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:43 +02:00
Ohad Sharabi
24fdfb359c habanalabs: fix using freed pointer
The code uses the pointer for trace purpose (without actually
dereference it) but still get static analysis warning.
This patch eliminate the warning.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:43 +02:00
Dilip Puri
dc8d243cae habanalabs/gaudi2: unsecure CBU_EARLY_BRESP registers
NIC ARCs need to have access to CBU_EARLY_BRESP, hence we unsecure
those registers.

Signed-off-by: Dilip Puri <dilipp@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:42 +02:00
Tal Cohen
27cd39afde habanalabs: verify no zero event is sent
The event notifier mechanism should not raise an empty
event (event equals zero).

Signed-off-by: Tal Cohen <talcohen@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:42 +02:00
Dani Liberman
4f11694f27 habanalabs/gaudi2: capture page fault data
Capture page fault data when it happens.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:42 +02:00
Dani Liberman
15ac503cdc habanalabs/gaudi2: capture RAZWI information
Added function to calculate possible engines which caused
RAZWI (read-only zero, write ignored), from a given router id or
module index.

When getting RAZWI via PSOC IP, first the router id is calculated
and then the possible engines that caused the RAZWI are calculated.

There is a possibility that the RAZWI initiator is not an engine. In
that case, it will not be included in possible engines as it
doesn't have an engine id.

RAZWI information is captured when receiving event from engine or via
PSOC IP.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:41 +02:00
Dani Liberman
17f3f42af2 habanalabs: handle HBM MMU when capturing page fault data
In case of HBM MMU page fault, capture its relevant mappings.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:40 +02:00
Tomer Tayar
1eebb25929 habanalabs: move reset workqueue to be under hl_device
'struct hl_device_reset_work' is used as a wrapper for the reset work
and its parameters, including the reset workqueue on which it runs.
In a future commit, another reset related work with similar parameters
is going to be added, but it won't use the reset workqueue.

As in any case there is a single reset workqueue, and to allow the resue
of this structure, move the reset workqueue to 'struct hl_device'.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:40 +02:00
Tomer Tayar
51236cd95e habanalabs: allow unregistering eventfd when device non-operational
Unregistering eventfd is for releasing host resources and doesn't
involve an access to the device. As such, there is no reason to disallow
it when device isn't operational.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:40 +02:00
Tomer Tayar
3a83ebc521 habanalabs: skip idle status check if reset on device release
If reset upon device release is enabled, there is no need to check the
device idle status in hpriv_release(), because device is going to be
reset in any case.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:40 +02:00
Tal Cohen
5731b6e6f0 habanalabs/gaudi2: add device unavailable notification
Device unavailable notifies the user that there isn't an option to
retrieve debug information from the device.
When a critical device error occurs and the f/w performs the device
reset, a device unavailable notification shall be sent to the user
process.

Signed-off-by: Tal Cohen <talcohen@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:40 +02:00
Koby Elbaz
16448d6444 habanalabs/gaudi2: remove privileged MME clock configuration
Privileged MME clock configuration is removed as it is done by the f/w.

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:40 +02:00
Dafna Hirschfeld
189b203ebb habanalabs: replace 'pf' to 'prefetch'
pf was an abbreviation for prefetch but because pf already stands
for 'physical function', we decided to change it to 'prefetch'.

Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:40 +02:00
Dani Liberman
dd600db47b habanalabs: add page fault info uapi
Only the first page fault will be saved.
Besides the address which caused the page fault, the driver captures
all of the mmu user mappings.
User can retrieve this data via the new uapi (new opcode in INFO ioctl).

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:40 +02:00
Tomer Tayar
6d1c567f2a habanalabs/gaudi2: fix module ID for RAZWI handling
RAZWI is optionally handled as part of the generic QM SEI error
handling, but it always uses PDMA as the module ID.
Fix it to use the suitable module ID according to the specific event.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:40 +02:00
Bharat Jauhari
0502df9bbe habanalabs: use lower_32_bits()
This fixes sparse warning on doing cast to 32-bits

Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:40 +02:00
Dani Liberman
52d5e54695 habanalabs: refactor razwi event notification
This event notification was compatible only with gaudi, where razwi
and page fault happens together.

To make it compatible with all ASICs, this refactor contains:

1. Razwi notification will only notify about razwi info.
   New notification will be added in future patch, to retrieve data
   about page fault error.

2. Changed razwi info structure to support all ASICs.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:39 +02:00
Oded Gabbay
ea73ef14dd habanalabs: Use simplified API for p2p dist calc
Use the simplified API that calculates distance between two devices.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:39 +02:00
Ofir Bitton
a925d90b36 habanalabs: allow control device open during reset
Monitoring apps would like to query device state at any time so we
should allow it also during reset because it doesn't involve
accessing the h/w.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:39 +02:00
Yang Yingliang
8749c27895 habanalabs: fix return value check in hl_fw_get_sec_attest_data()
If hl_cpu_accessible_dma_pool_alloc() fails, we should check
'req_cpu_addr', fix it.

Fixes: 0c88760f8f ("habanalabs/gaudi2: add secured attestation info uapi")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-11-23 16:13:39 +02:00
Greg Kroah-Hartman
210a671cc3 Merge 6.1-rc6 into char-misc-next
We need the char/misc fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-21 10:05:34 +01:00
Linus Torvalds
eb7081409f Linux 6.1-rc6 v6.1-rc6 2022-11-20 16:02:16 -08:00
Linus Torvalds
c6c67bf9bc Merge tag 'trace-probes-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing/probes fixes from Steven Rostedt:

 - Fix possible NULL pointer dereference on trace_event_file in
   kprobe_event_gen_test_exit()

 - Fix NULL pointer dereference for trace_array in
   kprobe_event_gen_test_exit()

 - Fix memory leak of filter string for eprobes

 - Fix a possible memory leak in rethook_alloc()

 - Skip clearing aggrprobe's post_handler in kprobe-on-ftrace case which
   can cause a possible use-after-free

 - Fix warning in eprobe filter creation

 - Fix eprobe filter creation as it picked the wrong event for the
   fields

* tag 'trace-probes-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing/eprobe: Fix eprobe filter to make a filter correctly
  tracing/eprobe: Fix warning in filter creation
  kprobes: Skip clearing aggrprobe's post_handler in kprobe-on-ftrace case
  rethook: fix a potential memleak in rethook_alloc()
  tracing/eprobe: Fix memory leak of filter string
  tracing: kprobe: Fix potential null-ptr-deref on trace_array in kprobe_event_gen_test_exit()
  tracing: kprobe: Fix potential null-ptr-deref on trace_event_file in kprobe_event_gen_test_exit()
2022-11-20 15:31:20 -08:00
Linus Torvalds
5239ddeb48 Merge tag 'trace-v6.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:

 - Fix polling to block on watermark like the reads do, as user space
   applications get confused when the select says read is available, and
   then the read blocks

 - Fix accounting of ring buffer dropped pages as it is what is used to
   determine if the buffer is empty or not

 - Fix memory leak in tracing_read_pipe()

 - Fix struct trace_array warning about being declared in parameters

 - Fix accounting of ftrace pages used in output at start up.

 - Fix allocation of dyn_ftrace pages by subtracting one from order
   instead of diving it by 2

 - Static analyzer found a case were a pointer being used outside of a
   NULL check (rb_head_page_deactivate())

 - Fix possible NULL pointer dereference if kstrdup() fails in
   ftrace_add_mod()

 - Fix memory leak in test_gen_synth_cmd() and test_empty_synth_event()

 - Fix bad pointer dereference in register_synth_event() on error path

 - Remove unused __bad_type_size() method

 - Fix possible NULL pointer dereference of entry in list 'tr->err_log'

 - Fix NULL pointer deference race if eprobe is called before the event
   setup

* tag 'trace-v6.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Fix race where eprobes can be called before the event
  tracing: Fix potential null-pointer-access of entry in list 'tr->err_log'
  tracing: Remove unused __bad_type_size() method
  tracing: Fix wild-memory-access in register_synth_event()
  tracing: Fix memory leak in test_gen_synth_cmd() and test_empty_synth_event()
  ftrace: Fix null pointer dereference in ftrace_add_mod()
  ring_buffer: Do not deactivate non-existant pages
  ftrace: Optimize the allocation for mcount entries
  ftrace: Fix the possible incorrect kernel message
  tracing: Fix warning on variable 'struct trace_array'
  tracing: Fix memory leak in tracing_read_pipe()
  ring-buffer: Include dropped pages in counting dirty patches
  tracing/ring-buffer: Have polling block on watermark
2022-11-20 15:25:32 -08:00
Steven Rostedt (Google)
94eedf3dde tracing: Fix race where eprobes can be called before the event
The flag that tells the event to call its triggers after reading the event
is set for eprobes after the eprobe is enabled. This leads to a race where
the eprobe may be triggered at the beginning of the event where the record
information is NULL. The eprobe then dereferences the NULL record causing
a NULL kernel pointer bug.

Test for a NULL record to keep this from happening.

Link: https://lore.kernel.org/linux-trace-kernel/20221116192552.1066630-1-rafaelmendsr@gmail.com/
Link: https://lore.kernel.org/linux-trace-kernel/20221117214249.2addbe10@gandalf.local.home

Cc: Linux Trace Kernel <linux-trace-kernel@vger.kernel.org>
Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: stable@vger.kernel.org
Fixes: 7491e2c442 ("tracing: Add a probe that attaches to trace events")
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reported-by: Rafael Mendonca <rafaelmendsr@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-20 14:05:50 -05:00