T.J. Mercier
1c1914d6e8
dma-buf: heaps: Don't track CMA dma-buf pages under RssFile
...
DMA buffers allocated from the CMA dma-buf heap get counted under
RssFile for processes that map them and trigger page faults. In
addition to the incorrect accounting reported to userspace, reclaim
behavior was influenced by the MM_FILEPAGES counter until linux 6.8, but
this memory is not reclaimable. [1] Change the CMA dma-buf heap to set
VM_PFNMAP on the VMA so MM does not poke at the memory managed by this
dma-buf heap, and use vmf_insert_pfn to correct the RSS accounting.
The system dma-buf heap does not suffer from this issue since
remap_pfn_range is used during the mmap of the buffer, which also sets
VM_PFNMAP on the VMA.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/mm/vmscan.c?id=fb46e22a9e3863e08aef8815df9f17d0f4b9aede
Fixes: b61614ec31 ("dma-buf: heaps: Add CMA heap to dmabuf heaps")
Signed-off-by: T.J. Mercier <tjmercier@google.com >
Acked-by: Christian König <christian.koenig@amd.com >
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org >
Link: https://patchwork.freedesktop.org/patch/msgid/20240117181141.286383-1-tjmercier@google.com
2024-01-31 19:54:58 +05:30
Sebastian Ott
9c64e749ce
drm/virtio: Set segment size for virtio_gpu device
...
Set the segment size of the virtio_gpu device to the value
used by the drm helpers when allocating sg lists to fix the
following complaint from DMA_API debug code:
DMA-API: virtio-pci 0000:07:00.0: mapping sg segment longer than
device claims to support [len=262144] [max=65536]
Cc: stable@vger.kernel.org
Tested-by: Zhenyu Zhang <zhenyzha@redhat.com >
Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com >
Signed-off-by: Sebastian Ott <sebott@redhat.com >
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com >
Link: https://patchwork.freedesktop.org/patch/msgid/7258a4cc-da16-5c34-a042-2a23ee396d56@redhat.com
2024-01-29 11:44:34 +03:00
Jacek Lawrynowicz
27d19268cf
accel/ivpu: Improve recovery and reset support
...
- Synchronize job submission with reset/recovery using reset_lock
- Always print recovery reason and call diagnose_failure()
- Don't allow for autosupend during recovery
- Prevent immediate autosuspend after reset/recovery
- Prevent force_recovery for issuing TDR when device is suspended
- Reset VPU instead triggering recovery after changing debugfs params
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com >
Reviewed-by: Wachowski, Karol <karol.wachowski@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240122120945.1150728-4-jacek.lawrynowicz@linux.intel.com
2024-01-25 10:17:37 +01:00
Jacek Lawrynowicz
264b271d12
accel/ivpu: Improve stability of ivpu_submit_ioctl()
...
- Wake up the device as late as possible
- Remove job reference counting in order to simplify the code
- Don't put jobs that are not fully submitted on submitted_jobs_xa in
order to avoid potential races with reset/recovery
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com >
Reviewed-by: Wachowski, Karol <karol.wachowski@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240122120945.1150728-3-jacek.lawrynowicz@linux.intel.com
2024-01-25 10:17:07 +01:00
Jacek Lawrynowicz
f1cc6aceec
accel/ivpu: Fix dev open/close races with unbind
...
- Add context_list_lock to synchronize user context addition/removal
- Use drm_dev_enter() to prevent unbinding the device during ivpu_open()
and vpu address allocation
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com >
Reviewed-by: Wachowski, Karol <karol.wachowski@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240122120945.1150728-2-jacek.lawrynowicz@linux.intel.com
2024-01-25 10:16:56 +01:00
Thomas Zimmermann
d1b163aa07
Revert "drivers/firmware: Move sysfb_init() from device_initcall to subsys_initcall_sync"
...
This reverts commit 60aebc9559 .
Commit 60aebc9559 ("drivers/firmware: Move sysfb_init() from
device_initcall to subsys_initcall_sync") messes up initialization order
of the graphics drivers and leads to blank displays on some systems. So
revert the commit.
To make the display drivers fully independent from initialization
order requires to track framebuffer memory by device and independently
from the loaded drivers. The kernel currently lacks the infrastructure
to do so.
Reported-by: Jaak Ristioja <jaak@ristioja.ee >
Closes: https://lore.kernel.org/dri-devel/ZUnNi3q3yB3zZfTl@P70.localdomain/T/#t
Reported-by: Huacai Chen <chenhuacai@loongson.cn >
Closes: https://lore.kernel.org/dri-devel/20231108024613.2898921-1-chenhuacai@loongson.cn/
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10133
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de >
Cc: Javier Martinez Canillas <javierm@redhat.com >
Cc: Thorsten Leemhuis <regressions@leemhuis.info >
Cc: Jani Nikula <jani.nikula@linux.intel.com >
Cc: stable@vger.kernel.org # v6.5+
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com >
Acked-by: Jani Nikula <jani.nikula@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240123120937.27736-1-tzimmermann@suse.de
2024-01-24 17:03:44 +01:00
Bagas Sanjaya
1a84c21314
drm/dp_mst: Separate @failing_port list in drm_dp_mst_atomic_check_mgr() comment
...
Stephen Rothwell reported htmldocs warnings when merging drm-intel
tree:
Documentation/gpu/drm-kms-helpers:296: drivers/gpu/drm/display/drm_dp_mst_topology.c:5484: ERROR: Unexpected indentation.
Documentation/gpu/drm-kms-helpers:296: drivers/gpu/drm/display/drm_dp_mst_topology.c:5488: WARNING: Block quote ends without a blank line; unexpected unindent.
Separate @failing_port return value list by surrounding it with a
blank line to fix above warnings.
Fixes: 1cd0a5ea42 ("drm/dp_mst: Factor out a helper to check the atomic state of a topology manager")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au >
Closes: https://lore.kernel.org/linux-next/20231114141715.6f435118@canb.auug.org.au/
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com >
Reviewed-by: Jani Nikula <jani.nikula@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20231114081033.27343-1-bagasdotme@gmail.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com >
2024-01-22 19:20:36 +02:00
Hsin-Yi Wang
4d5b7daa3c
drm/bridge: anx7625: Ensure bridge is suspended in disable()
...
Similar to commit 26db46bc9c ("drm/bridge: parade-ps8640: Ensure bridge
is suspended in .post_disable()"). Add a mutex to ensure that aux transfer
won't race with atomic_disable by holding the PM reference and prevent
the bridge from suspend.
Also we need to use pm_runtime_put_sync_suspend() to suspend the bridge
instead of idle with pm_runtime_put_sync().
Fixes: 3203e497eb ("drm/bridge: anx7625: Synchronously run runtime suspend.")
Fixes: adca62ec37 ("drm/bridge: anx7625: Support reading edid through aux channel")
Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org >
Tested-by: Xuxin Xiong <xuxinxiong@huaqin.corp-partner.google.com >
Reviewed-by: Pin-yen Lin <treapking@chromium.org >
Reviewed-by: Douglas Anderson <dianders@chromium.org >
Signed-off-by: Douglas Anderson <dianders@chromium.org >
Link: https://patchwork.freedesktop.org/patch/msgid/20240118015916.2296741-1-hsinyi@chromium.org
2024-01-22 08:53:42 -08:00
Jacek Lawrynowicz
4b5581f112
accel/ivpu: Disable PLL after VPU IP reset during FLR
...
IP reset has to followed by ivpu_pll_disable() to properly enter
reset state.
Fixes: 828d63042a ("accel/ivpu: Don't enter d0i3 during FLR")
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com >
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com >
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20231024165353.761507-1-stanislaw.gruszka@linux.intel.com
2024-01-22 13:29:42 +01:00
Wachowski, Karol
b246271d25
accel/ivpu: Deprecate DRM_IVPU_PARAM_CONTEXT_PRIORITY param
...
DRM_IVPU_PARAM_CONTEXT_PRIORITY has been deprecated because it
has been replaced with DRM_IVPU_JOB_PRIORITY levels set with
submit IOCTL and was unused anyway.
Signed-off-by: Wachowski, Karol <karol.wachowski@intel.com >
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com >
Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240115134434.493839-10-jacek.lawrynowicz@linux.intel.com
2024-01-22 10:31:54 +01:00
Jacek Lawrynowicz
37dee2a2f4
accel/ivpu: Improve buffer object debug logs
...
Make debug logs more readable and consistent:
- don't print handle as it is not always available for all buffers
- use hashed ivpu_bo ptr as main buffer identifier
- remove unused fields from ivpu_bo_print_info()
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com >
Reviewed-by: Wachowski, Karol <karol.wachowski@intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240115134434.493839-9-jacek.lawrynowicz@linux.intel.com
2024-01-22 10:30:56 +01:00
Jacek Lawrynowicz
b7a0e75632
accel/ivpu: Disable buffer sharing among VPU contexts
...
This was not supported properly. A buffer was imported to another VPU
context as a separate buffer object with duplicated sgt.
Both exported and imported buffers could be DMA mapped causing a double
mapping on the same device.
Buffers imported from another VPU context will now just increase
reference count, leaving only a single sgt, fixing the problem above.
Buffers still can't be shared among VPU contexts because each has its
own MMU mapping and ivpu_bo only supports single MMU mappings.
The solution would be to use a mapping list as in panfrost or etnaviv
drivers and it will be implemented in future if required.
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com >
Reviewed-by: Andrzej Kacprowski <andrzej.kacprowski@linux.intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240115134434.493839-8-jacek.lawrynowicz@linux.intel.com
2024-01-22 10:28:52 +01:00
Jacek Lawrynowicz
a8c099d5d0
accel/ivpu: Free buffer sgt on unbind
...
Call dma_unmap() on all buffers before the VPU is unbinded to avoid
"device driver has pending DMA allocations while released from device"
warning when DMA-API debug is enabled.
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com >
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240115134434.493839-7-jacek.lawrynowicz@linux.intel.com
2024-01-22 10:28:46 +01:00
Jacek Lawrynowicz
7f66319927
accel/ivpu: Fix for missing lock around drm_gem_shmem_vmap()
...
drm_gem_shmem_vmap/vunmap requires dma resv lock to be held.
This was missed during conversion to shmem helper.
Fixes: 8d88e4cdce ("accel/ivpu: Use GEM shmem helper for all buffers")
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com >
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240115134434.493839-6-jacek.lawrynowicz@linux.intel.com
2024-01-22 10:28:43 +01:00
Wachowski, Karol
2a20b857dd
accel/ivpu: Add diagnostic messages when VPU fails to boot or suspend
...
Make boot/suspend failure debugging easier by dumping FW logs and error
registers.
Signed-off-by: Wachowski, Karol <karol.wachowski@intel.com >
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com >
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240115134434.493839-5-jacek.lawrynowicz@linux.intel.com
2024-01-22 10:28:40 +01:00
Wachowski, Karol
8047d36fe5
accel/ivpu: Add debug prints for MMU map/unmap operations
...
It is common need to be able to see IOVA/physical to VPU addresses
mappings. Especially when debugging different kind of memory related
issues. Lack of such logs forces user to modify and recompile KMD manually.
This commit adds those logs under MMU debug mask which can be turned on
dynamically with module param during KMD load.
Signed-off-by: Wachowski, Karol <karol.wachowski@intel.com >
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com >
Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240115134434.493839-4-jacek.lawrynowicz@linux.intel.com
2024-01-22 10:27:47 +01:00
Wachowski, Karol
929acfb9c5
accel/ivpu: Call diagnose failure in ivpu_mmu_cmdq_sync()
...
Check for possible failure reasons in the buttress.
Some errors (like external abort) should have corresponding buttress errors
registers set indicating the real reason of failure.
Signed-off-by: Wachowski, Karol <karol.wachowski@intel.com >
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com >
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240115134434.493839-3-jacek.lawrynowicz@linux.intel.com
2024-01-22 10:27:43 +01:00
Wachowski, Karol
30cf36bb04
accel/ivpu: Dump MMU events in case of VPU boot timeout
...
Add ivpu_mmu_evtq_dump() function that dumps existing MMU events from
MMU event queue. Call this function if VPU boot failed.
Previously MMU events were only checked in interrupt handler, but if VPU
failed to boot due to MMU faults, those faults were missed because of
interrupts not yet being enabled. This will allow checking potential
fault reason of VPU not booting.
Signed-off-by: Wachowski, Karol <karol.wachowski@intel.com >
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com >
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240115134434.493839-2-jacek.lawrynowicz@linux.intel.com
2024-01-22 10:27:37 +01:00
Maxime Ripard
cf79f291f9
Merge v6.8-rc1 into drm-misc-fixes
...
Let's kickstart the 6.8 fix cycle.
Signed-off-by: Maxime Ripard <mripard@kernel.org >
2024-01-22 09:44:15 +01:00
Linus Torvalds
6613476e22
Linux 6.8-rc1
v6.8-rc1
2024-01-21 14:11:32 -08:00
Linus Torvalds
35a4474b5c
Merge tag 'bcachefs-2024-01-21' of https://evilpiepirate.org/git/bcachefs
...
Pull more bcachefs updates from Kent Overstreet:
"Some fixes, Some refactoring, some minor features:
- Assorted prep work for disk space accounting rewrite
- BTREE_TRIGGER_ATOMIC: after combining our trigger callbacks, this
makes our trigger context more explicit
- A few fixes to avoid excessive transaction restarts on
multithreaded workloads: fstests (in addition to ktest tests) are
now checking slowpath counters, and that's shaking out a few bugs
- Assorted tracepoint improvements
- Starting to break up bcachefs_format.h and move on disk types so
they're with the code they belong to; this will make room to start
documenting the on disk format better.
- A few minor fixes"
* tag 'bcachefs-2024-01-21' of https://evilpiepirate.org/git/bcachefs: (46 commits)
bcachefs: Improve inode_to_text()
bcachefs: logged_ops_format.h
bcachefs: reflink_format.h
bcachefs; extents_format.h
bcachefs: ec_format.h
bcachefs: subvolume_format.h
bcachefs: snapshot_format.h
bcachefs: alloc_background_format.h
bcachefs: xattr_format.h
bcachefs: dirent_format.h
bcachefs: inode_format.h
bcachefs; quota_format.h
bcachefs: sb-counters_format.h
bcachefs: counters.c -> sb-counters.c
bcachefs: comment bch_subvolume
bcachefs: bch_snapshot::btime
bcachefs: add missing __GFP_NOWARN
bcachefs: opts->compression can now also be applied in the background
bcachefs: Prep work for variable size btree node buffers
bcachefs: grab s_umount only if snapshotting
...
2024-01-21 14:01:12 -08:00
Linus Torvalds
4fbbed7872
Merge tag 'timers-core-2024-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
...
Pull timer updates from Thomas Gleixner:
"Updates for time and clocksources:
- A fix for the idle and iowait time accounting vs CPU hotplug.
The time is reset on CPU hotplug which makes the accumulated
systemwide time jump backwards.
- Assorted fixes and improvements for clocksource/event drivers"
* tag 'timers-core-2024-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick-sched: Fix idle and iowait sleeptime accounting vs CPU hotplug
clocksource/drivers/ep93xx: Fix error handling during probe
clocksource/drivers/cadence-ttc: Fix some kernel-doc warnings
clocksource/drivers/timer-ti-dm: Fix make W=n kerneldoc warnings
clocksource/timer-riscv: Add riscv_clock_shutdown callback
dt-bindings: timer: Add StarFive JH8100 clint
dt-bindings: timer: thead,c900-aclint-mtimer: separate mtime and mtimecmp regs
2024-01-21 11:14:40 -08:00
Linus Torvalds
7b297a5cc9
Merge tag 'powerpc-6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
...
Pull powerpc fixes from Aneesh Kumar:
- Increase default stack size to 32KB for Book3S
Thanks to Michael Ellerman.
* tag 'powerpc-6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s: Increase default stack size to 32KB
2024-01-21 11:04:29 -08:00
Kent Overstreet
249f441f83
bcachefs: Improve inode_to_text()
...
Add line breaks - inode_to_text() is now much easier to read.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev >
2024-01-21 13:27:11 -05:00
Kent Overstreet
d826cc57c5
bcachefs: logged_ops_format.h
...
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev >
2024-01-21 13:27:11 -05:00
Kent Overstreet
8d52ba60c4
bcachefs: reflink_format.h
...
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev >
2024-01-21 13:27:11 -05:00
Kent Overstreet
b2fa1b633b
bcachefs; extents_format.h
...
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev >
2024-01-21 13:27:11 -05:00
Kent Overstreet
0560eb9abf
bcachefs: ec_format.h
...
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev >
2024-01-21 13:27:11 -05:00
Kent Overstreet
c6c4ff6507
bcachefs: subvolume_format.h
...
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev >
2024-01-21 13:27:11 -05:00
Kent Overstreet
8fed323b14
bcachefs: snapshot_format.h
...
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev >
2024-01-21 13:27:10 -05:00
Kent Overstreet
d455179fce
bcachefs: alloc_background_format.h
...
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev >
2024-01-21 13:27:10 -05:00
Kent Overstreet
72e0801049
bcachefs: xattr_format.h
...
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev >
2024-01-21 13:27:10 -05:00
Kent Overstreet
7ffc4daa5f
bcachefs: dirent_format.h
...
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev >
2024-01-21 13:27:10 -05:00
Kent Overstreet
b36425da71
bcachefs: inode_format.h
...
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev >
2024-01-21 13:27:10 -05:00
Kent Overstreet
82de6207fb
bcachefs; quota_format.h
...
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev >
2024-01-21 13:27:10 -05:00
Kent Overstreet
43314801a4
bcachefs: sb-counters_format.h
...
bcachefs_format.h has gotten too big; let's do some organizing.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev >
2024-01-21 13:27:10 -05:00
Kent Overstreet
3a58dfbc46
bcachefs: counters.c -> sb-counters.c
...
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev >
2024-01-21 13:27:10 -05:00
Kent Overstreet
12207f49ef
bcachefs: comment bch_subvolume
...
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev >
2024-01-21 13:27:10 -05:00
Kent Overstreet
d32088f2f2
bcachefs: bch_snapshot::btime
...
Add a field to bch_snapshot for creation time; this will be important
when we start exposing the snapshot tree to userspace.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev >
2024-01-21 13:27:10 -05:00
Kent Overstreet
7be0208fc9
bcachefs: add missing __GFP_NOWARN
...
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev >
2024-01-21 13:27:10 -05:00
Kent Overstreet
d7e77f53e9
bcachefs: opts->compression can now also be applied in the background
...
The "apply this compression method in the background" paths now use the
compression option if background_compression is not set; this means that
setting or changing the compression option will cause existing data to
be compressed accordingly in the background.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev >
2024-01-21 13:27:10 -05:00
Kent Overstreet
ec4edd7b9d
bcachefs: Prep work for variable size btree node buffers
...
bcachefs btree nodes are big - typically 256k - and btree roots are
pinned in memory. As we're now up to 18 btrees, we now have significant
memory overhead in mostly empty btree roots.
And in the future we're going to start enforcing that certain btree node
boundaries exist, to solve lock contention issues - analagous to XFS's
AGIs.
Thus, we need to start allocating smaller btree node buffers when we
can. This patch changes code that refers to the filesystem constant
c->opts.btree_node_size to refer to the btree node buffer size -
btree_buf_bytes() - where appropriate.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev >
2024-01-21 13:27:10 -05:00
Su Yue
2acc59dd88
bcachefs: grab s_umount only if snapshotting
...
When I was testing mongodb over bcachefs with compression,
there is a lockdep warning when snapshotting mongodb data volume.
$ cat test.sh
prog=bcachefs
$prog subvolume create /mnt/data
$prog subvolume create /mnt/data/snapshots
while true;do
$prog subvolume snapshot /mnt/data /mnt/data/snapshots/$(date +%s)
sleep 1s
done
$ cat /etc/mongodb.conf
systemLog:
destination: file
logAppend: true
path: /mnt/data/mongod.log
storage:
dbPath: /mnt/data/
lockdep reports:
[ 3437.452330] ======================================================
[ 3437.452750] WARNING: possible circular locking dependency detected
[ 3437.453168] 6.7.0-rc7-custom+ #85 Tainted: G E
[ 3437.453562] ------------------------------------------------------
[ 3437.453981] bcachefs/35533 is trying to acquire lock:
[ 3437.454325] ffffa0a02b2b1418 (sb_writers#10){.+.+}-{0:0}, at: filename_create+0x62/0x190
[ 3437.454875]
but task is already holding lock:
[ 3437.455268] ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs]
[ 3437.456009]
which lock already depends on the new lock.
[ 3437.456553]
the existing dependency chain (in reverse order) is:
[ 3437.457054]
-> #3 (&type->s_umount_key#48){.+.+}-{3:3}:
[ 3437.457507] down_read+0x3e/0x170
[ 3437.457772] bch2_fs_file_ioctl+0x232/0xc90 [bcachefs]
[ 3437.458206] __x64_sys_ioctl+0x93/0xd0
[ 3437.458498] do_syscall_64+0x42/0xf0
[ 3437.458779] entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.459155]
-> #2 (&c->snapshot_create_lock){++++}-{3:3}:
[ 3437.459615] down_read+0x3e/0x170
[ 3437.459878] bch2_truncate+0x82/0x110 [bcachefs]
[ 3437.460276] bchfs_truncate+0x254/0x3c0 [bcachefs]
[ 3437.460686] notify_change+0x1f1/0x4a0
[ 3437.461283] do_truncate+0x7f/0xd0
[ 3437.461555] path_openat+0xa57/0xce0
[ 3437.461836] do_filp_open+0xb4/0x160
[ 3437.462116] do_sys_openat2+0x91/0xc0
[ 3437.462402] __x64_sys_openat+0x53/0xa0
[ 3437.462701] do_syscall_64+0x42/0xf0
[ 3437.462982] entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.463359]
-> #1 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}:
[ 3437.463843] down_write+0x3b/0xc0
[ 3437.464223] bch2_write_iter+0x5b/0xcc0 [bcachefs]
[ 3437.464493] vfs_write+0x21b/0x4c0
[ 3437.464653] ksys_write+0x69/0xf0
[ 3437.464839] do_syscall_64+0x42/0xf0
[ 3437.465009] entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.465231]
-> #0 (sb_writers#10){.+.+}-{0:0}:
[ 3437.465471] __lock_acquire+0x1455/0x21b0
[ 3437.465656] lock_acquire+0xc6/0x2b0
[ 3437.465822] mnt_want_write+0x46/0x1a0
[ 3437.465996] filename_create+0x62/0x190
[ 3437.466175] user_path_create+0x2d/0x50
[ 3437.466352] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs]
[ 3437.466617] __x64_sys_ioctl+0x93/0xd0
[ 3437.466791] do_syscall_64+0x42/0xf0
[ 3437.466957] entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.467180]
other info that might help us debug this:
[ 3437.469670] 2 locks held by bcachefs/35533:
other info that might help us debug this:
[ 3437.467507] Chain exists of:
sb_writers#10 --> &c->snapshot_create_lock --> &type->s_umount_key#48
[ 3437.467979] Possible unsafe locking scenario:
[ 3437.468223] CPU0 CPU1
[ 3437.468405] ---- ----
[ 3437.468585] rlock(&type->s_umount_key#48);
[ 3437.468758] lock(&c->snapshot_create_lock);
[ 3437.469030] lock(&type->s_umount_key#48);
[ 3437.469291] rlock(sb_writers#10);
[ 3437.469434]
*** DEADLOCK ***
[ 3437.469670] 2 locks held by bcachefs/35533:
[ 3437.469838] #0 : ffffa0a02ce00a88 (&c->snapshot_create_lock){++++}-{3:3}, at: bch2_fs_file_ioctl+0x1e3/0xc90 [bcachefs]
[ 3437.470294] #1 : ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs]
[ 3437.470744]
stack backtrace:
[ 3437.470922] CPU: 7 PID: 35533 Comm: bcachefs Kdump: loaded Tainted: G E 6.7.0-rc7-custom+ #85
[ 3437.471313] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
[ 3437.471694] Call Trace:
[ 3437.471795] <TASK>
[ 3437.471884] dump_stack_lvl+0x57/0x90
[ 3437.472035] check_noncircular+0x132/0x150
[ 3437.472202] __lock_acquire+0x1455/0x21b0
[ 3437.472369] lock_acquire+0xc6/0x2b0
[ 3437.472518] ? filename_create+0x62/0x190
[ 3437.472683] ? lock_is_held_type+0x97/0x110
[ 3437.472856] mnt_want_write+0x46/0x1a0
[ 3437.473025] ? filename_create+0x62/0x190
[ 3437.473204] filename_create+0x62/0x190
[ 3437.473380] user_path_create+0x2d/0x50
[ 3437.473555] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs]
[ 3437.473819] ? lock_acquire+0xc6/0x2b0
[ 3437.474002] ? __fget_files+0x2a/0x190
[ 3437.474195] ? __fget_files+0xbc/0x190
[ 3437.474380] ? lock_release+0xc5/0x270
[ 3437.474567] ? __x64_sys_ioctl+0x93/0xd0
[ 3437.474764] ? __pfx_bch2_fs_file_ioctl+0x10/0x10 [bcachefs]
[ 3437.475090] __x64_sys_ioctl+0x93/0xd0
[ 3437.475277] do_syscall_64+0x42/0xf0
[ 3437.475454] entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.475691] RIP: 0033:0x7f2743c313af
======================================================
In __bch2_ioctl_subvolume_create(), we grab s_umount unconditionally
and unlock it at the end of the function. There is a comment
"why do we need this lock?" about the lock coming from
commit 42d237320e ("bcachefs: Snapshot creation, deletion")
The reason is that __bch2_ioctl_subvolume_create() calls
sync_inodes_sb() which enforce locked s_umount to writeback all dirty
nodes before doing snapshot works.
Fix it by read locking s_umount for snapshotting only and unlocking
s_umount after sync_inodes_sb().
Signed-off-by: Su Yue <glass.su@suse.com >
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev >
2024-01-21 13:27:10 -05:00
Su Yue
369acf97d6
bcachefs: kvfree bch_fs::snapshots in bch2_fs_snapshots_exit
...
bch_fs::snapshots is allocated by kvzalloc in __snapshot_t_mut.
It should be freed by kvfree not kfree.
Or umount will triger:
[ 406.829178 ] BUG: unable to handle page fault for address: ffffe7b487148008
[ 406.830676 ] #PF: supervisor read access in kernel mode
[ 406.831643 ] #PF: error_code(0x0000) - not-present page
[ 406.832487 ] PGD 0 P4D 0
[ 406.832898 ] Oops: 0000 [#1 ] PREEMPT SMP PTI
[ 406.833512 ] CPU: 2 PID: 1754 Comm: umount Kdump: loaded Tainted: G OE 6.7.0-rc7-custom+ #90
[ 406.834746 ] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
[ 406.835796 ] RIP: 0010:kfree+0x62/0x140
[ 406.836197 ] Code: 80 48 01 d8 0f 82 e9 00 00 00 48 c7 c2 00 00 00 80 48 2b 15 78 9f 1f 01 48 01 d0 48 c1 e8 0c 48 c1 e0 06 48 03 05 56 9f 1f 01 <48> 8b 50 08 48 89 c7 f6 c2 01 0f 85 b0 00 00 00 66 90 48 8b 07 f6
[ 406.837810 ] RSP: 0018:ffffb9d641607e48 EFLAGS: 00010286
[ 406.838213 ] RAX: ffffe7b487148000 RBX: ffffb9d645200000 RCX: ffffb9d641607dc4
[ 406.838738 ] RDX: 000065bb00000000 RSI: ffffffffc0d88b84 RDI: ffffb9d645200000
[ 406.839217 ] RBP: ffff9a4625d00068 R08: 0000000000000001 R09: 0000000000000001
[ 406.839650 ] R10: 0000000000000001 R11: 000000000000001f R12: ffff9a4625d4da80
[ 406.840055 ] R13: ffff9a4625d00000 R14: ffffffffc0e2eb20 R15: 0000000000000000
[ 406.840451 ] FS: 00007f0a264ffb80(0000) GS:ffff9a4e2d500000(0000) knlGS:0000000000000000
[ 406.840851 ] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 406.841125 ] CR2: ffffe7b487148008 CR3: 000000018c4d2000 CR4: 00000000000006f0
[ 406.841464 ] Call Trace:
[ 406.841583 ] <TASK>
[ 406.841682 ] ? __die+0x1f/0x70
[ 406.841828 ] ? page_fault_oops+0x159/0x470
[ 406.842014 ] ? fixup_exception+0x22/0x310
[ 406.842198 ] ? exc_page_fault+0x1ed/0x200
[ 406.842382 ] ? asm_exc_page_fault+0x22/0x30
[ 406.842574 ] ? bch2_fs_release+0x54/0x280 [bcachefs]
[ 406.842842 ] ? kfree+0x62/0x140
[ 406.842988 ] ? kfree+0x104/0x140
[ 406.843138 ] bch2_fs_release+0x54/0x280 [bcachefs]
[ 406.843390 ] kobject_put+0xb7/0x170
[ 406.843552 ] deactivate_locked_super+0x2f/0xa0
[ 406.843756 ] cleanup_mnt+0xba/0x150
[ 406.843917 ] task_work_run+0x59/0xa0
[ 406.844083 ] exit_to_user_mode_prepare+0x197/0x1a0
[ 406.844302 ] syscall_exit_to_user_mode+0x16/0x40
[ 406.844510 ] do_syscall_64+0x4e/0xf0
[ 406.844675 ] entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 406.844907 ] RIP: 0033:0x7f0a2664e4fb
Signed-off-by: Su Yue <glass.su@suse.com >
Reviewed-by: Brian Foster <bfoster@redhat.com >
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev >
2024-01-21 13:27:10 -05:00
Kent Overstreet
00fff4dd58
bcachefs: bios must be 512 byte algined
...
Fixes: 023f9ac9f7 bcachefs: Delete dio read alignment check
Reported-by: Brian Foster <bfoster@redhat.com >
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev >
2024-01-21 13:27:10 -05:00
Colin Ian King
aead3428e8
bcachefs: remove redundant variable tmp
...
The variable tmp is being assigned a value but it isn't being
read afterwards. The assignment is redundant and so tmp can be
removed.
Cleans up clang scan build warning:
warning: Although the value stored to 'ret' is used in the enclosing
expression, the value is never actually read from 'ret'
[deadcode.DeadStores]
Signed-off-by: Colin Ian King <colin.i.king@gmail.com >
Reviewed-by: Brian Foster <bfoster@redhat.com >
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev >
2024-01-21 13:27:10 -05:00
Kent Overstreet
b97de45365
bcachefs: Improve trace_trans_restart_relock
...
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev >
2024-01-21 13:27:10 -05:00
Kent Overstreet
46bf2e9cc7
bcachefs: Fix excess transaction restarts in __bchfs_fallocate()
...
drop_locks_do() should not be used in a fastpath without first trying
the do in nonblocking mode - the unlock and relock will cause excessive
transaction restarts and potentially livelocking with other threads that
are contending for the same locks.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev >
2024-01-21 13:27:10 -05:00
Kent Overstreet
1a5039041b
bcachefs: extents_to_bp_state
...
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev >
2024-01-21 13:27:10 -05:00
Kent Overstreet
ba96d36ca5
bcachefs: bkey_and_val_eq()
...
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev >
2024-01-21 13:27:09 -05:00