Commit Graph

1235671 Commits

Author SHA1 Message Date
Chuck Lever
2dd6e29a3e svcrdma: Update some svcrdma DMA-related tracepoints
A send/recv_ctxt already records transport-related information
in the cq.id, thus there is no need to record the IP addresses of
the transport endpoints.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:28 -05:00
Chuck Lever
848760a9e7 svcrdma: DMA error tracepoints should report completion IDs
Update the DMA error flow tracepoints to report the completion ID of
the failing context. This ties the wait/failure to a particular
operation or request, which is more useful than knowing only the
failing transport.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:28 -05:00
Chuck Lever
ad3656bd84 svcrdma: SQ error tracepoints should report completion IDs
Update the Send Queue's error flow tracepoints to report the
completion ID of the waiting or failing context. This ties the
wait/failure to a particular operation or request, which is a little
more useful than knowing only the transport that is about to close.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:27 -05:00
Chuck Lever
be2acb1048 rpcrdma: Introduce a simple cid tracepoint class
De-duplicate some code, making it easier to add new tracepoints that
report only a completion ID.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:27 -05:00
Chuck Lever
907e34a7d0 svcrdma: Add lockdep class keys for transport locks
Two svcrdma-related transport locks can become quite contended.
Collate their use and make them easy to find in /proc/lock_stat for
better observability.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:27 -05:00
Chuck Lever
bfb81535c2 svcrdma: Clean up locking
There's no need to protect llist_entry() with a spin lock.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:27 -05:00
Chuck Lever
f09c36c8df svcrdma: Add an async version of svc_rdma_write_info_free()
DMA unmapping can take quite some time, so it should not be handled
in a single-threaded completion handler. Defer releasing write_info
structs to the recently-added workqueue.

With this patch, DMA unmapping can be handled in parallel, and it
does not cause head-of-queue blocking of Write completions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:27 -05:00
Chuck Lever
ae225fe27b svcrdma: Add an async version of svc_rdma_send_ctxt_put()
DMA unmapping can take quite some time, so it should not be handled
in a single-threaded completion handler. Defer releasing send_ctxts
to the recently-added workqueue.

With this patch, DMA unmapping can be handled in parallel, and it
does not cause head-of-queue blocking of Send completions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:27 -05:00
Chuck Lever
9c7e1a0658 svcrdma: Add a utility workqueue to svcrdma
To handle work in the background, set up an UNBOUND workqueue for
svcrdma. Subsequent patches will make use of it.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:26 -05:00
Chuck Lever
877118c667 svcrdma: Pre-allocate svc_rdma_recv_ctxt objects
The original reason for allocating svc_rdma_recv_ctxt objects during
Receive completion was to ensure the objects were allocated on the
NUMA node closest to the underlying IB device.

Since commit c5d68d25bd ("svcrdma: Clean up allocation of
svc_rdma_recv_ctxt"), however, the device's favored node is
explicitly passed to the memory allocator.

To enable switching Receive completion to soft IRQ context, move
memory allocation out of completion handling, since it can be
costly, and it can sleep.

A limited number of objects is now allocated at "accept" time.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:26 -05:00
Chuck Lever
b541dd554b svcrdma: Eliminate allocation of recv_ctxt objects in backchannel
The svc_rdma_recv_ctxt free list uses a lockless list to avoid the
need for a spin lock in the fast path. llist_del_first(), which is
used by svc_rdma_recv_ctxt_get(), requires serialization, however,
when there are multiple list producers that are unserialized.

I mistakenly thought there was only one caller of
svc_rdma_recv_ctxt_get() (svc_rdma_refresh_recvs()), thus explicit
serialization would not be necessary. But there is another caller:
svc_rdma_bc_sendto(), and these two are not serialized against each
other. I haven't seen ill effects that I could directly ascribe to
a lack of serialization. It's just an observation based on code
audit.

When DMA-mapping before sending a Reply, the passed-in struct
svc_rdma_recv_ctxt is used only for its write and reply PCLs. These
are currently always empty in the backchannel case. So, instead of
passing a full svc_rdma_recv_ctxt object to
svc_rdma_map_reply_msg(), let's pass in just the Write and Reply
PCLs.

This change makes it unnecessary for the backchannel to acquire a
dummy svc_rdma_recv_ctxt object when sending an RPC Call. The need
for svc_rdma_recv_ctxt free list serialization is now completely
avoided.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:26 -05:00
ChenXiaoSong
52e8910075 NFSv4, NFSD: move enum nfs_cb_opnum4 to include/linux/nfs4.h
Callback operations enum is defined in client and server, move it to
common header file.

Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Acked-by: Anna Schumaker <Anna.Schumaker@netapp.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:26 -05:00
Dan Carpenter
3c86e615d1 nfsd: remove unnecessary NULL check
We check "state" for NULL on the previous line so it can't be NULL here.
No need to check again.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/r/202312031425.LffZTarR-lkp@intel.com/
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:26 -05:00
Chuck Lever
3587b5c753 SUNRPC: Remove RQ_SPLICE_OK
This flag is no longer used.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:26 -05:00
Chuck Lever
a2c91753a4 NFSD: Modify NFSv4 to use nfsd_read_splice_ok()
Avoid the use of an atomic bitop, and prepare for adding a run-time
switch for using splice reads.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:25 -05:00
Chuck Lever
c21fd7a8e8 NFSD: Replace RQ_SPLICE_OK in nfsd_read()
RQ_SPLICE_OK is a bit of a layering violation. Also, a subsequent
patch is going to provide a mechanism for always disabling splice
reads.

Splicing is an issue only for NFS READs, so refactor nfsd_read() to
check the auth type directly instead of relying on an rq_flag
setting.

The new helper will be added into the NFSv4 read path in a
subsequent patch.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:25 -05:00
Chuck Lever
deb704281f SUNRPC: Add a server-side API for retrieving an RPC's pseudoflavor
NFSD will use this new API to determine whether nfsd_splice_read is
safe to use. This avoids the need to add a dependency to NFSD for
CONFIG_SUNRPC_GSS.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:25 -05:00
Chuck Lever
a853ed5525 NFSD: Document lack of f_pos_lock in nfsd_readdir()
Al Viro notes that normal system calls hold f_pos_lock when calling
->iterate_shared and ->llseek; however nfsd_readdir() does not take
that mutex when calling these methods.

It should be safe however because the struct file acquired by
nfsd_readdir() is not visible to other threads.

Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:25 -05:00
Chuck Lever
d0ab8b649b NFSD: Remove nfsd_drc_gc() tracepoint
This trace point was for debugging the DRC's garbage collection. In
the field it's just noise.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:25 -05:00
Chuck Lever
ce7df05508 NFSD: Make the file_delayed_close workqueue UNBOUND
workqueue: nfsd_file_delayed_close [nfsd] hogged CPU for >13333us 8
	times, consider switching to WQ_UNBOUND

There's no harm in closing a cached file descriptor on another core.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:25 -05:00
Oleg Nesterov
f3734cc407 NFSD: use read_seqbegin() rather than read_seqbegin_or_lock()
The usage of read_seqbegin_or_lock() in nfsd_copy_write_verifier()
is wrong. "seq" is always even and thus "or_lock" has no effect,
this code can never take ->writeverf_lock for writing.

I guess this is fine, nfsd_copy_write_verifier() just copies 8 bytes
and nfsd_reset_write_verifier() is supposed to be very rare operation
so we do not need the adaptive locking in this case.

Yet the code looks wrong and sub-optimal, it can use read_seqbegin()
without changing the behaviour.

[ cel: Note also that it eliminates this Sparse warning:

fs/nfsd/nfssvc.c:360:6: warning: context imbalance in 'nfsd_copy_write_verifier' -
	different lock contexts for basic block

]

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:24 -05:00
Jeff Layton
74fd48739d nfsd: new Kconfig option for legacy client tracking
We've had a number of attempts at different NFSv4 client tracking
methods over the years, but now nfsdcld has emerged as the clear winner
since the others (recoverydir and the usermodehelper upcall) are
problematic.

As a case in point, the recoverydir backend uses MD5 hashes to encode
long form clientid strings, which means that nfsd repeatedly gets dinged
on FIPS audits, since MD5 isn't considered secure. Its use of MD5 is not
cryptographically significant, so there is no danger there, but allowing
us to compile that out allows us to sidestep the issue entirely.

As a prelude to eventually removing support for these client tracking
methods, add a new Kconfig option that enables them. Mark it deprecated
and make it default to N.

Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-07 17:54:24 -05:00
Linus Torvalds
0dd3ee3112 Linux 6.7 v6.7 2024-01-07 12:18:38 -08:00
Linus Torvalds
52b1853b08 Merge tag 'i2c-for-6.7-final' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
 "Improve the detection when to run atomic transfer handlers for kernels
  with preemption disabled. This removes some false positive splats a
  number of users were seeing if their driver didn't have support for
  atomic transfers.

  Also, fix a typo in the docs while we are here"

* tag 'i2c-for-6.7-final' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: core: Fix atomic xfer check for non-preempt config
  Documentation/i2c: fix spelling error in i2c-address-translators
2024-01-06 11:35:37 -08:00
Benjamin Bara
a3368e1186 i2c: core: Fix atomic xfer check for non-preempt config
Since commit aa49c90894 ("i2c: core: Run atomic i2c xfer when
!preemptible"), the whole reboot/power off sequence on non-preempt kernels
is using atomic i2c xfer, as !preemptible() always results to 1.

During device_shutdown(), the i2c might be used a lot and not all busses
have implemented an atomic xfer handler. This results in a lot of
avoidable noise, like:

[   12.687169] No atomic I2C transfer handler for 'i2c-0'
[   12.692313] WARNING: CPU: 6 PID: 275 at drivers/i2c/i2c-core.h:40 i2c_smbus_xfer+0x100/0x118
...

Fix this by allowing non-atomic xfer when the interrupts are enabled, as
it was before.

Link: https://lore.kernel.org/r/20231222230106.73f030a5@yea
Link: https://lore.kernel.org/r/20240102150350.3180741-1-mwalle@kernel.org
Link: https://lore.kernel.org/linux-i2c/13271b9b-4132-46ef-abf8-2c311967bb46@mailbox.org/
Fixes: aa49c90894 ("i2c: core: Run atomic i2c xfer when !preemptible")
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: Benjamin Bara <benjamin.bara@skidata.com>
Tested-by: Michael Walle <mwalle@kernel.org>
Tested-by: Tor Vic <torvic9@mailbox.org>
[wsa: removed a comment which needs more work, code is ok]
Signed-off-by: Wolfram Sang <wsa@kernel.org>
2024-01-06 14:10:10 +01:00
Linus Torvalds
95c8a35f1c Merge tag 'mm-hotfixes-stable-2024-01-05-11-35' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc mm fixes from Andrew Morton:
 "12 hotfixes.

  Two are cc:stable and the remainder either address post-6.7 issues or
  aren't considered necessary for earlier kernel versions"

* tag 'mm-hotfixes-stable-2024-01-05-11-35' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm: shrinker: use kvzalloc_node() from expand_one_shrinker_info()
  mailmap: add entries for Mathieu Othacehe
  MAINTAINERS: change vmware.com addresses to broadcom.com
  arch/mm/fault: fix major fault accounting when retrying under per-VMA lock
  mm/mglru: skip special VMAs in lru_gen_look_around()
  MAINTAINERS: hand over hwpoison maintainership to Miaohe Lin
  MAINTAINERS: remove hugetlb maintainer Mike Kravetz
  mm: fix unmap_mapping_range high bits shift bug
  mm: memcg: fix split queue list crash when large folio migration
  mm: fix arithmetic for max_prop_frac when setting max_ratio
  mm: fix arithmetic for bdi min_ratio
  mm: align larger anonymous mappings on THP boundaries
2024-01-05 13:46:18 -08:00
Linus Torvalds
0d3ac66ed8 Merge tag 'nfsd-6.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fix from Chuck Lever:

 - Fix another regression in the NFSD administrative API

* tag 'nfsd-6.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  nfsd: drop the nfsd_put helper
2024-01-05 13:12:29 -08:00
Linus Torvalds
a4ab2706bb Merge tag 'firewire-fixes-6.7-final' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394
Pull firewire fix from Takashi Sakamoto:
 "A single patch to suppress unexpected system reboot in AMD Ryzen
  machines with PCIe card consisting of Asmedia ASM1083/1085 and
  VT6306/6307/6308.

  When the 1394 OHCI driver for the card accesses a specific register
  in PCI memory space, the system reboot often occurs.

  The issue affects all versions of Linux kernel as long as the 1394
  OHCI driver is included. The mechanism of unexpected system reboot is
  not clear, so the driver is changed to avoid the access itself when
  detecting the combination of hardware"

* tag 'firewire-fixes-6.7-final' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
  firewire: ohci: suppress unexpected system reboot in AMD Ryzen machines and ASM108x/VT630x PCIe cards
2024-01-05 12:26:26 -08:00
Linus Torvalds
6c23529c08 Merge tag 'mmc-v6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
 "MMC core:
   - Fix releasing the host by canceling the delayed work
   - Fix pause retune on all RPMB partitions

  MMC host:
   - meson-mx-sdhc: Fix HW hang during card initialization
   - sdhci-sprd: Fix eMMC init failure after HW reset"

* tag 'mmc-v6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: sdhci-sprd: Fix eMMC init failure after hw reset
  mmc: core: Cancel delayed work before releasing host
  mmc: rpmb: fixes pause retune on all RPMB partitions.
  mmc: meson-mx-sdhc: Fix initialization frozen issue
2024-01-05 12:12:33 -08:00
Linus Torvalds
2b5bd1498d Merge tag 'drm-fixes-2024-01-05' of git://anongit.freedesktop.org/drm/drm
Pull more drm fixes from Dave Airlie:
 "The amdgpu ones are fairly normal, the one that is a bit large is a
  fix for a newly introduced IP in 6.7 so unlikely to cause regressions.

  The nouveau ones are mostly memory leaks and debugging cleanups from
  the GSP (new nvidia firmware) enablement. There are some GSP changes
  to the message passing code and a subsequent fix for eDP panel turn
  on, that means my laptop can turn on the panel in GSP mode. These are
  fairly low chance of disrupting things since GSP is new in 6.7. The
  final not all in GSP fix is a deadlock seen with i915/nouveau when GSP
  is used where the the fence and irq paths have locking inversions,
  I've pushed some irq enablement out to a workqueue, and this has seen
  some fairly decent testing.

  amdgpu:
   - DP MST fix
   - SMU 13.0.6 fixes
   - fix displays on macbooks using vega12
   - fix VSC and colorimetry on DP/eDP

  nouveau:
   - fix deadlock between fence signalling and irq paths
   - fix GSP memory leaks
   - fix GSP leftover debug
   - hide some GSP callback messages
   - fix GSP display disable path
   - fix GSP ACPI interaction
   - handle errors in ctrl messages
   - use errors info to fix DP link training"

* tag 'drm-fixes-2024-01-05' of git://anongit.freedesktop.org/drm/drm:
  drm/nouveau/dp: Honor GSP link training retry timeouts
  nouveau: push event block/allowing out of the fence context
  nouveau/gsp: always free the alloc messages on r535
  nouveau/gsp: don't free ctrl messages on errors
  nouveau/gsp: convert gsp errors to generic errors
  drm/nouveau/gsp: Fix ACPI MXDM/MXDS method invocations
  nouveau/gsp: free userd allocation.
  nouveau/gsp: free acpi object after use
  nouveau: fix disp disabling with GSP
  nouveau/gsp: drop some acpi related debug
  nouveau/gsp: add three notifier callbacks that we see in normal operation (v2)
  drm/amd/pm: Use gpu_metrics_v1_5 for SMUv13.0.6
  drm/amd/pm: Add gpu_metrics_v1_5
  drm/amd/pm: Add mem_busy_percent for GCv9.4.3 apu
  drm/amd/display: Fix sending VSC (+ colorimetry) packets for DP/eDP displays without PSR
  drm/amdgpu: skip gpu_info fw loading on navi12
  drm/amd/display: add nv12 bounding box
  drm/amd/pm: Update metric table for jpeg/vcn data
  drm/amd/pm: Use separate metric table for APU
  drm/amd/display: pbn_div need be updated for hotplug event
2024-01-05 12:02:20 -08:00
Tetsuo Handa
7fba9420b7 mm: shrinker: use kvzalloc_node() from expand_one_shrinker_info()
syzbot is reporting uninit-value at shrinker_alloc(), for commit
307bececcd ("mm: shrinker: add a secondary array for
shrinker_info::{map, nr_deferred}") which assumed that the ->unit was
allocated with __GFP_ZERO forgot to replace kvmalloc_node() in
expand_one_shrinker_info() with kvzalloc_node().

Link: https://lkml.kernel.org/r/9226cc0a-10e0-4489-80c5-58c3b5b4359c@I-love.SAKURA.ne.jp
Reported-by: syzbot <syzbot+1e0ed05798af62917464@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=1e0ed05798af62917464
Fixes: 307bececcd ("mm: shrinker: add a secondary array for shrinker_info::{map, nr_deferred}")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05 09:58:32 -08:00
Linus Torvalds
6d0dc8559c Merge tag 'soc-fixes-6.7-3a' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
 "These are two correctness fixes for handing DT input in the
  Allwinner (sunxi) SMP startup code"

* tag 'soc-fixes-6.7-3a' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
  ARM: sun9i: smp: fix return code check of of_property_match_string
  ARM: sun9i: smp: Fix array-index-out-of-bounds read in sunxi_mc_smp_init
2024-01-05 09:39:24 -08:00
Linus Torvalds
7987b8b75f Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fix from Paolo Bonzini:

 - Fix boolean logic in intel_guest_get_msrs

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86/pmu: fix masking logic for MSR_CORE_PERF_GLOBAL_CTRL
2024-01-05 09:16:15 -08:00
Linus Torvalds
7131c2e9bb Merge tag 'probes-fixes-v6.7-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull kprobes/x86 fix from Masami Hiramatsu:

 - Fix to emulate indirect call which size is not 5 byte.

   Current code expects the indirect call instructions are 5 bytes, but
   that is incorrect. Usually indirect call based on register is shorter
   than that, thus the emulation causes a kernel crash by accessing
   wrong instruction boundary. This uses the instruction size to
   calculate the return address correctly.

* tag 'probes-fixes-v6.7-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  x86/kprobes: fix incorrect return address calculation in kprobe_emulate_call_indirect
2024-01-05 09:07:59 -08:00
Linus Torvalds
3eca89454a Merge tag '6.7-rc8-smb3-mchan-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
 "Three important multichannel smb3 client fixes found in recent
  testing:

   - fix oops due to incorrect refcounting of interfaces after
     disabling multichannel

   - fix possible unrecoverable session state after disabling
     multichannel with active sessions

   - fix two places that were missing use of chan_lock"

* tag '6.7-rc8-smb3-mchan-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: do not depend on release_iface for maintaining iface_list
  cifs: cifs_chan_is_iface_active should be called with chan_lock held
  cifs: after disabling multichannel, mark tcon for reconnect
2024-01-05 08:52:25 -08:00
Takashi Sakamoto
ac9184fbb8 firewire: ohci: suppress unexpected system reboot in AMD Ryzen machines and ASM108x/VT630x PCIe cards
VIA VT6306/6307/6308 provides PCI interface compliant to 1394 OHCI. When
the hardware is combined with Asmedia ASM1083/1085 PCIe-to-PCI bus bridge,
it appears that accesses to its 'Isochronous Cycle Timer' register (offset
0xf0 on PCI memory space) often causes unexpected system reboot in any
type of AMD Ryzen machine (both 0x17 and 0x19 families). It does not
appears in the other type of machine (AMD pre-Ryzen machine, Intel
machine, at least), or in the other OHCI 1394 hardware (e.g. Texas
Instruments).

The issue explicitly appears at a commit dcadfd7f7c ("firewire: core:
use union for callback of transaction completion") added to v6.5 kernel.
It changed 1394 OHCI driver to access to the register every time to
dispatch local asynchronous transaction. However, the issue exists in
older version of kernel as long as it runs in AMD Ryzen machine, since
the access to the register is required to maintain bus time. It is not
hard to imagine that users experience the unexpected system reboot when
generating bus reset by plugging any devices in, or reading the register
by time-aware application programs; e.g. audio sample processing.

This commit suppresses the unexpected system reboot in the combination of
hardware. It avoids the access itself. As a result, the software stack can
not provide the hardware time anymore to unit drivers, userspace
applications, and nodes in the same IEEE 1394 bus. It brings apparent
disadvantage since time-aware application programs require it, while
time-unaware applications are available again; e.g. sbp2.

Cc: stable@vger.kernel.org
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Closes: https://bugzilla.suse.com/show_bug.cgi?id=1215436
Reported-by: Mario Limonciello <mario.limonciello@amd.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217994
Reported-by: Tobias Gruetzmacher <tobias-lists@23.gs>
Closes: https://sourceforge.net/p/linux1394/mailman/message/58711901/
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2240973
Closes: https://bugs.launchpad.net/linux/+bug/2043905
Link: https://lore.kernel.org/r/20240102110150.244475-1-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-01-05 21:28:08 +09:00
Jeff Layton
64e6304169 nfsd: drop the nfsd_put helper
It's not safe to call nfsd_put once nfsd_last_thread has been called, as
that function will zero out the nn->nfsd_serv pointer.

Drop the nfsd_put helper altogether and open-code the svc_put in its
callers instead. That allows us to not be reliant on the value of that
pointer when handling an error.

Fixes: 2a501f55cd ("nfsd: call nfsd_last_thread() before final nfsd_put()")
Reported-by: Zhi Li <yieli@redhat.com>
Cc: NeilBrown <neilb@suse.de>
Signed-off-by: Jeffrey Layton <jlayton@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-04 22:52:27 -05:00
Lyude Paul
eb284f4b37 drm/nouveau/dp: Honor GSP link training retry timeouts
Turns out that one of the ways that Nvidia's driver handles the pre-LT
timeout for eDP panels is by providing a retry timeout in their link
training callbacks that we're expected to wait for. Up until now we didn't
pay any attention to this parameter.

So, start honoring the timeout if link training fails - and retry up to 3
times. The "3 times" bit comes from OpenRM's link training code.

[airlied: this fixes the panel on one of my laptops]

Signed-off-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231222043308.3090089-12-airlied@gmail.com
2024-01-05 12:27:53 +10:00
Dave Airlie
eacabb5462 nouveau: push event block/allowing out of the fence context
There is a deadlock between the irq and fctx locks,
the irq handling takes irq then fctx lock
the fence signalling takes fctx then irq lock

This splits the fence signalling path so the code that hits
the irq lock is done in a separate work queue.

This seems to fix crashes/hangs when using nouveau gsp with
i915 primary GPU.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231222043308.3090089-11-airlied@gmail.com
2024-01-05 12:27:53 +10:00
Dave Airlie
9c9dd22ba5 nouveau/gsp: always free the alloc messages on r535
Fixes a memory leak seen with kmemleak.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231222043308.3090089-10-airlied@gmail.com
2024-01-05 12:27:53 +10:00
Dave Airlie
4ae3a20102 nouveau/gsp: don't free ctrl messages on errors
It looks like for some messages the upper layers need to get access to the
results of the message so we can interpret it.

Rework the ctrl push interface to not free things and cleanup properly
whereever it errors out.

Requested-by: Lyude
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231222043308.3090089-9-airlied@gmail.com
2024-01-05 12:27:53 +10:00
Dave Airlie
59f6a3d8db nouveau/gsp: convert gsp errors to generic errors
This should let the upper layers retry as needed on EAGAIN.

There may be other values we will care about in the future, but
this covers our present needs.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231222043308.3090089-8-airlied@gmail.com
2024-01-05 12:27:53 +10:00
Lyude Paul
cf22fc2846 drm/nouveau/gsp: Fix ACPI MXDM/MXDS method invocations
Currently we get an error from ACPI because both of these arguments expect
a single argument, and we don't provide one. I'm not totally clear on what
that argument does, but we're able to find the missing value from
_acpiCacheMethodData() in src/kernel/platform/acpi_common.c in nvidia's
driver. So, let's add that - which doesn't get eDP displays to power on
quite yet, but gets rid of the argument warning at least.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231222043308.3090089-7-airlied@gmail.com
2024-01-05 12:27:53 +10:00
Dave Airlie
3108cc0323 nouveau/gsp: free userd allocation.
This was being leaked.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231222043308.3090089-6-airlied@gmail.com
2024-01-05 12:27:53 +10:00
Dave Airlie
a9b9b42b54 nouveau/gsp: free acpi object after use
This fixes a memory leak for the acpi dod object.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231222043308.3090089-5-airlied@gmail.com
2024-01-05 12:27:53 +10:00
Dave Airlie
7854ea0e40 nouveau: fix disp disabling with GSP
This func ptr here is normally static allocation, but gsp r535
uses a dynamic pointer, so we need to handle that better.

This fixes a crash with GSP when you use config=disp=0 to avoid
disp problems.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231222043308.3090089-4-airlied@gmail.com
2024-01-05 12:27:52 +10:00
Dave Airlie
34ce62a51e nouveau/gsp: drop some acpi related debug
These were leftover debug, if we need to bring them back do so
for debugging later.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231222043308.3090089-3-airlied@gmail.com
2024-01-05 12:27:52 +10:00
Dave Airlie
24ab185d98 nouveau/gsp: add three notifier callbacks that we see in normal operation (v2)
Add NULL callbacks for some things GSP calls that we don't handle, but know about
so we avoid the logging.

v2: Timur suggested allowing null fn.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231222043308.3090089-2-airlied@gmail.com
2024-01-05 12:27:52 +10:00
Dave Airlie
ed9895d8d4 Merge tag 'amd-drm-fixes-6.7-2024-01-04' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amdgpu:
- DP MST fix
- SMU 13.0.6 fixes
- Fix displays on macbooks using vega12
- Fix VSC and colorimetry on DP/eDP

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240104152139.4931-1-alexander.deucher@amd.com
2024-01-05 12:24:55 +10:00
Linus Torvalds
1f874787ed Merge tag 'net-6.7-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
 "Including fixes from wireless and netfilter.

  We haven't accumulated much over the break. If it wasn't for the
  uninterrupted stream of fixes for Intel drivers this PR would be very
  slim. There was a handful of user reports, however, either they stood
  out because of the lower traffic or users have had more time to test
  over the break. The ones which are v6.7-relevant should be wrapped up.

  Current release - regressions:

   - Revert "net: ipv6/addrconf: clamp preferred_lft to the minimum
     required", it caused issues on networks where routers send prefixes
     with preferred_lft=0

   - wifi:
      - iwlwifi: pcie: don't synchronize IRQs from IRQ, prevent deadlock
      - mac80211: fix re-adding debugfs entries during reconfiguration

  Current release - new code bugs:

   - tcp: print AO/MD5 messages only if there are any keys

  Previous releases - regressions:

   - virtio_net: fix missing dma unmap for resize, prevent OOM

  Previous releases - always broken:

   - mptcp: prevent tcp diag from closing listener subflows

   - nf_tables:
      - set transport header offset for egress hook, fix IPv4 mangling
      - skip set commit for deleted/destroyed sets, avoid double deactivation

   - nat: make sure action is set for all ct states, fix openvswitch
     matching on ICMP packets in related state

   - eth: mlxbf_gige: fix receive hang under heavy traffic

   - eth: r8169: fix PCI error on system resume for RTL8168FP

   - net: add missing getsockopt(SO_TIMESTAMPING_NEW) and cmsg handling"

* tag 'net-6.7-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (52 commits)
  net/tcp: Only produce AO/MD5 logs if there are any keys
  net: Implement missing SO_TIMESTAMPING_NEW cmsg support
  bnxt_en: Remove mis-applied code from bnxt_cfg_ntp_filters()
  net: ravb: Wait for operating mode to be applied
  asix: Add check for usbnet_get_endpoints
  octeontx2-af: Re-enable MAC TX in otx2_stop processing
  octeontx2-af: Always configure NIX TX link credits based on max frame size
  net/smc: fix invalid link access in dumping SMC-R connections
  net/qla3xxx: fix potential memleak in ql_alloc_buffer_queues
  virtio_net: fix missing dma unmap for resize
  igc: Fix hicredit calculation
  ice: fix Get link status data length
  i40e: Restore VF MSI-X state during PCI reset
  i40e: fix use-after-free in i40e_aqc_add_filters()
  net: Save and restore msg_namelen in sock_sendmsg
  netfilter: nft_immediate: drop chain reference counter on error
  netfilter: nf_nat: fix action not being set for all ct states
  net: bcmgenet: Fix FCS generation for fragmented skbuffs
  mptcp: prevent tcp diag from closing listener subflows
  MAINTAINERS: add Geliang as reviewer for MPTCP
  ...
2024-01-04 16:34:50 -08:00