We already know that the head buffer and page are empty, so if there is
any data, it is in the tail.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
We can fit the device_addr4 opaque data padding in the pages.
Fixes: cf500bac8f ("SUNRPC: Introduce rpc_prepare_reply_pages()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Use the existing xdr_stream_decode_string_dup() to safely decode into
kmalloced strings.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Ensure that we report the correct netid when using UDP or RDMA
transports to the DSes.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
We want to enable RDMA and UDP as valid transport methods if a
GETDEVICEINFO call specifies it. Do so by adding a parser for the
netid that translates it to an appropriate argument for the RPC
transport layer.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If the pNFS metadata server advertises multiple addresses for the same
data server, we should try to connect to just one protocol family and
transport type on the assumption that homogeneity will improve performance.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Switch the mount code to use xprt_find_transport_ident() and to check
the results before allowing the mount to proceed.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
After we've looked up the transport module, we need to ensure it can't
go away until we've finished running the transport setup code.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
According to RFC5666, the correct netid for an IPv6 addressed RDMA
transport is "rdma6", which we've supported as a mount option since
Linux-4.7. The problem is when we try to load the module "xprtrdma6",
that will fail, since there is no modulealias of that name.
Fixes: 181342c5eb ("xprtrdma: Add rdma6 option to support NFS/RDMA IPv6")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If the directory is changing, causing the page cache to get invalidated
while we are listing the contents, then the NFS client is currently forced
to read in the entire directory contents from scratch, because it needs
to perform a linear search for the readdir cookie. While this is not
an issue for small directories, it does not scale to directories with
millions of entries.
In order to be able to deal with large directories that are changing,
add a heuristic to ensure that if the page cache is empty, and we are
searching for a cookie that is not the zero cookie, we just default to
performing uncached readdir.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Tested-by: Dave Wysochanski <dwysocha@redhat.com>
If we're doing uncached readdir, allocate multiple pages in order to
try to avoid duplicate RPC calls for the same getdents() call.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Tested-by: Dave Wysochanski <dwysocha@redhat.com>
If the server is handing out monotonically increasing readdir cookie values,
then we can optimise away searches through pages that contain cookies that
lie outside our search range.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Tested-by: Dave Wysochanski <dwysocha@redhat.com>
If the server insists on using the readdir verifiers in order to allow
cookies to expire, then we should ensure that we cache the verifier
with the cookie, so that we can return an error if the application
tries to use the expired cookie.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Tested-by: Dave Wysochanski <dwysocha@redhat.com>
If the server returns NFS4ERR_NOT_SAME or tells us that the cookie is
bad in response to a READDIR call, then we should empty the page cache
so that we can fill it from scratch again.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Tested-by: Dave Wysochanski <dwysocha@redhat.com>
If we're ever going to allow support for servers that use the readdir
verifier, then that use needs to be managed by the middle layers as
those need to be able to reject cookies from other verifiers.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Tested-by: Dave Wysochanski <dwysocha@redhat.com>
Remove the redundant caching of the credential in struct
nfs_open_dir_context.
Pass the buffer size as an argument to nfs_readdir_xdr_filler().
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Tested-by: Dave Wysochanski <dwysocha@redhat.com>
If a readdir call returns more data than we can fit into one page
cache page, then allocate a new one for that data rather than
discarding the data.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Tested-by: Dave Wysochanski <dwysocha@redhat.com>
Ensure that the contents of struct nfs_open_dir_context are consistent
by setting them under the file->f_lock from a private copy (that is
known to be consistent).
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Tested-by: Dave Wysochanski <dwysocha@redhat.com>
Currently, the client will always ask for security_labels if the server
returns that it supports that feature regardless of any LSM modules
(such as Selinux) enforcing security policy. This adds performance
penalty to the READDIR operation.
Client adjusts superblock's support of the security_label based on
the server's support but also current client's configuration of the
LSM modules. Thus, prior to using the default bitmask in READDIR,
this patch checks the server's capabilities and then instructs
READDIR to remove FATTR4_WORD2_SECURITY_LABEL from the bitmask.
v5: fixing silly mistakes of the rushed v4
v4: simplifying logic
v3: changing label's initialization per Ondrej's comment
v2: dropping selinux hook and using the sb cap.
Suggested-by: Ondrej Mosnacek <omosnace@redhat.com>
Suggested-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Fixes: 2b0143b5c9 ("VFS: normal filesystems (and lustre): d_inode() annotations")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Currently, we wake up the tasks by priority queue ordering, which means
that we ignore the batching that is supposed to help with QoS issues.
Fixes: c049f8ea9a ("SUNRPC: Remove the bh-safe lock requirement on the rpc_wait_queue->lock")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
We need to respect the NFS_MOUNT_SOFTREVAL flag in _nfs4_proc_lookupp,
by timing out if the server is unavailable.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
In order to use the open_by_filehandle() operations on NFSv3, we need
to be able to emulate lookupp() so that nfs_get_parent() can be used
to convert disconnected dentries into connected ones.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
We want to reuse the lookup code in NFSv3 in order to emulate the
NFSv4 lookupp operation.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Since commit b4868b44c5 ("NFSv4: Wait for stateid updates after
CLOSE/OPEN_DOWNGRADE"), every inter server copy operation suffers 5
seconds delay regardless of the size of the copy. The delay is from
nfs_set_open_stateid_locked when the check by nfs_stateid_is_sequential
fails because the seqid in both nfs4_state and nfs4_stateid are 0.
Fix __nfs42_ssc_open to delay setting of NFS_OPEN_STATE in nfs4_state,
until after the call to update_open_stateid, to indicate this is the 1st
open. This fix is part of a 2 patches, the other patch is the fix in the
source server to return the stateid for COPY_NOTIFY request with seqid 1
instead of 0.
Fixes: ce0887ac96 ("NFSD add nfs4 inter ssc to nfsd4_copy")
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
By switching to an XFS-backed export, I am able to reproduce the
ibcomp worker crash on my client with xfstests generic/013.
For the failing LISTXATTRS operation, xdr_inline_pages() is called
with page_len=12 and buflen=128.
- When ->send_request() is called, rpcrdma_marshal_req() does not
set up a Reply chunk because buflen is smaller than the inline
threshold. Thus rpcrdma_convert_iovs() does not get invoked at
all and the transport's XDRBUF_SPARSE_PAGES logic is not invoked
on the receive buffer.
- During reply processing, rpcrdma_inline_fixup() tries to copy
received data into rq_rcv_buf->pages because page_len is positive.
But there are no receive pages because rpcrdma_marshal_req() never
allocated them.
The result is that the ibcomp worker faults and dies. Sometimes that
causes a visible crash, and sometimes it results in a transport hang
without other symptoms.
RPC/RDMA's XDRBUF_SPARSE_PAGES support is not entirely correct, and
should eventually be fixed or replaced. However, my preference is
that upper-layer operations should explicitly allocate their receive
buffers (using GFP_KERNEL) when possible, rather than relying on
XDRBUF_SPARSE_PAGES.
Reported-by: Olga kornievskaia <kolga@netapp.com>
Suggested-by: Olga kornievskaia <kolga@netapp.com>
Fixes: c10a75145f ("NFSv4.2: add the extended attribute proc functions.")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Olga kornievskaia <kolga@netapp.com>
Reviewed-by: Frank van der Linden <fllinden@amazon.com>
Tested-by: Olga kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
There's no need to defer allocation of pages for the receive buffer.
- This upcall is quite infrequent
- gssp_alloc_receive_pages() can allocate the pages with GFP_KERNEL,
unlike the transport
- gssp_alloc_receive_pages() knows exactly how many pages are needed
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Olga Kornievskaia <aglo@umich.edu>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
If the flexfiles mirroring is enabled, then the read code expects to be
able to set pgio->pg_mirror_idx to point to the data server that is
being used for this particular read. However it does not change the
pg_mirror_count because we only need to send a single read.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Pull locking fixes from Thomas Gleixner:
"Two more places which invoke tracing from RCU disabled regions in the
idle path.
Similar to the entry path the low level idle functions have to be
non-instrumentable"
* tag 'locking-urgent-2020-11-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
intel_idle: Fix intel_idle() vs tracing
sched/idle: Fix arch_cpu_idle() vs tracing
Pull irq fixes from Thomas Gleixner:
"Two fixes for irqchip drivers:
- Save and restore the GICV3 ITS state unconditionally on
suspend/resume to handle firmware which fails to do so.
- Use the correct index into the fwspec parameters to read the irq
trigger type in the EXIU chip driver"
* tag 'irq-urgent-2020-11-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic-v3-its: Unconditionally save/restore the ITS state on suspend
irqchip/exiu: Fix the index of fwspec for IRQ type
Pull EFI fixes from Borislav Petkov:
"More EFI fixes forwarded from Ard Biesheuvel:
- revert efivarfs kmemleak fix again - it was a false positive
- make CONFIG_EFI_EARLYCON depend on CONFIG_EFI explicitly so it does
not pull in other dependencies unnecessarily if CONFIG_EFI is not
set
- defer attempts to load SSDT overrides from EFI vars until after the
efivar layer is up"
* tag 'efi-urgent-for-v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi: EFI_EARLYCON should depend on EFI
efivarfs: revert "fix memory leak in efivarfs_create()"
efi/efivars: Set generic ops before loading SSDT
Pull x86 fixes from Borislav Petkov:
"A couple of urgent fixes which accumulated this last week:
- Two resctrl fixes to prevent refcount leaks when manipulating the
resctrl fs (Xiaochen Shen)
- Correct prctl(PR_GET_SPECULATION_CTRL) reporting (Anand K Mistry)
- A fix to not lose already seen MCE severity which determines
whether the machine can recover (Gabriele Paoloni)"
* tag 'x86_urgent_for_v5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Do not overwrite no_way_out if mce_end() fails
x86/speculation: Fix prctl() when spectre_v2_user={seccomp,prctl},ibpb
x86/resctrl: Add necessary kernfs_put() calls to prevent refcount leak
x86/resctrl: Remove superfluous kernfs_get() calls to prevent refcount leak
Pull RISC-V fixes from Palmer Dabbelt:
"I've collected a handful of fixes over the past few weeks:
- A fix to un-break the build-id argument to the vDSO build, which is
necessary for the LLVM linker.
- A fix to initialize the jump label subsystem, without which it (and
all the stuff that uses it) doesn't actually function.
- A fix to include <asm/barrier.h> from <vdso/processor.h>, without
which some drivers won't compile"
* tag 'riscv-for-linus-5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
RISC-V: fix barrier() use in <vdso/processor.h>
RISC-V: Add missing jump label initialization
riscv: Explicitly specify the build id style in vDSO Makefile again
Pull perf tool fixes from Arnaldo Carvalho de Melo:
- Fix die_entrypc() when DW_AT_ranges DWARF attribute not available
- Cope with broken DWARF (missing DW_AT_declaration) generated by some
recent gcc versions
- Do not generate CGROUP metadata events when not asked to in 'perf
record'
- Use proper CPU for shadow stats in 'perf stat'
- Update copy of libbpf's hashmap.c, silencing tools/perf build warning
- Fix return value in 'perf diff'
* tag 'perf-tools-fixes-for-v5.10-2020-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf probe: Change function definition check due to broken DWARF
perf probe: Fix to die_entrypc() returns error correctly
perf stat: Use proper cpu for shadow stats
perf record: Synthesize cgroup events only if needed
perf diff: Fix error return value in __cmd_diff()
perf tools: Update copy of libbpf's hashmap.c
Pull USB / PHY driver fixes from Greg KH:
"Here are a few small USB and PHY driver fixes for 5.10-rc6. They
include:
- small PHY driver fixes to resolve reported issues
- USB quirks added for "broken" devices
- typec fixes for reported problems
- USB gadget fixes for small issues
Full details are in the shortlog, nothing major in here and all have
been in linux-next with no reported issues"
* tag 'usb-5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: typec: stusb160x: fix power-opmode property with typec-power-opmode
USB: core: Change %pK for __user pointers to %px
USB: core: Fix regression in Hercules audio card
usb: gadget: Fix memleak in gadgetfs_fill_super
usb: gadget: f_midi: Fix memleak in f_midi_alloc
USB: quirks: Add USB_QUIRK_DISCONNECT_SUSPEND quirk for Lenovo A630Z TIO built-in usb-audio card
usb: typec: qcom-pmic-typec: fix builtin build errors
phy: mediatek: fix spelling mistake in Kconfig "veriosn" -> "version"
phy: qualcomm: Fix 28 nm Hi-Speed USB PHY OF dependency
phy: qualcomm: usb: Fix SuperSpeed PHY OF dependency
phy: intel: PHY_INTEL_KEEMBAY_EMMC should depend on ARCH_KEEMBAY
usb: cdns3: gadget: calculate TD_SIZE based on TD
usb: cdns3: gadget: initialize link_trb as NULL
phy: cpcap-usb: Use IRQF_ONESHOT
phy: qcom-qmp: Initialize another pointer to NULL
phy: tegra: xusb: Fix dangling pointer on probe failure
phy: usb: Fix incorrect clearing of tca_drv_sel bit in SETUP reg for 7211