For creating fattrs with the label field already allocated for us. I
also update nfs_free_fattr() to free the label in the end.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Refactor: surface useful show_ macros so they can be shared between
the client and server trace code.
Additional clean up:
- Housekeeping: ensure the correct #include files are pulled in
and add proper TRACE_DEFINE_ENUM where they are missing
- Use a consistent naming scheme for the helpers
- Store values to be displayed symbolically as unsigned long, as
that is the type that the __print_yada() functions take
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Save some space in the nfs_inode by setting up an anonymous union with
the fields that are peculiar to a specific type of filesystem object.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If O_DIRECT bumps the commit_info rpcs_out field, then that could lead
to fsync() hangs. The fix is to ensure that O_DIRECT calls
nfs_commit_end().
Fixes: 723c921e7d ("sched/wait, fs/nfs: Convert wait_on_atomic_t() usage to the new wait_var_event() API")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
The current range of RPC task PIDs is 0..65535. That's not adequate
for distinguishing tasks across multiple rpc_clnts running high
throughput workloads.
To help relieve this situation and to reduce the bottleneck of
having a single atomic for assigning all RPC task PIDs, assign task
PIDs per rpc_clnt.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If a user is doing 'ls -l', we have a heuristic in GETATTR that tells
the readdir code to try to use READDIRPLUS in order to refresh the inode
attributes. In certain cirumstances, we also try to invalidate the
remaining directory entries in order to ensure this refresh.
If there are multiple readers of the directory, we probably should avoid
invalidating the page cache, since the heuristic breaks down in that
situation anyway.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
The only check for RPC_TASK_RUNNING is the one in rpc_make_runnable(),
which happens under the same spin lock held when we call
rpc_clear_running().
Ditto, the last check for RPC_TASK_QUEUED in rpc_execute() is performed
under the same lock as the one held when we call rpc_clear_queued().
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If the cached credential exists but doesn't have any expiration callback
then exit early.
Fix up atomicity issues when replacing the credential with a new one
since the existing code could lead to refcount leaks.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Pull driver core fixes from Greg KH:
"Here are some driver core and kernfs fixes for reported issues for
5.15-rc4. These fixes include:
- kernfs positive dentry bugfix
- debugfs_create_file_size error path fix
- cpumask sysfs file bugfix to preserve the user/kernel abi (has been
reported multiple times.)
- devlink fixes for mdiobus devices as reported by the subsystem
maintainers.
Also included in here are some devlink debugging changes to make it
easier for people to report problems when asked. They have already
helped with the mdiobus and other subsystems reporting issues.
All of these have been linux-next for a while with no reported issues"
* tag 'driver-core-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
kernfs: also call kernfs_set_rev() for positive dentry
driver core: Add debug logs when fwnode links are added/deleted
driver core: Create __fwnode_link_del() helper function
driver core: Set deferred probe reason when deferred by driver core
net: mdiobus: Set FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD for mdiobus parents
driver core: fw_devlink: Add support for FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD
driver core: fw_devlink: Improve handling of cyclic dependencies
cpumask: Omit terminating null byte in cpumap_print_{list,bitmask}_to_buf
debugfs: debugfs_create_file_size(): use IS_ERR to check for error
Pull scheduler fixes from Borislav Petkov:
- Tell the compiler to always inline is_percpu_thread()
- Make sure tunable_scaling buffer is null-terminated after an update
in sysfs
- Fix LTP named regression due to cgroup list ordering
* tag 'sched_urgent_for_v5.15_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Always inline is_percpu_thread()
sched/fair: Null terminate buffer when updating tunable_scaling
sched/fair: Add ancestors of unthrottled undecayed cfs_rq
Pull perf fixes from Borislav Petkov:
- Make sure the destroy callback is reset when a event initialization
fails
- Update the event constraints for Icelake
- Make sure the active time of an event is updated even for inactive
events
* tag 'perf_urgent_for_v5.15_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: fix userpage->time_enabled of inactive events
perf/x86/intel: Update event constraints for ICX
perf/x86: Reset destroy callback on event init failure
Users of rdpmc rely on the mmapped user page to calculate accurate
time_enabled. Currently, userpage->time_enabled is only updated when the
event is added to the pmu. As a result, inactive event (due to counter
multiplexing) does not have accurate userpage->time_enabled. This can
be reproduced with something like:
/* open 20 task perf_event "cycles", to create multiplexing */
fd = perf_event_open(); /* open task perf_event "cycles" */
userpage = mmap(fd); /* use mmap and rdmpc */
while (true) {
time_enabled_mmap = xxx; /* use logic in perf_event_mmap_page */
time_enabled_read = read(fd).time_enabled;
if (time_enabled_mmap > time_enabled_read)
BUG();
}
Fix this by updating userpage for inactive events in merge_sched_in.
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reported-and-tested-by: Lucian Grijincu <lucian@fb.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210929194313.2398474-1-songliubraving@fb.com
Pull networking fixes from Jakub Kicinski:
"Networking fixes, including fixes from mac80211, netfilter and bpf.
Current release - regressions:
- bpf, cgroup: assign cgroup in cgroup_sk_alloc when called from
interrupt
- mdio: revert mechanical patches which broke handling of optional
resources
- dev_addr_list: prevent address duplication
Previous releases - regressions:
- sctp: break out if skb_header_pointer returns NULL in sctp_rcv_ootb
(NULL deref)
- Revert "mac80211: do not use low data rates for data frames with no
ack flag", fixing broadcast transmissions
- mac80211: fix use-after-free in CCMP/GCMP RX
- netfilter: include zone id in tuple hash again, minimize collisions
- netfilter: nf_tables: unlink table before deleting it (race -> UAF)
- netfilter: log: work around missing softdep backend module
- mptcp: don't return sockets in foreign netns
- sched: flower: protect fl_walk() with rcu (race -> UAF)
- ixgbe: fix NULL pointer dereference in ixgbe_xdp_setup
- smsc95xx: fix stalled rx after link change
- enetc: fix the incorrect clearing of IF_MODE bits
- ipv4: fix rtnexthop len when RTA_FLOW is present
- dsa: mv88e6xxx: 6161: use correct MAX MTU config method for this
SKU
- e100: fix length calculation & buffer overrun in ethtool::get_regs
Previous releases - always broken:
- mac80211: fix using stale frag_tail skb pointer in A-MSDU tx
- mac80211: drop frames from invalid MAC address in ad-hoc mode
- af_unix: fix races in sk_peer_pid and sk_peer_cred accesses (race
-> UAF)
- bpf, x86: Fix bpf mapping of atomic fetch implementation
- bpf: handle return value of BPF_PROG_TYPE_STRUCT_OPS prog
- netfilter: ip6_tables: zero-initialize fragment offset
- mhi: fix error path in mhi_net_newlink
- af_unix: return errno instead of NULL in unix_create1() when over
the fs.file-max limit
Misc:
- bpf: exempt CAP_BPF from checks against bpf_jit_limit
- netfilter: conntrack: make max chain length random, prevent
guessing buckets by attackers
- netfilter: nf_nat_masquerade: make async masq_inet6_event handling
generic, defer conntrack walk to work queue (prevent hogging RTNL
lock)"
* tag 'net-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (77 commits)
af_unix: fix races in sk_peer_pid and sk_peer_cred accesses
net: stmmac: fix EEE init issue when paired with EEE capable PHYs
net: dev_addr_list: handle first address in __hw_addr_add_ex
net: sched: flower: protect fl_walk() with rcu
net: introduce and use lock_sock_fast_nested()
net: phy: bcm7xxx: Fixed indirect MMD operations
net: hns3: disable firmware compatible features when uninstall PF
net: hns3: fix always enable rx vlan filter problem after selftest
net: hns3: PF enable promisc for VF when mac table is overflow
net: hns3: fix show wrong state when add existing uc mac address
net: hns3: fix mixed flag HCLGE_FLAG_MQPRIO_ENABLE and HCLGE_FLAG_DCB_ENABLE
net: hns3: don't rollback when destroy mqprio fail
net: hns3: remove tc enable checking
net: hns3: do not allow call hns3_nic_net_open repeatedly
ixgbe: Fix NULL pointer dereference in ixgbe_xdp_setup
net: bridge: mcast: Associate the seqcount with its protecting lock.
net: mdio-ipq4019: Fix the error for an optional regs resource
net: hns3: fix hclge_dbg_dump_tm_pg() stack usage
net: mdio: mscc-miim: Fix the mdio controller
af_unix: Return errno instead of NULL in unix_create1().
...
Daniel Borkmann says:
====================
pull-request: bpf 2021-09-28
The following pull-request contains BPF updates for your *net* tree.
We've added 10 non-merge commits during the last 14 day(s) which contain
a total of 11 files changed, 139 insertions(+), 53 deletions(-).
The main changes are:
1) Fix MIPS JIT jump code emission for too large offsets, from Piotr Krysiuk.
2) Fix x86 JIT atomic/fetch emission when dst reg maps to rax, from Johan Almbladh.
3) Fix cgroup_sk_alloc corner case when called from interrupt, from Daniel Borkmann.
4) Fix segfault in libbpf's linker for objects without BTF, from Kumar Kartikeya Dwivedi.
5) Fix bpf_jit_charge_modmem for applications with CAP_BPF, from Lorenz Bauer.
6) Fix return value handling for struct_ops BPF programs, from Hou Tao.
7) Various fixes to BPF selftests, from Jiri Benc.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
,
Pull kvm fixes from Paolo Bonzini:
"A bit late... I got sidetracked by back-from-vacation routines and
conferences. But most of these patches are already a few weeks old and
things look more calm on the mailing list than what this pull request
would suggest.
x86:
- missing TLB flush
- nested virtualization fixes for SMM (secure boot on nested
hypervisor) and other nested SVM fixes
- syscall fuzzing fixes
- live migration fix for AMD SEV
- mirror VMs now work for SEV-ES too
- fixes for reset
- possible out-of-bounds access in IOAPIC emulation
- fix enlightened VMCS on Windows 2022
ARM:
- Add missing FORCE target when building the EL2 object
- Fix a PMU probe regression on some platforms
Generic:
- KCSAN fixes
selftests:
- random fixes, mostly for clang compilation"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (43 commits)
selftests: KVM: Explicitly use movq to read xmm registers
selftests: KVM: Call ucall_init when setting up in rseq_test
KVM: Remove tlbs_dirty
KVM: X86: Synchronize the shadow pagetable before link it
KVM: X86: Fix missed remote tlb flush in rmap_write_protect()
KVM: x86: nSVM: don't copy virt_ext from vmcb12
KVM: x86: nSVM: test eax for 4K alignment for GP errata workaround
KVM: x86: selftests: test simultaneous uses of V_IRQ from L1 and L0
KVM: x86: nSVM: restore int_vector in svm_clear_vintr
kvm: x86: Add AMD PMU MSRs to msrs_to_save_all[]
KVM: x86: nVMX: re-evaluate emulation_required on nested VM exit
KVM: x86: nVMX: don't fail nested VM entry on invalid guest state if !from_vmentry
KVM: x86: VMX: synthesize invalid VM exit when emulating invalid guest state
KVM: x86: nSVM: refactor svm_leave_smm and smm_enter_smm
KVM: x86: SVM: call KVM_REQ_GET_NESTED_STATE_PAGES on exit from SMM mode
KVM: x86: reset pdptrs_from_userspace when exiting smm
KVM: x86: nSVM: restore the L1 host state prior to resuming nested guest on SMM exit
KVM: nVMX: Filter out all unsupported controls when eVMCS was activated
KVM: KVM: Use cpumask_available() to check for NULL cpumask when kicking vCPUs
KVM: Clean up benign vcpu->cpu data races when kicking vCPUs
...
Pull x86 fixes from Thomas Gleixner:
"A set of fixes for X86:
- Prevent sending the wrong signal when protection keys are enabled
and the kernel handles a fault in the vsyscall emulation.
- Invoke early_reserve_memory() before invoking e820_memory_setup()
which is required to make the Xen dom0 e820 hooks work correctly.
- Use the correct data type for the SETZ operand in the EMQCMDS
instruction wrapper.
- Prevent undefined behaviour to the potential unaligned accesss in
the instruction decoder library"
* tag 'x86-urgent-2021-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/insn, tools/x86: Fix undefined behavior due to potential unaligned accesses
x86/asm: Fix SETZ size enqcmds() build failure
x86/setup: Call early_reserve_memory() earlier
x86/fault: Fix wrong signal when vsyscall fails with pkey
Pull irq fixes from Thomas Gleixner:
"A set of fixes for interrupt chip drivers:
- Work around a bad GIC integration on a Renesas platform which can't
handle byte-sized MMIO access
- Plug a potential memory leak in the GICv4 driver
- Fix a regression in the Armada 370-XP IPI code which was caused by
issuing EOI instack of ACK.
- A couple of small fixes here and there"
* tag 'irq-urgent-2021-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic: Work around broken Renesas integration
irqchip/renesas-rza1: Use semicolons instead of commas
irqchip/gic-v3-its: Fix potential VPE leak on error
irqchip/goldfish-pic: Select GENERIC_IRQ_CHIP to fix build
irqchip/mbigen: Repair non-kernel-doc notation
irqdomain: Change the type of 'size' in __irq_domain_add() to be consistent
irqchip/armada-370-xp: Fix ack/eoi breakage
Documentation: Fix irq-domain.rst build warning
Merge misc fixes from Andrew Morton:
"16 patches.
Subsystems affected by this patch series: xtensa, sh, ocfs2, scripts,
lib, and mm (memory-failure, kasan, damon, shmem, tools, pagecache,
debug, and pagemap)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: fix uninitialized use in overcommit_policy_handler
mm/memory_failure: fix the missing pte_unmap() call
kasan: always respect CONFIG_KASAN_STACK
sh: pgtable-3level: fix cast to pointer from integer of different size
mm/debug: sync up latest migrate_reason to migrate_reason_names
mm/debug: sync up MR_CONTIG_RANGE and MR_LONGTERM_PIN
mm: fs: invalidate bh_lrus for only cold path
lib/zlib_inflate/inffast: check config in C to avoid unused function warning
tools/vm/page-types: remove dependency on opt_file for idle page tracking
scripts/sorttable: riscv: fix undeclared identifier 'EM_RISCV' error
ocfs2: drop acl cache for directories too
mm/shmem.c: fix judgment error in shmem_is_huge()
xtensa: increase size of gcc stack frame check
mm/damon: don't use strnlen() with known-bogus source length
kasan: fix Kconfig check of CC_HAS_WORKING_NOSANITIZE_ADDRESS
mm, hwpoison: add is_free_buddy_page() in HWPoisonHandlable()
Pull char/misc driver fixes from Greg KH:
"Here are some small char and misc driver fixes for 5.15-rc3.
Nothing huge in here, just fixes for a number of small issues that
have been reported. These include:
- habanalabs race conditions and other bugs fixed
- binder driver fixes
- fpga driver fixes
- coresight build warning fix
- nvmem driver fix
- comedi memory leak fix
- bcm-vk tty race fix
- other tiny driver fixes
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (21 commits)
comedi: Fix memory leak in compat_insnlist()
nvmem: NVMEM_NINTENDO_OTP should depend on WII
misc: bcm-vk: fix tty registration race
fpga: dfl: Avoid reads to AFU CSRs during enumeration
fpga: machxo2-spi: Fix missing error code in machxo2_write_complete()
fpga: machxo2-spi: Return an error on failure
habanalabs: expose a single cs seq in staged submissions
habanalabs: fix wait offset handling
habanalabs: rate limit multi CS completion errors
habanalabs/gaudi: fix LBW RR configuration
habanalabs: Fix spelling mistake "FEADBACK" -> "FEEDBACK"
habanalabs: fail collective wait when not supported
habanalabs/gaudi: use direct MSI in single mode
habanalabs: fix kernel OOPs related to staged cs
habanalabs: fix potential race in interrupt wait ioctl
mcb: fix error handling in mcb_alloc_bus()
misc: genwqe: Fixes DMA mask setting
coresight: syscfg: Fix compiler warning
nvmem: core: Add stubs for nvmem_cell_read_variable_le_u32/64 if !CONFIG_NVMEM
binder: make sure fd closes complete
...
Pull USB driver fixes from Greg KH:
"Here are some USB driver fixes and new device ids for 5.15-rc3.
They include:
- usb-storage quirk additions
- usb-serial new device ids
- usb-serial driver fixes
- USB roothub registration bugfix to resolve a long-reported issue
- usb gadget driver fixes for a large number of small things
- dwc2 driver fixes
All of these have been in linux-next for a while with no reported
issues"
* tag 'usb-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (28 commits)
USB: serial: option: add device id for Foxconn T99W265
USB: serial: cp210x: add ID for GW Instek GDM-834x Digital Multimeter
USB: serial: cp210x: add part-number debug printk
USB: serial: cp210x: fix dropped characters with CP2102
MAINTAINERS: usb, update Peter Korsgaard's entries
usb: musb: tusb6010: uninitialized data in tusb_fifo_write_unaligned()
usb-storage: Add quirk for ScanLogic SL11R-IDE older than 2.6c
Re-enable UAS for LaCie Rugged USB3-FW with fk quirk
USB: serial: option: remove duplicate USB device ID
USB: serial: mos7840: remove duplicated 0xac24 device ID
arm64: dts: qcom: ipq8074: remove USB tx-fifo-resize property
usb: gadget: f_uac2: Populate SS descriptors' wBytesPerInterval
usb: gadget: f_uac2: Add missing companion descriptor for feedback EP
usb: dwc2: gadget: Fix ISOC transfer complete handling for DDMA
usb: core: hcd: Modularize HCD stop configuration in usb_stop_hcd()
xhci: Set HCD flag to defer primary roothub registration
usb: core: hcd: Add support for deferring roothub registration
usb: dwc2: gadget: Fix ISOC flow for BDMA and Slave
usb: dwc3: core: balance phy init and exit
Revert "USB: bcma: Add a check for devm_gpiod_get"
...
Pull irqchip fixes from Marc Zyngier:
- Work around a bad GIC integration on a Renesas platform, where the
interconnect cannot deal with byte-sized MMIO accesses
- Cleanup another Renesas driver abusing the comma operator
- Fix a potential GICv4 memory leak on an error path
- Make the type of 'size' consistent with the rest of the code in
__irq_domain_add()
- Fix a regression in the Armada 370-XP IPI path
- Fix the build for the obviously unloved goldfish-pic
- Some documentation fixes
Link: https://lore.kernel.org/r/20210924090933.2766857-1-maz@kernel.org
Pull rseq fixes from Paolo Bonzini:
"A fix for a bug with restartable sequences and KVM.
KVM's handling of TIF_NOTIFY_RESUME, e.g. for task migration, clears
the flag without informing rseq and leads to stale data in userspace's
rseq struct"
* tag 'for-linus-rseq' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: selftests: Remove __NR_userfaultfd syscall fallback
KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs
tools: Move x86 syscall number fallbacks to .../uapi/
entry: rseq: Call rseq_handle_notify_resume() in tracehook_notify_resume()
KVM: rseq: Update rseq when processing NOTIFY_RESUME on xfer to KVM guest
Pull networking fixes from Jakub Kicinski:
"Current release - regressions:
- dsa: bcm_sf2: fix array overrun in bcm_sf2_num_active_ports()
Previous releases - regressions:
- introduce a shutdown method to mdio device drivers, and make DSA
switch drivers compatible with masters disappearing on shutdown;
preventing infinite reference wait
- fix issues in mdiobus users related to ->shutdown vs ->remove
- virtio-net: fix pages leaking when building skb in big mode
- xen-netback: correct success/error reporting for the
SKB-with-fraglist
- dsa: tear down devlink port regions when tearing down the devlink
port on error
- nexthop: fix division by zero while replacing a resilient group
- hns3: check queue, vf, vlan ids range before using
Previous releases - always broken:
- napi: fix race against netpoll causing NAPI getting stuck
- mlx4_en: ensure link operstate is updated even if link comes up
before netdev registration
- bnxt_en: fix TX timeout when TX ring size is set to the smallest
- enetc: fix illegal access when reading affinity_hint; prevent oops
on sysfs access
- mtk_eth_soc: avoid creating duplicate offload entries
Misc:
- core: correct the sock::sk_lock.owned lockdep annotations"
* tag 'net-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (51 commits)
atlantic: Fix issue in the pm resume flow.
net/mlx4_en: Don't allow aRFS for encapsulated packets
net: mscc: ocelot: fix forwarding from BLOCKING ports remaining enabled
net: ethernet: mtk_eth_soc: avoid creating duplicate offload entries
nfc: st-nci: Add SPI ID matching DT compatible
MAINTAINERS: remove Guvenc Gulce as net/smc maintainer
nexthop: Fix memory leaks in nexthop notification chain listeners
mptcp: ensure tx skbs always have the MPTCP ext
qed: rdma - don't wait for resources under hw error recovery flow
s390/qeth: fix deadlock during failing recovery
s390/qeth: Fix deadlock in remove_discipline
s390/qeth: fix NULL deref in qeth_clear_working_pool_list()
net: dsa: realtek: register the MDIO bus under devres
net: dsa: don't allocate the slave_mii_bus using devres
Doc: networking: Fox a typo in ice.rst
net: dsa: fix dsa_tree_setup error path
net/smc: fix 'workqueue leaked lock' in smc_conn_abort_work
net/smc: add missing error check in smc_clc_prfx_set()
net: hns3: fix a return value error in hclge_get_reset_status()
net: hns3: check vlan id before using it
...
If a parent device is also a supplier to a child device, fw_devlink=on by
design delays the probe() of the child device until the probe() of the
parent finishes successfully.
However, some drivers of such parent devices (where parent is also a
supplier) expect the child device to finish probing successfully as soon as
they are added using device_add() and before the probe() of the parent
device has completed successfully. One example of such a case is discussed
in the link mentioned below.
Add a flag to make fw_devlink=on not enforce these supplier-consumer
relationships, so these drivers can continue working.
Link: https://lore.kernel.org/netdev/CAGETcx_uj0V4DChME-gy5HGKTYnxLBX=TH2rag29f_p=UcG+Tg@mail.gmail.com/
Fixes: ea718c6990 ("Revert "Revert "driver core: Set fw_devlink=on by default""")
Signed-off-by: Saravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20210915170940.617415-3-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Read vcpu->vcpu_idx directly instead of bouncing through the one-line
wrapper, kvm_vcpu_get_idx(), and drop the wrapper. The wrapper is a
remnant of the original implementation and serves no purpose; remove it
before it gains more users.
Back when kvm_vcpu_get_idx() was added by commit 497d72d80a ("KVM: Add
kvm_vcpu_get_idx to get vcpu index in kvm->vcpus"), the implementation
was more than just a simple wrapper as vcpu->vcpu_idx did not exist and
retrieving the index meant walking over the vCPU array to find the given
vCPU.
When vcpu_idx was introduced by commit 8750e72a79 ("KVM: remember
position in kvm->vcpus array"), the helper was left behind, likely to
avoid extra thrash (but even then there were only two users, the original
arm usage having been removed at some point in the past).
No functional change intended.
Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210910183220.2397812-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Invoke rseq_handle_notify_resume() from tracehook_notify_resume() now
that the two function are always called back-to-back by architectures
that have rseq. The rseq helper is stubbed out for architectures that
don't support rseq, i.e. this is a nop across the board.
Note, tracehook_notify_resume() is horribly named and arguably does not
belong in tracehook.h as literally every line of code in it has nothing
to do with tracing. But, that's been true since commit a42c6ded82
("move key_repace_session_keyring() into tracehook_notify_resume()")
first usurped tracehook_notify_resume() back in 2012. Punt cleaning that
mess up to future patches.
No functional change intended.
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210901203030.1292304-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The function __bad_area_nosemaphore() calls kernelmode_fixup_or_oops()
with the parameter @signal being actually @pkey, which will send a
signal numbered with the argument in @pkey.
This bug can be triggered when the kernel fails to access user-given
memory pages that are protected by a pkey, so it can go down the
do_user_addr_fault() path and pass the !user_mode() check in
__bad_area_nosemaphore().
Most cases will simply run the kernel fixup code to make an -EFAULT. But
when another condition current->thread.sig_on_uaccess_err is met, which
is only used to emulate vsyscall, the kernel will generate the wrong
signal.
Add a new parameter @pkey to kernelmode_fixup_or_oops() to fix this.
[ bp: Massage commit message, fix build error as reported by the 0day
bot: https://lkml.kernel.org/r/202109202245.APvuT8BX-lkp@intel.com ]
Fixes: 5042d40a26 ("x86/fault: Bypass no_context() for implicit kernel faults from usermode")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Jiashuo Liang <liangjs@pku.edu.cn>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20210730030152.249106-1-liangjs@pku.edu.cn
Russell reported that since 5.13, KVM's probing of the PMU has
started to fail on his HW. As it turns out, there is an implicit
ordering dependency between the architectural PMU probing code and
and KVM's own probing. If, due to probe ordering reasons, KVM probes
before the PMU driver, it will fail to detect the PMU and prevent it
from being advertised to guests as well as the VMM.
Obviously, this is one probing too many, and we should be able to
deal with any ordering.
Add a callback from the PMU code into KVM to advertise the registration
of a host CPU PMU, allowing for any probing order.
Fixes: 5421db1be3 ("KVM: arm64: Divorce the perf code from oprofile helpers")
Reported-by: "Russell King (Oracle)" <linux@armlinux.org.uk>
Tested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/YUYRKVflRtUytzy5@shell.armlinux.org.uk
Cc: stable@vger.kernel.org
Pull x86 fixes from Borislav Petkov:
- Prevent a infinite loop in the MCE recovery on return to user space,
which was caused by a second MCE queueing work for the same page and
thereby creating a circular work list.
- Make kern_addr_valid() handle existing PMD entries, which are marked
not present in the higher level page table, correctly instead of
blindly dereferencing them.
- Pass a valid address to sanitize_phys(). This was caused by the
mixture of inclusive and exclusive ranges. memtype_reserve() expect
'end' being exclusive, but sanitize_phys() wants it inclusive. This
worked so far, but with end being the end of the physical address
space the fail is exposed.
- Increase the maximum supported GPIO numbers for 64bit. Newer SoCs
exceed the previous maximum.
* tag 'x86_urgent_for_v5.15_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Avoid infinite loop for copy from user recovery
x86/mm: Fix kern_addr_valid() to cope with existing but not present entries
x86/platform: Increase maximum GPIO number for X86_64
x86/pat: Pass valid address to sanitize_phys()
MDIO-attached devices might have interrupts and other things that might
need quiesced when we kexec into a new kernel. Things are even more
creepy when those interrupt lines are shared, and in that case it is
absolutely mandatory to disable all interrupt sources.
Moreover, MDIO devices might be DSA switches, and DSA needs its own
shutdown method to unlink from the DSA master, which is a new
requirement that appeared after commit 2f1e8ea726 ("net: dsa: link
interfaces with the DSA master to get rid of lockdep warnings").
So introduce a ->shutdown method in the MDIO device driver structure.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull io_uring iov_iter retry fixes from Jens Axboe:
"This adds a helper to save/restore iov_iter state, and modifies
io_uring to use it.
After that is done, we can now kill the iter->truncated addition that
we added for this release. The io_uring change is being overly
cautious with the save/restore/advance, but better safe than sorry and
we can always improve that and reduce the overhead if it proves to be
of concern. The only case to be worried about in this regard is huge
IO, where iteration can take a while to iterate segments.
I spent some time writing test cases, and expanded the coverage quite
a bit from the last posting of this. liburing carries this regression
test case now:
https://git.kernel.dk/cgit/liburing/tree/test/file-verify.c
which exercises all of this. It now also supports provided buffers,
and explicitly tests for end-of-file/device truncation as well.
On top of that, Pavel sanitized the IOPOLL retry path to follow the
exact same pattern as normal IO"
* tag 'iov_iter.3-5.15-2021-09-17' of git://git.kernel.dk/linux-block:
io_uring: move iopoll reissue into regular IO path
Revert "iov_iter: track truncated size"
io_uring: use iov_iter state save/restore helpers
iov_iter: add helper to save iov_iter state
NXP Legal insists that the following are not fine:
- Saying "NXP Semiconductors" instead of "NXP", since the company's
registered name is "NXP"
- Putting a "(c)" sign in the copyright string
- Putting a comma in the copyright string
The only accepted copyright string format is "Copyright <year-range> NXP".
This patch changes the copyright headers in the networking files that
were sent by me, or derived from code sent by me.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge absolute_pointer macro series from Guenter Roeck:
"Kernel test builds currently fail for several architectures with error
messages such as the following.
drivers/net/ethernet/i825xx/82596.c: In function 'i82596_probe':
arch/m68k/include/asm/string.h:72:25: error:
'__builtin_memcpy' reading 6 bytes from a region of size 0
[-Werror=stringop-overread]
Such warnings may be reported by gcc 11.x for string and memory
operations on fixed addresses if gcc's builtin functions are used for
those operations.
This series introduces absolute_pointer() to fix the problem.
absolute_pointer() disassociates a pointer from its originating symbol
type and context, and thus prevents gcc from making assumptions about
pointers passed to memory operations"
* emailed patches from Guenter Roeck <linux@roeck-us.net>:
alpha: Use absolute_pointer to define COMMAND_LINE
alpha: Move setup.h out of uapi
net: i825xx: Use absolute_pointer for memcpy from fixed memory location
compiler.h: Introduce absolute_pointer macro
absolute_pointer() disassociates a pointer from its originating symbol
type and context. Use it to prevent compiler warnings/errors such as
drivers/net/ethernet/i825xx/82596.c: In function 'i82596_probe':
arch/m68k/include/asm/string.h:72:25: error:
'__builtin_memcpy' reading 6 bytes from a region of size 0 [-Werror=stringop-overread]
Such warnings may be reported by gcc 11.x for string and memory
operations on fixed addresses.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit 2112ff5ce0.
We no longer need to track the truncation count, the one user that did
need it has been converted to using iov_iter_restore() instead.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The boot-time allocation interface for memblock is a mess, with
'memblock_alloc()' returning a virtual pointer, but then you are
supposed to free it with 'memblock_free()' that takes a _physical_
address.
Not only is that all kinds of strange and illogical, but it actually
causes bugs, when people then use it like a normal allocation function,
and it fails spectacularly on a NULL pointer:
https://lore.kernel.org/all/20210912140820.GD25450@xsang-OptiPlex-9020/
or just random memory corruption if the debug checks don't catch it:
https://lore.kernel.org/all/61ab2d0c-3313-aaab-514c-e15b7aa054a0@suse.cz/
I really don't want to apply patches that treat the symptoms, when the
fundamental cause is this horribly confusing interface.
I started out looking at just automating a sane replacement sequence,
but because of this mix or virtual and physical addresses, and because
people have used the "__pa()" macro that can take either a regular
kernel pointer, or just the raw "unsigned long" address, it's all quite
messy.
So this just introduces a new saner interface for freeing a virtual
address that was allocated using 'memblock_alloc()', and that was kept
as a regular kernel pointer. And then it converts a couple of users
that are obvious and easy to test, including the 'xbc_nodes' case in
lib/bootconfig.c that caused problems.
Reported-by: kernel test robot <oliver.sang@intel.com>
Fixes: 40caa127f3 ("init: bootconfig: Remove all bootconfig data when the init memory is removed")
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently if a function ptr in struct_ops has a return value, its
caller will get a random return value from it, because the return
value of related BPF_PROG_TYPE_STRUCT_OPS prog is just dropped.
So adding a new flag BPF_TRAMP_F_RET_FENTRY_RET to tell bpf trampoline
to save and return the return value of struct_ops prog if ret_size of
the function ptr is greater than 0. Also restricting the flag to be
used alone.
Fixes: 85d33df357 ("bpf: Introduce BPF_MAP_TYPE_STRUCT_OPS")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210914023351.3664499-1-houtao1@huawei.com