Synchronize the shared mlx5 branch with net:
- From Jiri: fixe a deadlock in mlx5_ib's netdev notifier unregister.
- From Mark and Patrisious: add IPsec RoCEv2 support.
- From Or: Rely on firmware to get special mkeys
* branch mlx5-next:
RDMA/mlx5: Use query_special_contexts for mkeys
net/mlx5e: Use query_special_contexts for mkeys
net/mlx5: Change define name for 0x100 lkey value
net/mlx5: Expose bits for querying special mkeys
net/mlx5: Configure IPsec steering for egress RoCEv2 traffic
net/mlx5: Configure IPsec steering for ingress RoCEv2 traffic
net/mlx5: Add IPSec priorities in RDMA namespaces
net/mlx5: Implement new destination type TABLE_TYPE
net/mlx5: Introduce new destination type TABLE_TYPE
RDMA/mlx5: Track netdev to avoid deadlock during netdev notifier unregister
net/mlx5e: Propagate an internal event in case uplink netdev changes
net/mlx5e: Fix trap event handling
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Add IPSec flow steering priorities in RDMA namespaces. This allows
adding tables/rules to forward RoCEv2 traffic to the IPSec crypto
tables in NIC_TX domain, and accept RoCEv2 traffic from NIC_RX domain.
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Implement new destination type to support flow transition between
different table types.
e.g. from NIC_RX to RDMA_RX or from RDMA_TX to NIC_TX.
The new destination is described in the tracepoint as follows:
"mlx5_fs_add_rule: rule=00000000d53cd0ed fte=0000000048a8a6ed index=0 sw_action=<> [dst] flow_table_type=7 id:262152"
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
This new destination type supports flow transition between different
table types, e.g. from NIC_RX to RDMA_RX or from RDMA_TX to NIC_TX.
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Whenever uplink netdev is set/cleared, propagate newly introduced event
to inform notifier blocks netdev was added/removed.
Move the set() helper to core.c from header, introduce clear() and
netdev_added_event_replay() helpers. The last one is going to be called
from rdma driver, so export it.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Pull NFS client fixes from Trond Myklebust:
- Fix a race in the RPCSEC_GSS upcall code that causes hung RPC calls
- Fix a broken coalescing test in the pNFS file layout driver
- Ensure that the access cache rcu path also applies the login test
- Fix up for a sparse warning
* tag 'nfs-for-6.2-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: Fix up a sparse warning
NFS: Judge the file access cache's timestamp in rcu path
pNFS/filelayout: Fix coalescing test for single DS
SUNRPC: ensure the matching upcall is in-flight upon downcall
Pull drm fixes from Daniel Vetter:
"Still not much, but more than last week. Dave should be back next week
from the beaching.
drivers:
- i915-gvt fixes
- amdgpu/kfd fixes
- panfrost bo refcounting fix
- meson afbc corruption fix
- imx plane width fix
core:
- drm/sched fixes
- drm/mm kunit test fix
- dma-buf export error handling fixes"
* tag 'drm-fixes-2023-01-06' of git://anongit.freedesktop.org/drm/drm:
Revert "drm/amd/display: Enable Freesync Video Mode by default"
drm/i915/gvt: fix double free bug in split_2MB_gtt_entry
drm/i915/gvt: use atomic operations to change the vGPU status
drm/i915/gvt: fix vgpu debugfs clean in remove
drm/i915/gvt: fix gvt debugfs destroy
drm/i915: unpin on error in intel_vgpu_shadow_mm_pin()
drm/amd/display: Uninitialized variables causing 4k60 UCLK to stay at DPM1 and not DPM0
drm/amdkfd: Fix kernel warning during topology setup
drm/scheduler: Fix lockup in drm_sched_entity_kill()
drm/imx: ipuv3-plane: Fix overlay plane width
drm/scheduler: Fix lockup in drm_sched_entity_kill()
drm/virtio: Fix memory leak in virtio_gpu_object_create()
drm/meson: Reduce the FIFO lines held when AFBC is not used
drm/tests: reduce drm_mm_test stack usage
drm/panfrost: Fix GEM handle creation ref-counting
drm/plane-helper: Add the missing declaration of drm_atomic_state
dma-buf: fix dma_buf_export init order v2
Pull block fixes from Jens Axboe:
"The big change here is obviously the revert of the pktcdvd driver
removal. Outside of that, just minor tweaks. In detail:
- Re-instate the pktcdvd driver, which necessitates adding back
bio_copy_data_iter() and the fops->devnode() hook for now (me)
- Fix for splitting of a bio marked as NOWAIT, causing either nowait
reads or writes to error with EAGAIN even if parts of the IO
completed (me)
- Fix for ublk, punting management commands to io-wq as they can all
easily block for extended periods of time (Ming)
- Removal of SRCU dependency for the block layer (Paul)"
* tag 'block-2023-01-06' of git://git.kernel.dk/linux:
block: Remove "select SRCU"
Revert "pktcdvd: remove driver."
Revert "block: remove devnode callback from struct block_device_operations"
Revert "block: bio_copy_data_iter"
ublk: honor IO_URING_F_NONBLOCK for handling control command
block: don't allow splitting of a REQ_NOWAIT bio
block: handle bio_split_to_limits() NULL return
Pull io_uring fixes from Jens Axboe:
"A few minor fixes that should go into the 6.2 release:
- Fix for a memory leak in io-wq worker creation, if we ultimately
end up canceling the worker creation before it gets created (me)
- lockdep annotations for the CQ locking (Pavel)
- A regression fix for CQ timeout handling (Pavel)
- Ring pinning around deferred task_work fix (Pavel)
- A trivial member move in struct io_ring_ctx, saving us some memory
(me)"
* tag 'io_uring-2023-01-06' of git://git.kernel.dk/linux:
io_uring: fix CQ waiting timeout handling
io_uring: move 'poll_multi_queue' bool in io_ring_ctx
io_uring: lockdep annotate CQ locking
io_uring: pin context while queueing deferred tw
io_uring/io-wq: free worker if task_work creation is canceled
Pull ceph fixes from Ilya Dryomov:
"Two file locking fixes from Xiubo"
* tag 'ceph-for-6.2-rc3' of https://github.com/ceph/ceph-client:
ceph: avoid use-after-free in ceph_fl_release_lock()
ceph: switch to vfs_inode_has_locks() to fix file lock bug
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf, wifi, and netfilter.
Current release - regressions:
- bpf: fix nullness propagation for reg to reg comparisons, avoid
null-deref
- inet: control sockets should not use current thread task_frag
- bpf: always use maximal size for copy_array()
- eth: bnxt_en: don't link netdev to a devlink port for VFs
Current release - new code bugs:
- rxrpc: fix a couple of potential use-after-frees
- netfilter: conntrack: fix IPv6 exthdr error check
- wifi: iwlwifi: fw: skip PPAG for JF, avoid FW crashes
- eth: dsa: qca8k: various fixes for the in-band register access
- eth: nfp: fix schedule in atomic context when sync mc address
- eth: renesas: rswitch: fix getting mac address from device tree
- mobile: ipa: use proper endpoint mask for suspend
Previous releases - regressions:
- tcp: add TIME_WAIT sockets in bhash2, fix regression caught by
Jiri / python tests
- net: tc: don't intepret cls results when asked to drop, fix
oob-access
- vrf: determine the dst using the original ifindex for multicast
- eth: bnxt_en:
- fix XDP RX path if BPF adjusted packet length
- fix HDS (header placement) and jumbo thresholds for RX packets
- eth: ice: xsk: do not use xdp_return_frame() on tx_buf->raw_buf,
avoid memory corruptions
Previous releases - always broken:
- ulp: prevent ULP without clone op from entering the LISTEN status
- veth: fix race with AF_XDP exposing old or uninitialized
descriptors
- bpf:
- pull before calling skb_postpull_rcsum() (fix checksum support
and avoid a WARN())
- fix panic due to wrong pageattr of im->image (when livepatch and
kretfunc coexist)
- keep a reference to the mm, in case the task is dead
- mptcp: fix deadlock in fastopen error path
- netfilter:
- nf_tables: perform type checking for existing sets
- nf_tables: honor set timeout and garbage collection updates
- ipset: fix hash:net,port,net hang with /0 subnet
- ipset: avoid hung task warning when adding/deleting entries
- selftests: net:
- fix cmsg_so_mark.sh test hang on non-x86 systems
- fix the arp_ndisc_evict_nocarrier test for IPv6
- usb: rndis_host: secure rndis_query check against int overflow
- eth: r8169: fix dmar pte write access during suspend/resume with
WOL
- eth: lan966x: fix configuration of the PCS
- eth: sparx5: fix reading of the MAC address
- eth: qed: allow sleep in qed_mcp_trace_dump()
- eth: hns3:
- fix interrupts re-initialization after VF FLR
- fix handling of promisc when MAC addr table gets full
- refine the handling for VF heartbeat
- eth: mlx5:
- properly handle ingress QinQ-tagged packets on VST
- fix io_eq_size and event_eq_size params validation on big endian
- fix RoCE setting at HCA level if not supported at all
- don't turn CQE compression on by default for IPoIB
- eth: ena:
- fix toeplitz initial hash key value
- account for the number of XDP-processed bytes in interface stats
- fix rx_copybreak value update
Misc:
- ethtool: harden phy stat handling against buggy drivers
- docs: netdev: convert maintainer's doc from FAQ to a normal
document"
* tag 'net-6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (112 commits)
caif: fix memory leak in cfctrl_linkup_request()
inet: control sockets should not use current thread task_frag
net/ulp: prevent ULP without clone op from entering the LISTEN status
qed: allow sleep in qed_mcp_trace_dump()
MAINTAINERS: Update maintainers for ptp_vmw driver
usb: rndis_host: Secure rndis_query check against int overflow
net: dpaa: Fix dtsec check for PCS availability
octeontx2-pf: Fix lmtst ID used in aura free
drivers/net/bonding/bond_3ad: return when there's no aggregator
netfilter: ipset: Rework long task execution when adding/deleting entries
netfilter: ipset: fix hash:net,port,net hang with /0 subnet
net: sparx5: Fix reading of the MAC address
vxlan: Fix memory leaks in error path
net: sched: htb: fix htb_classify() kernel-doc
net: sched: cbq: dont intepret cls results when asked to drop
net: sched: atm: dont intepret cls results when asked to drop
dt-bindings: net: marvell,orion-mdio: Fix examples
dt-bindings: net: sun8i-emac: Add phy-supply property
net: ipa: use proper endpoint mask for suspend
selftests: net: return non-zero for failures reported in arp_ndisc_evict_nocarrier
...
Pull virtio updates from Michael Tsirkin:
"Mostly fixes all over the place, a couple of cleanups"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (32 commits)
virtio_blk: Fix signedness bug in virtblk_prep_rq()
vdpa_sim_net: should not drop the multicast/broadcast packet
vdpasim: fix memory leak when freeing IOTLBs
vdpa: conditionally fill max max queue pair for stats
vdpa/vp_vdpa: fix kfree a wrong pointer in vp_vdpa_remove
vduse: Validate vq_num in vduse_validate_config()
tools/virtio: remove smp_read_barrier_depends()
tools/virtio: remove stray characters
vhost_vdpa: fix the crash in unmap a large memory
virtio: Implementing attribute show with sysfs_emit
virtio-crypto: fix memory leak in virtio_crypto_alg_skcipher_close_session()
tools/virtio: Variable type completion
vdpa_sim: fix vringh initialization in vdpasim_queue_ready()
virtio_blk: use UINT_MAX instead of -1U
vhost-vdpa: fix an iotlb memory leak
vhost: fix range used in translate_desc()
vringh: fix range used in iotlb_translate()
vhost/vsock: Fix error handling in vhost_vsock_init()
vdpa_sim: fix possible memory leak in vdpasim_net_init() and vdpasim_blk_init()
tools: Delete the unneeded semicolon after curly braces
...
The cacheline section holding this variable has two gaps, where one is
caused by this bool not packing well with structs. This causes it to
blow into the next cacheline. Move the variable, shrinking io_ring_ctx
by a full cacheline in size.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Use signed integer in ipv6_skip_exthdr() called from nf_confirm().
Reported by static analysis tooling, patch from Florian Westphal.
2) Missing set type checks in nf_tables: Validate that set declaration
matches the an existing set type, otherwise bail out with EEXIST.
Currently, nf_tables silently accepts the re-declaration with a
different type but it bails out later with EINVAL when the user adds
entries to the set. This fix is relatively large because it requires
two preparation patches that are included in this batch.
3) Do not ignore updates of timeout and gc_interval parameters in
existing sets.
4) Fix a hang when 0/0 subnets is added to a hash:net,port,net type of
ipset. Except hash:net,port,net and hash:net,iface, the set types don't
support 0/0 and the auxiliary functions rely on this fact. So 0/0 needs
a special handling in hash:net,port,net which was missing (hash:net,iface
was not affected by this bug), from Jozsef Kadlecsik.
5) When adding/deleting large number of elements in one step in ipset,
it can take a reasonable amount of time and can result in soft lockup
errors. This patch is a complete rework of the previous version in order
to use a smaller internal batch limit and at the same time removing
the external hard limit to add arbitrary number of elements in one step.
Also from Jozsef Kadlecsik.
Except for patch #1, which fixes a bug introduced in the previous net-next
development cycle, anything else has been broken for several releases.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull btrfs fixes from David Sterba:
"First batch of regression and regular fixes:
- regressions:
- fix error handling after conversion to qstr for paths
- fix raid56/scrub recovery caused by uninitialized variable
after conversion to error bitmaps
- restore qgroup backref lookup behaviour after recent
refactoring
- fix leak of device lists at module exit time
- fix resolving backrefs for inline extent followed by prealloc
- reset defrag ioctl buffer on memory allocation error"
* tag 'for-6.2-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix fscrypt name leak after failure to join log transaction
btrfs: scrub: fix uninitialized return value in recover_scrub_rbio
btrfs: fix resolving backrefs for inline extent followed by prealloc
btrfs: fix trace event name typo for FLUSH_DELAYED_REFS
btrfs: restore BTRFS_SEQ_LAST when looking up qgroup backref lookup
btrfs: fix leak of fs devices after removing btrfs module
btrfs: fix an error handling path in btrfs_defrag_leaves()
btrfs: fix an error handling path in btrfs_rename()
When adding/deleting large number of elements in one step in ipset, it can
take a reasonable amount of time and can result in soft lockup errors. The
patch 5f7b51bf09 ("netfilter: ipset: Limit the maximal range of
consecutive elements to add/delete") tried to fix it by limiting the max
elements to process at all. However it was not enough, it is still possible
that we get hung tasks. Lowering the limit is not reasonable, so the
approach in this patch is as follows: rely on the method used at resizing
sets and save the state when we reach a smaller internal batch limit,
unlock/lock and proceed from the saved state. Thus we can avoid long
continuous tasks and at the same time removed the limit to add/delete large
number of elements in one step.
The nfnl mutex is held during the whole operation which prevents one to
issue other ipset commands in parallel.
Fixes: 5f7b51bf09 ("netfilter: ipset: Limit the maximal range of consecutive elements to add/delete")
Reported-by: syzbot+9204e7399656300bf271@syzkaller.appspotmail.com
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When ceph releasing the file_lock it will try to get the inode pointer
from the fl->fl_file, which the memory could already be released by
another thread in filp_close(). Because in VFS layer the fl->fl_file
doesn't increase the file's reference counter.
Will switch to use ceph dedicate lock info to track the inode.
And in ceph_fl_release_lock() we should skip all the operations if the
fl->fl_u.ceph.inode is not set, which should come from the request
file_lock. And we will set fl->fl_u.ceph.inode when inserting it to the
inode lock list, which is when copying the lock.
Link: https://tracker.ceph.com/issues/57986
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Now that phylink no longer calls phy_get_rate_matching with
PHY_INTERFACE_MODE_NA, phys no longer need to support it. Remove the
documentation mandating support.
Fixes: 7642cc28fd ("net: phylink: fix PHY validation with rate adaption")
Signed-off-by: Sean Anderson <sean.anderson@seco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It was discovered that MGMT_DATA2 can contain up to 28 bytes of data
instead of the 12 bytes written in the Documentation by accounting the
limit of 16 bytes declared in Documentation subtracting the first 4 byte
in the packet header.
Update the define with the real world value.
Tested-by: Ronald Wahl <ronald.wahl@raritan.com>
Fixes: c2ee8181fd ("net: dsa: tag_qca: add define for handling mgmt Ethernet packet")
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Cc: stable@vger.kernel.org # v5.18+
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull Kbuild fixes from Masahiro Yamada:
- Fix broken BuildID
- Add srcrpm-pkg to the help message
- Fix the option order for modpost built with musl libc
- Fix the build dependency of rpm-pkg for openSUSE
* tag 'kbuild-fixes-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
fixdep: remove unneeded <stdarg.h> inclusion
kbuild: sort single-targets alphabetically again
kbuild: rpm-pkg: add libelf-devel as alternative for BuildRequires
kbuild: Fix running modpost with musl libc
kbuild: add a missing line for help message
.gitignore: ignore *.rpm
arch: fix broken BuildID for arm64 and riscv
kconfig: Add static text for search information in help menu
Pull ACPI fixes from Rafael Wysocki:
"These are new ACPI IRQ override quirks, low-power S0 idle (S0ix)
support adjustments and ACPI backlight handling fixes, mostly for
platforms using AMD chips.
Specifics:
- Add ACPI IRQ override quirks for Asus ExpertBook B2502, Lenovo
14ALC7, and XMG Core 15 (Hans de Goede, Adrian Freund, Erik
Schumacher).
- Adjust ACPI video detection fallback path to prevent
non-operational ACPI backlight devices from being created on
systems where the native driver does not detect a suitable panel
(Mario Limonciello).
- Fix Apple GMUX backlight detection (Hans de Goede).
- Add a low-power S0 idle (S0ix) handling quirk for HP Elitebook 865
and stop using AMD-specific low-power S0 idle code path for systems
with Rembrandt chips and newer (Mario Limonciello)"
* tag 'acpi-6.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: x86: s2idle: Stop using AMD specific codepath for Rembrandt+
ACPI: x86: s2idle: Force AMD GUID/_REV 2 on HP Elitebook 865
ACPI: video: Fix Apple GMUX backlight detection
ACPI: resource: Add Asus ExpertBook B2502 to Asus quirks
ACPI: resource: do IRQ override on Lenovo 14ALC7
ACPI: resource: do IRQ override on XMG Core 15
ACPI: video: Don't enable fallback path for creating ACPI backlight by default
drm/amd/display: Report to ACPI video if no panels were found
ACPI: video: Allow GPU drivers to report no panels
Dennis Gilmore reports that the BuildID is missing in the arm64 vmlinux
since commit 994b7ac169 ("arm64: remove special treatment for the
link order of head.o").
The issue is that the type of .notes section, which contains the BuildID,
changed from NOTES to PROGBITS.
Ard Biesheuvel figured out that whichever object gets linked first gets
to decide the type of a section. The PROGBITS type is the result of the
compiler emitting .note.GNU-stack as PROGBITS rather than NOTE.
While Ard provided a fix for arm64, I want to fix this globally because
the same issue is happening on riscv since commit 2348e6bf44 ("riscv:
remove special treatment for the link order of head.o"). This problem
will happen in general for other architectures if they start to drop
unneeded entries from scripts/head-object-list.txt.
Discard .note.GNU-stack in include/asm-generic/vmlinux.lds.h.
Link: https://lore.kernel.org/lkml/CAABkxwuQoz1CTbyb57n0ZX65eSYiTonFCU8-LCQc=74D=xE=rA@mail.gmail.com/
Fixes: 994b7ac169 ("arm64: remove special treatment for the link order of head.o")
Fixes: 2348e6bf44 ("riscv: remove special treatment for the link order of head.o")
Reported-by: Dennis Gilmore <dennis@ausil.us>
Suggested-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Jiri Slaby reported regression of bind() with a simple repro. [0]
The repro creates a TIME_WAIT socket and tries to bind() a new socket
with the same local address and port. Before commit 28044fc1d4 ("net:
Add a bhash2 table hashed by port and address"), the bind() failed with
-EADDRINUSE, but now it succeeds.
The cited commit should have put TIME_WAIT sockets into bhash2; otherwise,
inet_bhash2_conflict() misses TIME_WAIT sockets when validating bind()
requests if the address is not a wildcard one.
The straight option is to move sk_bind2_node from struct sock to struct
sock_common to add twsk to bhash2 as implemented as RFC. [1] However, the
binary layout change in the struct sock could affect performances moving
hot fields on different cachelines.
To avoid that, we add another TIME_WAIT list in inet_bind2_bucket and check
it while validating bind().
[0]: https://lore.kernel.org/netdev/6b971a4e-c7d8-411e-1f92-fda29b5b2fb9@kernel.org/
[1]: https://lore.kernel.org/netdev/20221221151258.25748-2-kuniyu@amazon.com/
Fixes: 28044fc1d4 ("net: Add a bhash2 table hashed by port and address")
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull block fixes from Jens Axboe:
"Mostly just NVMe, but also a single fixup for BFQ for a regression
that happened during the merge window. In detail:
- NVMe pull requests via Christoph:
- Fix doorbell buffer value endianness (Klaus Jensen)
- Fix Linux vs NVMe page size mismatch (Keith Busch)
- Fix a potential use memory access beyong the allocation limit
(Keith Busch)
- Fix a multipath vs blktrace NULL pointer dereference (Yanjun
Zhang)
- Fix various problems in handling the Command Supported and
Effects log (Christoph Hellwig)
- Don't allow unprivileged passthrough of commands that don't
transfer data but modify logical block content (Christoph
Hellwig)
- Add a features and quirks policy document (Christoph Hellwig)
- Fix some really nasty code that was correct but made smatch
complain (Sagi Grimberg)
- Use-after-free regression in BFQ from this merge window (Yu)"
* tag 'block-6.2-2022-12-29' of git://git.kernel.dk/linux:
nvme-auth: fix smatch warning complaints
nvme: consult the CSE log page for unprivileged passthrough
nvme: also return I/O command effects from nvme_command_effects
nvmet: don't defer passthrough commands with trivial effects to the workqueue
nvmet: set the LBCC bit for commands that modify data
nvmet: use NVME_CMD_EFFECTS_CSUPP instead of open coding it
nvme: fix the NVME_CMD_EFFECTS_CSE_MASK definition
docs, nvme: add a feature and quirk policy document
nvme-pci: update sqsize when adjusting the queue depth
nvme: fix setting the queue depth in nvme_alloc_io_tag_set
block, bfq: fix uaf for bfqq in bfq_exit_icq_bfqq
nvme: fix multipath crash caused by flush request when blktrace is enabled
nvme-pci: fix page size checks
nvme-pci: fix mempool alloc size
nvme-pci: fix doorbell buffer value endianness
Pull io_uring fixes from Jens Axboe:
- Two fixes for mutex grabbing when the task state is != TASK_RUNNING
(me)
- Check for invalid opcode in io_uring_register() a bit earlier, to
avoid going through the quiesce machinery just to return -EINVAL
later in the process (me)
- Fix for the uapi io_uring header, skipping including time_types.h
when necessary (Stefan)
* tag 'io_uring-6.2-2022-12-29' of git://git.kernel.dk/linux:
uapi:io_uring.h: allow linux/time_types.h to be skipped
io_uring: check for valid register opcode earlier
io_uring/cancel: re-grab ctx mutex after finishing wait
io_uring: finish waiting before flushing overflow entries
Fix SRIOV VST mode behavior to insert cvlan when a guest tag is already
present in the frame. Previous VST mode behavior was to drop packets or
override existing tag, depending on the device version.
In this patch we fix this behavior by correctly building the HW steering
rule with a push vlan action, or for older devices we ask the FW to stack
the vlan when a vlan is already present.
Fixes: 07bab95026 ("net/mlx5: E-Switch, Refactor eswitch ingress acl codes")
Fixes: dfcb1ed3c3 ("net/mlx5: E-Switch, Vport ingress/egress ACLs rules for VST mode")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Commands like Write Zeros can change the contents of a namespaces without
actually transferring data. To protect against this, check the Commands
Supported and Effects log is supported by the controller for any
unprivileg command passthrough and refuse unprivileged passthrough if the
command has any effects that can change data or metadata.
Note: While the Commands Support and Effects log page has only been
mandatory since NVMe 2.0, it is widely supported because Windows requires
it for any command passthrough from userspace.
Fixes: e4fbcf32c8 ("nvme: identify-namespace without CAP_SYS_ADMIN")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
3 << 16 does not generate the correct mask for bits 16, 17 and 18.
Use the GENMASK macro to generate the correct mask instead.
Fixes: 84fef62d13 ("nvme: check admin passthru command effects")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
x86:
* several fixes to nested VMX execution controls
* fixes and clarification to the documentation for Xen emulation
* do not unnecessarily release a pmu event with zero period
* MMU fixes
* fix Coverity warning in kvm_hv_flush_tlb()
selftests:
* fixes for the ucall mechanism in selftests
* other fixes mostly related to compilation with clang
Rudi reports a compilation failure on x86_64 when CONFIG_NET_CLS or
CONFIG_NET_CLS_ACT is not set but CONFIG_RETPOLINE is set.
A misplaced '#endif' was causing the issue.
Fixes: 7f0e810220 ("net/sched: add retpoline wrapper for tc")
Tested-by: Rudi Heitbaum <rudi@heitbaum.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We can merge VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES with
VDPA_ATTR_DEV_FEATURES which is functionally equivalent.
While at it, tweak the comment in header file to make
user provioned device features distinguished from those
supported by the parent mgmtdev device: the former of
which can be inherited as a whole from the latter, or
can be a subset of the latter if explicitly specified.
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Message-Id: <1665422823-18364-1-git-send-email-si-wei.liu@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
At the end of rxrpc_recvmsg(), if a call is found, the call is put and then
a trace line is emitted referencing that call in a couple of places - but
the call may have been deallocated by the time those traces happen.
Fix this by stashing the call debug_id in a variable and passing that to
the tracepoint rather than the call pointer.
Fixes: 849979051c ("rxrpc: Add a tracepoint to follow what recvmsg does")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: David S. Miller <davem@davemloft.net>
These are (uint64_t)-1 magic values are a userspace ABI, allowing the
shared info pages and other enlightenments to be disabled. This isn't
a Xen ABI because Xen doesn't let the guest turn these off except with
the full SHUTDOWN_soft_reset mechanism. Under KVM, the userspace VMM is
expected to handle soft reset, and tear down the kernel parts of the
enlightenments accordingly.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20221226120320.1125390-5-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>