In many deployments, we want the option to not learn a neighbor from
garp if the src ip is not in the same subnet as an address configured
on the interface that received the garp message. net.ipv4.arp_accept
sysctl is currently used to control creation of a neigh from a
received garp packet. This patch adds a new option '2' to
net.ipv4.arp_accept which extends option '1' by including the subnet
check.
Signed-off-by: Jaehee Park <jhpark1013@gmail.com>
Suggested-by: Roopa Prabhu <roopa@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Saeed Mahameed says:
====================
mlx5-updates-2022-07-13
1) Support 802.1ad for bridge offloads
Vlad Buslov Says:
=================
Current mlx5 bridge VLAN offload implementation only supports 802.1Q VLAN
Ethernet protocol. That protocol type is assumed by default and
SWITCHDEV_ATTR_ID_BRIDGE_VLAN_PROTOCOL notification is ignored.
In order to support dynamically setting VLAN protocol handle
SWITCHDEV_ATTR_ID_BRIDGE_VLAN_PROTOCOL notification by flushing FDB and
re-creating VLAN modify header actions with a new protocol. Implement support
for 802.1ad protocol by saving the current VLAN protocol to per-bridge variable
and re-create the necessary flow groups according to its current value (either
use cvlan or svlan flow fields).
==================
2) debugfs to count ongoing FW commands
3) debugfs to query eswitch vport firmware diagnostic counters
4) Add missing meter configuration in flow action
5) Some misc cleanup
* tag 'mlx5-updates-2022-07-13' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: Remove the duplicating check for striding RQ when enabling LRO
net/mlx5e: Move the LRO-XSK check to mlx5e_fix_features
net/mlx5e: Extend flower police validation
net/mlx5e: configure meter in flow action
net/mlx5e: Removed useless code in function
net/mlx5: Bridge, implement QinQ support
net/mlx5: Bridge, implement infrastructure for VLAN protocol change
net/mlx5: Bridge, extract VLAN push/pop actions creation
net/mlx5: Bridge, rename filter fg to vlan_filter
net/mlx5: Bridge, refactor groups sizes and indices
net/mlx5: debugfs, Add num of in-use FW command interface slots
net/mlx5: Expose vnic diagnostic counters for eswitch managed vports
net/mlx5: Use software VHCA id when it's supported
net/mlx5: Introduce ifc bits for using software vhca id
net/mlx5: Use the bitmap API to allocate bitmaps
====================
Link: https://lore.kernel.org/r/20220713225859.401241-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull integrity fixes from Mimi Zohar:
"Here are a number of fixes for recently found bugs.
Only 'ima: fix violation measurement list record' was introduced in
the current release. The rest address existing bugs"
* tag 'integrity-v5.19-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
ima: Fix potential memory leak in ima_init_crypto()
ima: force signature verification when CONFIG_KEXEC_SIG is configured
ima: Fix a potential integer overflow in ima_appraise_measurement
ima: fix violation measurement list record
Revert "evm: Fix memleak in init_desc"
Use software VHCA id when it's supported by the firmware.
A unique id is allocated upon mlx5_mdev_init() and freed upon
mlx5_mdev_uninit(), as such it stays the same during the full life cycle
of the device including upon health recovery if occurred.
The conjunction of sw_vhca_id with sw_owner_id will be a global unique
id per function which uses mlx5_core.
The sw_vhca_id is set upon init_hca command and is used to specify the
VHCA that the NIC vport is affiliated with.
This functionality is needed upon migration of VM which is MPV based.
(i.e. multi port device).
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Introduce ifc related stuff to enable using software vhca id
functionality.
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Pull cgroup fix from Tejun Heo:
"Fix an old and subtle bug in the migration path.
css_sets are used to track tasks and migrations are tasks moving from
a group of css_sets to another group of css_sets. The migration path
pins all source and destination css_sets in the prep stage.
Unfortunately, it was overloading the same list_head entry to track
sources and destinations, which got confused for migrations which are
partially identity leading to use-after-frees.
Fixed by using dedicated list_heads for tracking sources and
destinations"
* tag 'cgroup-for-5.19-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: Use separate src/dst nodes when preloading css_sets for migration
Currently, an unsigned kernel could be kexec'ed when IMA arch specific
policy is configured unless lockdown is enabled. Enforce kernel
signature verification check in the kexec_file_load syscall when IMA
arch specific policy is configured.
Fixes: 99d5cadfde ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG and KEXEC_SIG_FORCE")
Reported-and-suggested-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Johannes Berg says:
====================
A fairly large set of updates for next, highlights:
ath10k
* ethernet frame format support
rtw89
* TDLS support
cfg80211/mac80211
* airtime fairness fixes
* EHT support continued, especially in AP mode
* initial (and still major) rework for multi-link
operation (MLO) from 802.11be/wifi 7
As usual, also many small updates/cleanups/fixes/etc.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
As part of the flows invoked by mlx5_devlink_eswitch_mode_set() get to
mlx5_rescan_drivers_locked() which can call mlx5e_probe()/mlx5e_remove
and register/unregister mlx5e driver ports accordingly. This can lead to
deadlock once mlx5_devlink_eswitch_mode_set() will use devlink lock.
Use devl_port_register/unregister() instead of
devlink_port_register/unregister() and add devlink instance locks in the
driver paths to this function to have it locked while calling devl_ API
function.
If remove or probe were called by module init or module cleanup flows,
need to lock devlink just before calling devl_port_register(), otherwise
it is called by attach/detach or register/unregister flow and we can
have the flow locked. Added flag to distinguish between these cases.
This will be used by the downstream patch to invoke
mlx5_devlink_eswitch_mode_set() with devlink locked.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Pull x86 retbleed fixes from Borislav Petkov:
"Just when you thought that all the speculation bugs were addressed and
solved and the nightmare is complete, here's the next one: speculating
after RET instructions and leaking privileged information using the
now pretty much classical covert channels.
It is called RETBleed and the mitigation effort and controlling
functionality has been modelled similar to what already existing
mitigations provide"
* tag 'x86_bugs_retbleed' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
x86/speculation: Disable RRSBA behavior
x86/kexec: Disable RET on kexec
x86/bugs: Do not enable IBPB-on-entry when IBPB is not supported
x86/entry: Move PUSH_AND_CLEAR_REGS() back into error_entry
x86/bugs: Add Cannon lake to RETBleed affected CPU list
x86/retbleed: Add fine grained Kconfig knobs
x86/cpu/amd: Enumerate BTC_NO
x86/common: Stamp out the stepping madness
KVM: VMX: Prevent RSB underflow before vmenter
x86/speculation: Fill RSB on vmexit for IBRS
KVM: VMX: Fix IBRS handling after vmexit
KVM: VMX: Prevent guest RSB poisoning attacks with eIBRS
KVM: VMX: Convert launched argument to flags
KVM: VMX: Flatten __vmx_vcpu_run()
objtool: Re-add UNWIND_HINT_{SAVE_RESTORE}
x86/speculation: Remove x86_spec_ctrl_mask
x86/speculation: Use cached host SPEC_CTRL value for guest entry/exit
x86/speculation: Fix SPEC_CTRL write on SMT state change
x86/speculation: Fix firmware entry SPEC_CTRL handling
x86/speculation: Fix RSB filling with CONFIG_RETPOLINE=n
...
Pull hotfixes from Andrew Morton:
"Mainly MM fixes. About half for issues which were introduced after
5.18 and the remainder for longer-term issues"
* tag 'mm-hotfixes-stable-2022-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm: split huge PUD on wp_huge_pud fallback
nilfs2: fix incorrect masking of permission flags for symlinks
mm/rmap: fix dereferencing invalid subpage pointer in try_to_migrate_one()
riscv/mm: fix build error while PAGE_TABLE_CHECK enabled without MMU
Documentation: highmem: use literal block for code example in highmem.h comment
mm: sparsemem: fix missing higher order allocation splitting
mm/damon: use set_huge_pte_at() to make huge pte old
sh: convert nommu io{re,un}map() to static inline functions
mm: userfaultfd: fix UFFDIO_CONTINUE on fallocated shmem pages
As Chris explains, the comment above exit_itimers() is not correct,
we can race with proc_timers_seq_ops. Change exit_itimers() to clear
signal->posix_timers with ->siglock held.
Cc: <stable@vger.kernel.org>
Reported-by: chris@accessvector.net
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull char/misc driver fixes from Greg KH:
"Here are four small char/misc driver fixes for 5.19-rc6 to resolve
some reported issues. They only affect two drivers:
- rtsx_usb: fix for of-reported DMA warning error, the driver was
handling memory buffers in odd ways, it has now been fixed up to be
much simpler and correct by Shuah.
- at25 eeprom driver bugfix for reported problem
All of these have been in linux-next for a week with no reported
problems"
* tag 'char-misc-5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
misc: rtsx_usb: set return value in rsp_buf alloc err path
misc: rtsx_usb: use separate command and response buffers
misc: rtsx_usb: fix use of dma mapped buffer for usb bulk transfer
eeprom: at25: Rework buggy read splitting
Daniel Borkmann says:
====================
pull-request: bpf-next 2022-07-09
We've added 94 non-merge commits during the last 19 day(s) which contain
a total of 125 files changed, 5141 insertions(+), 6701 deletions(-).
The main changes are:
1) Add new way for performing BTF type queries to BPF, from Daniel Müller.
2) Add inlining of calls to bpf_loop() helper when its function callback is
statically known, from Eduard Zingerman.
3) Implement BPF TCP CC framework usability improvements, from Jörn-Thorben Hinz.
4) Add LSM flavor for attaching per-cgroup BPF programs to existing LSM
hooks, from Stanislav Fomichev.
5) Remove all deprecated libbpf APIs in prep for 1.0 release, from Andrii Nakryiko.
6) Add benchmarks around local_storage to BPF selftests, from Dave Marchevsky.
7) AF_XDP sample removal (given move to libxdp) and various improvements around AF_XDP
selftests, from Magnus Karlsson & Maciej Fijalkowski.
8) Add bpftool improvements for memcg probing and bash completion, from Quentin Monnet.
9) Add arm64 JIT support for BPF-2-BPF coupled with tail calls, from Jakub Sitnicki.
10) Sockmap optimizations around throughput of UDP transmissions which have been
improved by 61%, from Cong Wang.
11) Rework perf's BPF prologue code to remove deprecated functions, from Jiri Olsa.
12) Fix sockmap teardown path to avoid sleepable sk_psock_stop, from John Fastabend.
13) Fix libbpf's cleanup around legacy kprobe/uprobe on error case, from Chuang Wang.
14) Fix libbpf's bpf_helpers.h to work with gcc for the case of its sec/pragma
macro, from James Hilliard.
15) Fix libbpf's pt_regs macros for riscv to use a0 for RC register, from Yixun Lan.
16) Fix bpftool to show the name of type BPF_OBJ_LINK, from Yafang Shao.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (94 commits)
selftests/bpf: Fix xdp_synproxy build failure if CONFIG_NF_CONNTRACK=m/n
bpf: Correctly propagate errors up from bpf_core_composites_match
libbpf: Disable SEC pragma macro on GCC
bpf: Check attach_func_proto more carefully in check_return_code
selftests/bpf: Add test involving restrict type qualifier
bpftool: Add support for KIND_RESTRICT to gen min_core_btf command
MAINTAINERS: Add entry for AF_XDP selftests files
selftests, xsk: Rename AF_XDP testing app
bpf, docs: Remove deprecated xsk libbpf APIs description
selftests/bpf: Add benchmark for local_storage RCU Tasks Trace usage
libbpf, riscv: Use a0 for RC register
libbpf: Remove unnecessary usdt_rel_ip assignments
selftests/bpf: Fix few more compiler warnings
selftests/bpf: Fix bogus uninitialized variable warning
bpftool: Remove zlib feature test from Makefile
libbpf: Cleanup the legacy uprobe_event on failed add/attach_event()
libbpf: Fix wrong variable used in perf_event_uprobe_open_legacy()
libbpf: Cleanup the legacy kprobe_event on failed add/attach_event()
selftests/bpf: Add type match test against kernel's task_struct
selftests/bpf: Add nested type to type based tests
...
====================
Link: https://lore.kernel.org/r/20220708233145.32365-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull fscache fixes from David Howells:
- Fix a check in fscache_wait_on_volume_collision() in which the
polarity is reversed. It should complain if a volume is still marked
acquisition-pending after 20s, but instead complains if the mark has
been cleared (ie. the condition has cleared).
Also switch an open-coded test of the ACQUIRE_PENDING volume flag to
use the helper function for consistency.
- Not a fix per se, but neaten the code by using a helper to check for
the DROPPED state.
- Fix cachefiles's support for erofs to only flush requests associated
with a released control file, not all requests.
- Fix a race between one process invalidating an object in the cache
and another process trying to look it up.
* tag 'fscache-fixes-20220708' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
fscache: Fix invalidation/lookup race
cachefiles: narrow the scope of flushed requests when releasing fd
fscache: Introduce fscache_cookie_is_dropped()
fscache: Fix if condition in fscache_wait_on_volume_collision()
Pull ACPI fixes from Rafael Wysocki:
"These fix two recent regressions related to CPPC support.
Specifics:
- Prevent _CPC from being used if the platform firmware does not
confirm CPPC v2 support via _OSC (Mario Limonciello)
- Allow systems with X86_FEATURE_CPPC set to use _CPC even if CPPC
support cannot be agreed on via _OSC (Mario Limonciello)"
* tag 'acpi-5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: CPPC: Don't require _OSC if X86_FEATURE_CPPC is supported
ACPI: CPPC: Only probe for _CPC if CPPC v2 is acked
Pull power management fixes from Rafael Wysocki:
"These fix a NULL pointer dereference in a devfreq driver and a runtime
PM framework issue that may cause a supplier device to be suspended
before its consumer.
Specifics:
- Fix NULL pointer dereference related to printing a diagnostic
message in the exynos-bus devfreq driver (Christian Marangi)
- Fix race condition in the runtime PM framework which in some cases
may cause a supplier device to be suspended when its consumer is
still active (Rafael Wysocki)"
* tag 'pm-5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / devfreq: exynos-bus: Fix NULL pointer dereference
PM: runtime: Fix supplier device management during consumer probe
PM: runtime: Redefine pm_runtime_release_supplier()
Pull iommu fixes from Joerg Roedel:
- fix device setup failures in the Intel VT-d driver when the PASID
table is shared
- fix Intel VT-d device hot-add failure due to wrong device notifier
order
- remove the old IOMMU mailing list from the MAINTAINERS file now that
it has been retired
* tag 'iommu-fixes-v5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
MAINTAINERS: Remove iommu@lists.linux-foundation.org
iommu/vt-d: Fix RID2PASID setup/teardown failure
iommu/vt-d: Fix PCI bus rescan device hot add
Pull block fixes from Jens Axboe:
"NVMe pull request with another id quirk addition, and a tracing fix"
* tag 'block-5.19-2022-07-08' of git://git.kernel.dk/linux-block:
nvme: use struct group for generic command dwords
nvme-pci: phison e16 has bogus namespace ids
We need to prevent that users configure a screen size which is smaller than the
currently selected font size. Otherwise rendering chars on the screen will
access memory outside the graphics memory region.
This patch adds a new function fbcon_modechange_possible() which
implements this check and which later may be extended with other checks
if necessary. The new function is called from the FBIOPUT_VSCREENINFO
ioctl handler in fbmem.c, which will return -EINVAL if userspace asked
for a too small screen size.
Signed-off-by: Helge Deller <deller@gmx.de>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: stable@vger.kernel.org # v5.4+
This will allow the trace event to know the full size of the data
intended to be copied and silence read overflow checks.
Reported-by: John Garry <john.garry@huawei.com>
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Since optimisitic decrypt may add extra load in case of retries
require socket owner to explicitly opt-in.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The IOMMU driver shares the pasid table for PCI alias devices. When the
RID2PASID entry of the shared pasid table has been filled by the first
device, the subsequent device will encounter the "DMAR: Setup RID2PASID
failed" failure as the pasid entry has already been marked as present.
As the result, the IOMMU probing process will be aborted.
On the contrary, when any alias device is hot-removed from the system,
for example, by writing to /sys/bus/pci/devices/.../remove, the shared
RID2PASID will be cleared without any notifications to other devices.
As the result, any DMAs from those rest devices are blocked.
Sharing pasid table among PCI alias devices could save two memory pages
for devices underneath the PCIe-to-PCI bridges. Anyway, considering that
those devices are rare on modern platforms that support VT-d in scalable
mode and the saved memory is negligible, it's reasonable to remove this
part of immature code to make the driver feasible and stable.
Fixes: ef848b7e5a ("iommu/vt-d: Setup pasid entry for RID2PASID support")
Reported-by: Chenyi Qiang <chenyi.qiang@intel.com>
Reported-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220623065720.727849-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20220625133430.2200315-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Previously the kernel used to ignore whether the firmware masked CPPC
or CPPCv2 and would just pretend that it worked.
When support for the USB4 bit in _OSC was introduced from commit
9e1f561afb ("ACPI: Execute platform _OSC also with query bit clear")
the kernel began to look at the return when the query bit was clear.
This caused regressions that were misdiagnosed and attempted to be solved
as part of commit 2ca8e62852 ("Revert "ACPI: Pass the same capabilities
to the _OSC regardless of the query flag""). This caused a different
regression where non-Intel systems weren't able to negotiate _OSC
properly.
This was reverted in commit 2ca8e62852 ("Revert "ACPI: Pass the same
capabilities to the _OSC regardless of the query flag"") and attempted to
be fixed by commit c42fa24b44 ("ACPI: bus: Avoid using CPPC if not
supported by firmware") but the regression still returned.
These systems with the regression only load support for CPPC from an SSDT
dynamically when _OSC reports CPPC v2. Avoid the problem by not letting
CPPC satisfy the requirement in `acpi_cppc_processor_probe`.
Reported-by: CUI Hao <cuihao.leo@gmail.com>
Reported-by: maxim.novozhilov@gmail.com
Reported-by: lethe.tree@protonmail.com
Reported-by: garystephenwright@gmail.com
Reported-by: galaxyking0419@gmail.com
Fixes: c42fa24b44 ("ACPI: bus: Avoid using CPPC if not supported by firmware")
Fixes: 2ca8e62852 ("Revert "ACPI Pass the same capabilities to the _OSC regardless of the query flag"")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=213023
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2075387
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Tested-by: CUI Hao <cuihao.leo@gmail.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
If an NFS file is opened for writing and closed, fscache_invalidate() will
be asked to invalidate the file - however, if the cookie is in the
LOOKING_UP state (or the CREATING state), then request to invalidate
doesn't get recorded for fscache_cookie_state_machine() to do something
with.
Fix this by making __fscache_invalidate() set a flag if it sees the cookie
is in the LOOKING_UP state to indicate that we need to go to invalidation.
Note that this requires a count on the n_accesses counter for the state
machine, which that will release when it's done.
fscache_cookie_state_machine() then shifts to the INVALIDATING state if it
sees the flag.
Without this, an nfs file can get corrupted if it gets modified locally and
then read locally as the cache contents may not get updated.
Fixes: d24af13e2e ("fscache: Implement cookie invalidation")
Reported-by: Max Kellermann <mk@cm4all.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Max Kellermann <mk@cm4all.com>
Link: https://lore.kernel.org/r/YlWWbpW5Foynjllo@rabbit.intern.cm-ag [1]
Add support for BCM53128 internal PHYs. These support interrupts as well as
statistics. Therefore, enable the Broadcom PHY driver for them.
Tested on BCM53128 switch using the mainline b53 DSA driver.
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When building htmldocs on Linus's tree, there are inline emphasis warnings
on include/linux/highmem.h:
Documentation/vm/highmem:166: ./include/linux/highmem.h:154: WARNING: Inline emphasis start-string without end-string.
Documentation/vm/highmem:166: ./include/linux/highmem.h:157: WARNING: Inline emphasis start-string without end-string.
These warnings above are due to comments in code example at the mentioned
lines above are enclosed by double dash (--), which confuses Sphinx as
inline markup delimiters instead.
Fix these warnings by indenting the code example with literal block
indentation and making the comments C comments.
Link: https://lkml.kernel.org/r/20220622084546.17745-1-bagasdotme@gmail.com
Fixes: 85a85e7601 ("Documentation/vm: move "Using kmap-atomic" to highmem.h")
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Tested-by: Ira Weiny <ira.weiny@intel.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: "Fabio M. De Francesco" <fmdefrancesco@gmail.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Looking at the conditional lock acquire functions in the kernel due to
the new sparse support (see commit 4a557a5d1a "sparse: introduce
conditional lock acquire function attribute"), it became obvious that
the lockref code has a couple of them, but they don't match the usual
naming convention for the other ones, and their return value logic is
also reversed.
In the other very similar places, the naming pattern is '*_and_lock()'
(eg 'atomic_put_and_lock()' and 'refcount_dec_and_lock()'), and the
function returns true when the lock is taken.
The lockref code is superficially very similar to the refcount code,
only with the special "atomic wrt the embedded lock" semantics. But
instead of the '*_and_lock()' naming it uses '*_or_lock()'.
And instead of returning true in case it took the lock, it returns true
if it *didn't* take the lock.
Now, arguably the reflock code is quite logical: it really is a "either
decrement _or_ lock" kind of situation - and the return value is about
whether the operation succeeded without any special care needed.
So despite the similarities, the differences do make some sense, and
maybe it's not worth trying to unify the different conditional locking
primitives in this area.
But while looking at this all, it did become obvious that the
'lockref_get_or_lock()' function hasn't actually had any users for
almost a decade.
The only user it ever had was the shortlived 'd_rcu_to_refcount()'
function, and it got removed and replaced with 'lockref_get_not_dead()'
back in 2013 in commits 0d98439ea3 ("vfs: use lockred 'dead' flag to
mark unrecoverably dead dentries") and e5c832d555 ("vfs: fix dentry
RCU to refcounting possibly sleeping dput()")
In fact, that single use was removed less than a week after the whole
function was introduced in commit b3abd80250 ("lockref: add
'lockref_get_or_lock() helper") so this function has been around for a
decade, but only had a user for six days.
Let's just put this mis-designed and unused function out of its misery.
We can think about the naming and semantic oddities of the remaining
'lockref_put_or_lock()' later, but at least that function has users.
And while the naming is different and the return value doesn't match,
that function matches the whole '{atomic,refcount}_dec_and_test()'
pattern much better (ie the magic happens when the count goes down to
zero, not when it is incremented from zero).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The kernel tends to try to avoid conditional locking semantics because
it makes it harder to think about and statically check locking rules,
but we do have a few fundamental locking primitives that take locks
conditionally - most obviously the 'trylock' functions.
That has always been a problem for 'sparse' checking for locking
imbalance, and we've had a special '__cond_lock()' macro that we've used
to let sparse know how the locking works:
# define __cond_lock(x,c) ((c) ? ({ __acquire(x); 1; }) : 0)
so that you can then use this to tell sparse that (for example) the
spinlock trylock macro ends up acquiring the lock when it succeeds, but
not when it fails:
#define raw_spin_trylock(lock) __cond_lock(lock, _raw_spin_trylock(lock))
and then sparse can follow along the locking rules when you have code like
if (!spin_trylock(&dentry->d_lock))
return LRU_SKIP;
.. sparse sees that the lock is held here..
spin_unlock(&dentry->d_lock);
and sparse ends up happy about the lock contexts.
However, this '__cond_lock()' use does result in very ugly header files,
and requires you to basically wrap the real function with that macro
that uses '__cond_lock'. Which has made PeterZ NAK things that try to
fix sparse warnings over the years [1].
To solve this, there is now a very experimental patch to sparse that
basically does the exact same thing as '__cond_lock()' did, but using a
function attribute instead. That seems to make PeterZ happy [2].
Note that this does not replace existing use of '__cond_lock()', but
only exposes the new proposed attribute and uses it for the previously
unannotated 'refcount_dec_and_lock()' family of functions.
For existing sparse installations, this will make no difference (a
negative output context was ignored), but if you have the experimental
sparse patch it will make sparse now understand code that uses those
functions, the same way '__cond_lock()' makes sparse understand the very
similar 'atomic_dec_and_lock()' uses that have the old '__cond_lock()'
annotations.
Note that in some cases this will silence existing context imbalance
warnings. But in other cases it may end up exposing new sparse warnings
for code that sparse just didn't see the locking for at all before.
This is a trial, in other words. I'd expect that if it ends up being
successful, and new sparse releases end up having this new attribute,
we'll migrate the old-style '__cond_lock()' users to use the new-style
'__cond_acquires' function attribute.
The actual experimental sparse patch was posted in [3].
Link: https://lore.kernel.org/all/20130930134434.GC12926@twins.programming.kicks-ass.net/ [1]
Link: https://lore.kernel.org/all/Yr60tWxN4P568x3W@worktop.programming.kicks-ass.net/ [2]
Link: https://lore.kernel.org/all/CAHk-=wjZfO9hGqJ2_hGQG3U_XzSh9_XaXze=HgPdvJbgrvASfA@mail.gmail.com/ [3]
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Alexander Aring <aahringo@redhat.com>
Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marc Kleine-Budde says:
====================
pull-request: can-next 2022-07-03
this is a pull request of 15 patches for net-next/master.
The first 2 patches are by Max Staudt and add the can327 serial CAN
driver along with a new line discipline ID.
The next patch is by me an fixes a typo in the ctucanfd driver.
The last 12 patches are by Dario Binacchi and integrate slcan CAN
serial driver better into the existing CAN driver API.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, there are three eswitch modes, none, legacy and switchdev.
None is the default mode. Remove redundant none mode as eswitch mode
should always be either legacy mode or switchdev mode.
With this patch, there are two behavior changes:
1. Legacy becomes the default mode. When querying eswitch mode using
devlink, a valid mode is always returned.
2. When disabling sriov, the eswitch mode will not change, only vfs
are unloaded.
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Most drivers use "skb_transport_offset(skb) + tcp_hdrlen(skb)"
to compute headers length for a TCP packet, but others
use more convoluted (but equivalent) ways.
Add skb_tcp_all_headers() and skb_inner_tcp_all_headers()
helpers to harmonize this a bit.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull power management fixes from Rafael Wysocki:
"These fix some issues in cpufreq drivers and some issues in devfreq:
- Fix error code path issues related PROBE_DEFER handling in devfreq
(Christian Marangi)
- Revert an editing accident in SPDX-License line in the devfreq
passive governor (Lukas Bulwahn)
- Fix refcount leak in of_get_devfreq_events() in the exynos-ppmu
devfreq driver (Miaoqian Lin)
- Use HZ_PER_KHZ macro in the passive devfreq governor (Yicong Yang)
- Fix missing of_node_put for qoriq and pmac32 driver (Liang He)
- Fix issues around throttle interrupt for qcom driver (Stephen Boyd)
- Add MT8186 to cpufreq-dt-platdev blocklist (AngeloGioacchino Del
Regno)
- Make amd-pstate enable CPPC on resume from S3 (Jinzhou Su)"
* tag 'pm-5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / devfreq: passive: revert an editing accident in SPDX-License line
PM / devfreq: Fix kernel warning with cpufreq passive register fail
PM / devfreq: Rework freq_table to be local to devfreq struct
PM / devfreq: exynos-ppmu: Fix refcount leak in of_get_devfreq_events
PM / devfreq: passive: Use HZ_PER_KHZ macro in units.h
PM / devfreq: Fix cpufreq passive unregister erroring on PROBE_DEFER
PM / devfreq: Mute warning on governor PROBE_DEFER
PM / devfreq: Fix kernel panic with cpu based scaling to passive gov
cpufreq: Add MT8186 to cpufreq-dt-platdev blocklist
cpufreq: pmac32-cpufreq: Fix refcount leak bug
cpufreq: qcom-hw: Don't do lmh things without a throttle interrupt
drivers: cpufreq: Add missing of_node_put() in qoriq-cpufreq.c
cpufreq: amd-pstate: Add resume and suspend callbacks
Instead of passing an extra bool argument to pm_runtime_release_supplier(),
make its callers take care of triggering a runtime-suspend of the
supplier device as needed.
No expected functional impact.
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: 5.1+ <stable@vger.kernel.org> # 5.1+
Time-sensitive networking code needs to work with PTP times expressed in
nanoseconds, and with packet transmission times expressed in
picoseconds, since those would be fractional at higher than gigabit
speed when expressed in nanoseconds.
Convert the existing uses in tc-taprio and the ocelot/felix DSA driver
to a PSEC_PER_NSEC macro. This macro is placed in include/linux/time64.h
as opposed to its relatives (PSEC_PER_SEC etc) from include/vdso/time64.h
because the vDSO library does not (yet) need/use it.
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> # for the vDSO parts
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull drm fixes from Dave Airlie:
"Bit quieter this week, the main thing is it pulls in the fixes for the
sysfb resource issue you were seeing. these had been queued for next
so should have had some decent testing.
Otherwise amdgpu, i915 and msm each have a few fixes, and vc4 has one.
fbdev:
- sysfb fixes/conflicting fb fixes
amdgpu:
- GPU recovery fix
- Fix integer type usage in fourcc header for AMD modifiers
- KFD TLB flush fix for gfx9 APUs
- Display fix
i915:
- Fix ioctl argument error return
- Fix d3cold disable to allow PCI upstream bridge D3 transition
- Fix setting cache_dirty for dma-buf objects on discrete
msm:
- Fix to increment vsync_cnt before calling drm_crtc_handle_vblank so
that userspace sees the value *after* it is incremented if waiting
for vblank events
- Fix to reset drm_dev to NULL in dp_display_unbind to avoid a crash
in probe/bind error paths
- Fix to resolve the smatch error of de-referencing before NULL check
in dpu_encoder_phys_wb.c
- Fix error return to userspace if fence-id allocation fails in
submit ioctl
vc4:
- NULL ptr dereference fix"
* tag 'drm-fixes-2022-07-01' of git://anongit.freedesktop.org/drm/drm:
Revert "drm/amdgpu/display: set vblank_disable_immediate for DC"
drm/amdgpu: To flush tlb for MMHUB of RAVEN series
drm/fourcc: fix integer type usage in uapi header
drm/amdgpu: fix adev variable used in amdgpu_device_gpu_recover()
fbdev: Disable sysfb device registration when removing conflicting FBs
firmware: sysfb: Add sysfb_disable() helper function
firmware: sysfb: Make sysfb_create_simplefb() return a pdev pointer
drm/msm/gem: Fix error return on fence id alloc fail
drm/i915: tweak the ordering in cpu_write_needs_clflush
drm/i915/dgfx: Disable d3cold at gfx root port
drm/i915/gem: add missing else
drm/vc4: perfmon: Fix variable dereferenced before check
drm/msm/dpu: Fix variable dereferenced before check
drm/msm/dp: reset drm_dev to NULL at dp_display_unbind()
drm/msm/dpu: Increment vsync_cnt before waking up userspace
drivers/net/ethernet/microchip/sparx5/sparx5_switchdev.c
9c5de246c1 ("net: sparx5: mdb add/del handle non-sparx5 devices")
fbb89d02e3 ("net: sparx5: Allow mdb entries to both CPU and ports")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter.
Current release - new code bugs:
- clear msg_get_inq in __sys_recvfrom() and __copy_msghdr_from_user()
- mptcp:
- invoke MP_FAIL response only when needed
- fix shutdown vs fallback race
- consistent map handling on failure
- octeon_ep: use bitwise AND
Previous releases - regressions:
- tipc: move bc link creation back to tipc_node_create, fix NPD
Previous releases - always broken:
- tcp: add a missing nf_reset_ct() in 3WHS handling to prevent socket
buffered skbs from keeping refcount on the conntrack module
- ipv6: take care of disable_policy when restoring routes
- tun: make sure to always disable and unlink NAPI instances
- phy: don't trigger state machine while in suspend
- netfilter: nf_tables: avoid skb access on nf_stolen
- asix: fix "can't send until first packet is send" issue
- usb: asix: do not force pause frames support
- nxp-nci: don't issue a zero length i2c_master_read()
Misc:
- ncsi: allow use of proper "mellanox" DT vendor prefix
- act_api: add a message for user space if any actions were already
flushed before the error was hit"
* tag 'net-5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (55 commits)
net: dsa: felix: fix race between reading PSFP stats and port stats
selftest: tun: add test for NAPI dismantle
net: tun: avoid disabling NAPI twice
net: sparx5: mdb add/del handle non-sparx5 devices
net: sfp: fix memory leak in sfp_probe()
mlxsw: spectrum_router: Fix rollback in tunnel next hop init
net: rose: fix UAF bugs caused by timer handler
net: usb: ax88179_178a: Fix packet receiving
net: bonding: fix use-after-free after 802.3ad slave unbind
ipv6: fix lockdep splat in in6_dump_addrs()
net: phy: ax88772a: fix lost pause advertisement configuration
net: phy: Don't trigger state machine while in suspend
usbnet: fix memory allocation in helpers
selftests net: fix kselftest net fatal error
NFC: nxp-nci: don't print header length mismatch on i2c error
NFC: nxp-nci: Don't issue a zero length i2c_master_read()
net: tipc: fix possible refcount leak in tipc_sk_create()
nfc: nfcmrvl: Fix irq_of_parse_and_map() return value
net: ipv6: unexport __init-annotated seg6_hmac_net_init()
ipv6/sit: fix ipip6_tunnel_get_prl return value
...
Pull rdma fixes from Jason Gunthorpe:
"Three minor bug fixes:
- qedr not setting the QP timeout properly toward userspace
- Memory leak on error path in ib_cm
- Divide by 0 in RDMA interrupt moderation"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
linux/dim: Fix divide by 0 in RDMA DIM
RDMA/cm: Fix memory leak in ib_cm_insert_listen
RDMA/qedr: Fix reporting QP timeout attribute
Pull fanotify fix from Jan Kara:
"A fix for recently added fanotify API to have stricter checks and
refuse some invalid flag combinations to make our life easier in the
future"
* tag 'fsnotify_for_v5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fanotify: refine the validation checks on non-dir inode mask
add proc_dointvec_ms_jiffies_minmax to fit read msecs value to jiffies
with a limited range of values
Signed-off-by: Yuwei Wang <wangyuweihx@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>