* kvm-arm64/smccc-filtering:
: .
: SMCCC call filtering and forwarding to userspace, courtesy of
: Oliver Upton. From the cover letter:
:
: "The Arm SMCCC is rather prescriptive in regards to the allocation of
: SMCCC function ID ranges. Many of the hypercall ranges have an
: associated specification from Arm (FF-A, PSCI, SDEI, etc.) with some
: room for vendor-specific implementations.
:
: The ever-expanding SMCCC surface leaves a lot of work within KVM for
: providing new features. Furthermore, KVM implements its own
: vendor-specific ABI, with little room for other implementations (like
: Hyper-V, for example). Rather than cramming it all into the kernel we
: should provide a way for userspace to handle hypercalls."
: .
KVM: selftests: Fix spelling mistake "KVM_HYPERCAL_EXIT_SMC" -> "KVM_HYPERCALL_EXIT_SMC"
KVM: arm64: Test that SMC64 arch calls are reserved
KVM: arm64: Prevent userspace from handling SMC64 arch range
KVM: arm64: Expose SMC/HVC width to userspace
KVM: selftests: Add test for SMCCC filter
KVM: selftests: Add a helper for SMCCC calls with SMC instruction
KVM: arm64: Let errors from SMCCC emulation to reach userspace
KVM: arm64: Return NOT_SUPPORTED to guest for unknown PSCI version
KVM: arm64: Introduce support for userspace SMCCC filtering
KVM: arm64: Add support for KVM_EXIT_HYPERCALL
KVM: arm64: Use a maple tree to represent the SMCCC filter
KVM: arm64: Refactor hvc filtering to support different actions
KVM: arm64: Start handling SMCs from EL1
KVM: arm64: Rename SMC/HVC call handler to reflect reality
KVM: arm64: Add vm fd device attribute accessors
KVM: arm64: Add a helper to check if a VM has ran once
KVM: x86: Redefine 'longmode' as a flag for KVM_EXIT_HYPERCALL
Signed-off-by: Marc Zyngier <maz@kernel.org>
* kvm-arm64/timer-vm-offsets: (21 commits)
: .
: This series aims at satisfying multiple goals:
:
: - allow a VMM to atomically restore a timer offset for a whole VM
: instead of updating the offset each time a vcpu get its counter
: written
:
: - allow a VMM to save/restore the physical timer context, something
: that we cannot do at the moment due to the lack of offsetting
:
: - provide a framework that is suitable for NV support, where we get
: both global and per timer, per vcpu offsetting, and manage
: interrupts in a less braindead way.
:
: Conflict resolution involves using the new per-vcpu config lock instead
: of the home-grown timer lock.
: .
KVM: arm64: Handle 32bit CNTPCTSS traps
KVM: arm64: selftests: Augment existing timer test to handle variable offset
KVM: arm64: selftests: Deal with spurious timer interrupts
KVM: arm64: selftests: Add physical timer registers to the sysreg list
KVM: arm64: nv: timers: Support hyp timer emulation
KVM: arm64: nv: timers: Add a per-timer, per-vcpu offset
KVM: arm64: Document KVM_ARM_SET_CNT_OFFSETS and co
KVM: arm64: timers: Abstract the number of valid timers per vcpu
KVM: arm64: timers: Fast-track CNTPCT_EL0 trap handling
KVM: arm64: Elide kern_hyp_va() in VHE-specific parts of the hypervisor
KVM: arm64: timers: Move the timer IRQs into arch_timer_vm_data
KVM: arm64: timers: Abstract per-timer IRQ access
KVM: arm64: timers: Rationalise per-vcpu timer init
KVM: arm64: timers: Allow save/restoring of the physical timer
KVM: arm64: timers: Allow userspace to set the global counter offset
KVM: arm64: Expose {un,}lock_all_vcpus() to the rest of KVM
KVM: arm64: timers: Allow physical offset without CNTPOFF_EL2
KVM: arm64: timers: Use CNTPOFF_EL2 to offset the physical timer
arm64: Add HAS_ECV_CNTPOFF capability
arm64: Add CNTPOFF_EL2 register definition
...
Signed-off-by: Marc Zyngier <maz@kernel.org>
As the SMCCC (and related specifications) march towards an 'everything
and the kitchen sink' interface for interacting with a system it becomes
less likely that KVM will support every related feature. We could do
better by letting userspace have a crack at it instead.
Allow userspace to define an 'SMCCC filter' that applies to both HVCs
and SMCs initiated by the guest. Supporting both conduits with this
interface is important for a couple of reasons. Guest SMC usage is table
stakes for a nested guest, as HVCs are always taken to the virtual EL2.
Additionally, guests may want to interact with a service on the secure
side which can now be proxied by userspace.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230404154050.2270077-10-oliver.upton@linux.dev
Maple tree is an efficient B-tree implementation that is intended for
storing non-overlapping intervals. Such a data structure is a good fit
for the SMCCC filter as it is desirable to sparsely allocate the 32 bit
function ID space.
To that end, add a maple tree to kvm_arch and correctly init/teardown
along with the VM. Wire in a test against the hypercall filter for HVCs
which does nothing until the controls are exposed to userspace.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230404154050.2270077-8-oliver.upton@linux.dev
KVM handles SMCCC calls from virtual EL2 that use the SMC instruction
since commit bd36b1a9eb ("KVM: arm64: nv: Handle SMCs taken from
virtual EL2"). Thus, the function name of the handler no longer reflects
reality.
Normalize the name on SMCCC, since that's the only hypercall interface
KVM supports in the first place. No fuctional change intended.
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230404154050.2270077-5-oliver.upton@linux.dev
The 'longmode' field is a bit annoying as it blows an entire __u32 to
represent a boolean value. Since other architectures are looking to add
support for KVM_EXIT_HYPERCALL, now is probably a good time to clean it
up.
Redefine the field (and the remaining padding) as a set of flags.
Preserve the existing ABI by using bit 0 to indicate if the guest was in
long mode and requiring that the remaining 31 bits must be zero.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230404154050.2270077-2-oliver.upton@linux.dev
Emulating EL2 also means emulating the EL2 timers. To do so, we expand
our timer framework to deal with at most 4 timers. At any given time,
two timers are using the HW timers, and the two others are purely
emulated.
The role of deciding which is which at any given time is left to a
mapping function which is called every time we need to make such a
decision.
Reviewed-by: Colton Lewis <coltonlewis@google.com>
Co-developed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230330174800.2677007-18-maz@kernel.org
Being able to set a global offset isn't enough.
With NV, we also need to a per-vcpu, per-timer offset (for example,
CNTVCT_EL0 being offset by CNTVOFF_EL2).
Use a similar method as the VM-wide offset to have a timer point
to the shadow register that contains the offset value.
Reviewed-by: Colton Lewis <coltonlewis@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230330174800.2677007-17-maz@kernel.org
Having the timer IRQs duplicated into each vcpu isn't great, and
becomes absolutely awful with NV. So let's move these into
the per-VM arch_timer_vm_data structure.
This simplifies a lot of code, but requires us to introduce a
mutex so that we can reason about userspace trying to change
an interrupt number while another vcpu is running, something
that wasn't really well handled so far.
Reviewed-by: Colton Lewis <coltonlewis@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230330174800.2677007-12-maz@kernel.org
And this is the moment you have all been waiting for: setting the
counter offset from userspace.
We expose a brand new capability that reports the ability to set
the offset for both the virtual and physical sides.
In keeping with the architecture, the offset is expressed as
a delta that is substracted from the physical counter value.
Once this new API is used, there is no going back, and the counters
cannot be written to to set the offsets implicitly (the writes
are instead ignored).
Reviewed-by: Colton Lewis <coltonlewis@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230330174800.2677007-8-maz@kernel.org
Instead of accumulating the fractional ns value generated every time
we compute a ns delta in a global variable, use a per-vcpu, per-timer
variable. This keeps the fractional ns local to the timer instead of
contributing to any odd, unrelated timer.
Reviewed-by: Colton Lewis <coltonlewis@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230330174800.2677007-2-maz@kernel.org
Pull core fixes from Borislav Petkov:
- Do the delayed RCU wakeup for kthreads in the proper order so that
former doesn't get ignored
- A noinstr warning fix
* tag 'core_urgent_for_v6.3_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
entry/rcu: Check TIF_RESCHED _after_ delayed RCU wake-up
entry: Fix noinstr warning in __enter_from_user_mode()
Pull xfs percpu counter fixes from Darrick Wong:
"We discovered a filesystem summary counter corruption problem that was
traced to cpu hot-remove racing with the call to percpu_counter_sum
that sets the free block count in the superblock when writing it to
disk. The root cause is that percpu_counter_sum doesn't cull from
dying cpus and hence misses those counter values if the cpu shutdown
hooks have not yet run to merge the values.
I'm hoping this is a fairly painless fix to the problem, since the
dying cpu mask should generally be empty. It's been in for-next for a
week without any complaints from the bots.
- Fix a race in the percpu counters summation code where the
summation failed to add in the values for any CPUs that were dying
but not yet dead. This fixes some minor discrepancies and incorrect
assertions when running generic/650"
* tag 'xfs-6.3-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
pcpcntr: remove percpu_counter_sum_all()
fork: remove use of percpu_counter_sum_all
pcpcntrs: fix dying cpu summation race
cpumask: introduce for_each_cpu_or
Pull misc fixes from Andrew Morton:
"21 hotfixes, 8 of which are cc:stable. 11 are for MM, the remainder
are for other subsystems"
* tag 'mm-hotfixes-stable-2023-03-24-17-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (21 commits)
mm: mmap: remove newline at the end of the trace
mailmap: add entries for Richard Leitner
kcsan: avoid passing -g for test
kfence: avoid passing -g for test
mm: kfence: fix using kfence_metadata without initialization in show_object()
lib: dhry: fix unstable smp_processor_id(_) usage
mailmap: add entry for Enric Balletbo i Serra
mailmap: map Sai Prakash Ranjan's old address to his current one
mailmap: map Rajendra Nayak's old address to his current one
Revert "kasan: drop skip_kasan_poison variable in free_pages_prepare"
mailmap: add entry for Tobias Klauser
kasan, powerpc: don't rename memintrinsics if compiler adds prefixes
mm/ksm: fix race with VMA iteration and mm_struct teardown
kselftest: vm: fix unused variable warning
mm: fix error handling for map_deny_write_exec
mm: deduplicate error handling for map_deny_write_exec
checksyscalls: ignore fstat to silence build warning on LoongArch
nilfs2: fix kernel-infoleak in nilfs_ioctl_wrap_copy()
test_maple_tree: add more testing for mas_empty_area()
maple_tree: fix mas_skip_node() end slot detection
...
Pull block fixes from Jens Axboe:
- NVMe pull request via Christoph:
- Send Identify with CNS 06h only to I/O controllers (Martin
George)
- Fix nvme_tcp_term_pdu to match spec (Caleb Sander)
- Pass in issue_flags for uring_cmd, so the end_io handlers don't need
to assume what the right context is (me)
- Fix for ublk, marking it as LIVE before adding it to avoid races on
the initial IO (Ming)
* tag 'block-6.3-2023-03-24' of git://git.kernel.dk/linux:
nvme-tcp: fix nvme_tcp_term_pdu to match spec
nvme: send Identify with CNS 06h only to I/O controllers
block/io_uring: pass in issue_flags for uring_cmd task_work handling
block: ublk_drv: mark device as LIVE before adding disk
Pull thermal control fixes from Rafael Wysocki:
"These address two recent regressions related to thermal control.
Specifics:
- Restore the thermal core behavior regarding zero-temperature trip
points to avoid a driver regression (Ido Schimmel)
- Fix a recent regression in the ACPI processor driver preventing it
from changing the number of CPU cooling device states exposed via
sysfs after the given CPU cooling device has been registered
(Rafael Wysocki)"
* tag 'thermal-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: core: Restore behavior regarding invalid trip points
ACPI: processor: thermal: Update CPU cooling devices on cpufreq policy changes
thermal: core: Introduce thermal_cooling_device_update()
thermal: core: Introduce thermal_cooling_device_present()
ACPI: processor: Reorder acpi_processor_driver_init()
Pull EFI fixes from Ard Biesheuvel:
- Set the NX compat flag for arm64 and zboot, to ensure compatibility
with EFI firmware that complies with tightening requirements imposed
across the ecosystem.
- Improve identification of Ampere Altra systems based on SMBIOS data.
- Fix some issues related to the EFI framebuffer that were introduced
as a result from some refactoring related to zboot and the merge with
sysfb.
- Makefile tweak to avoid rebuilding vmlinuz unnecessarily.
- Fix efi_random_alloc() return value on out of memory condition.
* tag 'efi-fixes-for-v6.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
efi/libstub: randomalloc: Return EFI_OUT_OF_RESOURCES on failure
efi/libstub: Use relocated version of kernel's struct screen_info
efi/libstub: zboot: Add compressed image to make targets
efi: sysfb_efi: Add quirk for Lenovo Yoga Book X91F/L
efi: sysfb_efi: Fix DMI quirks not working for simpledrm
efi/libstub: smbios: Drop unused 'recsize' parameter
arm64: efi: Use SMBIOS processor version to key off Ampere quirk
efi/libstub: smbios: Use length member instead of record struct size
efi: earlycon: Reprobe after parsing config tables
arm64: efi: Set NX compat flag in PE/COFF header
efi/libstub: arm64: Remap relocated image with strict permissions
efi/libstub: zboot: Mark zboot EFI application as NX compatible
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf, wifi and bluetooth.
Current release - regressions:
- wifi: mt76: mt7915: add back 160MHz channel width support for
MT7915
- libbpf: revert poisoning of strlcpy, it broke uClibc-ng
Current release - new code bugs:
- bpf: improve the coverage of the "allow reads from uninit stack"
feature to fix verification complexity problems
- eth: am65-cpts: reset PPS genf adj settings on enable
Previous releases - regressions:
- wifi: mac80211: serialize ieee80211_handle_wake_tx_queue()
- wifi: mt76: do not run mt76_unregister_device() on unregistered hw,
fix null-deref
- Bluetooth: btqcomsmd: fix command timeout after setting BD address
- eth: igb: revert rtnl_lock() that causes a deadlock
- dsa: mscc: ocelot: fix device specific statistics
Previous releases - always broken:
- xsk: add missing overflow check in xdp_umem_reg()
- wifi: mac80211:
- fix QoS on mesh interfaces
- fix mesh path discovery based on unicast packets
- Bluetooth:
- ISO: fix timestamped HCI ISO data packet parsing
- remove "Power-on" check from Mesh feature
- usbnet: more fixes to drivers trusting packet length
- wifi: iwlwifi: mvm: fix mvmtxq->stopped handling
- Bluetooth: btintel: iterate only bluetooth device ACPI entries
- eth: iavf: fix inverted Rx hash condition leading to disabled hash
- eth: igc: fix the validation logic for taprio's gate list
- dsa: tag_brcm: legacy: fix daisy-chained switches
Misc:
- bpf: adjust insufficient default bpf_jit_limit to account for
growth of BPF use over the last 5 years
- xdp: bpf_xdp_metadata() use EOPNOTSUPP as unique errno indicating
no driver support"
* tag 'net-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (84 commits)
Bluetooth: HCI: Fix global-out-of-bounds
Bluetooth: mgmt: Fix MGMT add advmon with RSSI command
Bluetooth: btsdio: fix use after free bug in btsdio_remove due to unfinished work
Bluetooth: L2CAP: Fix responding with wrong PDU type
Bluetooth: btqcomsmd: Fix command timeout after setting BD address
Bluetooth: btinel: Check ACPI handle for NULL before accessing
net: mdio: thunder: Add missing fwnode_handle_put()
net: dsa: mt7530: move setting ssc_delta to PHY_INTERFACE_MODE_TRGMII case
net: dsa: mt7530: move lowering TRGMII driving to mt7530_setup()
net: dsa: mt7530: move enabling disabling core clock to mt7530_pll_setup()
net: asix: fix modprobe "sysfs: cannot create duplicate filename"
gve: Cache link_speed value from device
tools: ynl: Fix genlmsg header encoding formats
net: enetc: fix aggregate RMON counters not showing the ranges
Bluetooth: Remove "Power-on" check from Mesh feature
Bluetooth: Fix race condition in hci_cmd_sync_clear
Bluetooth: btintel: Iterate only bluetooth device ACPI entries
Bluetooth: ISO: fix timestamped HCI ISO data packet parsing
Bluetooth: btusb: Remove detection of ISO packets over bulk
Bluetooth: hci_core: Detect if an ACL packet is in fact an ISO packet
...
Current flow interates over entire ACPI table entries looking for
Bluetooth Per Platform Antenna Gain(PPAG) entry. This patch iterates
over ACPI entries relvant to Bluetooth device only.
Fixes: c585a92b2f ("Bluetooth: btintel: Set Per Platform Antenna Gain(PPAG)")
Signed-off-by: Kiran K <kiran.k@intel.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Introduce a core thermal API function, thermal_cooling_device_update(),
for updating the max_state value for a cooling device and rearranging
its statistics in sysfs after a possible change of its ->get_max_state()
callback return value.
That callback is now invoked only once, during cooling device
registration, to populate the max_state field in the cooling device
object, so if its return value changes, it needs to be invoked again
and the new return value needs to be stored as max_state. Moreover,
the statistics presented in sysfs need to be rearranged in general,
because there may not be enough room in them to store data for all
of the possible states (in the case when max_state grows).
The new function takes care of that (and some other minor things
related to it), but some extra locking and lockdep annotations are
added in several places too to protect against crashes in the cases
when the statistics are not present or when a stale max_state value
might be used by sysfs attributes.
Note that the actual user of the new function will be added separately.
Link: https://lore.kernel.org/linux-pm/53ec1f06f61c984100868926f282647e57ecfb2d.camel@intel.com/
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
The FEI field of C2HTermReq/H2CTermReq is 4 bytes but not 4-byte-aligned
in the NVMe/TCP specification (it is located at offset 10 in the PDU).
Split it into two 16-bit integers in struct nvme_tcp_term_pdu
so no padding is inserted. There should also be 10 reserved bytes after.
There are currently no users of this type.
Fixes: fc221d0544 ("nvme-tcp: Add protocol header")
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Caleb Sander <csander@purestorage.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
__enter_from_user_mode() is triggering noinstr warnings with
CONFIG_DEBUG_PREEMPT due to its call of preempt_count_add() via
ct_state().
The preemption disable isn't needed as interrupts are already disabled.
And the context_tracking_enabled() check in ct_state() also isn't needed
as that's already being done by the CT_WARN_ON().
Just use __ct_state() instead.
Fixes the following warnings:
vmlinux.o: warning: objtool: enter_from_user_mode+0xba: call to preempt_count_add() leaves .noinstr.text section
vmlinux.o: warning: objtool: syscall_enter_from_user_mode+0xf9: call to preempt_count_add() leaves .noinstr.text section
vmlinux.o: warning: objtool: syscall_enter_from_user_mode_prepare+0xc7: call to preempt_count_add() leaves .noinstr.text section
vmlinux.o: warning: objtool: irqentry_enter_from_user_mode+0xba: call to preempt_count_add() leaves .noinstr.text section
Fixes: 171476775d ("context_tracking: Convert state to atomic_t")
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/d8955fa6d68dc955dda19baf13ae014ae27926f5.1677369694.git.jpoimboe@kernel.org
io_uring_cmd_done() currently assumes that the uring_lock is held
when invoked, and while it generally is, this is not guaranteed.
Pass in the issue_flags associated with it, so that we have
IO_URING_F_UNLOCKED available to be able to lock the CQ ring
appropriately when completing events.
Cc: stable@vger.kernel.org
Fixes: ee692a21e9 ("fs,io_uring: add infrastructure for uring-cmd")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull NFS client fixes from Anna Schumaker:
- Fix /proc/PID/io read_bytes accounting
- Fix setting NLM file_lock start and end during decoding testargs
- Fix timing for setting access cache timestamps
* tag 'nfs-for-6.3-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
NFS: Correct timing for assigning access cache timestamp
lockd: set file_lock start and end when decoding nlm4 testargs
NFS: Fix /proc/PID/io read_bytes for buffered reads
percpu_counter_sum_all() is now redundant as the race condition it
was invented to handle is now dealt with by percpu_counter_sum()
directly and all users of percpu_counter_sum_all() have been
removed.
Remove it.
This effectively reverts the changes made in f689054aac
("percpu_counter: add percpu_counter_sum_all interface") except for
the cpumask iteration that fixes percpu_counter_sum() made earlier
in this series.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Equivalent of for_each_cpu_and, except it ORs the two masks together
so it iterates all the CPUs present in either mask.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Currently DMA address width is either read from a RO device register
or force set from the platform data. This breaks DMA when the host DMA
address width is <=32it but the device is >32bit.
Right now the driver may decide to use a 2nd DMA descriptor for
another buffer (happens in case of TSO xmit) assuming that 32bit
addressing is used due to platform configuration but the device will
still use both descriptor addresses as one address.
This can be observed with the Intel EHL platform driver that sets
32bit for addr64 but the MAC reports 40bit. The TX queue gets stuck in
case of TCP with iptables NAT configuration on TSO packets.
The logic should be like this: Whatever we do on the host side (memory
allocation GFP flags) should happen with the host DMA width, whenever
we decide how to set addresses on the device registers we must use the
device DMA address width.
This patch renames the platform address width field from addr64 (term
used in device datasheet) to host_addr and uses this value exclusively
for host side operations while all chip operations consider the device
DMA width as read from the device register.
Fixes: 7cfc4486e7 ("stmmac: intel: Configure EHL PSE0 GbE and PSE1 GbE to 32 bits DMA addressing")
Signed-off-by: Jochen Henneberg <jh@henneberg-systemdesign.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bus ownership is wrong when using acpi_mdiobus_register() to register an
mdio bus. That function is not inline, so when it calls
mdiobus_register() the wrong THIS_MODULE value is captured.
CC: Maxime Bizon <mbizon@freebox.fr>
Fixes: 803ca24d2f ("net: mdio: Add ACPI support code for mdio")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bus ownership is wrong when using of_mdiobus_register() to register an mdio
bus. That function is not inline, so when it calls mdiobus_register() the wrong
THIS_MODULE value is captured.
Signed-off-by: Maxime Bizon <mbizon@freebox.fr>
Fixes: 90eff9096c ("net: phy: Allow splitting MDIO bus/device support from PHYs")
[florian: fix kdoc, added Fixes tag]
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 8633ef82f1 ("drivers/firmware: consolidate EFI framebuffer setup
for all arches") moved the sysfb_apply_efi_quirks() call in sysfb_init()
from before the [sysfb_]parse_mode() call to after it.
But sysfb_apply_efi_quirks() modifies the global screen_info struct which
[sysfb_]parse_mode() parses, so doing it later is too late.
This has broken all DMI based quirks for correcting wrong firmware efifb
settings when simpledrm is used.
To fix this move the sysfb_apply_efi_quirks() call back to its old place
and split the new setup of the efifb_fwnode (which requires
the platform_device) into its own function and call that at
the place of the moved sysfb_apply_efi_quirks(pd) calls.
Fixes: 8633ef82f1 ("drivers/firmware: consolidate EFI framebuffer setup for all arches")
Cc: stable@vger.kernel.org
Cc: Javier Martinez Canillas <javierm@redhat.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter, wifi and ipsec.
A little more changes than usual, but it's pretty normal for us that
the rc3/rc4 PRs are oversized as people start testing in earnest.
Possibly an extra boost from people deploying the 6.1 LTS but that's
more of an unscientific hunch.
Current release - regressions:
- phy: mscc: fix deadlock in phy_ethtool_{get,set}_wol()
- virtio: vsock: don't use skbuff state to account credit
- virtio: vsock: don't drop skbuff on copy failure
- virtio_net: fix page_to_skb() miscalculating the memory size
Current release - new code bugs:
- eth: correct xdp_features after device reconfig
- wifi: nl80211: fix the puncturing bitmap policy
- net/mlx5e: flower:
- fix raw counter initialization
- fix missing error code
- fix cloned flow attribute
- ipa:
- fix some register validity checks
- fix a surprising number of bad offsets
- kill FILT_ROUT_CACHE_CFG IPA register
Previous releases - regressions:
- tcp: fix bind() conflict check for dual-stack wildcard address
- veth: fix use after free in XDP_REDIRECT when skb headroom is small
- ipv4: fix incorrect table ID in IOCTL path
- ipvlan: make skb->skb_iif track skb->dev for l3s mode
- mptcp:
- fix possible deadlock in subflow_error_report
- fix UaFs when destroying unaccepted and listening sockets
- dsa: mv88e6xxx: fix max_mtu of 1492 on 6165, 6191, 6220, 6250, 6290
Previous releases - always broken:
- tcp: tcp_make_synack() can be called from process context, don't
assume preemption is disabled when updating stats
- netfilter: correct length for loading protocol registers
- virtio_net: add checking sq is full inside xdp xmit
- bonding: restore IFF_MASTER/SLAVE flags on bond enslave Ethertype
change
- phy: nxp-c45-tja11xx: fix MII_BASIC_CONFIG_REV bit number
- eth: i40e: fix crash during reboot when adapter is in recovery mode
- eth: ice: avoid deadlock on rtnl lock when auxiliary device
plug/unplug meets bonding
- dsa: mt7530:
- remove now incorrect comment regarding port 5
- set PLL frequency and trgmii only when trgmii is used
- eth: mtk_eth_soc: reset PCS state when changing interface types
Misc:
- ynl: another license adjustment
- move the TCA_EXT_WARN_MSG attribute for tc action"
* tag 'net-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (108 commits)
selftests: bonding: add tests for ether type changes
bonding: restore bond's IFF_SLAVE flag if a non-eth dev enslave fails
bonding: restore IFF_MASTER/SLAVE flags on bond enslave ether type change
net: renesas: rswitch: Fix GWTSDIE register handling
net: renesas: rswitch: Fix the output value of quote from rswitch_rx()
ethernet: sun: add check for the mdesc_grab()
net: ipa: fix some register validity checks
net: ipa: kill FILT_ROUT_CACHE_CFG IPA register
net: ipa: add two missing declarations
net: ipa: reg: include <linux/bug.h>
net: xdp: don't call notifiers during driver init
net/sched: act_api: add specific EXT_WARN_MSG for tc action
Revert "net/sched: act_api: move TCA_EXT_WARN_MSG to the correct hierarchy"
net: dsa: microchip: fix RGMII delay configuration on KSZ8765/KSZ8794/KSZ8795
ynl: make the tooling check the license
ynl: broaden the license even more
tools: ynl: make definitions optional again
hsr: ratelimit only when errors are printed
qed/qed_mng_tlv: correctly zero out ->min instead of ->hour
selftests: net: devlink_port_split.py: skip test if no suitable device available
...
Pull block fixes from Jens Axboe:
"A bit bigger than usual, as the NVMe pull request missed last weeks
submission. In detail:
- NVMe pull request via Christoph:
- Avoid potential UAF in nvmet_req_complete (Damien Le Moal)
- More quirks (Elmer Miroslav Mosher Golovin, Philipp Geulen)
- Fix a memory leak in the nvme-pci probe teardown path
(Irvin Cote)
- Repair the MAINTAINERS entry (Lukas Bulwahn)
- Fix handling single range discard request (Ming Lei)
- Show more opcode names in trace events (Minwoo Im)
- Fix nvme-tcp timeout reporting (Sagi Grimberg)
- MD pull request via Song:
- Two fixes for old issues (Neil)
- Resource leak in device stopping (Xiao)
- Bio based device stats fix (Yu)
- Kill unused CONFIG_BLOCK_COMPAT (Lukas)
- sunvdc missing mdesc_grab() failure check (Liang)
- Fix for reversal of request ordering upon issue for certain cases
(Jan)
- null_blk timeout fixes (Damien)
- Loop use-after-free fix (Bart)
- blk-mq SRCU fix for BLK_MQ_F_BLOCKING devices (Chris)"
* tag 'block-6.3-2023-03-16' of git://git.kernel.dk/linux:
block: remove obsolete config BLOCK_COMPAT
md: select BLOCK_LEGACY_AUTOLOAD
block: count 'ios' and 'sectors' when io is done for bio-based device
block: sunvdc: add check for mdesc_grab() returning NULL
nvmet: avoid potential UAF in nvmet_req_complete()
nvme-trace: show more opcode names
nvme-tcp: add nvme-tcp pdu size build protection
nvme-tcp: fix opcode reporting in the timeout handler
nvme-pci: add NVME_QUIRK_BOGUS_NID for Lexar NM620
nvme-pci: add NVME_QUIRK_BOGUS_NID for Netac NV3000
nvme-pci: fixing memory leak in probe teardown path
nvme: fix handling single range discard request
MAINTAINERS: repair malformed T: entries in NVM EXPRESS DRIVERS
block: null_blk: cleanup null_queue_rq()
block: null_blk: Fix handling of fake timeout request
blk-mq: fix "bad unlock balance detected" on q->srcu in __blk_mq_run_dispatch_ops
loop: Fix use-after-free issues
block: do not reverse request order when flushing plug list
md: avoid signed overflow in slot_store()
md: Free resources in __md_stop
Pull ACPI fixes from Rafael Wysocki:
"These add some new quirks, fix PPTT handling, fix an ACPI utility and
correct a mistake in the ACPI documentation.
Specifics:
- Fix ACPI PPTT handling to avoid sleep in the atomic context when it
is not present (Sudeep Holla)
- Add 'backlight=native' DMI quirk for Dell Vostro 15 3535 to the
ACPI video driver (Chia-Lin Kao)
- Add ACPI quirks for I2C device enumeration on Lenovo Yoga Book X90
and Acer Iconia One 7 B1-750 (Hans de Goede)
- Fix handling of invalid command line option values in the ACPI
pfrut utility (Chen Yu)
- Fix references to I2C device data type in the ACPI documentation
for device enumeration (Andy Shevchenko)"
* tag 'acpi-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: tools: pfrut: Check if the input of level and type is in the right numeric range
ACPI: PPTT: Fix to avoid sleep in the atomic context when PPTT is absent
ACPI: x86: Add skip i2c clients quirk for Lenovo Yoga Book X90
ACPI: x86: Add skip i2c clients quirk for Acer Iconia One 7 B1-750
ACPI: x86: Introduce an acpi_quirk_skip_gpio_event_handlers() helper
ACPI: video: Add backlight=native DMI quirk for Dell Vostro 15 3535
ACPI: docs: enumeration: Correct reference to the I²C device data type
Pull xen fixes from Juergen Gross:
- cleanup for xen time handling
- enable the VGA console in a Xen PVH dom0
- cleanup in the xenfs driver
* tag 'for-linus-6.3-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: remove unnecessary (void*) conversions
x86/PVH: obtain VGA console info in Dom0
x86/xen/time: cleanup xen_tsc_safe_clocksource
xen: update arch/x86/include/asm/xen/cpuid.h
Pull s390 fixes from Vasily Gorbik:
- Update defconfigs
- Fix early boot code by adding missing intersection check to prevent
potential overwriting of the ipl report
- Fix a use-after-free issue in s390-specific code related to PCI
resources being retained after hot-unplugging individual functions,
by removing the resources from the PCI bus's resource list and using
the zpci_bar_struct's resource pointer directly
* tag 's390-6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: update defconfigs
PCI: s390: Fix use-after-free of PCI resources with per-function hotplug
s390/ipl: add missing intersection check to ipl_report handling
Pull drm fixes from Dave Airlie:
"Seems like a pretty regular rc3, i915 and amdgpu with the usual
selection of fixes, then a scattering of fixes across misc drivers and
other areas:
accel:
- build fix for accel
edid:
- fix info leak in edid
ttm:
- fix NULL ptr deref
- reference counting fix
i915:
- Fix hwmon PL1 power limit enabling
- Fix audio ELD handling for DP MST
- Fix PSR io and wake line calculations
- Fix DG2 HDMI modes with 267.30 and 319.89 MHz pixel clocks
- Fix SSEU subslice out-of-bounds access
- Fix misuse of non-idle barriers as fence trackers
amdgpu:
- SMU 13 update
- RDNA2 suspend/resume fix when overclocking is enabled
- SRIOV VCN fixes
- HDCP suspend/resume fix
- Fix drm polling splat regression
- Fix dirty rectangle tracking for PSR
- Fix vangogh regression on certain BIOSes
- Misc display fixes
- Suspend/resume IOMMU regression fix
amdkfd:
- Fix BO offset for multi-VMA page migration
- Fix a possible double free
- Fix potential use after free
- Fix process cleanup on module exit
bridge:
- fix returned array size name documentation
fbdev:
- ref-counting fix for fbdev deferred I/O
virtio:
- dma sync fix
shmem-helper:
- error path fix
msm:
- shrinker blocking fix
panfrost:
- shrinker rpm fix
chipsfb:
- fix error code
meson:
- fix 1px pink line
- fix regulator interaction
sun4i:
- fix missing component unbind"
* tag 'drm-fixes-2023-03-17' of git://anongit.freedesktop.org/drm/drm: (38 commits)
drm/ttm: drop extra ttm_bo_put in ttm_bo_cleanup_refs
drm/amdgpu: Don't resume IOMMU after incomplete init
drm/amdkfd: Fixed kfd_process cleanup on module exit.
drm/amd/display: disconnect MPCC only on OTG change
drm/amd/display: Fix DP MST sinks removal issue
drm/amd/display: Do not set DRR on pipe Commit
drm/amd/display: Remove OTG DIV register write for Virtual signals.
drm/meson: dw-hdmi: Fix devm_regulator_*get_enable*() conversion again
drm/bridge: Fix returned array size name for atomic_get_input_bus_fmts kdoc
drm/amdgpu/vcn: Disable indirect SRAM on Vangogh broken BIOSes
drm/amdgpu/nv: fix codec array for SR_IOV
drm/amd/display: Write to correct dirty_rect
drm/amdgpu: move poll enabled/disable into non DC path
drm/amd/display: Fix HDCP failing to enable after suspend
drm/amdkfd: fix potential kgd_mem UAFs
drm/amdgpu/vcn: custom video info caps for sriov
drm/amd/pm: Fix sienna cichlid incorrect OD volage after resume
drm/amd/pm: bump SMU 13.0.4 driver_if header version
drm/amdkfd: fix a potential double free in pqm_create_queue
drm/amdkfd: Get prange->offset after svm_range_vram_node_new
...
Pull SCSI fixes from James Bottomley:
"Ten patches, eight in drivers and two in the core, which correct a
regression from directory removal and add a no VPD size quirk also to
fix a regression. All pretty small"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: mcq: Use active_reqs to check busy in clock scaling
scsi: core: Fix a procfs host directory removal regression
scsi: core: Add BLIST_NO_VPD_SIZE for some VDASD
scsi: mpi3mr: Fix expander node leak in mpi3mr_remove()
scsi: mpi3mr: Fix memory leaks in mpi3mr_init_ioc()
scsi: mpi3mr: Fix sas_hba.phy memory leak in mpi3mr_remove()
scsi: mpi3mr: Fix mpi3mr_hba_port memory leak in mpi3mr_remove()
scsi: mpi3mr: Fix config page DMA memory leak
scsi: mpi3mr: Fix throttle_groups memory leak
scsi: mpt3sas: Fix NULL pointer access in mpt3sas_transport_port_add()
Merge a new ACPI backlight quirk, new ACPI quirks for I2C device
enumeration on some platforms, a pfrut utility fix and an ACPI
documentation fix for 6.3-rc3:
- Add backlight=native DMI quirk for Dell Vostro 15 3535 to the ACPI
video driver (Chia-Lin Kao).
- Add ACPI quirks for I2C devices enumeration on Lenovo Yoga Book X90
and Acer Iconia One 7 B1-750 (Hans de Goede).
- Fix handling of invalid command line option values in the ACPI pfrut
utility (Chen Yu).
- Fix references to I2C device data type in the ACPI documentation for
device enumeration (Andy Shevchenko).
* acpi-video:
ACPI: video: Add backlight=native DMI quirk for Dell Vostro 15 3535
* acpi-x86:
ACPI: x86: Add skip i2c clients quirk for Lenovo Yoga Book X90
ACPI: x86: Add skip i2c clients quirk for Acer Iconia One 7 B1-750
ACPI: x86: Introduce an acpi_quirk_skip_gpio_event_handlers() helper
* acpi-tools:
ACPI: tools: pfrut: Check if the input of level and type is in the right numeric range
* acpi-docs:
ACPI: docs: enumeration: Correct reference to the I²C device data type
In my previous commit 0349b8779c ("sched: add new attr TCA_EXT_WARN_MSG
to report tc extact message") I didn't notice the tc action use different
enum with filter. So we can't use TCA_EXT_WARN_MSG directly for tc action.
Let's add a TCA_ROOT_EXT_WARN_MSG for tc action specifically and put this
param before going to the TCA_ACT_TAB nest.
Fixes: 0349b8779c ("sched: add new attr TCA_EXT_WARN_MSG to report tc extact message")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
I relicensed Netlink spec code to GPL-2.0 OR BSD-3-Clause but
we still put a slightly different license on the uAPI header
than the rest of the code. Use the Linux-syscall-note on all
the specs and all generated code. It's moot for kernel code,
but should not hurt. This way the licenses match everywhere.
Cc: Chuck Lever <chuck.lever@oracle.com>
Fixes: 37d9df224d ("ynl: re-license uniformly under GPL-2.0 OR BSD-3-Clause")
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull kvm fixes from Paolo Bonzini:
"ARM64:
- Address a rather annoying bug w.r.t. guest timer offsetting. The
synchronization of timer offsets between vCPUs was broken, leading
to inconsistent timer reads within the VM.
x86:
- New tests for the slow path of the EVTCHNOP_send Xen hypercall
- Add missing nVMX consistency checks for CR0 and CR4
- Fix bug that broke AMD GATag on 512 vCPU machines
Selftests:
- Skip hugetlb tests if huge pages are not available
- Sync KVM exit reasons"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: selftests: Sync KVM exit reasons in selftests
KVM: selftests: Add macro to generate KVM exit reason strings
KVM: selftests: Print expected and actual exit reason in KVM exit reason assert
KVM: selftests: Make vCPU exit reason test assertion common
KVM: selftests: Add EVTCHNOP_send slow path test to xen_shinfo_test
KVM: selftests: Use enum for test numbers in xen_shinfo_test
KVM: selftests: Add helpers to make Xen-style VMCALL/VMMCALL hypercalls
KVM: selftests: Move the guts of kvm_hypercall() to a separate macro
KVM: SVM: WARN if GATag generation drops VM or vCPU ID information
KVM: SVM: Modify AVIC GATag to support max number of 512 vCPUs
KVM: SVM: Fix a benign off-by-one bug in AVIC physical table mask
selftests: KVM: skip hugetlb tests if huge pages are not available
KVM: VMX: Use tabs instead of spaces for indentation
KVM: VMX: Fix indentation coding style issue
KVM: nVMX: remove unnecessary #ifdef
KVM: nVMX: add missing consistency checks for CR0 and CR4
KVM: arm64: timers: Convert per-vcpu virtual offset to a global value
Pull NVMe fixes from Christoph:
"nvme fixes for Linux 6.3
- avoid potential UAF in nvmet_req_complete (Damien Le Moal)
- more quirks (Elmer Miroslav Mosher Golovin, Philipp Geulen)
- fix a memory leak in the nvme-pci probe teardown path (Irvin Cote)
- repair the MAINTAINERS entry (Lukas Bulwahn)
- fix handling single range discard request (Ming Lei)
- show more opcode names in trace events (Minwoo Im)
- fix nvme-tcp timeout reporting (Sagi Grimberg)"
* tag 'nvme-6.3-2022-03-16' of git://git.infradead.org/nvme:
nvmet: avoid potential UAF in nvmet_req_complete()
nvme-trace: show more opcode names
nvme-tcp: add nvme-tcp pdu size build protection
nvme-tcp: fix opcode reporting in the timeout handler
nvme-pci: add NVME_QUIRK_BOGUS_NID for Lexar NM620
nvme-pci: add NVME_QUIRK_BOGUS_NID for Netac NV3000
nvme-pci: fixing memory leak in probe teardown path
nvme: fix handling single range discard request
MAINTAINERS: repair malformed T: entries in NVM EXPRESS DRIVERS
While using iostat for raid, I observed very strange 'await'
occasionally, and turns out it's due to that 'ios' and 'sectors' is
counted in bdev_start_io_acct(), while 'nsecs' is counted in
bdev_end_io_acct(). I'm not sure why they are ccounted like that
but I think this behaviour is obviously wrong because user will get
wrong disk stats.
Fix the problem by counting 'ios' and 'sectors' when io is done, like
what rq-based device does.
Fixes: 394ffa503b ("blk: introduce generic io stat accounting help function")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230223091226.1135678-1-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We have more commands to show in the trace. Sync up.
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>