Pull smb server fixes from Steve French:
- Fix out of bounds write
- Fix for better calculating max output buffers
- Fix memory leaks in SMB2/SMB3 lock
- Fix use after free
- Multichannel fix
* tag 'v7.0-rc5-ksmbd-srv-fixes' of git://git.samba.org/ksmbd:
ksmbd: fix potencial OOB in get_file_all_info() for compound requests
ksmbd: replace hardcoded hdr2_len with offsetof() in smb2_calc_max_out_buf_len()
ksmbd: fix memory leaks and NULL deref in smb2_lock()
ksmbd: fix use-after-free and NULL deref in smb_grant_oplock()
ksmbd: do not expire session on binding failure
Identified resume-probe race condition in kernel v7.0 with the commit
38fa29b01a ("i2c: designware: Combine the init functions"),but this
issue existed from the beginning though not detected.
The amdisp i2c device requires ISP to be in power-on state for probe
to succeed. To meet this requirement, this device is added to genpd
to control ISP power using runtime PM. The pm_runtime_get_sync() called
before i2c_dw_probe() triggers PM resume, which powers on ISP and also
invokes the amdisp i2c runtime resume before the probe completes resulting
in this race condition and a NULL dereferencing issue in v7.0
Fix this race condition by using the genpd APIs directly during probe:
- Call dev_pm_genpd_resume() to Power ON ISP before probe
- Call dev_pm_genpd_suspend() to Power OFF ISP after probe
- Set the device to suspended state with pm_runtime_set_suspended()
- Enable runtime PM only after the device is fully initialized
Fixes: d6263c468a ("i2c: amd-isp: Add ISP i2c-designware driver")
Co-developed-by: Bin Du <bin.du@amd.com>
Signed-off-by: Bin Du <bin.du@amd.com>
Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com>
Cc: <stable@vger.kernel.org> # v6.16+
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Link: https://lore.kernel.org/r/20260320201302.3490570-1-pratap.nirujogi@amd.com
When reading from the I2DR register, right after releasing the bus by
clearing MSTA and MTX, the I2C controller might still generate an
additional clock cycle which can cause devices to misbehave. Ensure to
only read from I2DR after the bus is not busy anymore. Because this
requires polling, the read of the last byte is moved outside of the
interrupt handler.
An example for such a failing transfer is this:
i2ctransfer -y -a 0 w1@0x00 0x02 r1
Error: Sending messages failed: Connection timed out
It does not happen with every device because not all devices react to
the additional clock cycle.
Fixes: 5f5c2d4579 ("i2c: imx: prevent rescheduling in non dma mode")
Cc: stable@vger.kernel.org # v6.13+
Signed-off-by: Stefan Eichenberger <stefan.eichenberger@toradex.com>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Link: https://lore.kernel.org/r/20260218150940.131354-3-eichest@gmail.com
When reading multiple messages, meaning a repeated start is required,
polling the bus busy bit must be avoided. This must only be done for
the last message. Otherwise, the driver will timeout.
Here an example of such a sequence that fails with an error:
i2ctransfer -y -a 0 w1@0x00 0x02 r1 w1@0x00 0x02 r1
Error: Sending messages failed: Connection timed out
Fixes: 5f5c2d4579 ("i2c: imx: prevent rescheduling in non dma mode")
Cc: stable@vger.kernel.org # v6.13+
Signed-off-by: Stefan Eichenberger <stefan.eichenberger@toradex.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Link: https://lore.kernel.org/r/20260218150940.131354-2-eichest@gmail.com
Commit 7d6899fb69 ("ovl: fsync after metadata copy-up") was done to
fix durability of overlayfs copy up on an upper filesystem which does
not enforce ordering on storing of metadata changes (e.g. ubifs).
In an earlier revision of the regressing commit by Lei Lv, the metadata
fsync behavior was opt-in via a new "fsync=strict" mount option.
We were hoping that the opt-in mount option could be avoided, so the
change was only made to depend on metacopy=off, in the hope of not
hurting performance of metadata heavy workloads, which are more likely
to be using metacopy=on.
This hope was proven wrong by a performance regression report from Google
COS workload after upgrade to kernel 6.12.
This is an adaptation of Lei's original "fsync=strict" mount option
to the existing upstream code.
The new mount option is mutually exclusive with the "volatile" mount
option, so the latter is now an alias to the "fsync=volatile" mount
option.
Reported-by: Chenglong Tang <chenglongtang@google.com>
Closes: https://lore.kernel.org/linux-unionfs/CAOdxtTadAFH01Vui1FvWfcmQ8jH1O45owTzUcpYbNvBxnLeM7Q@mail.gmail.com/
Link: https://lore.kernel.org/linux-unionfs/CAOQ4uxgKC1SgjMWre=fUb00v8rxtd6sQi-S+dxR8oDzAuiGu8g@mail.gmail.com/
Fixes: 7d6899fb69 ("ovl: fsync after metadata copy-up")
Depends: 50e638beb6 ("ovl: Use str_on_off() helper in ovl_show_options()")
Cc: stable@vger.kernel.org # v6.12+
Signed-off-by: Fei Lv <feilv@asrmicro.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Setting up the interface when suspended/resumeing fail on this card.
Adding a reset and delay quirk will eliminate this problem.
usb 1-1: new full-speed USB device number 2 using xhci-hcd
usb 1-1: New USB device found, idVendor=001f, idProduct=0b23
usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
usb 1-1: Product: AB17X USB Audio
usb 1-1: Manufacturer: Generic
usb 1-1: SerialNumber: 20241228172028
Signed-off-by: Lianqin Hu <hulianqin@vivo.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/PUZPR06MB6224CA59AD2B26054120B276D249A@PUZPR06MB6224.apcprd06.prod.outlook.com
The HP Pavilion 15-eg0xxx with subsystem ID 0x103c87cb uses a Realtek
ALC287 codec with a mute LED wired to GPIO pin 4 (mask 0x10). The
existing ALC287_FIXUP_HP_GPIO_LED fixup already handles this correctly,
but the subsystem ID was missing from the quirk table.
GPIO pin confirmed via manual hda-verb testing:
hda-verb SET_GPIO_MASK 0x10
hda-verb SET_GPIO_DIRECTION 0x10
hda-verb SET_GPIO_DATA 0x10
Signed-off-by: César Montoya <sprit152009@gmail.com>
Link: https://patch.msgid.link/20260321153603.12771-1-sprit152009@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
On the HP EliteBoard G1a platform (models without a headphone jack).
the speaker mute LED failed to function. The Sysfs ctl-led info showed
empty values because the standard LED registration couldn't correctly
bind to the master switch.
Adding this patch will fix and enable the speaker mute LED feature.
Tested-by: Chris Chiu <chris.chiu@canonical.com>
Signed-off-by: Kailang Yang <kailang@realtek.com>
Link: https://lore.kernel.org/279e929e884849df84687dbd67f20037@realtek.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
ASoC: Fixes for v7.0
This is two week's worth of fixes and quirks so it's a bit larger than
you might expect, there's nothing too exciting individually and nothing
in core code.
__io_uring_show_fdinfo() iterates over pending SQEs and, for 128-byte
SQEs on an IORING_SETUP_SQE_MIXED ring, needs to detect when the second
half of the SQE would be past the end of the sq_sqes array. The current
check tests (++sq_head & sq_mask) == 0, but sq_head is only incremented
when a 128-byte SQE is encountered, not on every iteration. The actual
array index is sq_idx = (i + sq_head) & sq_mask, which can be sq_mask
(the last slot) while the wrap check passes.
Fix by checking sq_idx directly. Keep the sq_head increment so the loop
still skips the second half of the 128-byte SQE on the next iteration.
Fixes: 1cba30bf9f ("io_uring: add support for IORING_SETUP_SQE_MIXED")
Signed-off-by: Nicholas Carlini <nicholas@carlini.com>
Link: https://patch.msgid.link/20260327021823.3138396-1-nicholas@carlini.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull smb client fix from Steve French:
- Fix rebuild of mapping table
* tag 'v7.0-rc5-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6:
smb/client: ensure smb2_mapping_table rebuild on cmd changes
Pull power management fixes from Rafael Wysocki:
"These fix two cpufreq issues, one in the core and one in the
conservative governor, and two issues related to system sleep:
- Restore the cpufreq core behavior changed inadvertently during the
6.19 development cycle to call cpufreq_frequency_table_cpuinfo()
for cpufreq policies getting re-initialized which ensures that
policy->max and policy->cpuinfo_max_freq will be valid going
forward (Viresh Kumar)
- Adjust the cached requested frequency in the conservative cpufreq
governor on policy limits changes to prevent it from becoming stale
in some cases (Viresh Kumar)
- Prevent pm_restore_gfp_mask() from triggering a WARN_ON() in some
code paths in which it is legitimately called without invoking
pm_restrict_gfp_mask() previously (Youngjun Park)
- Update snapshot_write_finalize() to take trailing zero pages into
account properly which prevents user space restore from failing
subsequently in some cases (Alberto Garcia)"
* tag 'pm-7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: sleep: Drop spurious WARN_ON() from pm_restore_gfp_mask()
PM: hibernate: Drain trailing zero pages on userspace restore
cpufreq: conservative: Reset requested_freq on limits change
cpufreq: Don't skip cpufreq_frequency_table_cpuinfo()
Pull thermal control fix from Rafael Wysocki:
"This prevents the int340x thermal driver from taking the power slider
offset parameter into account incorrectly in some cases (Srinivas
Pandruvada)"
* tag 'thermal-7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: intel: int340x: soc_slider: Set offset only for balanced mode
Pull ACPI support fix from Rafael Wysocki:
"Prevent use-after-free from occurring on reduced-hardware ACPI
platforms when -EPROBE_DEFER is returned by ec_install_handlers()
during ACPI EC driver initialization (Weiming Shi)"
* tag 'acpi-7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: EC: clean up handlers on probe failure in acpi_ec_setup()
Pull Landlock fixes from Mickaël Salaün:
"This mainly fixes Landlock TSYNC issues related to interrupts and
unexpected task exit.
Other fixes touch documentation and sample, and a new test extends
coverage"
* tag 'landlock-7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
landlock: Expand restrict flags example for ABI version 8
selftests/landlock: Test tsync interruption and cancellation paths
landlock: Clean up interrupted thread logic in TSYNC
landlock: Serialize TSYNC thread restriction
samples/landlock: Bump ABI version to 8
landlock: Improve TSYNC types
landlock: Fully release unused TSYNC work entries
landlock: Fix formatting
Merge fixes related to system sleep for 7.0-rc6:
- Prevent pm_restore_gfp_mask() from triggering a WARN_ON() in some
code paths in which it is legitimately called without invoking
pm_restrict_gfp_mask() previously (Youngjun Park)
- Update snapshot_write_finalize() to take trailing zero pages into
account properly which prevents user space restore from failing
subsequently in some cases (Alberto Garcia)
* pm-sleep:
PM: sleep: Drop spurious WARN_ON() from pm_restore_gfp_mask()
PM: hibernate: Drain trailing zero pages on userspace restore
Pull networking fixes from Paolo Abeni:
"Including fixes from Bluetooth, CAN, IPsec and Netfilter.
Notably, this includes the fix for the Bluetooth regression that you
were notified about. I'm not aware of any other pending regressions.
Current release - regressions:
- bluetooth:
- fix stack-out-of-bounds read in l2cap_ecred_conn_req
- fix regressions caused by reusing ident
- netfilter: revisit array resize logic
- eth: ice: set max queues in alloc_etherdev_mqs()
Previous releases - regressions:
- core: correctly handle tunneled traffic on IPV6_CSUM GSO fallback
- bluetooth:
- fix dangling pointer on mgmt_add_adv_patterns_monitor_complete
- fix deadlock in l2cap_conn_del()
- sched: codel: fix stale state for empty flows in fq_codel
- ipv6: remove permanent routes from tb6_gc_hlist when all exceptions expire.
- xfrm: fix skb_put() panic on non-linear skb during reassembly
- openvswitch:
- avoid releasing netdev before teardown completes
- validate MPLS set/set_masked payload length
- eth: iavf: fix out-of-bounds writes in iavf_get_ethtool_stats()
Previous releases - always broken:
- bluetooth: fix null-ptr-deref on l2cap_sock_ready_cb
- udp: fix wildcard bind conflict check when using hash2
- netfilter: fix use of uninitialized rtp_addr in process_sdp
- tls: Purge async_hold in tls_decrypt_async_wait()
- xfrm:
- prevent policy_hthresh.work from racing with netns teardown
- fix skb leak with espintcp and async crypto
- smc: fix double-free of smc_spd_priv when tee() duplicates splice pipe buffer
- can:
- add missing error handling to call can_ctrlmode_changelink()
- fix OOB heap access in cgw_csum_crc8_rel()
- eth:
- mana: fix use-after-free in add_adev() error path
- virtio-net: fix for VIRTIO_NET_F_GUEST_HDRLEN
- bcmasp: fix double free of WoL irq"
* tag 'net-7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (90 commits)
net: macb: use the current queue number for stats
netfilter: ctnetlink: use netlink policy range checks
netfilter: nf_conntrack_sip: fix use of uninitialized rtp_addr in process_sdp
netfilter: nf_conntrack_expect: skip expectations in other netns via proc
netfilter: nf_conntrack_expect: store netns and zone in expectation
netfilter: ctnetlink: ensure safe access to master conntrack
netfilter: nf_conntrack_expect: use expect->helper
netfilter: nf_conntrack_expect: honor expectation helper field
netfilter: nft_set_rbtree: revisit array resize logic
netfilter: ip6t_rt: reject oversized addrnr in rt_mt6_check()
netfilter: nfnetlink_log: fix uninitialized padding leak in NFULA_PAYLOAD
tls: Purge async_hold in tls_decrypt_async_wait()
selftests: netfilter: nft_concat_range.sh: add check for flush+reload bug
netfilter: nft_set_pipapo_avx2: don't return non-matching entry on expiry
Bluetooth: btusb: clamp SCO altsetting table indices
Bluetooth: L2CAP: Fix ERTM re-init and zero pdu_len infinite loop
Bluetooth: L2CAP: Fix deadlock in l2cap_conn_del()
Bluetooth: btintel: serialize btintel_hw_error() with hci_req_sync_lock
Bluetooth: L2CAP: Fix send LE flow credits in ACL link
net: mana: fix use-after-free in add_adev() error path
...
Bard Liao <yung-chuan.liao@linux.intel.com> says:
The new rt722 + rt1320 configuration uses the DMIC on the rt1320.
This series adds support for such configurations, where the DMIC is
provided by the rt1320 instead of the rt722.
According to the Intel sof design, it will create the name prefix
appended with amp index for the amp codec only, such as:
rt1318-1, rt1318-2, etc...
But the rt1320 is a codec with amp and mic codec functions, it doesn't
have the amp index in its name prefix as above.
And then it will be hard to identify the codec if in multi-rt1320 case.
So we add a flag to force the amp index to be appended.
Signed-off-by: Derek Fang <derek.fang@realtek.com>
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Link: https://patch.msgid.link/20260326075303.1083567-3-yung-chuan.liao@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Pull pin control fixes from Linus Walleij:
- Implement .get_direction() in the spmi-gpio gpio_chip
Recent changes makes this start to print warnings and it's not nice,
let's just fix it
- Clamp the return value of gpio_get() in the Renesas RZA1 driver
- Add the GPIO_GENERIC dependency to the STM32 HDP driver
- Modify the Mediatek driver to accept devices that do not use external
interrupts (EINT) at all
- Fix flag propagation in the Sunxi driver, so that we can fix an issue
with uninitialized pins in a follow-up patch using said flags
* tag 'pinctrl-v7.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: sunxi: fix gpiochip_lock_as_irq() failure when pinmux is unknown
pinctrl: sunxi: pass down flags to pinctrl routines
pinctrl: mediatek: common: Fix probe failure for devices without EINT
pinctrl: stm32: fix HDP driver dependency on GPIO_GENERIC
pinctrl: renesas: rza1: Normalize return value of gpio_get()
pinctrl: qcom: spmi-gpio: implement .get_direction()
pinctrl: renesas: rzt2h: Fix invalid wait context
pinctrl: renesas: rzt2h: Fix device node leak in rzt2h_gpio_register()
Pull dma-mapping fixes from Marek Szyprowski:
"A set of fixes for DMA-mapping subsystem, which resolve false-
positive warnings from KMSAN and DMA-API debug (Shigeru Yoshida
and Leon Romanovsky) as well as a simple build fix (Miguel Ojeda)"
* tag 'dma-mapping-7.0-2026-03-25' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux:
dma-mapping: add missing `inline` for `dma_free_attrs`
mm/hmm: Indicate that HMM requires DMA coherency
RDMA/umem: Tell DMA mapping that UMEM requires coherency
iommu/dma: add support for DMA_ATTR_REQUIRE_COHERENT attribute
dma-direct: prevent SWIOTLB path when DMA_ATTR_REQUIRE_COHERENT is set
dma-mapping: Introduce DMA require coherency attribute
dma-mapping: Clarify valid conditions for CPU cache line overlap
dma-mapping: handle DMA_ATTR_CPU_CACHE_CLEAN in trace output
dma-debug: Allow multiple invocations of overlapping entries
dma: swiotlb: add KMSAN annotations to swiotlb_bounce()
During futex_key_to_node_opt() execution, vma->vm_policy is read under
speculative mmap lock and RCU. Concurrently, mbind() may call
vma_replace_policy() which frees the old mempolicy immediately via
kmem_cache_free().
This creates a race where __futex_key_to_node() dereferences a freed
mempolicy pointer, causing a use-after-free read of mpol->mode.
[ 151.412631] BUG: KASAN: slab-use-after-free in __futex_key_to_node (kernel/futex/core.c:349)
[ 151.414046] Read of size 2 at addr ffff888001c49634 by task e/87
[ 151.415969] Call Trace:
[ 151.416732] __asan_load2 (mm/kasan/generic.c:271)
[ 151.416777] __futex_key_to_node (kernel/futex/core.c:349)
[ 151.416822] get_futex_key (kernel/futex/core.c:374 kernel/futex/core.c:386 kernel/futex/core.c:593)
Fix by adding rcu to __mpol_put().
Fixes: c042c50521 ("futex: Implement FUTEX2_MPOL")
Reported-by: Hao-Yu Yang <naup96721@gmail.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Hao-Yu Yang <naup96721@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Link: https://patch.msgid.link/20260324174418.GB1850007@noisy.programming.kicks-ass.net
Nicholas reported that his LLM found it was possible to create a UaF
when sys_futex_requeue() is used with different flags. The initial
motivation for allowing different flags was the variable sized futex,
but since that hasn't been merged (yet), simply mandate the flags are
identical, as is the case for the old style sys_futex() requeue
operations.
Fixes: 0f4b5f9722 ("futex: Add sys_futex_requeue()")
Reported-by: Nicholas Carlini <npc@anthropic.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
A previous commit changed the behaviour of the KVM_S390_VCPU_FAULT
ioctl. The current (wrong) implementation will trigger a guest
addressing exception if the requested address lies outside of a
memslot, unless the VM is UCONTROL.
Restore the previous behaviour by open coding the fault-in logic.
Fixes: 3762e905ec ("KVM: s390: use __kvm_faultin_pfn()")
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
When shadowing, the guest page tables are write-protected, in order to
trap changes and properly unshadow the shadow mapping for the nested
guest. Already shadowed levels are skipped, so that only the needed
levels are write protected.
Currently the levels that get write protected are exactly one level too
deep: the last level (nested guest memory) gets protected in the wrong
way, and will be protected again correctly a few lines afterwards; most
importantly, the highest non-shadowed level does *not* get write
protected.
Moreover, if the nested guest is running in a real address space, there
are no DAT tables to shadow.
Write protect the correct levels, so that all the levels that need to
be protected are protected, and avoid double protecting the last level;
skip attempting to shadow the DAT tables when the nested guest is
running in a real address space.
Fixes: e38c884df9 ("KVM: s390: Switch to new gmap")
Tested-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
If shadowing causes the shadow gmap to get unshadowed, exit early to
prevent an attempt to dereference the parent pointer, which at this
point is NULL.
Opportunistically add some more checks to prevent NULL parents.
Fixes: a2c17f9270 ("KVM: s390: New gmap code")
Fixes: e5f98a6899 ("KVM: s390: Add some helper functions needed for vSIE")
Fixes: e38c884df9 ("KVM: s390: Switch to new gmap")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
In most cases gmap_put() was not called when it should have.
Add the missing gmap_put() in vsie_run().
Fixes: e38c884df9 ("KVM: s390: Switch to new gmap")
Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Fix _do_shadow_pte() to use the correct pointer (guest pte instead of
nested guest) to set up the new pte.
Add a check to return -EOPNOTSUPP if the mapping for the nested guest
is writeable but the same page in the guest is only read-only.
Fixes: e38c884df9 ("KVM: s390: Switch to new gmap")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Introduce a new special softbit for large pages, like already presend
for normal pages, and use it to mark guest mappings that do not have
struct pages.
Whenever a leaf DAT entry becomes dirty, check the special softbit and
only call SetPageDirty() if there is an actual struct page.
Move the logic to mark pages dirty inside _gmap_ptep_xchg() and
_gmap_crstep_xchg_atomic(), to avoid needlessly duplicating the code.
Fixes: 5a74e3d934 ("KVM: s390: KVM-specific bitfields and helper functions")
Fixes: a2c17f9270 ("KVM: s390: New gmap code")
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
The slow path of the fault handler ultimately called gmap_link(), which
assumed the fault was a major fault, and blindly called dat_link().
In case of minor faults, things were not always handled properly; in
particular the prefix and vsie marker bits were ignored.
Move dat_link() into gmap.c, renaming it accordingly. Once moved, the
new _gmap_link() function will be able to correctly honour the prefix
and vsie markers.
This will cause spurious unshadows in some uncommon cases.
Fixes: 94fd9b16cc ("KVM: s390: KVM page table management functions: lifecycle management")
Fixes: a2c17f9270 ("KVM: s390: New gmap code")
Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
When shadowing a nested guest, a check is performed and no shadowing is
attempted if the nested guest is already shadowed.
The existing check was incomplete; fix it by also checking whether the
leaf DAT table entry in the existing shadow gmap has the same protection
as the one specified in the guest DAT entry.
Fixes: e38c884df9 ("KVM: s390: Switch to new gmap")
Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
In practice dat_crstep_xchg() is racy and hard to use correctly. Simply
remove it and replace its uses with dat_crstep_xchg_atomic().
This solves some actual races that lead to system hangs / crashes.
Opportunistically fix an alignment issue in _gmap_crstep_xchg_atomic().
Fixes: 589071eaaa ("KVM: s390: KVM page table management functions: clear and replace")
Fixes: 94fd9b16cc ("KVM: s390: KVM page table management functions: lifecycle management")
Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
If the guest misbehaves and puts the page tables for its nested guest
inside the memory of the nested guest itself, and the guest and nested
guest are being mapped with large pages, the shadow mapping will
lose synchronization with the actual mapping, since this will cause the
large page with the vsie notification bit to be split, but the
vsie notification bit will not be propagated to the resulting small
pages.
Fix this by propagating the vsie_notif bit from large pages to normal
pages when splitting a large page.
Fixes: 2db149a0a6 ("KVM: s390: KVM page table management functions: walks")
Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>