Merge branch 'virtio-mem' into features

David Hildenbrand says:

====================
virtio-mem: s390 support

Let's finally add s390 support for virtio-mem; my last RFC was sent
4 years ago, and a lot changed in the meantime.

The latest QEMU series is available at [1], which contains some more
details and a usage example on s390 (last patch).

There is not too much in here: The biggest part is querying a new diag(500)
STORAGE_LIMIT hypercall to obtain the proper "max_physmem_end".

The last three patches are not strictly required but certainly nice-to-have.

Note that -- in contrast to standby memory -- virtio-mem memory must be
configured to be automatically onlined as soon as hotplugged. The easiest
approach is using the "memhp_default_state=" kernel parameter or by using
proper udev rules. More details can be found at [2].

I have reviving+upstreaming a systemd service to handle configuring
that on my todo list, but for some reason I keep getting distracted ...

I tested various things, including:
 * Various memory hotplug/hotunplug combinations
 * Device hotplug/hotunplug
 * /proc/iomem output
 * reboot
 * kexec
 * kdump: make sure we properly enter the "kdump mode" in the virtio-mem
   driver

kdump support for virtio-mem memory on s390 will be sent out separately.

v2 -> v3
* "s390/kdump: make is_kdump_kernel() consistently return "true" in kdump
   environments only"
 -> Sent out separately [3]
* "s390/physmem_info: query diag500(STORAGE LIMIT) to support QEMU/KVM memory
   devices"
 -> No query function for diag500 for now.
 -> Update comment above setup_ident_map_size().
 -> Optimize/rewrite diag500_storage_limit() [Heiko]
 -> Change handling in detect_physmem_online_ranges [Alexander]
 -> Improve documentation.
* "s390/sparsemem: provide memory_add_physaddr_to_nid() with CONFIG_NUMA"
 -> Added after testing on systems with CONFIG_NUMA=y

v1 -> v2:
* Document the new diag500 subfunction
* Use "s390" instead of "s390x" consistently

[1] https://lkml.kernel.org/r/20241008105455.2302628-1-david@redhat.com
[2] https://virtio-mem.gitlab.io/user-guide/user-guide-linux.html
[3] https://lkml.kernel.org/r/20241023090651.1115507-1-david@redhat.com
====================

Link: https://lore.kernel.org/r/20241025141453.1210600-1-david@redhat.com/
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
This commit is contained in:
Heiko Carstens
2024-11-07 10:28:10 +01:00
7 changed files with 98 additions and 18 deletions

View File

@@ -35,20 +35,24 @@ DIAGNOSE function codes not specific to KVM, please refer to the
documentation for the s390 hypervisors defining them.
DIAGNOSE function code 'X'500' - KVM virtio functions
-----------------------------------------------------
DIAGNOSE function code 'X'500' - KVM functions
----------------------------------------------
If the function code specifies 0x500, various virtio-related functions
are performed.
If the function code specifies 0x500, various KVM-specific functions
are performed, including virtio functions.
General register 1 contains the virtio subfunction code. Supported
virtio subfunctions depend on KVM's userspace. Generally, userspace
provides either s390-virtio (subcodes 0-2) or virtio-ccw (subcode 3).
General register 1 contains the subfunction code. Supported subfunctions
depend on KVM's userspace. Regarding virtio subfunctions, generally
userspace provides either s390-virtio (subcodes 0-2) or virtio-ccw
(subcode 3).
Upon completion of the DIAGNOSE instruction, general register 2 contains
the function's return code, which is either a return code or a subcode
specific value.
If the specified subfunction is not supported, a SPECIFICATION exception
will be triggered.
Subcode 0 - s390-virtio notification and early console printk
Handled by userspace.
@@ -76,6 +80,23 @@ Subcode 3 - virtio-ccw notification
See also the virtio standard for a discussion of this hypercall.
Subcode 4 - storage-limit
Handled by userspace.
After completion of the DIAGNOSE call, general register 2 will
contain the storage limit: the maximum physical address that might be
used for storage throughout the lifetime of the VM.
The storage limit does not indicate currently usable storage, it may
include holes, standby storage and areas reserved for other means, such
as memory hotplug or virtio-mem devices. Other interfaces for detecting
actually usable storage, such as SCLP, must be used in conjunction with
this subfunction.
Note that the storage limit can be larger, but never smaller than the
maximum storage address indicated by SCLP via the "maximum storage
increment" and the "increment size".
DIAGNOSE function code 'X'501 - KVM breakpoint
----------------------------------------------

View File

@@ -109,6 +109,42 @@ static int diag260(void)
return 0;
}
#define DIAG500_SC_STOR_LIMIT 4
static int diag500_storage_limit(unsigned long *max_physmem_end)
{
unsigned long storage_limit;
unsigned long reg1, reg2;
psw_t old;
asm volatile(
" mvc 0(16,%[psw_old]),0(%[psw_pgm])\n"
" epsw %[reg1],%[reg2]\n"
" st %[reg1],0(%[psw_pgm])\n"
" st %[reg2],4(%[psw_pgm])\n"
" larl %[reg1],1f\n"
" stg %[reg1],8(%[psw_pgm])\n"
" lghi 1,%[subcode]\n"
" lghi 2,0\n"
" diag 2,4,0x500\n"
"1: mvc 0(16,%[psw_pgm]),0(%[psw_old])\n"
" lgr %[slimit],2\n"
: [reg1] "=&d" (reg1),
[reg2] "=&a" (reg2),
[slimit] "=d" (storage_limit),
"=Q" (get_lowcore()->program_new_psw),
"=Q" (old)
: [psw_old] "a" (&old),
[psw_pgm] "a" (&get_lowcore()->program_new_psw),
[subcode] "i" (DIAG500_SC_STOR_LIMIT)
: "memory", "1", "2");
if (!storage_limit)
return -EINVAL;
/* Convert inclusive end to exclusive end */
*max_physmem_end = storage_limit + 1;
return 0;
}
static int tprot(unsigned long addr)
{
unsigned long reg1, reg2;
@@ -157,7 +193,9 @@ unsigned long detect_max_physmem_end(void)
{
unsigned long max_physmem_end = 0;
if (!sclp_early_get_memsize(&max_physmem_end)) {
if (!diag500_storage_limit(&max_physmem_end)) {
physmem_info.info_source = MEM_DETECT_DIAG500_STOR_LIMIT;
} else if (!sclp_early_get_memsize(&max_physmem_end)) {
physmem_info.info_source = MEM_DETECT_SCLP_READ_INFO;
} else {
max_physmem_end = search_mem_end();
@@ -170,6 +208,13 @@ void detect_physmem_online_ranges(unsigned long max_physmem_end)
{
if (!sclp_early_read_storage_info()) {
physmem_info.info_source = MEM_DETECT_SCLP_STOR_INFO;
} else if (physmem_info.info_source == MEM_DETECT_DIAG500_STOR_LIMIT) {
unsigned long online_end;
if (!sclp_early_get_memsize(&online_end)) {
physmem_info.info_source = MEM_DETECT_SCLP_READ_INFO;
add_physmem_online_range(0, online_end);
}
} else if (!diag260()) {
physmem_info.info_source = MEM_DETECT_DIAG260;
} else if (max_physmem_end) {

View File

@@ -182,12 +182,15 @@ static void kaslr_adjust_got(unsigned long offset)
* Merge information from several sources into a single ident_map_size value.
* "ident_map_size" represents the upper limit of physical memory we may ever
* reach. It might not be all online memory, but also include standby (offline)
* memory. "ident_map_size" could be lower then actual standby or even online
* memory or memory areas reserved for other means (e.g., memory devices such as
* virtio-mem).
*
* "ident_map_size" could be lower then actual standby/reserved or even online
* memory present, due to limiting factors. We should never go above this limit.
* It is the size of our identity mapping.
*
* Consider the following factors:
* 1. max_physmem_end - end of physical memory online or standby.
* 1. max_physmem_end - end of physical memory online, standby or reserved.
* Always >= end of the last online memory range (get_physmem_online_end()).
* 2. CONFIG_MAX_PHYSMEM_BITS - the maximum size of physical memory the
* kernel is able to support.

View File

@@ -9,6 +9,7 @@ enum physmem_info_source {
MEM_DETECT_NONE = 0,
MEM_DETECT_SCLP_STOR_INFO,
MEM_DETECT_DIAG260,
MEM_DETECT_DIAG500_STOR_LIMIT,
MEM_DETECT_SCLP_READ_INFO,
MEM_DETECT_BIN_SEARCH
};
@@ -107,6 +108,8 @@ static inline const char *get_physmem_info_source(void)
return "sclp storage info";
case MEM_DETECT_DIAG260:
return "diag260";
case MEM_DETECT_DIAG500_STOR_LIMIT:
return "diag500 storage limit";
case MEM_DETECT_SCLP_READ_INFO:
return "sclp read info";
case MEM_DETECT_BIN_SEARCH:

View File

@@ -2,7 +2,15 @@
#ifndef _ASM_S390_SPARSEMEM_H
#define _ASM_S390_SPARSEMEM_H
#define SECTION_SIZE_BITS 28
#define SECTION_SIZE_BITS 27
#define MAX_PHYSMEM_BITS CONFIG_MAX_PHYSMEM_BITS
#ifdef CONFIG_NUMA
static inline int memory_add_physaddr_to_nid(u64 addr)
{
return 0;
}
#define memory_add_physaddr_to_nid memory_add_physaddr_to_nid
#endif /* CONFIG_NUMA */
#endif /* _ASM_S390_SPARSEMEM_H */

View File

@@ -122,7 +122,7 @@ config VIRTIO_BALLOON
config VIRTIO_MEM
tristate "Virtio mem driver"
depends on X86_64 || ARM64 || RISCV
depends on X86_64 || ARM64 || RISCV || S390
depends on VIRTIO
depends on MEMORY_HOTPLUG
depends on MEMORY_HOTREMOVE
@@ -132,11 +132,11 @@ config VIRTIO_MEM
This driver provides access to virtio-mem paravirtualized memory
devices, allowing to hotplug and hotunplug memory.
This driver currently only supports x86-64 and arm64. Although it
should compile on other architectures that implement memory
hot(un)plug, architecture-specific and/or common
code changes may be required for virtio-mem, kdump and kexec to work as
expected.
This driver currently supports x86-64, arm64, riscv and s390.
Although it should compile on other architectures that implement
memory hot(un)plug, architecture-specific and/or common
code changes may be required for virtio-mem, kdump and kexec to
work as expected.
If unsure, say M.

View File

@@ -1905,7 +1905,7 @@ config STRICT_DEVMEM
bool "Filter access to /dev/mem"
depends on MMU && DEVMEM
depends on ARCH_HAS_DEVMEM_IS_ALLOWED || GENERIC_LIB_DEVMEM_IS_ALLOWED
default y if PPC || X86 || ARM64
default y if PPC || X86 || ARM64 || S390
help
If this option is disabled, you allow userspace (root) access to all
of memory, including kernel and userspace memory. Accidental