asm/unaligned.h is always an include of asm-generic/unaligned.h;
might as well move that thing to linux/unaligned.h and include
that - there's nothing arch-specific in that header.
auto-generated by the following:
for i in `git grep -l -w asm/unaligned.h`; do
sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i
done
for i in `git grep -l -w asm-generic/unaligned.h`; do
sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i
done
git mv include/asm-generic/unaligned.h include/linux/unaligned.h
git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h
sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild
sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
Pull UML updates from Richard Weinberger:
- Removal of dead code (TT mode leftovers, etc)
- Fixes for the network vector driver
- Fixes for time-travel mode
* tag 'uml-for-linus-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux:
um: fix time-travel syscall scheduling hack
um: Remove outdated asm/sysrq.h header
um: Remove the declaration of user_thread function
um: Remove the call to SUBARCH_EXECVE1 macro
um: Remove unused mm_fd field from mm_id
um: Remove unused fields from thread_struct
um: Remove the redundant newpage check in update_pte_range
um: Remove unused kpte_clear_flush macro
um: Remove obsoleted declaration for execute_syscall_skas
user_mode_linux_howto_v2: add VDE vector support in doc
vector_user: add VDE support
um: remove ARCH_NO_PREEMPT_DYNAMIC
um: vector: Fix NAPI budget handling
um: vector: Replace locks guarding queue depth with atomics
um: remove variable stack array in os_rcv_fd_msg()
no_llseek had been defined to NULL two years ago, in commit 868941b144
("fs: remove no_llseek")
To quote that commit,
At -rc1 we'll need do a mechanical removal of no_llseek -
git grep -l -w no_llseek | grep -v porting.rst | while read i; do
sed -i '/\<no_llseek\>/d' $i
done
would do it.
Unfortunately, that hadn't been done. Linus, could you do that now, so
that we could finally put that thing to rest? All instances are of the
form
.llseek = no_llseek,
so it's obviously safe.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The schedule() call there really never did anything at
least since the introduction of the EEVDF scheduler,
but now I found a case where we permanently hang in a
loop of -ERESTARTNOINTR (due to locking.) Work around
it by making any syscalls with error return take time
(and then schedule after) so we cannot hang in such a
loop forever.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
This header no longer serves a purpose after show_trace was removed
by commit 9d1ee8ce92 ("um: Rewrite show_stack()").
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
This function has never been defined since its declaration was
introduced by commit 1da177e4c3 ("Linux-2.6.12-rc2").
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
This macro has never been defined by any supported sub-architectures
in tree since it was introduced by commit 1d3468a664 ("[PATCH uml:
move _kern.c files").
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
It's no longer used since the removal of the SKAS3/4 support.
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
These fields are no longer used since the removal of tt mode.
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
The two checks have been identical since commit ef714f1502 ("um:
remove force_flush_all from fork_handler"). And the inner one isn't
necessary anymore.
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
This macro has no users, and __flush_tlb_one doesn't exist either.
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
The execute_syscall_skas() have been removed since
commit e32dacb9f4 ("[PATCH] uml: system call path cleanup"),
and now it is useless, so remove it.
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Richard Weinberger <richard@nod.at>
There's no such symbol and we currently don't have any of the
mechanisms to make boot-time selection cheap enough, so we can't
have HAVE_PREEMPT_DYNAMIC_CALL or HAVE_PREEMPT_DYNAMIC_KEY.
Remove the select statement.
Reported-by: Lukas Bulwahn <lbulwahn@redhat.com>
Fixes: cd01672d64 ("um: Enable preemption in UML")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
UML vector drivers use ring buffer structures which map
preallocated skbs onto mmsg vectors for use with sendmmsg
and recvmmsg. They are designed around a single consumer,
single producer pattern allowing simultaneous enqueue and
dequeue.
Lock debugging with preemption showed possible races when
locking the queue depth. This patch addresses this by
removing extra locks, adding barriers and making queue
depth inc/dec and access atomic.
Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
When generalizing this, I was in the mindset of this being
"userspace" code, but even there we should not use variable
arrays as the kernel is moving away from allowing that.
Simply reserve (but not use) enough space for the maximum
two descriptors we might need now, and return an error if
attempting to receive more than that.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202407041459.3SYg4TEi-lkp@intel.com/
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
This just standardizes the use of MIN() and MAX() macros, with the very
traditional semantics. The goal is to use these for C constant
expressions and for top-level / static initializers, and so be able to
simplify the min()/max() macros.
These macro names were used by various kernel code - they are very
traditional, after all - and all such users have been fixed up, with a
few different approaches:
- trivial duplicated macro definitions have been removed
Note that 'trivial' here means that it's obviously kernel code that
already included all the major kernel headers, and thus gets the new
generic MIN/MAX macros automatically.
- non-trivial duplicated macro definitions are guarded with #ifndef
This is the "yes, they define their own versions, but no, the include
situation is not entirely obvious, and maybe they don't get the
generic version automatically" case.
- strange use case #1
A couple of drivers decided that the way they want to describe their
versioning is with
#define MAJ 1
#define MIN 2
#define DRV_VERSION __stringify(MAJ) "." __stringify(MIN)
which adds zero value and I just did my Alexander the Great
impersonation, and rewrote that pointless Gordian knot as
#define DRV_VERSION "1.2"
instead.
- strange use case #2
A couple of drivers thought that it's a good idea to have a random
'MIN' or 'MAX' define for a value or index into a table, rather than
the traditional macro that takes arguments.
These values were re-written as C enum's instead. The new
function-line macros only expand when followed by an open
parenthesis, and thus don't clash with enum use.
Happily, there weren't really all that many of these cases, and a lot of
users already had the pattern of using '#ifndef' guarding (or in one
case just using '#undef MIN') before defining their own private version
that does the same thing. I left such cases alone.
Cc: David Laight <David.Laight@aculab.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull UML updates from Richard Weinberger:
- Support for preemption
- i386 Rust support
- Huge cleanup by Benjamin Berg
- UBSAN support
- Removal of dead code
* tag 'uml-for-linus-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: (41 commits)
um: vector: always reset vp->opened
um: vector: remove vp->lock
um: register power-off handler
um: line: always fill *error_out in setup_one_line()
um: remove pcap driver from documentation
um: Enable preemption in UML
um: refactor TLB update handling
um: simplify and consolidate TLB updates
um: remove force_flush_all from fork_handler
um: Do not flush MM in flush_thread
um: Delay flushing syscalls until the thread is restarted
um: remove copy_context_skas0
um: remove LDT support
um: compress memory related stub syscalls while adding them
um: Rework syscall handling
um: Add generic stub_syscall6 function
um: Create signal stack memory assignment in stub_data
um: Remove stub-data.h include from common-offsets.h
um: time-travel: fix signal blocking race/hang
um: time-travel: remove time_exit()
...
Pull interrupt subsystem updates from Thomas Gleixner:
"Core:
- Provide a new mechanism to create interrupt domains. The existing
interfaces have already too many parameters and it's a pain to
expand any of this for new required functionality.
The new function takes a pointer to a data structure as argument.
The data structure combines all existing parameters and allows for
easy extension.
The first extension for this is to handle the instantiation of
generic interrupt chips at the core level and to allow drivers to
provide extra init/exit callbacks.
This is necessary to do the full interrupt chip initialization
before the new domain is published, so that concurrent usage sites
won't see a half initialized interrupt domain. Similar problems
exist on teardown.
This has turned out to be a real problem due to the deferred and
parallel probing which was added in recent years.
Handling this at the core level allows to remove quite some accrued
boilerplate code in existing drivers and avoids horrible
workarounds at the driver level.
- The usual small improvements all over the place
Drivers:
- Add support for LAN966x OIC and RZ/Five SoC
- Split the STM ExtI driver into a microcontroller and a SMP version
to allow building the latter as a module for multi-platform
kernels
- Enable MSI support for Armada 370XP on platforms which do not
support IPIs
- The usual small fixes and enhancements all over the place"
* tag 'irq-core-2024-07-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (59 commits)
irqdomain: Fix the kernel-doc and plug it into Documentation
genirq: Set IRQF_COND_ONESHOT in request_irq()
irqchip/imx-irqsteer: Handle runtime power management correctly
irqchip/gic-v3: Pass #redistributor-regions to gic_of_setup_kvm_info()
irqchip/bcm2835: Enable SKIP_SET_WAKE and MASK_ON_SUSPEND
irqchip/gic-v4: Make sure a VPE is locked when VMAPP is issued
irqchip/gic-v4: Substitute vmovp_lock for a per-VM lock
irqchip/gic-v4: Always configure affinity on VPE activation
Revert "irqchip/dw-apb-ictl: Support building as module"
Revert "Loongarch: Support loongarch avec"
arm64: Kconfig: Allow build irq-stm32mp-exti driver as module
ARM: stm32: Allow build irq-stm32mp-exti driver as module
irqchip/stm32mp-exti: Allow building as module
irqchip/stm32mp-exti: Rename internal symbols
irqchip/stm32-exti: Split MCU and MPU code
arm64: Kconfig: Select STM32MP_EXTI on STM32 platforms
ARM: stm32: Use different EXTI driver on ARMv7m and ARMv7a
irqchip/stm32-exti: Add CONFIG_STM32MP_EXTI
irqchip/dw-apb-ictl: Support building as module
irqchip/riscv-aplic: Simplify the initialization code
...
Pull virtio updates from Michael Tsirkin:
"Several new features here:
- Virtio find vqs API has been reworked (required to fix the
scalability issue we have with adminq, which I hope to merge later
in the cycle)
- vDPA driver for Marvell OCTEON
- virtio fs performance improvement
- mlx5 migration speedups
Fixes, cleanups all over the place"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (56 commits)
virtio: rename virtio_find_vqs_info() to virtio_find_vqs()
virtio: remove unused virtio_find_vqs() and virtio_find_vqs_ctx() helpers
virtio: convert the rest virtio_find_vqs() users to virtio_find_vqs_info()
virtio_balloon: convert to use virtio_find_vqs_info()
virtiofs: convert to use virtio_find_vqs_info()
scsi: virtio_scsi: convert to use virtio_find_vqs_info()
virtio_net: convert to use virtio_find_vqs_info()
virtio_crypto: convert to use virtio_find_vqs_info()
virtio_console: convert to use virtio_find_vqs_info()
virtio_blk: convert to use virtio_find_vqs_info()
virtio: rename find_vqs_info() op to find_vqs()
virtio: remove the original find_vqs() op
virtio: call virtio_find_vqs_info() from virtio_find_single_vq() directly
virtio: convert find_vqs() op implementations to find_vqs_info()
virtio_pci: convert vp_*find_vqs() ops to find_vqs_info()
virtio: introduce virtio_queue_info struct and find_vqs_info() config op
virtio: make virtio_find_single_vq() call virtio_find_vqs()
virtio: make virtio_find_vqs() call virtio_find_vqs_ctx()
caif_virtio: use virtio_find_single_vq() for single virtqueue finding
vdpa/mlx5: Don't enable non-active VQs in .set_vq_ready()
...
kmsg_dump doesn't forward the panic reason string to the kmsg_dumper
callback.
This patch adds a new struct kmsg_dump_detail, that will hold the
reason and description, and pass it to the dump() callback.
To avoid updating all kmsg_dump() call, it adds a kmsg_dump_desc()
function and a macro for backward compatibility.
I've written this for drm_panic, but it can be useful for other
kmsg_dumper.
It allows to see the panic reason, like "sysrq triggered crash"
or "VFS: Unable to mount root fs on xxxx" on the drm panic screen.
v2:
* Use a struct kmsg_dump_detail to hold the reason and description
pointer, for more flexibility if we want to add other parameters.
(Kees Cook)
* Fix powerpc/nvram_64 build, as I didn't update the forward
declaration of oops_to_nvram()
Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>
Acked-by: Petr Mladek <pmladek@suse.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Acked-by: Kees Cook <kees@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240702122639.248110-1-jfalempe@redhat.com
UML should not be using the architecture native runtime constants, since
that requires also having the appropriate instruction fixups (and all
the linker script details).
Not that using that code would be impossible, but it's not worth it.
Just point UML at the generic version.
Reported-by: Nathan Chancellor <nathan@kernel.org>
Fixes: e3c92e8171 ("runtime constants: add x86 architecture support")
Link: https://lore.kernel.org/all/20240716143644.GA1827132@thelio-3990X/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull asm-generic updates from Arnd Bergmann:
"Most of this is part of my ongoing work to clean up the system call
tables. In this bit, all of the newer architectures are converted to
use the machine readable syscall.tbl format instead in place of
complex macros in include/uapi/asm-generic/unistd.h.
This follows an earlier series that fixed various API mismatches and
in turn is used as the base for planned simplifications.
The other two patches are dead code removal and a warning fix"
* tag 'asm-generic-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
vmlinux.lds.h: catch .bss..L* sections into BSS")
fixmap: Remove unused set_fixmap_offset_io()
riscv: convert to generic syscall table
openrisc: convert to generic syscall table
nios2: convert to generic syscall table
loongarch: convert to generic syscall table
hexagon: use new system call table
csky: convert to generic syscall table
arm64: rework compat syscall macros
arm64: generate 64-bit syscall.tbl
arm64: convert unistd_32.h to syscall.tbl format
arc: convert to generic syscall table
clone3: drop __ARCH_WANT_SYS_CLONE3 macro
kbuild: add syscall table generation to scripts/Makefile.asm-headers
kbuild: verify asm-generic header list
loongarch: avoid generating extra header files
um: don't generate asm/bpf_perf_event.h
csky: drop asm/gpio.h wrapper
syscalls: add generic scripts/syscall.tbl
If we start validating the existence of the asm-generic side of
generated headers, this one causes a warning:
make[3]: *** No rule to make target 'arch/um/include/generated/asm/bpf_perf_event.h', needed by 'all'. Stop.
The problem is that the asm-generic header only exists for the uapi
variant, but arch/um has no uapi headers and instead uses the x86
userspace API.
Add a custom file with an explicit redirect to avoid this.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Conceptually, we want the memory mappings to always be up to date and
represent whatever is in the TLB. To ensure that, we need to sync them
over in the userspace case and for the kernel we need to process the
mappings.
The kernel will call flush_tlb_* if page table entries that were valid
before become invalid. Unfortunately, this is not the case if entries
are added.
As such, change both flush_tlb_* and set_ptes to track the memory range
that has to be synchronized. For the kernel, we need to execute a
flush_tlb_kern_* immediately but we can wait for the first page fault in
case of set_ptes. For userspace in contrast we only store that a range
of memory needs to be synced and do so whenever we switch to that
process.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20240703134536.1161108-13-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The HVC update was mostly used to compress consecutive calls into one.
This is mostly relevant for userspace where it is already handled by the
syscall stub code.
Simplify the whole logic and consolidate it for both kernel and
userspace. This does remove the sequential syscall compression for the
kernel, however that shouldn't be the main factor in most runs.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20240703134536.1161108-12-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
There should be no need to flush the memory in flush_thread. Doing this
likely worked around some issue where memory was still incorrectly
mapped when creating or cloning an MM.
With the removal of the special clone path, that isn't relevant anymore.
However, add the flush into MM initialization so that any new userspace
MM is guaranteed to be clean.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20240703134536.1161108-10-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The current LDT code has a few issues that mean it should be redone in a
different way once we always start with a fresh MM even when cloning.
In a new and better world, the kernel would just ensure its own LDT is
clear at startup. At that point, all that is needed is a simple function
to populate the LDT from another MM in arch_dup_mmap combined with some
tracking of the installed LDT entries for each MM.
Note that the old implementation was even incorrect with regard to
reading, as it copied out the LDT entries in the internal format rather
than converting them to the userspace structure.
Removal should be fine as the LDT is not used for thread-local storage
anymore.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20240703134536.1161108-7-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Rework syscall handling to be platform independent. Also create a clean
split between queueing of syscalls and flushing them out, removing the
need to keep state in the code that triggers the syscalls.
The code adds syscall_data_len to the global mm_id structure. This will
be used later to allow surrounding code to track whether syscalls still
need to run and if errors occurred.
Signed-off-by: Benjamin Berg <benjamin@sipsolutions.net>
Link: https://patch.msgid.link/20240703134536.1161108-5-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When we switch to use seccomp, we need both the signal stack and other
data (i.e. syscall information) to co-exist in the stub data. To
facilitate this, start by defining separate memory areas for the stack
and syscall data.
This moves the signal stack onto a new page as the memory area is not
sufficient to hold both signal stack and syscall information.
Only change the signal stack setup for now, as the syscall code will be
reworked later.
Signed-off-by: Benjamin Berg <benjamin@sipsolutions.net>
Link: https://patch.msgid.link/20240703134536.1161108-3-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When signals are hard-blocked in order to do time-travel
socket processing, we set signals_blocked and then handle
SIGIO signals by setting the SIGIO bit in signals_pending.
When unblocking, we first set signals_blocked to 0, and
then handle all pending signals. We have to set it first,
so that we can again properly block/unblock inside the
unblock, if the time-travel handlers need to be processed.
Unfortunately, this is racy. We can get into this situation:
// signals_pending = SIGIO_MASK
unblock_signals_hard()
signals_blocked = 0;
if (signals_pending && signals_enabled) {
block_signals();
unblock_signals()
...
sig_handler_common(SIGIO, NULL, NULL);
sigio_handler()
...
sigio_reg_handler()
irq_do_timetravel_handler()
reg->timetravel_handler() ==
vu_req_interrupt_comm_handler()
vu_req_read_message()
vhost_user_recv_req()
vhost_user_recv()
vhost_user_recv_header()
// reads 12 bytes header of
// 20 bytes message
<-- receive SIGIO here <--
sig_handler()
int enabled = signals_enabled; // 1
if ((signals_blocked || !enabled) && (sig == SIGIO)) {
if (!signals_blocked && time_travel_mode == TT_MODE_EXTERNAL)
sigio_run_timetravel_handlers()
_sigio_handler()
sigio_reg_handler()
... as above ...
vhost_user_recv_header()
// reads 8 bytes that were message payload
// as if it were header - but aborts since
// it then gets -EAGAIN
...
--> end signal handler -->
// continue in vhost_user_recv()
// full_read() for 8 bytes payload busy loops
// entire process hangs here
Conceptually, to fix this, we need to ensure that the
signal handler cannot run while we hard-unblock signals.
The thing that makes this more complex is that we can be
doing hard-block/unblock while unblocking. Introduce a
new signals_blocked_pending variable that we can keep at
non-zero as long as pending signals are being processed,
then we only need to ensure it's decremented safely and
the signal handler will only increment it if it's already
non-zero (or signals_blocked is set, of course.)
Note also that only the outermost call to hard-unblock is
allowed to decrement signals_blocked_pending, since it
could otherwise reach zero in an inner call, and leave
the same race happening if the timetravel_handler loops,
but that's basically required of it.
Fixes: d6b399a0e0 ("um: time-travel/signals: fix ndelay() in interrupt")
Link: https://patch.msgid.link/20240703110144.28034-2-johannes@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
With external time travel, a LOT of message can end up
being exchanged on the socket, taking a significant
amount of time just to do that.
Add a new shared memory optimisation to that, where a
number of changes are made:
- the controller sends a client ID and a shared memory FD
(and a logging FD we don't use) in the ACK message to
the initial START
- the shared memory holds the current time and the
free_until value, so that there's no need to exchange
messages for that
- if the client that's running has shared memory support,
any client (the running one included) can request the
next time it wants to run inside the shared memory,
rather than sending a message, by also updating the
free_until value
- when shared memory is enabled, RUN/WAIT messages no
longer have an ACK, further cutting down on messages
Together, this can reduce the number of messages very
significantly, and reduce overall test/simulation run time.
Co-developed-by: Mordechay Goodstein <mordechay.goodstein@intel.com>
Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com>
Link: https://patch.msgid.link/20240702192118.6ad0a083f574.Ie41206c8ce4507fe26b991937f47e86c24ca7a31@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>