In general it's preferable to avoid placing cpumasks on the stack, as
for large values of NR_CPUS these can consume significant amounts of
stack space and make stack overflows more likely.
Use cpumask_first_and_and() to avoid the need for a temporary cpumask on
the stack.
Signed-off-by: Dawei Li <dawei.li@shingroup.cn>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240416085454.3547175-3-dawei.li@shingroup.cn
The interrupt controller shutdown path does not need to save the mask of
enabled interrupts because the next state the system is going to be in is
akin to a cold boot, or a kexec'd kernel. Saving the mask only makes sense
if the software state needs to preserve the hardware state across a system
suspend/resume cycle.
As an optimization, and given that there are systems with dozens of such
interrupt controller, save a "slow" memory mapped I/O read in the shutdown
path where no saving/restoring is required.
Reported-by: Tim Ross <tim.ross@broadcom.com>
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240424175732.1526531-1-florian.fainelli@broadcom.com
When a CPU goes offline, the interrupts affine to that CPU are
re-configured.
Managed interrupts undergo either migration to other CPUs or shutdown if
all CPUs listed in the affinity are offline. The migration of managed
interrupts is guaranteed on x86 because there are interrupt vectors
reserved.
Regular interrupts are migrated to a still online CPU in the affinity mask
or if there is no online CPU to any online CPU.
This works as long as the still online CPUs in the affinity mask have
interrupt vectors available, but in case that none of those CPUs has a
vector available the migration fails and the device interrupt becomes
stale.
This is not any different from the case where the affinity mask does not
contain any online CPU, but there is no fallback operation for this.
Instead of giving up, retry the migration attempt with the online CPU mask
if the interrupt is not managed, as managed interrupts cannot be affected
by this problem.
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240423073413.79625-1-dongli.zhang@oracle.com
irq_restore_affinity_of_irq() restarts managed interrupts unconditionally
when the first CPU in the affinity mask comes online. That's correct during
normal hotplug operations, but not when resuming from S3 because the
drivers are not resumed yet and interrupt delivery is not expected by them.
Skip the startup of suspended interrupts and let resume_device_irqs() deal
with restoring them. This ensures that irqs are not delivered to drivers
during the noirq phase of resuming from S3, after non-boot CPUs are brought
back online.
Signed-off-by: David Stevens <stevensd@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240424090341.72236-1-stevensd@chromium.org
Stop using the table inside the EXTI driver and list in DT the mapping
between EXTI events and its parent interrupts.
By switching away from using the internal table, there is no need anymore
to use the specific compatible "st,stm32mp13-exti", which was introduced to
select the proper internal table.
Convert the driver's table for stm32mp131 to the DT property
interrupts-extended.
Switch the compatible string to the generic "st,stm32mp1-exti", in place of
the specific "st,stm32mp13-exti".
Older DT using compatible "st,stm32mp13-exti" will still work as the driver
remains backward compatible.
Signed-off-by: Antonio Borneo <antonio.borneo@foss.st.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240415134926.1254428-10-antonio.borneo@foss.st.com
EXTI events availability depends on Resource Isolation Framework (RIF)
configuration.
RIF grants access to buses with Compartment ID (CID) filtering, secure and
privilege level. It also assigns EXTI events to one or several processors
(CID, Secure, Privilege).
EXTI events used by Linux must be CID-filtered (EnCIDCFGR.CFEN=1) and
statically assigned to CID1 (EnCIDCFR.CID=CID1).
EXTI events not filling these conditions are marked as reserved and can't
be used by Linux.
Signed-off-by: Antonio Borneo <antonio.borneo@foss.st.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240415134926.1254428-7-antonio.borneo@foss.st.com
All driver's dependencies for suspend/resume have been fixed long
ago. There are no more reasons to use syscore PM for the part of this
driver related to Cortex-A MPU.
Switch to standard PM using NOIRQ_SYSTEM_SLEEP_PM_OPS, so all the registers
of the interrupt controller get resumed before any irq gets enabled.
A side effect of this change is to drop the only global variable
'stm32_host_data', used to keep the driver's data for syscore_ops. This
makes the driver ready to support multiple EXTI instances.
Signed-off-by: Antonio Borneo <antonio.borneo@foss.st.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240415134926.1254428-5-antonio.borneo@foss.st.com
The mapping of EXTI events to its parent interrupt controller is both SoC
and instance dependent.
The current implementation requires adding a new mapping table to the
driver's code and a new compatible for each new EXTI instance.
Check for the presence of the optional interrupts-extended property and use
it to map EXTI events to the parent's interrupts.
For old device trees without the optional interrupts-extended property, the
driver's behavior is unchanged, thus keeps backward compatibility.
Signed-off-by: Antonio Borneo <antonio.borneo@foss.st.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240415134926.1254428-4-antonio.borneo@foss.st.com
After commit 5bb578a0c1 ("ARM: 9298/1: Drop custom mdesc->handle_irq()")
the function icoll_handle_irq() is only used within irq-mxs.c. So declare
it as static to fix the warning about a missing prototype when building
with W=1.
Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The current code is using a fixed mapping between the LS7A interrupt source
and the HT interrupt vector. This prevents the utilization of the full
interrupt vector space and therefore limits the number of interrupt source
in a system.
Replace the fixed mapping with a dynamic mapping which allocates a
vector when an interrupt source is set up. This avoids that unused
sources prevent vectors from being used for other devices.
Introduce a mapping table in struct pch_pic, where each interrupt source
will allocate an index as a 'hwirq' number from the table in the order of
application and set table value as interrupt source number. This hwirq
number will be configured as vector in the HT interrupt controller. For an
interrupt source, the validity period of the obtained hwirq will last until
the system reset.
Co-developed-by: Biao Dong <dongbiao@loongson.cn>
Signed-off-by: Biao Dong <dongbiao@loongson.cn>
Co-developed-by: Tianyang Zhang <zhangtianyang@loongson.cn>
Signed-off-by: Tianyang Zhang <zhangtianyang@loongson.cn>
Signed-off-by: Baoqi Zhang <zhangbaoqi@loongson.cn>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240422093830.27212-1-zhangtianyang@loongson.cn
Currently, the following warning is observed on the QEMU virt machine:
genirq: irq_chip APLIC-MSI-d000000.aplic did not update eff. affinity mask of irq 12
The above warning is because the IMSIC driver does not set the initial
value of effective affinity in the interrupt descriptor. To address this,
initialize the effective affinity in imsic_irq_domain_alloc().
Fixes: 027e125acd ("irqchip/riscv-imsic: Add device MSI domain support for platform devices")
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240413065210.315896-1-apatel@ventanamicro.com
When the watchdog determines that the current soft lockup is due to an
interrupt storm based on CPU utilization, reporting the most frequent
interrupts could be good enough for further troubleshooting.
Below is an example of interrupt storm. The call tree does not provide
useful information, but analyzing which interrupt caused the soft lockup by
comparing the counts of interrupts during the lockup period allows to
identify the culprit.
[ 638.870231] watchdog: BUG: soft lockup - CPU#9 stuck for 26s! [swapper/9:0]
[ 638.870825] CPU#9 Utilization every 4s during lockup:
[ 638.871194] #1: 0% system, 0% softirq, 100% hardirq, 0% idle
[ 638.871652] #2: 0% system, 0% softirq, 100% hardirq, 0% idle
[ 638.872107] #3: 0% system, 0% softirq, 100% hardirq, 0% idle
[ 638.872563] #4: 0% system, 0% softirq, 100% hardirq, 0% idle
[ 638.873018] #5: 0% system, 0% softirq, 100% hardirq, 0% idle
[ 638.873494] CPU#9 Detect HardIRQ Time exceeds 50%. Most frequent HardIRQs:
[ 638.873994] #1: 330945 irq#7
[ 638.874236] #2: 31 irq#82
[ 638.874493] #3: 10 irq#10
[ 638.874744] #4: 2 irq#89
[ 638.874992] #5: 1 irq#102
...
[ 638.875313] Call trace:
[ 638.875315] __do_softirq+0xa8/0x364
Signed-off-by: Bitao Hu <yaoma@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Liu Song <liusong@linux.alibaba.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20240411074134.30922-6-yaoma@linux.alibaba.com
The following softlockup is caused by interrupt storm, but it cannot be
identified from the call tree. Because the call tree is just a snapshot
and doesn't fully capture the behavior of the CPU during the soft lockup.
watchdog: BUG: soft lockup - CPU#28 stuck for 23s! [fio:83921]
...
Call trace:
__do_softirq+0xa0/0x37c
__irq_exit_rcu+0x108/0x140
irq_exit+0x14/0x20
__handle_domain_irq+0x84/0xe0
gic_handle_irq+0x80/0x108
el0_irq_naked+0x50/0x58
Therefore, it is necessary to report CPU utilization during the
softlockup_threshold period (report once every sample_period, for a total
of 5 reportings), like this:
watchdog: BUG: soft lockup - CPU#28 stuck for 23s! [fio:83921]
CPU#28 Utilization every 4s during lockup:
#1: 0% system, 0% softirq, 100% hardirq, 0% idle
#2: 0% system, 0% softirq, 100% hardirq, 0% idle
#3: 0% system, 0% softirq, 100% hardirq, 0% idle
#4: 0% system, 0% softirq, 100% hardirq, 0% idle
#5: 0% system, 0% softirq, 100% hardirq, 0% idle
...
This is helpful in determining whether an interrupt storm has occurred or
in identifying the cause of the softlockup. The criteria for determination
are as follows:
a. If the hardirq utilization is high, then interrupt storm should be
considered and the root cause cannot be determined from the call tree.
b. If the softirq utilization is high, then the call might not necessarily
point at the root cause.
c. If the system utilization is high, then analyzing the root
cause from the call tree is possible in most cases.
The mechanism requires a considerable amount of global storage space
when configured for the maximum number of CPUs. Therefore, adding a
SOFTLOCKUP_DETECTOR_INTR_STORM Kconfig knob that defaults to "yes"
if the max number of CPUs is <= 128.
Signed-off-by: Bitao Hu <yaoma@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Liu Song <liusong@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240411074134.30922-5-yaoma@linux.alibaba.com
show_interrupts() unconditionally accumulates the per CPU interrupt
statistics to determine whether an interrupt was ever raised.
This can be avoided for all interrupts which are not strictly per CPU
and not of type NMI because those interrupts provide already an
accumulated counter. The required logic is already implemented in
kstat_irqs().
Split the inner access logic out of kstat_irqs() and use it for
kstat_irqs() and show_interrupts() to avoid the accumulation loop
when possible.
Originally-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Bitao Hu <yaoma@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Liu Song <liusong@linux.alibaba.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20240411074134.30922-4-yaoma@linux.alibaba.com
The soft lockup detector lacks a mechanism to identify interrupt storms as
root cause of a lockup. To enable this the detector needs a mechanism to
snapshot the interrupt count statistics on a CPU when the detector observes
a potential lockup scenario and compare that against the interrupt count
when it warns about the lockup later on. The number of interrupts in that
period give a hint whether the lockup might have been caused by an interrupt
storm.
Instead of having extra storage in the lockup detector and accessing the
internals of the interrupt descriptor directly, add a snapshot member to
the per CPU irq_desc::kstat_irq structure and provide interfaces to take a
snapshot of all interrupts on the current CPU and to retrieve the delta of
a specific interrupt later on.
Originally-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Bitao Hu <yaoma@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240411074134.30922-3-yaoma@linux.alibaba.com
The irq_desc::kstat_irqs member is a per-CPU variable of type int, which is
only capable of counting. A snapshot mechanism for interrupt statistics
will be added soon, which requires an additional variable to store the
snapshot.
To facilitate expansion, convert kstat_irqs here to a struct containing
only the count.
Originally-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Bitao Hu <yaoma@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240411074134.30922-2-yaoma@linux.alibaba.com
An interrupt's effective affinity can only be different from its configured
affinity if there are multiple CPUs. Make it clear that this option is only
meaningful when SMP is enabled. Otherwise, there exists "WARNING: unmet
direct dependencies detected for GENERIC_IRQ_EFFECTIVE_AFF_MASK" when make
menuconfig if CONFIG_SMP is not set on LoongArch.
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240326121130.16622-3-yangtiezhu@loongson.cn
When alpine_msix_gic_domain_alloc() fails, there is an off-by-one in the
number of interrupts to be freed.
Fix it by passing the number of successfully allocated interrupts, instead
of the relative index of the last allocated one.
Fixes: 3841245e84 ("irqchip/alpine-msi: Fix freeing of interrupts on allocation error path")
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240327142305.1048-1-yuzenghui@huawei.com
It's a bit hard to read the logic since the virq is used before checking it
for 0. Rearrange the code to make it better to understand.
This, in particular, should clearly answer the question whether the caller
needs to perform this check or not, and there are plenty of places for both
variants, confirming a confusion.
Fun fact that the new code is shorter:
Function old new delta
irq_dispose_mapping 278 271 -7
Total: Before=11625, After=11618, chg -0.06%
when compiled by GCC on Debian for x86_64.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240405190105.3932034-1-andriy.shevchenko@linux.intel.com
The RISC-V advanced interrupt architecture (AIA) specification defines
advanced platform-level interrupt controller (APLIC) which has two modes
of operation: 1) Direct mode and 2) MSI mode.
(For more details, refer https://github.com/riscv/riscv-aia)
In APLIC direct-mode, wired interrupts are forwared to CPUs (or HARTs)
as a local external interrupt.
Add a platform irqchip driver for the RISC-V APLIC direct-mode to
support RISC-V platforms having only wired interrupts.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20240307140307.646078-7-apatel@ventanamicro.com
Simplify rzg2l_irqc_irq_{en,dis}able() by moving common code to
rzg2l_tint_irq_endisable().
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull EFI fixes from Ard Biesheuvel:
- Fix logic that is supposed to prevent placement of the kernel image
below LOAD_PHYSICAL_ADDR
- Use the firmware stack in the EFI stub when running in mixed mode
- Clear BSS only once when using mixed mode
- Check efi.get_variable() function pointer for NULL before trying to
call it
* tag 'efi-fixes-for-v6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
efi: fix panic in kdump kernel
x86/efistub: Don't clear BSS twice in mixed mode
x86/efistub: Call mixed mode boot services on the firmware's stack
efi/libstub: fix efi_random_alloc() to allocate memory at alloc_min or higher address
Pull x86 fixes from Thomas Gleixner:
- Ensure that the encryption mask at boot is properly propagated on
5-level page tables, otherwise the PGD entry is incorrectly set to
non-encrypted, which causes system crashes during boot.
- Undo the deferred 5-level page table setup as it cannot work with
memory encryption enabled.
- Prevent inconsistent XFD state on CPU hotplug, where the MSR is reset
to the default value but the cached variable is not, so subsequent
comparisons might yield the wrong result and as a consequence the
result prevents updating the MSR.
- Register the local APIC address only once in the MPPARSE enumeration
to prevent triggering the related WARN_ONs() in the APIC and topology
code.
- Handle the case where no APIC is found gracefully by registering a
fake APIC in the topology code. That makes all related topology
functions work correctly and does not affect the actual APIC driver
code at all.
- Don't evaluate logical IDs during early boot as the local APIC IDs
are not yet enumerated and the invoked function returns an error
code. Nothing requires the logical IDs before the final CPUID
enumeration takes place, which happens after the enumeration.
- Cure the fallout of the per CPU rework on UP which misplaced the
copying of boot_cpu_data to per CPU data so that the final update to
boot_cpu_data got lost which caused inconsistent state and boot
crashes.
- Use copy_from_kernel_nofault() in the kprobes setup as there is no
guarantee that the address can be safely accessed.
- Reorder struct members in struct saved_context to work around another
kmemleak false positive
- Remove the buggy code which tries to update the E820 kexec table for
setup_data as that is never passed to the kexec kernel.
- Update the resource control documentation to use the proper units.
- Fix a Kconfig warning observed with tinyconfig
* tag 'x86-urgent-2024-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot/64: Move 5-level paging global variable assignments back
x86/boot/64: Apply encryption mask to 5-level pagetable update
x86/cpu: Add model number for another Intel Arrow Lake mobile processor
x86/fpu: Keep xfd_state in sync with MSR_IA32_XFD
Documentation/x86: Document that resctrl bandwidth control units are MiB
x86/mpparse: Register APIC address only once
x86/topology: Handle the !APIC case gracefully
x86/topology: Don't evaluate logical IDs during early boot
x86/cpu: Ensure that CPU info updates are propagated on UP
kprobes/x86: Use copy_from_kernel_nofault() to read from unsafe address
x86/pm: Work around false positive kmemleak report in msr_build_context()
x86/kexec: Do not update E820 kexec table for setup_data
x86/config: Fix warning for 'make ARCH=x86_64 tinyconfig'