linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-05-16 03:11:11 -04:00

Author	SHA1	Message	Date
Peter Zijlstra	0e98eb1481	entry: Prepare for deferred hrtimer rearming The hrtimer interrupt expires timers and at the end of the interrupt it rearms the clockevent device for the next expiring timer. That's obviously correct, but in the case that a expired timer sets NEED_RESCHED the return from interrupt ends up in schedule(). If HRTICK is enabled then schedule() will modify the hrtick timer, which causes another reprogramming of the hardware. That can be avoided by deferring the rearming to the return from interrupt path and if the return results in a immediate schedule() invocation then it can be deferred until the end of schedule(), which avoids multiple rearms and re-evaluation of the timer wheel. As this is only relevant for interrupt to user return split the work masks up and hand them in as arguments from the relevant exit to user functions, which allows the compiler to optimize the deferred handling out for the syscall exit to user case. Add the rearm checks to the approritate places in the exit to user loop and the interrupt return to kernel path, so that the rearming is always guaranteed. In the return to user space path this is handled in the same way as TIF_RSEQ to avoid extra instructions in the fast path, which are truly hurtful for device interrupt heavy work loads as the extra instructions and conditionals while benign at first sight accumulate quickly into measurable regressions. The return from syscall path is completely unaffected due to the above mentioned split so syscall heavy workloads wont have any extra burden. For now this is just placing empty stubs at the right places which are all optimized out by the compiler until the actual functionality is in place. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163431.066469985@kernel.org	2026-02-27 16:40:13 +01:00
Peter Zijlstra	a43b4856bc	hrtimer: Prepare stubs for deferred rearming The hrtimer interrupt expires timers and at the end of the interrupt it rearms the clockevent device for the next expiring timer. That's obviously correct, but in the case that a expired timer set NEED_RESCHED the return from interrupt ends up in schedule(). If HRTICK is enabled then schedule() will modify the hrtick timer, which causes another reprogramming of the hardware. That can be avoided by deferring the rearming to the return from interrupt path and if the return results in a immediate schedule() invocation then it can be deferred until the end of schedule(). To make this correct the affected code parts need to be made aware of this. Provide empty stubs for the deferred rearming mechanism, so that the relevant code changes for entry, softirq and scheduler can be split up into separate changes independent of the actual enablement in the hrtimer code. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163431.000891171@kernel.org	2026-02-27 16:40:13 +01:00
Thomas Gleixner	9e07a9c980	hrtimer: Rename hrtimer_cpu_base::in_hrtirq to deferred_rearm The upcoming deferred rearming scheme has the same effect as the deferred rearming when the hrtimer interrupt is executing. So it can reuse the in_hrtirq flag, but when it gets deferred beyond the hrtimer interrupt path, then the name does not make sense anymore. Rename it to deferred_rearm upfront to keep the actual functional change separate from the mechanical rename churn. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163430.935623347@kernel.org	2026-02-27 16:40:12 +01:00
Peter Zijlstra	2889243848	hrtimer: Re-arrange hrtimer_interrupt() Rework hrtimer_interrupt() such that reprogramming is split out into an independent function at the end of the interrupt. This prepares for reprogramming getting delayed beyond the end of hrtimer_interrupt(). Notably, this changes the hang handling to always wait 100ms instead of trying to keep it proportional to the actual delay. This simplifies the state, also this really shouldn't be happening. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163430.870639266@kernel.org	2026-02-27 16:40:12 +01:00
Thomas Gleixner	8e10f6b81a	hrtimer: Add hrtimer_rearm tracepoint Analyzing the reprogramming of the clock event device is essential to debug the behaviour of the hrtimer subsystem especially with the upcoming deferred rearming scheme. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163430.803669745@kernel.org	2026-02-27 16:40:12 +01:00
Thomas Gleixner	85a690d1c1	hrtimer: Separate remove/enqueue handling for local timers As the base switch can be avoided completely when the base stays the same the remove/enqueue handling can be more streamlined. Split it out into a separate function which handles both in one go which is way more efficient and makes the code simpler to follow. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163430.737600486@kernel.org	2026-02-27 16:40:11 +01:00
Thomas Gleixner	c939191457	hrtimer: Use NOHZ information for locality The decision to keep a timer which is associated to the local CPU on that CPU does not take NOHZ information into account. As a result there are a lot of hrtimer base switch invocations which end up not switching the base and stay on the local CPU. That's just work for nothing and can be further improved. If the local CPU is part of the NOISE housekeeping mask, then check: 1) Whether the local CPU has the tick running, which means it is either not idle or already expecting a timer soon. 2) Whether the tick is stopped and need_resched() is set, which means the CPU is about to exit idle. This reduces the amount of hrtimer base switch attempts, which end up on the local CPU anyway, significantly and prepares for further optimizations. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163430.673473029@kernel.org	2026-02-27 16:40:11 +01:00
Thomas Gleixner	3288cd4863	hrtimer: Optimize for local timers The decision whether to keep timers on the local CPU or on the CPU they are associated to is suboptimal and causes the expensive switch_hrtimer_base() mechanism to be invoked more than necessary. This is especially true for pinned timers. Rewrite the decision logic so that the current base is kept if: 1) The callback is running on the base 2) The timer is associated to the local CPU and the first expiring timer as that allows to optimize for reprogramming avoidance 3) The timer is associated to the local CPU and pinned 4) The timer is associated to the local CPU and timer migration is disabled. Only #2 was covered by the original code, but especially #3 makes a difference for high frequency rearming timers like the scheduler hrtick timer. If timer migration is disabled, then #4 avoids most of the base switches. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163430.607935269@kernel.org	2026-02-27 16:40:11 +01:00
Thomas Gleixner	22f011be7a	hrtimer: Convert state and properties to boolean All 'u8' flags are true booleans, so make it entirely clear that these can only contain true or false. This is especially true for hrtimer::state, which has a historical leftover of using the state with bitwise operations. That was used in the early hrtimer implementation with several bits, but then converted to a boolean state. But that conversion missed to replace the bit OR and bit check operations all over the place, which creates suboptimal code. As of today 'state' is a misnomer because it's only purpose is to reflect whether the timer is enqueued into the RB-tree or not. Rename it to 'is_queued' and make all operations on it boolean. This reduces text size from 8926 to 8732 bytes. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163430.542427240@kernel.org	2026-02-27 16:40:11 +01:00
Thomas Gleixner	7d27eafe54	hrtimer: Replace the bitfield in hrtimer_cpu_base Use bool for the various flags as that creates better code in the hot path. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163430.475262618@kernel.org	2026-02-27 16:40:10 +01:00
Thomas Gleixner	8ffc9ea881	hrtimer: Evaluate timer expiry only once No point in accessing the timer twice. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163430.409352042@kernel.org	2026-02-27 16:40:10 +01:00
Thomas Gleixner	0c6af0ea51	hrtimer: Cleanup coding style and comments As this code has some major surgery ahead, clean up coding style and bring comments up to date. No functional change intended. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163430.342740952@kernel.org	2026-02-27 16:40:10 +01:00
Thomas Gleixner	6abfc2bd5b	hrtimer: Use guards where appropriate Simplify and tidy up the code where possible. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163430.275551488@kernel.org	2026-02-27 16:40:09 +01:00
Thomas Gleixner	f2e388a019	hrtimer: Reduce trace noise in hrtimer_start() hrtimer_start() when invoked with an already armed timer traces like: <comm>-.. [032] d.h2. 5.002263: hrtimer_cancel: hrtimer= .... <comm>-.. [032] d.h1. 5.002263: hrtimer_start: hrtimer= .... Which is incorrect as the timer doesn't get canceled. Just the expiry time changes. The internal dequeue operation which is required for that is not really interesting for trace analysis. But it makes it tedious to keep real cancellations and the above case apart. Remove the cancel tracing in hrtimer_start() and add a 'was_armed' indicator to the hrtimer start tracepoint, which clearly indicates what the state of the hrtimer is when hrtimer_start() is invoked: <comm>-.. [032] d.h1. 6.200103: hrtimer_start: hrtimer= .... was_armed=0 <comm>-.. [032] d.h1. 6.200558: hrtimer_start: hrtimer= .... was_armed=1 Fixes: `c6a2a17702` ("hrtimer: Add tracepoint for hrtimers") Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163430.208491877@kernel.org	2026-02-27 16:40:09 +01:00
Thomas Gleixner	513e744a0a	hrtimer: Add debug object init assertion The debug object coverage in hrtimer_start_range_ns() happens too late to do anything useful. Implement the init assert assertion part and invoke that early in hrtimer_start_range_ns(). Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163430.143098153@kernel.org	2026-02-27 16:40:09 +01:00
Thomas Gleixner	f246ec3478	x86/apic: Enable TSC coupled programming mode The TSC deadline timer is directly coupled to the TSC and setting the next deadline is tedious as the clockevents core code converts the CLOCK_MONOTONIC based absolute expiry time to a relative expiry by reading the current time from the TSC. It converts that delta to cycles and hands the result to lapic_next_deadline(), which then has read to the TSC and add the delta to program the timer. The core code now supports coupled clock event devices and can provide the expiry time in TSC cycles directly without reading the TSC at all. This obviouly works only when the TSC is the current clocksource, but that's the default for all modern CPUs which implement the TSC deadline timer. If the TSC is not the current clocksource (e.g. early boot) then the core code falls back to the relative set_next_event() callback as before. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163430.076565985@kernel.org	2026-02-27 16:40:09 +01:00
Thomas Gleixner	89f951a1e8	clockevents: Provide support for clocksource coupled comparators Some clockevent devices are coupled to the system clocksource by implementing a less than or equal comparator which compares the programmed absolute expiry time against the underlying time counter. The timekeeping core provides a function to convert and absolute CLOCK_MONOTONIC based expiry time to a absolute clock cycles time which can be directly fed into the comparator. That spares two time reads in the next event progamming path, one to convert the absolute nanoseconds time to a delta value and the other to convert the delta value back to a absolute time value suitable for the comparator. Provide a new clocksource callback which takes the absolute cycle value and wire it up in clockevents_program_event(). Similar to clocksources allow architectures to inline the rearm operation. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163430.010425428@kernel.org	2026-02-27 16:40:08 +01:00
Thomas Gleixner	cd38bdb8e6	timekeeping: Provide infrastructure for coupled clockevents Some architectures have clockevent devices which are coupled to the system clocksource by implementing a less than or equal comparator which compares the programmed absolute expiry time against the underlying time counter. Well known examples are TSC/TSC deadline timer and the S390 TOD clocksource/comparator. While the concept is nice it has some downsides: 1) The clockevents core code is strictly based on relative expiry times as that's the most common case for clockevent device hardware. That requires to convert the absolute expiry time provided by the caller (hrtimers, NOHZ code) to a relative expiry time by reading and substracting the current time. The clockevent::set_next_event() callback must then read the counter again to convert the relative expiry back into a absolute one. 2) The conversion factors from nanoseconds to counter clock cycles are set up when the clockevent is registered. When NTP applies corrections then the clockevent conversion factors can deviate from the clocksource conversion substantially which either results in timers firing late or in the worst case early. The early expiry then needs to do a reprogam with a short delta. In most cases this is papered over by the fact that the read in the set_next_event() callback happens after the read which is used to calculate the delta. So the tendency is that timers expire mostly late. All of this can be avoided by providing support for these devices in the core code: 1) The timekeeping core keeps track of the last update to the clocksource by storing the base nanoseconds and the corresponding clocksource counter value. That's used to keep the conversion math for reading the time within 64-bit in the common case. This information can be used to avoid both reads of the underlying clocksource in the clockevents reprogramming path: delta = expiry - base_ns; cycles = base_cycles + ((delta * clockevent::mult) >> clockevent::shift); The resulting cycles value can be directly used to program the comparator. 2) As #1 does not longer provide the "compensation" through the second read the deviation of the clocksource and clockevent conversions caused by NTP become more prominent. This can be cured by letting the timekeeping core compute and store the reverse conversion factors when the clocksource cycles to nanoseconds factors are modified by NTP: CS::MULT (1 << NS_TO_CYC_SHIFT) --------------- = ---------------------- (1 << CS:SHIFT) NS_TO_CYC_MULT Ergo: NS_TO_CYC_MULT = (1 << (CS::SHIFT + NS_TO_CYC_SHIFT)) / CS::MULT The NS_TO_CYC_SHIFT value is calculated when the clocksource is installed so that it aims for a one hour maximum sleep time. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163429.944763521@kernel.org	2026-02-27 16:40:08 +01:00
Thomas Gleixner	2302828612	x86/apic: Avoid the PVOPS indirection for the TSC deadline timer XEN PV does not emulate the TSC deadline timer, so the PVOPS indirection for writing the deadline MSR can be avoided completely. Use native_wrmsrq() instead. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163429.877429827@kernel.org	2026-02-27 16:40:08 +01:00
Thomas Gleixner	92d0e753d5	x86/apic: Remove pointless fence in lapic_next_deadline() lapic_next_deadline() contains a fence before the TSC read and the write to the TSC_DEADLINE MSR with a content free and therefore useless comment: /* This MSR is special and need a special fence: */ The MSR is not really special. It is just not a serializing MSR, but that does not matter at all in this context as all of these operations are strictly CPU local. The only thing the fence prevents is that the RDTSC is speculated ahead, but that's not really relevant as the delta is calculated way before based on a previous TSC read and therefore inaccurate by definition. So removing the fence is just making it slightly more inaccurate in the worst case, but that is irrelevant as it's way below the actual system immanent latencies and variations. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163429.809059527@kernel.org	2026-02-27 16:40:07 +01:00
Thomas Gleixner	b27801189f	x86: Inline TSC reads in timekeeping Avoid the overhead of the indirect call for a single instruction to read the TSC. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163429.741886362@kernel.org	2026-02-27 16:40:07 +01:00
Thomas Gleixner	2e27beeb66	timekeeping: Allow inlining clocksource::read() On some architectures clocksource::read() boils down to a single instruction, so the indirect function call is just a massive overhead especially with speculative execution mitigations in effect. Allow architectures to enable conditional inlining of that read to avoid that by: - providing a static branch to switch to the inlined variant - disabling the branch before clocksource changes - enabling the branch after a clocksource change, when the clocksource indicates in a feature flag that it is the one which provides the inlined variant This is intentionally not a static call as that would only remove the indirect call, but not the rest of the overhead. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163429.675151545@kernel.org	2026-02-27 16:40:07 +01:00
Thomas Gleixner	7080280739	clockevents: Remove redundant CLOCK_EVT_FEAT_KTIME The only real usecase for this is the hrtimer based broadcast device. No point in using two different feature flags for this. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163429.609049777@kernel.org	2026-02-27 16:40:06 +01:00
Thomas Gleixner	adcec6a7f5	tick/sched: Avoid hrtimer_cancel/start() sequence The sequence of cancel and start is inefficient. It has to do the timer lock/unlock twice and in the worst case has to reprogram the underlying clock event device twice. The reason why it is done this way is the usage of hrtimer_forward_now(), which requires the timer to be inactive. But that can be completely avoided as the forward can be done on a variable and does not need any of the overrun accounting provided by hrtimer_forward_now(). Implement a trivial forwarding mechanism and replace the cancel/reprogram sequence with hrtimer_start(..., new_expiry). For the non high resolution case the timer is not actually armed, but used for storage so that code checking for expiry times can unconditially look it up in the timer. So it is safe for that case to set the new expiry time directly. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163429.542178086@kernel.org	2026-02-27 16:40:06 +01:00
Peter Zijlstra	0abec32a68	sched/hrtick: Mark hrtick timer LAZY_REARM The hrtick timer is frequently rearmed before expiry and most of the time the new expiry is past the armed one. As this happens on every context switch it becomes expensive with scheduling heavy work loads especially in virtual machines as the "hardware" reprogamming implies a VM exit. hrtimer now provide a lazy rearm mode flag which skips the reprogamming if: 1) The timer was the first expiring timer before the rearm 2) The new expiry time is farther out than the armed time This avoids a massive amount of reprogramming operations of the hrtick timer for the price of eventually taking the alredy armed interrupt for nothing. Mark the hrtick timer accordingly. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163429.475409346@kernel.org	2026-02-27 16:40:06 +01:00
Peter Zijlstra	b7dd64778a	hrtimer: Provide LAZY_REARM mode The hrtick timer is frequently rearmed before expiry and most of the time the new expiry is past the armed one. As this happens on every context switch it becomes expensive with scheduling heavy work loads especially in virtual machines as the "hardware" reprogamming implies a VM exit. Add a lazy rearm mode flag which skips the reprogamming if: 1) The timer was the first expiring timer before the rearm 2) The new expiry time is farther out than the armed time This avoids a massive amount of reprogramming operations of the hrtick timer for the price of eventually taking the alredy armed interrupt for nothing. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163429.408524456@kernel.org	2026-02-27 16:40:06 +01:00
Thomas Gleixner	c8cdb9b516	sched/hrtick: Avoid tiny hrtick rearms Tiny adjustments to the hrtick expiry time below 5 microseconds are just causing extra work for no real value. Filter them out when restarting the hrtick. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163429.340593047@kernel.org	2026-02-27 16:40:05 +01:00
Thomas Gleixner	96d1610e0b	sched: Optimize hrtimer handling schedule() provides several mechanisms to update the hrtick timer: 1) When the next task is picked 2) When the balance callbacks are invoked before rq::lock is released Each of them can result in a first expiring timer and cause a reprogram of the clock event device. Solve this by deferring the rearm to the end of schedule() right before releasing rq::lock by setting a flag on entry which tells hrtick_start() to cache the runtime constraint in rq::hrtick_delay without touching the timer itself. Right before releasing rq::lock evaluate the flags and either rearm or cancel the hrtick timer. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163429.273068659@kernel.org	2026-02-27 16:40:05 +01:00
Thomas Gleixner	c3a92213eb	sched: Use hrtimer_highres_enabled() Use the static branch based variant and thereby avoid following three pointers. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163429.203610956@kernel.org	2026-02-27 16:40:05 +01:00
Thomas Gleixner	0a93d30861	hrtimer: Provide a static branch based hrtimer_hres_enabled() The scheduler evaluates this via hrtimer_is_hres_active() every time it has to update HRTICK. This needs to follow three pointers, which is expensive. Provide a static branch based mechanism to avoid that. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163429.136503358@kernel.org	2026-02-27 16:40:04 +01:00
Peter Zijlstra	d19ff16c11	hrtimer: Avoid pointless reprogramming in __hrtimer_start_range_ns() Much like hrtimer_reprogram(), skip programming if the cpu_base is running the hrtimer interrupt. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Juri Lelli <juri.lelli@redhat.com> Reviewed-by: Thomas Gleixner <tglx@kernel.org> Link: https://patch.msgid.link/20260224163429.069535561@kernel.org	2026-02-27 16:40:04 +01:00
Thomas Gleixner	d70c1080a9	sched: Avoid ktime_get() indirection The clock of the hrtick and deadline timers is known to be CLOCK_MONOTONIC. No point in looking it up via hrtimer_cb_get_time(). Just use ktime_get() directly. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163429.001511662@kernel.org	2026-02-27 16:40:04 +01:00
Peter Zijlstra (Intel)	5d88e424ec	sched/fair: Make hrtick resched hard Since the tick causes hard preemption, the hrtick should too. Letting the hrtick do lazy preemption completely defeats the purpose, since it will then still be delayed until a old tick and be dependent on CONFIG_HZ. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163428.933894105@kernel.org	2026-02-27 16:40:04 +01:00
Peter Zijlstra (Intel)	9701537664	sched/fair: Simplify hrtick_update() hrtick_update() was needed when the slice depended on nr_running, all that code is gone. All that remains is starting the hrtick when nr_running becomes more than 1. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163428.866374835@kernel.org	2026-02-27 16:40:03 +01:00
Peter Zijlstra	558c18d3fb	sched/eevdf: Fix HRTICK duration The nominal duration for an EEVDF task to run is until its deadline. At which point the deadline is moved ahead and a new task selection is done. Try and predict the time 'lost' to higher scheduling classes. Since this is an estimate, the timer can be both early or late. In case it is early task_tick_fair() will take the !need_resched() path and restarts the timer. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Juri Lelli <juri.lelli@redhat.com> Link: https://patch.msgid.link/20260224163428.798198874@kernel.org	2026-02-27 16:40:03 +01:00
Linus Torvalds	6de23f81a5	Linux 7.0-rc1 v7.0-rc1	2026-02-22 13:18:59 -08:00
Linus Torvalds	fbf3380361	Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux Pull fsverity fixes from Eric Biggers: - Fix a build error on parisc - Remove the non-large-folio-aware function fsverity_verify_page() * tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux: fsverity: fix build error by adding fsverity_readahead() stub fsverity: remove fsverity_verify_page() f2fs: make f2fs_verify_cluster() partially large-folio-aware f2fs: remove unnecessary ClearPageUptodate in f2fs_verify_cluster()	2026-02-22 13:12:04 -08:00
Linus Torvalds	75e1f66a9e	Merge tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux Pull crypto library fix from Eric Biggers: "Fix a big endian specific issue in the PPC64-optimized AES code" * tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: lib/crypto: powerpc/aes: Fix rndkey_from_vsx() on big endian CPUs	2026-02-22 13:09:33 -08:00
Mark Brown	aaf96df959	CREDITS: Add -next to Stephen Rothwell's entry Stephen retired and stepped back from -next maintainership, update his entry in CREDITS to recognise his 18 years of hard work making it what it is today and all the impact it's had on our development process. Also update to his current GnuPG key while we're here. Acked-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: SeongJae Park <sj@kernel.org> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Mark Brown <broonie@kernel.org> Acked-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-22 12:11:33 -08:00
Arnd Bergmann	746b9ef5d5	x509: select CONFIG_CRYPTO_LIB_SHA256 The x509 public key code gained a dependency on the sha256 hash implementation, causing a rare link time failure in randconfig builds: arm-linux-gnueabi-ld: crypto/asymmetric_keys/x509_public_key.o: in function `x509_get_sig_params': x509_public_key.c:(.text.x509_get_sig_params+0x12): undefined reference to `sha256' arm-linux-gnueabi-ld: (sha256): Unknown destination type (ARM/Thumb) in crypto/asymmetric_keys/x509_public_key.o x509_public_key.c:(.text.x509_get_sig_params+0x12): dangerous relocation: unsupported relocation Select the necessary library code from Kconfig. Fixes: `2c62068ac8` ("x509: Separately calculate sha256 for blacklist") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-22 12:09:23 -08:00
Haiyue Wang	fd1d6b9d13	xz: fix arm fdt compile error for kmalloc replacement Align to the commit `bf4afc53b7` ("Convert 'alloc_obj' family to use the new default GFP_KERNEL argument") update the 'kmalloc_obj' declaration for userspace to fix below compile error: In file included from arch/arm/boot/compressed/../../../../lib/decompress_unxz.c:241, from arch/arm/boot/compressed/decompress.c:56: arch/arm/boot/compressed/../../../../lib/xz/xz_dec_stream.c: In function 'xz_dec_init': arch/arm/boot/compressed/../../../../lib/xz/xz_dec_stream.c:787:28: error: implicit declaration of function 'kmalloc_obj'; did you mean 'kmalloc'? [-Wimplicit-function-declaration] 787 \| struct xz_dec s = kmalloc_obj(s); \| ^~~~~~~~~~~ \| kmalloc Signed-off-by: Haiyue Wang <haiyuewa@163.com> Fixes: `69050f8d6d` ("treewide: Replace kmalloc with kmalloc_obj for non-scalar types") Fixes: `bf4afc53b7` ("Convert 'alloc_obj' family to use the new default GFP_KERNEL argument") Reviewed-by: Kees Cook <kees@kernel.org> Acked-by: Lasse Collin <lasse.collin@tukaani.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-22 12:05:31 -08:00
Linus Torvalds	5f2eac7767	Merge tag 'rtc-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux Pull RTC updates from Alexandre Belloni: - loongson: Loongson-2K0300 support - s35390a: nvmem support - zynqmp: rework calibration * tag 'rtc-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: rtc: ds1390: fix number of bytes read from RTC rtc: class: Remove duplicate check for alarm rtc: optee: simplify OP-TEE context match rtc: interface: Alarm race handling should not discard preceding error rtc: s35390a: implement nvmem support rtc: loongson: Add Loongson-2K0300 support dt-bindings: rtc: loongson: Document Loongson-2K0300 compatible dt-bindings: rtc: loongson: Correct Loongson-1C interrupts property dt-bindings: rtc: renesas,rz-rtca3: Add RZ/V2N support dt-bindings: rtc: cpcap: convert to schema rtc: zynqmp: use dynamic max and min offset ranges rtc: zynqmp: rework set_offset rtc: zynqmp: rework read_offset rtc: zynqmp: check calibration max value rtc: zynqmp: correct frequency value rtc: amlogic-a4: Remove IRQF_ONESHOT rtc: pcf8563: use correct of_node for output clock rtc: max31335: use correct CONFIG symbol in IS_REACHABLE() rtc: nvvrs: Add ARCH_TEGRA to the NV VRS RTC driver	2026-02-22 09:43:11 -08:00
Linus Torvalds	1dd419145d	Merge tag 'rust-fixes-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux Pull rust fixes from Miguel Ojeda: "Toolchain and infrastructure: - Pass '-Zunstable-options' flag required by the future Rust 1.95.0 - Fix 'objtool' warning for Rust 1.84.0 'kernel' crate: - 'irq' module: add missing bound detected by the future Rust 1.95.0 - 'list' module: add missing 'unsafe' blocks and placeholder safety comments to macros (an issue for future callers within the crate) 'pin-init' crate: - Clean Clippy warning that changed behavior in the future Rust 1.95.0" * tag 'rust-fixes-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux: rust: list: Add unsafe blocks for container_of and safety comments rust: pin-init: replace clippy `expect` with `allow` rust: irq: add `'static` bounds to irq callbacks objtool/rust: add one more `noreturn` Rust function rust: kbuild: pass `-Zunstable-options` for Rust 1.95.0	2026-02-22 08:43:31 -08:00
Linus Torvalds	d2ba6e9c0a	Merge tag 'trace-rv-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull runtime verifier fix from Steven Rostedt: - Fix multiple definition of __pcpu_unique_da_mon_this After refactoring monitors, we used static per-cpu variables with the same names across different per-cpu monitors. This is explicitly disallowed for modules on some architectures (alpha) or if CONFIG_DEBUG_FORCE_WEAK_PER_CPU is enabled (e.g. Fedora's debug kernel). Make sure all those variables have different names to avoid compilation issues. * tag 'trace-rv-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: rv: Fix multiple definition of __pcpu_unique_da_mon_this	2026-02-22 08:40:13 -08:00
Kees Cook	189f164e57	Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL uses Conversion performed via this Coccinelle script: // SPDX-License-Identifier: GPL-2.0-only // Options: --include-headers-for-types --all-includes --include-headers --keep-comments virtual patch @gfp depends on patch && !(file in "tools") && !(file in "samples")@ identifier ALLOC = {kmalloc_obj,kmalloc_objs,kmalloc_flex, kzalloc_obj,kzalloc_objs,kzalloc_flex, kvmalloc_obj,kvmalloc_objs,kvmalloc_flex, kvzalloc_obj,kvzalloc_objs,kvzalloc_flex}; @@ ALLOC(... - , GFP_KERNEL ) $ make coccicheck MODE=patch COCCI=gfp.cocci Build and boot tested x86_64 with Fedora 42's GCC and Clang: Linux version 6.19.0+ (user@host) (gcc (GCC) 15.2.1 20260123 (Red Hat 15.2.1-7), GNU ld version 2.44-12.fc42) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Linux version 6.19.0+ (user@host) (clang version 20.1.8 (Fedora 20.1.8-4.fc42), LLD 20.1.8) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-22 08:26:33 -08:00
Linus Torvalds	32a92f8c89	Convert more 'alloc_obj' cases to default GFP_KERNEL arguments This converts some of the visually simpler cases that have been split over multiple lines. I only did the ones that are easy to verify the resulting diff by having just that final GFP_KERNEL argument on the next line. Somebody should probably do a proper coccinelle script for this, but for me the trivial script actually resulted in an assertion failure in the middle of the script. I probably had made it a bit _too_ trivial. So after fighting that far a while I decided to just do some of the syntactically simpler cases with variations of the previous 'sed' scripts. The more syntactically complex multi-line cases would mostly really want whitespace cleanup anyway. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 20:03:00 -08:00
Linus Torvalds	323bbfcf1e	Convert 'alloc_flex' family to use the new default GFP_KERNEL argument This is the exact same thing as the 'alloc_obj()' version, only much smaller because there are a lot fewer users of the alloc_flex() interface. As with alloc_obj() version, this was done entirely with mindless brute force, using the same script, except using 'flex' in the pattern rather than 'objs'. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 17:09:51 -08:00
Linus Torvalds	bf4afc53b7	Convert 'alloc_obj' family to use the new default GFP_KERNEL argument This was done entirely with mindless brute force, using git grep -l '\<k[vmz]alloc_objs(., GFP_KERNEL)' \| xargs sed -i 's/$alloc_objs(.*$, GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 17:09:51 -08:00
Linus Torvalds	e19e1b480a	add default_gfp() helper macro and use it in the new alloc_obj() helpers Most simple allocations use GFP_KERNEL, and with the new allocation helpers being introduced, let's just take advantage of that to simplify that default case. It's a numbers game: git grep 'alloc_obj(' \| sed 's/.$GFP_[_A-Z]$./\1/' \| sort \| uniq -c \| sort -n \| tail shows that about 90% of all those new allocator instances just use that standard GFP_KERNEL. Those helpers are already macros, and we can easily just make it be the default case when the gfp argument is missing. And yes, we could do that for all the legacy interfaces too, but let's keep it to just the new ones at least for now, since those all got converted recently anyway, so this is not any "extra" noise outside of that limited conversion. And, in fact, I want to do this before doing the -rc1 release, exactly so that we don't get extra merge conflicts. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 17:09:50 -08:00
Linus Torvalds	fa5c82f4d2	slab.h: disable completely broken overflow handling in flex allocations Commit `69050f8d6d` ("treewide: Replace kmalloc with kmalloc_obj for non-scalar types") started using the new allocation helpers, and in the process showed that they were completely non-working. The overflow logic in overflows_flex_counter_type() is completely the wrong way around, and that broke __alloc_flex() completely. By chance, the resulting code was then such a mess that clang generated sufficiently garbage code that objtool warned about it all. Which made it somewhat quicker to narrow things down. While fixing overflows_flex_counter_type() would presumably fix this all, I'm excising the whole broken overflow logic from __alloc_flex(), because we don't want that kind of code in basic allocation functions anyway. That (no longer) broken overflows_flex_counter_type() thing needs to be inserted into the actual __set_flex_counter() logic in the unlikely case that we ever want this at all. And made conditional. Fixes: `81cee9166a` ("compiler_types: Introduce __flex_counter() and family") Fixes: `69050f8d6d` ("treewide: Replace kmalloc with kmalloc_obj for non-scalar types") Cc: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/all/CAHk-=whEd020BYzGTzYrENjD9Z5_82xx6h8HsQvH5xDSnv0=Hw@mail.gmail.com/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 15:12:09 -08:00

1 2 3 4 5 ...

1426932 Commits