mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2025-12-27 12:21:22 -05:00
Pull power management updates from Rafael Wysocki:
"There are quite a few interesting things here, including new hardware
support, new features, some bug fixes and documentation updates. In
addition, there are a usual bunch of minor fixes and cleanups all
over.
In the new hardware support category, there are intel_pstate and
intel_rapl driver updates to support new processors, Panther Lake,
Wildcat Lake, Noval Lake, and Diamond Rapids in the OOB mode, OPP and
bandwidth allocation support in the tegra186 cpufreq driver, and
JH7110S SOC support in dt-platdev cpufreq.
The new features are the PM QoS CPU latency limit for suspend-to-idle,
the netlink support for the energy model management, support for
terminating system suspend via a wakeup event during the sync of file
systems, configurable number of hibernation compression threads, the
runtime PM auto-cleanup macros, and the "poweroff" PM event that is
expected to be used during system shutdown.
Bugs are mostly fixed in cpuidle governors, but there are also fixes
elsewhere, like in the amd-pstate cpufreq driver.
Documentation updates include, but are not limited to, a new doc on
debugging shutdown hangs, cross-referencing fixes and cleanups in the
intel_pstate documentation, and updates of comments in the core
hibernation code.
Specifics:
- Introduce and document a QoS limit on CPU exit latency during
wakeup from suspend-to-idle (Ulf Hansson)
- Add support for building libcpupower statically (Zuo An)
- Add support for sending netlink notifications to user space on
energy model updates (Changwoo Mini, Peng Fan)
- Minor improvements to the Rust OPP interface (Tamir Duberstein)
- Fixes to scope-based pointers in the OPP library (Viresh Kumar)
- Use residency threshold in polling state override decisions in the
menu cpuidle governor (Aboorva Devarajan)
- Add sanity check for exit latency and target residency in the
cpufreq core (Rafael Wysocki)
- Use this_cpu_ptr() where possible in the teo governor (Christian
Loehle)
- Rework the handling of tick wakeups in the teo cpuidle governor to
increase the likelihood of stopping the scheduler tick in the cases
when tick wakeups can be counted as non-timer ones (Rafael Wysocki)
- Fix a reverse condition in the teo cpuidle governor and drop a
misguided target residency check from it (Rafael Wysocki)
- Clean up multiple minor defects in the teo cpuidle governor (Rafael
Wysocki)
- Update header inclusion to make it follow the Include What You Use
principle (Andy Shevchenko)
- Enable MSR-based RAPL PMU support in the intel_rapl power capping
driver and arrange for using it on the Panther Lake and Wildcat
Lake processors (Kuppuswamy Sathyanarayanan)
- Add support for Nova Lake and Wildcat Lake processors to the
intel_rapl power capping driver (Kaushlendra Kumar, Srinivas
Pandruvada)
- Add OPP and bandwidth support for Tegra186 (Aaron Kling)
- Optimizations for parameter array handling in the amd-pstate
cpufreq driver (Mario Limonciello)
- Fix for mode changes with offline CPUs in the amd-pstate cpufreq
driver (Gautham Shenoy)
- Preserve freq_table_sorted across suspend/hibernate in the cpufreq
core (Zihuan Zhang)
- Adjust energy model rules for Intel hybrid platforms in the
intel_pstate cpufreq driver and improve printing of debug messages
in it (Rafael Wysocki)
- Replace deprecated strcpy() in cpufreq_unregister_governor()
(Thorsten Blum)
- Fix duplicate hyperlink target errors in the intel_pstate cpufreq
driver documentation and use :ref: directive for internal linking
in it (Swaraj Gaikwad, Bagas Sanjaya)
- Add Diamond Rapids OOB mode support to the intel_pstate cpufreq
driver (Kuppuswamy Sathyanarayanan)
- Use mutex guard for driver locking in the intel_pstate driver and
eliminate some code duplication from it (Rafael Wysocki)
- Replace udelay() with usleep_range() in ACPI cpufreq (Kaushlendra
Kumar)
- Minor improvements to various cpufreq drivers (Christian Marangi,
Hal Feng, Jie Zhan, Marco Crivellari, Miaoqian Lin, and Shuhao Fu)
- Replace snprintf() with scnprintf() in show_trace_dev_match()
(Kaushlendra Kumar)
- Fix memory allocation error handling in pm_vt_switch_required()
(Malaya Kumar Rout)
- Introduce CALL_PM_OP() macro and use it to simplify code in generic
PM operations (Kaushlendra Kumar)
- Add module param to backtrace all CPUs in the device power
management watchdog (Sergey Senozhatsky)
- Rework message printing in swsusp_save() (Rafael Wysocki)
- Make it possible to change the number of hibernation compression
threads (Xueqin Luo)
- Clarify that only cgroup1 freezer uses PM freezer (Tejun Heo)
- Add document on debugging shutdown hangs to PM documentation and
correct a mistaken configuration option in it (Mario Limonciello)
- Shut down wakeup source timer before removing the wakeup source
from the list (Kaushlendra Kumar, Rafael Wysocki)
- Introduce new PMSG_POWEROFF event for system shutdown handling with
the help of PM device callbacks (Mario Limonciello)
- Make pm_test delay interruptible by wakeup events (Riwen Lu)
- Clean up kernel-doc comment style usage in the core hibernation
code and remove unuseful comments from it (Sunday Adelodun, Rafael
Wysocki)
- Add support for handling wakeup events and aborting the suspend
process while it is syncing file systems (Samuel Wu, Rafael
Wysocki)
- Add WQ_UNBOUND to pm_wq workqueue (Marco Crivellari)
- Add runtime PM wrapper macros for ACQUIRE()/ACQUIRE_ERR() and use
them in the PCI core and the ACPI TAD driver (Rafael Wysocki)
- Improve runtime PM in the ACPI TAD driver (Rafael Wysocki)
- Update pm_runtime_allow/forbid() documentation (Rafael Wysocki)
- Fix typos in runtime.c comments (Malaya Kumar Rout)
- Move governor.h from devfreq under include/linux/ and rename to
devfreq-governor.h to allow devfreq governor definitions in out of
drivers/devfreq/ (Dmitry Baryshkov)
- Use min() to improve readability in tegra30-devfreq.c (Thorsten
Blum)
- Fix potential use-after-free issue of OPP handling in
hisi_uncore_freq.c (Pengjie Zhang)
- Fix typo in DFSO_DOWNDIFFERENTIAL macro name in
governor_simpleondemand.c in devfreq (Riwen Lu)"
* tag 'pm-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (96 commits)
PM / devfreq: Fix typo in DFSO_DOWNDIFFERENTIAL macro name
cpuidle: Warn instead of bailing out if target residency check fails
cpuidle: Update header inclusion
Documentation: power/cpuidle: Document the CPU system wakeup latency QoS
cpuidle: Respect the CPU system wakeup QoS limit for cpuidle
sched: idle: Respect the CPU system wakeup QoS limit for s2idle
pmdomain: Respect the CPU system wakeup QoS limit for cpuidle
pmdomain: Respect the CPU system wakeup QoS limit for s2idle
PM: QoS: Introduce a CPU system wakeup QoS limit
cpuidle: governors: teo: Add missing space to the description
PM: hibernate: Extra cleanup of comments in swap handling code
PM / devfreq: tegra30: use min to simplify actmon_cpu_to_emc_rate
PM / devfreq: hisi: Fix potential UAF in OPP handling
PM / devfreq: Move governor.h to a public header location
powercap: intel_rapl: Enable MSR-based RAPL PMU support
powercap: intel_rapl: Prepare read_raw() interface for atomic-context callers
cpufreq: qcom-nvmem: fix compilation warning for qcom_cpufreq_ipq806x_match_list
PM: sleep: Call pm_sleep_fs_sync() instead of ksys_sync_helper()
PM: sleep: Add support for wakeup during filesystem sync
cpufreq: ACPI: Replace udelay() with usleep_range()
...
563 lines
14 KiB
C
563 lines
14 KiB
C
// SPDX-License-Identifier: GPL-2.0-only
|
|
/*
|
|
* Generic entry points for the idle threads and
|
|
* implementation of the idle task scheduling class.
|
|
*
|
|
* (NOTE: these are not related to SCHED_IDLE batch scheduled
|
|
* tasks which are handled in sched/fair.c )
|
|
*/
|
|
#include <linux/cpuidle.h>
|
|
#include <linux/suspend.h>
|
|
#include <linux/livepatch.h>
|
|
#include "sched.h"
|
|
#include "smp.h"
|
|
|
|
/* Linker adds these: start and end of __cpuidle functions */
|
|
extern char __cpuidle_text_start[], __cpuidle_text_end[];
|
|
|
|
/**
|
|
* sched_idle_set_state - Record idle state for the current CPU.
|
|
* @idle_state: State to record.
|
|
*/
|
|
void sched_idle_set_state(struct cpuidle_state *idle_state)
|
|
{
|
|
idle_set_state(this_rq(), idle_state);
|
|
}
|
|
|
|
static int __read_mostly cpu_idle_force_poll;
|
|
|
|
void cpu_idle_poll_ctrl(bool enable)
|
|
{
|
|
if (enable) {
|
|
cpu_idle_force_poll++;
|
|
} else {
|
|
cpu_idle_force_poll--;
|
|
WARN_ON_ONCE(cpu_idle_force_poll < 0);
|
|
}
|
|
}
|
|
|
|
#ifdef CONFIG_GENERIC_IDLE_POLL_SETUP
|
|
static int __init cpu_idle_poll_setup(char *__unused)
|
|
{
|
|
cpu_idle_force_poll = 1;
|
|
|
|
return 1;
|
|
}
|
|
__setup("nohlt", cpu_idle_poll_setup);
|
|
|
|
static int __init cpu_idle_nopoll_setup(char *__unused)
|
|
{
|
|
cpu_idle_force_poll = 0;
|
|
|
|
return 1;
|
|
}
|
|
__setup("hlt", cpu_idle_nopoll_setup);
|
|
#endif /* CONFIG_GENERIC_IDLE_POLL_SETUP */
|
|
|
|
static noinline int __cpuidle cpu_idle_poll(void)
|
|
{
|
|
instrumentation_begin();
|
|
trace_cpu_idle(0, smp_processor_id());
|
|
stop_critical_timings();
|
|
ct_cpuidle_enter();
|
|
|
|
raw_local_irq_enable();
|
|
while (!tif_need_resched() &&
|
|
(cpu_idle_force_poll || tick_check_broadcast_expired()))
|
|
cpu_relax();
|
|
raw_local_irq_disable();
|
|
|
|
ct_cpuidle_exit();
|
|
start_critical_timings();
|
|
trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
|
|
local_irq_enable();
|
|
instrumentation_end();
|
|
|
|
return 1;
|
|
}
|
|
|
|
/* Weak implementations for optional arch specific functions */
|
|
void __weak arch_cpu_idle_prepare(void) { }
|
|
void __weak arch_cpu_idle_enter(void) { }
|
|
void __weak arch_cpu_idle_exit(void) { }
|
|
void __weak __noreturn arch_cpu_idle_dead(void) { while (1); }
|
|
void __weak arch_cpu_idle(void)
|
|
{
|
|
cpu_idle_force_poll = 1;
|
|
}
|
|
|
|
#ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST_IDLE
|
|
DEFINE_STATIC_KEY_FALSE(arch_needs_tick_broadcast);
|
|
|
|
static inline void cond_tick_broadcast_enter(void)
|
|
{
|
|
if (static_branch_unlikely(&arch_needs_tick_broadcast))
|
|
tick_broadcast_enter();
|
|
}
|
|
|
|
static inline void cond_tick_broadcast_exit(void)
|
|
{
|
|
if (static_branch_unlikely(&arch_needs_tick_broadcast))
|
|
tick_broadcast_exit();
|
|
}
|
|
#else /* !CONFIG_GENERIC_CLOCKEVENTS_BROADCAST_IDLE: */
|
|
static inline void cond_tick_broadcast_enter(void) { }
|
|
static inline void cond_tick_broadcast_exit(void) { }
|
|
#endif /* !CONFIG_GENERIC_CLOCKEVENTS_BROADCAST_IDLE */
|
|
|
|
/**
|
|
* default_idle_call - Default CPU idle routine.
|
|
*
|
|
* To use when the cpuidle framework cannot be used.
|
|
*/
|
|
void __cpuidle default_idle_call(void)
|
|
{
|
|
instrumentation_begin();
|
|
if (!current_clr_polling_and_test()) {
|
|
cond_tick_broadcast_enter();
|
|
trace_cpu_idle(1, smp_processor_id());
|
|
stop_critical_timings();
|
|
|
|
ct_cpuidle_enter();
|
|
arch_cpu_idle();
|
|
ct_cpuidle_exit();
|
|
|
|
start_critical_timings();
|
|
trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
|
|
cond_tick_broadcast_exit();
|
|
}
|
|
local_irq_enable();
|
|
instrumentation_end();
|
|
}
|
|
|
|
static int call_cpuidle_s2idle(struct cpuidle_driver *drv,
|
|
struct cpuidle_device *dev,
|
|
u64 max_latency_ns)
|
|
{
|
|
if (current_clr_polling_and_test())
|
|
return -EBUSY;
|
|
|
|
return cpuidle_enter_s2idle(drv, dev, max_latency_ns);
|
|
}
|
|
|
|
static int call_cpuidle(struct cpuidle_driver *drv, struct cpuidle_device *dev,
|
|
int next_state)
|
|
{
|
|
/*
|
|
* The idle task must be scheduled, it is pointless to go to idle, just
|
|
* update no idle residency and return.
|
|
*/
|
|
if (current_clr_polling_and_test()) {
|
|
dev->last_residency_ns = 0;
|
|
local_irq_enable();
|
|
return -EBUSY;
|
|
}
|
|
|
|
/*
|
|
* Enter the idle state previously returned by the governor decision.
|
|
* This function will block until an interrupt occurs and will take
|
|
* care of re-enabling the local interrupts
|
|
*/
|
|
return cpuidle_enter(drv, dev, next_state);
|
|
}
|
|
|
|
/**
|
|
* cpuidle_idle_call - the main idle function
|
|
*
|
|
* NOTE: no locks or semaphores should be used here
|
|
*
|
|
* On architectures that support TIF_POLLING_NRFLAG, is called with polling
|
|
* set, and it returns with polling set. If it ever stops polling, it
|
|
* must clear the polling bit.
|
|
*/
|
|
static void cpuidle_idle_call(void)
|
|
{
|
|
struct cpuidle_device *dev = cpuidle_get_device();
|
|
struct cpuidle_driver *drv = cpuidle_get_cpu_driver(dev);
|
|
int next_state, entered_state;
|
|
|
|
/*
|
|
* Check if the idle task must be rescheduled. If it is the
|
|
* case, exit the function after re-enabling the local IRQ.
|
|
*/
|
|
if (need_resched()) {
|
|
local_irq_enable();
|
|
return;
|
|
}
|
|
|
|
if (cpuidle_not_available(drv, dev)) {
|
|
tick_nohz_idle_stop_tick();
|
|
|
|
default_idle_call();
|
|
goto exit_idle;
|
|
}
|
|
|
|
/*
|
|
* Suspend-to-idle ("s2idle") is a system state in which all user space
|
|
* has been frozen, all I/O devices have been suspended and the only
|
|
* activity happens here and in interrupts (if any). In that case bypass
|
|
* the cpuidle governor and go straight for the deepest idle state
|
|
* available. Possibly also suspend the local tick and the entire
|
|
* timekeeping to prevent timer interrupts from kicking us out of idle
|
|
* until a proper wakeup interrupt happens.
|
|
*/
|
|
|
|
if (idle_should_enter_s2idle() || dev->forced_idle_latency_limit_ns) {
|
|
u64 max_latency_ns;
|
|
|
|
if (idle_should_enter_s2idle()) {
|
|
max_latency_ns = cpu_wakeup_latency_qos_limit() *
|
|
NSEC_PER_USEC;
|
|
|
|
entered_state = call_cpuidle_s2idle(drv, dev,
|
|
max_latency_ns);
|
|
if (entered_state > 0)
|
|
goto exit_idle;
|
|
} else {
|
|
max_latency_ns = dev->forced_idle_latency_limit_ns;
|
|
}
|
|
|
|
tick_nohz_idle_stop_tick();
|
|
|
|
next_state = cpuidle_find_deepest_state(drv, dev, max_latency_ns);
|
|
call_cpuidle(drv, dev, next_state);
|
|
} else {
|
|
bool stop_tick = true;
|
|
|
|
/*
|
|
* Ask the cpuidle framework to choose a convenient idle state.
|
|
*/
|
|
next_state = cpuidle_select(drv, dev, &stop_tick);
|
|
|
|
if (stop_tick || tick_nohz_tick_stopped())
|
|
tick_nohz_idle_stop_tick();
|
|
else
|
|
tick_nohz_idle_retain_tick();
|
|
|
|
entered_state = call_cpuidle(drv, dev, next_state);
|
|
/*
|
|
* Give the governor an opportunity to reflect on the outcome
|
|
*/
|
|
cpuidle_reflect(dev, entered_state);
|
|
}
|
|
|
|
exit_idle:
|
|
__current_set_polling();
|
|
|
|
/*
|
|
* It is up to the idle functions to re-enable local interrupts
|
|
*/
|
|
if (WARN_ON_ONCE(irqs_disabled()))
|
|
local_irq_enable();
|
|
}
|
|
|
|
/*
|
|
* Generic idle loop implementation
|
|
*
|
|
* Called with polling cleared.
|
|
*/
|
|
static void do_idle(void)
|
|
{
|
|
int cpu = smp_processor_id();
|
|
|
|
/*
|
|
* Check if we need to update blocked load
|
|
*/
|
|
nohz_run_idle_balance(cpu);
|
|
|
|
/*
|
|
* If the arch has a polling bit, we maintain an invariant:
|
|
*
|
|
* Our polling bit is clear if we're not scheduled (i.e. if rq->curr !=
|
|
* rq->idle). This means that, if rq->idle has the polling bit set,
|
|
* then setting need_resched is guaranteed to cause the CPU to
|
|
* reschedule.
|
|
*/
|
|
|
|
__current_set_polling();
|
|
tick_nohz_idle_enter();
|
|
|
|
while (!need_resched()) {
|
|
|
|
/*
|
|
* Interrupts shouldn't be re-enabled from that point on until
|
|
* the CPU sleeping instruction is reached. Otherwise an interrupt
|
|
* may fire and queue a timer that would be ignored until the CPU
|
|
* wakes from the sleeping instruction. And testing need_resched()
|
|
* doesn't tell about pending needed timer reprogram.
|
|
*
|
|
* Several cases to consider:
|
|
*
|
|
* - SLEEP-UNTIL-PENDING-INTERRUPT based instructions such as
|
|
* "wfi" or "mwait" are fine because they can be entered with
|
|
* interrupt disabled.
|
|
*
|
|
* - sti;mwait() couple is fine because the interrupts are
|
|
* re-enabled only upon the execution of mwait, leaving no gap
|
|
* in-between.
|
|
*
|
|
* - ROLLBACK based idle handlers with the sleeping instruction
|
|
* called with interrupts enabled are NOT fine. In this scheme
|
|
* when the interrupt detects it has interrupted an idle handler,
|
|
* it rolls back to its beginning which performs the
|
|
* need_resched() check before re-executing the sleeping
|
|
* instruction. This can leak a pending needed timer reprogram.
|
|
* If such a scheme is really mandatory due to the lack of an
|
|
* appropriate CPU sleeping instruction, then a FAST-FORWARD
|
|
* must instead be applied: when the interrupt detects it has
|
|
* interrupted an idle handler, it must resume to the end of
|
|
* this idle handler so that the generic idle loop is iterated
|
|
* again to reprogram the tick.
|
|
*/
|
|
local_irq_disable();
|
|
|
|
if (cpu_is_offline(cpu)) {
|
|
cpuhp_report_idle_dead();
|
|
arch_cpu_idle_dead();
|
|
}
|
|
|
|
arch_cpu_idle_enter();
|
|
rcu_nocb_flush_deferred_wakeup();
|
|
|
|
/*
|
|
* In poll mode we re-enable interrupts and spin. Also if we
|
|
* detected in the wakeup from idle path that the tick
|
|
* broadcast device expired for us, we don't want to go deep
|
|
* idle as we know that the IPI is going to arrive right away.
|
|
*/
|
|
if (cpu_idle_force_poll || tick_check_broadcast_expired()) {
|
|
tick_nohz_idle_restart_tick();
|
|
cpu_idle_poll();
|
|
} else {
|
|
cpuidle_idle_call();
|
|
}
|
|
arch_cpu_idle_exit();
|
|
}
|
|
|
|
/*
|
|
* Since we fell out of the loop above, we know TIF_NEED_RESCHED must
|
|
* be set, propagate it into PREEMPT_NEED_RESCHED.
|
|
*
|
|
* This is required because for polling idle loops we will not have had
|
|
* an IPI to fold the state for us.
|
|
*/
|
|
preempt_set_need_resched();
|
|
tick_nohz_idle_exit();
|
|
__current_clr_polling();
|
|
|
|
/*
|
|
* We promise to call sched_ttwu_pending() and reschedule if
|
|
* need_resched() is set while polling is set. That means that clearing
|
|
* polling needs to be visible before doing these things.
|
|
*/
|
|
smp_mb__after_atomic();
|
|
|
|
/*
|
|
* RCU relies on this call to be done outside of an RCU read-side
|
|
* critical section.
|
|
*/
|
|
flush_smp_call_function_queue();
|
|
schedule_idle();
|
|
|
|
if (unlikely(klp_patch_pending(current)))
|
|
klp_update_patch_state(current);
|
|
}
|
|
|
|
bool cpu_in_idle(unsigned long pc)
|
|
{
|
|
return pc >= (unsigned long)__cpuidle_text_start &&
|
|
pc < (unsigned long)__cpuidle_text_end;
|
|
}
|
|
|
|
struct idle_timer {
|
|
struct hrtimer timer;
|
|
int done;
|
|
};
|
|
|
|
static enum hrtimer_restart idle_inject_timer_fn(struct hrtimer *timer)
|
|
{
|
|
struct idle_timer *it = container_of(timer, struct idle_timer, timer);
|
|
|
|
WRITE_ONCE(it->done, 1);
|
|
set_tsk_need_resched(current);
|
|
|
|
return HRTIMER_NORESTART;
|
|
}
|
|
|
|
void play_idle_precise(u64 duration_ns, u64 latency_ns)
|
|
{
|
|
struct idle_timer it;
|
|
|
|
/*
|
|
* Only FIFO tasks can disable the tick since they don't need the forced
|
|
* preemption.
|
|
*/
|
|
WARN_ON_ONCE(current->policy != SCHED_FIFO);
|
|
WARN_ON_ONCE(current->nr_cpus_allowed != 1);
|
|
WARN_ON_ONCE(!(current->flags & PF_KTHREAD));
|
|
WARN_ON_ONCE(!(current->flags & PF_NO_SETAFFINITY));
|
|
WARN_ON_ONCE(!duration_ns);
|
|
WARN_ON_ONCE(current->mm);
|
|
|
|
rcu_sleep_check();
|
|
preempt_disable();
|
|
current->flags |= PF_IDLE;
|
|
cpuidle_use_deepest_state(latency_ns);
|
|
|
|
it.done = 0;
|
|
hrtimer_setup_on_stack(&it.timer, idle_inject_timer_fn, CLOCK_MONOTONIC,
|
|
HRTIMER_MODE_REL_HARD);
|
|
hrtimer_start(&it.timer, ns_to_ktime(duration_ns),
|
|
HRTIMER_MODE_REL_PINNED_HARD);
|
|
|
|
while (!READ_ONCE(it.done))
|
|
do_idle();
|
|
|
|
cpuidle_use_deepest_state(0);
|
|
current->flags &= ~PF_IDLE;
|
|
|
|
preempt_fold_need_resched();
|
|
preempt_enable();
|
|
}
|
|
EXPORT_SYMBOL_GPL(play_idle_precise);
|
|
|
|
void cpu_startup_entry(enum cpuhp_state state)
|
|
{
|
|
current->flags |= PF_IDLE;
|
|
arch_cpu_idle_prepare();
|
|
cpuhp_online_idle(state);
|
|
while (1)
|
|
do_idle();
|
|
}
|
|
|
|
/*
|
|
* idle-task scheduling class.
|
|
*/
|
|
|
|
static int
|
|
select_task_rq_idle(struct task_struct *p, int cpu, int flags)
|
|
{
|
|
return task_cpu(p); /* IDLE tasks as never migrated */
|
|
}
|
|
|
|
static int
|
|
balance_idle(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
|
|
{
|
|
return WARN_ON_ONCE(1);
|
|
}
|
|
|
|
/*
|
|
* Idle tasks are unconditionally rescheduled:
|
|
*/
|
|
static void wakeup_preempt_idle(struct rq *rq, struct task_struct *p, int flags)
|
|
{
|
|
resched_curr(rq);
|
|
}
|
|
|
|
static void update_curr_idle(struct rq *rq);
|
|
|
|
static void put_prev_task_idle(struct rq *rq, struct task_struct *prev, struct task_struct *next)
|
|
{
|
|
update_curr_idle(rq);
|
|
scx_update_idle(rq, false, true);
|
|
}
|
|
|
|
static void set_next_task_idle(struct rq *rq, struct task_struct *next, bool first)
|
|
{
|
|
update_idle_core(rq);
|
|
scx_update_idle(rq, true, true);
|
|
schedstat_inc(rq->sched_goidle);
|
|
next->se.exec_start = rq_clock_task(rq);
|
|
}
|
|
|
|
struct task_struct *pick_task_idle(struct rq *rq, struct rq_flags *rf)
|
|
{
|
|
scx_update_idle(rq, true, false);
|
|
return rq->idle;
|
|
}
|
|
|
|
/*
|
|
* It is not legal to sleep in the idle task - print a warning
|
|
* message if some code attempts to do it:
|
|
*/
|
|
static bool
|
|
dequeue_task_idle(struct rq *rq, struct task_struct *p, int flags)
|
|
{
|
|
raw_spin_rq_unlock_irq(rq);
|
|
printk(KERN_ERR "bad: scheduling from the idle thread!\n");
|
|
dump_stack();
|
|
raw_spin_rq_lock_irq(rq);
|
|
return true;
|
|
}
|
|
|
|
/*
|
|
* scheduler tick hitting a task of our scheduling class.
|
|
*
|
|
* NOTE: This function can be called remotely by the tick offload that
|
|
* goes along full dynticks. Therefore no local assumption can be made
|
|
* and everything must be accessed through the @rq and @curr passed in
|
|
* parameters.
|
|
*/
|
|
static void task_tick_idle(struct rq *rq, struct task_struct *curr, int queued)
|
|
{
|
|
update_curr_idle(rq);
|
|
}
|
|
|
|
static void switching_to_idle(struct rq *rq, struct task_struct *p)
|
|
{
|
|
BUG();
|
|
}
|
|
|
|
static void
|
|
prio_changed_idle(struct rq *rq, struct task_struct *p, u64 oldprio)
|
|
{
|
|
if (p->prio == oldprio)
|
|
return;
|
|
|
|
BUG();
|
|
}
|
|
|
|
static void update_curr_idle(struct rq *rq)
|
|
{
|
|
struct sched_entity *se = &rq->idle->se;
|
|
u64 now = rq_clock_task(rq);
|
|
s64 delta_exec;
|
|
|
|
delta_exec = now - se->exec_start;
|
|
if (unlikely(delta_exec <= 0))
|
|
return;
|
|
|
|
se->exec_start = now;
|
|
|
|
dl_server_update_idle(&rq->fair_server, delta_exec);
|
|
}
|
|
|
|
/*
|
|
* Simple, special scheduling class for the per-CPU idle tasks:
|
|
*/
|
|
DEFINE_SCHED_CLASS(idle) = {
|
|
|
|
.queue_mask = 0,
|
|
|
|
/* no enqueue/yield_task for idle tasks */
|
|
|
|
/* dequeue is not valid, we print a debug message there: */
|
|
.dequeue_task = dequeue_task_idle,
|
|
|
|
.wakeup_preempt = wakeup_preempt_idle,
|
|
|
|
.pick_task = pick_task_idle,
|
|
.put_prev_task = put_prev_task_idle,
|
|
.set_next_task = set_next_task_idle,
|
|
|
|
.balance = balance_idle,
|
|
.select_task_rq = select_task_rq_idle,
|
|
.set_cpus_allowed = set_cpus_allowed_common,
|
|
|
|
.task_tick = task_tick_idle,
|
|
|
|
.prio_changed = prio_changed_idle,
|
|
.switching_to = switching_to_idle,
|
|
.update_curr = update_curr_idle,
|
|
};
|