During suspend/resume usecases and tests, it is common to see issues
such as lockups either in suspend path or resume path because of the
bugs in the corresponding device driver pm handling code. In such cases,
it is important that watchdog is active to make sure that we either
receive a watchdog pretimeout notification or a bite causing reset
instead of a hang causing us to hard reset the machine.
There are good reasons as to why we need this because:
* We can have a watchdog pretimeout governor set to panic in which
case we can have a backtrace which would help identify the issue
with the particular driver and cause a normal reboot.
* Even in case where there is no pretimeout support, a watchdog
bite is still useful because some firmware has debug support to dump
CPU core context on watchdog bite for post-mortem analysis.
* One more usecase which comes to mind is of warm reboot. In case we
hard reset the target, a cold reboot could be induced resulting in
lose of ddr contents thereby losing all the debug info.
Currently, the watchdog pm callback just invokes the usual suspend
and resume callback which do not have any special ordering in the
sense that a watchdog can be suspended before the buggy device driver
suspend callback and watchdog resume can happen after the buggy device
driver resume callback. This would mean that the watchdog will not be
active when the buggy driver cause the lockups thereby hanging the
system. So to make sure this doesn't happen, move the watchdog pm to
use late/early system pm callbacks which will ensure that the watchdog
is suspended late and resumed early so that it can catch such issues.
Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20210310202004.1436-1-saiprakash.ranjan@codeaurora.org
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
Use the bark interrupt as the pretimeout notifier if available.
When the watchdog timer expires in dual mode, an interrupt will be
triggered first, then the timing restarts. The reset signal will be
initiated when the timer expires again.
The pretimeout notification shall occur at timeout-sec/2.
V2:
- panic() by default if WATCHDOG_PRETIMEOUT_GOV is not enabled.
V3:
- Modify the pretimeout behavior, manually reset after the pretimeout
- is processed and wait until timeout.
V4:
- Remove pretimeout related processing.
- Add dual mode control separately.
V5:
- Fix some formatting and printing problems.
V6:
- Realize pretimeout processing through dualmode.
V7:
- Add set_pretimeout().
V8/V9:
- Fix some formatting problems.
Signed-off-by: Wang Qing <wangqing@vivo.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/1619315527-8171-2-git-send-email-wangqing@vivo.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
The command 'find drivers/watchdog | xargs ./scripts/kernel-doc -none'
reports a number of kernel-doc warnings in the watchdog subsystem.
Address the kernel-doc warnings that were purely syntactic issues with
kernel-doc comments.
The remaining kernel-doc warnings are of type "Excess function parameter"
and "Function parameter or member not described". These warnings would
need to be addressed in a second pass with a bit more insight into the
APIs and purpose of the functions in the watchdog subsystem.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20210322065337.617-1-lukas.bulwahn@gmail.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
This driver's remove path calls del_timer(). However, that function
does not wait until the timer handler finishes. This means that the
timer handler may still be running after the driver's remove function
has finished, which would result in a use-after-free.
Fix by calling del_timer_sync(), which makes sure the timer handler
has finished, and unable to re-schedule itself.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zou Wei <zou_wei@huawei.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Vladimir Zapolskiy <vz@mleia.com>
Link: https://lore.kernel.org/r/1620802676-19701-1-git-send-email-zou_wei@huawei.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
This module's remove path calls del_timer(). However, that function
does not wait until the timer handler finishes. This means that the
timer handler may still be running after the driver's remove function
has finished, which would result in a use-after-free.
Fix by calling del_timer_sync(), which makes sure the timer handler
has finished, and unable to re-schedule itself.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zou Wei <zou_wei@huawei.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/1620716691-108460-1-git-send-email-zou_wei@huawei.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
This module's remove path calls del_timer(). However, that function
does not wait until the timer handler finishes. This means that the
timer handler may still be running after the driver's remove function
has finished, which would result in a use-after-free.
Fix by calling del_timer_sync(), which makes sure the timer handler
has finished, and unable to re-schedule itself.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zou Wei <zou_wei@huawei.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/1620716495-108352-1-git-send-email-zou_wei@huawei.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
The mtx-1_wdt driver does not need the au1000.h header file.
Instead, the header file causes build errors, so drop it.
This change fixes multiple build errors, all in au1000.h. E.g.:
In file included from ../drivers/watchdog/mtx-1_wdt.c:44:
../arch/mips/include/asm/mach-au1x00/au1000.h: In function 'alchemy_rdsys':
../arch/mips/include/asm/mach-au1x00/au1000.h:603:36: error: implicit declaration of function 'KSEG1ADDR'; did you mean 'CKSEG1ADDR'? [-Werror=implicit-function-declaration]
603 | void __iomem *b = (void __iomem *)KSEG1ADDR(AU1000_SYS_PHYS_ADDR);
../arch/mips/include/asm/mach-au1x00/au1000.h:603:20: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
603 | void __iomem *b = (void __iomem *)KSEG1ADDR(AU1000_SYS_PHYS_ADDR);
Fixes: da2a68b3eb ("watchdog: Enable COMPILE_TEST where possible")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Wim Van Sebroeck <wim@linux-watchdog.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: linux-watchdog@vger.kernel.org
Cc: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20210516211703.25349-1-rdunlap@infradead.org
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
The pretimeout register has a default reset value. Hence
when a smaller WDT timeout is set which would be lesser than the
default pretimeout, the system behaves abnormally, starts
triggering the pretimeout interrupt even when the WDT is
not enabled, most of the times leading to system crash.
Hence an update in the pre-timeout is also required for the
default timeout that is being configured.
Fixes: fa0f8d51e9 ("watchdog: Add watchdog driver for Intel Keembay Soc")
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Kris Pan <kris.pan@intel.com>
Signed-off-by: Shruthi Sanil <shruthi.sanil@intel.com>
Link: https://lore.kernel.org/r/20210517174953.19404-2-shruthi.sanil@intel.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
Pull perf fixes from Thomas Gleixner:
"Two perf fixes:
- Do not check the LBR_TOS MSR when setting up unrelated LBR MSRs as
this can cause malfunction when TOS is not supported
- Allocate the LBR XSAVE buffers along with the DS buffers upfront
because allocating them when adding an event can deadlock"
* tag 'perf-urgent-2021-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/lbr: Remove cpuc->lbr_xsave allocation from atomic context
perf/x86: Avoid touching LBR_TOS MSR for Arch LBR
Pull locking fixes from Thomas Gleixner:
"Two locking fixes:
- Invoke the lockdep tracepoints in the correct place so the ordering
is correct again
- Don't leave the mutex WAITER bit stale when the last waiter is
dropping out early due to a signal as that forces all subsequent
lock operations needlessly into the slowpath until it's cleaned up
again"
* tag 'locking-urgent-2021-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/mutex: clear MUTEX_FLAGS if wait_list is empty due to signal
locking/lockdep: Correct calling tracepoints
Pull irq fixes from Thomas Gleixner:
"A few fixes for irqchip drivers:
- Allocate interrupt descriptors correctly on Mainstone PXA when
SPARSE_IRQ is enabled; otherwise the interrupt association fails
- Make the APPLE AIC chip driver depend on APPLE
- Remove redundant error output on devm_ioremap_resource() failure"
* tag 'irq-urgent-2021-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip: Remove redundant error printing
irqchip/apple-aic: APPLE_AIC should depend on ARCH_APPLE
ARM: PXA: Fix cplds irqdesc allocation when using legacy mode
Pull x86 fixes from Borislav Petkov:
- Fix how SEV handles MMIO accesses by forwarding potential page faults
instead of killing the machine and by using the accessors with the
exact functionality needed when accessing memory.
- Fix a confusion with Clang LTO compiler switches passed to the it
- Handle the case gracefully when VMGEXIT has been executed in
userspace
* tag 'x86_urgent_for_v5.13_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sev-es: Use __put_user()/__get_user() for data accesses
x86/sev-es: Forward page-faults which happen during emulation
x86/sev-es: Don't return NULL from sev_es_get_ghcb()
x86/build: Fix location of '-plugin-opt=' flags
x86/sev-es: Invalidate the GHCB after completing VMGEXIT
x86/sev-es: Move sev_es_put_ghcb() in prep for follow on patch
Pull powerpc fixes from Michael Ellerman:
- Fix breakage of strace (and other ptracers etc.) when using the new
scv ABI (Power9 or later with glibc >= 2.33).
- Fix early_ioremap() on 64-bit, which broke booting on some machines.
Thanks to Dmitry V. Levin, Nicholas Piggin, Alexey Kardashevskiy, and
Christophe Leroy.
* tag 'powerpc-5.13-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s/syscall: Fix ptrace syscall info with scv syscalls
powerpc/64s/syscall: Use pt_regs.trap to distinguish syscall ABI difference between sc and scv syscalls
powerpc: Fix early setup to make early_ioremap() work
Pull Kbuild fixes from Masahiro Yamada:
- Fix short log indentation for tools builds
- Fix dummy-tools to adjust to the latest stackprotector check
* tag 'kbuild-fixes-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: dummy-tools: adjust to stricter stackprotector check
scripts/jobserver-exec: Fix a typo ("envirnoment")
tools build: Fix quiet cmd indentation
Merge misc fixes from Andrew Morton:
"10 patches.
Subsystems affected by this patch series: mm (pagealloc, gup, kasan,
and userfaultfd), ipc, selftests, watchdog, bitmap, procfs, and lib"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
userfaultfd: hugetlbfs: fix new flag usage in error path
lib: kunit: suppress a compilation warning of frame size
proc: remove Alexey from MAINTAINERS
linux/bits.h: fix compilation error with GENMASK
watchdog: reliable handling of timestamps
kasan: slab: always reset the tag in get_freepointer_safe()
tools/testing/selftests/exec: fix link error
ipc/mqueue, msg, sem: avoid relying on a stack reference past its expiry
Revert "mm/gup: check page posion status for coredump."
mm/shuffle: fix section mismatch warning