This commit takes care of the generated signature error CQE generated
by the HW (if happened). The underlying mlx5 driver will handle
signature error completions and will mark the relevant memory region
as dirty.
Once the consumer gets the completion for the transaction, it must
check for signature errors on signature memory region using a new
lightweight verb ib_check_mr_status().
In case the user doesn't check for signature error (i.e. doesn't call
ib_check_mr_status() with status check IB_MR_CHECK_SIG_STATUS), the
memory region cannot be used for another signature operation
(REG_SIG_MR work request will fail).
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This patch implements IB_WR_REG_SIG_MR posted by the user.
Baisically this WR involves 3 WQEs in order to prepare and properly
register the signature layout:
1. post UMR WR to register the sig_mr in one of two possible ways:
* In case the user registered a single MR for data so the UMR data segment
consists of:
- single klm (data MR) passed by the user
- BSF with signature attributes requested by the user.
* In case the user registered 2 MRs, one for data and one for protection,
the UMR consists of:
- strided block format which includes data and protection MRs and
their repetitive block format.
- BSF with signature attributes requested by the user.
2. post SET_PSV in order to set the memory domain initial
signature parameters passed by the user.
SET_PSV is not signaled and solicited CQE.
3. post SET_PSV in order to set the wire domain initial
signature parameters passed by the user.
SET_PSV is not signaled and solicited CQE.
* After this compound WR we place a small fence for next WR to come.
This patch also introduces some helper functions to set the BSF correctly
and determining the signature format selectors.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This will be useful when processing signature errors on a specific
key. The mlx5 driver will lookup the matching mlx5 memory region
structure and mark it as dirty (contains signature errors).
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
get_umr_flags helper function might be used for types of access modes
other than ACCESS_MODE_MTT, such as ACCESS_MODE_KLM. So remove it from
helper, and callers will add their own access mode flag.
This commit does not add/change functionality.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
As a preliminary step for signature feature which will require posting
multiple (3) WQEs for a single WR, we break post_send routine WQE
indexing into begin and finish routines.
This patch does not change any functionality.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
If user requested signature enable we initialize relevant mlx5_ib_qp
members. We mark the qp as sig_enable and we increase the effective
SQ size, but still limit the user max_send_wr to original size
computed. We also allow the create_qp routine to accept sig_enable
create flag.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Support create_mr and destroy_mr verbs. Creating ib_mr may be done
for either ib_mr that will register regular page lists like
alloc_fast_reg_mr routine, or indirect ib_mrs that can register other
(pre-registered) ib_mrs in an indirect manner.
In addition user may request signature enable, that will mean that the
created ib_mr may be attached with signature attributes (BSF, PSVs).
Currently we only allow direct/indirect registration modes.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Introduce a verbs interface for signature-related operations. A
signature handover operation configures the layouts of data and
protection attributes both in memory and wire domains.
Signature operations are:
- INSERT:
Generate and insert protection information when handing over
data from input space to output space.
- validate and STRIP:
Validate protection information and remove it when handing over
data from input space to output space.
- validate and PASS:
Validate protection information and pass it when handing over
data from input space to output space.
Once the signature handover opration is done, the HCA will offload
data integrity generation/validation while performing the actual data
transfer.
Additions:
1. HCA signature capabilities in device attributes
Verbs provider supporting signature handover operations fills
relevant fields in device attributes structure returned by
ib_query_device.
2. QP creation flag IB_QP_CREATE_SIGNATURE_EN
Creating a QP that will carry signature handover operations may
require some special preparations from the verbs provider. So we
add QP creation flag IB_QP_CREATE_SIGNATURE_EN to declare that the
created QP may carry out signature handover operations. Expose
signature support to verbs layer (no support for now).
3. New send work request IB_WR_REG_SIG_MR
Signature handover work request. This WR will define the signature
handover properties of the memory/wire domains as well as the
domains layout. The purpose of this work request is to bind all
the needed information for the signature operation:
- data to be transferred: wr->sg_list (ib_sge).
* The raw data, pre-registered to a single MR (normally, before
signature, this MR would have been used directly for the data
transfer)
- data protection guards: sig_handover.prot (ib_sge).
* The data protection buffer, pre-registered to a single MR, which
contains the data integrity guards of the raw data blocks.
Note that it may not always exist, only in cases where the user is
interested in storing protection guards in memory.
- signature operation attributes: sig_handover.sig_attrs.
* Tells the HCA how to validate/generate the protection information.
Once the work request is executed, the memory region that will
describe the signature transaction will be the sig_mr. The
application can now go ahead and send the sig_mr.rkey or use the
sig_mr.lkey for data transfer.
4. New Verb ib_check_mr_status
check_mr_status verb checks the status of the memory region post
transaction. The first check that may be used is
IB_MR_CHECK_SIG_STATUS, which will indicate if any signature
errors are pending for a specific signature-enabled ib_mr. This
verb is a lightwight check and is allowed to be taken from
interrupt context. An application must call this verb after it is
known that the actual data transfer has finished.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This commit introduces verbs for creating/destoying memory
regions which will allow new types of memory key operations such
as protected memory registration.
Indirect memory registration is registering several (one
of more) pre-registered memory regions in a specific layout.
The Indirect region may potentialy describe several regions
and some repitition format between them.
Protected Memory registration is registering a memory region
with various data integrity attributes which will describe protection
schemes that will be handled by the HCA in an offloaded manner.
These memory regions will be applicable for a new REG_SIG_MR
work request introduced later in this patchset.
In the future these routines may replace or implement current memory
regions creation routines existing today:
- ib_reg_user_mr
- ib_alloc_fast_reg_mr
- ib_get_dma_mr
- ib_dereg_mr
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This patch refactors the IB core umem code and vendor drivers to use a
linear (chained) SG table instead of chunk list. With this change the
relevant code becomes clearer—no need for nested loops to build and
use umem.
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Pull ARM SoC fixes from Olof Johansson:
"A collection of fixes for ARM platforms. Most are fixes for DTS
files, mostly from DT conversion on OMAP which is still finding a few
issues here and there.
There's a couple of small stale code removal patches that we usually
queue for the next release instead, but they seemed harmless enough to
bring in now.
Also, a fix for backlight on some PXA platforms, and a cache
configuration fix for Tegra, etc"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (25 commits)
MAINTAINERS: add additional ARM BCM281xx/BCM11xxx maintainer
ARM: tegra: only run PL310 init on systems with one
ARM: tegra: Add head numbers to display controllers
ARM: imx6: build pm-imx6q.c independently of CONFIG_PM
ARM: tegra: fix RTC0 alias for Cardhu
ARM: dove: dt: revert PMU interrupt controller node
Documentation: dt: OMAP: Update Overo/Tobi
ARM: dts: Add support for both OMAP35xx and OMAP36xx Overo/Tobi
ARM: dts: omap3-tobi: Use the correct vendor prefix
ARM: dts: omap3-tobi: Fix boot with OMAP36xx-based Overo
ARM: OMAP2+: Remove legacy macros for zoom platforms
ARM: OMAP2+: Remove MACH_NOKIA_N800
ARM: dts: N900: add missing compatible property
ARM: dts: N9/N950: fix boot hang with 3.14-rc1
ARM: OMAP1: nokia770: enable tahvo-usb
ARM: OMAP2+: gpmc: fix: DT ONENAND child nodes not probed when MTD_ONENAND is built as module
ARM: OMAP2+: gpmc: fix: DT NAND child nodes not probed when MTD_NAND is built as module
ARM: dts: omap3-gta04: Fix mmc1 properties.
ARM: dts: omap3-gta04: Fix 'aux' gpio key flags.
ARM: OMAP2+: add missing ARCH_HAS_OPP
...
Pull regulator fixes from Mark Brown:
"Mostly unexciting driver fixes, plus one fix to lower the severity of
the log message when we don't use an optional regulator - the fixes
for ACPI system made this come up more often and it was correctly
observed that it was causing undue concern for users"
* tag 'regulator-v3.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: max14577: Fix invalid return value on DT parse success
regulator: core: Change dummy supplies error message to a warning
regulator: s5m8767: Add missing of_node_put
regulator: s5m8767: Use of_get_child_by_name
regulator: da9063: Bug fix when setting max voltage on LDOs 5-11
Pull timer fix from Thomas Gleixner:
"Serialize the registration of a new sched_clock in the currently ARM
only generic sched_clock facilty to avoid sched_clock havoc"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched_clock: Prevent callers from seeing half-updated data
Pull x86 fixes from Thomas Gleixner:
- a bugfix which prevents a divide by 0 panic when the newly introduced
try_msr_calibrate_tsc() fails
- enablement of the Baytrail platform to utilize the newfangled msr
based calibration
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: tsc: Add missing Baytrail frequency to the table
x86, tsc: Fallback to normal calibration if fast MSR calibration fails
Pull irq fixes from Thomas Gleixner:
"Another four fixlets to tame the ARM orion irq chip"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip: orion: Fix getting generic chip pointer.
irqchip: orion: clear stale interrupts in irq_startup
irqchip: orion: use handle_edge_irq on bridge irqs
irqchip: orion: clear bridge cause register on init
Pull USB fixes from Greg KH:
"Here are a number of USB fixes for reported issues for 3.14-rc4
The majority of these are for USB gadget, phy, and musb driver issues.
And there's a few new device ids thrown in for good measure"
* tag 'usb-3.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: chipidea: need to mask when writting endptflush and endptprime
usb: musb: correct use of schedule_delayed_work()
usb: phy: msm: fix compilation errors when !CONFIG_PM_SLEEP
usb: gadget: fix NULL pointer dereference
usb: gadget: printer: using gadget_is_otg to check otg support at runtime
phy: let phy_provider_register be the last step in registering PHY
phy-core: Don't allow building phy-core as a module
phy-core: Don't propagate -ENOSUPP from phy_pm_runtime_get_sync to caller
phy-core: phy_get: Leave error logging to the caller
phy,phy-bcm-kona-usb2.c: Add dependency on HAS_IOMEM
usb: musb: correct use of schedule_delayed_work()
usb: musb: do not sleep in atomic context
USB: serial: option: blacklist interface 4 for Cinterion PHS8 and PXS8
USB: EHCI: add delay during suspend to prevent erroneous wakeups
usb: gadget: bcm63xx_udc: fix build failure on DMA channel code
usb: musb: do not sleep in atomic context
usb: gadget: s3c2410_udc: Fix build error
usb: musb: core: Fix remote-wakeup resume
usb: musb: host: Fix SuperSpeed hub enumeration
usb: musb: fix obex in g_nokia.ko causing kernel panic
Pull TTY revert from Greg KH:
"Here is a single commit, a revert of a sysfs file change that ended up
breaking a userspace tool"
* tag 'tty-3.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
Revert "tty: Set correct tty name in 'active' sysfs attribute"
Pull staging tree fix from Greg KH:
"Here is a single android driver fix for 3.14-rc4 that fixes a reported
problem in the binder driver"
* tag 'staging-3.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: binder: Fix death notifications
Pull char/misc fix from Greg KH:
"Here is a single commit, to fix a reported problem in the mei driver"
* tag 'char-misc-3.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
mei: set client's read_cb to NULL when flow control fails
Add myself as an additional maintainer for the Broadcom mobile
SoCs.
Signed-off-by: Matt Porter <mporter@linaro.org>
Acked-by: Christian Daudt <bcm@fixthebug.org>
Signed-off-by: Olof Johansson <olof@lixom.net>
Pull scheduler fixes from Ingo Molnar:
"Misc fixlets: a fair number of them resulting from the new
SCHED_DEADLINE code"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/deadline: Remove useless dl_nr_total
sched/deadline: Test for CPU's presence explicitly
sched: Add 'flags' argument to sched_{set,get}attr() syscalls
sched: Fix information leak in sys_sched_getattr()
sched,numa: add cond_resched to task_numa_work
sched/core: Make dl_b->lock IRQ safe
sched/core: Fix sched_rt_global_validate
sched/deadline: Fix overflow to handle period==0 and deadline!=0
sched/deadline: Fix bad accounting of nr_running
Pull hwmon fix from Guenter Roeck:
"Fix writing the minimum temperature in max1668 driver"
* tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (max1668) Fix writing the minimum temperature
Pull xfs fixes from Dave Chinner:
"This is the first pull request I've had to do for you, so I'm still
sorting things out. The reason I'm sending this and not Ben should be
obvious from the first commit below - SGI has stepped down from the
XFS maintainership role. As such, I'd like to take another
opportunity to thank them for their many years of effort maintaining
XFS and supporting the XFS community that they developed from the
ground up.
So I haven't had time to work things like signed tags into my
workflows yet, so this is just a repo branch I'm asking you to pull
from. And yes, I named the branch -rc4 because I wanted the fixes in
rc4, not because the branch was for merging into -rc3. Probably not
right, either.
Anyway, I should have everything sorted out by the time the next merge
window comes around. If there's anything that you don't like in the
pull req, feel free to flame me unmercifully.
The changes are fixes for recent regressions and important thinkos in
verification code:
- a log vector buffer alignment issue on ia32
- timestamps on truncate got mangled
- primary superblock CRC validation fixes and error message
sanitisation"
* 'xfs-fixes-for-3.14-rc4' of git://oss.sgi.com/xfs/xfs:
xfs: limit superblock corruption errors to actual corruption
xfs: skip verification on initial "guess" superblock read
MAINTAINERS: SGI no longer maintaining XFS
xfs: xfs_sb_read_verify() doesn't flag bad crcs on primary sb
xfs: ensure correct log item buffer alignment
xfs: ensure correct timestamp updates from truncate
This fixes bug introduced in 667a6b7a (regulator: max14577: Add missing
of_node_put). The DTS parsing function returned number of matched
regulators as success status which then was compared against 0 in probe.
Result was a probe fail after successful parsing the DTS:
max14577-regulator: probe of max14577-regulator failed with error 2
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Reviwed-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: Mark Brown <broonie@linaro.org>
irqchip mvebu fixes for v3.14
- orion:
- fixes for clearing bridge cause register, and clearing stale interrupts
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull devicetree fixes from Grant Likely:
"Device tree compatible match order bug fix
This branch contains a bug fix for the way devicetree code identifies
the type of device. Device drivers can contain a list of
of_device_ids, but it more than one entry will match, then the device
driver may choose the wrong one. Commit 105353145e, "match each node
compatible against all given matches first", was queued for v3.14 but
ended up causing other bugs. Commit 06b29e76a7 attempted to fix it
but it had other bugs. Merely reverting the fix and waiting until
v3.15 isn't a good option because there is code in v3.14 that depends
on the revised behaviour to boot.
This branch should finally fixes the problem correctly. This time
instead of just hoping that the patch is correct, this branch also
adds new testcases that validate the behaviour.
The changes in this branch are larger than I would like for a -rc
pull, but moving the test case data out of out of arch/arm so that it
could be validated on other architectures was important"
* tag 'dt-for-linus' of git://git.secretlab.ca/git/linux:
of: Add self test for of_match_node()
of: Move testcase FDT data into drivers/of
of: reimplement the matching method for __of_match_node()
Revert "of: search the best compatible match first in __of_match_node()"
Pull watchdog fix from Wim Van Sebroeck:
"It corrects the error code when no device was found for w83697hf_wdt"
* git://www.linux-watchdog.org/linux-watchdog:
watchdog: w83697hf_wdt: return ENODEV if no device was found
Enabling SPARSE_IRQ shows up a bug in the irq-orion bridge interrupt
handler. The bridge interrupt is implemented using a single generic
chip. Thus the parameter passed to irq_get_domain_generic_chip()
should always be zero.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Fixes: 9dbd90f17e ("irqchip: Add support for Marvell Orion SoCs")
Cc: <stable@vger.kernel.org> # v3.11+
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
When using BTS on Core i7-4*, I get the below kernel warning.
$ perf record -c 1 -e branches:u ls
Message from syslogd@labpc1501 at Nov 11 15:49:25 ...
kernel:[ 438.317893] Uhhuh. NMI received for unknown reason 31 on CPU 2.
Message from syslogd@labpc1501 at Nov 11 15:49:25 ...
kernel:[ 438.317920] Do you have a strange power saving mode enabled?
Message from syslogd@labpc1501 at Nov 11 15:49:25 ...
kernel:[ 438.317945] Dazed and confused, but trying to continue
Make intel_pmu_handle_irq() take the full exit path when returning early.
Cc: eranian@google.com
Cc: peterz@infradead.org
Cc: mingo@kernel.org
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1392425048-5309-1-git-send-email-andi@firstfloor.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
ENDPTFLUSH and ENDPTPRIME registers are set by software and clear
by hardware. There is a bit for each endpoint. When we are setting
a bit for an endpoint we should make sure we do not touch other
endpoint bit. There is a race condition if the hardware clear the
bit between the read and the write in hw_write.
Cc: stable <stable@vger.kernel.org> # 3.11+
Signed-off-by: Peter Chen <peter.chen@freescale.com>
Signed-off-by: Matthieu CASTET <matthieu.castet@parrot.com>
Tested-by: Michael Grzeschik <mgrzeschik@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We're copying the on-stack structure to userspace, but forgot to give
the right number of bytes to copy. This allows the calling process to
obtain up to PAGE_SIZE bytes from the stack (and possibly adjacent
kernel memory).
This fix copies only as much as we actually have on the stack
(attr->size defaults to the size of the struct) and leaves the rest of
the userspace-provided buffer untouched.
Found using kmemcheck + trinity.
Fixes: d50dde5a10 ("sched: Add new scheduler syscalls to support an extended scheduling parameters ABI")
Cc: Dario Faggioli <raistlin@linux.it>
Cc: Juri Lelli <juri.lelli@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1392585857-10725-1-git-send-email-vegard.nossum@oracle.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Fix this lockdep warning:
[ 44.804600] =========================================================
[ 44.805746] [ INFO: possible irq lock inversion dependency detected ]
[ 44.805746] 3.14.0-rc2-test+ #14 Not tainted
[ 44.805746] ---------------------------------------------------------
[ 44.805746] bash/3674 just changed the state of lock:
[ 44.805746] (&dl_b->lock){+.....}, at: [<ffffffff8106ad15>] sched_rt_handler+0x132/0x248
[ 44.805746] but this lock was taken by another, HARDIRQ-safe lock in the past:
[ 44.805746] (&rq->lock){-.-.-.}
and interrupts could create inverse lock ordering between them.
[ 44.805746]
[ 44.805746] other info that might help us debug this:
[ 44.805746] Possible interrupt unsafe locking scenario:
[ 44.805746]
[ 44.805746] CPU0 CPU1
[ 44.805746] ---- ----
[ 44.805746] lock(&dl_b->lock);
[ 44.805746] local_irq_disable();
[ 44.805746] lock(&rq->lock);
[ 44.805746] lock(&dl_b->lock);
[ 44.805746] <Interrupt>
[ 44.805746] lock(&rq->lock);
by making dl_b->lock acquiring always IRQ safe.
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Juri Lelli <juri.lelli@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1392107067-19907-3-git-send-email-juri.lelli@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
While debugging the crash with the bad nr_running accounting, I hit
another bug where, after running my sched deadline test, I was getting
failures to take a CPU offline. It was giving me a -EBUSY error.
Adding a bunch of trace_printk()s around, I found that the cpu
notifier that called sched_cpu_inactive() was returning a failure. The
overflow value was coming up negative?
Talking this over with Juri, the problem is that the total_bw update was
suppose to be made by dl_overflow() which, during my tests, seemed to
not be called. Adding more trace_printk()s, it wasn't that it wasn't
called, but it exited out right away with the check of new_bw being
equal to p->dl.dl_bw. The new_bw calculates the ratio between period and
runtime. The bug is that if you set a deadline, you do not need to set
a period if you plan on the period being equal to the deadline. That
is, if period is zero and deadline is not, then the system call should
set the period to be equal to the deadline. This is done elsewhere in
the code.
The fix is easy, check if period is set, and if it is not, then use the
deadline.
Cc: Juri Lelli <juri.lelli@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140219135335.7e74abd4@gandalf.local.home
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Rostedt writes:
My test suite was locking up hard when enabling mmiotracer. This was due
to the mmiotracer placing all but one CPU offline. I found this out
when I was able to reproduce the bug with just my stress-cpu-hotplug
test. This bug baffled me because it would not always trigger, and
would only trigger on the first run after boot up. The
stress-cpu-hotplug test would crash hard the first run, or never crash
at all. But a new reboot may cause it to crash on the first run again.
I spent all week bisecting this, as I couldn't find a consistent
reproducer. I finally narrowed it down to the sched deadline patches,
and even more peculiar, to the commit that added the sched
deadline boot up self test to the latency tracer. Then it dawned on me
to what the bug was.
All it took was to run a task under sched deadline to screw up the CPU
hot plugging. This explained why it would lock up only on the first run
of the stress-cpu-hotplug test. The bug happened when the boot up self
test of the schedule latency tracer would test a deadline task. The
deadline task would corrupt something that would cause CPU hotplug to
fail. If it didn't corrupt it, the stress test would always work
(there's no other sched deadline tasks that would run to cause
problems). If it did corrupt on boot up, the first test would lockup
hard.
I proved this theory by running my deadline test program on another box,
and then run the stress-cpu-hotplug test, and it would now consistently
lock up. I could run stress-cpu-hotplug over and over with no problem,
but once I ran the deadline test, the next run of the
stress-cpu-hotplug would lock hard.
After adding lots of tracing to the code, I found the cause. The
function tracer showed that migrate_tasks() was stuck in an infinite
loop, where rq->nr_running never equaled 1 to break out of it. When I
added a trace_printk() to see what that number was, it was 335 and
never decrementing!
Looking at the deadline code I found:
static void __dequeue_task_dl(struct rq *rq, struct task_struct *p, int flags) {
dequeue_dl_entity(&p->dl);
dequeue_pushable_dl_task(rq, p);
}
static void dequeue_task_dl(struct rq *rq, struct task_struct *p, int flags) {
update_curr_dl(rq);
__dequeue_task_dl(rq, p, flags);
dec_nr_running(rq);
}
And this:
if (dl_runtime_exceeded(rq, dl_se)) {
__dequeue_task_dl(rq, curr, 0);
if (likely(start_dl_timer(dl_se, curr->dl.dl_boosted)))
dl_se->dl_throttled = 1;
else
enqueue_task_dl(rq, curr, ENQUEUE_REPLENISH);
if (!is_leftmost(curr, &rq->dl))
resched_task(curr);
}
Notice how we call __dequeue_task_dl() and in the else case we
call enqueue_task_dl()? Also notice that dequeue_task_dl() has
underscores where enqueue_task_dl() does not. The enqueue_task_dl()
calls inc_nr_running(rq), but __dequeue_task_dl() does not. This is
where we get nr_running out of sync.
[snip]
Another point where nr_running can get out of sync is when the dl_timer
fires:
dl_se->dl_throttled = 0;
if (p->on_rq) {
enqueue_task_dl(rq, p, ENQUEUE_REPLENISH);
if (task_has_dl_policy(rq->curr))
check_preempt_curr_dl(rq, p, 0);
else
resched_task(rq->curr);
This patch does two things:
- correctly accounts for throttled tasks (that are now considered
!running);
- fixes the bug, updating nr_running from {inc,dec}_dl_tasks(),
since we risk to update it twice in some situations (e.g., a
task is dequeued while it has exceeded its budget).
Cc: mingo@redhat.com
Cc: torvalds@linux-foundation.org
Cc: akpm@linux-foundation.org
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Juri Lelli <juri.lelli@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1392884379-13744-1-git-send-email-juri.lelli@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Most WDT driver modules return ENODEV during modprobe if
no valid device was found, but w83697hf_wdt returns EIO.
Let w83697hf_wdt return ENODEV.
Signed-off-by: Stanislav Kholmanskikh <stanislav.kholmanskikh@oracle.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Pull sparc fixes from David Miller:
"Three minor fixes from David Howells and Paul Gortmaker"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
Sparc: sparc_cpu_model isn't in asm/system.h any more [ver #2]
sparc32: make copy_to/from_user_page() usable from modular code
sparc32: fix build failure for arch_jump_label_transform