In blk-cgroup, operations on blkg objects are protected with the
request_queue lock. This is no more the lock that protects
I/O-scheduler operations in blk-mq. In fact, the latter are now
protected with a finer-grained per-scheduler-instance lock. As a
consequence, although blkg lookups are also rcu-protected, blk-mq I/O
schedulers may see inconsistent data when they access blkg and
blkg-related objects. BFQ does access these objects, and does incur
this problem, in the following case.
The blkg_lookup performed in bfq_get_queue, being protected (only)
through rcu, may happen to return the address of a copy of the
original blkg. If this is the case, then the blkg_get performed in
bfq_get_queue, to pin down the blkg, is useless: it does not prevent
blk-cgroup code from destroying both the original blkg and all objects
directly or indirectly referred by the copy of the blkg. BFQ accesses
these objects, which typically causes a crash for NULL-pointer
dereference of memory-protection violation.
Some additional protection mechanism should be added to blk-cgroup to
address this issue. In the meantime, this commit provides a quick
temporary fix for BFQ: cache (when safe) blkg data that might
disappear right after a blkg_lookup.
In particular, this commit exploits the following facts to achieve its
goal without introducing further locks. Destroy operations on a blkg
invoke, as a first step, hooks of the scheduler associated with the
blkg. And these hooks are executed with bfqd->lock held for BFQ. As a
consequence, for any blkg associated with the request queue an
instance of BFQ is attached to, we are guaranteed that such a blkg is
not destroyed, and that all the pointers it contains are consistent,
while that instance is holding its bfqd->lock. A blkg_lookup performed
with bfqd->lock held then returns a fully consistent blkg, which
remains consistent until this lock is held. In more detail, this holds
even if the returned blkg is a copy of the original one.
Finally, also the object describing a group inside BFQ needs to be
protected from destruction on the blkg_free of the original blkg
(which invokes bfq_pd_free). This commit adds private refcounting for
this object, to let it disappear only after no bfq_queue refers to it
any longer.
This commit also removes or updates some stale comments on locking
issues related to blk-cgroup operations.
Reported-by: Tomas Konir <tomas.konir@gmail.com>
Reported-by: Lee Tibbert <lee.tibbert@gmail.com>
Reported-by: Marco Piazza <mpiazza@gmail.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Tomas Konir <tomas.konir@gmail.com>
Tested-by: Lee Tibbert <lee.tibbert@gmail.com>
Tested-by: Marco Piazza <mpiazza@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Christoph writes:
"A few NVMe fixes for 4.12-rc, PCIe reset fixes and APST fixes, a
RDMA reconnect fix, two FC fixes and a general controller removal fix."
While installing SLES-12 (based on v4.4), I found that the installer
will stall for 60+ seconds during LVM disk scan. The root cause was
determined to be the removal of a bound device check in loop_flush()
by commit b5dd2f6047 ("block: loop: improve performance via blk-mq").
Restoring this check, examining ->lo_state as set by loop_set_fd()
eliminates the bad behavior.
Test method:
modprobe loop max_loop=64
dd if=/dev/zero of=disk bs=512 count=200K
for((i=0;i<4;i++))do losetup -f disk; done
mkfs.ext4 -F /dev/loop0
for((i=0;i<4;i++))do mkdir t$i; mount /dev/loop$i t$i;done
for f in `ls /dev/loop[0-9]*|sort`; do \
echo $f; dd if=$f of=/dev/null bs=512 count=1; \
done
Test output: stock patched
/dev/loop0 18.1217e-05 8.3842e-05
/dev/loop1 6.1114e-05 0.000147979
/dev/loop10 0.414701 0.000116564
/dev/loop11 0.7474 6.7942e-05
/dev/loop12 0.747986 8.9082e-05
/dev/loop13 0.746532 7.4799e-05
/dev/loop14 0.480041 9.3926e-05
/dev/loop15 1.26453 7.2522e-05
Note that from loop10 onward, the device is not mounted, yet the
stock kernel consumes several orders of magnitude more wall time
than it does for a mounted device.
(Thanks for Mike Galbraith <efault@gmx.de>, give a changelog review.)
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: James Wang <jnwang@suse.com>
Fixes: b5dd2f6047 ("block: loop: improve performance via blk-mq")
Signed-off-by: Jens Axboe <axboe@fb.com>
hard disk IO latency varies a lot depending on spindle move. The latency
range could be from several microseconds to several milliseconds. It's
pretty hard to get the baseline latency used by io.low.
We will use a different stragety here. The idea is only using IO with
spindle move to determine if cgroup IO is in good state. For HD, if io
latency is small (< 1ms), we ignore the IO. Such IO is likely from
sequential IO, and is helpless to help determine if a cgroup's IO is
impacted by other cgroups. With this, we only account IO with big
latency. Then we can choose a hardcoded baseline latency for HD (4ms,
which is typical IO latency with seek). With all these settings, the
io.low latency works for both HD and SSD.
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
I have encountered a NULL pointer dereference in
throtl_schedule_pending_timer:
[ 413.735396] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
[ 413.735535] IP: [<ffffffff812ebbbf>] throtl_schedule_pending_timer+0x3f/0x210
[ 413.735643] PGD 22c8cf067 PUD 22cb34067 PMD 0
[ 413.735713] Oops: 0000 [#1] SMP
......
This is caused by the following case:
blk_throtl_bio
throtl_schedule_next_dispatch <= sq is top level one without parent
throtl_schedule_pending_timer
sq_to_tg(sq)->td->throtl_slice <= sq_to_tg(sq) returns NULL
Fix it by using sq_to_td instead of sq_to_tg(sq)->td, which will always
return a valid td.
Fixes: 297e3d8547 ("blk-throttle: make throtl_slice tunable")
Signed-off-by: Joseph Qi <qijiang.qj@alibaba-inc.com>
Reviewed-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Christoph Hellwig suggests we should to make APST work out of the box.
Hence relax the the default max latency to make them able to enter
deepest power state on default.
Here are id-ctrl excerpts from two high latency NVMes:
vid : 0x14a4
ssvid : 0x1b4b
mn : CX2-GB1024-Q11 NVMe LITEON 1024GB
ps 3 : mp:0.1000W non-operational enlat:5000 exlat:5000 rrt:3 rrl:3
rwt:3 rwl:3 idle_power:- active_power:-
ps 4 : mp:0.0100W non-operational enlat:50000 exlat:100000 rrt:4 rrl:4
rwt:4 rwl:4 idle_power:- active_power:-
vid : 0x15b7
ssvid : 0x1b4b
mn : A400 NVMe SanDisk 512GB
ps 3 : mp:0.0500W non-operational enlat:51000 exlat:10000 rrt:0 rrl:0
rwt:0 rwl:0 idle_power:- active_power:-
ps 4 : mp:0.0055W non-operational enlat:1000000 exlat:100000 rrt:0 rrl:0
rwt:0 rwl:0 idle_power:- active_power:-
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
When a NVMe is in non-op states, the latency is exlat.
The latency will be enlat + exlat only when the NVMe tries to transit
from operational state right atfer it begins to transit to
non-operational state, which should be a rare case.
Therefore, as Andy Lutomirski suggests, use exlat only when deciding power
states to trainsit to.
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The failure case, of a create controller request, called
nvme_uninit_ctrl() but didn't do a put to allow the nvme
controller to be deleted.
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Per FC-NVME, when lldd or transport detects an i/o error, the
connection must be terminated, which in turn requires the association
to be termianted. Currently the transport simply creates a nvme
completion status of transport error and returns the io. The FC-NVME
spec makes the mandate as initiator and host, depending on the error,
can get out of sync on outstanding io counts (sqhd/sqtail).
Implement the association teardown on lldd or transport detected
errors.
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
When we encounter an transport/controller errors, error recovery
kicks in which performs:
1. stops io/admin queues
2. moves transport queues out of LIVE state
3. fast fail pending io
4. schedule periodic reconnects.
But we also need to fast fail incoming IO taht enters after we
already scheduled. Given that our queue is not LIVE anymore, simply
restart the request queues to fail in .queue_rq
Reported-by: Alex Turin <alex@vastdata.com>
Reported-by: shahar.salzman <shahar.salzman@gmail.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
We need to start admin queues too in nvme_kill_queues()
for avoiding hang in remove path[1].
This patch is very similar with 806f026f9b901eaf(nvme: use
blk_mq_start_hw_queues() in nvme_kill_queues()).
[1] hang stack trace
[<ffffffff813c9716>] blk_execute_rq+0x56/0x80
[<ffffffff815cb6e9>] __nvme_submit_sync_cmd+0x89/0xf0
[<ffffffff815ce7be>] nvme_set_features+0x5e/0x90
[<ffffffff815ce9f6>] nvme_configure_apst+0x166/0x200
[<ffffffff815cef45>] nvme_set_latency_tolerance+0x35/0x50
[<ffffffff8157bd11>] apply_constraint+0xb1/0xc0
[<ffffffff8157cbb4>] dev_pm_qos_constraints_destroy+0xf4/0x1f0
[<ffffffff8157b44a>] dpm_sysfs_remove+0x2a/0x60
[<ffffffff8156d951>] device_del+0x101/0x320
[<ffffffff8156db8a>] device_unregister+0x1a/0x60
[<ffffffff8156dc4c>] device_destroy+0x3c/0x50
[<ffffffff815cd295>] nvme_uninit_ctrl+0x45/0xa0
[<ffffffff815d4858>] nvme_remove+0x78/0x110
[<ffffffff81452b69>] pci_device_remove+0x39/0xb0
[<ffffffff81572935>] device_release_driver_internal+0x155/0x210
[<ffffffff81572a02>] device_release_driver+0x12/0x20
[<ffffffff815d36fb>] nvme_remove_dead_ctrl_work+0x6b/0x70
[<ffffffff810bf3bc>] process_one_work+0x18c/0x3a0
[<ffffffff810bf61e>] worker_thread+0x4e/0x3b0
[<ffffffff810c5ac9>] kthread+0x109/0x140
[<ffffffff8185800c>] ret_from_fork+0x2c/0x40
[<ffffffffffffffff>] 0xffffffffffffffff
Fixes: c5552fde102fc("nvme: Enable autonomous power state transitions")
Reported-by: Rakesh Pandit <rakesh@tuxera.com>
Tested-by: Rakesh Pandit <rakesh@tuxera.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
gcc 7.1 reports the following warning:
block/elevator.c: In function ‘elv_register’:
block/elevator.c:898:5: warning: ‘snprintf’ output may be truncated before the last format character [-Wformat-truncation=]
"%s_io_cq", e->elevator_name);
^~~~~~~~~~
block/elevator.c:897:3: note: ‘snprintf’ output between 7 and 22 bytes into a destination of size 21
snprintf(e->icq_cache_name, sizeof(e->icq_cache_name),
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"%s_io_cq", e->elevator_name);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The bug is that the name of the icq_cache is 6 characters longer than
the elevator name, but only ELV_NAME_MAX + 5 characters were reserved
for it --- so in the case of a maximum-length elevator name, the 'q'
character in "_io_cq" would be truncated by snprintf(). Fix it by
reserving ELV_NAME_MAX + 6 characters instead.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
When direct issue is done on request picked up from plug list,
the hctx need to be updated with the actual hw queue, otherwise
wrong hctx is used and may hurt performance, especially when
wrong SRCU readlock is acquired/released
Reported-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Pull XFS fix from Darrick Wong:
"I've one more bugfix for you for 4.12-rc4: Fix an unmount hang due to
a race in io buffer accounting"
* tag 'xfs-4.12-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: use ->b_state to fix buffer I/O accounting release race
Pull ceph fix from Ilya Dryomov:
"A small fix for rbd FALLOC_FL_ZERO_RANGE/PUNCH_HOLE handling breakage
introduced in -rc1"
* tag 'ceph-for-4.12-rc4' of git://github.com/ceph/ceph-client:
rbd: implement REQ_OP_WRITE_ZEROES
Pull device mapper fixes from Mike Snitzer:
- a DM verity fix for a mode when no salt is used
- a fix to DM to account for the possibility that PREFLUSH or FUA are
used without the SYNC flag if the underlying storage doesn't have a
volatile write-cache
- a DM ioctl memory allocation flag fix to use __GFP_HIGH to allow
emergency forward progress (by using memory reserves as last resort)
- a small DM integrity cleanup to use kvmalloc() instead of duplicating
the same
* tag 'for-4.12/dm-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm: make flush bios explicitly sync
dm ioctl: restore __GFP_HIGH in copy_params()
dm integrity: use kvmalloc() instead of dm_integrity_kvmalloc()
dm verity: fix no salt use case
Pull MD fixes from Shaohua Li:
"Several patches for MD. One notable is making flush bios sync, others
fix small issues"
* tag 'md/4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
md: Make flush bios explicitely sync
md: report sector of stripes with check mismatches
md: uuid debug statement now in processor byte order.
md-cluster: fix potential lock issue in add_new_disk
Pull block fixes from Jens Axboe:
"A set of fixes that should go into the next -rc. This contains:
- A use-after-free in the request_list exit for the legacy IO path,
from Bart.
- A fix for CFQ, fixing a recent regression with the conversion to
higher resolution timing for iops mode. From Hou Tao.
- A single fix for nbd, split in two patches, fixing a leak of a data
structure.
- A regression fix from Keith, ensuring that callers of
blk_mq_update_nr_hw_queues() hold the right lock"
* 'for-linus' of git://git.kernel.dk/linux-block:
block: Avoid that blk_exit_rl() triggers a use-after-free
cfq-iosched: fix the delay of cfq_group's vdisktime under iops mode
blk-mq: Take tagset lock when updating hw queues
nbd: don't leak nbd_config
nbd: nbd_reset() call in nbd_dev_add() is redundant
Pull drm displayport quirk support:
"DP quirk for usb c dongles.
As mentioned I have a separate request for fixing a regression, but
also keeping the broken hw working, for certain USB-C DP adapters they
require a minimised n/m parameters, but an attempt to do this
generically has failed, we need to quirk these specific adapters.
However doing it generically regressed some eDP panels.
This pull adds the infrastructure and a quirk for the adapter"
* tag 'drm-dp-quirk-for-v4.12-rc4' of git://people.freedesktop.org/~airlied/linux:
drm/i915: Detect USB-C specific dongles before reducing M and N
drm/dp: start a DPCD based DP sink/branch device quirk database
drm/i915: use drm DP helper to read DPCD desc
drm/dp: add helper for reading DP sink/branch device desc from DPCD
Pull sound fixes from Takashi Iwai:
"This contains the fixes for a few reported regression for HD-audio and
USB-audio. All small, trivial, and boring"
* tag 'sound-4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda - Fix applying MSI dual-codec mobo quirk
ALSA: usb: Avoid VLA in mixer_us16x08.c
ALSA: usb: Fix a typo in Tascam US-16x08 mixer element
Revert "ALSA: usb-audio: purge needless variable length array"
Pull dmaengine fixes from Vinod Koul:
"Here is the dmaengine fixes request for 4.12. Fixes bunch of issues in
the driver, npthing exciting though..
- mv_xor_v2 driver fixes for handling descriptors, tx_submit
implementation, removing interrupt coalescing and setting DMA mask
properly
- fix usb-dmac DMAOR AE bit definition
- fix ep93xx start buffer from BASE0 and not drain the transfers in
terminate_all
- fix rcar-dmac to use right descriptor pointer for residue
calculation
- pl330 fix warn for irq freeup"
* tag 'dmaengine-fix-4.12-rc4' of git://git.infradead.org/users/vkoul/slave-dma:
dmaengine: pl330: fix warning in pl330_remove
rcar-dmac: fixup descriptor pointer for descriptor mode
dmaengine: ep93xx: Don't drain the transfers in terminate_all()
dmaengine: ep93xx: Always start from BASE0
dmaengine: usb-dmac: Fix DMAOR AE bit definition
dmaengine: mv_xor_v2: set DMA mask to 40 bits
dmaengine: mv_xor_v2: remove interrupt coalescing
dmaengine: mv_xor_v2: fix tx_submit() implementation
dmaengine: mv_xor_v2: enable XOR engine after its configuration
dmaengine: mv_xor_v2: do not use descriptors not acked by async_tx
dmaengine: mv_xor_v2: properly handle wrapping in the array of HW descriptors
dmaengine: mv_xor_v2: handle mv_xor_v2_prep_sw_desc() error properly
Pull HID fixes from Jiri Kosina:
- corner-case oops fixes for Asus and Wacom drivers from Carlo Caione
and Jason Gerecke
- power management fix (reported on SIS0817 touchscreen) for i2c-hid
devices from Hans de Goede
- device-id-specific fixes and quirks from Hans de Goede, Diego Elio
Pettenò and Che-Liang Chiou
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
HID: asus: Stop underlying hardware on remove
HID: i2c: Call acpi_device_fix_up_power for ACPI-enumerated devices
HID: asus: Add support for T100 keyboard
HID: elecom: extend to fix the descriptor for DEFT trackballs
HID: magicmouse: Set multi-touch keybits for Magic Mouse
HID: wacom: Have wacom_tpc_irq guard against possible NULL dereference
Pull livepatching fix from Jiri Kosina:
"Kconfig dependency fix for livepatching infrastructure from Miroslav
Benes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
livepatch: Make livepatch dependent on !TRIM_UNUSED_KSYMS
Pull x86 fixes from Ingo Molnar:
"Misc fixes:
- revert a broken PAT commit that broke a number of systems
- fix two preemptability warnings/bugs that can trigger under certain
circumstances, in the debug code and in the microcode loader"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "x86/PAT: Fix Xorg regression on CPUs that don't support PAT"
x86/debug/32: Convert a smp_processor_id() call to raw to avoid DEBUG_PREEMPT warning
x86/microcode/AMD: Change load_microcode_amd()'s param to bool to fix preemptibility bug
Pull EFI fixes from Ingo Molnar:
"Misc fixes:
- three boot crash fixes for uncommon configurations
- silence a boot warning under virtualization
- plus a GCC 7 related (harmless) build warning fix"
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/bgrt: Skip efi_bgrt_init() in case of non-EFI boot
x86/efi: Correct EFI identity mapping under 'efi=old_map' when KASLR is enabled
x86/efi: Disable runtime services on kexec kernel if booted with efi=old_map
efi: Remove duplicate 'const' specifiers
efi: Don't issue error message when booted under Xen
The BAD_MADT_GICC_ENTRY() macro checks if a GICC MADT entry passes
muster from an ACPI specification standpoint. Current macro detects the
MADT GICC entry length through ACPI firmware version (it changed from 76
to 80 bytes in the transition from ACPI 5.1 to ACPI 6.0 specification)
but always uses (erroneously) the ACPICA (latest) struct (ie struct
acpi_madt_generic_interrupt - that is 80-bytes long) length to check if
the current GICC entry memory record exceeds the MADT table end in
memory as defined by the MADT table header itself, which may result in
false negatives depending on the ACPI firmware version and how the MADT
entries are laid out in memory (ie on ACPI 5.1 firmware MADT GICC
entries are 76 bytes long, so by adding 80 to a GICC entry start address
in memory the resulting address may well be past the actual MADT end,
triggering a false negative).
Fix the BAD_MADT_GICC_ENTRY() macro by reshuffling the condition checks
and update them to always use the firmware version specific MADT GICC
entry length in order to carry out boundary checks.
Fixes: b6cfb27737 ("ACPI / ARM64: add BAD_MADT_GICC_ENTRY() macro")
Reported-by: Julien Grall <julien.grall@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Hanjun Guo <hanjun.guo@linaro.org>
Cc: Al Stone <ahs3@redhat.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
We are missing a call to hid_hw_stop() on the remove hook.
Among other things this is causing an Oops when (re-)starting GNOME /
upowerd / ... after the module has been already rmmod-ed.
Signed-off-by: Carlo Caione <carlo@endlessm.com>
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
When removing a device with less than 9 IRQs (AMBA_NR_IRQS), we'll get a
big WARN_ON from devres.c because pl330_remove calls devm_free_irqs for
unallocated irqs. Similarly to pl330_probe, check that IRQ number is
present before calling devm_free_irq.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
DP sink specific quirks
* tag 'topic/dp-quirks-2017-05-31' of git://anongit.freedesktop.org/git/drm-intel:
drm/i915: Detect USB-C specific dongles before reducing M and N
drm/dp: start a DPCD based DP sink/branch device quirk database
drm/i915: use drm DP helper to read DPCD desc
drm/dp: add helper for reading DP sink/branch device desc from DPCD
Pull nfsd fixes from Bruce Fields:
"Revert patch accidentally included in the merge window pull request,
and fix a crash that was likely a result of buggy client behavior"
* tag 'nfsd-4.12-1' of git://linux-nfs.org/~bfields/linux:
nfsd4: fix null dereference on replay
nfsd: Revert "nfsd: check for oversized NFSv2/v3 arguments"
Pull gcc-plugin prepwork from Kees Cook:
"Use designated initializers for mtk-vcodec, powerplay, amdgpu, and
sgi-xp. Use ERR_CAST() to avoid cross-structure cast in ocf2, ntfs,
and NFS.
Christoph Hellwig recommended that I send these fixes now, rather than
waiting for the v4.13 merge window. These are all initializer and cast
fixes needed for the future randstruct plugin that haven't been picked
up by the respective maintainers"
* tag 'gcc-plugins-v4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
mtk-vcodec: Use designated initializers
drm/amd/powerplay: Use designated initializers
drm/amdgpu: Use designated initializers
sgi-xp: Use designated initializers
ocfs2: Use ERR_CAST() to avoid cross-structure cast
ntfs: Use ERR_CAST() to avoid cross-structure cast
NFS: Use ERR_CAST() to avoid cross-structure cast
Pull KVM fixes from Paolo Bonzini:
"Many small x86 bug fixes: SVM segment registers access rights, nested
VMX, preempt notifiers, LAPIC virtual wire mode, NMI injection"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Fix nmi injection failure when vcpu got blocked
KVM: SVM: do not zero out segment attributes if segment is unusable or not present
KVM: SVM: ignore type when setting segment registers
KVM: nVMX: fix nested_vmx_check_vmptr failure paths under debugging
KVM: x86: Fix virtual wire mode
KVM: nVMX: Fix handling of lmsw instruction
KVM: X86: Fix preempt the preemption timer cancel
Pull Reiserfs and GFS2 fixes from Jan Kara:
"Fixes to GFS2 & Reiserfs for the fallout of the recent WRITE_FUA
cleanup from Christoph.
Fixes for other filesystems were already merged by respective
maintainers."
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
reiserfs: Make flush bios explicitely sync
gfs2: Make flush bios explicitely sync
Pull SCSI target fixes from Nicholas Bellinger:
"Here are the target-pending fixes for v4.12-rc4:
- ibmviscsis ABORT_TASK handling fixes that missed the v4.12 merge
window. (Bryant Ly and Michael Cyr)
- Re-add a target-core check enforcing WRITE overflow reject that was
relaxed in v4.3, to avoid unsupported iscsi-target immediate data
overflow. (nab)
- Fix a target-core-user OOPs during device removal. (MNC + Bryant
Ly)
- Fix a long standing iscsi-target potential issue where kthread exit
did not wait for kthread_should_stop(). (Jiang Yi)
- Fix a iscsi-target v3.12.y regression OOPs involving initial login
PDU processing during asynchronous TCP connection close. (MNC +
nab)
This is a little larger than usual for an -rc4, primarily due to the
iscsi-target v3.12.y regression OOPs bug-fix.
However, it's an important patch as MNC + Hannes where both able to
trigger it using a reduced iscsi initiator login timeout combined with
a backend taking a long time to complete I/Os during iscsi login
driven session reinstatement"
* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
iscsi-target: Always wait for kthread_should_stop() before kthread exit
iscsi-target: Fix initial login PDU asynchronous socket close OOPs
tcmu: fix crash during device removal
target: Re-add check to reject control WRITEs with overflow data
ibmvscsis: Fix the incorrect req_lim_delta
ibmvscsis: Clear left-over abort_cmd pointers
When spin_lock_irqsave() deadlock occurs inside the guest, vcpu threads,
other than the lock-holding one, would enter into S state because of
pvspinlock. Then inject NMI via libvirt API "inject-nmi", the NMI could
not be injected into vm.
The reason is:
1 It sets nmi_queued to 1 when calling ioctl KVM_NMI in qemu, and sets
cpu->kvm_vcpu_dirty to true in do_inject_external_nmi() meanwhile.
2 It sets nmi_queued to 0 in process_nmi(), before entering guest, because
cpu->kvm_vcpu_dirty is true.
It's not enough just to check nmi_queued to decide whether to stay in
vcpu_block() or not. NMI should be injected immediately at any situation.
Add checking nmi_pending, and testing KVM_REQ_NMI replaces nmi_queued
in vm_vcpu_has_events().
Do the same change for SMIs.
Signed-off-by: Zhuang Yanying <ann.zhuangyanying@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This is a fix for the problem [1], where VMCB.CPL was set to 0 and interrupt
was taken on userspace stack. The root cause lies in the specific AMD CPU
behaviour which manifests itself as unusable segment attributes on SYSRET.
The corresponding work around for the kernel is the following:
61f01dd941 ("x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue")
In other turn virtualization side treated unusable segment incorrectly and
restored CPL from SS attributes, which were zeroed out few lines above.
In current patch it is assured only that P bit is cleared in VMCB.save state
and segment attributes are not zeroed out if segment is not presented or is
unusable, therefore CPL can be safely restored from DPL field.
This is only one part of the fix, since QEMU side should be fixed accordingly
not to zero out attributes on its side. Corresponding patch will follow.
[1] Message id: CAJrWOzD6Xq==b-zYCDdFLgSRMPM-NkNuTSDFEtX=7MreT45i7Q@mail.gmail.com
Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
Signed-off-by: Mikhail Sennikovskii <mikhail.sennikovskii@profitbricks.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim KrÄmář <rkrcmar@redhat.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The previous commit [63691587f7: ALSA: hda - Apply dual-codec quirk
for MSI Z270-Gaming mobo] attempted to apply the existing dual-codec
quirk for a MSI mobo. But it turned out that this isn't applied
properly due to the MSI-vendor quirk before this entry. I overlooked
such two MSI entries just because they were put in the wrong position,
although we have a list ordered by PCI SSID numbers.
This patch fixes it by rearranging the unordered entries.
Fixes: 63691587f7 ("ALSA: hda - Apply dual-codec quirk for MSI Z270-Gaming mobo")
Reported-by: Rudolf Schmidt <info@rudolfschmidt.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Pull drm fixes from Dave Airlie:
"This is the main set of fixes for rc4, one amdgpu fix, some exynos
regression fixes, some msm fixes and some i915 and GVT fixes.
I've got a second regression fix for some DP chips that might be a
bit large, but I think we'd like to land it now, I'll send it along
tomorrow, once you are happy with this set"
* tag 'drm-fixes-for-v4.12-rc4' of git://people.freedesktop.org/~airlied/linux: (24 commits)
drm/amdgpu: Program ring for vce instance 1 at its register space
drm/exynos: clean up description of exynos_drm_crtc
drm/exynos: dsi: Remove bridge node reference in removal
drm/exynos: dsi: Fix the parse_dt function
drm/exynos: Merge pre/postclose hooks
drm/msm: Fix the check for the command size
drm/msm: Take the mutex before calling msm_gem_new_impl
drm/msm: for array in-fences, check if all backing fences are from our own context before waiting
drm/msm: constify irq_domain_ops
drm/msm/mdp5: release hwpipe(s) for unused planes
drm/msm: Reuse dma_fence_release.
drm/msm: Expose our reservation object when exporting a dmabuf.
drm/msm/gpu: check legacy clk names in get_clocks()
drm/msm/mdp5: use __drm_atomic_helper_plane_duplicate_state()
drm/msm: select PM_OPP
drm/i915: Stop pretending to mask/unmask LPE audio interrupts
drm/i915/selftests: Silence compiler warning in igt_ctx_exec
Revert "drm/i915: Restore lost "Initialized i915" welcome message"
drm/i915/gvt: clean up unsubmited workloads before destroying kmem cache
drm/i915/gvt: Disable compression workaround for Gen9
...
- Fix a regression to description of exynos_drm_crtc
- Remove preclose hook of Exynos
. This was a exynos change of the patch series[1] merged already.
- Fix one dt broken issue
- Make sure to release bridge_node of Exynos MIPI-DSI driver.
[1] https://lists.freedesktop.org/archives/dri-devel/2017-March/135111.html
* tag 'exynos-drm-fixes-for-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos:
drm/exynos: clean up description of exynos_drm_crtc
drm/exynos: dsi: Remove bridge node reference in removal
drm/exynos: dsi: Fix the parse_dt function
drm/exynos: Merge pre/postclose hooks
a few fixes for 4.12..
* 'msm-fixes-4.12-rc4' of git://people.freedesktop.org/~robclark/linux:
drm/msm: Fix the check for the command size
drm/msm: Take the mutex before calling msm_gem_new_impl
drm/msm: for array in-fences, check if all backing fences are from our own context before waiting
drm/msm: constify irq_domain_ops
drm/msm/mdp5: release hwpipe(s) for unused planes
drm/msm: Reuse dma_fence_release.
drm/msm: Expose our reservation object when exporting a dmabuf.
drm/msm/gpu: check legacy clk names in get_clocks()
drm/msm/mdp5: use __drm_atomic_helper_plane_duplicate_state()
drm/msm: select PM_OPP
drm/i915 fixes for v4.12-rc4
* tag 'drm-intel-fixes-2017-05-29' of git://anongit.freedesktop.org/git/drm-intel:
drm/i915: Stop pretending to mask/unmask LPE audio interrupts
drm/i915/selftests: Silence compiler warning in igt_ctx_exec
Revert "drm/i915: Restore lost "Initialized i915" welcome message"
drm/i915/gvt: clean up unsubmited workloads before destroying kmem cache
drm/i915/gvt: Disable compression workaround for Gen9
drm/i915: set initialised only when init_context callback is NULL
drm/i915: Fix new -Wint-in-bool-context gcc compiler warning
drm/i915: use vma->size for appgtt allocate_va_range
drm/i915: Do not sync RCU during shrinking
There are three timing problems in the kthread usages of iscsi_target_mod:
- np_thread of struct iscsi_np
- rx_thread and tx_thread of struct iscsi_conn
In iscsit_close_connection(), it calls
send_sig(SIGINT, conn->tx_thread, 1);
kthread_stop(conn->tx_thread);
In conn->tx_thread, which is iscsi_target_tx_thread(), when it receive
SIGINT the kthread will exit without checking the return value of
kthread_should_stop().
So if iscsi_target_tx_thread() exit right between send_sig(SIGINT...)
and kthread_stop(...), the kthread_stop() will try to stop an already
stopped kthread.
This is invalid according to the documentation of kthread_stop().
(Fix -ECONNRESET logout handling in iscsi_target_tx_thread and
early iscsi_target_rx_thread failure case - nab)
Signed-off-by: Jiang Yi <jiangyilism@gmail.com>
Cc: <stable@vger.kernel.org> # v3.12+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
This patch fixes a OOPs originally introduced by:
commit bb048357da
Author: Nicholas Bellinger <nab@linux-iscsi.org>
Date: Thu Sep 5 14:54:04 2013 -0700
iscsi-target: Add sk->sk_state_change to cleanup after TCP failure
which would trigger a NULL pointer dereference when a TCP connection
was closed asynchronously via iscsi_target_sk_state_change(), but only
when the initial PDU processing in iscsi_target_do_login() from iscsi_np
process context was blocked waiting for backend I/O to complete.
To address this issue, this patch makes the following changes.
First, it introduces some common helper functions used for checking
socket closing state, checking login_flags, and atomically checking
socket closing state + setting login_flags.
Second, it introduces a LOGIN_FLAGS_INITIAL_PDU bit to know when a TCP
connection has dropped via iscsi_target_sk_state_change(), but the
initial PDU processing within iscsi_target_do_login() in iscsi_np
context is still running. For this case, it sets LOGIN_FLAGS_CLOSED,
but doesn't invoke schedule_delayed_work().
The original NULL pointer dereference case reported by MNC is now handled
by iscsi_target_do_login() doing a iscsi_target_sk_check_close() before
transitioning to FFP to determine when the socket has already closed,
or iscsi_target_start_negotiation() if the login needs to exchange
more PDUs (eg: iscsi_target_do_login returned 0) but the socket has
closed. For both of these cases, the cleanup up of remaining connection
resources will occur in iscsi_target_start_negotiation() from iscsi_np
process context once the failure is detected.
Finally, to handle to case where iscsi_target_sk_state_change() is
called after the initial PDU procesing is complete, it now invokes
conn->login_work -> iscsi_target_do_login_rx() to perform cleanup once
existing iscsi_target_sk_check_close() checks detect connection failure.
For this case, the cleanup of remaining connection resources will occur
in iscsi_target_do_login_rx() from delayed workqueue process context
once the failure is detected.
Reported-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Tested-by: Mike Christie <mchristi@redhat.com>
Cc: Mike Christie <mchristi@redhat.com>
Reported-by: Hannes Reinecke <hare@suse.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Varun Prakash <varun@chelsio.com>
Cc: <stable@vger.kernel.org> # v3.12+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>