Commit Graph

82340 Commits

Author SHA1 Message Date
Christoffer Dall
2db4c104fa KVM: arm/arm64: Get rid of vgic_cpu->nr_lr
The number of list registers is a property of the underlying system, not
of emulated VGIC CPU interface.

As we are about to move this variable to global state in the new vgic
for clarity, move it from the legacy implementation as well to make the
merge of the new code easier.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
2016-05-20 15:39:41 +02:00
Christoffer Dall
41a54482c0 KVM: arm/arm64: Move timer IRQ map to latest possible time
We are about to modify the VGIC to allocate all data structures
dynamically and store mapped IRQ information on a per-IRQ struct, which
is indeed allocated dynamically at init time.

Therefore, we cannot record the mapped IRQ info from the timer at timer
reset time like it's done now, because VCPU reset happens before timer
init.

A possible later time to do this is on the first run of a per VCPU, it
just requires us to move the enable state to be a per-VCPU state and do
the lookup of the physical IRQ number when we are about to run the VCPU.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
2016-05-20 15:39:41 +02:00
Andre Przywara
c8eb3f6b9b KVM: arm/arm64: vgic: Remove irq_phys_map from interface
Now that the virtual arch timer does not care about the irq_phys_map
anymore, let's rework kvm_vgic_map_phys_irq() to return an error
value instead. Any reference to that mapping can later be done by
passing the correct combination of VCPU and virtual IRQ number.
This makes the irq_phys_map handling completely private to the
VGIC code.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Eric Auger <eric.auger@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-20 15:39:40 +02:00
Andre Przywara
a7e33ad9b2 KVM: arm/arm64: arch_timer: Remove irq_phys_map
Now that the interface between the arch timer and the VGIC does not
require passing the irq_phys_map entry pointer anymore, let's remove
it from the virtual arch timer and use the virtual IRQ number instead
directly.
The remaining pointer returned by kvm_vgic_map_phys_irq() will be
removed in the following patch.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Eric Auger <eric.auger@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-20 15:39:39 +02:00
Christoffer Dall
b452cb5207 KVM: arm/arm64: Remove the IRQ field from struct irq_phys_map
The communication of a Linux IRQ number from outside the VGIC to the
vgic was a leftover from the day when the vgic code cared about how a
particular device injects virtual interrupts mapped to a physical
interrupt.

We can safely remove this notion, leaving all physical IRQ handling to
be done in the device driver (the arch timer in this case), which makes
room for a saner API for the new VGIC.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Eric Auger <eric.auger@linaro.org>
2016-05-20 15:39:39 +02:00
Andre Przywara
63306c28ac KVM: arm/arm64: vgic: avoid map in kvm_vgic_unmap_phys_irq()
kvm_vgic_unmap_phys_irq() only needs the virtual IRQ number, so let's
just pass that between the arch timer and the VGIC to get rid of
the irq_phys_map pointer.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Eric Auger <eric.auger@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-20 15:39:38 +02:00
Andre Przywara
e262f41936 KVM: arm/arm64: vgic: avoid map in kvm_vgic_map_is_active()
For getting the active state of a mapped IRQ, we actually only need
the virtual IRQ number, not the pointer to the mapping entry.
Pass the virtual IRQ number from the arch timer to the VGIC directly.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Eric Auger <eric.auger@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-20 15:39:38 +02:00
Andre Przywara
4f551a3d96 KVM: arm/arm64: vgic: avoid map in kvm_vgic_inject_mapped_irq()
When we want to inject a hardware mapped IRQ into a guest, we actually
only need the virtual IRQ number from the irq_phys_map.
So let's pass this number directly from the arch timer to the VGIC
to avoid using the map as a parameter.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Eric Auger <eric.auger@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-20 15:39:37 +02:00
Julien Grall
a53d892dfb clocksource: arm_arch_timer: Remove arch_timer_get_timecounter
The only call of arch_timer_get_timecounter (in KVM) has been removed.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-03 12:54:21 +02:00
Julien Grall
503a62862e KVM: arm/arm64: vgic: Rely on the GIC driver to parse the firmware tables
Currently, the firmware tables are parsed 2 times: once in the GIC
drivers, the other time when initializing the vGIC. It means code
duplication and make more tedious to add the support for another
firmware table (like ACPI).

Use the recently introduced helper gic_get_kvm_info() to get
information about the virtual GIC.

With this change, the virtual GIC becomes agnostic to the firmware
table and KVM will be able to initialize the vGIC on ACPI.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-03 12:54:21 +02:00
Julien Grall
1839e57696 irqchip/gic-v3: Parse and export virtual GIC information
Fill up the recently introduced gic_kvm_info with the hardware
information used for virtualization.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-03 12:54:21 +02:00
Julien Grall
502d6df11a irqchip/gic-v2: Parse and export virtual GIC information
For now, the firmware tables are parsed 2 times: once in the GIC
drivers, the other timer when initializing the vGIC. It means code
duplication and make more tedious to add the support for another
firmware table (like ACPI).

Introduce a new structure and set of helpers to get/set the virtual GIC
information. Also fill up the structure for GICv2.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-03 12:54:21 +02:00
Julien Grall
d9b5e41591 clocksource: arm_arch_timer: Extend arch_timer_kvm_info to get the virtual IRQ
Currently, the firmware table is parsed by the virtual timer code in
order to retrieve the virtual timer interrupt. However, this is already
done by the arch timer driver.

To avoid code duplication, extend arch_timer_kvm_info to get the virtual
IRQ.

Note that the KVM code will be modified in a subsequent patch.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-03 12:54:21 +02:00
Julien Grall
b4d6ce9776 clocksource: arm_arch_timer: Gather KVM specific information in a structure
Introduce a structure which are filled up by the arch timer driver and
used by the virtual timer in KVM.

The first member of this structure will be the timecounter. More members
will be added later.

A stub for the new helper isn't introduced because KVM requires the arch
timer for both ARM64 and ARM32.

The function arch_timer_get_timecounter is kept for the time being and
will be dropped in a subsequent patch.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-03 12:54:20 +02:00
Linus Torvalds
d5a38f6e46 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph updates from Sage Weil:
 "There is quite a bit here, including some overdue refactoring and
  cleanup on the mon_client and osd_client code from Ilya, scattered
  writeback support for CephFS and a pile of bug fixes from Zheng, and a
  few random cleanups and fixes from others"

[ I already decided not to pull this because of it having been rebased
  recently, but ended up changing my mind after all.  Next time I'll
  really hold people to it.  Oh well.   - Linus ]

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (34 commits)
  libceph: use KMEM_CACHE macro
  ceph: use kmem_cache_zalloc
  rbd: use KMEM_CACHE macro
  ceph: use lookup request to revalidate dentry
  ceph: kill ceph_get_dentry_parent_inode()
  ceph: fix security xattr deadlock
  ceph: don't request vxattrs from MDS
  ceph: fix mounting same fs multiple times
  ceph: remove unnecessary NULL check
  ceph: avoid updating directory inode's i_size accidentally
  ceph: fix race during filling readdir cache
  libceph: use sizeof_footer() more
  ceph: kill ceph_empty_snapc
  ceph: fix a wrong comparison
  ceph: replace CURRENT_TIME by current_fs_time()
  ceph: scattered page writeback
  libceph: add helper that duplicates last extent operation
  libceph: enable large, variable-sized OSD requests
  libceph: osdc->req_mempool should be backed by a slab pool
  libceph: make r_request msg_size calculation clearer
  ...
2016-03-26 15:53:16 -07:00
Linus Torvalds
b4cec5f668 Merge tag 'ntb-4.6' of git://github.com/jonmason/ntb
Pull NTB bug fixes from Jon Mason:
 "NTB bug fixes for tasklet from spinning forever, link errors,
  translation window setup, NULL ptr dereference, and ntb-perf errors.

  Also, a modification to the driver API that makes _addr functions
  optional"

* tag 'ntb-4.6' of git://github.com/jonmason/ntb:
  NTB: Remove _addr functions from ntb_hw_amd
  NTB: Make _addr functions optional in the API
  NTB: Fix incorrect clean up routine in ntb_perf
  NTB: Fix incorrect return check in ntb_perf
  ntb: fix possible NULL dereference
  ntb: add missing setup of translation window
  ntb: stop link work when we do not have memory
  ntb: stop tasklet from spinning forever during shutdown.
  ntb: perf test: fix address space confusion
2016-03-26 11:37:42 -07:00
Linus Torvalds
895a1067d5 Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull more SCSI updates from James Bottomley:
 "The only new stuff which missed the first pull request is an update to
  the UFS driver.

  The rest is an assortment of bug fixes and minor tweaks which appeared
  recently (some are fixes for recent code and some are stuff spotted
  recently by the checkers or the new gcc-6 compiler [most of Arnd's
  stuff])"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (32 commits)
  scsi_common: do not clobber fixed sense information
  scsi: ufs: select CONFIG_NLS
  scsi: fc: use get/put_unaligned64 for wwn access
  fnic: move printk()s outside of the critical code section.
  qla2xxx: avoid maybe_uninitialized warning
  megaraid_sas: add missing curly braces in ioctl handler
  lpfc: fix misleading indentation
  scsi_transport_sas: add 'scsi_target_id' sysfs attribute
  scsi_dh_alua: uninitialized variable in alua_check_vpd()
  scsi: ufs-qcom: add printouts of testbus debug registers
  scsi: ufs-qcom: enable/disable the device ref clock
  scsi: ufs-qcom: set PA_Local_TX_LCC_Enable before link startup
  scsi: ufs: add device quirk delay before putting UFS rails in LPM
  scsi: ufs: fix leakage during link off state
  scsi: ufs: tune UniPro parameters to optimize hibern8 exit time
  scsi: ufs: handle non spec compliant bkops behaviour by device
  scsi: ufs: add retry for query descriptors
  scsi: ufs: add error recovery after DL NAC error
  scsi: ufs: make error handling bit faster
  scsi: ufs: disable vccq if it's not needed by UFS device
  ...
2016-03-26 11:31:01 -07:00
Alexander Potapenko
cd11016e5f mm, kasan: stackdepot implementation. Enable stackdepot for SLAB
Implement the stack depot and provide CONFIG_STACKDEPOT.  Stack depot
will allow KASAN store allocation/deallocation stack traces for memory
chunks.  The stack traces are stored in a hash table and referenced by
handles which reside in the kasan_alloc_meta and kasan_free_meta
structures in the allocated memory chunks.

IRQ stack traces are cut below the IRQ entry point to avoid unnecessary
duplication.

Right now stackdepot support is only enabled in SLAB allocator.  Once
KASAN features in SLAB are on par with those in SLUB we can switch SLUB
to stackdepot as well, thus removing the dependency on SLUB stack
bookkeeping, which wastes a lot of memory.

This patch is based on the "mm: kasan: stack depots" patch originally
prepared by Dmitry Chernenkov.

Joonsoo has said that he plans to reuse the stackdepot code for the
mm/page_owner.c debugging facility.

[akpm@linux-foundation.org: s/depot_stack_handle/depot_stack_handle_t]
[aryabinin@virtuozzo.com: comment style fixes]
Signed-off-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrey Konovalov <adech.fo@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-25 16:37:42 -07:00
Alexander Potapenko
be7635e728 arch, ftrace: for KASAN put hard/soft IRQ entries into separate sections
KASAN needs to know whether the allocation happens in an IRQ handler.
This lets us strip everything below the IRQ entry point to reduce the
number of unique stack traces needed to be stored.

Move the definition of __irq_entry to <linux/interrupt.h> so that the
users don't need to pull in <linux/ftrace.h>.  Also introduce the
__softirq_entry macro which is similar to __irq_entry, but puts the
corresponding functions to the .softirqentry.text section.

Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrey Konovalov <adech.fo@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-25 16:37:42 -07:00
Alexander Potapenko
505f5dcb1c mm, kasan: add GFP flags to KASAN API
Add GFP flags to KASAN hooks for future patches to use.

This patch is based on the "mm: kasan: unified support for SLUB and SLAB
allocators" patch originally prepared by Dmitry Chernenkov.

Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrey Konovalov <adech.fo@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-25 16:37:42 -07:00
Alexander Potapenko
7ed2f9e663 mm, kasan: SLAB support
Add KASAN hooks to SLAB allocator.

This patch is based on the "mm: kasan: unified support for SLUB and SLAB
allocators" patch originally prepared by Dmitry Chernenkov.

Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrey Konovalov <adech.fo@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-25 16:37:42 -07:00
Tetsuo Handa
aaf4fb712b include/linux/oom.h: remove undefined oom_kills_count()/note_oom_kill()
A leftover from commit c32b3cbe0d ("oom, PM: make OOM detection in the
freezer path raceless").

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-25 16:37:42 -07:00
Tetsuo Handa
bb29902a75 oom, oom_reaper: protect oom_reaper_list using simpler way
"oom, oom_reaper: disable oom_reaper for oom_kill_allocating_task" tried
to protect oom_reaper_list using MMF_OOM_KILLED flag.  But we can do it
by simply checking tsk->oom_reaper_list != NULL.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-25 16:37:42 -07:00
Vladimir Davydov
29c696e1c6 oom: make oom_reaper_list single linked
Entries are only added/removed from oom_reaper_list at head so we can
use a single linked list and hence save a word in task_struct.

Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-25 16:37:42 -07:00
Michal Hocko
855b018325 oom, oom_reaper: disable oom_reaper for oom_kill_allocating_task
Tetsuo has reported that oom_kill_allocating_task=1 will cause
oom_reaper_list corruption because oom_kill_process doesn't follow
standard OOM exclusion (aka ignores TIF_MEMDIE) and allows to enqueue
the same task multiple times - e.g.  by sacrificing the same child
multiple times.

This patch fixes the issue by introducing a new MMF_OOM_KILLED mm flag
which is set in oom_kill_process atomically and oom reaper is disabled
if the flag was already set.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-25 16:37:42 -07:00
Michal Hocko
03049269de mm, oom_reaper: implement OOM victims queuing
wake_oom_reaper has allowed only 1 oom victim to be queued.  The main
reason for that was the simplicity as other solutions would require some
way of queuing.  The current approach is racy and that was deemed
sufficient as the oom_reaper is considered a best effort approach to
help with oom handling when the OOM victim cannot terminate in a
reasonable time.  The race could lead to missing an oom victim which can
get stuck

out_of_memory
  wake_oom_reaper
    cmpxchg // OK
    			oom_reaper
			  oom_reap_task
			    __oom_reap_task
oom_victim terminates
			      atomic_inc_not_zero // fail
out_of_memory
  wake_oom_reaper
    cmpxchg // fails
			  task_to_reap = NULL

This race requires 2 OOM invocations in a short time period which is not
very likely but certainly not impossible.  E.g.  the original victim
might have not released a lot of memory for some reason.

The situation would improve considerably if wake_oom_reaper used a more
robust queuing.  This is what this patch implements.  This means adding
oom_reaper_list list_head into task_struct (eat a hole before embeded
thread_struct for that purpose) and a oom_reaper_lock spinlock for
queuing synchronization.  wake_oom_reaper will then add the task on the
queue and oom_reaper will dequeue it.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Andrea Argangeli <andrea@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-25 16:37:42 -07:00
Michal Hocko
36324a990c oom: clear TIF_MEMDIE after oom_reaper managed to unmap the address space
When oom_reaper manages to unmap all the eligible vmas there shouldn't
be much of the freable memory held by the oom victim left anymore so it
makes sense to clear the TIF_MEMDIE flag for the victim and allow the
OOM killer to select another task.

The lack of TIF_MEMDIE also means that the victim cannot access memory
reserves anymore but that shouldn't be a problem because it would get
the access again if it needs to allocate and hits the OOM killer again
due to the fatal_signal_pending resp.  PF_EXITING check.  We can safely
hide the task from the OOM killer because it is clearly not a good
candidate anymore as everyhing reclaimable has been torn down already.

This patch will allow to cap the time an OOM victim can keep TIF_MEMDIE
and thus hold off further global OOM killer actions granted the oom
reaper is able to take mmap_sem for the associated mm struct.  This is
not guaranteed now but further steps should make sure that mmap_sem for
write should be blocked killable which will help to reduce such a lock
contention.  This is not done by this patch.

Note that exit_oom_victim might be called on a remote task from
__oom_reap_task now so we have to check and clear the flag atomically
otherwise we might race and underflow oom_victims or wake up waiters too
early.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Suggested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Andrea Argangeli <andrea@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-25 16:37:42 -07:00
Michal Hocko
aac4536355 mm, oom: introduce oom reaper
This patch (of 5):

This is based on the idea from Mel Gorman discussed during LSFMM 2015
and independently brought up by Oleg Nesterov.

The OOM killer currently allows to kill only a single task in a good
hope that the task will terminate in a reasonable time and frees up its
memory.  Such a task (oom victim) will get an access to memory reserves
via mark_oom_victim to allow a forward progress should there be a need
for additional memory during exit path.

It has been shown (e.g.  by Tetsuo Handa) that it is not that hard to
construct workloads which break the core assumption mentioned above and
the OOM victim might take unbounded amount of time to exit because it
might be blocked in the uninterruptible state waiting for an event (e.g.
lock) which is blocked by another task looping in the page allocator.

This patch reduces the probability of such a lockup by introducing a
specialized kernel thread (oom_reaper) which tries to reclaim additional
memory by preemptively reaping the anonymous or swapped out memory owned
by the oom victim under an assumption that such a memory won't be needed
when its owner is killed and kicked from the userspace anyway.  There is
one notable exception to this, though, if the OOM victim was in the
process of coredumping the result would be incomplete.  This is
considered a reasonable constrain because the overall system health is
more important than debugability of a particular application.

A kernel thread has been chosen because we need a reliable way of
invocation so workqueue context is not appropriate because all the
workers might be busy (e.g.  allocating memory).  Kswapd which sounds
like another good fit is not appropriate as well because it might get
blocked on locks during reclaim as well.

oom_reaper has to take mmap_sem on the target task for reading so the
solution is not 100% because the semaphore might be held or blocked for
write but the probability is reduced considerably wrt.  basically any
lock blocking forward progress as described above.  In order to prevent
from blocking on the lock without any forward progress we are using only
a trylock and retry 10 times with a short sleep in between.  Users of
mmap_sem which need it for write should be carefully reviewed to use
_killable waiting as much as possible and reduce allocations requests
done with the lock held to absolute minimum to reduce the risk even
further.

The API between oom killer and oom reaper is quite trivial.
wake_oom_reaper updates mm_to_reap with cmpxchg to guarantee only
NULL->mm transition and oom_reaper clear this atomically once it is done
with the work.  This means that only a single mm_struct can be reaped at
the time.  As the operation is potentially disruptive we are trying to
limit it to the ncessary minimum and the reaper blocks any updates while
it operates on an mm.  mm_struct is pinned by mm_count to allow parallel
exit_mmap and a race is detected by atomic_inc_not_zero(mm_users).

Signed-off-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Mel Gorman <mgorman@suse.de>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Argangeli <andrea@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-25 16:37:42 -07:00
Andrew Morton
69b27baf00 sched: add schedule_timeout_idle()
This will be needed in the patch "mm, oom: introduce oom reaper".

Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-25 16:37:42 -07:00
Yan, Zheng
315f240880 ceph: fix security xattr deadlock
When security is enabled, security module can call filesystem's
getxattr/setxattr callbacks during d_instantiate(). For cephfs,
d_instantiate() is usually called by MDS' dispatch thread, while
handling MDS reply. If the MDS reply does not include xattrs and
corresponding caps, getxattr/setxattr need to send a new request
to MDS and waits for the reply. This makes MDS' dispatch sleep,
nobody handles later MDS replies.

The fix is make sure lookup/atomic_open reply include xattrs and
corresponding caps. So getxattr can be handled by cached xattrs.
This requires some modification to both MDS and request message.
(Client tells MDS what caps it wants; MDS encodes proper caps in
the reply)

Smack security module may call setxattr during d_instantiate().
Unlike getxattr, we can't force MDS to issue CEPH_CAP_XATTR_EXCL
to us. So just make setxattr return error when called by MDS'
dispatch thread.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25 18:51:55 +01:00
Yan, Zheng
2c63f49a72 libceph: add helper that duplicates last extent operation
This helper duplicates last extent operation in OSD request, then
adjusts the new extent operation's offset and length. The helper
is for scatterd page writeback, which adds nonconsecutive dirty
pages to single OSD request.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-25 18:51:43 +01:00
Ilya Dryomov
3f1af42ad0 libceph: enable large, variable-sized OSD requests
Turn r_ops into a flexible array member to enable large, consisting of
up to 16 ops, OSD requests.  The use case is scattered writeback in
cephfs and, as far as the kernel client is concerned, 16 is just a made
up number.

r_ops had size 3 for copyup+hint+write, but copyup is really a special
case - it can only happen once.  ceph_osd_request_cache is therefore
stuffed with num_ops=2 requests, anything bigger than that is allocated
with kmalloc().  req_mempool is backed by ceph_osd_request_cache, which
means either num_ops=1 or num_ops=2 for use_mempool=true - all existing
users (ceph_writepages_start(), ceph_osdc_writepages()) are fine with
that.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-25 18:51:43 +01:00
Yan, Zheng
7665d85b73 libceph: move r_reply_op_{len,result} into struct ceph_osd_req_op
This avoids defining large array of r_reply_op_{len,result} in
in struct ceph_osd_request.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-25 18:51:42 +01:00
Ilya Dryomov
de2aa102ea libceph: rename ceph_osd_req_op::payload_len to indata_len
Follow userspace nomenclature on this - the next commit adds
outdata_len.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-25 18:51:41 +01:00
Ilya Dryomov
168b9090c7 libceph: monc hunt rate is 3s with backoff up to 30s
Unless we are in the process of setting up a client (i.e. connecting to
the monitor cluster for the first time), apply a backoff: every time we
want to reopen a session, increase our timeout by a multiple (currently
2); when we complete the connection, reduce that multipler by 50%.

Mirrors ceph.git commit 794c86fd289bd62a35ed14368fa096c46736e9a2.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-25 18:51:39 +01:00
Ilya Dryomov
58d81b1294 libceph: monc ping rate is 10s
Split ping interval and ping timeout: ping interval is 10s; keepalive
timeout is 30s.

Make monc_ping_timeout a constant while at it - it's not actually
exported as a mount option (and the rest of tick-related settings won't
be either), so it's got no place in ceph_options.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-25 18:51:39 +01:00
Ilya Dryomov
82dcabad75 libceph: revamp subs code, switch to SUBSCRIBE2 protocol
It is currently hard-coded in the mon_client that mdsmap and monmap
subs are continuous, while osdmap sub is always "onetime".  To better
handle full clusters/pools in the osd_client, we need to be able to
issue continuous osdmap subs.  Revamp subs code to allow us to specify
for each sub whether it should be continuous or not.

Although not strictly required for the above, switch to SUBSCRIBE2
protocol while at it, eliminating the ambiguity between a request for
"every map since X" and a request for "just the latest" when we don't
have a map yet (i.e. have epoch 0).  SUBSCRIBE2 feature bit is now
required - it's been supported since pre-argonaut (2010).

Move "got mdsmap" call to the end of ceph_mdsc_handle_map() - calling
in before we validate the epoch and successfully install the new map
can mess up mon_client sub state.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-25 18:51:38 +01:00
Linus Torvalds
f98c2135f8 Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
 "Just a couple of dma-buf related fixes and some amdgpu fixes, along
  with a regression fix for radeon off but default feature, but makes my
  30" monitor happy again"

* 'drm-next' of git://people.freedesktop.org/~airlied/linux:
  drm/radeon/mst: cleanup code indentation
  drm/radeon/mst: fix regression in lane/link handling.
  drm/amdgpu: add invalidate_page callback for userptrs
  drm/amdgpu: Revert "remove the userptr rmn->lock"
  drm/amdgpu: clean up path handling for powerplay
  drm/amd/powerplay: fix memory leak of tdp_table
  dma-buf/fence: fix fence_is_later v2
  dma-buf: Update docs for SYNC ioctl
  drm: remove excess description
  dma-buf, drm, ion: Propagate error code from dma_buf_start_cpu_access()
  drm/atmel-hlcdc: use helper to get crtc state
  drm/atomic: use helper to get crtc state
2016-03-25 08:48:31 -07:00
Linus Torvalds
11caf57f6a Merge tag 'asm-generic-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic updates from Arnd Bergmann:
 "There are only three patches this time, most other changes to files in
  include/asm-generic tend to go through the tree of whoever depends on
  the change.

  Two patches are cleanups for stuff that is no longer needed, the main
  change is to adapt the generic version of BUG_ON() for CONFIG_BUG=n to
  make it behave consistently with BUG().

  This avoids undefined behavior along with a number of warnings about
  that undefined behavior in randconfig builds when we keep going on
  after hitting a BUG_ON()"

* tag 'asm-generic-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  asm-generic: remove old nonatomic-io wrapper files
  asm-generic: default BUG_ON(x) to if(x)BUG()
  asm-generic: page.h: Remove useless get_user_page and free_user_page
2016-03-24 23:13:48 -07:00
Linus Torvalds
3d66c6ba3f Merge tag 'pm+acpi-4.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more power management and ACPI updates from Rafael Wysocki:
 "The second batch of power management and ACPI updates for v4.6.

  Included are fixups on top of the previous PM/ACPI pull request and
  other material that didn't make into it but still should go into 4.6.

  Among other things, there's a fix for an intel_pstate driver issue
  uncovered by recent cpufreq changes, a workaround for a boot hang on
  Skylake-H related to the handling of deep C-states by the platform and
  a PCI/ACPI fix for the handling of IO port resources on non-x86
  architectures plus some new device IDs and similar.

  Specifics:

   - Fix for an intel_pstate driver issue related to the handling of MSR
     updates uncovered by the recent cpufreq rework (Rafael Wysocki).

   - cpufreq core cleanups related to starting governors and frequency
     synchronization during resume from system suspend and a locking fix
     for cpufreq_quick_get() (Rafael Wysocki, Richard Cochran).

   - acpi-cpufreq and powernv cpufreq driver updates (Jisheng Zhang,
     Michael Neuling, Richard Cochran, Shilpasri Bhat).

   - intel_idle driver update preventing some Skylake-H systems from
     hanging during initialization by disabling deep C-states mishandled
     by the platform in the problematic configurations (Len Brown).

   - Intel Xeon Phi Processor x200 support for intel_idle
     (Dasaratharaman Chandramouli).

   - cpuidle menu governor updates to make it always honor PM QoS
     latency constraints (and prevent C1 from being used as the fallback
     C-state on x86 when they are set below its exit latency) and to
     restore the previous behavior to fall back to C1 if the next timer
     event is set far enough in the future that was changed in 4.4 which
     led to an energy consumption regression (Rik van Riel, Rafael
     Wysocki).

   - New device ID for a future AMD UART controller in the ACPI driver
     for AMD SoCs (Wang Hongcheng).

   - Rockchip rk3399 support for the rockchip-io-domain adaptive voltage
     scaling (AVS) driver (David Wu).

   - ACPI PCI resources management fix for the handling of IO space
     resources on architectures where the IO space is memory mapped
     (IA64 and ARM64) broken by the introduction of common ACPI
     resources parsing for PCI host bridges in 4.4 (Lorenzo Pieralisi).

   - Fix for the ACPI backend of the generic device properties API to
     make it parse non-device (data node only) children of an ACPI
     device correctly (Irina Tirdea).

   - Fixes for the handling of global suspend flags (introduced in 4.4)
     during hibernation and resume from it (Lukas Wunner).

   - Support for obtaining configuration information from Device Trees
     in the PM clocks framework (Jon Hunter).

   - ACPI _DSM helper code and devfreq framework cleanups (Colin Ian
     King, Geert Uytterhoeven)"

* tag 'pm+acpi-4.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (23 commits)
  PM / AVS: rockchip-io: add io selectors and supplies for rk3399
  intel_idle: Support for Intel Xeon Phi Processor x200 Product Family
  intel_idle: prevent SKL-H boot failure when C8+C9+C10 enabled
  ACPI / PM: Runtime resume devices when waking from hibernate
  PM / sleep: Clear pm_suspend_global_flags upon hibernate
  cpufreq: governor: Always schedule work on the CPU running update
  cpufreq: Always update current frequency before startig governor
  cpufreq: Introduce cpufreq_update_current_freq()
  cpufreq: Introduce cpufreq_start_governor()
  cpufreq: powernv: Add sysfs attributes to show throttle stats
  cpufreq: acpi-cpufreq: make Intel/AMD MSR access, io port access static
  PCI: ACPI: IA64: fix IO port generic range check
  ACPI / util: cast data to u64 before shifting to fix sign extension
  cpufreq: powernv: Define per_cpu chip pointer to optimize hot-path
  cpuidle: menu: Fall back to polling if next timer event is near
  cpufreq: acpi-cpufreq: Clean up hot plug notifier callback
  intel_pstate: Do not call wrmsrl_on_cpu() with disabled interrupts
  cpufreq: Make cpufreq_quick_get() safe to call
  ACPI / property: fix data node parsing in acpi_get_next_subnode()
  ACPI / APD: Add device HID for future AMD UART controller
  ...
2016-03-24 22:59:58 -07:00
Linus Torvalds
1d02369dba Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "Final round of fixes for this merge window - some of this has come up
  after the initial pull request, and some of it was put in a post-merge
  branch before the merge window.

  This contains:

   - Fix for a bad check for an error on dma mapping in the mtip32xx
     driver, from Alexey Khoroshilov.

   - A set of fixes for lightnvm, from Javier, Matias, and Wenwei.

   - An NVMe completion record corruption fix from Marta, ensuring that
     we read things in the right order.

   - Two writeback fixes from Tejun, marked for stable@ as well.

   - A blk-mq sw queue iterator fix from Thomas, fixing an oops for
     sparse CPU maps.  They hit this in the hot plug/unplug rework"

* 'for-linus' of git://git.kernel.dk/linux-block:
  nvme: avoid cqe corruption when update at the same time as read
  writeback, cgroup: fix use of the wrong bdi_writeback which mismatches the inode
  writeback, cgroup: fix premature wb_put() in locked_inode_to_wb_and_lock_list()
  blk-mq: Use proper cpumask iterator
  mtip32xx: fix checks for dma mapping errors
  lightnvm: do not load L2P table if not supported
  lightnvm: do not reserve lun on l2p loading
  nvme: lightnvm: return ppa completion status
  lightnvm: add a bitmap of luns
  lightnvm: specify target's logical address area
  null_blk: add lightnvm null_blk device to the nullb_list
2016-03-24 20:00:44 -07:00
Linus Torvalds
8f40842e42 Merge tag 'for-linus-20160324' of git://git.infradead.org/linux-mtd
Pull MTD updates from Brian Norris:
 "NAND:
   - Add sunxi_nand randomizer support
   - begin refactoring NAND ecclayout structs
   - fix pxa3xx_nand dmaengine usage
   - brcmnand: fix support for v7.1 controller
   - add Qualcomm NAND controller driver

  SPI NOR:
   - add new ls1021a, ls2080a support to Freescale QuadSPI
   - add new flash ID entries
   - support bottom-block protection for Winbond flash
   - support Status Register Write Protect
   - remove broken QPI support for Micron SPI flash

  JFFS2:
   - improve post-mount CRC scan efficiency

  General:
   - refactor bcm63xxpart parser, to later extend for NAND
   - add writebuf size parameter to mtdram

  Other minor code quality improvements"

* tag 'for-linus-20160324' of git://git.infradead.org/linux-mtd: (72 commits)
  mtd: nand: remove kerneldoc for removed function parameter
  mtd: nand: Qualcomm NAND controller driver
  dt/bindings: qcom_nandc: Add DT bindings
  mtd: nand: don't select chip in nand_chip's block_bad op
  mtd: spi-nor: support lock/unlock for a few Winbond chips
  mtd: spi-nor: add TB (Top/Bottom) protect support
  mtd: spi-nor: add SPI_NOR_HAS_LOCK flag
  mtd: spi-nor: use BIT() for flash_info flags
  mtd: spi-nor: disallow further writes to SR if WP# is low
  mtd: spi-nor: make lock/unlock bounds checks more obvious and robust
  mtd: spi-nor: silently drop lock/unlock for already locked/unlocked region
  mtd: spi-nor: wait for SR_WIP to clear on initial unlock
  mtd: nand: simplify nand_bch_init() usage
  mtd: mtdswap: remove useless if (!mtd->ecclayout) test
  mtd: create an mtd_oobavail() helper and make use of it
  mtd: kill the ecclayout->oobavail field
  mtd: nand: check status before reporting timeout
  mtd: bcm63xxpart: give width specifier an 'int', not 'size_t'
  mtd: mtdram: Add parameter for setting writebuf size
  mtd: nand: pxa3xx_nand: kill unused field 'drcmr_cmd'
  ...
2016-03-24 19:57:15 -07:00
Linus Torvalds
8b306a2e7c Merge tag 'nfsd-4.6-1' of git://linux-nfs.org/~bfields/linux
Pull more nfsd updates from Bruce Fields:
 "Apologies for the previous request, which omitted the top 8 commits
  from my for-next branch (including the SCSI layout commits).  Thanks
  to Trond for spotting my error!"

This actually includes the new layout types, so here's that part of
the pull message repeated:

 "Support for a new pnfs layout type from Christoph Hellwig.  The new
  layout type is a variant of the block layout which uses SCSI features
  to offer improved fencing and device identification.

  Note this pull request also includes the client side of SCSI layout,
  with Trond's permission"

* tag 'nfsd-4.6-1' of git://linux-nfs.org/~bfields/linux:
  nfsd: use short read as well as i_size to set eof
  nfsd: better layoutupdate bounds-checking
  nfsd: block and scsi layout drivers need to depend on CONFIG_BLOCK
  nfsd: add SCSI layout support
  nfsd: move some blocklayout code
  nfsd: add a new config option for the block layout driver
  nfs/blocklayout: add SCSI layout support
  nfs4.h: add SCSI layout definitions
2016-03-24 19:50:32 -07:00
Linus Torvalds
2162b80fca Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
Pull kbuild updates from Michal Marek:

 - make dtbs_install fix

 - Error handling fix fixdep and link-vmlinux.sh

 - __UNIQUE_ID fix for clang

 - Fix for if_changed_* to suppress the "is up to date." message

 - The kernel is built with -Werror=incompatible-pointer-types

* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
  kbuild: Add option to turn incompatible pointer check into error
  kbuild: suppress annoying "... is up to date." message
  kbuild: fixdep: Check fstat(2) return value
  scripts/link-vmlinux.sh: force error on kallsyms failure
  Kbuild: provide a __UNIQUE_ID for clang
  dtbsinstall: don't move target directory out of the way
2016-03-24 19:26:47 -07:00
Linus Torvalds
976fb3f7b9 Merge branch 'parisc-4.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc updates from Helge Deller:
 "This patchset adds stack usage debug info for parisc and metag (on
  both the stack grows upwards), switches to the new generic realative
  extable search and sort routines, drops the long time ago removed
  syscalls alloc_hugepages and free_hugepages and wires up the new
  preadv2 and pwritev2 syscalls"

* 'parisc-4.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Wire up preadv2 and pwritev2 syscalls
  parisc: Use generic extable search and sort routines
  parisc: Panic immediately when panic_on_oops
  parisc,metag: Implement CONFIG_DEBUG_STACK_USAGE option
  parisc: Drop alloc_hugepages and free_hugepages syscalls
2016-03-24 19:16:20 -07:00
Rafael J. Wysocki
3513ac743d Merge branches 'pm-avs', 'pm-clk', 'pm-devfreq' and 'pm-sleep'
* pm-avs:
  PM / AVS: rockchip-io: add io selectors and supplies for rk3399

* pm-clk:
  PM / clk: Add support for obtaining clocks from device-tree

* pm-devfreq:
  PM / devfreq: Spelling s/frequnecy/frequency/

* pm-sleep:
  ACPI / PM: Runtime resume devices when waking from hibernate
  PM / sleep: Clear pm_suspend_global_flags upon hibernate
2016-03-25 00:58:18 +01:00
Linus Torvalds
e46b4e2b46 Merge tag 'trace-v4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
 "Nothing major this round.  Mostly small clean ups and fixes.

  Some visible changes:

   - A new flag was added to distinguish traces done in NMI context.

   - Preempt tracer now shows functions where preemption is disabled but
     interrupts are still enabled.

  Other notes:

   - Updates were done to function tracing to allow better performance
     with perf.

   - Infrastructure code has been added to allow for a new histogram
     feature for recording live trace event histograms that can be
     configured by simple user commands.  The feature itself was just
     finished, but needs a round in linux-next before being pulled.

     This only includes some infrastructure changes that will be needed"

* tag 'trace-v4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (22 commits)
  tracing: Record and show NMI state
  tracing: Fix trace_printk() to print when not using bprintk()
  tracing: Remove redundant reset per-CPU buff in irqsoff tracer
  x86: ftrace: Fix the misleading comment for arch/x86/kernel/ftrace.c
  tracing: Fix crash from reading trace_pipe with sendfile
  tracing: Have preempt(irqs)off trace preempt disabled functions
  tracing: Fix return while holding a lock in register_tracer()
  ftrace: Use kasprintf() in ftrace_profile_tracefs()
  ftrace: Update dynamic ftrace calls only if necessary
  ftrace: Make ftrace_hash_rec_enable return update bool
  tracing: Fix typoes in code comment and printk in trace_nop.c
  tracing, writeback: Replace cgroup path to cgroup ino
  tracing: Use flags instead of bool in trigger structure
  tracing: Add an unreg_all() callback to trigger commands
  tracing: Add needs_rec flag to event triggers
  tracing: Add a per-event-trigger 'paused' field
  tracing: Add get_syscall_name()
  tracing: Add event record param to trigger_ops.func()
  tracing: Make event trigger functions available
  tracing: Make ftrace_event_field checking functions available
  ...
2016-03-24 10:52:25 -07:00
Linus Torvalds
faea72dd0f Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux
Pull thermal updates from Zhang Rui:

 - Fix a regression where bogus trip points on some Lenovo laptops start
   to screw up thermal control after commit 81ad4276b5 ("Thermal:
   initialize thermal zone device correctly").

   On these Lenovo laptops, a bogus passive trip point is reported,
   which is 0 degree Celsius.  Without commit 81ad4276b5, thermal zone
   fails to set cooling devices to proper cooling state, which is a bug.
   But with commit 81ad4276b5 applied, the processors are always
   throttled on these Lenovo laptops because the current temperature is
   always higher than the passive trip point.

   Fix things to ignore such bogus trip points.  (Zhang Rui)

 - Introduce Mediatek thermal driver.  (Sascha Hauer)

 - Introduce devm_ versions of OF thermal sensor register API.  (Laxman
   Dewangan)

 - Changes in Kconfigs to allow compile test on UM arch.  (Krzysztof
   Kozlowski)

 - Introduce Skylake support in intel_pch_thermal driver.  (Srinivas
   Pandruvada)

 - Several small fixes on Rockchip, TI-SoC, Tegra, RCar, and Exynos
   thermal drivers.

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (26 commits)
  Thermal: Ignore invalid trip points
  thermal: trace: migrating thermal traces to use TRACE_DEFINE_ENUM() macros
  thermal: intel_pch_thermal: Enable Skylake PCH thermal
  thermal: doc: Add details of devm_thermal_zone_of_sensor_{register,unregister}
  thermal: of-thermal: Add devm version of thermal_zone_of_sensor_register
  thermal: doc: Add details of thermal_zone_of_sensor_{register,unregister}
  thermal: exynos: Defer probe if vtmu is present but not registered
  thermal: exynos: Use devm_regulator_get_optional() for vtmu
  thermal: exynos: List vtmu-supply as optional property in DT binding
  thermal: exynos: Print a message about exceeded number of supported trip-points
  thermal: exynos: Document number of supported trip-points
  thermal: exynos: Document compatible for Exynos5433 TMU
  thermal: mtk: allow compile testing on UM
  thermal: tegra_soctherm: fix sign bit of temperature
  thermal: Fix build error of missing devm_ioremap_resource on UM
  thermal: ti-soc-thermal: clean up the error handling a bit
  thermal: rcar: Use ARCH_RENESAS
  thermal: rcar_thermal: don't open code of_device_get_match_data()
  thermal: db8500_cpufreq_cooling: Compile with COMPILE_TEST
  thermal: rockchip: fix the tsadc sequence output on rk3228/rk3399
  ...
2016-03-24 10:45:59 -07:00
Linus Torvalds
5b1e167d8d Merge tag 'nfsd-4.6' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
 "Various bugfixes, a RDMA update from Chuck Lever, and support for a
  new pnfs layout type from Christoph Hellwig.  The new layout type is a
  variant of the block layout which uses SCSI features to offer improved
  fencing and device identification.

  (Also: note this pull request also includes the client side of SCSI
  layout, with Trond's permission.)"

* tag 'nfsd-4.6' of git://linux-nfs.org/~bfields/linux:
  sunrpc/cache: drop reference when sunrpc_cache_pipe_upcall() detects a race
  nfsd: recover: fix memory leak
  nfsd: fix deadlock secinfo+readdir compound
  nfsd4: resfh unused in nfsd4_secinfo
  svcrdma: Use new CQ API for RPC-over-RDMA server send CQs
  svcrdma: Use new CQ API for RPC-over-RDMA server receive CQs
  svcrdma: Remove close_out exit path
  svcrdma: Hook up the logic to return ERR_CHUNK
  svcrdma: Use correct XID in error replies
  svcrdma: Make RDMA_ERROR messages work
  rpcrdma: Add RPCRDMA_HDRLEN_ERR
  svcrdma: svc_rdma_post_recv() should close connection on error
  svcrdma: Close connection when a send error occurs
  nfsd: Lower NFSv4.1 callback message size limit
  svcrdma: Do not send Write chunk XDR pad with inline content
  svcrdma: Do not write xdr_buf::tail in a Write chunk
  svcrdma: Find client-provided write and reply chunks once per reply
  nfsd: Update NFS server comments related to RDMA support
  nfsd: Fix a memory leak when meeting unsupported state_protect_how4
  nfsd4: fix bad bounds checking
2016-03-24 10:41:00 -07:00
Linus Torvalds
3fa2fe2ce0 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "This tree contains various perf fixes on the kernel side, plus three
  hw/event-enablement late additions:

   - Intel Memory Bandwidth Monitoring events and handling
   - the AMD Accumulated Power Mechanism reporting facility
   - more IOMMU events

  ... and a final round of perf tooling updates/fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
  perf llvm: Use strerror_r instead of the thread unsafe strerror one
  perf llvm: Use realpath to canonicalize paths
  perf tools: Unexport some methods unused outside strbuf.c
  perf probe: No need to use formatting strbuf method
  perf help: Use asprintf instead of adhoc equivalents
  perf tools: Remove unused perf_pathdup, xstrdup functions
  perf tools: Do not include stringify.h from the kernel sources
  tools include: Copy linux/stringify.h from the kernel
  tools lib traceevent: Remove redundant CPU output
  perf tools: Remove needless 'extern' from function prototypes
  perf tools: Simplify die() mechanism
  perf tools: Remove unused DIE_IF macro
  perf script: Remove lots of unused arguments
  perf thread: Rename perf_event__preprocess_sample_addr to thread__resolve
  perf machine: Rename perf_event__preprocess_sample to machine__resolve
  perf tools: Add cpumode to struct perf_sample
  perf tests: Forward the perf_sample in the dwarf unwind test
  perf tools: Remove misplaced __maybe_unused
  perf list: Fix documentation of :ppp
  perf bench numa: Fix assertion for nodes bitfield
  ...
2016-03-24 10:02:14 -07:00