Commit Graph

702800 Commits

Author SHA1 Message Date
Jérôme Glisse
133ff0eac9 mm/hmm: heterogeneous memory management (HMM for short)
HMM provides 3 separate types of functionality:
    - Mirroring: synchronize CPU page table and device page table
    - Device memory: allocating struct page for device memory
    - Migration: migrating regular memory to device memory

This patch introduces some common helpers and definitions to all of
those 3 functionality.

Link: http://lkml.kernel.org/r/20170817000548.32038-3-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Evgeny Baskakov <ebaskakov@nvidia.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Mark Hairgrove <mhairgrove@nvidia.com>
Signed-off-by: Sherry Cheung <SCheung@nvidia.com>
Signed-off-by: Subhash Gutti <sgutti@nvidia.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Bob Liu <liubo95@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08 18:26:45 -07:00
Jérôme Glisse
bffc33ec53 hmm: heterogeneous memory management documentation
Patch series "HMM (Heterogeneous Memory Management)", v25.

Heterogeneous Memory Management (HMM) (description and justification)

Today device driver expose dedicated memory allocation API through their
device file, often relying on a combination of IOCTL and mmap calls.
The device can only access and use memory allocated through this API.
This effectively split the program address space into object allocated
for the device and useable by the device and other regular memory
(malloc, mmap of a file, share memory, â) only accessible by
CPU (or in a very limited way by a device by pinning memory).

Allowing different isolated component of a program to use a device thus
require duplication of the input data structure using device memory
allocator.  This is reasonable for simple data structure (array, grid,
image, â) but this get extremely complex with advance data
structure (list, tree, graph, â) that rely on a web of memory
pointers.  This is becoming a serious limitation on the kind of work
load that can be offloaded to device like GPU.

New industry standard like C++, OpenCL or CUDA are pushing to remove
this barrier.  This require a shared address space between GPU device
and CPU so that GPU can access any memory of a process (while still
obeying memory protection like read only).  This kind of feature is also
appearing in various other operating systems.

HMM is a set of helpers to facilitate several aspects of address space
sharing and device memory management.  Unlike existing sharing mechanism
that rely on pining pages use by a device, HMM relies on mmu_notifier to
propagate CPU page table update to device page table.

Duplicating CPU page table is only one aspect necessary for efficiently
using device like GPU.  GPU local memory have bandwidth in the TeraBytes/
second range but they are connected to main memory through a system bus
like PCIE that is limited to 32GigaBytes/second (PCIE 4.0 16x).  Thus it
is necessary to allow migration of process memory from main system memory
to device memory.  Issue is that on platform that only have PCIE the
device memory is not accessible by the CPU with the same properties as
main memory (cache coherency, atomic operations, ...).

To allow migration from main memory to device memory HMM provides a set of
helper to hotplug device memory as a new type of ZONE_DEVICE memory which
is un-addressable by CPU but still has struct page representing it.  This
allow most of the core kernel logic that deals with a process memory to
stay oblivious of the peculiarity of device memory.

When page backing an address of a process is migrated to device memory the
CPU page table entry is set to a new specific swap entry.  CPU access to
such address triggers a migration back to system memory, just like if the
page was swap on disk.  HMM also blocks any one from pinning a ZONE_DEVICE
page so that it can always be migrated back to system memory if CPU access
it.  Conversely HMM does not migrate to device memory any page that is pin
in system memory.

To allow efficient migration between device memory and main memory a new
migrate_vma() helpers is added with this patchset.  It allows to leverage
device DMA engine to perform the copy operation.

This feature will be use by upstream driver like nouveau mlx5 and probably
other in the future (amdgpu is next suspect in line).  We are actively
working on nouveau and mlx5 support.  To test this patchset we also worked
with NVidia close source driver team, they have more resources than us to
test this kind of infrastructure and also a bigger and better userspace
eco-system with various real industry workload they can be use to test and
profile HMM.

The expected workload is a program builds a data set on the CPU (from
disk, from network, from sensors, â).  Program uses GPU API (OpenCL,
CUDA, ...) to give hint on memory placement for the input data and also
for the output buffer.  Program call GPU API to schedule a GPU job, this
happens using device driver specific ioctl.  All this is hidden from
programmer point of view in case of C++ compiler that transparently
offload some part of a program to GPU.  Program can keep doing other stuff
on the CPU while the GPU is crunching numbers.

It is expected that CPU will not access the same data set as the GPU while
GPU is working on it, but this is not mandatory.  In fact we expect some
small memory object to be actively access by both GPU and CPU concurrently
as synchronization channel and/or for monitoring purposes.  Such object
will stay in system memory and should not be bottlenecked by system bus
bandwidth (rare write and read access from both CPU and GPU).

As we are relying on device driver API, HMM does not introduce any new
syscall nor does it modify any existing ones.  It does not change any
POSIX semantics or behaviors.  For instance the child after a fork of a
process that is using HMM will not be impacted in anyway, nor is there any
data hazard between child COW or parent COW of memory that was migrated to
device prior to fork.

HMM assume a numbers of hardware features.  Device must allow device page
table to be updated at any time (ie device job must be preemptable).
Device page table must provides memory protection such as read only.
Device must track write access (dirty bit).  Device must have a minimum
granularity that match PAGE_SIZE (ie 4k).

Reviewer (just hint):
Patch 1  HMM documentation
Patch 2  introduce core infrastructure and definition of HMM, pretty
         small patch and easy to review
Patch 3  introduce the mirror functionality of HMM, it relies on
         mmu_notifier and thus someone familiar with that part would be
         in better position to review
Patch 4  is an helper to snapshot CPU page table while synchronizing with
         concurrent page table update. Understanding mmu_notifier makes
         review easier.
Patch 5  is mostly a wrapper around handle_mm_fault()
Patch 6  add new add_pages() helper to avoid modifying each arch memory
         hot plug function
Patch 7  add a new memory type for ZONE_DEVICE and also add all the logic
         in various core mm to support this new type. Dan Williams and
         any core mm contributor are best people to review each half of
         this patchset
Patch 8  special case HMM ZONE_DEVICE pages inside put_page() Kirill and
         Dan Williams are best person to review this
Patch 9  allow to uncharge a page from memory group without using the lru
         list field of struct page (best reviewer: Johannes Weiner or
         Vladimir Davydov or Michal Hocko)
Patch 10 Add support to uncharge ZONE_DEVICE page from a memory cgroup (best
         reviewer: Johannes Weiner or Vladimir Davydov or Michal Hocko)
Patch 11 add helper to hotplug un-addressable device memory as new type
         of ZONE_DEVICE memory (new type introducted in patch 3 of this
         serie). This is boiler plate code around memory hotplug and it
         also pick a free range of physical address for the device memory.
         Note that the physical address do not point to anything (at least
         as far as the kernel knows).
Patch 12 introduce a new hmm_device class as an helper for device driver
         that want to expose multiple device memory under a common fake
         device driver. This is usefull for multi-gpu configuration.
         Anyone familiar with device driver infrastructure can review
         this. Boiler plate code really.
Patch 13 add a new migrate mode. Any one familiar with page migration is
         welcome to review.
Patch 14 introduce a new migration helper (migrate_vma()) that allow to
         migrate a range of virtual address of a process using device DMA
         engine to perform the copy. It is not limited to do copy from and
         to device but can also do copy between any kind of source and
         destination memory. Again anyone familiar with migration code
         should be able to verify the logic.
Patch 15 optimize the new migrate_vma() by unmapping pages while we are
         collecting them. This can be review by any mm folks.
Patch 16 add unaddressable memory migration to helper introduced in patch
         7, this can be review by anyone familiar with migration code
Patch 17 add a feature that allow device to allocate non-present page on
         the GPU when migrating a range of address to device memory. This
         is an helper for device driver to avoid having to first allocate
         system memory before migration to device memory
Patch 18 add a new kind of ZONE_DEVICE memory for cache coherent device
         memory (CDM)
Patch 19 add an helper to hotplug CDM memory

Previous patchset posting :
v1 http://lwn.net/Articles/597289/
v2 https://lkml.org/lkml/2014/6/12/559
v3 https://lkml.org/lkml/2014/6/13/633
v4 https://lkml.org/lkml/2014/8/29/423
v5 https://lkml.org/lkml/2014/11/3/759
v6 http://lwn.net/Articles/619737/
v7 http://lwn.net/Articles/627316/
v8 https://lwn.net/Articles/645515/
v9 https://lwn.net/Articles/651553/
v10 https://lwn.net/Articles/654430/
v11 http://www.gossamer-threads.com/lists/linux/kernel/2286424
v12 http://www.kernelhub.org/?msg=972982&p=2
v13 https://lwn.net/Articles/706856/
v14 https://lkml.org/lkml/2016/12/8/344
v15 http://www.mail-archive.com/linux-kernel@xxxxxxxxxxxxxxx/msg1304107.html
v16 http://www.spinics.net/lists/linux-mm/msg119814.html
v17 https://lkml.org/lkml/2017/1/27/847
v18 https://lkml.org/lkml/2017/3/16/596
v19 https://lkml.org/lkml/2017/4/5/831
v20 https://lwn.net/Articles/720715/
v21 https://lkml.org/lkml/2017/4/24/747
v22 http://lkml.iu.edu/hypermail/linux/kernel/1705.2/05176.html
v23 https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1404788.html
v24 https://lwn.net/Articles/726691/

This patch (of 19):

This adds documentation for HMM (Heterogeneous Memory Management).  It
presents the motivation behind it, the features necessary for it to be
useful and and gives an overview of how this is implemented.

Link: http://lkml.kernel.org/r/20170817000548.32038-2-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mark Hairgrove <mhairgrove@nvidia.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Sherry Cheung <SCheung@nvidia.com>
Cc: Subhash Gutti <sgutti@nvidia.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Bob Liu <liubo95@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08 18:26:45 -07:00
Naoya Horiguchi
8135d8926c mm: memory_hotplug: memory hotremove supports thp migration
This patch enables thp migration for memory hotremove.

Link: http://lkml.kernel.org/r/20170717193955.20207-11-zi.yan@sent.com
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08 18:26:45 -07:00
Naoya Horiguchi
e8db67eb0d mm: migrate: move_pages() supports thp migration
This patch enables thp migration for move_pages(2).

Link: http://lkml.kernel.org/r/20170717193955.20207-10-zi.yan@sent.com
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08 18:26:45 -07:00
Naoya Horiguchi
c863379849 mm: mempolicy: mbind and migrate_pages support thp migration
This patch enables thp migration for mbind(2) and migrate_pages(2).

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08 18:26:45 -07:00
Naoya Horiguchi
ab6e3d0939 mm: soft-dirty: keep soft-dirty bits over thp migration
Soft dirty bit is designed to keep tracked over page migration.  This
patch makes it work in the same manner for thp migration too.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08 18:26:45 -07:00
Zi Yan
84c3fc4e9c mm: thp: check pmd migration entry in common path
When THP migration is being used, memory management code needs to handle
pmd migration entries properly.  This patch uses !pmd_present() or
is_swap_pmd() (depending on whether pmd_none() needs separate code or
not) to check pmd migration entries at the places where a pmd entry is
present.

Since pmd-related code uses split_huge_page(), split_huge_pmd(),
pmd_trans_huge(), pmd_trans_unstable(), or
pmd_none_or_trans_huge_or_clear_bad(), this patch:

1. adds pmd migration entry split code in split_huge_pmd(),

2. takes care of pmd migration entries whenever pmd_trans_huge() is present,

3. makes pmd_none_or_trans_huge_or_clear_bad() pmd migration entry aware.

Since split_huge_page() uses split_huge_pmd() and pmd_trans_unstable()
is equivalent to pmd_none_or_trans_huge_or_clear_bad(), we do not change
them.

Until this commit, a pmd entry should be:
1. pointing to a pte page,
2. is_swap_pmd(),
3. pmd_trans_huge(),
4. pmd_devmap(), or
5. pmd_none().

Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08 18:26:45 -07:00
Zi Yan
616b837153 mm: thp: enable thp migration in generic path
Add thp migration's core code, including conversions between a PMD entry
and a swap entry, setting PMD migration entry, removing PMD migration
entry, and waiting on PMD migration entries.

This patch makes it possible to support thp migration.  If you fail to
allocate a destination page as a thp, you just split the source thp as
we do now, and then enter the normal page migration.  If you succeed to
allocate destination thp, you enter thp migration.  Subsequent patches
actually enable thp migration for each caller of page migration by
allowing its get_new_page() callback to allocate thps.

[zi.yan@cs.rutgers.edu: fix gcc-4.9.0 -Wmissing-braces warning]
  Link: http://lkml.kernel.org/r/A0ABA698-7486-46C3-B209-E95A9048B22C@cs.rutgers.edu
[akpm@linux-foundation.org: fix x86_64 allnoconfig warning]
Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08 18:26:45 -07:00
Naoya Horiguchi
9c670ea379 mm: thp: introduce CONFIG_ARCH_ENABLE_THP_MIGRATION
Introduce CONFIG_ARCH_ENABLE_THP_MIGRATION to limit thp migration
functionality to x86_64, which should be safer at the first step.

Link: http://lkml.kernel.org/r/20170717193955.20207-5-zi.yan@sent.com
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08 18:26:45 -07:00
Naoya Horiguchi
b5ff8161e3 mm: thp: introduce separate TTU flag for thp freezing
TTU_MIGRATION is used to convert pte into migration entry until thp
split completes.  This behavior conflicts with thp migration added later
patches, so let's introduce a new TTU flag specifically for freezing.

try_to_unmap() is used both for thp split (via freeze_page()) and page
migration (via __unmap_and_move()).  In freeze_page(), ttu_flag given
for head page is like below (assuming anonymous thp):

    (TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS | TTU_RMAP_LOCKED | \
     TTU_MIGRATION | TTU_SPLIT_HUGE_PMD)

and ttu_flag given for tail pages is:

    (TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS | TTU_RMAP_LOCKED | \
     TTU_MIGRATION)

__unmap_and_move() calls try_to_unmap() with ttu_flag:

    (TTU_MIGRATION | TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS)

Now I'm trying to insert a branch for thp migration at the top of
try_to_unmap_one() like below

static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
                       unsigned long address, void *arg)
  {
          ...
          /* PMD-mapped THP migration entry */
          if (!pvmw.pte && (flags & TTU_MIGRATION)) {
              if (!PageAnon(page))
                  continue;

              set_pmd_migration_entry(&pvmw, page);
              continue;
          }
	  ...
  }

so try_to_unmap() for tail pages called by thp split can go into thp
migration code path (which converts *pmd* into migration entry), while
the expectation is to freeze thp (which converts *pte* into migration
entry.)

I detected this failure as a "bad page state" error in a testcase where
split_huge_page() is called from queue_pages_pte_range().

Link: http://lkml.kernel.org/r/20170717193955.20207-4-zi.yan@sent.com
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08 18:26:45 -07:00
Naoya Horiguchi
eee4818baa mm: x86: move _PAGE_SWP_SOFT_DIRTY from bit 7 to bit 1
_PAGE_PSE is used to distinguish between a truly non-present
(_PAGE_PRESENT=0) PMD, and a PMD which is undergoing a THP split and
should be treated as present.

But _PAGE_SWP_SOFT_DIRTY currently uses the _PAGE_PSE bit, which would
cause confusion between one of those PMDs undergoing a THP split, and a
soft-dirty PMD.  Dropping _PAGE_PSE check in pmd_present() does not work
well, because it can hurt optimization of tlb handling in thp split.

Thus, we need to move the bit.

In the current kernel, bits 1-4 are not used in non-present format since
commit 00839ee3b2 ("x86/mm: Move swap offset/type up in PTE to work
around erratum").  So let's move _PAGE_SWP_SOFT_DIRTY to bit 1.  Bit 7
is used as reserved (always clear), so please don't use it for other
purpose.

Link: http://lkml.kernel.org/r/20170717193955.20207-3-zi.yan@sent.com
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08 18:26:45 -07:00
Naoya Horiguchi
88aaa2a1d7 mm: mempolicy: add queue_pages_required()
Patch series "mm: page migration enhancement for thp", v9.

Motivations:

1. THP migration becomes important in the upcoming heterogeneous memory
   systems. As David Nellans from NVIDIA pointed out from other threads
   (http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1349227.html),
   future GPUs or other accelerators will have their memory managed by
   operating systems. Moving data into and out of these memory nodes
   efficiently is critical to applications that use GPUs or other
   accelerators. Existing page migration only supports base pages, which
   has a very low memory bandwidth utilization. My experiments (see
   below) show THP migration can migrate pages more efficiently.

2. Base page migration vs THP migration throughput.

   Here are cross-socket page migration results from calling
   move_pages() syscall:

   In x86_64, a Intel two-socket E5-2640v3 box,
    - single 4KB base page migration takes 62.47 us, using 0.06 GB/s BW,
    - single 2MB THP migration takes 658.54 us, using 2.97 GB/s BW,
    - 512 4KB base page migration takes 1987.38 us, using 0.98 GB/s BW.

   In ppc64, a two-socket Power8 box,
    - single 64KB base page migration takes 49.3 us, using 1.24 GB/s BW,
    - single 16MB THP migration takes 2202.17 us, using 7.10 GB/s BW,
    - 256 64KB base page migration takes 2543.65 us, using 6.14 GB/s BW.

   THP migration can give us 3x and 1.15x throughput over base page
   migration in x86_64 and ppc64 respectivley.

   You can test it out by using the code here:
      https://github.com/x-y-z/thp-migration-bench

3. Existing page migration splits THP before migration and cannot
   guarantee the migrated pages are still contiguous. Contiguity is
   always what GPUs and accelerators look for. Without THP migration,
   khugepaged needs to do extra work to reassemble the migrated pages
   back to THPs.

This patch (of 10):

Introduce a separate check routine related to MPOL_MF_INVERT flag.  This
patch just does cleanup, no behavioral change.

Link: http://lkml.kernel.org/r/20170717193955.20207-2-zi.yan@sent.com
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08 18:26:45 -07:00
Linus Torvalds
015a9e66b9 RDMA/netlink: clean up message validity array initializer
The fix in the parent made me look at that function, and react to how
illogical and illegible the array initializer was.

Use named array indexes to make it clearer what is going on, and make
the initializer not depend silently on the exact index numbers.

[ The initializer now also shows an odd inconsistency in the naming:
  note the IWCM vs IWPM..   - Linus ]

Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Doug Ledford <dledford@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08 10:17:20 -07:00
Leon Romanovsky
8b2c7e7a3c RDAM/netlink: Fix out-of-bound access while checking message validity
The netlink message sent with type == 0, which doesn't have any client
behind it, caused to the overflow in max_num_ops array.

Fix it by declaring zero number of ops for the first client.

Fixes: c9901724a2 ("RDMA/netlink: Remove netlink clients infrastructure")
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08 10:01:03 -07:00
Linus Torvalds
5969d1bb30 Merge branch 'gperf-removal'
Remove our use of 'gperf' for generating perfect hashes from some of our
build tools.

This removal was prompted by Masahiro Yamada sending out a patch that
removes all our pre-generated files, and when I tested it, I noticed
that the gperf version I have (3.1) apparently generates code that no
longer works with out code-base because the function interfaces
generated by gperf have changed.

We really don't care that much, and the gperf people changed their
interfaces in ways that makes it annoying to work with them.  Tools that
make it hard to use them should not be used, and the kernel is not at
all interested in some autoconf mess.  So remove the gperf dependency
entirely.

It turns out that if you ignore the pre-generated files, the use of
gperf apparently saved us a whopping fifteen lines of code.  It
obviously wasn't worth it, considering that the pre-generated files are
about 500 lines.

I sent this out as a patch about three weeks ago, and got absolutely
zero responses.  So let's see if anybody notices now that I merge it.
Because there might be serious bugs here, but it WorksForMe(tm).

* gperf-removal:
  Remove gperf usage from toolchain
2017-09-07 21:39:15 -07:00
Linus Torvalds
572c01ba19 Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
 "This is mostly updates of the usual suspects: lpfc, qla2xxx, hisi_sas,
  megaraid_sas, zfcp and a host of minor updates.

  The major driver change here is the elimination of the block based
  cciss driver in favour of the SCSI based hpsa driver (which now drives
  all the legacy cases cciss used to be required for). Plus a reset
  handler clean up and the redo of the SAS SMP handler to use bsg lib"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (279 commits)
  scsi: scsi-mq: Always unprepare before requeuing a request
  scsi: Show .retries and .jiffies_at_alloc in debugfs
  scsi: Improve requeuing behavior
  scsi: Call scsi_initialize_rq() for filesystem requests
  scsi: qla2xxx: Reset the logo flag, after target re-login.
  scsi: qla2xxx: Fix slow mem alloc behind lock
  scsi: qla2xxx: Clear fc4f_nvme flag
  scsi: qla2xxx: add missing includes for qla_isr
  scsi: qla2xxx: Fix an integer overflow in sysfs code
  scsi: aacraid: report -ENOMEM to upper layer from aac_convert_sgraw2()
  scsi: aacraid: get rid of one level of indentation
  scsi: aacraid: fix indentation errors
  scsi: storvsc: fix memory leak on ring buffer busy
  scsi: scsi_transport_sas: switch to bsg-lib for SMP passthrough
  scsi: smartpqi: remove the smp_handler stub
  scsi: hpsa: remove the smp_handler stub
  scsi: bsg-lib: pass the release callback through bsg_setup_queue
  scsi: Rework handling of scsi_device.vpd_pg8[03]
  scsi: Rework the code for caching Vital Product Data (VPD)
  scsi: rcu: Introduce rcu_swap_protected()
  ...
2017-09-07 21:11:05 -07:00
Linus Torvalds
cef5d0f952 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk
Pull printk updates from Petr Mladek:

 - Do not allow use of freed init data and code even when boot consoles
   are forced to stay. Also check for the init memory more precisely.

 - Some code clean up by starting contributors.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
  printk: Clean up do_syslog() error handling
  printk/console: Enhance the check for consoles using init memory
  printk/console: Always disable boot consoles that use init memory before it is freed
  printk: Modify operators of printed_len and text_len
2017-09-07 21:00:52 -07:00
Linus Torvalds
0fb02e718f Merge tag 'audit-pr-20170907' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit updates from Paul Moore:
 "A small pull request for audit this time, only four patches and only
  two with any real code changes.

  Those two changes are the removal of a pointless SELinux AVC
  initialization audit event and a fix to improve the audit timestamp
  overhead.

  The other two patches are comment cleanup and administrative updates,
  nothing very exciting.

  Everything passes our tests"

* tag 'audit-pr-20170907' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
  audit: update the function comments
  selinux: remove AVC init audit log message
  audit: update the audit info in MAINTAINERS
  audit: Reduce overhead using a coarse clock
2017-09-07 20:48:25 -07:00
Linus Torvalds
828f4257d1 Merge tag 'secureexec-v4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull secureexec update from Kees Cook:
 "This series has the ultimate goal of providing a sane stack rlimit
  when running set*id processes.

  To do this, the bprm_secureexec LSM hook is collapsed into the
  bprm_set_creds hook so the secureexec-ness of an exec can be
  determined early enough to make decisions about rlimits and the
  resulting memory layouts. Other logic acting on the secureexec-ness of
  an exec is similarly consolidated. Capabilities needed some special
  handling, but the refactoring removed other special handling, so that
  was a wash"

* tag 'secureexec-v4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  exec: Consolidate pdeath_signal clearing
  exec: Use sane stack rlimit under secureexec
  exec: Consolidate dumpability logic
  smack: Remove redundant pdeath_signal clearing
  exec: Use secureexec for clearing pdeath_signal
  exec: Use secureexec for setting dumpability
  LSM: drop bprm_secureexec hook
  commoncap: Move cap_elevated calculation into bprm_set_creds
  commoncap: Refactor to remove bprm_secureexec hook
  smack: Refactor to remove bprm_secureexec hook
  selinux: Refactor to remove bprm_secureexec hook
  apparmor: Refactor to remove bprm_secureexec hook
  binfmt: Introduce secureexec flag
  exec: Correct comments about "point of no return"
  exec: Rename bprm->cred_prepared to called_set_creds
2017-09-07 20:35:29 -07:00
Linus Torvalds
44ccba3f7b Merge tag 'gcc-plugins-v4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull gcc plugins update from Kees Cook:
 "This finishes the porting work on randstruct, and introduces a new
  option to structleak, both noted below:

   - For the randstruct plugin, enable automatic randomization of
     structures that are entirely function pointers (along with a couple
     designated initializer fixes).

   - For the structleak plugin, provide an option to perform zeroing
     initialization of all otherwise uninitialized stack variables that
     are passed by reference (Ard Biesheuvel)"

* tag 'gcc-plugins-v4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  gcc-plugins: structleak: add option to init all vars used as byref args
  randstruct: Enable function pointer struct detection
  drivers/net/wan/z85230.c: Use designated initializers
  drm/amd/powerplay: rv: Use designated initializers
2017-09-07 20:30:19 -07:00
Linus Torvalds
21d236bf2b Merge tag 'pstore-v4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull pstore update from Kees Cook:
 "Make pstore permissions more versatile by removing CAP_SYSLOG
  requirement and defining more restrictive root directory DAC
  permissions default (0750, which can be adjust after boot unlike the
  CAP_SYSLOG check).

  Suggested by Nick Kralevich"

* tag 'pstore-v4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  Revert "pstore: Honor dmesg_restrict sysctl on dmesg dumps"
  pstore: Make default pstorefs root dir perms 0750
2017-09-07 19:58:56 -07:00
Linus Torvalds
8dc5b3a6cb Merge tag '4.14-smb3-xattr-enable' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs update from Steve French:
 "Enable xattr support for smb3 and also a bugfix"

* tag '4.14-smb3-xattr-enable' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: Check for timeout on Negotiate stage
  cifs: Add support for writing attributes on SMB2+
  cifs: Add support for reading attributes on SMB2+
2017-09-07 16:06:14 -07:00
Linus Torvalds
2500e287bc Merge git://git.kvack.org/~bcrl/aio-next
Pull aio fix from Ben LaHaise:
 "Improve aio-nr counting on large SMP systems.

  It has been in linux-next for quite some time"

* git://git.kvack.org/~bcrl/aio-next:
  fs: aio: fix the increment of aio-nr and counting against aio-max-nr
2017-09-07 15:51:11 -07:00
Linus Torvalds
ae8ac6b7db Merge branch 'quota_scaling' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull quota scaling updates from Jan Kara:
 "This contains changes to make the quota subsystem more scalable.

  Reportedly it improves number of files created per second on ext4
  filesystem on fast storage by about a factor of 2x"

* 'quota_scaling' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (28 commits)
  quota: Add lock annotations to struct members
  quota: Reduce contention on dq_data_lock
  fs: Provide __inode_get_bytes()
  quota: Inline dquot_[re]claim_reserved_space() into callsite
  quota: Inline inode_{incr,decr}_space() into callsites
  quota: Inline functions into their callsites
  ext4: Disable dirty list tracking of dquots when journalling quotas
  quota: Allow disabling tracking of dirty dquots in a list
  quota: Remove dq_wait_unused from dquot
  quota: Move locking into clear_dquot_dirty()
  quota: Do not dirty bad dquots
  quota: Fix possible corruption of dqi_flags
  quota: Propagate ->quota_read errors from v2_read_file_info()
  quota: Fix error codes in v2_read_file_info()
  quota: Push dqio_sem down to ->read_file_info()
  quota: Push dqio_sem down to ->write_file_info()
  quota: Push dqio_sem down to ->get_next_id()
  quota: Push dqio_sem down to ->release_dqblk()
  quota: Remove locking for writing to the old quota format
  quota: Do not acquire dqio_sem for dquot overwrites in v2 format
  ...
2017-09-07 15:19:35 -07:00
Linus Torvalds
460352c2f1 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull UDF, reiserfs, quota, fsnotify cleanups from Jan Kara:
 "Several UDF, reiserfs, quota and fsnotify cleanups.

  Note that there is also a patch updating MAINTAINERS entry for
  notification subsystem to point to me as a maintainer since current
  entries are stale"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  fsnotify: make dnotify_fsnotify_ops const
  isofs: Delete an unnecessary variable initialisation in isofs_read_inode()
  isofs: Adjust four checks for null pointers
  isofs: Delete an error message for a failed memory allocation in isofs_read_inode()
  quota_v2: Delete an error message for a failed memory allocation in v2_read_file_info()
  fs-udf: Delete an error message for a failed memory allocation in two functions
  fs-udf: Improve six size determinations
  fs-udf: Adjust two checks for null pointers
  reiserfs: fix spelling mistake: "tranasction" -> "transaction"
  MAINTAINERS: Update entries for notification subsystem
  uapi/linux/quota.h: Do not include linux/errno.h
2017-09-07 14:53:17 -07:00
Linus Torvalds
74fee4e88f Merge tag 'devicetree-for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull DeviceTree updates from Rob Herring:
 "There's a few orphans in the conversion to %pOF printf specifiers
  included here that no one else picked up.

  Summary:

   - Convert more DT code to use of_property_read_* API.

   - Improve DT overlay support when adding multiple overlays

   - Convert printk's to %pOF format specifiers. Most went via subsystem
     trees, but picked up the remaining orphans

   - Correct unittests to use preferred "okay" for "status" property
     value

   - Add a KASLR seed property

   - Vendor prefixes for Mellanox, Theobroma System, Adaptrum, Moxa

   - Fix modalias buffer handling

   - Clean-up of include paths for building dtbs

   - Add bindings for amc6821, isl1208, tsl2x7x, srf02, and srf10
     devices

   - Add nvmem bindings for MediaTek MT7623 and MT7622 SoC

   - Add compatible string for Allwinner H5 Mali-450 GPU

   - Fix links to old OpenFirmware docs with new mirror on
     devicetree.org

   - Remove status property from binding doc examples"

* tag 'devicetree-for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (45 commits)
  devicetree: Adjust status "ok" -> "okay" under drivers/of/
  dt-bindings: Remove "status" from examples
  dt-bindings: pinctrl: sh-pfc: Use generic node name
  dt-bindings: Add vendor Mellanox
  dt-binding: net/phy: fix interrupts description
  virt: Convert to using %pOF instead of full_name
  macintosh: Convert to using %pOF instead of full_name
  ide: pmac: Convert to using %pOF instead of full_name
  microblaze: Convert to using %pOF instead of full_name
  dt-bindings: usb: musb: Grammar s/the/to/, s/is/are/
  of: Use PLATFORM_DEVID_NONE definition
  of/device: Fix of_device_get_modalias() buffer handling
  of/device: Prevent buffer overflow in of_device_modalias()
  dt-bindings: add amc6821, isl1208 trivial bindings
  dt-bindings: add vendor prefix for Theobroma Systems
  of: search scripts/dtc/include-prefixes path for both CPP and DTC
  of: remove arch/$(SRCARCH)/boot/dts from include search path for CPP
  of: remove drivers/of/testcase-data from include search path for CPP
  of: return of_get_cpu_node from of_cpu_device_node_get if CPUs are not registered
  iio: srf08: add device tree binding for srf02 and srf10
  ...
2017-09-07 14:43:33 -07:00
Linus Torvalds
7d95565612 Merge tag 'drm-intel-next-fixes-2017-09-07' of git://anongit.freedesktop.org/git/drm-intel
Pull i916 drm fixes from Rodrigo Vivi:
 "Since Dave is on paternity leave we are sending drm/i915 fixes for
  v4.14-rc1 directly to you as he had asked us to do.

  The most critical ones are the GPU reset fix for gen2-4 and GVT fix
  for a regression that is blocking gvt init to work on your tree.

  The rest is general fixes for patches coming from drm-next"

Acked-by: Dave Airlie <airlied@redhat.com>

* tag 'drm-intel-next-fixes-2017-09-07' of git://anongit.freedesktop.org/git/drm-intel:
  drm/i915: Re-enable GTT following a device reset
  drm/i915: Annotate user relocs with __user
  drm/i915: Silence sparse by using gfp_t
  drm/i915: Add __rcu to radix tree slot pointer
  drm/i915: Fix the missing PPAT cache attributes on CNL
  drm/i915/gvt: Remove one duplicated MMIO
  drm/i915: Fix enum pipe vs. enum transcoder for the PCH transcoder
  drm/i915: Make i2c lock ops static
  drm/i915: Make i9xx_load_ycbcr_conversion_matrix() static
  drm/i915/edp: Increase T12 panel delay to 900 ms to fix DP AUX CH timeouts
  drm/i915: Ignore duplicate VMA stored within the per-object handle LUT
  drm/i915: Skip fence alignemnt check for the CCS plane
  drm/i915: Treat fb->offsets[] as a raw byte offset instead of a linear offset
  drm/i915: Always wake the device to flush the GTT
  drm/i915: Recreate vmapping even when the object is pinned
  drm/i915: Quietly cancel FBC activation if CRTC is turned off before worker
2017-09-07 14:37:25 -07:00
Linus Torvalds
5f9cc57036 Merge tag 'leds_for_4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds
Pull LED updates from Jacek Anaszewski:
 "LED class drivers improvements:

  leds-pca955x:
   - add Device Tree support and bindings
   - use devm_led_classdev_register()
   - add GPIO support
   - prevent crippled LED class device name
   - check for I2C errors

  leds-gpio:
   - add optional retain-state-shutdown DT property
   - allow LED to retain state at shutdown

  leds-tlc591xx:
   - merge conditional tests
   - add missing of_node_put

  leds-powernv:
   - delete an error message for a failed memory allocation in
     powernv_led_create()

  leds-is31fl32xx.c
   - convert to using custom %pOF printf format specifier

  Constify attribute_group structures in:
   - leds-blinkm
   - leds-lm3533

  Make several arrays static const in:
   - leds-aat1290
   - leds-lp5521
   - leds-lp5562
   - leds-lp8501"

* tag 'leds_for_4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds:
  leds: pca955x: check for I2C errors
  leds: gpio: Allow LED to retain state at shutdown
  dt-bindings: leds: gpio: Add optional retain-state-shutdown property
  leds: powernv: Delete an error message for a failed memory allocation in powernv_led_create()
  leds: lp8501: make several arrays static const
  leds: lp5562: make several arrays static const
  leds: lp5521: make several arrays static const
  leds: aat1290: make array max_mm_current_percent static const
  leds: pca955x: Prevent crippled LED device name
  leds: lm3533: constify attribute_group structure
  dt-bindings: leds: add pca955x
  leds: pca955x: add GPIO support
  leds: pca955x: use devm_led_classdev_register
  leds: pca955x: add device tree support
  leds: Convert to using %pOF instead of full_name
  leds: blinkm: constify attribute_group structures.
  leds: tlc591xx: add missing of_node_put
  leds: tlc591xx: merge conditional tests
2017-09-07 14:33:13 -07:00
Linus Torvalds
cd7b34fe1c Merge tag 'dmaengine-4.14-rc1' of git://git.infradead.org/users/vkoul/slave-dma
Pull dmaengine updates from Vinod Koul:
 "This one features the usual updates to the drivers and one good part
  of removing DA_SG from core as it has no users.

  Summary:

   - Remove DMA_SG support as we have no users for this feature
   - New driver for Altera / Intel mSGDMA IP core
   - Support for memset in dmatest and qcom_hidma driver
   - Update for non cyclic mode in k3dma, bunch of update in bam_dma,
     bcm sba-raid
   - Constify device ids across drivers"

* tag 'dmaengine-4.14-rc1' of git://git.infradead.org/users/vkoul/slave-dma: (52 commits)
  dmaengine: sun6i: support V3s SoC variant
  dmaengine: sun6i: make gate bit in sun8i's DMA engines a common quirk
  dmaengine: rcar-dmac: document R8A77970 bindings
  dmaengine: xilinx_dma: Fix error code format specifier
  dmaengine: altera: Use macros instead of structs to describe the registers
  dmaengine: ti-dma-crossbar: Fix dra7 reserve function
  dmaengine: pl330: constify amba_id
  dmaengine: pl08x: constify amba_id
  dmaengine: bcm-sba-raid: Remove redundant SBA_REQUEST_STATE_COMPLETED
  dmaengine: bcm-sba-raid: Explicitly ACK mailbox message after sending
  dmaengine: bcm-sba-raid: Add debugfs support
  dmaengine: bcm-sba-raid: Remove redundant SBA_REQUEST_STATE_RECEIVED
  dmaengine: bcm-sba-raid: Re-factor sba_process_deferred_requests()
  dmaengine: bcm-sba-raid: Pre-ack async tx descriptor
  dmaengine: bcm-sba-raid: Peek mbox when we have no free requests
  dmaengine: bcm-sba-raid: Alloc resources before registering DMA device
  dmaengine: bcm-sba-raid: Improve sba_issue_pending() run duration
  dmaengine: bcm-sba-raid: Increase number of free sba_request
  dmaengine: bcm-sba-raid: Allow arbitrary number free sba_request
  dmaengine: bcm-sba-raid: Remove reqs_free_count from sba_device
  ...
2017-09-07 14:03:05 -07:00
Linus Torvalds
75c727155c Merge tag 'backlight-next-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight
Pull backlight updates from Lee Jones:
 "Fix-ups:
   - Constification; pwm_bl
   - Use new GPIO API; gpio_backlight
   - Remove unused functionality; gpio_backlight

  Bug Fixes:
   - Fix artificial MAXREG limit; lm3630a_bl"

* tag 'backlight-next-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight:
  backlight: gpio_backlight: Delete pdata inversion
  backlight: gpio_backlight: Convert to use GPIO descriptor
  backlight: pwm_bl: Make of_device_ids const
  backlight: lm3630a: Bump REG_MAX value to 0x50 instead of 0x1F
2017-09-07 13:55:16 -07:00
Linus Torvalds
968c61f7da Merge tag 'mfd-next-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd
Pull MFD updates from Lee Jones:
 "New Drivers
   - RK805 Power Management IC (PMIC)
   - ROHM BD9571MWV-M MFD Power Management IC (PMIC)
   - Texas Instruments TPS68470 Power Management IC (PMIC) & LEDs

  New Device Support:
   - Add support for HiSilicon Hi6421v530 to hi6421-pmic-core
   - Add support for X-Powers AXP806 to axp20x
   - Add support for X-Powers AXP813 to axp20x
   - Add support for Intel Sunrise Point LPSS to intel-lpss-pci

  New Functionality:
   - Amend API to provide register layout; atmel-smc

  Fix-ups:
   - DT re-work; omap, nokia
   - Header file location change {I2C => MFD}; dm355evm_msp, tps65010
   - Fix chip ID formatting issue(s); rk808
   - Optionally register touchscreen devices; da9052-core
   - Documentation improvements; twl-core
   - Constification; rtsx_pcr, ab8500-core, da9055-i2c, da9052-spi
   - Drop unnecessary static declaration; max8925-i2c
   - Kconfig changes (missing deps and remove module support)
   - Slim down oversized licence statement; hi6421-pmic-core
   - Use managed resources (devm_*); lp87565
   - Supply proper error checking/handling; t7l66xb

  Bug Fixes:
   - Fix counter duplication issue; da9052-core
   - Fix potential NULL deference issue; max8998
   - Leave SPI-NOR write-protection bit alone; lpc_ich
   - Ensure device is put into reset during suspend; intel-lpss
   - Correct register offset variable size; omap-usb-tll"

* tag 'mfd-next-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (61 commits)
  mfd: intel_soc_pmic: Differentiate between Bay and Cherry Trail CRC variants
  mfd: intel_soc_pmic: Export separate mfd-cell configs for BYT and CHT
  dt-bindings: mfd: Add bindings for ZII RAVE devices
  mfd: omap-usb-tll: Fix register offsets
  mfd: da9052: Constify spi_device_id
  mfd: intel-lpss: Put I2C and SPI controllers into reset state on suspend
  mfd: da9055: Constify i2c_device_id
  mfd: intel-lpss: Add missing PCI ID for Intel Sunrise Point LPSS devices
  mfd: t7l66xb: Handle return value of clk_prepare_enable
  mfd: Add ROHM BD9571MWV-M PMIC DT bindings
  mfd: intel_soc_pmic_chtwc: Turn Kconfig option into a bool
  mfd: lp87565: Convert to use devm_mfd_add_devices()
  mfd: Add support for TPS68470 device
  mfd: lpc_ich: Do not touch SPI-NOR write protection bit on Haswell/Broadwell
  mfd: syscon: atmel-smc: Add helper to retrieve register layout
  mfd: axp20x: Use correct platform device ID for many PEK
  dt-bindings: mfd: axp20x: Introduce bindings for AXP813
  mfd: axp20x: Add support for AXP813 PMIC
  dt-bindings: mfd: axp20x: Add AXP806 to supported list of chips
  mfd: Add ROHM BD9571MWV-M MFD PMIC driver
  ...
2017-09-07 13:51:13 -07:00
Linus Torvalds
9d71941d39 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:

 - a new GPIO bit-banging driver implementing PS/2 protocol

 - a new power key driver for Rockchip RK805 PMIC

 - bunch of patches constifying various device ID structures

 - Elan I2C touchpad driver now supports devices with 2 buttons

 - other assorted fixes

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (76 commits)
  Input: byd - make array seq static, reduces object code size
  Input: xilinx_ps2 - fix multiline comment style
  Input: pxa27x_keypad - handle return value of clk_prepare_enable
  Input: tegra-kbc - handle return value of clk_prepare_enable
  Input: PS/2 gpio bit banging driver for serio bus
  Input: xen-kbdfront - enable auto repeat for xen keyboard frontend driver
  Input: ambakmi - constify amba_id
  Input: atmel_mxt_ts - add support for reset line
  Input: atmel_mxt_ts - use more managed resources
  Input: wacom_w8001 - constify serio_device_id
  Input: tsc40 - constify serio_device_id
  Input: touchwin - constify serio_device_id
  Input: touchright - constify serio_device_id
  Input: touchit213 - constify serio_device_id
  Input: penmount - constify serio_device_id
  Input: mtouch - constify serio_device_id
  Input: inexio - constify serio_device_id
  Input: hampshire - constify serio_device_id
  Input: gunze - constify serio_device_id
  Input: fujitsu_ts - constify serio_device_id
  ...
2017-09-07 13:39:21 -07:00
Linus Torvalds
dfd9e6d231 Merge tag 'mailbox-v4.14' of git://git.linaro.org/landing-teams/working/fujitsu/integration
Pull mailbox updates from Jassi Brar:
 "Just behavorial changes to a controller driver: the Broadcom's Flexrm
  mailbox driver has been modifified to support debugfs and TX-Done
  mechanism by ACK.

  Nothing for the core mailbox stack"

* tag 'mailbox-v4.14' of git://git.linaro.org/landing-teams/working/fujitsu/integration:
  mailbox: bcm-flexrm-mailbox: Use txdone_ack instead of txdone_poll
  mailbox: bcm-flexrm-mailbox: Use bitmap instead of IDA
  mailbox: bcm-flexrm-mailbox: Fix mask used in CMPL_START_ADDR_VALUE()
  mailbox: bcm-flexrm-mailbox: Add debugfs support
  mailbox: bcm-flexrm-mailbox: Set IRQ affinity hint for FlexRM ring IRQs
2017-09-07 13:23:37 -07:00
Linus Torvalds
c0da4fa0d1 Merge tag 'media/v4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:
 "Brazil's Independence Day pull request :-)

  This is one of the biggest media pull requests, with 625 patches
  affecting almost all parts of media (RC, DVB, V4L2, CEC, docs).

  This contains:

   - A lot of new drivers:
     * DVB frontends: mxl5xx, stv0910, stv6111;
     * camera flash: as3645a led driver;
     * HDMI receiver: adv748X;
     * camera sensor: Omnivision 6650 5M driver (ov6650);
     * HDMI CEC: ao-cec meson driver;
     * V4L2: Qualcom camss driver;
     * Remote controller: gpio-ir-tx, pwm-ir-tx and zx-irdec drivers.

   - The DDbridge DVB driver got a massive update, with makes it in sync
     with modern hardware from that vendor;

   - There's an important milestone on this series: the DVB
     documentation was written in 2003, but only started to be updated
     in 2007. It also used to contain several gaps from the time it was
     kept out of tree, mentioning error codes and device nodes that
     never existed upstream. On this series, it received a massive
     update: all non-deprecated digital TV APIs are now in sync with the
     current implementation;

   - Some DVB APIs that aren't used by any upstream driver got removed;

   - Other parts of the media documentation algo got updated, fixing
     some bugs on its PDF output and making it compatible with Sphinx
     version 1.6.

     As the number of hacks required to build PDF output reduced, I hope
     we'll have less troubles as newer versions of our documentation
     toolchain are released (famous last words);

   - As usual, lots of driver cleanups and improvements"

* tag 'media/v4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (624 commits)
  media: leds: as3645a: add V4L2_FLASH_LED_CLASS dependency
  media: get rid of removed DMX_GET_CAPS and DMX_SET_SOURCE leftovers
  media: Revert "[media] v4l: async: make v4l2 coexist with devicetree nodes in a dt overlay"
  media: staging: atomisp: sh_css_calloc shall return a pointer to the allocated space
  media: Revert "[media] lirc_dev: remove superfluous get/put_device() calls"
  media: add qcom_camss.rst to v4l-drivers rst file
  media: dvb headers: make checkpatch happier
  media: dvb uapi: move frontend legacy API to another part of the book
  media: pixfmt-srggb12p.rst: better format the table for PDF output
  media: docs-rst: media: Don't use \small for V4L2_PIX_FMT_SRGGB10 documentation
  media: index.rst: don't write "Contents:" on PDF output
  media: pixfmt*.rst: replace a two dots by a comma
  media: vidioc-g-fmt.rst: adjust table format
  media: vivid.rst: add a blank line to correct ReST format
  media: v4l2 uapi book: get rid of driver programming's chapter
  media: format.rst: use the right markup for important notes
  media: docs-rst: cardlists: change their format to flat-tables
  media: em28xx-cardlist.rst: update to reflect last changes
  media: v4l2-event.rst: adjust table to fit on PDF output
  media: docs: don't show ToC for each part on PDF output
  ...
2017-09-07 12:53:14 -07:00
Linus Torvalds
d969443064 Merge tag 'sound-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound updates from Takashi Iwai:
 "We have touched quite a lot of files but with fewer changes at this
  cycle; as you can see, most of changes are trivial fixes, especially
  constification patches.

  Among the massive attacks by constification gangs, we had a few core
  changes (mostly for ASoC core), as well the fixes and the updates by
  major vendors.

  Some highlights:

  ALSA core:

   - Fix possible races in control API user-TLV codes

   - Small cleanup of PCM core

  ASoC:

   - Continued work for componentization; still half-baked, but we're
     certainly progressing

   - Use of devres for jack detection GPIOs, rather as a cleanup

   - Jack detection support for Qualcomm MSM8916

   - Support for Allwinner H3, Cirrus Logic CS43130, Intel Kabylake
     systems with RT5663, Realtek RT274, TI TLV320AIC32x6 and Wolfson
     WM8523"

* tag 'sound-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (512 commits)
  ALSA: hda/ca0132 - Fix memory leak at error path
  ALSA: hda: Fix forget to free resource in error handling code path in hda_codec_driver_probe
  ASoC: cs43130: Fix unused compiler warnings for PM runtime
  ASoC: cs43130: Fix possible Oops with invalid dev_id
  ASoC: cs43130: fix spelling mistake: "irq_occurrance" -> "irq_occurrence"
  ALSA: atmel: Remove leftovers of AVR32 removal
  ALSA: atmel: convert AC97c driver to GPIO descriptor API
  ALSA: hda/realtek - Enable jack detection function for Intel ALC700
  ALSA: hda: Fix regression of hdmi eld control created based on invalid pcm
  ASoC: Intel: Skylake: Add IPC to configure the copier secondary pins
  ASoC: add missing compile rule for max98371
  ASoC: add missing compile rule for sirf-audio-codec
  ASoC: add missing compile rule for max98371
  ASoC: cs43130: Add devicetree bindings for CS43130
  ASoC: cs43130: Add support for CS43130 codec
  ASoC: make clock direction configurable in asoc-simple
  ALSA: ctxfi: Remove null check before kfree
  ASoC: max98927: Changed device property read function
  ASoC: max98927: Modified DAPM widget and map to enable/disable VI sense path
  ASoC: max98927: Added PM suspend and resume function
  ...
2017-09-07 12:44:53 -07:00
Linus Torvalds
3645e6d0dc Merge tag 'md/4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md
Pull MD updates from Shaohua Li:
 "This update mainly fixes bugs:

   - Make raid5 ppl support several ppl from Pawel

   - Several raid5-cache bug fixes from Song

   - Bitmap fixes from Neil and Me

   - One raid1/10 regression fix since 4.12 from Me

   - Other small fixes and cleanup"

* tag 'md/4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
  md/bitmap: disable bitmap_resize for file-backed bitmaps.
  raid5-ppl: Recovery support for multiple partial parity logs
  md: Runtime support for multiple ppls
  md/raid0: attach correct cgroup info in bio
  lib/raid6: align AVX512 constants to 512 bits, not bytes
  raid5: remove raid5_build_block
  md/r5cache: call mddev_lock/unlock() in r5c_journal_mode_show
  md: replace seq_release_private with seq_release
  md: notify about new spare disk in the container
  md/raid1/10: reset bio allocated from mempool
  md/raid5: release/flush io in raid5_do_work()
  md/bitmap: copy correct data for bitmap super
2017-09-07 12:41:48 -07:00
Linus Torvalds
15d8ffc964 Merge tag 'mmc-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC updates from Ulf Hansson:
 "MMC core:
   - Continue to refactor the mmc block code to prepare for blkmq
   - Move mmc block debugfs into block module
   - Next step for eMMC CMDQ by adding a new mmc host interface for it
   - Move Kconfig option MMC_DEBUG from core to host
   - Some additional minor improvements

  MMC host:
   - Declare structs as const when applicable
   - Explicitly request exclusive reset control when applicable
   - Improve some error paths and other various cleanups
   - sdhci: Preparations to support SDHCI OMAP
   - sdhci: Improve some PM related code
   - sdhci: Re-factoring and modernizations
   - sdhci-xenon: Add runtime PM and system sleep support
   - sdhci-xenon: Add support for eMMC HS400 Enhanced Strobe
   - sdhci-cadence: Add system sleep support
   - sdhci-of-at91: Improve system sleep support
   - dw_mmc: Add support for Hisilicon hi3660
   - sunxi: Add support for A83T eMMC
   - sunxi: Add support for DDR52 mode
   - meson-gx: Add support for UHS-I SD-cards
   - meson-gx: Cleanups and improvements
   - tmio: Fix CMD12 (STOP) handling
   - tmio: Cleanups and improvements
   - renesas_sdhi: Add r8a7743/5 support
   - renesas-sdhi: Add support for R-Car Gen3 SDHI DMAC
   - renesas_sdhi: Cleanups and improvements"

* tag 'mmc-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: (145 commits)
  mmc: renesas_sdhi: Add r8a7743/5 support
  mmc: meson-gx: fix __ffsdi2 undefined on arm32
  mmc: sdhci-xenon: add runtime pm support and reimplement standby
  mmc: core: Move mmc_start_areq() declaration
  mmc: mmci: stop building qcom dml as module
  mmc: sunxi: Reset the device at probe time
  clk: sunxi-ng: Provide a default reset hook
  mmc: meson-gx: rework tuning function
  mmc: meson-gx: change default tx phase
  mmc: meson-gx: implement voltage switch callback
  mmc: meson-gx: use CCF to handle the clock phases
  mmc: meson-gx: implement card_busy callback
  mmc: meson-gx: simplify interrupt handler
  mmc: meson-gx: work around clk-stop issue
  mmc: meson-gx: fix dual data rate mode frequencies
  mmc: meson-gx: rework clock init function
  mmc: meson-gx: rework clk_set function
  mmc: meson-gx: rework set_ios function
  mmc: meson-gx: cfg init overwrite values
  mmc: meson-gx: initialize sane clk default before clock register
  ...
2017-09-07 12:24:50 -07:00
James Bottomley
2441500a41 Merge branch 'fixes' into misc 2017-09-07 12:12:43 -07:00
Linus Torvalds
a0725ab0c7 Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block
Pull block layer updates from Jens Axboe:
 "This is the first pull request for 4.14, containing most of the code
  changes. It's a quiet series this round, which I think we needed after
  the churn of the last few series. This contains:

   - Fix for a registration race in loop, from Anton Volkov.

   - Overflow complaint fix from Arnd for DAC960.

   - Series of drbd changes from the usual suspects.

   - Conversion of the stec/skd driver to blk-mq. From Bart.

   - A few BFQ improvements/fixes from Paolo.

   - CFQ improvement from Ritesh, allowing idling for group idle.

   - A few fixes found by Dan's smatch, courtesy of Dan.

   - A warning fixup for a race between changing the IO scheduler and
     device remova. From David Jeffery.

   - A few nbd fixes from Josef.

   - Support for cgroup info in blktrace, from Shaohua.

   - Also from Shaohua, new features in the null_blk driver to allow it
     to actually hold data, among other things.

   - Various corner cases and error handling fixes from Weiping Zhang.

   - Improvements to the IO stats tracking for blk-mq from me. Can
     drastically improve performance for fast devices and/or big
     machines.

   - Series from Christoph removing bi_bdev as being needed for IO
     submission, in preparation for nvme multipathing code.

   - Series from Bart, including various cleanups and fixes for switch
     fall through case complaints"

* 'for-4.14/block' of git://git.kernel.dk/linux-block: (162 commits)
  kernfs: checking for IS_ERR() instead of NULL
  drbd: remove BIOSET_NEED_RESCUER flag from drbd_{md_,}io_bio_set
  drbd: Fix allyesconfig build, fix recent commit
  drbd: switch from kmalloc() to kmalloc_array()
  drbd: abort drbd_start_resync if there is no connection
  drbd: move global variables to drbd namespace and make some static
  drbd: rename "usermode_helper" to "drbd_usermode_helper"
  drbd: fix race between handshake and admin disconnect/down
  drbd: fix potential deadlock when trying to detach during handshake
  drbd: A single dot should be put into a sequence.
  drbd: fix rmmod cleanup, remove _all_ debugfs entries
  drbd: Use setup_timer() instead of init_timer() to simplify the code.
  drbd: fix potential get_ldev/put_ldev refcount imbalance during attach
  drbd: new disk-option disable-write-same
  drbd: Fix resource role for newly created resources in events2
  drbd: mark symbols static where possible
  drbd: Send P_NEG_ACK upon write error in protocol != C
  drbd: add explicit plugging when submitting batches
  drbd: change list_for_each_safe to while(list_first_entry_or_null)
  drbd: introduce drbd_recv_header_maybe_unplug
  ...
2017-09-07 11:59:42 -07:00
Linus Torvalds
3ee31b89d9 Merge tag 'for-linus-4.14b-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen updates from Juergen Gross:

 - the new pvcalls backend for routing socket calls from a guest to dom0

 - some cleanups of Xen code

 - a fix for wrong usage of {get,put}_cpu()

* tag 'for-linus-4.14b-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (27 commits)
  xen/mmu: set MMU_NORMAL_PT_UPDATE in remap_area_mfn_pte_fn
  xen: Don't try to call xen_alloc_p2m_entry() on autotranslating guests
  xen/events: events_fifo: Don't use {get,put}_cpu() in xen_evtchn_fifo_init()
  xen/pvcalls: use WARN_ON(1) instead of __WARN()
  xen: remove not used trace functions
  xen: remove unused function xen_set_domain_pte()
  xen: remove tests for pvh mode in pure pv paths
  xen-platform: constify pci_device_id.
  xen: cleanup xen.h
  xen: introduce a Kconfig option to enable the pvcalls backend
  xen/pvcalls: implement write
  xen/pvcalls: implement read
  xen/pvcalls: implement the ioworker functions
  xen/pvcalls: disconnect and module_exit
  xen/pvcalls: implement release command
  xen/pvcalls: implement poll command
  xen/pvcalls: implement accept command
  xen/pvcalls: implement listen command
  xen/pvcalls: implement bind command
  xen/pvcalls: implement connect command
  ...
2017-09-07 10:24:21 -07:00
Linus Torvalds
bac65d9d87 Merge tag 'powerpc-4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
 "Nothing really major this release, despite quite a lot of activity.
  Just lots of things all over the place.

  Some things of note include:

   - Access via perf to a new type of PMU (IMC) on Power9, which can
     count both core events as well as nest unit events (Memory
     controller etc).

   - Optimisations to the radix MMU TLB flushing, mostly to avoid
     unnecessary Page Walk Cache (PWC) flushes when the structure of the
     tree is not changing.

   - Reworks/cleanups of do_page_fault() to modernise it and bring it
     closer to other architectures where possible.

   - Rework of our page table walking so that THP updates only need to
     send IPIs to CPUs where the affected mm has run, rather than all
     CPUs.

   - The size of our vmalloc area is increased to 56T on 64-bit hash MMU
     systems. This avoids problems with the percpu allocator on systems
     with very sparse NUMA layouts.

   - STRICT_KERNEL_RWX support on PPC32.

   - A new sched domain topology for Power9, to capture the fact that
     pairs of cores may share an L2 cache.

   - Power9 support for VAS, which is a new mechanism for accessing
     coprocessors, and initial support for using it with the NX
     compression accelerator.

   - Major work on the instruction emulation support, adding support for
     many new instructions, and reworking it so it can be used to
     implement the emulation needed to fixup alignment faults.

   - Support for guests under PowerVM to use the Power9 XIVE interrupt
     controller.

  And probably that many things again that are almost as interesting,
  but I had to keep the list short. Plus the usual fixes and cleanups as
  always.

  Thanks to: Alexey Kardashevskiy, Alistair Popple, Andreas Schwab,
  Aneesh Kumar K.V, Anju T Sudhakar, Arvind Yadav, Balbir Singh,
  Benjamin Herrenschmidt, Bhumika Goyal, Breno Leitao, Bryant G. Ly,
  Christophe Leroy, Cédric Le Goater, Dan Carpenter, Dou Liyang,
  Frederic Barrat, Gautham R. Shenoy, Geliang Tang, Geoff Levand, Hannes
  Reinecke, Haren Myneni, Ivan Mikhaylov, John Allen, Julia Lawall,
  LABBE Corentin, Laurentiu Tudor, Madhavan Srinivasan, Markus Elfring,
  Masahiro Yamada, Matt Brown, Michael Neuling, Murilo Opsfelder Araujo,
  Nathan Fontenot, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran,
  Paul Mackerras, Rashmica Gupta, Rob Herring, Rui Teng, Sam Bobroff,
  Santosh Sivaraj, Scott Wood, Shilpasri G Bhat, Sukadev Bhattiprolu,
  Suraj Jitindar Singh, Tobin C. Harding, Victor Aoqui"

* tag 'powerpc-4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (321 commits)
  powerpc/xive: Fix section __init warning
  powerpc: Fix kernel crash in emulation of vector loads and stores
  powerpc/xive: improve debugging macros
  powerpc/xive: add XIVE Exploitation Mode to CAS
  powerpc/xive: introduce H_INT_ESB hcall
  powerpc/xive: add the HW IRQ number under xive_irq_data
  powerpc/xive: introduce xive_esb_write()
  powerpc/xive: rename xive_poke_esb() in xive_esb_read()
  powerpc/xive: guest exploitation of the XIVE interrupt controller
  powerpc/xive: introduce a common routine xive_queue_page_alloc()
  powerpc/sstep: Avoid used uninitialized error
  axonram: Return directly after a failed kzalloc() in axon_ram_probe()
  axonram: Improve a size determination in axon_ram_probe()
  axonram: Delete an error message for a failed memory allocation in axon_ram_probe()
  powerpc/powernv/npu: Move tlb flush before launching ATSD
  powerpc/macintosh: constify wf_sensor_ops structures
  powerpc/iommu: Use permission-specific DEVICE_ATTR variants
  powerpc/eeh: Delete an error out of memory message at init time
  powerpc/mm: Use seq_putc() in two functions
  macintosh: Convert to using %pOF instead of full_name
  ...
2017-09-07 10:15:40 -07:00
Linus Torvalds
f92e3da18b Merge branch 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Transparently fall back to other poweroff method(s) if EFI poweroff
     fails (and returns)

   - Use separate PE/COFF section headers for the RX and RW parts of the
     ARM stub loader so that the firmware can use strict mapping
     permissions

   - Add support for requesting the firmware to wipe RAM at warm reboot

   - Increase the size of the random seed obtained from UEFI so CRNG
     fast init can complete earlier

   - Update the EFI framebuffer address if it points to a BAR that gets
     moved by the PCI resource allocation code

   - Enable "reset attack mitigation" of TPM environments: this is
     enabled if the kernel is configured with
     CONFIG_RESET_ATTACK_MITIGATION=y.

   - Clang related fixes

   - Misc cleanups, constification, refactoring, etc"

* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi/bgrt: Use efi_mem_type()
  efi: Move efi_mem_type() to common code
  efi/reboot: Make function pointer orig_pm_power_off static
  efi/random: Increase size of firmware supplied randomness
  efi/libstub: Enable reset attack mitigation
  firmware/efi/esrt: Constify attribute_group structures
  firmware/efi: Constify attribute_group structures
  firmware/dcdbas: Constify attribute_group structures
  arm/efi: Split zImage code and data into separate PE/COFF sections
  arm/efi: Replace open coded constants with symbolic ones
  arm/efi: Remove pointless dummy .reloc section
  arm/efi: Remove forbidden values from the PE/COFF header
  drivers/fbdev/efifb: Allow BAR to be moved instead of claiming it
  efi/reboot: Fall back to original power-off method if EFI_RESET_SHUTDOWN returns
  efi/arm/arm64: Add missing assignment of efi.config_table
  efi/libstub/arm64: Set -fpie when building the EFI stub
  efi/libstub/arm64: Force 'hidden' visibility for section markers
  efi/libstub/arm64: Use hidden attribute for struct screen_info reference
  efi/arm: Don't mark ACPI reclaim memory as MEMBLOCK_NOMAP
2017-09-07 09:42:35 -07:00
Mauricio Faria de Oliveira
2a8a98673c fs: aio: fix the increment of aio-nr and counting against aio-max-nr
Currently, aio-nr is incremented in steps of 'num_possible_cpus() * 8'
for io_setup(nr_events, ..) with 'nr_events < num_possible_cpus() * 4':

    ioctx_alloc()
    ...
        nr_events = max(nr_events, num_possible_cpus() * 4);
        nr_events *= 2;
    ...
        ctx->max_reqs = nr_events;
    ...
        aio_nr += ctx->max_reqs;
    ....

This limits the number of aio contexts actually available to much less
than aio-max-nr, and is increasingly worse with greater number of CPUs.

For example, with 64 CPUs, only 256 aio contexts are actually available
(with aio-max-nr = 65536) because the increment is 512 in that scenario.

Note: 65536 [max aio contexts] / (64*4*2) [increment per aio context]
is 128, but make it 256 (double) as counting against 'aio-max-nr * 2':

    ioctx_alloc()
    ...
        if (aio_nr + nr_events > (aio_max_nr * 2UL) ||
        ...
            goto err_ctx;
    ...

This patch uses the original value of nr_events (from userspace) to
increment aio-nr and count against aio-max-nr, which resolves those.

Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Reported-by: Lekshmi C. Pillai <lekshmi.cpillai@in.ibm.com>
Tested-by: Lekshmi C. Pillai <lekshmi.cpillai@in.ibm.com>
Tested-by: Paul Nguyen <nguyenp@us.ibm.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
2017-09-07 12:28:28 -04:00
Linus Torvalds
57e88b43b8 Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 platform updates from Ingo Molnar:
 "The main changes include various Hyper-V optimizations such as faster
  hypercalls and faster/better TLB flushes - and there's also some
  Intel-MID cleanups"

* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tracing/hyper-v: Trace hyperv_mmu_flush_tlb_others()
  x86/hyper-v: Support extended CPU ranges for TLB flush hypercalls
  x86/platform/intel-mid: Make several arrays static, to make code smaller
  MAINTAINERS: Add missed file for Hyper-V
  x86/hyper-v: Use hypercall for remote TLB flush
  hyper-v: Globalize vp_index
  x86/hyper-v: Implement rep hypercalls
  hyper-v: Use fast hypercall for HVCALL_SIGNAL_EVENT
  x86/hyper-v: Introduce fast hypercall implementation
  x86/hyper-v: Make hv_do_hypercall() inline
  x86/hyper-v: Include hyperv/ only when CONFIG_HYPERV is set
  x86/platform/intel-mid: Make 'bt_sfi_data' const
  x86/platform/intel-mid: Make IRQ allocation a bit more flexible
  x86/platform/intel-mid: Group timers callbacks together
2017-09-07 09:25:15 -07:00
Linus Torvalds
3b9f8ed25d Merge branch 'for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata
Pull libata updates from Tejun Heo:
 "Except for the ahci fix that fixes a boot issue, nothing major in this
  pull request. Some new platform controller support and device specific
  changes"

* 'for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
  libata: zpodd: make arrays cdb static, reduces object code size
  ahci: don't use MSI for devices with the silly Intel NVMe remapping scheme
  dt-bindings: ata: add DT bindings for MediaTek SATA controller
  ata: mediatek: add support for MediaTek SATA controller
  pata_octeon_cf: use of_property_read_{bool|u32}()
  cs5536: add support for IDE controller variant
  ata: sata_gemini: Introduce explicit IDE pin control
  ata: sata_gemini: Retire custom pin control
  ata: ahci_platform: Add shutdown handler
  ata: sata_gemini: explicitly request exclusive reset control
  ata: Drop unnecessary static
  ata: Convert to using %pOF instead of full_name
2017-09-06 22:41:21 -07:00
Linus Torvalds
608c1d3c17 Merge branch 'for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
 "Several notable changes this cycle:

   - Thread mode was merged. This will be used for cgroup2 support for
     CPU and possibly other controllers. Unfortunately, CPU controller
     cgroup2 support didn't make this pull request but most contentions
     have been resolved and the support is likely to be merged before
     the next merge window.

   - cgroup.stat now shows the number of descendant cgroups.

   - cpuset now can enable the easier-to-configure v2 behavior on v1
     hierarchy"

* 'for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (21 commits)
  cpuset: Allow v2 behavior in v1 cgroup
  cgroup: Add mount flag to enable cpuset to use v2 behavior in v1 cgroup
  cgroup: remove unneeded checks
  cgroup: misc changes
  cgroup: short-circuit cset_cgroup_from_root() on the default hierarchy
  cgroup: re-use the parent pointer in cgroup_destroy_locked()
  cgroup: add cgroup.stat interface with basic hierarchy stats
  cgroup: implement hierarchy limits
  cgroup: keep track of number of descent cgroups
  cgroup: add comment to cgroup_enable_threaded()
  cgroup: remove unnecessary empty check when enabling threaded mode
  cgroup: update debug controller to print out thread mode information
  cgroup: implement cgroup v2 thread support
  cgroup: implement CSS_TASK_ITER_THREADED
  cgroup: introduce cgroup->dom_cgrp and threaded css_set handling
  cgroup: add @flags to css_task_iter_start() and implement CSS_TASK_ITER_PROCS
  cgroup: reorganize cgroup.procs / task write path
  cgroup: replace css_set walking populated test with testing cgrp->nr_populated_csets
  cgroup: distinguish local and children populated states
  cgroup: remove now unused list_head @pending in cgroup_apply_cftypes()
  ...
2017-09-06 22:25:25 -07:00
Linus Torvalds
9954d4892a Merge branch 'for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue updates from Tejun Heo:
 "Nothing major. I introduced a flag collsion bug during v4.13 cycle
  which is fixed in this pull request. Fortunately, the flag is for
  debugging / verification and the bug isn't critical"

* 'for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: Fix flag collision
  workqueue: Use TASK_IDLE
  workqueue: fix path to documentation
  workqueue: doc change for ST behavior on NUMA systems
2017-09-06 21:59:31 -07:00
Linus Torvalds
a7cbfd05f4 Merge branch 'for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
Pull percpu updates from Tejun Heo:
 "A lot of changes for percpu this time around. percpu inherited the
  same area allocator from the original pre-virtual-address-mapped
  implementation. This was from the time when percpu allocator wasn't
  used all that much and the implementation was focused on simplicity,
  with the unfortunate computational complexity of O(number of areas
  allocated from the chunk) per alloc / free.

  With the increase in percpu usage, we're hitting cases where the lack
  of scalability is hurting. The most prominent one right now is bpf
  perpcu map creation / destruction which may allocate and free a lot of
  entries consecutively and it's likely that the problem will become
  more prominent in the future.

  To address the issue, Dennis replaced the area allocator with hinted
  bitmap allocator which is more consistent. While the new allocator
  does perform a bit worse in some cases, it outperforms the old
  allocator way more than an order of magnitude in other more common
  scenarios while staying mostly flat in CPU overhead and completely
  flat in memory consumption"

* 'for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (27 commits)
  percpu: update header to contain bitmap allocator explanation.
  percpu: update pcpu_find_block_fit to use an iterator
  percpu: use metadata blocks to update the chunk contig hint
  percpu: update free path to take advantage of contig hints
  percpu: update alloc path to only scan if contig hints are broken
  percpu: keep track of the best offset for contig hints
  percpu: skip chunks if the alloc does not fit in the contig hint
  percpu: add first_bit to keep track of the first free in the bitmap
  percpu: introduce bitmap metadata blocks
  percpu: replace area map allocator with bitmap
  percpu: generalize bitmap (un)populated iterators
  percpu: increase minimum percpu allocation size and align first regions
  percpu: introduce nr_empty_pop_pages to help empty page accounting
  percpu: change the number of pages marked in the first_chunk pop bitmap
  percpu: combine percpu address checks
  percpu: modify base_addr to be region specific
  percpu: setup_first_chunk rename schunk/dchunk to chunk
  percpu: end chunk area maps page aligned for the populated bitmap
  percpu: unify allocation of schunk and dchunk
  percpu: setup_first_chunk remove dyn_size and consolidate logic
  ...
2017-09-06 21:33:12 -07:00
Linus Torvalds
d34fc1adf0 Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:

 - various misc bits

 - DAX updates

 - OCFS2

 - most of MM

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (119 commits)
  mm,fork: introduce MADV_WIPEONFORK
  x86,mpx: make mpx depend on x86-64 to free up VMA flag
  mm: add /proc/pid/smaps_rollup
  mm: hugetlb: clear target sub-page last when clearing huge page
  mm: oom: let oom_reap_task and exit_mmap run concurrently
  swap: choose swap device according to numa node
  mm: replace TIF_MEMDIE checks by tsk_is_oom_victim
  mm, oom: do not rely on TIF_MEMDIE for memory reserves access
  z3fold: use per-cpu unbuddied lists
  mm, swap: don't use VMA based swap readahead if HDD is used as swap
  mm, swap: add sysfs interface for VMA based swap readahead
  mm, swap: VMA based swap readahead
  mm, swap: fix swap readahead marking
  mm, swap: add swap readahead hit statistics
  mm/vmalloc.c: don't reinvent the wheel but use existing llist API
  mm/vmstat.c: fix wrong comment
  selftests/memfd: add memfd_create hugetlbfs selftest
  mm/shmem: add hugetlbfs support to memfd_create()
  mm, devm_memremap_pages: use multi-order radix for ZONE_DEVICE lookups
  mm/vmalloc.c: halve the number of comparisons performed in pcpu_get_vm_areas()
  ...
2017-09-06 20:49:49 -07:00
Andy Lutomirski
1c9fe4409c x86/mm: Document how CR4.PCIDE restore works
While debugging a problem, I thought that using
cr4_set_bits_and_update_boot() to restore CR4.PCIDE would be
helpful.  It turns out to be counterproductive.

Add a comment documenting how this works.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06 20:12:57 -07:00