Commit Graph

96477 Commits

Author SHA1 Message Date
Pavel Tatashin
a4a3ede213 mm: zero reserved and unavailable struct pages
Some memory is reserved but unavailable: not present in memblock.memory
(because not backed by physical pages), but present in memblock.reserved.
Such memory has backing struct pages, but they are not initialized by
going through __init_single_page().

In some cases these struct pages are accessed even if they do not
contain any data.  One example is page_to_pfn() might access page->flags
if this is where section information is stored (CONFIG_SPARSEMEM,
SECTION_IN_PAGE_FLAGS).

One example of such memory: trim_low_memory_range() unconditionally
reserves from pfn 0, but e820__memblock_setup() might provide the
exiting memory from pfn 1 (i.e.  KVM).

Since struct pages are zeroed in __init_single_page(), and not during
allocation time, we must zero such struct pages explicitly.

The patch involves adding a new memblock iterator:
	for_each_resv_unavail_range(i, p_start, p_end)

Which iterates through reserved && !memory lists, and we zero struct pages
explicitly by calling mm_zero_struct_page().

===

Here is more detailed example of problem that this patch is addressing:

Run tested on qemu with the following arguments:

	-enable-kvm -cpu kvm64 -m 512 -smp 2

This patch reports that there are 98 unavailable pages.

They are: pfn 0 and pfns in range [159, 255].

Note, trim_low_memory_range() reserves only pfns in range [0, 15], it does
not reserve [159, 255] ones.

e820__memblock_setup() reports linux that the following physical ranges are
available:
    [1 , 158]
[256, 130783]

Notice, that exactly unavailable pfns are missing!

Now, lets check what we have in zone 0: [1, 131039]

pfn 0, is not part of the zone, but pfns [1, 158], are.

However, the bigger problem we have if we do not initialize these struct
pages is with memory hotplug.  Because, that path operates at 2M
boundaries (section_nr).  And checks if 2M range of pages is hot
removable.  It starts with first pfn from zone, rounds it down to 2M
boundary (sturct pages are allocated at 2M boundaries when vmemmap is
created), and checks if that section is hot removable.  In this case
start with pfn 1 and convert it down to pfn 0.  Later pfn is converted
to struct page, and some fields are checked.  Now, if we do not zero
struct pages, we get unpredictable results.

In fact when CONFIG_VM_DEBUG is enabled, and we explicitly set all
vmemmap memory to ones, the following panic is observed with kernel test
without this patch applied:

  BUG: unable to handle kernel NULL pointer dereference at          (null)
  IP: is_pageblock_removable_nolock+0x35/0x90
  PGD 0 P4D 0
  Oops: 0000 [#1] PREEMPT
  ...
  task: ffff88001f4e2900 task.stack: ffffc90000314000
  RIP: 0010:is_pageblock_removable_nolock+0x35/0x90
  Call Trace:
   ? is_mem_section_removable+0x5a/0xd0
   show_mem_removable+0x6b/0xa0
   dev_attr_show+0x1b/0x50
   sysfs_kf_seq_show+0xa1/0x100
   kernfs_seq_show+0x22/0x30
   seq_read+0x1ac/0x3a0
   kernfs_fop_read+0x36/0x190
   ? security_file_permission+0x90/0xb0
   __vfs_read+0x16/0x30
   vfs_read+0x81/0x130
   SyS_read+0x44/0xa0
   entry_SYSCALL_64_fastpath+0x1f/0xbd

Link: http://lkml.kernel.org/r/20171013173214.27300-7-pasha.tatashin@oracle.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Tested-by: Bob Picco <bob.picco@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:05 -08:00
Pavel Tatashin
ea1f5f3712 mm: define memblock_virt_alloc_try_nid_raw
* A new variant of memblock_virt_alloc_* allocations:
memblock_virt_alloc_try_nid_raw()
    - Does not zero the allocated memory
    - Does not panic if request cannot be satisfied

* optimize early system hash allocations

Clients can call alloc_large_system_hash() with flag: HASH_ZERO to
specify that memory that was allocated for system hash needs to be
zeroed, otherwise the memory does not need to be zeroed, and client will
initialize it.

If memory does not need to be zero'd, call the new
memblock_virt_alloc_raw() interface, and thus improve the boot
performance.

* debug for raw alloctor

When CONFIG_DEBUG_VM is enabled, this patch sets all the memory that is
returned by memblock_virt_alloc_try_nid_raw() to ones to ensure that no
places excpect zeroed memory.

Link: http://lkml.kernel.org/r/20171013173214.27300-6-pasha.tatashin@oracle.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Tested-by: Bob Picco <bob.picco@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:05 -08:00
Levin, Alexander (Sasha Levin)
4675ff05de kmemcheck: rip it out
Fix up makefiles, remove references, and git rm kmemcheck.

Link: http://lkml.kernel.org/r/20171007030159.22241-4-alexander.levin@verizon.com
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Vegard Nossum <vegardno@ifi.uio.no>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Tim Hansen <devtimhansen@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:05 -08:00
Levin, Alexander (Sasha Levin)
d8be75663c kmemcheck: remove whats left of NOTRACK flags
Now that kmemcheck is gone, we don't need the NOTRACK flags.

Link: http://lkml.kernel.org/r/20171007030159.22241-5-alexander.levin@verizon.com
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim Hansen <devtimhansen@gmail.com>
Cc: Vegard Nossum <vegardno@ifi.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:05 -08:00
Levin, Alexander (Sasha Levin)
75f296d93b kmemcheck: stop using GFP_NOTRACK and SLAB_NOTRACK
Convert all allocations that used a NOTRACK flag to stop using it.

Link: http://lkml.kernel.org/r/20171007030159.22241-3-alexander.levin@verizon.com
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim Hansen <devtimhansen@gmail.com>
Cc: Vegard Nossum <vegardno@ifi.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:04 -08:00
Levin, Alexander (Sasha Levin)
4950276672 kmemcheck: remove annotations
Patch series "kmemcheck: kill kmemcheck", v2.

As discussed at LSF/MM, kill kmemcheck.

KASan is a replacement that is able to work without the limitation of
kmemcheck (single CPU, slow).  KASan is already upstream.

We are also not aware of any users of kmemcheck (or users who don't
consider KASan as a suitable replacement).

The only objection was that since KASAN wasn't supported by all GCC
versions provided by distros at that time we should hold off for 2
years, and try again.

Now that 2 years have passed, and all distros provide gcc that supports
KASAN, kill kmemcheck again for the very same reasons.

This patch (of 4):

Remove kmemcheck annotations, and calls to kmemcheck from the kernel.

[alexander.levin@verizon.com: correctly remove kmemcheck call from dma_map_sg_attrs]
  Link: http://lkml.kernel.org/r/20171012192151.26531-1-alexander.levin@verizon.com
Link: http://lkml.kernel.org/r/20171007030159.22241-2-alexander.levin@verizon.com
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim Hansen <devtimhansen@gmail.com>
Cc: Vegard Nossum <vegardno@ifi.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:04 -08:00
Kirill A. Shutemov
af5b0f6a09 mm: consolidate page table accounting
Currently, we account page tables separately for each page table level,
but that's redundant -- we only make use of total memory allocated to
page tables for oom_badness calculation.  We also provide the
information to userspace, but it has dubious value there too.

This patch switches page table accounting to single counter.

mm->pgtables_bytes is now used to account all page table levels.  We use
bytes, because page table size for different levels of page table tree
may be different.

The change has user-visible effect: we don't have VmPMD and VmPUD
reported in /proc/[pid]/status.  Not sure if anybody uses them.  (As
alternative, we can always report 0 kB for them.)

OOM-killer report is also slightly changed: we now report pgtables_bytes
instead of nr_ptes, nr_pmd, nr_puds.

Apart from reducing number of counters per-mm, the benefit is that we
now calculate oom_badness() more correctly for machines which have
different size of page tables depending on level or where page tables
are less than a page in size.

The only downside can be debuggability because we do not know which page
table level could leak.  But I do not remember many bugs that would be
caught by separate counters so I wouldn't lose sleep over this.

[akpm@linux-foundation.org: fix mm/huge_memory.c]
Link: http://lkml.kernel.org/r/20171006100651.44742-2-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
[kirill.shutemov@linux.intel.com: fix build]
  Link: http://lkml.kernel.org/r/20171016150113.ikfxy3e7zzfvsr4w@black.fi.intel.com
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:04 -08:00
Kirill A. Shutemov
c4812909f5 mm: introduce wrappers to access mm->nr_ptes
Let's add wrappers for ->nr_ptes with the same interface as for nr_pmd
and nr_pud.

The patch also makes nr_ptes accounting dependent onto CONFIG_MMU.  Page
table accounting doesn't make sense if you don't have page tables.

It's preparation for consolidation of page-table counters in mm_struct.

Link: http://lkml.kernel.org/r/20171006100651.44742-1-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:04 -08:00
Kirill A. Shutemov
b4e98d9ac7 mm: account pud page tables
On a machine with 5-level paging support a process can allocate
significant amount of memory and stay unnoticed by oom-killer and memory
cgroup.  The trick is to allocate a lot of PUD page tables.  We don't
account PUD page tables, only PMD and PTE.

We already addressed the same issue for PMD page tables, see commit
dc6c9a35b6 ("mm: account pmd page tables to the process").
Introduction of 5-level paging brings the same issue for PUD page
tables.

The patch expands accounting to PUD level.

[kirill.shutemov@linux.intel.com: s/pmd_t/pud_t/]
  Link: http://lkml.kernel.org/r/20171004074305.x35eh5u7ybbt5kar@black.fi.intel.com
[heiko.carstens@de.ibm.com: s390/mm: fix pud table accounting]
  Link: http://lkml.kernel.org/r/20171103090551.18231-1-heiko.carstens@de.ibm.com
Link: http://lkml.kernel.org/r/20171002080427.3320-1-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:04 -08:00
Jan Kara
67fd707f46 mm: remove nr_pages argument from pagevec_lookup_{,range}_tag()
All users of pagevec_lookup() and pagevec_lookup_range() now pass
PAGEVEC_SIZE as a desired number of pages.  Just drop the argument.

Link: http://lkml.kernel.org/r/20171009151359.31984-15-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:04 -08:00
Jan Kara
93d3b7140a mm: add variant of pagevec_lookup_range_tag() taking number of pages
Currently pagevec_lookup_range_tag() takes number of pages to look up
but most users don't need this.  Create a new function
pagevec_lookup_range_nr_tag() that takes maximum number of pages to
lookup for Ceph which wants this functionality so that we can drop
nr_pages argument from pagevec_lookup_range_tag().

Link: http://lkml.kernel.org/r/20171009151359.31984-13-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:04 -08:00
Jan Kara
72b045aecd mm: implement find_get_pages_range_tag()
Patch series "Ranged pagevec tagged lookup", v3.

In this series I provide a ranged variant of pagevec_lookup_tag() and
use it in places where it makes sense.  This series removes some common
code and it also has a potential for speeding up some operations
similarly as for pagevec_lookup_range() (but for now I can think of only
artificial cases where this happens).

This patch (of 16):

Implement a variant of find_get_pages_tag() that stops iterating at
given index.  Lots of users of this function (through pagevec_lookup())
actually want a range lookup and all of them are currently open-coding
this.

Also create corresponding pagevec_lookup_range_tag() function.

Link: http://lkml.kernel.org/r/20171009151359.31984-2-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Bob Peterson <rpeterso@redhat.com>
Cc: Chao Yu <yuchao0@huawei.com>
Cc: David Howells <dhowells@redhat.com>
Cc: David Sterba <dsterba@suse.com>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Steve French <sfrench@samba.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:03 -08:00
Michal Hocko
8745808fda mm, arch: remove empty_bad_page*
empty_bad_page() and empty_bad_pte_table() seem to be relics from old
days which is not used by any code for a long time.  I have tried to
find when exactly but this is not really all that straightforward due to
many code movements - traces disappear around 2.4 times.

Anyway no code really references neither empty_bad_page nor
empty_bad_pte_table.  We only allocate the storage which is not used by
anybody so remove them.

Link: http://lkml.kernel.org/r/20171004150045.30755-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Ralf Baechle <ralf@linus-mips.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: David Howells <dhowells@redhat.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:03 -08:00
Andrey Ryabinin
3a50d14d0d mm: remove unused pgdat->inactive_ratio
Since commit 59dc76b0d4 ("mm: vmscan: reduce size of inactive file
list") 'pgdat->inactive_ratio' is not used, except for printing
"node_inactive_ratio: 0" in /proc/zoneinfo output.

Remove it.

Link: http://lkml.kernel.org/r/20171003152611.27483-1-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:03 -08:00
Jérôme Glisse
4645b9fe84 mm/mmu_notifier: avoid call to invalidate_range() in range_end()
This is an optimization patch that only affect mmu_notifier users which
rely on the invalidate_range() callback.  This patch avoids calling that
callback twice in a row from inside __mmu_notifier_invalidate_range_end

Existing pattern (before this patch):
    mmu_notifier_invalidate_range_start()
        pte/pmd/pud_clear_flush_notify()
            mmu_notifier_invalidate_range()
    mmu_notifier_invalidate_range_end()
        mmu_notifier_invalidate_range()

New pattern (after this patch):
    mmu_notifier_invalidate_range_start()
        pte/pmd/pud_clear_flush_notify()
            mmu_notifier_invalidate_range()
    mmu_notifier_invalidate_range_only_end()

We call the invalidate_range callback after clearing the page table
under the page table lock and we skip the call to invalidate_range
inside the __mmu_notifier_invalidate_range_end() function.

Idea from Andrea Arcangeli

Link: http://lkml.kernel.org/r/20171017031003.7481-3-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Alistair Popple <alistair@popple.id.au>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:03 -08:00
Jérôme Glisse
0f10851ea4 mm/mmu_notifier: avoid double notification when it is useless
This patch only affects users of mmu_notifier->invalidate_range callback
which are device drivers related to ATS/PASID, CAPI, IOMMUv2, SVM ...
and it is an optimization for those users.  Everyone else is unaffected
by it.

When clearing a pte/pmd we are given a choice to notify the event under
the page table lock (notify version of *_clear_flush helpers do call the
mmu_notifier_invalidate_range).  But that notification is not necessary
in all cases.

This patch removes almost all cases where it is useless to have a call
to mmu_notifier_invalidate_range before
mmu_notifier_invalidate_range_end.  It also adds documentation in all
those cases explaining why.

Below is a more in depth analysis of why this is fine to do this:

For secondary TLB (non CPU TLB) like IOMMU TLB or device TLB (when
device use thing like ATS/PASID to get the IOMMU to walk the CPU page
table to access a process virtual address space).  There is only 2 cases
when you need to notify those secondary TLB while holding page table
lock when clearing a pte/pmd:

  A) page backing address is free before mmu_notifier_invalidate_range_end
  B) a page table entry is updated to point to a new page (COW, write fault
     on zero page, __replace_page(), ...)

Case A is obvious you do not want to take the risk for the device to write
to a page that might now be used by something completely different.

Case B is more subtle. For correctness it requires the following sequence
to happen:
  - take page table lock
  - clear page table entry and notify (pmd/pte_huge_clear_flush_notify())
  - set page table entry to point to new page

If clearing the page table entry is not followed by a notify before setting
the new pte/pmd value then you can break memory model like C11 or C++11 for
the device.

Consider the following scenario (device use a feature similar to ATS/
PASID):

Two address addrA and addrB such that |addrA - addrB| >= PAGE_SIZE we
assume they are write protected for COW (other case of B apply too).

[Time N] -----------------------------------------------------------------
CPU-thread-0  {try to write to addrA}
CPU-thread-1  {try to write to addrB}
CPU-thread-2  {}
CPU-thread-3  {}
DEV-thread-0  {read addrA and populate device TLB}
DEV-thread-2  {read addrB and populate device TLB}
[Time N+1] ---------------------------------------------------------------
CPU-thread-0  {COW_step0: {mmu_notifier_invalidate_range_start(addrA)}}
CPU-thread-1  {COW_step0: {mmu_notifier_invalidate_range_start(addrB)}}
CPU-thread-2  {}
CPU-thread-3  {}
DEV-thread-0  {}
DEV-thread-2  {}
[Time N+2] ---------------------------------------------------------------
CPU-thread-0  {COW_step1: {update page table point to new page for addrA}}
CPU-thread-1  {COW_step1: {update page table point to new page for addrB}}
CPU-thread-2  {}
CPU-thread-3  {}
DEV-thread-0  {}
DEV-thread-2  {}
[Time N+3] ---------------------------------------------------------------
CPU-thread-0  {preempted}
CPU-thread-1  {preempted}
CPU-thread-2  {write to addrA which is a write to new page}
CPU-thread-3  {}
DEV-thread-0  {}
DEV-thread-2  {}
[Time N+3] ---------------------------------------------------------------
CPU-thread-0  {preempted}
CPU-thread-1  {preempted}
CPU-thread-2  {}
CPU-thread-3  {write to addrB which is a write to new page}
DEV-thread-0  {}
DEV-thread-2  {}
[Time N+4] ---------------------------------------------------------------
CPU-thread-0  {preempted}
CPU-thread-1  {COW_step3: {mmu_notifier_invalidate_range_end(addrB)}}
CPU-thread-2  {}
CPU-thread-3  {}
DEV-thread-0  {}
DEV-thread-2  {}
[Time N+5] ---------------------------------------------------------------
CPU-thread-0  {preempted}
CPU-thread-1  {}
CPU-thread-2  {}
CPU-thread-3  {}
DEV-thread-0  {read addrA from old page}
DEV-thread-2  {read addrB from new page}

So here because at time N+2 the clear page table entry was not pair with a
notification to invalidate the secondary TLB, the device see the new value
for addrB before seing the new value for addrA.  This break total memory
ordering for the device.

When changing a pte to write protect or to point to a new write protected
page with same content (KSM) it is ok to delay invalidate_range callback
to mmu_notifier_invalidate_range_end() outside the page table lock.  This
is true even if the thread doing page table update is preempted right
after releasing page table lock before calling
mmu_notifier_invalidate_range_end

Thanks to Andrea for thinking of a problematic scenario for COW.

[jglisse@redhat.com: v2]
  Link: http://lkml.kernel.org/r/20171017031003.7481-2-jglisse@redhat.com
Link: http://lkml.kernel.org/r/20170901173011.10745-1-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Alistair Popple <alistair@popple.id.au>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:03 -08:00
Ralph Campbell
0bea803e9e mm/hmm: constify hmm_devmem_page_get_drvdata() parameter
Constify pointer parameter to avoid issue when use from code that only
has const struct page pointer to use in the first place.

Link: http://lkml.kernel.org/r/1506972774-10191-1-git-send-email-jglisse@redhat.com
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:03 -08:00
Gioh Kim
66e8b438bd mm/memblock.c: make the index explicit argument of for_each_memblock_type
for_each_memblock_type macro function relies on idx variable defined in
the caller context.  Silent macro arguments are almost always wrong
thing to do.  They make code harder to read and easier to get wrong.
Let's use an explicit iterator parameter for for_each_memblock_type and
make the code more obious.  This patch is a mere cleanup and it
shouldn't introduce any functional change.

Link: http://lkml.kernel.org/r/20170913133029.28911-1-gi-oh.kim@profitbricks.com
Signed-off-by: Gioh Kim <gi-oh.kim@profitbricks.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:02 -08:00
Michal Hocko
4da2ce250f mm: distinguish CMA and MOVABLE isolation in has_unmovable_pages()
Joonsoo has noticed that "mm: drop migrate type checks from
has_unmovable_pages" would break CMA allocator because it relies on
has_unmovable_pages returning false even for CMA pageblocks which in
fact don't have to be movable:

 alloc_contig_range
   start_isolate_page_range
     set_migratetype_isolate
       has_unmovable_pages

This is a result of the code sharing between CMA and memory hotplug
while each one has a different idea of what has_unmovable_pages should
return.  This is unfortunate but fixing it properly would require a lot
of code duplication.

Fix the issue by introducing the requested migrate type argument and
special case MIGRATE_CMA case where CMA page blocks are handled
properly.  This will work for memory hotplug because it requires
MIGRATE_MOVABLE.

Link: http://lkml.kernel.org/r/20171019122118.y6cndierwl2vnguj@dhcp22.suse.cz
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Tested-by: Stefan Wahren <stefan.wahren@i2se.com>
Tested-by: Ran Wang <ran.wang_1@nxp.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Yasuaki Ishimatsu <yasu.isimatu@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:02 -08:00
Minchan Kim
aa8d22a11d mm: swap: SWP_SYNCHRONOUS_IO: skip swapcache only if swapped page has no other reference
When SWP_SYNCHRONOUS_IO swapped-in pages are shared by several
processes, it can cause unnecessary memory wastage by skipping swap
cache.  Because, with swapin fault by read, they could share a page if
the page were in swap cache.  Thus, it avoids allocating same content
new pages.

This patch makes the swapcache skipping work only if the swap pte is
non-sharable.

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/1507620825-5537-1-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:02 -08:00
Minchan Kim
0bcac06f27 mm, swap: skip swapcache for swapin of synchronous device
With fast swap storage, the platforms want to use swap more aggressively
and swap-in is crucial to application latency.

The rw_page() based synchronous devices like zram, pmem and btt are such
fast storage.  When I profile swapin performance with zram lz4
decompress test, S/W overhead is more than 70%.  Maybe, it would be
bigger in nvdimm.

This patch aims to reduce swap-in latency by skipping swapcache if the
swap device is synchronous device like rw_page based device.  It
enhances 45% my swapin test(5G sequential swapin, no readahead, from
2.41sec to 1.64sec).

Link: http://lkml.kernel.org/r/1505886205-9671-5-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:02 -08:00
Minchan Kim
539a6fea7f mm, swap: introduce SWP_SYNCHRONOUS_IO
If rw-page based fast storage is used for swap devices, we need to
detect it to enhance swap IO operations.  This patch is preparation for
optimizing of swap-in operation with next patch.

Link: http://lkml.kernel.org/r/1505886205-9671-4-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:02 -08:00
Minchan Kim
23c47d2ada bdi: introduce BDI_CAP_SYNCHRONOUS_IO
As discussed at

  https://lkml.kernel.org/r/<20170728165604.10455-1-ross.zwisler@linux.intel.com>

someday we will remove rw_page().  If so, we need something to detect
such super-fast storage on which synchronous IO operations like the
current rw_page are always a win.

Introduces BDI_CAP_SYNCHRONOUS_IO to indicate such devices.  With it, we
could use various optimization techniques.

Link: http://lkml.kernel.org/r/1505886205-9671-3-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:02 -08:00
Changbin Du
41710443f7 mm: update comments for struct page.mapping
struct page.mapping can be NULL or points to one object of type
address_space, anon_vma or KSM private structure.

Link: http://lkml.kernel.org/r/1506485067-15954-1-git-send-email-changbin.du@intel.com
Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:02 -08:00
Johannes Thumshirn
5799b255c4 include/linux/slab.h: add kmalloc_array_node() and kcalloc_node()
Patch series "Add kmalloc_array_node() and kcalloc_node()".

Our current memeory allocation routines suffer form an API imbalance,
for one we have kmalloc_array() and kcalloc() which check for overflows
in size multiplication and we have kmalloc_node() and kzalloc_node()
which allow for memory allocation on a certain NUMA node but don't check
for eventual overflows.

This patch (of 6):

We have kmalloc_array() and kcalloc() wrappers on top of kmalloc() which
ensure us overflow free multiplication for the size of a memory
allocation but these implementations are not NUMA-aware.

Likewise we have kmalloc_node() which is a NUMA-aware version of
kmalloc() but the implementation is not aware of any possible overflows
in eventual size calculations.

Introduce a combination of the two above cases to have a NUMA-node aware
version of kmalloc_array() and kcalloc().

Link: http://lkml.kernel.org/r/20170927082038.3782-2-jthumshirn@suse.de
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mike Marciniszyn <infinipath@intel.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:01 -08:00
Alexey Dobriyan
4fd0b46e89 slab, slub, slob: convert slab_flags_t to 32-bit
struct kmem_cache::flags is "unsigned long" which is unnecessary on
64-bit as no flags are defined in the higher bits.

Switch the field to 32-bit and save some space on x86_64 until such
flags appear:

	add/remove: 0/0 grow/shrink: 0/107 up/down: 0/-657 (-657)
	function                                     old     new   delta
	sysfs_slab_add                               720     719      -1
				...
	check_object                                 699     676     -23

[akpm@linux-foundation.org: fix printk warning]
Link: http://lkml.kernel.org/r/20171021100635.GA8287@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:01 -08:00
Alexey Dobriyan
d50112edde slab, slub, slob: add slab_flags_t
Add sparse-checked slab_flags_t for struct kmem_cache::flags (SLAB_POISON,
etc).

SLAB is bloated temporarily by switching to "unsigned long", but only
temporarily.

Link: http://lkml.kernel.org/r/20171021100225.GA22428@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:01 -08:00
Linus Torvalds
c9b012e5f4 Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
 "The big highlight is support for the Scalable Vector Extension (SVE)
  which required extensive ABI work to ensure we don't break existing
  applications by blowing away their signal stack with the rather large
  new vector context (<= 2 kbit per vector register). There's further
  work to be done optimising things like exception return, but the ABI
  is solid now.

  Much of the line count comes from some new PMU drivers we have, but
  they're pretty self-contained and I suspect we'll have more of them in
  future.

  Plenty of acronym soup here:

   - initial support for the Scalable Vector Extension (SVE)

   - improved handling for SError interrupts (required to handle RAS
     events)

   - enable GCC support for 128-bit integer types

   - remove kernel text addresses from backtraces and register dumps

   - use of WFE to implement long delay()s

   - ACPI IORT updates from Lorenzo Pieralisi

   - perf PMU driver for the Statistical Profiling Extension (SPE)

   - perf PMU driver for Hisilicon's system PMUs

   - misc cleanups and non-critical fixes"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (97 commits)
  arm64: Make ARMV8_DEPRECATED depend on SYSCTL
  arm64: Implement __lshrti3 library function
  arm64: support __int128 on gcc 5+
  arm64/sve: Add documentation
  arm64/sve: Detect SVE and activate runtime support
  arm64/sve: KVM: Hide SVE from CPU features exposed to guests
  arm64/sve: KVM: Treat guest SVE use as undefined instruction execution
  arm64/sve: KVM: Prevent guests from using SVE
  arm64/sve: Add sysctl to set the default vector length for new processes
  arm64/sve: Add prctl controls for userspace vector length management
  arm64/sve: ptrace and ELF coredump support
  arm64/sve: Preserve SVE registers around EFI runtime service calls
  arm64/sve: Preserve SVE registers around kernel-mode NEON use
  arm64/sve: Probe SVE capabilities and usable vector lengths
  arm64: cpufeature: Move sys_caps_initialised declarations
  arm64/sve: Backend logic for setting the vector length
  arm64/sve: Signal handling support
  arm64/sve: Support vector length resetting for new processes
  arm64/sve: Core task context handling
  arm64/sve: Low-level CPU setup
  ...
2017-11-15 10:56:56 -08:00
Linus Torvalds
b293fca43b Merge tag 'riscv-for-linus-4.15-arch-v9-premerge' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux
Pull RISC-V architecture support from Palmer Dabbelt:
 "This contains the core RISC-V Linux port, which has been through nine
  rounds of review on various mailing lists. The port is not complete:
  there's some cleanup patches moving through the review process, a
  whole bunch of drivers that need some work, and a lot of feature
  additions that will be needed.

  The patches contained in this tag have been through nine rounds of
  review on the various mailing lists. I have some outstanding cleanup
  patches, but since there's been so much review on these patches I
  thought it would be best to submit them as-is and then submit explicit
  cleanup patches so everyone can review them. This first patch set is
  big enough that it's a bit of a pain to constantly rewrite, and it's
  caused a few headaches with various contributors.

  The port is definately a work in progress. While what's there builds
  and boots with 4.14, it's a bit hard to actually see anything happen
  because there are no device drivers yet. I maintain a staging branch
  that contains all the device drivers and cleanup that actually works,
  but those patches won't all be ready for a while. I'd like to get what
  we currently have into your tree so everyone can start working from a
  single base -- of particular importance is allowing the glibc
  upstreaming process to proceed so we can sort out any possibly
  lingering user-visible ABI problems we might have.

  Copied below is the ChangeLog that contains the history of this patch
  set:

   (v9) As per suggestions on our v8 patch set, I've split the core
        architecture code out from our drivers and would like to submit
        this patch set to be included into linux-next, with the goal
        being to be merged in during the next merge window. This patch
        set is based on 4.14-rc2, but if it's better to have it based on
        something else then I can change it around.

        This patch set contains just the core arch code for RISC-V, so
        while it builds an nominally boots, you can't print or take an
        interrupt so it's not that useful. If you're looking to actually
        boot a system it would probably be better to use the full patch
        set listed below.

        We've collected a handful of tags from reviewers, and the
        remainder of the patch set only got minimal feedback last time.
        Here's what changed:

         - We now use the device tree to initialize the timer driver so
           it's less tighly coupled with the arch port.

         - I cleaned up the defconfigs -- there's actually now just one,
           and it's empty. For now I think we're OK with what the kernel
           sets as defaults, but I anticipate we'll begin to expand this
           as people start to use the port more.

         - The VDSO symbols version is sane.

         - We WFI while spinning in the boot loop.

         - A handful of comments have been added.

        While there are still a handful of FIXMEs in this patch set,
        we've started to get enough interest from various users and
        contributors that maintaining an out of tree patch set is
        starting to become a big burden. Hopefully the patches are good
        enough to merge now, which will at least get everyone working in
        a more reasonable manner as we clean up the remaining issues.

   (v8) I know it may not be the ideal time to submit a patch set right
        now, as it's the middle of the merge window, but things have
        calmed down quite a bit in the last month so I thought it would
        be good to get everyone on the same page. There's been a handful
        of changes since the last patch set, but most of them are fairly
        minor:

         - We changed PAGE_OFFSET to allowing mapping more physical
           memory on 64-bit systems. This is user configurable, as it
           triggers a different code model that generates slightly less
           efficient code.

         - The device tree binding documentation is back, I'd managed to
           lose it at some point.

         - We now pass the atomic64 test suite

         - The SBI timer driver has been refactored.

   (v7) It's been a while since my last patch set, but the changes han
        been fairly minimal:

         - The PCI cleanup patches have been dropped, we'll do them as a
           separate patch set later.

         - We've the Kconfig entries from CONFIG_ISA_* to
           CONFIG_RISCV_ISA_*, to make grep easier.

         - There have been a handful of memory model related tweaks in
           I/O land, particularly relating the PCI and the upcoming
           platform specification. There are significant comments in the
           relevant files. This is still a WIP, but I think we're close
           to getting as good as we're going to get until we end up with
           some more specifications.

   (v6) As it's been only a day since the v5 patch set, the changes are
        pretty minimal:

         - The patch set is now based on linux-next/master, which I
           believe is a better base now that we're getting closer to
           upstream.

         - EARLY_PRINTK is no longer an option. Since the SBI console is
           reasonable, there's no penalty to enabling it (and thus no
           benefit to disabling it).

         - The mmap syscalls were refactored a bit.

   (v5) Things have really started to calm down, so this is fairly
        similar to the v4 patch set. The most interesting changes
        include:

         - We've moved back to a single patch set.

         - SMP support has been fixed, I was accidentally running on a
           non-SMP configuration. There were various mistakes all over
           the tree as a result of this.

         - The cmpxchg syscalls have been removed, as they were deemed a
           bad idea. As a result, RISC-V Linux systems mandate the A
           extension. The corresponding Kconfig entry to enable builds
           on non-A systems has been removed.

         - A few more atomic fixes: mostly fence changes, but those
           resulted in a handful of additional macros that were no
           longer necessary.

         - riscv_early_sie has been removed.

   (v4) There have only been a few changes since the v3 patch set:

         - The cmpxchg64 syscall is no longer enabled on 32-bit systems.
           It's not possible to provide this on SMP systems, and it's
           not necessary as glibc knows not to call it.

         - We provide a ELF_HWCAP so users can determine the ISA of the
           machine the kernel is running on.

         - The multi-line comments are in a better form.

         - There were a handful of headers that could be replaced with
           the asm-generic versions, and a few unnecessary definitions.

         - We no longer use printk, but instead use pr_*.

         - A few Kconfig and defconfig entries have been cleaned up.

   (v3) A highlight of the changes since the v2 patch set includes:

         - We've split out all our drivers into separate patch sets,
           which I've already sent out to the relevant maintainers. I
           haven't included those patches in this patch set, but some of
           them are necessary to build our port.

         - The patch set is now split up differently: rather than being
           split per directory it is split per topic. Hopefully this
           will make it easier to review the port on the mailing list.
           The split is a bit rough, so you probably still want to look
           at the patch set as a whole.

         - atomic.h has been completely rewritten and is hopefully now
           correct. I've attempted to sanitize the various other memory
           model related code as well, and I think it should all be sane
           now aside from a handful of FIXMEs commented in the code.

         - We've changed the cmpexchg syscall to always exist and to not
           be multiplexed. There is also a VDSO entry for compare and
           exchange, which allows kernels with the A extension to
           execute user code without the A extension reasonably fast.

         - Our user-visible register state now contains enough space for
           the Q extension for 128-bit floating point, as well as a few
           words to allow extensibility to future ISA extensions like
           the eventual V extension for vectors.

         - A handful of driver cleanups, but these have been split into
           separate patch sets now so I won't duplicate them here.

   (v2) A highlight of the changes since the v1 patch set includes:

         - We've split out our drivers into the right places, which
           means now there's a lot more patches. I'll be submitting
           these patches to various subsystem maintainers and including
           them in any future RISC-V patch sets until they've been
           merged.

         - The SBI console driver has been completely rewritten to use
           the HVC helpers and is now significantly smaller.

         - We've begun to use weaker barriers as opposed to just the big
           "fence". There's still some work to do here, specifically:
            - We need fences in the relaxed MMIO functions.
            - The non-relaxed MMIO functions are missing R/W bits on their fences.
            - Many AMOs need the aq and rl bits set.

         - We now have thread_info in task_struct. As a result, sscratch
           now contains TP instead of SP. This was necessary because
           thread_info is no longer on the stack.

         - A few shared routines have been added that we use instead of
           creating another arch copy"

Reviewed-by: Arnd Bergmann <arnd@arndb.de>

* tag 'riscv-for-linus-4.15-arch-v9-premerge' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux:
  RISC-V: Build Infrastructure
  RISC-V: User-facing API
  RISC-V: Paging and MMU
  RISC-V: Device, timer, IRQs, and the SBI
  RISC-V: Task implementation
  RISC-V: ELF and module implementation
  RISC-V: Generic library routines and assembly
  RISC-V: Atomic and Locking Code
  RISC-V: Init and Halt Code
  dt-bindings: RISC-V CPU Bindings
  lib: Add shared copies of some GCC library routines
  MAINTAINERS: Add RISC-V
2017-11-15 10:49:15 -08:00
Linus Torvalds
0ef76878cf Merge branch 'for-linus' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching
Pull livepatching updates from Jiri Kosina:

 - shadow variables support, allowing livepatches to associate new
   "shadow" fields to existing data structures, from Joe Lawrence

 - pre/post patch callbacks API, allowing livepatch writers to register
   callbacks to be called before and after patch application, from Joe
   Lawrence

* 'for-linus' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
  livepatch: __klp_disable_patch() should never be called for disabled patches
  livepatch: Correctly call klp_post_unpatch_callback() in error paths
  livepatch: add transition notices
  livepatch: move transition "complete" notice into klp_complete_transition()
  livepatch: add (un)patch callbacks
  livepatch: Small shadow variable documentation fixes
  livepatch: __klp_shadow_get_or_alloc() is local to shadow.c
  livepatch: introduce shadow variable API
2017-11-15 10:21:58 -08:00
Linus Torvalds
9682b3dea2 Merge branch 'for-linus' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina:
 "The usual rocket-science from trivial tree for 4.15"

* 'for-linus' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
  MAINTAINERS: relinquish kconfig
  MAINTAINERS: Update my email address
  treewide: Fix typos in Kconfig
  kfifo: Fix comments
  init/Kconfig: Fix module signing document location
  misc: ibmasm: Return error on error path
  HID: logitech-hidpp: fix mistake in printk, "feeback" -> "feedback"
  MAINTAINERS: Correct path to uDraw PS3 driver
  tracing: Fix doc mistakes in trace sample
  tracing: Kconfig text fixes for CONFIG_HWLAT_TRACER
  MIPS: Alchemy: Remove reverted CONFIG_NETLINK_MMAP from db1xxx_defconfig
  mm/huge_memory.c: fixup grammar in comment
  lib/xz: Add fall-through comments to a switch statement
2017-11-15 10:14:11 -08:00
Linus Torvalds
20df15783a Merge branch 'for-linus' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jikos/hid
Pull HID updates from Jiri Kosina:

 - high resolution mode for Dell canvas support, from Benjamin Tissoires

 - pen handling fixes for the Wacom driver, from Jason Gerecke

 - i2c-hid: Apollo-Lake based laptops improvements, from Hans de Goede

 - Input/Core: eraser tool support, from Ping Cheng

 - new ALPS touchpad (T4, found currently on HP EliteBook 1000, Zbook
   Stduio and HP Elite book x360) supportm from Masaki Ota

 - other smaller assorted fixes

* 'for-linus' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (33 commits)
  HID: cp2112: fix broken gpio_direction_input callback
  HID: cp2112: fix interface specification URL
  HID: Wacom: switch Dell canvas into highres mode
  HID: wacom: generic: Send BTN_STYLUS3 when both barrel switches are set
  HID: sony: Fix SHANWAN pad rumbling on USB
  HID: i2c-hid: Add no-irq-after-reset quirk for 0911:5288 device
  HID: add backlight level quirk for Asus ROG laptops
  HID: cp2112: add HIDRAW dependency
  HID: Add ID 044f:b605 ThrustMaster, Inc. force feedback Racing Wheel
  HID: hid-logitech: remove redundant assignment to pointer value
  HID: wacom: generic: Recognize WACOM_HID_WD_PEN as a type of pen collection
  HID: rmi: Check that a device is a RMI device before calling RMI functions
  HID: add multi-input quirk for GamepadBlock
  HID: alps: add new U1 device ID
  HID: alps: add support for Alps T4 Touchpad device
  HID: alps: remove variables local to u1_init() from the device struct
  HID: alps: properly handle max_fingers and minimum on X and Y axis
  HID: alps: Separate U1 device code
  HID: alps: delete unnecessary struct u1_dev devInfo
  HID: usbhid: Convert timers to use timer_setup()
  ...
2017-11-15 09:43:57 -08:00
Jiri Kosina
01125b2d1f Merge branch 'for-4.15/wacom' into for-linus
- High resolution mode for DEll canvas support, from Benjamin Tissoires
- A lot of improvements to pen handling in the Wacom driver, from Jason Gerecke

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-11-15 11:14:23 +01:00
Jiri Kosina
6ed7a70be5 Merge branch 'for-4.15/upstream' into for-linus
- cp2112: GPIO error handling and Kconfig fixes from Sébastien Szymanski
- i2c-hid: fixup / quirk for Apollo-Lake based laptops, from Hans de Goede
- Input/Core: add eraser tool support, from Ping Cheng
- small assorted code fixes

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-11-15 11:10:38 +01:00
Jiri Kosina
ea3bbd0a21 Merge branch 'for-4.15/multitouch' into for-linus
- make sure that we forward MSC_TIMESTAMP in accordance to the specification,
  from Nicolas Boichat

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-11-15 11:09:23 +01:00
Linus Torvalds
37cb8e1f8e Merge tag 'devicetree-for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull DeviceTree updates from Rob Herring:
 "A bigger diffstat than usual with the kbuild changes and a tree wide
  fix in the binding documentation.

  Summary:

   - kbuild cleanups and improvements for dtbs

   - Code clean-up of overlay code and fixing for some long standing
     memory leak and race condition in applying overlays

   - Improvements to DT memory usage making sysfs/kobjects optional and
     skipping unflattening of disabled nodes. This is part of kernel
     tinification efforts.

   - Final piece of removing storing the full path for every DT node.
     The prerequisite conversion of printk's to use device_node format
     specifier happened in 4.14.

   - Sync with current upstream dtc. This brings additional checks to
     dtb compiling.

   - Binding doc tree wide removal of leading 0s from examples

   - RTC binding documentation adding missing devices and some
     consolidation of duplicated bindings

   - Vendor prefix documentation for nutsboard, Silicon Storage
     Technology, shimafuji, Tecon Microprocessor Technologies, DH
     electronics GmbH, Opal Kelly, and Next Thing"

* tag 'devicetree-for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (55 commits)
  dt-bindings: usb: add #phy-cells to usb-nop-xceiv
  dt-bindings: Remove leading zeros from bindings notation
  kbuild: handle dtb-y and CONFIG_OF_ALL_DTBS natively in Makefile.lib
  MIPS: dts: remove bogus bcm96358nb4ser.dtb from dtb-y entry
  kbuild: clean up *.dtb and *.dtb.S patterns from top-level Makefile
  .gitignore: move *.dtb and *.dtb.S patterns to the top-level .gitignore
  .gitignore: sort normal pattern rules alphabetically
  dt-bindings: add vendor prefix for Next Thing Co.
  scripts/dtc: Update to upstream version v1.4.5-6-gc1e55a5513e9
  of: dynamic: fix memory leak related to properties of __of_node_dup
  of: overlay: make pr_err() string unique
  of: overlay: pr_err from return NOTIFY_OK to overlay apply/remove
  of: overlay: remove unneeded check for NULL kbasename()
  of: overlay: remove a dependency on device node full_name
  of: overlay: simplify applying symbols from an overlay
  of: overlay: avoid race condition between applying multiple overlays
  of: overlay: loosen overly strict phandle clash check
  of: overlay: expand check of whether overlay changeset can be removed
  of: overlay: detect cases where device tree may become corrupt
  of: overlay: minor restructuring
  ...
2017-11-14 18:25:40 -08:00
Linus Torvalds
6a77d86655 Merge tag 'leds_for_4.15rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds
Pull LED updates from Jacek Anaszewski:
 "New LED class driver:
   - add a driver for PC Engines APU/APU2 LEDs

  New LED trigger:
   - add a system activity LED trigger

  LED core improvements:
   - replace flags bit shift with BIT() macros

  Convert timers to use timer_setup() in:
   - led-core
   - ledtrig-activity
   - ledtrig-heartbeat
   - ledtrig-transient

  LED class drivers fixes:
   - lp55xx: fix spelling mistake: 'cound' -> 'could'
   - tca6507: Remove unnecessary reg check
   - pca955x: Don't invert requested value in pca955x_gpio_set_value()

  LED documentation improvements:
   - update 00-INDEX file"

* tag 'leds_for_4.15rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds:
  leds: Add driver for PC Engines APU/APU2 LEDs
  leds: lp55xx: fix spelling mistake: 'cound' -> 'could'
  leds: Convert timers to use timer_setup()
  Documentation: leds: Update 00-INDEX file
  leds: tca6507: Remove unnecessary reg check
  leds: ledtrig-heartbeat: Convert timers to use timer_setup()
  leds: Replace flags bit shift with BIT() macros
  leds: pca955x: Don't invert requested value in pca955x_gpio_set_value()
  leds: ledtrig-activity: Add a system activity LED trigger
2017-11-14 18:09:31 -08:00
Linus Torvalds
9f7a9b1191 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:

 - three new touchscreen drivers: EETI EXC3000, HiDeep, and Samsung
   S6SY761

 - the timer API conversion (setup_timer() -> timer_setup())

 - a few drivers swiytched to using managed API for creating custom
   device attributes

 - other assorted fixed and cleanups.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (50 commits)
  Input: gamecon - mark expected switch fall-throughs
  Input: sidewinder - mark expected switch fall-throughs
  Input: spaceball - mark expected switch fall-throughs
  Input: uinput - unlock on allocation failure in ioctl
  Input: add support for the Samsung S6SY761 touchscreen
  Input: add support for HiDeep touchscreen
  Input: st1232 - remove obsolete platform device support
  Input: convert autorepeat timer to use timer_setup()
  media: ttpci: remove autorepeat handling and use timer_setup
  Input: cyttsp4 - avoid overflows when calculating memory sizes
  Input: mxs-lradc - remove redundant assignment to pointer input
  Input: add I2C attached EETI EXC3000 multi touch driver
  Input: goodix - support gt1151 touchpanel
  Input: ps2-gpio - actually abort probe when connected to sleeping GPIOs
  Input: hil_mlc - convert to using timer_setup()
  Input: hp_sdc - convert to using timer_setup()
  Input: touchsceen - convert timers to use timer_setup()
  Input: keyboard - convert timers to use timer_setup()
  Input: uinput - fold header into the driver proper
  Input: uinput - remove uinput_allocate_device()
  ...
2017-11-14 18:07:18 -08:00
Linus Torvalds
4e4510fec4 Merge tag 'sound-4.15-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound updates from Takashi Iwai:
 "There are no big surprising changes in this cycle, yet not too boring,
  either. The biggest change from diffstat POV is the removal of the
  legacy OSS driver codes that have been already disabled for a long
  time. This will bring a few trivial merge conflicts.

  As new features in ASoC side, there are two things: a new AC97 bus
  implementation and AMD Stony platform support. Both include the
  relevant changes shared with other subsystems, e.g. AC97 MFD changes
  and DRM AMD changes.

  Some other highlighted topics are:

   - A bunch of USB-audio drivers got the hardening against the
     malicious device accesses with a new helper code for endpoint
     sanity check

   - Lots of cleanups for ASoC Intel platform code, including support
     for their open source audio firmware

   - Continued ASoC core componentization works

   - Support for scaling MCLK with sample rate in ASoC simple-card

   - Stabler PCM hot-unplug capability, especially for ASoC usages"

* tag 'sound-4.15-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (302 commits)
  Documentation: sound: hd-audio: notes.rst
  ASoC: bcm2835: Support left/right justified and DSP modes
  ASoC: bcm2835: Enforce full symmetry
  ASoC: bcm2835: Support additional samplerates up to 384kHz
  ASoC: bcm2835: Add support for TDM modes
  ASoC: add mclk-fs support to audio graph card
  ASoC: add mclk-fs to audio graph card binding
  ASoC: rt5514: work around link error
  ASoC: rt5514: mark PM functions as __maybe_unused
  ASoC: rt5663: Check the JD status in the button pushing
  ASoC: amd: Modified DMA transfer Mechanism for Playback
  ASoC: rt5645: Wait for 400msec before concluding on value of RT5645_VENDOR_ID2
  ASoC: sun4i-codec: fixed 32bit audio capture support for H3/H2+
  ASoC: da7213: add support for DSP modes
  ASoC: sun8i-codec: Add a comment on the LRCK inversion
  ASoC: sun8i-codec: Set the BCLK divider
  ASoC: rt5663: Delay and retry reading rt5663 ID register
  ASoC: amd: use do_div rather than 64 bit division to fix 32 bit builds
  ASoC: cs42l56: Fix reset GPIO name in example DT binding
  ASoC: rt5514-spi: check irq status to schedule data copy in resume function
  ...
2017-11-14 18:01:46 -08:00
Linus Torvalds
4008e6a9bc Merge branch 'i2c/for-4.15' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c updates from Wolfram Sang:
 "This contains two bigger than usual tree-wide changes this time. They
  all have proper acks, caused no merge conflicts in linux-next where
  they have been for a while. They are namely:

   - to-gpiod conversion of the i2c-gpio driver and its users (touching
     arch/* and drivers/mfd/*)

   - adding a sbs-manager based on I2C core updates to SMBus alerts
     (touching drivers/power/*)

  Other notable changes:

   - i2c_boardinfo can now carry a dev_name to be used when the device
     is created. This is because some devices in ACPI world need fixed
     names to find the regulators.

   - the designware driver got a long discussed overhaul of its PM
     handling. img-scb and davinci got PM support, too.

   - at24 driver has way better OF support. And it has a new maintainer.
     Thanks Bartosz for stepping up!

  The rest is regular driver updates and fixes"

* 'i2c/for-4.15' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (55 commits)
  ARM: sa1100: simpad: Correct I2C GPIO offsets
  i2c: aspeed: Deassert reset in probe
  eeprom: at24: Add OF device ID table
  MAINTAINERS: new maintainer for AT24 driver
  i2c: nuc900: remove platform_data, too
  i2c: thunderx: Remove duplicate NULL check
  i2c: taos-evm: Remove duplicate NULL check
  i2c: Make i2c_unregister_device() NULL-aware
  i2c: xgene-slimpro: Support v2
  i2c: mpc: remove useless variable initialization
  i2c: omap: Trigger bus recovery in lockup case
  i2c: gpio: Add support for named gpios in DT
  dt-bindings: i2c: i2c-gpio: Add support for named gpios
  i2c: gpio: Local vars in probe
  i2c: gpio: Augment all boardfiles to use open drain
  i2c: gpio: Enforce open drain through gpiolib
  gpio: Make it possible for consumers to enforce open drain
  i2c: gpio: Convert to use descriptors
  power: supply: sbs-message: fix some code style issues
  power: supply: sbs-battery: remove unchecked return var
  ...
2017-11-14 17:52:21 -08:00
Linus Torvalds
6aa2f9441f Merge tag 'gpio-v4.15-1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO updates from Linus Walleij:
 "This is the bulk of GPIO changes for the v4.15 kernel cycle:

  Core:

   - Fix the semantics of raw GPIO to actually be raw. No inversion
     semantics as before, but also no open draining, and allow the raw
     operations to affect lines used for interrupts as the caller
     supposedly knows what they are doing if they are getting the big
     hammer.

   - Rewrote the __inner_function() notation calls to names that make
     more sense. I just find this kind of code disturbing.

   - Drop the .irq_base() field from the gpiochip since now all IRQs are
     mapped dynamically. This is nice.

   - Support for .get_multiple() in the core driver API. This allows us
     to read several GPIO lines with a single register read. This has
     high value for some usecases: it can be used to create
     oscilloscopes and signal analyzers and other things that rely on
     reading several lines at exactly the same instant. Also a generally
     nice optimization. This uses the new assign_bit() macro from the
     bitops lib that was ACKed by Andrew Morton and is implemented for
     two drivers, one of them being the generic MMIO driver so everyone
     using that will be able to benefit from this.

   - Do not allow requests of Open Drain and Open Source setting of a
     GPIO line simultaneously. If the hardware actually supports
     enabling both at the same time the electrical result would be
     disastrous.

   - A new interrupt chip core helper. This will be helpful to deal with
     "banked" GPIOs, which means GPIO controllers with several logical
     blocks of GPIO inside them. This is several gpiochips per device in
     the device model, in contrast to the case when there is a 1-to-1
     relationship between a device and a gpiochip.

  New drivers:

   - Maxim MAX3191x industrial serializer, a very interesting piece of
     professional I/O hardware.

   - Uniphier GPIO driver. This is the GPIO block from the recent
     Socionext (ex Fujitsu and Panasonic) platform.

   - Tegra 186 driver. This is based on the new banked GPIO
     infrastructure.

  Other improvements:

   - Some documentation improvements.

   - Wakeup support for the DesignWare DWAPB GPIO controller.

   - Reset line support on the DesignWare DWAPB GPIO controller.

   - Several non-critical bug fixes and improvements for the Broadcom
     BRCMSTB driver.

   - Misc non-critical bug fixes like exotic errorpaths, removal of dead
     code etc.

   - Explicit comments on fall-through switch() statements"

* tag 'gpio-v4.15-1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (65 commits)
  gpio: tegra186: Remove tegra186_gpio_lock_class
  gpio: rcar: Add r8a77995 (R-Car D3) support
  pinctrl: bcm2835: Fix some merge fallout
  gpio: Fix undefined lock_dep_class
  gpio: Automatically add lockdep keys
  gpio: Introduce struct gpio_irq_chip.first
  gpio: Disambiguate struct gpio_irq_chip.nested
  gpio: Add Tegra186 support
  gpio: Export gpiochip_irq_{map,unmap}()
  gpio: Implement tighter IRQ chip integration
  gpio: Move lock_key into struct gpio_irq_chip
  gpio: Move irq_valid_mask into struct gpio_irq_chip
  gpio: Move irq_nested into struct gpio_irq_chip
  gpio: Move irq_chained_parent to struct gpio_irq_chip
  gpio: Move irq_default_type to struct gpio_irq_chip
  gpio: Move irq_handler to struct gpio_irq_chip
  gpio: Move irqdomain into struct gpio_irq_chip
  gpio: Move irqchip into struct gpio_irq_chip
  gpio: Introduce struct gpio_irq_chip
  pinctrl: armada-37xx: remove unused variable
  ...
2017-11-14 17:23:44 -08:00
Linus Torvalds
e37e0ee019 Merge tag 'dma-mapping-4.15' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping updates from Christoph Hellwig:

 - turn dma_cache_sync into a dma_map_ops instance and remove
   implementation that purely are dead because the architecture doesn't
   support noncoherent allocations

 - add a flag for busses that need DMA configuration (Robin Murphy)

* tag 'dma-mapping-4.15' of git://git.infradead.org/users/hch/dma-mapping:
  dma-mapping: turn dma_cache_sync into a dma_map_ops method
  sh: make dma_cache_sync a no-op
  xtensa: make dma_cache_sync a no-op
  unicore32: make dma_cache_sync a no-op
  powerpc: make dma_cache_sync a no-op
  mn10300: make dma_cache_sync a no-op
  microblaze: make dma_cache_sync a no-op
  ia64: make dma_cache_sync a no-op
  frv: make dma_cache_sync a no-op
  x86: make dma_cache_sync a no-op
  floppy: consolidate the dummy fd_cacheflush definition
  drivers: flag buses which demand DMA configuration
2017-11-14 16:54:12 -08:00
Linus Torvalds
23c258763b Merge tag 'dmaengine-4.15-rc1' of git://git.infradead.org/users/vkoul/slave-dma
Pull dmaengine updates from Vinod Koul:
 "Updates for this cycle include:

   - new driver for Spreadtrum dma controller, ST MDMA and DMAMUX
     controllers

   - PM support for IMG MDC drivers

   - updates to bcm-sba-raid driver and improvements to sun6i driver

   - subsystem conversion for:
      - timers to use timer_setup()
      - remove usage of PCI pool API
      - usage of %p format specifier

   - minor updates to bunch of drivers"

* tag 'dmaengine-4.15-rc1' of git://git.infradead.org/users/vkoul/slave-dma: (49 commits)
  dmaengine: ti-dma-crossbar: Correct am335x/am43xx mux value type
  dmaengine: dmatest: warn user when dma test times out
  dmaengine: Revert "rcar-dmac: use TCRB instead of TCR for residue"
  dmaengine: stm32_mdma: activate pack/unpack feature
  dmaengine: at_hdmac: Remove unnecessary 0x prefixes before %pad
  dmaengine: coh901318: Remove unnecessary 0x prefixes before %pad
  MAINTAINERS: Step down from a co-maintaner of DW DMAC driver
  dmaengine: pch_dma: Replace PCI pool old API
  dmaengine: Convert timers to use timer_setup()
  dmaengine: sprd: Add Spreadtrum DMA driver
  dt-bindings: dmaengine: Add Spreadtrum SC9860 DMA controller
  dmaengine: sun6i: Retrieve channel count/max request from devicetree
  dmaengine: Build bcm-sba-raid driver as loadable module for iProc SoCs
  dmaengine: bcm-sba-raid: Use common GPL comment header
  dmaengine: bcm-sba-raid: Use only single mailbox channel
  dmaengine: bcm-sba-raid: serialize dma_cookie_complete() using reqs_lock
  dmaengine: pl330: fix descriptor allocation fail
  dmaengine: rcar-dmac: use TCRB instead of TCR for residue
  dmaengine: sun6i: Add support for Allwinner A64 and compatibles
  arm64: allwinner: a64: Add devicetree binding for DMA controller
  ...
2017-11-14 16:49:31 -08:00
Linus Torvalds
2cd83ba5be Merge tag 'iommu-v4.15-rc1' of git://github.com/awilliam/linux-vfio
Pull IOMMU updates from Alex Williamson:
 "As Joerg mentioned[1], he's out on paternity leave through the end of
  the year and I'm filling in for him in the interim:

   - Enforce MSI multiple IRQ alignment in AMD IOMMU

   - VT-d PASID error handling fixes

   - Add r8a7795 IPMMU support

   - Manage runtime PM links on exynos at {add,remove}_device callbacks

   - Fix Mediatek driver name to avoid conflict

   - Add terminate support to qcom fault handler

   - 64-bit IOVA optimizations

   - Simplfy IOVA domain destruction, better use of rcache, and skip
     anchor nodes on copy

   - Convert to IOMMU TLB sync API in io-pgtable-arm{-v7s}

   - Drop command queue lock when waiting for CMD_SYNC completion on ARM
     SMMU implementations supporting MSI to cacheable memory

   - iomu-vmsa cleanup inspired by missed IOTLB sync callbacks

   - Fix sleeping lock with preemption disabled for RT

   - Dual MMU support for TI DRA7xx DSPs

   - Optional flush option on IOVA allocation avoiding overhead when
     caller can try other options

  [1] https://lkml.org/lkml/2017/10/22/72"

* tag 'iommu-v4.15-rc1' of git://github.com/awilliam/linux-vfio: (54 commits)
  iommu/iova: Use raw_cpu_ptr() instead of get_cpu_ptr() for ->fq
  iommu/mediatek: Fix driver name
  iommu/ipmmu-vmsa: Hook up r8a7795 DT matching code
  iommu/ipmmu-vmsa: Allow two bit SL0
  iommu/ipmmu-vmsa: Make IMBUSCTR setup optional
  iommu/ipmmu-vmsa: Write IMCTR twice
  iommu/ipmmu-vmsa: IPMMU device is 40-bit bus master
  iommu/ipmmu-vmsa: Make use of IOMMU_OF_DECLARE()
  iommu/ipmmu-vmsa: Enable multi context support
  iommu/ipmmu-vmsa: Add optional root device feature
  iommu/ipmmu-vmsa: Introduce features, break out alias
  iommu/ipmmu-vmsa: Unify ipmmu_ops
  iommu/ipmmu-vmsa: Clean up struct ipmmu_vmsa_iommu_priv
  iommu/ipmmu-vmsa: Simplify group allocation
  iommu/ipmmu-vmsa: Unify domain alloc/free
  iommu/ipmmu-vmsa: Fix return value check in ipmmu_find_group_dma()
  iommu/vt-d: Clear pasid table entry when memory unbound
  iommu/vt-d: Clear Page Request Overflow fault bit
  iommu/vt-d: Missing checks for pasid tables if allocation fails
  iommu/amd: Limit the IOVA page range to the specified addresses
  ...
2017-11-14 16:43:27 -08:00
Linus Torvalds
670ffccb2f Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
 "This is mostly updates of the usual suspects: lpfc, qla2xxx, hisi_sas,
  megaraid_sas, pm80xx, mpt3sas, be2iscsi, hpsa. and a host of minor
  updates.

  There's no major behaviour change or additions to the core in all of
  this, so the potential for regressions should be small (biggest
  potential being in the scsi error handler changes)"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (203 commits)
  scsi: lpfc: Fix hard lock up NMI in els timeout handling.
  scsi: mpt3sas: remove a stray KERN_INFO
  scsi: mpt3sas: cleanup _scsih_pcie_enumeration_event()
  scsi: aacraid: use timespec64 instead of timeval
  scsi: scsi_transport_fc: add 64GBIT and 128GBIT port speed definitions
  scsi: qla2xxx: Suppress a kernel complaint in qla_init_base_qpair()
  scsi: mpt3sas: fix dma_addr_t casts
  scsi: be2iscsi: Use kasprintf
  scsi: storvsc: Avoid excessive host scan on controller change
  scsi: lpfc: fix kzalloc-simple.cocci warnings
  scsi: mpt3sas: Update mpt3sas driver version.
  scsi: mpt3sas: Fix sparse warnings
  scsi: mpt3sas: Fix nvme drives checking for tlr.
  scsi: mpt3sas: NVMe drive support for BTDHMAPPING ioctl command and log info
  scsi: mpt3sas: Add-Task-management-debug-info-for-NVMe-drives.
  scsi: mpt3sas: scan and add nvme device after controller reset
  scsi: mpt3sas: Set NVMe device queue depth as 128
  scsi: mpt3sas: Handle NVMe PCIe device related events generated from firmware.
  scsi: mpt3sas: API's to remove nvme drive from sml
  scsi: mpt3sas: API 's to support NVMe drive addition to SML
  ...
2017-11-14 16:23:44 -08:00
Linus Torvalds
e2c5923c34 Merge branch 'for-4.15/block' of git://git.kernel.dk/linux-block
Pull core block layer updates from Jens Axboe:
 "This is the main pull request for block storage for 4.15-rc1.

  Nothing out of the ordinary in here, and no API changes or anything
  like that. Just various new features for drivers, core changes, etc.
  In particular, this pull request contains:

   - A patch series from Bart, closing the whole on blk/scsi-mq queue
     quescing.

   - A series from Christoph, building towards hidden gendisks (for
     multipath) and ability to move bio chains around.

   - NVMe
        - Support for native multipath for NVMe (Christoph).
        - Userspace notifications for AENs (Keith).
        - Command side-effects support (Keith).
        - SGL support (Chaitanya Kulkarni)
        - FC fixes and improvements (James Smart)
        - Lots of fixes and tweaks (Various)

   - bcache
        - New maintainer (Michael Lyle)
        - Writeback control improvements (Michael)
        - Various fixes (Coly, Elena, Eric, Liang, et al)

   - lightnvm updates, mostly centered around the pblk interface
     (Javier, Hans, and Rakesh).

   - Removal of unused bio/bvec kmap atomic interfaces (me, Christoph)

   - Writeback series that fix the much discussed hundreds of millions
     of sync-all units. This goes all the way, as discussed previously
     (me).

   - Fix for missing wakeup on writeback timer adjustments (Yafang
     Shao).

   - Fix laptop mode on blk-mq (me).

   - {mq,name} tupple lookup for IO schedulers, allowing us to have
     alias names. This means you can use 'deadline' on both !mq and on
     mq (where it's called mq-deadline). (me).

   - blktrace race fix, oopsing on sg load (me).

   - blk-mq optimizations (me).

   - Obscure waitqueue race fix for kyber (Omar).

   - NBD fixes (Josef).

   - Disable writeback throttling by default on bfq, like we do on cfq
     (Luca Miccio).

   - Series from Ming that enable us to treat flush requests on blk-mq
     like any other request. This is a really nice cleanup.

   - Series from Ming that improves merging on blk-mq with schedulers,
     getting us closer to flipping the switch on scsi-mq again.

   - BFQ updates (Paolo).

   - blk-mq atomic flags memory ordering fixes (Peter Z).

   - Loop cgroup support (Shaohua).

   - Lots of minor fixes from lots of different folks, both for core and
     driver code"

* 'for-4.15/block' of git://git.kernel.dk/linux-block: (294 commits)
  nvme: fix visibility of "uuid" ns attribute
  blk-mq: fixup some comment typos and lengths
  ide: ide-atapi: fix compile error with defining macro DEBUG
  blk-mq: improve tag waiting setup for non-shared tags
  brd: remove unused brd_mutex
  blk-mq: only run the hardware queue if IO is pending
  block: avoid null pointer dereference on null disk
  fs: guard_bio_eod() needs to consider partitions
  xtensa/simdisk: fix compile error
  nvme: expose subsys attribute to sysfs
  nvme: create 'slaves' and 'holders' entries for hidden controllers
  block: create 'slaves' and 'holders' entries for hidden gendisks
  nvme: also expose the namespace identification sysfs files for mpath nodes
  nvme: implement multipath access to nvme subsystems
  nvme: track shared namespaces
  nvme: introduce a nvme_ns_ids structure
  nvme: track subsystems
  block, nvme: Introduce blk_mq_req_flags_t
  block, scsi: Make SCSI quiesce and resume work reliably
  block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag
  ...
2017-11-14 15:32:19 -08:00
Linus Torvalds
abc36be236 Merge tag 'configfs-for-4.15' of git://git.infradead.org/users/hch/configfs
Pull configfs updates from Christoph Hellwig:
 "A couple of configfs cleanups:

   - proper use of the bool type (Thomas Meyer)

   - constification of struct config_item_type (Bhumika Goyal)"

* tag 'configfs-for-4.15' of git://git.infradead.org/users/hch/configfs:
  RDMA/cma: make config_item_type const
  stm class: make config_item_type const
  ACPI: configfs: make config_item_type const
  nvmet: make config_item_type const
  usb: gadget: configfs: make config_item_type const
  PCI: endpoint: make config_item_type const
  iio: make function argument and some structures const
  usb: gadget: make config_item_type structures const
  dlm: make config_item_type const
  netconsole: make config_item_type const
  nullb: make config_item_type const
  ocfs2/cluster: make config_item_type const
  target: make config_item_type const
  configfs: make ci_type field, some pointers and function arguments const
  configfs: make config_item_type const
  configfs: Fix bool initialization/comparison
2017-11-14 14:44:04 -08:00
Linus Torvalds
f14fc0ccee Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull quota, ext2, isofs and udf fixes from Jan Kara:

 - two small quota error handling fixes

 - two isofs fixes for architectures with signed char

 - several udf block number overflow and signedness fixes

 - ext2 rework of mount option handling to avoid GFP_KERNEL allocation
   with spinlock held

 - ... it also contains a patch to implement auditing of responses to
   fanotify permission events. That should have been in the fanotify
   pull request but I mistakenly merged that patch into a wrong branch
   and noticed only now at which point I don't think it's worth rebasing
   and redoing.

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  quota: be aware of error from dquot_initialize
  quota: fix potential infinite loop
  isofs: use unsigned char types consistently
  isofs: fix timestamps beyond 2027
  udf: Fix some sign-conversion warnings
  udf: Fix signed/unsigned format specifiers
  udf: Fix 64-bit sign extension issues affecting blocks > 0x7FFFFFFF
  udf: Remove some outdate references from documentation
  udf: Avoid overflow when session starts at large offset
  ext2: Fix possible sleep in atomic during mount option parsing
  ext2: Parse mount options into a dedicated structure
  audit: Record fanotify access control decisions
2017-11-14 14:13:11 -08:00
Linus Torvalds
23281c8034 Merge branch 'fsnotify' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fsnotify updates from Jan Kara:

 - fixes of use-after-tree issues when handling fanotify permission
   events from Miklos

 - refcount_t conversions from Elena

 - fixes of ENOMEM handling in dnotify and fsnotify from me

* 'fsnotify' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  fsnotify: convert fsnotify_mark.refcnt from atomic_t to refcount_t
  fanotify: clean up CONFIG_FANOTIFY_ACCESS_PERMISSIONS ifdefs
  fsnotify: clean up fsnotify()
  fanotify: fix fsnotify_prepare_user_wait() failure
  fsnotify: fix pinning group in fsnotify_prepare_user_wait()
  fsnotify: pin both inode and vfsmount mark
  fsnotify: clean up fsnotify_prepare/finish_user_wait()
  fsnotify: convert fsnotify_group.refcnt from atomic_t to refcount_t
  fsnotify: Protect bail out path of fsnotify_add_mark_locked() properly
  dnotify: Handle errors from fsnotify_add_mark_locked() in fcntl_dirnotify()
2017-11-14 14:08:20 -08:00
Linus Torvalds
29309a4eb8 Merge tag 'gfs2-4.15.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 updates from Bob Peterson:
 "We've got a total of 17 GFS2 patches for this merge window. The
  patches are basically in three categories: (1) patches related to
  broken xfstest cases, (2) patches related to improving iomap and start
  using it in GFS2, and (3) general typos and clarifications.

  Please note that one of the iomap patches extends beyond GFS2 and
  affects other file systems, but it was publically reviewed by a
  variety of file system people in the community.

  From Andreas Gruenbacher:

   - rename variable 'bsize' to 'factor' to clarify the logic related to
     gfs2_block_map.

   - correctly set ctime in the setflags ioctl, which fixes broken
     xfstests test 277.

   - fix broken xfstest 258, due to an atime initialization problem.

   - fix broken xfstest 307, in which GFS2 was not setting ctime when
     setting acls.

   - switch general iomap code from blkno to disk offset for a variety
     of file systems.

   - add a new IOMAP_F_DATA_INLINE flag for iomap to indicate blocks
     that have data mixed with metadata.

   - implement SEEK_HOLE and SEEK_DATA via iomap in GFS2.

   - fix failing xfstest case 066, which was due to not properly syncing
     dirty inodes when changing extended attributes.

   - fix a minor typo in a comment.

   - partially fix xfstest 424, which involved GET_FLAGS and SET_FLAGS
     ioctl. This is also a cleanup and simplification of the translation
     of flags from fs flags to gfs2 flags.

   - add support for STATX_ATTR_ in statx, which fixed broken xfstest
     424.

   - fix for failing xfstest 093 which fixes a recursive glock problem
     with gfs2_xattr_get and _set

  From me:

   - make inode height info part of the 'metapath' data structure to
     facilitate using iomap in GFS2.

   - start using iomap inside GFS2 and switch GFS2's block_map functions
     to use iomap under the covers.

   - switch GFS2's fiemap implementation from using block_map to using
     iomap under the covers.

   - fix journaled data pages not being properly synced to media when
     writing inodes. This was caught with xfstests.

   - fix another failing xfstest case in which switching a file from
     ordered_write to journaled data via set_flags caused a deadlock"

* tag 'gfs2-4.15.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Allow gfs2_xattr_set to be called with the glock held
  gfs2: Add support for statx inode flags
  gfs2: Fix and clean up {GET,SET}FLAGS ioctl
  gfs2: Fix a harmless typo
  gfs2: Fix xattr fsync
  GFS2: Take inode off order_write list when setting jdata flag
  GFS2: flush the log and all pages for jdata as we do for WB_SYNC_ALL
  gfs2: Implement SEEK_HOLE / SEEK_DATA via iomap
  GFS2: Switch fiemap implementation to use iomap
  GFS2: Implement iomap for block_map
  GFS2: Make height info part of metapath
  gfs2: Always update inode ctime in set_acl
  gfs2: Support negative atimes
  gfs2: Update ctime in setflags ioctl
  gfs2: Clarify gfs2_block_map
2017-11-14 13:55:51 -08:00