Commit Graph

122362 Commits

Author SHA1 Message Date
Mike Rapoport
c89ab04feb mm/sparse: cleanup the code surrounding memory_present()
After removal of CONFIG_HAVE_MEMBLOCK_NODE_MAP we have two equivalent
functions that call memory_present() for each region in memblock.memory:
sparse_memory_present_with_active_regions() and membocks_present().

Moreover, all architectures have a call to either of these functions
preceding the call to sparse_init() and in the most cases they are called
one after the other.

Mark the regions from memblock.memory as present during sparce_init() by
making sparse_init() call memblocks_present(), make memblocks_present()
and memory_present() functions static and remove redundant
sparse_memory_present_with_active_regions() function.

Also remove no longer required HAVE_MEMORY_PRESENT configuration option.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20200712083130.22919-1-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:27 -07:00
Wei Yang
b8aa9d9d95 mm/mremap: it is sure to have enough space when extent meets requirement
Patch series "mm/mremap: cleanup move_page_tables() a little", v5.

move_page_tables() tries to move page table by PMD or PTE.

The root reason is if it tries to move PMD, both old and new range should
be PMD aligned.  But current code calculate old range and new range
separately.  This leads to some redundant check and calculation.

This cleanup tries to consolidate the range check in one place to reduce
some extra range handling.

This patch (of 3):

old_end is passed to these two functions to check whether there is enough
space to do the move, while this check is done before invoking these
functions.

These two functions only would be invoked when extent meets the
requirement and there is one check before invoking these functions:

    if (extent > old_end - old_addr)
        extent = old_end - old_addr;

This implies (old_end - old_addr) won't fail the check in these two
functions.

Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Thomas Hellstrom (VMware) <thomas_os@shipmail.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Link: http://lkml.kernel.org/r/20200710092835.56368-1-richard.weiyang@linux.alibaba.com
Link: http://lkml.kernel.org/r/20200710092835.56368-2-richard.weiyang@linux.alibaba.com
Link: http://lkml.kernel.org/r/20200708095028.41706-1-richard.weiyang@linux.alibaba.com
Link: http://lkml.kernel.org/r/20200708095028.41706-2-richard.weiyang@linux.alibaba.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:27 -07:00
Peter Collingbourne
45e55300f1 mm: remove unnecessary wrapper function do_mmap_pgoff()
The current split between do_mmap() and do_mmap_pgoff() was introduced in
commit 1fcfd8db7f ("mm, mpx: add "vm_flags_t vm_flags" arg to
do_mmap_pgoff()") to support MPX.

The wrapper function do_mmap_pgoff() always passed 0 as the value of the
vm_flags argument to do_mmap().  However, MPX support has subsequently
been removed from the kernel and there were no more direct callers of
do_mmap(); all calls were going via do_mmap_pgoff().

Simplify the code by removing do_mmap_pgoff() and changing all callers to
directly call do_mmap(), which now no longer takes a vm_flags argument.

Signed-off-by: Peter Collingbourne <pcc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Link: http://lkml.kernel.org/r/20200727194109.1371462-1-pcc@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:27 -07:00
Anshuman Khandual
56993b4e14 mm/sparsemem: enable vmem_altmap support in vmemmap_alloc_block_buf()
There are many instances where vmemap allocation is often switched between
regular memory and device memory just based on whether altmap is available
or not.  vmemmap_alloc_block_buf() is used in various platforms to
allocate vmemmap mappings.  Lets also enable it to handle altmap based
device memory allocation along with existing regular memory allocations.
This will help in avoiding the altmap based allocation switch in many
places.  To summarize there are two different methods to call
vmemmap_alloc_block_buf().

vmemmap_alloc_block_buf(size, node, NULL)   /* Allocate from system RAM */
vmemmap_alloc_block_buf(size, node, altmap) /* Allocate from altmap */

This converts altmap_alloc_block_buf() into a static function, drops it's
entry from the header and updates Documentation/vm/memory-model.rst.

Suggested-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Jia He <justin.he@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Will Deacon <will@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Hsin-Yi Wang <hsinyi@chromium.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yu Zhao <yuzhao@google.com>
Link: http://lkml.kernel.org/r/1594004178-8861-3-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:27 -07:00
Anshuman Khandual
1d9cfee753 mm/sparsemem: enable vmem_altmap support in vmemmap_populate_basepages()
Patch series "arm64: Enable vmemmap mapping from device memory", v4.

This series enables vmemmap backing memory allocation from device memory
ranges on arm64.  But before that, it enables vmemmap_populate_basepages()
and vmemmap_alloc_block_buf() to accommodate struct vmem_altmap based
alocation requests.

This patch (of 3):

vmemmap_populate_basepages() is used across platforms to allocate backing
memory for vmemmap mapping.  This is used as a standard default choice or
as a fallback when intended huge pages allocation fails.  This just
creates entire vmemmap mapping with base pages (PAGE_SIZE).

On arm64 platforms, vmemmap_populate_basepages() is called instead of the
platform specific vmemmap_populate() when ARM64_SWAPPER_USES_SECTION_MAPS
is not enabled as in case for ARM64_16K_PAGES and ARM64_64K_PAGES configs.

At present vmemmap_populate_basepages() does not support allocating from
driver defined struct vmem_altmap while trying to create vmemmap mapping
for a device memory range.  It prevents ARM64_16K_PAGES and
ARM64_64K_PAGES configs on arm64 from supporting device memory with
vmemap_altmap request.

This enables vmem_altmap support in vmemmap_populate_basepages() unlocking
device memory allocation for vmemap mapping on arm64 platforms with 16K or
64K base page configs.

Each architecture should evaluate and decide on subscribing device memory
based base page allocation through vmemmap_populate_basepages().  Hence
lets keep it disabled on all archs in order to preserve the existing
semantics.  A subsequent patch enables it on arm64.

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Jia He <justin.he@arm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Hsin-Yi Wang <hsinyi@chromium.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Yu Zhao <yuzhao@google.com>
Link: http://lkml.kernel.org/r/1594004178-8861-1-git-send-email-anshuman.khandual@arm.com
Link: http://lkml.kernel.org/r/1594004178-8861-2-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:27 -07:00
Feng Tang
56f3547bfa mm: adjust vm_committed_as_batch according to vm overcommit policy
When checking a performance change for will-it-scale scalability mmap test
[1], we found very high lock contention for spinlock of percpu counter
'vm_committed_as':

    94.14%     0.35%  [kernel.kallsyms]         [k] _raw_spin_lock_irqsave
    48.21% _raw_spin_lock_irqsave;percpu_counter_add_batch;__vm_enough_memory;mmap_region;do_mmap;
    45.91% _raw_spin_lock_irqsave;percpu_counter_add_batch;__do_munmap;

Actually this heavy lock contention is not always necessary.  The
'vm_committed_as' needs to be very precise when the strict
OVERCOMMIT_NEVER policy is set, which requires a rather small batch number
for the percpu counter.

So keep 'batch' number unchanged for strict OVERCOMMIT_NEVER policy, and
lift it to 64X for OVERCOMMIT_ALWAYS and OVERCOMMIT_GUESS policies.  Also
add a sysctl handler to adjust it when the policy is reconfigured.

Benchmark with the same testcase in [1] shows 53% improvement on a 8C/16T
desktop, and 2097%(20X) on a 4S/72C/144T server.  We tested with test
platforms in 0day (server, desktop and laptop), and 80%+ platforms shows
improvements with that test.  And whether it shows improvements depends on
if the test mmap size is bigger than the batch number computed.

And if the lift is 16X, 1/3 of the platforms will show improvements,
though it should help the mmap/unmap usage generally, as Michal Hocko
mentioned:

: I believe that there are non-synthetic worklaods which would benefit from
: a larger batch.  E.g.  large in memory databases which do large mmaps
: during startups from multiple threads.

[1] https://lore.kernel.org/lkml/20200305062138.GI5972@shao2-debian/

Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Qian Cai <cai@lca.pw>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: kernel test robot <rong.a.chen@intel.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1589611660-89854-4-git-send-email-feng.tang@intel.com
Link: http://lkml.kernel.org/r/1592725000-73486-4-git-send-email-feng.tang@intel.com
Link: http://lkml.kernel.org/r/1594389708-60781-5-git-send-email-feng.tang@intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:26 -07:00
Feng Tang
0a4954a850 percpu_counter: add percpu_counter_sync()
percpu_counter's accuracy is related to its batch size.  For a
percpu_counter with a big batch, its deviation could be big, so when the
counter's batch is runtime changed to a smaller value for better accuracy,
there could also be requirment to reduce the big deviation.

So add a percpu-counter sync function to be run on each CPU.

Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Tim Chen <tim.c.chen@intel.com>
Link: http://lkml.kernel.org/r/1594389708-60781-4-git-send-email-feng.tang@intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:26 -07:00
Joerg Roedel
2a681cfa5b mm: move p?d_alloc_track to separate header file
The functions are only used in two source files, so there is no need for
them to be in the global <linux/mm.h> header.  Move them to the new
<linux/pgalloc-track.h> header and include it only where needed.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Matthew Wilcox <willy@infradead.org>
Link: http://lkml.kernel.org/r/20200609120533.25867-1-joro@8bytes.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:26 -07:00
Mike Rapoport
f9cb654cb5 asm-generic: pgalloc: provide generic pgd_free()
Most architectures define pgd_free() as a wrapper for free_page().

Provide a generic version in asm-generic/pgalloc.h and enable its use for
most architectures.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>	[m68k]
Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Matthew Wilcox <willy@infradead.org>
Link: http://lkml.kernel.org/r/20200627143453.31835-7-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:26 -07:00
Mike Rapoport
d9e8b92967 asm-generic: pgalloc: provide generic pud_alloc_one() and pud_free_one()
Several architectures define pud_alloc_one() as a wrapper for
__get_free_page() and pud_free() as a wrapper for free_page().

Provide a generic implementation in asm-generic/pgalloc.h and use it where
appropriate.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Matthew Wilcox <willy@infradead.org>
Link: http://lkml.kernel.org/r/20200627143453.31835-6-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:26 -07:00
Mike Rapoport
1355c31eeb asm-generic: pgalloc: provide generic pmd_alloc_one() and pmd_free_one()
For most architectures that support >2 levels of page tables,
pmd_alloc_one() is a wrapper for __get_free_pages(), sometimes with
__GFP_ZERO and sometimes followed by memset(0) instead.

More elaborate versions on arm64 and x86 account memory for the user page
tables and call to pgtable_pmd_page_ctor() as the part of PMD page
initialization.

Move the arm64 version to include/asm-generic/pgalloc.h and use the
generic version on several architectures.

The pgtable_pmd_page_ctor() is a NOP when ARCH_ENABLE_SPLIT_PMD_PTLOCK is
not enabled, so there is no functional change for most architectures
except of the addition of __GFP_ACCOUNT for allocation of user page
tables.

The pmd_free() is a wrapper for free_page() in all the cases, so no
functional change here.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Link: http://lkml.kernel.org/r/20200627143453.31835-5-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:26 -07:00
Mike Rapoport
ca15ca406f mm: remove unneeded includes of <asm/pgalloc.h>
Patch series "mm: cleanup usage of <asm/pgalloc.h>"

Most architectures have very similar versions of pXd_alloc_one() and
pXd_free_one() for intermediate levels of page table.  These patches add
generic versions of these functions in <asm-generic/pgalloc.h> and enable
use of the generic functions where appropriate.

In addition, functions declared and defined in <asm/pgalloc.h> headers are
used mostly by core mm and early mm initialization in arch and there is no
actual reason to have the <asm/pgalloc.h> included all over the place.
The first patch in this series removes unneeded includes of
<asm/pgalloc.h>

In the end it didn't work out as neatly as I hoped and moving
pXd_alloc_track() definitions to <asm-generic/pgalloc.h> would require
unnecessary changes to arches that have custom page table allocations, so
I've decided to move lib/ioremap.c to mm/ and make pgalloc-track.h local
to mm/.

This patch (of 8):

In most cases <asm/pgalloc.h> header is required only for allocations of
page table memory.  Most of the .c files that include that header do not
use symbols declared in <asm/pgalloc.h> and do not require that header.

As for the other header files that used to include <asm/pgalloc.h>, it is
possible to move that include into the .c file that actually uses symbols
from <asm/pgalloc.h> and drop the include from the header file.

The process was somewhat automated using

	sed -i -E '/[<"]asm\/pgalloc\.h/d' \
                $(grep -L -w -f /tmp/xx \
                        $(git grep -E -l '[<"]asm/pgalloc\.h'))

where /tmp/xx contains all the symbols defined in
arch/*/include/asm/pgalloc.h.

[rppt@linux.ibm.com: fix powerpc warning]

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>	[m68k]
Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Matthew Wilcox <willy@infradead.org>
Link: http://lkml.kernel.org/r/20200627143453.31835-1-rppt@kernel.org
Link: http://lkml.kernel.org/r/20200627143453.31835-2-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:26 -07:00
Chris Down
45c7f7e1ef mm, memcg: decouple e{low,min} state mutations from protection checks
mem_cgroup_protected currently is both used to set effective low and min
and return a mem_cgroup_protection based on the result.  As a user, this
can be a little unexpected: it appears to be a simple predicate function,
if not for the big warning in the comment above about the order in which
it must be executed.

This change makes it so that we separate the state mutations from the
actual protection checks, which makes it more obvious where we need to be
careful mutating internal state, and where we are simply checking and
don't need to worry about that.

[mhocko@suse.com - don't check protection on root memcgs]

Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Chris Down <chris@chrisdown.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Yafang Shao <laoar.shao@gmail.com>
Link: http://lkml.kernel.org/r/ff3f915097fcee9f6d7041c084ef92d16aaeb56a.1594638158.git.chris@chrisdown.name
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:25 -07:00
Yafang Shao
22f7496f0b mm, memcg: avoid stale protection values when cgroup is above protection
Patch series "mm, memcg: memory.{low,min} reclaim fix & cleanup", v4.

This series contains a fix for a edge case in my earlier protection
calculation patches, and a patch to make the area overall a little more
robust to hopefully help avoid this in future.

This patch (of 2):

A cgroup can have both memory protection and a memory limit to isolate it
from its siblings in both directions - for example, to prevent it from
being shrunk below 2G under high pressure from outside, but also from
growing beyond 4G under low pressure.

Commit 9783aa9917 ("mm, memcg: proportional memory.{low,min} reclaim")
implemented proportional scan pressure so that multiple siblings in excess
of their protection settings don't get reclaimed equally but instead in
accordance to their unprotected portion.

During limit reclaim, this proportionality shouldn't apply of course:
there is no competition, all pressure is from within the cgroup and should
be applied as such.  Reclaim should operate at full efficiency.

However, mem_cgroup_protected() never expected anybody to look at the
effective protection values when it indicated that the cgroup is above its
protection.  As a result, a query during limit reclaim may return stale
protection values that were calculated by a previous reclaim cycle in
which the cgroup did have siblings.

When this happens, reclaim is unnecessarily hesitant and potentially slow
to meet the desired limit.  In theory this could lead to premature OOM
kills, although it's not obvious this has occurred in practice.

Workaround the problem by special casing reclaim roots in
mem_cgroup_protection.  These memcgs are never participating in the
reclaim protection because the reclaim is internal.

We have to ignore effective protection values for reclaim roots because
mem_cgroup_protected might be called from racing reclaim contexts with
different roots.  Calculation is relying on root -> leaf tree traversal
therefore top-down reclaim protection invariants should hold.  The only
exception is the reclaim root which should have effective protection set
to 0 but that would be problematic for the following setup:

 Let's have global and A's reclaim in parallel:
  |
  A (low=2G, usage = 3G, max = 3G, children_low_usage = 1.5G)
  |\
  | C (low = 1G, usage = 2.5G)
  B (low = 1G, usage = 0.5G)

 for A reclaim we have
 B.elow = B.low
 C.elow = C.low

 For the global reclaim
 A.elow = A.low
 B.elow = min(B.usage, B.low) because children_low_usage <= A.elow
 C.elow = min(C.usage, C.low)

 With the effective values resetting we have A reclaim
 A.elow = 0
 B.elow = B.low
 C.elow = C.low

 and global reclaim could see the above and then
 B.elow = C.elow = 0 because children_low_usage > A.elow

Which means that protected memcgs would get reclaimed.

In future we would like to make mem_cgroup_protected more robust against
racing reclaim contexts but that is likely more complex solution than this
simple workaround.

[hannes@cmpxchg.org - large part of the changelog]
[mhocko@suse.com - workaround explanation]
[chris@chrisdown.name - retitle]

Fixes: 9783aa9917 ("mm, memcg: proportional memory.{low,min} reclaim")
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Chris Down <chris@chrisdown.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Chris Down <chris@chrisdown.name>
Acked-by: Roman Gushchin <guro@fb.com>
Link: http://lkml.kernel.org/r/cover.1594638158.git.chris@chrisdown.name
Link: http://lkml.kernel.org/r/044fb8ecffd001c7905d27c0c2ad998069fdc396.1594638158.git.chris@chrisdown.name
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:25 -07:00
Roman Gushchin
eda330e57b mm: kmem: switch to static_branch_likely() in memcg_kmem_enabled()
Currently memcg_kmem_enabled() is optimized for the kernel memory
accounting being off.  It was so for a long time, and arguably the reason
behind was that the kernel memory accounting was initially an opt-in
feature.  However, now it's on by default on both cgroup v1 and cgroup v2,
and it's on for all cgroups.  So let's switch over to
static_branch_likely() to reflect this fact.

Unlikely there is a significant performance difference, as the cost of a
memory allocation and its accounting significantly exceeds the cost of a
jump.  However, the conversion makes the code look more logically.

Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Pekka Enberg <penberg@kernel.org>
Link: http://lkml.kernel.org/r/20200707173612.124425-3-guro@fb.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:25 -07:00
Shakeel Butt
991e767385 mm: memcontrol: account kernel stack per node
Currently the kernel stack is being accounted per-zone.  There is no need
to do that.  In addition due to being per-zone, memcg has to keep a
separate MEMCG_KERNEL_STACK_KB.  Make the stat per-node and deprecate
MEMCG_KERNEL_STACK_KB as memcg_stat_item is an extension of
node_stat_item.  In addition localize the kernel stack stats updates to
account_kernel_stack().

Signed-off-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Roman Gushchin <guro@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Link: http://lkml.kernel.org/r/20200630161539.1759185-1-shakeelb@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:25 -07:00
Roman Gushchin
10befea91b mm: memcg/slab: use a single set of kmem_caches for all allocations
Instead of having two sets of kmem_caches: one for system-wide and
non-accounted allocations and the second one shared by all accounted
allocations, we can use just one.

The idea is simple: space for obj_cgroup metadata can be allocated on
demand and filled only for accounted allocations.

It allows to remove a bunch of code which is required to handle kmem_cache
clones for accounted allocations.  There is no more need to create them,
accumulate statistics, propagate attributes, etc.  It's a quite
significant simplification.

Also, because the total number of slab_caches is reduced almost twice (not
all kmem_caches have a memcg clone), some additional memory savings are
expected.  On my devvm it additionally saves about 3.5% of slab memory.

[guro@fb.com: fix build on MIPS]
  Link: http://lkml.kernel.org/r/20200717214810.3733082-1-guro@fb.com

Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
Link: http://lkml.kernel.org/r/20200623174037.3951353-18-guro@fb.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:25 -07:00
Roman Gushchin
272911a4ad mm: memcg/slab: remove memcg_kmem_get_cache()
The memcg_kmem_get_cache() function became really trivial, so let's just
inline it into the single call point: memcg_slab_pre_alloc_hook().

It will make the code less bulky and can also help the compiler to
generate a better code.

Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20200623174037.3951353-15-guro@fb.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:25 -07:00
Roman Gushchin
d797b7d054 mm: memcg/slab: simplify memcg cache creation
Because the number of non-root kmem_caches doesn't depend on the number of
memory cgroups anymore and is generally not very big, there is no more
need for a dedicated workqueue.

Also, as there is no more need to pass any arguments to the
memcg_create_kmem_cache() except the root kmem_cache, it's possible to
just embed the work structure into the kmem_cache and avoid the dynamic
allocation of the work structure.

This will also simplify the synchronization: for each root kmem_cache
there is only one work.  So there will be no more concurrent attempts to
create a non-root kmem_cache for a root kmem_cache: the second and all
following attempts to queue the work will fail.

On the kmem_cache destruction path there is no more need to call the
expensive flush_workqueue() and wait for all pending works to be finished.
Instead, cancel_work_sync() can be used to cancel/wait for only one work.

Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20200623174037.3951353-14-guro@fb.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:25 -07:00
Roman Gushchin
9855609bde mm: memcg/slab: use a single set of kmem_caches for all accounted allocations
This is fairly big but mostly red patch, which makes all accounted slab
allocations use a single set of kmem_caches instead of creating a separate
set for each memory cgroup.

Because the number of non-root kmem_caches is now capped by the number of
root kmem_caches, there is no need to shrink or destroy them prematurely.
They can be perfectly destroyed together with their root counterparts.
This allows to dramatically simplify the management of non-root
kmem_caches and delete a ton of code.

This patch performs the following changes:
1) introduces memcg_params.memcg_cache pointer to represent the
   kmem_cache which will be used for all non-root allocations
2) reuses the existing memcg kmem_cache creation mechanism
   to create memcg kmem_cache on the first allocation attempt
3) memcg kmem_caches are named <kmemcache_name>-memcg,
   e.g. dentry-memcg
4) simplifies memcg_kmem_get_cache() to just return memcg kmem_cache
   or schedule it's creation and return the root cache
5) removes almost all non-root kmem_cache management code
   (separate refcounter, reparenting, shrinking, etc)
6) makes slab debugfs to display root_mem_cgroup css id and never
   show :dead and :deact flags in the memcg_slabinfo attribute.

Following patches in the series will simplify the kmem_cache creation.

Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20200623174037.3951353-13-guro@fb.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:25 -07:00
Roman Gushchin
0f876e4dc5 mm: memcg/slab: move memcg_kmem_bypass() to memcontrol.h
To make the memcg_kmem_bypass() function available outside of the
memcontrol.c, let's move it to memcontrol.h.  The function is small and
nicely fits into static inline sort of functions.

It will be used from the slab code.

Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20200623174037.3951353-12-guro@fb.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:24 -07:00
Roman Gushchin
964d4bd370 mm: memcg/slab: save obj_cgroup for non-root slab objects
Store the obj_cgroup pointer in the corresponding place of
page->obj_cgroups for each allocated non-root slab object.  Make sure that
each allocated object holds a reference to obj_cgroup.

Objcg pointer is obtained from the memcg->objcg dereferencing in
memcg_kmem_get_cache() and passed from pre_alloc_hook to post_alloc_hook.
Then in case of successful allocation(s) it's getting stored in the
page->obj_cgroups vector.

The objcg obtaining part look a bit bulky now, but it will be simplified
by next commits in the series.

Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20200623174037.3951353-9-guro@fb.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:24 -07:00
Roman Gushchin
286e04b8ed mm: memcg/slab: allocate obj_cgroups for non-root slab pages
Allocate and release memory to store obj_cgroup pointers for each non-root
slab page. Reuse page->mem_cgroup pointer to store a pointer to the
allocated space.

This commit temporarily increases the memory footprint of the kernel memory
accounting. To store obj_cgroup pointers we'll need a place for an
objcg_pointer for each allocated object. However, the following patches
in the series will enable sharing of slab pages between memory cgroups,
which will dramatically increase the total slab utilization. And the final
memory footprint will be significantly smaller than before.

To distinguish between obj_cgroups and memcg pointers in case when it's
not obvious which one is used (as in page_cgroup_ino()), let's always set
the lowest bit in the obj_cgroup case. The original obj_cgroups
pointer is marked to be ignored by kmemleak, which otherwise would
report a memory leak for each allocated vector.

Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20200623174037.3951353-8-guro@fb.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:24 -07:00
Roman Gushchin
bf4f059954 mm: memcg/slab: obj_cgroup API
Obj_cgroup API provides an ability to account sub-page sized kernel
objects, which potentially outlive the original memory cgroup.

The top-level API consists of the following functions:
  bool obj_cgroup_tryget(struct obj_cgroup *objcg);
  void obj_cgroup_get(struct obj_cgroup *objcg);
  void obj_cgroup_put(struct obj_cgroup *objcg);

  int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size);
  void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size);

  struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg);
  struct obj_cgroup *get_obj_cgroup_from_current(void);

Object cgroup is basically a pointer to a memory cgroup with a per-cpu
reference counter.  It substitutes a memory cgroup in places where it's
necessary to charge a custom amount of bytes instead of pages.

All charged memory rounded down to pages is charged to the corresponding
memory cgroup using __memcg_kmem_charge().

It implements reparenting: on memcg offlining it's getting reattached to
the parent memory cgroup.  Each online memory cgroup has an associated
active object cgroup to handle new allocations and the list of all
attached object cgroups.  On offlining of a cgroup this list is reparented
and for each object cgroup in the list the memcg pointer is swapped to the
parent memory cgroup.  It prevents long-living objects from pinning the
original memory cgroup in the memory.

The implementation is based on byte-sized per-cpu stocks.  A sub-page
sized leftover is stored in an atomic field, which is a part of obj_cgroup
object.  So on cgroup offlining the leftover is automatically reparented.

memcg->objcg is rcu protected.  objcg->memcg is a raw pointer, which is
always pointing at a memory cgroup, but can be atomically swapped to the
parent memory cgroup.  So a user must ensure the lifetime of the
cgroup, e.g.  grab rcu_read_lock or css_set_lock.

Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Link: http://lkml.kernel.org/r/20200623174037.3951353-7-guro@fb.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:24 -07:00
Roman Gushchin
4138fdfc8b mm: slub: implement SLUB version of obj_to_index()
This commit implements SLUB version of the obj_to_index() function, which
will be required to calculate the offset of obj_cgroup in the obj_cgroups
vector to store/obtain the objcg ownership data.

To make it faster, let's repeat the SLAB's trick introduced by commit
6a2d7a955d ("SLAB: use a multiply instead of a divide in
obj_to_index()") and avoid an expensive division.

Vlastimil Babka noticed, that SLUB does have already a similar function
called slab_index(), which is defined only if SLUB_DEBUG is enabled.  The
function does a similar math, but with a division, and it also takes a
page address instead of a page pointer.

Let's remove slab_index() and replace it with the new helper
__obj_to_index(), which takes a page address.  obj_to_index() will be a
simple wrapper taking a page pointer and passing page_address(page) into
__obj_to_index().

Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20200623174037.3951353-5-guro@fb.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:24 -07:00
Roman Gushchin
d42f3245c7 mm: memcg: convert vmstat slab counters to bytes
In order to prepare for per-object slab memory accounting, convert
NR_SLAB_RECLAIMABLE and NR_SLAB_UNRECLAIMABLE vmstat items to bytes.

To make it obvious, rename them to NR_SLAB_RECLAIMABLE_B and
NR_SLAB_UNRECLAIMABLE_B (similar to NR_KERNEL_STACK_KB).

Internally global and per-node counters are stored in pages, however memcg
and lruvec counters are stored in bytes.  This scheme may look weird, but
only for now.  As soon as slab pages will be shared between multiple
cgroups, global and node counters will reflect the total number of slab
pages.  However memcg and lruvec counters will be used for per-memcg slab
memory tracking, which will take separate kernel objects in the account.
Keeping global and node counters in pages helps to avoid additional
overhead.

The size of slab memory shouldn't exceed 4Gb on 32-bit machines, so it
will fit into atomic_long_t we use for vmstats.

Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20200623174037.3951353-4-guro@fb.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:24 -07:00
Roman Gushchin
ea426c2a7d mm: memcg: prepare for byte-sized vmstat items
To implement per-object slab memory accounting, we need to convert slab
vmstat counters to bytes.  Actually, out of 4 levels of counters: global,
per-node, per-memcg and per-lruvec only two last levels will require
byte-sized counters.  It's because global and per-node counters will be
counting the number of slab pages, and per-memcg and per-lruvec will be
counting the amount of memory taken by charged slab objects.

Converting all vmstat counters to bytes or even all slab counters to bytes
would introduce an additional overhead.  So instead let's store global and
per-node counters in pages, and memcg and lruvec counters in bytes.

To make the API clean all access helpers (both on the read and write
sides) are dealing with bytes.

To avoid back-and-forth conversions a new flavor of read-side helpers is
introduced, which always returns values in pages: node_page_state_pages()
and global_node_page_state_pages().

Actually new helpers are just reading raw values.  Old helpers are simple
wrappers, which will complain on an attempt to read byte value, because at
the moment no one actually needs bytes.

Thanks to Johannes Weiner for the idea of having the byte-sized API on top
of the page-sized internal storage.

Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20200623174037.3951353-3-guro@fb.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:24 -07:00
Roman Gushchin
eedc4e5a14 mm: memcg: factor out memcg- and lruvec-level changes out of __mod_lruvec_state()
Patch series "The new cgroup slab memory controller", v7.

The patchset moves the accounting from the page level to the object level.
It allows to share slab pages between memory cgroups.  This leads to a
significant win in the slab utilization (up to 45%) and the corresponding
drop in the total kernel memory footprint.  The reduced number of
unmovable slab pages should also have a positive effect on the memory
fragmentation.

The patchset makes the slab accounting code simpler: there is no more need
in the complicated dynamic creation and destruction of per-cgroup slab
caches, all memory cgroups use a global set of shared slab caches.  The
lifetime of slab caches is not more connected to the lifetime of memory
cgroups.

The more precise accounting does require more CPU, however in practice the
difference seems to be negligible.  We've been using the new slab
controller in Facebook production for several months with different
workloads and haven't seen any noticeable regressions.  What we've seen
were memory savings in order of 1 GB per host (it varied heavily depending
on the actual workload, size of RAM, number of CPUs, memory pressure,
etc).

The third version of the patchset added yet another step towards the
simplification of the code: sharing of slab caches between accounted and
non-accounted allocations.  It comes with significant upsides (most
noticeable, a complete elimination of dynamic slab caches creation) but
not without some regression risks, so this change sits on top of the
patchset and is not completely merged in.  So in the unlikely event of a
noticeable performance regression it can be reverted separately.

The slab memory accounting works in exactly the same way for SLAB and
SLUB.  With both allocators the new controller shows significant memory
savings, with SLUB the difference is bigger.  On my 16-core desktop
machine running Fedora 32 the size of the slab memory measured after the
start of the system was lower by 58% and 38% with SLUB and SLAB
correspondingly.

As an estimation of a potential CPU overhead, below are results of
slab_bulk_test01 test, kindly provided by Jesper D.  Brouer.  He also
helped with the evaluation of results.

The test can be found here: https://github.com/netoptimizer/prototype-kernel/
The smallest number in each row should be picked for a comparison.

SLUB-patched - bulk-API
 - SLUB-patched : bulk_quick_reuse objects=1 : 187 -  90 - 224  cycles(tsc)
 - SLUB-patched : bulk_quick_reuse objects=2 : 110 -  53 - 133  cycles(tsc)
 - SLUB-patched : bulk_quick_reuse objects=3 :  88 -  95 -  42  cycles(tsc)
 - SLUB-patched : bulk_quick_reuse objects=4 :  91 -  85 -  36  cycles(tsc)
 - SLUB-patched : bulk_quick_reuse objects=8 :  32 -  66 -  32  cycles(tsc)

SLUB-original -  bulk-API
 - SLUB-original: bulk_quick_reuse objects=1 :  87 -  87 - 142  cycles(tsc)
 - SLUB-original: bulk_quick_reuse objects=2 :  52 -  53 -  53  cycles(tsc)
 - SLUB-original: bulk_quick_reuse objects=3 :  42 -  42 -  91  cycles(tsc)
 - SLUB-original: bulk_quick_reuse objects=4 :  91 -  37 -  37  cycles(tsc)
 - SLUB-original: bulk_quick_reuse objects=8 :  31 -  79 -  76  cycles(tsc)

SLAB-patched -  bulk-API
 - SLAB-patched : bulk_quick_reuse objects=1 :  67 -  67 - 140  cycles(tsc)
 - SLAB-patched : bulk_quick_reuse objects=2 :  55 -  46 -  46  cycles(tsc)
 - SLAB-patched : bulk_quick_reuse objects=3 :  93 -  94 -  39  cycles(tsc)
 - SLAB-patched : bulk_quick_reuse objects=4 :  35 -  88 -  85  cycles(tsc)
 - SLAB-patched : bulk_quick_reuse objects=8 :  30 -  30 -  30  cycles(tsc)

SLAB-original-  bulk-API
 - SLAB-original: bulk_quick_reuse objects=1 : 143 - 136 -  67  cycles(tsc)
 - SLAB-original: bulk_quick_reuse objects=2 :  45 -  46 -  46  cycles(tsc)
 - SLAB-original: bulk_quick_reuse objects=3 :  38 -  39 -  39  cycles(tsc)
 - SLAB-original: bulk_quick_reuse objects=4 :  35 -  87 -  87  cycles(tsc)
 - SLAB-original: bulk_quick_reuse objects=8 :  29 -  66 -  30  cycles(tsc)

This patch (of 19):

To convert memcg and lruvec slab counters to bytes there must be a way to
change these counters without touching node counters.  Factor out
__mod_memcg_lruvec_state() out of __mod_lruvec_state().

Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20200623174037.3951353-1-guro@fb.com
Link: http://lkml.kernel.org/r/20200623174037.3951353-2-guro@fb.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:24 -07:00
Chris Down
ea3271f719 tmpfs: support 64-bit inums per-sb
The default is still set to inode32 for backwards compatibility, but
system administrators can opt in to the new 64-bit inode numbers by
either:

1. Passing inode64 on the command line when mounting, or
2. Configuring the kernel with CONFIG_TMPFS_INODE64=y

The inode64 and inode32 names are used based on existing precedent from
XFS.

[hughd@google.com: Kconfig fixes]
  Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008011928010.13320@eggly.anvils

Signed-off-by: Chris Down <chris@chrisdown.name>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/8b23758d0c66b5e2263e08baf9c4b6a7565cbd8f.1594661218.git.chris@chrisdown.name
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:24 -07:00
Chris Down
e809d5f0b5 tmpfs: per-superblock i_ino support
Patch series "tmpfs: inode: Reduce risk of inum overflow", v7.

In Facebook production we are seeing heavy i_ino wraparounds on tmpfs.  On
affected tiers, in excess of 10% of hosts show multiple files with
different content and the same inode number, with some servers even having
as many as 150 duplicated inode numbers with differing file content.

This causes actual, tangible problems in production.  For example, we have
complaints from those working on remote caches that their application is
reporting cache corruptions because it uses (device, inodenum) to
establish the identity of a particular cache object, but because it's not
unique any more, the application refuses to continue and reports cache
corruption.  Even worse, sometimes applications may not even detect the
corruption but may continue anyway, causing phantom and hard to debug
behaviour.

In general, userspace applications expect that (device, inodenum) should
be enough to be uniquely point to one inode, which seems fair enough.  One
might also need to check the generation, but in this case:

1. That's not currently exposed to userspace
   (ioctl(...FS_IOC_GETVERSION...) returns ENOTTY on tmpfs);
2. Even with generation, there shouldn't be two live inodes with the
   same inode number on one device.

In order to mitigate this, we take a two-pronged approach:

1. Moving inum generation from being global to per-sb for tmpfs. This
   itself allows some reduction in i_ino churn. This works on both 64-
   and 32- bit machines.
2. Adding inode{64,32} for tmpfs. This fix is supported on machines with
   64-bit ino_t only: we allow users to mount tmpfs with a new inode64
   option that uses the full width of ino_t, or CONFIG_TMPFS_INODE64.

You can see how this compares to previous related patches which didn't
implement this per-superblock:

- https://patchwork.kernel.org/patch/11254001/
- https://patchwork.kernel.org/patch/11023915/

This patch (of 2):

get_next_ino has a number of problems:

- It uses and returns a uint, which is susceptible to become overflowed
  if a lot of volatile inodes that use get_next_ino are created.
- It's global, with no specificity per-sb or even per-filesystem. This
  means it's not that difficult to cause inode number wraparounds on a
  single device, which can result in having multiple distinct inodes
  with the same inode number.

This patch adds a per-superblock counter that mitigates the second case.
This design also allows us to later have a specific i_ino size per-device,
for example, allowing users to choose whether to use 32- or 64-bit inodes
for each tmpfs mount.  This is implemented in the next commit.

For internal shmem mounts which may be less tolerant to spinlock delays,
we implement a percpu batching scheme which only takes the stat_lock at
each batch boundary.

Signed-off-by: Chris Down <chris@chrisdown.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/cover.1594661218.git.chris@chrisdown.name
Link: http://lkml.kernel.org/r/1986b9d63b986f08ec07a4aa4b2275e718e47d8a.1594661218.git.chris@chrisdown.name
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:24 -07:00
John Hubbard
6dc5ea16c8 mm, dump_page: do not crash with bad compound_mapcount()
If a compound page is being split while dump_page() is being run on that
page, we can end up calling compound_mapcount() on a page that is no
longer compound.  This leads to a crash (already seen at least once in the
field), due to the VM_BUG_ON_PAGE() assertion inside compound_mapcount().

(The above is from Matthew Wilcox's analysis of Qian Cai's bug report.)

A similar problem is possible, via compound_pincount() instead of
compound_mapcount().

In order to avoid this kind of crash, make dump_page() slightly more
robust, by providing a pair of simpler routines that don't contain
assertions: head_mapcount() and head_pincount().

For debug tools, we don't want to go *too* far in this direction, but this
is a simple small fix, and the crash has already been seen, so it's a good
trade-off.

Reported-by: Qian Cai <cai@lca.pw>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Link: http://lkml.kernel.org/r/20200804214807.169256-1-jhubbard@nvidia.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:23 -07:00
Waiman Long
453431a549 mm, treewide: rename kzfree() to kfree_sensitive()
As said by Linus:

  A symmetric naming is only helpful if it implies symmetries in use.
  Otherwise it's actively misleading.

  In "kzalloc()", the z is meaningful and an important part of what the
  caller wants.

  In "kzfree()", the z is actively detrimental, because maybe in the
  future we really _might_ want to use that "memfill(0xdeadbeef)" or
  something. The "zero" part of the interface isn't even _relevant_.

The main reason that kzfree() exists is to clear sensitive information
that should not be leaked to other future users of the same memory
objects.

Rename kzfree() to kfree_sensitive() to follow the example of the recently
added kvfree_sensitive() and make the intention of the API more explicit.
In addition, memzero_explicit() is used to clear the memory to make sure
that it won't get optimized away by the compiler.

The renaming is done by using the command sequence:

  git grep -w --name-only kzfree |\
  xargs sed -i 's/kzfree/kfree_sensitive/'

followed by some editing of the kfree_sensitive() kerneldoc and adding
a kzfree backward compatibility macro in slab.h.

[akpm@linux-foundation.org: fs/crypto/inline_crypt.c needs linux/slab.h]
[akpm@linux-foundation.org: fix fs/crypto/inline_crypt.c some more]

Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: David Howells <dhowells@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Joe Perches <joe@perches.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: "Jason A . Donenfeld" <Jason@zx2c4.com>
Link: http://lkml.kernel.org/r/20200616154311.12314-3-longman@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:22 -07:00
Ralph Campbell
c1a06df6eb mm/migrate: fix migrate_pgmap_owner w/o CONFIG_MMU_NOTIFIER
On x86_64, when CONFIG_MMU_NOTIFIER is not set/enabled, there is a
compiler error:

   mm/migrate.c: In function 'migrate_vma_collect':
   mm/migrate.c:2481:7: error: 'struct mmu_notifier_range' has no member named 'migrate_pgmap_owner'
     range.migrate_pgmap_owner = migrate->pgmap_owner;
          ^

Fixes: 998427b3ad ("mm/notifier: add migration invalidation type")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "Jason Gunthorpe" <jgg@mellanox.com>
Link: http://lkml.kernel.org/r/20200806193353.7124-1-rcampbell@nvidia.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:21 -07:00
Linus Torvalds
d6efb3ac3e Merge tag 'tty-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial updates from Greg KH:
 "Here is the large set of TTY and Serial driver patches for 5.9-rc1.

  Lots of bugfixes in here, thanks to syzbot fuzzing for serial and vt
  and console code.

  Other highlights include:

   - much needed vt/vc code cleanup from Jiri Slaby

   - 8250 driver fixes and additions

   - various serial driver updates and feature enhancements

   - locking cleanup for serial/console initializations

   - other minor cleanups

  All of these have been in linux-next with no reported issues"

* tag 'tty-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (90 commits)
  MAINTAINERS: enlist Greg formally for console stuff
  vgacon: Fix for missing check in scrollback handling
  Revert "serial: 8250: Let serial core initialise spin lock"
  serial: 8250: Let serial core initialise spin lock
  tty: keyboard, do not speculate on func_table index
  serial: stm32: Add RS485 RTS GPIO control
  serial: 8250_dw: Fix common clocks usage race condition
  serial: 8250_dw: Pass the same rate to the clk round and set rate methods
  serial: 8250_dw: Simplify the ref clock rate setting procedure
  serial: 8250: Add 8250 port clock update method
  tty: serial: imx: add imx earlycon driver
  tty: serial: imx: enable imx serial console port as module
  tty/synclink: remove leftover bits of non-PCI card support
  tty: Use the preferred form for passing the size of a structure type
  tty: Fix identation issues in struct serial_struct32
  tty: Avoid the use of one-element arrays
  serial: msm_serial: add sparse context annotation
  serial: pmac_zilog: add sparse context annotation
  newport_con: vc_color is now in state
  serial: imx: use hrtimers for rs485 delays
  ...
2020-08-06 14:56:11 -07:00
Linus Torvalds
c0c419c045 Merge tag 'staging-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging/IIO driver updates from Greg KH:
 "Here is the large set of Staging and IIO driver patches for 5.9-rc1.

  Lots of churn here, but overall the size increase in lines added is
  small, while adding a load of new IIO drivers.

  Major things in here:

   - lots and lots of IIO new drivers and frameworks added

   - IIO driver fixes and updates

   - lots of tiny coding style cleanups for staging drivers

   - vc04_services major reworks and cleanups

  We had 3 set of drivers move out of staging in this round as well:

   - wilc1000 wireless driver moved out of staging

   - speakup moved out of staging

   - most USB driver moved out of staging

  Full details are in the shortlog.

  All of these have been in linux-next with no reported issues. The last
  few changes here were to resolve reported linux-next issues, and they
  seem to have resolved the problems"

* tag 'staging-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (428 commits)
  staging: most: fix up movement of USB driver
  staging: rts5208: clear alignment style issues
  staging: r8188eu: replace rtw_netdev_priv define with inline function
  staging: netlogic: clear alignment style issues
  staging: android: ashmem: Fix lockdep warning for write operation
  drivers: most: add USB adapter driver
  staging: most: Use %pM format specifier for MAC addresses
  staging: ks7010: Use %pM format specifier for MAC addresses
  staging: qlge: qlge_dbg: removed comment repition
  staging: wfx: Use flex_array_size() helper in memcpy()
  staging: rtl8723bs: Align macro definitions
  staging: rtl8723bs: Clean up function declations
  staging: rtl8723bs: Fix coding style errors
  drivers: staging: audio: Fix the missing header file for helper file
  staging: greybus: audio: Enable GB codec, audio module compilation.
  staging: greybus: audio: Add helper APIs for dynamic audio modules
  staging: greybus: audio: Resolve compilation error in topology parser
  staging: greybus: audio: Resolve compilation errors for GB codec module
  staging: greybus: audio: Maintain jack list within GB Audio module
  staging: greybus: audio: Update snd_jack FW usage as per new APIs
  ...
2020-08-06 14:36:13 -07:00
Linus Torvalds
3f9df56480 Merge tag 'sound-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound updates from Takashi Iwai:
 "This became wide and scattered updates all over the sound tree as
  diffstat shows: lots of (still ongoing) refactoring works in ASoC,
  fixes and cleanups caught by static analysis, inclusive term
  conversions as well as lots of new drivers. Below are highlights:

  ASoC core:
   - API cleanups and conversions to the unified mute_stream() call
   - Simplify I/O helper functions
   - Use helper macros to retrieve RTD from substreams

  ASoC drivers:
   - Lots of fixes and cleanups in Intel ASoC drivers
   - Lots of new stuff: Freescale MQS and i.MX6sx, Intel KeemBay I2S,
     Maxim MAX98360A and MAX98373 SoundWire, various Mediatek boards,
     nVidia Tegra 186 and 210, RealTek RL6231, Samsung Midas and Aries
     boards, TI J721e EVM

  ALSA core:
   - Minor code refacotring for SG-buffer handling

  HD-audio:
   - Generalization of mute-LED handling with LED classdev
   - Intel silent stream support for HDMI
   - Device-specific fixes: CA0132, Loongson-3

  Others:
   - Usual USB- and HD-audio quirks for various devices
   - Fixes for echoaudio DMA position handling
   - Various documents and trivial fixes for sparse warnings
   - Conversion to adopt inclusive terms"

* tag 'sound-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (479 commits)
  ALSA: pci: delete repeated words in comments
  ALSA: isa: delete repeated words in comments
  ALSA: hda/tegra: Add 100us dma stop delay
  ALSA: hda: Add dma stop delay variable
  ASoC: hda/tegra: Set buffer alignment to 128 bytes
  ALSA: seq: oss: Serialize ioctls
  ALSA: hda/hdmi: Add quirk to force connectivity
  ALSA: usb-audio: add startech usb audio dock name
  ALSA: usb-audio: Add support for Lenovo ThinkStation P620
  Revert "ALSA: hda: call runtime_allow() for all hda controllers"
  ALSA: hda/ca0132 - Fix AE-5 microphone selection commands.
  ALSA: hda/ca0132 - Add new quirk ID for Recon3D.
  ALSA: hda/ca0132 - Fix ZxR Headphone gain control get value.
  ALSA: hda/realtek: Add alc269/alc662 pin-tables for Loongson-3 laptops
  ALSA: docs: fix typo
  ALSA: doc: use correct config variable name
  ASoC: core: Two step component registration
  ASoC: core: Simplify snd_soc_component_initialize declaration
  ASoC: core: Relocate and expose snd_soc_component_initialize
  ASoC: sh: Replace 'select' DMADEVICES 'with depends on'
  ...
2020-08-06 14:27:31 -07:00
Linus Torvalds
921d2597ab Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
 "s390:
   - implement diag318

  x86:
   - Report last CPU for debugging
   - Emulate smaller MAXPHYADDR in the guest than in the host
   - .noinstr and tracing fixes from Thomas
   - nested SVM page table switching optimization and fixes

  Generic:
   - Unify shadow MMU cache data structures across architectures"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (127 commits)
  KVM: SVM: Fix sev_pin_memory() error handling
  KVM: LAPIC: Set the TDCR settable bits
  KVM: x86: Specify max TDP level via kvm_configure_mmu()
  KVM: x86/mmu: Rename max_page_level to max_huge_page_level
  KVM: x86: Dynamically calculate TDP level from max level and MAXPHYADDR
  KVM: VXM: Remove temporary WARN on expected vs. actual EPTP level mismatch
  KVM: x86: Pull the PGD's level from the MMU instead of recalculating it
  KVM: VMX: Make vmx_load_mmu_pgd() static
  KVM: x86/mmu: Add separate helper for shadow NPT root page role calc
  KVM: VMX: Drop a duplicate declaration of construct_eptp()
  KVM: nSVM: Correctly set the shadow NPT root level in its MMU role
  KVM: Using macros instead of magic values
  MIPS: KVM: Fix build error caused by 'kvm_run' cleanup
  KVM: nSVM: remove nonsensical EXITINFO1 adjustment on nested NPF
  KVM: x86: Add a capability for GUEST_MAXPHYADDR < HOST_MAXPHYADDR support
  KVM: VMX: optimize #PF injection when MAXPHYADDR does not match
  KVM: VMX: Add guest physical address check in EPT violation and misconfig
  KVM: VMX: introduce vmx_need_pf_intercept
  KVM: x86: update exception bitmap on CPUID changes
  KVM: x86: rename update_bp_intercept to update_exception_bitmap
  ...
2020-08-06 12:59:31 -07:00
Linus Torvalds
6d2b84a4e5 Merge tag 'sched-fifo-2020-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull sched/fifo updates from Ingo Molnar:
 "This adds the sched_set_fifo*() encapsulation APIs to remove static
  priority level knowledge from non-scheduler code.

  The three APIs for non-scheduler code to set SCHED_FIFO are:

   - sched_set_fifo()
   - sched_set_fifo_low()
   - sched_set_normal()

  These are two FIFO priority levels: default (high), and a 'low'
  priority level, plus sched_set_normal() to set the policy back to
  non-SCHED_FIFO.

  Since the changes affect a lot of non-scheduler code, we kept this in
  a separate tree"

* tag 'sched-fifo-2020-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  sched,tracing: Convert to sched_set_fifo()
  sched: Remove sched_set_*() return value
  sched: Remove sched_setscheduler*() EXPORTs
  sched,psi: Convert to sched_set_fifo_low()
  sched,rcutorture: Convert to sched_set_fifo_low()
  sched,rcuperf: Convert to sched_set_fifo_low()
  sched,locktorture: Convert to sched_set_fifo()
  sched,irq: Convert to sched_set_fifo()
  sched,watchdog: Convert to sched_set_fifo()
  sched,serial: Convert to sched_set_fifo()
  sched,powerclamp: Convert to sched_set_fifo()
  sched,ion: Convert to sched_set_normal()
  sched,powercap: Convert to sched_set_fifo*()
  sched,spi: Convert to sched_set_fifo*()
  sched,mmc: Convert to sched_set_fifo*()
  sched,ivtv: Convert to sched_set_fifo*()
  sched,drm/scheduler: Convert to sched_set_fifo*()
  sched,msm: Convert to sched_set_fifo*()
  sched,psci: Convert to sched_set_fifo*()
  sched,drbd: Convert to sched_set_fifo*()
  ...
2020-08-06 11:55:43 -07:00
Linus Torvalds
4cec929370 Merge tag 'integrity-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
Pull integrity updates from Mimi Zohar:
 "The nicest change is the IMA policy rule checking. The other changes
  include allowing the kexec boot cmdline line measure policy rules to
  be defined in terms of the inode associated with the kexec kernel
  image, making the IMA_APPRAISE_BOOTPARAM, which governs the IMA
  appraise mode (log, fix, enforce), a runtime decision based on the
  secure boot mode of the system, and including errno in the audit log"

* tag 'integrity-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
  integrity: remove redundant initialization of variable ret
  ima: move APPRAISE_BOOTPARAM dependency on ARCH_POLICY to runtime
  ima: AppArmor satisfies the audit rule requirements
  ima: Rename internal filter rule functions
  ima: Support additional conditionals in the KEXEC_CMDLINE hook function
  ima: Use the common function to detect LSM conditionals in a rule
  ima: Move comprehensive rule validation checks out of the token parser
  ima: Use correct type for the args_p member of ima_rule_entry.lsm elements
  ima: Shallow copy the args_p member of ima_rule_entry.lsm elements
  ima: Fail rule parsing when appraise_flag=blacklist is unsupportable
  ima: Fail rule parsing when the KEY_CHECK hook is combined with an invalid cond
  ima: Fail rule parsing when the KEXEC_CMDLINE hook is combined with an invalid cond
  ima: Fail rule parsing when buffer hook functions have an invalid action
  ima: Free the entire rule if it fails to parse
  ima: Free the entire rule when deleting a list of rules
  ima: Have the LSM free its audit rule
  IMA: Add audit log for failure conditions
  integrity: Add errno field in audit message
2020-08-06 11:35:57 -07:00
Linus Torvalds
b62e419707 Merge tag 'mips_5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS upates from Thomas Bogendoerfer:

 - improvements for Loongson64

 - extended ingenic support

 - removal of not maintained paravirt system type

 - cleanups and fixes

* tag 'mips_5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (81 commits)
  MIPS: SGI-IP27: always enable NUMA in Kconfig
  MAINTAINERS: Update KVM/MIPS maintainers
  MIPS: Update default config file for Loongson-3
  MIPS: KVM: Add kvm guest support for Loongson-3
  dt-bindings: mips: Document Loongson kvm guest board
  MIPS: handle Loongson-specific GSExc exception
  MIPS: add definitions for Loongson-specific CP0.Diag1 register
  MIPS: only register FTLBPar exception handler for supported models
  MIPS: ingenic: Hardcode mem size for qi,lb60 board
  MIPS: DTS: ingenic/qi,lb60: Add model and memory node
  MIPS: ingenic: Use fw_passed_dtb even if CONFIG_BUILTIN_DTB
  MIPS: head.S: Init fw_passed_dtb to builtin DTB
  of: address: Fix parser address/size cells initialization
  of_address: Guard of_bus_pci_get_flags with CONFIG_PCI
  MIPS: DTS: Fix number of msi vectors for Loongson64G
  MIPS: Loongson64: Add ISA node for LS7A PCH
  MIPS: Loongson64: DTS: Fix ISA and PCI I/O ranges for RS780E PCH
  MIPS: Loongson64: Enlarge IO_SPACE_LIMIT
  MIPS: Loongson64: Process ISA Node in DeviceTree
  of_address: Add bus type match for pci ranges parser
  ...
2020-08-06 10:54:07 -07:00
Linus Torvalds
9ab9bc5115 Merge tag 'hyperv-next-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv updates from Wei Liu:

 - A patch series from Andrea to improve vmbus code

 - Two clean-up patches from Alexander and Randy

* tag 'hyperv-next-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  hyperv: hyperv.h: drop a duplicated word
  tools: hv: change http to https in hv_kvp_daemon.c
  Drivers: hv: vmbus: Remove the lock field from the vmbus_channel struct
  scsi: storvsc: Introduce the per-storvsc_device spinlock
  Drivers: hv: vmbus: Remove unnecessary channel->lock critical sections (sc_list updaters)
  Drivers: hv: vmbus: Use channel_mutex in channel_vp_mapping_show()
  Drivers: hv: vmbus: Remove unnecessary channel->lock critical sections (sc_list readers)
  Drivers: hv: vmbus: Replace cpumask_test_cpu(, cpu_online_mask) with cpu_online()
  Drivers: hv: vmbus: Remove the numa_node field from the vmbus_channel struct
  Drivers: hv: vmbus: Remove the target_vp field from the vmbus_channel struct
2020-08-06 09:26:10 -07:00
Linus Torvalds
47ec5303d7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from David Miller:

 1) Support 6Ghz band in ath11k driver, from Rajkumar Manoharan.

 2) Support UDP segmentation in code TSO code, from Eric Dumazet.

 3) Allow flashing different flash images in cxgb4 driver, from Vishal
    Kulkarni.

 4) Add drop frames counter and flow status to tc flower offloading,
    from Po Liu.

 5) Support n-tuple filters in cxgb4, from Vishal Kulkarni.

 6) Various new indirect call avoidance, from Eric Dumazet and Brian
    Vazquez.

 7) Fix BPF verifier failures on 32-bit pointer arithmetic, from
    Yonghong Song.

 8) Support querying and setting hardware address of a port function via
    devlink, use this in mlx5, from Parav Pandit.

 9) Support hw ipsec offload on bonding slaves, from Jarod Wilson.

10) Switch qca8k driver over to phylink, from Jonathan McDowell.

11) In bpftool, show list of processes holding BPF FD references to
    maps, programs, links, and btf objects. From Andrii Nakryiko.

12) Several conversions over to generic power management, from Vaibhav
    Gupta.

13) Add support for SO_KEEPALIVE et al. to bpf_setsockopt(), from Dmitry
    Yakunin.

14) Various https url conversions, from Alexander A. Klimov.

15) Timestamping and PHC support for mscc PHY driver, from Antoine
    Tenart.

16) Support bpf iterating over tcp and udp sockets, from Yonghong Song.

17) Support 5GBASE-T i40e NICs, from Aleksandr Loktionov.

18) Add kTLS RX HW offload support to mlx5e, from Tariq Toukan.

19) Fix the ->ndo_start_xmit() return type to be netdev_tx_t in several
    drivers. From Luc Van Oostenryck.

20) XDP support for xen-netfront, from Denis Kirjanov.

21) Support receive buffer autotuning in MPTCP, from Florian Westphal.

22) Support EF100 chip in sfc driver, from Edward Cree.

23) Add XDP support to mvpp2 driver, from Matteo Croce.

24) Support MPTCP in sock_diag, from Paolo Abeni.

25) Commonize UDP tunnel offloading code by creating udp_tunnel_nic
    infrastructure, from Jakub Kicinski.

26) Several pci_ --> dma_ API conversions, from Christophe JAILLET.

27) Add FLOW_ACTION_POLICE support to mlxsw, from Ido Schimmel.

28) Add SK_LOOKUP bpf program type, from Jakub Sitnicki.

29) Refactor a lot of networking socket option handling code in order to
    avoid set_fs() calls, from Christoph Hellwig.

30) Add rfc4884 support to icmp code, from Willem de Bruijn.

31) Support TBF offload in dpaa2-eth driver, from Ioana Ciornei.

32) Support XDP_REDIRECT in qede driver, from Alexander Lobakin.

33) Support PCI relaxed ordering in mlx5 driver, from Aya Levin.

34) Support TCP syncookies in MPTCP, from Flowian Westphal.

35) Fix several tricky cases of PMTU handling wrt. briding, from Stefano
    Brivio.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2056 commits)
  net: thunderx: initialize VF's mailbox mutex before first usage
  usb: hso: remove bogus check for EINPROGRESS
  usb: hso: no complaint about kmalloc failure
  hso: fix bailout in error case of probe
  ip_tunnel_core: Fix build for archs without _HAVE_ARCH_IPV6_CSUM
  selftests/net: relax cpu affinity requirement in msg_zerocopy test
  mptcp: be careful on subflow creation
  selftests: rtnetlink: make kci_test_encap() return sub-test result
  selftests: rtnetlink: correct the final return value for the test
  net: dsa: sja1105: use detected device id instead of DT one on mismatch
  tipc: set ub->ifindex for local ipv6 address
  ipv6: add ipv6_dev_find()
  net: openvswitch: silence suspicious RCU usage warning
  Revert "vxlan: fix tos value before xmit"
  ptp: only allow phase values lower than 1 period
  farsync: switch from 'pci_' to 'dma_' API
  wan: wanxl: switch from 'pci_' to 'dma_' API
  hv_netvsc: do not use VF device if link is down
  dpaa2-eth: Fix passing zero to 'PTR_ERR' warning
  net: macb: Properly handle phylink on at91sam9x
  ...
2020-08-05 20:13:21 -07:00
Linus Torvalds
8186749621 Merge tag 'drm-next-2020-08-06' of git://anongit.freedesktop.org/drm/drm
Pull drm updates from Dave Airlie:
 "New xilinx displayport driver, AMD support for two new GPUs (more
  header files), i915 initial support for RocketLake and some work on
  their DG1 (discrete chip).

  The core also grew some lockdep annotations to try and constrain what
  drivers do with dma-fences, and added some documentation on why the
  idea of indefinite fences doesn't work.

  The long list is below.

  I do have some fixes trees outstanding, but I'll follow up with those
  later.

  core:
   - add user def flag to cmd line modes
   - dma_fence_wait added might_sleep
   - dma-fence lockdep annotations
   - indefinite fences are bad documentation
   - gem CMA functions used in more drivers
   - struct mutex removal
   - more drm_ debug macro usage
   - set/drop master api fixes
   - fix for drm/mm hole size comparison
   - drm/mm remove invalid entry optimization
   - optimise drm/mm hole handling
   - VRR debugfs added
   - uncompressed AFBC modifier support
   - multiple display id blocks in EDID
   - multiple driver sg handling fixes
   - __drm_atomic_helper_crtc_reset in all drivers
   - managed vram helpers

  ttm:
   - ttm_mem_reg handling cleanup
   - remove bo offset field
   - drop CMA memtype flag
   - drop mappable flag

  xilinx:
   - New Xilinx ZynqMP DisplayPort Subsystem driver

  nouveau:
   - add CRC support
   - start using NVIDIA published class header files
   - convert all push buffer emission to new macros
   - Proper push buffer space management for EVO/NVD channels.
   - firmware loading fixes
   - 2MiB system memory pages support on Pascal and newer

  vkms:
   - larger cursor support

  i915:
   - Rocketlake platform enablement
   - Early DG1 enablement
   - Numerous GEM refactorings
   - DP MST fixes
   - FBC, PSR, Cursor, Color, Gamma fixes
   - TGL, RKL, EHL workaround updates
   - TGL 8K display support fixes
   - SDVO/HDMI/DVI fixes

  amdgpu:
   - Initial support for Sienna Cichlid GPU
   - Initial support for Navy Flounder GPU
   - SI UVD/VCE support
   - expose rotation property
   - Add support for unique id on Arcturus
   - Enable runtime PM on vega10 boards that support BACO
   - Skip BAR resizing if the bios already did id
   - Major swSMU code cleanup
   - Fixes for DCN bandwidth calculations

  amdkfd:
   - Track SDMA usage per process
   - SMI events interface

  radeon:
   - Default to on chip GART for AGP boards on all arches
   - Runtime PM reference count fixes

  msm:
   - headers regenerated causing churn
   - a650/a640 display and GPU enablement
   - dpu dither support for 6bpc panels
   - dpu cursor fix
   - dsi/mdp5 enablement for sdm630/sdm636/sdm66

  tegra:
   - video capture prep support
   - reflection support

  mediatek:
   - convert mtk_dsi to bridge API

  meson:
   - FBC support

  sun4i:
   - iommu support

  rockchip:
   - register locking fix
   - per-pixel alpha support PX30 VOP

  mgag200:
   - ported to simple and shmem helpers
   - device init cleanups
   - use managed pci functions
   - dropped hw cursor support

  ast:
   - use managed pci functions
   - use managed VRAM helpers
   - rework cursor support

  malidp:
   - dev_groups support

  hibmc:
   - refactor hibmc_drv_vdac:

  vc4:
   - create TXP CRTC

  imx:
   - error path fixes and cleanups

  etnaviv:
   - clock handling and error handling cleanups
   - use pin_user_pages"

* tag 'drm-next-2020-08-06' of git://anongit.freedesktop.org/drm/drm: (1747 commits)
  drm/msm: use kthread_create_worker instead of kthread_run
  drm/msm/mdp5: Add MDP5 configuration for SDM636/660
  drm/msm/dsi: Add DSI configuration for SDM660
  drm/msm/mdp5: Add MDP5 configuration for SDM630
  drm/msm/dsi: Add phy configuration for SDM630/636/660
  drm/msm/a6xx: add A640/A650 hwcg
  drm/msm/a6xx: hwcg tables in gpulist
  drm/msm/dpu: add SM8250 to hw catalog
  drm/msm/dpu: add SM8150 to hw catalog
  drm/msm/dpu: intf timing path for displayport
  drm/msm/dpu: set missing flush bits for INTF_2 and INTF_3
  drm/msm/dpu: don't use INTF_INPUT_CTRL feature on sdm845
  drm/msm/dpu: move some sspp caps to dpu_caps
  drm/msm/dpu: update UBWC config for sm8150 and sm8250
  drm/msm/dpu: use right setup_blend_config for sm8150 and sm8250
  drm/msm/a6xx: set ubwc config for A640 and A650
  drm/msm/adreno: un-open-code some packets
  drm/msm: sync generated headers
  drm/msm/a6xx: add build_bw_table for A640/A650
  drm/msm/a6xx: fix crashstate capture for A650
  ...
2020-08-05 19:50:06 -07:00
Linus Torvalds
e4a7b2dc35 Merge tag 'leds-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/pavel/linux-leds
Pull LED updates from Pavel Machek:
 "Okay, so... this one is interesting. RGB LEDs are very common, and we
  need to have some kind of support for them. Multicolor is for
  arbitrary set of LEDs in one package, RGB is for LEDs that can produce
  full range of colors. We do not have real multicolor LED that is not
  RGB in the pipeline, so that one is disabled for now.

  You can expect this saga to continue with next pull requests"

* tag 'leds-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/pavel/linux-leds: (37 commits)
  MAINTAINERS: Remove myself as LED subsystem maintainer
  leds: disallow /sys/class/leds/*:multi:* for now
  leds: add RGB color option, as that is different from multicolor.
  Make LEDS_LP55XX_COMMON depend on I2C to fix build errors:
  Documentation: ABI: leds-turris-omnia: document sysfs attribute
  leds: initial support for Turris Omnia LEDs
  dt-bindings: leds: add cznic,turris-omnia-leds binding
  leds: pattern trigger -- check pattern for validity
  leds: Replace HTTP links with HTTPS ones
  leds: trigger: add support for LED-private device triggers
  leds: lp5521: Add multicolor framework multicolor brightness support
  leds: lp5523: Update the lp5523 code to add multicolor brightness function
  leds: lp55xx: Add multicolor framework support to lp55xx
  leds: lp55xx: Convert LED class registration to devm_*
  dt-bindings: leds: Convert leds-lp55xx to yaml
  leds: multicolor: Introduce a multicolor class definition
  leds: Add multicolor ID to the color ID list
  dt: bindings: Add multicolor class dt bindings documention
  leds: lp5523: Fix various formatting issues in the code
  leds: lp55xx: Fix file permissions to use DEVICE_ATTR macros
  ...
2020-08-05 19:24:27 -07:00
Linus Torvalds
fffe3ae0ee Merge tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull hmm updates from Jason Gunthorpe:
 "Ralph has been working on nouveau's use of hmm_range_fault() and
  migrate_vma() which resulted in this small series. It adds reporting
  of the page table order from hmm_range_fault() and some optimization
  of migrate_vma():

   - Report the size of the page table mapping out of hmm_range_fault().

     This makes it easier to establish a large/huge/etc mapping in the
     device's page table.

   - Allow devices to ignore the invalidations during migration in cases
     where the migration is not going to change pages.

     For instance migrating pages to a device does not require the
     device to invalidate pages already in the device.

   - Update nouveau and hmm_tests to use the above"

* tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  mm/hmm/test: use the new migration invalidation
  nouveau/svm: use the new migration invalidation
  mm/notifier: add migration invalidation type
  mm/migrate: add a flags parameter to migrate_vma
  nouveau: fix storing invalid ptes
  nouveau/hmm: support mapping large sysmem pages
  nouveau: fix mapping 2MB sysmem pages
  nouveau/hmm: fault one page at a time
  mm/hmm: add tests for hmm_pfn_to_map_order()
  mm/hmm: provide the page mapping order in hmm_range_fault()
2020-08-05 13:28:50 -07:00
Linus Torvalds
8f7be62915 Merge tag 'mmc-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC updates from Ulf Hansson:
 "MMC core:

   - Add a new host cap bit and a corresponding DT property, to support
     power cycling of the card by FW at system suspend/resume.

   - Fix clock rate setting for SDIO in SDR12/SDR25 speed-mode

   - Fix switch to 1/4-bit mode at system suspend/resume for SD-combo
     cards

   - Convert the mmc-pwrseq DT bindings to the json-schema

   - Always allow the card detect uevent to be consumed by userspace

  MMC host controllers:

   - Convert a few DT bindings to the json-schema

   - mtk-sd:
      - Add support for command queue through cqhci
      - Add support for the MT6779 variant

   - renesas_sdhi_internal_dmac:
      - Fix dma unmapping in the error path

   - sdhci_am654:
      - Add support for the AM65x PG2.0 variant
      - Extend support for phys/clocks

   - sdhci-cadence:
      - Drop incorrect HW tuning for SD mode

   - sdhci-msm:
      - Add support for interconnect bandwidth scaling
      - Enable internal voltage control
      - Enable low power state for pinctrls

   - sdhci-of-at91:
      - Ludovic Desroches handovers maintenance to Eugen Hristev

   - sdhci-pci-gli:
      - Improve clock handling for GL975x

   - sdhci-pci-o2micro:
      - Add HW tuning for SDR104 mode
      - Fix support for O2 host controller Seabird1"

* tag 'mmc-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: (66 commits)
  mmc: mediatek: make function msdc_cqe_disable() static
  MAINTAINERS: mmc: sdhci-of-at91: handover maintenance to Eugen Hristev
  dt-bindings: mmc: mediatek: Add document for mt6779
  mmc: mediatek: command queue support
  mmc: mediatek: refine msdc timeout api
  mmc: mediatek: add MT6779 MMC driver support
  mmc: sdhci-pci-o2micro: Add HW tuning for SDR104 mode
  mmc: sdhci-pci-o2micro: Bug fix for O2 host controller Seabird1
  mmc: via-sdmmc: use generic power management
  memstick: jmb38x_ms: use generic power management
  mmc: sdhci-cadence: do not use hardware tuning for SD mode
  mmc: sdhci-pci-gli: Set SDR104's clock to 205MHz and enable SSC for GL975x
  mmc: cqhci: Fix a print format for the task descriptor
  mmc: sdhci-of-arasan: fix timings allocation code
  mmc: sdhci: Fix a potential uninitialized variable
  dt-bindings: mmc: renesas,sdhi: convert to YAML
  dt-bindings: mmc: convert arasan sdhci bindings to yaml
  mmc: sdhci: Fix potential null pointer access while accessing vqmmc
  mmc: core: Add MMC_CAP2_FULL_PWR_CYCLE_IN_SUSPEND
  dt-bindings: mmc: Add full-pwr-cycle-in-suspend property
  ...
2020-08-05 13:23:24 -07:00
Linus Torvalds
9aebd3254c Merge tag 'hwmon-for-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon updates from Guenter Roeck:
 "Highlights:
   - New driver for Sparx5 SoC temperature sensot
   - New driver for Corsair Commander Pro
   - MAX20710 support added to max20730 driver

  Enhancements:
   - max6697: Allow max6581 to create tempX_offset attributes
   - gsc (Gateworks System Controller): add 16bit pre-scaled voltage mode
   - adm1275: Enable adm1278 ADM1278_TEMP1_EN
   - dell-smm: Add Latitude 5480 to fan control whitelist

  Fixes:
   - adc128d818: Fix advanced configuration register init
   - pmbus/core: Use s64 instead of long for calculations to fix
     overflow issues with 32-bit architectures

  Plus various cleanups in several drivers"

* tag 'hwmon-for-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: (32 commits)
  hwmon: (adc128d818) Fix advanced configuration register init
  hwmon: (axi-fan-control) remove duplicate macros
  hwmon: (i5k_amb, vt8231) Drop uses of pci_read_config_*() return value
  hwmon: (sparx5) Make symbol 's5_temp_match' static
  hwmon: (corsair-cpro) add reading pwm values
  hwmon: sparx5: Add Sparx5 SoC temperature driver
  dt-bindings: hwmon: Add Sparx5 temperature sensor
  hwmon: (tmp401) Replace HTTP links with HTTPS ones
  hwmon: (lm95234) Replace HTTP links with HTTPS ones
  hwmon: (lm90) Replace HTTP links with HTTPS ones
  hwmon: (k8temp) Replace HTTP links with HTTPS ones
  hwmon: (jc42) Replace HTTP links with HTTPS ones
  hwmon: (ina2xx) Replace HTTP links with HTTPS ones
  hwmon: (ina209) Replace HTTP links with HTTPS ones
  hwmon: Replace HTTP links with HTTPS ones
  docs: hwmon: Replace HTTP links with HTTPS ones
  hwmon: (adm1025) Replace HTTP links with HTTPS ones
  hwmon: add Corsair Commander Pro driver
  hwmon: (max6697) Allow max6581 to create tempX_offset
  hwmon: (tmmp513) Replace HTTP links with HTTPS links
  ...
2020-08-05 13:13:57 -07:00
Linus Torvalds
441977979a Merge tag 'devicetree-for-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull Devicetree updates from Rob Herring:

 - Improve device links cycle detection and breaking. Add more bindings
   for device link dependencies.

 - Refactor parsing 'no-map' in __reserved_mem_alloc_size()

 - Improve DT unittest 'ranges' and 'dma-ranges' test case to check
   differing cell sizes

 - Various http to https link conversions

 - Add a schema check to prevent 'syscon' from being used by itself
   without a more specific compatible

 - A bunch more DT binding conversions to schema

* tag 'devicetree-for-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (55 commits)
  of: reserved-memory: remove duplicated call to of_get_flat_dt_prop() for no-map node
  of: unittest: Use bigger address cells to catch parser regressions
  dt-bindings: memory-controllers: Convert mmdc to json-schema
  dt-bindings: mtd: Convert imx nand to json-schema
  dt-bindings: mtd: Convert gpmi nand to json-schema
  dt-bindings: iio: io-channel-mux: Fix compatible string in example code
  of: property: Add device link support for pinctrl-0 through pinctrl-8
  of: property: Add device link support for multiple DT bindings
  dt-bindings: phy: ti: phy-gmii-sel: convert bindings to json-schema
  dt-bindings: mux: mux.h: drop a duplicated word
  dt-bindings: misc: Convert olpc,xo1.75-ec to json-schema
  dt-bindings: aspeed-lpc: Replace HTTP links with HTTPS ones
  dt-bindings: drm/bridge: Replace HTTP links with HTTPS ones
  drm/tilcdc: Replace HTTP links with HTTPS ones
  dt-bindings: iommu: renesas,ipmmu-vmsa: Add r8a774e1 support
  dt-bindings: fpga: Replace HTTP links with HTTPS ones
  dt-bindings: virtio: Replace HTTP links with HTTPS ones
  dt-bindings: media: imx274: Add optional input clock and supplies
  dt-bindings: i2c-gpio: Use 'deprecated' keyword on deprecated properties
  dt-bindings: interrupt-controller: Fix typos in loongson,liointc.yaml
  ...
2020-08-05 13:02:45 -07:00
Linus Torvalds
1d8ce0e093 Merge tag 'gpio-v5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO updates from Linus Walleij:
 "This is the bulk of GPIO changes for the v5.9 kernel cycle.

  There is nothing too exciting in it, but a new macro that fixes a
  build failure on a minor ARM32 platform that appeared yesterday is
  part of it so we better merge it.

  Core changes:

   - Introduce the for_each_requested_gpio() macro to help in dependent
     code all over the place. Also patch a few locations to use it while
     we are at it.

   - Split out the sysfs code into its own file.

   - Split out the character device code into its own file, then make a
     set of refactorings and improvements to this code. We are setting
     the stage to revamp the userspace API a bit in the next cycle.

   - Fix a whole slew of kerneldoc that was wrong or missing.

  New drivers:

   - The PCA953x driver now supports the PCAL9535.

  Driver improvements:

   - A host of incremental modernizations and improvements to the
     PCA953x driver.

   - Incremental improvements to the Xilinx Zynq driver.

   - Some improvements to the GPIO aggregator driver.

   - I ran all over the place switching all threaded and other drivers
     requesting their own IRQ while using the core GPIO IRQ helpers to
     pass the GPIO irq chip as a template instead of calling the
     explicit set-up functions. Next merge window we may retire the old
     code altogether"

* tag 'gpio-v5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (97 commits)
  gpio: wcove: Request IRQ after all initialisation done
  gpio: crystalcove: Free IRQ on error path
  gpio: pca953x: Request IRQ after all initialisation done
  gpio: don't use same lockdep class for all devm_gpiochip_add_data users
  gpio: max732x: Use irqchip template
  gpio: stmpe: Move chip registration
  gpio: rcar: Use irqchip template
  gpio: regmap: fix type clash
  gpio: Correct kernel-doc inconsistency
  gpio: pci-idio-16: Use irqchip template
  gpio: pcie-idio-24: Use irqchip template
  gpio: 104-idio-16: Use irqchip template
  gpio: 104-idi-48: Use irqchip template
  gpio: 104-dio-48e: Use irqchip template
  gpio: ws16c48: Use irqchip template
  gpio: omap: improve coding style for pin config flags
  gpio: dln2: Use irqchip template
  gpio: sch: Add a blank line between declaration and code
  gpio: sch: changed every 'unsigned' to 'unsigned int'
  gpio: ich: changed every 'unsigned' to 'unsigned int'
  ...
2020-08-05 12:56:27 -07:00
Xin Long
81f6cb3122 ipv6: add ipv6_dev_find()
This is to add an ip_dev_find like function for ipv6, used to find
the dev by saddr.

It will be used by TIPC protocol. So also export it.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-05 12:19:52 -07:00