mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2025-12-27 11:06:41 -05:00
Pull mm updates from Andrew Morton:
"The usual shower of singleton fixes and minor series all over MM,
documented (hopefully adequately) in the respective changelogs.
Notable series include:
- Lucas Stach has provided some page-mapping cleanup/consolidation/
maintainability work in the series "mm/treewide: Remove pXd_huge()
API".
- In the series "Allow migrate on protnone reference with
MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's
MPOL_PREFERRED_MANY mode, yielding almost doubled performance in
one test.
- In their series "Memory allocation profiling" Kent Overstreet and
Suren Baghdasaryan have contributed a means of determining (via
/proc/allocinfo) whereabouts in the kernel memory is being
allocated: number of calls and amount of memory.
- Matthew Wilcox has provided the series "Various significant MM
patches" which does a number of rather unrelated things, but in
largely similar code sites.
- In his series "mm: page_alloc: freelist migratetype hygiene"
Johannes Weiner has fixed the page allocator's handling of
migratetype requests, with resulting improvements in compaction
efficiency.
- In the series "make the hugetlb migration strategy consistent"
Baolin Wang has fixed a hugetlb migration issue, which should
improve hugetlb allocation reliability.
- Liu Shixin has hit an I/O meltdown caused by readahead in a
memory-tight memcg. Addressed in the series "Fix I/O high when
memory almost met memcg limit".
- In the series "mm/filemap: optimize folio adding and splitting"
Kairui Song has optimized pagecache insertion, yielding ~10%
performance improvement in one test.
- Baoquan He has cleaned up and consolidated the early zone
initialization code in the series "mm/mm_init.c: refactor
free_area_init_core()".
- Baoquan has also redone some MM initializatio code in the series
"mm/init: minor clean up and improvement".
- MM helper cleanups from Christoph Hellwig in his series "remove
follow_pfn".
- More cleanups from Matthew Wilcox in the series "Various
page->flags cleanups".
- Vlastimil Babka has contributed maintainability improvements in the
series "memcg_kmem hooks refactoring".
- More folio conversions and cleanups in Matthew Wilcox's series:
"Convert huge_zero_page to huge_zero_folio"
"khugepaged folio conversions"
"Remove page_idle and page_young wrappers"
"Use folio APIs in procfs"
"Clean up __folio_put()"
"Some cleanups for memory-failure"
"Remove page_mapping()"
"More folio compat code removal"
- David Hildenbrand chipped in with "fs/proc/task_mmu: convert
hugetlb functions to work on folis".
- Code consolidation and cleanup work related to GUP's handling of
hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2".
- Rick Edgecombe has developed some fixes to stack guard gaps in the
series "Cover a guard gap corner case".
- Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the
series "mm/ksm: fix ksm exec support for prctl".
- Baolin Wang has implemented NUMA balancing for multi-size THPs.
This is a simple first-cut implementation for now. The series is
"support multi-size THP numa balancing".
- Cleanups to vma handling helper functions from Matthew Wilcox in
the series "Unify vma_address and vma_pgoff_address".
- Some selftests maintenance work from Dev Jain in the series
"selftests/mm: mremap_test: Optimizations and style fixes".
- Improvements to the swapping of multi-size THPs from Ryan Roberts
in the series "Swap-out mTHP without splitting".
- Kefeng Wang has significantly optimized the handling of arm64's
permission page faults in the series
"arch/mm/fault: accelerate pagefault when badaccess"
"mm: remove arch's private VM_FAULT_BADMAP/BADACCESS"
- GUP cleanups from David Hildenbrand in "mm/gup: consistently call
it GUP-fast".
- hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault
path to use struct vm_fault".
- selftests build fixes from John Hubbard in the series "Fix
selftests/mm build without requiring "make headers"".
- Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the
series "Improved Memory Tier Creation for CPUless NUMA Nodes".
Fixes the initialization code so that migration between different
memory types works as intended.
- David Hildenbrand has improved follow_pte() and fixed an errant
driver in the series "mm: follow_pte() improvements and acrn
follow_pte() fixes".
- David also did some cleanup work on large folio mapcounts in his
series "mm: mapcount for large folios + page_mapcount() cleanups".
- Folio conversions in KSM in Alex Shi's series "transfer page to
folio in KSM".
- Barry Song has added some sysfs stats for monitoring multi-size
THP's in the series "mm: add per-order mTHP alloc and swpout
counters".
- Some zswap cleanups from Yosry Ahmed in the series "zswap
same-filled and limit checking cleanups".
- Matthew Wilcox has been looking at buffer_head code and found the
documentation to be lacking. The series is "Improve buffer head
documentation".
- Multi-size THPs get more work, this time from Lance Yang. His
series "mm/madvise: enhance lazyfreeing with mTHP in madvise_free"
optimizes the freeing of these things.
- Kemeng Shi has added more userspace-visible writeback
instrumentation in the series "Improve visibility of writeback".
- Kemeng Shi then sent some maintenance work on top in the series
"Fix and cleanups to page-writeback".
- Matthew Wilcox reduces mmap_lock traffic in the anon vma code in
the series "Improve anon_vma scalability for anon VMAs". Intel's
test bot reported an improbable 3x improvement in one test.
- SeongJae Park adds some DAMON feature work in the series
"mm/damon: add a DAMOS filter type for page granularity access recheck"
"selftests/damon: add DAMOS quota goal test"
- Also some maintenance work in the series
"mm/damon/paddr: simplify page level access re-check for pageout"
"mm/damon: misc fixes and improvements"
- David Hildenbrand has disabled some known-to-fail selftests ni the
series "selftests: mm: cow: flag vmsplice() hugetlb tests as
XFAIL".
- memcg metadata storage optimizations from Shakeel Butt in "memcg:
reduce memory consumption by memcg stats".
- DAX fixes and maintenance work from Vishal Verma in the series
"dax/bus.c: Fixups for dax-bus locking""
* tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (426 commits)
memcg, oom: cleanup unused memcg_oom_gfp_mask and memcg_oom_order
selftests/mm: hugetlb_madv_vs_map: avoid test skipping by querying hugepage size at runtime
mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_wp
mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_fault
selftests: cgroup: add tests to verify the zswap writeback path
mm: memcg: make alloc_mem_cgroup_per_node_info() return bool
mm/damon/core: fix return value from damos_wmark_metric_value
mm: do not update memcg stats for NR_{FILE/SHMEM}_PMDMAPPED
selftests: cgroup: remove redundant enabling of memory controller
Docs/mm/damon/maintainer-profile: allow posting patches based on damon/next tree
Docs/mm/damon/maintainer-profile: change the maintainer's timezone from PST to PT
Docs/mm/damon/design: use a list for supported filters
Docs/admin-guide/mm/damon/usage: fix wrong schemes effective quota update command
Docs/admin-guide/mm/damon/usage: fix wrong example of DAMOS filter matching sysfs file
selftests/damon: classify tests for functionalities and regressions
selftests/damon/_damon_sysfs: use 'is' instead of '==' for 'None'
selftests/damon/_damon_sysfs: find sysfs mount point from /proc/mounts
selftests/damon/_damon_sysfs: check errors from nr_schemes file reads
mm/damon/core: initialize ->esz_bp from damos_quota_init_priv()
selftests/damon: add a test for DAMOS quota goal
...
187 lines
4.6 KiB
C
187 lines
4.6 KiB
C
// SPDX-License-Identifier: GPL-2.0
|
|
/*
|
|
* arch/alpha/lib/checksum.c
|
|
*
|
|
* This file contains network checksum routines that are better done
|
|
* in an architecture-specific manner due to speed..
|
|
* Comments in other versions indicate that the algorithms are from RFC1071
|
|
*
|
|
* accelerated versions (and 21264 assembly versions ) contributed by
|
|
* Rick Gorton <rick.gorton@alpha-processor.com>
|
|
*/
|
|
|
|
#include <linux/module.h>
|
|
#include <linux/string.h>
|
|
#include <net/checksum.h>
|
|
|
|
#include <asm/byteorder.h>
|
|
#include <asm/checksum.h>
|
|
|
|
static inline unsigned short from64to16(unsigned long x)
|
|
{
|
|
/* Using extract instructions is a bit more efficient
|
|
than the original shift/bitmask version. */
|
|
|
|
union {
|
|
unsigned long ul;
|
|
unsigned int ui[2];
|
|
unsigned short us[4];
|
|
} in_v, tmp_v, out_v;
|
|
|
|
in_v.ul = x;
|
|
tmp_v.ul = (unsigned long) in_v.ui[0] + (unsigned long) in_v.ui[1];
|
|
|
|
/* Since the bits of tmp_v.sh[3] are going to always be zero,
|
|
we don't have to bother to add that in. */
|
|
out_v.ul = (unsigned long) tmp_v.us[0] + (unsigned long) tmp_v.us[1]
|
|
+ (unsigned long) tmp_v.us[2];
|
|
|
|
/* Similarly, out_v.us[2] is always zero for the final add. */
|
|
return out_v.us[0] + out_v.us[1];
|
|
}
|
|
|
|
/*
|
|
* computes the checksum of the TCP/UDP pseudo-header
|
|
* returns a 16-bit checksum, already complemented.
|
|
*/
|
|
__sum16 csum_tcpudp_magic(__be32 saddr, __be32 daddr,
|
|
__u32 len, __u8 proto, __wsum sum)
|
|
{
|
|
return (__force __sum16)~from64to16(
|
|
(__force u64)saddr + (__force u64)daddr +
|
|
(__force u64)sum + ((len + proto) << 8));
|
|
}
|
|
EXPORT_SYMBOL(csum_tcpudp_magic);
|
|
|
|
__wsum csum_tcpudp_nofold(__be32 saddr, __be32 daddr,
|
|
__u32 len, __u8 proto, __wsum sum)
|
|
{
|
|
unsigned long result;
|
|
|
|
result = (__force u64)saddr + (__force u64)daddr +
|
|
(__force u64)sum + ((len + proto) << 8);
|
|
|
|
/* Fold down to 32-bits so we don't lose in the typedef-less
|
|
network stack. */
|
|
/* 64 to 33 */
|
|
result = (result & 0xffffffff) + (result >> 32);
|
|
/* 33 to 32 */
|
|
result = (result & 0xffffffff) + (result >> 32);
|
|
return (__force __wsum)result;
|
|
}
|
|
EXPORT_SYMBOL(csum_tcpudp_nofold);
|
|
|
|
/*
|
|
* Do a 64-bit checksum on an arbitrary memory area..
|
|
*
|
|
* This isn't a great routine, but it's not _horrible_ either. The
|
|
* inner loop could be unrolled a bit further, and there are better
|
|
* ways to do the carry, but this is reasonable.
|
|
*/
|
|
static inline unsigned long do_csum(const unsigned char * buff, int len)
|
|
{
|
|
int odd, count;
|
|
unsigned long result = 0;
|
|
|
|
if (len <= 0)
|
|
goto out;
|
|
odd = 1 & (unsigned long) buff;
|
|
if (odd) {
|
|
result = *buff << 8;
|
|
len--;
|
|
buff++;
|
|
}
|
|
count = len >> 1; /* nr of 16-bit words.. */
|
|
if (count) {
|
|
if (2 & (unsigned long) buff) {
|
|
result += *(unsigned short *) buff;
|
|
count--;
|
|
len -= 2;
|
|
buff += 2;
|
|
}
|
|
count >>= 1; /* nr of 32-bit words.. */
|
|
if (count) {
|
|
if (4 & (unsigned long) buff) {
|
|
result += *(unsigned int *) buff;
|
|
count--;
|
|
len -= 4;
|
|
buff += 4;
|
|
}
|
|
count >>= 1; /* nr of 64-bit words.. */
|
|
if (count) {
|
|
unsigned long carry = 0;
|
|
do {
|
|
unsigned long w = *(unsigned long *) buff;
|
|
count--;
|
|
buff += 8;
|
|
result += carry;
|
|
result += w;
|
|
carry = (w > result);
|
|
} while (count);
|
|
result += carry;
|
|
result = (result & 0xffffffff) + (result >> 32);
|
|
}
|
|
if (len & 4) {
|
|
result += *(unsigned int *) buff;
|
|
buff += 4;
|
|
}
|
|
}
|
|
if (len & 2) {
|
|
result += *(unsigned short *) buff;
|
|
buff += 2;
|
|
}
|
|
}
|
|
if (len & 1)
|
|
result += *buff;
|
|
result = from64to16(result);
|
|
if (odd)
|
|
result = ((result >> 8) & 0xff) | ((result & 0xff) << 8);
|
|
out:
|
|
return result;
|
|
}
|
|
|
|
/*
|
|
* This is a version of ip_compute_csum() optimized for IP headers,
|
|
* which always checksum on 4 octet boundaries.
|
|
*/
|
|
__sum16 ip_fast_csum(const void *iph, unsigned int ihl)
|
|
{
|
|
return (__force __sum16)~do_csum(iph,ihl*4);
|
|
}
|
|
EXPORT_SYMBOL(ip_fast_csum);
|
|
|
|
/*
|
|
* computes the checksum of a memory block at buff, length len,
|
|
* and adds in "sum" (32-bit)
|
|
*
|
|
* returns a 32-bit number suitable for feeding into itself
|
|
* or csum_tcpudp_magic
|
|
*
|
|
* this function must be called with even lengths, except
|
|
* for the last fragment, which may be odd
|
|
*
|
|
* it's best to have buff aligned on a 32-bit boundary
|
|
*/
|
|
__wsum csum_partial(const void *buff, int len, __wsum sum)
|
|
{
|
|
unsigned long result = do_csum(buff, len);
|
|
|
|
/* add in old sum, and carry.. */
|
|
result += (__force u32)sum;
|
|
/* 32+c bits -> 32 bits */
|
|
result = (result & 0xffffffff) + (result >> 32);
|
|
return (__force __wsum)result;
|
|
}
|
|
|
|
EXPORT_SYMBOL(csum_partial);
|
|
|
|
/*
|
|
* this routine is used for miscellaneous IP-like checksums, mainly
|
|
* in icmp.c
|
|
*/
|
|
__sum16 ip_compute_csum(const void *buff, int len)
|
|
{
|
|
return (__force __sum16)~from64to16(do_csum(buff,len));
|
|
}
|
|
EXPORT_SYMBOL(ip_compute_csum);
|