Commit Graph

547601 Commits

Author SHA1 Message Date
Jaegeuk Kim
ab126cfc30 f2fs: should get a victim from retrials
If we do not call get_victim first, we cannot get a new victim for retrial
path.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:55 -07:00
Chao Yu
45fe8492cc f2fs: fix to correct freed section number during gc
This patch fixes to maintain the right section count freed in garbage
collecting when triggering a foreground gc.

Besides, when a foreground gc is running on current selected section, once
we fail to gc one segment, it's better to abandon gcing the left segments
in current section, because anyway we will select next victim for
foreground gc, so gc on the left segments in previous section will become
overhead and also cause the long latency for caller.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:54 -07:00
Chao Yu
345a6b2ee2 f2fs: fix to update {m,c}time correctly when truncating larger
This patch fixes to update ctime and atime correctly when truncating
larger in ->setattr.

The bug is reported by xfstest generic/313 as below:

generic/313 2s ... - output mismatch (see ./results/generic/313.out.bad)
    --- tests/generic/313.out   2015-08-04 15:28:53.430798882 +0800
    +++ results/generic/313.out.bad   2015-09-28 17:04:27.294278016 +0800
    @@ -1,2 +1,4 @@
     QA output created by 313
     Silence is golden
    +ctime not updated after truncate up
    +mtime not updated after truncate up
    ...
    (Run 'diff -u tests/generic/313.out tests/generic/313.out.bad'  to see the entire diff)
Ran: generic/313
Failures: generic/313
Failed 1 of 1 tests

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:54 -07:00
Jaegeuk Kim
90b803e6fb f2fs: do not skip dentry block writes
Previously, we skip dentry block writes when wbc is SYNC_NONE with no memory
pressure and the number of dirty pages is pretty small.

But, we didn't skip for normal data writes, which gives us not much big impact
on overall performance.
Moreover, by skipping some data writes, kworker falls into infinite loop to try
to write blocks, when many dir inodes have only one dentry block.

So, this patch removes skipping data writes.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:54 -07:00
Chao Yu
7223554133 f2fs: remove unneeded f2fs_{,un}lock_op in do_recover_data()
Protecting recovery flow by using cp_rwsem is not needed, since we have
prevent triggering any checkpoint by locking cp_mutex previously.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:54 -07:00
Chao Yu
1d7e10d58a f2fs: fix incorrect bimodal calculation
In update_sit_info, we use div_u64 to handle 'u64 divide u64' case, but
div_u64 can only handle 32-bits divisor, so our divisor with u64 type
passed to div_u64 will overflow, result in the wrong calculation when
show debug info of f2fs as below:

BDF: 464, avg. vblocks: 23509
(BDF should never exceed 100)

So change to use div64_u64 to handle this case correctly.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:53 -07:00
Chao Yu
4abd3f5ac4 f2fs: introduce __try_update_largest_extent
This patch adds a new helper __try_update_largest_extent for cleanup.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:53 -07:00
Nicholas Krause
545fe4210d f2fs: fix error handling for calls to various functions in the function recover_inline_data
This fixes error handling for calls to various functions in the
function  recover_inline_data to check if these particular functions
either return a error code or the boolean value false to signal their
caller they have failed internally and if this arises return false
to signal failure immediately to the caller of recover_inline_data
as we cannot continue after failures to calling either the function
truncate_inline_inode or truncate_blocks.

Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:53 -07:00
Chao Yu
9cd81ce3c2 f2fs: disallow switch extent_cache option dynamically
Swith extent_cache option dynamically when remount may casue consistency
issue between extent cache and dnode page. Fix in this patch to avoid
that condition.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:53 -07:00
Chao Yu
46c9e1413f f2fs: use correct flag in f2fs_map_blocks()
We introduce F2FS_GET_BLOCK_READ in commit e2b4e2bc88 ("f2fs: fix
incorrect mapping for bmap"), but forget to use this flag in the right
place, fix it.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:52 -07:00
Chao Yu
f9811703fe f2fs: fix to handle io error in ->direct_IO
Here is a oops reported as following message when testing generic/019 of
xfstest:

 ------------[ cut here ]------------
 kernel BUG at /home/yuchao/git/f2fs-dev/segment.c:882!
 invalid opcode: 0000 [#1] SMP
 Modules linked in: zram lz4_compress lz4_decompress f2fs(O) ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4
nf_def
 CPU: 2 PID: 25441 Comm: fio Tainted: G           O    4.3.0-rc1+ #6
 Hardware name: Hewlett-Packard HP Z220 CMT Workstation/1790, BIOS K51 v01.61 05/16/2013
 task: ffff8803f4e85580 ti: ffff8803fd61c000 task.ti: ffff8803fd61c000
 RIP: 0010:[<ffffffffa0784981>]  [<ffffffffa0784981>] new_curseg+0x321/0x330 [f2fs]
 RSP: 0018:ffff8803fd61f918  EFLAGS: 00010246
 RAX: 00000000000007ed RBX: 0000000000000224 RCX: 000000000000001f
 RDX: 0000000000000800 RSI: ffffffffffffffff RDI: ffff8803f56f4300
 RBP: ffff8803fd61f978 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000024 R11: ffff8800d23bbd78 R12: ffff8800d0ef0000
 R13: 0000000000000224 R14: 0000000000000000 R15: 0000000000000001
 FS:  00007f827ff85700(0000) GS:ffff88041ea80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: ffffffffff600000 CR3: 00000003fef17000 CR4: 00000000001406e0
 Stack:
  000007ea00000002 0000000100000001 ffff8803f6456248 000007ed0000002b
  0000000000000224 ffff880404d1aa20 ffff8803fd61f9c8 ffff8800d0ef0000
  ffff8803f6456248 0000000000000001 00000000ffffffff ffffffffa078f358
 Call Trace:
  [<ffffffffa0785b87>] allocate_segment_by_default+0x1a7/0x1f0 [f2fs]
  [<ffffffffa078322c>] allocate_data_block+0x17c/0x360 [f2fs]
  [<ffffffffa0779521>] __allocate_data_block+0x131/0x1d0 [f2fs]
  [<ffffffffa077a995>] f2fs_direct_IO+0x4b5/0x580 [f2fs]
  [<ffffffff811510ae>] generic_file_direct_write+0xae/0x160
  [<ffffffff811518f5>] __generic_file_write_iter+0xd5/0x1f0
  [<ffffffff81151e07>] generic_file_write_iter+0xf7/0x200
  [<ffffffff81319e38>] ? apparmor_file_permission+0x18/0x20
  [<ffffffffa0768480>] ? f2fs_fallocate+0x1190/0x1190 [f2fs]
  [<ffffffffa07684c6>] f2fs_file_write_iter+0x46/0x90 [f2fs]
  [<ffffffff8120b4fe>] aio_run_iocb+0x1ee/0x290
  [<ffffffff81700f7e>] ? mutex_lock+0x1e/0x50
  [<ffffffff8120a1d7>] ? aio_read_events+0x207/0x2b0
  [<ffffffff8120b913>] do_io_submit+0x373/0x630
  [<ffffffff8120a4f6>] ? SyS_io_getevents+0x56/0xb0
  [<ffffffff8120bbe0>] SyS_io_submit+0x10/0x20
  [<ffffffff81703857>] entry_SYSCALL_64_fastpath+0x12/0x6a
 Code: 45 c8 48 8b 78 10 e8 9f 23 bf e0 41 8b 8c 24 cc 03 00 00 89 c7 31 d2 89 c6 89 d8 29 df f7 f1 29 d1 39 cf 0f 83 be fd ff ff eb
 RIP  [<ffffffffa0784981>] new_curseg+0x321/0x330 [f2fs]
  RSP <ffff8803fd61f918>
 ---[ end trace 2e577d7f711ddb86 ]---

The reason is that: in the test of generic/019, we will trigger a manmade
IO error in block layer through debugfs, after that, prefree segment will
no longer be freed, because we always skip doing gc or checkpoint when
there occurs an IO error.

Meanwhile fio with aio engine generated a large number of direct IOs,
which continue allocating spaces in free segment until we run out of them,
eventually, results in panic in new_curseg as no more free segment was
found.

So, this patch changes to return EIO in direct_IO for this condition.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:52 -07:00
Chao Yu
ea58711e88 f2fs: do in batches truncation in truncate_hole
truncate_data_blocks_range can do in batches truncation which makes all
changes in dnode page content, dnode page status, extent cache, block
count updating together.

But previously, truncate_hole() always truncates one block in dnode page
at a time by invoking truncate_data_blocks_range(,1), which make thing
slow.

This patch changes truncate_hole() to do in batches truncation for all
target blocks in one direct node inside truncate_data_blocks_range, which
can make our punch hole operation in ->fallocate more efficent.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:52 -07:00
Fan Li
4d1fa815f2 f2fs: optimize code of f2fs_update_extent_tree_range
Fix 2 potential problems:
1. when largest extent needs to be invalidated, it will be reset in
   __drop_largest_extent, which makes __is_extent_same after always
   return false, and largest extent unchanged. Now we update it properly.

2. when extent is split and the latter part remains in tree, next_en
   should be the latter part instead of next extent of original extent.
   It will cause merge failure if there is in-place update, although
   there is not, I think this fix will still makes codes less ambiguous.

This patch also simplifies codes of invalidating extents, and optimizes the
procedues that split extent into two.
There are a few modifications after last patch:
1. prev_en now is updated properly.
2. more codes and branches are simplified.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:52 -07:00
Fan Li
41a099de3a f2fs: drop largest extent by range
now we update extent by range, fofs may not be on the largest
extent if the new extent overlaps with it. so add a new function
to drop largest extent properly.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:51 -07:00
Jaegeuk Kim
a7230d16d5 f2fs: check end_io for metapages before making next checkpoint blocks
This patch avoids to produce new checkpoint blocks before the previous meta
pages were written completely.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:51 -07:00
Jaegeuk Kim
569cf1876a f2fs crypto: allocate buffer for decrypting filename
We got dentry pages from high_mem, and its address space directly goes into the
decryption path via f2fs_fname_disk_to_usr.
But, sg_init_one assumes the address is not from high_mem, so we can get this
panic since it doesn't call kmap_high but kunmap_high is triggered at the end.

kernel BUG at ../../../../../../kernel/mm/highmem.c:290!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
...
 (kunmap_high+0xb0/0xb8) from [<c0114534>] (__kunmap_atomic+0xa0/0xa4)
 (__kunmap_atomic+0xa0/0xa4) from [<c035f028>] (blkcipher_walk_done+0x128/0x1ec)
 (blkcipher_walk_done+0x128/0x1ec) from [<c0366c24>] (crypto_cbc_decrypt+0xc0/0x170)
 (crypto_cbc_decrypt+0xc0/0x170) from [<c0367148>] (crypto_cts_decrypt+0xc0/0x114)
 (crypto_cts_decrypt+0xc0/0x114) from [<c035ea98>] (async_decrypt+0x40/0x48)
 (async_decrypt+0x40/0x48) from [<c032ca34>] (f2fs_fname_disk_to_usr+0x124/0x304)
 (f2fs_fname_disk_to_usr+0x124/0x304) from [<c03056fc>] (f2fs_fill_dentries+0xac/0x188)
 (f2fs_fill_dentries+0xac/0x188) from [<c03059c8>] (f2fs_readdir+0x1f0/0x300)
 (f2fs_readdir+0x1f0/0x300) from [<c0218054>] (vfs_readdir+0x90/0xb4)
 (vfs_readdir+0x90/0xb4) from [<c0218418>] (SyS_getdents64+0x64/0xcc)
 (SyS_getdents64+0x64/0xcc) from [<c0105ba0>] (ret_fast_syscall+0x0/0x30)

Cc: <stable@vger.kernel.org>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:51 -07:00
Chao Yu
973163fc0c f2fs: reorganize f2fs_map_blocks
In this patch, we try to reorganize f2fs_map_blocks to make block mapping
flow more clear by using following structure:

/* check status of mapping */

if (unmapped) {
	/* blkaddr == NULL_ADDR || blkaddr == NEW_ADDR */

	if (create) {
		/* write path, handle dio write case here */
		alloc_and_map;
	} else {
		/*
		 * handle read cases from all call paths:
		 *     1. generic read;
		 *     2. dio read;
		 *     3. fiemap;
		 *     4. bmap
		 */
	}
}

/* map buffer_header */

Besides, this patch handles the missing case correctly for dio write:
When we fail in __allocate_data_blocks, then in f2fs_map_blocks, we will
not allocate blocks correctly for preallocated blocks, but returning with
an unmapped buffer head, which will result in failure of dio write.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:51 -07:00
Jaegeuk Kim
514053e454 f2fs: declare f2fs_update_extent_tree_range as static
This function should be static.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:50 -07:00
Chao Yu
9edcdabf36 f2fs: fix overflow of size calculation
We have potential overflow issue when calculating size of object, when
we left shift index with PAGE_CACHE_SHIFT bits, if type of index has only
32-bits space in 32-bit architecture, left shifting will incur overflow,
i.e:

pgoff_t index =  0xFFFFFFFF;
loff_t size = index << PAGE_CACHE_SHIFT;
size: 0xFFFFF000

So we should cast index with 64-bits type to avoid this issue.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:50 -07:00
Chao Yu
100136acfb f2fs: fix incorrect searching position when shrinking extent cache
When shrinking extent cache, we have two steps in the flow:
1) shrink objects which are unreferenced by inodes;
2) shrink objects from LRU list of extent cache.

In step 1, if we haven't shrunk enough number of objects, we will try
step 2, but before that we didn't update the searching position which
may point to last inode index in global extent tree, result in failing
to shrink objects by traversing the all inodes' extent tree.

In this patch, we reset searching position to beginning of global extent
tree for fixing.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:50 -07:00
Chao Yu
c998012b0b f2fs: verify file type early in f2fs_fallocate
This patch changes to verify file type early in f2fs_fallocate for
cleanup, meanwhile this also fixes to add missing verification for
expand_inode_data.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:50 -07:00
Jaegeuk Kim
c5cd29d21c f2fs: no need to lock for update_inode_page all the time
As comment says, we don't need to call f2fs_lock_op in write_inode to prevent
from producing dirty node pages all the time.
That happens only when there is not enough free sections and we can avoid that
by calling balance_fs in prior to that.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:50 -07:00
Jaegeuk Kim
25b93346a6 f2fs: cover number of dirty node pages under node_write lock
This number is referenced by checkpoint under node_write lock.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:49 -07:00
Nicholas Krause
538e17e7e9 f2fs: fix incorrect return statement in the function f2fs_ioc_release_volatile_write
This fixes the incorrect return statement at the end of the function
f2fs_ioc_release_volatile_write's body for returning zero as this is
incorrect due to the function call before this return statement to
the function punch_hole being able to fail and we should return this
function's return fail directly in order to signal to callers of the
function f2fs_ioc_release_volatile if a failure arises with this call
to punch_hole fails.

Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:49 -07:00
Chao Yu
744288c721 f2fs: trace in batches extent info update
Rename trace_f2fs_update_extent_tree to trace_f2fs_update_extent_tree_range,
then expand and enable it to trace in batches extent info updates.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:49 -07:00
Linus Torvalds
c6fa8e6de3 Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
 "This addresses a couple of issues found with RT, a broken initrd
  message in the console log and a simple performance fix for some MMC
  workloads.

  Summary:

   - A couple of locking fixes for RT kernels
   - Avoid printing bogus initrd warnings when initrd isn't present
   - Performance fix for random mmap file readahead
   - Typo fix"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: replace read_lock to rcu lock in call_break_hook
  arm64: Don't relocate non-existent initrd
  arm64: convert patch_lock to raw lock
  arm64: readahead: fault retry breaks mmap file read random detection
  arm64: debug: Fix typo in debug-monitors.c
2015-10-07 18:17:46 +01:00
Linus Torvalds
e82fa92e62 Merge tag 'fbdev-fixes-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux
Pull fbdev fixes from Tomi Valkeinen:

 - fbdev: Minor fixes to broadsheetfb, fsl-diu-fb, mb862xxfb, tridentfb,
   omapfb

 - display-timing: Fix memory leak in error path

* tag 'fbdev-fixes-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux:
  video: of: fix memory leak
  fbdev: broadsheetfb: fix memory leak
  OMAPDSS: panel-sony-acx565akm: Export OF module alias information
  fbdev: omap2: connector-dvi: use of_get_i2c_adapter_by_node interface
  tridentfb: Fix set_lwidth on TGUI9440 and CYBER9320
  tridentfb: fix hang on Blade3D with CONFIG_CC_OPTIMIZE_FOR_SIZE
  video: fbdev: mb862xx: Fix module autoload for OF platform driver
  video: fbdev: fsl: Fix the sleep function for FSL DIU module
2015-10-07 18:05:09 +01:00
Linus Torvalds
8ace60f8b7 Merge tag 'regmap-fix-v4.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap fixes from Mark Brown:
 "A couple of fixes for the debugfs information on the register map,
  fixing issues with very small reads potentially causing underflows and
  wraparounds"

* tag 'regmap-fix-v4.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap: debugfs: Don't bother actually printing when calculating max length
  regmap: debugfs: Ensure we don't underflow when printing access masks
2015-10-07 14:15:49 +01:00
Linus Torvalds
07443cecdf Merge tag 'spi-fix-v4.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
 "A couple of very minor fixes, one for error handling in the Davinci
  driver probe function and another making the Renesas sh-msiof DT
  binding documentation correspond to what's actually implemented"

* tag 'spi-fix-v4.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: sh-msiof: Match renesas,rx-fifo-size in DT bindings doc with driver
  spi: davinci: fix handling platform_get_irq result
2015-10-07 14:13:10 +01:00
Linus Torvalds
21f3c96188 Merge tag 'regulator-fix-v4.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
 "Two fixes here, one device specific fix for axp20x and a core fix for
   cases where one regulator is supplying another which broke probe
  deferral, substituting in a dummy regulator too aggressively"

* tag 'regulator-fix-v4.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: core: Handle probe deferral from DT when resolving supplies
  regulator: axp20x: Fix enable bit indexes for DCDC4 and DCDC5
2015-10-07 14:11:47 +01:00
Sudip Mukherjee
d663baba8b video: of: fix memory leak
If of_parse_display_timing() fails we are printing an error message and
jumping to the error path but we missed freeing "dt".

Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
2015-10-07 14:13:59 +03:00
Mark Brown
c2c80bdb9e Merge remote-tracking branches 'spi/fix/davinci' and 'spi/fix/sh-msiof' into spi-linus 2015-10-07 11:43:39 +01:00
Linus Torvalds
79c7c7acd2 Merge branch 'strscpy' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile
Pull strscpy fixes from Chris Metcalf :
 "This patch series fixes up a couple of architecture issues where
  strscpy wasn't configured correctly (missing on h8300, duplicating
  local and asm-generic copies on powerpc and tile).

  It also adds a use of zero_bytemask() to the final store for strscpy
  to avoid writing uninitialized data to the destination.  However, to
  make this work we had to add support for zero_bytemask() to the two
  architectures that didn't have it (alpha and tile), because they were
  providing their own local copies, but didn't provide the
  zero_bytemask() that was previously only required when building with
  CONFIG_DCACHE_WORD_ACCESS"

[ Side note: there is still no actual users of strscpy except for the
  one preexisting use in arch/tile that predates the generic version.
  So this is all about fixing the infrastructure so that we eventually
  can start using it.  - Linus ]

* 'strscpy' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
  strscpy: zero any trailing garbage bytes in the destination
  word-at-a-time.h: support zero_bytemask() on alpha and tile
  word-at-a-time.h: fix some Kbuild files
2015-10-07 09:52:42 +01:00
Linus Torvalds
3f5e4a3116 Merge tag 'for-linus-20151006' of git://git.infradead.org/linux-mtd
Pull MTD fixes from Brian Norris:
 "A few MTD fixes:

   - mxc_nand: a "refactoring only" change in 4.3-rc1 had some bad
     pointer (array) arithmetic.  Fix that

   - sunxi_nand:

   - Fix an old list manipulation / memory management bug in the device
     release() code path

   - Correct a few mistakes in OOB write support"

* tag 'for-linus-20151006' of git://git.infradead.org/linux-mtd:
  mxc_nand: fix copy_spare
  mtd: nand: sunxi: fix sunxi_nand_chips_cleanup()
  mtd: nand: sunxi: fix OOB handling in ->write_xxx() functions
2015-10-07 09:35:15 +01:00
Linus Torvalds
a0eeb8dd34 Merge tag 'nfs-for-4.3-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
 "Highlights include:

  Bugfixes:
   - Fix a use-after-free bug in the RPC/RDMA client
   - Fix a write performance regression
   - Fix up page writeback accounting
   - Don't try to reclaim unused state owners
   - Fix a NFSv4 nograce recovery hang
   - reset states to use open_stateid when returning delegation
     voluntarily
   - Fix a tracepoint NULL-pointer dereference"

* tag 'nfs-for-4.3-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS: Fix a tracepoint NULL-pointer dereference
  nfs4: reset states to use open_stateid when returning delegation voluntarily
  NFSv4: Fix a nograce recovery hang
  NFSv4.1: nfs4_opendata_check_deleg needs to handle NFS4_OPEN_CLAIM_DELEG_CUR_FH
  NFSv4: Don't try to reclaim unused state owners
  NFS: Fix a write performance regression
  NFS: Fix up page writeback accounting
  xprtrdma: disconnect and flush cqs before freeing buffers
2015-10-07 08:54:22 +01:00
Linus Torvalds
00a3d660cb Revert "fs: do not prefault sys_write() user buffer pages"
This reverts commit 998ef75ddb.

The commit itself does not appear to be buggy per se, but it is exposing
a bug in ext4 (and Ted thinks ext3 too, but we solved that by getting
rid of it).  It's too late in the release cycle to really worry about
this, even if Dave Hansen has a patch that may actually fix the
underlying ext4 problem.  We can (and should) revisit this for the next
release.

The problem is that moving the prefaulting later now exposes a special
case with partially successful writes that isn't handled correctly.  And
the prefaulting likely isn't normally even that much of a performance
issue - it looks like at least one reason Dave saw this in his
performance tests is that he also ran them on Skylake that now supports
the new SMAP code, which makes the normally very cheap user space
prefaulting noticeably more expensive.

Bisected-and-acked-by: Ted Ts'o <tytso@mit.edu>
Analyzed-and-acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-10-07 08:32:38 +01:00
Anna Schumaker
39d0d3bdf7 NFS: Fix a tracepoint NULL-pointer dereference
Running xfstest generic/013 with the tracepoint nfs:nfs4_open_file
enabled produces a NULL-pointer dereference when calculating fileid and
filehandle of the opened file.  Fix this by checking if state is NULL
before trying to use the inode pointer.

Reported-by: Olga Kornievskaia <aglo@umich.edu>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-10-06 18:56:25 -04:00
Chris Metcalf
990486c8af strscpy: zero any trailing garbage bytes in the destination
It's possible that the destination can be shadowed in userspace
(as, for example, the perf buffers are now).  So we should take
care not to leak data that could be inspected by userspace.

Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
2015-10-06 14:53:18 -04:00
Chris Metcalf
c753bf34c9 word-at-a-time.h: support zero_bytemask() on alpha and tile
Both alpha and tile needed implementations of zero_bytemask.

The alpha version is untested.

Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
2015-10-06 14:53:16 -04:00
Chris Metcalf
19c22f3a29 word-at-a-time.h: fix some Kbuild files
arch/tile added word-at-a-time.h after the patch that added generic-y
entries; the generic-y entry is now stale.

arch/h8300 is newer than the generic-y patch for word-at-a-time.h,
and needs a generic-y entry.

arch/powerpc seems to have gotten a generic-y entry by mistake in
the first patch; this change removes it.

Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
2015-10-06 14:52:48 -04:00
Yang Shi
62c6c61adb arm64: replace read_lock to rcu lock in call_break_hook
BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917
in_atomic(): 0, irqs_disabled(): 128, pid: 342, name: perf
1 lock held by perf/342:
 #0:  (break_hook_lock){+.+...}, at: [<ffffffc0000851ac>] call_break_hook+0x34/0xd0
irq event stamp: 62224
hardirqs last  enabled at (62223): [<ffffffc00010b7bc>] __call_rcu.constprop.59+0x104/0x270
hardirqs last disabled at (62224): [<ffffffc0000fbe20>] vprintk_emit+0x68/0x640
softirqs last  enabled at (0): [<ffffffc000097928>] copy_process.part.8+0x428/0x17f8
softirqs last disabled at (0): [<          (null)>]           (null)
CPU: 0 PID: 342 Comm: perf Not tainted 4.1.6-rt5 #4
Hardware name: linux,dummy-virt (DT)
Call trace:
[<ffffffc000089968>] dump_backtrace+0x0/0x128
[<ffffffc000089ab0>] show_stack+0x20/0x30
[<ffffffc0007030d0>] dump_stack+0x7c/0xa0
[<ffffffc0000c878c>] ___might_sleep+0x174/0x260
[<ffffffc000708ac8>] __rt_spin_lock+0x28/0x40
[<ffffffc000708db0>] rt_read_lock+0x60/0x80
[<ffffffc0000851a8>] call_break_hook+0x30/0xd0
[<ffffffc000085a70>] brk_handler+0x30/0x98
[<ffffffc000082248>] do_debug_exception+0x50/0xb8
Exception stack(0xffffffc00514fe30 to 0xffffffc00514ff50)
fe20:                                     00000000 00000000 c1594680 0000007f
fe40: ffffffff ffffffff 92063940 0000007f 0550dcd8 ffffffc0 00000000 00000000
fe60: 0514fe70 ffffffc0 000be1f8 ffffffc0 0514feb0 ffffffc0 0008948c ffffffc0
fe80: 00000004 00000000 0514fed0 ffffffc0 ffffffff ffffffff 9282a948 0000007f
fea0: 00000000 00000000 9282b708 0000007f c1592820 0000007f 00083914 ffffffc0
fec0: 00000000 00000000 00000010 00000000 00000064 00000000 00000001 00000000
fee0: 005101e0 00000000 c1594680 0000007f c1594740 0000007f ffffffd8 ffffff80
ff00: 00000000 00000000 00000000 00000000 c1594770 0000007f c1594770 0000007f
ff20: 00665e10 00000000 7f7f7f7f 7f7f7f7f 01010101 01010101 00000000 00000000
ff40: 928e4cc0 0000007f 91ff11e8 0000007f

call_break_hook is called in atomic context (hard irq disabled), so replace
the sleepable lock to rcu lock, replace relevant list operations to rcu
version and call synchronize_rcu() in unregister_break_hook().

And, replace write lock to spinlock in {un}register_break_hook.

Signed-off-by: Yang Shi <yang.shi@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-10-06 19:10:28 +01:00
Mark Rutland
4ca3bc86be arm64: Don't relocate non-existent initrd
When booting a kernel without an initrd, the kernel reports that it
moves -1 bytes worth, having gone through the motions with initrd_start
equal to initrd_end:

    Moving initrd from [4080000000-407fffffff] to [9fff49000-9fff48fff]

Prevent this by bailing out early when the initrd size is zero (i.e. we
have no initrd), avoiding the confusing message and other associated
work.

Fixes: 1570f0d7ab ("arm64: support initrd outside kernel linear map")
Cc: Mark Salter <msalter@redhat.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-10-06 18:33:15 +01:00
Linus Torvalds
f6702681a0 Merge tag 'for-linus-4.3b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen bug fixes from David Vrabel:

 - Fix VM save performance regression with x86 PV guests

 - Make kexec work in x86 PVHVM guests (if Xen has the soft-reset ABI)

 - Other minor fixes.

* tag 'for-linus-4.3b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  x86/xen/p2m: hint at the last populated P2M entry
  x86/xen: Do not clip xen_e820_map to xen_e820_map_entries when sanitizing map
  x86/xen: Support kexec/kdump in HVM guests by doing a soft reset
  xen/x86: Don't try to write syscall-related MSRs for PV guests
  xen: use correct type for HYPERVISOR_memory_op()
2015-10-06 15:05:02 +01:00
Linus Torvalds
3ec20e2e61 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
 "Three bug fixes and an update to the default configuration"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/defconfig: set SCSI_DH=y
  s390/vtime: correct scaled cputime of partially idle CPUs
  s390/boot/decompression: disable floating point in decompressor
  s390/numa: use correct type for node_to_cpumask_map
2015-10-06 14:59:36 +01:00
Linus Torvalds
3c68319b28 Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull CIFS fixes from Steve French:
 "Two fixes for problems pointed out by automated tools.

  Thanks PaX/grsecurity team and Dan Carpenter (and the Smatch tool)"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  [CIFS] Update cifs version number
  [SMB3] Do not fall back to SMBWriteX in set_file_size error cases
  [SMB3] Missing null tcon check
2015-10-06 14:30:21 +01:00
David Vrabel
98dd166ea3 x86/xen/p2m: hint at the last populated P2M entry
With commit 633d6f17cd (x86/xen: prepare
p2m list for memory hotplug) the P2M may be sized to accomdate a much
larger amount of memory than the domain currently has.

When saving a domain, the toolstack must scan all the P2M looking for
populated pages.  This results in a performance regression due to the
unnecessary scanning.

Instead of reporting (via shared_info) the maximum possible size of
the P2M, hint at the last PFN which might be populated.  This hint is
increased as new leaves are added to the P2M (in the expectation that
they will be used for populated entries).

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: <stable@vger.kernel.org> # 4.0+
2015-10-06 13:54:20 +01:00
Mark Brown
22ca03a3ad Merge remote-tracking branch 'regulator/fix/axp20x' into regulator-linus 2015-10-06 12:00:42 +01:00
Mark Brown
6710f22343 Merge remote-tracking branch 'regulator/fix/core' into regulator-linus 2015-10-06 12:00:38 +01:00
Yang Shi
abffa6f3b1 arm64: convert patch_lock to raw lock
When running kprobe test on arm64 rt kernel, it reports the below warning:

root@qemu7:~# modprobe kprobe_example
BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917
in_atomic(): 0, irqs_disabled(): 128, pid: 484, name: modprobe
CPU: 0 PID: 484 Comm: modprobe Not tainted 4.1.6-rt5 #2
Hardware name: linux,dummy-virt (DT)
Call trace:
[<ffffffc0000891b8>] dump_backtrace+0x0/0x128
[<ffffffc000089300>] show_stack+0x20/0x30
[<ffffffc00061dae8>] dump_stack+0x1c/0x28
[<ffffffc0000bbad0>] ___might_sleep+0x120/0x198
[<ffffffc0006223e8>] rt_spin_lock+0x28/0x40
[<ffffffc000622b30>] __aarch64_insn_write+0x28/0x78
[<ffffffc000622e48>] aarch64_insn_patch_text_nosync+0x18/0x48
[<ffffffc000622ee8>] aarch64_insn_patch_text_cb+0x70/0xa0
[<ffffffc000622f40>] aarch64_insn_patch_text_sync+0x28/0x48
[<ffffffc0006236e0>] arch_arm_kprobe+0x38/0x48
[<ffffffc00010e6f4>] arm_kprobe+0x34/0x50
[<ffffffc000110374>] register_kprobe+0x4cc/0x5b8
[<ffffffbffc002038>] kprobe_init+0x38/0x7c [kprobe_example]
[<ffffffc000084240>] do_one_initcall+0x90/0x1b0
[<ffffffc00061c498>] do_init_module+0x6c/0x1cc
[<ffffffc0000fd0c0>] load_module+0x17f8/0x1db0
[<ffffffc0000fd8cc>] SyS_finit_module+0xb4/0xc8

Convert patch_lock to raw loc kto avoid this issue.

Although the problem is found on rt kernel, the fix should be applicable to
mainline kernel too.

Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Yang Shi <yang.shi@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-10-05 18:30:29 +01:00
Mark Salyzyn
569ba74a7b arm64: readahead: fault retry breaks mmap file read random detection
This is the arm64 portion of commit 45cac65b0f ("readahead: fault
retry breaks mmap file read random detection"), which was absent from
the initial port and has since gone unnoticed. The original commit says:

> .fault now can retry.  The retry can break state machine of .fault.  In
> filemap_fault, if page is miss, ra->mmap_miss is increased.  In the second
> try, since the page is in page cache now, ra->mmap_miss is decreased.  And
> these are done in one fault, so we can't detect random mmap file access.
>
> Add a new flag to indicate .fault is tried once.  In the second try, skip
> ra->mmap_miss decreasing.  The filemap_fault state machine is ok with it.

With this change, Mark reports that:

> Random read improves by 250%, sequential read improves by 40%, and
> random write by 400% to an eMMC device with dm crypto wrapped around it.

Cc: Shaohua Li <shli@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Mark Salyzyn <salyzyn@android.com>
Signed-off-by: Riley Andrews <riandrews@android.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-10-05 16:30:50 +01:00