linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-06-23 14:36:15 -04:00

Author	SHA1	Message	Date
Jaegeuk Kim	14b44d238c	f2fs: add fault injection on f2fs_truncate Inject a fault during f2fs_truncate(). Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:26 -04:00
Sheng Yong	1941d7bcb4	f2fs: check range before defragment This patch checks the parameter range passed by ioctl to void that range exceeds the max_file_blocks limit. Signed-off-by: Sheng Yong <shengyong1@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:25 -04:00
Sheng Yong	b0beab5016	f2fs: use parameter max_items instead of PIDVEC_SIZE Signed-off-by: Sheng Yong <shengyong1@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:24 -04:00
Yunlei He	3d6a650feb	f2fs: add a punch discard command function This patch add a function to punch discard command if one segment reuse before discard. Split this segment from multi-segments discard range, and discard the left bigger range. Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:23 -04:00
Jaegeuk Kim	c81abe34fe	f2fs: allocate a bio for discarding when actually issuing it Let's allocate a bio when issuing discard commands later. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:23 -04:00
Yunlei He	a29d0e0bc0	f2fs: skip writeback meta pages if cp_mutex acquire failed Skip writeback meta pages if cp_mutex lock acquire failed, cp will flush dirty pages instead. Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:22 -04:00
Jaegeuk Kim	5ce4738a02	f2fs: show more precise message on orphan recovery failure This case is not caused by fsck.f2fs. User needs to retry mount. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:21 -04:00
Kinglong Mee	fc7d5bc427	f2fs: remove dead macro PGOFS_OF_NEXT_DNODE Fixes: `3cf4574705` ("f2fs: introduce get_next_page_offset to speed up SEEK_DATA") Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:21 -04:00
Kinglong Mee	0b28b71e29	f2fs: drop duplicate radix tree lookup of nat_entry_set The nat entry is listed from the set list for freeing, it's duplicate to do radix tree lookup again. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> [Jaegeuk Kim: remove unnecessary f2fs_bug_on] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:20 -04:00
Kinglong Mee	20fda56b01	f2fs: make sure trace all f2fs_issue_flush The root device's issue flush trace is missing, add it and tracing the result from submit. Fixes `d50aaeec90` ("f2fs: show actual device info in tracepoints") Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:19 -04:00
Chao Yu	8ff0971f15	f2fs: don't allow volatile writes for non-regular file Now f2fs only supports volatile writes for journal db regular file. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:18 -04:00
Jaegeuk Kim	e811898c97	f2fs: don't allow atomic writes for not regular files The atomic writes only supports regular files for database. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:17 -04:00
Jaegeuk Kim	8c242db9b8	f2fs: fix stale ATOMIC_WRITTEN_PAGE private pointer When I forced to enable atomic operations intentionally, I could hit the below panic, since we didn't clear page->private in f2fs_invalidate_page called by file truncation. The panic occurs due to NULL mapping having page->private. BUG: unable to handle kernel paging request at ffffffffffffffff IP: drop_buffers+0x38/0xe0 PGD 5d00c067 PUD 5d00e067 PMD 0 CPU: 3 PID: 1648 Comm: fsstress Tainted: G D OE 4.10.0+ #5 Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 task: ffff9151952863c0 task.stack: ffffaaec40db4000 RIP: 0010:drop_buffers+0x38/0xe0 RSP: 0018:ffffaaec40db74c8 EFLAGS: 00010292 Call Trace: ? page_referenced+0x8b/0x170 try_to_free_buffers+0xc5/0xe0 try_to_release_page+0x49/0x50 shrink_page_list+0x8bc/0x9f0 shrink_inactive_list+0x1dd/0x500 ? shrink_active_list+0x2c0/0x430 shrink_node_memcg+0x5eb/0x7c0 shrink_node+0xe1/0x320 do_try_to_free_pages+0xef/0x2e0 try_to_free_pages+0xe9/0x190 __alloc_pages_slowpath+0x390/0xe70 __alloc_pages_nodemask+0x291/0x2b0 alloc_pages_current+0x95/0x140 __page_cache_alloc+0xc4/0xe0 pagecache_get_page+0xab/0x2a0 grab_cache_page_write_begin+0x20/0x40 get_read_data_page+0x2e6/0x4c0 [f2fs] ? f2fs_mark_inode_dirty_sync+0x16/0x30 [f2fs] ? truncate_data_blocks_range+0x238/0x2b0 [f2fs] get_lock_data_page+0x30/0x190 [f2fs] __exchange_data_block+0xaaf/0xf40 [f2fs] f2fs_fallocate+0x418/0xd00 [f2fs] vfs_fallocate+0x157/0x220 SyS_fallocate+0x48/0x80 Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> [Chao Yu: use INMEM_INVALIDATE for better tracing] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:10 -04:00
Jaegeuk Kim	aa51d08a11	f2fs: build stat_info before orphan inode recovery f2fs_sync_fs() -> write_checkpoint() calls stat_inc_cp_count(sbi->stat_info), which needs stat_info allocation. Otherwise, we can hit: [254042.598623] ? count_shadow_nodes+0xa0/0xa0 [254042.598633] f2fs_sync_fs+0x65/0xd0 [f2fs] [254042.598645] f2fs_balance_fs_bg+0xe4/0x1c0 [f2fs] [254042.598657] f2fs_write_node_pages+0x34/0x1a0 [f2fs] [254042.598664] ? pagevec_lookup_entries+0x1e/0x30 [254042.598673] do_writepages+0x1e/0x30 [254042.598682] __writeback_single_inode+0x45/0x330 [254042.598688] writeback_single_inode+0xd7/0x190 [254042.598694] write_inode_now+0x86/0xa0 [254042.598699] iput+0x122/0x200 [254042.598709] f2fs_fill_super+0xd4a/0x14d0 [f2fs] [254042.598717] mount_bdev+0x184/0x1c0 [254042.598934] ? f2fs_commit_super+0x100/0x100 [f2fs] [254042.599142] f2fs_mount+0x15/0x20 [f2fs] [254042.599349] mount_fs+0x39/0x160 [254042.599554] ? __alloc_percpu+0x15/0x20 [254042.599759] vfs_kern_mount+0x67/0x110 [254042.599972] do_mount+0x1bb/0xc80 [254042.600175] ? memdup_user+0x42/0x60 [254042.600380] SyS_mount+0x83/0xd0 [254042.600583] entry_SYSCALL_64_fastpath+0x1e/0xad Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 16:52:16 -04:00
Kinglong Mee	10a875f82b	f2fs: fix the fault of calculating blkstart twice When the zone type is BLK_ZONE_TYPE_CONVENTIONAL, the blkstart is calculated twice. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 16:52:16 -04:00
Kinglong Mee	e2f0e962ac	f2fs: fix the fault of checking F2FS_LINK_MAX for rename inode The parent directory's nlink will change, not the inode. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 16:52:16 -04:00
Jaegeuk Kim	4f1bca9f0d	f2fs: don't allow to get pino when filename is encrypted After renaming an encrypted file, we have no way to get its encrypted filename from its dentry. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 16:52:16 -04:00
Jaegeuk Kim	8c1b3c0fb6	f2fs: fix wrong error injection for evict_inode The previous one was not a proper location to inject an error, since there is no point to get errors. Instead, we can emulate EIO during truncation, and the below logic should handle it correctly. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 16:52:16 -04:00
Kinglong Mee	10047f537c	f2fs: le32_to_cpu for ckpt->cp_pack_total_block_count Fixes: `22ad0b6ab4` ("f2fs: add bitmaps for empty or full NAT blocks") Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 16:52:16 -04:00
Jaegeuk Kim	b71deadbc4	f2fs: le16_to_cpu for xattr->e_value_size This patch fixes missing le16 conversion, reported by kbuild test robot. Fixes: `5f35a2cd5` ("f2fs: Don't update the xattr data that same as the exist") Reviewed-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 16:52:16 -04:00
Jaegeuk Kim	4f295443bf	f2fs: don't need to invalidate wrong node page If f2fs_new_inode() is failed, the bad inode will invalidate 0'th node page during f2fs_evict_inode(), which doesn't need to do. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 16:52:16 -04:00
Yunlei He	a78aaa2c3c	f2fs: fix an error return value in truncate_partial_data_page This patch fix a error return value in truncate_partial_data_page Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 16:52:16 -04:00
Logan Gunthorpe	233ed09d7f	chardev: add helper function to register char devs with a struct device Credit for this patch goes is shared with Dan Williams [1]. I've taken things one step further to make the helper function more useful and clean up calling code. There's a common pattern in the kernel whereby a struct cdev is placed in a structure along side a struct device which manages the life-cycle of both. In the naive approach, the reference counting is broken and the struct device can free everything before the chardev code is entirely released. Many developers have solved this problem by linking the internal kobjs in this fashion: cdev.kobj.parent = &parent_dev.kobj; The cdev code explicitly gets and puts a reference to it's kobj parent. So this seems like it was intended to be used this way. Dmitrty Torokhov first put this in place in 2012 with this commit: `2f0157f` char_dev: pin parent kobject and the first instance of the fix was then done in the input subsystem in the following commit: `4a215aa` Input: fix use-after-free introduced with dynamic minor changes Subsequently over the years, however, this issue seems to have tripped up multiple developers independently. For example, see these commits: `0d5b7da` iio: Prevent race between IIO chardev opening and IIO device (by Lars-Peter Clausen in 2013) `ba0ef85` tpm: Fix initialization of the cdev (by Jason Gunthorpe in 2015) `5b28dde` [media] media: fix use-after-free in cdev_put() when app exits after driver unbind (by Shauh Khan in 2016) This technique is similarly done in at least 15 places within the kernel and probably should have been done so in another, at least, 5 places. The kobj line also looks very suspect in that one would not expect drivers to have to mess with kobject internals in this way. Even highly experienced kernel developers can be surprised by this code, as seen in [2]. To help alleviate this situation, and hopefully prevent future wasted effort on this problem, this patch introduces a helper function to register a char device along with its parent struct device. This creates a more regular API for tying a char device to its parent without the developer having to set members in the underlying kobject. This patch introduce cdev_device_add and cdev_device_del which replaces a common pattern including setting the kobj parent, calling cdev_add and then calling device_add. It also introduces cdev_set_parent for the few cases that set the kobject parent without using device_add. [1] https://lkml.org/lkml/2017/2/13/700 [2] https://lkml.org/lkml/2017/2/10/370 Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Hans Verkuil <hans.verkuil@cisco.com> Reviewed-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-21 06:44:32 +01:00
Kyle Huey	e9ea1e7f53	x86/arch_prctl: Add ARCH_[GET\|SET]_CPUID Intel supports faulting on the CPUID instruction beginning with Ivy Bridge. When enabled, the processor will fault on attempts to execute the CPUID instruction with CPL>0. Exposing this feature to userspace will allow a ptracer to trap and emulate the CPUID instruction. When supported, this feature is controlled by toggling bit 0 of MSR_MISC_FEATURES_ENABLES. It is documented in detail in Section 2.3.2 of https://bugzilla.kernel.org/attachment.cgi?id=243991 Implement a new pair of arch_prctls, available on both x86-32 and x86-64. ARCH_GET_CPUID: Returns the current CPUID state, either 0 if CPUID faulting is enabled (and thus the CPUID instruction is not available) or 1 if CPUID faulting is not enabled. ARCH_SET_CPUID: Set the CPUID state to the second argument. If cpuid_enabled is 0 CPUID faulting will be activated, otherwise it will be deactivated. Returns ENODEV if CPUID faulting is not supported on this system. The state of the CPUID faulting flag is propagated across forks, but reset upon exec. Signed-off-by: Kyle Huey <khuey@kylehuey.com> Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com> Cc: kvm@vger.kernel.org Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: linux-kselftest@vger.kernel.org Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Robert O'Callahan <robert@ocallahan.org> Cc: Richard Weinberger <richard@nod.at> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Len Brown <len.brown@intel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: user-mode-linux-devel@lists.sourceforge.net Cc: Jeff Dike <jdike@addtoit.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: user-mode-linux-user@lists.sourceforge.net Cc: David Matlack <dmatlack@google.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: linux-fsdevel@vger.kernel.org Cc: Paolo Bonzini <pbonzini@redhat.com> Link: http://lkml.kernel.org/r/20170320081628.18952-9-khuey@kylehuey.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-03-20 16:10:34 +01:00
Chao Yu	7041d5d286	f2fs: combine nat_bits and free_nid_bitmap cache Both nat_bits cache and free_nid_bitmap cache provide same functionality as a intermediate cache between free nid cache and disk, but with different granularity of indicating free nid range, and different persistence policy. nat_bits cache provides better persistence ability, and free_nid_bitmap provides better granularity. In this patch we combine advantage of both caches, so finally policy of the intermediate cache would be: - init: load free nid status from nat_bits into free_nid_bitmap - lookup: scan free_nid_bitmap before load NAT blocks - update: update free_nid_bitmap in real-time - persistence: udpate and persist nat_bits in checkpoint This patch also resolves performance regression reported by lkp-robot. commit: `4ac912427c` ("f2fs: introduce free nid bitmap") d00030cf9cd0bb96fdccc41e33d3c91dcbb672ba ("f2fs: use __set{__clear}_bit_le") 1382c0f3f9d3f936c8bc42ed1591cf7a593ef9f7 ("f2fs: combine nat_bits and free_nid_bitmap cache") `4ac912427c` d00030cf9cd0bb96fdccc41e33 1382c0f3f9d3f936c8bc42ed15 ---------------- -------------------------- -------------------------- %stddev %change %stddev %change %stddev \ \| \ \| \ 77863 ± 0% +2.1% 79485 ± 1% +50.8% 117404 ± 0% aim7.jobs-per-min 231.63 ± 0% -2.0% 227.01 ± 1% -33.6% 153.80 ± 0% aim7.time.elapsed_time 231.63 ± 0% -2.0% 227.01 ± 1% -33.6% 153.80 ± 0% aim7.time.elapsed_time.max 896604 ± 0% -0.8% 889221 ± 3% -20.2% 715260 ± 1% aim7.time.involuntary_context_switches 2394 ± 1% +4.6% 2503 ± 1% +3.7% 2481 ± 2% aim7.time.maximum_resident_set_size 6240 ± 0% -1.5% 6145 ± 1% -14.1% 5360 ± 1% aim7.time.system_time 1111357 ± 3% +1.9% 1132509 ± 2% -6.2% 1041932 ± 2% aim7.time.voluntary_context_switches ... Signed-off-by: Chao Yu <yuchao0@huawei.com> Tested-by: Xiaolong Ye <xiaolong.ye@intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-20 10:00:18 -04:00
Chao Yu	586d1492f3	f2fs: skip scanning free nid bitmap of full NAT blocks This patch adds to account free nids for each NAT blocks, and while scanning all free nid bitmap, do check count and skip lookuping in full NAT block. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-20 10:00:17 -04:00
Jaegeuk Kim	23380b8568	f2fs: use __set{__clear}_bit_le This patch uses __set{__clear}_bit_le for highter speed. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-20 10:00:16 -04:00
Jaegeuk Kim	9f7e4a2c49	f2fs: declare static functions This is to avoid build warning reported by kbuild test robot. Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-20 10:00:15 -04:00
Jaegeuk Kim	720037f939	f2fs: don't overwrite node block by SSR This patch fixes that SSR can overwrite previous warm node block consisting of a node chain since the last checkpoint. Fixes: `5b6c6be2d8` ("f2fs: use SSR for warm node as well") Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-20 10:00:14 -04:00
Linus Torvalds	8841b5f0cd	Merge tag 'nfs-for-4.11-2' of git://git.linux-nfs.org/projects/anna/linux-nfs Pull NFS client fixes from Anna Schumaker: "We have a handful of stable fixes to fix kernel warnings and other bugs that have been around for a while. We've also found a few other reference counting bugs and memory leaks since the initial 4.11 pull. Stable Bugfixes: - Fix decrementing nrequests in NFS v4.2 COPY to fix kernel warnings - Prevent a double free in async nfs4_exchange_id() - Squelch a kbuild sparse complaint for xprtrdma Other Bugfixes: - Fix a typo (NFS_ATTR_FATTR_GROUP_NAME) that causes a memory leak - Fix a reference leak that causes kernel warnings - Make nfs4_cb_sv_ops static to fix a sparse warning - Respect a server's max size in CREATE_SESSION - Handle errors from nfs4_pnfs_ds_connect - Flexfiles layout shouldn't mark devices as unavailable" * tag 'nfs-for-4.11-2' of git://git.linux-nfs.org/projects/anna/linux-nfs: pNFS/flexfiles: never nfs4_mark_deviceid_unavailable pNFS: return status from nfs4_pnfs_ds_connect NFSv4.1 respect server's max size in CREATE_SESSION NFS prevent double free in async nfs4_exchange_id nfs: make nfs4_cb_sv_ops static xprtrdma: Squelch kbuild sparse complaint NFS: fix the fault nrequests decreasing for nfs_inode COPY NFSv4: fix a reference leak caused WARNING messages nfs4: fix a typo of NFS_ATTR_FATTR_GROUP_NAME	2017-03-17 14:16:22 -07:00
Zygo Blaxell	e1699d2d7b	btrfs: add missing memset while reading compressed inline extents This is a story about 4 distinct (and very old) btrfs bugs. Commit `c8b978188c` ("Btrfs: Add zlib compression support") added three data corruption bugs for inline extents (bugs #1-3). Commit `93c82d5750` ("Btrfs: zero page past end of inline file items") fixed bug #1: uncompressed inline extents followed by a hole and more extents could get non-zero data in the hole as they were read. The fix was to add a memset in btrfs_get_extent to zero out the hole. Commit `166ae5a418` ("btrfs: fix inline compressed read err corruption") fixed bug #2: compressed inline extents which contained non-zero bytes might be replaced with zero bytes in some cases. This patch removed an unhelpful memset from uncompress_inline, but the case where memset is required was missed. There is also a memset in the decompression code, but this only covers decompressed data that is shorter than the ram_bytes from the extent ref record. This memset doesn't cover the region between the end of the decompressed data and the end of the page. It has also moved around a few times over the years, so there's no single patch to refer to. This patch fixes bug #3: compressed inline extents followed by a hole and more extents could get non-zero data in the hole as they were read (i.e. bug #3 is the same as bug #1, but s/uncompressed/compressed/). The fix is the same: zero out the hole in the compressed case too, by putting a memset back in uncompress_inline, but this time with correct parameters. The last and oldest bug, bug #0, is the cause of the offending inline extent/hole/extent pattern. Bug #0 is a subtle and mostly-harmless quirk of behavior somewhere in the btrfs write code. In a few special cases, an inline extent and hole are allowed to persist where they normally would be combined with later extents in the file. A fast reproducer for bug #0 is presented below. A few offending extents are also created in the wild during large rsync transfers with the -S flag. A Linux kernel build (git checkout; make allyesconfig; make -j8) will produce a handful of offending files as well. Once an offending file is created, it can present different content to userspace each time it is read. Bug #0 is at least 4 and possibly 8 years old. I verified every vX.Y kernel back to v3.5 has this behavior. There are fossil records of this bug's effects in commits all the way back to v2.6.32. I have no reason to believe bug #0 wasn't present at the beginning of btrfs compression support in v2.6.29, but I can't easily test kernels that old to be sure. It is not clear whether bug #0 is worth fixing. A fix would likely require injecting extra reads into currently write-only paths, and most of the exceptional cases caused by bug #0 are already handled now. Whether we like them or not, bug #0's inline extents followed by holes are part of the btrfs de-facto disk format now, and we need to be able to read them without data corruption or an infoleak. So enough about bug #0, let's get back to bug #3 (this patch). An example of on-disk structure leading to data corruption found in the wild: item 61 key (606890 INODE_ITEM 0) itemoff 9662 itemsize 160 inode generation 50 transid 50 size 47424 nbytes 49141 block group 0 mode 100644 links 1 uid 0 gid 0 rdev 0 flags 0x0(none) item 62 key (606890 INODE_REF 603050) itemoff 9642 itemsize 20 inode ref index 3 namelen 10 name: DB_File.so item 63 key (606890 EXTENT_DATA 0) itemoff 8280 itemsize 1362 inline extent data size 1341 ram 4085 compress(zlib) item 64 key (606890 EXTENT_DATA 4096) itemoff 8227 itemsize 53 extent data disk byte 5367308288 nr 20480 extent data offset 0 nr 45056 ram 45056 extent compression(zlib) Different data appears in userspace during each read of the 11 bytes between 4085 and 4096. The extent in item 63 is not long enough to fill the first page of the file, so a memset is required to fill the space between item 63 (ending at 4085) and item 64 (beginning at 4096) with zero. Here is a reproducer from Liu Bo, which demonstrates another method of creating the same inline extent and hole pattern: Using 'page_poison=on' kernel command line (or enable CONFIG_PAGE_POISONING) run the following: # touch foo # chattr +c foo # xfs_io -f -c "pwrite -W 0 1000" foo # xfs_io -f -c "falloc 4 8188" foo # od -x foo # echo 3 >/proc/sys/vm/drop_caches # od -x foo This produce the following on my box: Correct output: file contains 1000 data bytes followed by zeros: 0000000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd * 0001740 cdcd cdcd cdcd cdcd 0000 0000 0000 0000 `0001760` 0000 0000 0000 0000 0000 0000 0000 0000 * 0020000 Actual output: the data after the first 1000 bytes will be different each run: 0000000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd * 0001740 cdcd cdcd cdcd cdcd 6c63 7400 635f 006d `0001760` 5f74 6f43 7400 435f 0053 5f74 7363 7400 0002000 435f 0056 5f74 6164 7400 645f 0062 5f74 (...) Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Chris Mason <clm@fb.com> Signed-off-by: Chris Mason <clm@fb.com>	2017-03-17 13:47:10 -07:00
Liu Bo	49d4a33472	Btrfs: fix regression in lock_delalloc_pages The bug is a regression after commit (`da2c7009f6` "btrfs: teach __process_pages_contig about PAGE_LOCK operation") and commit (`76c0021db8` "Btrfs: use helper to simplify lock/unlock pages"). So if the dirty pages which are under writeback got truncated partially before we lock the dirty pages, we couldn't find all pages mapping to the delalloc range, and the bug didn't return an error so it kept going on and found that the delalloc range got truncated and got to unlock the dirty pages, and then the ASSERT could caught the error, and showed ----------------------------------------------------------------------------- assertion failed: page_ops & PAGE_LOCK, file: fs/btrfs/extent_io.c, line: 1716 ----------------------------------------------------------------------------- This fixes the bug by returning the proper -EAGAIN. Cc: David Sterba <dsterba@suse.com> Reported-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2017-03-17 13:47:09 -07:00
Weston Andros Adamson	da066f3f03	pNFS/flexfiles: never nfs4_mark_deviceid_unavailable The flexfiles layout should never mark a device unavailable. Move nfs4_mark_deviceid_unavailable out of nfs4_pnfs_ds_connect and call directly from files layout where it's still needed. The flexfiles driver still handles marked devices in error paths, but will now print a rate limited warning. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2017-03-17 16:07:17 -04:00
Weston Andros Adamson	a33e4b036d	pNFS: return status from nfs4_pnfs_ds_connect The nfs4_pnfs_ds_connect path can call rpc_create which can fail or it can wait on another context to reach the same failure. This checks that the rpc_create succeeded and returns the error to the caller. When an error is returned, both the files and flexfiles layouts will return NULL from _prepare_ds(). The flexfiles layout will also return the layout with the error NFS4ERR_NXIO. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2017-03-17 16:07:10 -04:00
Olga Kornievskaia	033853325f	NFSv4.1 respect server's max size in CREATE_SESSION Currently client doesn't respect max sizes server returns in CREATE_SESSION. nfs4_session_set_rwsize() gets called and server->rsize, server->wsize are 0 so they never get set to the sizes returned by the server. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2017-03-17 16:07:03 -04:00
Olga Kornievskaia	63513232f8	NFS prevent double free in async nfs4_exchange_id Since rpc_task is async, the release function should be called which will free the impl_id, scope, and owner. Trond pointed at 2 more problems: -- use of client pointer after free in the nfs4_exchangeid_release() function -- cl_count mismatch if rpc_run_task() isn't run Fixes: `8d89bd70bc` ("NFS setup async exchange_id") Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Cc: stable@vger.kernel.org # 4.9 Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2017-03-17 16:06:27 -04:00
Jason Yan	05fae7bbc2	nfs: make nfs4_cb_sv_ops static Fixes the following sparse warning: fs/nfs/callback.c:235:21: warning: symbol 'nfs4_cb_sv_ops' was not declared. Should it be static? Signed-off-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2017-03-17 16:05:56 -04:00
Kinglong Mee	38a33101dd	NFS: fix the fault nrequests decreasing for nfs_inode COPY The nfs_commit_file for NFSv4.2's COPY operation goes through the commit path for normal WRITE, but without increase nrequests, so, the nrequests decreased in nfs_commit_release_pages is fault. After that, the nrequests will be wrong. [ 5670.299881] ------------[ cut here ]------------ [ 5670.300295] WARNING: CPU: 0 PID: 27656 at fs/nfs/inode.c:127 nfs_clear_inode+0x66/0x90 [nfs] [ 5670.300558] Modules linked in: nfsv4(E) nfs(E) fscache(E) tun bridge stp llc fuse ip_set nfnetlink vmw_vsock_vmci_transport vsock snd_seq_midi snd_seq_midi_event ppdev f2fs coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_ens1371 intel_rapl_perf gameport snd_ac97_codec vmw_balloon ac97_bus snd_seq snd_pcm joydev snd_rawmidi snd_timer snd_seq_device snd soundcore nfit parport_pc parport acpi_cpufreq tpm_tis tpm_tis_core tpm i2c_piix4 vmw_vmci shpchp nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c vmwgfx drm_kms_helper ttm drm e1000 crc32c_intel mptspi scsi_transport_spi serio_raw mptscsih mptbase ata_generic pata_acpi fjes [last unloaded: fscache] [ 5670.302925] CPU: 0 PID: 27656 Comm: umount.nfs4 Tainted: G W E 4.11.0-rc1+ #519 [ 5670.303292] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015 [ 5670.304094] Call Trace: [ 5670.304510] dump_stack+0x63/0x86 [ 5670.304917] __warn+0xcb/0xf0 [ 5670.305276] warn_slowpath_null+0x1d/0x20 [ 5670.305661] nfs_clear_inode+0x66/0x90 [nfs] [ 5670.306093] nfs4_evict_inode+0x61/0x70 [nfsv4] [ 5670.306480] evict+0xbb/0x1c0 [ 5670.306888] dispose_list+0x4d/0x70 [ 5670.307233] evict_inodes+0x178/0x1a0 [ 5670.307579] generic_shutdown_super+0x44/0xf0 [ 5670.307985] nfs_kill_super+0x21/0x40 [nfs] [ 5670.308325] deactivate_locked_super+0x43/0x70 [ 5670.308698] deactivate_super+0x5a/0x60 [ 5670.309036] cleanup_mnt+0x3f/0x90 [ 5670.309407] __cleanup_mnt+0x12/0x20 [ 5670.309837] task_work_run+0x80/0xa0 [ 5670.310162] exit_to_usermode_loop+0x89/0x90 [ 5670.310497] syscall_return_slowpath+0xaa/0xb0 [ 5670.310875] entry_SYSCALL_64_fastpath+0xa7/0xa9 [ 5670.311197] RIP: 0033:0x7f1bb3617fe7 [ 5670.311545] RSP: 002b:00007ffecbabb828 EFLAGS: 00000206 ORIG_RAX: 00000000000000a6 [ 5670.311906] RAX: 0000000000000000 RBX: 0000000001dca1f0 RCX: 00007f1bb3617fe7 [ 5670.312239] RDX: 000000000000000c RSI: 0000000000000001 RDI: 0000000001dc83c0 [ 5670.312653] RBP: 0000000001dc83c0 R08: 0000000000000001 R09: 0000000000000000 [ 5670.312998] R10: 0000000000000755 R11: 0000000000000206 R12: 00007ffecbabc66a [ 5670.313335] R13: 0000000001dc83a0 R14: 0000000000000000 R15: 0000000000000000 [ 5670.313758] ---[ end trace bf4bfe7764e4eb40 ]--- Cc: linux-kernel@vger.kernel.org Fixes: `67911c8f18` ("NFS: Add nfs_commit_file()") Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Cc: stable@vger.kernel.org # 4.7+ Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2017-03-17 16:03:51 -04:00
Linus Torvalds	57fd0b77d6	Merge tag 'afs-20170316' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull AFS fixes from David Howells: "Fixes to the AFS filesystem in the kernel. They fix a variety of bugs. These include some issues fixed for consistency with other AFS implementations: - handle AFS mode bits better - use the client mtime rather than the server mtime in the protocol - handle the server returning more or less data than was requested in a FetchData call - distinguish mountpoints from symlinks based on the mode bits rather than preemptively reading every symlink to find out what it actually represents One other notable change for the user is that files are now flushed on close analogously with other network filesystems" * tag 'afs-20170316' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: (28 commits) afs: Don't wait for page writeback with the page lock held afs: ->writepage() shouldn't call clear_page_dirty_for_io() afs: Fix abort on signal while waiting for call completion afs: Fix an off-by-one error in afs_send_pages() afs: Fix afs_kill_pages() afs: Fix page leak in afs_write_begin() afs: Don't set PG_error on local EINTR or ENOMEM when filling a page afs: Populate and use client modification time afs: Better abort and net error handling afs: Invalid op ID should abort with RXGEN_OPCODE afs: Fix the maths in afs_fs_store_data() afs: Use a bvec rather than a kvec in afs_send_pages() afs: Make struct afs_read::remain 64-bit afs: Fix AFS read bug afs: Prevent callback expiry timer overflow afs: Migrate vlocation fields to 64-bit afs: security: Replace rcu_assign_pointer() with RCU_INIT_POINTER() afs: inode: Replace rcu_assign_pointer() with RCU_INIT_POINTER() afs: Distinguish mountpoints from symlinks by file mode alone afs: Flush outstanding writes when an fd is closed ...	2017-03-17 12:16:44 -07:00
Vaibhav Jain	966fa72a71	kernfs: Check KERNFS_HAS_RELEASE before calling kernfs_release_file() Recently started seeing a kernel oops when a module tries removing a memory mapped sysfs bin_attribute. On closer investigation the root cause seems to be kernfs_release_file() trying to call kernfs_op.release() callback that's NULL for such sysfs bin_attributes. The oops occurs when kernfs_release_file() is called from kernfs_drain_open_files() to cleanup any open handles with active memory mappings. The patch fixes this by checking for flag KERNFS_HAS_RELEASE before calling kernfs_release_file() in function kernfs_drain_open_files(). On ppc64-le arch with cxl module the oops back-trace is of the form below: [ 861.381126] Unable to handle kernel paging request for instruction fetch [ 861.381360] Faulting instruction address: 0x00000000 [ 861.381428] Oops: Kernel access of bad area, sig: 11 [#1] .... [ 861.382481] NIP: 0000000000000000 LR: c000000000362c60 CTR: 0000000000000000 .... Call Trace: [c000000f1680b750] [c000000000362c34] kernfs_drain_open_files+0x104/0x1d0 (unreliable) [c000000f1680b790] [c00000000035fa00] __kernfs_remove+0x260/0x2c0 [c000000f1680b820] [c000000000360da0] kernfs_remove_by_name_ns+0x60/0xe0 [c000000f1680b8b0] [c0000000003638f4] sysfs_remove_bin_file+0x24/0x40 [c000000f1680b8d0] [c00000000062a164] device_remove_bin_file+0x24/0x40 [c000000f1680b8f0] [d000000009b7b22c] cxl_sysfs_afu_remove+0x144/0x170 [cxl] [c000000f1680b940] [d000000009b7c7e4] cxl_remove+0x6c/0x1a0 [cxl] [c000000f1680b990] [c00000000052f694] pci_device_remove+0x64/0x110 [c000000f1680b9d0] [c0000000006321d4] device_release_driver_internal+0x1f4/0x2b0 [c000000f1680ba20] [c000000000525cb0] pci_stop_bus_device+0xa0/0xd0 [c000000f1680ba60] [c000000000525e80] pci_stop_and_remove_bus_device+0x20/0x40 [c000000f1680ba90] [c00000000004a6c4] pci_hp_remove_devices+0x84/0xc0 [c000000f1680bad0] [c00000000004a688] pci_hp_remove_devices+0x48/0xc0 [c000000f1680bb10] [c0000000009dfda4] eeh_reset_device+0xb0/0x290 [c000000f1680bbb0] [c000000000032b4c] eeh_handle_normal_event+0x47c/0x530 [c000000f1680bc60] [c000000000032e64] eeh_handle_event+0x174/0x350 [c000000f1680bd10] [c000000000033228] eeh_event_handler+0x1e8/0x1f0 [c000000f1680bdc0] [c0000000000d384c] kthread+0x14c/0x190 [c000000f1680be30] [c00000000000b5a0] ret_from_kernel_thread+0x5c/0xbc Fixes: `f83f3c5156` ("kernfs: fix locking around kernfs_ops->release() callback") Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-17 10:25:59 +09:00
Linus Torvalds	d11507e197	Merge tag 'xfs-4.11-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs fix from Darrick Wong: "Here's a single fix for -rc3 to improve input validation on inline directory data to prevent buffer overruns due to corrupt metadata" * tag 'xfs-4.11-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: verify inline directory data forks	2017-03-16 12:30:43 -07:00
Bob Peterson	cc963a11b6	GFS2: Temporarily zero i_no_addr when creating a dinode Before this patch i_no_addr was not initialized until after the return from allocating its block. That meant the i_no_addr was temporarily uninitialized storage. Ordinarily that's not a concern, but if inplace_reserve can't find space, it can call try_rgrp_unlink which references i_no_addr as a block to avoid. That can result in unpredictable behavior. More importantly, the trace point in gfs2_alloc_blocks references ip->i_no_addr before it is set, which is misleading when reading the kernel traces. This patch makes it look like the new dinode block was assigned in the name of inode 0 rather than a random inode that's completely unrelated. Signed-off-by: Bob Peterson <rpeterso@redhat.com>	2017-03-16 15:29:13 -04:00
David Howells	c5051c7bc7	afs: Don't wait for page writeback with the page lock held Drop the page lock before waiting for page writeback. Signed-off-by: David Howells <dhowells@redhat.com>	2017-03-16 16:29:30 +00:00
David Howells	65a151094e	afs: ->writepage() shouldn't call clear_page_dirty_for_io() The ->writepage() op shouldn't call clear_page_dirty_for_io() as that has already been called by the caller. Fix afs_writepage() by moving the call out of afs_write_back_from_locked_page() to afs_writepages_region() where it is needed. Signed-off-by: David Howells <dhowells@redhat.com>	2017-03-16 16:29:30 +00:00
David Howells	954cd6dc02	afs: Fix abort on signal while waiting for call completion Fix the way in which a call that's in progress and being waited for is aborted in the case that EINTR is detected. We should be sending RX_USER_ABORT rather than RX_CALL_DEAD as the abort code. Note that since the only two ways out of the loop are if the call completes or if a signal happens, the kill-the-call clause after the loop has finished can only happen in the case of EINTR. This means that we only have one abort case to deal with, not two, and the "KWC" case can never happen and so can be deleted. Note further that simply aborting the call isn't necessarily the best thing here since at this point: the request has been entirely sent and it's likely the server will do the operation anyway - whether we abort it or not. In future, we should punt the handling of the remainder of the call off to a background thread. Reported-by: Marc Dionne <marc.c.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com>	2017-03-16 16:29:30 +00:00
David Howells	445783d0ec	afs: Fix an off-by-one error in afs_send_pages() afs_send_pages() should only put the call into the AFS_CALL_AWAIT_REPLY state if it has sent all the pages - but the check it makes is incorrect and sometimes it will finish the loop early. Signed-off-by: David Howells <dhowells@redhat.com>	2017-03-16 16:29:30 +00:00
David Howells	7286a35e89	afs: Fix afs_kill_pages() Fix afs_kill_pages() in two ways: (1) If a writeback has been partially flushed, then if we try and kill the pages it contains, some of them may no longer be undergoing writeback and end_page_writeback() will assert. Fix this by checking to see whether the page in question is actually undergoing writeback before ending that writeback. (2) The loop that scans for pages to kill doesn't increase the first page index, and so the loop may not terminate, but it will try to process the same pages over and over again. Fix this by increasing the first page index to one after the last page we processed. Signed-off-by: David Howells <dhowells@redhat.com>	2017-03-16 16:29:30 +00:00
David Howells	6d06b0d252	afs: Fix page leak in afs_write_begin() afs_write_begin() leaks a ref and a lock on a page if afs_fill_page() fails. Fix the leak by unlocking and releasing the page in the error path. Signed-off-by: David Howells <dhowells@redhat.com>	2017-03-16 16:27:48 +00:00
David Howells	68ae849d7e	afs: Don't set PG_error on local EINTR or ENOMEM when filling a page Don't set PG_error on a page if we get local EINTR or ENOMEM when filling a page for writing. Signed-off-by: David Howells <dhowells@redhat.com>	2017-03-16 16:27:48 +00:00
Marc Dionne	ab94f5d0dd	afs: Populate and use client modification time The inode timestamps should be set from the client time in the status received from the server, rather than the server time which is meant for internal server use. Set AFS_SET_MTIME and populate the mtime for operations that take an input status, such as file/dir creation and StoreData. If an input time is not provided the server will set the vnode times based on the current server time. In a situation where the server has some skew with the client, this could lead to the client seeing a timestamp in the future for a file that it just created or wrote. Signed-off-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com>	2017-03-16 16:27:47 +00:00

... 40 41 42 43 44 ...

50492 Commits