linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-06-22 14:05:22 -04:00

Author	SHA1	Message	Date
Al Viro	ca1579f6c6	Merge remote-tracking branch 'jl/locks-4.13' into work.misc-set_fs	2017-06-26 23:52:33 -04:00
Will Deacon	3edb1dd13c	Merge branch 'aarch64/for-next/ras-apei' into aarch64/for-next/core Merge in arm64 ACPI RAS support (APEI/GHES) from Tyler Baicar.	2017-06-26 10:54:27 +01:00
Brian Foster	39775431f8	xfs: free uncommitted transactions during log recovery Log recovery allocates in-core transaction and member item data structures on-demand as it processes the on-disk log. Transactions are allocated on first encounter on-disk and stored in a hash table structure where they are easily accessible for subsequent lookups. Transaction items are also allocated on demand and are attached to the associated transactions. When a commit record is encountered in the log, the transaction is committed to the fs and the in-core structures are freed. If a filesystem crashes or shuts down before all in-core log buffers are flushed to the log, however, not all transactions may have commit records in the log. As expected, the modifications in such an incomplete transaction are not replayed to the fs. The in-core data structures for the partial transaction are never freed, however, resulting in a memory leak. Update xlog_do_recovery_pass() to first correctly initialize the hash table array so empty lists can be distinguished from populated lists on function exit. Update xlog_recover_free_trans() to always remove the transaction from the list prior to freeing the associated memory. Finally, walk the hash table of transaction lists as the last step before it goes out of scope and free any transactions that may remain on the lists. This prevents a memory leak of partial transactions in the log. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2017-06-24 10:11:41 -07:00
Ingo Molnar	1bc3cd4dfa	Merge branch 'linus' into sched/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-06-24 08:57:20 +02:00
Eric Biggers	c250b7dd8e	fscrypt: make ->dummy_context() return bool This makes it consistent with ->is_encrypted(), ->empty_dir(), and fscrypt_dummy_context_enabled(). Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-06-23 20:11:50 -04:00
Daniel Walter	b7e7cf7a66	fscrypt: add support for AES-128-CBC fscrypt provides facilities to use different encryption algorithms which are selectable by userspace when setting the encryption policy. Currently, only AES-256-XTS for file contents and AES-256-CBC-CTS for file names are implemented. This is a clear case of kernel offers the mechanism and userspace selects a policy. Similar to what dm-crypt and ecryptfs have. This patch adds support for using AES-128-CBC for file contents and AES-128-CBC-CTS for file name encryption. To mitigate watermarking attacks, IVs are generated using the ESSIV algorithm. While AES-CBC is actually slightly less secure than AES-XTS from a security point of view, there is more widespread hardware support. Using AES-CBC gives us the acceptable performance while still providing a moderate level of security for persistent storage. Especially low-powered embedded devices with crypto accelerators such as CAAM or CESA often only support AES-CBC. Since using AES-CBC over AES-XTS is basically thought of a last resort, we use AES-128-CBC over AES-256-CBC since it has less encryption rounds and yields noticeable better performance starting from a file size of just a few kB. Signed-off-by: Daniel Walter <dwalter@sigma-star.at> [david@sigma-star.at: addressed review comments] Signed-off-by: David Gstir <david@sigma-star.at> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-06-23 20:05:07 -04:00
Eric Biggers	27e47a6342	fscrypt: inline fscrypt_free_filename() fscrypt_free_filename() only needs to do a kfree() of crypto_buf.name, which works well as an inline function. We can skip setting the various pointers to NULL, since no user cares about it (the name is always freed just before it goes out of scope). Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: David Gstir <david@sigma-star.at> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-06-23 19:59:08 -04:00
Eric Biggers	63136858ae	ext4: require key for truncate(2) of encrypted file Currently, filesystems allow truncate(2) on an encrypted file without the encryption key. However, it's impossible to correctly handle the case where the size being truncated to is not a multiple of the filesystem block size, because that would require decrypting the final block, zeroing the part beyond i_size, then encrypting the block. As other modifications to encrypted file contents are prohibited without the key, just prohibit truncate(2) as well, making it fail with ENOKEY. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-06-23 19:48:44 -04:00
Eric Biggers	66e0aaadce	ext4: don't bother checking for encryption key in ->mmap() Since only an open file can be mmap'ed, and we only allow open()ing an encrypted file when its key is available, there is no need to check for the key again before permitting each mmap(). Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-06-23 19:41:38 -04:00
Linus Torvalds	337c6ba2d8	Merge branch 'akpm' (patches from Andrew) Merge misc fixes from Andrew Morton: "8 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: fs/exec.c: account for argv/envp pointers ocfs2: fix deadlock caused by recursive locking in xattr slub: make sysfs file removal asynchronous lib/cmdline.c: fix get_options() overflow while parsing ranges fs/dax.c: fix inefficiency in dax_writeback_mapping_range() autofs: sanity check status reported with AUTOFS_DEV_IOCTL_FAIL mm/vmalloc.c: huge-vmap: fail gracefully on unexpected huge vmap mappings mm, thp: remove cond_resched from __collapse_huge_page_copy	2017-06-23 16:30:52 -07:00
Kees Cook	98da7d0885	fs/exec.c: account for argv/envp pointers When limiting the argv/envp strings during exec to 1/4 of the stack limit, the storage of the pointers to the strings was not included. This means that an exec with huge numbers of tiny strings could eat 1/4 of the stack limit in strings and then additional space would be later used by the pointers to the strings. For example, on 32-bit with a 8MB stack rlimit, an exec with 1677721 single-byte strings would consume less than 2MB of stack, the max (8MB / 4) amount allowed, but the pointers to the strings would consume the remaining additional stack space (1677721 * 4 == 6710884). The result (1677721 + 6710884 == 8388605) would exhaust stack space entirely. Controlling this stack exhaustion could result in pathological behavior in setuid binaries (CVE-2017-1000365). [akpm@linux-foundation.org: additional commenting from Kees] Fixes: `b6a2fea393` ("mm: variable length argument support") Link: http://lkml.kernel.org/r/20170622001720.GA32173@beast Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Qualys Security Advisory <qsa@qualys.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-06-23 16:15:56 -07:00
Eric Ren	8818efaaac	ocfs2: fix deadlock caused by recursive locking in xattr Another deadlock path caused by recursive locking is reported. This kind of issue was introduced since commit `743b5f1434` ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()"). Two deadlock paths have been fixed by commit `b891fa5024` ("ocfs2: fix deadlock issue when taking inode lock at vfs entry points"). Yes, we intend to fix this kind of case in incremental way, because it's hard to find out all possible paths at once. This one can be reproduced like this. On node1, cp a large file from home directory to ocfs2 mountpoint. While on node2, run setfacl/getfacl. Both nodes will hang up there. The backtraces: On node1: __ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2] ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2] ocfs2_write_begin+0x43/0x1a0 [ocfs2] generic_perform_write+0xa9/0x180 __generic_file_write_iter+0x1aa/0x1d0 ocfs2_file_write_iter+0x4f4/0xb40 [ocfs2] __vfs_write+0xc3/0x130 vfs_write+0xb1/0x1a0 SyS_write+0x46/0xa0 On node2: __ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2] ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2] ocfs2_xattr_set+0x12e/0xe80 [ocfs2] ocfs2_set_acl+0x22d/0x260 [ocfs2] ocfs2_iop_set_acl+0x65/0xb0 [ocfs2] set_posix_acl+0x75/0xb0 posix_acl_xattr_set+0x49/0xa0 __vfs_setxattr+0x69/0x80 __vfs_setxattr_noperm+0x72/0x1a0 vfs_setxattr+0xa7/0xb0 setxattr+0x12d/0x190 path_setxattr+0x9f/0xb0 SyS_setxattr+0x14/0x20 Fix this one by using ocfs2_inode_{lock\|unlock}_tracker, which is exported by commit `439a36b8ef` ("ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock"). Link: http://lkml.kernel.org/r/20170622014746.5815-1-zren@suse.com Fixes: `743b5f1434` ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()") Signed-off-by: Eric Ren <zren@suse.com> Reported-by: Thomas Voegtle <tv@lio96.de> Tested-by: Thomas Voegtle <tv@lio96.de> Reviewed-by: Joseph Qi <jiangqi903@gmail.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-06-23 16:15:55 -07:00
Jan Kara	1eb643d02b	fs/dax.c: fix inefficiency in dax_writeback_mapping_range() dax_writeback_mapping_range() fails to update iteration index when searching radix tree for entries needing cache flushing. Thus each pagevec worth of entries is searched starting from the start which is inefficient and prone to livelocks. Update index properly. Link: http://lkml.kernel.org/r/20170619124531.21491-1-jack@suse.cz Fixes: `9973c98ecf` ("dax: add support for fsync/sync") Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-06-23 16:15:55 -07:00
NeilBrown	9fa4eb8e49	autofs: sanity check status reported with AUTOFS_DEV_IOCTL_FAIL If a positive status is passed with the AUTOFS_DEV_IOCTL_FAIL ioctl, autofs4_d_automount() will return ERR_PTR(status) with that status to follow_automount(), which will then dereference an invalid pointer. So treat a positive status the same as zero, and map to ENOENT. See comment in systemd src/core/automount.c::automount_send_ready(). Link: http://lkml.kernel.org/r/871sqwczx5.fsf@notabene.neil.brown.name Signed-off-by: NeilBrown <neilb@suse.com> Cc: Ian Kent <raven@themaw.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-06-23 16:15:55 -07:00
Linus Torvalds	7b249bdc3d	Merge tag 'xfs-4.12-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs fixes from Darrick Wong: "I have one more bugfix for you for 4.12-rc7 to fix a disk corruption problem: - don't allow swapon on files on the realtime device, because the swap code will swap pages out to blocks on the data device, thereby corrupting the filesystem" * tag 'xfs-4.12-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: don't allow bmap on rt files	2017-06-23 12:23:06 -07:00
Jeff Mahoney	08db141b53	reiserfs: fix race in prealloc discard The main loop in __discard_prealloc is protected by the reiserfs write lock which is dropped across schedules like the BKL it replaced. The problem is that it checks the value, calls a routine that schedules, and then adjusts the state. As a result, two threads that are calling reiserfs_prealloc_discard at the same time can race when one calls reiserfs_free_prealloc_block, the lock is dropped, and the other calls reiserfs_free_prealloc_block with the same block number. In the right circumstances, it can cause the prealloc count to go negative. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Jan Kara <jack@suse.cz>	2017-06-23 09:40:30 +02:00
Jeff Mahoney	54930dfeb4	reiserfs: don't preallocate blocks for extended attributes Most extended attributes will fit in a single block. More importantly, we drop the reference to the inode while holding the transaction open so the preallocated blocks aren't released. As a result, the inode may be evicted before it's removed from the transaction's prealloc list which can cause memory corruption. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Jan Kara <jack@suse.cz>	2017-06-23 09:40:24 +02:00
Chao Yu	1ea1516fbb	ext4: check return value of kstrtoull correctly in reserved_clusters_store kstrtoull returns 0 on success, however, in reserved_clusters_store we will return -EINVAL if kstrtoull returns 0, it makes us fail to update reserved_clusters value through sysfs. Fixes: `76d33bca55` Cc: stable@vger.kernel.org # 4.4 Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Miao Xie <miaoxie@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-06-23 01:08:22 -04:00
Darrick J. Wong	4a4956249d	ext4: fix off-by-one fsmap error on 1k block filesystems For 1k-block filesystems, the filesystem starts at block 1, not block 0. This fact is recorded in s_first_data_block, so use that to bump up the start_fsb before we start querying the filesystem for its space map. Without this, ext4/026 fails on 1k block ext4 because various functions (notably ext4_get_group_no_and_offset) don't know what to do with an fsblock that is "before" the start of the filesystem and return garbage results (blockgroup 2^32-1, etc.) that confuse fsmap. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-06-23 00:58:57 -04:00
Theodore Ts'o	bdddf34279	ext4: return EFSBADCRC if a bad checksum error is found in ext4_find_entry() Previously a bad directory block with a bad checksum is skipped; we should be returning EFSBADCRC (aka EBADMSG). Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-06-23 00:47:05 -04:00
Khazhismel Kumykov	6febe6f253	ext4: return EIO on read error in ext4_find_entry Previously, a read error would be ignored and we would eventually return NULL from ext4_find_entry, which signals "no such file or directory". We should be returning EIO. Signed-off-by: Khazhismel Kumykov <khazhy@google.com>	2017-06-23 00:29:05 -04:00
Eric Biggers	9ce0151a47	ext4: forbid encrypting root directory Currently it's possible to encrypt all files and directories on an ext4 filesystem by deleting everything, including lost+found, then setting an encryption policy on the root directory. However, this is incompatible with e2fsck because e2fsck expects to find, create, and/or write to lost+found and does not have access to any encryption keys. Especially problematic is that if e2fsck can't find lost+found, it will create it without regard for whether the root directory is encrypted. This is wrong for obvious reasons, and it causes a later run of e2fsck to consider the lost+found directory entry to be corrupted. Encrypting the root directory may also be of limited use because it is the "all-or-nothing" use case, for which dm-crypt can be used instead. (By design, encryption policies are inherited and cannot be overridden; so the root directory having an encryption policy implies that all files and directories on the filesystem have that same encryption policy.) In any case, encrypting the root directory is broken currently and must not be allowed; so start returning an error if userspace requests it. For now only do this in ext4, because f2fs and ubifs do not appear to have the lost+found requirement. We could move it into fscrypt_ioctl_set_policy() later if desired, though. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca>	2017-06-23 00:10:36 -04:00
Daeho Jeong	a015434480	ext4: send parallel discards on commit completions Now, when we mount ext4 filesystem with '-o discard' option, we have to issue all the discard commands for the blocks to be deallocated and wait for the completion of the commands on the commit complete phase. Because this procedure might involve a lot of sequential combinations of issuing discard commands and waiting for that, the delay of this procedure might be too much long, even to 17.0s in our test, and it results in long commit delay and fsync() performance degradation. To reduce this kind of delay, instead of adding callback for each extent and handling all of them in a sequential manner on commit phase, we instead add a separate list of extents to free to the superblock and then process this list at once after transaction commits so that we can issue all the discard commands in a parallel manner like XFS filesystem. Finally, we could enhance the discard command handling performance. The result was such that 17.0s delay of a single commit in the worst case has been enhanced to 4.8s. Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Tested-by: Hobin Woo <hobin.woo@samsung.com> Tested-by: Kitae Lee <kitae87.lee@samsung.com> Reviewed-by: Jan Kara <jack@suse.cz>	2017-06-22 23:54:33 -04:00
Jan Kara	3abb1a0fc2	ext4: avoid unnecessary stalls in ext4_evict_inode() These days inode reclaim calls evict_inode() only when it has no pages in the mapping. In that case it is not necessary to wait for transaction commit in ext4_evict_inode() as there can be no pages waiting to be committed. So avoid unnecessary transaction waiting in that case. We still have to keep the check for the case where ext4_evict_inode() gets called from other paths (e.g. umount) where inode still can have some page cache pages. Reported-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-06-22 23:49:46 -04:00
Linus Torvalds	a38371cba6	Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6 Pull cifs fixes from Steve French: "Various small fixes for stable" * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: CIFS: Fix some return values in case of error in 'crypt_message' cifs: remove redundant return in cifs_creation_time_get CIFS: Improve readdir verbosity CIFS: check if pages is null rather than bv for a failed allocation CIFS: Set ->should_dirty in cifs_user_readv()	2017-06-22 11:16:55 -07:00
Tahsin Erdogan	cdb7ee4c63	ext4: add nombcache mount option The main purpose of mb cache is to achieve deduplication in extended attributes. In use cases where opportunity for deduplication is unlikely, it only adds overhead. Add a mount option to explicitly turn off mb cache. Suggested-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-06-22 11:55:14 -04:00
Tahsin Erdogan	b9fc761ea2	ext4: strong binding of xattr inode references To verify that a xattr entry is not pointing to the wrong xattr inode, we currently check that the target inode has EXT4_EA_INODE_FL flag set and also the entry size matches the target inode size. For stronger validation, also incorporate crc32c hash of the value into the e_hash field. This is done regardless of whether the entry lives in the inode body or external attribute block. Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-06-22 11:53:15 -04:00
Tahsin Erdogan	daf8328172	ext4: eliminate xattr entry e_hash recalculation for removes When an extended attribute block is modified, ext4_xattr_hash_entry() recalculates e_hash for the entry that is pointed by s->here. This is unnecessary if the modification is to remove an entry. Currently, if the removed entry is the last one and there are other entries remaining, hash calculation targets the just erased entry which has been filled with zeroes and effectively does nothing. If the removed entry is not the last one and there are more entries, this time it will recalculate hash on the next entry which is totally unnecessary. Fix these by moving the decision on when to recalculate hash to ext4_xattr_set_entry(). Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-06-22 11:52:03 -04:00
Tahsin Erdogan	9c6e7853c5	ext4: reserve space for xattr entries/names New ea_inode feature allows putting large xattr values into external inodes. struct ext4_xattr_entry and the attribute name however have to remain in the inode extra space or external attribute block. Once that space is exhausted, no further entries can be added. Some of that space could also be used by values that fit in there at the time of addition. So, a single xattr entry whose value barely fits in the external block could prevent further entries being added. To mitigate the problem, this patch introduces a notion of reserved space in the external attribute block that cannot be used by value data. This reserve is enforced when ea_inode feature is enabled. The amount of reserve is arbitrarily chosen to be min(block_size/8, 1024). The table below shows how much space is reserved for each block size and the guaranteed mininum number of entries that can be placed in the external attribute block. block size reserved bytes entries (name length = 16) 1k 128 3 2k 256 7 4k 512 15 8k 1024 31 16k 1024 31 32k 1024 31 64k 1024 31 Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-06-22 11:48:53 -04:00
Tahsin Erdogan	7a9ca53aea	quota: add get_inode_usage callback to transfer multi-inode charges Ext4 ea_inode feature allows storing xattr values in external inodes to be able to store values that are bigger than a block in size. Ext4 also has deduplication support for these type of inodes. With deduplication, the actual storage waste is eliminated but the users of such inodes are still charged full quota for the inodes as if there was no sharing happening in the background. This design requires ext4 to manually charge the users because the inodes are shared. An implication of this is that, if someone calls chown on a file that has such references we need to transfer the quota for the file and xattr inodes. Current dquot_transfer() function implicitly transfers one inode charge. With ea_inode feature, we would like to transfer multiple inode charges. Add get_inode_usage callback which can interrogate the total number of inodes that were charged for a given inode. [ Applied fix from Colin King to make sure the 'ret' variable is initialized on the successful return path. Detected by CoverityScan, CID#1446616 ("Uninitialized scalar variable") --tytso] Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Jan Kara <jack@suse.cz>	2017-06-22 11:46:48 -04:00
Tahsin Erdogan	dec214d00e	ext4: xattr inode deduplication Ext4 now supports xattr values that are up to 64k in size (vfs limit). Large xattr values are stored in external inodes each one holding a single value. Once written the data blocks of these inodes are immutable. The real world use cases are expected to have a lot of value duplication such as inherited acls etc. To reduce data duplication on disk, this patch implements a deduplicator that allows sharing of xattr inodes. The deduplication is based on an in-memory hash lookup that is a best effort sharing scheme. When a xattr inode is read from disk (i.e. getxattr() call), its crc32c hash is added to a hash table. Before creating a new xattr inode for a value being set, the hash table is checked to see if an existing inode holds an identical value. If such an inode is found, the ref count on that inode is incremented. On value removal the ref count is decremented and if it reaches zero the inode is deleted. The quota charging for such inodes is manually managed. Every reference holder is charged the full size as if there was no sharing happening. This is consistent with how xattr blocks are also charged. [ Fixed up journal credits calculation to handle inline data and the rare case where an shared xattr block can get freed when two thread race on breaking the xattr block sharing. --tytso ] Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-06-22 11:44:55 -04:00
Tahsin Erdogan	30a7eb970c	ext4: cleanup transaction restarts during inode deletion During inode deletion, the number of journal credits that will be needed is hard to determine. For that reason we have journal extend/restart calls in several places. Whenever a transaction is restarted, filesystem must be in a consistent state because there is no atomicity guarantee beyond a restart call. Add ext4_xattr_ensure_credits() helper function which takes care of journal extend/restart logic. It also handles getting jbd2 write access and dirty metadata calls. This function is called at every iteration of handling an ea_inode reference. Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-06-22 11:42:09 -04:00
Tahsin Erdogan	02749a4c20	ext4: add ext4_is_quota_file() IS_NOQUOTA() indicates whether quota is disabled for an inode. Ext4 also uses it to check whether an inode is for a quota file. The distinction currently doesn't matter because quota is disabled only for the quota files. When we start disabling quota for other inodes in the future, we will want to make the distinction clear. Replace IS_NOQUOTA() call with ext4_is_quota_file() at places where we are checking for quota files. Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-06-22 11:31:25 -04:00
Tahsin Erdogan	47387409ee	ext2, ext4: make mb block cache names more explicit There will be a second mb_cache instance that tracks ea_inodes. Make existing names more explicit so that it is clear that they refer to xattr block cache. Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-06-22 11:28:55 -04:00
Tahsin Erdogan	c07dfcb458	mbcache: make mbcache naming more generic Make names more generic so that mbcache usage is not limited to block sharing. In a subsequent patch in the series ("ext4: xattr inode deduplication"), we start using the mbcache code for sharing xattr inodes. With that patch, old mb_cache_entry.e_block field could be holding either a block number or an inode number. Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-06-22 10:29:53 -04:00
Tahsin Erdogan	b6d9029df0	ext4: move struct ext4_xattr_inode_array to xattr.h Since this is a xattr specific data structure it is cleaner to keep it in xattr header file. Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-06-22 10:28:38 -04:00
Tahsin Erdogan	0421a189bc	ext4: modify ext4_xattr_ino_array to hold struct inode * Tracking struct inode * rather than the inode number eliminates the repeated ext4_xattr_inode_iget() call later. The second call cannot fail in practice but still requires explanation when it wants to ignore the return value. Avoid the trouble and make things simple. Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-06-22 10:26:31 -04:00
Darrick J. Wong	eb5e248d50	xfs: don't allow bmap on rt files bmap returns a dumb LBA address but not the block device that goes with that LBA. Swapfiles don't care about this and will blindly assume that the data volume is the correct blockdev, which is totally bogus for files on the rt subvolume. This results in the swap code doing IOs to arbitrary locations on the data device(!) if the passed in mapping is a realtime file, so just turn off bmap for rt files. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2017-06-21 20:27:35 -07:00
Tahsin Erdogan	c1a5d5f6ab	ext4: improve journal credit handling in set xattr paths Both ext4_set_acl() and ext4_set_context() need to be made aware of ea_inode feature when it comes to credits calculation. Also add a sufficient credits check in ext4_xattr_set_handle() right after xattr write lock is grabbed. Original credits calculation is done outside the lock so there is a possiblity that the initially calculated credits are not sufficient anymore. Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-06-21 22:28:40 -04:00
Tahsin Erdogan	65d3000520	ext4: ext4_xattr_delete_inode() should return accurate errors In a few places the function returns without trying to pass the actual error code to the caller. Fix those. Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-06-21 22:24:38 -04:00
Tahsin Erdogan	b347e2bcd1	ext4: retry storing value in external inode with xattr block too When value size is <= EXT4_XATTR_MIN_LARGE_EA_SIZE(), and it doesn't fit in either inline or xattr block, a second try is made to store it in an external inode while storing the entry itself in inline area. There should also be an attempt to store the entry in xattr block. This patch adds a retry loop to do that. It also makes the caller the sole decider on whether to store a value in an external inode. Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-06-21 22:20:32 -04:00
Tahsin Erdogan	b315529891	ext4: fix credits calculation for xattr inode When there is no space for a value in xattr block, it may be stored in an xattr inode even if the value length is less than EXT4_XATTR_MIN_LARGE_EA_SIZE(). So the current assumption in credits calculation is wrong. Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-06-21 22:16:20 -04:00
Tahsin Erdogan	7cec191894	ext4: fix ext4_xattr_cmp() When a xattr entry refers to an external inode, the value data is not available in the inline area so we should not attempt to read it using value offset. Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-06-21 22:14:30 -04:00
Tahsin Erdogan	f6109100ba	ext4: fix ext4_xattr_move_to_block() When moving xattr entries from inline area to a xattr block, entries that refer to external xattr inodes need special handling because value data is not available in the inline area but rather should be read from its external inode. Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-06-21 22:11:54 -04:00
Tahsin Erdogan	9bb21cedda	ext4: fix ext4_xattr_make_inode_space() value size calculation ext4_xattr_make_inode_space() is interested in calculating the inline space used in an inode. When a xattr entry refers to an external inode the value size indicates the external inode size, not the value size in the inline area. Change the function to take this into account. Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-06-21 22:05:44 -04:00
Tahsin Erdogan	0bd454c04f	ext4: ext4_xattr_value_same() should return false for external data ext4_xattr_value_same() is used as a quick optimization in case the new xattr value is identical to the previous value. When xattr value is stored in a xattr inode the check becomes expensive so it is better to just assume that they are not equal. Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-06-21 22:02:06 -04:00
Tahsin Erdogan	990461dd85	ext4: add missing le32_to_cpu(e_value_inum) conversions Two places in code missed converting xattr inode number using le32_to_cpu(). Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-06-21 21:59:30 -04:00
Tahsin Erdogan	9096669332	ext4: clean up ext4_xattr_inode_get() The input and output values of *size parameter are equal on successful return from ext4_xattr_inode_get(). On error return, the callers ignore the output value so there is no need to update it. Also check for NULL return from ext4_bread(). If the actual xattr inode size happens to be smaller than the expected size, ext4_bread() may return NULL which would indicate data corruption. Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-06-21 21:57:36 -04:00
Tahsin Erdogan	bab79b0499	ext4: change ext4_xattr_inode_iget() signature In general, kernel functions indicate success/failure through their return values. This function returns the status as an output parameter and reserves the return value for the inode. Make it follow the general convention. Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-06-21 21:49:53 -04:00
Tahsin Erdogan	0eefb10758	ext4: extended attribute value size limit is enforced by vfs EXT4_XATTR_MAX_LARGE_EA_SIZE definition in ext4 is currently unused. Besides, vfs enforces its own 64k limit which makes the 1MB limit in ext4 redundant. Remove it. Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-06-21 21:41:37 -04:00

... 17 18 19 20 21 ...

50492 Commits