OSDN Git Service

sagit-ice-cold/kernel_xiaomi_msm8998.git
6 years agocrypto: introduce crypto wait for async op
Gilad Ben-Yossef [Wed, 18 Oct 2017 07:00:38 +0000 (08:00 +0100)]
crypto: introduce crypto wait for async op

Invoking a possibly async. crypto op and waiting for completion
while correctly handling backlog processing is a common task
in the crypto API implementation and outside users of it.

This patch adds a generic implementation for doing so in
preparation for using it across the board instead of hand
rolled versions.

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
CC: Eric Biggers <ebiggers3@gmail.com>
CC: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
6 years agofscrypt: lock mutex before checking for bounce page pool
Eric Biggers [Sun, 29 Oct 2017 10:30:19 +0000 (06:30 -0400)]
fscrypt: lock mutex before checking for bounce page pool

fscrypt_initialize(), which allocates the global bounce page pool when
an encrypted file is first accessed, uses "double-checked locking" to
try to avoid locking fscrypt_init_mutex.  However, it doesn't use any
memory barriers, so it's theoretically possible for a thread to observe
a bounce page pool which has not been fully initialized.  This is a
classic bug with "double-checked locking".

While "only a theoretical issue" in the latest kernel, in pre-4.8
kernels the pointer that was checked was not even the last to be
initialized, so it was easily possible for a crash (NULL pointer
dereference) to happen.  This was changed only incidentally by the large
refactor to use fs/crypto/.

Solve both problems in a trivial way that can easily be backported: just
always take the mutex.  It's theoretically less efficient, but it
shouldn't be noticeable in practice as the mutex is only acquired very
briefly once per encrypted file.

Later I'd like to make this use a helper macro like DO_ONCE().  However,
DO_ONCE() runs in atomic context, so we'd need to add a new macro that
allows blocking.

Cc: stable@vger.kernel.org # v4.1+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: new helper function - fscrypt_prepare_setattr()
Eric Biggers [Mon, 9 Oct 2017 19:15:44 +0000 (12:15 -0700)]
fscrypt: new helper function - fscrypt_prepare_setattr()

Introduce a helper function for filesystems to call when processing
->setattr() on a possibly-encrypted inode.  It handles enforcing that an
encrypted file can only be truncated if its encryption key is available.

Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: new helper function - fscrypt_prepare_lookup()
Eric Biggers [Mon, 9 Oct 2017 19:15:43 +0000 (12:15 -0700)]
fscrypt: new helper function - fscrypt_prepare_lookup()

Introduce a helper function which prepares to look up the given dentry
in the given directory.  If the directory is encrypted, it handles
loading the directory's encryption key, setting the dentry's ->d_op to
fscrypt_d_ops, and setting DCACHE_ENCRYPTED_WITH_KEY if the directory's
encryption key is available.

Note: once all filesystems switch over to this, we'll be able to move
fscrypt_d_ops and fscrypt_set_encrypted_dentry() to fscrypt_private.h.

Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: new helper function - fscrypt_prepare_rename()
Eric Biggers [Mon, 9 Oct 2017 19:15:42 +0000 (12:15 -0700)]
fscrypt: new helper function - fscrypt_prepare_rename()

Introduce a helper function which prepares to rename a file into a
possibly encrypted directory.  It handles loading the encryption keys
for the source and target directories if needed, and it handles
enforcing that if the target directory (and the source directory for a
cross-rename) is encrypted, then the file being moved into the directory
has the same encryption policy as its containing directory.

Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: new helper function - fscrypt_prepare_link()
Eric Biggers [Mon, 9 Oct 2017 19:15:41 +0000 (12:15 -0700)]
fscrypt: new helper function - fscrypt_prepare_link()

Introduce a helper function which prepares to link an inode into a
possibly-encrypted directory.  It handles setting up the target
directory's encryption key, then verifying that the link won't violate
the constraint that all files in an encrypted directory tree use the
same encryption policy.

Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: new helper function - fscrypt_file_open()
Eric Biggers [Mon, 9 Oct 2017 19:15:40 +0000 (12:15 -0700)]
fscrypt: new helper function - fscrypt_file_open()

Add a helper function which prepares to open a regular file which may be
encrypted.  It handles setting up the file's encryption key, then
checking that the file's encryption policy matches that of its parent
directory (if the parent directory is encrypted).  It may be set as the
->open() method or it can be called from another ->open() method.

Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: new helper function - fscrypt_require_key()
Eric Biggers [Mon, 9 Oct 2017 19:15:39 +0000 (12:15 -0700)]
fscrypt: new helper function - fscrypt_require_key()

Add a helper function which checks if an inode is encrypted, and if so,
tries to set up its encryption key.  This is a pattern which is
duplicated in multiple places in each of ext4, f2fs, and ubifs --- for
example, when a regular file is asked to be opened or truncated.

Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: remove unneeded empty fscrypt_operations structs
Eric Biggers [Mon, 9 Oct 2017 19:15:38 +0000 (12:15 -0700)]
fscrypt: remove unneeded empty fscrypt_operations structs

In the case where a filesystem has been configured without encryption
support, there is no longer any need to initialize ->s_cop at all, since
none of the methods are ever called.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: remove ->is_encrypted()
Eric Biggers [Mon, 9 Oct 2017 19:15:37 +0000 (12:15 -0700)]
fscrypt: remove ->is_encrypted()

Now that all callers of fscrypt_operations.is_encrypted() have been
switched to IS_ENCRYPTED(), remove ->is_encrypted().

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: switch from ->is_encrypted() to IS_ENCRYPTED()
Eric Biggers [Mon, 9 Oct 2017 19:15:36 +0000 (12:15 -0700)]
fscrypt: switch from ->is_encrypted() to IS_ENCRYPTED()

IS_ENCRYPTED() now gives the same information as
i_sb->s_cop->is_encrypted() but is more efficient, since IS_ENCRYPTED()
is just a simple flag check.  Prepare to remove ->is_encrypted() by
switching all callers to IS_ENCRYPTED().

Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofs, fscrypt: add an S_ENCRYPTED inode flag
Eric Biggers [Mon, 9 Oct 2017 19:15:35 +0000 (12:15 -0700)]
fs, fscrypt: add an S_ENCRYPTED inode flag

Introduce a flag S_ENCRYPTED which can be set in ->i_flags to indicate
that the inode is encrypted using the fscrypt (fs/crypto/) mechanism.

Checking this flag will give the same information that
inode->i_sb->s_cop->is_encrypted(inode) currently does, but will be more
efficient.  This will be useful for adding higher-level helper functions
for filesystems to use.  For example we'll be able to replace this:

if (ext4_encrypted_inode(inode)) {
ret = fscrypt_get_encryption_info(inode);
if (ret)
return ret;
if (!fscrypt_has_encryption_key(inode))
return -ENOKEY;
}

with this:

ret = fscrypt_require_key(inode);
if (ret)
return ret;

... since we'll be able to retain the fast path for unencrypted files as
a single flag check, using an inline function.  This wasn't possible
before because we'd have had to frequently call through the
->i_sb->s_cop->is_encrypted function pointer, even when the encryption
support was disabled or not being used.

Note: we don't define S_ENCRYPTED to 0 if CONFIG_FS_ENCRYPTION is
disabled because we want to continue to return an error if an encrypted
file is accessed without encryption support, rather than pretending that
it is unencrypted.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: clean up include file mess
Dave Chinner [Mon, 9 Oct 2017 19:15:34 +0000 (12:15 -0700)]
fscrypt: clean up include file mess

Filesystems have to include different header files based on whether they
are compiled with encryption support or not. That's nasty and messy.

Instead, rationalise the headers so we have a single include fscrypt.h
and let it decide what internal implementation to include based on the
__FS_HAS_ENCRYPTION define.  Filesystems set __FS_HAS_ENCRYPTION to 1
before including linux/fscrypt.h if they are built with encryption
support.  Otherwise, they must set __FS_HAS_ENCRYPTION to 0.

Add guards to prevent fscrypt_supp.h and fscrypt_notsupp.h from being
directly included by filesystems.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
[EB: use 1 and 0 rather than defined/undefined]
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: fix dereference of NULL user_key_payload
Eric Biggers [Mon, 9 Oct 2017 19:46:18 +0000 (12:46 -0700)]
fscrypt: fix dereference of NULL user_key_payload

When an fscrypt-encrypted file is opened, we request the file's master
key from the keyrings service as a logon key, then access its payload.
However, a revoked key has a NULL payload, and we failed to check for
this.  request_key() *does* skip revoked keys, but there is still a
window where the key can be revoked before we acquire its semaphore.

Fix it by checking for a NULL payload, treating it like a key which was
already revoked at the time it was requested.

Fixes: 88bd6ccdcdd6 ("ext4 crypto: add encryption key management facilities")
Reviewed-by: James Morris <james.l.morris@oracle.com>
Cc: <stable@vger.kernel.org> [v4.1+]
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
6 years agofscrypt: make ->dummy_context() return bool
Eric Biggers [Thu, 22 Jun 2017 19:14:40 +0000 (12:14 -0700)]
fscrypt: make ->dummy_context() return bool

This makes it consistent with ->is_encrypted(), ->empty_dir(), and
fscrypt_dummy_context_enabled().

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agof2fs: deny accessing encryption policy if encryption is off
Chao Yu [Tue, 14 Nov 2017 11:28:42 +0000 (19:28 +0800)]
f2fs: deny accessing encryption policy if encryption is off

This patch adds missing feature check in encryption ioctl interface.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: inject fault in inc_valid_node_count
Chao Yu [Mon, 13 Nov 2017 09:32:40 +0000 (17:32 +0800)]
f2fs: inject fault in inc_valid_node_count

This patch adds missing fault injection in inc_valid_node_count.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix to clear FI_NO_PREALLOC
Chao Yu [Mon, 13 Nov 2017 09:32:39 +0000 (17:32 +0800)]
f2fs: fix to clear FI_NO_PREALLOC

We need to clear FI_NO_PREALLOC flag in error path of f2fs_file_write_iter,
otherwise we will lose the chance to preallocate blocks in latter write()
at one time.

Fixes: dc91de78e5e1 ("f2fs: do not preallocate blocks which has wrong buffer")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: expose quota information in debugfs
Jaegeuk Kim [Tue, 14 Nov 2017 01:46:38 +0000 (17:46 -0800)]
f2fs: expose quota information in debugfs

This patch shows # of dirty pages and # of hidden quota files.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: separate nat entry mem alloc from nat_tree_lock
Yunlei He [Fri, 10 Nov 2017 21:36:51 +0000 (13:36 -0800)]
f2fs: separate nat entry mem alloc from nat_tree_lock

This patch splits memory allocation part in nat_entry to avoid lock contention.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: validate before set/clear free nat bitmap
LiFan [Fri, 10 Nov 2017 07:41:42 +0000 (15:41 +0800)]
f2fs: validate before set/clear free nat bitmap

In flush_nat_entries, all dirty nats will be flushed and if
their new address isn't NULL_ADDR, their bitmaps will be updated,
the free_nid_count of the bitmaps will be increaced regardless
of whether the nats have already been occupied before.
This could lead to wrong free_nid_count.
So this patch checks the status of the bits beforeactually
set/clear them.

Fixes: 586d1492f301 ("f2fs: skip scanning free nid bitmap of full NAT blocks")
Signed-off-by: Fan li <fanofcode.li@samsung.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: avoid opened loop codes in __add_ino_entry
Chao Yu [Fri, 10 Nov 2017 01:30:42 +0000 (09:30 +0800)]
f2fs: avoid opened loop codes in __add_ino_entry

We will keep __add_ino_entry success all the time, for ENOMEM failure
case, we have already handled it by using  __GFP_NOFAIL flag, so we
don't have to use additional opened loop codes here, remove them.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: apply write hints to select the type of segments for buffered write
Hyunchul Lee [Thu, 9 Nov 2017 05:51:27 +0000 (14:51 +0900)]
f2fs: apply write hints to select the type of segments for buffered write

Write hints helps F2FS to determine which type of segments would be
selected for buffered write.

This patch implements the mapping from write hints to segment types
as shown below.

  hints               segment type
  -----               ------------
  WRITE_LIFE_SHORT    CURSEG_HOT_DATA
  WRITE_LIFE_EXTREME  CURSEG_COLD_DATA
  others              CURSEG_WARM_DATA

the F2FS poliy for hot/cold seperation has precedence over this hints.
And hints are not applied in in-place update.

Signed-off-by: Hyunchul Lee <cheol.lee@lge.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: introduce scan_curseg_cache for cleanup
Chao Yu [Wed, 8 Nov 2017 09:47:36 +0000 (17:47 +0800)]
f2fs: introduce scan_curseg_cache for cleanup

Commit 4ac912427c42 ("f2fs: introduce free nid bitmap") copied codes
from __build_free_nids() into scan_free_nid_bits(), they are redundant,
introduce one common function scan_curseg_cache for cleanup.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: optimize the way of traversing free_nid_bitmap
Fan Li [Tue, 7 Nov 2017 11:14:24 +0000 (19:14 +0800)]
f2fs: optimize the way of traversing free_nid_bitmap

We call scan_free_nid_bits only when there isn't many
free nids left, it means that marked bits in free_nid_bitmap
are supposed to be few, use find_next_bit_le is more
efficient in such case.
According to my tests, use find_next_bit_le instead of
test_bit_le will cut down the traversal time to one
third of its original.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: keep scanning until enough free nids are acquired
Fan Li [Tue, 7 Nov 2017 03:04:33 +0000 (11:04 +0800)]
f2fs: keep scanning until enough free nids are acquired

In current version, after scan_free_nid_bits, the scan is over if
nid_cnt[FREE_NID] != 0. In most cases, there are still free nids in the
free list during the scan, and scan_free_nid_bits usually can't increase
nid_cnt[FREE_NID]. It causes that __build_free_nids is called many times
without solving the shortage of the free nids. This patch fixes that.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: trace checkpoint reason in fsync()
Chao Yu [Mon, 6 Nov 2017 14:51:45 +0000 (22:51 +0800)]
f2fs: trace checkpoint reason in fsync()

This patch slightly changes need_do_checkpoint to return the detail
info that indicates why we need do checkpoint, then caller could print
it with trace message.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: keep isize once block is reserved cross EOF
Chao Yu [Sun, 5 Nov 2017 13:53:30 +0000 (21:53 +0800)]
f2fs: keep isize once block is reserved cross EOF

Without FADVISE_KEEP_SIZE_BIT, we will try to recover file size
according to last non-hole block, so in fallocate(), we must set
FADVISE_KEEP_SIZE_BIT flag once we have preallocated block cross
EOF, instead of when all preallocation is success. Otherwise, file
size will be incorrect due to lack of this flag.

Simple testcase to reproduce this:

1. echo 2 > /sys/fs/f2fs/<device>/inject_type
2. echo 10 > /sys/fs/f2fs/<device>/inject_rate
3. run tests/generic/392
4. disable fault injection
5. do remount

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: avoid race in between GC and block exchange
Chao Yu [Fri, 3 Nov 2017 02:21:05 +0000 (10:21 +0800)]
f2fs: avoid race in between GC and block exchange

During block exchange in {insert,collapse,move}_range, page-block mapping
is unstable due to mapping moving or recovery, so there should be no
concurrent cache read operation rely on such mapping, nor cache write
operation to mess up block exchange.

So this patch let background GC be aware of that.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: save a multiplication for last_nid calculation
Fan Li [Thu, 2 Nov 2017 03:02:52 +0000 (11:02 +0800)]
f2fs: save a multiplication for last_nid calculation

Use a slightly easier way to calculate last_nid.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix summary info corruption
Chao Yu [Thu, 2 Nov 2017 12:41:03 +0000 (20:41 +0800)]
f2fs: fix summary info corruption

Sometimes, after running generic/270 of fstest, fsck reports summary
info and actual position of block address in direct node becoming
inconsistent.

The root cause is race in between __f2fs_replace_block and change_curseg
as below:

Thread A Thread B
- __clone_blkaddrs
 - f2fs_replace_block
  - __f2fs_replace_block
   - segnoA = GET_SEGNO(sbi, blkaddrA);
   - type = se->type:=CURSEG_HOT_DATA
   - if (!IS_CURSEG(sbi, segnoA))
         type = CURSEG_WARM_DATA
- allocate_data_block
 - allocate_segment
  - get_ssr_segment
  - change_curseg(segnoA, CURSEG_HOT_DATA)
   - change_curseg(segnoA, CURSEG_WARM_DATA)
    - reset_curseg
     - __set_sit_entry_type
      - change se->type from CURSEG_HOT_DATA to CURSEG_WARM_DATA

So finally, hot curseg locates in segnoA, but type of segnoA becomes
CURSEG_WARM_DATA.

Then if we invoke __f2fs_replace_block(blkaddrB, blkaddrA, true, false),
as blkaddrA locates in segnoA, so we will move warm type curseg to segnoA,
then change its summary cache and writeback it to summary block.

But segnoA is used by hot type curseg too, once it moves or persist, it
will cover summary block content with inner old summary cache, result in
inconsistent status.

This patch tries to fix this issue by introduce global curseg lock to avoid
race in between __f2fs_replace_block and change_curseg.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: remove dead code in update_meta_page
Chao Yu [Thu, 2 Nov 2017 12:41:02 +0000 (20:41 +0800)]
f2fs: remove dead code in update_meta_page

After commit a468f0ef516f ("f2fs: use crc and cp version to determine
roll-forward recovery"), last caller of update_meta_page passing @src
with NULL is gone, so remove related dead code there.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: remove unneeded semicolon
Chao Yu [Thu, 2 Nov 2017 12:41:01 +0000 (20:41 +0800)]
f2fs: remove unneeded semicolon

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: don't bother with inode->i_version
Jeff Layton [Mon, 30 Oct 2017 15:11:56 +0000 (11:11 -0400)]
f2fs: don't bother with inode->i_version

f2fs does not set the SB_I_VERSION flag, so the i_version will never
be incremented on write. It was recently changed to increment the
i_version on a quota write, which isn't necessary here.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: check curseg space before foreground GC
Chao Yu [Mon, 30 Oct 2017 09:49:54 +0000 (17:49 +0800)]
f2fs: check curseg space before foreground GC

When we are closing to trigger foreground GC, if there are only a few
of dirty metas, we can log these dirty metas in left space of opened
segments instead of triggering foreground GC.

With this patch, total count of foreground GC triggered by
test/generic/* of fstest suit reduce from 254 to 184.

So let's do the check before foreground GC anyway.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: use rw_semaphore to protect SIT cache
Chao Yu [Mon, 30 Oct 2017 09:49:53 +0000 (17:49 +0800)]
f2fs: use rw_semaphore to protect SIT cache

There are some cases user didn't update SIT cache under this lock,
so let's use rw_semaphore instead of mutex to enhance concurrently
accessing.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: support quota sys files
Jaegeuk Kim [Fri, 6 Oct 2017 16:14:28 +0000 (09:14 -0700)]
f2fs: support quota sys files

This patch supports hidden quota files in the system, which will be used for
Android. It requires up-to-date f2fs-tools later than v1.9.0.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: add quota_ino feature infra
Jaegeuk Kim [Fri, 6 Oct 2017 04:03:06 +0000 (21:03 -0700)]
f2fs: add quota_ino feature infra

This patch adds quota_ino feature infra to be used for quota files.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: optimize __update_nat_bits
Fan Li [Mon, 30 Oct 2017 07:19:48 +0000 (15:19 +0800)]
f2fs: optimize __update_nat_bits

Make three modification for __update_nat_bits:
1. Take the codes of dealing the nat with nid 0 out of the loop
    Such nat only needs to be dealt with once at beginning.
2. Use " nat_index == 0" instead of " start_nid == 0" to decide if it's the first nat block
    It's better that we don't assume @start_nid is the first nid of the nat block it's in.
3. Use " if (nat_blk->entries[i].block_addr != NULL_ADDR)" to explicitly comfirm the value of block_addr
    use constant to make sure the codes is right, even if the value of NULL_ADDR changes.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: modify for accurate fggc node io stat
Yunlei He [Mon, 30 Oct 2017 06:18:55 +0000 (14:18 +0800)]
f2fs: modify for accurate fggc node io stat

modify for accurate fggc node io stat

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agoRevert "f2fs: handle dirty segments inside refresh_sit_entry"
Yunlong Song [Mon, 30 Oct 2017 01:33:41 +0000 (09:33 +0800)]
Revert "f2fs: handle dirty segments inside refresh_sit_entry"

This reverts commit 5e443818fa0b2a2845561ee25bec181424fb2889

The commit should be reverted because call sequence of below two parts
of code must be kept:
a. update sit information, it needs to be updated before segment
allocation since latter allocation may trigger SSR, and SSR allocation
needs latest valid block information of all segments.
b. update segment status, it needs to be updated after segment allocation
since we can skip updating current opened segment status.

Fixes: 5e443818fa0b ("f2fs: handle dirty segments inside refresh_sit_entry")
Suggested-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: remove refresh_sit_entry function]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: add a function to move nid
Fan Li [Sat, 28 Oct 2017 11:03:37 +0000 (19:03 +0800)]
f2fs: add a function to move nid

This patch add a new function to move nid from one state to another.
Move operation is heavily used, by adding a new function for it
we can cut down some branches from several flow.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: export SSR allocation threshold
Chao Yu [Sat, 28 Oct 2017 08:52:33 +0000 (16:52 +0800)]
f2fs: export SSR allocation threshold

This patch exports min_ssr_segments threshold in sysfs to let user
control triggering SSR allocation flexibly.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: give correct trimmed blocks in fstrim
Chao Yu [Sat, 28 Oct 2017 08:52:32 +0000 (16:52 +0800)]
f2fs: give correct trimmed blocks in fstrim

We have supported to issue discard in specified range during fstrim,
it needs to return caller with successfully trimmed bytes in that
range instead of bytes of invalid blocks which are scanned in
checkpoint.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: support bio allocation error injection
Chao Yu [Sat, 28 Oct 2017 08:52:31 +0000 (16:52 +0800)]
f2fs: support bio allocation error injection

This patch adds to support bio allocation error injection to simulate
out-of-memory test scenario.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: support get_page error injection
Chao Yu [Sat, 28 Oct 2017 08:52:30 +0000 (16:52 +0800)]
f2fs: support get_page error injection

This patch adds to support get_page error injection to simulate
out-of-memory test scenario.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: add missing sysfs description
Chao Yu [Sat, 28 Oct 2017 08:52:29 +0000 (16:52 +0800)]
f2fs: add missing sysfs description

There are some missing sysfs entries' description in document, add them.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: support soft block reservation
Yunlong Song [Fri, 27 Oct 2017 12:45:05 +0000 (20:45 +0800)]
f2fs: support soft block reservation

It supports to extend reserved_blocks sysfs interface to be soft
threshold, which allows user configure it exceeding current available
user space. This patch also introduces a new sysfs interface called
current_reserved_blocks, which shows the current blocks that have
already been reserved.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: handle error case when adding xattr entry
Jaegeuk Kim [Mon, 16 Oct 2017 22:05:16 +0000 (15:05 -0700)]
f2fs: handle error case when adding xattr entry

This patch fixes recovering incomplete xattr entries remaining in inline xattr
and xattr block, caused by any kind of errors.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: support flexible inline xattr size
Chao Yu [Wed, 6 Sep 2017 13:59:50 +0000 (21:59 +0800)]
f2fs: support flexible inline xattr size

Now, in product, more and more features based on file encryption were
introduced, their demand of xattr space is increasing, however, inline
xattr has fixed-size of 200 bytes, once inline xattr space is full, new
increased xattr data would occupy additional xattr block which may bring
us more space usage and performance regression during persisting.

In order to resolve above issue, it's better to expand inline xattr size
flexibly according to user's requirement.

So this patch introduces new filesystem feature 'flexible inline xattr',
and new mount option 'inline_xattr_size=%u', once mkfs enables the
feature, we can use the option to make f2fs supporting flexible inline
xattr size.

To support this feature, we add extra attribute i_inline_xattr_size in
inode layout, indicating that how many space inline xattr borrows from
block address mapping space in inode layout, by this, we can easily
locate and store flexible-sized inline xattr data in inode.

Inode disk layout:
  +----------------------+
  | .i_mode              |
  | ...                  |
  | .i_ext               |
  +----------------------+
  | .i_extra_isize       |
  | .i_inline_xattr_size |-----------+
  | ...                  |           |
  +----------------------+           |
  | .i_addr              |           |
  |  - block address or  |           |
  |  - inline data       |           |
  +----------------------+<---+      v
  |    inline xattr      |    +---inline xattr range
  +----------------------+<---+
  | .i_nid               |
  +----------------------+
  |   node_footer        |
  | (nid, ino, offset)   |
  +----------------------+

Note that, we have to cnosider backward compatibility which reserved
inline_data space, 200 bytes, all the time, reported by Sheng Yong.

Previous inline data or directory always reserved 200 bytes in inode layout,
even if inline_xattr is disabled. In order to keep inline_dentry's structure
for backward compatibility, we get the space back only from inline_data.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Reported-by: Sheng Yong <shengyong1@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: show current cp state
Jaegeuk Kim [Thu, 26 Oct 2017 08:31:22 +0000 (10:31 +0200)]
f2fs: show current cp state

This patch shows whether checkpoint met any error case.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: add missing quota_initialize
Jaegeuk Kim [Mon, 23 Oct 2017 21:50:15 +0000 (23:50 +0200)]
f2fs: add missing quota_initialize

This patch adds to call quota_intialize in f2fs_set_acl, f2fs_unlink,
and f2fs_rename.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: show # of dirty segments via sysfs
Jaegeuk Kim [Tue, 24 Oct 2017 07:46:54 +0000 (09:46 +0200)]
f2fs: show # of dirty segments via sysfs

This patch adds one sysfs entry to show # of dirty segments which can be
used for gc timing by user.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: stop all the operations by cp_error flag
Jaegeuk Kim [Mon, 23 Oct 2017 21:48:49 +0000 (23:48 +0200)]
f2fs: stop all the operations by cp_error flag

This patch replaces to use cp_error flag instead of RDONLY for quota off.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: remove several redundant assignments
Colin Ian King [Thu, 19 Oct 2017 10:58:21 +0000 (12:58 +0200)]
f2fs: remove several redundant assignments

There are several assignments to variables that are redundant
as the values are never read when the variables are updated later
and so the redundant statements can be safely removed.

Cleans up clang warnings:
fs/f2fs/segment.c:923:19: warning: Value stored to 'p' during its initialization is never read
fs/f2fs/segment.c:2060:2: warning: Value stored to 'hint' is never read
fs/f2fs/segment.c:2353:2: warning: Value stored to 'start_block' is never read
fs/f2fs/segment.c:2354:2: warning: Value stored to 'end_block' is never read

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: avoid using timespec
Arnd Bergmann [Thu, 19 Oct 2017 09:52:47 +0000 (11:52 +0200)]
f2fs: avoid using timespec

All uses of timespec are deprecated, and this one is not particularly
useful, as the documented method for converting seconds to jiffies
is to multiply by 'HZ'.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix to correct no_fggc_candidate
Chao Yu [Wed, 18 Oct 2017 02:34:14 +0000 (10:34 +0800)]
f2fs: fix to correct no_fggc_candidate

There may be extreme case as below:

For one section contains one segment, and there are total 100 segments
with 10% over-privision ratio in f2fs partition, fggc_threshold will
be rounded down to 460 instead of 460.8 as below caclulation:

sbi->fggc_threshold = div_u64((u64)(main_count - ovp_count) *
BLKS_PER_SEC(sbi), (main_count - resv_count));

If section usage is as:
60 segments which contain 460 valid blocks
40 segments which contain 462 valid blocks

As valid block number in all sections is large than fggc_threshold, so
none of them will be chosen as candidate due to incorrect fggc_threshold.

Let's just soften the term of choosing foreground GC candidates.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agoRevert "f2fs: return wrong error number on f2fs_quota_write"
Jaegeuk Kim [Thu, 19 Oct 2017 19:07:11 +0000 (12:07 -0700)]
Revert "f2fs: return wrong error number on f2fs_quota_write"

This reverts commit 4f31d26b0c17f2aae6a6afeb823a87e20671ab4b.

It turns out that we need to report error number if nothing was written.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: remove obsolete pointer for truncate_xattr_node
Jaegeuk Kim [Thu, 19 Oct 2017 18:48:57 +0000 (11:48 -0700)]
f2fs: remove obsolete pointer for truncate_xattr_node

This patch removes obosolete parameter for truncate_xattr_node.

Suggested-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: retry ENOMEM for quota_read|write
Jaegeuk Kim [Thu, 19 Oct 2017 16:43:56 +0000 (09:43 -0700)]
f2fs: retry ENOMEM for quota_read|write

This gives another chance to read or write quota data.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: limit # of inmemory pages
Jaegeuk Kim [Thu, 19 Oct 2017 02:05:57 +0000 (19:05 -0700)]
f2fs: limit # of inmemory pages

If some abnormal users try lots of atomic write operations, f2fs is able to
produce pinned pages in the main memory which affects system performance.
This patch limits that as 20% over total memory size, and if f2fs reaches
to the limit, it will drop all the inmemory pages.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: update ctx->pos correctly when hitting hole in directory
Chao Yu [Fri, 13 Oct 2017 10:01:36 +0000 (18:01 +0800)]
f2fs: update ctx->pos correctly when hitting hole in directory

This patch fixes to update ctx->pos correctly when hitting hole in
directory.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: relocate readahead codes in readdir()
Chao Yu [Fri, 13 Oct 2017 10:01:35 +0000 (18:01 +0800)]
f2fs: relocate readahead codes in readdir()

Previously, for large directory, we just do readahead only once in
readdir(), readdir()'s performance may drop when traversing latter
blocks. In order to avoid this, relocate readahead codes to covering
all traverse flow.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: allow readdir() to be interrupted
Chao Yu [Fri, 13 Oct 2017 10:01:34 +0000 (18:01 +0800)]
f2fs: allow readdir() to be interrupted

This patch follows ext4 to allow readdir() in large empty directory to
be interrupted. Referenced commit of ext4: 1f60fbe72749 ("ext4: allow
readdir()'s of large empty directories to be interrupted").

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: trace f2fs_readdir
Chao Yu [Fri, 13 Oct 2017 10:01:33 +0000 (18:01 +0800)]
f2fs: trace f2fs_readdir

This patch adds trace for f2fs_readdir.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: trace f2fs_lookup
Chao Yu [Tue, 17 Oct 2017 09:33:41 +0000 (17:33 +0800)]
f2fs: trace f2fs_lookup

This patch adds trace for f2fs_lookup.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: skip searching non-exist range in truncate_hole
Weichao Guo [Sat, 14 Oct 2017 00:13:32 +0000 (08:13 +0800)]
f2fs: skip searching non-exist range in truncate_hole

Let's skip entire non-exist area to speed up truncate_hole by
using get_next_page_offset.

Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: expose some sectors to user in inline data or dentry case
Jaegeuk Kim [Fri, 13 Oct 2017 17:27:45 +0000 (10:27 -0700)]
f2fs: expose some sectors to user in inline data or dentry case

If there's some data written through inline data or dentry, we need to shouw
st_blocks. This fixes reporting zero blocks even though there is small written
data.

Cc: stable@vger.kernel.org
Reviewed-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: avoid link file for quotacheck]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: avoid stale fi->gdirty_list pointer
Jaegeuk Kim [Fri, 13 Oct 2017 02:12:53 +0000 (19:12 -0700)]
f2fs: avoid stale fi->gdirty_list pointer

When doing fault injection test, f2fs_evict_inode() didn't remove gdirty_list
which incurs a kernel panic due to wrong pointer access.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs/crypto: drop crypto key at evict_inode only
Jaegeuk Kim [Sat, 7 Oct 2017 07:08:05 +0000 (00:08 -0700)]
f2fs/crypto: drop crypto key at evict_inode only

This patch avoids dropping crypto key in f2fs_drop_inode, so we can guarantee
it happens only at evict_inode.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix to avoid race when accessing last_disk_size
Chao Yu [Mon, 9 Oct 2017 09:55:19 +0000 (17:55 +0800)]
f2fs: fix to avoid race when accessing last_disk_size

last_disk_size could be wrong due to concurrently updating, so using
i_sem semaphore to make last_disk_size updating exclusive to fix this
issue.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: Fix bool initialization/comparison
Thomas Meyer [Sat, 7 Oct 2017 14:02:21 +0000 (16:02 +0200)]
f2fs: Fix bool initialization/comparison

Bool initializations should use true and false. Bool tests don't need
comparisons.

Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: give up CP_TRIMMED_FLAG if it drops discards
Chao Yu [Wed, 4 Oct 2017 01:08:37 +0000 (09:08 +0800)]
f2fs: give up CP_TRIMMED_FLAG if it drops discards

In ->umount, once we drop remained discard entries, we should not
set CP_TRIMMED_FLAG with another checkpoint.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: trace f2fs_remove_discard
Chao Yu [Wed, 4 Oct 2017 01:08:36 +0000 (09:08 +0800)]
f2fs: trace f2fs_remove_discard

This patch adds tracepoint to trace f2fs_remove_discard.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: reduce cmd_lock coverage in __issue_discard_cmd
Chao Yu [Wed, 4 Oct 2017 01:08:35 +0000 (09:08 +0800)]
f2fs: reduce cmd_lock coverage in __issue_discard_cmd

__submit_discard_cmd may lead long latency due to exhaustion of I/O
request resource in block layer, so issuing all discard under cmd_lock
may lead to hangtask, in order to avoid that, let's reduce it's coverage.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: split discard policy
Chao Yu [Wed, 4 Oct 2017 01:08:34 +0000 (09:08 +0800)]
f2fs: split discard policy

There are many different scenarios such as fstrim, umount, urgent or
background where we will issue discards, actually, they need use
different policy in aspect of io aware, discard granularity, delay
interval and so on. But now they just share one common discard policy,
so there will be race when changing policy in between these scenarios,
the interference of changing discard policy will be very serious.

This patch changes to split discard policy for different scenarios.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: wrap discard policy
Chao Yu [Wed, 4 Oct 2017 01:08:33 +0000 (09:08 +0800)]
f2fs: wrap discard policy

This patch wraps scattered optional parameters into discard policy as
below, later, with it we expect that we can adjust these parameters with
proper strategy in different scenario.

struct discard_policy {
unsigned int min_interval; /* used for candidates exist */
unsigned int max_interval; /* used for candidates not exist */
unsigned int max_requests; /* # of discards issued per round */
unsigned int io_aware_gran; /* minimum granularity discard not be aware of I/O */
bool io_aware; /* issue discard in idle time */
bool sync; /* submit discard with REQ_SYNC flag */
};

This patch doesn't change any logic of codes.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: support issuing/waiting discard in range
Chao Yu [Wed, 4 Oct 2017 01:08:32 +0000 (09:08 +0800)]
f2fs: support issuing/waiting discard in range

Fstrim intends to trim invalid blocks of filesystem only with specified
range and granularity, but actually, it will issue all previous cached
discard commands which may be out-of-range and be with unmatched
granularity, it's unneeded.

In order to fix above issues, this patch introduces new helps to support
to issue and wait discard in range and adds a new fstrim_list for tracking
in-flight discard from ->fstrim.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix to flush multiple device in checkpoint
Chao Yu [Fri, 29 Sep 2017 05:59:39 +0000 (13:59 +0800)]
f2fs: fix to flush multiple device in checkpoint

If f2fs manages multiple devices, in checkpoint, we need to issue flush
in those devices which contain dirty data/node in their cache before
we write checkpoint region, otherwise, filesystem metadata could be
corrupted if hitting SPO after checkpoint.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: enhance multiple device flush
Chao Yu [Fri, 29 Sep 2017 05:59:38 +0000 (13:59 +0800)]
f2fs: enhance multiple device flush

When multiple device feature is enabled, during ->fsync we will issue
flush in all devices to make sure node/data of the file being persisted
into storage. But some flushes of device could be unneeded as file's
data may be not writebacked into those devices. So this patch adds and
manage bitmap per inode in global cache to indicate which device is
dirty and it needs to issue flush during ->fsync, hence, we could improve
performance of fsync in scenario of multiple device.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix to show ino management cache size correctly
Chao Yu [Fri, 29 Sep 2017 05:59:37 +0000 (13:59 +0800)]
f2fs: fix to show ino management cache size correctly

It needs to stat size of ino management cache with all type instead of
orphan ino type.

Fixes: 652be55162dc ("f2fs: show # of orphan inodes")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: drop FI_UPDATE_WRITE tag after f2fs_issue_flush
Chao Yu [Fri, 29 Sep 2017 05:59:36 +0000 (13:59 +0800)]
f2fs: drop FI_UPDATE_WRITE tag after f2fs_issue_flush

If we failed to issue flush in ->fsync, we need to keep FI_UPDATE_WRITE
flag to make sure triggering flush in next ->fsync.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: obsolete ALLOC_NID_LIST list
Chao Yu [Fri, 29 Sep 2017 05:59:35 +0000 (13:59 +0800)]
f2fs: obsolete ALLOC_NID_LIST list

As Fan Li reported, there is no user traversing nid_list[ALLOC_NID_LIST]
which is used for tracking preallocated nids. Let's drop it, and only
track preallocated nids in free_nid_root radix-tree.

Reported-by: Fan Li <fanofcode.li@samsung.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: convert inline data for direct I/O & FI_NO_PREALLOC
Weichao Guo [Fri, 29 Sep 2017 14:43:23 +0000 (22:43 +0800)]
f2fs: convert inline data for direct I/O & FI_NO_PREALLOC

In FI_NO_PREALLOC cases, direct I/O path may allocate blocks for an
inode but keep its inline data flag. This inconsistency may trigger
vfs clear_inode nrpages bug_on when evicting the inode. We should
convert inline data first in this case.

Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: allow readpages with NULL file pointer
Hsiang Kao [Sat, 23 Sep 2017 18:45:42 +0000 (02:45 +0800)]
f2fs: allow readpages with NULL file pointer

Keep in line with the other Linux file system implementations
since page_cache_sync_readahead supports NULL file pointer,
and thus we can readahead data by f2fs itself without file opening
(something like the btrfs behavior).

Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: show flush list status in sysfs
Chao Yu [Thu, 14 Sep 2017 02:18:01 +0000 (10:18 +0800)]
f2fs: show flush list status in sysfs

This patch adds to show flush list status in sysfs.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: introduce read_xattr_block
Chao Yu [Mon, 4 Sep 2017 10:58:03 +0000 (18:58 +0800)]
f2fs: introduce read_xattr_block

Commit ba38c27eb93e ("f2fs: enhance lookup xattr") introduces
lookup_all_xattrs duplicating from read_all_xattrs, which leaves
lots of similar codes in between them, so introduce new help
read_xattr_block to clean up redundant codes.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: introduce read_inline_xattr
Chao Yu [Mon, 4 Sep 2017 10:58:02 +0000 (18:58 +0800)]
f2fs: introduce read_inline_xattr

Commit ba38c27eb93e ("f2fs: enhance lookup xattr") introduces
lookup_all_xattrs duplicating from read_all_xattrs, which leaves
lots of similar codes in between them, so introduce new help
read_inline_xattr to clean up redundant codes.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agoRevert "f2fs: reuse nids more aggressively"
Chao Yu [Mon, 25 Sep 2017 06:17:51 +0000 (14:17 +0800)]
Revert "f2fs: reuse nids more aggressively"

Commit 268344664603 ("f2fs: reuse nids more aggressively") tries to
reuse nids as many as possilbe, in order to mitigate producing obsolete
node pages in page cache.

But acutally, before we reuse the nids and related node page cache,
we will always invalidate that node page, so there will be not any
obsolete node pages in cache.

Let's just revert previous commit, so that nm_i::next_scan_nid can be
increased ascendingly, making __build_free_nids traverses all NAT pages
more easily, finally, free nid bitmap cache can be enabled as soon as
possible.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agoRevert "f2fs: node segment is prior to data segment selected victim"
Yunlong Song [Sat, 23 Sep 2017 09:02:18 +0000 (17:02 +0800)]
Revert "f2fs: node segment is prior to data segment selected victim"

This reverts commit b9cd20619e359d199b755543474c3d853c8e3415.

That patch causes much fewer node segments (which can be used for SSR)
than before, and in the corner case (e.g. create and delete *.txt files in
one same directory, there will be very few node segments but many data
segments), if the reserved free segments are all used up during gc, then
the write_checkpoint can still flush dentry pages to data ssr segments,
but will probably fail to flush node pages to node ssr segments, since
there are not enough node ssr segments left (the left ones are all
full).

So revert this patch to give a fair chance to let node segments remain
for SSR, which provides more robustness for corner cases.

Conflicts:
fs/f2fs/gc.c

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix potential panic during fstrim
Chao Yu [Sun, 1 Oct 2017 18:50:16 +0000 (02:50 +0800)]
f2fs: fix potential panic during fstrim

As Ju Hyung Park reported:

"When 'fstrim' is called for manual trim, a BUG() can be triggered
randomly with this patch.

I'm seeing this issue on both x86 Desktop and arm64 Android phone.

On x86 Desktop, this was caused during Ubuntu boot-up. I have a
cronjob installed which calls 'fstrim -v /' during boot. On arm64
Android, this was caused during GC looping with 1ms gc_min_sleep_time
& gc_max_sleep_time."

Root cause of this issue is that f2fs_wait_discard_bios can only be
used by f2fs_put_super, because during put_super there must be no
other referrers, so it can ignore discard entry's reference count
when removing the entry, otherwise in other caller we will hit bug_on
in __remove_discard_cmd as there may be other issuer added reference
count in discard entry.

Thread A Thread B
- issue_discard_thread
- f2fs_ioc_fitrim
 - f2fs_trim_fs
  - f2fs_wait_discard_bios
   - __issue_discard_cmd
    - __submit_discard_cmd
 - __wait_discard_cmd
  - dc->ref++
  - __wait_one_discard_bio
   - __wait_discard_cmd
    - __remove_discard_cmd
     - f2fs_bug_on(sbi, dc->ref)

Fixes: 969d1b180d987c2be02de890d0fff0f66a0e80de
Reported-by: Ju Hyung Park <qkrwngud825@gmail.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: hurry up to issue discard after io interruption
Chao Yu [Tue, 12 Sep 2017 13:35:12 +0000 (21:35 +0800)]
f2fs: hurry up to issue discard after io interruption

Once we encounter I/O interruption during issuing discards, we will delay
long time before next round, but if system status is I/O idle during the
time, it may loses opportunity to issue discards. So this patch changes
to hurry up to issue discard after io interruption.

Besides, this patch also fixes to issue discards accurately with assigned
rate.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix to show correct discard_granularity in sysfs
Chao Yu [Tue, 12 Sep 2017 06:25:35 +0000 (14:25 +0800)]
f2fs: fix to show correct discard_granularity in sysfs

Fix below incorrect display when reading discard_granularity sysfs node.

$ cat /sys/fs/f2fs/<device>/discard_granularity
$ 16
$ echo 32 > /sys/fs/f2fs/<device>/discard_granularity
$ cat /sys/fs/f2fs/<device>/discard_granularity
$ 16

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: detect dirty inode in evict_inode
Chao Yu [Tue, 12 Sep 2017 06:04:05 +0000 (14:04 +0800)]
f2fs: detect dirty inode in evict_inode

Add a bugon in f2fs_evict_inode to detect inconsistent status between
inode cache and related node page cache.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: clear radix tree dirty tag of pages whose dirty flag is cleared
Daeho Jeong [Mon, 11 Sep 2017 07:30:28 +0000 (16:30 +0900)]
f2fs: clear radix tree dirty tag of pages whose dirty flag is cleared

On a senario like writing out the first dirty page of the inode
as the inline data, we only cleared dirty flags of the pages, but
didn't clear the dirty tags of those pages in the radix tree.

If we don't clear the dirty tags of the pages in the radix tree, the
inodes which contain the pages will be marked with I_DIRTY_PAGES again
and again, and writepages() for the inodes will be invoked in every
writeback period. As a result, nothing will be done in every
writepages() for the inodes and it will just consume CPU time
meaninglessly.

Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: speed up gc_urgent mode with SSR
Jaegeuk Kim [Sat, 9 Sep 2017 18:11:04 +0000 (11:11 -0700)]
f2fs: speed up gc_urgent mode with SSR

This patch activates SSR in gc_urgent mode.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: better to wait for fstrim completion
Jaegeuk Kim [Sat, 9 Sep 2017 19:03:23 +0000 (12:03 -0700)]
f2fs: better to wait for fstrim completion

In android, we'd better wait for fstrim completion instead of issuing the
discard commands asynchronous.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: avoid race in between read xattr & write xattr
Yunlei He [Thu, 7 Sep 2017 02:40:54 +0000 (10:40 +0800)]
f2fs: avoid race in between read xattr & write xattr

Thread A: Thread B:
-f2fs_getxattr
   -lookup_all_xattrs
      -xnid = F2FS_I(inode)->i_xattr_nid;
-f2fs_setxattr
    -__f2fs_setxattr
        -write_all_xattrs
            -truncate_xattr_node
          ...  ...
-write_checkpoint
  ...  ...
-alloc_nid   <- nid reuse
          -get_node_page
              -f2fs_bug_on  <- nid != node_footer->nid

It's need a rw_sem to avoid the race

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: make get_lock_data_page to handle encrypted inode
Jaegeuk Kim [Thu, 7 Sep 2017 04:04:44 +0000 (21:04 -0700)]
f2fs: make get_lock_data_page to handle encrypted inode

This patch refactors get_lock_data_page() to handle encryption case directly.
In order to do that, it introduces common f2fs_submit_page_read().

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: use generic terms used for encrypted block management
Jaegeuk Kim [Wed, 6 Sep 2017 00:04:35 +0000 (17:04 -0700)]
f2fs: use generic terms used for encrypted block management

This patch renames functions regarding to buffer management via META_MAPPING
used for encrypted blocks especially. We can actually use them in generic way.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>