OSDN Git Service

sagit-ice-cold/kernel_xiaomi_msm8998.git
6 years agofscrypt: define fscrypt_fname_alloc_buffer() to be for presented names
Eric Biggers [Fri, 12 Jan 2018 04:30:08 +0000 (23:30 -0500)]
fscrypt: define fscrypt_fname_alloc_buffer() to be for presented names

Previously fscrypt_fname_alloc_buffer() was used to allocate buffers for
both presented (decrypted or encoded) and encrypted filenames.  That was
confusing, because it had to allocate the worst-case size for either,
e.g. including NUL-padding even when it was meaningless.

But now that fscrypt_setup_filename() no longer calls it, it is only
used in the ->get_link() and ->readdir() paths, which specifically want
a buffer for presented filenames.  Therefore, switch the behavior over
to allocating the buffer for presented filenames only.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: calculate NUL-padding length in one place only
Eric Biggers [Fri, 12 Jan 2018 04:30:08 +0000 (23:30 -0500)]
fscrypt: calculate NUL-padding length in one place only

Currently, when encrypting a filename (either a real filename or a
symlink target) we calculate the amount of NUL-padding twice: once
before encryption and once during encryption in fname_encrypt().  It is
needed before encryption to allocate the needed buffer size as well as
calculate the size the symlink target will take up on-disk before
creating the symlink inode.  Calculating the size during encryption as
well is redundant.

Remove this redundancy by always calculating the exact size beforehand,
and making fname_encrypt() just add as much NUL padding as is needed to
fill the output buffer.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: move fscrypt_symlink_data to fscrypt_private.h
Eric Biggers [Fri, 12 Jan 2018 04:30:08 +0000 (23:30 -0500)]
fscrypt: move fscrypt_symlink_data to fscrypt_private.h

Now that all filesystems have been converted to use the symlink helper
functions, they no longer need the declaration of 'struct
fscrypt_symlink_data'.  Move it from fscrypt.h to fscrypt_private.h.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: remove fscrypt_fname_usr_to_disk()
Eric Biggers [Fri, 12 Jan 2018 04:30:08 +0000 (23:30 -0500)]
fscrypt: remove fscrypt_fname_usr_to_disk()

fscrypt_fname_usr_to_disk() sounded very generic but was actually only
used to encrypt symlinks.  Remove it now that all filesystems have been
switched over to fscrypt_encrypt_symlink().

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agof2fs: switch to fscrypt_get_symlink()
Eric Biggers [Fri, 12 Jan 2018 04:26:49 +0000 (23:26 -0500)]
f2fs: switch to fscrypt_get_symlink()

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agof2fs: switch to fscrypt ->symlink() helper functions
Eric Biggers [Fri, 12 Jan 2018 04:26:49 +0000 (23:26 -0500)]
f2fs: switch to fscrypt ->symlink() helper functions

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: new helper function - fscrypt_get_symlink()
Eric Biggers [Fri, 5 Jan 2018 18:45:02 +0000 (10:45 -0800)]
fscrypt: new helper function - fscrypt_get_symlink()

Filesystems also have duplicate code to support ->get_link() on
encrypted symlinks.  Factor it out into a new function
fscrypt_get_symlink().  It takes in the contents of the encrypted
symlink on-disk and provides the target (decrypted or encoded) that
should be returned from ->get_link().

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: new helper functions for ->symlink()
Eric Biggers [Fri, 5 Jan 2018 18:45:01 +0000 (10:45 -0800)]
fscrypt: new helper functions for ->symlink()

Currently, filesystems supporting fscrypt need to implement some tricky
logic when creating encrypted symlinks, including handling a peculiar
on-disk format (struct fscrypt_symlink_data) and correctly calculating
the size of the encrypted symlink.  Introduce helper functions to make
things a bit easier:

- fscrypt_prepare_symlink() computes and validates the size the symlink
  target will require on-disk.
- fscrypt_encrypt_symlink() creates the encrypted target if needed.

The new helpers actually fix some subtle bugs.  First, when checking
whether the symlink target was too long, filesystems didn't account for
the fact that the NUL padding is meant to be truncated if it would cause
the maximum length to be exceeded, as is done for filenames in
directories.  Consequently users would receive ENAMETOOLONG when
creating symlinks close to what is supposed to be the maximum length.
For example, with EXT4 with a 4K block size, the maximum symlink target
length in an encrypted directory is supposed to be 4093 bytes (in
comparison to 4095 in an unencrypted directory), but in
FS_POLICY_FLAGS_PAD_32-mode only up to 4064 bytes were accepted.

Second, symlink targets of "." and ".." were not being encrypted, even
though they should be, as these names are special in *directory entries*
but not in symlink targets.  Fortunately, we can fix this simply by
starting to encrypt them, as old kernels already accept them in
encrypted form.

Third, the output string length the filesystems were providing when
doing the actual encryption was incorrect, as it was forgotten to
exclude 'sizeof(struct fscrypt_symlink_data)'.  Fortunately though, this
bug didn't make a difference.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: trim down fscrypt.h includes
Eric Biggers [Fri, 5 Jan 2018 18:45:00 +0000 (10:45 -0800)]
fscrypt: trim down fscrypt.h includes

fscrypt.h included way too many other headers, given that it is included
by filesystems both with and without encryption support.  Trim down the
includes list by moving the needed includes into more appropriate
places, and removing the unneeded ones.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: move fscrypt_is_dot_dotdot() to fs/crypto/fname.c
Eric Biggers [Fri, 5 Jan 2018 18:44:59 +0000 (10:44 -0800)]
fscrypt: move fscrypt_is_dot_dotdot() to fs/crypto/fname.c

Only fs/crypto/fname.c cares about treating the "." and ".." filenames
specially with regards to encryption, so move fscrypt_is_dot_dotdot()
from fscrypt.h to there.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: move fscrypt_valid_enc_modes() to fscrypt_private.h
Eric Biggers [Fri, 5 Jan 2018 18:44:58 +0000 (10:44 -0800)]
fscrypt: move fscrypt_valid_enc_modes() to fscrypt_private.h

The encryption modes are validated by fs/crypto/, not by individual
filesystems.  Therefore, move fscrypt_valid_enc_modes() from fscrypt.h
to fscrypt_private.h.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: move fscrypt_operations declaration to fscrypt_supp.h
Eric Biggers [Fri, 5 Jan 2018 18:44:57 +0000 (10:44 -0800)]
fscrypt: move fscrypt_operations declaration to fscrypt_supp.h

Filesystems now only define their fscrypt_operations when they are
compiled with encryption support, so move the fscrypt_operations
declaration from fscrypt.h to fscrypt_supp.h.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: split fscrypt_dummy_context_enabled() into supp/notsupp versions
Eric Biggers [Fri, 5 Jan 2018 18:44:56 +0000 (10:44 -0800)]
fscrypt: split fscrypt_dummy_context_enabled() into supp/notsupp versions

fscrypt_dummy_context_enabled() accesses ->s_cop, which now is only set
when the filesystem is built with encryption support.  This didn't
actually matter because no filesystems called it.  However, it will
start being used soon, so fix it by moving it from fscrypt.h to
fscrypt_supp.h and stubbing it out in fscrypt_notsupp.h.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: move fscrypt_ctx declaration to fscrypt_supp.h
Eric Biggers [Fri, 5 Jan 2018 18:44:55 +0000 (10:44 -0800)]
fscrypt: move fscrypt_ctx declaration to fscrypt_supp.h

Filesystems only ever access 'struct fscrypt_ctx' through fscrypt
functions.  But when a filesystem is built without encryption support,
these functions are all stubbed out, so the declaration of fscrypt_ctx
is unneeded.  Therefore, move it from fscrypt.h to fscrypt_supp.h.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: move fscrypt_info_cachep declaration to fscrypt_private.h
Eric Biggers [Fri, 5 Jan 2018 18:44:54 +0000 (10:44 -0800)]
fscrypt: move fscrypt_info_cachep declaration to fscrypt_private.h

The fscrypt_info kmem_cache is internal to fscrypt; filesystems don't
need to access it.  So move its declaration into fscrypt_private.h.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: move fscrypt_control_page() to supp/notsupp headers
Eric Biggers [Fri, 5 Jan 2018 18:44:53 +0000 (10:44 -0800)]
fscrypt: move fscrypt_control_page() to supp/notsupp headers

fscrypt_control_page() is already split into two versions depending on
whether the filesystem is being built with encryption support or not.
Move them into the appropriate headers.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: move fscrypt_has_encryption_key() to supp/notsupp headers
Eric Biggers [Fri, 5 Jan 2018 18:44:52 +0000 (10:44 -0800)]
fscrypt: move fscrypt_has_encryption_key() to supp/notsupp headers

fscrypt_has_encryption_key() is already split into two versions
depending on whether the filesystem is being built with encryption
support or not.  Move them into the appropriate headers.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agof2fs: don't put dentry page in pagecache into highmem
Yunlong Song [Wed, 28 Feb 2018 12:31:52 +0000 (20:31 +0800)]
f2fs: don't put dentry page in pagecache into highmem

Previous dentry page uses highmem, which will cause panic in platforms
using highmem (such as arm), since the address space of dentry pages
from highmem directly goes into the decryption path via the function
fscrypt_fname_disk_to_usr. But sg_init_one assumes the address is not
from highmem, and then cause panic since it doesn't call kmap_high but
kunmap_high is triggered at the end. To fix this problem in a simple
way, this patch avoids to put dentry page in pagecache into highmem.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix coding style]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: support inode creation time
Chao Yu [Thu, 25 Jan 2018 06:54:42 +0000 (14:54 +0800)]
f2fs: support inode creation time

This patch adds creation time field in inode layout to support showing
kstat.btime in ->statx.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: rebuild sit page from sit info in mem
Yunlei He [Thu, 25 Jan 2018 09:27:11 +0000 (17:27 +0800)]
f2fs: rebuild sit page from sit info in mem

This patch rebuild sit page from sit info in mem instead
of issue a read io.

I test this method and the result is as below:

Pre:
 mmc_perf_test-12061 [001] ...1   976.819992: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
 mmc_perf_test-12061 [001] ...1   976.856446: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
 mmc_perf_test-12061 [003] ...1   998.976946: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
 mmc_perf_test-12061 [003] ...1   999.023269: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
 mmc_perf_test-12061 [003] ...1  1022.060772: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
 mmc_perf_test-12061 [003] ...1  1022.111034: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
 mmc_perf_test-12061 [002] ...1  1070.127643: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
 mmc_perf_test-12061 [003] ...1  1070.187352: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
 mmc_perf_test-12061 [003] ...1  1095.942124: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
 mmc_perf_test-12061 [003] ...1  1095.995975: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
 mmc_perf_test-12061 [003] ...1  1122.535091: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
 mmc_perf_test-12061 [003] ...1  1122.586521: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
 mmc_perf_test-12061 [001] ...1  1147.897487: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
 mmc_perf_test-12061 [001] ...1  1147.959438: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
 mmc_perf_test-12061 [003] ...1  1177.926951: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
 mmc_perf_test-12061 [002] ...1  1177.976823: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
 mmc_perf_test-12061 [002] ...1  1204.176087: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
 mmc_perf_test-12061 [002] ...1  1204.239046: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit

Some sit flush consume more than 50ms.

Now:
mmc_perf_test-2187  [007] ...1   196.840684: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187  [007] ...1   196.841258: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187  [007] ...1   219.430582: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187  [007] ...1   219.431144: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187  [002] ...1   243.638678: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187  [000] ...1   243.638980: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187  [002] ...1   265.392180: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187  [002] ...1   265.392245: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187  [000] ...1   290.309051: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187  [000] ...1   290.309116: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187  [003] ...1   317.144209: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187  [003] ...1   317.145913: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187  [005] ...1   343.224954: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187  [005] ...1   343.225574: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187  [000] ...1   370.239846: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187  [000] ...1   370.241138: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187  [001] ...1   397.029043: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187  [001] ...1   397.030750: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187  [003] ...1   425.386377: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187  [003] ...1   425.387735: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit

Most sit flush consume no more than 1ms.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: stop issuing discard if fs is readonly
Chao Yu [Thu, 25 Jan 2018 10:57:27 +0000 (18:57 +0800)]
f2fs: stop issuing discard if fs is readonly

If filesystem is readonly, stop to issue discard in daemon.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: clean up duplicated assignment in init_discard_policy
Chao Yu [Thu, 25 Jan 2018 10:57:26 +0000 (18:57 +0800)]
f2fs: clean up duplicated assignment in init_discard_policy

Remove duplicated codes of assignment for .max_requests and .io_aware_gran.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: use GFP_F2FS_ZERO for cleanup
Chao Yu [Thu, 25 Jan 2018 10:57:25 +0000 (18:57 +0800)]
f2fs: use GFP_F2FS_ZERO for cleanup

Clean up codes with GFP_F2FS_ZERO, no logic changes.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: allow to recover node blocks given updated checkpoint
Jaegeuk Kim [Fri, 19 Jan 2018 21:42:33 +0000 (13:42 -0800)]
f2fs: allow to recover node blocks given updated checkpoint

If fsck.f2fs changes crc, we have no way to recover some inode blocks by roll-
forward recovery. Let's relax the condition to recover them.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: recover some i_inline flags
Jaegeuk Kim [Sat, 20 Jan 2018 04:01:40 +0000 (20:01 -0800)]
f2fs: recover some i_inline flags

This fixes lost i_inline flags during roll-forward.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: correct removexattr behavior for null valued extended attribute
Daeho Jeong [Sat, 20 Jan 2018 07:46:33 +0000 (15:46 +0800)]
f2fs: correct removexattr behavior for null valued extended attribute

__vfs_removexattr() transfers "NULL" value to the setxattr handler of
the f2fs filesystem in order to remove the extended attribute. But,
__f2fs_setxattr() just ignores the removal request when the value of
the extended attribute is already NULL. We have to remove the extended
attribute itself even if the value of that is already NULL.

We can reporduce this bug with the below:

1. touch file
2. setfattr -n "user.foo" file
3. setfattr -x "user.foo" file
4. getfattr -d file
> user.foo

Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com>
Signed-off-by: Youngjin Gil <youngjin.gil@samsung.com>
Tested-by: Hobin Woo <hobin.woo@samsung.com>
Tested-by: Chao Yu <yuchao0@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: drop page cache after fs shutdown
Chao Yu [Thu, 18 Jan 2018 09:29:10 +0000 (17:29 +0800)]
f2fs: drop page cache after fs shutdown

Don't remain dirtied page cache in f2fs after shutdown, it can mitigate
memory pressure of whole system, in order to keep other modules working
properly.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: stop gc/discard thread after fs shutdown
Chao Yu [Thu, 18 Jan 2018 09:23:29 +0000 (17:23 +0800)]
f2fs: stop gc/discard thread after fs shutdown

Once filesystem shuts down, daemons like gc/discard thread should be
aware of it, and do exit, in addtion, drop all cached pending discard
commands and turn off real-time discard mode.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: hanlde error case in f2fs_ioc_shutdown
Chao Yu [Wed, 17 Jan 2018 14:28:52 +0000 (22:28 +0800)]
f2fs: hanlde error case in f2fs_ioc_shutdown

This patch makes f2fs_ioc_shutdown handling error case correctly.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: split need_inplace_update
Chao Yu [Wed, 17 Jan 2018 08:31:38 +0000 (16:31 +0800)]
f2fs: split need_inplace_update

This patch splits need_inplace_update to two functions:
a. should_update_inplace() includes all conditions that we must use IPU.
b. should_update_outplace() includes all conditions that we must use OPU.

So that, in f2fs_ioc_set_pin_file() and f2fs_defragment_range(), we can
use corresponding function to check whether we can trigger OPU/IPU or not.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix to update last_disk_size correctly
Chao Yu [Wed, 17 Jan 2018 08:31:37 +0000 (16:31 +0800)]
f2fs: fix to update last_disk_size correctly

This patch fixes to update last_disk_size only when writing out page
successfully.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: kill F2FS_INLINE_XATTR_ADDRS for cleanup
Chao Yu [Wed, 17 Jan 2018 08:31:36 +0000 (16:31 +0800)]
f2fs: kill F2FS_INLINE_XATTR_ADDRS for cleanup

Use get_inline_xattr_addrs directly instead of F2FS_INLINE_XATTR_ADDRS.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: clean up error path of fill_super
Chao Yu [Wed, 17 Jan 2018 08:31:35 +0000 (16:31 +0800)]
f2fs: clean up error path of fill_super

This patch cleans up error path of fille_super to avoid unneeded
release step.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: avoid hungtask when GC encrypted block if io_bits is set
Sheng Yong [Wed, 17 Jan 2018 04:11:31 +0000 (12:11 +0800)]
f2fs: avoid hungtask when GC encrypted block if io_bits is set

When io_bits is set, GCing encrypted block may hit the following hungtask.
Since io_bits requires aligned block address, f2fs_submit_page_write may
return -EAGAIN if new_blkaddr does not satisify io_bits alignment. As a
result, the encrypted page will never be writtenback.

This patch makes move_data_block aware the EAGAIN error and cancel the
writeback.

[  246.751371] INFO: task kworker/u4:4:797 blocked for more than 90 seconds.
[  246.752423]       Not tainted 4.15.0-rc4+ #11
[  246.754176] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  246.755336] kworker/u4:4    D25448   797      2 0x80000000
[  246.755597] Workqueue: writeback wb_workfn (flush-7:0)
[  246.755616] Call Trace:
[  246.755695]  ? __schedule+0x322/0xa90
[  246.755761]  ? blk_init_request_from_bio+0x120/0x120
[  246.755773]  ? pci_mmcfg_check_reserved+0xb0/0xb0
[  246.755801]  ? __radix_tree_create+0x19e/0x200
[  246.755813]  ? delete_node+0x136/0x370
[  246.755838]  schedule+0x43/0xc0
[  246.755904]  io_schedule+0x17/0x40
[  246.755939]  wait_on_page_bit_common+0x17b/0x240
[  246.755950]  ? wake_page_function+0xa0/0xa0
[  246.755961]  ? add_to_page_cache_lru+0x160/0x160
[  246.755972]  ? page_cache_tree_insert+0x170/0x170
[  246.755983]  ? __lru_cache_add+0x96/0xb0
[  246.756086]  __filemap_fdatawait_range+0x14f/0x1c0
[  246.756097]  ? wait_on_page_bit_common+0x240/0x240
[  246.756120]  ? __wake_up_locked_key_bookmark+0x20/0x20
[  246.756167]  ? wait_on_all_pages_writeback+0xc9/0x100
[  246.756179]  ? __remove_ino_entry+0x120/0x120
[  246.756192]  ? wait_woken+0x100/0x100
[  246.756204]  filemap_fdatawait_range+0x9/0x20
[  246.756216]  write_checkpoint+0x18a1/0x1f00
[  246.756254]  ? blk_get_request+0x10/0x10
[  246.756265]  ? cpumask_next_and+0x43/0x60
[  246.756279]  ? f2fs_sync_inode_meta+0x160/0x160
[  246.756289]  ? remove_element.isra.4+0xa0/0xa0
[  246.756300]  ? __put_compound_page+0x40/0x40
[  246.756310]  ? f2fs_sync_fs+0xec/0x1c0
[  246.756320]  ? f2fs_sync_fs+0x120/0x1c0
[  246.756329]  f2fs_sync_fs+0x120/0x1c0
[  246.756357]  ? trace_event_raw_event_f2fs__page+0x260/0x260
[  246.756393]  ? ata_build_rw_tf+0x173/0x410
[  246.756397]  f2fs_balance_fs_bg+0x198/0x390
[  246.756405]  ? drop_inmem_page+0x230/0x230
[  246.756415]  ? ahci_qc_prep+0x1bb/0x2e0
[  246.756418]  ? ahci_qc_issue+0x1df/0x290
[  246.756422]  ? __accumulate_pelt_segments+0x42/0xd0
[  246.756426]  ? f2fs_write_node_pages+0xd1/0x380
[  246.756429]  f2fs_write_node_pages+0xd1/0x380
[  246.756437]  ? sync_node_pages+0x8f0/0x8f0
[  246.756440]  ? update_curr+0x53/0x220
[  246.756444]  ? __accumulate_pelt_segments+0xa2/0xd0
[  246.756448]  ? __update_load_avg_se.isra.39+0x349/0x360
[  246.756452]  ? do_writepages+0x2a/0xa0
[  246.756456]  do_writepages+0x2a/0xa0
[  246.756460]  __writeback_single_inode+0x70/0x490
[  246.756463]  ? check_preempt_wakeup+0x199/0x310
[  246.756467]  writeback_sb_inodes+0x2a2/0x660
[  246.756471]  ? is_empty_dir_inode+0x40/0x40
[  246.756474]  ? __writeback_single_inode+0x490/0x490
[  246.756477]  ? string+0xbf/0xf0
[  246.756480]  ? down_read_trylock+0x35/0x60
[  246.756484]  __writeback_inodes_wb+0x9f/0xf0
[  246.756488]  wb_writeback+0x41d/0x4b0
[  246.756492]  ? writeback_inodes_wb.constprop.55+0x150/0x150
[  246.756498]  ? set_worker_desc+0xf7/0x130
[  246.756502]  ? current_is_workqueue_rescuer+0x60/0x60
[  246.756511]  ? _find_next_bit+0x2c/0xa0
[  246.756514]  ? wb_workfn+0x400/0x5d0
[  246.756518]  wb_workfn+0x400/0x5d0
[  246.756521]  ? finish_task_switch+0xdf/0x2a0
[  246.756525]  ? inode_wait_for_writeback+0x30/0x30
[  246.756529]  process_one_work+0x3a7/0x6f0
[  246.756533]  worker_thread+0x82/0x750
[  246.756537]  kthread+0x16f/0x1c0
[  246.756541]  ? trace_event_raw_event_workqueue_work+0x110/0x110
[  246.756544]  ? kthread_create_worker_on_cpu+0xb0/0xb0
[  246.756548]  ret_from_fork+0x1f/0x30

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: allow quota to use reserved blocks
Jaegeuk Kim [Sat, 6 Jan 2018 00:02:36 +0000 (16:02 -0800)]
f2fs: allow quota to use reserved blocks

This patch allows quota to use reserved blocks all the time.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix to drop all inmem pages correctly
Chao Yu [Mon, 15 Jan 2018 09:16:46 +0000 (17:16 +0800)]
f2fs: fix to drop all inmem pages correctly

In commit 57864ae5ce3a ("f2fs: limit # of inmemory pages"), we have
limited memory footprint of all inmem pages with 20% of total memory,
otherwise, if we exceed the threshold, we will try to drop all inmem
pages to avoid excessive memory pressure resulting in performance
regression.

But in some unrelated error paths, we will also drop all inmem pages,
which should be wrong, fix it in this patch.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: speed up defragment on sparse file
Chao Yu [Wed, 10 Jan 2018 10:18:52 +0000 (18:18 +0800)]
f2fs: speed up defragment on sparse file

We have supported to get next page offset with valid mapping crossing
hole in f2fs_map_blocks, utilizing it to speed up defragment on sparse
file.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: support F2FS_IOC_PRECACHE_EXTENTS
Chao Yu [Thu, 11 Jan 2018 06:42:30 +0000 (14:42 +0800)]
f2fs: support F2FS_IOC_PRECACHE_EXTENTS

This patch introduces a new ioctl F2FS_IOC_PRECACHE_EXTENTS to precache
extent info like ext4, in order to gain better performance during
triggering AIO by eliminating synchronous waiting of mapping info.

Referred commit: 7869a4a6c5ca ("ext4: add support for extent pre-caching")

In addition, with newly added extent precache abilitiy, this patch add
to support FIEMAP_FLAG_CACHE in ->fiemap.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: add an ioctl to disable GC for specific file
Jaegeuk Kim [Fri, 8 Dec 2017 00:25:39 +0000 (16:25 -0800)]
f2fs: add an ioctl to disable GC for specific file

This patch gives a flag to disable GC on given file, which would be useful, when
user wants to keep its block map. It also conducts in-place-update for dontmove
file.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: prevent newly created inode from being dirtied incorrectly
Daeho Jeong [Thu, 11 Jan 2018 02:26:19 +0000 (11:26 +0900)]
f2fs: prevent newly created inode from being dirtied incorrectly

Now, we invoke f2fs_mark_inode_dirty_sync() to make an inode dirty in
advance of creating a new node page for the inode. By this, some inodes
whose node page is not created yet can be linked into the global dirty
list.

If the checkpoint is executed at this moment, the inode will be written
back by writeback_single_inode() and finally update_inode_page() will
fail to detach the inode from the global dirty list because the inode
doesn't have a node page.

The problem is that the inode's state in VFS layer will become clean
after execution of writeback_single_inode() and it's still linked in
the global dirty list of f2fs and this will cause a kernel panic.

So, we will prevent the newly created inode from being dirtied during
the FI_NEW_INODE flag of the inode is set. We will make it dirty
right after the flag is cleared.

Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com>
Signed-off-by: Youngjin Gil <youngjin.gil@samsung.com>
Tested-by: Hobin Woo <hobin.woo@samsung.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: support FIEMAP_FLAG_XATTR
Chao Yu [Thu, 11 Jan 2018 06:39:57 +0000 (14:39 +0800)]
f2fs: support FIEMAP_FLAG_XATTR

This patch enables ->fiemap to handle FIEMAP_FLAG_XATTR flag for xattr
mapping info lookup purpose.

It makes f2fs passing generic/425 test in fstest.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix to cover f2fs_inline_data_fiemap with inode_lock
Chao Yu [Thu, 11 Jan 2018 06:37:35 +0000 (14:37 +0800)]
f2fs: fix to cover f2fs_inline_data_fiemap with inode_lock

This patch fix to cover f2fs_inline_data_fiemap with inode_lock in order
to make that interface avoiding race with mapping change.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: check node page again in write end io
Yunlei He [Thu, 11 Jan 2018 06:19:32 +0000 (14:19 +0800)]
f2fs: check node page again in write end io

Check node page again in write end io in case of
data corruption during inflght IO.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix to caclulate required free section correctly
Chao Yu [Wed, 10 Jan 2018 10:18:51 +0000 (18:18 +0800)]
f2fs: fix to caclulate required free section correctly

When calculating required free section during file defragmenting, we
should skip holes in file, otherwise we will probably fail to defrag
sparse file with large size.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: handle newly created page when revoking inmem pages
Daeho Jeong [Wed, 10 Jan 2018 07:49:10 +0000 (16:49 +0900)]
f2fs: handle newly created page when revoking inmem pages

When committing inmem pages is successful, we revoke already committed
blocks in __revoke_inmem_pages() and finally replace the committed
ones with the old blocks using f2fs_replace_block(). However, if
the committed block was newly created one, the address of the old
block is NEW_ADDR and __f2fs_replace_block() cannot handle NEW_ADDR
as new_blkaddr properly and a kernel panic occurrs.

Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com>
Tested-by: Shu Tan <shu.tan@samsung.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: add resgid and resuid to reserve root blocks
Jaegeuk Kim [Fri, 5 Jan 2018 05:36:09 +0000 (21:36 -0800)]
f2fs: add resgid and resuid to reserve root blocks

This patch adds mount options to reserve some blocks via resgid=%u,resuid=%u.
It only activates with reserve_root=%u.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: implement cgroup writeback support
Yufen Yu [Tue, 9 Jan 2018 11:33:39 +0000 (19:33 +0800)]
f2fs: implement cgroup writeback support

Cgroup writeback requires explicit support from the filesystem.
f2fs's data and node writeback IOs go through __write_data_page,
which sets fio for submiting IOs. So, we add io_wbc for fio,
associate bios with blkcg by invoking wbc_init_bio() and
account IOs issuing by wbc_account_io().
In addtion, f2fs_fill_super() is updated to set SB_I_CGROUPWB.

Meta writeback IOs is left alone by this patch and will always be
attributed to the root cgroup.

The results show that f2fs can throttle writeback nicely for
data writing and file creating.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: remove unused pend_list_tag
Chao Yu [Mon, 8 Jan 2018 10:48:34 +0000 (18:48 +0800)]
f2fs: remove unused pend_list_tag

In commit 78997b569f56 ("f2fs: split discard policy"), we have get rid
of using pend_list_tag field in struct discard_cmd_control, but forgot
to remove it, now do it.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: avoid high cpu usage in discard thread
Chao Yu [Mon, 8 Jan 2018 10:48:33 +0000 (18:48 +0800)]
f2fs: avoid high cpu usage in discard thread

We take very long time to finish generic/476, this is because we will
check consistence of all discard entries in global rb tree while
traversing all different granularity pending lists, even when the list
is empty, in order to avoid that unneeded overhead, we have to skip
the check when coming up an empty list.

generic/476 time consumption:
cost
Before patch & w/o consistence check 57s
Before patch & w/ consistence check 1426s
After patch 78s

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: make local functions static
Wei Yongjun [Fri, 5 Jan 2018 09:41:20 +0000 (09:41 +0000)]
f2fs: make local functions static

Fixes the following sparse warnings:

fs/f2fs/segment.c:887:6: warning:
 symbol '__check_sit_bitmap' was not declared. Should it be static?
fs/f2fs/segment.c:1327:6: warning:
 symbol 'f2fs_wait_discard_bio' was not declared. Should it be static?
fs/f2fs/super.c:1661:5: warning:
 symbol 'f2fs_get_projid' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: add reserved blocks for root user
Jaegeuk Kim [Wed, 27 Dec 2017 23:05:52 +0000 (15:05 -0800)]
f2fs: add reserved blocks for root user

This patch allows root to reserve some blocks via mount option.

"-o reserve_root=N" means N x 4KB-sized blocks for root only.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: check segment type in __f2fs_replace_block
Yunlong Song [Thu, 4 Jan 2018 07:02:02 +0000 (15:02 +0800)]
f2fs: check segment type in __f2fs_replace_block

In some case, the node blocks has wrong blkaddr whose segment type is
NODE, e.g., recover inode has missing xattr flag and the blkaddr is in
the xattr range. Since fsck.f2fs does not check the recovery nodes, this
will cause __f2fs_replace_block change the curseg of node and do the
update_sit_entry(sbi, new_blkaddr, 1) with no next_blkoff refresh, as a
result, when recovery process write checkpoint and sync nodes, the
next_blkoff of curseg is used in the segment bit map, then it will
cause f2fs_bug_on. So let's check segment type in __f2fs_replace_block.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: update inode info to inode page for new file
Yunlei He [Wed, 3 Jan 2018 10:03:04 +0000 (18:03 +0800)]
f2fs: update inode info to inode page for new file

After checkpoint,
 1. creat a new file A ,(with dirty inode && dirty inode page && xattr info)
 2. backgroud wb write back file A inode page (without update from inode cache)
 3. fsync file A, write back inode page of file A with inode cache info
 4. sudden power off before new checkpoint

In this case, recovery process will try to recover a zero inode
page. Inline xattr flag of file A will be miss and xattr info
will be taken as blkaddr index.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: show precise # of blocks that user/root can use
Jaegeuk Kim [Wed, 3 Jan 2018 18:55:07 +0000 (10:55 -0800)]
f2fs: show precise # of blocks that user/root can use

Let's show precise # of blocks that user/root can use through bavail and bfree
respectively.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: clean up unneeded declaration
Chao Yu [Wed, 3 Jan 2018 09:32:51 +0000 (17:32 +0800)]
f2fs: clean up unneeded declaration

Commit 6afc662e68b5 ("f2fs: support flexible inline xattr size")
declared f2fs_sb_has_flexible_inline_xattr in f2fs.h for latter being
used in get_inline_xattr_addrs, but in latter version, related code
has been changed, leave f2fs_sb_has_flexible_inline_xattr w/o any
users. Let's remove it for cleanup.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: continue to do direct IO if we only preallocate partial blocks
Chao Yu [Wed, 3 Jan 2018 09:30:19 +0000 (17:30 +0800)]
f2fs: continue to do direct IO if we only preallocate partial blocks

While doing direct IO, if we run out-of-space when we preallocate blocks,
we should not return ENOSPC error directly, instead, we should continue
to do following direct IO, which will keep directIO of f2fs acting like
other filesystems.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: enable quota at remount from r to w
Jaegeuk Kim [Tue, 2 Jan 2018 19:03:19 +0000 (11:03 -0800)]
f2fs: enable quota at remount from r to w

We have to enable quota only when remounting from read to write. Otherwise,
we'll get remount failure. (e.g., write to write case)

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: skip stop_checkpoint for user data writes
Jaegeuk Kim [Mon, 1 Jan 2018 00:26:38 +0000 (16:26 -0800)]
f2fs: skip stop_checkpoint for user data writes

We can give another chance to write user data, which can resolve
generic/441.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix missing error number for xattr operation
Jaegeuk Kim [Fri, 29 Dec 2017 01:47:19 +0000 (17:47 -0800)]
f2fs: fix missing error number for xattr operation

This fixes generic/449 hang problem caused by no ENOSPC forever which should be
returned by setxattr under disk full scenario.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: recover directory operations by fsync
Jaegeuk Kim [Thu, 28 Dec 2017 16:09:44 +0000 (08:09 -0800)]
f2fs: recover directory operations by fsync

This fixes generic/342 which doesn't recover renamed file which was fsynced
before. It will be done via another fsync on newly created file.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: return error during fill_super
Jaegeuk Kim [Wed, 20 Dec 2017 03:16:34 +0000 (19:16 -0800)]
f2fs: return error during fill_super

Let's avoid BUG_ON during fill_super, when on-disk was totall corrupted.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix an error case of missing update inode page
Yunlei He [Tue, 5 Dec 2017 04:07:47 +0000 (12:07 +0800)]
f2fs: fix an error case of missing update inode page

-Thread A                             Thread B

-write_checkpoint
 -block_operations
  -f2fs_unlock_all                    -f2fs_sync_file
                                       -f2fs_write_inode
                                        -f2fs_inode_synced
    -f2fs_sync_inode_meta
     -sync_node_pages
                                        -set_page_drity

In this case, if sudden power off without next new checkpoint,
the last inode page update will lost. wb_writeback is same with
fsync.

Yunlei also reproduced the bug by:

@@ -366,7 +366,7 @@ int update_inode(struct inode *inode, struct page *node_page)
        struct extent_tree *et = F2FS_I(inode)->extent_tree;

        f2fs_inode_synced(inode);
-
+       msleep(10000);
        f2fs_wait_on_page_writeback(node_page, NODE, true);

shell 1:                                       shell2:

dd if=/dev/zero of=./test bs=1M count=10
sync
echo "hello" >> ./test
fsync test  // sleep 10s
                                               sync //return quickly
echo c > /proc/sysrq-trigger

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix potential hangtask in f2fs_trace_pid
Chao Yu [Tue, 12 Dec 2017 06:11:40 +0000 (14:11 +0800)]
f2fs: fix potential hangtask in f2fs_trace_pid

As Jia-Ju Bai reported:

"According to fs/f2fs/trace.c, the kernel module may sleep under a spinlock.
The function call path is:
f2fs_trace_pid (acquire the spinlock)
   f2fs_radix_tree_insert
     cond_resched --> may sleep

I do not find a good way to fix it, so I only report.
This possible bug is found by my static analysis tool (DSAC) and my code
review."

Obviously, it's problemetic to schedule in critical region of spinlock,
which will cause uninterruptable sleep if there is no waker.

This patch changes to use mutex lock intead of spinlock to avoid this
condition.

Reported-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: no need return value in restore summary process
Yunlei He [Wed, 6 Dec 2017 03:31:29 +0000 (11:31 +0800)]
f2fs: no need return value in restore summary process

No need return value in restore summary process

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: use unlikely for release case
LiFan [Tue, 5 Dec 2017 08:38:01 +0000 (16:38 +0800)]
f2fs: use unlikely for release case

Since the variable release is only nonzero when another unlikely
case occurs, use unlikely() on it seems logical.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: don't return value in truncate_data_blocks_range
Chao Yu [Thu, 30 Nov 2017 11:28:23 +0000 (19:28 +0800)]
f2fs: don't return value in truncate_data_blocks_range

There is no caller cares about return value of truncate_data_blocks_range,
remove it.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: clean up f2fs_map_blocks
Chao Yu [Thu, 30 Nov 2017 11:28:22 +0000 (19:28 +0800)]
f2fs: clean up f2fs_map_blocks

f2fs_map_blocks():

if (blkaddr == NEW_ADDR || blkaddr == NULL_ADDR) {
if (create) {
...
} else {
...
if (flag == F2FS_GET_BLOCK_FIEMAP &&
blkaddr == NULL_ADDR) {
...
}
if (flag != F2FS_GET_BLOCK_FIEMAP ||
blkaddr != NEW_ADDR)
goto sync_out;
}

It means we can break the loop in cases of:
a) flag != F2FS_GET_BLOCK_FIEMAP or
b) flag == F2FS_GET_BLOCK_FIEMAP && blkaddr == NULL_ADDR

Condition b) is the same as previous one, so merge operations of them
for readability.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: clean up hash codes
Chao Yu [Thu, 30 Nov 2017 11:28:21 +0000 (19:28 +0800)]
f2fs: clean up hash codes

f2fs_chksum and f2fs_crc32 use the same 'crc32' crypto engine, also
their implementation are almost the same, except with different
shash description context.

Introduce __f2fs_crc32 to wrap the common codes, and reuse it in
f2fs_chksum and f2fs_crc32.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix error handling in fill_super
Chao Yu [Thu, 30 Nov 2017 11:28:20 +0000 (19:28 +0800)]
f2fs: fix error handling in fill_super

In fill_super, if we fail to call f2fs_build_stats(), it needs to detach
from global f2fs shrink list, otherwise once system starts to shrink slab
cache, we will encounter below panic:

BUG: unable to handle kernel paging request at 00007d35
Oops: 0002 [#1] PREEMPT SMP
EIP: __lock_acquire+0x70/0x12c0
Call Trace:
 lock_acquire+0xae/0x220
 mutex_trylock+0xc5/0xf0
 f2fs_shrink_count+0x32/0xb0 [f2fs]
 shrink_slab+0xf1/0x5b0
 drop_slab_node+0x35/0x60
 drop_slab+0xf/0x20
 drop_caches_sysctl_handler+0x79/0xc0
 proc_sys_call_handler+0xa4/0xc0
 proc_sys_write+0x1f/0x30
 __vfs_write+0x24/0x150
 SyS_write+0x44/0x90
 do_fast_syscall_32+0xa1/0x1ca
 entry_SYSENTER_32+0x4c/0x7b

In addition, this patch relocates f2fs_join_shrinker in fill_super to
avoid unneeded error handling of it.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: spread f2fs_k{m,z}alloc
Chao Yu [Thu, 30 Nov 2017 11:28:19 +0000 (19:28 +0800)]
f2fs: spread f2fs_k{m,z}alloc

Use f2fs_k{m,z}alloc as much as possible to increase fault injection
points.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: inject fault to kvmalloc
Chao Yu [Thu, 30 Nov 2017 11:28:18 +0000 (19:28 +0800)]
f2fs: inject fault to kvmalloc

This patch supports to inject fault into kvmalloc/kvzalloc.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: inject fault to kzalloc
Chao Yu [Thu, 30 Nov 2017 11:28:17 +0000 (19:28 +0800)]
f2fs: inject fault to kzalloc

This patch introduces f2fs_kzalloc based on f2fs_kmalloc in order to
support error injection for kzalloc().

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: remove a redundant conditional expression
LiFan [Tue, 28 Nov 2017 12:17:41 +0000 (20:17 +0800)]
f2fs: remove a redundant conditional expression

Avoid checking is_inode repeatedly, and make the logic
a little bit clearer.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: apply write hints to select the type of segment for direct write
Hyunchul Lee [Tue, 28 Nov 2017 00:23:00 +0000 (09:23 +0900)]
f2fs: apply write hints to select the type of segment for direct write

When blocks are allocated for direct write, select the type of
segment using the kiocb hint. But if an inode has FI_NO_ALLOC,
use the inode hint.

Signed-off-by: Hyunchul Lee <cheol.lee@lge.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: switch to fscrypt_prepare_setattr()
Eric Biggers [Wed, 29 Nov 2017 20:35:32 +0000 (12:35 -0800)]
f2fs: switch to fscrypt_prepare_setattr()

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: switch to fscrypt_prepare_lookup()
Eric Biggers [Wed, 29 Nov 2017 20:35:31 +0000 (12:35 -0800)]
f2fs: switch to fscrypt_prepare_lookup()

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: switch to fscrypt_prepare_rename()
Eric Biggers [Wed, 29 Nov 2017 20:35:30 +0000 (12:35 -0800)]
f2fs: switch to fscrypt_prepare_rename()

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: switch to fscrypt_prepare_link()
Eric Biggers [Wed, 29 Nov 2017 20:35:29 +0000 (12:35 -0800)]
f2fs: switch to fscrypt_prepare_link()

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: switch to fscrypt_file_open()
Eric Biggers [Wed, 29 Nov 2017 20:35:28 +0000 (12:35 -0800)]
f2fs: switch to fscrypt_file_open()

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: remove repeated f2fs_bug_on
Zhikang Zhang [Sat, 25 Nov 2017 18:34:28 +0000 (02:34 +0800)]
f2fs: remove repeated f2fs_bug_on

f2fs: remove repeated f2fs_bug_on which has already existed
      in function invalidate_blocks.

Signed-off-by: Zhikang Zhang <zhangzhikang1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: remove an excess variable
LiFan [Sat, 25 Nov 2017 03:46:18 +0000 (11:46 +0800)]
f2fs: remove an excess variable

Remove the variable page_idx which no one would miss.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix lock dependency in between dio_rwsem & i_mmap_sem
Chao Yu [Thu, 23 Nov 2017 15:26:52 +0000 (23:26 +0800)]
f2fs: fix lock dependency in between dio_rwsem & i_mmap_sem

test/generic/208 reports a potential deadlock as below:

Chain exists of:
  &mm->mmap_sem --> &fi->i_mmap_sem --> &fi->dio_rwsem[WRITE]

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&fi->dio_rwsem[WRITE]);
                               lock(&fi->i_mmap_sem);
                               lock(&fi->dio_rwsem[WRITE]);
  lock(&mm->mmap_sem);

This patch changes the lock dependency as below in fallocate() to
fix this issue:
- dio_rwsem
 - i_mmap_sem

Fixes: bb06664a534b ("f2fs: avoid race in between GC and block exchange")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: remove unused parameter
Sheng Yong [Wed, 22 Nov 2017 10:23:40 +0000 (18:23 +0800)]
f2fs: remove unused parameter

Commit d260081ccf37 ("f2fs: change recovery policy of xattr node block")
removes the use of blkaddr, which is no longer used. So remove the
parameter.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: still write data if preallocate only partial blocks
Sheng Yong [Wed, 22 Nov 2017 10:23:39 +0000 (18:23 +0800)]
f2fs: still write data if preallocate only partial blocks

If there is not enough space left, f2fs_preallocate_blocks may only
preallocte partial blocks. As a result, the write operation fails
but i_blocks is not 0.  To avoid this, f2fs should write data in
non-preallocation way and write as many data as the size of i_blocks.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: introduce sysfs readdir_ra to readahead inode block in readdir
Sheng Yong [Wed, 22 Nov 2017 10:23:38 +0000 (18:23 +0800)]
f2fs: introduce sysfs readdir_ra to readahead inode block in readdir

This patch introduces a sysfs interface readdir_ra to enable/disable
readaheading inode block in f2fs_readdir. When readdir_ra is enabled,
it improves the performance of "readdir + stat".

For 300,000 files:
time find /data/test > /dev/null
disable readdir_ra: 1m25.69s real  0m01.94s user  0m50.80s system
enable  readdir_ra: 0m18.55s real  0m00.44s user  0m15.39s system

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix concurrent problem for updating free bitmap
LiFan [Wed, 22 Nov 2017 08:07:23 +0000 (16:07 +0800)]
f2fs: fix concurrent problem for updating free bitmap

alloc_nid_failed and scan_nat_page can be called at the same time,
and we haven't protected add_free_nid and update_free_nid_bitmap
with the same nid_list_lock. That could lead to

Thread A Thread B
- __build_free_nids
 - scan_nat_page
  - add_free_nid
- alloc_nid_failed
 - update_free_nid_bitmap
  - update_free_nid_bitmap

scan_nat_page will clear the free bitmap since the nid is PREALLOC_NID,
but alloc_nid_failed needs to set the free bitmap. This results in
free nid with free bitmap cleared.
This patch update the bitmap under the same nid_list_lock in add_free_nid.
And use __GFP_NOFAIL to make sure to update status of free nid correctly.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: remove unneeded memory footprint accounting
Chao Yu [Tue, 21 Nov 2017 09:49:54 +0000 (17:49 +0800)]
f2fs: remove unneeded memory footprint accounting

We forgot to remov memory footprint accounting of per-cpu type
variables, fix it.

Fixes: 35782b233f37 ("f2fs: remove percpu_count due to performance regression")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: no need to read nat block if nat_block_bitmap is set
Yunlei He [Fri, 17 Nov 2017 08:13:38 +0000 (16:13 +0800)]
f2fs: no need to read nat block if nat_block_bitmap is set

No need to read nat block if nat_block_bitmap is set.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: reserve nid resource for quota sysfile
Chao Yu [Thu, 16 Nov 2017 08:59:14 +0000 (16:59 +0800)]
f2fs: reserve nid resource for quota sysfile

During mkfs, quota sysfiles have already occupied nid resource,
it needs to adjust remaining available nid count in kernel side.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agofscrypt: resolve some cherry-pick bugs
Jaegeuk Kim [Wed, 10 Jan 2018 00:52:25 +0000 (16:52 -0800)]
fscrypt: resolve some cherry-pick bugs

- remove wrong linux/fscrypt.h declared in ext4
- remove obsolete function

Fixes: 734f0d241d2b ("fscrypt: clean up include file mess")
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agofscrypt: move to generic async completion
Gilad Ben-Yossef [Wed, 18 Oct 2017 07:00:44 +0000 (08:00 +0100)]
fscrypt: move to generic async completion

fscrypt starts several async. crypto ops and waiting for them to
complete. Move it over to generic code doing the same.

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
6 years agocrypto: introduce crypto wait for async op
Gilad Ben-Yossef [Wed, 18 Oct 2017 07:00:38 +0000 (08:00 +0100)]
crypto: introduce crypto wait for async op

Invoking a possibly async. crypto op and waiting for completion
while correctly handling backlog processing is a common task
in the crypto API implementation and outside users of it.

This patch adds a generic implementation for doing so in
preparation for using it across the board instead of hand
rolled versions.

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
CC: Eric Biggers <ebiggers3@gmail.com>
CC: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
6 years agofscrypt: lock mutex before checking for bounce page pool
Eric Biggers [Sun, 29 Oct 2017 10:30:19 +0000 (06:30 -0400)]
fscrypt: lock mutex before checking for bounce page pool

fscrypt_initialize(), which allocates the global bounce page pool when
an encrypted file is first accessed, uses "double-checked locking" to
try to avoid locking fscrypt_init_mutex.  However, it doesn't use any
memory barriers, so it's theoretically possible for a thread to observe
a bounce page pool which has not been fully initialized.  This is a
classic bug with "double-checked locking".

While "only a theoretical issue" in the latest kernel, in pre-4.8
kernels the pointer that was checked was not even the last to be
initialized, so it was easily possible for a crash (NULL pointer
dereference) to happen.  This was changed only incidentally by the large
refactor to use fs/crypto/.

Solve both problems in a trivial way that can easily be backported: just
always take the mutex.  It's theoretically less efficient, but it
shouldn't be noticeable in practice as the mutex is only acquired very
briefly once per encrypted file.

Later I'd like to make this use a helper macro like DO_ONCE().  However,
DO_ONCE() runs in atomic context, so we'd need to add a new macro that
allows blocking.

Cc: stable@vger.kernel.org # v4.1+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: new helper function - fscrypt_prepare_setattr()
Eric Biggers [Mon, 9 Oct 2017 19:15:44 +0000 (12:15 -0700)]
fscrypt: new helper function - fscrypt_prepare_setattr()

Introduce a helper function for filesystems to call when processing
->setattr() on a possibly-encrypted inode.  It handles enforcing that an
encrypted file can only be truncated if its encryption key is available.

Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: new helper function - fscrypt_prepare_lookup()
Eric Biggers [Mon, 9 Oct 2017 19:15:43 +0000 (12:15 -0700)]
fscrypt: new helper function - fscrypt_prepare_lookup()

Introduce a helper function which prepares to look up the given dentry
in the given directory.  If the directory is encrypted, it handles
loading the directory's encryption key, setting the dentry's ->d_op to
fscrypt_d_ops, and setting DCACHE_ENCRYPTED_WITH_KEY if the directory's
encryption key is available.

Note: once all filesystems switch over to this, we'll be able to move
fscrypt_d_ops and fscrypt_set_encrypted_dentry() to fscrypt_private.h.

Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: new helper function - fscrypt_prepare_rename()
Eric Biggers [Mon, 9 Oct 2017 19:15:42 +0000 (12:15 -0700)]
fscrypt: new helper function - fscrypt_prepare_rename()

Introduce a helper function which prepares to rename a file into a
possibly encrypted directory.  It handles loading the encryption keys
for the source and target directories if needed, and it handles
enforcing that if the target directory (and the source directory for a
cross-rename) is encrypted, then the file being moved into the directory
has the same encryption policy as its containing directory.

Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: new helper function - fscrypt_prepare_link()
Eric Biggers [Mon, 9 Oct 2017 19:15:41 +0000 (12:15 -0700)]
fscrypt: new helper function - fscrypt_prepare_link()

Introduce a helper function which prepares to link an inode into a
possibly-encrypted directory.  It handles setting up the target
directory's encryption key, then verifying that the link won't violate
the constraint that all files in an encrypted directory tree use the
same encryption policy.

Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: new helper function - fscrypt_file_open()
Eric Biggers [Mon, 9 Oct 2017 19:15:40 +0000 (12:15 -0700)]
fscrypt: new helper function - fscrypt_file_open()

Add a helper function which prepares to open a regular file which may be
encrypted.  It handles setting up the file's encryption key, then
checking that the file's encryption policy matches that of its parent
directory (if the parent directory is encrypted).  It may be set as the
->open() method or it can be called from another ->open() method.

Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: new helper function - fscrypt_require_key()
Eric Biggers [Mon, 9 Oct 2017 19:15:39 +0000 (12:15 -0700)]
fscrypt: new helper function - fscrypt_require_key()

Add a helper function which checks if an inode is encrypted, and if so,
tries to set up its encryption key.  This is a pattern which is
duplicated in multiple places in each of ext4, f2fs, and ubifs --- for
example, when a regular file is asked to be opened or truncated.

Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: remove unneeded empty fscrypt_operations structs
Eric Biggers [Mon, 9 Oct 2017 19:15:38 +0000 (12:15 -0700)]
fscrypt: remove unneeded empty fscrypt_operations structs

In the case where a filesystem has been configured without encryption
support, there is no longer any need to initialize ->s_cop at all, since
none of the methods are ever called.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>