OSDN Git Service

sagit-ice-cold/kernel_xiaomi_msm8998.git
6 years agof2fs: align memory boundary for bitops
Jaegeuk Kim [Sat, 10 Mar 2018 01:42:28 +0000 (17:42 -0800)]
f2fs: align memory boundary for bitops

For example, in arm64, free_nid_bitmap should be aligned to word size in order
to use bit operations.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: remove unneeded set_cold_node()
Chao Yu [Fri, 9 Mar 2018 06:24:22 +0000 (14:24 +0800)]
f2fs: remove unneeded set_cold_node()

When setting COLD_BIT_SHIFT flag in node block, we only need to call
set_cold_node() in new_node_page() and recover_inode_page() during
node page initialization. So remove unneeded set_cold_node() in other
places.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: add nowait aio support
Hyunchul Lee [Thu, 8 Mar 2018 10:34:38 +0000 (19:34 +0900)]
f2fs: add nowait aio support

This patch adds nowait aio support[1].

Return EAGAIN if any of the following checks fail for direct I/O:
  - i_rwsem is not lockable
  - Blocks are not allocated at the write location

And xfstests generic/471 is passed.

 [1]: 6be96d "Introduce RWF_NOWAIT and FMODE_AIO_NOWAIT"

Signed-off-by: Hyunchul Lee <cheol.lee@lge.com>
Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: wrap all options with f2fs_sb_info.mount_opt
Chao Yu [Thu, 8 Mar 2018 06:22:56 +0000 (14:22 +0800)]
f2fs: wrap all options with f2fs_sb_info.mount_opt

This patch merges miscellaneous mount options into struct f2fs_mount_info,
After this patch, once we add new mount option, we don't need to worry
about recovery of it in remount_fs(), since we will recover the
f2fs_sb_info.mount_opt including all options.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: Don't overwrite all types of node to keep node chain
Yunlei He [Wed, 7 Mar 2018 08:22:50 +0000 (16:22 +0800)]
f2fs: Don't overwrite all types of node to keep node chain

Currently, we enable node SSR by default, and mixed
different types of node segment to do SSR more intensively.
Although reuse warm node is not allowed, warm node chain
will be destroyed by errors introduced by other types
node chain. So we'd better forbid reusing all types
of node to keep warm node chain.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: introduce mount option for fsync mode
Junling Zheng [Wed, 7 Mar 2018 04:07:49 +0000 (12:07 +0800)]
f2fs: introduce mount option for fsync mode

Commit "0a007b97aad6"(f2fs: recover directory operations by fsync)
fixed xfstest generic/342 case, but it also increased the written
data and caused the performance degradation. In most cases, there's
no need to do so heavy fsync actually.

So we introduce new mount option "fsync_mode={posix,strict}" to
control the policy of fsync. "fsync_mode=posix" is set by default,
and means that f2fs uses a light fsync, which follows POSIX semantics.
And "fsync_mode=strict" means that it's a heavy fsync, which behaves
in line with xfs, ext4 and btrfs, where generic/342 will pass, but
the performance will regress.

Signed-off-by: Junling Zheng <zhengjunling@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix to restore old mount option in ->remount_fs
Chao Yu [Thu, 1 Mar 2018 15:40:32 +0000 (23:40 +0800)]
f2fs: fix to restore old mount option in ->remount_fs

This patch fixes to restore old mount option once we encounter failure
in ->remount_fs.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: wrap sb_rdonly with f2fs_readonly
Chao Yu [Thu, 1 Mar 2018 15:40:31 +0000 (23:40 +0800)]
f2fs: wrap sb_rdonly with f2fs_readonly

Use f2fs_readonly to wrap sb_rdonly for cleanup, and spread it in
all places.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: avoid selinux denial on CAP_SYS_RESOURCE
Jaegeuk Kim [Fri, 9 Mar 2018 04:47:33 +0000 (20:47 -0800)]
f2fs: avoid selinux denial on CAP_SYS_RESOURCE

This fixes CAP_SYS_RESOURCE denial of selinux when using resgid, since it
seems selinux reports it at the first place, but mostly we don't need to
check this condition first.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: support hot file extension
Chao Yu [Wed, 28 Feb 2018 09:07:27 +0000 (17:07 +0800)]
f2fs: support hot file extension

This patch supports to recognize hot file extension in f2fs, so that we
can allocate proper hot segment location for its data, which can lead to
better hot/cold seperation in filesystem.

In addition, we changes a bit on query/add/del operation method for
extension_list sysfs entry as below:

- Query: cat /sys/fs/f2fs/<disk>/extension_list
- Add: echo 'extension' > /sys/fs/f2fs/<disk>/extension_list
- Del: echo '!extension' > /sys/fs/f2fs/<disk>/extension_list
- Add: echo '[h/c]extension' > /sys/fs/f2fs/<disk>/extension_list
- Del: echo '[h/c]!extension' > /sys/fs/f2fs/<disk>/extension_list
- [h] means add/del hot file extension
- [c] means add/del cold file extension

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix to avoid race in between atomic write and background GC
Chao Yu [Tue, 27 Feb 2018 14:45:24 +0000 (22:45 +0800)]
f2fs: fix to avoid race in between atomic write and background GC

Sqlite user Background GC
- move_data_block
  : move page #1
 - f2fs_is_atomic_file
- f2fs_ioc_start_atomic_write
- f2fs_ioc_commit_atomic_write
 - commit_inmem_pages
   : commit page #1 & set node #2 dirty
 - f2fs_submit_page_write
  - f2fs_update_data_blkaddr
   - set_page_dirty
     : set node #2 dirty
 - f2fs_do_sync_file
  - fsync_node_pages
   : commit node #1 & node #2, then sudden power-cut

In a race case, we may check FI_ATOMIC_FILE flag before starting atomic
write flow, then we will commit meta data before data with reversed
order, after a sudden pow-cut, database transaction will be inconsistent.

So we'd better to exclude gc/atomic_write to each other by using lock
instead of flag checking.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: do gc in greedy mode for whole range if gc_urgent mode is set
Jaegeuk Kim [Mon, 26 Feb 2018 23:40:30 +0000 (15:40 -0800)]
f2fs: do gc in greedy mode for whole range if gc_urgent mode is set

Otherwise, f2fs conducts GC on 8GB range only based on slow cost-benefit.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: issue discard aggressively in the gc_urgent mode
Jaegeuk Kim [Fri, 23 Feb 2018 07:30:55 +0000 (23:30 -0800)]
f2fs: issue discard aggressively in the gc_urgent mode

This patch avoids to skip discard commands when user sets gc_urgent mode.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: set readdir_ra by default
Jaegeuk Kim [Sun, 25 Feb 2018 09:04:57 +0000 (01:04 -0800)]
f2fs: set readdir_ra by default

It gives general readdir improvement.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: add auto tuning for small devices
Jaegeuk Kim [Thu, 22 Feb 2018 22:09:30 +0000 (14:09 -0800)]
f2fs: add auto tuning for small devices

If f2fs is running on top of very small devices, it's worth to avoid abusing
free LBAs. In order to achieve that, this patch introduces some parameter
tuning.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: add mount option for segment allocation policy
Jaegeuk Kim [Sun, 18 Feb 2018 16:50:49 +0000 (08:50 -0800)]
f2fs: add mount option for segment allocation policy

This patch adds an mount option, "alloc_mode=%s" having two options, "default"
and "reuse".

In "alloc_mode=reuse" case, f2fs starts to allocate segments from 0'th segment
all the time to reassign segments. It'd be useful for small-sized eMMC parts.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: don't stop GC if GC is contended
Jaegeuk Kim [Mon, 26 Feb 2018 17:19:47 +0000 (09:19 -0800)]
f2fs: don't stop GC if GC is contended

Let's do GC as much as possible, while gc_urgent is set.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: expose extension_list sysfs entry
Chao Yu [Mon, 26 Feb 2018 14:04:13 +0000 (22:04 +0800)]
f2fs: expose extension_list sysfs entry

This patch adds a sysfs entry 'extension_list' to support
query/add/del item in extension list.

Query:
cat /sys/fs/f2fs/<device>/extension_list

Add:
echo 'extension' > /sys/fs/f2fs/<device>/extension_list

Del:
echo '!extension' > /sys/fs/f2fs/<device>/extension_list

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix to set KEEP_SIZE bit in f2fs_zero_range
Chao Yu [Sun, 25 Feb 2018 15:38:21 +0000 (23:38 +0800)]
f2fs: fix to set KEEP_SIZE bit in f2fs_zero_range

As Jayashree Mohan reported:

A simple workload to reproduce this would be :
1. create foo
2. Write (8K - 16K)  // foo size = 16K now
3. fsync()
4. falloc zero_range , keep_size (4202496 - 4210688) // foo size must be 16K
5. fdatasync()
Crash now

On recovery, we see that the file size is 4210688 and not 16K, which
violates the semantics of keep_size flag. We have a test case to
reproduce this using CrashMonkey on 4.15 kernel. Try this out by
simply running :
 ./c_harness -f /dev/sda -d /dev/cow_ram0 -t f2fs -e 102400  -P -v
 tests/generic_468_zero.so

The root cause is that we miss to set KEEP_SIZE bit correctly in zero_range
when zeroing block cross EOF with FALLOC_FL_KEEP_SIZE, let's fix this
missing case.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: introduce sb_lock to make encrypt pwsalt update exclusive
Chao Yu [Sun, 11 Feb 2018 14:53:20 +0000 (22:53 +0800)]
f2fs: introduce sb_lock to make encrypt pwsalt update exclusive

f2fs_super_block.encrypt_pw_salt can be udpated and persisted
concurrently, result in getting different pwsalt in separated
threads, so let's introduce sb_lock to exclude concurrent
accessers.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: remove redundant initialization of pointer 'p'
Colin Ian King [Wed, 21 Feb 2018 18:13:40 +0000 (18:13 +0000)]
f2fs: remove redundant initialization of pointer 'p'

Pointer p is initialized with a value that is never read and is later
re-assigned a new value, hence the initialization is redundant and can
be removed.

Cleans up clang warning:
fs/f2fs/extent_cache.c:463:19: warning: Value stored to 'p' during
its initialization is never read

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: flush cp pack except cp pack 2 page at first
Gao Xiang [Sat, 10 Feb 2018 04:12:51 +0000 (12:12 +0800)]
f2fs: flush cp pack except cp pack 2 page at first

Previously, we attempt to flush the whole cp pack in a single bio,
however, when suddenly powering off at this time, we could get into
an extreme scenario that cp pack 1 page and cp pack 2 page are updated
and latest, but payload or current summaries are still partially
outdated. (see reliable write in the UFS specification)

This patch submits the whole cp pack except cp pack 2 page at first,
and then writes the cp pack 2 page with an extra independent
bio with pre-io barrier.

Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: clean up f2fs_sb_has_xxx functions
Sheng Yong [Tue, 6 Feb 2018 04:31:17 +0000 (12:31 +0800)]
f2fs: clean up f2fs_sb_has_xxx functions

This patch introduces F2FS_FEATURE_FUNCS to clean up the definitions of
different f2fs_sb_has_xxx functions.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: remove redundant check of page type when submit bio
Tiezhu Yang [Tue, 6 Feb 2018 00:21:45 +0000 (08:21 +0800)]
f2fs: remove redundant check of page type when submit bio

This patch removes redundant check of page type when submit bio to
make the logic more clear.

Signed-off-by: Tiezhu Yang <kernelpatch@126.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix to handle looped node chain during recovery
Chao Yu [Sat, 3 Feb 2018 09:44:39 +0000 (17:44 +0800)]
f2fs: fix to handle looped node chain during recovery

There is no checksum in node block now, so bit-transition from hardware
can make node_footer.next_blkaddr being corrupted w/o any detection,
result in node chain becoming looped one.

For this condition, during recovery, in order to avoid running into dead
loop, let's detect it and just skip out.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: handle quota for orphan inodes
Jaegeuk Kim [Thu, 8 Feb 2018 01:01:48 +0000 (17:01 -0800)]
f2fs: handle quota for orphan inodes

This is to detect dquot_initialize errors early from evict_inode
for orphan inodes.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: support passing down write hints to block layer with F2FS policy
Hyunchul Lee [Wed, 31 Jan 2018 02:36:58 +0000 (11:36 +0900)]
f2fs: support passing down write hints to block layer with F2FS policy

Add 'whint_mode=fs-based' mount option. In this mode, F2FS passes
down write hints with its policy.

* whint_mode=fs-based. F2FS passes down hints with its policy.

User                  F2FS                     Block
----                  ----                     -----
                      META                     WRITE_LIFE_MEDIUM;
                      HOT_NODE                 WRITE_LIFE_NOT_SET
                      WARM_NODE                "
                      COLD_NODE                WRITE_LIFE_NONE
ioctl(COLD)           COLD_DATA                WRITE_LIFE_EXTREME
extension list        "                        "

-- buffered io
WRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME
WRITE_LIFE_SHORT      HOT_DATA                 WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_LONG
WRITE_LIFE_NONE       "                        "
WRITE_LIFE_MEDIUM     "                        "
WRITE_LIFE_LONG       "                        "

-- direct io
WRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME
WRITE_LIFE_SHORT      HOT_DATA                 WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_NOT_SET
WRITE_LIFE_NONE       "                        WRITE_LIFE_NONE
WRITE_LIFE_MEDIUM     "                        WRITE_LIFE_MEDIUM
WRITE_LIFE_LONG       "                        WRITE_LIFE_LONG

Many thanks to Chao Yu and Jaegeuk Kim for comments to
implement this patch.

Signed-off-by: Hyunchul Lee <cheol.lee@lge.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: support passing down write hints given by users to block layer
Hyunchul Lee [Wed, 31 Jan 2018 02:36:57 +0000 (11:36 +0900)]
f2fs: support passing down write hints given by users to block layer

Add the 'whint_mode' mount option that controls which write
hints are passed down to block layer. There are "off" and
"user-based" mode. The default mode is "off".

1) whint_mode=off. F2FS only passes down WRITE_LIFE_NOT_SET.

2) whint_mode=user-based. F2FS tries to pass down hints given
by users.

User                  F2FS                     Block
----                  ----                     -----
                      META                     WRITE_LIFE_NOT_SET
                      HOT_NODE                 "
                      WARM_NODE                "
                      COLD_NODE                "
ioctl(COLD)           COLD_DATA                WRITE_LIFE_EXTREME
extension list        "                        "

-- buffered io
WRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME
WRITE_LIFE_SHORT      HOT_DATA                 WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_NOT_SET
WRITE_LIFE_NONE       "                        "
WRITE_LIFE_MEDIUM     "                        "
WRITE_LIFE_LONG       "                        "

-- direct io
WRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME
WRITE_LIFE_SHORT      HOT_DATA                 WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_NOT_SET
WRITE_LIFE_NONE       "                        WRITE_LIFE_NONE
WRITE_LIFE_MEDIUM     "                        WRITE_LIFE_MEDIUM
WRITE_LIFE_LONG       "                        WRITE_LIFE_LONG

Many thanks to Chao Yu and Jaegeuk Kim for comments to
implement this patch.

Signed-off-by: Hyunchul Lee <cheol.lee@lge.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: avoid build warning]
[Chao Yu: fix to restore whint_mode in ->remount_fs]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix to clear CP_TRIMMED_FLAG
Chao Yu [Wed, 31 Jan 2018 01:30:34 +0000 (09:30 +0800)]
f2fs: fix to clear CP_TRIMMED_FLAG

Once CP_TRIMMED_FLAG is set, after a reboot, we will never issue discard
before LBA becomes invalid again, fix it by clearing the flag in
checkpoint without CP_TRIMMED reason.

Fixes: 1f43e2ad7bff ("f2fs: introduce CP_TRIMMED_FLAG to avoid unneeded discard")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: support large nat bitmap
Chao Yu [Thu, 25 Jan 2018 11:40:08 +0000 (19:40 +0800)]
f2fs: support large nat bitmap

Previously, we will store all nat version bitmap in checkpoint pack block,
so our total node entry number has a limitation which caused total node
number can not exceed (3900 * 8) block * 455 node/block = 14196000. So
that once user wants to create more nodes in large size image, it becomes
a bottleneck, that's unreasonable.

This patch detects the new layout of nat/sit version bitmap in image in
order to enable supporting large nat bitmap.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix to check extent cache in f2fs_drop_extent_tree
Chao Yu [Sat, 27 Jan 2018 09:29:49 +0000 (17:29 +0800)]
f2fs: fix to check extent cache in f2fs_drop_extent_tree

If noextent_cache mount option is on, we will never initialize extent tree
in inode, but still we're going to access it in f2fs_drop_extent_tree,
result in kernel panic as below:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
 IP: _raw_write_lock+0xc/0x30
 Call Trace:
  ? f2fs_drop_extent_tree+0x41/0x70 [f2fs]
  f2fs_fallocate+0x5a0/0xdd0 [f2fs]
  ? common_file_perm+0x47/0xc0
  ? apparmor_file_permission+0x1a/0x20
  vfs_fallocate+0x15b/0x290
  SyS_fallocate+0x44/0x70
  do_syscall_64+0x6e/0x160
  entry_SYSCALL64_slow_path+0x25/0x25

This patch fixes to check extent cache status before using in
f2fs_drop_extent_tree.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: restrict inline_xattr_size configuration
Chao Yu [Sat, 27 Jan 2018 09:29:48 +0000 (17:29 +0800)]
f2fs: restrict inline_xattr_size configuration

This patch limits to enable inline_xattr_size mount option only if
both extra_attr and flexible_inline_xattr feature is on in current
image.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix heap mode to reset it back
Yunlong Song [Mon, 29 Jan 2018 03:37:45 +0000 (11:37 +0800)]
f2fs: fix heap mode to reset it back

Commit 7a20b8a61eff81bdb7097a578752a74860e9d142 ("f2fs: allocate node
and hot data in the beginning of partition") introduces another mount
option, heap, to reset it back. But it does not do anything for heap
mode, so fix it.

Cc: stable@vger.kernel.org
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix potential corruption in area before F2FS_SUPER_OFFSET
Sheng Yong [Mon, 29 Jan 2018 11:13:15 +0000 (19:13 +0800)]
f2fs: fix potential corruption in area before F2FS_SUPER_OFFSET

sb_getblk does not guarantee the buffer head is uptodate. If bh is not
uptodate, the data (may be used as boot code) in area before
F2FS_SUPER_OFFSET may get corrupted when super block is committed.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agofscrypt: fix build with pre-4.6 gcc versions
Eric Biggers [Fri, 19 Jan 2018 21:45:24 +0000 (13:45 -0800)]
fscrypt: fix build with pre-4.6 gcc versions

gcc versions prior to 4.6 require an extra level of braces when using a
designated initializer for a member in an anonymous struct or union.
This caused a compile error with the 'struct qstr' initialization in
__fscrypt_encrypt_symlink().

Fix it by using QSTR_INIT().

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Fixes: 76e81d6d5048 ("fscrypt: new helper functions for ->symlink()")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: fix up fscrypt_fname_encrypted_size() for internal use
Eric Biggers [Fri, 12 Jan 2018 04:30:08 +0000 (23:30 -0500)]
fscrypt: fix up fscrypt_fname_encrypted_size() for internal use

Filesystems don't need fscrypt_fname_encrypted_size() anymore, so
unexport it and move it to fscrypt_private.h.

We also never calculate the encrypted size of a filename without having
the fscrypt_info present since it is needed to know the amount of
NUL-padding which is determined by the encryption policy, and also we
will always truncate the NUL-padding to the maximum filename length.
Therefore, also make fscrypt_fname_encrypted_size() assume that the
fscrypt_info is present, and make it truncate the returned length to the
specified max_len.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: define fscrypt_fname_alloc_buffer() to be for presented names
Eric Biggers [Fri, 12 Jan 2018 04:30:08 +0000 (23:30 -0500)]
fscrypt: define fscrypt_fname_alloc_buffer() to be for presented names

Previously fscrypt_fname_alloc_buffer() was used to allocate buffers for
both presented (decrypted or encoded) and encrypted filenames.  That was
confusing, because it had to allocate the worst-case size for either,
e.g. including NUL-padding even when it was meaningless.

But now that fscrypt_setup_filename() no longer calls it, it is only
used in the ->get_link() and ->readdir() paths, which specifically want
a buffer for presented filenames.  Therefore, switch the behavior over
to allocating the buffer for presented filenames only.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: calculate NUL-padding length in one place only
Eric Biggers [Fri, 12 Jan 2018 04:30:08 +0000 (23:30 -0500)]
fscrypt: calculate NUL-padding length in one place only

Currently, when encrypting a filename (either a real filename or a
symlink target) we calculate the amount of NUL-padding twice: once
before encryption and once during encryption in fname_encrypt().  It is
needed before encryption to allocate the needed buffer size as well as
calculate the size the symlink target will take up on-disk before
creating the symlink inode.  Calculating the size during encryption as
well is redundant.

Remove this redundancy by always calculating the exact size beforehand,
and making fname_encrypt() just add as much NUL padding as is needed to
fill the output buffer.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: move fscrypt_symlink_data to fscrypt_private.h
Eric Biggers [Fri, 12 Jan 2018 04:30:08 +0000 (23:30 -0500)]
fscrypt: move fscrypt_symlink_data to fscrypt_private.h

Now that all filesystems have been converted to use the symlink helper
functions, they no longer need the declaration of 'struct
fscrypt_symlink_data'.  Move it from fscrypt.h to fscrypt_private.h.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: remove fscrypt_fname_usr_to_disk()
Eric Biggers [Fri, 12 Jan 2018 04:30:08 +0000 (23:30 -0500)]
fscrypt: remove fscrypt_fname_usr_to_disk()

fscrypt_fname_usr_to_disk() sounded very generic but was actually only
used to encrypt symlinks.  Remove it now that all filesystems have been
switched over to fscrypt_encrypt_symlink().

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agof2fs: switch to fscrypt_get_symlink()
Eric Biggers [Fri, 12 Jan 2018 04:26:49 +0000 (23:26 -0500)]
f2fs: switch to fscrypt_get_symlink()

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agof2fs: switch to fscrypt ->symlink() helper functions
Eric Biggers [Fri, 12 Jan 2018 04:26:49 +0000 (23:26 -0500)]
f2fs: switch to fscrypt ->symlink() helper functions

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: new helper function - fscrypt_get_symlink()
Eric Biggers [Fri, 5 Jan 2018 18:45:02 +0000 (10:45 -0800)]
fscrypt: new helper function - fscrypt_get_symlink()

Filesystems also have duplicate code to support ->get_link() on
encrypted symlinks.  Factor it out into a new function
fscrypt_get_symlink().  It takes in the contents of the encrypted
symlink on-disk and provides the target (decrypted or encoded) that
should be returned from ->get_link().

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: new helper functions for ->symlink()
Eric Biggers [Fri, 5 Jan 2018 18:45:01 +0000 (10:45 -0800)]
fscrypt: new helper functions for ->symlink()

Currently, filesystems supporting fscrypt need to implement some tricky
logic when creating encrypted symlinks, including handling a peculiar
on-disk format (struct fscrypt_symlink_data) and correctly calculating
the size of the encrypted symlink.  Introduce helper functions to make
things a bit easier:

- fscrypt_prepare_symlink() computes and validates the size the symlink
  target will require on-disk.
- fscrypt_encrypt_symlink() creates the encrypted target if needed.

The new helpers actually fix some subtle bugs.  First, when checking
whether the symlink target was too long, filesystems didn't account for
the fact that the NUL padding is meant to be truncated if it would cause
the maximum length to be exceeded, as is done for filenames in
directories.  Consequently users would receive ENAMETOOLONG when
creating symlinks close to what is supposed to be the maximum length.
For example, with EXT4 with a 4K block size, the maximum symlink target
length in an encrypted directory is supposed to be 4093 bytes (in
comparison to 4095 in an unencrypted directory), but in
FS_POLICY_FLAGS_PAD_32-mode only up to 4064 bytes were accepted.

Second, symlink targets of "." and ".." were not being encrypted, even
though they should be, as these names are special in *directory entries*
but not in symlink targets.  Fortunately, we can fix this simply by
starting to encrypt them, as old kernels already accept them in
encrypted form.

Third, the output string length the filesystems were providing when
doing the actual encryption was incorrect, as it was forgotten to
exclude 'sizeof(struct fscrypt_symlink_data)'.  Fortunately though, this
bug didn't make a difference.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: trim down fscrypt.h includes
Eric Biggers [Fri, 5 Jan 2018 18:45:00 +0000 (10:45 -0800)]
fscrypt: trim down fscrypt.h includes

fscrypt.h included way too many other headers, given that it is included
by filesystems both with and without encryption support.  Trim down the
includes list by moving the needed includes into more appropriate
places, and removing the unneeded ones.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: move fscrypt_is_dot_dotdot() to fs/crypto/fname.c
Eric Biggers [Fri, 5 Jan 2018 18:44:59 +0000 (10:44 -0800)]
fscrypt: move fscrypt_is_dot_dotdot() to fs/crypto/fname.c

Only fs/crypto/fname.c cares about treating the "." and ".." filenames
specially with regards to encryption, so move fscrypt_is_dot_dotdot()
from fscrypt.h to there.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: move fscrypt_valid_enc_modes() to fscrypt_private.h
Eric Biggers [Fri, 5 Jan 2018 18:44:58 +0000 (10:44 -0800)]
fscrypt: move fscrypt_valid_enc_modes() to fscrypt_private.h

The encryption modes are validated by fs/crypto/, not by individual
filesystems.  Therefore, move fscrypt_valid_enc_modes() from fscrypt.h
to fscrypt_private.h.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: move fscrypt_operations declaration to fscrypt_supp.h
Eric Biggers [Fri, 5 Jan 2018 18:44:57 +0000 (10:44 -0800)]
fscrypt: move fscrypt_operations declaration to fscrypt_supp.h

Filesystems now only define their fscrypt_operations when they are
compiled with encryption support, so move the fscrypt_operations
declaration from fscrypt.h to fscrypt_supp.h.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: split fscrypt_dummy_context_enabled() into supp/notsupp versions
Eric Biggers [Fri, 5 Jan 2018 18:44:56 +0000 (10:44 -0800)]
fscrypt: split fscrypt_dummy_context_enabled() into supp/notsupp versions

fscrypt_dummy_context_enabled() accesses ->s_cop, which now is only set
when the filesystem is built with encryption support.  This didn't
actually matter because no filesystems called it.  However, it will
start being used soon, so fix it by moving it from fscrypt.h to
fscrypt_supp.h and stubbing it out in fscrypt_notsupp.h.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: move fscrypt_ctx declaration to fscrypt_supp.h
Eric Biggers [Fri, 5 Jan 2018 18:44:55 +0000 (10:44 -0800)]
fscrypt: move fscrypt_ctx declaration to fscrypt_supp.h

Filesystems only ever access 'struct fscrypt_ctx' through fscrypt
functions.  But when a filesystem is built without encryption support,
these functions are all stubbed out, so the declaration of fscrypt_ctx
is unneeded.  Therefore, move it from fscrypt.h to fscrypt_supp.h.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: move fscrypt_info_cachep declaration to fscrypt_private.h
Eric Biggers [Fri, 5 Jan 2018 18:44:54 +0000 (10:44 -0800)]
fscrypt: move fscrypt_info_cachep declaration to fscrypt_private.h

The fscrypt_info kmem_cache is internal to fscrypt; filesystems don't
need to access it.  So move its declaration into fscrypt_private.h.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: move fscrypt_control_page() to supp/notsupp headers
Eric Biggers [Fri, 5 Jan 2018 18:44:53 +0000 (10:44 -0800)]
fscrypt: move fscrypt_control_page() to supp/notsupp headers

fscrypt_control_page() is already split into two versions depending on
whether the filesystem is being built with encryption support or not.
Move them into the appropriate headers.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: move fscrypt_has_encryption_key() to supp/notsupp headers
Eric Biggers [Fri, 5 Jan 2018 18:44:52 +0000 (10:44 -0800)]
fscrypt: move fscrypt_has_encryption_key() to supp/notsupp headers

fscrypt_has_encryption_key() is already split into two versions
depending on whether the filesystem is being built with encryption
support or not.  Move them into the appropriate headers.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agof2fs: don't put dentry page in pagecache into highmem
Yunlong Song [Wed, 28 Feb 2018 12:31:52 +0000 (20:31 +0800)]
f2fs: don't put dentry page in pagecache into highmem

Previous dentry page uses highmem, which will cause panic in platforms
using highmem (such as arm), since the address space of dentry pages
from highmem directly goes into the decryption path via the function
fscrypt_fname_disk_to_usr. But sg_init_one assumes the address is not
from highmem, and then cause panic since it doesn't call kmap_high but
kunmap_high is triggered at the end. To fix this problem in a simple
way, this patch avoids to put dentry page in pagecache into highmem.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix coding style]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: support inode creation time
Chao Yu [Thu, 25 Jan 2018 06:54:42 +0000 (14:54 +0800)]
f2fs: support inode creation time

This patch adds creation time field in inode layout to support showing
kstat.btime in ->statx.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: rebuild sit page from sit info in mem
Yunlei He [Thu, 25 Jan 2018 09:27:11 +0000 (17:27 +0800)]
f2fs: rebuild sit page from sit info in mem

This patch rebuild sit page from sit info in mem instead
of issue a read io.

I test this method and the result is as below:

Pre:
 mmc_perf_test-12061 [001] ...1   976.819992: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
 mmc_perf_test-12061 [001] ...1   976.856446: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
 mmc_perf_test-12061 [003] ...1   998.976946: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
 mmc_perf_test-12061 [003] ...1   999.023269: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
 mmc_perf_test-12061 [003] ...1  1022.060772: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
 mmc_perf_test-12061 [003] ...1  1022.111034: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
 mmc_perf_test-12061 [002] ...1  1070.127643: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
 mmc_perf_test-12061 [003] ...1  1070.187352: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
 mmc_perf_test-12061 [003] ...1  1095.942124: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
 mmc_perf_test-12061 [003] ...1  1095.995975: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
 mmc_perf_test-12061 [003] ...1  1122.535091: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
 mmc_perf_test-12061 [003] ...1  1122.586521: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
 mmc_perf_test-12061 [001] ...1  1147.897487: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
 mmc_perf_test-12061 [001] ...1  1147.959438: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
 mmc_perf_test-12061 [003] ...1  1177.926951: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
 mmc_perf_test-12061 [002] ...1  1177.976823: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
 mmc_perf_test-12061 [002] ...1  1204.176087: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
 mmc_perf_test-12061 [002] ...1  1204.239046: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit

Some sit flush consume more than 50ms.

Now:
mmc_perf_test-2187  [007] ...1   196.840684: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187  [007] ...1   196.841258: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187  [007] ...1   219.430582: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187  [007] ...1   219.431144: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187  [002] ...1   243.638678: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187  [000] ...1   243.638980: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187  [002] ...1   265.392180: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187  [002] ...1   265.392245: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187  [000] ...1   290.309051: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187  [000] ...1   290.309116: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187  [003] ...1   317.144209: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187  [003] ...1   317.145913: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187  [005] ...1   343.224954: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187  [005] ...1   343.225574: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187  [000] ...1   370.239846: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187  [000] ...1   370.241138: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187  [001] ...1   397.029043: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187  [001] ...1   397.030750: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit
mmc_perf_test-2187  [003] ...1   425.386377: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = start flush sit
mmc_perf_test-2187  [003] ...1   425.387735: f2fs_write_checkpoint: dev = (259,44), checkpoint for Sync, state = end flush sit

Most sit flush consume no more than 1ms.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: stop issuing discard if fs is readonly
Chao Yu [Thu, 25 Jan 2018 10:57:27 +0000 (18:57 +0800)]
f2fs: stop issuing discard if fs is readonly

If filesystem is readonly, stop to issue discard in daemon.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: clean up duplicated assignment in init_discard_policy
Chao Yu [Thu, 25 Jan 2018 10:57:26 +0000 (18:57 +0800)]
f2fs: clean up duplicated assignment in init_discard_policy

Remove duplicated codes of assignment for .max_requests and .io_aware_gran.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: use GFP_F2FS_ZERO for cleanup
Chao Yu [Thu, 25 Jan 2018 10:57:25 +0000 (18:57 +0800)]
f2fs: use GFP_F2FS_ZERO for cleanup

Clean up codes with GFP_F2FS_ZERO, no logic changes.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: allow to recover node blocks given updated checkpoint
Jaegeuk Kim [Fri, 19 Jan 2018 21:42:33 +0000 (13:42 -0800)]
f2fs: allow to recover node blocks given updated checkpoint

If fsck.f2fs changes crc, we have no way to recover some inode blocks by roll-
forward recovery. Let's relax the condition to recover them.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: recover some i_inline flags
Jaegeuk Kim [Sat, 20 Jan 2018 04:01:40 +0000 (20:01 -0800)]
f2fs: recover some i_inline flags

This fixes lost i_inline flags during roll-forward.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: correct removexattr behavior for null valued extended attribute
Daeho Jeong [Sat, 20 Jan 2018 07:46:33 +0000 (15:46 +0800)]
f2fs: correct removexattr behavior for null valued extended attribute

__vfs_removexattr() transfers "NULL" value to the setxattr handler of
the f2fs filesystem in order to remove the extended attribute. But,
__f2fs_setxattr() just ignores the removal request when the value of
the extended attribute is already NULL. We have to remove the extended
attribute itself even if the value of that is already NULL.

We can reporduce this bug with the below:

1. touch file
2. setfattr -n "user.foo" file
3. setfattr -x "user.foo" file
4. getfattr -d file
> user.foo

Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com>
Signed-off-by: Youngjin Gil <youngjin.gil@samsung.com>
Tested-by: Hobin Woo <hobin.woo@samsung.com>
Tested-by: Chao Yu <yuchao0@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: drop page cache after fs shutdown
Chao Yu [Thu, 18 Jan 2018 09:29:10 +0000 (17:29 +0800)]
f2fs: drop page cache after fs shutdown

Don't remain dirtied page cache in f2fs after shutdown, it can mitigate
memory pressure of whole system, in order to keep other modules working
properly.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: stop gc/discard thread after fs shutdown
Chao Yu [Thu, 18 Jan 2018 09:23:29 +0000 (17:23 +0800)]
f2fs: stop gc/discard thread after fs shutdown

Once filesystem shuts down, daemons like gc/discard thread should be
aware of it, and do exit, in addtion, drop all cached pending discard
commands and turn off real-time discard mode.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: hanlde error case in f2fs_ioc_shutdown
Chao Yu [Wed, 17 Jan 2018 14:28:52 +0000 (22:28 +0800)]
f2fs: hanlde error case in f2fs_ioc_shutdown

This patch makes f2fs_ioc_shutdown handling error case correctly.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: split need_inplace_update
Chao Yu [Wed, 17 Jan 2018 08:31:38 +0000 (16:31 +0800)]
f2fs: split need_inplace_update

This patch splits need_inplace_update to two functions:
a. should_update_inplace() includes all conditions that we must use IPU.
b. should_update_outplace() includes all conditions that we must use OPU.

So that, in f2fs_ioc_set_pin_file() and f2fs_defragment_range(), we can
use corresponding function to check whether we can trigger OPU/IPU or not.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix to update last_disk_size correctly
Chao Yu [Wed, 17 Jan 2018 08:31:37 +0000 (16:31 +0800)]
f2fs: fix to update last_disk_size correctly

This patch fixes to update last_disk_size only when writing out page
successfully.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: kill F2FS_INLINE_XATTR_ADDRS for cleanup
Chao Yu [Wed, 17 Jan 2018 08:31:36 +0000 (16:31 +0800)]
f2fs: kill F2FS_INLINE_XATTR_ADDRS for cleanup

Use get_inline_xattr_addrs directly instead of F2FS_INLINE_XATTR_ADDRS.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: clean up error path of fill_super
Chao Yu [Wed, 17 Jan 2018 08:31:35 +0000 (16:31 +0800)]
f2fs: clean up error path of fill_super

This patch cleans up error path of fille_super to avoid unneeded
release step.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: avoid hungtask when GC encrypted block if io_bits is set
Sheng Yong [Wed, 17 Jan 2018 04:11:31 +0000 (12:11 +0800)]
f2fs: avoid hungtask when GC encrypted block if io_bits is set

When io_bits is set, GCing encrypted block may hit the following hungtask.
Since io_bits requires aligned block address, f2fs_submit_page_write may
return -EAGAIN if new_blkaddr does not satisify io_bits alignment. As a
result, the encrypted page will never be writtenback.

This patch makes move_data_block aware the EAGAIN error and cancel the
writeback.

[  246.751371] INFO: task kworker/u4:4:797 blocked for more than 90 seconds.
[  246.752423]       Not tainted 4.15.0-rc4+ #11
[  246.754176] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  246.755336] kworker/u4:4    D25448   797      2 0x80000000
[  246.755597] Workqueue: writeback wb_workfn (flush-7:0)
[  246.755616] Call Trace:
[  246.755695]  ? __schedule+0x322/0xa90
[  246.755761]  ? blk_init_request_from_bio+0x120/0x120
[  246.755773]  ? pci_mmcfg_check_reserved+0xb0/0xb0
[  246.755801]  ? __radix_tree_create+0x19e/0x200
[  246.755813]  ? delete_node+0x136/0x370
[  246.755838]  schedule+0x43/0xc0
[  246.755904]  io_schedule+0x17/0x40
[  246.755939]  wait_on_page_bit_common+0x17b/0x240
[  246.755950]  ? wake_page_function+0xa0/0xa0
[  246.755961]  ? add_to_page_cache_lru+0x160/0x160
[  246.755972]  ? page_cache_tree_insert+0x170/0x170
[  246.755983]  ? __lru_cache_add+0x96/0xb0
[  246.756086]  __filemap_fdatawait_range+0x14f/0x1c0
[  246.756097]  ? wait_on_page_bit_common+0x240/0x240
[  246.756120]  ? __wake_up_locked_key_bookmark+0x20/0x20
[  246.756167]  ? wait_on_all_pages_writeback+0xc9/0x100
[  246.756179]  ? __remove_ino_entry+0x120/0x120
[  246.756192]  ? wait_woken+0x100/0x100
[  246.756204]  filemap_fdatawait_range+0x9/0x20
[  246.756216]  write_checkpoint+0x18a1/0x1f00
[  246.756254]  ? blk_get_request+0x10/0x10
[  246.756265]  ? cpumask_next_and+0x43/0x60
[  246.756279]  ? f2fs_sync_inode_meta+0x160/0x160
[  246.756289]  ? remove_element.isra.4+0xa0/0xa0
[  246.756300]  ? __put_compound_page+0x40/0x40
[  246.756310]  ? f2fs_sync_fs+0xec/0x1c0
[  246.756320]  ? f2fs_sync_fs+0x120/0x1c0
[  246.756329]  f2fs_sync_fs+0x120/0x1c0
[  246.756357]  ? trace_event_raw_event_f2fs__page+0x260/0x260
[  246.756393]  ? ata_build_rw_tf+0x173/0x410
[  246.756397]  f2fs_balance_fs_bg+0x198/0x390
[  246.756405]  ? drop_inmem_page+0x230/0x230
[  246.756415]  ? ahci_qc_prep+0x1bb/0x2e0
[  246.756418]  ? ahci_qc_issue+0x1df/0x290
[  246.756422]  ? __accumulate_pelt_segments+0x42/0xd0
[  246.756426]  ? f2fs_write_node_pages+0xd1/0x380
[  246.756429]  f2fs_write_node_pages+0xd1/0x380
[  246.756437]  ? sync_node_pages+0x8f0/0x8f0
[  246.756440]  ? update_curr+0x53/0x220
[  246.756444]  ? __accumulate_pelt_segments+0xa2/0xd0
[  246.756448]  ? __update_load_avg_se.isra.39+0x349/0x360
[  246.756452]  ? do_writepages+0x2a/0xa0
[  246.756456]  do_writepages+0x2a/0xa0
[  246.756460]  __writeback_single_inode+0x70/0x490
[  246.756463]  ? check_preempt_wakeup+0x199/0x310
[  246.756467]  writeback_sb_inodes+0x2a2/0x660
[  246.756471]  ? is_empty_dir_inode+0x40/0x40
[  246.756474]  ? __writeback_single_inode+0x490/0x490
[  246.756477]  ? string+0xbf/0xf0
[  246.756480]  ? down_read_trylock+0x35/0x60
[  246.756484]  __writeback_inodes_wb+0x9f/0xf0
[  246.756488]  wb_writeback+0x41d/0x4b0
[  246.756492]  ? writeback_inodes_wb.constprop.55+0x150/0x150
[  246.756498]  ? set_worker_desc+0xf7/0x130
[  246.756502]  ? current_is_workqueue_rescuer+0x60/0x60
[  246.756511]  ? _find_next_bit+0x2c/0xa0
[  246.756514]  ? wb_workfn+0x400/0x5d0
[  246.756518]  wb_workfn+0x400/0x5d0
[  246.756521]  ? finish_task_switch+0xdf/0x2a0
[  246.756525]  ? inode_wait_for_writeback+0x30/0x30
[  246.756529]  process_one_work+0x3a7/0x6f0
[  246.756533]  worker_thread+0x82/0x750
[  246.756537]  kthread+0x16f/0x1c0
[  246.756541]  ? trace_event_raw_event_workqueue_work+0x110/0x110
[  246.756544]  ? kthread_create_worker_on_cpu+0xb0/0xb0
[  246.756548]  ret_from_fork+0x1f/0x30

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: allow quota to use reserved blocks
Jaegeuk Kim [Sat, 6 Jan 2018 00:02:36 +0000 (16:02 -0800)]
f2fs: allow quota to use reserved blocks

This patch allows quota to use reserved blocks all the time.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix to drop all inmem pages correctly
Chao Yu [Mon, 15 Jan 2018 09:16:46 +0000 (17:16 +0800)]
f2fs: fix to drop all inmem pages correctly

In commit 57864ae5ce3a ("f2fs: limit # of inmemory pages"), we have
limited memory footprint of all inmem pages with 20% of total memory,
otherwise, if we exceed the threshold, we will try to drop all inmem
pages to avoid excessive memory pressure resulting in performance
regression.

But in some unrelated error paths, we will also drop all inmem pages,
which should be wrong, fix it in this patch.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: speed up defragment on sparse file
Chao Yu [Wed, 10 Jan 2018 10:18:52 +0000 (18:18 +0800)]
f2fs: speed up defragment on sparse file

We have supported to get next page offset with valid mapping crossing
hole in f2fs_map_blocks, utilizing it to speed up defragment on sparse
file.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: support F2FS_IOC_PRECACHE_EXTENTS
Chao Yu [Thu, 11 Jan 2018 06:42:30 +0000 (14:42 +0800)]
f2fs: support F2FS_IOC_PRECACHE_EXTENTS

This patch introduces a new ioctl F2FS_IOC_PRECACHE_EXTENTS to precache
extent info like ext4, in order to gain better performance during
triggering AIO by eliminating synchronous waiting of mapping info.

Referred commit: 7869a4a6c5ca ("ext4: add support for extent pre-caching")

In addition, with newly added extent precache abilitiy, this patch add
to support FIEMAP_FLAG_CACHE in ->fiemap.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: add an ioctl to disable GC for specific file
Jaegeuk Kim [Fri, 8 Dec 2017 00:25:39 +0000 (16:25 -0800)]
f2fs: add an ioctl to disable GC for specific file

This patch gives a flag to disable GC on given file, which would be useful, when
user wants to keep its block map. It also conducts in-place-update for dontmove
file.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: prevent newly created inode from being dirtied incorrectly
Daeho Jeong [Thu, 11 Jan 2018 02:26:19 +0000 (11:26 +0900)]
f2fs: prevent newly created inode from being dirtied incorrectly

Now, we invoke f2fs_mark_inode_dirty_sync() to make an inode dirty in
advance of creating a new node page for the inode. By this, some inodes
whose node page is not created yet can be linked into the global dirty
list.

If the checkpoint is executed at this moment, the inode will be written
back by writeback_single_inode() and finally update_inode_page() will
fail to detach the inode from the global dirty list because the inode
doesn't have a node page.

The problem is that the inode's state in VFS layer will become clean
after execution of writeback_single_inode() and it's still linked in
the global dirty list of f2fs and this will cause a kernel panic.

So, we will prevent the newly created inode from being dirtied during
the FI_NEW_INODE flag of the inode is set. We will make it dirty
right after the flag is cleared.

Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com>
Signed-off-by: Youngjin Gil <youngjin.gil@samsung.com>
Tested-by: Hobin Woo <hobin.woo@samsung.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: support FIEMAP_FLAG_XATTR
Chao Yu [Thu, 11 Jan 2018 06:39:57 +0000 (14:39 +0800)]
f2fs: support FIEMAP_FLAG_XATTR

This patch enables ->fiemap to handle FIEMAP_FLAG_XATTR flag for xattr
mapping info lookup purpose.

It makes f2fs passing generic/425 test in fstest.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix to cover f2fs_inline_data_fiemap with inode_lock
Chao Yu [Thu, 11 Jan 2018 06:37:35 +0000 (14:37 +0800)]
f2fs: fix to cover f2fs_inline_data_fiemap with inode_lock

This patch fix to cover f2fs_inline_data_fiemap with inode_lock in order
to make that interface avoiding race with mapping change.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: check node page again in write end io
Yunlei He [Thu, 11 Jan 2018 06:19:32 +0000 (14:19 +0800)]
f2fs: check node page again in write end io

Check node page again in write end io in case of
data corruption during inflght IO.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix to caclulate required free section correctly
Chao Yu [Wed, 10 Jan 2018 10:18:51 +0000 (18:18 +0800)]
f2fs: fix to caclulate required free section correctly

When calculating required free section during file defragmenting, we
should skip holes in file, otherwise we will probably fail to defrag
sparse file with large size.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: handle newly created page when revoking inmem pages
Daeho Jeong [Wed, 10 Jan 2018 07:49:10 +0000 (16:49 +0900)]
f2fs: handle newly created page when revoking inmem pages

When committing inmem pages is successful, we revoke already committed
blocks in __revoke_inmem_pages() and finally replace the committed
ones with the old blocks using f2fs_replace_block(). However, if
the committed block was newly created one, the address of the old
block is NEW_ADDR and __f2fs_replace_block() cannot handle NEW_ADDR
as new_blkaddr properly and a kernel panic occurrs.

Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com>
Tested-by: Shu Tan <shu.tan@samsung.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: add resgid and resuid to reserve root blocks
Jaegeuk Kim [Fri, 5 Jan 2018 05:36:09 +0000 (21:36 -0800)]
f2fs: add resgid and resuid to reserve root blocks

This patch adds mount options to reserve some blocks via resgid=%u,resuid=%u.
It only activates with reserve_root=%u.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: implement cgroup writeback support
Yufen Yu [Tue, 9 Jan 2018 11:33:39 +0000 (19:33 +0800)]
f2fs: implement cgroup writeback support

Cgroup writeback requires explicit support from the filesystem.
f2fs's data and node writeback IOs go through __write_data_page,
which sets fio for submiting IOs. So, we add io_wbc for fio,
associate bios with blkcg by invoking wbc_init_bio() and
account IOs issuing by wbc_account_io().
In addtion, f2fs_fill_super() is updated to set SB_I_CGROUPWB.

Meta writeback IOs is left alone by this patch and will always be
attributed to the root cgroup.

The results show that f2fs can throttle writeback nicely for
data writing and file creating.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: remove unused pend_list_tag
Chao Yu [Mon, 8 Jan 2018 10:48:34 +0000 (18:48 +0800)]
f2fs: remove unused pend_list_tag

In commit 78997b569f56 ("f2fs: split discard policy"), we have get rid
of using pend_list_tag field in struct discard_cmd_control, but forgot
to remove it, now do it.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: avoid high cpu usage in discard thread
Chao Yu [Mon, 8 Jan 2018 10:48:33 +0000 (18:48 +0800)]
f2fs: avoid high cpu usage in discard thread

We take very long time to finish generic/476, this is because we will
check consistence of all discard entries in global rb tree while
traversing all different granularity pending lists, even when the list
is empty, in order to avoid that unneeded overhead, we have to skip
the check when coming up an empty list.

generic/476 time consumption:
cost
Before patch & w/o consistence check 57s
Before patch & w/ consistence check 1426s
After patch 78s

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: make local functions static
Wei Yongjun [Fri, 5 Jan 2018 09:41:20 +0000 (09:41 +0000)]
f2fs: make local functions static

Fixes the following sparse warnings:

fs/f2fs/segment.c:887:6: warning:
 symbol '__check_sit_bitmap' was not declared. Should it be static?
fs/f2fs/segment.c:1327:6: warning:
 symbol 'f2fs_wait_discard_bio' was not declared. Should it be static?
fs/f2fs/super.c:1661:5: warning:
 symbol 'f2fs_get_projid' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: add reserved blocks for root user
Jaegeuk Kim [Wed, 27 Dec 2017 23:05:52 +0000 (15:05 -0800)]
f2fs: add reserved blocks for root user

This patch allows root to reserve some blocks via mount option.

"-o reserve_root=N" means N x 4KB-sized blocks for root only.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: check segment type in __f2fs_replace_block
Yunlong Song [Thu, 4 Jan 2018 07:02:02 +0000 (15:02 +0800)]
f2fs: check segment type in __f2fs_replace_block

In some case, the node blocks has wrong blkaddr whose segment type is
NODE, e.g., recover inode has missing xattr flag and the blkaddr is in
the xattr range. Since fsck.f2fs does not check the recovery nodes, this
will cause __f2fs_replace_block change the curseg of node and do the
update_sit_entry(sbi, new_blkaddr, 1) with no next_blkoff refresh, as a
result, when recovery process write checkpoint and sync nodes, the
next_blkoff of curseg is used in the segment bit map, then it will
cause f2fs_bug_on. So let's check segment type in __f2fs_replace_block.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: update inode info to inode page for new file
Yunlei He [Wed, 3 Jan 2018 10:03:04 +0000 (18:03 +0800)]
f2fs: update inode info to inode page for new file

After checkpoint,
 1. creat a new file A ,(with dirty inode && dirty inode page && xattr info)
 2. backgroud wb write back file A inode page (without update from inode cache)
 3. fsync file A, write back inode page of file A with inode cache info
 4. sudden power off before new checkpoint

In this case, recovery process will try to recover a zero inode
page. Inline xattr flag of file A will be miss and xattr info
will be taken as blkaddr index.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: show precise # of blocks that user/root can use
Jaegeuk Kim [Wed, 3 Jan 2018 18:55:07 +0000 (10:55 -0800)]
f2fs: show precise # of blocks that user/root can use

Let's show precise # of blocks that user/root can use through bavail and bfree
respectively.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: clean up unneeded declaration
Chao Yu [Wed, 3 Jan 2018 09:32:51 +0000 (17:32 +0800)]
f2fs: clean up unneeded declaration

Commit 6afc662e68b5 ("f2fs: support flexible inline xattr size")
declared f2fs_sb_has_flexible_inline_xattr in f2fs.h for latter being
used in get_inline_xattr_addrs, but in latter version, related code
has been changed, leave f2fs_sb_has_flexible_inline_xattr w/o any
users. Let's remove it for cleanup.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: continue to do direct IO if we only preallocate partial blocks
Chao Yu [Wed, 3 Jan 2018 09:30:19 +0000 (17:30 +0800)]
f2fs: continue to do direct IO if we only preallocate partial blocks

While doing direct IO, if we run out-of-space when we preallocate blocks,
we should not return ENOSPC error directly, instead, we should continue
to do following direct IO, which will keep directIO of f2fs acting like
other filesystems.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: enable quota at remount from r to w
Jaegeuk Kim [Tue, 2 Jan 2018 19:03:19 +0000 (11:03 -0800)]
f2fs: enable quota at remount from r to w

We have to enable quota only when remounting from read to write. Otherwise,
we'll get remount failure. (e.g., write to write case)

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: skip stop_checkpoint for user data writes
Jaegeuk Kim [Mon, 1 Jan 2018 00:26:38 +0000 (16:26 -0800)]
f2fs: skip stop_checkpoint for user data writes

We can give another chance to write user data, which can resolve
generic/441.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix missing error number for xattr operation
Jaegeuk Kim [Fri, 29 Dec 2017 01:47:19 +0000 (17:47 -0800)]
f2fs: fix missing error number for xattr operation

This fixes generic/449 hang problem caused by no ENOSPC forever which should be
returned by setxattr under disk full scenario.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: recover directory operations by fsync
Jaegeuk Kim [Thu, 28 Dec 2017 16:09:44 +0000 (08:09 -0800)]
f2fs: recover directory operations by fsync

This fixes generic/342 which doesn't recover renamed file which was fsynced
before. It will be done via another fsync on newly created file.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: return error during fill_super
Jaegeuk Kim [Wed, 20 Dec 2017 03:16:34 +0000 (19:16 -0800)]
f2fs: return error during fill_super

Let's avoid BUG_ON during fill_super, when on-disk was totall corrupted.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix an error case of missing update inode page
Yunlei He [Tue, 5 Dec 2017 04:07:47 +0000 (12:07 +0800)]
f2fs: fix an error case of missing update inode page

-Thread A                             Thread B

-write_checkpoint
 -block_operations
  -f2fs_unlock_all                    -f2fs_sync_file
                                       -f2fs_write_inode
                                        -f2fs_inode_synced
    -f2fs_sync_inode_meta
     -sync_node_pages
                                        -set_page_drity

In this case, if sudden power off without next new checkpoint,
the last inode page update will lost. wb_writeback is same with
fsync.

Yunlei also reproduced the bug by:

@@ -366,7 +366,7 @@ int update_inode(struct inode *inode, struct page *node_page)
        struct extent_tree *et = F2FS_I(inode)->extent_tree;

        f2fs_inode_synced(inode);
-
+       msleep(10000);
        f2fs_wait_on_page_writeback(node_page, NODE, true);

shell 1:                                       shell2:

dd if=/dev/zero of=./test bs=1M count=10
sync
echo "hello" >> ./test
fsync test  // sleep 10s
                                               sync //return quickly
echo c > /proc/sysrq-trigger

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix potential hangtask in f2fs_trace_pid
Chao Yu [Tue, 12 Dec 2017 06:11:40 +0000 (14:11 +0800)]
f2fs: fix potential hangtask in f2fs_trace_pid

As Jia-Ju Bai reported:

"According to fs/f2fs/trace.c, the kernel module may sleep under a spinlock.
The function call path is:
f2fs_trace_pid (acquire the spinlock)
   f2fs_radix_tree_insert
     cond_resched --> may sleep

I do not find a good way to fix it, so I only report.
This possible bug is found by my static analysis tool (DSAC) and my code
review."

Obviously, it's problemetic to schedule in critical region of spinlock,
which will cause uninterruptable sleep if there is no waker.

This patch changes to use mutex lock intead of spinlock to avoid this
condition.

Reported-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: no need return value in restore summary process
Yunlei He [Wed, 6 Dec 2017 03:31:29 +0000 (11:31 +0800)]
f2fs: no need return value in restore summary process

No need return value in restore summary process

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>