OSDN Git Service

sagit-ice-cold/kernel_xiaomi_msm8998.git
6 years agof2fs: allow to recover node blocks given updated checkpoint
Jaegeuk Kim [Fri, 19 Jan 2018 21:42:33 +0000 (13:42 -0800)]
f2fs: allow to recover node blocks given updated checkpoint

If fsck.f2fs changes crc, we have no way to recover some inode blocks by roll-
forward recovery. Let's relax the condition to recover them.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: recover some i_inline flags
Jaegeuk Kim [Sat, 20 Jan 2018 04:01:40 +0000 (20:01 -0800)]
f2fs: recover some i_inline flags

This fixes lost i_inline flags during roll-forward.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: correct removexattr behavior for null valued extended attribute
Daeho Jeong [Sat, 20 Jan 2018 07:46:33 +0000 (15:46 +0800)]
f2fs: correct removexattr behavior for null valued extended attribute

__vfs_removexattr() transfers "NULL" value to the setxattr handler of
the f2fs filesystem in order to remove the extended attribute. But,
__f2fs_setxattr() just ignores the removal request when the value of
the extended attribute is already NULL. We have to remove the extended
attribute itself even if the value of that is already NULL.

We can reporduce this bug with the below:

1. touch file
2. setfattr -n "user.foo" file
3. setfattr -x "user.foo" file
4. getfattr -d file
> user.foo

Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com>
Signed-off-by: Youngjin Gil <youngjin.gil@samsung.com>
Tested-by: Hobin Woo <hobin.woo@samsung.com>
Tested-by: Chao Yu <yuchao0@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: drop page cache after fs shutdown
Chao Yu [Thu, 18 Jan 2018 09:29:10 +0000 (17:29 +0800)]
f2fs: drop page cache after fs shutdown

Don't remain dirtied page cache in f2fs after shutdown, it can mitigate
memory pressure of whole system, in order to keep other modules working
properly.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: stop gc/discard thread after fs shutdown
Chao Yu [Thu, 18 Jan 2018 09:23:29 +0000 (17:23 +0800)]
f2fs: stop gc/discard thread after fs shutdown

Once filesystem shuts down, daemons like gc/discard thread should be
aware of it, and do exit, in addtion, drop all cached pending discard
commands and turn off real-time discard mode.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: hanlde error case in f2fs_ioc_shutdown
Chao Yu [Wed, 17 Jan 2018 14:28:52 +0000 (22:28 +0800)]
f2fs: hanlde error case in f2fs_ioc_shutdown

This patch makes f2fs_ioc_shutdown handling error case correctly.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: split need_inplace_update
Chao Yu [Wed, 17 Jan 2018 08:31:38 +0000 (16:31 +0800)]
f2fs: split need_inplace_update

This patch splits need_inplace_update to two functions:
a. should_update_inplace() includes all conditions that we must use IPU.
b. should_update_outplace() includes all conditions that we must use OPU.

So that, in f2fs_ioc_set_pin_file() and f2fs_defragment_range(), we can
use corresponding function to check whether we can trigger OPU/IPU or not.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix to update last_disk_size correctly
Chao Yu [Wed, 17 Jan 2018 08:31:37 +0000 (16:31 +0800)]
f2fs: fix to update last_disk_size correctly

This patch fixes to update last_disk_size only when writing out page
successfully.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: kill F2FS_INLINE_XATTR_ADDRS for cleanup
Chao Yu [Wed, 17 Jan 2018 08:31:36 +0000 (16:31 +0800)]
f2fs: kill F2FS_INLINE_XATTR_ADDRS for cleanup

Use get_inline_xattr_addrs directly instead of F2FS_INLINE_XATTR_ADDRS.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: clean up error path of fill_super
Chao Yu [Wed, 17 Jan 2018 08:31:35 +0000 (16:31 +0800)]
f2fs: clean up error path of fill_super

This patch cleans up error path of fille_super to avoid unneeded
release step.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: avoid hungtask when GC encrypted block if io_bits is set
Sheng Yong [Wed, 17 Jan 2018 04:11:31 +0000 (12:11 +0800)]
f2fs: avoid hungtask when GC encrypted block if io_bits is set

When io_bits is set, GCing encrypted block may hit the following hungtask.
Since io_bits requires aligned block address, f2fs_submit_page_write may
return -EAGAIN if new_blkaddr does not satisify io_bits alignment. As a
result, the encrypted page will never be writtenback.

This patch makes move_data_block aware the EAGAIN error and cancel the
writeback.

[  246.751371] INFO: task kworker/u4:4:797 blocked for more than 90 seconds.
[  246.752423]       Not tainted 4.15.0-rc4+ #11
[  246.754176] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  246.755336] kworker/u4:4    D25448   797      2 0x80000000
[  246.755597] Workqueue: writeback wb_workfn (flush-7:0)
[  246.755616] Call Trace:
[  246.755695]  ? __schedule+0x322/0xa90
[  246.755761]  ? blk_init_request_from_bio+0x120/0x120
[  246.755773]  ? pci_mmcfg_check_reserved+0xb0/0xb0
[  246.755801]  ? __radix_tree_create+0x19e/0x200
[  246.755813]  ? delete_node+0x136/0x370
[  246.755838]  schedule+0x43/0xc0
[  246.755904]  io_schedule+0x17/0x40
[  246.755939]  wait_on_page_bit_common+0x17b/0x240
[  246.755950]  ? wake_page_function+0xa0/0xa0
[  246.755961]  ? add_to_page_cache_lru+0x160/0x160
[  246.755972]  ? page_cache_tree_insert+0x170/0x170
[  246.755983]  ? __lru_cache_add+0x96/0xb0
[  246.756086]  __filemap_fdatawait_range+0x14f/0x1c0
[  246.756097]  ? wait_on_page_bit_common+0x240/0x240
[  246.756120]  ? __wake_up_locked_key_bookmark+0x20/0x20
[  246.756167]  ? wait_on_all_pages_writeback+0xc9/0x100
[  246.756179]  ? __remove_ino_entry+0x120/0x120
[  246.756192]  ? wait_woken+0x100/0x100
[  246.756204]  filemap_fdatawait_range+0x9/0x20
[  246.756216]  write_checkpoint+0x18a1/0x1f00
[  246.756254]  ? blk_get_request+0x10/0x10
[  246.756265]  ? cpumask_next_and+0x43/0x60
[  246.756279]  ? f2fs_sync_inode_meta+0x160/0x160
[  246.756289]  ? remove_element.isra.4+0xa0/0xa0
[  246.756300]  ? __put_compound_page+0x40/0x40
[  246.756310]  ? f2fs_sync_fs+0xec/0x1c0
[  246.756320]  ? f2fs_sync_fs+0x120/0x1c0
[  246.756329]  f2fs_sync_fs+0x120/0x1c0
[  246.756357]  ? trace_event_raw_event_f2fs__page+0x260/0x260
[  246.756393]  ? ata_build_rw_tf+0x173/0x410
[  246.756397]  f2fs_balance_fs_bg+0x198/0x390
[  246.756405]  ? drop_inmem_page+0x230/0x230
[  246.756415]  ? ahci_qc_prep+0x1bb/0x2e0
[  246.756418]  ? ahci_qc_issue+0x1df/0x290
[  246.756422]  ? __accumulate_pelt_segments+0x42/0xd0
[  246.756426]  ? f2fs_write_node_pages+0xd1/0x380
[  246.756429]  f2fs_write_node_pages+0xd1/0x380
[  246.756437]  ? sync_node_pages+0x8f0/0x8f0
[  246.756440]  ? update_curr+0x53/0x220
[  246.756444]  ? __accumulate_pelt_segments+0xa2/0xd0
[  246.756448]  ? __update_load_avg_se.isra.39+0x349/0x360
[  246.756452]  ? do_writepages+0x2a/0xa0
[  246.756456]  do_writepages+0x2a/0xa0
[  246.756460]  __writeback_single_inode+0x70/0x490
[  246.756463]  ? check_preempt_wakeup+0x199/0x310
[  246.756467]  writeback_sb_inodes+0x2a2/0x660
[  246.756471]  ? is_empty_dir_inode+0x40/0x40
[  246.756474]  ? __writeback_single_inode+0x490/0x490
[  246.756477]  ? string+0xbf/0xf0
[  246.756480]  ? down_read_trylock+0x35/0x60
[  246.756484]  __writeback_inodes_wb+0x9f/0xf0
[  246.756488]  wb_writeback+0x41d/0x4b0
[  246.756492]  ? writeback_inodes_wb.constprop.55+0x150/0x150
[  246.756498]  ? set_worker_desc+0xf7/0x130
[  246.756502]  ? current_is_workqueue_rescuer+0x60/0x60
[  246.756511]  ? _find_next_bit+0x2c/0xa0
[  246.756514]  ? wb_workfn+0x400/0x5d0
[  246.756518]  wb_workfn+0x400/0x5d0
[  246.756521]  ? finish_task_switch+0xdf/0x2a0
[  246.756525]  ? inode_wait_for_writeback+0x30/0x30
[  246.756529]  process_one_work+0x3a7/0x6f0
[  246.756533]  worker_thread+0x82/0x750
[  246.756537]  kthread+0x16f/0x1c0
[  246.756541]  ? trace_event_raw_event_workqueue_work+0x110/0x110
[  246.756544]  ? kthread_create_worker_on_cpu+0xb0/0xb0
[  246.756548]  ret_from_fork+0x1f/0x30

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: allow quota to use reserved blocks
Jaegeuk Kim [Sat, 6 Jan 2018 00:02:36 +0000 (16:02 -0800)]
f2fs: allow quota to use reserved blocks

This patch allows quota to use reserved blocks all the time.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix to drop all inmem pages correctly
Chao Yu [Mon, 15 Jan 2018 09:16:46 +0000 (17:16 +0800)]
f2fs: fix to drop all inmem pages correctly

In commit 57864ae5ce3a ("f2fs: limit # of inmemory pages"), we have
limited memory footprint of all inmem pages with 20% of total memory,
otherwise, if we exceed the threshold, we will try to drop all inmem
pages to avoid excessive memory pressure resulting in performance
regression.

But in some unrelated error paths, we will also drop all inmem pages,
which should be wrong, fix it in this patch.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: speed up defragment on sparse file
Chao Yu [Wed, 10 Jan 2018 10:18:52 +0000 (18:18 +0800)]
f2fs: speed up defragment on sparse file

We have supported to get next page offset with valid mapping crossing
hole in f2fs_map_blocks, utilizing it to speed up defragment on sparse
file.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: support F2FS_IOC_PRECACHE_EXTENTS
Chao Yu [Thu, 11 Jan 2018 06:42:30 +0000 (14:42 +0800)]
f2fs: support F2FS_IOC_PRECACHE_EXTENTS

This patch introduces a new ioctl F2FS_IOC_PRECACHE_EXTENTS to precache
extent info like ext4, in order to gain better performance during
triggering AIO by eliminating synchronous waiting of mapping info.

Referred commit: 7869a4a6c5ca ("ext4: add support for extent pre-caching")

In addition, with newly added extent precache abilitiy, this patch add
to support FIEMAP_FLAG_CACHE in ->fiemap.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: add an ioctl to disable GC for specific file
Jaegeuk Kim [Fri, 8 Dec 2017 00:25:39 +0000 (16:25 -0800)]
f2fs: add an ioctl to disable GC for specific file

This patch gives a flag to disable GC on given file, which would be useful, when
user wants to keep its block map. It also conducts in-place-update for dontmove
file.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: prevent newly created inode from being dirtied incorrectly
Daeho Jeong [Thu, 11 Jan 2018 02:26:19 +0000 (11:26 +0900)]
f2fs: prevent newly created inode from being dirtied incorrectly

Now, we invoke f2fs_mark_inode_dirty_sync() to make an inode dirty in
advance of creating a new node page for the inode. By this, some inodes
whose node page is not created yet can be linked into the global dirty
list.

If the checkpoint is executed at this moment, the inode will be written
back by writeback_single_inode() and finally update_inode_page() will
fail to detach the inode from the global dirty list because the inode
doesn't have a node page.

The problem is that the inode's state in VFS layer will become clean
after execution of writeback_single_inode() and it's still linked in
the global dirty list of f2fs and this will cause a kernel panic.

So, we will prevent the newly created inode from being dirtied during
the FI_NEW_INODE flag of the inode is set. We will make it dirty
right after the flag is cleared.

Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com>
Signed-off-by: Youngjin Gil <youngjin.gil@samsung.com>
Tested-by: Hobin Woo <hobin.woo@samsung.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: support FIEMAP_FLAG_XATTR
Chao Yu [Thu, 11 Jan 2018 06:39:57 +0000 (14:39 +0800)]
f2fs: support FIEMAP_FLAG_XATTR

This patch enables ->fiemap to handle FIEMAP_FLAG_XATTR flag for xattr
mapping info lookup purpose.

It makes f2fs passing generic/425 test in fstest.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix to cover f2fs_inline_data_fiemap with inode_lock
Chao Yu [Thu, 11 Jan 2018 06:37:35 +0000 (14:37 +0800)]
f2fs: fix to cover f2fs_inline_data_fiemap with inode_lock

This patch fix to cover f2fs_inline_data_fiemap with inode_lock in order
to make that interface avoiding race with mapping change.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: check node page again in write end io
Yunlei He [Thu, 11 Jan 2018 06:19:32 +0000 (14:19 +0800)]
f2fs: check node page again in write end io

Check node page again in write end io in case of
data corruption during inflght IO.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix to caclulate required free section correctly
Chao Yu [Wed, 10 Jan 2018 10:18:51 +0000 (18:18 +0800)]
f2fs: fix to caclulate required free section correctly

When calculating required free section during file defragmenting, we
should skip holes in file, otherwise we will probably fail to defrag
sparse file with large size.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: handle newly created page when revoking inmem pages
Daeho Jeong [Wed, 10 Jan 2018 07:49:10 +0000 (16:49 +0900)]
f2fs: handle newly created page when revoking inmem pages

When committing inmem pages is successful, we revoke already committed
blocks in __revoke_inmem_pages() and finally replace the committed
ones with the old blocks using f2fs_replace_block(). However, if
the committed block was newly created one, the address of the old
block is NEW_ADDR and __f2fs_replace_block() cannot handle NEW_ADDR
as new_blkaddr properly and a kernel panic occurrs.

Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com>
Tested-by: Shu Tan <shu.tan@samsung.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: add resgid and resuid to reserve root blocks
Jaegeuk Kim [Fri, 5 Jan 2018 05:36:09 +0000 (21:36 -0800)]
f2fs: add resgid and resuid to reserve root blocks

This patch adds mount options to reserve some blocks via resgid=%u,resuid=%u.
It only activates with reserve_root=%u.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: implement cgroup writeback support
Yufen Yu [Tue, 9 Jan 2018 11:33:39 +0000 (19:33 +0800)]
f2fs: implement cgroup writeback support

Cgroup writeback requires explicit support from the filesystem.
f2fs's data and node writeback IOs go through __write_data_page,
which sets fio for submiting IOs. So, we add io_wbc for fio,
associate bios with blkcg by invoking wbc_init_bio() and
account IOs issuing by wbc_account_io().
In addtion, f2fs_fill_super() is updated to set SB_I_CGROUPWB.

Meta writeback IOs is left alone by this patch and will always be
attributed to the root cgroup.

The results show that f2fs can throttle writeback nicely for
data writing and file creating.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: remove unused pend_list_tag
Chao Yu [Mon, 8 Jan 2018 10:48:34 +0000 (18:48 +0800)]
f2fs: remove unused pend_list_tag

In commit 78997b569f56 ("f2fs: split discard policy"), we have get rid
of using pend_list_tag field in struct discard_cmd_control, but forgot
to remove it, now do it.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: avoid high cpu usage in discard thread
Chao Yu [Mon, 8 Jan 2018 10:48:33 +0000 (18:48 +0800)]
f2fs: avoid high cpu usage in discard thread

We take very long time to finish generic/476, this is because we will
check consistence of all discard entries in global rb tree while
traversing all different granularity pending lists, even when the list
is empty, in order to avoid that unneeded overhead, we have to skip
the check when coming up an empty list.

generic/476 time consumption:
cost
Before patch & w/o consistence check 57s
Before patch & w/ consistence check 1426s
After patch 78s

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: make local functions static
Wei Yongjun [Fri, 5 Jan 2018 09:41:20 +0000 (09:41 +0000)]
f2fs: make local functions static

Fixes the following sparse warnings:

fs/f2fs/segment.c:887:6: warning:
 symbol '__check_sit_bitmap' was not declared. Should it be static?
fs/f2fs/segment.c:1327:6: warning:
 symbol 'f2fs_wait_discard_bio' was not declared. Should it be static?
fs/f2fs/super.c:1661:5: warning:
 symbol 'f2fs_get_projid' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: add reserved blocks for root user
Jaegeuk Kim [Wed, 27 Dec 2017 23:05:52 +0000 (15:05 -0800)]
f2fs: add reserved blocks for root user

This patch allows root to reserve some blocks via mount option.

"-o reserve_root=N" means N x 4KB-sized blocks for root only.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: check segment type in __f2fs_replace_block
Yunlong Song [Thu, 4 Jan 2018 07:02:02 +0000 (15:02 +0800)]
f2fs: check segment type in __f2fs_replace_block

In some case, the node blocks has wrong blkaddr whose segment type is
NODE, e.g., recover inode has missing xattr flag and the blkaddr is in
the xattr range. Since fsck.f2fs does not check the recovery nodes, this
will cause __f2fs_replace_block change the curseg of node and do the
update_sit_entry(sbi, new_blkaddr, 1) with no next_blkoff refresh, as a
result, when recovery process write checkpoint and sync nodes, the
next_blkoff of curseg is used in the segment bit map, then it will
cause f2fs_bug_on. So let's check segment type in __f2fs_replace_block.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: update inode info to inode page for new file
Yunlei He [Wed, 3 Jan 2018 10:03:04 +0000 (18:03 +0800)]
f2fs: update inode info to inode page for new file

After checkpoint,
 1. creat a new file A ,(with dirty inode && dirty inode page && xattr info)
 2. backgroud wb write back file A inode page (without update from inode cache)
 3. fsync file A, write back inode page of file A with inode cache info
 4. sudden power off before new checkpoint

In this case, recovery process will try to recover a zero inode
page. Inline xattr flag of file A will be miss and xattr info
will be taken as blkaddr index.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: show precise # of blocks that user/root can use
Jaegeuk Kim [Wed, 3 Jan 2018 18:55:07 +0000 (10:55 -0800)]
f2fs: show precise # of blocks that user/root can use

Let's show precise # of blocks that user/root can use through bavail and bfree
respectively.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: clean up unneeded declaration
Chao Yu [Wed, 3 Jan 2018 09:32:51 +0000 (17:32 +0800)]
f2fs: clean up unneeded declaration

Commit 6afc662e68b5 ("f2fs: support flexible inline xattr size")
declared f2fs_sb_has_flexible_inline_xattr in f2fs.h for latter being
used in get_inline_xattr_addrs, but in latter version, related code
has been changed, leave f2fs_sb_has_flexible_inline_xattr w/o any
users. Let's remove it for cleanup.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: continue to do direct IO if we only preallocate partial blocks
Chao Yu [Wed, 3 Jan 2018 09:30:19 +0000 (17:30 +0800)]
f2fs: continue to do direct IO if we only preallocate partial blocks

While doing direct IO, if we run out-of-space when we preallocate blocks,
we should not return ENOSPC error directly, instead, we should continue
to do following direct IO, which will keep directIO of f2fs acting like
other filesystems.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: enable quota at remount from r to w
Jaegeuk Kim [Tue, 2 Jan 2018 19:03:19 +0000 (11:03 -0800)]
f2fs: enable quota at remount from r to w

We have to enable quota only when remounting from read to write. Otherwise,
we'll get remount failure. (e.g., write to write case)

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: skip stop_checkpoint for user data writes
Jaegeuk Kim [Mon, 1 Jan 2018 00:26:38 +0000 (16:26 -0800)]
f2fs: skip stop_checkpoint for user data writes

We can give another chance to write user data, which can resolve
generic/441.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix missing error number for xattr operation
Jaegeuk Kim [Fri, 29 Dec 2017 01:47:19 +0000 (17:47 -0800)]
f2fs: fix missing error number for xattr operation

This fixes generic/449 hang problem caused by no ENOSPC forever which should be
returned by setxattr under disk full scenario.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: recover directory operations by fsync
Jaegeuk Kim [Thu, 28 Dec 2017 16:09:44 +0000 (08:09 -0800)]
f2fs: recover directory operations by fsync

This fixes generic/342 which doesn't recover renamed file which was fsynced
before. It will be done via another fsync on newly created file.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: return error during fill_super
Jaegeuk Kim [Wed, 20 Dec 2017 03:16:34 +0000 (19:16 -0800)]
f2fs: return error during fill_super

Let's avoid BUG_ON during fill_super, when on-disk was totall corrupted.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix an error case of missing update inode page
Yunlei He [Tue, 5 Dec 2017 04:07:47 +0000 (12:07 +0800)]
f2fs: fix an error case of missing update inode page

-Thread A                             Thread B

-write_checkpoint
 -block_operations
  -f2fs_unlock_all                    -f2fs_sync_file
                                       -f2fs_write_inode
                                        -f2fs_inode_synced
    -f2fs_sync_inode_meta
     -sync_node_pages
                                        -set_page_drity

In this case, if sudden power off without next new checkpoint,
the last inode page update will lost. wb_writeback is same with
fsync.

Yunlei also reproduced the bug by:

@@ -366,7 +366,7 @@ int update_inode(struct inode *inode, struct page *node_page)
        struct extent_tree *et = F2FS_I(inode)->extent_tree;

        f2fs_inode_synced(inode);
-
+       msleep(10000);
        f2fs_wait_on_page_writeback(node_page, NODE, true);

shell 1:                                       shell2:

dd if=/dev/zero of=./test bs=1M count=10
sync
echo "hello" >> ./test
fsync test  // sleep 10s
                                               sync //return quickly
echo c > /proc/sysrq-trigger

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix potential hangtask in f2fs_trace_pid
Chao Yu [Tue, 12 Dec 2017 06:11:40 +0000 (14:11 +0800)]
f2fs: fix potential hangtask in f2fs_trace_pid

As Jia-Ju Bai reported:

"According to fs/f2fs/trace.c, the kernel module may sleep under a spinlock.
The function call path is:
f2fs_trace_pid (acquire the spinlock)
   f2fs_radix_tree_insert
     cond_resched --> may sleep

I do not find a good way to fix it, so I only report.
This possible bug is found by my static analysis tool (DSAC) and my code
review."

Obviously, it's problemetic to schedule in critical region of spinlock,
which will cause uninterruptable sleep if there is no waker.

This patch changes to use mutex lock intead of spinlock to avoid this
condition.

Reported-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: no need return value in restore summary process
Yunlei He [Wed, 6 Dec 2017 03:31:29 +0000 (11:31 +0800)]
f2fs: no need return value in restore summary process

No need return value in restore summary process

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: use unlikely for release case
LiFan [Tue, 5 Dec 2017 08:38:01 +0000 (16:38 +0800)]
f2fs: use unlikely for release case

Since the variable release is only nonzero when another unlikely
case occurs, use unlikely() on it seems logical.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: don't return value in truncate_data_blocks_range
Chao Yu [Thu, 30 Nov 2017 11:28:23 +0000 (19:28 +0800)]
f2fs: don't return value in truncate_data_blocks_range

There is no caller cares about return value of truncate_data_blocks_range,
remove it.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: clean up f2fs_map_blocks
Chao Yu [Thu, 30 Nov 2017 11:28:22 +0000 (19:28 +0800)]
f2fs: clean up f2fs_map_blocks

f2fs_map_blocks():

if (blkaddr == NEW_ADDR || blkaddr == NULL_ADDR) {
if (create) {
...
} else {
...
if (flag == F2FS_GET_BLOCK_FIEMAP &&
blkaddr == NULL_ADDR) {
...
}
if (flag != F2FS_GET_BLOCK_FIEMAP ||
blkaddr != NEW_ADDR)
goto sync_out;
}

It means we can break the loop in cases of:
a) flag != F2FS_GET_BLOCK_FIEMAP or
b) flag == F2FS_GET_BLOCK_FIEMAP && blkaddr == NULL_ADDR

Condition b) is the same as previous one, so merge operations of them
for readability.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: clean up hash codes
Chao Yu [Thu, 30 Nov 2017 11:28:21 +0000 (19:28 +0800)]
f2fs: clean up hash codes

f2fs_chksum and f2fs_crc32 use the same 'crc32' crypto engine, also
their implementation are almost the same, except with different
shash description context.

Introduce __f2fs_crc32 to wrap the common codes, and reuse it in
f2fs_chksum and f2fs_crc32.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix error handling in fill_super
Chao Yu [Thu, 30 Nov 2017 11:28:20 +0000 (19:28 +0800)]
f2fs: fix error handling in fill_super

In fill_super, if we fail to call f2fs_build_stats(), it needs to detach
from global f2fs shrink list, otherwise once system starts to shrink slab
cache, we will encounter below panic:

BUG: unable to handle kernel paging request at 00007d35
Oops: 0002 [#1] PREEMPT SMP
EIP: __lock_acquire+0x70/0x12c0
Call Trace:
 lock_acquire+0xae/0x220
 mutex_trylock+0xc5/0xf0
 f2fs_shrink_count+0x32/0xb0 [f2fs]
 shrink_slab+0xf1/0x5b0
 drop_slab_node+0x35/0x60
 drop_slab+0xf/0x20
 drop_caches_sysctl_handler+0x79/0xc0
 proc_sys_call_handler+0xa4/0xc0
 proc_sys_write+0x1f/0x30
 __vfs_write+0x24/0x150
 SyS_write+0x44/0x90
 do_fast_syscall_32+0xa1/0x1ca
 entry_SYSENTER_32+0x4c/0x7b

In addition, this patch relocates f2fs_join_shrinker in fill_super to
avoid unneeded error handling of it.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: spread f2fs_k{m,z}alloc
Chao Yu [Thu, 30 Nov 2017 11:28:19 +0000 (19:28 +0800)]
f2fs: spread f2fs_k{m,z}alloc

Use f2fs_k{m,z}alloc as much as possible to increase fault injection
points.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: inject fault to kvmalloc
Chao Yu [Thu, 30 Nov 2017 11:28:18 +0000 (19:28 +0800)]
f2fs: inject fault to kvmalloc

This patch supports to inject fault into kvmalloc/kvzalloc.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: inject fault to kzalloc
Chao Yu [Thu, 30 Nov 2017 11:28:17 +0000 (19:28 +0800)]
f2fs: inject fault to kzalloc

This patch introduces f2fs_kzalloc based on f2fs_kmalloc in order to
support error injection for kzalloc().

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: remove a redundant conditional expression
LiFan [Tue, 28 Nov 2017 12:17:41 +0000 (20:17 +0800)]
f2fs: remove a redundant conditional expression

Avoid checking is_inode repeatedly, and make the logic
a little bit clearer.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: apply write hints to select the type of segment for direct write
Hyunchul Lee [Tue, 28 Nov 2017 00:23:00 +0000 (09:23 +0900)]
f2fs: apply write hints to select the type of segment for direct write

When blocks are allocated for direct write, select the type of
segment using the kiocb hint. But if an inode has FI_NO_ALLOC,
use the inode hint.

Signed-off-by: Hyunchul Lee <cheol.lee@lge.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: switch to fscrypt_prepare_setattr()
Eric Biggers [Wed, 29 Nov 2017 20:35:32 +0000 (12:35 -0800)]
f2fs: switch to fscrypt_prepare_setattr()

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: switch to fscrypt_prepare_lookup()
Eric Biggers [Wed, 29 Nov 2017 20:35:31 +0000 (12:35 -0800)]
f2fs: switch to fscrypt_prepare_lookup()

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: switch to fscrypt_prepare_rename()
Eric Biggers [Wed, 29 Nov 2017 20:35:30 +0000 (12:35 -0800)]
f2fs: switch to fscrypt_prepare_rename()

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: switch to fscrypt_prepare_link()
Eric Biggers [Wed, 29 Nov 2017 20:35:29 +0000 (12:35 -0800)]
f2fs: switch to fscrypt_prepare_link()

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: switch to fscrypt_file_open()
Eric Biggers [Wed, 29 Nov 2017 20:35:28 +0000 (12:35 -0800)]
f2fs: switch to fscrypt_file_open()

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: remove repeated f2fs_bug_on
Zhikang Zhang [Sat, 25 Nov 2017 18:34:28 +0000 (02:34 +0800)]
f2fs: remove repeated f2fs_bug_on

f2fs: remove repeated f2fs_bug_on which has already existed
      in function invalidate_blocks.

Signed-off-by: Zhikang Zhang <zhangzhikang1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: remove an excess variable
LiFan [Sat, 25 Nov 2017 03:46:18 +0000 (11:46 +0800)]
f2fs: remove an excess variable

Remove the variable page_idx which no one would miss.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix lock dependency in between dio_rwsem & i_mmap_sem
Chao Yu [Thu, 23 Nov 2017 15:26:52 +0000 (23:26 +0800)]
f2fs: fix lock dependency in between dio_rwsem & i_mmap_sem

test/generic/208 reports a potential deadlock as below:

Chain exists of:
  &mm->mmap_sem --> &fi->i_mmap_sem --> &fi->dio_rwsem[WRITE]

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&fi->dio_rwsem[WRITE]);
                               lock(&fi->i_mmap_sem);
                               lock(&fi->dio_rwsem[WRITE]);
  lock(&mm->mmap_sem);

This patch changes the lock dependency as below in fallocate() to
fix this issue:
- dio_rwsem
 - i_mmap_sem

Fixes: bb06664a534b ("f2fs: avoid race in between GC and block exchange")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: remove unused parameter
Sheng Yong [Wed, 22 Nov 2017 10:23:40 +0000 (18:23 +0800)]
f2fs: remove unused parameter

Commit d260081ccf37 ("f2fs: change recovery policy of xattr node block")
removes the use of blkaddr, which is no longer used. So remove the
parameter.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: still write data if preallocate only partial blocks
Sheng Yong [Wed, 22 Nov 2017 10:23:39 +0000 (18:23 +0800)]
f2fs: still write data if preallocate only partial blocks

If there is not enough space left, f2fs_preallocate_blocks may only
preallocte partial blocks. As a result, the write operation fails
but i_blocks is not 0.  To avoid this, f2fs should write data in
non-preallocation way and write as many data as the size of i_blocks.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: introduce sysfs readdir_ra to readahead inode block in readdir
Sheng Yong [Wed, 22 Nov 2017 10:23:38 +0000 (18:23 +0800)]
f2fs: introduce sysfs readdir_ra to readahead inode block in readdir

This patch introduces a sysfs interface readdir_ra to enable/disable
readaheading inode block in f2fs_readdir. When readdir_ra is enabled,
it improves the performance of "readdir + stat".

For 300,000 files:
time find /data/test > /dev/null
disable readdir_ra: 1m25.69s real  0m01.94s user  0m50.80s system
enable  readdir_ra: 0m18.55s real  0m00.44s user  0m15.39s system

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix concurrent problem for updating free bitmap
LiFan [Wed, 22 Nov 2017 08:07:23 +0000 (16:07 +0800)]
f2fs: fix concurrent problem for updating free bitmap

alloc_nid_failed and scan_nat_page can be called at the same time,
and we haven't protected add_free_nid and update_free_nid_bitmap
with the same nid_list_lock. That could lead to

Thread A Thread B
- __build_free_nids
 - scan_nat_page
  - add_free_nid
- alloc_nid_failed
 - update_free_nid_bitmap
  - update_free_nid_bitmap

scan_nat_page will clear the free bitmap since the nid is PREALLOC_NID,
but alloc_nid_failed needs to set the free bitmap. This results in
free nid with free bitmap cleared.
This patch update the bitmap under the same nid_list_lock in add_free_nid.
And use __GFP_NOFAIL to make sure to update status of free nid correctly.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: remove unneeded memory footprint accounting
Chao Yu [Tue, 21 Nov 2017 09:49:54 +0000 (17:49 +0800)]
f2fs: remove unneeded memory footprint accounting

We forgot to remov memory footprint accounting of per-cpu type
variables, fix it.

Fixes: 35782b233f37 ("f2fs: remove percpu_count due to performance regression")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: no need to read nat block if nat_block_bitmap is set
Yunlei He [Fri, 17 Nov 2017 08:13:38 +0000 (16:13 +0800)]
f2fs: no need to read nat block if nat_block_bitmap is set

No need to read nat block if nat_block_bitmap is set.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: reserve nid resource for quota sysfile
Chao Yu [Thu, 16 Nov 2017 08:59:14 +0000 (16:59 +0800)]
f2fs: reserve nid resource for quota sysfile

During mkfs, quota sysfiles have already occupied nid resource,
it needs to adjust remaining available nid count in kernel side.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agofscrypt: resolve some cherry-pick bugs
Jaegeuk Kim [Wed, 10 Jan 2018 00:52:25 +0000 (16:52 -0800)]
fscrypt: resolve some cherry-pick bugs

- remove wrong linux/fscrypt.h declared in ext4
- remove obsolete function

Fixes: 734f0d241d2b ("fscrypt: clean up include file mess")
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agofscrypt: move to generic async completion
Gilad Ben-Yossef [Wed, 18 Oct 2017 07:00:44 +0000 (08:00 +0100)]
fscrypt: move to generic async completion

fscrypt starts several async. crypto ops and waiting for them to
complete. Move it over to generic code doing the same.

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
6 years agocrypto: introduce crypto wait for async op
Gilad Ben-Yossef [Wed, 18 Oct 2017 07:00:38 +0000 (08:00 +0100)]
crypto: introduce crypto wait for async op

Invoking a possibly async. crypto op and waiting for completion
while correctly handling backlog processing is a common task
in the crypto API implementation and outside users of it.

This patch adds a generic implementation for doing so in
preparation for using it across the board instead of hand
rolled versions.

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
CC: Eric Biggers <ebiggers3@gmail.com>
CC: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
6 years agofscrypt: lock mutex before checking for bounce page pool
Eric Biggers [Sun, 29 Oct 2017 10:30:19 +0000 (06:30 -0400)]
fscrypt: lock mutex before checking for bounce page pool

fscrypt_initialize(), which allocates the global bounce page pool when
an encrypted file is first accessed, uses "double-checked locking" to
try to avoid locking fscrypt_init_mutex.  However, it doesn't use any
memory barriers, so it's theoretically possible for a thread to observe
a bounce page pool which has not been fully initialized.  This is a
classic bug with "double-checked locking".

While "only a theoretical issue" in the latest kernel, in pre-4.8
kernels the pointer that was checked was not even the last to be
initialized, so it was easily possible for a crash (NULL pointer
dereference) to happen.  This was changed only incidentally by the large
refactor to use fs/crypto/.

Solve both problems in a trivial way that can easily be backported: just
always take the mutex.  It's theoretically less efficient, but it
shouldn't be noticeable in practice as the mutex is only acquired very
briefly once per encrypted file.

Later I'd like to make this use a helper macro like DO_ONCE().  However,
DO_ONCE() runs in atomic context, so we'd need to add a new macro that
allows blocking.

Cc: stable@vger.kernel.org # v4.1+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: new helper function - fscrypt_prepare_setattr()
Eric Biggers [Mon, 9 Oct 2017 19:15:44 +0000 (12:15 -0700)]
fscrypt: new helper function - fscrypt_prepare_setattr()

Introduce a helper function for filesystems to call when processing
->setattr() on a possibly-encrypted inode.  It handles enforcing that an
encrypted file can only be truncated if its encryption key is available.

Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: new helper function - fscrypt_prepare_lookup()
Eric Biggers [Mon, 9 Oct 2017 19:15:43 +0000 (12:15 -0700)]
fscrypt: new helper function - fscrypt_prepare_lookup()

Introduce a helper function which prepares to look up the given dentry
in the given directory.  If the directory is encrypted, it handles
loading the directory's encryption key, setting the dentry's ->d_op to
fscrypt_d_ops, and setting DCACHE_ENCRYPTED_WITH_KEY if the directory's
encryption key is available.

Note: once all filesystems switch over to this, we'll be able to move
fscrypt_d_ops and fscrypt_set_encrypted_dentry() to fscrypt_private.h.

Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: new helper function - fscrypt_prepare_rename()
Eric Biggers [Mon, 9 Oct 2017 19:15:42 +0000 (12:15 -0700)]
fscrypt: new helper function - fscrypt_prepare_rename()

Introduce a helper function which prepares to rename a file into a
possibly encrypted directory.  It handles loading the encryption keys
for the source and target directories if needed, and it handles
enforcing that if the target directory (and the source directory for a
cross-rename) is encrypted, then the file being moved into the directory
has the same encryption policy as its containing directory.

Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: new helper function - fscrypt_prepare_link()
Eric Biggers [Mon, 9 Oct 2017 19:15:41 +0000 (12:15 -0700)]
fscrypt: new helper function - fscrypt_prepare_link()

Introduce a helper function which prepares to link an inode into a
possibly-encrypted directory.  It handles setting up the target
directory's encryption key, then verifying that the link won't violate
the constraint that all files in an encrypted directory tree use the
same encryption policy.

Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: new helper function - fscrypt_file_open()
Eric Biggers [Mon, 9 Oct 2017 19:15:40 +0000 (12:15 -0700)]
fscrypt: new helper function - fscrypt_file_open()

Add a helper function which prepares to open a regular file which may be
encrypted.  It handles setting up the file's encryption key, then
checking that the file's encryption policy matches that of its parent
directory (if the parent directory is encrypted).  It may be set as the
->open() method or it can be called from another ->open() method.

Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: new helper function - fscrypt_require_key()
Eric Biggers [Mon, 9 Oct 2017 19:15:39 +0000 (12:15 -0700)]
fscrypt: new helper function - fscrypt_require_key()

Add a helper function which checks if an inode is encrypted, and if so,
tries to set up its encryption key.  This is a pattern which is
duplicated in multiple places in each of ext4, f2fs, and ubifs --- for
example, when a regular file is asked to be opened or truncated.

Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: remove unneeded empty fscrypt_operations structs
Eric Biggers [Mon, 9 Oct 2017 19:15:38 +0000 (12:15 -0700)]
fscrypt: remove unneeded empty fscrypt_operations structs

In the case where a filesystem has been configured without encryption
support, there is no longer any need to initialize ->s_cop at all, since
none of the methods are ever called.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: remove ->is_encrypted()
Eric Biggers [Mon, 9 Oct 2017 19:15:37 +0000 (12:15 -0700)]
fscrypt: remove ->is_encrypted()

Now that all callers of fscrypt_operations.is_encrypted() have been
switched to IS_ENCRYPTED(), remove ->is_encrypted().

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: switch from ->is_encrypted() to IS_ENCRYPTED()
Eric Biggers [Mon, 9 Oct 2017 19:15:36 +0000 (12:15 -0700)]
fscrypt: switch from ->is_encrypted() to IS_ENCRYPTED()

IS_ENCRYPTED() now gives the same information as
i_sb->s_cop->is_encrypted() but is more efficient, since IS_ENCRYPTED()
is just a simple flag check.  Prepare to remove ->is_encrypted() by
switching all callers to IS_ENCRYPTED().

Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofs, fscrypt: add an S_ENCRYPTED inode flag
Eric Biggers [Mon, 9 Oct 2017 19:15:35 +0000 (12:15 -0700)]
fs, fscrypt: add an S_ENCRYPTED inode flag

Introduce a flag S_ENCRYPTED which can be set in ->i_flags to indicate
that the inode is encrypted using the fscrypt (fs/crypto/) mechanism.

Checking this flag will give the same information that
inode->i_sb->s_cop->is_encrypted(inode) currently does, but will be more
efficient.  This will be useful for adding higher-level helper functions
for filesystems to use.  For example we'll be able to replace this:

if (ext4_encrypted_inode(inode)) {
ret = fscrypt_get_encryption_info(inode);
if (ret)
return ret;
if (!fscrypt_has_encryption_key(inode))
return -ENOKEY;
}

with this:

ret = fscrypt_require_key(inode);
if (ret)
return ret;

... since we'll be able to retain the fast path for unencrypted files as
a single flag check, using an inline function.  This wasn't possible
before because we'd have had to frequently call through the
->i_sb->s_cop->is_encrypted function pointer, even when the encryption
support was disabled or not being used.

Note: we don't define S_ENCRYPTED to 0 if CONFIG_FS_ENCRYPTION is
disabled because we want to continue to return an error if an encrypted
file is accessed without encryption support, rather than pretending that
it is unencrypted.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: clean up include file mess
Dave Chinner [Mon, 9 Oct 2017 19:15:34 +0000 (12:15 -0700)]
fscrypt: clean up include file mess

Filesystems have to include different header files based on whether they
are compiled with encryption support or not. That's nasty and messy.

Instead, rationalise the headers so we have a single include fscrypt.h
and let it decide what internal implementation to include based on the
__FS_HAS_ENCRYPTION define.  Filesystems set __FS_HAS_ENCRYPTION to 1
before including linux/fscrypt.h if they are built with encryption
support.  Otherwise, they must set __FS_HAS_ENCRYPTION to 0.

Add guards to prevent fscrypt_supp.h and fscrypt_notsupp.h from being
directly included by filesystems.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
[EB: use 1 and 0 rather than defined/undefined]
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agofscrypt: fix dereference of NULL user_key_payload
Eric Biggers [Mon, 9 Oct 2017 19:46:18 +0000 (12:46 -0700)]
fscrypt: fix dereference of NULL user_key_payload

When an fscrypt-encrypted file is opened, we request the file's master
key from the keyrings service as a logon key, then access its payload.
However, a revoked key has a NULL payload, and we failed to check for
this.  request_key() *does* skip revoked keys, but there is still a
window where the key can be revoked before we acquire its semaphore.

Fix it by checking for a NULL payload, treating it like a key which was
already revoked at the time it was requested.

Fixes: 88bd6ccdcdd6 ("ext4 crypto: add encryption key management facilities")
Reviewed-by: James Morris <james.l.morris@oracle.com>
Cc: <stable@vger.kernel.org> [v4.1+]
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
6 years agofscrypt: make ->dummy_context() return bool
Eric Biggers [Thu, 22 Jun 2017 19:14:40 +0000 (12:14 -0700)]
fscrypt: make ->dummy_context() return bool

This makes it consistent with ->is_encrypted(), ->empty_dir(), and
fscrypt_dummy_context_enabled().

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 years agof2fs: deny accessing encryption policy if encryption is off
Chao Yu [Tue, 14 Nov 2017 11:28:42 +0000 (19:28 +0800)]
f2fs: deny accessing encryption policy if encryption is off

This patch adds missing feature check in encryption ioctl interface.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: inject fault in inc_valid_node_count
Chao Yu [Mon, 13 Nov 2017 09:32:40 +0000 (17:32 +0800)]
f2fs: inject fault in inc_valid_node_count

This patch adds missing fault injection in inc_valid_node_count.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix to clear FI_NO_PREALLOC
Chao Yu [Mon, 13 Nov 2017 09:32:39 +0000 (17:32 +0800)]
f2fs: fix to clear FI_NO_PREALLOC

We need to clear FI_NO_PREALLOC flag in error path of f2fs_file_write_iter,
otherwise we will lose the chance to preallocate blocks in latter write()
at one time.

Fixes: dc91de78e5e1 ("f2fs: do not preallocate blocks which has wrong buffer")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: expose quota information in debugfs
Jaegeuk Kim [Tue, 14 Nov 2017 01:46:38 +0000 (17:46 -0800)]
f2fs: expose quota information in debugfs

This patch shows # of dirty pages and # of hidden quota files.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: separate nat entry mem alloc from nat_tree_lock
Yunlei He [Fri, 10 Nov 2017 21:36:51 +0000 (13:36 -0800)]
f2fs: separate nat entry mem alloc from nat_tree_lock

This patch splits memory allocation part in nat_entry to avoid lock contention.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: validate before set/clear free nat bitmap
LiFan [Fri, 10 Nov 2017 07:41:42 +0000 (15:41 +0800)]
f2fs: validate before set/clear free nat bitmap

In flush_nat_entries, all dirty nats will be flushed and if
their new address isn't NULL_ADDR, their bitmaps will be updated,
the free_nid_count of the bitmaps will be increaced regardless
of whether the nats have already been occupied before.
This could lead to wrong free_nid_count.
So this patch checks the status of the bits beforeactually
set/clear them.

Fixes: 586d1492f301 ("f2fs: skip scanning free nid bitmap of full NAT blocks")
Signed-off-by: Fan li <fanofcode.li@samsung.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: avoid opened loop codes in __add_ino_entry
Chao Yu [Fri, 10 Nov 2017 01:30:42 +0000 (09:30 +0800)]
f2fs: avoid opened loop codes in __add_ino_entry

We will keep __add_ino_entry success all the time, for ENOMEM failure
case, we have already handled it by using  __GFP_NOFAIL flag, so we
don't have to use additional opened loop codes here, remove them.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: apply write hints to select the type of segments for buffered write
Hyunchul Lee [Thu, 9 Nov 2017 05:51:27 +0000 (14:51 +0900)]
f2fs: apply write hints to select the type of segments for buffered write

Write hints helps F2FS to determine which type of segments would be
selected for buffered write.

This patch implements the mapping from write hints to segment types
as shown below.

  hints               segment type
  -----               ------------
  WRITE_LIFE_SHORT    CURSEG_HOT_DATA
  WRITE_LIFE_EXTREME  CURSEG_COLD_DATA
  others              CURSEG_WARM_DATA

the F2FS poliy for hot/cold seperation has precedence over this hints.
And hints are not applied in in-place update.

Signed-off-by: Hyunchul Lee <cheol.lee@lge.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: introduce scan_curseg_cache for cleanup
Chao Yu [Wed, 8 Nov 2017 09:47:36 +0000 (17:47 +0800)]
f2fs: introduce scan_curseg_cache for cleanup

Commit 4ac912427c42 ("f2fs: introduce free nid bitmap") copied codes
from __build_free_nids() into scan_free_nid_bits(), they are redundant,
introduce one common function scan_curseg_cache for cleanup.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: optimize the way of traversing free_nid_bitmap
Fan Li [Tue, 7 Nov 2017 11:14:24 +0000 (19:14 +0800)]
f2fs: optimize the way of traversing free_nid_bitmap

We call scan_free_nid_bits only when there isn't many
free nids left, it means that marked bits in free_nid_bitmap
are supposed to be few, use find_next_bit_le is more
efficient in such case.
According to my tests, use find_next_bit_le instead of
test_bit_le will cut down the traversal time to one
third of its original.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: keep scanning until enough free nids are acquired
Fan Li [Tue, 7 Nov 2017 03:04:33 +0000 (11:04 +0800)]
f2fs: keep scanning until enough free nids are acquired

In current version, after scan_free_nid_bits, the scan is over if
nid_cnt[FREE_NID] != 0. In most cases, there are still free nids in the
free list during the scan, and scan_free_nid_bits usually can't increase
nid_cnt[FREE_NID]. It causes that __build_free_nids is called many times
without solving the shortage of the free nids. This patch fixes that.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: trace checkpoint reason in fsync()
Chao Yu [Mon, 6 Nov 2017 14:51:45 +0000 (22:51 +0800)]
f2fs: trace checkpoint reason in fsync()

This patch slightly changes need_do_checkpoint to return the detail
info that indicates why we need do checkpoint, then caller could print
it with trace message.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: keep isize once block is reserved cross EOF
Chao Yu [Sun, 5 Nov 2017 13:53:30 +0000 (21:53 +0800)]
f2fs: keep isize once block is reserved cross EOF

Without FADVISE_KEEP_SIZE_BIT, we will try to recover file size
according to last non-hole block, so in fallocate(), we must set
FADVISE_KEEP_SIZE_BIT flag once we have preallocated block cross
EOF, instead of when all preallocation is success. Otherwise, file
size will be incorrect due to lack of this flag.

Simple testcase to reproduce this:

1. echo 2 > /sys/fs/f2fs/<device>/inject_type
2. echo 10 > /sys/fs/f2fs/<device>/inject_rate
3. run tests/generic/392
4. disable fault injection
5. do remount

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: avoid race in between GC and block exchange
Chao Yu [Fri, 3 Nov 2017 02:21:05 +0000 (10:21 +0800)]
f2fs: avoid race in between GC and block exchange

During block exchange in {insert,collapse,move}_range, page-block mapping
is unstable due to mapping moving or recovery, so there should be no
concurrent cache read operation rely on such mapping, nor cache write
operation to mess up block exchange.

So this patch let background GC be aware of that.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: save a multiplication for last_nid calculation
Fan Li [Thu, 2 Nov 2017 03:02:52 +0000 (11:02 +0800)]
f2fs: save a multiplication for last_nid calculation

Use a slightly easier way to calculate last_nid.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: fix summary info corruption
Chao Yu [Thu, 2 Nov 2017 12:41:03 +0000 (20:41 +0800)]
f2fs: fix summary info corruption

Sometimes, after running generic/270 of fstest, fsck reports summary
info and actual position of block address in direct node becoming
inconsistent.

The root cause is race in between __f2fs_replace_block and change_curseg
as below:

Thread A Thread B
- __clone_blkaddrs
 - f2fs_replace_block
  - __f2fs_replace_block
   - segnoA = GET_SEGNO(sbi, blkaddrA);
   - type = se->type:=CURSEG_HOT_DATA
   - if (!IS_CURSEG(sbi, segnoA))
         type = CURSEG_WARM_DATA
- allocate_data_block
 - allocate_segment
  - get_ssr_segment
  - change_curseg(segnoA, CURSEG_HOT_DATA)
   - change_curseg(segnoA, CURSEG_WARM_DATA)
    - reset_curseg
     - __set_sit_entry_type
      - change se->type from CURSEG_HOT_DATA to CURSEG_WARM_DATA

So finally, hot curseg locates in segnoA, but type of segnoA becomes
CURSEG_WARM_DATA.

Then if we invoke __f2fs_replace_block(blkaddrB, blkaddrA, true, false),
as blkaddrA locates in segnoA, so we will move warm type curseg to segnoA,
then change its summary cache and writeback it to summary block.

But segnoA is used by hot type curseg too, once it moves or persist, it
will cover summary block content with inner old summary cache, result in
inconsistent status.

This patch tries to fix this issue by introduce global curseg lock to avoid
race in between __f2fs_replace_block and change_curseg.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
6 years agof2fs: remove dead code in update_meta_page
Chao Yu [Thu, 2 Nov 2017 12:41:02 +0000 (20:41 +0800)]
f2fs: remove dead code in update_meta_page

After commit a468f0ef516f ("f2fs: use crc and cp version to determine
roll-forward recovery"), last caller of update_meta_page passing @src
with NULL is gone, so remove related dead code there.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>