OSDN Git Service
Chao Yu [Mon, 15 Apr 2019 07:26:32 +0000 (15:26 +0800)]
f2fs: introduce DATA_GENERIC_ENHANCE
Previously, f2fs_is_valid_blkaddr(, blkaddr, DATA_GENERIC) will check
whether @blkaddr locates in main area or not.
That check is weak, since the block address in range of main area can
point to the address which is not valid in segment info table, and we
can not detect such condition, we may suffer worse corruption as system
continues running.
So this patch introduce DATA_GENERIC_ENHANCE to enhance the sanity check
which trigger SIT bitmap check rather than only range check.
This patch did below changes as wel:
- set SBI_NEED_FSCK in f2fs_is_valid_blkaddr().
- get rid of is_valid_data_blkaddr() to avoid panic if blkaddr is invalid.
- introduce verify_fio_blkaddr() to wrap fio {new,old}_blkaddr validation check.
- spread blkaddr check in:
* f2fs_get_node_info()
* __read_out_blkaddrs()
* f2fs_submit_page_read()
* ra_data_block()
* do_recover_data()
This patch can fix bug reported from bugzilla below:
https://bugzilla.kernel.org/show_bug.cgi?id=203215
https://bugzilla.kernel.org/show_bug.cgi?id=203223
https://bugzilla.kernel.org/show_bug.cgi?id=203231
https://bugzilla.kernel.org/show_bug.cgi?id=203235
https://bugzilla.kernel.org/show_bug.cgi?id=203241
= Update by Jaegeuk Kim =
DATA_GENERIC_ENHANCE enhanced to validate block addresses on read/write paths.
But, xfstest/generic/446 compalins some generated kernel messages saying invalid
bitmap was detected when reading a block. The reaons is, when we get the
block addresses from extent_cache, there is no lock to synchronize it from
truncating the blocks in parallel.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Fri, 26 Apr 2019 09:57:54 +0000 (17:57 +0800)]
f2fs: fix to handle error in f2fs_disable_checkpoint()
In f2fs_disable_checkpoint(), it needs to detect and propagate error
number returned from f2fs_write_checkpoint().
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chengguang Xu [Tue, 23 Apr 2019 05:08:35 +0000 (13:08 +0800)]
f2fs: remove redundant check in f2fs_file_write_iter()
We have already checked flag IOCB_DIRECT in the sanity
check of flag IOCB_NOWAIT, so don't have to check it
again here.
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 22 Apr 2019 12:22:38 +0000 (20:22 +0800)]
f2fs: fix to be aware of readonly device in write_checkpoint()
As Park Ju Hyung reported:
Probably unrelated but a similar issue:
Warning appears upon unmounting a corrupted R/O f2fs loop image.
Should be a trivial issue to fix as well :)
[ 2373.758424] ------------[ cut here ]------------
[ 2373.758428] generic_make_request: Trying to write to read-only
block-device loop1 (partno 0)
[ 2373.758455] WARNING: CPU: 1 PID: 13950 at block/blk-core.c:2174
generic_make_request_checks+0x590/0x630
[ 2373.758556] CPU: 1 PID: 13950 Comm: umount Tainted: G O
4.19.35-zen+ #1
[ 2373.758558] Hardware name: System manufacturer System Product
Name/ROG MAXIMUS X HERO (WI-FI AC), BIOS 1704 09/14/2018
[ 2373.758564] RIP: 0010:generic_make_request_checks+0x590/0x630
[ 2373.758567] Code: 5c 03 00 00 48 8d 74 24 08 48 89 df c6 05 b5 cd
36 01 01 e8 c2 90 01 00 48 89 c6 44 89 ea 48 c7 c7 98 64 59 82 e8 d5
9b a7 ff <0f> 0b 48 8b 7b 08 e9 f2 fa ff ff 41 8b 86 98 02 00 00 49 8b
16 89
[ 2373.758570] RSP: 0018:
ffff8882bdb43950 EFLAGS:
00010282
[ 2373.758573] RAX:
0000000000000050 RBX:
ffff8887244c6700 RCX:
0000000000000006
[ 2373.758575] RDX:
0000000000000007 RSI:
0000000000000086 RDI:
ffff88884ec56340
[ 2373.758577] RBP:
ffff888849c426c0 R08:
0000000000000004 R09:
00000000000003ba
[ 2373.758579] R10:
0000000000000001 R11:
0000000000000029 R12:
0000000000001000
[ 2373.758581] R13:
0000000000000000 R14:
ffff888844a2e800 R15:
ffff8882bdb43ac0
[ 2373.758584] FS:
00007fc0d114f8c0(0000) GS:
ffff88884ec40000(0000)
knlGS:
0000000000000000
[ 2373.758586] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 2373.758588] CR2:
00007fc0d1ad12c0 CR3:
00000002bdb82003 CR4:
00000000003606e0
[ 2373.758590] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 2373.758592] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[ 2373.758593] Call Trace:
[ 2373.758602] ? generic_make_request+0x46/0x3d0
[ 2373.758608] ? wait_woken+0x80/0x80
[ 2373.758612] ? mempool_alloc+0xb7/0x1a0
[ 2373.758618] ? submit_bio+0x30/0x110
[ 2373.758622] ? bvec_alloc+0x7c/0xd0
[ 2373.758628] ? __submit_merged_bio+0x68/0x390
[ 2373.758633] ? f2fs_submit_page_write+0x1bb/0x7f0
[ 2373.758638] ? f2fs_do_write_meta_page+0x7f/0x160
[ 2373.758642] ? __f2fs_write_meta_page+0x70/0x140
[ 2373.758647] ? f2fs_sync_meta_pages+0x140/0x250
[ 2373.758653] ? f2fs_write_checkpoint+0x5c5/0x17b0
[ 2373.758657] ? f2fs_sync_fs+0x9c/0x110
[ 2373.758664] ? sync_filesystem+0x66/0x80
[ 2373.758667] ? generic_shutdown_super+0x1d/0x100
[ 2373.758670] ? kill_block_super+0x1c/0x40
[ 2373.758674] ? kill_f2fs_super+0x64/0xb0
[ 2373.758678] ? deactivate_locked_super+0x2d/0xb0
[ 2373.758682] ? cleanup_mnt+0x65/0xa0
[ 2373.758688] ? task_work_run+0x7f/0xa0
[ 2373.758693] ? exit_to_usermode_loop+0x9c/0xa0
[ 2373.758698] ? do_syscall_64+0xc7/0xf0
[ 2373.758703] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 2373.758706] ---[ end trace
5d3639907c56271b ]---
[ 2373.758780] print_req_error: I/O error, dev loop1, sector 143048
[ 2373.758800] print_req_error: I/O error, dev loop1, sector 152200
[ 2373.758808] print_req_error: I/O error, dev loop1, sector 8192
[ 2373.758819] print_req_error: I/O error, dev loop1, sector 12272
This patch adds to detect readonly device in write_checkpoint() to avoid
trigger write IOs on it.
Reported-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 22 Apr 2019 12:22:37 +0000 (20:22 +0800)]
f2fs: fix to skip recovery on readonly device
As Park Ju Hyung reported in mailing list:
https://sourceforge.net/p/linux-f2fs/mailman/message/
36639787/
generic_make_request: Trying to write to read-only block-device loop0 (partno 0)
WARNING: CPU: 0 PID: 23437 at block/blk-core.c:2174 generic_make_request_checks+0x594/0x630
generic_make_request+0x46/0x3d0
submit_bio+0x30/0x110
__submit_merged_bio+0x68/0x390
f2fs_submit_page_write+0x1bb/0x7f0
f2fs_do_write_meta_page+0x7f/0x160
__f2fs_write_meta_page+0x70/0x140
f2fs_sync_meta_pages+0x140/0x250
f2fs_write_checkpoint+0x5c5/0x17b0
f2fs_sync_fs+0x9c/0x110
sync_filesystem+0x66/0x80
f2fs_recover_fsync_data+0x790/0xa30
f2fs_fill_super+0xe4e/0x1980
mount_bdev+0x518/0x610
mount_fs+0x34/0x13f
vfs_kern_mount.part.11+0x4f/0x120
do_mount+0x2d1/0xe40
__x64_sys_mount+0xbf/0xe0
do_syscall_64+0x4a/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
print_req_error: I/O error, dev loop0, sector 4096
If block device is readonly, we should never trigger write IO from
filesystem layer, but previously, orphan and journal recovery didn't
consider such condition, result in triggering above warning, fix it.
Reported-by: Park Ju Hyung <qkrwngud825@gmail.com>
Tested-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 22 Apr 2019 12:22:36 +0000 (20:22 +0800)]
f2fs: fix to consider multiple device for readonly check
This patch introduce f2fs_hw_is_readonly() to check whether lower
device is readonly or not, it adapts multiple device scenario.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 22 Apr 2019 09:33:53 +0000 (17:33 +0800)]
f2fs: relocate chksum_offset for large_nat_bitmap feature
For large_nat_bitmap feature, there is a design flaw:
Previous:
struct f2fs_checkpoint layout:
+--------------------------+ 0x0000
| checkpoint_ver |
| ...... |
| checksum_offset |------+
| ...... | |
| sit_nat_version_bitmap[] |<-----|-------+
| ...... | | |
| checksum_value |<-----+ |
+--------------------------+ 0x1000 |
| | nat_bitmap + sit_bitmap
| payload blocks | |
| | |
+--------------------------|<-------------+
Obviously, if nat_bitmap size + sit_bitmap size is larger than
MAX_BITMAP_SIZE_IN_CKPT, nat_bitmap or sit_bitmap may overlap
checkpoint checksum's position, once checkpoint() is triggered
from kernel, nat or sit bitmap will be damaged by checksum field.
In order to fix this, let's relocate checksum_value's position
to the head of sit_nat_version_bitmap as below, then nat/sit
bitmap and chksum value update will become safe.
After:
struct f2fs_checkpoint layout:
+--------------------------+ 0x0000
| checkpoint_ver |
| ...... |
| checksum_offset |------+
| ...... | |
| sit_nat_version_bitmap[] |<-----+
| ...... |<-------------+
| | |
+--------------------------+ 0x1000 |
| | nat_bitmap + sit_bitmap
| payload blocks | |
| | |
+--------------------------|<-------------+
Related report and discussion:
https://sourceforge.net/p/linux-f2fs/mailman/message/
36642346/
Reported-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 22 Apr 2019 09:33:52 +0000 (17:33 +0800)]
f2fs: allow unfixed f2fs_checkpoint.checksum_offset
Previously, f2fs_checkpoint.checksum_offset points fixed position of
f2fs_checkpoint structure:
"#define CP_CHKSUM_OFFSET 4092"
It is unnecessary, and it breaks the consecutiveness of nat and sit
bitmap stored across checkpoint park block and payload blocks.
This patch allows f2fs to handle unfixed .checksum_offset.
In addition, for the case checksum value is stored in the middle of
checkpoint park, calculating checksum value with superposition method
like we did for inode_checksum.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Youngjun Yoo [Sat, 20 Apr 2019 13:50:40 +0000 (22:50 +0900)]
f2fs: Replace spaces with tab
Modify coding style
ERROR: code indent should use tabs where possible
Signed-off-by: Youngjun Yoo <youngjun.willow@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Youngjun Yoo [Sat, 20 Apr 2019 13:51:36 +0000 (22:51 +0900)]
f2fs: insert space before the open parenthesis '('
Modify coding style
ERROR: space required before the open parenthesis '('
Signed-off-by: Youngjun Yoo <youngjun.willow@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 25 Mar 2019 13:08:19 +0000 (21:08 +0800)]
f2fs: allow address pointer number of dnode aligning to specified size
This patch expands scalability of dnode layout, it allows address pointer
number of dnode aligning to specified size (now, the size is one byte by
default), and later the number can align to compress cluster size
(1 << n bytes, n=[2,..)), it can avoid cluster acrossing two dnode, making
design of compress meta layout simple.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 25 Mar 2019 13:07:30 +0000 (21:07 +0800)]
f2fs: introduce f2fs_read_single_page() for cleanup
This patch introduces f2fs_read_single_page() to wrap core operations
of reading one page in f2fs_mpage_readpages().
In addition, if we failed in f2fs_mpage_readpages(), propagate error
number to f2fs_read_data_page(), for f2fs_read_data_pages() path,
always return success.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Park Ju Hyung [Wed, 17 Apr 2019 09:57:38 +0000 (18:57 +0900)]
f2fs: mark is_extension_exist() inline
The caller set_file_temperature() is marked as inline as well.
It doesn't make much sense to leave is_extension_exist() un-inlined.
Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 15 Apr 2019 07:30:53 +0000 (15:30 +0800)]
f2fs: fix to set FI_UPDATE_WRITE correctly
This patch fixes to set FI_UPDATE_WRITE only if in-place IO was issued.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 15 Apr 2019 07:30:52 +0000 (15:30 +0800)]
f2fs: fix to avoid panic in f2fs_inplace_write_data()
As Jungyeon reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=203239
- Overview
When mounting the attached crafted image and running program, following errors are reported.
Additionally, it hangs on sync after running program.
The image is intentionally fuzzed from a normal f2fs image for testing.
Compile options for F2FS are as follows.
CONFIG_F2FS_FS=y
CONFIG_F2FS_STAT_FS=y
CONFIG_F2FS_FS_XATTR=y
CONFIG_F2FS_FS_POSIX_ACL=y
CONFIG_F2FS_CHECK_FS=y
- Reproduces
cc poc_15.c
./run.sh f2fs
sync
- Kernel messages
------------[ cut here ]------------
kernel BUG at fs/f2fs/segment.c:3162!
RIP: 0010:f2fs_inplace_write_data+0x12d/0x160
Call Trace:
f2fs_do_write_data_page+0x3c1/0x820
__write_data_page+0x156/0x720
f2fs_write_cache_pages+0x20d/0x460
f2fs_write_data_pages+0x1b4/0x300
do_writepages+0x15/0x60
__filemap_fdatawrite_range+0x7c/0xb0
file_write_and_wait_range+0x2c/0x80
f2fs_do_sync_file+0x102/0x810
do_fsync+0x33/0x60
__x64_sys_fsync+0xb/0x10
do_syscall_64+0x43/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The reason is f2fs_inplace_write_data() will trigger kernel panic due
to data block locates in node type segment.
To avoid panic, let's just return error code and set SBI_NEED_FSCK to
give a hint to fsck for latter repairing.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 15 Apr 2019 07:30:51 +0000 (15:30 +0800)]
f2fs: fix to do sanity check on valid block count of segment
As Jungyeon reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=203233
- Overview
When mounting the attached crafted image and running program, following errors are reported.
Additionally, it hangs on sync after running program.
The image is intentionally fuzzed from a normal f2fs image for testing.
Compile options for F2FS are as follows.
CONFIG_F2FS_FS=y
CONFIG_F2FS_STAT_FS=y
CONFIG_F2FS_FS_XATTR=y
CONFIG_F2FS_FS_POSIX_ACL=y
CONFIG_F2FS_CHECK_FS=y
- Reproduces
cc poc_13.c
mkdir test
mount -t f2fs tmp.img test
cp a.out test
cd test
sudo ./a.out
sync
- Kernel messages
F2FS-fs (sdb): Bitmap was wrongly set, blk:4608
kernel BUG at fs/f2fs/segment.c:2102!
RIP: 0010:update_sit_entry+0x394/0x410
Call Trace:
f2fs_allocate_data_block+0x16f/0x660
do_write_page+0x62/0x170
f2fs_do_write_node_page+0x33/0xa0
__write_node_page+0x270/0x4e0
f2fs_sync_node_pages+0x5df/0x670
f2fs_write_checkpoint+0x372/0x1400
f2fs_sync_fs+0xa3/0x130
f2fs_do_sync_file+0x1a6/0x810
do_fsync+0x33/0x60
__x64_sys_fsync+0xb/0x10
do_syscall_64+0x43/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
sit.vblocks and sum valid block count in sit.valid_map may be
inconsistent, segment w/ zero vblocks will be treated as free
segment, while allocating in free segment, we may allocate a
free block, if its bitmap is valid previously, it can cause
kernel crash due to bitmap verification failure.
Anyway, to avoid further serious metadata inconsistence and
corruption, it is necessary and worth to detect SIT
inconsistence. So let's enable check_block_count() to verify
vblocks and valid_map all the time rather than do it only
CONFIG_F2FS_CHECK_FS is enabled.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 15 Apr 2019 07:30:50 +0000 (15:30 +0800)]
f2fs: fix to do sanity check on valid node/block count
As Jungyeon reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=203229
- Overview
When mounting the attached crafted image, following errors are reported.
Additionally, it hangs on sync after trying to mount it.
The image is intentionally fuzzed from a normal f2fs image for testing.
Compile options for F2FS are as follows.
CONFIG_F2FS_FS=y
CONFIG_F2FS_STAT_FS=y
CONFIG_F2FS_FS_XATTR=y
CONFIG_F2FS_FS_POSIX_ACL=y
CONFIG_F2FS_CHECK_FS=y
- Reproduces
mkdir test
mount -t f2fs tmp.img test
sync
- Kernel message
kernel BUG at fs/f2fs/recovery.c:591!
RIP: 0010:recover_data+0x12d8/0x1780
Call Trace:
f2fs_recover_fsync_data+0x613/0x710
f2fs_fill_super+0x1043/0x1aa0
mount_bdev+0x16d/0x1a0
mount_fs+0x4a/0x170
vfs_kern_mount+0x5d/0x100
do_mount+0x200/0xcf0
ksys_mount+0x79/0xc0
__x64_sys_mount+0x1c/0x20
do_syscall_64+0x43/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
With corrupted image wihch has out-of-range valid node/block count, during
recovery, once we failed due to no free space, it will trigger kernel
panic.
Adding sanity check on valid node/block count in f2fs_sanity_check_ckpt()
to detect such condition, so that potential panic can be avoided.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 15 Apr 2019 07:28:37 +0000 (15:28 +0800)]
f2fs: fix to avoid panic in do_recover_data()
As Jungyeon reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=203227
- Overview
When mounting the attached crafted image, following errors are reported.
Additionally, it hangs on sync after trying to mount it.
The image is intentionally fuzzed from a normal f2fs image for testing.
Compile options for F2FS are as follows.
CONFIG_F2FS_FS=y
CONFIG_F2FS_STAT_FS=y
CONFIG_F2FS_FS_XATTR=y
CONFIG_F2FS_FS_POSIX_ACL=y
CONFIG_F2FS_CHECK_FS=y
- Reproduces
mkdir test
mount -t f2fs tmp.img test
sync
- Messages
kernel BUG at fs/f2fs/recovery.c:549!
RIP: 0010:recover_data+0x167a/0x1780
Call Trace:
f2fs_recover_fsync_data+0x613/0x710
f2fs_fill_super+0x1043/0x1aa0
mount_bdev+0x16d/0x1a0
mount_fs+0x4a/0x170
vfs_kern_mount+0x5d/0x100
do_mount+0x200/0xcf0
ksys_mount+0x79/0xc0
__x64_sys_mount+0x1c/0x20
do_syscall_64+0x43/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
During recovery, if ofs_of_node is inconsistent in between recovered
node page and original checkpointed node page, let's just fail recovery
instead of making kernel panic.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 15 Apr 2019 07:28:36 +0000 (15:28 +0800)]
f2fs: fix to do sanity check on free nid
As Jungyeon reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=203225
- Overview
When mounting the attached crafted image and unmounting it, following errors are reported.
Additionally, it hangs on sync after unmounting.
The image is intentionally fuzzed from a normal f2fs image for testing.
Compile options for F2FS are as follows.
CONFIG_F2FS_FS=y
CONFIG_F2FS_STAT_FS=y
CONFIG_F2FS_FS_XATTR=y
CONFIG_F2FS_FS_POSIX_ACL=y
CONFIG_F2FS_CHECK_FS=y
- Reproduces
mkdir test
mount -t f2fs tmp.img test
touch test/t
umount test
sync
- Messages
kernel BUG at fs/f2fs/node.c:3073!
RIP: 0010:f2fs_destroy_node_manager+0x2f0/0x300
Call Trace:
f2fs_put_super+0xf4/0x270
generic_shutdown_super+0x62/0x110
kill_block_super+0x1c/0x50
kill_f2fs_super+0xad/0xd0
deactivate_locked_super+0x35/0x60
cleanup_mnt+0x36/0x70
task_work_run+0x75/0x90
exit_to_usermode_loop+0x93/0xa0
do_syscall_64+0xba/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0010:f2fs_destroy_node_manager+0x2f0/0x300
NAT table is corrupted, so reserved meta/node inode ids were added into
free list incorrectly, during file creation, since reserved id has cached
in inode hash, so it fails the creation and preallocated nid can not be
released later, result in kernel panic.
To fix this issue, let's do nid boundary check during free nid loading.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 15 Apr 2019 07:28:35 +0000 (15:28 +0800)]
f2fs: fix to do checksum even if inode page is uptodate
As Jungyeon reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=203221
- Overview
When mounting the attached crafted image and running program, this error is reported.
The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
- Reproduces
cc poc_07.c
mkdir test
mount -t f2fs tmp.img test
cp a.out test
cd test
sudo ./a.out
- Messages
kernel BUG at fs/f2fs/node.c:1279!
RIP: 0010:read_node_page+0xcf/0xf0
Call Trace:
__get_node_page+0x6b/0x2f0
f2fs_iget+0x8f/0xdf0
f2fs_lookup+0x136/0x320
__lookup_slow+0x92/0x140
lookup_slow+0x30/0x50
walk_component+0x1c1/0x350
path_lookupat+0x62/0x200
filename_lookup+0xb3/0x1a0
do_fchmodat+0x3e/0xa0
__x64_sys_chmod+0x12/0x20
do_syscall_64+0x43/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
On below paths, we can have opportunity to readahead inode page
- gc_node_segment -> f2fs_ra_node_page
- gc_data_segment -> f2fs_ra_node_page
- f2fs_fill_dentries -> f2fs_ra_node_page
Unlike synchronized read, on readahead path, we can set page uptodate
before verifying page's checksum, then read_node_page() will trigger
kernel panic once it encounters a uptodated page w/ incorrect checksum.
So considering readahead scenario, we have to do checksum each time
when loading inode page even if it is uptodated.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 15 Apr 2019 07:28:34 +0000 (15:28 +0800)]
f2fs: fix to avoid panic in f2fs_remove_inode_page()
As Jungyeon reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=203219
- Overview
When mounting the attached crafted image and running program, I got this error.
Additionally, it hangs on sync after running the program.
The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
- Reproduces
cc poc_06.c
mkdir test
mount -t f2fs tmp.img test
cp a.out test
cd test
sudo ./a.out
sync
- Messages
kernel BUG at fs/f2fs/node.c:1183!
RIP: 0010:f2fs_remove_inode_page+0x294/0x2d0
Call Trace:
f2fs_evict_inode+0x2a3/0x3a0
evict+0xba/0x180
__dentry_kill+0xbe/0x160
dentry_kill+0x46/0x180
dput+0xbb/0x100
do_renameat2+0x3c9/0x550
__x64_sys_rename+0x17/0x20
do_syscall_64+0x43/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The reason is f2fs_remove_inode_page() will trigger kernel panic due to
inconsistent i_blocks value of inode.
To avoid panic, let's just print debug message and set SBI_NEED_FSCK to
give a hint to fsck for latter repairing of potential image corruption.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix build warning and add unlikely]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 15 Apr 2019 07:28:33 +0000 (15:28 +0800)]
f2fs: fix to clear dirty inode in error path of f2fs_iget()
As Jungyeon reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=203217
- Overview
When mounting the attached crafted image and running program, I got this error.
Additionally, it hangs on sync after running the program.
The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
- Reproduces
cc poc_test_05.c
mkdir test
mount -t f2fs tmp.img test
sudo ./a.out
sync
- Messages
kernel BUG at fs/f2fs/inode.c:707!
RIP: 0010:f2fs_evict_inode+0x33f/0x3a0
Call Trace:
evict+0xba/0x180
f2fs_iget+0x598/0xdf0
f2fs_lookup+0x136/0x320
__lookup_slow+0x92/0x140
lookup_slow+0x30/0x50
walk_component+0x1c1/0x350
path_lookupat+0x62/0x200
filename_lookup+0xb3/0x1a0
do_readlinkat+0x56/0x110
__x64_sys_readlink+0x16/0x20
do_syscall_64+0x43/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
During inode loading, __recover_inline_status() can recovery inode status
and set inode dirty, once we failed in following process, it will fail
the check in f2fs_evict_inode, result in trigger BUG_ON().
Let's clear dirty inode in error path of f2fs_iget() to avoid panic.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 15 Apr 2019 07:28:32 +0000 (15:28 +0800)]
f2fs: remove new blank line of f2fs kernel message
Just removing '\n' in f2fs_msg(, "\n") to avoid redundant new blank line.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 15 Apr 2019 07:26:31 +0000 (15:26 +0800)]
f2fs: fix wrong __is_meta_io() macro
This patch changes codes as below:
- don't use is_read_io() as a condition to judge the meta IO.
- use .is_por to replace .is_meta to indicate IO is from recovery explicitly.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 15 Apr 2019 07:28:31 +0000 (15:28 +0800)]
f2fs: fix to avoid panic in dec_valid_node_count()
As Jungyeon reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=203213
- Overview
When mounting the attached crafted image and running program, I got this error.
Additionally, it hangs on sync after running the this script.
The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
- Reproduces
mkdir test
mount -t f2fs tmp.img test
cp a.out test
cd test
sudo ./a.out
sync
kernel BUG at fs/f2fs/f2fs.h:2012!
RIP: 0010:truncate_node+0x2c9/0x2e0
Call Trace:
f2fs_truncate_xattr_node+0xa1/0x130
f2fs_remove_inode_page+0x82/0x2d0
f2fs_evict_inode+0x2a3/0x3a0
evict+0xba/0x180
__dentry_kill+0xbe/0x160
dentry_kill+0x46/0x180
dput+0xbb/0x100
do_renameat2+0x3c9/0x550
__x64_sys_rename+0x17/0x20
do_syscall_64+0x43/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The reason is dec_valid_node_count() will trigger kernel panic due to
inconsistent count in between inode.i_blocks and actual block.
To avoid panic, let's just print debug message and set SBI_NEED_FSCK to
give a hint to fsck for latter repairing.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix build warning and add unlikely]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 15 Apr 2019 07:28:30 +0000 (15:28 +0800)]
f2fs: fix to avoid panic in dec_valid_block_count()
As Jungyeon reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=203209
- Overview
When mounting the attached crafted image and running program, I got this error.
Additionally, it hangs on sync after the this script.
The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
- Reproduces
cc poc_01.c
./run.sh f2fs
sync
kernel BUG at fs/f2fs/f2fs.h:1788!
RIP: 0010:f2fs_truncate_data_blocks_range+0x342/0x350
Call Trace:
f2fs_truncate_blocks+0x36d/0x3c0
f2fs_truncate+0x88/0x110
f2fs_setattr+0x3e1/0x460
notify_change+0x2da/0x400
do_truncate+0x6d/0xb0
do_sys_ftruncate+0xf1/0x160
do_syscall_64+0x43/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The reason is dec_valid_block_count() will trigger kernel panic due to
inconsistent count in between inode.i_blocks and actual block.
To avoid panic, let's just print debug message and set SBI_NEED_FSCK to
give a hint to fsck for latter repairing.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix build warning and add unlikely]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Thu, 11 Apr 2019 03:48:10 +0000 (11:48 +0800)]
f2fs: fix to use inline space only if inline_xattr is enable
With below mkfs and mount option:
MKFS_OPTIONS -- -O extra_attr -O project_quota -O inode_checksum -O flexible_inline_xattr -O inode_crtime -f
MOUNT_OPTIONS -- -o noinline_xattr
We may miss xattr data with below testcase:
- mkdir dir
- setfattr -n "user.name" -v 0 dir
- for ((i = 0; i < 190; i++)) do touch dir/$i; done
- umount
- mount
- getfattr -n "user.name" dir
user.name: No such attribute
The root cause is that we persist xattr data into reserved inline xattr
space, even if inline_xattr is not enable in inline directory inode, after
inline dentry conversion, reserved space no longer exists, so that xattr
data missed.
Let's use inline xattr space only if inline_xattr flag is set on inode
to fix this iusse.
Fixes:
6afc662e68b5 ("f2fs: support flexible inline xattr size")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Thu, 11 Apr 2019 03:48:09 +0000 (11:48 +0800)]
f2fs: fix to retrieve inline xattr space
With below mkfs and mount option, generic/339 of fstest will report that
scratch image becomes corrupted.
MKFS_OPTIONS -- -O extra_attr -O project_quota -O inode_checksum -O flexible_inline_xattr -O inode_crtime -f /dev/zram1
MOUNT_OPTIONS -- -o acl,user_xattr -o discard,noinline_xattr /dev/zram1 /mnt/scratch_f2fs
[ASSERT] (f2fs_check_dirent_position:1315) --> Wrong position of dirent pino:1970, name: (...)
level:8, dir_level:0, pgofs:951, correct range:[900, 901]
In old kernel, inline data and directory always reserved 200 bytes in
inode layout, even if inline_xattr is disabled, then new kernel tries
to retrieve that space for non-inline xattr inode, but for inline dentry,
its layout size should be fixed, so we just keep that reserved space.
But the problem here is that, after inline dentry conversion, inline
dentry layout no longer exists, if we still reserve inline xattr space,
after dents updates, there will be a hole in inline xattr space, which
can break hierarchy hash directory structure.
This patch fixes this issue by retrieving inline xattr space after
inline dentry conversion.
Fixes:
6afc662e68b5 ("f2fs: support flexible inline xattr size")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 10 Apr 2019 10:45:26 +0000 (18:45 +0800)]
f2fs: fix error path of recovery
There are some places in where we missed to unlock page or unlock page
incorrectly, fix them.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 10 Apr 2019 10:45:50 +0000 (18:45 +0800)]
f2fs: fix to avoid deadloop in foreground GC
As Jungyeon reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=203211
- Overview
When mounting the attached crafted image and making a new file, I got this error and the error messages keep repeating.
The image is intentionally fuzzed from a normal f2fs image for testing and I run with option CONFIG_F2FS_CHECK_FS on.
- Reproduces
mkdir test
mount -t f2fs tmp.img test
cd test
touch t
- Messages
[ 58.820451] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.821485] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.822530] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.823571] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.824616] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.825640] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.826663] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.827698] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.828719] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.829759] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.830783] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.831828] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.832869] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.833888] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.834945] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.835996] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.837028] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.838051] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.839072] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.840100] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.841147] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.842186] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.843214] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.844267] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.845282] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.846305] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.847341] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
... (repeating)
During GC, if segment type stored in SSA and SIT is inconsistent, we just
skip migrating current segment directly, since we need to know the exact
type to decide the migration function we use.
So in foreground GC, we will easily run into a infinite loop as we may
select the same victim segment which has inconsistent type due to greedy
policy. In order to end up this, we choose to shutdown filesystem. For
backgrond GC, we need to do that as well, so that we can avoid latter
potential infinite looped foreground GC.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Hariprasad Kelam [Sat, 6 Apr 2019 10:59:36 +0000 (16:29 +0530)]
f2fs: data: fix warning Using plain integer as NULL pointer
changed passing function argument "0 to NULL" to fix below sparse
warning
fs/f2fs/data.c:426:47: warning: Using plain integer as NULL pointer
Signed-off-by: Hariprasad Kelam <hariprasad.kelam@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Tue, 2 Apr 2019 10:52:22 +0000 (18:52 +0800)]
f2fs: add tracepoint for f2fs_file_write_iter()
This patch adds tracepoint for f2fs_file_write_iter().
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Tue, 2 Apr 2019 10:52:20 +0000 (18:52 +0800)]
f2fs: add comment for conditional compilation statement
Commit
af033b2aa8a8 ("f2fs: guarantee journalled quota data by checkpoint")
added function is_journalled_quota() in f2fs.h, but it located outside of
_LINUX_F2FS_H macro coverage, it has been fixed with commit
0af725fcb77a
("f2fs: fix wrong #endif").
But anyway, in order to avoid making same mistake latter, let's add single
line comment to notice which #if the last #endif is corresponding to.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: Remove unnecessary empty EOL]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Tue, 2 Apr 2019 10:52:19 +0000 (18:52 +0800)]
f2fs: fix potential recursive call when enabling data_flush
As Hagbard Celine reported:
Hi, this is a long standing bug that I've hit before on older kernels,
but I was not able to get the syslog saved because of the nature of
the bug. This time I had booted form a pen-drive, and was able to save
the log to it's efi-partition.
What i did to trigger it was to create a partition and format it f2fs,
then mount it with options:
"rw,relatime,lazytime,background_gc=on,disable_ext_identify,discard,heap,user_xattr,inline_xattr,acl,inline_data,inline_dentry,flush_merge,data_flush,extent_cache,mode=adaptive,active_logs=6,whint_mode=fs-based,alloc_mode=default,fsync_mode=strict".
Then I unpacked a big .tar.xz to the partition (I used a
gentoo-stage3-tarball as I was in process of installing Gentoo).
Same options just without data_flush gives no problems.
Mar 20 20:54:01 usbgentoo kernel: FAT-fs (nvme0n1p4): Volume was not
properly unmounted. Some data may be corrupt. Please run fsck.
Mar 20 21:05:23 usbgentoo kernel: kworker/dying (1588) used greatest
stack depth: 12064 bytes left
Mar 20 21:06:40 usbgentoo kernel: BUG: stack guard page was hit at
00000000a4b0733c (stack is
0000000056016422..
0000000096e7463f)
Mar 20 21:06:40 usbgentoo kernel: kernel stack overflow
......
Mar 20 21:06:40 usbgentoo kernel: Call Trace:
Mar 20 21:06:40 usbgentoo kernel: read_node_page+0x71/0xf0
Mar 20 21:06:40 usbgentoo kernel: ? xas_load+0x8/0x50
Mar 20 21:06:40 usbgentoo kernel: __get_node_page+0x73/0x2a0
Mar 20 21:06:40 usbgentoo kernel: f2fs_get_dnode_of_data+0x34e/0x580
Mar 20 21:06:40 usbgentoo kernel: f2fs_write_inline_data+0x5e/0x2a0
Mar 20 21:06:40 usbgentoo kernel: __write_data_page+0x421/0x690
Mar 20 21:06:40 usbgentoo kernel: f2fs_write_cache_pages+0x1cf/0x460
Mar 20 21:06:40 usbgentoo kernel: f2fs_write_data_pages+0x2b3/0x2e0
Mar 20 21:06:40 usbgentoo kernel: ? f2fs_inode_chksum_verify+0x1d/0xc0
Mar 20 21:06:40 usbgentoo kernel: ? read_node_page+0x71/0xf0
Mar 20 21:06:40 usbgentoo kernel: do_writepages+0x3c/0xd0
Mar 20 21:06:40 usbgentoo kernel: __filemap_fdatawrite_range+0x7c/0xb0
Mar 20 21:06:40 usbgentoo kernel: f2fs_sync_dirty_inodes+0xf2/0x200
Mar 20 21:06:40 usbgentoo kernel: f2fs_balance_fs_bg+0x2a3/0x2c0
Mar 20 21:06:40 usbgentoo kernel: ? f2fs_inode_dirtied+0x21/0xc0
Mar 20 21:06:40 usbgentoo kernel: f2fs_balance_fs+0xd6/0x2b0
Mar 20 21:06:40 usbgentoo kernel: __write_data_page+0x4fb/0x690
......
Mar 20 21:06:40 usbgentoo kernel: __writeback_single_inode+0x2a1/0x340
Mar 20 21:06:40 usbgentoo kernel: ? soft_cursor+0x1b4/0x220
Mar 20 21:06:40 usbgentoo kernel: writeback_sb_inodes+0x1d5/0x3e0
Mar 20 21:06:40 usbgentoo kernel: __writeback_inodes_wb+0x58/0xa0
Mar 20 21:06:40 usbgentoo kernel: wb_writeback+0x250/0x2e0
Mar 20 21:06:40 usbgentoo kernel: ? 0xffffffff8c000000
Mar 20 21:06:40 usbgentoo kernel: ? cpumask_next+0x16/0x20
Mar 20 21:06:40 usbgentoo kernel: wb_workfn+0x2f6/0x3b0
Mar 20 21:06:40 usbgentoo kernel: ? __switch_to_asm+0x40/0x70
Mar 20 21:06:40 usbgentoo kernel: process_one_work+0x1f5/0x3f0
Mar 20 21:06:40 usbgentoo kernel: worker_thread+0x28/0x3c0
Mar 20 21:06:40 usbgentoo kernel: ? rescuer_thread+0x330/0x330
Mar 20 21:06:40 usbgentoo kernel: kthread+0x10e/0x130
Mar 20 21:06:40 usbgentoo kernel: ? kthread_create_on_node+0x60/0x60
Mar 20 21:06:40 usbgentoo kernel: ret_from_fork+0x35/0x40
The root cause is that we run into an infinite recursive calling in
between f2fs_balance_fs_bg and writepage() as described below:
- f2fs_write_data_pages --- A
- __write_data_page
- f2fs_balance_fs
- f2fs_balance_fs_bg --- B
- f2fs_sync_dirty_inodes
- filemap_fdatawrite
- f2fs_write_data_pages --- A
...
- f2fs_balance_fs_bg --- B
...
In order to fix this issue, let's detect such condition in __write_data_page()
and just skip calling f2fs_balance_fs() recursively.
Reported-by: Hagbard Celine <hagbardcelin@gmail.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Damien Le Moal [Sat, 16 Mar 2019 00:13:08 +0000 (09:13 +0900)]
f2fs: improve discard handling with multi-device volumes
f2fs_hw_support_discard() only tests if the super block device supports
discard. However, for a multi-device volume, not all disks used may
support discard. Improve the check performed to test all devices of
the volume and report discard as supported if at least one device of
the volume supports discard. To implement this, introduce the helper
function f2fs_bdev_support_discard(), which returns true for zoned block
devices (where discard is processed as a zone reset) and for regular
disks supporting the discard command.
f2fs_bdev_support_discard() is also used in __queue_discard_cmd() to
handle discard command issuing for a particular device of the volume.
That is, prevent issuing a discard command for block devices that do
not support it.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Damien Le Moal [Sat, 16 Mar 2019 00:13:07 +0000 (09:13 +0900)]
f2fs: Reduce zoned block device memory usage
For zoned block devices, an array of zone types for each device is
allocated and initialized in order to determine if a section is stored
on a sequential zone (zone reset needed) or a conventional zone (no
zone reset needed and regular discard applies). Considering this usage,
the zone types stored in memory can be replaced with a bitmap to
indicate an equivalent information, that is, if a zone is sequential or
not. This reduces the memory usage for each zoned device by roughly 8:
on a 14TB disk with zones of 256 MB, the zone type array consumes
13x4KB pages while the bitmap uses only 2x4KB pages.
This patch changes the f2fs_dev_info structure blkz_type field to the
bitmap blkz_seq. Access to this bitmap is done using the helper
function f2fs_blkz_is_seq(), which is a rewrite of the function
get_blkz_type().
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Damien Le Moal [Sat, 16 Mar 2019 00:13:06 +0000 (09:13 +0900)]
f2fs: Fix use of number of devices
For a single device mount using a zoned block device, the zone
information for the device is stored in the sbi->devs single entry
array and sbi->s_ndevs is set to 1. This differs from a single device
mount using a regular block device which does not allocate sbi->devs
and sets sbi->s_ndevs to 0.
However, sbi->s_devs == 0 condition is used throughout the code to
differentiate a single device mount from a multi-device mount where
sbi->s_ndevs is always larger than 1. This results in problems with
single zoned block device volumes as these are treated as multi-device
mounts but do not have the start_blk and end_blk information set. One
of the problem observed is skipping of zone discard issuing resulting in
write commands being issued to full zones or unaligned to a zone write
pointer.
Fix this problem by simply treating the cases sbi->s_ndevs == 0 (single
regular block device mount) and sbi->s_ndevs == 1 (single zoned block
device mount) in the same manner. This is done by introducing the
helper function f2fs_is_multi_device() and using this helper in place
of direct tests of sbi->s_ndevs value, improving code readability.
Fixes:
7bb3a371d199 ("f2fs: Fix zoned block device support")
Cc: <stable@vger.kernel.org>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Wed, 13 Mar 2019 23:15:08 +0000 (16:15 -0700)]
f2fs: set pin_file under CAP_SYS_ADMIN
Android uses pin_file for uncrypt during OTA, and that should be managed by
CAP_SYS_ADMIN only.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Tue, 12 Mar 2019 07:44:27 +0000 (15:44 +0800)]
f2fs: fix to avoid deadlock in f2fs_read_inline_dir()
As Jiqun Li reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=202883
sometimes, dead lock when make system call SYS_getdents64 with fsync() is
called by another process.
monkey running on android9.0
1. task 9785 held sbi->cp_rwsem and waiting lock_page()
2. task 10349 held mm_sem and waiting sbi->cp_rwsem
3. task 9709 held lock_page() and waiting mm_sem
so this is a dead lock scenario.
task stack is show by crash tools as following
crash_arm64> bt
ffffffc03c354080
PID: 9785 TASK:
ffffffc03c354080 CPU: 1 COMMAND: "RxIoScheduler-3"
>> #7 [
ffffffc01b50fac0] __lock_page at
ffffff80081b11e8
crash-arm64> bt 10349
PID: 10349 TASK:
ffffffc018b83080 CPU: 1 COMMAND: "BUGLY_ASYNC_UPL"
>> #3 [
ffffffc01f8cfa40] rwsem_down_read_failed at
ffffff8008a93afc
PC:
00000033 LR:
00000000 SP:
00000000 PSTATE:
ffffffffffffffff
crash-arm64> bt 9709
PID: 9709 TASK:
ffffffc03e7f3080 CPU: 1 COMMAND: "IntentService[A"
>> #3 [
ffffffc001e67850] rwsem_down_read_failed at
ffffff8008a93afc
>> #8 [
ffffffc001e67b80] el1_ia at
ffffff8008084fc4
PC:
ffffff8008274114 [compat_filldir64+120]
LR:
ffffff80083584d4 [f2fs_fill_dentries+448]
SP:
ffffffc001e67b80 PSTATE:
80400145
X29:
ffffffc001e67b80 X28:
0000000000000000 X27:
000000000000001a
X26:
00000000000093d7 X25:
ffffffc070d52480 X24:
0000000000000008
X23:
0000000000000028 X22:
00000000d43dfd60 X21:
ffffffc001e67e90
X20:
0000000000000011 X19:
ffffff80093a4000 X18:
0000000000000000
X17:
0000000000000000 X16:
0000000000000000 X15:
0000000000000000
X14:
ffffffffffffffff X13:
0000000000000008 X12:
0101010101010101
X11:
7f7f7f7f7f7f7f7f X10:
6a6a6a6a6a6a6a6a X9:
7f7f7f7f7f7f7f7f
X8:
0000000080808000 X7:
ffffff800827409c X6:
0000000080808000
X5:
0000000000000008 X4:
00000000000093d7 X3:
000000000000001a
X2:
0000000000000011 X1:
ffffffc070d52480 X0:
0000000000800238
>> #9 [
ffffffc001e67be0] f2fs_fill_dentries at
ffffff80083584d0
PC:
0000003c LR:
00000000 SP:
00000000 PSTATE:
000000d9
X12:
f48a02ff X11:
d4678960 X10:
d43dfc00 X9:
d4678ae4
X8:
00000058 X7:
d4678994 X6:
d43de800 X5:
000000d9
X4:
d43dfc0c X3:
d43dfc10 X2:
d46799c8 X1:
00000000
X0:
00001068
Below potential deadlock will happen between three threads:
Thread A Thread B Thread C
- f2fs_do_sync_file
- f2fs_write_checkpoint
- down_write(&sbi->node_change) -- 1)
- do_page_fault
- down_write(&mm->mmap_sem) -- 2)
- do_wp_page
- f2fs_vm_page_mkwrite
- getdents64
- f2fs_read_inline_dir
- lock_page -- 3)
- f2fs_sync_node_pages
- lock_page -- 3)
- __do_map_lock
- down_read(&sbi->node_change) -- 1)
- f2fs_fill_dentries
- dir_emit
- compat_filldir64
- do_page_fault
- down_read(&mm->mmap_sem) -- 2)
Since f2fs_readdir is protected by inode.i_rwsem, there should not be
any updates in inode page, we're safe to lookup dents in inode page
without its lock held, so taking off the lock to improve concurrency
of readdir and avoid potential deadlock.
Reported-by: Jiqun Li <jiqun.li@unisoc.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Tue, 5 Mar 2019 11:32:26 +0000 (19:32 +0800)]
f2fs: fix to adapt small inline xattr space in __find_inline_xattr()
With below testcase, we will fail to find existed xattr entry:
1. mkfs.f2fs -O extra_attr -O flexible_inline_xattr /dev/zram0
2. mount -t f2fs -o inline_xattr_size=1 /dev/zram0 /mnt/f2fs/
3. touch /mnt/f2fs/file
4. setfattr -n "user.name" -v 0 /mnt/f2fs/file
5. getfattr -n "user.name" /mnt/f2fs/file
/mnt/f2fs/file: user.name: No such attribute
The reason is for inode which has very small inline xattr size,
__find_inline_xattr() will fail to traverse any entry due to first
entry may not be loaded from xattr node yet, later, we may skip to
check entire xattr datas in __find_xattr(), result in such wrong
condition.
This patch adds condition to check such case to avoid this issue.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 4 Mar 2019 09:19:04 +0000 (17:19 +0800)]
f2fs: fix to do sanity check with inode.i_inline_xattr_size
As Paul Bandha reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=202709
When I run the poc on the mounted f2fs img I get a buffer overflow in
read_inline_xattr due to there being no sanity check on the value of
i_inline_xattr_size.
I created the img by just modifying the value of i_inline_xattr_size
in the inode:
i_name [test1.txt]
i_ext: fofs:0 blkaddr:0 len:0
i_extra_isize [0x 18 : 24]
i_inline_xattr_size [0x ffff : 65535]
i_addr[ofs] [0x 0 : 0]
mkdir /mnt/f2fs
mount ./f2fs1.img /mnt/f2fs
gcc poc.c -o poc
./poc
int main() {
int y = syscall(SYS_listxattr, "/mnt/f2fs/test1.txt", NULL, 0);
printf("ret %d", y);
printf("errno: %d\n", errno);
}
BUG: KASAN: slab-out-of-bounds in read_inline_xattr+0x18f/0x260
Read of size 262140 at addr
ffff88011035efd8 by task f2fs1poc/3263
CPU: 0 PID: 3263 Comm: f2fs1poc Not tainted 4.18.0-custom #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/2014
Call Trace:
dump_stack+0x71/0xab
print_address_description+0x83/0x250
kasan_report+0x213/0x350
memcpy+0x1f/0x50
read_inline_xattr+0x18f/0x260
read_all_xattrs+0xba/0x190
f2fs_listxattr+0x9d/0x3f0
listxattr+0xb2/0xd0
path_listxattr+0x93/0xe0
do_syscall_64+0x9d/0x220
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Let's add sanity check for inode.i_inline_xattr_size during f2fs_iget()
to avoid this issue.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Tue, 12 Mar 2019 18:49:53 +0000 (11:49 -0700)]
f2fs: give some messages for inline_xattr_size
This patch adds some kernel messages when user sets wrong inline_xattr_size.
Fixes:
500e0b28ecd3 ("f2fs: fix to check inline_xattr_size boundary correctly")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Thu, 7 Mar 2019 09:31:30 +0000 (17:31 +0800)]
f2fs: don't trigger read IO for beyond EOF page
In f2fs_mpage_readpages(), if page is beyond EOF, we should just
zero out it, but previously, before checking previous mapping
info, we missed to check filesize boundary, fix it.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 6 Mar 2019 09:30:59 +0000 (17:30 +0800)]
f2fs: fix to add refcount once page is tagged PG_private
As Gao Xiang reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=202749
f2fs may skip pageout() due to incorrect page reference count.
The problem here is that MM defined the rule [1] very clearly that
once page was set with PG_private flag, we should increment the
refcount in that page, also main flows like pageout(), migrate_page()
will assume there is one additional page reference count if
page_has_private() returns true.
But currently, f2fs won't add/del refcount when changing PG_private
flag. Anyway, f2fs should follow MM's rule to make MM's related flows
running as expected.
[1] https://lore.kernel.org/lkml/
2b19b3c4-2bc4-15fa-15cc-
27a13e5c7af1@aol.com/
Reported-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 6 Mar 2019 08:18:33 +0000 (16:18 +0800)]
f2fs: remove wrong comment in f2fs_invalidate_page()
Since
8c242db9b8c0 ("f2fs: fix stale ATOMIC_WRITTEN_PAGE private pointer"),
we've started to not skip clear private flag for atomic_write page
truncation, so removing old wrong comment in f2fs_invalidate_page().
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Tue, 5 Mar 2019 09:52:33 +0000 (17:52 +0800)]
f2fs: fix to use kvfree instead of kzfree
As Jiqun Li reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=202747
System can panic due to using wrong allocate/free function pair
in xattr interface:
- use kvmalloc to allocate memory
- use kzfree to free memory
Let's fix to use kvfree instead of kzfree, BTW, we are safe to
get rid of kzfree, since there is no such confidential data stored
as xattr, we don't need to zero it before free memory.
Fixes:
5222595d093e ("f2fs: use kvmalloc, if kmalloc is failed")
Reported-by: Jiqun Li <jiqun.li@unisoc.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Tue, 26 Feb 2019 11:01:16 +0000 (19:01 +0800)]
f2fs: print more parameters in trace_f2fs_map_blocks
for better map_blocks trace.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Tue, 26 Feb 2019 11:01:15 +0000 (19:01 +0800)]
f2fs: trace f2fs_ioc_shutdown
This patch supports to trace f2fs_ioc_shutdown.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Mon, 25 Feb 2019 09:11:03 +0000 (17:11 +0800)]
f2fs: fix to avoid deadlock of atomic file operations
Thread A Thread B
- __fput
- f2fs_release_file
- drop_inmem_pages
- mutex_lock(&fi->inmem_lock)
- __revoke_inmem_pages
- lock_page(page)
- open
- f2fs_setattr
- truncate_setsize
- truncate_inode_pages_range
- lock_page(page)
- truncate_cleanup_page
- f2fs_invalidate_page
- drop_inmem_page
- mutex_lock(&fi->inmem_lock);
We may encounter above ABBA deadlock as reported by Kyungtae Kim:
I'm reporting a bug in linux-4.17.19: "INFO: task hung in
drop_inmem_page" (no reproducer)
I think this might be somehow related to the following:
https://groups.google.com/forum/#!searchin/syzkaller-bugs/INFO$3A$20task$20hung$20in$20%7Csort:date/syzkaller-bugs/c6soBTrdaIo/AjAzPeIzCgAJ
=========================================
INFO: task syz-executor7:10822 blocked for more than 120 seconds.
Not tainted 4.17.19 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor7 D27024 10822 6346 0x00000004
Call Trace:
context_switch kernel/sched/core.c:2867 [inline]
__schedule+0x721/0x1e60 kernel/sched/core.c:3515
schedule+0x88/0x1c0 kernel/sched/core.c:3559
schedule_preempt_disabled+0x18/0x30 kernel/sched/core.c:3617
__mutex_lock_common kernel/locking/mutex.c:833 [inline]
__mutex_lock+0x5bd/0x1410 kernel/locking/mutex.c:893
mutex_lock_nested+0x1b/0x20 kernel/locking/mutex.c:908
drop_inmem_page+0xcb/0x810 fs/f2fs/segment.c:327
f2fs_invalidate_page+0x337/0x5e0 fs/f2fs/data.c:2401
do_invalidatepage mm/truncate.c:165 [inline]
truncate_cleanup_page+0x261/0x330 mm/truncate.c:187
truncate_inode_pages_range+0x552/0x1610 mm/truncate.c:367
truncate_inode_pages mm/truncate.c:478 [inline]
truncate_pagecache+0x6d/0x90 mm/truncate.c:801
truncate_setsize+0x81/0xa0 mm/truncate.c:826
f2fs_setattr+0x44f/0x1270 fs/f2fs/file.c:781
notify_change+0xa62/0xe80 fs/attr.c:313
do_truncate+0x12e/0x1e0 fs/open.c:63
do_last fs/namei.c:2955 [inline]
path_openat+0x2042/0x29f0 fs/namei.c:3505
do_filp_open+0x1bd/0x2c0 fs/namei.c:3540
do_sys_open+0x35e/0x4e0 fs/open.c:1101
__do_sys_open fs/open.c:1119 [inline]
__se_sys_open fs/open.c:1114 [inline]
__x64_sys_open+0x89/0xc0 fs/open.c:1114
do_syscall_64+0xc4/0x4e0 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4497b9
RSP: 002b:
00007f734e459c68 EFLAGS:
00000246 ORIG_RAX:
0000000000000002
RAX:
ffffffffffffffda RBX:
00007f734e45a6cc RCX:
00000000004497b9
RDX:
0000000000000104 RSI:
00000000000a8280 RDI:
0000000020000080
RBP:
000000000071bea0 R08:
0000000000000000 R09:
0000000000000000
R10:
0000000000000000 R11:
0000000000000246 R12:
00000000ffffffff
R13:
0000000000007230 R14:
00000000006f02d0 R15:
00007f734e45a700
INFO: task syz-executor7:10858 blocked for more than 120 seconds.
Not tainted 4.17.19 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor7 D28880 10858 6346 0x00000004
Call Trace:
context_switch kernel/sched/core.c:2867 [inline]
__schedule+0x721/0x1e60 kernel/sched/core.c:3515
schedule+0x88/0x1c0 kernel/sched/core.c:3559
__rwsem_down_write_failed_common kernel/locking/rwsem-xadd.c:565 [inline]
rwsem_down_write_failed+0x5e6/0xc90 kernel/locking/rwsem-xadd.c:594
call_rwsem_down_write_failed+0x17/0x30 arch/x86/lib/rwsem.S:117
__down_write arch/x86/include/asm/rwsem.h:142 [inline]
down_write+0x58/0xa0 kernel/locking/rwsem.c:72
inode_lock include/linux/fs.h:713 [inline]
do_truncate+0x120/0x1e0 fs/open.c:61
do_last fs/namei.c:2955 [inline]
path_openat+0x2042/0x29f0 fs/namei.c:3505
do_filp_open+0x1bd/0x2c0 fs/namei.c:3540
do_sys_open+0x35e/0x4e0 fs/open.c:1101
__do_sys_open fs/open.c:1119 [inline]
__se_sys_open fs/open.c:1114 [inline]
__x64_sys_open+0x89/0xc0 fs/open.c:1114
do_syscall_64+0xc4/0x4e0 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4497b9
RSP: 002b:
00007f734e3b4c68 EFLAGS:
00000246 ORIG_RAX:
0000000000000002
RAX:
ffffffffffffffda RBX:
00007f734e3b56cc RCX:
00000000004497b9
RDX:
0000000000000104 RSI:
00000000000a8280 RDI:
0000000020000080
RBP:
000000000071c238 R08:
0000000000000000 R09:
0000000000000000
R10:
0000000000000000 R11:
0000000000000246 R12:
00000000ffffffff
R13:
0000000000007230 R14:
00000000006f02d0 R15:
00007f734e3b5700
INFO: task syz-executor5:10829 blocked for more than 120 seconds.
Not tainted 4.17.19 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor5 D28760 10829 6308 0x80000002
Call Trace:
context_switch kernel/sched/core.c:2867 [inline]
__schedule+0x721/0x1e60 kernel/sched/core.c:3515
schedule+0x88/0x1c0 kernel/sched/core.c:3559
io_schedule+0x21/0x80 kernel/sched/core.c:5179
wait_on_page_bit_common mm/filemap.c:1100 [inline]
__lock_page+0x2b5/0x390 mm/filemap.c:1273
lock_page include/linux/pagemap.h:483 [inline]
__revoke_inmem_pages+0xb35/0x11c0 fs/f2fs/segment.c:231
drop_inmem_pages+0xa3/0x3e0 fs/f2fs/segment.c:306
f2fs_release_file+0x2c7/0x330 fs/f2fs/file.c:1556
__fput+0x2c7/0x780 fs/file_table.c:209
____fput+0x1a/0x20 fs/file_table.c:243
task_work_run+0x151/0x1d0 kernel/task_work.c:113
exit_task_work include/linux/task_work.h:22 [inline]
do_exit+0x8ba/0x30a0 kernel/exit.c:865
do_group_exit+0x13b/0x3a0 kernel/exit.c:968
get_signal+0x6bb/0x1650 kernel/signal.c:2482
do_signal+0x84/0x1b70 arch/x86/kernel/signal.c:810
exit_to_usermode_loop+0x155/0x190 arch/x86/entry/common.c:162
prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
do_syscall_64+0x445/0x4e0 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4497b9
RSP: 002b:
00007f1c68e74ce8 EFLAGS:
00000246 ORIG_RAX:
00000000000000ca
RAX:
fffffffffffffe00 RBX:
000000000071bf80 RCX:
00000000004497b9
RDX:
0000000000000000 RSI:
0000000000000000 RDI:
000000000071bf80
RBP:
000000000071bf80 R08:
0000000000000000 R09:
000000000071bf58
R10:
0000000000000000 R11:
0000000000000246 R12:
0000000000000000
R13:
0000000000000000 R14:
00007f1c68e759c0 R15:
00007f1c68e75700
This patch tries to use trylock_page to mitigate such deadlock condition
for fix.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Sat, 23 Feb 2019 01:48:27 +0000 (09:48 +0800)]
f2fs: fix to dirty inode for i_mode recovery
As Seulbae Kim reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=202637
We didn't recover permission field correctly after sudden power-cut,
the reason is in setattr we didn't add inode into global dirty list
once i_mode is changed, so latter checkpoint triggered by fsync will
not flush last i_mode into disk, result in this problem, fix it.
Reported-by: Seulbae Kim <seulbae@gatech.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Mon, 25 Feb 2019 17:46:45 +0000 (09:46 -0800)]
f2fs: give random value to i_generation
This follows to give random number to i_generation along with commit
232530680290b ("ext4: improve smp scalability for inode generation")
This can be used for DUN for UFS HW encryption.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Gao Xiang [Thu, 21 Feb 2019 04:57:35 +0000 (12:57 +0800)]
f2fs: no need to take page lock in readdir
VFS will take inode_lock for readdir, therefore no need to
take page lock in readdir at all just as the majority of
other generic filesystems.
This patch improves concurrency since .iterate_shared
was introduced to VFS years ago.
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Thu, 21 Feb 2019 12:40:13 +0000 (20:40 +0800)]
f2fs: fix to update iostat correctly in IPU path
In error path of IPU, we didn't account iostat correctly, fix it.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Thu, 21 Feb 2019 12:37:14 +0000 (20:37 +0800)]
f2fs: fix encrypted page memory leak
For IPU path of f2fs_do_write_data_page(), in its error path, we
need to release encrypted page and fscrypt context, otherwise it
will cause memory leak.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Tue, 19 Feb 2019 09:08:18 +0000 (17:08 +0800)]
f2fs: make fault injection covering __submit_flush_wait()
This patch changes to allow failure of f2fs_bio_alloc() in
__submit_flush_wait(), which can simulate flush error in checkpoint()
for covering more error paths.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Tue, 19 Feb 2019 08:23:53 +0000 (16:23 +0800)]
f2fs: fix to retry fill_super only if recovery failed
With current retry mechanism in f2fs_fill_super, first fill_super
fails due to no memory, then second fill_super runs w/o recovery,
if we succeed, we may lose fsynced data, it doesn't make sense.
Let's retry fill_super only if it occurs non-ENOMEM error during
recovery.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Gao Xiang [Tue, 19 Feb 2019 02:31:52 +0000 (10:31 +0800)]
f2fs: silence VM_WARN_ON_ONCE in mempool_alloc
Note that __GFP_ZERO is not supported for mempool_alloc,
which also documented in the mempool_alloc comments.
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Zeng Guangyue [Mon, 18 Feb 2019 06:26:41 +0000 (14:26 +0800)]
f2fs: correct spelling mistake
correct spelling mistake for "nunmber"
Signed-off-by: Zeng Guangyue <zengguangyue@hisilicon.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Sat, 16 Feb 2019 03:04:38 +0000 (19:04 -0800)]
f2fs: fix wrong #endif
We have to cover whole headerfile with last #endif.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Tue, 5 Feb 2019 15:59:57 +0000 (07:59 -0800)]
f2fs: don't clear CP_QUOTA_NEED_FSCK_FLAG
If we met this once, let fsck.f2fs clear this only.
Note that, this addresses all the subtle fault injection test.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Thu, 14 Feb 2019 16:16:15 +0000 (00:16 +0800)]
f2fs: don't allow negative ->write_io_size_bits
As Dan reported:
"We put an upper bound on ->write_io_size_bits but we don't have a lower
bound."
So let's add lower bound check for ->write_io_size_bits in parse_options().
[We don't allow configuring ->write_io_size_bits to zero, since at least
we need to fill one dummy page for aligned IO.]
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Thu, 14 Feb 2019 16:08:25 +0000 (00:08 +0800)]
f2fs: fix to check inline_xattr_size boundary correctly
We use below condition to check inline_xattr_size boundary:
if (!F2FS_OPTION(sbi).inline_xattr_size ||
F2FS_OPTION(sbi).inline_xattr_size >=
DEF_ADDRS_PER_INODE -
F2FS_TOTAL_EXTRA_ATTR_SIZE -
DEF_INLINE_RESERVED_SIZE -
DEF_MIN_INLINE_SIZE)
There is there problems in that check:
- we should allow inline_xattr_size equaling to min size of inline
{data,dentry} area.
- F2FS_TOTAL_EXTRA_ATTR_SIZE and inline_xattr_size are based on
different size unit, previous one is 4 bytes, latter one is 1 bytes.
- DEF_MIN_INLINE_SIZE only indicate min size of inline data area,
however, we need to consider min size of inline dentry area as well,
minimal inline dentry should at least contain two entries: '.' and
'..', so that min inline_dentry size is 40 bytes.
.bitmap 1 * 1 = 1
.reserved 1 * 1 = 1
.dentry 11 * 2 = 22
.filename 8 * 2 = 16
total 40
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Tue, 26 Feb 2019 17:47:34 +0000 (09:47 -0800)]
Revert "f2fs: fix to avoid deadlock of atomic file operations"
This reverts commit
f3ac182210162c7e76997a8566a7f9869349f3d8.
Jaegeuk Kim [Sat, 16 Feb 2019 04:57:53 +0000 (20:57 -0800)]
Revert "f2fs: fix to check inline_xattr_size boundary correctly"
This reverts commit
802a643228462e0e3f6e84977d912f1871f61467.
Sahitya Tummala [Mon, 4 Feb 2019 08:06:53 +0000 (13:36 +0530)]
f2fs: do not use mutex lock in atomic context
Fix below warning coming because of using mutex lock in atomic context.
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:98
in_atomic(): 1, irqs_disabled(): 0, pid: 585, name: sh
Preemption disabled at: __radix_tree_preload+0x28/0x130
Call trace:
dump_backtrace+0x0/0x2b4
show_stack+0x20/0x28
dump_stack+0xa8/0xe0
___might_sleep+0x144/0x194
__might_sleep+0x58/0x8c
mutex_lock+0x2c/0x48
f2fs_trace_pid+0x88/0x14c
f2fs_set_node_page_dirty+0xd0/0x184
Do not use f2fs_radix_tree_insert() to avoid doing cond_resched() with
spin_lock() acquired.
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Sat, 2 Feb 2019 09:33:01 +0000 (17:33 +0800)]
f2fs: fix potential data inconsistence of checkpoint
Previously, we changed lock from cp_rwsem to node_change, it solved
the deadlock issue which was caused by below race condition:
Thread A Thread B
- f2fs_setattr
- f2fs_lock_op -- read_lock
- dquot_transfer
- __dquot_transfer
- dquot_acquire
- commit_dqblk
- f2fs_quota_write
- f2fs_write_begin
- f2fs_write_failed
- write_checkpoint
- block_operations
- f2fs_lock_all -- write_lock
- f2fs_truncate_blocks
- f2fs_lock_op -- read_lock
But it breaks the sematics of cp_rwsem, in other callers like:
- f2fs_file_write_iter -> f2fs_write_begin -> f2fs_write_failed
- f2fs_direct_IO -> f2fs_write_failed
We allow to truncate dnode w/o cp_rwsem held, result in incorrect sit
bitmap update, which can cause further data corruption.
So this patch reverts previous fix implementation, and try to fix
deadlock by skipping calling f2fs_truncate_blocks() in f2fs_write_failed()
only for quota file, and keep the preallocated data/node in the tail of
quota file, we can expecte that the preallocated space can be used to
store quota info latter soon.
Fixes:
af033b2aa8a8 ("f2fs: guarantee journalled quota data by checkpoint")
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Tue, 22 Jan 2019 12:18:06 +0000 (20:18 +0800)]
f2fs: fix to avoid deadlock of atomic file operations
Thread A Thread B
- __fput
- f2fs_release_file
- drop_inmem_pages
- mutex_lock(&fi->inmem_lock)
- __revoke_inmem_pages
- lock_page(page)
- open
- f2fs_setattr
- truncate_setsize
- truncate_inode_pages_range
- lock_page(page)
- truncate_cleanup_page
- f2fs_invalidate_page
- drop_inmem_page
- mutex_lock(&fi->inmem_lock);
We may encounter above ABBA deadlock as reported by Kyungtae Kim:
I'm reporting a bug in linux-4.17.19: "INFO: task hung in
drop_inmem_page" (no reproducer)
I think this might be somehow related to the following:
https://groups.google.com/forum/#!searchin/syzkaller-bugs/INFO$3A$20task$20hung$20in$20%7Csort:date/syzkaller-bugs/c6soBTrdaIo/AjAzPeIzCgAJ
=========================================
INFO: task syz-executor7:10822 blocked for more than 120 seconds.
Not tainted 4.17.19 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor7 D27024 10822 6346 0x00000004
Call Trace:
context_switch kernel/sched/core.c:2867 [inline]
__schedule+0x721/0x1e60 kernel/sched/core.c:3515
schedule+0x88/0x1c0 kernel/sched/core.c:3559
schedule_preempt_disabled+0x18/0x30 kernel/sched/core.c:3617
__mutex_lock_common kernel/locking/mutex.c:833 [inline]
__mutex_lock+0x5bd/0x1410 kernel/locking/mutex.c:893
mutex_lock_nested+0x1b/0x20 kernel/locking/mutex.c:908
drop_inmem_page+0xcb/0x810 fs/f2fs/segment.c:327
f2fs_invalidate_page+0x337/0x5e0 fs/f2fs/data.c:2401
do_invalidatepage mm/truncate.c:165 [inline]
truncate_cleanup_page+0x261/0x330 mm/truncate.c:187
truncate_inode_pages_range+0x552/0x1610 mm/truncate.c:367
truncate_inode_pages mm/truncate.c:478 [inline]
truncate_pagecache+0x6d/0x90 mm/truncate.c:801
truncate_setsize+0x81/0xa0 mm/truncate.c:826
f2fs_setattr+0x44f/0x1270 fs/f2fs/file.c:781
notify_change+0xa62/0xe80 fs/attr.c:313
do_truncate+0x12e/0x1e0 fs/open.c:63
do_last fs/namei.c:2955 [inline]
path_openat+0x2042/0x29f0 fs/namei.c:3505
do_filp_open+0x1bd/0x2c0 fs/namei.c:3540
do_sys_open+0x35e/0x4e0 fs/open.c:1101
__do_sys_open fs/open.c:1119 [inline]
__se_sys_open fs/open.c:1114 [inline]
__x64_sys_open+0x89/0xc0 fs/open.c:1114
do_syscall_64+0xc4/0x4e0 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4497b9
RSP: 002b:
00007f734e459c68 EFLAGS:
00000246 ORIG_RAX:
0000000000000002
RAX:
ffffffffffffffda RBX:
00007f734e45a6cc RCX:
00000000004497b9
RDX:
0000000000000104 RSI:
00000000000a8280 RDI:
0000000020000080
RBP:
000000000071bea0 R08:
0000000000000000 R09:
0000000000000000
R10:
0000000000000000 R11:
0000000000000246 R12:
00000000ffffffff
R13:
0000000000007230 R14:
00000000006f02d0 R15:
00007f734e45a700
INFO: task syz-executor7:10858 blocked for more than 120 seconds.
Not tainted 4.17.19 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor7 D28880 10858 6346 0x00000004
Call Trace:
context_switch kernel/sched/core.c:2867 [inline]
__schedule+0x721/0x1e60 kernel/sched/core.c:3515
schedule+0x88/0x1c0 kernel/sched/core.c:3559
__rwsem_down_write_failed_common kernel/locking/rwsem-xadd.c:565 [inline]
rwsem_down_write_failed+0x5e6/0xc90 kernel/locking/rwsem-xadd.c:594
call_rwsem_down_write_failed+0x17/0x30 arch/x86/lib/rwsem.S:117
__down_write arch/x86/include/asm/rwsem.h:142 [inline]
down_write+0x58/0xa0 kernel/locking/rwsem.c:72
inode_lock include/linux/fs.h:713 [inline]
do_truncate+0x120/0x1e0 fs/open.c:61
do_last fs/namei.c:2955 [inline]
path_openat+0x2042/0x29f0 fs/namei.c:3505
do_filp_open+0x1bd/0x2c0 fs/namei.c:3540
do_sys_open+0x35e/0x4e0 fs/open.c:1101
__do_sys_open fs/open.c:1119 [inline]
__se_sys_open fs/open.c:1114 [inline]
__x64_sys_open+0x89/0xc0 fs/open.c:1114
do_syscall_64+0xc4/0x4e0 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4497b9
RSP: 002b:
00007f734e3b4c68 EFLAGS:
00000246 ORIG_RAX:
0000000000000002
RAX:
ffffffffffffffda RBX:
00007f734e3b56cc RCX:
00000000004497b9
RDX:
0000000000000104 RSI:
00000000000a8280 RDI:
0000000020000080
RBP:
000000000071c238 R08:
0000000000000000 R09:
0000000000000000
R10:
0000000000000000 R11:
0000000000000246 R12:
00000000ffffffff
R13:
0000000000007230 R14:
00000000006f02d0 R15:
00007f734e3b5700
INFO: task syz-executor5:10829 blocked for more than 120 seconds.
Not tainted 4.17.19 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor5 D28760 10829 6308 0x80000002
Call Trace:
context_switch kernel/sched/core.c:2867 [inline]
__schedule+0x721/0x1e60 kernel/sched/core.c:3515
schedule+0x88/0x1c0 kernel/sched/core.c:3559
io_schedule+0x21/0x80 kernel/sched/core.c:5179
wait_on_page_bit_common mm/filemap.c:1100 [inline]
__lock_page+0x2b5/0x390 mm/filemap.c:1273
lock_page include/linux/pagemap.h:483 [inline]
__revoke_inmem_pages+0xb35/0x11c0 fs/f2fs/segment.c:231
drop_inmem_pages+0xa3/0x3e0 fs/f2fs/segment.c:306
f2fs_release_file+0x2c7/0x330 fs/f2fs/file.c:1556
__fput+0x2c7/0x780 fs/file_table.c:209
____fput+0x1a/0x20 fs/file_table.c:243
task_work_run+0x151/0x1d0 kernel/task_work.c:113
exit_task_work include/linux/task_work.h:22 [inline]
do_exit+0x8ba/0x30a0 kernel/exit.c:865
do_group_exit+0x13b/0x3a0 kernel/exit.c:968
get_signal+0x6bb/0x1650 kernel/signal.c:2482
do_signal+0x84/0x1b70 arch/x86/kernel/signal.c:810
exit_to_usermode_loop+0x155/0x190 arch/x86/entry/common.c:162
prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
do_syscall_64+0x445/0x4e0 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4497b9
RSP: 002b:
00007f1c68e74ce8 EFLAGS:
00000246 ORIG_RAX:
00000000000000ca
RAX:
fffffffffffffe00 RBX:
000000000071bf80 RCX:
00000000004497b9
RDX:
0000000000000000 RSI:
0000000000000000 RDI:
000000000071bf80
RBP:
000000000071bf80 R08:
0000000000000000 R09:
000000000071bf58
R10:
0000000000000000 R11:
0000000000000246 R12:
0000000000000000
R13:
0000000000000000 R14:
00007f1c68e759c0 R15:
00007f1c68e75700
This patch tries to use trylock_page to mitigate such deadlock condition
for fix.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Thu, 24 Jan 2019 07:18:47 +0000 (15:18 +0800)]
f2fs: fix to check inline_xattr_size boundary correctly
We use below condition to check inline_xattr_size boundary:
if (!F2FS_OPTION(sbi).inline_xattr_size ||
F2FS_OPTION(sbi).inline_xattr_size >=
DEF_ADDRS_PER_INODE -
F2FS_TOTAL_EXTRA_ATTR_SIZE -
DEF_INLINE_RESERVED_SIZE -
DEF_MIN_INLINE_SIZE)
There is there problems in that check:
- we should allow inline_xattr_size equaling to min size of inline
{data,dentry} area.
- F2FS_TOTAL_EXTRA_ATTR_SIZE and inline_xattr_size are based on
different size unit, previous one is 4 bytes, latter one is 1 bytes.
- DEF_MIN_INLINE_SIZE only indicate min size of inline data area,
however, we need to consider min size of inline dentry area as well,
minimal inline dentry should at least contain two entries: '.' and
'..', so that min inline_dentry size is 40 bytes.
.bitmap 1 * 1 = 1
.reserved 1 * 1 = 1
.dentry 11 * 2 = 22
.filename 8 * 2 = 16
total 40
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chengguang Xu [Wed, 23 Jan 2019 07:49:44 +0000 (15:49 +0800)]
f2fs: jump to label 'free_node_inode' when failing from d_make_root()
When sb->s_root is NULL dput() will do nothing,
so jump to label 'free_node_inode' instead of lable
'free_root_inode' when failing from d_make_root().
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Thu, 24 Jan 2019 09:18:07 +0000 (17:18 +0800)]
f2fs: fix to document inline_xattr_size option
We missed to add document for inline_xattr_size mount option in f2fs.txt,
add it.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
zhengliang [Thu, 24 Jan 2019 12:57:03 +0000 (20:57 +0800)]
f2fs: fix to data block override node segment by mistake
The following race could lead to data block override node segment by mistake.
Task A | Task B | Task C | Task D
======= | ======== |========== | =========
open file | | |
white file | | |
submit bio | | |
wait io complete | | |
| remove file | |
........ | iput_final | |
| | sync |
| | do checkpoint |
| | data segment free |
| | | create file1
| | | allocate node segment(if it is the same segment freed by Task C)
f2fs_write_end_io | | |
So we need to guarantee io complete before truncate inode
in f2fs_drop_inode.
Signed-off-by: Zheng Liang <zhengliang6@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Geliang Tang [Fri, 25 Jan 2019 07:35:01 +0000 (15:35 +0800)]
f2fs: fix typos in code comments
lengh -> length
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Tue, 22 Jan 2019 22:04:33 +0000 (14:04 -0800)]
f2fs: sync filesystem after roll-forward recovery
Some works after roll-forward recovery can get an error which will release
all the data structures. Let's flush them in order to make it clean.
One possible corruption came from:
[ 90.400500] list_del corruption. prev->next should be
ffffffed1f566208, but was (null)
[ 90.675349] Call trace:
[ 90.677869] __list_del_entry_valid+0x94/0xb4
[ 90.682351] remove_dirty_inode+0xac/0x114
[ 90.686563] __f2fs_write_data_pages+0x6a8/0x6c8
[ 90.691302] f2fs_write_data_pages+0x40/0x4c
[ 90.695695] do_writepages+0x80/0xf0
[ 90.699372] __writeback_single_inode+0xdc/0x4ac
[ 90.704113] writeback_sb_inodes+0x280/0x440
[ 90.708501] wb_writeback+0x1b8/0x3d0
[ 90.712267] wb_workfn+0x1a8/0x4d4
[ 90.715765] process_one_work+0x1c0/0x3d4
[ 90.719883] worker_thread+0x224/0x344
[ 90.723739] kthread+0x120/0x130
[ 90.727055] ret_from_fork+0x10/0x18
Reported-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Mon, 4 Feb 2019 18:46:42 +0000 (10:46 -0800)]
fs: export evict_inodes
This comes by
799ea9e9c599 ("xfs: evict all inodes involved with log redo item")
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Mon, 28 Jan 2019 01:59:53 +0000 (17:59 -0800)]
f2fs: flush quota blocks after turnning it off
After quota_off, we'll get some dirty blocks. If put_super don't have a chance
to flush them by checkpoint, it causes NULL pointer exception in end_io after
iput(node_inode). (e.g., by checkpoint=disable)
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Fri, 25 Jan 2019 20:05:25 +0000 (12:05 -0800)]
f2fs: avoid null pointer exception in dcc_info
If dcc_info is not set yet, we can get null pointer panic.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Fri, 25 Jan 2019 18:26:39 +0000 (10:26 -0800)]
f2fs: don't wake up too frequently, if there is lots of IOs
Otherwise, it wakes up discard thread which will sleep again by busy IOs
in a loop.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Fri, 25 Jan 2019 17:12:13 +0000 (09:12 -0800)]
f2fs: try to keep CP_TRIMMED_FLAG after successful umount
If every discard were issued successfully, we can avoid further discard.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Fri, 25 Jan 2019 01:48:38 +0000 (17:48 -0800)]
f2fs: add quick mode of checkpoint=disable for QA
This mode returns mount() quickly with EAGAIN. We can trigger this by
shutdown(F2FS_GOING_DOWN_NEED_FSCK).
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Mon, 14 Jan 2019 18:42:11 +0000 (10:42 -0800)]
f2fs: run discard jobs when put_super
When we umount f2fs, we need to avoid long delay due to discard commands, which
is actually taking tens of seconds, if storage is very slow on UNMAP. So, this
patch introduces timeout-based work on it.
By default, let me give 5 seconds for discard.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Thu, 10 Jan 2019 08:40:12 +0000 (16:40 +0800)]
f2fs: fix to set sbi dirty correctly
In order to record direct IO count, we add two additional type in
enum count_type: F2FS_DIO_{WRITE,READ}, but those IO won't dirty
filesystem metadata, so we don't need to set filesystem dirty in
inc_page_count(), fix it.
Fixes:
02b16d0a34a1 ("f2fs: add to account direct IO")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Sheng Yong [Tue, 15 Jan 2019 20:02:15 +0000 (20:02 +0000)]
f2fs: UBSAN: set boolean value iostat_enable correctly
When setting /sys/fs/f2fs/<DEV>/iostat_enable with non-bool value, UBSAN
reports the following warning.
[ 7562.295484] ================================================================================
[ 7562.296531] UBSAN: Undefined behaviour in fs/f2fs/f2fs.h:2776:10
[ 7562.297651] load of value 64 is not a valid value for type '_Bool'
[ 7562.298642] CPU: 1 PID: 7487 Comm: dd Not tainted 4.20.0-rc4+ #79
[ 7562.298653] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 7562.298662] Call Trace:
[ 7562.298760] dump_stack+0x46/0x5b
[ 7562.298811] ubsan_epilogue+0x9/0x40
[ 7562.298830] __ubsan_handle_load_invalid_value+0x72/0x90
[ 7562.298863] f2fs_file_write_iter+0x29f/0x3f0
[ 7562.298905] __vfs_write+0x115/0x160
[ 7562.298922] vfs_write+0xa7/0x190
[ 7562.298934] ksys_write+0x50/0xc0
[ 7562.298973] do_syscall_64+0x4a/0xe0
[ 7562.298992] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 7562.299001] RIP: 0033:0x7fa45ec19c00
[ 7562.299004] Code: 73 01 c3 48 8b 0d 88 92 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d dd eb 2c 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ce 8f 01 00 48 89 04 24
[ 7562.299044] RSP: 002b:
00007ffca52b49e8 EFLAGS:
00000246 ORIG_RAX:
0000000000000001
[ 7562.299052] RAX:
ffffffffffffffda RBX:
0000000000000000 RCX:
00007fa45ec19c00
[ 7562.299059] RDX:
0000000000000400 RSI:
000000000093f000 RDI:
0000000000000001
[ 7562.299065] RBP:
000000000093f000 R08:
0000000000000004 R09:
0000000000000000
[ 7562.299071] R10:
00007ffca52b47b0 R11:
0000000000000246 R12:
0000000000000400
[ 7562.299077] R13:
000000000093f000 R14:
000000000093f400 R15:
0000000000000000
[ 7562.299091] ================================================================================
So, if iostat_enable is enabled, set its value as true.
Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Sheng Yong [Mon, 14 Jan 2019 14:05:14 +0000 (22:05 +0800)]
f2fs: add brackets for macros
Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Sheng Yong [Mon, 7 Jan 2019 07:02:34 +0000 (15:02 +0800)]
f2fs: check if file namelen exceeds max value
Dentry bitmap is not enough to detect incorrect dentries. So this patch
also checks the namelen value of a dentry.
Signed-off-by: Gong Chen <gongchen4@huawei.com>
Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Tue, 8 Jan 2019 02:21:24 +0000 (10:21 +0800)]
f2fs: fix to trigger fsck if dirent.name_len is zero
While traversing dirents in f2fs_fill_dentries(), if bitmap is valid,
filename length should not be zero, otherwise, directory structure
consistency could be corrupted, in this case, let's print related
info and set SBI_NEED_FSCK to trigger fsck for repairing.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Greg Kroah-Hartman [Fri, 4 Jan 2019 13:26:18 +0000 (14:26 +0100)]
f2fs: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Chao Yu <yuchao0@huawei.com>
Cc: linux-f2fs-devel@lists.sourceforge.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Fri, 4 Jan 2019 01:19:08 +0000 (17:19 -0800)]
f2fs: export FS_NOCOW_FL flag to user
This exports pin_file status to user.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Fri, 4 Jan 2019 09:39:53 +0000 (17:39 +0800)]
f2fs: check inject_rate validity during configuring
Type of inject_rate is unsigned int, let's check new value's
validity during configuring.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
YueHaibing [Fri, 4 Jan 2019 01:38:29 +0000 (01:38 +0000)]
f2fs: remove set but not used variable 'err'
Fixes gcc '-Wunused-but-set-variable' warning:
fs/f2fs/data.c: In function 'f2fs_dio_submit_bio':
fs/f2fs/data.c:2585:6: warning:
variable 'err' set but not used [-Wunused-but-set-variable]
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Zhikang Zhang [Thu, 3 Jan 2019 12:06:38 +0000 (20:06 +0800)]
f2fs: fix compile warnings: 'struct *' declared inside parameter list
We meet these compile warnings below, which caused by missing declare structs:
struct f2fs_io_info, struct extent, struct f2fs_sb_info.
warning: 'struct f2fs_io_info' declared inside parameter list
warning: 'struct extent_info' declared inside parameter list
warning: 'struct f2fs_sb_info' declared inside parameter list
Signed-off-by: Zhikang Zhang <zhangzhikang1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chengguang Xu [Tue, 1 Jan 2019 13:33:11 +0000 (21:33 +0800)]
f2fs: change error code to -ENOMEM from -EINVAL
The error case of failing allocating memory should
return -ENOMEM.
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Tue, 1 Jan 2019 08:11:30 +0000 (00:11 -0800)]
f2fs: don't access node/meta inode mapping after iput
This fixes wrong access of address spaces of node and meta inodes after iput.
Fixes:
60aa4d5536ab ("f2fs: fix use-after-free issue when accessing sbi->stat_info")
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Fri, 28 Dec 2018 19:00:38 +0000 (11:00 -0800)]
f2fs: wait on atomic writes to count F2FS_CP_WB_DATA
Otherwise, we can get wrong counts incurring checkpoint hang.
IO_W (CP: -24, Data: 24, Flush: ( 0 0 1), Discard: ( 0 0))
Thread A Thread B
- f2fs_write_data_pages
- __write_data_page
- f2fs_submit_page_write
- inc_page_count(F2FS_WB_DATA)
type is F2FS_WB_DATA due to file is non-atomic one
- f2fs_ioc_start_atomic_write
- set_inode_flag(FI_ATOMIC_FILE)
- f2fs_write_end_io
- dec_page_count(F2FS_WB_CP_DATA)
type is F2FS_WB_DATA due to file becomes
atomic one
Cc: <stable@vger.kernel.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Thu, 27 Dec 2018 03:54:07 +0000 (19:54 -0800)]
f2fs: sanity check of xattr entry size
There is a security report where f2fs_getxattr() has a hole to expose wrong
memory region when the image is malformed like this.
f2fs_getxattr: entry->e_name_len: 4, size: 12288, buffer_size: 16384, len: 4
Cc: <stable@vger.kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Sahitya Tummala [Wed, 26 Dec 2018 05:50:29 +0000 (11:20 +0530)]
f2fs: fix use-after-free issue when accessing sbi->stat_info
iput() on sbi->node_inode can update sbi->stat_info
in the below context, if the f2fs_write_checkpoint()
has failed with error.
f2fs_balance_fs_bg+0x1ac/0x1ec
f2fs_write_node_pages+0x4c/0x260
do_writepages+0x80/0xbc
__writeback_single_inode+0xdc/0x4ac
writeback_single_inode+0x9c/0x144
write_inode_now+0xc4/0xec
iput+0x194/0x22c
f2fs_put_super+0x11c/0x1e8
generic_shutdown_super+0x70/0xf4
kill_block_super+0x2c/0x5c
kill_f2fs_super+0x44/0x50
deactivate_locked_super+0x60/0x8c
deactivate_super+0x68/0x74
cleanup_mnt+0x40/0x78
Fix this by moving f2fs_destroy_stats() further below iput() in
both f2fs_put_super() and f2fs_fill_super() paths.
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Tue, 25 Dec 2018 09:43:42 +0000 (17:43 +0800)]
f2fs: check PageWriteback flag for ordered case
For all ordered cases in f2fs_wait_on_page_writeback(), we need to
check PageWriteback status, so let's clean up to relocate the check
into f2fs_wait_on_page_writeback().
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Martin Blumenstingl [Sat, 22 Dec 2018 10:22:26 +0000 (11:22 +0100)]
f2fs: fix validation of the block count in sanity_check_raw_super
Treat "block_count" from struct f2fs_super_block as 64-bit little endian
value in sanity_check_raw_super() because struct f2fs_super_block
declares "block_count" as "__le64".
This fixes a bug where the superblock validation fails on big endian
devices with the following error:
F2FS-fs (sda1): Wrong segment_count / block_count (61439 > 0)
F2FS-fs (sda1): Can't find valid F2FS filesystem in 1th superblock
F2FS-fs (sda1): Wrong segment_count / block_count (61439 > 0)
F2FS-fs (sda1): Can't find valid F2FS filesystem in 2th superblock
As result of this the partition cannot be mounted.
With this patch applied the superblock validation works fine and the
partition can be mounted again:
F2FS-fs (sda1): Mounted with checkpoint version = 7c84
My little endian x86-64 hardware was able to mount the partition without
this fix.
To confirm that mounting f2fs filesystems works on big endian machines
again I tested this on a 32-bit MIPS big endian (lantiq) device.
Fixes:
0cfe75c5b01199 ("f2fs: enhance sanity_check_raw_super() to avoid potential overflows")
Cc: stable@vger.kernel.org
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Tue, 18 Dec 2018 01:08:26 +0000 (17:08 -0800)]
f2fs: fix missing unlock(sbi->gc_mutex)
This fixes missing unlock call.
Cc: <stable@vger.kernel.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Tue, 18 Dec 2018 11:20:16 +0000 (19:20 +0800)]
f2fs: clean up structure extent_node
The union in struct extent_node wass only to indicate below fields
struct rb_node rb_node;
union {
struct {
unsigned int fofs;
unsigned int len;
...
...
can be parsed as fields in struct rb_entry, but they were never be
used explicitly before, so let's remove them for cleanup.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Qiuyang Sun [Tue, 18 Dec 2018 09:32:23 +0000 (17:32 +0800)]
f2fs: fix block address for __check_sit_bitmap
Should use lstart (logical start address) instead of start (in dev) here.
This fixes a bug in multi-device scenarios.
Signed-off-by: Qiuyang Sun <sunqiuyang@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>