OSDN Git Service

sagit-ice-cold/kernel_xiaomi_msm8998.git
7 years agofscrypt: inline fscrypt_free_filename()
Eric Biggers [Tue, 23 May 2017 01:14:06 +0000 (18:14 -0700)]
fscrypt: inline fscrypt_free_filename()

fscrypt_free_filename() only needs to do a kfree() of crypto_buf.name,
which works well as an inline function.  We can skip setting the various
pointers to NULL, since no user cares about it (the name is always freed
just before it goes out of scope).

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: David Gstir <david@sigma-star.at>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
7 years agof2fs: make more close to v4.13-rc1
Jaegeuk Kim [Tue, 11 Jul 2017 02:16:28 +0000 (19:16 -0700)]
f2fs: make more close to v4.13-rc1

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: support plain user/group quota
Chao Yu [Sat, 8 Jul 2017 16:13:07 +0000 (00:13 +0800)]
f2fs: support plain user/group quota

This patch adds to support plain user/group quota.

Change Note by Jaegeuk Kim.

- Use f2fs page cache for quota files in order to consider garbage collection.
  so, quota files are not tolerable for sudden power-cuts, so user needs to do
  quotacheck.

- setattr() calls dquot_transfer which will transfer inode->i_blocks.
  We can't reclaim that during f2fs_evict_inode(). So, we need to count
  node blocks as well in order to match i_blocks with dquot's space.

  Note that, Chao wrote a patch to count inode->i_blocks without inode block.
  (f2fs: don't count inode block in in-memory inode.i_blocks)

- in f2fs_remount, we need to make RW in prior to dquot_resume.

- handle fault_injection case during f2fs_quota_off_umount

- TODO: Project quota

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: avoid deadlock caused by lock order of page and lock_op
Jaegeuk Kim [Thu, 22 Jun 2017 00:52:39 +0000 (17:52 -0700)]
f2fs: avoid deadlock caused by lock order of page and lock_op

- punch_hole
 - fill_zero
  - f2fs_lock_op
  - get_new_data_page
   - lock_page

- f2fs_write_data_pages
 - lock_page
 - do_write_data_page
  - f2fs_lock_op

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: use spin_{,un}lock_irq{save,restore}
Chao Yu [Fri, 7 Jul 2017 06:10:15 +0000 (14:10 +0800)]
f2fs: use spin_{,un}lock_irq{save,restore}

generic/361 reports below warning, this is because: once, there is
someone entering into critical region of sbi.cp_lock, if write_end_io.
f2fs_stop_checkpoint is invoked from an triggered IRQ, we will encounter
deadlock.

So this patch changes to use spin_{,un}lock_irq{save,restore} to create
critical region without IRQ enabled to avoid potential deadlock.

 irq event stamp: 83391573
 loop: Write error at byte offset 438729728, length 1024.
 hardirqs last  enabled at (83391573): [<c1809752>] restore_all+0xf/0x65
 hardirqs last disabled at (83391572): [<c1809eac>] reschedule_interrupt+0x30/0x3c
 loop: Write error at byte offset 438860288, length 1536.
 softirqs last  enabled at (83389244): [<c180cc4e>] __do_softirq+0x1ae/0x476
 softirqs last disabled at (83389237): [<c101ca7c>] do_softirq_own_stack+0x2c/0x40
 loop: Write error at byte offset 438990848, length 2048.
 ================================
 WARNING: inconsistent lock state
 4.12.0-rc2+ #30 Tainted: G           O
 --------------------------------
 inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
 xfs_io/7959 [HC1[1]:SC0[0]:HE0:SE1] takes:
  (&(&sbi->cp_lock)->rlock){?.+...}, at: [<f96f96cc>] f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
 {HARDIRQ-ON-W} state was registered at:
   __lock_acquire+0x527/0x7b0
   lock_acquire+0xae/0x220
   _raw_spin_lock+0x42/0x50
   do_checkpoint+0x165/0x9e0 [f2fs]
   write_checkpoint+0x33f/0x740 [f2fs]
   __f2fs_sync_fs+0x92/0x1f0 [f2fs]
   f2fs_sync_fs+0x12/0x20 [f2fs]
   sync_filesystem+0x67/0x80
   generic_shutdown_super+0x27/0x100
   kill_block_super+0x22/0x50
   kill_f2fs_super+0x3a/0x40 [f2fs]
   deactivate_locked_super+0x3d/0x70
   deactivate_super+0x40/0x60
   cleanup_mnt+0x39/0x70
   __cleanup_mnt+0x10/0x20
   task_work_run+0x69/0x80
   exit_to_usermode_loop+0x57/0x85
   do_fast_syscall_32+0x18c/0x1b0
   entry_SYSENTER_32+0x4c/0x7b
 irq event stamp: 1957420
 hardirqs last  enabled at (1957419): [<c1808f37>] _raw_spin_unlock_irq+0x27/0x50
 hardirqs last disabled at (1957420): [<c1809f9c>] call_function_single_interrupt+0x30/0x3c
 softirqs last  enabled at (1953784): [<c180cc4e>] __do_softirq+0x1ae/0x476
 softirqs last disabled at (1953773): [<c101ca7c>] do_softirq_own_stack+0x2c/0x40

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&(&sbi->cp_lock)->rlock);
   <Interrupt>
     lock(&(&sbi->cp_lock)->rlock);

  *** DEADLOCK ***

 2 locks held by xfs_io/7959:
  #0:  (sb_writers#13){.+.+.+}, at: [<c11fd7ca>] vfs_write+0x16a/0x190
  #1:  (&sb->s_type->i_mutex_key#16){+.+.+.}, at: [<f96e33f5>] f2fs_file_write_iter+0x25/0x140 [f2fs]

 stack backtrace:
 CPU: 2 PID: 7959 Comm: xfs_io Tainted: G           O    4.12.0-rc2+ #30
 Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
 Call Trace:
  dump_stack+0x5f/0x92
  print_usage_bug+0x1d3/0x1dd
  ? check_usage_backwards+0xe0/0xe0
  mark_lock+0x23d/0x280
  __lock_acquire+0x699/0x7b0
  ? __this_cpu_preempt_check+0xf/0x20
  ? trace_hardirqs_off_caller+0x91/0xe0
  lock_acquire+0xae/0x220
  ? f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
  _raw_spin_lock+0x42/0x50
  ? f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
  f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
  f2fs_write_end_io+0x147/0x150 [f2fs]
  bio_endio+0x7a/0x1e0
  blk_update_request+0xad/0x410
  blk_mq_end_request+0x16/0x60
  lo_complete_rq+0x3c/0x70
  __blk_mq_complete_request_remote+0x11/0x20
  flush_smp_call_function_queue+0x6d/0x120
  ? debug_smp_processor_id+0x12/0x20
  generic_smp_call_function_single_interrupt+0x12/0x30
  smp_call_function_single_interrupt+0x25/0x40
  call_function_single_interrupt+0x37/0x3c
 EIP: _raw_spin_unlock_irq+0x2d/0x50
 EFLAGS: 00000296 CPU: 2
 EAX: 00000001 EBX: d2ccc51c ECX: 00000001 EDX: c1aacebd
 ESI: 00000000 EDI: 00000000 EBP: c96c9d1c ESP: c96c9d18
  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
  ? inherit_task_group.isra.98.part.99+0x6b/0xb0
  __add_to_page_cache_locked+0x1d4/0x290
  add_to_page_cache_lru+0x38/0xb0
  pagecache_get_page+0x8e/0x200
  f2fs_write_begin+0x96/0xf00 [f2fs]
  ? trace_hardirqs_on_caller+0xdd/0x1c0
  ? current_time+0x17/0x50
  ? trace_hardirqs_on+0xb/0x10
  generic_perform_write+0xa9/0x170
  __generic_file_write_iter+0x1a2/0x1f0
  ? f2fs_preallocate_blocks+0x137/0x160 [f2fs]
  f2fs_file_write_iter+0x6e/0x140 [f2fs]
  ? __lock_acquire+0x429/0x7b0
  __vfs_write+0xc1/0x140
  vfs_write+0x9b/0x190
  SyS_pwrite64+0x63/0xa0
  do_fast_syscall_32+0xa1/0x1b0
  entry_SYSENTER_32+0x4c/0x7b
 EIP: 0xb7786c61
 EFLAGS: 00000293 CPU: 2
 EAX: ffffffda EBX: 00000003 ECX: 08416000 EDX: 00001000
 ESI: 18b24000 EDI: 00000000 EBP: 00000003 ESP: bf9b36b0
  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b

Fixes: aaec2b1d1879 ("f2fs: introduce cp_lock to protect updating of ckpt_flags")
Cc: stable@vger.kernel.org
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: relax migratepage for atomic written page
Jaegeuk Kim [Thu, 6 Jul 2017 21:46:01 +0000 (14:46 -0700)]
f2fs: relax migratepage for atomic written page

In order to avoid lock contention for atomic written pages, we'd better give
EBUSY in f2fs_migrate_page when mode is asynchronous. We expect it will be
released soon as transaction commits.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: don't count inode block in in-memory inode.i_blocks
Chao Yu [Wed, 5 Jul 2017 17:11:31 +0000 (01:11 +0800)]
f2fs: don't count inode block in in-memory inode.i_blocks

Previously, we count all inode consumed blocks including inode block,
xattr block, index block, data block into i_blocks, for other generic
filesystems, they won't count inode block into i_blocks, so for
userspace applications or quota system, they may detect incorrect block
count according to i_blocks value in inode.

This patch changes to count all blocks into inode.i_blocks excluding
inode block, for on-disk i_blocks, we keep counting inode block for
backward compatibility.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agoRevert "f2fs: fix to clean previous mount option when remount_fs"
Chao Yu [Wed, 5 Jul 2017 04:17:24 +0000 (12:17 +0800)]
Revert "f2fs: fix to clean previous mount option when remount_fs"

Don't clear old mount option before parse new option during ->remount_fs
like other generic filesystems.

This reverts commit 26666c8a4366debae30ae37d0688b2bec92d196a.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: do not set LOST_PINO for renamed dir
Sheng Yong [Mon, 26 Jun 2017 02:41:36 +0000 (10:41 +0800)]
f2fs: do not set LOST_PINO for renamed dir

After renaming a directory, fsck could detect unmatched pino. The scenario
can be reproduced as the following:

$ mkdir /bar/subbar /foo
$ rename /bar/subbar /foo

Then fsck will report:
[ASSERT] (__chk_dots_dentries:1182)  --> Bad inode number[0x3] for '..', parent parent ino is [0x4]

Rename sets LOST_PINO for old_inode. However, the flag cannot be cleared,
since dir is written back with CP. So, let's get rid of LOST_PINO for a
renamed dir and fix the pino directly at the end of rename.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: do not set LOST_PINO for newly created dir
Sheng Yong [Mon, 26 Jun 2017 02:41:35 +0000 (10:41 +0800)]
f2fs: do not set LOST_PINO for newly created dir

Since directories will be written back with checkpoint and fsync a
directory will always write CP, there is no need to set LOST_PINO
after creating a directory.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: skip ->writepages for {mete,node}_inode during recovery
Chao Yu [Thu, 29 Jun 2017 15:20:45 +0000 (23:20 +0800)]
f2fs: skip ->writepages for {mete,node}_inode during recovery

Skip ->writepages in prior to ->writepage for {meta,node}_inode during
recovery, hence unneeded loop in ->writepages can be avoided.

Moreover, check SBI_POR_DOING earlier while writebacking pages.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: introduce __check_sit_bitmap
Chao Yu [Fri, 30 Jun 2017 09:19:02 +0000 (17:19 +0800)]
f2fs: introduce __check_sit_bitmap

After we introduce discard thread, discard command can be issued
concurrently with data allocating, this patch adds new function to
heck sit bitmap to ensure that userdata was invalid in which on-going
discard command covered.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
 Conflicts:
fs/f2fs/segment.c

7 years agof2fs: stop gc/discard thread in prior during umount
Chao Yu [Thu, 29 Jun 2017 15:17:45 +0000 (23:17 +0800)]
f2fs: stop gc/discard thread in prior during umount

This patch resolves kernel panic for xfstests/081, caused by recent f2fs_bug_on

 f2fs: add f2fs_bug_on in __remove_discard_cmd

For fixing, we will stop gc/discard thread in prior in ->kill_sb in order to
avoid referring and releasing race among them.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: introduce reserved_blocks in sysfs
Chao Yu [Mon, 26 Jun 2017 08:24:41 +0000 (16:24 +0800)]
f2fs: introduce reserved_blocks in sysfs

In this patch, we add a new sysfs interface, with it, we can control
number of reserved blocks in system which could not be used by user,
it enable f2fs to let user to configure for adjusting over-provision
ratio dynamically instead of changing it by mkfs.

So we can expect it will help to reserve more free space for relieving
GC in both filesystem and flash device.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
 Conflicts:
Documentation/ABI/testing/sysfs-fs-f2fs

7 years agof2fs: avoid redundant f2fs_flush after remount
Yunlong Song [Sat, 24 Jun 2017 07:57:19 +0000 (15:57 +0800)]
f2fs: avoid redundant f2fs_flush after remount

create_flush_cmd_control will create redundant issue_flush_thread after each
remount with flush_merge option.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: report # of free inodes more precisely
Jaegeuk Kim [Thu, 22 Jun 2017 03:55:55 +0000 (20:55 -0700)]
f2fs: report # of free inodes more precisely

If the partition is small, we don't need to report total # of inodes including
hidden free nodes.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: add ioctl to do gc with target block address
Jaegeuk Kim [Thu, 15 Jun 2017 23:44:42 +0000 (16:44 -0700)]
f2fs: add ioctl to do gc with target block address

This patch adds f2fs_ioc_gc_range() to move blocks located in the given
range.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: don't need to check encrypted inode for partial truncation
Jaegeuk Kim [Wed, 14 Jun 2017 15:05:32 +0000 (08:05 -0700)]
f2fs: don't need to check encrypted inode for partial truncation

The cache_only is always false, if inode is encrypted.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: measure inode.i_blocks as generic filesystem
Chao Yu [Wed, 14 Jun 2017 15:00:56 +0000 (23:00 +0800)]
f2fs: measure inode.i_blocks as generic filesystem

Both in memory or on disk, generic filesystems record i_blocks with
512bytes sized sector count, also VFS sub module such as disk quota
follows this rule, but f2fs records it with 4096bytes sized block
count, this difference leads to that once we use dquota's function
which inc/dec iblocks, it will make i_blocks of f2fs being inconsistent
between in memory and on disk.

In order to resolve this issue, this patch changes to make in-memory
i_blocks of f2fs recording sector count instead of block count,
meanwhile leaving on-disk i_blocks recording block count.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: set CP_TRIMMED_FLAG correctly
Chao Yu [Wed, 14 Jun 2017 15:00:55 +0000 (23:00 +0800)]
f2fs: set CP_TRIMMED_FLAG correctly

Don't set CP_TRIMMED_FLAG for non-zoned block device or discard
unsupported device, it can avoid to trigger unneeded checkpoint for
that kind of device.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: require key for truncate(2) of encrypted file
Eric Biggers [Tue, 13 Jun 2017 23:47:54 +0000 (16:47 -0700)]
f2fs: require key for truncate(2) of encrypted file

Currently, filesystems allow truncate(2) on an encrypted file without
the encryption key.  However, it's impossible to correctly handle the
case where the size being truncated to is not a multiple of the
filesystem block size, because that would require decrypting the final
block, zeroing the part beyond i_size, then encrypting the block.

As other modifications to encrypted file contents are prohibited without
the key, just prohibit truncate(2) as well, making it fail with ENOKEY.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: move sysfs code from super.c to fs/f2fs/sysfs.c
Chao Yu [Wed, 14 Jun 2017 09:39:47 +0000 (17:39 +0800)]
f2fs: move sysfs code from super.c to fs/f2fs/sysfs.c

Codes related to sysfs and procfs are dispersive and mixed with sb
related codes, but actually these codes are independent from others,
so split them from super.c, and reorgnize and manger them in sysfs.c.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
 Conflicts:
fs/f2fs/super.c

7 years agof2fs: clean up sysfs codes
Chao Yu [Wed, 14 Jun 2017 09:39:46 +0000 (17:39 +0800)]
f2fs: clean up sysfs codes

Just cleanup.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: fix wrong error number of fill_super
Chao Yu [Mon, 12 Jun 2017 01:44:27 +0000 (09:44 +0800)]
f2fs: fix wrong error number of fill_super

This patch fixes incorrect error number in error path of fill_super.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: fix to show injection rate in ->show_options
Chao Yu [Mon, 12 Jun 2017 01:44:24 +0000 (09:44 +0800)]
f2fs: fix to show injection rate in ->show_options

If fault injection functionality is enabled, show additional injection
rate in ->show_options.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: Fix a return value in case of error in 'f2fs_fill_super'
Christophe JAILLET [Sun, 11 Jun 2017 07:21:11 +0000 (09:21 +0200)]
f2fs: Fix a return value in case of error in 'f2fs_fill_super'

err must be set to -ENOMEM, otherwise we return 0.

Fixes: a912b54d3aaa0 ("f2fs: split bio cache")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: use proper variable name
Tiezhu Yang [Thu, 8 Jun 2017 22:32:54 +0000 (06:32 +0800)]
f2fs: use proper variable name

It is better to use variable name "inline_dentry" instead of "dentry_blk"
when data type is "struct f2fs_inline_dentry". This patch has no functional
changes, just to make code more readable especially when call the function
make_dentry_ptr_inline() and f2fs_convert_inline_dir().

Signed-off-by: Tiezhu Yang <kernelpatch@126.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: fix to avoid panic when encountering corrupt node
Chao Yu [Wed, 7 Jun 2017 03:17:35 +0000 (11:17 +0800)]
f2fs: fix to avoid panic when encountering corrupt node

With fault_injection option, generic/361 of fstests will complain us
with below message:

Call Trace:
 get_node_page+0x12/0x20 [f2fs]
 f2fs_iget+0x92/0x7d0 [f2fs]
 f2fs_fill_super+0x10fb/0x15e0 [f2fs]
 mount_bdev+0x184/0x1c0
 f2fs_mount+0x15/0x20 [f2fs]
 mount_fs+0x39/0x150
 vfs_kern_mount+0x67/0x110
 do_mount+0x1bb/0xc70
 SyS_mount+0x83/0xd0
 do_syscall_64+0x6e/0x160
 entry_SYSCALL64_slow_path+0x25/0x25

Since mkfs loop device in f2fs partition can be failed silently due to
checkpoint error injection, so root inode page can be corrupted, in order
to avoid needless panic, in get_node_page, it's better to leave message
and return error to caller, and let fsck repaire it later.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: don't track newly allocated nat entry in list
Chao Yu [Mon, 5 Jun 2017 10:29:08 +0000 (18:29 +0800)]
f2fs: don't track newly allocated nat entry in list

We will never persist newly allocated nat entries during checkpoint(), so
we don't need to track such nat entries in nat dirty list in order to
avoid:
- more latency during traversing dirty list;
- sorting nat sets incorrectly due to recording wrong entry_cnt in nat
entry set.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: add f2fs_bug_on in __remove_discard_cmd
Chao Yu [Mon, 5 Jun 2017 10:29:07 +0000 (18:29 +0800)]
f2fs: add f2fs_bug_on in __remove_discard_cmd

Recently, discard related codes have changed a lot, so add f2fs_bug_on to
detect potential bug.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: introduce __wait_one_discard_bio
Chao Yu [Mon, 5 Jun 2017 10:29:06 +0000 (18:29 +0800)]
f2fs: introduce __wait_one_discard_bio

In order to avoid copied codes.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: dax: fix races between page faults and truncating pages
Qiuyang Sun [Thu, 18 May 2017 03:06:45 +0000 (11:06 +0800)]
f2fs: dax: fix races between page faults and truncating pages

Currently in F2FS, page faults and operations that truncate the pagecahe
or data blocks, are completely unsynchronized. This can result in page
fault faulting in a page into a range that we are changing after
truncating, and thus we can end up with a page mapped to disk blocks that
will be shortly freed. Filesystem corruption will shortly follow.

This patch fixes the problem by creating new rw semaphore i_mmap_sem in
f2fs_inode_info and grab it for functions removing blocks from extent tree
and for read over page faults. The mechanism is similar to that in ext4.

Signed-off-by: Qiuyang Sun <sunqiuyang@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
 Conflicts:
fs/f2fs/file.c

7 years agof2fs: simplify the way of calulating next nat address
Fan Li [Fri, 2 Jun 2017 07:45:42 +0000 (15:45 +0800)]
f2fs: simplify the way of calulating next nat address

The index of segment which the next nat block is in has only one different
bit than the current one, so to get the next nat address, we can simply
alter that one bit.

Signed-off-by: Fan Li <fanofcode.li@samsung.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: sanity check size of nat and sit cache
Jin Qian [Thu, 1 Jun 2017 18:18:30 +0000 (11:18 -0700)]
f2fs: sanity check size of nat and sit cache

Make sure number of entires doesn't exceed max journal size.

Cc: stable@vger.kernel.org
Signed-off-by: Jin Qian <jinqian@android.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: fix a panic caused by NULL flush_cmd_control
Yunlei He [Thu, 1 Jun 2017 08:43:51 +0000 (16:43 +0800)]
f2fs: fix a panic caused by NULL flush_cmd_control

Mount fs with option noflush_merge, boot failed for illegal address
fcc in function f2fs_issue_flush:

        if (!test_opt(sbi, FLUSH_MERGE)) {
                ret = submit_flush_wait(sbi);
                atomic_inc(&fcc->issued_flush);   ->  Here, fcc illegal
                return ret;
        }

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: remove the unnecessary cast for PTR_ERR
Zhang Shengju [Thu, 1 Jun 2017 08:50:10 +0000 (16:50 +0800)]
f2fs: remove the unnecessary cast for PTR_ERR

It's not necessary to specify 'int' casting for PTR_ERR.

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: remove false-positive bug_on
Jaegeuk Kim [Thu, 1 Jun 2017 22:39:27 +0000 (15:39 -0700)]
f2fs: remove false-positive bug_on

For example,

f2fs_create
 - new_node_page is failed
 - handle_failed_inode
  - skip to add it into orphan list, since ni.blk_addr == NULL_ADDR
   : set_inode_flag(inode, FI_FREE_NID)

f2fs_evict_inode
 - EIO due to fault injection
 - f2fs_bug_on() is triggered

So, we don't need to call f2fs_bug_on in this case.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: Do not issue small discards in LFS mode
Damien Le Moal [Fri, 26 May 2017 08:04:40 +0000 (17:04 +0900)]
f2fs: Do not issue small discards in LFS mode

clear_prefree_segments() issues small discards after discarding full
segments. These small discards may not be section aligned, so not zone
aligned on a zoned block device, causing __f2fs_iissue_discard_zone() to fail.
Fix this by not issuing small discards for a volume mounted with the BLKZONED
feature enabled.

Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: don't bother checking for encryption key in ->write_iter()
Eric Biggers [Tue, 23 May 2017 00:39:45 +0000 (17:39 -0700)]
f2fs: don't bother checking for encryption key in ->write_iter()

Since only an open file can be written to, and we only allow open()ing
an encrypted file when its key is available, there is no need to check
for the key again before permitting each ->write_iter().

This code was also broken in that it wouldn't actually have failed if
the key was in fact unavailable.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: David Gstir <david@sigma-star.at>
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: don't bother checking for encryption key in ->mmap()
Eric Biggers [Tue, 23 May 2017 00:39:43 +0000 (17:39 -0700)]
f2fs: don't bother checking for encryption key in ->mmap()

Since only an open file can be mmap'ed, and we only allow open()ing an
encrypted file when its key is available, there is no need to check for
the key again before permitting each mmap().

This f2fs copy of this code was also broken in that it wouldn't actually
have failed if the key was in fact unavailable.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: David Gstir <david@sigma-star.at>
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: wait discard IO completion without cmd_lock held
Chao Yu [Fri, 19 May 2017 15:46:45 +0000 (23:46 +0800)]
f2fs: wait discard IO completion without cmd_lock held

Wait discard IO completion outside cmd_lock to avoid long latency
of holding cmd_lock in IO busy scenario.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: wake up all waiters in f2fs_submit_discard_endio
Chao Yu [Fri, 19 May 2017 15:46:44 +0000 (23:46 +0800)]
f2fs: wake up all waiters in f2fs_submit_discard_endio

There could be more than one waiter waiting discard IO completion, so we
need use complete_all() instead of complete() in f2fs_submit_discard_endio
to avoid hungtask.

Fixes:  ec9895add2c5 ("f2fs: don't hold cmd_lock during waiting discard
command")
Cc: <stable@vger.kernel.org>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: show more info if fail to issue discard
Chao Yu [Fri, 19 May 2017 15:46:43 +0000 (23:46 +0800)]
f2fs: show more info if fail to issue discard

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: introduce io_list for serialize data/node IOs
Chao Yu [Fri, 19 May 2017 15:37:01 +0000 (23:37 +0800)]
f2fs: introduce io_list for serialize data/node IOs

Serialize data/node IOs by using fifo list instead of mutex lock,
it will help to enhance concurrency of f2fs, meanwhile keeping LFS
IO semantics.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: split wio_mutex
Chao Yu [Fri, 19 May 2017 15:37:00 +0000 (23:37 +0800)]
f2fs: split wio_mutex

Split wio_mutex to adjust different temperature bio cache.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: combine huge num of discard rb tree consistence checks
Yunlei He [Fri, 19 May 2017 06:42:12 +0000 (14:42 +0800)]
f2fs: combine huge num of discard rb tree consistence checks

Came across a hungtask caused by huge number of rb tree traversing
during adding discard addrs in cp. This patch combine these consistence
checks and move it to discard thread.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: fix a bug caused by NULL extent tree
Yunlei He [Fri, 19 May 2017 07:06:12 +0000 (15:06 +0800)]
f2fs: fix a bug caused by NULL extent tree

Thread A: Thread B:

-f2fs_remount
    -sbi->mount_opt.opt = 0;
<--- -f2fs_iget
         -do_read_inode
     -f2fs_init_extent_tree
         -F2FS_I(inode)->extent_tree is NULL
        -default_options && parse_options
    -remount return
<---  -f2fs_map_blocks
          -f2fs_lookup_extent_tree
                                                              -f2fs_bug_on(sbi, !et);

The same problem with f2fs_new_inode.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: try to freeze in gc and discard threads
Jaegeuk Kim [Wed, 17 May 2017 17:36:58 +0000 (10:36 -0700)]
f2fs: try to freeze in gc and discard threads

This allows to freeze gc and discard threads.

Cc: stable@vger.kernel.org
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: add a new function get_ssr_cost
Yunlei He [Wed, 17 May 2017 09:22:51 +0000 (17:22 +0800)]
f2fs: add a new function get_ssr_cost

This patch add a new method get_ssr_cost to select
SSR segment more accurately.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: declare load_free_nid_bitmap static
Hou Pengyang [Wed, 17 May 2017 02:48:48 +0000 (02:48 +0000)]
f2fs: declare load_free_nid_bitmap static

Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: avoid f2fs_lock_op for IPU writes
Jaegeuk Kim [Fri, 12 May 2017 20:51:34 +0000 (13:51 -0700)]
f2fs: avoid f2fs_lock_op for IPU writes

Currently, if we do get_node_of_data before f2fs_lock_op, there may be dead lock
as follows, where process A would be in infinite loop, and B will NOT be awaked.

Process A(cp):            Process B:
f2fs_lock_all(sbi)
                        get_dnode_of_data <---- lock dn.node_page
flush_nodes             f2fs_lock_op

So, this patch adds f2fs_trylock_op to avoid f2fs_lock_op done by IPU.

Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: split bio cache
Jaegeuk Kim [Wed, 10 May 2017 18:18:25 +0000 (11:18 -0700)]
f2fs: split bio cache

Split DATA/NODE type bio cache according to different temperature,
so write IOs with the same temperature can be merged in corresponding
bio cache as much as possible, otherwise, different temperature write
IOs submitting into one bio cache will always cause split of bio.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
 Conflicts:
include/trace/events/f2fs.h

7 years agof2fs: use fio instead of multiple parameters
Jaegeuk Kim [Wed, 10 May 2017 21:19:54 +0000 (14:19 -0700)]
f2fs: use fio instead of multiple parameters

This patch just changes using fio instead of parameters.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: remove unnecessary read cases in merged IO flow
Jaegeuk Kim [Wed, 10 May 2017 18:28:38 +0000 (11:28 -0700)]
f2fs: remove unnecessary read cases in merged IO flow

Merged IO flow doesn't need to care about read IOs.

f2fs_submit_merged_bio -> f2fs_submit_merged_write
f2fs_submit_merged_bios -> f2fs_submit_merged_writes
f2fs_submit_merged_bio_cond -> f2fs_submit_merged_write_cond

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: use f2fs_submit_page_bio for ra_meta_pages
Jaegeuk Kim [Wed, 10 May 2017 18:23:36 +0000 (11:23 -0700)]
f2fs: use f2fs_submit_page_bio for ra_meta_pages

This patch avoids to use f2fs_submit_merged_bio for read, which was the only
read case.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: make sure f2fs_gc returns consistent errno
Weichao Guo [Wed, 10 May 2017 20:28:00 +0000 (04:28 +0800)]
f2fs: make sure f2fs_gc returns consistent errno

By default, f2fs_gc returns -EINVAL in general error cases, e.g., no victim
was selected. However, the default errno may be overwritten in two cases:
gc_more and BG_GC -> FG_GC. We should return consistent errno in such cases.

Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: load inode's flag from disk
Jaegeuk Kim [Tue, 16 May 2017 20:20:16 +0000 (13:20 -0700)]
f2fs: load inode's flag from disk

This patch fixes missing inode flag loaded from disk, reported by Tom.

[tom@localhost ~]$ sudo mount /dev/loop0 /mnt/
[tom@localhost ~]$ sudo chown tom:tom /mnt/
[tom@localhost ~]$ touch /mnt/testfile
[tom@localhost ~]$ sudo chattr +i /mnt/testfile
[tom@localhost ~]$ echo test > /mnt/testfile
bash: /mnt/testfile: Operation not permitted
[tom@localhost ~]$ rm /mnt/testfile
rm: cannot remove '/mnt/testfile': Operation not permitted
[tom@localhost ~]$ sudo umount /mnt/
[tom@localhost ~]$ sudo mount /dev/loop0 /mnt/
[tom@localhost ~]$ lsattr /mnt/testfile
----i-------------- /mnt/testfile
[tom@localhost ~]$ echo test > /mnt/testfile
[tom@localhost ~]$ rm /mnt/testfile
[tom@localhost ~]$ sudo umount /mnt/

Cc: stable@vger.kernel.org
Reported-by: Tom Yan <tom.ty89@outlook.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: sanity check checkpoint segno and blkoff
Jin Qian [Mon, 15 May 2017 17:45:08 +0000 (10:45 -0700)]
f2fs: sanity check checkpoint segno and blkoff

Make sure segno and blkoff read from raw image are valid.

Cc: stable@vger.kernel.org
Signed-off-by: Jin Qian <jinqian@google.com>
[Jaegeuk Kim: adjust minor coding style]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs, block_dump: give WRITE direction to submit_bio
Jaegeuk Kim [Thu, 6 Jul 2017 19:24:49 +0000 (12:24 -0700)]
f2fs, block_dump: give WRITE direction to submit_bio

The block_dump in submit_bio uses rw, instead of bio->bi_rw.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agofscrypt: correct collision claim for digested names
Eric Biggers [Mon, 1 May 2017 18:43:32 +0000 (11:43 -0700)]
fscrypt: correct collision claim for digested names

As I noted on the mailing list, it's easier than I originally thought to
create intentional collisions in the digested names.  Unfortunately it's
not too easy to solve this, so for now just fix the comment to not lie.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
7 years agof2fs: switch to using fscrypt_match_name()
Eric Biggers [Mon, 24 Apr 2017 17:00:12 +0000 (10:00 -0700)]
f2fs: switch to using fscrypt_match_name()

Switch f2fs directory searches to use the fscrypt_match_name() helper
function.  There should be no functional change.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
7 years agofscrypt: introduce helper function for filename matching
Eric Biggers [Mon, 24 Apr 2017 17:00:10 +0000 (10:00 -0700)]
fscrypt: introduce helper function for filename matching

Introduce a helper function fscrypt_match_name() which tests whether a
fscrypt_name matches a directory entry.  Also clean up the magic numbers
and document things properly.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
7 years agofscrypt: fix context consistency check when key(s) unavailable
Eric Biggers [Fri, 7 Apr 2017 17:58:37 +0000 (10:58 -0700)]
fscrypt: fix context consistency check when key(s) unavailable

To mitigate some types of offline attacks, filesystem encryption is
designed to enforce that all files in an encrypted directory tree use
the same encryption policy (i.e. the same encryption context excluding
the nonce).  However, the fscrypt_has_permitted_context() function which
enforces this relies on comparing struct fscrypt_info's, which are only
available when we have the encryption keys.  This can cause two
incorrect behaviors:

1. If we have the parent directory's key but not the child's key, or
   vice versa, then fscrypt_has_permitted_context() returned false,
   causing applications to see EPERM or ENOKEY.  This is incorrect if
   the encryption contexts are in fact consistent.  Although we'd
   normally have either both keys or neither key in that case since the
   master_key_descriptors would be the same, this is not guaranteed
   because keys can be added or removed from keyrings at any time.

2. If we have neither the parent's key nor the child's key, then
   fscrypt_has_permitted_context() returned true, causing applications
   to see no error (or else an error for some other reason).  This is
   incorrect if the encryption contexts are in fact inconsistent, since
   in that case we should deny access.

To fix this, retrieve and compare the fscrypt_contexts if we are unable
to set up both fscrypt_infos.

While this slightly hurts performance when accessing an encrypted
directory tree without the key, this isn't a case we really need to be
optimizing for; access *with* the key is much more important.
Furthermore, the performance hit is barely noticeable given that we are
already retrieving the fscrypt_context and doing two keyring searches in
fscrypt_get_encryption_info().  If we ever actually wanted to optimize
this case we might start by caching the fscrypt_contexts.

Cc: stable@vger.kernel.org # 4.0+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
7 years agofscrypt: Move key structure and constants to uapi
Joe Richey [Thu, 6 Apr 2017 23:14:05 +0000 (16:14 -0700)]
fscrypt: Move key structure and constants to uapi

This commit exposes the necessary constants and structures for a
userspace program to pass filesystem encryption keys into the keyring.
The fscrypt_key structure was already part of the kernel ABI, this
change just makes it so programs no longer have to redeclare these
structures (like e4crypt in e2fsprogs currently does).

Note that we do not expose the other FS_*_KEY_SIZE constants as they are
not necessary. Only XTS is supported for contents_encryption_mode, so
currently FS_MAX_KEY_SIZE bytes of key material must always be passed to
the kernel.

This commit also removes __packed from fscrypt_key as it does not
contain any implicit padding and does not refer to an on-disk structure.

Signed-off-by: Joe Richey <joerichey@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
7 years agofscrypt: remove fscrypt_symlink_data_len()
Eric Biggers [Tue, 4 Apr 2017 21:43:34 +0000 (14:43 -0700)]
fscrypt: remove fscrypt_symlink_data_len()

fscrypt_symlink_data_len() is never called and can be removed.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
7 years agofscrypt: remove unnecessary checks for NULL operations
Eric Biggers [Tue, 4 Apr 2017 21:39:41 +0000 (14:39 -0700)]
fscrypt: remove unnecessary checks for NULL operations

The functions in fs/crypto/*.c are only called by filesystems configured
with encryption support.  Since the ->get_context(), ->set_context(),
and ->empty_dir() operations are always provided in that case (and must
be, otherwise there would be no way to get/set encryption policies, or
in the case of ->get_context() even access encrypted files at all),
there is no need to check for these operations being NULL and we can
remove these unneeded checks.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
7 years agofscrypt: eliminate ->prepare_context() operation
Eric Biggers [Wed, 22 Feb 2017 21:25:14 +0000 (13:25 -0800)]
fscrypt: eliminate ->prepare_context() operation

The only use of the ->prepare_context() fscrypt operation was to allow
ext4 to evict inline data from the inode before ->set_context().
However, there is no reason why this cannot be done as simply the first
step in ->set_context(), and in fact it makes more sense to do it that
way because then the policy modes and flags get validated before any
real work is done.  Therefore, merge ext4_prepare_context() into
ext4_set_context(), and remove ->prepare_context().

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
 Conflicts:
fs/ext4/super.c

7 years agofscrypt: remove broken support for detecting keyring key revocation
Eric Biggers [Tue, 21 Feb 2017 23:07:11 +0000 (15:07 -0800)]
fscrypt: remove broken support for detecting keyring key revocation

Filesystem encryption ostensibly supported revoking a keyring key that
had been used to "unlock" encrypted files, causing those files to become
"locked" again.  This was, however, buggy for several reasons, the most
severe of which was that when key revocation happened to be detected for
an inode, its fscrypt_info was immediately freed, even while other
threads could be using it for encryption or decryption concurrently.
This could be exploited to crash the kernel or worse.

This patch fixes the use-after-free by removing the code which detects
the keyring key having been revoked, invalidated, or expired.  Instead,
an encrypted inode that is "unlocked" now simply remains unlocked until
it is evicted from memory.  Note that this is no worse than the case for
block device-level encryption, e.g. dm-crypt, and it still remains
possible for a privileged user to evict unused pages, inodes, and
dentries by running 'sync; echo 3 > /proc/sys/vm/drop_caches', or by
simply unmounting the filesystem.  In fact, one of those actions was
already needed anyway for key revocation to work even somewhat sanely.
This change is not expected to break any applications.

In the future I'd like to implement a real API for fscrypt key
revocation that interacts sanely with ongoing filesystem operations ---
waiting for existing operations to complete and blocking new operations,
and invalidating and sanitizing key material and plaintext from the VFS
caches.  But this is a hard problem, and for now this bug must be fixed.

This bug affected almost all versions of ext4, f2fs, and ubifs
encryption, and it was potentially reachable in any kernel configured
with encryption support (CONFIG_EXT4_ENCRYPTION=y,
CONFIG_EXT4_FS_ENCRYPTION=y, CONFIG_F2FS_FS_ENCRYPTION=y, or
CONFIG_UBIFS_FS_ENCRYPTION=y).  Note that older kernels did not use the
shared fs/crypto/ code, but due to the potential security implications
of this bug, it may still be worthwhile to backport this fix to them.

Fixes: b7236e21d55f ("ext4 crypto: reorganize how we store keys in the inode")
Cc: stable@vger.kernel.org # v4.2+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Acked-by: Michael Halcrow <mhalcrow@google.com>
7 years agofscrypt: avoid collisions when presenting long encrypted filenames
Eric Biggers [Mon, 24 Apr 2017 17:00:09 +0000 (10:00 -0700)]
fscrypt: avoid collisions when presenting long encrypted filenames

When accessing an encrypted directory without the key, userspace must
operate on filenames derived from the ciphertext names, which contain
arbitrary bytes.  Since we must support filenames as long as NAME_MAX,
we can't always just base64-encode the ciphertext, since that may make
it too long.  Currently, this is solved by presenting long names in an
abbreviated form containing any needed filesystem-specific hashes (e.g.
to identify a directory block), then the last 16 bytes of ciphertext.
This needs to be sufficient to identify the actual name on lookup.

However, there is a bug.  It seems to have been assumed that due to the
use of a CBC (ciphertext block chaining)-based encryption mode, the last
16 bytes (i.e. the AES block size) of ciphertext would depend on the
full plaintext, preventing collisions.  However, we actually use CBC
with ciphertext stealing (CTS), which handles the last two blocks
specially, causing them to appear "flipped".  Thus, it's actually the
second-to-last block which depends on the full plaintext.

This caused long filenames that differ only near the end of their
plaintexts to, when observed without the key, point to the wrong inode
and be undeletable.  For example, with ext4:

    # echo pass | e4crypt add_key -p 16 edir/
    # seq -f "edir/abcdefghijklmnopqrstuvwxyz012345%.0f" 100000 | xargs touch
    # find edir/ -type f | xargs stat -c %i | sort | uniq | wc -l
    100000
    # sync
    # echo 3 > /proc/sys/vm/drop_caches
    # keyctl new_session
    # find edir/ -type f | xargs stat -c %i | sort | uniq | wc -l
    2004
    # rm -rf edir/
    rm: cannot remove 'edir/_A7nNFi3rhkEQlJ6P,hdzluhODKOeWx5V': Structure needs cleaning
    ...

To fix this, when presenting long encrypted filenames, encode the
second-to-last block of ciphertext rather than the last 16 bytes.

Although it would be nice to solve this without depending on a specific
encryption mode, that would mean doing a cryptographic hash like SHA-256
which would be much less efficient.  This way is sufficient for now, and
it's still compatible with encryption modes like HEH which are strong
pseudorandom permutations.  Also, changing the presented names is still
allowed at any time because they are only provided to allow applications
to do things like delete encrypted directories.  They're not designed to
be used to persistently identify files --- which would be hard to do
anyway, given that they're encrypted after all.

For ease of backports, this patch only makes the minimal fix to both
ext4 and f2fs.  It leaves ubifs as-is, since ubifs doesn't compare the
ciphertext block yet.  Follow-on patches will clean things up properly
and make the filesystems use a shared helper function.

Fixes: 5de0b4d0cd15 ("ext4 crypto: simplify and speed up filename encryption")
Reported-by: Gwendal Grignou <gwendal@chromium.org>
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
7 years agof2fs: check entire encrypted bigname when finding a dentry
Jaegeuk Kim [Mon, 24 Apr 2017 17:00:08 +0000 (10:00 -0700)]
f2fs: check entire encrypted bigname when finding a dentry

If user has no key under an encrypted dir, fscrypt gives digested dentries.
Previously, when looking up a dentry, f2fs only checks its hash value with
first 4 bytes of the digested dentry, which didn't handle hash collisions fully.
This patch enhances to check entire dentry bytes likewise ext4.

Eric reported how to reproduce this issue by:

 # seq -f "edir/abcdefghijklmnopqrstuvwxyz012345%.0f" 100000 | xargs touch
 # find edir -type f | xargs stat -c %i | sort | uniq | wc -l
100000
 # sync
 # echo 3 > /proc/sys/vm/drop_caches
 # keyctl new_session
 # find edir -type f | xargs stat -c %i | sort | uniq | wc -l
99999

Cc: <stable@vger.kernel.org>
Reported-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
(fixed f2fs_dentry_hash() to work even when the hash is 0)
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
 Conflicts:
fs/f2fs/inline.c

7 years agof2fs: sync f2fs_lookup() with ext4_lookup()
Eric Biggers [Fri, 7 Apr 2017 17:58:39 +0000 (10:58 -0700)]
f2fs: sync f2fs_lookup() with ext4_lookup()

As for ext4, now that fscrypt_has_permitted_context() correctly handles
the case where we have the key for the parent directory but not the
child, f2fs_lookup() no longer has to work around it.  Also add the same
warning message that ext4 uses.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
7 years agof2fs: fix a mount fail for wrong next_scan_nid
Yunlei He [Wed, 26 Apr 2017 07:56:52 +0000 (15:56 +0800)]
f2fs: fix a mount fail for wrong next_scan_nid

-write_checkpoint
   -do_checkpoint
      -next_free_nid    <--- something wrong with next free nid

-f2fs_fill_super
   -build_node_manager
      -build_free_nids
          -get_current_nat_page
             -__get_meta_page   <--- attempt to access beyond end of device

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: relocate inode_{,un}lock in F2FS_IOC_SETFLAGS
Chao Yu [Wed, 3 May 2017 15:59:13 +0000 (23:59 +0800)]
f2fs: relocate inode_{,un}lock in F2FS_IOC_SETFLAGS

This patch expands cover region of inode->i_rwsem to keep setting flag
atomically.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: show available_nids in f2fs/status
Jaegeuk Kim [Tue, 2 May 2017 01:13:03 +0000 (18:13 -0700)]
f2fs: show available_nids in f2fs/status

This patch adds an entry in f2fs/status to show # of available nids.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: flush dirty nats periodically
Jaegeuk Kim [Tue, 2 May 2017 01:09:44 +0000 (18:09 -0700)]
f2fs: flush dirty nats periodically

This patch flushes dirty nats in order to acquire available nids by writing
checkpoint. Otherwise, we can have no chance to get freed nids.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: introduce CP_TRIMMED_FLAG to avoid unneeded discard
Chao Yu [Fri, 28 Apr 2017 05:56:08 +0000 (13:56 +0800)]
f2fs: introduce CP_TRIMMED_FLAG to avoid unneeded discard

Introduce CP_TRIMMED_FLAG to indicate all invalid block were trimmed
before umount, so once we do mount with image which contain the flag,
we don't record invalid blocks as undiscard one, when fstrim is being
triggered, we can avoid issuing redundant discard commands.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: allow cpc->reason to indicate more than one reason
Chao Yu [Thu, 27 Apr 2017 12:40:39 +0000 (20:40 +0800)]
f2fs: allow cpc->reason to indicate more than one reason

Change to use different bits of cpc->reason to indicate different status,
so cpc->reason can indicate more than one reason.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: release cp and dnode lock before IPU
Hou Pengyang [Wed, 26 Apr 2017 16:17:21 +0000 (00:17 +0800)]
f2fs: release cp and dnode lock before IPU

We don't need to rewrite the page under cp_rwsem and dnode locks.

Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: shrink size of struct discard_cmd
Chao Yu [Wed, 26 Apr 2017 09:39:55 +0000 (17:39 +0800)]
f2fs: shrink size of struct discard_cmd

In order to shrink size of struct discard_cmd, change variable type of
@state in struct discard_cmd from int to unsigned char.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: don't hold cmd_lock during waiting discard command
Chao Yu [Wed, 26 Apr 2017 09:39:54 +0000 (17:39 +0800)]
f2fs: don't hold cmd_lock during waiting discard command

Previously, with protection of cmd_lock, we will wait for end io of
discard command which potentially may lead long latency, making worse
concurrency.

So, in this patch, we try to add reference into discard entry to prevent
the entry being released by other thread, then we can avoid holding
global cmd_lock during waiting discard to finish.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: nullify fio->encrypted_page for each writes
Jaegeuk Kim [Wed, 26 Apr 2017 18:11:12 +0000 (11:11 -0700)]
f2fs: nullify fio->encrypted_page for each writes

This makes sure each write request has nullified encrypted_page pointer.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
 Conflicts:
fs/f2fs/segment.c

7 years agof2fs: sanity check segment count
Jin Qian [Tue, 25 Apr 2017 23:28:48 +0000 (16:28 -0700)]
f2fs: sanity check segment count

F2FS uses 4 bytes to represent block address. As a result, supported
size of disk is 16 TB and it equals to 16 * 1024 * 1024 / 2 segments.

Signed-off-by: Jin Qian <jinqian@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: introduce valid_ipu_blkaddr to clean up
Jaegeuk Kim [Mon, 24 Apr 2017 22:20:16 +0000 (15:20 -0700)]
f2fs: introduce valid_ipu_blkaddr to clean up

This patch introduces valid_ipu_blkaddr to clean up checking block address for
inplace-update.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: lookup extent cache first under IPU scenario
Hou Pengyang [Tue, 25 Apr 2017 12:45:13 +0000 (12:45 +0000)]
f2fs: lookup extent cache first under IPU scenario

If a page is cold, NOT atomit written and need_ipu now, there is
a high probability that IPU should be adapted. For IPU, we try to
check extent tree to get the block index first, instead of reading
the dnode page, where may lead to an useless dnode IO, since no need to
update the dnode index for IPU.

Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
 Conflicts:
fs/f2fs/gc.c

7 years agof2fs: reconstruct code to write a data page
Hou Pengyang [Tue, 25 Apr 2017 12:45:12 +0000 (12:45 +0000)]
f2fs: reconstruct code to write a data page

This patch introduces encrypt_one_page which encrypts one data page before
submit_bio, and change the use of need_inplace_update.

Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: introduce __wait_discard_cmd
Chao Yu [Tue, 25 Apr 2017 12:21:38 +0000 (20:21 +0800)]
f2fs: introduce __wait_discard_cmd

Just cleanup, no logic change.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: introduce __issue_discard_cmd
Chao Yu [Tue, 25 Apr 2017 12:21:37 +0000 (20:21 +0800)]
f2fs: introduce __issue_discard_cmd

Just cleanup, no logic change.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: enable small discard by default
Chao Yu [Mon, 24 Apr 2017 16:21:35 +0000 (00:21 +0800)]
f2fs: enable small discard by default

This patch start to enable 4K granularity small discard by default
when realtime discard is on, so, in seriously fragmented space,
small size discard can be issued in time to avoid useless storage
space occupying of invalid filesystem's data, then performance of
flash storage can be recovered.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: delay awaking discard thread
Chao Yu [Mon, 24 Apr 2017 16:21:34 +0000 (00:21 +0800)]
f2fs: delay awaking discard thread

It's better to delay awaking discard thread while queuing discard commands
in checkpoint, it will help to give more chances for merging big and small
discard.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: seperate read nat page from nat_tree_lock
Yunlei He [Sat, 22 Apr 2017 10:06:26 +0000 (18:06 +0800)]
f2fs: seperate read nat page from nat_tree_lock

This patch seperate nat page read io from nat_tree_lock.

-lock_page
-get_node_info()
-current_nat_addr

......            ->       write_checkpoint

-get_meta_page

Because we lock node page, we can make sure no other threads
modify this nid concurrently. So we just obtain current_nat_addr
under nat_tree_lock, node info is always same in both nat pack.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: fix multiple f2fs_add_link() having same name for inline dentry
Sheng Yong [Sat, 22 Apr 2017 02:39:20 +0000 (10:39 +0800)]
f2fs: fix multiple f2fs_add_link() having same name for inline dentry

Commit 88c5c13a5027 (f2fs: fix multiple f2fs_add_link() calls having
same name) does not cover the scenario where inline dentry is enabled.
In that case, F2FS_I(dir)->task will be NULL, and __f2fs_add_link will
lookup dentries one more time.

This patch fixes it by moving the assigment of current task to a upper
level to cover both normal and inline dentry.

Cc: <stable@vger.kernel.org>
Fixes: 88c5c13a5027 (f2fs: fix multiple f2fs_add_link() calls having same name)
Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: skip encrypted inode in ASYNC IPU policy
Hou Pengyang [Fri, 21 Apr 2017 12:41:48 +0000 (12:41 +0000)]
f2fs: skip encrypted inode in ASYNC IPU policy

Async request may be throttled in block layer, so page for async may keep WRITE_BACK
for a long time.

For encrytped inode, we need wait on page writeback no matter if the device supports
BDI_CAP_STABLE_WRITES. This may result in a higher waiting page writeback time for
async encrypted inode page.

This patch skips IPU for encrypted inode's updating write.

Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: fix out-of free segments
Jaegeuk Kim [Thu, 20 Apr 2017 20:51:57 +0000 (13:51 -0700)]
f2fs: fix out-of free segments

This patch also reverts d0db7703ac1 ("f2fs: do SSR in higher priority").

This patch fixes out of free segments caused by many small file creation by
1) mkfs -s 1 2G
2) mount
3) untar
 - preoduce 60000 small files burstly
4) sync
 - flush node pages
 - flush imeta

Here, when we do f2fs_balance_fs, we missed # of imeta blocks, resulting in
skipping to check has_not_enough_free_secs.

Another test is done by
1) mkfs -s 12 2G
2) mount
3) untar
 - preoduce 60000 small files burstly
4) sync
 - flush node pages
 - flush imeta

In this case, this patch also fixes wrong block allocation under large section
size.

Reported-by: William Brana <wbrana@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: improve definition of statistic macros
Arnd Bergmann [Wed, 19 Apr 2017 17:38:33 +0000 (19:38 +0200)]
f2fs: improve definition of statistic macros

With a recent addition of f2fs_lookup_extent_tree(), we get a warning about
the use of empty macros:

fs/f2fs/extent_cache.c: In function 'f2fs_lookup_extent_tree':
fs/f2fs/extent_cache.c:358:32: error: suggest braces around empty body in an 'else' statement [-Werror=empty-body]
   stat_inc_rbtree_node_hit(sbi);

A good way to avoid the warning and make the code more robust is to define
all no-op macros as 'do { } while (0)'.

Fixes: 54c2258cd63a ("f2fs: extract rb-tree operation infrastructure")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reivewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: assign allocation hint for warm/cold data
Jaegeuk Kim [Tue, 18 Apr 2017 22:03:15 +0000 (15:03 -0700)]
f2fs: assign allocation hint for warm/cold data

This patch gives slower device region to warm/cold data area more eagerly.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: fix _IOW usage
Jaegeuk Kim [Tue, 18 Apr 2017 20:47:25 +0000 (13:47 -0700)]
f2fs: fix _IOW usage

This patch fixes wrong _IOW usage.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: add ioctl to flush data from faster device to cold area
Jaegeuk Kim [Thu, 13 Apr 2017 22:17:00 +0000 (15:17 -0700)]
f2fs: add ioctl to flush data from faster device to cold area

This patch adds an ioctl to flush data in faster device to cold area. User can
give device number and number of segments to move. It doesn't move it if there
is only one device.

The parameter looks like:

struct f2fs_flush_device {
u32 dev_num; /* device number to flush */
u32 segments; /* # of segments to flush */
};

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
7 years agof2fs: introduce async IPU policy
Hou Pengyang [Tue, 18 Apr 2017 11:57:16 +0000 (11:57 +0000)]
f2fs: introduce async IPU policy

This patch introduces an ASYNC IPU policy.

Under senario of large # of async updating(e.g. log writing in Android),
disk would be seriously fragmented, and higher frequent gc would be triggered.

This patch uses IPU to rewrite the async update writting, since async is
NOT sensitive to io latency.

Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
7 years agof2fs: add undiscard blocks stat
Chao Yu [Tue, 18 Apr 2017 11:27:39 +0000 (19:27 +0800)]
f2fs: add undiscard blocks stat

This patch adds to account undiscard blocks.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
7 years agof2fs: unlock cp_rwsem early for IPU writes
Chao Yu [Tue, 18 Apr 2017 11:23:39 +0000 (19:23 +0800)]
f2fs: unlock cp_rwsem early for IPU writes

For IPU writes, there won't be any udpates in dnode page since we
will reuse old block address instead of allocating new one, so we
don't need to lock cp_rwsem during IPU IO submitting.

Signed-off-by: Chao Yu <yuchao0@huawei.com>