git.osdn.net Git - sagit-ice-cold/kernel_xiaomi

(root) / sagit-ice-cold / kernel_xiaomi_msm8998.git / log

Sheng Yong [Sat, 21 Apr 2018 06:12:50 +0000 (14:12 +0800)]

f2fs: remove duplicated dquot_initialize and fix error handling

This patch removes duplicated dquot_initialize in recover_orphan_inode(),
and fix the error handling if dquot_initialize fails.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Yunlei He [Fri, 13 Apr 2018 03:08:05 +0000 (11:08 +0800)]

f2fs: stop issue discard if something wrong with f2fs

v4->v5: move data corruption check to __submit_discard_cmd, in order to
control discard io submitted more accurately, besides, increase async
thread wait time if data corruption detected.

This patch stop async thread and umount process to issue discard
if something wrong with f2fs, which is similar to fstrim.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Chao Yu [Wed, 18 Apr 2018 09:45:02 +0000 (17:45 +0800)]

f2fs: fix return value in f2fs_ioc_commit_atomic_write

In f2fs_ioc_commit_atomic_write, if file is volatile, return -EINVAL to
indicate that commit failure.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Yunlei He [Wed, 18 Apr 2018 03:06:39 +0000 (11:06 +0800)]

f2fs: allocate hot_data for atomic write more strictly

If a file not set type as hot, has dirty pages more than
threshold 64 before starting atomic write, may be lose hot
flag.

v1->v2: move set FI_ATOMIC_FILE flag behind flush dirty pages too,
in case of dirty pages before starting atomic use atomic mode to
write back.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Sheng Yong [Tue, 17 Apr 2018 09:12:27 +0000 (17:12 +0800)]

f2fs: check if inmem_pages list is empty correctly

`cur' will never be NULL, we should check inmem_pages list instead.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Chao Yu [Tue, 17 Apr 2018 09:51:28 +0000 (17:51 +0800)]

f2fs: fix race in between GC and atomic open

Thread GC thread
- f2fs_ioc_start_atomic_write
- get_dirty_pages
- filemap_write_and_wait_range
- f2fs_gc
- do_garbage_collect
  - gc_data_segment
   - move_data_page
    - f2fs_is_atomic_file
    - set_page_dirty
- set_inode_flag(, FI_ATOMIC_FILE)

Dirty data page can still be generated by GC in race condition as
above call stack.

This patch adds fi->dio_rwsem[WRITE] in f2fs_ioc_start_atomic_write
to avoid such race.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Zhikang Zhang [Fri, 13 Apr 2018 17:02:34 +0000 (01:02 +0800)]

f2fs: change le32 to le16 of f2fs_inode->i_extra_size

In the structure of f2fs_inode, i_extra_size's type is __le16,
so we should keep type consistent when using it.

Fixes: 704956ecf5bc ("f2fs: support inode checksum")
Signed-off-by: Zhikang Zhang <zhangzhikang1@huawei.com>
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Zhikang Zhang [Sun, 8 Apr 2018 20:28:41 +0000 (04:28 +0800)]

f2fs: check cur_valid_map_mir & raw_sit block count when flush sit entries

We should check valid_map_mir and block count to ensure
the flushed raw_sit is correct.

Signed-off-by: Zhikang Zhang <zhangzhikang1@huawei.com>
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Chao Yu [Sun, 8 Apr 2018 12:39:03 +0000 (20:39 +0800)]

f2fs: correct return value of f2fs_trim_fs

Correct return value in two cases:
- return EINVAL if end boundary is out-of-range.
- return EIO if fs needs off-line check.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Chao Yu [Sun, 8 Apr 2018 03:27:14 +0000 (11:27 +0800)]

f2fs: fix to show missing bits in FS_IOC_GETFLAGS

This patch fixes to show missing encrypt/inline_data flag in
FS_IOC_GETFLAGS like ext4 does.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Chao Yu [Sun, 8 Apr 2018 03:25:53 +0000 (11:25 +0800)]

f2fs: remove unneeded F2FS_PROJINHERIT_FL

Now F2FS_FL_USER_VISIBLE and F2FS_FL_USER_MODIFIABLE has included
F2FS_PROJINHERIT_FL, so remove unneeded F2FS_PROJINHERIT_FL when
using visible/modifiable flag macro.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Chao Yu [Mon, 9 Apr 2018 12:25:06 +0000 (20:25 +0800)]

f2fs: don't use GFP_ZERO for page caches

Related to https://lkml.org/lkml/2018/4/8/661

Sometimes, we need to write meta data to new allocated block address,
then we will allocate a zeroed page in inner inode's address space, and
fill partial data in it, and leave other place with zero value which means
some fields are initial status.

There are two inner inodes (meta inode and node inode) setting __GFP_ZERO,
I have just checked them, for both of them, we can avoid using __GFP_ZERO,
and do initialization by ourselves to avoid unneeded/redundant zeroing
from mm.

Cc: <stable@vger.kernel.org>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Yunlei He [Wed, 4 Apr 2018 09:29:05 +0000 (17:29 +0800)]

f2fs: issue all big range discards in umount process

This patch modify max_requests to UINT_MAX, to issue
all big range discards in umount.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Chao Yu [Wed, 4 Apr 2018 09:35:13 +0000 (17:35 +0800)]

f2fs: remove redundant block plug

For buffered IO, we don't need to use block plug to cache bio,
for direct IO, generic f2fs_direct_IO has already added block
plug, so let's remove redundant one in .write_iter.

As Yunlei described in his patch:

-f2fs_file_write_iter
  -blk_start_plug
    -__generic_file_write_iter
...
  -do_blockdev_direct_IO
    -blk_start_plug
...
    -blk_finish_plug
...
  -blk_finish_plug

which may conduct performance decrease in our platform

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Yunlong Song [Tue, 3 Apr 2018 11:42:41 +0000 (19:42 +0800)]

f2fs: remove unmatched zero_user_segment when convert inline dentry

Since the layout of regular dentry block is different from inline dentry
block, zero_user_segment starting from MAX_INLINE_DATA(dir) is not
correct for regular dentry block, besides, bitmap is already copied and
used, so there is no necessary to zero page at all, so just remove the
zero_user_segment is OK.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Chao Yu [Tue, 3 Apr 2018 07:08:17 +0000 (15:08 +0800)]

f2fs: introduce private inode status mapping

Previously, we use generic FS_*_FL defined by vfs to indicate inode status
for each bit of i_flags, so f2fs's flag status definition is tied to vfs'
one, it will be hard for f2fs to reuse bits f2fs never used to indicate
new status..

In order to solve this issue, we introduce private inode status mapping,
Note, for these bits have already been persisted into disk, we should
never change their definition, for other ones, we can remap them for
later new coming status.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Eric Biggers [Fri, 18 May 2018 17:58:14 +0000 (10:58 -0700)]

fscrypt: log the crypto algorithm implementations

Log the crypto algorithm driver name for each fscrypt encryption mode on
its first use, also showing a friendly name for the mode.

This will help people determine whether the expected implementations are
being used.  In some cases we've seen people do benchmarks and reject
using encryption for performance reasons, when in fact they used a much
slower implementation of AES-XTS than was possible on the hardware.  It
can make an enormous difference; e.g., AES-XTS on ARM is about 10x
faster with the crypto extensions (AES instructions) than without.

This also makes it more obvious which modes are being used, now that
fscrypt supports multiple combinations of modes.

Example messages (with default modes, on x86_64):

[   35.492057] fscrypt: AES-256-CTS-CBC using implementation "cts(cbc-aes-aesni)"
[   35.492171] fscrypt: AES-256-XTS using implementation "xts-aes-aesni"

Note: algorithms can be dynamically added to the crypto API, which can
result in different implementations being used at different times.  But
this is rare; for most users, showing the first will be good enough.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

commit | commitdiff | tree

Herbert Xu [Sat, 23 Jan 2016 05:51:01 +0000 (13:51 +0800)]

crypto: api - Add crypto_type_has_alg helper

This patch adds the helper crypto_type_has_alg which is meant
to replace crypto_has_alg for new-style crypto types. Rather
than hard-coding type/mask information they're now retrieved
from the crypto_type object.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

commit | commitdiff | tree

Herbert Xu [Tue, 12 Jul 2016 05:17:31 +0000 (13:17 +0800)]

crypto: skcipher - Add low-level skcipher interface

This patch allows skcipher algorithms and instances to be created
and registered with the crypto API. They are accessible through
the top-level skcipher interface, along with ablkcipher/blkcipher
algorithms and instances.

This patch also introduces a new parameter called chunk size
which is meant for ciphers such as CTR and CTS which ostensibly
can handle arbitrary lengths, but still behave like block ciphers
in that you can only process a partial block at the very end.

For these ciphers the block size will continue to be set to 1
as it is now while the chunk size will be set to the underlying
block size.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

commit | commitdiff | tree

Herbert Xu [Tue, 26 Jan 2016 14:14:36 +0000 (22:14 +0800)]

crypto: skcipher - Add helper to retrieve driver name

This patch adds the helper crypto_skcipher_driver_name which returns
the driver name of the alg object for a given tfm. This is needed by
ecryptfs.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

commit | commitdiff | tree

Herbert Xu [Thu, 21 Jan 2016 09:10:56 +0000 (17:10 +0800)]

crypto: skcipher - Add default key size helper

While converting ecryptfs over to skcipher I found that it needs
to pick a default key size if one isn't given. Rather than having
it poke into the guts of the algorithm to get max_keysize, let's
provide a helper that is meant to give a sane default (just in
case we ever get an algorithm that has no maximum key size).

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

commit | commitdiff | tree

Eric Biggers [Tue, 8 May 2018 00:22:08 +0000 (17:22 -0700)]

fscrypt: add Speck128/256 support

fscrypt currently only supports AES encryption.  However, many low-end
mobile devices have older CPUs that don't have AES instructions, e.g.
the ARMv8 Cryptography Extensions.  Currently, user data on such devices
is not encrypted at rest because AES is too slow, even when the NEON
bit-sliced implementation of AES is used.  Unfortunately, it is
infeasible to encrypt these devices at all when AES is the only option.

Therefore, this patch updates fscrypt to support the Speck block cipher,
which was recently added to the crypto API.  The C implementation of
Speck is not especially fast, but Speck can be implemented very
efficiently with general-purpose vector instructions, e.g. ARM NEON.
For example, on an ARMv7 processor, we measured the NEON-accelerated
Speck128/256-XTS at 69 MB/s for both encryption and decryption, while
AES-256-XTS with the NEON bit-sliced implementation was only 22 MB/s
encryption and 19 MB/s decryption.

There are multiple variants of Speck.  This patch only adds support for
Speck128/256, which is the variant with a 128-bit block size and 256-bit
key size -- the same as AES-256.  This is believed to be the most secure
variant of Speck, and it's only about 6% slower than Speck128/128.
Speck64/128 would be at least 20% faster because it has 20% rounds, and
it can be even faster on CPUs that can't efficiently do the 64-bit
operations needed for Speck128.  However, Speck64's 64-bit block size is
not preferred security-wise.  ARM NEON also supports the needed 64-bit
operations even on 32-bit CPUs, resulting in Speck128 being fast enough
for our targeted use cases so far.

The chosen modes of operation are XTS for contents and CTS-CBC for
filenames.  These are the same modes of operation that fscrypt defaults
to for AES.  Note that as with the other fscrypt modes, Speck will not
be used unless userspace chooses to use it.  Nor are any of the existing
modes (which are all AES-based) being removed, of course.

We intentionally don't make CONFIG_FS_ENCRYPTION select
CONFIG_CRYPTO_SPECK, so people will have to enable Speck support
themselves if they need it.  This is because we shouldn't bloat the
FS_ENCRYPTION dependencies with every new cipher, especially ones that
aren't recommended for most users.  Moreover, CRYPTO_SPECK is just the
generic implementation, which won't be fast enough for many users; in
practice, they'll need to enable CRYPTO_SPECK_NEON to get acceptable
performance.

More details about our choice of Speck can be found in our patches that
added Speck to the crypto API, and the follow-on discussion threads.
We're planning a publication that explains the choice in more detail.
But briefly, we can't use ChaCha20 as we previously proposed, since it
would be insecure to use a stream cipher in this context, with potential
IV reuse during writes on f2fs and/or on wear-leveling flash storage.

We also evaluated many other lightweight and/or ARX-based block ciphers
such as Chaskey-LTS, RC5, LEA, CHAM, Threefish, RC6, NOEKEON, SPARX, and
XTEA.  However, all had disadvantages vs. Speck, such as insufficient
performance with NEON, much less published cryptanalysis, or an
insufficient security level.  Various design choices in Speck make it
perform better with NEON than competing ciphers while still having a
security margin similar to AES, and in the case of Speck128 also the
same available security levels.  Unfortunately, Speck does have some
political baggage attached -- it's an NSA designed cipher, and was
rejected from an ISO standard (though for context, as far as I know none
of the above-mentioned alternatives are ISO standards either).
Nevertheless, we believe it is a good solution to the problem from a
technical perspective.

Certain algorithms constructed from ChaCha or the ChaCha permutation,
such as MEM (Masked Even-Mansour) or HPolyC, may also meet our
performance requirements.  However, these are new constructions that
need more time to receive the cryptographic review and acceptance needed
to be confident in their security.  HPolyC hasn't been published yet,
and we are concerned that MEM makes stronger assumptions about the
underlying permutation than the ChaCha stream cipher does.  In contrast,
the XTS mode of operation is relatively well accepted, and Speck has
over 70 cryptanalysis papers.  Of course, these ChaCha-based algorithms
can still be added later if they become ready.

The best known attack on Speck128/256 is a differential cryptanalysis
attack on 25 of 34 rounds with 2^253 time complexity and 2^125 chosen
plaintexts, i.e. only marginally faster than brute force.  There is no
known attack on the full 34 rounds.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

commit | commitdiff | tree

Eric Biggers [Mon, 30 Apr 2018 22:51:49 +0000 (15:51 -0700)]

fscrypt: only derive the needed portion of the key

Currently the key derivation function in fscrypt uses the master key
length as the amount of output key material to derive. This works, but
it means we can waste time deriving more key material than is actually
used, e.g. most commonly, deriving 64 bytes for directories which only
take a 32-byte AES-256-CTS-CBC key. It also forces us to validate that
the master key length is a multiple of AES_BLOCK_SIZE, which wouldn't
otherwise be necessary.

Fix it to only derive the needed length key.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

commit | commitdiff | tree

Eric Biggers [Mon, 30 Apr 2018 22:51:48 +0000 (15:51 -0700)]

fscrypt: separate key lookup from key derivation

Refactor the confusingly-named function 'validate_user_key()' into a new
function 'find_and_derive_key()' which first finds the keyring key, then
does the key derivation. Among other benefits this avoids the strange
behavior we had previously where if key derivation failed for some
reason, then we would fall back to the alternate key prefix. Now, we'll
only fall back to the alternate key prefix if a valid key isn't found.

This patch also improves the warning messages that are logged when the
keyring key's payload is invalid.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

commit | commitdiff | tree

Eric Biggers [Mon, 30 Apr 2018 22:51:47 +0000 (15:51 -0700)]

fscrypt: use a common logging function

Use a common function for fscrypt warning and error messages so that all
the messages are consistently ratelimited, include the "fscrypt:"
prefix, and include the filesystem name if applicable.

Also fix up a few of the log messages to be more descriptive.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

commit | commitdiff | tree

Eric Biggers [Mon, 30 Apr 2018 22:51:46 +0000 (15:51 -0700)]

fscrypt: remove internal key size constants

With one exception, the internal key size constants such as
FS_AES_256_XTS_KEY_SIZE are only used for the 'available_modes' array,
where they really only serve to obfuscate what the values are.  Also
some of the constants are unused, and the key sizes tend to be in the
names of the algorithms anyway.  In the past these values were also
misused, e.g. we used to have FS_AES_256_XTS_KEY_SIZE in places that
technically should have been FS_MAX_KEY_SIZE.

The exception is that FS_AES_128_ECB_KEY_SIZE is used for key
derivation.  But it's more appropriate to use
FS_KEY_DERIVATION_NONCE_SIZE for that instead.

Thus, just put the sizes directly in the 'available_modes' array.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

commit | commitdiff | tree

Eric Biggers [Mon, 30 Apr 2018 22:51:45 +0000 (15:51 -0700)]

fscrypt: remove unnecessary check for non-logon key type

We're passing 'key_type_logon' to request_key(), so the found key is
guaranteed to be of type "logon". Thus, there is no reason to check
later that the key is really a "logon" key.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

commit | commitdiff | tree

Eric Biggers [Mon, 30 Apr 2018 22:51:44 +0000 (15:51 -0700)]

fscrypt: make fscrypt_operations.max_namelen an integer

Now ->max_namelen() is only called to limit the filename length when
adding NUL padding, and only for real filenames -- not symlink targets.
It also didn't give the correct length for symlink targets anyway since
it forgot to subtract 'sizeof(struct fscrypt_symlink_data)'.

Thus, change ->max_namelen from a function to a simple 'unsigned int'
that gives the filesystem's maximum filename length.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

commit | commitdiff | tree

Eric Biggers [Mon, 30 Apr 2018 22:51:43 +0000 (15:51 -0700)]

fscrypt: drop empty name check from fname_decrypt()

fname_decrypt() is validating that the encrypted filename is nonempty.
However, earlier a stronger precondition was already enforced: the
encrypted filename must be at least 16 (FS_CRYPTO_BLOCK_SIZE) bytes.

Drop the redundant check for an empty filename.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

commit | commitdiff | tree

Eric Biggers [Mon, 30 Apr 2018 22:51:42 +0000 (15:51 -0700)]

fscrypt: drop max_namelen check from fname_decrypt()

fname_decrypt() returns an error if the input filename is longer than
the inode's ->max_namelen() as given by the filesystem. But, this
doesn't actually make sense because the filesystem provided the input
filename in the first place, where it was subject to the filesystem's
limits. And fname_decrypt() has no internal limit itself.

Thus, remove this unnecessary check.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

commit | commitdiff | tree

Eric Biggers [Mon, 30 Apr 2018 22:51:41 +0000 (15:51 -0700)]

fscrypt: don't special-case EOPNOTSUPP from fscrypt_get_encryption_info()

In fscrypt_setup_filename(), remove the unnecessary check for
fscrypt_get_encryption_info() returning EOPNOTSUPP. There's no reason
to handle this error differently from any other. I think there may have
been some confusion because the "notsupp" version of
fscrypt_get_encryption_info() returns EOPNOTSUPP -- but that's not
applicable from inside fs/crypto/.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

commit | commitdiff | tree

Eric Biggers [Mon, 30 Apr 2018 22:51:40 +0000 (15:51 -0700)]

fscrypt: don't clear flags on crypto transform

fscrypt is clearing the flags on the crypto_skcipher it allocates for
each inode. But, this is unnecessary and may cause problems in the
future because it will even clear flags that are meant to be internal to
the crypto API, e.g. CRYPTO_TFM_NEED_KEY.

Remove the unnecessary flag clearing.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

commit | commitdiff | tree

Eric Biggers [Mon, 30 Apr 2018 22:51:39 +0000 (15:51 -0700)]

fscrypt: remove stale comment from fscrypt_d_revalidate()

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

commit | commitdiff | tree

Eric Biggers [Mon, 30 Apr 2018 22:51:38 +0000 (15:51 -0700)]

fscrypt: remove error messages for skcipher_request_alloc() failure

skcipher_request_alloc() can only fail due to lack of memory, and in
that case the memory allocator will have already printed a detailed
error message. Thus, remove the redundant error messages from fscrypt.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

commit | commitdiff | tree

Eric Biggers [Mon, 30 Apr 2018 22:51:37 +0000 (15:51 -0700)]

fscrypt: remove unnecessary NULL check when allocating skcipher

crypto_alloc_skcipher() returns an ERR_PTR() on failure, not NULL.
Remove the unnecessary check for NULL.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

commit | commitdiff | tree

Eric Biggers [Mon, 30 Apr 2018 22:51:36 +0000 (15:51 -0700)]

fscrypt: clean up after fscrypt_prepare_lookup() conversions

Now that all filesystems have been converted to use
fscrypt_prepare_lookup(), we can remove the fscrypt_set_d_op() and
fscrypt_set_encrypted_dentry() functions as well as un-export
fscrypt_d_ops.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

commit | commitdiff | tree

Eric Biggers [Fri, 20 Apr 2018 23:30:02 +0000 (16:30 -0700)]

fscrypt: use unbound workqueue for decryption

Improve fscrypt read performance by switching the decryption workqueue
from bound to unbound.  With the bound workqueue, when multiple bios
completed on the same CPU, they were decrypted on that same CPU.  But
with the unbound queue, they are now decrypted in parallel on any CPU.

Although fscrypt read performance can be tough to measure due to the
many sources of variation, this change is most beneficial when
decryption is slow, e.g. on CPUs without AES instructions.  For example,
I timed tarring up encrypted directories on f2fs.  On x86 with AES-NI
instructions disabled, the unbound workqueue improved performance by
about 25-35%, using 1 to NUM_CPUs jobs with 4 or 8 CPUs available.  But
with AES-NI enabled, performance was unchanged to within ~2%.

I also did the same test on a quad-core ARM CPU using xts-speck128-neon
encryption.  There performance was usually about 10% better with the
unbound workqueue, bringing it closer to the unencrypted speed.

The unbound workqueue may be worse in some cases due to worse locality,
but I think it's still the better default.  dm-crypt uses an unbound
workqueue by default too, so this change makes fscrypt match.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

commit | commitdiff | tree

Jaegeuk Kim [Thu, 31 May 2018 17:20:48 +0000 (10:20 -0700)]

f2fs: run fstrim asynchronously if runtime discard is on

We don't need to wait for whole bunch of discard candidates in fstrim, since
runtime discard will issue them in idle time.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Chao Yu [Tue, 10 Apr 2018 07:43:09 +0000 (15:43 +0800)]

f2fs: turn down IO priority of discard from background

In order to avoid interfering normal r/w IO, let's turn down IO
priority of discard issued from background.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Chao Yu [Mon, 9 Apr 2018 02:25:23 +0000 (10:25 +0800)]

f2fs: don't split checkpoint in fstrim

Now, we issue discard asynchronously in separated thread instead of in
checkpoint, after that, we won't encounter long latency in checkpoint
due to huge number of synchronous discard command handling, so, we don't
need to split checkpoint to do trim in batch, merge it and obsolete
related sysfs entry.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Jaegeuk Kim [Tue, 29 May 2018 16:58:42 +0000 (09:58 -0700)]

f2fs: issue discard commands proactively in high fs utilization

In the high utilization like over 80%, we don't expect huge # of large discard
commands, but do many small pending discards which affects FTL GCs a lot.
Let's issue them in that case.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Jaegeuk Kim [Sat, 26 May 2018 01:02:58 +0000 (18:02 -0700)]

f2fs: add fsync_mode=nobarrier for non-atomic files

For non-atomic files, this patch adds an option to give nobarrier which
doesn't issue flush commands to the device.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Jaegeuk Kim [Thu, 24 May 2018 20:57:26 +0000 (13:57 -0700)]

f2fs: let fstrim issue discard commands in lower priority

The fstrim gathers huge number of large discard commands, and tries to issue
without IO awareness, which results in long user-perceive IO latencies on
READ, WRITE, and FLUSH in UFS. We've observed some of commands take several
seconds due to long discard latency.

This patch limits the maximum size to 2MB per candidate, and check IO congestion
when issuing them to disk.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Jaegeuk Kim [Fri, 4 May 2018 06:26:02 +0000 (23:26 -0700)]

f2fs: avoid fsync() failure caused by EAGAIN in writepage()

pageout() in MM traslates EAGAIN, so calls handle_write_error()
-> mapping_set_error() -> set_bit(AS_EIO, ...).
file_write_and_wait_range() will see EIO error, which is critical
to return value of fsync() followed by atomic_write failure to user.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Jaegeuk Kim [Thu, 12 Apr 2018 06:09:04 +0000 (23:09 -0700)]

f2fs: clear PageError on writepage - part 2

This patch clears PageError in some pages tagged by read path, but when we
write the pages with valid contents, writepage should clear the bit likewise
ext4.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Jaegeuk Kim [Sat, 21 Apr 2018 06:44:59 +0000 (23:44 -0700)]

f2fs: check cap_resource only for data blocks

This patch changes the rule to check cap_resource for data blocks, not inode
or node blocks in order to avoid selinux denial.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Jaegeuk Kim [Sat, 21 Apr 2018 02:29:52 +0000 (19:29 -0700)]

Revert "f2fs: introduce f2fs_set_page_dirty_nobuffer"

This patch reverts copied f2fs_set_page_dirty_nobuffer to use generic function
for stability.

This reverts commit fe76b796fc5194cc3d57265002e3a748566d073f.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Jaegeuk Kim [Thu, 12 Apr 2018 06:09:04 +0000 (23:09 -0700)]

f2fs: clear PageError on writepage

This patch clears PageError in some pages tagged by read path, but when we
write the pages with valid contents, writepage should clear the bit likewise
ext4.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Eric Biggers [Wed, 18 Apr 2018 22:48:42 +0000 (15:48 -0700)]

f2fs: call unlock_new_inode() before d_instantiate()

xfstest generic/429 sometimes hangs on f2fs, caused by a thread being
unable to take a directory's i_rwsem for write in vfs_rmdir().  In the
test, one thread repeatedly creates and removes a directory, and other
threads repeatedly look up a file in the directory.  The bug is that
f2fs_mkdir() calls d_instantiate() before unlock_new_inode(), resulting
in the directory inode being exposed to lookups before it has been fully
initialized.  And with CONFIG_DEBUG_LOCK_ALLOC, unlock_new_inode()
reinitializes ->i_rwsem, corrupting its state when it is already held.

Fix it by calling unlock_new_inode() before d_instantiate().  This
matches what other filesystems do.

Fixes: 57397d86c62d ("f2fs: add inode operations for special inodes")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Eric Biggers [Wed, 18 Apr 2018 18:09:48 +0000 (11:09 -0700)]

f2fs: refactor read path to allow multiple postprocessing steps

Currently f2fs's ->readpage() and ->readpages() assume that either the
data undergoes no postprocessing, or decryption only.  But with
fs-verity, there will be an additional authenticity verification step,
and it may be needed either by itself, or combined with decryption.

To support this, store a 'struct bio_post_read_ctx' in ->bi_private
which contains a work struct, a bitmask of postprocessing steps that are
enabled, and an indicator of the current step.  The bio completion
routine, if there was no I/O error, enqueues the first postprocessing
step.  When that completes, it continues to the next step.  Pages that
fail any postprocessing step have PageError set.  Once all steps have
completed, pages without PageError set are set Uptodate, and all pages
are unlocked.

Also replace f2fs_encrypted_file() with a new function
f2fs_post_read_required() in places like direct I/O and garbage
collection that really should be testing whether the file needs special
I/O processing, not whether it is encrypted specifically.

This may also be useful for other future f2fs features such as
compression.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Eric Biggers [Wed, 18 Apr 2018 18:09:47 +0000 (11:09 -0700)]

fscrypt: allow synchronous bio decryption

Currently, fscrypt provides fscrypt_decrypt_bio_pages() which decrypts a
bio's pages asynchronously, then unlocks them afterwards.  But, this
assumes that decryption is the last "postprocessing step" for the bio,
so it's incompatible with additional postprocessing steps such as
authenticity verification after decryption.

Therefore, rename the existing fscrypt_decrypt_bio_pages() to
fscrypt_enqueue_decrypt_bio().  Then, add fscrypt_decrypt_bio() which
decrypts the pages in the bio synchronously without unlocking the pages,
nor setting them Uptodate; and add fscrypt_enqueue_decrypt_work(), which
enqueues work on the fscrypt_read_workqueue.  The new functions will be
used by filesystems that support both fscrypt and fs-verity.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Jaegeuk Kim [Fri, 30 Mar 2018 05:50:41 +0000 (22:50 -0700)]

f2fs: remain written times to update inode during fsync

This fixes xfstests/generic/392.

The failure was caused by different times between 1) one marked in the last
fsync(2) call and 2) the other given by roll-forward recovery after power-cut.
The reason was that we skipped updating inode block at 1), since its i_size
was recoverable along with 4KB-aligned data writes, which was fixed by:
"f2fs: fix a wrong condition in f2fs_skip_inode_update"

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Yunlong Song [Mon, 2 Apr 2018 12:22:20 +0000 (20:22 +0800)]

f2fs: make assignment of t->dentry_bitmap more readable

In make_dentry_ptr_block, it is confused with "&" for t->dentry_bitmap
but without "&" for t->dentry, so delete "&" to make code more readable.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Jaegeuk Kim [Sat, 31 Mar 2018 00:58:13 +0000 (17:58 -0700)]

f2fs: truncate preallocated blocks in error case

If write is failed, we must deallocate the blocks that we couldn't write.

Cc: stable@vger.kernel.org
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Junling Zheng [Thu, 29 Mar 2018 11:27:12 +0000 (19:27 +0800)]

f2fs: fix a wrong condition in f2fs_skip_inode_update

Fix commit 97dd26ad8347 (f2fs: fix wrong AUTO_RECOVER condition).
We should use ~PAGE_MASK to determine whether i_size is aligned to
the f2fs's block size or not.

Signed-off-by: Junling Zheng <zhengjunling@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Eric Biggers [Wed, 28 Mar 2018 18:15:09 +0000 (11:15 -0700)]

f2fs: reserve bits for fs-verity

Reserve an F2FS feature flag and inode flag for fs-verity.  This is an
in-development feature that is planned be discussed at LSF/MM 2018 [1].
It will provide file-based integrity and authenticity for read-only
files.  Most code will be in a filesystem-independent module, with
smaller changes needed to individual filesystems that opt-in to
supporting the feature.  An early prototype supporting F2FS is available
[2].  Reserving the F2FS on-disk bits for fs-verity will prevent users
of the prototype from conflicting with other new F2FS features.

Note that we're reserving the inode flag in f2fs_inode.i_advise, which
isn't really appropriate since it's not a hint or advice.  But
->i_advise is already being used to hold the 'encrypt' flag; and F2FS's
->i_flags uses the generic FS_* values, so it seems ->i_flags can't be
used for an F2FS-specific flag without additional work to remove the
assumption that ->i_flags uses the generic flags namespace.

[1] https://marc.info/?l=linux-fsdevel&m=151690752225644
[2] https://git.kernel.org/pub/scm/linux/kernel/git/mhalcrow/linux.git/log/?h=fs-verity-dev

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Yunlei He [Mon, 26 Mar 2018 09:32:23 +0000 (17:32 +0800)]

f2fs: Add a segment type check in inplace write

This patch add a segment type check in IPU, in
case of something wrong with blkadd in dnode.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Yunlong Song [Thu, 22 Mar 2018 02:08:40 +0000 (10:08 +0800)]

f2fs: no need to initialize zero value for GFP_F2FS_ZERO

Since f2fs_inode_info is allocated with flag GFP_F2FS_ZERO, so we do not
need to initialize zero value for its member any more.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Chao Yu [Tue, 20 Mar 2018 15:08:30 +0000 (23:08 +0800)]

f2fs: don't track new nat entry in nat set

Nat entry set is used only in checkpoint(), and during checkpoint() we
won't flush new nat entry with unallocated address, so we don't need to
add new nat entry into nat set, then nat_entry_set::entry_cnt can
indicate actual entry count we need to flush in checkpoint().

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Chao Yu [Tue, 20 Mar 2018 15:08:29 +0000 (23:08 +0800)]

f2fs: clean up with F2FS_BLK_ALIGN

Clean up F2FS_BYTES_TO_BLK(x + F2FS_BLKSIZE - 1) with F2FS_BLK_ALIGN(x).

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Yunlei He [Thu, 8 Mar 2018 08:29:13 +0000 (16:29 +0800)]

f2fs: check blkaddr more accuratly before issue a bio

This patch check blkaddr more accuratly before issue a
write or read bio.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Ritesh Harjani [Fri, 16 Mar 2018 13:23:53 +0000 (18:53 +0530)]

f2fs: Set GF_NOFS in read_cache_page_gfp while doing f2fs_quota_read

Quota code itself is serializing the operations by taking mutex_lock.
It seems a below deadlock can happen if GF_NOFS is not used in
f2fs_quota_read

__switch_to+0x88
__schedule+0x5b0
schedule+0x78
schedule_preempt_disabled+0x20
__mutex_lock_slowpath+0xdc //mutex owner is itself
mutex_lock+0x2c
dquot_commit+0x30 //mutex_lock(&dqopt->dqio_mutex);
dqput+0xe0
__dquot_drop+0x80
dquot_drop+0x48
f2fs_evict_inode+0x218
evict+0xa8
dispose_list+0x3c
prune_icache_sb+0x58
super_cache_scan+0xf4
do_shrink_slab+0x208
shrink_slab.part.40+0xac
shrink_zone+0x1b0
do_try_to_free_pages+0x25c
try_to_free_pages+0x164
__alloc_pages_nodemask+0x534
do_read_cache_page+0x6c
read_cache_page+0x14
f2fs_quota_read+0xa4
read_blk+0x54
find_tree_dqentry+0xe4
find_tree_dqentry+0xb8
find_tree_dqentry+0xb8
find_tree_dqentry+0xb8
qtree_read_dquot+0x68
v2_read_dquot+0x24
dquot_acquire+0x5c // mutex_lock(&dqopt->dqio_mutex);
dqget+0x238
__dquot_initialize+0xd4
dquot_initialize+0x10
dquot_file_open+0x34
f2fs_file_open+0x6c
do_dentry_open+0x1e4
vfs_open+0x6c
path_openat+0xa20
do_filp_open+0x4c
do_sys_open+0x178

Signed-off-by: Ritesh Harjani <riteshh@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Sheng Yong [Thu, 15 Mar 2018 10:51:42 +0000 (18:51 +0800)]

f2fs: introduce a new mount option test_dummy_encryption

This patch introduces a new mount option `test_dummy_encryption'
to allow fscrypt to create a fake fscrypt context. This is used
by xfstests.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Sheng Yong [Thu, 15 Mar 2018 10:51:41 +0000 (18:51 +0800)]

f2fs: introduce F2FS_FEATURE_LOST_FOUND feature

This patch introduces a new feature, F2FS_FEATURE_LOST_FOUND, which
is set by mkfs. mkfs creates a directory named lost+found, which saves
unreachable files. If fsck finds a file which has no parent, or its
parent is removed by fsck, the file will be placed under lost+found
directory by fsck.

lost+found directory could not be encrypted. As a result, the root
directory cannot be encrypted too. So if LOST_FOUND feature is enabled,
let's avoid to encrypt root directory.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Qiuyang Sun [Tue, 13 Mar 2018 11:42:50 +0000 (19:42 +0800)]

f2fs: release locks before return in f2fs_ioc_gc_range()

Currently, we will leave the kernel with locks still held when the gc_range
is invalid. This patch fixes the bug.

Signed-off-by: Qiuyang Sun <sunqiuyang@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Jaegeuk Kim [Sat, 10 Mar 2018 01:42:28 +0000 (17:42 -0800)]

f2fs: align memory boundary for bitops

For example, in arm64, free_nid_bitmap should be aligned to word size in order
to use bit operations.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Chao Yu [Fri, 9 Mar 2018 06:24:22 +0000 (14:24 +0800)]

f2fs: remove unneeded set_cold_node()

When setting COLD_BIT_SHIFT flag in node block, we only need to call
set_cold_node() in new_node_page() and recover_inode_page() during
node page initialization. So remove unneeded set_cold_node() in other
places.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Hyunchul Lee [Thu, 8 Mar 2018 10:34:38 +0000 (19:34 +0900)]

f2fs: add nowait aio support

This patch adds nowait aio support[1].

Return EAGAIN if any of the following checks fail for direct I/O:
- i_rwsem is not lockable
- Blocks are not allocated at the write location

And xfstests generic/471 is passed.

[1]: 6be96d "Introduce RWF_NOWAIT and FMODE_AIO_NOWAIT"

Signed-off-by: Hyunchul Lee <cheol.lee@lge.com>
Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Chao Yu [Thu, 8 Mar 2018 06:22:56 +0000 (14:22 +0800)]

f2fs: wrap all options with f2fs_sb_info.mount_opt

This patch merges miscellaneous mount options into struct f2fs_mount_info,
After this patch, once we add new mount option, we don't need to worry
about recovery of it in remount_fs(), since we will recover the
f2fs_sb_info.mount_opt including all options.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Yunlei He [Wed, 7 Mar 2018 08:22:50 +0000 (16:22 +0800)]

f2fs: Don't overwrite all types of node to keep node chain

Currently, we enable node SSR by default, and mixed
different types of node segment to do SSR more intensively.
Although reuse warm node is not allowed, warm node chain
will be destroyed by errors introduced by other types
node chain. So we'd better forbid reusing all types
of node to keep warm node chain.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Junling Zheng [Wed, 7 Mar 2018 04:07:49 +0000 (12:07 +0800)]

f2fs: introduce mount option for fsync mode

Commit "0a007b97aad6"(f2fs: recover directory operations by fsync)
fixed xfstest generic/342 case, but it also increased the written
data and caused the performance degradation. In most cases, there's
no need to do so heavy fsync actually.

So we introduce new mount option "fsync_mode={posix,strict}" to
control the policy of fsync. "fsync_mode=posix" is set by default,
and means that f2fs uses a light fsync, which follows POSIX semantics.
And "fsync_mode=strict" means that it's a heavy fsync, which behaves
in line with xfs, ext4 and btrfs, where generic/342 will pass, but
the performance will regress.

Signed-off-by: Junling Zheng <zhengjunling@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Chao Yu [Thu, 1 Mar 2018 15:40:32 +0000 (23:40 +0800)]

f2fs: fix to restore old mount option in ->remount_fs

This patch fixes to restore old mount option once we encounter failure
in ->remount_fs.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Chao Yu [Thu, 1 Mar 2018 15:40:31 +0000 (23:40 +0800)]

f2fs: wrap sb_rdonly with f2fs_readonly

Use f2fs_readonly to wrap sb_rdonly for cleanup, and spread it in
all places.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Jaegeuk Kim [Fri, 9 Mar 2018 04:47:33 +0000 (20:47 -0800)]

f2fs: avoid selinux denial on CAP_SYS_RESOURCE

This fixes CAP_SYS_RESOURCE denial of selinux when using resgid, since it
seems selinux reports it at the first place, but mostly we don't need to
check this condition first.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Chao Yu [Wed, 28 Feb 2018 09:07:27 +0000 (17:07 +0800)]

f2fs: support hot file extension

This patch supports to recognize hot file extension in f2fs, so that we
can allocate proper hot segment location for its data, which can lead to
better hot/cold seperation in filesystem.

In addition, we changes a bit on query/add/del operation method for
extension_list sysfs entry as below:

- Query: cat /sys/fs/f2fs/<disk>/extension_list
- Add: echo 'extension' > /sys/fs/f2fs/<disk>/extension_list
- Del: echo '!extension' > /sys/fs/f2fs/<disk>/extension_list
- Add: echo '[h/c]extension' > /sys/fs/f2fs/<disk>/extension_list
- Del: echo '[h/c]!extension' > /sys/fs/f2fs/<disk>/extension_list
- [h] means add/del hot file extension
- [c] means add/del cold file extension

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Chao Yu [Tue, 27 Feb 2018 14:45:24 +0000 (22:45 +0800)]

f2fs: fix to avoid race in between atomic write and background GC

Sqlite user Background GC
- move_data_block
  : move page #1
- f2fs_is_atomic_file
- f2fs_ioc_start_atomic_write
- f2fs_ioc_commit_atomic_write
- commit_inmem_pages
   : commit page #1 & set node #2 dirty
- f2fs_submit_page_write
  - f2fs_update_data_blkaddr
   - set_page_dirty
     : set node #2 dirty
- f2fs_do_sync_file
  - fsync_node_pages
   : commit node #1 & node #2, then sudden power-cut

In a race case, we may check FI_ATOMIC_FILE flag before starting atomic
write flow, then we will commit meta data before data with reversed
order, after a sudden pow-cut, database transaction will be inconsistent.

So we'd better to exclude gc/atomic_write to each other by using lock
instead of flag checking.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Jaegeuk Kim [Mon, 26 Feb 2018 23:40:30 +0000 (15:40 -0800)]

f2fs: do gc in greedy mode for whole range if gc_urgent mode is set

Otherwise, f2fs conducts GC on 8GB range only based on slow cost-benefit.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Jaegeuk Kim [Fri, 23 Feb 2018 07:30:55 +0000 (23:30 -0800)]

f2fs: issue discard aggressively in the gc_urgent mode

This patch avoids to skip discard commands when user sets gc_urgent mode.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Jaegeuk Kim [Sun, 25 Feb 2018 09:04:57 +0000 (01:04 -0800)]

f2fs: set readdir_ra by default

It gives general readdir improvement.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Jaegeuk Kim [Thu, 22 Feb 2018 22:09:30 +0000 (14:09 -0800)]

f2fs: add auto tuning for small devices

If f2fs is running on top of very small devices, it's worth to avoid abusing
free LBAs. In order to achieve that, this patch introduces some parameter
tuning.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Jaegeuk Kim [Sun, 18 Feb 2018 16:50:49 +0000 (08:50 -0800)]

f2fs: add mount option for segment allocation policy

This patch adds an mount option, "alloc_mode=%s" having two options, "default"
and "reuse".

In "alloc_mode=reuse" case, f2fs starts to allocate segments from 0'th segment
all the time to reassign segments. It'd be useful for small-sized eMMC parts.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Jaegeuk Kim [Mon, 26 Feb 2018 17:19:47 +0000 (09:19 -0800)]

f2fs: don't stop GC if GC is contended

Let's do GC as much as possible, while gc_urgent is set.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Chao Yu [Mon, 26 Feb 2018 14:04:13 +0000 (22:04 +0800)]

f2fs: expose extension_list sysfs entry

This patch adds a sysfs entry 'extension_list' to support
query/add/del item in extension list.

Query:
cat /sys/fs/f2fs/<device>/extension_list

Add:
echo 'extension' > /sys/fs/f2fs/<device>/extension_list

Del:
echo '!extension' > /sys/fs/f2fs/<device>/extension_list

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Chao Yu [Sun, 25 Feb 2018 15:38:21 +0000 (23:38 +0800)]

f2fs: fix to set KEEP_SIZE bit in f2fs_zero_range

As Jayashree Mohan reported:

A simple workload to reproduce this would be :
1. create foo
2. Write (8K - 16K) // foo size = 16K now
3. fsync()
4. falloc zero_range , keep_size (4202496 - 4210688) // foo size must be 16K
5. fdatasync()
Crash now

On recovery, we see that the file size is 4210688 and not 16K, which
violates the semantics of keep_size flag. We have a test case to
reproduce this using CrashMonkey on 4.15 kernel. Try this out by
simply running :
./c_harness -f /dev/sda -d /dev/cow_ram0 -t f2fs -e 102400 -P -v
tests/generic_468_zero.so

The root cause is that we miss to set KEEP_SIZE bit correctly in zero_range
when zeroing block cross EOF with FALLOC_FL_KEEP_SIZE, let's fix this
missing case.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Chao Yu [Sun, 11 Feb 2018 14:53:20 +0000 (22:53 +0800)]

f2fs: introduce sb_lock to make encrypt pwsalt update exclusive

f2fs_super_block.encrypt_pw_salt can be udpated and persisted
concurrently, result in getting different pwsalt in separated
threads, so let's introduce sb_lock to exclude concurrent
accessers.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Colin Ian King [Wed, 21 Feb 2018 18:13:40 +0000 (18:13 +0000)]

f2fs: remove redundant initialization of pointer 'p'

Pointer p is initialized with a value that is never read and is later
re-assigned a new value, hence the initialization is redundant and can
be removed.

Cleans up clang warning:
fs/f2fs/extent_cache.c:463:19: warning: Value stored to 'p' during
its initialization is never read

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Gao Xiang [Sat, 10 Feb 2018 04:12:51 +0000 (12:12 +0800)]

f2fs: flush cp pack except cp pack 2 page at first

Previously, we attempt to flush the whole cp pack in a single bio,
however, when suddenly powering off at this time, we could get into
an extreme scenario that cp pack 1 page and cp pack 2 page are updated
and latest, but payload or current summaries are still partially
outdated. (see reliable write in the UFS specification)

This patch submits the whole cp pack except cp pack 2 page at first,
and then writes the cp pack 2 page with an extra independent
bio with pre-io barrier.

Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Sheng Yong [Tue, 6 Feb 2018 04:31:17 +0000 (12:31 +0800)]

f2fs: clean up f2fs_sb_has_xxx functions

This patch introduces F2FS_FEATURE_FUNCS to clean up the definitions of
different f2fs_sb_has_xxx functions.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Tiezhu Yang [Tue, 6 Feb 2018 00:21:45 +0000 (08:21 +0800)]

f2fs: remove redundant check of page type when submit bio

This patch removes redundant check of page type when submit bio to
make the logic more clear.

Signed-off-by: Tiezhu Yang <kernelpatch@126.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Chao Yu [Sat, 3 Feb 2018 09:44:39 +0000 (17:44 +0800)]

f2fs: fix to handle looped node chain during recovery

There is no checksum in node block now, so bit-transition from hardware
can make node_footer.next_blkaddr being corrupted w/o any detection,
result in node chain becoming looped one.

For this condition, during recovery, in order to avoid running into dead
loop, let's detect it and just skip out.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Jaegeuk Kim [Thu, 8 Feb 2018 01:01:48 +0000 (17:01 -0800)]

f2fs: handle quota for orphan inodes

This is to detect dquot_initialize errors early from evict_inode
for orphan inodes.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Hyunchul Lee [Wed, 31 Jan 2018 02:36:58 +0000 (11:36 +0900)]

f2fs: support passing down write hints to block layer with F2FS policy

Add 'whint_mode=fs-based' mount option. In this mode, F2FS passes
down write hints with its policy.

* whint_mode=fs-based. F2FS passes down hints with its policy.

User                  F2FS                     Block
----                  ----                     -----
                      META                     WRITE_LIFE_MEDIUM;
                      HOT_NODE                 WRITE_LIFE_NOT_SET
                      WARM_NODE                "
                      COLD_NODE                WRITE_LIFE_NONE
ioctl(COLD)           COLD_DATA                WRITE_LIFE_EXTREME
extension list        "                        "

-- buffered io
WRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME
WRITE_LIFE_SHORT      HOT_DATA                 WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_LONG
WRITE_LIFE_NONE       "                        "
WRITE_LIFE_MEDIUM     "                        "
WRITE_LIFE_LONG       "                        "

-- direct io
WRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME
WRITE_LIFE_SHORT      HOT_DATA                 WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_NOT_SET
WRITE_LIFE_NONE       "                        WRITE_LIFE_NONE
WRITE_LIFE_MEDIUM     "                        WRITE_LIFE_MEDIUM
WRITE_LIFE_LONG       "                        WRITE_LIFE_LONG

Many thanks to Chao Yu and Jaegeuk Kim for comments to
implement this patch.

Signed-off-by: Hyunchul Lee <cheol.lee@lge.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Hyunchul Lee [Wed, 31 Jan 2018 02:36:57 +0000 (11:36 +0900)]

f2fs: support passing down write hints given by users to block layer

Add the 'whint_mode' mount option that controls which write
hints are passed down to block layer. There are "off" and
"user-based" mode. The default mode is "off".

1) whint_mode=off. F2FS only passes down WRITE_LIFE_NOT_SET.

2) whint_mode=user-based. F2FS tries to pass down hints given
by users.

User                  F2FS                     Block
----                  ----                     -----
                      META                     WRITE_LIFE_NOT_SET
                      HOT_NODE                 "
                      WARM_NODE                "
                      COLD_NODE                "
ioctl(COLD)           COLD_DATA                WRITE_LIFE_EXTREME
extension list        "                        "

-- buffered io
WRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME
WRITE_LIFE_SHORT      HOT_DATA                 WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_NOT_SET
WRITE_LIFE_NONE       "                        "
WRITE_LIFE_MEDIUM     "                        "
WRITE_LIFE_LONG       "                        "

-- direct io
WRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME
WRITE_LIFE_SHORT      HOT_DATA                 WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_NOT_SET
WRITE_LIFE_NONE       "                        WRITE_LIFE_NONE
WRITE_LIFE_MEDIUM     "                        WRITE_LIFE_MEDIUM
WRITE_LIFE_LONG       "                        WRITE_LIFE_LONG

Many thanks to Chao Yu and Jaegeuk Kim for comments to
implement this patch.

Signed-off-by: Hyunchul Lee <cheol.lee@lge.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: avoid build warning]
[Chao Yu: fix to restore whint_mode in ->remount_fs]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Chao Yu [Wed, 31 Jan 2018 01:30:34 +0000 (09:30 +0800)]

f2fs: fix to clear CP_TRIMMED_FLAG

Once CP_TRIMMED_FLAG is set, after a reboot, we will never issue discard
before LBA becomes invalid again, fix it by clearing the flag in
checkpoint without CP_TRIMMED reason.

Fixes: 1f43e2ad7bff ("f2fs: introduce CP_TRIMMED_FLAG to avoid unneeded discard")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Chao Yu [Thu, 25 Jan 2018 11:40:08 +0000 (19:40 +0800)]

f2fs: support large nat bitmap

Previously, we will store all nat version bitmap in checkpoint pack block,
so our total node entry number has a limitation which caused total node
number can not exceed (3900 * 8) block * 455 node/block = 14196000. So
that once user wants to create more nodes in large size image, it becomes
a bottleneck, that's unreasonable.

This patch detects the new layout of nat/sit version bitmap in image in
order to enable supporting large nat bitmap.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Chao Yu [Sat, 27 Jan 2018 09:29:49 +0000 (17:29 +0800)]

f2fs: fix to check extent cache in f2fs_drop_extent_tree

If noextent_cache mount option is on, we will never initialize extent tree
in inode, but still we're going to access it in f2fs_drop_extent_tree,
result in kernel panic as below:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
IP: _raw_write_lock+0xc/0x30
Call Trace:
  ? f2fs_drop_extent_tree+0x41/0x70 [f2fs]
  f2fs_fallocate+0x5a0/0xdd0 [f2fs]
  ? common_file_perm+0x47/0xc0
  ? apparmor_file_permission+0x1a/0x20
  vfs_fallocate+0x15b/0x290
  SyS_fallocate+0x44/0x70
  do_syscall_64+0x6e/0x160
  entry_SYSCALL64_slow_path+0x25/0x25

This patch fixes to check extent cache status before using in
f2fs_drop_extent_tree.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Chao Yu [Sat, 27 Jan 2018 09:29:48 +0000 (17:29 +0800)]

f2fs: restrict inline_xattr_size configuration

This patch limits to enable inline_xattr_size mount option only if
both extra_attr and flexible_inline_xattr feature is on in current
image.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Yunlong Song [Mon, 29 Jan 2018 03:37:45 +0000 (11:37 +0800)]

f2fs: fix heap mode to reset it back

Commit 7a20b8a61eff81bdb7097a578752a74860e9d142 ("f2fs: allocate node
and hot data in the beginning of partition") introduces another mount
option, heap, to reset it back. But it does not do anything for heap
mode, so fix it.

Cc: stable@vger.kernel.org
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Sheng Yong [Mon, 29 Jan 2018 11:13:15 +0000 (19:13 +0800)]

f2fs: fix potential corruption in area before F2FS_SUPER_OFFSET

sb_getblk does not guarantee the buffer head is uptodate. If bh is not
uptodate, the data (may be used as boot code) in area before
F2FS_SUPER_OFFSET may get corrupted when super block is committed.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

commit | commitdiff | tree

Eric Biggers [Fri, 19 Jan 2018 21:45:24 +0000 (13:45 -0800)]

fscrypt: fix build with pre-4.6 gcc versions

gcc versions prior to 4.6 require an extra level of braces when using a
designated initializer for a member in an anonymous struct or union.
This caused a compile error with the 'struct qstr' initialization in
__fscrypt_encrypt_symlink().

Fix it by using QSTR_INIT().

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Fixes: 76e81d6d5048 ("fscrypt: new helper functions for ->symlink()")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

Mirror: https://github.com/0ranko0P/kernel_xiaomi_msm8998/

RSS Atom

About OSDN

Find Software

Develop Software

Help