git.osdn.net Git - uclinux-h8/linux.git/commit

author	Martin Willi <martin@strongswan.org>
	Tue, 20 Nov 2018 16:30:48 +0000 (17:30 +0100)
committer	Herbert Xu <herbert@gondor.apana.org.au>
	Thu, 29 Nov 2018 08:27:04 +0000 (16:27 +0800)
commit	cee7a36ecb5bafef8c87fb2c10641e6125044154
tree	7e52243f3733ef595aeaa7df6094a8db30fb3747	tree \| snapshot
parent	059c2a4d8e164dccc3078e49e7f286023b019a98	commit \| diff

crypto: x86/chacha20 - Add a 8-block AVX-512VL variant

This variant is similar to the AVX2 version, but benefits from the AVX-512
rotate instructions and the additional registers, so it can operate without
any data on the stack. It uses ymm registers only to avoid the massive core
throttling on Skylake-X platforms. Nontheless does it bring a ~30% speed
improvement compared to the AVX2 variant for random encryption lengths.

The AVX2 version uses "rep movsb" for partial block XORing via the stack.
With AVX-512, the new "vmovdqu8" can do this much more efficiently. The
associated "kmov" instructions to work with dynamic masks is not part of
the AVX-512VL instruction set, hence we depend on AVX-512BW as well. Given
that the major AVX-512VL architectures provide AVX-512BW and this extension
does not affect core clocking, this seems to be no problem at least for
now.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

arch/x86/crypto/Makefile		diff \| blob \| history
arch/x86/crypto/chacha20-avx512vl-x86_64.S	[new file with mode: 0644]	blob
arch/x86/crypto/chacha20_glue.c		diff \| blob \| history