[X86] Allow cross-lane permutations for sub targets supporting AVX2.
Summary:
Most instructions in AVX work “in-lane”, that is, each source element is applied only to other
elements of the same lane, thus a cross lane permutation is costly and needs more than one instrution.
AVX2 includes instructions to perform any-to-any permutation of words over a 256-bit register
and vectorized table lookup.
This should also Fix PR34369
Differential Revision: https://reviews.llvm.org/D37388
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312608
91177308-0d34-0410-b5e6-
96231b3b80d8