[x86] allow more shuffle splitting to avoid vpermps (PR40434)
This is tricky to make optimal: sometimes we're better off using
a single wider op, but other times it makes more sense to combine
a narrow ops to achieve the same result.
This solves the case from:
https://bugs.llvm.org/show_bug.cgi?id=40434
There's potentially a similar change for vectors with 64-bit elements,
but it needs adjustments similar to rL352333 to avoid creating infinite
loops.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@352380
91177308-0d34-0410-b5e6-
96231b3b80d8