OSDN Git Service

[X86][SSE] Change BUILD_VECTOR interleaving ordering to improve coalescing/combine...
authorSimon Pilgrim <llvm-dev@redking.me.uk>
Sun, 4 Jun 2017 20:12:04 +0000 (20:12 +0000)
committerSimon Pilgrim <llvm-dev@redking.me.uk>
Sun, 4 Jun 2017 20:12:04 +0000 (20:12 +0000)
commit0261597a5e969c5bcec83923201abd6d763165c9
treef9e9b1fcb9f739c36d9728a998b7e930c4527abc
parent2cfe765f46ed566aa42fdeba0128617db7c052bd
[X86][SSE] Change BUILD_VECTOR interleaving ordering to improve coalescing/combine opportunities

We currently generate BUILD_VECTOR as a tree of UNPCKL shuffles of the same type:

e.g. for v4f32:

Step 1: unpcklps 0, 2 ==> X: <?, ?, 2, 0>
      : unpcklps 1, 3 ==> Y: <?, ?, 3, 1>
Step 2: unpcklps X, Y ==>    <3, 2, 1, 0>

The issue is because we are not placing sequential vector elements together early enough, we fail to recognise many combinable patterns - consecutive scalar loads, extractions etc.

Instead, this patch unpacks progressively larger sequential vector elements together:

e.g. for v4f32:

Step 1: unpcklps 0, 2 ==> X: <?, ?, 1, 0>
      : unpcklps 1, 3 ==> Y: <?, ?, 3, 2>
Step 2: unpcklpd X, Y ==>    <3, 2, 1, 0>

This does mean that we are creating UNPCKL shuffle of different value types, but the relevant combines that benefit from this are quite capable of handling the additional BITCASTs that are now included in the shuffle tree.

Differential Revision: https://reviews.llvm.org/D33864

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304688 91177308-0d34-0410-b5e6-96231b3b80d8
20 files changed:
lib/Target/X86/X86ISelLowering.cpp
test/CodeGen/X86/build-vector-128.ll
test/CodeGen/X86/buildvec-insertvec.ll
test/CodeGen/X86/clear_upper_vector_element_bits.ll
test/CodeGen/X86/haddsub-2.ll
test/CodeGen/X86/haddsub-undef.ll
test/CodeGen/X86/merge-consecutive-loads-128.ll
test/CodeGen/X86/select.ll
test/CodeGen/X86/sse-intrinsics-fast-isel.ll
test/CodeGen/X86/sse1.ll
test/CodeGen/X86/sse2-intrinsics-fast-isel.ll
test/CodeGen/X86/sse3-avx-addsub-2.ll
test/CodeGen/X86/vec_fp_to_int.ll
test/CodeGen/X86/vec_int_to_fp.ll
test/CodeGen/X86/vec_set.ll
test/CodeGen/X86/vector-rem.ll
test/CodeGen/X86/vector-sext.ll
test/CodeGen/X86/vector-shuffle-variable-128.ll
test/CodeGen/X86/vshift-1.ll
test/CodeGen/X86/vshift-2.ll