git.osdn.net Git - android-x86/external-llvm.git/commit

author	Igor Breger <igor.breger@intel.com>
	Mon, 20 Feb 2017 14:16:29 +0000 (14:16 +0000)
committer	Igor Breger <igor.breger@intel.com>
	Mon, 20 Feb 2017 14:16:29 +0000 (14:16 +0000)
commit	05a06cba9ee31a002377463f832bc44a9a32ae26
tree	2eba5b428a73173c09e546f0ad80fba216d031ce	tree \| snapshot
parent	b0f1c39d24a03cef6cce132a57fbb07f35b6a0a5	commit \| diff

[X86] Fix EXTRACT_VECTOR_ELT with variable index from v32i16 and v64i8 vector.

Its more profitable to go through memory (1 cycles throughput)
than using VMOVD + VPERMV/PSHUFB sequence ( 2/3 cycles throughput) to implement EXTRACT_VECTOR_ELT with variable index.
IACA tool was used to get performace estimation (https://software.intel.com/en-us/articles/intel-architecture-code-analyzer)
For example for var_shuffle_v16i8_v16i8_xxxxxxxxxxxxxxxx_i8 test from vector-shuffle-variable-128.ll I get 26 cycles vs 79 cycles.
Removing the VINSERT node, we don't need it any more.

Differential Revision: https://reviews.llvm.org/D29690

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295660 91177308-0d34-0410-b5e6-96231b3b80d8

lib/Target/X86/X86ISelLowering.cpp		diff \| blob \| history
lib/Target/X86/X86ISelLowering.h		diff \| blob \| history
lib/Target/X86/X86InstrAVX512.td		diff \| blob \| history
lib/Target/X86/X86InstrFragmentsSIMD.td		diff \| blob \| history
lib/Target/X86/X86InstrSSE.td		diff \| blob \| history
test/CodeGen/X86/avx512-insert-extract.ll		diff \| blob \| history
test/CodeGen/X86/extractelement-index.ll		diff \| blob \| history
test/CodeGen/X86/vector-shuffle-variable-256.ll		diff \| blob \| history