git.osdn.net Git - android-x86/external-llvm.git/commit

[X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess (VF16 stride 4).

This patch expands the support of lowerInterleavedStore to 16x8i stride 4.

LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=4 VF=16) and we plan to include more patterns in the future.

The patch goal is to optimize the following sequence:
At the end of the computation, we have ymm2, ymm0, ymm12 and ymm3 holding
each 16 chars:

c0, c1, , c16
m0, m1, , m16
y0, y1, , y16
k0, k1, ., k16

And these need to be transposed/interleaved and stored like so:

c0 m0 y0 k0 c1 m1 y1 k1 c2 m2 y2 k2 c3 m3 y3 k3 ....

Differential Revision: https://reviews.llvm.org/D35829

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310252 91177308-0d34-0410-b5e6-96231b3b80d8

author	Michael Zuckerman <Michael.zuckerman@intel.com>
	Mon, 7 Aug 2017 13:22:39 +0000 (13:22 +0000)
committer	Michael Zuckerman <Michael.zuckerman@intel.com>
	Mon, 7 Aug 2017 13:22:39 +0000 (13:22 +0000)
commit	f87ba7b70143d984f58e25faf3bfcefb3025d7c3
tree	c579896bd31a2cc3ace6e4b57ea184895ec80a6f	tree \| snapshot
parent	9676036f42b42e9da99a433ecde798439ab6eb47	commit \| diff

lib/Target/X86/X86InterleavedAccess.cpp		diff \| blob \| history
test/CodeGen/X86/x86-interleaved-access.ll		diff \| blob \| history
test/Transforms/InterleavedAccess/X86/interleavedStore.ll		diff \| blob \| history