git.osdn.net Git - android-x86/external-llvm-project.git/commit

author	Florian Hahn <flo@fhahn.com>
	Fri, 29 May 2020 11:53:30 +0000 (12:53 +0100)
committer	Florian Hahn <flo@fhahn.com>
	Fri, 29 May 2020 12:21:13 +0000 (13:21 +0100)
commit	d20a3d35e1875d7a4928184117e6a875c35f3f63
tree	2df111d6011906d7e4aeed679579274b5489d979	tree \| snapshot
parent	1ee114322cb251f851028c72e7974bf85e707e55	commit \| diff

[DAGComb] Do not turn insert_elt into shuffle for single elt vectors.

Currently combineInsertEltToShuffle turns insert_vector_elt into a
vector_shuffle, even if the inserted element is a vector with a single
element. In this case, it should be unlikely that the additional shuffle
would be more efficient than a insert_vector_elt.

Additionally, this fixes a infinite cycle in DAGCombine, where
combineInsertEltToShuffle turns a insert_vector_elt into a shuffle,
which gets turned back into a insert_vector_elt/extract_vector_elt by
a custom AArch64 lowering (in visitVECTOR_SHUFFLE).

Such insert_vector_elt and extract_vector_elt combinations can be
lowered efficiently using mov on AArch64.

There are 2 test changes in arm64-neon-copy.ll: we now use one or two
mov instructions instead of a single zip1. The reason that we need a
second mov in ins1f2 is that we have to move the result to the result
register and is not really related to the DAGCombine fold I think.
But in any case, on most uarchs, mov should be cheaper than zip1. On a
Cortex-A75 for example, zip1 is twice as expensive as mov
(https://developer.arm.com/docs/101398/latest/arm-cortex-a75-software-optimization-guide-v20)

Reviewers: spatel, efriedma, dmgreen, RKSimon

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D80710

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp		diff \| blob \| history
llvm/test/CodeGen/AArch64/arm64-neon-copy.ll		diff \| blob \| history
llvm/test/CodeGen/AArch64/vector-insert-shuffle-cycle.ll	[new file with mode: 0644]	blob