git.osdn.net Git - android-x86/external-llvm.git/commit

[X86] Heuristic to selectively build Newton-Raphson SQRT estimation

On modern Intel processors hardware SQRT in many cases is faster than RSQRT
followed by Newton-Raphson refinement. The patch introduces a simple heuristic
to choose between hardware SQRT instruction and Newton-Raphson software
estimation.

The patch treats scalars and vectors differently. The heuristic is that for
scalars the compiler should optimize for latency while for vectors it should
optimize for throughput. It is based on the assumption that throughput bound
code is likely to be vectorized.

Basically, the patch disables scalar NR for big cores and disables NR completely
for Skylake. Firstly, scalar SQRT has shorter latency than NR code in big cores.
Secondly, vector SQRT has been greatly improved in Skylake and has better
throughput compared to NR.

Differential Revision: https://reviews.llvm.org/D21379

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277725 91177308-0d34-0410-b5e6-96231b3b80d8

author	Nikolai Bozhenov <nikolai.bozhenov@intel.com>
	Thu, 4 Aug 2016 12:47:28 +0000 (12:47 +0000)
committer	Nikolai Bozhenov <nikolai.bozhenov@intel.com>
	Thu, 4 Aug 2016 12:47:28 +0000 (12:47 +0000)
commit	17c4ba4fe41285eaa2eabe6f058f094b4d384520
tree	bf7a07683c99d00782bca7c99622ff268d40dd79	tree \| snapshot
parent	d2c97748ac77e84ba19a765796e2a10741e6691f	commit \| diff

include/llvm/Target/TargetLowering.h		diff \| blob \| history
lib/CodeGen/SelectionDAG/DAGCombiner.cpp		diff \| blob \| history
lib/CodeGen/TargetLoweringBase.cpp		diff \| blob \| history
lib/Target/AMDGPU/AMDGPUISelLowering.cpp		diff \| blob \| history
lib/Target/AMDGPU/AMDGPUISelLowering.h		diff \| blob \| history
lib/Target/X86/X86.td		diff \| blob \| history
lib/Target/X86/X86ISelLowering.cpp		diff \| blob \| history
lib/Target/X86/X86ISelLowering.h		diff \| blob \| history
lib/Target/X86/X86Subtarget.cpp		diff \| blob \| history
lib/Target/X86/X86Subtarget.h		diff \| blob \| history
test/CodeGen/X86/sqrt-fastmath-tune.ll	[new file with mode: 0644]	blob