[X86] Enable reciprocal estimates for v16f32 vectors by using VRCP14PS/VRSQRT14PS
Summary:
The legacy VRCPPS/VRSQRTPS instructions aren't available in 512-bit versions. The new increased precision versions are. So we can use those to implement v16f32 reciprocal estimates.
For KNL CPUs we can probably use VRCP28PS/VRSQRT28PS and avoid the NR step altogether, but I leave that for a future patch.
Reviewers: spatel
Reviewed By: spatel
Subscribers: RKSimon, llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D46498
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331606
91177308-0d34-0410-b5e6-
96231b3b80d8