git.osdn.net Git - android-x86/external-llvm.git/log

[LV] Move test for r343954 into x86 subdirectory

This test uses an x86 triple, so it needs to be in the x86 specific
test directory.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344087 91177308-0d34-0410-b5e6-96231b3b80d8

[WebAssembly] Improve readability of SIMD instructions (NFC)

Summary:
- Categorize instructions into the categories as in the SIMD spec
- Move SIMD-related definition to WebAssemblyInstrSIMD.td
- Put definition and use of patterns together
- Add newlines here and there

Reviewers: tlively

Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53045

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344086 91177308-0d34-0410-b5e6-96231b3b80d8

Recommit r343993: [X86] condition branches folding for three-way conditional codes

Fix the memory issue exposed by sanitizer.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344085 91177308-0d34-0410-b5e6-96231b3b80d8

[FPEnv] PatternMatcher support for checking FNEG ignoring signed zeros

https://reviews.llvm.org/D52934

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344084 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] reverse 'trunc X to <N x i1>' canonicalization

icmp ne (and X, 1), 0 --> trunc X to N x i1

Ideally, we'd do the same for scalars, but there will likely be
regressions unless we add more trunc folds as we're doing here
for vectors.

The motivating vector case is from PR37549:
https://bugs.llvm.org/show_bug.cgi?id=37549

define <4 x float> @bitwise_select(<4 x float> %x, <4 x float> %y, <4 x float> %z, <4 x float> %w) {
  %c = fcmp ole <4 x float> %x, %y
  %s = sext <4 x i1> %c to <4 x i32>
  %s1 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 0, i32 0, i32 1, i32 1>
  %s2 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 2, i32 2, i32 3, i32 3>
  %cond = or <4 x i32> %s1, %s2
  %condtr = trunc <4 x i32> %cond to <4 x i1>
  %r = select <4 x i1> %condtr, <4 x float> %z, <4 x float> %w
  ret <4 x float> %r
}

Here's a sampling of the vector codegen for that case using
mask+icmp (current behavior) vs. trunc (with this patch):

AVX before:

vcmpleps %xmm1, %xmm0, %xmm0
vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps %xmm0, %xmm1, %xmm0
vandps LCPI0_0(%rip), %xmm0, %xmm0
vxorps %xmm1, %xmm1, %xmm1
vpcmpeqd %xmm1, %xmm0, %xmm0
vblendvps %xmm0, %xmm3, %xmm2, %xmm0

AVX after:

vcmpleps %xmm1, %xmm0, %xmm0
vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps %xmm0, %xmm1, %xmm0
vblendvps %xmm0, %xmm2, %xmm3, %xmm0

AVX512f before:

vcmpleps %xmm1, %xmm0, %xmm0
vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps %xmm0, %xmm1, %xmm0
vpbroadcastd LCPI0_0(%rip), %xmm1 ## xmm1 = [1,1,1,1]
vptestnmd %zmm1, %zmm0, %k1
vblendmps %zmm3, %zmm2, %zmm0 {%k1}

AVX512f after:

vcmpleps %xmm1, %xmm0, %xmm0
vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps %xmm0, %xmm1, %xmm0
vpslld $31, %xmm0, %xmm0
vptestmd %zmm0, %zmm0, %k1
vblendmps %zmm2, %zmm3, %zmm0 {%k1}

AArch64 before:

fcmge v0.4s, v1.4s, v0.4s
zip1 v1.4s, v0.4s, v0.4s
zip2 v0.4s, v0.4s, v0.4s
orr v0.16b, v1.16b, v0.16b
movi v1.4s, #1
and v0.16b, v0.16b, v1.16b
cmeq v0.4s, v0.4s, #0
bsl v0.16b, v3.16b, v2.16b

AArch64 after:

fcmge v0.4s, v1.4s, v0.4s
zip1 v1.4s, v0.4s, v0.4s
zip2 v0.4s, v0.4s, v0.4s
orr v0.16b, v1.16b, v0.16b
bsl v0.16b, v2.16b, v3.16b

PowerPC-le before:

xvcmpgesp 34, 35, 34
vspltisw 0, 1
vmrglw 3, 2, 2
vmrghw 2, 2, 2
xxlor 0, 35, 34
xxlxor 35, 35, 35
xxland 34, 0, 32
vcmpequw 2, 2, 3
xxsel 34, 36, 37, 34

PowerPC-le after:

xvcmpgesp 34, 35, 34
vmrglw 3, 2, 2
vmrghw 2, 2, 2
xxlor 0, 35, 34
xxsel 34, 37, 36, 0

Differential Revision: https://reviews.llvm.org/D52747

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344082 91177308-0d34-0410-b5e6-96231b3b80d8

[PDB] Fix another bug in globals stream name lookup.

When we're on the last bucket the computation is tricky.
We were failing when the last bucket contained multiple
matches. Added a new test for this.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344081 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-objcopy] Make -S an alias for --strip-all

-S should be an alias for --strip-all not --strip-all-gnu

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344080 91177308-0d34-0410-b5e6-96231b3b80d8

llvm-dwarfdump: Extend --name to also search DW_AT_linkage_name.

rdar://problem/45132695

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344079 91177308-0d34-0410-b5e6-96231b3b80d8

[ORC] Promote and rename private symbols inside the CompileOnDemand layer,
rather than require them to have been promoted before being passed in.

Dropping this precondition is better for layer composition (CompileOnDemandLayer
was the only one that placed pre-conditions on the modules that could be added).
It also means that the promoted private symbols do not show up in the target
JITDylib's symbol table. Instead, they are confined to the hidden implementation
dylib that contains the actual definitions.

For the 403.gcc testcase this cut down the public symbol table size from ~15,000
symbols to ~4000, substantially reducing symbol dependence tracking costs.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344078 91177308-0d34-0410-b5e6-96231b3b80d8

[PowerPC] Implement hasBitPreservingFPLogic for types that can be supported

This is the PPC-specific non-controversial part of
https://reviews.llvm.org/D44548 that simply enables this combine for PPC
since PPC has these instructions.
This commit will allow the target-independent portion to be truly target
independent.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344077 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] When lowering unsigned v2i64 setcc without SSE42, flip the sign bits in the v2i64 type then bitcast to v4i32.

This may give slightly better opportunities for DAG combine to simplify with the operations before the setcc. It also matches the type the xors will eventually be promoted to anyway so it saves a legalization step.

Almost all of the test changes are because our constant pool entry is now v2i64 instead of v4i32 on 64-bit targets. On 32-bit targets getConstant should be emitting a v4i32 build_vector and a v4i32->v2i64 bitcast.

There are a couple test cases where it appears we now combine a bitwise not with one of these xors which caused a new constant vector to be generated. This prevented a constant pool entry from being shared. But if that's an issue we're concerned about, it seems we need to address it another way that just relying a bitcast to hide it.

This came about from experiments I've been trying with pushing the promotion of and/or/xor to vXi64 later than LegalizeVectorOps where it is today. We run LegalizeVectorOps in a bottom up order. So the and/or/xor are promoted before their users are legalized. The bitcasts added for the promotion act as a barrier to computeKnownBits if we try to use it during vector legalization of a later operation. So by moving the promotion out we can hopefully get better results from computeKnownBits/computeNumSignBits like in LowerTruncate on AVX512. I've also looked at running LegalizeVectorOps in a top down order like LegalizeDAG, but thats showing some other issues.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344071 91177308-0d34-0410-b5e6-96231b3b80d8

[SLPVectorizer] Check that lowered type is floating point before calling isFabsFree

In the case of soft-fp (e.g. fp128 under wasm) the result of
getTypeLegalizationCost() can be an integer type even if the input is
floating point (See LegalizeTypeAction::TypeSoftenFloat).

Before calling isFabsFree() (which asserts if given a non-fp
type) we need to check that that result is fp. This is safe since in
fabs is certainly not free in the soft-fp case.

Fixes PR39168

Differential Revision: https://reviews.llvm.org/D52899

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344069 91177308-0d34-0410-b5e6-96231b3b80d8

[DWARF] Make llvm-dwarfdump display the .debug_loc.dwo section. Fixes PR38991.

Reviewer: dblaikie

Differential Revision: https://reviews.llvm.org/D52444

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344068 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] add tests for extract subvector shuffles; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344067 91177308-0d34-0410-b5e6-96231b3b80d8

Add missing space

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344064 91177308-0d34-0410-b5e6-96231b3b80d8

[PDB] Fix failure on big endian machines.

We changed an ArrayRef<uint8_t> to an ArrayRef<uint32_t>, but
it needs to be an ArrayRef<support::ulittle32_t>.

We also change ArrayRef<> to FixedStreamArray<>. Technically
an ArrayRef<> will work, but it can cause a copy in the underlying
implementation if the memory is not contiguous, and there's no
reason not to use a FixedStreamArray<>.

Thanks to nemanjai@ and thakis@ for helping me track this down
and confirm the fix.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344063 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Autogenerate complete checks. NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344060 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64][x86] add tests for bitcasted fnabs; NFC

Alternate target coverage for D44548.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344059 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] make helper function 'static'; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344056 91177308-0d34-0410-b5e6-96231b3b80d8

Fix function case.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344051 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-exegesis] Fix invalid return type and add a Dump function.

Reviewers: courbet

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D53020

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344050 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] use demanded bits to simplify masked store codegen

As noted in D52747, if we prefer IR to use trunc for bool vectors rather
than and+icmp, we can expose codegen shortcomings as seen here with masked store.

Replace a hard-coded PCMPGT simplification with the more general demanded bits call
to improve things.

Differential Revision: https://reviews.llvm.org/D52964

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344048 91177308-0d34-0410-b5e6-96231b3b80d8

[SelectionDAG] Add SIGN_EXTEND_VECTOR_INREG and CONCAT_VECTORS support to SimplifyDemandedBits

Fix for AVX1 masked load/store regression on D52964

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344043 91177308-0d34-0410-b5e6-96231b3b80d8

[mips] Fix FDE/CFI encoding in case of N32 ABI

For O32 and N32 ABI FDE/CFI encoding should be `DW_EH_PE_sdata4` and only
N64 ABI uses `DW_EH_PE_sdata8`. To cover all cases this patch check code
pointer size and setup a correct FDE/CFI encoding type.

Differential revision: https://reviews.llvm.org/D52876

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344040 91177308-0d34-0410-b5e6-96231b3b80d8

[mips] Set pointer size to 4 bytes for N32 ABI

CodePointerSize and CalleeSaveStackSlotSize values are used in DWARF
generation. In case of MIPS it's incorrect to check for Triple::isMIPS64()
only this function returns true for N32 ABI too.

Now we do not have a method to recognize N32 if it's specified by a command
line option and is not a part of a target triple. So we check for
Triple::GNUABIN32 only. It's better than nothing.

Differential revision: https://reviews.llvm.org/D52874

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344039 91177308-0d34-0410-b5e6-96231b3b80d8

Fix buildbot failures with the newly added test case (triple was missing).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344037 91177308-0d34-0410-b5e6-96231b3b80d8

[PowerPC] Remove self-copies in pre-emit peephole

There are occasionally instances where AADB rewrites registers in such a way
that a reg-reg copy becomes a self-copy. Such an instruction is obviously
redundant and can be removed. This patch does precisely that.

Note that this will not remove various nop's that we insert (which are
themselves just self-copies). The reason those are left alone is that all of
them have their own opcodes (that just encode to a self-copy).

What prompted this patch is the fact that these self-copies sometimes end up
using registers that make the instruction a priority-setting nop, thereby
having a significant effect on performance.

Differential revision: https://reviews.llvm.org/D52432

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344036 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-exegesis] Fix wrong index type.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344032 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-exegesis] Fix unused lambda capture.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344029 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-exegesis][NFC] Use accessors for Operand.

Summary:
This moves checking logic into the accessors and makes the structure smaller.
It will also help when/if Operand are generated from the TD files.

Subscribers: tschuett, courbet, llvm-commits

Differential Revision: https://reviews.llvm.org/D52982

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344028 91177308-0d34-0410-b5e6-96231b3b80d8

[ADT] Force the alignment of the `data` field of `IntervalMap`

Summary:
This patch forces the alignment of the `data` field of `IntervalMap`.
It is because x86 MSVC doesn't apply automatically
(without `__declspec(align(...))`) alignments more than 4 bytes,
even if `alignof` has returned so. Consider the example:

https://godbolt.org/z/zIPa_G

Here `alignof` for both `S0` and `S1` returns `8`, but only `S1` is really
aligned on x86. The explanation of this behavior is here:

https://docs.microsoft.com/en-us/cpp/build/conflicts-with-the-x86-compiler

Reviewers: bkramer, stoklund, hans, rnk

Reviewed By: rnk

Subscribers: dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D52613

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344027 91177308-0d34-0410-b5e6-96231b3b80d8

Revert "[ADT] Change the `IntervalMap` alignment assert for x86 MSVC"

This reverts commit 7f9eb168a9a8f5ff4fc931a00aec43e8706afecb.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344020 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][AVX1] Enable *_EXTEND_VECTOR_INREG lowering of 256-bit vectors

As discussed on D52964, this adds 256-bit *_EXTEND_VECTOR_INREG lowering support for AVX1 targets to help improve SimplifyDemandedBits handling.

Differential Revision: https://reviews.llvm.org/D52980

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344019 91177308-0d34-0410-b5e6-96231b3b80d8

[ADT] Change the `IntervalMap` alignment assert for x86 MSVC

Summary:
This patch forces the alignment of the `data` field of `IntervalMap`.
It is because x86 MSVC doesn't apply automatically
(without `__declspec(align(...))`) alignments more than 4 bytes,
even if `alignof` has returned so. Consider the example:

https://godbolt.org/z/zIPa_G

Here `alignof` for both `S0` and `S1` returns `8`, but only `S1` is really
aligned on x86. The explanation of this behavior is here:

https://docs.microsoft.com/en-us/cpp/build/conflicts-with-the-x86-compiler

Reviewers: bkramer, stoklund, hans, rnk

Reviewed By: rnk

Subscribers: dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D52613

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344018 91177308-0d34-0410-b5e6-96231b3b80d8

[CFG Printer] Add support for writing the dot files with a custom
prefix.

Use this to direct these files to a specific location in the test suite
so that we don't write files out to random directories (or fail if the
working directory isn't writable).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344014 91177308-0d34-0410-b5e6-96231b3b80d8

Make LocationSize a proper Optional type; NFC

This is the second in a series of changes intended to make
https://reviews.llvm.org/D44748 more easily reviewable. Please see that
patch for more context. The first change being r344012.

Since I was requested to do all of this with post-commit review, this is
about as small as I can make this patch.

This patch makes LocationSize into an actual type that wraps a uint64_t;
users are required to call getValue() in order to get the size now. If
the LocationSize has an Unknown size (e.g. if LocSize ==
MemoryLocation::UnknownSize), getValue() will assert.

This also adds DenseMap specializations for LocationInfo, which required
taking two more values from the set of values LocationInfo can
represent. Hence, heavy users of multi-exabyte arrays or structs may
observe slightly lower-quality code as a result of this change.

The intent is for getValue()s to be very close to a corresponding
hasValue() (which is often spelled `!= MemoryLocation::UnknownSize`).
Sadly, small diff context appears to crop that out sometimes, and the
last change in DSE does require a bit of nonlocal reasoning about
control-flow. :/

This also removes an assert, since it's now redundant with the assert in
getValue().

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344013 91177308-0d34-0410-b5e6-96231b3b80d8

Use locals instead of struct fields; NFC

This is one of a series of changes intended to make
https://reviews.llvm.org/D44748 more easily reviewable. Please see that
patch for more context.

Since I was requested to do all of this with post-commit review, this is
about as small as I can make it (beyond committing changes to these few
files separately, but they're incredibly similar in spirit, so...)

On its own, this change doesn't make a great deal of sense. I plan on
having a follow-up Real Soon Now(TM) to make the bits here make more
sense. :)

In particular, the next change in this series is meant to make
LocationSize an actual type, which you have to call .getValue() on in
order to get at the uint64_t inside. Hence, this change refactors code
so that:
- we only need to call the soon-to-come getValue() once in most cases,
and
- said call to getValue() happens very closely to a piece of code that
checks if the LocationSize has a value (e.g. if it's != UnknownSize).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344012 91177308-0d34-0410-b5e6-96231b3b80d8

llvm-link: Improve diagnostic for module-level metadata mismatch

This might produce hard to read/illegible diagnostics for especially
weird/non-trivial module metadata but integers are about all we are
using these days, so seems more useful than not.

Patch based on work by Kristina Brooks - thanks!

Differential Revision: https://reviews.llvm.org/D52952

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344011 91177308-0d34-0410-b5e6-96231b3b80d8

ExpandPostRAPseudos: Fix alldefsAreDead() not removing operands

One case left around nonsensical operands for the KILL instruction
which the machine verifier checks for nowadays. While this should not
hurt in release builds we should fix the machine verifier errors anyway.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344008 91177308-0d34-0410-b5e6-96231b3b80d8

[MIPS GlobalISel] Legalize i64 add

Custom legalize s64 G_ADD for MIPS32.

Patch by Petar Avramovic.

Differential Revision: https://reviews.llvm.org/D52652

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344007 91177308-0d34-0410-b5e6-96231b3b80d8

TwoAddressInstructionPass: Modernize/fix some comments; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344006 91177308-0d34-0410-b5e6-96231b3b80d8

PHIElimination: Remove wrong comment; NFC

The comment was contradicting the code. Looking at history the feature
was implemented a day after the comment was written without dropping the
comment.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344005 91177308-0d34-0410-b5e6-96231b3b80d8

MachineFunctionPrinterPass: Declare SlotIndexes as used if available; NFC

This makes print-machineinstrs print the slot indexes in more
situations. NFC for normal compilation.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344004 91177308-0d34-0410-b5e6-96231b3b80d8

Remove unused variable.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344002 91177308-0d34-0410-b5e6-96231b3b80d8

[PDB] fix a bug in global stream name lookup.

When we're looking up a record in the last hash bucket chain, we
need to be careful with the end-offset calculation.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344001 91177308-0d34-0410-b5e6-96231b3b80d8

[DebugInfo] Fix debug information label tests

Remove the space in the asm check so that the expression is more general
and can also capture MIPS labels which can be surrounded by braces, e.g.:

.4byte ($tmp1) # DW_AT_low_pc

Also change optimization level to O0 because the DW_TAG_label does not
appear on MIPS when -O2 is used.

Patch by Milos Stojanovic.

Differential Revision: https://reviews.llvm.org/D52901

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343999 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Revert r343993 condition branches folding for three-way conditional codes

Some buildbots failed.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343998 91177308-0d34-0410-b5e6-96231b3b80d8

[DAGCombiner] simplify code for fmul with constant fold; NFCI

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343997 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Prefer isTypeLegal over checking isSimple in a DAG combine.

Simple types are a superset of what all in tree targets in LLVM could possibly have a legal type. This means the behavior of using isSimple to check for a supported type for X86 could change over time. For example, this could would change if a v256i1 type was added to MVT in the future.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343995 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] add tests for phaddd/phaddw; NFC

More tests related to PR39195:
https://bugs.llvm.org/show_bug.cgi?id=39195

If we limit the horizontal codegen, it may require different
constraints for FP and integer.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343994 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] condition branches folding for three-way conditional codes

This patch implements a pass that optimizes condition branches on x86 by
taking advantage of the three-way conditional code generated by compare
instructions.

Currently, it tries to hoisting EQ and NE conditional branch to a dominant
conditional branch condition where the same EQ/NE conditional code is
computed. An example:
bb_0:
  cmp %0, 19
  jg bb_1
  jmp bb_2
bb_1:
  cmp %0, 40
  jg bb_3
  jmp bb_4
bb_4:
  cmp %0, 20
  je bb_5
  jmp bb_6
Here we could combine the two compares in bb_0 and bb_4 and have the
following code:

bb_0:
  cmp %0, 20
  jg bb_1
  jl bb_2
  jmp bb_5
bb_1:
  cmp %0, 40
  jg bb_3
  jmp bb_6

For the case of %0 == 20 (bb_5), we eliminate two jumps, and the control height
for bb_6 is also reduced. bb_4 is gone after the optimization.

This optimization is motivated by the branch pattern generated by the switch
lowering: we always have pivot-1 compare for the inner nodes and we do a pivot
compare again the leaf (like above pattern).

This pass currently is enabled on Intel's Sandybridge and later arches. Some
reviewers pointed out that on some arches (like AMD Jaguar), this pass may
increase branch density to the point where it hurts the performance of the
branch predictor.

Differential Revision: https://reviews.llvm.org/D46662

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343993 91177308-0d34-0410-b5e6-96231b3b80d8

[AMDGPU] Legalize VGPR Rsrc operands for MUBUF instructions

Emit a waterfall loop in the general case for a potentially-divergent Rsrc
operand. When practical, avoid this by using Addr64 instructions.

Recommits r341413 with changes to update the MachineDominatorTree when present.

Differential Revision: https://reviews.llvm.org/D51742

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343992 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][AVX2] Enable ZERO_EXTEND_VECTOR_INREG lowering of 256-bit vectors

Some necessary yak shaving before lowering *_EXTEND_VECTOR_INREG 256-bit vectors on AVX1 targets as suggested by D52964.

Differential Revision: https://reviews.llvm.org/D52970

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343991 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] make horizontal binop matching clearer; NFCI

The instructions are complicated, so this code will
probably never be very obvious, but hopefully this
makes it better.

As shown in PR39195:
https://bugs.llvm.org/show_bug.cgi?id=39195
...we need to improve the matching to not miss cases
where we're h-opping on 1 source vector, and that
should be a small patch after this rearranging.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343989 91177308-0d34-0410-b5e6-96231b3b80d8

[TailCallElim] Enable marking of calls with byval as tails

In r339636 the alias analysis rules were changed with regards to tail calls
and byval arguments. Previously, tail calls were assumed not to alias
allocas from the current frame. This has been updated, to not assume this
for arguments with the byval attribute.

This patch aligns TailCallElim with the new rule. Tail marking can now be
more aggressive and mark more calls as tails, e.g.:

define void @test() {
  %f = alloca %struct.foo
  call void @bar(%struct.foo* byval %f)
  ret void
}

define void @test2(%struct.foo* byval %f) {
  call void @bar(%struct.foo* byval %f)
  ret void
}

define void @test3(%struct.foo* byval %f) {
  %agg.tmp = alloca %struct.foo
  %0 = bitcast %struct.foo* %agg.tmp to i8*
  %1 = bitcast %struct.foo* %f to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 40, i1 false)
  call void @bar(%struct.foo* byval %agg.tmp)
  ret void
}

The problematic case where a byval parameter is captured by a call is still
handled correctly, and will not be marked as a tail (see PR7272).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343986 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU/GlobalISel: Select amdgcn.cvt.pkrtz to 64-bit instructions

Summary: The 32-bit variants do not exist on VI+.

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D52958

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343985 91177308-0d34-0410-b5e6-96231b3b80d8

Fix incorrect Twine usage in CFGPrinter

CFGPrinter (-view-cfg, -dot-cfg) invokes an undefined behaviour (dangling
pointer to rvalue) on IR files with branch weights. This patch fixes the
problem caused by Twine initialization and string conversion split into
two statements.

This change fixes the bug 37019. A similar patch to this problem was
provided in the llvmlite project

Patch by mcopik (Marcin Copik).

Differential Revision: https://reviews.llvm.org/D52933

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343984 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Future-proof {raw,struct}.buffer.atomic intrinsics

Summary:
The ISA is really supposed to support 64-bit atomics as well,
so the data type should be an overload.

Mesa doesn't use these atomics yet, in fact I noticed this
issue while trying to use the atomics from Mesa.

Change-Id: I77f58317a085a0d3eb933cc7e99308c48a19f83e

Reviewers: tpr

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D52291

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343978 91177308-0d34-0410-b5e6-96231b3b80d8

TableGen/CodeGenDAGPatterns: addPredicateFn only once

Summary:
The predicate function is added in InlinePatternFragments, no need to
do it here. As a result, all uses of addPredicateFn are located in
InlinePatternFragments.

Test confirmed that there are no changes to generated files when
building all (non-experimental) targets.

Change-Id: I720e42e045ca596eb0aa339fb61adf6fe71034d5

Reviewers: arsenm, rampitec, RKSimon, craig.topper, hfinkel, uweigand

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D51993

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343977 91177308-0d34-0410-b5e6-96231b3b80d8

Fix test case for @r343970

op2 for weakodr symbols is 101 from bcanalyzer.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343976 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] add hadd test with no undefs, remove duplicate tests; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343975 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] simplify hadd tests; NFC

The tests from PR39195 don't use 2 parameters. That's the
root problem for the pattern matching in isHorizontalBinOp().

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343974 91177308-0d34-0410-b5e6-96231b3b80d8

[AMDGPU] Add an AMDGPU specific atomic optimizer.

This commit adds a new IR level pass to the AMDGPU backend to perform
atomic optimizations. It works by:

- Running through a function and finding atomicrmw add/sub or uses of
  the atomic buffer intrinsics for add/sub.
- If all arguments except the value to be added/subtracted are uniform,
  record the value to be optimized.
- Run through the atomic operations we can optimize and, depending on
  whether the value is uniform/divergent use wavefront wide operations
  (DPP in the divergent case) to calculate the total amount to be
  atomically added/subtracted.
- Then let only a single lane of each wavefront perform the atomic
  operation, reducing the total number of atomic operations in flight.
- Lastly we recombine the result from the single lane to each lane of
  the wavefront, and calculate our individual lanes offset into the
  final result.

Differential Revision: https://reviews.llvm.org/D51969

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343973 91177308-0d34-0410-b5e6-96231b3b80d8

Don't use back-quotes in a run line.

This works on Windows, but seems to be breaking tests that
use an external shell (e.g. bash) because backquote has special
meaning.

This particular argument wasn't crucial for the test, so I've
just removed it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343971 91177308-0d34-0410-b5e6-96231b3b80d8

[ThinLTO] Keep non-prevailing (linkonce|weak)_odr symbols live

Summary:
If we have a symbol with (linkonce|weak)_odr linkage, we do not want
to dead strip it even it is not prevailing.

IR level (linkonce|weak)_odr symbol can become non-prevailing when we mix
ELF objects and IR objects where the (linkonce|weak)_odr symbol in the ELF
object is prevailing and the ones in the IR objects are not. Stripping
them will prevent us from doing optimizations with them.

By not dead stripping them, We will convert these symbols to
available_externally linkage as a result of non-prevailing and eventually
dropping them after inlining.

I modified cache-prevailing.ll to use linkonce linkage as it is
testing whether cache prevailing bit is effective or not, not
we should treat linkonce_odr alive or not

Reviewers: tejohnson, pcc

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D52893

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343970 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64][v8.5A] Don't create BR instructions in outliner when BTI enabled

When branch target identification is enabled, we can only do indirect
tail-calls through x16 or x17. This means that the outliner can't
transform a BLR instruction at the end of an outlined region into a BR.

Differential revision: https://reviews.llvm.org/D52869

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343969 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64][v8.5A] Restrict indirect tail calls to use x16/17 only when using BTI

When branch target identification is enabled, all indirectly-callable
functions start with a BTI C instruction. this instruction can only be
the target of certain indirect branches (direct branches and
fall-through are not affected):
- A BLR instruction, in either a protected or unprotected page.
- A BR instruction in a protected page, using x16 or x17.
- A BR instruction in an unprotected page, using any register.

Without BTI, we can use any non call-preserved register to hold the
address for an indirect tail call. However, when BTI is enabled, then
the code being compiled might be loaded into a BTI-protected page, where
only x16 and x17 can be used for indirect tail calls.

Legacy code withiout this restriction can still indirectly tail-call
BTI-protected functions, because they will be loaded into an unprotected
page, so any register is allowed.

Differential revision: https://reviews.llvm.org/D52868

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343968 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64][v8.5A] Branch Target Identification code-generation pass

The Branch Target Identification extension, introduced to AArch64 in
Armv8.5-A, adds the BTI instruction, which is used to mark valid targets
of indirect branches. When enabled, the processor will trap if an
instruction in a protected page tries to perform an indirect branch to
any instruction other than a BTI. The BTI instruction uses encodings
which were NOPs in earlier versions of the architecture, so BTI-enabled
code will still run on earlier hardware, just without the extra
protection.

There are 3 variants of the BTI instruction, which are valid targets for
different kinds or branches:
- BTI C can be targeted by call instructions, and is inteneded to be
  used at function entry points. These are the BLR instruction, as well
  as BR with x16 or x17. These BR instructions are allowed for use in
  PLT entries, and we can also use them to allow indirect tail-calls.
- BTI J can be targeted by BR only, and is intended to be used by jump
  tables.
- BTI JC acts ab both a BTI C and a BTI J instruction, and can be
  targeted by any BLR or BR instruction.

Note that RET instructions are not restricted by branch target
identification, the reason for this is that return addresses can be
protected more effectively using return address signing. Direct branches
and calls are also unaffected, as it is assumed that an attacker cannot
modify executable pages (if they could, they wouldn't need to do a
ROP/JOP attack).

This patch adds a MachineFunctionPass which:
- Adds a BTI C at the start of every function which could be indirectly
  called (either because it is address-taken, or externally visible so
  could be address-taken in another translation unit).
- Adds a BTI J at the start of every basic block which could be
  indirectly branched to. This could be either done by a jump table, or
  by taking the address of the block (e.g. the using GCC label values
  extension).

We only need to use BTI JC when a function is indirectly-callable, and
takes the address of the entry block. I've not been able to trigger this
from C or IR, but I've included a MIR test just in case.

Using BTI C at function entries relies on the fact that no other code in
BTI-protected pages uses indirect tail-calls, unless they use x16 or x17
to hold the address. I'll add that code-generation restriction as a
separate patch.

Differential revision: https://reviews.llvm.org/D52867

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343967 91177308-0d34-0410-b5e6-96231b3b80d8

[GlobalIsel][X86] Support G_UDIV/G_UREM/G_SREM

Support G_UDIV/G_UREM/G_SREM. The instruction selection
code is taken from FastISel with only minor tweaks to adapt
for GlobalISel.

Differential Revision: https://reviews.llvm.org/D49781

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343966 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] add 16 missed hadd patterns (PR39195); NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343965 91177308-0d34-0410-b5e6-96231b3b80d8

[IRBuilder] Fixup CreateIntrinsic to allow specifying Types to Mangle.

The IRBuilder CreateIntrinsic method wouldn't allow you to specify the
types that you wanted the intrinsic to be mangled with. To fix this
I've:

- Added an ArrayRef<Type *> member to both CreateIntrinsic overloads.
- Used that array to pass into the Intrinsic::getDeclaration call.
- Added a CreateUnaryIntrinsic to replace the most common use of
CreateIntrinsic where the type was auto-deduced from operand 0.
- Added a bunch more unit tests to test Create*Intrinsic calls that
weren't being tested (including the FMF flag that wasn't checked).

This was suggested as part of the AMDGPU specific atomic optimizer
review (https://reviews.llvm.org/D51969).

Differential Revision: https://reviews.llvm.org/D52087

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343962 91177308-0d34-0410-b5e6-96231b3b80d8

[AsmParser] Return an error in the case of empty symbol ref in an expression

The following instruction:

> str q28, [x0, #1*6*4*@]

contains a @ which is parsed as an empty symbol. The parser returns true
but has no error, so the assembler continues by ignoring the
instruction.

Differential Revision: https://reviews.llvm.org/D52645

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343961 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM] Account for implicit IT when calculating inline asm size

When deciding if it is safe to optimize a conditional branch to a CBZ or
CBNZ the offsets of the BasicBlocks from the start of the function are
estimated. For inline assembly the generic getInlineAsmLength() function is
used to get a worst case estimate of the inline assembly by multiplying the
number of instructions by the max instruction size of 4 bytes. This
unfortunately doesn't take into account the generation of Thumb implicit IT
instructions. In edge cases such as when all the instructions in the block
are 4-bytes in size and there is an implicit IT then the size is
underestimated. This can cause an out of range CBZ or CBNZ to be generated.

The patch takes a conservative approach and assumes that every instruction
in the inline assembly block may have an implicit IT.

Fixes pr31805

Differential Revision: https://reviews.llvm.org/D52834

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343960 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64] Fix verifier error when outlining indirect calls

The MachineOutliner for AArch64 transforms indirect calls into indirect
tail calls, replacing the call with the TCRETURNri pseudo-instruction.
This pseudo lowers to a BR, but has the isCall and isReturn flags set.

The problem is that TCRETURNri takes a tcGPR64 as the register argument,
to prevent indiret tail-calls from using caller-saved registers. The
indirect calls transformed by the outliner could use caller-saved
registers. This is fine, because the outliner ensures that the register
is available at all call sites. However, this causes a verifier failure
when the register is not in tcGPR64. The fix is to add a new
pseudo-instruction like TCRETURNri, but which accepts any GPR.

Differential revision: https://reviews.llvm.org/D52829

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343959 91177308-0d34-0410-b5e6-96231b3b80d8

[RISCV] Update alu8.ll and alu16.ll test cases

The srli test in alu8.ll was a no-op, as it shifted by 8 bits. Fix this, and
also change the immediate in alu16.ll as shifted by something other than a
poewr of 8 is more interesting.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343958 91177308-0d34-0410-b5e6-96231b3b80d8

[DebugInfo][PDB] Fix a signed/unsigned coversion warning

Fix the following warning when compiling with clang (caused by commit
rL343951):

GlobalsStream.cpp:61:33: warning: comparison of integers of different
signs: 'int' and 'uint32_t'

This also avoids double evaluation of `GlobalsTable.HashBuckets.size()`.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343957 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] Fix incongruous GEP type addrspace

Currently running the @insertelem_after_gep function below through the InstCombine pass with opt produces invalid IR.

Input:
```
define void @insertelem_after_gep(<16 x i32>* %t0) {
   %t1 = bitcast <16 x i32>* %t0 to [16 x i32]*
   %t2 = addrspacecast [16 x i32]* %t1 to [16 x i32] addrspace(3)*
   %t3 = getelementptr inbounds [16 x i32], [16 x i32] addrspace(3)* %t2, i64 0, i64 0
   %t4 = insertelement <16 x i32 addrspace(3)*> undef, i32 addrspace(3)* %t3, i32 0
   call void @extern_vec_pointers_func(<16 x i32 addrspace(3)*> %t4)
   ret void
}
```

Output:

```
define void @insertelem_after_gep(<16 x i32>* %t0) {
  %t3 = getelementptr inbounds <16 x i32>, <16 x i32>* %t0, i64 0, i64 0
  %t4 = insertelement <16 x i32 addrspace(3)*> undef, i32 addrspace(3)* %t3, i32 0
  call void @my_extern_func(<16 x i32 addrspace(3)*> %t4)
  ret void
}
```

Which although causes no complaints when produced, isn't valid IR as the insertelement use of the %t3 GEP expects an address space.

```
opt: /tmp/bad.ll:52:73: error: '%t3' defined with type 'i32*' but expected 'i32 addrspace(3)*'
  %t4 = insertelement <16 x i32 addrspace(3)*> undef, i32 addrspace(3)* %t3, i32 0
```

I've fixed this by adding an addrspacecast after the GEP in the InstCombine pass, and including a check for this type mismatch to the verifier.

Reviewers: spatel, lebedev.ri
Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52294

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343956 91177308-0d34-0410-b5e6-96231b3b80d8

[SelectionDAGBuilder][NFC] Pass LHSTy to getShiftAmountTy rather than RHSTy

r126518 introduced a a type parameter to the getShiftAmountTy target hook. It
produces the type of the shift (RHSTy), parameterised by the type of the value
being shifted (LHSTy). SelectionDAGBuilder::visitShift passed RHSTy rather
than LHSTy and this patch corrects this. The change is a no-op because in LLVM
IR the LHS and RHS types for a shift must be equal anyway.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343955 91177308-0d34-0410-b5e6-96231b3b80d8

[LV] Do not create SCEVs on broken IR in emitTransformedIndex. PR39160

At the point when we perform `emitTransformedIndex`, we have a broken IR (in
particular, we have Phis for which not every incoming value is properly set). On
such IR, it is illegal to create SCEV expressions, because their internal
simplification process may try to prove some predicates and break when it
stumbles across some broken IR.

The only purpose of using SCEV in this particular place is attempt to simplify
the generated code slightly. It seems that the result isn't worth it, because
some trivial cases (like addition of zero and multiplication by 1) can be
handled separately if needed, but more generally InstCombine is able to achieve
the goals we want to achieve by using SCEV.

This patch fixes a functional crash described in PR39160, and as side-effect it
also generates a bit smarter code in some simple cases. It also may cause some
optimality loss (i.e. we will now generate `mul` by power of `2` instead of
shift etc), but there is nothing what InstCombine could not handle later. In
case of dire need, we can support more trivial cases just in place.

Note that this patch only fixes one particular case of the general problem that
LV misuses SCEV, attempting to create SCEVs or prove predicates on invalid IR.
The general solution, however, seems complex enough.

Differential Revision: https://reviews.llvm.org/D52881
Reviewed By: fhahn, hsaito

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343954 91177308-0d34-0410-b5e6-96231b3b80d8

Fix a -Wsign-compare warning.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343953 91177308-0d34-0410-b5e6-96231b3b80d8

Fix a compilation failure on non-MSVC compilers.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343952 91177308-0d34-0410-b5e6-96231b3b80d8

[PDB] Add the ability to lookup global symbols by name.

The Globals table is a hash table keyed on symbol name, so
it's possible to lookup symbols by name in O(1) time. Add
a function to the globals stream to do this, and add an option
to llvm-pdbutil to exercise this, then use it to write some
tests to verify correctness.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343951 91177308-0d34-0410-b5e6-96231b3b80d8

Revert r343948 "[LegalizeDAG] Make one of the ReplaceNode signatures take an ArrayRef instead a pointer to an array. Add assert on size of array. NFC"

The assert is failing some asan tests on the bots.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343950 91177308-0d34-0410-b5e6-96231b3b80d8

[LegalizeDAG] Make one of the ReplaceNode signatures take an ArrayRef instead a pointer to an array. Add assert on size of array. NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343948 91177308-0d34-0410-b5e6-96231b3b80d8

[LegalizeDAG] Move legalization of scatter and masked store from LegalizeVectorOps to LegalizeDAG.

This is where we legalize gather and masked load so this is consistent.

Since these ops are always on vectors I've chosen to go with LegalizeDAG since that's what we do for other vector only ops like BUILD_VECTOR, VECTOR_SHUFFLE, etc. The ScalarizeMaskedMemIntrinsic pass should take care of scalarizing these before SelectionDAG so hopefully we don't need to worry about illegally typed scalar ops being emitted in the legalizing. If we did we would need to do this in LegalizeVectorOps so we could get the second type legalization that runs between LegalizeVectorOps and LegalizeDAG.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343947 91177308-0d34-0410-b5e6-96231b3b80d8

[DAGCombiner] allow undef elts in vector fadd matching

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343945 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] add vector fadd with undef elts test; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343944 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] remove redundant tests; NFC

The equivalent tests were added to the file with related folds in rL343941.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343943 91177308-0d34-0410-b5e6-96231b3b80d8

[DAGCombiner] allow undefs when matching vector splats for fmul folds

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343942 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] add vector fmul with undef elts tests; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343941 91177308-0d34-0410-b5e6-96231b3b80d8

[DAGCombiner] allow undef elts in vector fabs/fneg matching

This change is proposed as a part of D44548, but we
need this independently to avoid regressions from improved
undef propagation in SimplifyDemandedVectorElts().

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343940 91177308-0d34-0410-b5e6-96231b3b80d8

[DAGCombiner] shorten code for bitcast+fabs fold; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343939 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] add tests for FP logic folding for vectors with undefs; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343938 91177308-0d34-0410-b5e6-96231b3b80d8

[clangd] NFC: Migrate to LLVM STLExtras API where possible

This patch improves readability by migrating `std::function(ForwardIt
start, ForwardIt end, ...)` to LLVM's STLExtras range-based equivalent
`llvm::function(RangeT &&Range, ...)`.

Similar change in Clang: D52576.

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D52650

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343937 91177308-0d34-0410-b5e6-96231b3b80d8

[InstSimplify] add vector test for fneg+fdiv; NFC

This should be fixed with D52934.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343936 91177308-0d34-0410-b5e6-96231b3b80d8

[SelectionDAG] Respect multiple uses in SimplifyDemandedBits to SimplifyDemandedVectorElts simplification

rL343913 was using SimplifyDemandedBits's original demanded mask instead of the adjusted 'NewMask' that accounts for multiple uses of the op (those variable names really need improving....).

Annoyingly many of the test changes (back to pre-rL343913 state) are actually safe - but only because their multiple uses are all by PMULDQ/PMULUDQ.

Thanks to Jan Vesely (@jvesely) for bisecting the bug.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343935 91177308-0d34-0410-b5e6-96231b3b80d8

[AARCH64][X86] Remove _nonsplat from test names

As discussed on D50222

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343934 91177308-0d34-0410-b5e6-96231b3b80d8

[LegalizeVectorOps] Make ExpandStrictFPOp return the result corresponding to the result number of the SDValue passed in.

It was always returning the chain which seems to be the result number of the SDValue in the lit tests we have. But I don't know if that's guaranteed.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343933 91177308-0d34-0410-b5e6-96231b3b80d8

[IAI,LV] Avoid creating interleave-groups for predicated accesse

This patch fixes PR39099.

When strided loads are predicated, each of them will form an interleaved-group
(with gaps). However, subsequent stages of vectorization (planning and
transformation) assume that if a load is part of an Interleave-Group it is not
predicated, resulting in wrong code - unmasked wide loads are created.

The Interleaving Analysis does take care not to have conditional interleave
groups of size > 1, but until we extend the planning and transformation stages
to support masked-interleave-groups we should also avoid having them for
size == 1.

Reviewers: Ayal, hsaito, dcaballe, fhahn

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D52682

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343931 91177308-0d34-0410-b5e6-96231b3b80d8

[RISCV] Introduce alu8.ll and alu16.ll tests

These track the quality of generated code for simple arithmetic operations
that were legalised from non-native types.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343930 91177308-0d34-0410-b5e6-96231b3b80d8