git.osdn.net Git - android-x86/external-llvm.git/log

[x86] Flesh out the custom ISel for RMW aritmetic ops with used flags to
cover the bitwise operators.

Nothing really exciting here, this just stamps out the rest of the core
operations that can RMW memory and set flags.

Still not implemented here: ADC, SBB. Those will require more
interesting logic to channel the flags *in*, and I'm not currently
planning to try to tackle that. It might be interesting for someone who
wants to improve our code generation for bignum implementations.

Differential Revision: https://reviews.llvm.org/D37141

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312768 91177308-0d34-0410-b5e6-96231b3b80d8

WholeProgramDevirt: When promoting for single-impl devirt, also rename the comdat.

This is required when targeting COFF, as the comdat name must match
one of the names of the symbols in the comdat.

Differential Revision: https://reviews.llvm.org/D37550

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312767 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] Extend the manual ISel of `add` and `sub` with both RMW memory
operands and used flags to support matching immediate operands.

This is a bit trickier than register operands, and we still want to fall
back on a register operands even for things that appear to be
"immediates" when they won't actually select into the operation's
immediate operand. This also requires us to handle things like selecting
`sub` vs. `add` to minimize the number of bits needed to represent the
immediate, and picking the shortest immediate encoding. In order to
that, we in turn need to scan to make sure that CF isn't used as it will
get inverted.

The end result seems very nice though, and we're now generating
optimal instruction sequences for these patterns IMO.

A follow-up patch will further expand this to other operations with RMW
memory operands. But handing `add` and `sub` are useful starting points
to flesh out the machinery and make sure interesting and complex cases
can be handled.

Thanks to Craig Topper who provided a few fixes and improvements to this
patch in addition to the review!

Differential Revision: https://reviews.llvm.org/D37139

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312764 91177308-0d34-0410-b5e6-96231b3b80d8

Don't call exit from cl::PrintHelpMessage.

Most callers were not expecting the exit(0) and trying to exit with a
different value.

This also adds back the call to cl::PrintHelpMessage in llvm-ar.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312761 91177308-0d34-0410-b5e6-96231b3b80d8

[Bitcode] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes (NFC).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312760 91177308-0d34-0410-b5e6-96231b3b80d8

Sink some IntrinsicInst.h and Intrinsics.h out of llvm/include

Many of these uses can get by with forward declarations. Hopefully this
speeds up compilation after adding a single intrinsic.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312759 91177308-0d34-0410-b5e6-96231b3b80d8

Revert r312318, r312325, r312424, r312489

r312318 - Debug info for variables whose type is shrinked to bool
r312325, r312424, r312489 - Test case for r312318

Revision 312318 introduced a null dereference bug.
Details in https://bugs.llvm.org/show_bug.cgi?id=34490

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312758 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-objcopy] Add support for special section indexes in symbol table greater than SHN_LORESERVE

As is indexes above SHN_LORESERVE will not be handled correctly because
they'll be treated as indexes of sections rather than special values
that should just be copied. This change adds support to copy them
though.

Patch by Jake Ehrlich

Differential Revision: https://reviews.llvm.org/D37393

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312756 91177308-0d34-0410-b5e6-96231b3b80d8

Move duplicate helpers from DbgValueInst / DbgDeclareInst to DbgInfoIntrinsic

NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312754 91177308-0d34-0410-b5e6-96231b3b80d8

llvm-ar: exit with 1 if there is an error.

This is pr34396.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312752 91177308-0d34-0410-b5e6-96231b3b80d8

[DWARF] Line 0 should not have a discriminator.
It's meaningless and takes up extra space in the line table.

Differential Revision: https://reviews.llvm.org/D37364

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312751 91177308-0d34-0410-b5e6-96231b3b80d8

Fix llvm-xray tests to avoid subshells

We already uses pipefail to detect failure of a redirected command, so
the "|| echo failure" construct was unnecessary.

These tests run and pass on Windows now.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312747 91177308-0d34-0410-b5e6-96231b3b80d8

[ORC] Add ErrorSuccess and void specializations to AsyncHandlerTraits.

This will allow async handlers to be added that return void or Error::success().
Such handlers are expected to be common, since one of the primary uses of
addAsyncHandler is to run the body of the handler in a detached thread, in which
case the main handler returns immediately and does not need to provide an Error
value.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312746 91177308-0d34-0410-b5e6-96231b3b80d8

[yaml2obj][ELF] Add support for symbol indexes greater than SHN_LORESERVE

Right now Symbols must be either undefined or defined in a specific
section. Some symbols have section indexes like SHN_ABS however. This
change adds support for outputting symbols that have such section
indexes.

Patch by Jake Ehrlich

Differential Revision: https://reviews.llvm.org/D37391

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312745 91177308-0d34-0410-b5e6-96231b3b80d8

COFF: PDB: Allow multiple modules with the same name.

It is possible for two modules to have the same name if they are
archive members with the same name, or if we are doing LTO (in which
case all modules will have the name "lto.tmp").

Differential Revision: https://reviews.llvm.org/D37589

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312744 91177308-0d34-0410-b5e6-96231b3b80d8

Remove dead code. NFCI.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312740 91177308-0d34-0410-b5e6-96231b3b80d8

[XRay][tools] Disable windows for tests that use an unsupported shell redirect.

The tests are filechecking against stderr and use some magic to make stdout go
away and pipe stderr to FileCheck. This broke bots on windows.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312739 91177308-0d34-0410-b5e6-96231b3b80d8

[CUDA] Added rudimentary support for CUDA-9 and sm_70.

For now CUDA-9 is not included in the list of CUDA versions clang
searches for, so the path to CUDA-9 must be explicitly passed
via --cuda-path=.

On LLVM side NVPTX added sm_70 GPU type which bumps required
PTX version to 6.0, but otherwise is equivalent to sm_62 at the moment.

Differential Revision: https://reviews.llvm.org/D37576

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312734 91177308-0d34-0410-b5e6-96231b3b80d8

[XRay][tools] Function call stack based analysis tooling for XRay traces

Second try after fixing a code san problem with iterator reference types.

This change introduces a subcommand to the llvm-xray tool called
"stacks" which allows for analysing XRay traces provided as inputs and
accounting time to stacks instead of just individual functions. This
gives us a more precise view of where in a program the latency is
actually attributed.

The tool uses a trie data structure to keep track of the caller-callee
relationships as we process the XRay traces. In particular, we keep
track of the function call stack as we enter functions. While we're
doing this we're adding nodes in a trie and indicating a "calls"
relatinship between the caller (current top of the stack) and the callee
(the new top of the stack). When we push function ids onto the stack, we
keep track of the timestamp (TSC) for the enter event.

When exiting functions, we are able to account the duration by getting
the difference between the timestamp of the exit event and the
corresponding entry event in the stack. This works even if we somehow
miss the exit events for intermediary functions (i.e. if the exit event
is not cleanly associated with the enter event at the top of the stack).

The output of the tool currently provides just the top N leaf functions
that contribute the most latency, and the top N stacks that have the
most frequency. In the future we can provide more sophisticated query
mechanisms and potentially an export to database feature to make offline
analysis of the stack traces possible with existing tools.

Differential revision: D34863

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312733 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Start selecting v_mad_mix_f32

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312732 91177308-0d34-0410-b5e6-96231b3b80d8

DAG: Allow creating extract_vector_elt post-legalize

Fixes some combine issues for AMDGPU where we weren't
getting the many extract_vector_elt combines expected
in a future patch.

This should really be checking isOperationLegalOrCustom on
the extract. That improves a number of x86 lit tests, but
a few get stuck in an infinite loop from one place
where a similar looking extract is created. I have a
different workaround in the backend for that which
keeps many of those improvements, but also adds a few
regressions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312730 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Handle non-temporal loads and stores

Differential Revision: https://reviews.llvm.org/D36862

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312729 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Handle more than one memory operand in SIMemoryLegalizer

Differential Revision: https://reviews.llvm.org/D37397

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312725 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM] Remove redundant vcvt patterns.

These don't add any value as they're just compositions of existing
patterns. However, they can confuse the cost logic in ISel, leading to
duplicated vcvt instructions like in PR33199.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312724 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][LLVM]Expanding Supports lowerInterleavedLoad() in X86InterleavedAccess (VF{8|16|32} stride 3).

This patch expands the support of lowerInterleavedload to {8|16|32}x8i stride 3.

LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=3 VF={8|16|32}) and we plan to include the store (deinterleved side).

The patch goal is to optimize the following sequence:
a0 b0 c0 a1 b1 c1 a2 b2
c2 a3 b3 c3 a4 b4 c4 a5
b5 c5 a6 b6 c6 a7 b7 c7

into

a0 a1 a2 a3 a4 a5 a6 a7
b0 b1 b2 b3 b4 b5 b6 b7
c0 c1 c2 c3 c4 c5 c6 c7

Reviewers
1. zvi
2. igor
3. guyblank
4. dorit
5. Ayal

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312722 91177308-0d34-0410-b5e6-96231b3b80d8

[mips] Use RegisterMCAsmBackend to register all MIPS asm backends. NFC

This change converts the `MipsAsmBackend` constructor to the "standard"
form. It makes possible to use `RegisterMCAsmBackend` for the backends
registrations. Now we pass `Triple` instance to the `MipsAsmBackend`
ctor and deduce all required options like endianness and bitness from
the triple. We still need to implement explicit ABI checking for
providing correct options to backends.

Differential revision: https://reviews.llvm.org/D37519

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312720 91177308-0d34-0410-b5e6-96231b3b80d8

[MachineCombiner] Update instruction depths incrementally for large BBs.

Summary:
For large basic blocks with lots of combinable instructions, the
MachineTraceMetrics computations in MachineCombiner can dominate the compile
time, as computing the trace information is quadratic in the number of
instructions in a BB and it's relevant successors/predecessors.

In most cases, knowing the instruction depth should be enough to make
combination decisions. As we already iterate over all instructions in a basic
block, the instruction depth can be computed incrementally. This reduces the
cost of machine-combine drastically in cases where lots of instructions
are combined. The major drawback is that AFAIK, computing the critical path
length cannot be done incrementally. Therefore we only compute
instruction depths incrementally, for basic blocks with more
instructions than inc_threshold. The -machine-combiner-inc-threshold
option can be used to set the threshold and allows for easier
experimenting and checking if using incremental updates for all basic
blocks has any impact on the performance.

Reviewers: sanjoy, Gerolf, MatzeB, efriedma, fhahn

Reviewed By: fhahn

Subscribers: kiranchandramohan, javed.absar, efriedma, llvm-commits

Differential Revision: https://reviews.llvm.org/D36619

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312719 91177308-0d34-0410-b5e6-96231b3b80d8

[MachineTraceMetrics] Add computeDepth function (NFCI).

Summary:
This function is used in D36619 to update the instruction depths
incrementally.

Reviewers: efriedma, Gerolf, MatzeB, fhahn

Reviewed By: fhahn

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D36696

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312714 91177308-0d34-0410-b5e6-96231b3b80d8

[Sparc][NFC] Clean up SelectCC lowering
The ARM, BPF, MSP430, Sparc and Mips backends all use a similar code sequence
for lowering SelectCC. As pointed out by @reames in D29937, this code isn't
particularly clear and in most of these backends doesn't actually match the
comments. This patch makes the code sequence clearer for the Sparc backend
through better variable naming and more accurate comments (e.g. we are
inserting triangle control flow, _not_ diamond). There is no functional
change.

Differential Revision: https://reviews.llvm.org/D37194

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312713 91177308-0d34-0410-b5e6-96231b3b80d8

Fixing incorrectly capitalised regexps.

Patch by Sam Allen!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312709 91177308-0d34-0410-b5e6-96231b3b80d8

Revert "[RegAlloc] Make sure live-ranges reflect the state of the IR when removing them"

This temporarily reverts commit 463fa38 (r311401).

See https://bugs.llvm.org/show_bug.cgi?id=34502

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312708 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] Update to cmov promotion tests for D36711; NFC

Adding i8 -> [i16, i32, i64] and i32 -> i64 cases.
This way we can see what the current codegen looks like.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312707 91177308-0d34-0410-b5e6-96231b3b80d8

X86: Improve AVX512 fptoui lowering

Summary:
Add patterns for
fptoui <16 x float> to <16 x i8>
fptoui <16 x float> to <16 x i16>

Reviewers: igorb, delena, craig.topper

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D37505

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312704 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Force shuffle lowering to only create X86ISD::VPERM2X128 with 64-bit element types so we can remove some patterns from isel.

Intrinsic handling is still creating these nodes with 32-bit elements as well. But at least this gets rid of 8 and 16.

Ideally, someday we'll convert the intrinsics to generic vector shuffles and remove the intrinsics.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312702 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Don't legalize i16 extloads to i32 with legal i16

Keeping non-i16 extloads makes it easier to match some new
gfx9 load instructions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312699 91177308-0d34-0410-b5e6-96231b3b80d8

ModuleSummaryAnalysis: Correctly handle all function operand references.

The current code that handles personality functions when creating a
module summary does not correctly handle the case where a function's
personality function operand refers to the function indirectly
(e.g. via a bitcast). This patch handles such cases by treating
personality function references like any other reference, i.e. by
adding them to the function's reference list. This has the minor side
benefit of allowing personality functions to participate in early
dead stripping.

We do this by calling findRefEdges on the function itself. This way
we also end up handling other function operands (specifically prefix
data and prologue data) for free.

Differential Revision: https://reviews.llvm.org/D37553

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312698 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Remove patterns for selecting a v8f32 X86ISD::MOVSS or v4f64 X86ISD::MOVSD.

I don't think we ever generate these. If we did, I would expect we would also be able to generate v16f32 and v8f64, but we don't have those patterns.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312694 91177308-0d34-0410-b5e6-96231b3b80d8

ARM: track globals promoted to coalesced const pool entries

Globals that are promoted to an ARM constant pool may alias with another
existing constant pool entry. We need to keep a reference to all globals
that were promoted to each constant pool value so that we can emit a
distinct label for each promoted global. These labels are necessary so
that debug info can refer to the promoted global without an undefined
reference during linking.

Patch by Stephen Crane!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312692 91177308-0d34-0410-b5e6-96231b3b80d8

Object: Downgrade invalid weak externals from an assert fail to an llvm::Error when creating an irsymtab.

This fixes bitcode emission for modules containing invalid weak externals.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312686 91177308-0d34-0410-b5e6-96231b3b80d8

InstSimplify: canonicalize is idempotent

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312685 91177308-0d34-0410-b5e6-96231b3b80d8

LTO: Remove unnecessary Windows support code.

I empirically verified that open files can in fact be renamed on
Windows with sys::fs::rename, so remove the incorrect code and comment.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312683 91177308-0d34-0410-b5e6-96231b3b80d8

Reland "[llvm-objcopy] Add support for relocations"

This change adds support for SHT_REL and SHT_RELA sections in
llvm-objcopy.

Patch by Jake Ehrlich

Differential Revision: https://reviews.llvm.org/D36554

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312680 91177308-0d34-0410-b5e6-96231b3b80d8

[Pass] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312679 91177308-0d34-0410-b5e6-96231b3b80d8

Util: Improve update_llc_test_checks to scrub macosx-style assembly annotations

Summary:
In D37523 Sanjay pointed out that the tool does not scrub macosx-style 'End of Function' annotations,
where the comments begin with a double-#.

I tested this patch by verifying all existing occurences of 'End function' are scrubbed:
find ./test/CodeGen/X86 -name '*.ll' | xargs grep -l "End function" | xargs utils/update_llc_test_checks.py --llc-binary build/bin/llc

Reviewers: spatel, chandlerc, craig.topper

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D37532

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312678 91177308-0d34-0410-b5e6-96231b3b80d8

[AMDGPU] Use v_pk_max_f16 for fcanonicalize

Differential Revision: https://reviews.llvm.org/D37325

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312676 91177308-0d34-0410-b5e6-96231b3b80d8

[WebAssembly] Only treat imports/exports as symbols when reading relocatable object files

This change only treats imported and exports functions and globals
as symbol table entries the object has a "linking" section (i.e. it is
relocatable object file).

In this case all globals must be of type I32 and initialized with
i32.const. This was previously being assumed but not checked for and
was causing a failure on big endian machines due to using the wrong
value of then union.

See: https://bugs.llvm.org/show_bug.cgi?id=34487

Differential Revision: https://reviews.llvm.org/D37497

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312674 91177308-0d34-0410-b5e6-96231b3b80d8

Removes redundant `llvm::`, add comments and simplify a return type of a function.

No functional change intended.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312673 91177308-0d34-0410-b5e6-96231b3b80d8

Insert IMPLICIT_DEFS for undef uses in tail merging

Tail merging can convert an undef use into a normal one when creating a
common tail. Doing so can make the register live out from a block which
previously contained the undef use. To keep the liveness up-to-date,
insert IMPLICIT_DEFs in such blocks when necessary.

To enable this patch the computeLiveIns() function which used to
compute live-ins for a block and set them immediately is split into new
functions:
- computeLiveIns() just computes the live-ins in a LivePhysRegs set.
- addLiveIns() applies the live-ins to a block live-in list.
- computeAndAddLiveIns() is a convenience function combining the other
two functions and behaving like computeLiveIns() before this patch.

Based on a patch by Krzysztof Parzyszek <kparzysz@codeaurora.org>

Differential Revision: https://reviews.llvm.org/D37034

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312668 91177308-0d34-0410-b5e6-96231b3b80d8

[docs] Add a note on iteration of unordered containers to coding standards

Summary: Beware of non-determinism due to ordering of pointers

Reviewers: dblaikie, dexonsmith

Reviewed By: dblaikie

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D37525

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312667 91177308-0d34-0410-b5e6-96231b3b80d8

Disable jump threading into loop headers

Consider this type of a loop:
    for (...) {
      ...
      if (...) continue;
      ...
    }
Normally, the "continue" would branch to the loop control code that
checks whether the loop should continue iterating and which contains
the (often) unique loop latch branch. In certain cases jump threading
can "thread" the inner branch directly to the loop header, creating
a second loop latch. Loop canonicalization would then transform this
loop into a loop nest. The problem with this is that in such a loop
nest neither loop is countable even if the original loop was. This
may inhibit subsequent loop optimizations and be detrimental to
performance.

Differential Revision: https://reviews.llvm.org/D36404

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312664 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] fix triple and regenerate checks for psubus; NFC

Patch by Yulia Koval!

Differential Revision: https://reviews.llvm.org/D37523

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312662 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Move more isel patterns to X86InstrVecCompiler.td. NFC

This moves more of our subvector insert/extract tricks to X86InstrVecCompiler.td and refactors them into multiclasses.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312661 91177308-0d34-0410-b5e6-96231b3b80d8

[AMDGPU] Fixed encoding of v_pk_mul_f16 in fcanonicalize

Differential Revision: https://reviews.llvm.org/D37522

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312660 91177308-0d34-0410-b5e6-96231b3b80d8

[IfConversion] Remove kill flags from common instructions as well

When if-converting a diamond, two separate blocks will be placed back
to back to form a straight line code. To ensure correctness of the
liveness information, any registers that are live in the second block
should not be killed in the first block, even if they were in the
original code.
Additionally, when the two blocks share common instructions at the
beginning, these instructions will not be duplicated, but only placed
once, before both of the blocks. Since the function "isIdenticalTo"
(as used here) ignores kill flags, the common initial code in one
block may have a kill flag for a register that is live in the other
block.
Because the code that removes kill flags only runs for the non-common
parts of the predicated blocks, a kill flag mismatch in the common
code could still lead to a live register being killed prematurely.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312654 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Actually add the new file that was supposed to go with r312649.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312650 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Introduce a new td file to hold patterns some of the non instruction patterns from SSE and AVX512

This patch moves some of similar non-instruction patterns from X86InstrSSE.td and X86InstrAVX512.td to a common file.

This is intended as a starting point. There are many other optimization patterns that exist in both files that we could move here.

Differential Revision: https://reviews.llvm.org/D37455

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312649 91177308-0d34-0410-b5e6-96231b3b80d8

Fix PR33878: BasicAA incorrectly assumes different address spaces don't alias
Remove code that assumed that a nullptr of address space != 0 couldnt alias with a non-null pointer. This is incorrect, since nothing can be concluded about a null pointer in an address space != 0.
This code was written before address spaces were introduced

Differential Revision: https://reviews.llvm.org/D37518

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312648 91177308-0d34-0410-b5e6-96231b3b80d8

Minor style fixes in lib/Support/**/Program.(inc|cpp).

No functional changes intended.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312646 91177308-0d34-0410-b5e6-96231b3b80d8

Revert "[llvm-objcopy] Add support for relocations"

This reverts r312643 because it's failing on llvm-i686-linux-RA.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312645 91177308-0d34-0410-b5e6-96231b3b80d8

[Hexagon] Add option to generate calls to "abort" for "unreachable"

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312644 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-objcopy] Add support for relocations

This change adds support for SHT_REL and SHT_RELA sections in
llvm-objcopy.

Patch by Jake Ehrlich

Differential Revision: https://reviews.llvm.org/D36554

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312643 91177308-0d34-0410-b5e6-96231b3b80d8

[TailCall] Allow llvm.memcpy/memset/memmove to be tail calls when parent
function return the intrinsics's first argument.

llvm.memcpy/memset/memmove return void but they will return the first
argument after they are expanded as libcalls. Now if the parent function
has any return value, llvm.memcpy cannot be turned into tail call after
expansion.

The patch is to handle that case in SelectionDAGBuilder so when caller
function return the same value as the first argument of llvm.memcpy,
tail call is allowed.

Differential Revision: https://reviews.llvm.org/D37406

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312641 91177308-0d34-0410-b5e6-96231b3b80d8

[AMDGPU] Fix shouldClusterMemOps to process flat loads

Flat loads do not have vdata operand but have vdst instead.

Differential Revision: https://reviews.llvm.org/D37502

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312640 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Make worst-case assumption about the wait states in inline assembly

Summary:
Mesa still uses a hack where empty inline assembly is used as a kind of
optimization barrier. This exposed a problem where not enough wait states
were inserted, because the hazard recognizer implicitly assumed that each
inline assembly "instruction" has at least one wait state.

Reviewers: arsenm

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D37205

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312635 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][X87] Ensure x87 instructions are tagged as altering the FPSW reg

As noted in PR34080, a lot of x87 instructions alter the FPSW status register (or leave it in an undefined state) but aren't tagged as such in the tablegen.

This patch tags the control word, stack, wait and math instructions as altering FPSW, which matches what the AMD APMs suggests happens.

Differential Revision: https://reviews.llvm.org/D36414

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312629 91177308-0d34-0410-b5e6-96231b3b80d8

[RISCV][NFC] Fix sorting of includes in lib/Target/RISCV

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312624 91177308-0d34-0410-b5e6-96231b3b80d8

[DAGCombiner] When combining EXTRACT_SUBVECTOR of a BUILD_VECTOR, make sure we don't create a BUILD_VECTOR with an illegal type after type legalization.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312621 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] Fix PR34377 by disabling cmov conversion when we relied on it
performing a zext of a register.

On the PR there is discussion of how to more effectively handle this,
but this patch prevents us from miscompiling code.

Differential Revision: https://reviews.llvm.org/D37504

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312620 91177308-0d34-0410-b5e6-96231b3b80d8

X86 Tests: Tidy up AVX512 conversion tests. NFC.

Rename functions to a consistent format to make it easier to track coverage.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312619 91177308-0d34-0410-b5e6-96231b3b80d8

Updating a test reference for rL312608.

Differential Revision: https://reviews.llvm.org/D37501

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312614 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Add more FMA3 patterns to cover a load in all 3 possible positions.

This matches what we already do for AVX512. The peephole pass makes up for this in most if not all cases. But this makes isel behavior for these consistent with every other instruction.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312613 91177308-0d34-0410-b5e6-96231b3b80d8

[PowerPC] Don't use xscvdpspn on the P7

xscvdpspn was not introduced until the P8, so don't use it on the P7. Fixes a
regression introduced in r288152.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312612 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Allow cross-lane permutations for sub targets supporting AVX2.

Summary:
Most instructions in AVX work “in-lane”, that is, each source element is applied only to other
elements of the same lane, thus a cross lane permutation is costly and needs more than one instrution.
AVX2 includes instructions to perform any-to-any permutation of words over a 256-bit register
and vectorized table lookup.

This should also Fix PR34369

Differential Revision: https://reviews.llvm.org/D37388

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312608 91177308-0d34-0410-b5e6-96231b3b80d8

[ORC] Fix some comments in JITSymbol.

Patch by Breckin Loggins. Thanks Breckin!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312607 91177308-0d34-0410-b5e6-96231b3b80d8

Fix crbug 759265 by suppressing llvm mt warnings.

Summary:
Previous would throw warning whenever libxml2 is not installed. Now
only give this warning if merging manifest fails.

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D37240

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312604 91177308-0d34-0410-b5e6-96231b3b80d8

Use the section name if a STT_SECTION symbol has empty name.

Without this we would have multiple relocations pointing to symbols
with the same name: the empty string. There was no way for yaml2obj to
be able to handle that.

A more general solution would be to unique symbol names in a similar
way to how we unique section names. In practice I think this covers
all common cases and is a bit more user friendly than using names like
sym1, sym2, sym3, etc.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312603 91177308-0d34-0410-b5e6-96231b3b80d8

[AMDGPU] Transform __read_pipe_* and __write_pipe_*

When packet size equals packet align and is power of 2, transform
__read_pipe* and __write_pipe* to specialized library function.

Differential Revision: https://reviews.llvm.org/D36831

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312598 91177308-0d34-0410-b5e6-96231b3b80d8

[ValueTracking, InstCombine] canonicalize fcmp ord/uno with non-NAN ops to null constants

This is a preliminary step towards solving the remaining part of PR27145 - IR for isfinite():
https://bugs.llvm.org/show_bug.cgi?id=27145

In order to solve that one more generally, we need to add matching for and/or of fcmp ord/uno
with a constant operand.

But while looking at those patterns, I realized we were missing a canonicalization for nonzero
constants. Rather than limiting to just folds for constants, we're adding a general value
tracking method for this based on an existing DAG helper.

By transforming everything to 0.0, we can simplify the existing code in foldLogicOfFCmps()
and pick up missing vector folds.

Differential Revision: https://reviews.llvm.org/D37427

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312591 91177308-0d34-0410-b5e6-96231b3b80d8

Fix a use after free.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312590 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM] Make ARMExpandPseudo add implicit uses for predicated instructions

Missing these could potentially screw up post-ra scheduling.

Issue found by inspection, so I don't have a real testcase. Included
test just verifies the expected operands after expansion.

Differential Revision: https://reviews.llvm.org/D35156

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312589 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM] Register ARMExpandPseudo pass.

This allows -run-pass etc. to refer to it.

(Split off from D35156.)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312587 91177308-0d34-0410-b5e6-96231b3b80d8

obj2yaml: Print unique section names.

Without this patch passing a .o file with multiple sections with the
same name to obj2yaml produces a yaml file that yaml2obj cannot
handle. This is pr34162.

The problem is that when specifying, for example, the section of a
symbol, we get only

Section: foo

and don't know which of the sections whose name is foo we have to use.

One alternative would be to use section numbers. This would work, but
the output from obj2yaml would be very inconvenient to edit as
deleting a section would invalidate all indexes.

Another alternative would be to invent a unique section id that would
exist only on yaml. This would work, but seems a bit heavy handed. We
could make the id optional and default it to the section name.

Since in the last alternative the id is basically what this patch uses
as a name, it can be implemented as a followup patch if needed.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312585 91177308-0d34-0410-b5e6-96231b3b80d8

[ORC] Convert null remote symbols to null JITSymbols.

The existing code created a JITSymbol with an invalid materializer instead,
guaranteeing a 'missing symbol' error when someone tried to materialize the
symbol.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312584 91177308-0d34-0410-b5e6-96231b3b80d8

[CodeView] Don't output S_UDTs for nested typedefs.

S_UDT records are basically the "bridge" between the debugger's
expression evaluator and the type information. If you type
(Foo*)nullptr into the watch window, the debugger looks for an
S_UDT record named Foo. If it can find one, it displays your type.
Otherwise you get an error.

We have always understood this to mean that if you have code like
this:

  struct A {
    int X;
  };

  struct B {
    typedef A AT;
    AT Member;
  };

that you will get 3 S_UDT records. "A", "B", and "B::AT". Because
if you were to type (B::AT*)nullptr into the debugger, it would
need to find an S_UDT record named "B::AT".

But "B::AT" is actually the S_UDT record that would be generated
if B were a namespace, not a struct. So the debugger needs to be
able to distinguish this case. So what it does is:

  1. Look for an S_UDT named "B::AT". If it finds one, it knows
     that AT is in a namespace.
  2. If it doesn't find one, split at the scope resolution operator,
     and look for an S_UDT named B. If it finds one, look up the type
     for B, and then look for AT as one of its members.

With this algorithm, S_UDT records for nested typedefs are not just
unnecessary, but actually wrong!

The results of implementing this in clang are dramatic. It cuts
our /DEBUG:FASTLINK PDB sizes by more than 50%, and we go from
being ~20% larger than MSVC PDBs on average, to ~40% smaller.

It also slightly speeds up link time. We get about 10% faster
links than without this patch.

Differential Revision: https://reviews.llvm.org/D37410

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312583 91177308-0d34-0410-b5e6-96231b3b80d8

Revert "[Decompression] Fail gracefully when out of memory"

This reverts commit r312526.

Revert "Fix test/DebugInfo/dwarfdump-decompression-invalid-size.test"

This reverts commit r312527.

It causes an ASan failure:
http://lab.llvm.org:8080/green/job/clang-stage2-cmake-RgSan_check/4150

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312582 91177308-0d34-0410-b5e6-96231b3b80d8

[unittest/ReverseIteration] Unbreak when compiling with GCC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312579 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] add nnan tests; NFC

As suggested in D37427, we could have a value tracking function and folds that use
it to simplify these cases.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312578 91177308-0d34-0410-b5e6-96231b3b80d8

[GVNHoist] Move duplicated code to a helper function. NFCI.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312575 91177308-0d34-0410-b5e6-96231b3b80d8

[unittests] Add reverse iteration unit test for pointer-like keys

Reviewers: dblaikie, efriedma, mehdi_amini

Reviewed By: dblaikie

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D37241

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312574 91177308-0d34-0410-b5e6-96231b3b80d8

Fix RST syntax in LangRef for llvm.codeview.annotation intrinsic

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312571 91177308-0d34-0410-b5e6-96231b3b80d8

Add llvm.codeview.annotation to implement MSVC __annotation

Summary:
This intrinsic represents a label with a list of associated metadata
strings. It is modelled as reading and writing inaccessible memory so
that it won't be removed as dead code. I think the intention is that the
annotation strings should appear at most once in the debug info, so I
marked it noduplicate. We are allowed to inline code with annotations as
long as we strip the annotation, but that can be done later.

Reviewers: majnemer

Subscribers: eraman, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D36904

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312569 91177308-0d34-0410-b5e6-96231b3b80d8

[SCEV] Ensure ScalarEvolution::createAddRecFromPHIWithCastsImpl properly handles out of range truncations of the start and accum values

Summary:
When constructing the predicate P1 in ScalarEvolution::createAddRecFromPHIWithCastsImpl() it is possible
for the PHISCEV from which the predicate is constructed to be a SCEVConstant instead of a SCEVAddRec. If
this happens, then the cast<SCEVAddRec>(PHISCEV) in the code will assert.

Such a PHISCEV is possible if either the start value or the accumulator value is a constant value
that not equal to its truncated value, and if the truncated value is zero.

This patch adds tests that demonstrate the cast<> assertion, and fixes this problem by checking
whether the PHISCEV is a constant before constructing the P1 predicate; if it is, then P1 is
equivalent to one of P2 or P3. Additionally, if we know that the start value or accumulator
value are constants then we check whether the P2 and/or P3 predicates are known false at compile
time; if either is, then we bail out of constructing the AddRec.

Reviewers: sanjoy, mkazantsev, silviu.baranga

Reviewed By: mkazantsev

Subscribers: mkazantsev, llvm-commits

Differential Revision: https://reviews.llvm.org/D37265

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312568 91177308-0d34-0410-b5e6-96231b3b80d8

LTO: Try to open cache files before renaming them.

It appears that a potential race between the cache client and the cache
pruner that I thought was unlikely actually happened in practice [1].
Try to avoid the race condition by opening the temporary file before
renaming it. Do this only on non-Windows platforms because we cannot
rename open files on Windows using the sys::fs::rename function.

[1] https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Fchromium.memory%2FLinux_CFI%2F1610%2F%2B%2Frecipes%2Fsteps%2Fcompile%2F0%2Fstdout

Differential Revision: https://reviews.llvm.org/D37410

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312567 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Remove unnecessary (v4f32 (X86vzmovl (v4f32 (scalar_to_vector FR32X)))) patterns

We had already disabled the pattern for SSE4.1 and SSE4.2. But it got re-enabled for AVX and AVX512.

With SSE41 we rely on a separate (v4f32 (X86vzmovl VR128)) pattern to select blendps with a xorps to create zeroess. And a separate (v4f32 (scalar_to_vector FR32X)) to select a COPY_TO_REG_CLASS to move FR32 to VR128

The same thing can happen for AVX with vblendps and those separate patterns already exist.

For AVX512, (v4f32 (X86vzmov VR128)) will select a VMOVSS instruction instead of VBLENDPS due to their not being a EVEX VBLENDPS. This is what we were getting out of the larger pattern anyway. So the larger pattern is unneeded for AVX512 too.

For SSE1-SSSE3 we can rely on (v4f32 (X86vzmov VR128)) selecting a MOVSS similar to AVX512. Again this is what the larger pattern did too.

So the only real change here is that AVX1/2 now properly outputs a VBLENDPS during isel instead of a VMOVSS to match SSE41. Most tests didn't notice because the two address instruction pass knows how to turn VMOVSS into VBLENDPS to get an independent destination register.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312564 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Cleanup/refactor SIMemoryLegalizer [3]:

- Refactor SIMemOpInfo's constructors
- Allow construction of NotAtomic SIMemOpInfo

Differential Revision: https://reviews.llvm.org/D37396

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312563 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Fix not accounting for tail call resource usage

If the only call in a function is a tail call, the
function isn't considered to have a call since it's a
type of return.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312561 91177308-0d34-0410-b5e6-96231b3b80d8

X86 Tests: Adding missing AVX512 fptoui coverage tests. NFC.

Some of the cases show missing pattern i intend to fix shortly.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312560 91177308-0d34-0410-b5e6-96231b3b80d8

[PPC][NFC] Renaming things with 'xxinsert' moniker to 'vecinsert' to make it more general.

Commit on behalf of Graham Yiu (gyiu@ca.ibm.com)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312547 91177308-0d34-0410-b5e6-96231b3b80d8

Split opt-remark YAML and opt output testing on this test

This prepares for https://reviews.llvm.org/D33514

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312544 91177308-0d34-0410-b5e6-96231b3b80d8

[AVX512] Remove patterns for (v8f32 (X86vzmovl (insert_subvector undef, (v4f32 (scalar_to_vector FR32X:)), (iPTR 0)))) and the same for v4f64.

We don't have this same pattern for AVX2 so I don't believe we should have it for AVX512. We also didn't have it for v16f32.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312543 91177308-0d34-0410-b5e6-96231b3b80d8