git.osdn.net Git - android-x86/external-llvm.git/log

[MSSA] Remove incorrect comment + `auto`ify dyn_cast results; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335399 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][AsmParser] Keep track of whether an explicit scale was specified while parsing an address in Intel syntax. Use it for improved error checking.

This allows us to check these:
-16-bit addressing doesn't support scale so we should error if we find one there.
-Multiplying ESP/RSP by a scale even if the scale is 1 should be an error because ESP/RSP can't be an index.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335398 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-size] Make global variables static

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335397 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] add more tests for bit hacking opportunities with setcc; NFC

Missed cases where the input and output are the same size in rL335391.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335396 91177308-0d34-0410-b5e6-96231b3b80d8

[PowerPC] add more tests for bit hacking opportunities with setcc; NFC

Missed cases where the input and output are the same size in rL335390.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335395 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][AsmParser] In Intel syntax make sure we support ESP/RSP being the second register in memory expressions like [EAX+ESP].

By default, the second register gets assigned to the index register slot. But ESP can't be an index register so we need to swap it with the other register.

There's still a slight bug that we allow [EAX+ESP*1]. The existence of the multiply even though its with 1 should force ESP to the index register and trigger an error, but it doesn't currently.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335394 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-mca] Remove unnecessary include and forward decl in RCU. NFC.

The DispatchUnit is no longer a dependency of RCU, so this patch removes a
stale include and forward decl. This patch also cleans up some comments.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335392 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] add tests for bit hacking opportunities with setcc; NFC

We likely gave up on folding some select-of-constants patterns in
IR with rL331486, and we need to recover those in the DAG.

The tests without select are based on our current DAGCombiner
optimizations for select-of-constants.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335391 91177308-0d34-0410-b5e6-96231b3b80d8

[PowerPC] add tests for bit hacking opportunities with setcc; NFC

We likely gave up on folding some select-of-constants patterns in
IR with rL331486, and we need to recover those in the DAG.

The tests without select are based on our current DAGCombiner
optimizations for select-of-constants.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335390 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Add test cases showing missed select simplifcation for MCU when icmp is in a slightly different form.

These test cases show that the "(select (and (x , 0x1) == 0), y, (z ^ y) ) -> (-(and (x , 0x1)) & z ) ^ y" doesn't work if the select condition is changed to (and (x, 0x1) != 1)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335389 91177308-0d34-0410-b5e6-96231b3b80d8

[GISel]: Add G_ADDRSPACE_CAST Opcode

Added IRTranslator support for addrspacecast.

https://reviews.llvm.org/D48469

reviewed by: volkan

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335388 91177308-0d34-0410-b5e6-96231b3b80d8

[gdb] Use Latin-1 to decode StringRef

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335387 91177308-0d34-0410-b5e6-96231b3b80d8

Re-land "[LTO] Enable module summary emission by default for regular LTO"

Since we are now producing a summary also for regular LTO builds, we
need to run the NameAnonGlobals pass in those cases as well (the
summary cannot handle anonymous globals).

See https://reviews.llvm.org/D34156 for details on the original change.

This reverts commit 6c9ee4a4a438a8059aacc809b2dd57128fccd6b3.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335385 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Don't accept (%si,%bp) 16-bit address expressions.

The second register is the index register and should only be %si or %di if used with a base register. And in that case the base register should be %bp or %bx.

This makes us compatible with gas.

We do still need to support both orders with Intel syntax which uses [bp+si] and [si+bp]

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335384 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][AsmParser] Allow (%bp,%si) and (%bp,%di) to be encoded without using a zero displacement.

(%bp) can't be encoded without a displacement. The encoding is instead used for displacement alone. So a 1 byte displacement of 0 must be used. But if there is an index register we can encode without a displacement.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335379 91177308-0d34-0410-b5e6-96231b3b80d8

AMDHSA: Put old assembler docs back

Until we switch to code object v3 by default.
Follow up for https://reviews.llvm.org/D47736.

Differential Revision: https://reviews.llvm.org/D48497

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335378 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][SSE] Add sdiv by (nonuniform) minus one tests (PR37119)

Test cases from D45806

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335376 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][AsmParser] Check for invalid 16-bit base register in Intel syntax.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335373 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Don't allow ESP/RSP to be used as an index register in assembly.

Fixes PR37892

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335370 91177308-0d34-0410-b5e6-96231b3b80d8

[LoopUnswitch]Fix comparison for DomTree updates.

Summary:
In LoopUnswitch when replacing a branch Parent -> Succ with a conditional
branch Parent -> True & Parent->False, the DomTree updates should insert an edge for
each of True/False if True/False are different than Succ, and delete Parent->Succ edge
if both are different. The comparison with Succ appears to be incorect,
it's comparing with Parent instead.
There is no test failing either before or after this change, but it seems to me this is
the right way to do the update.

Reviewers: chandlerc, kuhar

Subscribers: sanjoy, jlebar, llvm-commits

Differential Revision: https://reviews.llvm.org/D48457

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335369 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-mca] Remove redundant call. NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335368 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Add a test to show missed opportunity to generate vfnmadd

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335367 91177308-0d34-0410-b5e6-96231b3b80d8

Initialize LiveRegs once in BranchFolder::mergeCommonTails

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335365 91177308-0d34-0410-b5e6-96231b3b80d8

[SLPVectorizer] Support alternate opcodes in tryToVectorizeList

Enable tryToVectorizeList to support InstructionsState alternate opcode patterns at a root (build vector etc.) as well as further down the vectorization tree.

NOTE: This patch reduces some of the debug reporting if there are opcode mismatches - I can try to add it back if it proves a problem. But it could get rather messy trying to provide equivalent verbose debug strings via getSameOpcode etc.

Differential Revision: https://reviews.llvm.org/D48488

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335364 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-mca] Set the operand ID for implicit register reads/writes. NFC

Also, move the definition of InstRef at the end of Instruction.h to avoid a
forward declaration.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335363 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-mca] Introduce a sequential container of Stages

Summary:
Remove explicit stages and introduce a list of stages.

A pipeline should be composed of an arbitrary list of stages, and not any
predefined list of stages in the Backend.  The Backend should not know of any
particular stage, rather it should only be concerned that it has a list of
stages, and that those stages will fulfill the contract of what it means to be
a Stage (namely pre/post/execute a given instruction).

For now, we leave the original set of stages defined in the Backend ctor;
however, I imagine these will be moved out at a later time.

This patch makes an adjustment to the semantics of Stage::isReady.
Specifically, what the Backend really needs to know is if a Stage has
unfinished work.  With that said, it is more appropriately renamed
Stage::hasWorkToComplete().  This change will clean up the check in
Backend::run(), allowing us to query each stage to see if there is unfinished
work, regardless of what subclass a stage might be.  I feel that this change
simplifies the semantics too, but that's a subjective statement.

Given how RetireStage and ExecuteStage handle data in their preExecute(), I've
had to change the order of Retire and Execute in our stage list.  Retire must
complete any of its preExecute actions before ExecuteStage's preExecute can
take control.  This is mainly because both stages utilize the RCU.  In the
meantime, I want to see if I can adjust that or remove that coupling.

Reviewers: andreadb, RKSimon, courbet

Reviewed By: andreadb

Subscribers: tschuett, gbedwell, llvm-commits

Differential Revision: https://reviews.llvm.org/D46907

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335361 91177308-0d34-0410-b5e6-96231b3b80d8

[SLPVectorizer] reorderAltShuffleOperands should just take InstructionsState. NFCI.

All calls were extracting the InstructionsState Opcode/AltOpcode values so we might as well pass it directly

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335359 91177308-0d34-0410-b5e6-96231b3b80d8

Fix test again, try to keep all targets happy

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335356 91177308-0d34-0410-b5e6-96231b3b80d8

Fix test, nop is not always 1 byte

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335353 91177308-0d34-0410-b5e6-96231b3b80d8

[DWARFv5] Allow ".loc 0" to refer to the root file.

DWARF v5 explicitly represents file #0 in the line table. Prior
versions did not, so ".loc 0" is still an error in those cases.

Differential Revision: https://reviews.llvm.org/D48452

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335350 91177308-0d34-0410-b5e6-96231b3b80d8

[SLPVectorizer] Relax alternate opcodes to accept any BinaryOperator pair

SLP currently only accepts (F)Add/(F)Sub alternate counterpart ops to be merged into an alternate shuffle.

This patch relaxes this to accept any pair of BinaryOperator opcodes instead, assuming the target's cost model accepts the vectorization+shuffle.

Differential Revision: https://reviews.llvm.org/D48477

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335349 91177308-0d34-0410-b5e6-96231b3b80d8

[SLPVectorizer][X86] Add alternate opcode tests for simple build vector cases

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335348 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] add shuffle+binops test from PR37806; NFC

This one shows another pattern that we'll need to match
in some cases, but the current ordering of folds allows
us to match this as 2 binops before simplification takes
place.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335347 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] add tests for shuffle-with-different-binops; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335345 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] rearrange shuffle-of-binops logic; NFC

The commutative matcher makes things more complicated
here, and I'm planning an enhancement where this
form is more readable.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335343 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Regenerate tests to include fma comments

Noticed in the review of D48467

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335342 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Add notes to a few intrinsics

This a change corresponding to the clang change in
https://reviews.llvm.org/D45616

Reviewers: craig.topper, uriel.k, RKSimon, andrew.w.kaylor, spatel, scanon, efriedma

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D48280

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335340 91177308-0d34-0410-b5e6-96231b3b80d8

Recommit r335333 "[MC] - Add .stack_size sections into groups and link them with .text"

With compilation fix.

Original commit message:

D39788 added a '.stack-size' section containing metadata on function stack sizes
to output ELF files behind the new -stack-size-section flag.

This change does following two things on top:

1) Imagine the case when there are -ffunction-sections flag given and there are text sections in COMDATs.
    The patch adds a '.stack-size' section into corresponding COMDAT group, so that linker will be able to
    eliminate them fast during resolving the COMDATs.
2) Patch sets a SHF_LINK_ORDER flag and links '.stack-size' with the corresponding .text.
   With that linker will be able to do -gc-sections on dead stack sizes sections.

Differential revision: https://reviews.llvm.org/D46874

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335336 91177308-0d34-0410-b5e6-96231b3b80d8

[IR] Use Instruction::isBinaryOp helper instead of raw enum range tests. NFCI.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335335 91177308-0d34-0410-b5e6-96231b3b80d8

Revert r335332 "[MC] - Add .stack_size sections into groups and link them with .text"

It broke bots.

http://lab.llvm.org:8011/builders/clang-ppc64le-linux-lnt/builds/12891
http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/9443
http://lab.llvm.org:8011/builders/lldb-x86_64-ubuntu-14.04-buildserver/builds/25551

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335333 91177308-0d34-0410-b5e6-96231b3b80d8

[MC] - Add .stack_size sections into groups and link them with .text

D39788 added a '.stack-size' section containing metadata on function stack sizes
to output ELF files behind the new -stack-size-section flag.

This change does following two things on top:

1) Imagine the case when there are -ffunction-sections flag given and there are text sections in COMDATs.
    The patch adds a '.stack-size' section into corresponding COMDAT group, so that linker will be able to
    eliminate them fast during resolving the COMDATs.
2) Patch sets a SHF_LINK_ORDER flag and links '.stack-size' with the corresponding .text.
   With that linker will be able to do -gc-sections on dead stack sizes sections.

Differential revision: https://reviews.llvm.org/D46874

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335332 91177308-0d34-0410-b5e6-96231b3b80d8

Recommit of r335326, with the test fixed that I missed.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335331 91177308-0d34-0410-b5e6-96231b3b80d8

[CostModel][AArch64] Add some initial costs for SK_Select and SK_PermuteSingleSrc

AArch64 was only setting costs for SK_Transpose, which meant that many of the simpler shuffles (e.g. SK_Select and SK_PermuteSingleSrc for larger vector elements) was being severely overestimated by the default shuffle expansion.

This patch adds costs to help improve SLP performance and avoid a regression in reductions introduced by D48174.

I'm not very knowledgeable about AArch64 shuffle lowering so I've kept the extra costs to a minimum - someone who knows this code can add extra costs which should improve vectorization a lot more.

Differential Revision: https://reviews.llvm.org/D48172

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335329 91177308-0d34-0410-b5e6-96231b3b80d8

Reverting r335326 while I look at the test failure

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335328 91177308-0d34-0410-b5e6-96231b3b80d8

Revert r335324 due to a builtbot failure

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335327 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM] ARMv6m and v8m.baseline strict align

This sets target feature FeatureStrictAlign for Armv6-m and Armv8-m.baseline,
because it has no support for unaligned accesses.
It looks like we always pass target feature "+strict-align" from
Clang, so this is not a user facing problem, but querying the subtarget
(in e.g. llc) for unaligned access support is incorrect.

Differential Revision: https://reviews.llvm.org/D48437

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335326 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Add patterns for i32/i64 local atomic load/store

Not sure why the 32/64 split is needed in the atomic_load
store hierarchies. The regular PatFrags do this, but we don't
do it for the existing handling for global.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335325 91177308-0d34-0410-b5e6-96231b3b80d8

[Evaluator] Improve evaluation of call instruction

Differential revision: https://reviews.llvm.org/D46584

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335324 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Changing the check for valid inputs in combineScalarToVector

Changing the logic of scalar mask folding to check for valid input types rather
than against invalid ones, making it more robust and fixing PR37879.

Differential Revision: https://reviews.llvm.org/D48366

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335323 91177308-0d34-0410-b5e6-96231b3b80d8

Revert r335306 (and r335314) - the Call Graph Profile pass.

This is the first pass in the main pipeline to use the legacy PM's
ability to run function analyses "on demand". Unfortunately, it turns
out there are bugs in that somewhat-hacky approach. At the very least,
it leaks memory and doesn't support -debug-pass=Structure. Unclear if
there are larger issues or not, but this should get the sanitizer bots
back to green by fixing the memory leaks.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335320 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU/GlobalISel: Default to using TableGen'd instruction selector

Summary:
We can select all instructions that are marked as legal in a full piglit run,
so now is a good time to make the TableGen'd instruction selector default
for all opcodes. This is NFC for a full piglit run, which is why there are
no tests.

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48198

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335319 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU/GlobalISel: legalize and select 32-bit G_ASHR

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D48196

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335318 91177308-0d34-0410-b5e6-96231b3b80d8

[LegacyPM] Fix PR37888 by teaching the legacy loop pass manager how to
clear out deleted loops from the current queue beyond just the current
loop.

This is important because SimpleLoopUnswitch will now enqueue the same
loop to be re-processed. When it does this with the legacy PM, we don't
have a way of canceling the rest of the pipeline and so we can end up
deleting the loop before we reprocess it. =/

This change also makes it easy to support deleting other loops in the
queue to process, although I don't have any use cases for that.

Differential Revision: https://reviews.llvm.org/D48470

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335317 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU/GlobalISel: legalize and select 32-bit G_SITOFP

Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48195

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335316 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU/GlobalISel: Implement select() for COPY

Reviewers: arsenm, nhaehnle

Reviewed By: nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D46151

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335315 91177308-0d34-0410-b5e6-96231b3b80d8

Fix test failures after r335306 due to the pipeline changing.

This wasn't obvious for the author to fix because this is the first
pipeline use of the magic utility to get function analyses within
a module pass in the lagecy pass manager. Turns out that has a bug which
prevents dumping the structure of the pipeline and shows up as an
unnamed pass.

I've just left a FIXME for that as it doesn't seem likely worth fixing
and certainly shouldn't hold up getting the bots green.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335314 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] fix shuffle-of-binops bug

With non-commutative binops, we could be using the same
variable value as operand 0 in 1 binop and operand 1 in
the other, so we have to check for that possibility and
bail out.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335312 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] add test for shuffle-of-binops; NFC

This shows a miscompile that was missed in rL335283.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335311 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU/GlobalISel: Implement select() for G_IMPLICIT_DEF

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D46150

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335307 91177308-0d34-0410-b5e6-96231b3b80d8

[Instrumentation] Add Call Graph Profile pass

This patch adds support for generating a call graph profile from Branch Frequency Info.

The CGProfile module pass simply gets the block profile count for each BB and scans for call instructions. For each call instruction it adds an edge from the current function to the called function with the current BB block profile count as the weight.

After scanning all the functions, it generates an appending module flag containing the data. The format looks like:

!llvm.module.flags = !{!0}

!0 = !{i32 5, !"CG Profile", !1}
!1 = !{!2, !3, !4} ; List of edges
!2 = !{void ()* @a, void ()* @b, i64 32} ; Edge from a to b with a weight of 32
!3 = !{void (i1)* @freq, void ()* @a, i64 11}
!4 = !{void (i1)* @freq, void ()* @b, i64 20}

Differential Revision: https://reviews.llvm.org/D48105

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335306 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Fix 32-bit mingw comdat names, only add one underscore

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335304 91177308-0d34-0410-b5e6-96231b3b80d8

[gdb] Update llvm::Optional

Reviewers: dblaikie

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D48461

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335303 91177308-0d34-0410-b5e6-96231b3b80d8

[AMDGPU] Fix lit failures introduced in r335281

The tests do not support big-endian hosts.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335302 91177308-0d34-0410-b5e6-96231b3b80d8

[IR] fix typo in comment; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335301 91177308-0d34-0410-b5e6-96231b3b80d8

Revert r335297 "[X86] Implement more of x86-64 large and medium PIC code models"

MCJIT can't handle R_X86_64_GOT64 yet.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335300 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Commit some comments that weren't in the medium code model patch

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335298 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Implement more of x86-64 large and medium PIC code models

Summary:
The large code model allows code and data segments to exceed 2GB, which
means that some symbol references may require a displacement that cannot
be encoded as a displacement from RIP. The large PIC model even relaxes
the assumption that the GOT itself is within 2GB of all code. Therefore,
we need a special code sequence to materialize it:
  .LtmpN:
    leaq .LtmpN(%rip), %rbx
    movabsq $_GLOBAL_OFFSET_TABLE_-.LtmpN, %rax # Scratch
    addq %rax, %rbx # GOT base reg

From that, non-local references go through the GOT base register instead
of being PC-relative loads. Local references typically use GOTOFF
symbols, like this:
    movq extern_gv@GOT(%rbx), %rax
    movq local_gv@GOTOFF(%rbx), %rax

All calls end up being indirect:
    movabsq $local_fn@GOTOFF, %rax
    addq %rbx, %rax
    callq *%rax

The medium code model retains the assumption that the code segment is
less than 2GB, so calls are once again direct, and the RIP-relative
loads can be used to access the GOT. Materializing the GOT is easy:
    leaq _GLOBAL_OFFSET_TABLE_(%rip), %rbx # GOT base reg

DSO local data accesses will use it:
    movq local_gv@GOTOFF(%rbx), %rax

Non-local data accesses will use RIP-relative addressing, which means we
may not always need to materialize the GOT base:
    movq extern_gv@GOTPCREL(%rip), %rax

Direct calls are basically the same as they are in the small code model:
They use direct, PC-relative addressing, and the PLT is used for calls
to non-local functions.

This patch adds reasonably comprehensive testing of LEA, but there are
lots of interesting folding opportunities that are unimplemented.

Reviewers: chandlerc, echristo

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D47211

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335297 91177308-0d34-0410-b5e6-96231b3b80d8

[GVN] Avoid casting a vector of size less than 8 bits to i8

Summary:
A reprise of D25849.

This crash was found through fuzzing some time ago and was documented in PR28879.

No check for load size has been added due to the following tests:
- Transforms/GVN/invariant.group.ll
- Transforms/GVN/pr10820.ll

These tests expect load sizes that are not a multiple of eight.

Thanks to @davide for the original patch.

Reviewers: nlopes, davide, RKSimon, reames, efriedma

Reviewed By: efriedma

Subscribers: davide, llvm-commits, Prazek

Differential Revision: https://reviews.llvm.org/D48330

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335294 91177308-0d34-0410-b5e6-96231b3b80d8

[dsymutil] Force mmap'ing of binaries

After the recent refactoring that introduced parallel handling of
different object, the binary holder became unique per object file. This
defeats its optimization of caching archives, leading to an archive
being opened for every binary it contains. This is obviously unfortunate
and will need to be refactored soon.

Luckily in practice, the impact of this is limited as most files are
mmap'ed instead of memcopy'd. There's a caveat however: when the memory
buffer requires a null terminator and it's a multiple of the page size,
we allocate instead of mmap'ing. If this happens for a static archive,
we end up with N copies of it in memory, where N is the number of
objects in the archive, leading to exuberant memory usage. This provided
a stopgap solution to ensure that all the files it loads are mmap in
memory by removing the requirement for a terminating null byte.

Differential revision: https://reviews.llvm.org/D48397

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335293 91177308-0d34-0410-b5e6-96231b3b80d8

[SCEV] Re-apply r335197 (with Polly fixes).

Summary:
This initiates a discussion on changing Polly accordingly while re-applying r335197 (D48338).

I have never worked on Polly. The proposed change to param_div_div_div_2.ll is not educated, but just patterns that match the output.

All LLVM files are already reviewed in D48338.

Reviewers: jdoerfert, bollu, efriedma

Subscribers: jlebar, sanjoy, hiraditya, llvm-commits, bixia

Differential Revision: https://reviews.llvm.org/D48453

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335292 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Remove ability to reserve VGPRs for debugger

Differential Revision: https://reviews.llvm.org/D48234

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335288 91177308-0d34-0410-b5e6-96231b3b80d8

[mingw] Fix GCC ABI compatibility for comdat things

Summary:
GCC and the binutils COFF linker do comdats differently from MSVC.
If we want to be ABI compatible, we have to do what they do, which is to
emit unique section names like ".text$_Z3foov" instead of short section
names like ".text". Otherwise, the binutils linker gets confused and
reports multiple definition errors when two object files from GCC and
Clang containing the same inline function are linked together.

The best description of the issue is probably at
https://github.com/Alexpux/MINGW-packages/issues/1677, we don't seem to
have a good one in our tracker.

I fixed up the .pdata and .xdata sections needed everywhere other than
32-bit x86. GCC doesn't use associative comdats for those, it appears to
rely on the section name.

Reviewers: smeenai, compnerd, mstorsjo, martell, mati865

Subscribers: llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D48402

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335286 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] fold vector select of binops with constant ops to 1 binop (PR37806)

This is the simplest case from PR37806:
https://bugs.llvm.org/show_bug.cgi?id=37806

If we have a common variable operand used in a pair of binops with vector constants
that are vector selected together, then we can constant shuffle the constant vectors
to eliminate the shuffle instruction.

This has some tricky parts that are hopefully addressed in the tests and their
respective comments:

  1. If the shuffle mask contains an undef element, then that lane of the result is
     undef:
     http://llvm.org/docs/LangRef.html#shufflevector-instruction

     Therefore, we can replace the constant in that lane with an undef value except
     for div/rem. With div/rem, an undef in the divisor would cause the whole op to
     be undef. So I'm using the same hack as in D47686 - replace the undefs with '1'.

  2. Intersect the wrapping and FMF of the original binops for the new binop. There
     should be no extra poison or fast-math potential in the new binop that wasn't
     possible in the original code.

  3. Disregard other uses. Given that we're eliminating uses (shortening the
     dependency chain), I think that's always the right IR canonicalization. But
     I purposely chose the udiv test to demonstrate the scenario where both
     intermediate values have other uses because that seems likely worse for
     codegen with an expensive math op. This seems like a very rare possibility to
     me, so I don't think it requires a backend patch first.

Differential Revision: https://reviews.llvm.org/D48401

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335283 91177308-0d34-0410-b5e6-96231b3b80d8

[AMDGPU] Update assembler for HSA Code Object v3

Update AMDGPU assembler syntax behind the code-object-v3 feature:

* Replace/rename most AMDGPU assembler directives/symbols and document them.
* Provide more diagnostics (e.g. values out of range, missing values, repeated
values).
* Provide path for backwards compatibility, even with underlying descriptor
changes.

Differential Revision: https://reviews.llvm.org/D47736

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335281 91177308-0d34-0410-b5e6-96231b3b80d8

Revert r335206 "Recommit r333268: [IPSCCP] Use PredicateInfo to propagate facts from cmp instructions."

This reverts commit r335206.

As discussed here: https://reviews.llvm.org/rL333740, a fix will come
tomorrow. In the meanwhile, revert this to fix some bots.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335272 91177308-0d34-0410-b5e6-96231b3b80d8

[mips] Modify comment to test new email address (NFC).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335269 91177308-0d34-0410-b5e6-96231b3b80d8

[AMDGPU] Fix bug with tracking processed blocks in SIInsertWaitcnts

BlockWaitcntProcessedSet was not being cleared between calls, so it was
producing incorrect counts in cases where MBB addresses happened to coincide
across multiple calls.

Differential Revision: https://reviews.llvm.org/D48391

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335268 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU/AMDHSA: Remove GridWorkGroupCountX/Y/Z
and everything that comes with it from implementation
and v3 header files.

Leave definition in v2 header files for backwards
compatibility.

Differential Revision: https://reviews.llvm.org/D48191

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335267 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] add tests for shuffled cmps; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335266 91177308-0d34-0410-b5e6-96231b3b80d8

[DebugInfo] Ignore DBG_VALUE instructions in PostRA Machine Sink

Summary:
The logic for handling the sinking of COPY instructions was generating
different code when building with debug flags.

The original code did not take into consideration debug instructions. This
resulted in the registers in the DBG_VALUE instructions being treated as used,
and prevented the COPY from being sunk. This patch avoids analyzing debug
instructions when trying to sink COPY instructions.

This patch also creates a routine from the code in MachineSinking::SinkInstruction to
perform the logic of sinking an instruction along with its debug instructions.
This functionality is used in multiple places, including the code for sinking COPY instrs.

Reviewers: junbuml, javed.absar, MatzeB, bjope

Reviewed By: bjope

Subscribers: aprantl, probinson, thegameg, jonpa, bjope, vsk, kristof.beyls, JDevlieghere, llvm-commits

Tags: #debug-info

Differential Revision: https://reviews.llvm.org/D45637

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335264 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] use constant pattern matchers with icmp+sext

The previous code worked with vectors, but it failed when the
vector constants contained undef elements.
The matchers handle those cases.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335262 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] add vector icmp tests with undefs; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335261 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] simplify binops before trying other folds

This is outwardly NFC from what I can tell, but it should be more efficient
to simplify first (despite the name, SimplifyAssociativeOrCommutative does
not actually simplify as InstSimplify does - it creates/morphs instructions).

This should make it easier to refactor duplicated code that runs for all binops.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335258 91177308-0d34-0410-b5e6-96231b3b80d8

[LoopVectorize] regenerate full checks; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335257 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Update fast-isel tests for clang r335253.

The new IR fixes a mismatch in the final extractelement for the i32 intrinsics. Previously we extracted a 64-bit element even though we only wanted 32 bits.

SimplifyDemandedElts isn't able to make FP elements undef now and the shuffle mask I used prevents the use of horizontal add we had before. Not sure we should have been using horizontal add anyway. It's implemented on Intel with two port 5 shuffles and an add. So we have on less shuffle now, but an additional instruction to decode.

Differential Revision: https://reviews.llvm.org/D48347

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335256 91177308-0d34-0410-b5e6-96231b3b80d8

[DWARF] Warn on and ignore ".file 0" for DWARF v4 and earlier.

This had been messing with the directory table for prior versions, and
also could induce a crash when generating asm output.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335254 91177308-0d34-0410-b5e6-96231b3b80d8

Revert "[AArch64] Coalesce Copy Zero during instruction selection"

This reverts commit d8f57105010cc7e78026e511d5def873fc91e0e7.

Original Commit:

Author: Haicheng Wu <haicheng@codeaurora.org>
Date:   Sun Feb 18 13:51:33 2018 +0000

    [AArch64] Coalesce Copy Zero during instruction selection

    Add special case for copy of zero to avoid a double copy.

    Differential Revision: https://reviews.llvm.org/D36104

Author's intention is to remove a BB that has one mov instruction. In
order to do that, d8f571050 pessmizes MachineSinking by introducing a
copy, such that mov instruction is NOT moved to the BB. Optimization
downstream gets rid of the BB with only mov instruction. This works well
if we have only one fall through branch as there is only one "extra"
mov instruction.

If we have multiple fall throughs, we will have a lot of redundant movs.
In such a case, it's better to have this BB which has one mov instruction.

This is causing degradation in jpeg, fft and other codebases. I believe
if we want to remove a BB with only one branch instruction, we should not
pessimize Machine Sinking at all, and find some other solution.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335251 91177308-0d34-0410-b5e6-96231b3b80d8

DAG combine "and|or (select c, -1, 0), x" -> "select c, x, 0|-1"

Allowed folding for "and/or" binops with non-constant operand if
arguments of select are 0/-1 values.

Normally this code with "and" opcode does not get to a DAG combiner
and simplified yet in the InstCombine. However AMDGPU produces it
during lowering and InstCombine has no chance to optimize it out.

In turn the same pattern with "or" opcode can reach DAG.

Differential Revision: https://reviews.llvm.org/D48301

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335250 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM] Enable useAA() for the in-order Cortex-R52

This option allows codegen (such as DAGCombine or MI scheduling) to use alias
analysis information, which can help with the codegen on in-order cpu's,
especially machine scheduling. Here I have done things the same way as AArch64,
adding a subtarget feature to enable this for specific cores, and enabled it for
the R52 where we have a schedule to make use of it.

Differential Revision: https://reviews.llvm.org/D48074

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335249 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] make div/rem vector constant utility function; NFCI

This was originally in D48401 and will be used there.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335242 91177308-0d34-0410-b5e6-96231b3b80d8

[NFC][ARM] ldrd/strd negative tests

Add negative tests for load and stores of alignment 2.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335241 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-exegesis][NFC] Simplify BenchmarkRunner.

Get rid of createExecutableFunction().

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335240 91177308-0d34-0410-b5e6-96231b3b80d8

[RISCV] Tail calls don't need to save return address

Summary:
When expanding the PseudoTail in expandFunctionCall() we were using X6
to save the return address. Since this is a tail call the return
address is not needed, this patch replaces it with X0 to be ignored.

This matches the behaviour listed in the ISA V2.2 document page 110.
tail offset -----> jalr x0, x6, offset

GCC exhibits the same behavior.

Reviewers: apazos, asb, mgrang

Reviewed By: asb

Subscribers: rbar, johnrusso, simoncook, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, rogfer01

Differential Revision: https://reviews.llvm.org/D48343

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335239 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] Lower some trunc + shuffle patterns to vpmov[q|d][b|w]

This should help in lowering the following four intrinsics:
_mm256_cvtepi32_epi8
_mm256_cvtepi64_epi16
_mm256_cvtepi64_epi8
_mm512_cvtepi64_epi8

Differential Revision: https://reviews.llvm.org/D46957

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335238 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-exegesis][NFC] Simplify LLVMState.

Summary: Pretty much everything we need is in llvm::TargetMachine.

Reviewers: gchatelet

Subscribers: llvm-commits, tschuett

Differential Revision: https://reviews.llvm.org/D48428

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335237 91177308-0d34-0410-b5e6-96231b3b80d8

[CodeGen] Avoid handling DBG_VALUE in LiveRegUnits::stepBackward

Patch by Jesper Antonsson.

Differential Revision: https://reviews.llvm.org/D48420

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335233 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Remove redundant MIMG instruction variants

Summary:
For sample and gather ops, we can accurately determine the set of
vaddr-size instruction variants that are required. This reduces
the size of instruction tables by ~5%.

The number of machine instruction opcodes is reduced from 10002
to 9476.

Change-Id: Ie7fc65d3657b762c7816017fe70b2e9bec644a8a

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D48168

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335232 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Remove old-style image intrinsics

Summary:
This also removes the need for atomic pseudo instructions, since
we select the correct encoding directly in SITargetLowering::lowerImage
for dimension-aware image intrinsics.

Mesa uses dimension-aware image intrinsics since
commit a9a7993441.

Change-Id: I7473d20009476a4ed6d919cae4e6dca9ff42e77a

Reviewers: arsenm, rampitec, mareko, tpr, b-sumner

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48167

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335231 91177308-0d34-0410-b5e6-96231b3b80d8

InstCombine/AMDGPU: Add dimension-aware image intrinsics to SimplifyDemanded

Summary:
Use the expanded features of the TableGen generic tables to avoid manually
adding the combinatorially exploded set of intrinsics. The
getAMDGPUImageDimIntrinsic lookup function is early-out,
i.e. non-AMDGPU intrinsics will never look at the underlying table.

Use a generic approach for getting the new intrinsic overload to keep the
code simple, and make the image dmask handling more generic:
- handle non-sampler image loads
- handle the case where the set of demanded elements is not a prefix

There is some overlap between this code and an optimization that happens
in the backend during code generation. They currently complement each other:

- only the codegen optimization can generate vec3 loads
- only the InstCombine optimization can handle D16

The InstCombine optimization also likely covers more cases since the
codegen optimization is fairly ad-hoc. Ideally, we'll remove the optimization
in codegen once the infrastructure for vec3 is in place (which will probably
take a long time).

Modify the test cases to use dimension-aware intrinsics. This makes it
easier to see that the test coverage for the new intrinsics is equivalent,
and the old style intrinsics will be removed in a follow-up commit anyway.

Change-Id: I4b91ea661413d13004956fe4ef7d13d41b8ce3ad

Reviewers: arsenm, rampitec, majnemer

Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48165

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335230 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Convert test cases to the dimension-aware intrinsics

Summary:
Also explicitly port over some tests in llvm.amdgcn.image.* that were
missing. Some tests are removed because they no longer apply (i.e.
explicitly testing building an address vector via insertelement).

This is in preparation for the eventual removal of the old-style
intrinsics.

Some additional notes:
- constant-address-space-32bit.ll: change some GCN-NEXT to GCN because
  the instruction schedule was subtly altered
- insert_vector_elt.ll: the old test didn't actually test anything,
  because %tmp1 was not used; remove the load, because it doesn't work
  (Because of the amdgpu_ps calling convention? In any case, it's
  orthogonal to what the test claims to be testing.)

Change-Id: Idfa99b6512ad139e755e82b8b89548ab08f0afcf

Reviewers: arsenm, rampitec

Subscribers: MatzeB, qcolombet, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D48018

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335229 91177308-0d34-0410-b5e6-96231b3b80d8