git.osdn.net Git - android-x86/external-llvm.git/log

[llvm-mca] LSUnit: use a SmallSet to model load/store queues. NFCI

Also, try to minimize the number of queries to the memory queues to speedup the
analysis.

On average, this change gives a small 2% speedup. For memcpy-like kernels, the
speedup is up to 5.5%.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347469 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-mca] Use a SmallVector instead of std::vector to track register reads/writes. NFCI

This avoids a heap allocation most of the times.
This patch gives a small but consistent 3% speedup on a release build (up to ~5%
on a debug build).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347464 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-mca] Fix an invalid memory read introduced by r346487.

This patch fixes an invalid memory read introduced by r346487.
Before this patch, partial register write had to query the latency of the
dependent full register write by calling a method on the full write descriptor.
However, if the full write is from an already retired instruction, chances are
that the EntryStage already reclaimed its memory.
In some parial register write tests, valgrind was reporting an invalid
memory read.

This change fixes the invalid memory access problem. Writes are now responsible
for tracking dependent partial register writes, and notify them in the event of
instruction issued.
That means, partial register writes no longer need to query their associated
full write to check when they are ready to execute.

Added test X86/BtVer2/partial-reg-update-7.s

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347459 91177308-0d34-0410-b5e6-96231b3b80d8

[NFC] Assert that all blocks staying in loop are live

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347458 91177308-0d34-0410-b5e6-96231b3b80d8

[NFC] Ensure deterministic order of dead exit blocks

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347457 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64] Fix SelectionDAG infinite loop for v1i64 SCALAR_TO_VECTOR

A consequence of r347274 is that SCALAR_TO_VECTOR can be converted into
BUILD_VECTOR by SimplifyDemandedBits, but LowerBUILD_VECTOR can turn
BUILD_VECTOR into SCALAR_TO_VECTOR so we get an infinite loop.

Fix this by making LowerBUILD_VECTOR not do this transformation for those
vectors that would get transformed back, i.e. BUILD_VECTOR of a single-element
constant vector. Doing that means we get a DUP, which we then need to recognise
in ISel as a copy.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347456 91177308-0d34-0410-b5e6-96231b3b80d8

[NFC] Simplify code by using standard exit blocks collection

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347454 91177308-0d34-0410-b5e6-96231b3b80d8

[TI removal] Leverage the fact that TerminatorInst is gone to create
a normal base class that provides all common "call" functionality.

This merges two complex CRTP mixins for the common "call" logic and
common operand bundle logic into a single, normal base class of
`CallInst` and `InvokeInst`. Going forward, users can typically
`dyn_cast<CallBase>` and use the resulting API. No more need for the
`CallSite` wrapper. I'm planning to migrate current usage of the wrapper
to directly use the base class and then it can be removed, but those are
simpler and much more incremental steps. The big change is to introduce
this abstraction into the type system.

I've tried to do some basic simplifications of the APIs that I couldn't
really help but touch as part of this:
- I've tried to organize the attribute API and bundle API into groups to
  make understanding the API of `CallBase` easier. Without this,
  I wasn't able to navigate the API sanely for all of the ways I needed
  to modify it.
- I've added what seem like more clear and consistent APIs for getting
  at the called operand. These ended up being especially useful to
  consolidate the *numerous* duplicated code paths trying to do this.
- I've largely reworked the organization and implementation of the APIs
  for computing the argument operands as they needed to change to work
  with the new subclass approach.

To minimize any cost associated with this abstraction, I've moved the
operand layout in memory to store the called operand last. This makes
its position relative to the end of the operand array the same,
regardless of the subclass. It should make it much cheaper to reference
from the `CallBase` abstraction, and this is likely one of the most
frequent things to query.

We do still pay one abstraction penalty here: we have to branch to
determine whether there are 0 or 2 extra operands when computing the end
of the argument operand sequence. However, that seems both rare and
should optimize well. I've implemented this in a way specifically
designed to allow it to optimize fairly well. If this shows up in
profiles, we can add overrides of the relevant methods to the subclasses
that bypass this penalty. It seems very unlikely that this will be an
issue as the code was *already* dealing with an ever present abstraction
of whether or not there are operand bundles, so this isn't the first
branch to go into the computation.

I've tried to remove as much of the obvious vestigial API surface of the
old CRTP implementation as I could, but I suspect there is further
cleanup that should now be possible, especially around the operand
bundle APIs. I'm leaving all of that for future work in this patch as
enough things are changing here as-is.

One thing that made this harder for me to reason about and debug was the
pervasive use of unsigned values in subtraction and other arithmetic
computations. I had to debug more than one unintentional wrap. I've
switched a few of these to use `int` which seems substantially simpler,
but I've held back from doing this more broadly to avoid creating
confusing divergence within a single class's API.

I also worked to remove all of the magic numbers used to index into
operands, putting them behind named constants or putting them into
a single method with a comment and strictly using the method elsewhere.
This was necessary to be able to re-layout the operands as discussed
above.

Thanks to Ben for reviewing this (somewhat large and awkward) patch!

Differential Revision: https://reviews.llvm.org/D54788

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347452 91177308-0d34-0410-b5e6-96231b3b80d8

Revert r343473 "Move llvm util dependencies from clang-tools-extra to add_lit_target."

Summary:
It will cause test tools `FileCheck`, `count`, `not` being built blindly, these
dependencies should move back to clang-tools-extra.

Reviewers: mgorny

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54797

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347448 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM GlobalISel] Add test for BFC. NFCI

r334871 has made it possible for TableGen'erated code to select BFC, but
it has not added a test for it on the ARM side. Add it now to make sure
we don't introduce regressions if we ever change anything about that
rule.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347447 91177308-0d34-0410-b5e6-96231b3b80d8

[SystemZTTIImpl] Give correct cost values for vector bswap intrinsics.

Implement getIntrinsicInstrCost() and return costs reflecting that bswap can
be done with a vperm per vector register.

Review: Ulrich Weigand
https://reviews.llvm.org/D54789

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347445 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-size] Use empty() and range-based for loop. NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347441 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-mca] Add test case (NFC)

Add test case that will serve as the base for D54820.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347440 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] use FileCheck to verify output; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347438 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-mca] Add test case (NFC)

Fix previous commit r347434.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347437 91177308-0d34-0410-b5e6-96231b3b80d8

Add a ubsan blacklist entry for libstdc++ 8.0.1.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347436 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-mca] Add test case (NFC)

Add test case that will serve as the base for D54777.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347434 91177308-0d34-0410-b5e6-96231b3b80d8

Removing test/MC/Mips/reloc-directive-label-offset.s temporarily

This test is failing on llvm-clang-x86_64-expensive-checks-win builder.
Removing it until I get it fixed.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347433 91177308-0d34-0410-b5e6-96231b3b80d8

[PM] correcting return value for new-pass-manager version of Scalarizer

Obvious mistake missed during D54695 review.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347432 91177308-0d34-0410-b5e6-96231b3b80d8

[mingw] Use unmangled name after the $ in the section name

GCC does it this way, and we have to be consistent. This includes
stdcall and fastcall functions with suffixes. I confirmed that a
fastcall function named "foo" ends up in ".text$foo", not
".text$@foo@8".

Based on a patch by Andrew Yohn!

Fixes PR39218.

Differential Revision: https://reviews.llvm.org/D54762

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347431 91177308-0d34-0410-b5e6-96231b3b80d8

[PowerPC][NFC] Split PPCMCCodeEmitter into header and cpp file.

This is further cleanup for PPCMCCodeEmitter. The class had been contained
within the cpp file alone. Now it has been split up between a header file and
a cpp file which allows other classes to make use of the functions in this class
if required.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347428 91177308-0d34-0410-b5e6-96231b3b80d8

[DAGCombiner] refactor select-of-FP-constants transform

This transform needs to be limited.

We are converting to a constant pool load very early, and we
are turning loads that are independent of the select condition
(and therefore speculatable) into a dependent non-speculatable
load.

We may also be transferring a condition code from an FP register
to integer to create that dependent load.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347424 91177308-0d34-0410-b5e6-96231b3b80d8

[PowerPC][NFC] Minor Code Cleaup for PPCMCCodeEmitter.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347422 91177308-0d34-0410-b5e6-96231b3b80d8

[LLVM] Allow modulemap installation

Summary:
Currently we can't install the modulemaps provided by LLVM, since they are not structured to support headers generated as part of the build (ex. `llvm/IR/Attributes.gen`).
This patch restructures the module maps in order to support installation.

Modules containing generated headers are defined in the new `module.extern.modulemap` file, and are referenced from the main `module.modulemap` using `extern module`. There are two versions of the `module.extern.modulemap` file; one used when building and another, `module.install.modulemap`, which is re-named during installation.

Users can opt-into module map installation using `-DLLVM_INSTALL_MODULEMAPS=ON`. The default value is `OFF` due to llvm.org/PR31905.

Reviewers: rsmith, mehdi_amini, bruno, EricWF

Reviewed By: EricWF

Subscribers: tschuett, chapuni, mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D53510

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347420 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] Add tests for funnel shift with zero operand; NFC

These are additional baseline tests for D54778.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347414 91177308-0d34-0410-b5e6-96231b3b80d8

[DAGCombiner] reduce code duplication; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347410 91177308-0d34-0410-b5e6-96231b3b80d8

[MergeFuncs] Generate alias instead of thunk if possible

The MergeFunctions pass was originally intended to emit aliases
instead of thunks where possible (unnamed_addr). However, for a
long time this functionality was behind a flag hardcoded to false,
bitrotted and was eventually removed in r309313.

Originally the functionality was first disabled in r108417 due to
lack of support for aliases in Mach-O. I believe that this is no
longer the case nowadays, but not really familiar with this area.

In the interest of being conservative, this patch reintroduces the
aliasing functionality behind a default disabled -mergefunc-use-aliases
flag.

Differential Revision: https://reviews.llvm.org/D53285

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347407 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] add tests for select-of-FP-constants; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347406 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] fix predicate for avoiding vblendv

It only makes sense to produce the logic ops when 1 of the
constants is +0.0. Otherwise, go with vblendv to reduce code.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347403 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] add test for FP select with constant; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347401 91177308-0d34-0410-b5e6-96231b3b80d8

[mips][mc] Add basic support for R_MIPS_JALR/R_MICROMIPS_JALR

R_MIPS_JALR/R_MICROMIPS_JALR can now be parsed in .s files and emitted to .o.
They are still not generated with JALR.

Differential revision: https://reviews.llvm.org/D54721

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347398 91177308-0d34-0410-b5e6-96231b3b80d8

[MC] Support labels as offsets in .reloc directive

Currently, expressions like

.reloc 1f, R_MIPS_JALR, foo
1: nop

are not allowed, ie. an offset in .reloc can only be absolute value.
This patch adds support for labels as offsets.
If offset is a forward declared label, MCObjectStreamer keeps the fixup locally
and adds it to the fixups vector after the label (and its offset) is defined.
label+number is not supported yet.

Differential revision: https://reviews.llvm.org/D53990

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347397 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] add checks for asm to test; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347394 91177308-0d34-0410-b5e6-96231b3b80d8

[TargetLowering] SimplifyDemandedBits - only reduce known bits for integer constants

Avoids fuzzing crash found by Mikael Holmén.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347393 91177308-0d34-0410-b5e6-96231b3b80d8

[PM] Port Scalarizer to the new pass manager.

Patch by: markus (Markus Lavin)

Reviewers: chandlerc, fedor.sergeev

Reviewed By: fedor.sergeev

Subscribers: llvm-commits, Ka-Ka, bjope

Differential Revision: https://reviews.llvm.org/D54695

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347392 91177308-0d34-0410-b5e6-96231b3b80d8

[nios2] Add missing Nios2CodeGen -> Nios2AsmPrinter linkage

Add missing linkage from Nios2CodeGen library to Nios2AsmPrinter
library.  The missing dependency causes shared-lib build to fail with
the following reason:

  lib/Target/Nios2/CMakeFiles/LLVMNios2CodeGen.dir/Nios2AsmPrinter.cpp.o: In function `(anonymous namespace)::Nios2AsmPrinter::PrintAsmMemoryOperand(llvm::MachineInstr const*, unsigned int, unsigned int, char const*, llvm::raw_ostream&)':
  Nios2AsmPrinter.cpp:(.text._ZN12_GLOBAL__N_115Nios2AsmPrinter21PrintAsmMemoryOperandEPKN4llvm12MachineInstrEjjPKcRNS1_11raw_ostreamE+0x2b): undefined reference to `llvm::Nios2InstPrinter::getRegisterName(unsigned int)'
  lib/Target/Nios2/CMakeFiles/LLVMNios2CodeGen.dir/Nios2AsmPrinter.cpp.o: In function `(anonymous namespace)::Nios2AsmPrinter::PrintAsmOperand(llvm::MachineInstr const*, unsigned int, unsigned int, char const*, llvm::raw_ostream&)':
  Nios2AsmPrinter.cpp:(.text._ZN12_GLOBAL__N_115Nios2AsmPrinter15PrintAsmOperandEPKN4llvm12MachineInstrEjjPKcRNS1_11raw_ostreamE+0x97): undefined reference to `llvm::Nios2InstPrinter::getRegisterName(unsigned int)'
  collect2: error: ld returned 1 exit status

Differential Revision: https://reviews.llvm.org/D47810

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347387 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][AVX] Remove BROADCAST if we only need the 0'th element

We don't catch this with target shuffle simplification if the src/dst types are different.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347386 91177308-0d34-0410-b5e6-96231b3b80d8

Test commit: Delete trailing space in comment

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347385 91177308-0d34-0410-b5e6-96231b3b80d8

[NFC] More complex tests for LoopSimplifyCFG

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347384 91177308-0d34-0410-b5e6-96231b3b80d8

[NFC] Add some sophisticated tests on LoopSimplifyCFG

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347381 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] In getScalarMaskingNode, replace scalar_to_vector with a bitcast to v8i1 and an extract_subvector to convert i8 to v1i1.

The bitcast can be nicely merged with any i8 loads that exist for argument passing in 32 mode for example.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347380 91177308-0d34-0410-b5e6-96231b3b80d8

[LVI] run transfer function for binary operator even when the RHS isn't a constant

LVI was symbolically executing binary operators only when the RHS was
constant, missing the case where we have a ConstantRange for the RHS,
but not an actual constant. Tested using check-all and by
bootstrapping. Compile time is not impacted measurably.

Differential Revision: https://reviews.llvm.org/D19859

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347379 91177308-0d34-0410-b5e6-96231b3b80d8

[PowerPC] Do not use vectors to codegen bswap with Altivec turned off

We have efficient codegen on P9 for lowering bswap that involves moving
the value into a vector reg and moving it back. However, the check under
which we custom lowered it did not adequately reflect the actual requirements.
It required only that the subtarget be an implementation of ISA 3.0 since all
compliant implementations have to provide the vector instructions.
However, the kernel builds have a valid use case for -mno-altivec -mcpu=pwr9
(i.e. don't emit vector code, don't have to save vector regs for context
switch). So we should require the correct features for this lowering.
Fixes https://bugs.llvm.org/show_bug.cgi?id=39334

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347376 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Correct 256 vpmovzx/vpmovsx isel patterns to check HasAVX2 instead of HasAVX to prevent fast-isel from using them incorrectly.

These are AVX2 instructions, but have been incorrectly marked in tablegen for a while. This wasn't a problem until r346784 switched the patterns to use target independent ISD opcodes. This made the patterns visible to fast isel.

Fixes PR39733

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347375 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Add a copy of avx512-trunc.ll with -x86-experimental-vector-widening-legalization enabled.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347374 91177308-0d34-0410-b5e6-96231b3b80d8

[docs] Add C++ Performance Benchmark to test-suite proposals.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347369 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Emit a PACKUS instead of a VECTOR_SHUFFLE from LowerTRUNCATE for v16i16->v16i8.

We can't guarantee that demanded bits passing through the vector shuffle won't cause the AND in front of this to be removed. This would prevent the PACKUS from being matched during shuffle lowering.

Unfortunately, this adds a packuswb to one of the vector-reduce-mul.ll tests since we were removing the shuffle via SimplifyDemandedVectorElts. We appear to have similar issues with vpmovwb on the same test case on other targets.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347361 91177308-0d34-0410-b5e6-96231b3b80d8

Fix pointer options mask. It was off by 1 bit.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347359 91177308-0d34-0410-b5e6-96231b3b80d8

[DAGCombiner] look through bitcasts when trying to narrow vector binops

This is another step in vector narrowing - a follow-up to D53784
(and hoping to eventually squash potential regressions seen in
D51553).

The x86 test diffs are wins, but the AArch64 diff is probably not.
That problem already exists independent of this patch (see PR39722), but it
went unnoticed in the previous patch because there were no regression tests
that showed the possibility.

The x86 diff in i64-mem-copy.ll is close. Given the frequency throttling
concerns with using wider vector ops, an extra extract to reduce vector
width is the right trade-off at this level of codegen.

Differential Revision: https://reviews.llvm.org/D54392

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347356 91177308-0d34-0410-b5e6-96231b3b80d8

[CodeView] Add support for ref-qualified member functions.

When you have a member function with a ref-qualifier, for example:

struct Foo {
void Func() &;
void Func2() &&;
};

clang-cl was not emitting this information. Doing so is a bit
awkward, because it's not a property of the LF_MFUNCTION type, which
is what you'd expect. Instead, it's a property of the this pointer
which is actually an LF_POINTER. This record has an attributes
bitmask on it, and our handling of this bitmask was all wrong. We
had some parts of the bitmask defined incorrectly, but importantly
for this bug, we didn't know about these extra 2 bits that represent
the ref qualifier at all.

Differential Revision: https://reviews.llvm.org/D54667

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347354 91177308-0d34-0410-b5e6-96231b3b80d8

[CodeView] Mark this pointers as const.

This is for compatibility with MSVC, which also marks this pointers
as being const-qualified.

Fixes llvm.org/pr36526

Differential Revision: https://reviews.llvm.org/D54736

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347353 91177308-0d34-0410-b5e6-96231b3b80d8

[CodeView] RelocPtr points to little endian data.

Don't use a uint32_t*, use a ulittle32_t* to make this correct
on big endian systems.

Patch by James Clarke
Differential Revision: https://reviews.llvm.org/D54421

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347349 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Emit a single shuffle for the v16i8->v4i32 step of a SIGN_EXTEND_VECTOR_INREG lowering on pre-sse4.1 targets.

Previously we emitted to separate shuffles, one for unpcklbw and one for unpcklwd. Instead emit a single shuffle equivalent to both of the original shuffles. Shuffle lowering seems able to handle it. This avoids a bitcast between the two shuffles which seems helpful to DAG combine.

Remove the custom type legalization for v8i8->v8i32. I had put that in to avoid some almost duplicate punpcklbw instructions I was seeing, but this lowering change seems to fix that. It also fixes some duplicate shuffles seen in vector-sext.ll

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347348 91177308-0d34-0410-b5e6-96231b3b80d8

[unittests] Fix ExpandTilde test to match handling home dirs with trailing slash

The `expandTildeExpr` routine just replaces a tilde by a home dir path.
If the home dir has a trailing slash, the result of substitution will
contain double slashes. For example, `HOME=/foo/ ~/bar` gives `/foo//bar`.
That corresponds to (at least) Bash behaviour because the following
command `$HOME=/foo/ echo ~/bar` prints `/foo//bar`.

The `ExpandTilde` test constructs a path expected as the `fs::expand_tilde`
call result by calling `path::append` and the expected path has a single
slash. This patch fixes that and allows to pass the unittest on hosts where
the `HOME` is `/`.

Differential Revision: http://reviews.llvm.org/D54752

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347346 91177308-0d34-0410-b5e6-96231b3b80d8

Silence C4709 in MSVC because it is buggy.

The diagnostic will trigger on code that does not have any comma operator, but instead default-constructs an object with an explicitly defaulted constructor as the array index argument.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347345 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] add tests for 8-bit multiply with constant; NFC

This is based on the existing file for 16-bit. We also already have 32-bit and 64-bit variants.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347341 91177308-0d34-0410-b5e6-96231b3b80d8

[WebAssembly] WebAssemblyLowerEmscriptenEHSjLj: use getter/setter for accessing tempRet0

Rather than assuming that `tempRet0` exists in linear memory only assume
the getter/setter functions exist. This avoids conflicting with
binaryen which declares a wasm global for this purpose and defines it's
own getter and setter for that.

The other advantage of doing things this way is that it leaving
it up to the linker/finalizer to decide how to actually store this
temporary. As it happens binaryen uses a wasm global which is more
appropriate since it is thread safe.

This also allows us to change the way this is stored in the future
(memory, TLS memory, wasm global) without modifying LLVM.

This is part of a 4 part change:
LLVM: https://reviews.llvm.org/D53240
fastcomp: https://github.com/kripken/emscripten-fastcomp/pull/237
emscripten: https://github.com/kripken/emscripten/pull/7358
binaryen: https://github.com/WebAssembly/binaryen/pull/1709

Differential Revision: https://reviews.llvm.org/D53240

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347340 91177308-0d34-0410-b5e6-96231b3b80d8

[unittest] Skip W+X MappedMemoryTests when MPROTECT is enabled

Skip all MappedMemoryTest variants that rely on memory pages being
mapped for MF_WRITE|MF_EXEC when MPROTECT is enabled on NetBSD. W^X
protection causes all those mmap() calls to fail, causing the tests
to fail.

Differential Revision: https://reviews.llvm.org/D54080

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347337 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Remove -verify-machineinstrs=0 now that PR38391 is fixed.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347335 91177308-0d34-0410-b5e6-96231b3b80d8

[Docs] Documentation for the saturation addition and subtraction intrinsics

Differential Revision: https://reviews.llvm.org/D54729

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347334 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] add tests for funnel shifts; NFC

These are included in D54666, so adding them first with baseline results.

Patch by: @nikic (Nikita Popov)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347333 91177308-0d34-0410-b5e6-96231b3b80d8

[InstSimplify] fold funnel shifts with undef operands

Splitting these off from the D54666.

Patch by: nikic (Nikita Popov)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347332 91177308-0d34-0410-b5e6-96231b3b80d8

[InstSimplify] add tests for funnel shift with undef operands; NFC

These are part of D54666, so adding them here before the patch to
show the baseline (currently unoptimized) results.

Patch by: @nikic (Nikita Popov)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347331 91177308-0d34-0410-b5e6-96231b3b80d8

[InstructionSimplify] Add support for saturating add/sub

Add support for saturating add/sub in InstructionSimplify. In particular, the following simplifications are supported:

    sat(X + 0) -> X
    sat(X + undef) -> -1
    sat(X uadd MAX) -> MAX
    (and commutative variants)

    sat(X - 0) -> X
    sat(X - X) -> 0
    sat(X - undef) -> 0
    sat(undef - X) -> 0
    sat(0 usub X) -> 0
    sat(X usub MAX) -> 0

Patch by: @nikic (Nikita Popov)

Differential Revision: https://reviews.llvm.org/D54532

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347330 91177308-0d34-0410-b5e6-96231b3b80d8

[ConstantFolding] Add support for saturating add/sub

Support saturating add/sub in constant folding, based on the APInt methods introduced in D54332.

Patch by: @nikic (Nikita Popov)

Differential Revision: https://reviews.llvm.org/D54531

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347328 91177308-0d34-0410-b5e6-96231b3b80d8

[AMDGPU] Regenerate weird stores tests.

Makes an upcoming SimplifyDemandedBits optimization much easier to understand.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347326 91177308-0d34-0410-b5e6-96231b3b80d8

[LoopSink] Add preheader to alias set

This patch fixes PR39695.

The original LoopSink only considers memory alias in loop body. But PR39695 shows that instructions following sink candidate in preheader should also be checked. This is a conservative patch, it simply adds whole preheader block to alias set. It may lose some optimization opportunity, but I think that is very rare because: 1 in the most common case st/ld to the same address, the load should already be optimized away. 2 usually preheader is not very large.

Differential Revision: https://reviews.llvm.org/D54659

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347325 91177308-0d34-0410-b5e6-96231b3b80d8

[APInt] Add methods for saturated add and sub

This adds the sadd_sat, uadd_sat, ssub_sat, usub_sat methods for performing saturating additions and subtractions to APInt.

Split out from D54237.

Patch by: nikic (Nikita Popov)

Differential Revision: https://reviews.llvm.org/D54332

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347324 91177308-0d34-0410-b5e6-96231b3b80d8

[PatternMatch] Handle undef vectors consistently

This patch fixes the issue noticed in D54532.
The problem is that cst_pred_ty-based matchers like m_Zero() currently do not match
scalar undefs (as expected), but *do* match vector undefs. This may lead to optimization
inconsistencies in rare cases.

There is only one existing test for which output changes, reverting the change from D53205.
The reason here is that vector fsub undef, %x is no longer matched as an m_FNeg(). While I
think that the new output is technically worse than the previous one, it is consistent with
scalar, and I don't think it's really important either way (generally that undef should have
been folded away prior to reassociation.)

I've also added another test case for this issue based on InstructionSimplify. It took some
effort to find that one, as in most cases undef folds are either checked first -- and in the
cases where they aren't it usually happens to not make a difference in the end. This is the
only case I was able to come up with. Prior to this patch the test case simplified to undef
in the scalar case, but zeroinitializer in the vector case.

Patch by: @nikic (Nikita Popov)

Differential Revision: https://reviews.llvm.org/D54631

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347318 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64, x86] add tests for shift-not (PR39657); NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347316 91177308-0d34-0410-b5e6-96231b3b80d8

[DAGCombine] Add calls to SimplifyDemandedVectorElts from visitINSERT_SUBVECTOR (PR37989)

This uncovered an off-by-one typo in SimplifyDemandedVectorElts's INSERT_SUBVECTOR handling as its bounds check was bailing on safe indices.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347313 91177308-0d34-0410-b5e6-96231b3b80d8

[PowerPC] Add Itineraries for STWU/STWUX etc

When doing some instruction scheduling work, we noticed some missing itineraries.

Before we switch to machine scheduler, those missing itineraries might not have impact to actually scheduling,
because we can still get same latency due to default values.

With machine scheduler, however, itineraries will have impact to scheduling.
eg: NumMicroOps will default to be 0 if there is NO itineraries for specific instruction class.
And most of the instruction class with itineraries will have NumMicroOps default to 1.

This will has impact on the count of RetiredMOps, affects the Pending/Available Queue,
then causing different scheduling or suboptimal scheduling further.

This patch is for STWU/STWUX (IIC_LdStStoreUpd ) for P8.

Since there are already multiple IIC for store update, this patch also merge
IIC_LdStSTDU/IIC_LdStStoreUpd to IIC_LdStSTU
IIC_LdStSTDUX to IIC_LdStSTUX

and we add a new testcase in https://reviews.llvm.org/D54699 to show the difference.

Differential Revision: https://reviews.llvm.org/D54700

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347311 91177308-0d34-0410-b5e6-96231b3b80d8

[PowerPC][NFC]Add testcase for STWU scheduling check

This patch add a STWU testcase for scheduling check.

Currently P7/P8 which use itineraries are missing IIC_LdStStoreUpd,
We use CHECK-ITIN prefix to check P7/P8, then use default for P9 (and future).

We will fix the missing itineraries of IIC_LdStStoreUpd in following patch,
and update this testcase to show the scheduling difference only there.

Differential Revision: https://reviews.llvm.org/D54699

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347310 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-exegesis][NFC] Some code style cleanup

Apply review comments of https://reviews.llvm.org/D54185 to other target as well, specifically:

1. make anonymous namespaces as small as possible, avoid using static inside anonymous namespaces
2. Add missing header to some files
3. GetLoadImmediateOpcodem-> getLoadImmediateOpcode
4. Fix typo

Differential Revision: https://reviews.llvm.org/D54343

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347309 91177308-0d34-0410-b5e6-96231b3b80d8

Fix MSVC 'truncation of constant value' warning. NFCI.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347308 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][SSE] Add computeKnownBits/ComputeNumSignBits support for PACKSS/PACKUS instructions.

Pull out getPackDemandedElts demanded elts remapping helper from computeKnownBitsForTargetNode and use in computeKnownBits/ComputeNumSignBits.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347303 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][SSE] XFormVExtractWithShuffleIntoLoad - getVectorShuffle won't accept SM_SentinelZero

Noticed while working on improving demanded elts target shuffle shuffle combining

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347302 91177308-0d34-0410-b5e6-96231b3b80d8

[TargetLowering] Improve SimplifyDemandedVectorElts/SimplifyDemandedBits support

For bitcast nodes from larger element types, add the ability for SimplifyDemandedVectorElts to call SimplifyDemandedBits by merging the elts mask to a bits mask.

I've raised https://bugs.llvm.org/show_bug.cgi?id=39689 to deal with the few places where SimplifyDemandedBits's lack of vector handling is a problem.

Differential Revision: https://reviews.llvm.org/D54679

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347301 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][SSE] Lower immediately to PACKUS instead of VECTOR_SHUFFLE.

As discussed on rL347240, this avoids some regressions on D54679 and also helps some combines to kick in a bit earlier.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347300 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][SSE] Add SimplifyDemandedVectorElts support for PACKSS/PACKUS instructions.

As discussed on rL347240.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347299 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Preserve undef information when creating a punpckl/hbw from a v16i8 where all the even or odd elements are undef.

Previously if V2 was unused we ended up using V1 for both inputs as part of the code that follows the new code. By using lowerVectorShuffleWithUNPCK we keep the undef nature of V2 in the output.

As near as I can tell this makes v16i8 behavior consistent with every other VT now.

This does mean that we give the register allocator freedom to fill in random registers now and create false dependencies. But like I said we're already doing that for other types.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347296 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Add custom type legalization for v8i8->v8i32 sign extend pre-SSE4.1

This helps with a future patch and makes us less reliant on DAG combine merging shuffles.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347295 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Replace more calls to getZeroVector with regular getConstant.

getZeroVector produces a specifically canonicalized zero vector, but we can just let DAG legalization take care of it.

The test changes are because MULH lowering happens later than it should and this change gave us the opportunity to constant fold away a multiply during a DAG combine before the build_vector got legalized with a bitcast.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347290 91177308-0d34-0410-b5e6-96231b3b80d8

Recommit "[LoopSimplifyCFG] Teach LoopSimplifyCFG to constant-fold branches and switches"

The initial version of patch lacked Phi nodes updates in destinations of removed
edges. This version contains this update and tests on this situation.

Differential Revision: https://reviews.llvm.org/D54021

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347289 91177308-0d34-0410-b5e6-96231b3b80d8

[PowerPC] Don't combine to bswap store on 1-byte truncating store

Turns out that there was no check for a store that truncates down
to a single byte when combining a (store (bswap...)) into a byte-swapping
store. This patch just adds that check.

Fixes https://bugs.llvm.org/show_bug.cgi?id=39478.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347288 91177308-0d34-0410-b5e6-96231b3b80d8

[SelectionDAG] Compute known bits and num sign bits for live out vector registers. Use it to add AssertZExt/AssertSExt in the live in basic blocks

Summary:
We already support this for scalars, but it was explicitly disabled for vectors. In the updated test cases this allows us to see the upper bits are zero to use less multiply instructions to emulate a 64 bit multiply.

This should help with this ispc issue that a coworker pointed me to https://github.com/ispc/ispc/issues/1362

Reviewers: spatel, efriedma, RKSimon, arsenm

Reviewed By: spatel

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D54725

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347287 91177308-0d34-0410-b5e6-96231b3b80d8

[ExecutionEngine][Interpreter] Fix out-of-bounds array access.

If args is empty then accesing element 0 is illegal.

https://reviews.llvm.org/D53556

Patch by Eugene Sharygin. Thanks Eugene!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347281 91177308-0d34-0410-b5e6-96231b3b80d8

[DAGCombiner] reduce code duplication in visitXOR; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347278 91177308-0d34-0410-b5e6-96231b3b80d8

[WebAssembly] Remove unused function return types (NFC)

Reviewers: sbc100

Subscribers: dschuff, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D54734

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347277 91177308-0d34-0410-b5e6-96231b3b80d8

[CodeView] Don't print PointerAttributes when dumping.

PointerAttributes is a bitwise-or of several other fields, each of
which is already printed on its own line with a better explanation.
So this doesn't really help much.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347275 91177308-0d34-0410-b5e6-96231b3b80d8

Implement computeKnownBits for scalar_to_vector

Differential Revision: https://reviews.llvm.org/D54728

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347274 91177308-0d34-0410-b5e6-96231b3b80d8

It's its

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347271 91177308-0d34-0410-b5e6-96231b3b80d8

[Transforms] Prefer static and avoid namespaces, NFC

Put 'static' on three functions in an anonymous namespace as per our
coding style.

Remove the 'namespace llvm {}' around the .cpp file and explicitly
declare the free function 'llvm::optimizeGlobalCtorsList' in 'llvm::'.
I prefer this style for free functions because the compiler will error
out if the .h and .cpp files don't agree on the function name or
prototype.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347269 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Rename combineVSZext->combineExtendVectorInreg. NFC

Now that we no longer have target specific vector extend nodes let's make the function name match the nodes we do use.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347268 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Add test case to show missed opportunity to use a single pmuludq to implement a multiply when a zext lives in another basic block.

This can occur when one of the inputs to the multiply is loop invariant. Though my test cases just use two basic blocks with an unconditional jump which we won't merge until after isel in the codegen pipeline.

For scalars, I believe SelectionDAGBuilder can add an AssertZExt to pass knowledge across basic blocks but its explicitly disabled for vectors.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347266 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Fix V_FMA_F16 selection on GFX9

GFX9 should select opsel version.

Differential Revision: https://reviews.llvm.org/D54545

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347265 91177308-0d34-0410-b5e6-96231b3b80d8

Revert "[LoopSimplifyCFG] Teach LoopSimplifyCFG to constant-fold branches and switches"

This reverts commits r347183 & r347184. Crashes while building libxml.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347260 91177308-0d34-0410-b5e6-96231b3b80d8

[AMDGPU] Restored selection of scalar_to_vector (v2x16)

This works if DAG combiner is enabled, but without combining
we cannot select scalar_to_vector of <2 x half> and <2 x i16>.

Differential Revision: https://reviews.llvm.org/D54718

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347259 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] Set debug loc on `mergeStoreIntoSuccessor` phi

Assigning a merged debug location to the `mergeStoreIntoSuccessor` phi
improves backtrace quality.

Fixes llvm.org/PR38083.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347257 91177308-0d34-0410-b5e6-96231b3b80d8

[IR] Add hasNPredecessors, hasNPredecessorsOrMore to BasicBlock

Add methods to BasicBlock which make it easier to efficiently check
whether a block has N (or more) predecessors.

This can be more efficient than using pred_size(), which is a linear
time operation.

We might consider adding similar methods for successors. I haven't done
so in this patch because succ_size() is already O(1).

With this patch applied, I measured a 0.065% compile-time reduction in
user time for running `opt -O3` on the sqlite3 amalgamation (30 trials).
The change in mergeStoreIntoSuccessor alone saves 45 million linked list
iterations in a stage2 Release build of llc.

See llvm.org/PR39702 for a harder but more general way of achieving
similar results.

Differential Revision: https://reviews.llvm.org/D54686

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347256 91177308-0d34-0410-b5e6-96231b3b80d8