git.osdn.net Git - android-x86/external-llvm.git/log

[AMDGPU][Waitcnt] fix "comparison of integers of different signs" build error

Build error on Android; reported by and fix provided by (thanks) by Mauro Rossi <issor.oruam@gmail.com>

Fixes the following building error:

external/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1903:61:
error: comparison of integers of different signs:
'typename iterator_traits<__wrap_iter<MachineBasicBlock **> >::difference_type'
(aka 'int') and 'unsigned int' [-Werror,-Wsign-compare]
BlockWaitcntProcessedSet.end(), &MBB) < Count)) {
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~
1 error generated.

Differential Revision: https://reviews.llvm.org/D49089

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336588 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Force inlining if LDS global address is used

These won't work for the forseeable future. These aren't allowed
from OpenCL, but IPO optimizations can make them appear.

Also directly set the attributes on functions, regardless
of the linkage rather than cloning functions like before.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336587 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][TLI] DAGCombine: Unfold variable bit-clearing mask to two shifts.

Summary:
This adds a reverse transform for the instcombine canonicalizations
that were added in D47980, D47981.

As discussed later, that was worse at least for the code size,
and potentially for the performance, too.

https://rise4fun.com/Alive/Zmpl

Reviewers: craig.topper, RKSimon, spatel

Reviewed By: spatel

Subscribers: reames, llvm-commits

Differential Revision: https://reviews.llvm.org/D48768

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336585 91177308-0d34-0410-b5e6-96231b3b80d8

[Utils] Fix gdb pretty printers to work with Python 3.

Reiterate D23202 for container printers added after the change landed.

Differential Revision: https://reviews.llvm.org/D46578

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336580 91177308-0d34-0410-b5e6-96231b3b80d8

[Power9] Add __float128 builtins for Round To Odd

GCC has builtins for these round to odd instructions:

__float128 __builtin_sqrtf128_round_to_odd (__float128)
__float128 __builtin_{add,sub,mul,div}f128_round_to_odd (__float128, __float128)
__float128 __builtin_fmaf128_round_to_odd (__float128, __float128, __float128)

Differential Revision: https://reviews.llvm.org/D47550

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336578 91177308-0d34-0410-b5e6-96231b3b80d8

[DebugInfo] Change default value of FDEPointerEncoding

Summary:
If the encoding is not specified in CIE augmentation string, then it
should be DW_EH_PE_absptr instead of DW_EH_PE_omit.

Reviewers: ruiu, MaskRay, plotfi, rafauler

Reviewed By: MaskRay

Subscribers: rafauler, JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D49000

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336577 91177308-0d34-0410-b5e6-96231b3b80d8

[SelectionDAG] Add VT consistency checks to the creation of ISD::FMA.

This is similar to what is done for binops. I don't know if this would have helped us catch the bug fixed in r336566 earlier or not, but I figured it couldn't hurt.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336576 91177308-0d34-0410-b5e6-96231b3b80d8

Add bitcode compatibility test for 6.0

Summary:
Add bitcode compatibility test for 6.0. On top of the normal disassemble
test, also runs the verifier to make sure simple 6.0 bitcode can pass
the current IR verifier.

Reviewers: vsk

Reviewed By: vsk

Subscribers: dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D49086

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336574 91177308-0d34-0410-b5e6-96231b3b80d8

[LoopInfo] Port loop exit interfaces from Loop to LoopBase

This patch ports hasDedicatedExits, getUniqueExitBlocks and
getUniqueExitBlock in Loop to LoopBase so that they can be used
from other LoopBase sub-classes.

Reviewers: chandlerc, sanjoy, hfinkel, fhahn

Reviewed By: chandlerc

Differential Revision: https://reviews.llvm.org/D48817

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336572 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] correct test comments; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336570 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] In combineFMA, make sure we bitcast the result of isFNEG back the expected type before creating the new FMA node.

Previously, we were creating malformed SDNodes, but nothing noticed because the type constraints prevented isel from noticing.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336566 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][AVX] Regenerate AVX1 fast-isel tests.

Let the update script merge 32/64 tests where possible

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336565 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] avoid extra poison when moving shift above shuffle

As discussed in D49047 / D48987, shift-by-undef produces poison,
so we can't use undef vector elements in that case..

Note that we need to extend this for poison-generating flags,
and there's a proposal to create poison from FMF in D47963,

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336562 91177308-0d34-0410-b5e6-96231b3b80d8

[dsymutil] Add support for outputting assembly

When implementing the DWARF accelerator tables in dsymutil I ran into an
assertion in the assembler. Debugging these kind of issues is a lot
easier when looking at the assembly instead of debugging the assembler
itself. Since it's only a matter of creating an AsmStreamer instead of a
MCObjectStreamer it made sense to turn this into a (hidden) dsymutil
feature.

Differential revision: https://reviews.llvm.org/D49079

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336561 91177308-0d34-0410-b5e6-96231b3b80d8

[BitcodeReader] Infer the correct runtime preemption for GlobalValue

Summary:
To allow bitcode built by old compiler to pass the current verifer,
BitcodeReader needs to auto infer the correct runtime preemption from
linkage and visibility for GlobalValues.

Since llvm-6.0 bitcode already contains the new field but can be
incorrect in some cases, the attribute needs to be recomputed all the
time in BitcodeReader. This will make all the GVs has dso_local marked
correctly if read from bitcode, and it should still allow the verifier
to catch mistakes in optimization passes.

This should fix PR38009.

Reviewers: sfertile, vsk

Reviewed By: vsk

Subscribers: dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D49039

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336560 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] generalize safe vector constant utility

This is almost NFC, but there could be some case where the original
code had undefs in the constants (rather than just the shuffle mask),
and we'll use safe constants rather than undefs now.

The FIXME noted in foldShuffledBinop() is already visible in existing
tests, so correcting that is the next step.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336558 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Remove some patterns that include a bitcast of a floating point load to an integer type.

DAG combine should have converted the type of the load.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336557 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Remove some patterns that seems to be unreachable.

These patterns mapped (v2f64 (X86vzmovl (v2f64 (scalar_to_vector FR64:$src)))) to a MOVSD and an zeroing XOR. But the complexity of a pattern for (v2f64 (X86vzmovl (v2f64))) that selects MOVQ is artificially and hides this MOVSD pattern.

Weirder still, the SSE version of the pattern was explicitly blocked on SSE41, but yet we had copied it to AVX and AVX512.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336556 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Remove some seemingly unnecessary AddedComplexity lines.

Looking at the generated tables this didn't seem to make an obvious difference in pattern priority.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336555 91177308-0d34-0410-b5e6-96231b3b80d8

[VPlan][LV] Introduce condition bit in VPBlockBase

This patch introduces a VPValue in VPBlockBase to represent the condition
bit that is used as successor selector when a block has multiple successors.
This information wasn't necessary until now, when we are about to introduce
outer loop vectorization support in VPlan code gen.

Reviewers: fhahn, rengolin, mkuper, hfinkel, mssimpso

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D48814

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336554 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64][SVE] Asm: Support for CNT(B|H|W|D) and CNTP instructions.

This patch adds support for the following instructions:

  CNTB CNTH - Determine the number of active elements implied by
  CNTW CNTD   the named predicate constant, multiplied by an
              immediate, e.g.

                cnth x0, vl8, #16

  CNTP      - Count active predicate elements, e.g.
                cntp  x0, p0, p1.b

              counts the number of active elements in p1, predicated
              by p0, and stores the result in x0.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336552 91177308-0d34-0410-b5e6-96231b3b80d8

[CVP] Handle calls with void return value. No need to create CVPLattice state for it.

Summary:
Tests: 10
Metric: compile_time

Program                                         unpatch-result  patch-result diff

Bullet/bullet                                  32.39           30.54        -5.7%
SPASS/SPASS                                    18.14           17.25        -4.9%
mafft/pairlocalalign                           12.10           11.64        -3.8%
ClamAV/clamscan                                19.21           19.63         2.2%
7zip/7zip-benchmark                            49.55           48.85        -1.4%
kimwitu++/kc                                   15.68           15.87         1.2%
lencod/lencod                                  21.13           21.34         1.0%
consumer-typeset/consumer-typeset              13.65           13.62        -0.2%
tramp3d-v4/tramp3d-v4                          29.88           29.92         0.1%
sqlite3/sqlite3                                18.48           18.46        -0.1%
       unpatch-result  patch-result       diff
count  10.000000       10.000000     10.000000
mean   23.022000       22.712400    -0.011671
std    11.362831       11.094183     0.027338
min    12.104000       11.640000    -0.057298
25%    16.299000       16.214000    -0.032282
50%    18.844000       19.048000    -0.001350
75%    27.689000       27.774000     0.007752
max    49.552000       48.852000     0.021861

I also tested only this pass by concatenating all the code from the
llvm/lib/Analysis/ folder and do clang -g followed by opt. I get close to 20% speedup
for the pass. I expect a majority of the gain come from skipping the dbg intrinsics.

Before patch (opt -time-passes -called-value-propagation):
============
===-------------------------------------------------------------------------===
... Pass execution timing report ...
===-------------------------------------------------------------------------===
Total Execution Time: 3.8303 seconds (3.8279 wall clock)

---User Time--- --System Time-- --User+System-- ---Wall Time--- ---
Name ---
2.0768 ( 57.3%) 0.0990 ( 48.0%) 2.1757 ( 56.8%) 2.1757 ( 56.8%) Bitcode
Writer
0.8444 ( 23.3%) 0.0600 ( 29.1%) 0.9044 ( 23.6%) 0.9044 ( 23.6%) Called
Value Propagation
0.7031 ( 19.4%) 0.0472 ( 22.9%) 0.7502 ( 19.6%) 0.7478 ( 19.5%) Module
Verifier
3.6242 (100.0%) 0.2062 (100.0%) 3.8303 (100.0%) 3.8279 (100.0%) Total

After patch (opt -time-passes -called-value-propagation):
============
===-------------------------------------------------------------------------===
... Pass execution timing report ...
===-------------------------------------------------------------------------===
Total Execution Time: 3.6605 seconds (3.6579 wall clock)

---User Time--- --System Time-- --User+System-- ---Wall Time--- ---
Name ---
2.0716 ( 59.7%) 0.0990 ( 52.5%) 2.1705 ( 59.3%) 2.1706 ( 59.3%) Bitcode
Writer
0.7144 ( 20.6%) 0.0300 ( 15.9%) 0.7444 ( 20.3%) 0.7444 ( 20.4%) Called
Value Propagation
0.6859 ( 19.8%) 0.0596 ( 31.6%) 0.7455 ( 20.4%) 0.7429 ( 20.3%) Module
Verifier
3.4719 (100.0%) 0.1886 (100.0%) 3.6605 (100.0%) 3.6579 (100.0%) Total

Reviewers: davide, mssimpso

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49078

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336551 91177308-0d34-0410-b5e6-96231b3b80d8

[Power9] Add __float128 support for compare operations

Added handling for the select f128.

Differential Revision: https://reviews.llvm.org/D48294

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336548 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64][SVE] Asm: Support for remaining shift instructions.

This patch completes support for shifts, which include:
- LSL   - Logical Shift Left
- LSLR  - Logical Shift Left, Reversed form
- LSR   - Logical Shift Right
- LSRR  - Logical Shift Right, Reversed form
- ASR   - Arithmetic Shift Right
- ASRR  - Arithmetic Shift Right, Reversed form
- ASRD  - Arithmetic Shift Right for Divide

In the following variants:

- Predicated shift by immediate - ASR, LSL, LSR, ASRD
  e.g.
    asr z0.h, p0/m, z0.h, #1

  (active lanes of z0 shifted by #1)

- Unpredicated shift by immediate - ASR, LSL*, LSR*
  e.g.
    asr z0.h, z1.h, #1

  (all lanes of z1 shifted by #1, stored in z0)

- Predicated shift by vector - ASR, LSL*, LSR*
  e.g.
    asr z0.h, p0/m, z0.h, z1.h

  (active lanes of z0 shifted by z1, stored in z0)

- Predicated shift by vector, reversed form - ASRR, LSLR, LSRR
  e.g.
    lslr z0.h, p0/m, z0.h, z1.h

  (active lanes of z1 shifted by z0, stored in z0)

- Predicated shift left/right by wide vector - ASR, LSL, LSR
  e.g.
    lsl z0.h, p0/m, z0.h, z1.d

  (active lanes of z0 shifted by wide elements of vector z1)

- Unpredicated shift left/right by wide vector - ASR, LSL, LSR
  e.g.
    lsl z0.h, z1.h, z2.d

  (all lanes of z1 shifted by wide elements of z2, stored in z0)

*Variants added in previous patches.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336547 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] fix shuffle-of-binops transform to avoid poison/undef

As noted in D48987, there are many different ways for this transform to go wrong.
In particular, the poison potential for shifts means we have to more careful with those ops.
I added tests to make that behavior visible for all of the different cases that I could find.

This is a partial fix. To make this review easier, I did not make changes for the single binop
pattern (handled in foldSelectShuffleWith1Binop()). I also left out some potential optimizations
noted with TODO comments. I'll follow-up once we're confident that things are correct here.

The goal is to correct all marked FIXME tests to either avoid the shuffle transform or do it safely.

Note that distinguishing when the shuffle mask contains undefs and using getBinOpIdentity() allows
for some improvements to div/rem patterns, so there are wins along with the missed opportunities
and fixes.

Differential Revision: https://reviews.llvm.org/D49047

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336546 91177308-0d34-0410-b5e6-96231b3b80d8

[mips] Addition of the [d]rem and [d]remu instructions

Related to http://reviews.llvm.org/D15772
Depends on http://reviews.llvm.org/D16889
Adds [D]REM[U] instructions.

Patch By: Srdjan Obucina
Contributions from: Simon Dardis

Differential Revision: https://reviews.llvm.org/D17036

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336545 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64][SVE] Asm: Support for TBL instruction.

Support for SVE's TBL instruction for programmable table
lookup/permute using vector of element indices, e.g.

tbl z0.d, { z1.d }, z2.d

stores elements from z1, indexed by elements from z2, into z0.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336544 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-mca] report an error if the assembly sequence contains an unsupported instruction.

This is a short-term fix for PR38093.
For now, we llvm::report_fatal_error if the instruction builder finds an
unsupported instruction in the instruction stream.

We need to revisit this fix once we start addressing PR38101.
Essentially, we need a better framework for error handling.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336543 91177308-0d34-0410-b5e6-96231b3b80d8

[Support] Allow JSON serialization of Optional<T> for supported T.

This is ported from r333881 to JSON's new home.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336542 91177308-0d34-0410-b5e6-96231b3b80d8

[Support] Make JSON handle doubles and int64s losslessly

Summary:
This patch adds a new "integer" ValueType, and renames Number -> Double.
This allows us to preserve the full precision of int64_t when parsing integers
from the wire, or constructing from an integer.
The API is unchanged, other than giving asInteger() a clearer contract.

In addition, always output doubles with enough precision that parsing will
reconstruct the same double.

Reviewers: simon_tatham

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D46209

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336541 91177308-0d34-0410-b5e6-96231b3b80d8

[Support] Fix GCC compile after r336534

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336537 91177308-0d34-0410-b5e6-96231b3b80d8

[PM/Unswitch] Fix a nasty bug in the new PM's unswitch introduced in
r335553 with the non-trivial unswitching of switches.

The code correctly updated most aspects of the CFG and analyses, but
missed some crucial aspects:
1) When multiple cases have the same successor, we unswitch that
   a single time and replace the switch with a direct branch. The CFG
   here is correct, but the target of this direct branch may have had
   a PHI node with multiple entries in it.
2) When we still have to clone a successor of the switch into an
   unswitched copy of the loop, we'll delete potentially multiple edges
   entering this successor, not just one.
3) We also have to delete multiple edges entering the successors in the
   original loop when they have to be retained.
4) When the "retained successor" *also* occurs as a case successor, we
   just assert failed everywhere. This doesn't happen very easily
   because its always valid to simply drop the case -- the retained
   successor for switches is always the default successor. However, it
   is likely possible through some contrivance of different loop passes,
   unrolling, and simplifying for this to occur in practice and
   certainly there is nothing "invalid" about the IR so this pass needs
   to handle it.
5) In the case of #4, we also will replace these multiple edges with
   a direct branch much like in #1 and need to collapse the entries in
   any PHI nodes to a single enrty.

All of this stems from the delightful fact that the same successor can
show up in multiple parts of the switch terminator, and each of these
are considered a distinct edge for the purpose of PHI nodes (and
iterating the successors and predecessors) but not for unswitching
itself, the dominator tree, or many other things. For the record,
I intensely dislike this "feature" of the IR in large part because of
the complexity it causes in passes like this. We already have a ton of
logic building sets and handling duplicates, and we just had to add
a bunch more.

I've added a complex test case that covers all five of the above failure
modes. I've also added a variation on it where #4 and #5 occur in loop
exit, adding fun where we have an LCSSA PHI node with "multiple entries"
despite have dedicated exits. There were no additional issues found by
this, but it seems a useful corner case to cover with testing.

One thing that working on all of this code has made painfully clear for
me as well is how amazingly inefficient our PHI node representation is
(in terms of the in-memory data structures and the APIs used to update
them). This code has truly marvelous complexity bounds because every
time we remove an entry from a PHI node we do a linear scan to find it
and then a linear update to the data structure to remove it. We could in
theory batch all of the PHI node updates into a single linear walk of
the operands making this much more efficient, but the APIs fight hard
against this and the fact that we have to handle duplicates in the
peculiar manner we do (removing all but one in some cases) makes even
implementing that very tedious and annoying. Anyways, none of this is
new here or specific to loop unswitching. All code in LLVM that updates
PHI node operands suffers from these problems.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336536 91177308-0d34-0410-b5e6-96231b3b80d8

Lift JSON library from clang-tools-extra/clangd to llvm/Support.

Summary:
This consists of four main parts:
- an type json::Expr representing JSON values of dynamic kind, which can be
   composed, inspected, and modified
- a JSON parser from string -> json::Expr
- a JSON printer from json::Expr -> string, with optional pretty-printing
- a convention for mapping json::Expr <=> native types (fromJSON/toJSON)
   Mapping functions are provided for primitives (e.g. int, vector) and the
   ObjectMapper helper helps implement fromJSON for struct/object types.

Based on clangd's usage, a couple of places I'd appreciate review attention:
- fromJSON returns only bool. A richer error-signaling mechanism may be useful
   to provide useful messages, or let recursive fromJSONs (containers/structs)
   do careful error recovery.
- should json::obj be always explicitly written (like json::ary)
- there's no streaming parse API. I suspect there are some simple wins like
   a callback API where the document is a long array, and each element is small.
   But this can probably be bolted on easily when we see the need.

Reviewers: bkramer, labath

Subscribers: mgorny, ilya-biryukov, ioeric, MaskRay, llvm-commits

Differential Revision: https://reviews.llvm.org/D45753

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336534 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64][SVE] Asm: Support for ADR instruction.

Supporting various addressing modes:
- adr z0.s, [z0.s, z0.s]
- adr z0.s, [z0.s, z0.s, lsl #<shift>]
- adr z0.d, [z0.d, z0.d]
- adr z0.d, [z0.d, z0.d, lsl #<shift>]
- adr z0.d, [z0.d, z0.d, uxtw #<shift>]
- adr z0.d, [z0.d, z0.d, sxtw #<shift>]

Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D48870

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336533 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64][SVE] Asm: Support for UZP and TRN instructions.

This patch adds support for:
  UZP1  Concatenate even elements from two vectors
  UZP2  Concatenate  odd elements from two vectors
  TRN1  Interleave  even elements from two vectors
  TRN2  Interleave   odd elements from two vectors

With variants for both data and predicate vectors, e.g.
  uzp1    z0.b, z1.b, z2.b
  trn2    p0.s, p1.s, p2.s

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336531 91177308-0d34-0410-b5e6-96231b3b80d8

[AccelTable] Provide abstraction for emitting DWARF5 accelerator tables.

When emitting the DWARF accelerator tables from dsymutil, we don't have
a DwarfDebug instance and we use a custom class to represent Dwarf
compile units. This patch adds an interface AccelTableWriterInfo to
abstract these from the Dwarf5AccelTableWriter, so we can have a custom
implementation for this in dsymutil.

Differential revision: https://reviews.llvm.org/D49031

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336529 91177308-0d34-0410-b5e6-96231b3b80d8

[AccelTable] Dwarf5AccelTableEmitter -> Writer (NFC)

Renames Dwarf5AccelTableEmitter to Dwarf5AccelTableWriter as suggested
in D49031.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336525 91177308-0d34-0410-b5e6-96231b3b80d8

[PGOMemOPSize] Preserve the DominatorTree

Summary:
PGOMemOPSize only modifies CFG in a couple of places; thus we can preserve the DominatorTree with little effort.
When optimizing SQLite with -O3, this patch can decrease 3.8% of the numbers of nodes traversed by DFS and 5.7% of the times DominatorTreeBase::recalculation is called.

Reviewers: kuhar, davide, dmgreen

Reviewed By: dmgreen

Subscribers: mzolotukhin, vsk, llvm-commits

Differential Revision: https://reviews.llvm.org/D48914

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336522 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Improve the message for some asserts. Remove an if that is guaranteed true by said asserts.

This replaces some asserts in lowerV2F64VectorShuffle with the similar asserts from lowerVIF64VectorShuffle which are more readable. The original asserts mentioned a blend, but there's no guarantee that it is a blend.

Also remove an if that the asserts prove is always true. Mask[0] is always less than 2 and Mask[1] is always at least 2. Therefore (Mask[0] >= 2) + (Mask[1] >= 2) == 1 must wlays be true.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336517 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Remove an AddedComplexity line that seems unnecessary.

It only existed on SSE and AVX version. AVX512 version didn't have it.

I checked the generated table and this didn't seem necessary to creat a match preference.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336516 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][Nearly NFC] Split SHLD/SHRD into their own WriteShiftDouble class

Summary:
{F6603964}
While there is still some discrepancies within that new group,
it is clearly separate from the other shifts.
And Agner's tables agree, these double shifts are clearly
different from the normal shifts/rotates.

I'm guessing `FeatureSlowSHLD` is related.

Indeed, a basic sched pair is *not* the /best/ match.
But keeping it in the WriteShift is /clearly/ not ideal either.
This can and likely will be fine-tuned later.

This is purely mechanical change, it does not change any numbers,
as the [lack of the change of] mca tests show.

Reviewers: craig.topper, RKSimon, andreadb

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49015

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336515 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Enhance combineFMA to look for FNEG behind an EXTRACT_VECTOR_ELT.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336514 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][SSE] Combine v16i8 SHL by constants to multiplies

Pre-AVX512 (which can perform a quick extend/shift/truncate), extending to 2 v8i16 for the PMULLW and then truncating is more performant than relying on the generic PBLENDVB vXi8 shift path and uses a similar amount of mask constant pool data.

Differential Revision: https://reviews.llvm.org/D48963

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336513 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Set scheduler classes to unsupported. NFCI.

While looking at PR36895 I noticed how much of the atom model was still setting schedules for unsupported SSE4+ instructions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336512 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][Basically NFC] Sched: split WriteBitScan into WriteBSF/WriteBSR.

Summary:
Motivation: {F6597954}

This only does the mechanical splitting, does not actually change
any numbers, as the tests added in previous revision show.

Reviewers: craig.topper, RKSimon, courbet

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D48998

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336511 91177308-0d34-0410-b5e6-96231b3b80d8

[MCA][X86][NFC] Add BSF/BSR resource tests

Reviewers: RKSimon, andreadb, courbet

Reviewed By: RKSimon

Subscribers: gbedwell, llvm-commits

Differential Revision: https://reviews.llvm.org/D48997

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336510 91177308-0d34-0410-b5e6-96231b3b80d8

[LoopIdiomRecognize] Support for converting loops that use LSHR to CTLZ.

In the 'detectCTLZIdiom' function support for loops that use LSHR instruction instead of ASHR has been added.

This supports creating ctlz from the following code.

int lzcnt(int x) {
     int count = 0;
     while (x > 0)  {
          count++;
          x = x >> 1;
     }
    return count;
}

Patch by Olga Moldovanova

Differential Revision: https://reviews.llvm.org/D48354

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336509 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Add back some intrinsic table entries lost in r336506.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336508 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Add new scalar fma intrinsics with rounding mode that use f32/f64 types.

This allows us to handle masking in a very similar way to the default rounding version that uses llvm.fma.

I had to add new rounding mode CodeGenOnly instructions to support isel when we can't find a movss to grab the upper bits from to use the b_Int instruction.

Fast-isel tests have been updated to match new clang codegen.

We are currently having trouble folding fneg into the new intrinsic. I'm going to correct that in a follow up patch to keep the size of this one down.

A future patch will also remove the old intrinsics.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336506 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Use a rounding mode other than 4 in the scalar fma intrinsic fast-isel tests to match clang test cases.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336505 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Regenerate PR14088 test. NFCI.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336496 91177308-0d34-0410-b5e6-96231b3b80d8

[SelectionDAG] Split float and integer isKnownNeverZero tests

Splits off isKnownNeverZeroFloat to handle +/- 0 float cases.

This will make it easier to be more aggressive with the integer isKnownNeverZero tests (similar to ValueTracking), use computeKnownBits etc.

Differential Revision: https://reviews.llvm.org/D48969

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336492 91177308-0d34-0410-b5e6-96231b3b80d8

Use const APInt& to avoid extra copy. NFCI.

As discussed on D48825.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336491 91177308-0d34-0410-b5e6-96231b3b80d8

[DAGCombiner] Add EXTRACT_SUBVECTOR to SimplifyDemandedVectorElts

As discussed on PR37989, this patch adds EXTRACT_SUBVECTOR handling to TargetLowering::SimplifyDemandedVectorElts and calls it from DAGCombiner::visitEXTRACT_SUBVECTOR.

Differential Revision: https://reviews.llvm.org/D48825

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336490 91177308-0d34-0410-b5e6-96231b3b80d8

[CostModel][X86] Add SREM/UREM general and constant costs (PR38056)

We penalize general SDIV/UDIV costs but don't do the same for SREM/UREM.

This patch makes general vector SREM/UREM x20 as costly as scalar, the same approach as we do for SDIV/UDIV. The patch also extends the existing SDIV/UDIV constant costs for SREM/UREM - at the moment this means the additional cost of a MUL+SUB (see D48975).

Differential Revision: https://reviews.llvm.org/D48980

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336486 91177308-0d34-0410-b5e6-96231b3b80d8

Test commit

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336485 91177308-0d34-0410-b5e6-96231b3b80d8

NFC - Typo fixes in X86 flags-copy-lowering.mir test

Differential Revision: https://reviews.llvm.org/D48934

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336484 91177308-0d34-0410-b5e6-96231b3b80d8

[MachineOutliner] Add missing liveness tracking info in MIR test.

This should bring the bots back to green state.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336482 91177308-0d34-0410-b5e6-96231b3b80d8

[MachineOutliner] Assert that Liveness tracking is accurate (NFC)

The checking is done deeper inside MachineBasicBlock, but this will
hopefully help to find issues when porting the machine outliner to a
target where Liveness tracking is broken (like ARM).

Differential Revision: https://reviews.llvm.org/D49023

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336481 91177308-0d34-0410-b5e6-96231b3b80d8

[Support] Clear errno before calling the function in RetryAfterSignal.

For certain APIs, the return value of the function does not distinguish
between failure (which populates errno) and other non-error conditions
(which do not set errno).

For example, `fgets` returns `NULL` both when an error has occurred, or
upon EOF. If `errno` is already `EINTR` for whatever reason, then
```
RetryAfterSignal(nullptr, fgets, ...);
```
on a stream that has reached EOF would infinite loop.

Fix this by setting `errno` to `0` before each attempt in
`RetryAfterSignal`.

Patch by Ricky Zhou!

Differential Revision: https://reviews.llvm.org/D48755

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336479 91177308-0d34-0410-b5e6-96231b3b80d8

[PM/LoopUnswitch] Fix PR37889, producing the correct loop nest structure
after trivial unswitching.

This PR illustrates that a fundamental analysis update was not performed
with the new loop unswitch. This update is also somewhat fundamental to
the core idea of the new loop unswitch -- we actually *update* the CFG
based on the unswitching. In order to do that, we need to update the
loop nest in addition to the domtree.

For some reason, when writing trivial unswitching, I thought that the
loop nest structure cannot be changed by the transformation. But the PR
helps illustrate that it clearly can. I've expanded this to a number of
different test cases that try to cover the different cases of this. When
we unswitch, we move an exit edge of a loop out of the loop. If this
exit edge changes which loop reached by an exit is the innermost loop,
it changes the parent of the loop. Essentially, this transformation may
hoist the inner loop up the nest. I've added the simple logic to handle
this reliably in the trivial unswitching case. This just requires
updating LoopInfo and rebuilding LCSSA on the impacted loops. In the
trivial case, we don't even need to handle dedicated exits because we're
only hoisting the one loop and we just split its preheader.

I've also ported all of these tests to non-trivial unswitching and
verified that the logic already there correctly handles the loop nest
updates necessary.

Differential Revision: https://reviews.llvm.org/D48851

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336477 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Merge INTR_TYPE_3OP_RM with INTR_TYPE_3OP. Remove unused INTR_TYPE_1OP_RM.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336476 91177308-0d34-0410-b5e6-96231b3b80d8

Revert "[SCEV] Strengthen StrengthenNoWrapFlags (reapply r334428)."

This reverts commit r336140. Our tests shows that LSR assert fails with it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336473 91177308-0d34-0410-b5e6-96231b3b80d8

[PDB] memicmp only exists on Windows, use StringRef::compare_lower instead

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336469 91177308-0d34-0410-b5e6-96231b3b80d8

Fix DIExpression::ExprOperand::appendToVector

appendToVector used the wrong overload of SmallVector::append, resulting
in it appending the same element to a vector `getSize()` times. This did
not cause a problem when initially committed because appendToVector was
only used to append 1-element operands.

This changes appendToVector to use the correct overload of append().

Testing: ./unittests/IR/IRTests --gtest_filter='*DIExpressionTest*'

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336466 91177308-0d34-0410-b5e6-96231b3b80d8

Remove a redundant null-check in DIExpression::prepend, NFC

Code outside of an `if (Expr)` block dereferenced `Expr`, so the null
check was redundant.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336465 91177308-0d34-0410-b5e6-96231b3b80d8

[PDB] One more fix for hasing GSI records.

The reference implementation uses a case-insensitive string
comparison for strings of equal length.  This will cause the
string "tEo" to compare less than "VUo".  However we were using
a case sensitive comparison, which would generate the opposite
outcome.  Switch to a case insensitive comparison.  Also, when
one of the strings contains non-ascii characters, fallback to
a straight memcmp.

The only way to really test this is with a DIA test.  Before this
patch, the test will fail (but succeed if link.exe is used instead
of lld-link).  After the patch, it succeeds even with lld-link.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336464 91177308-0d34-0410-b5e6-96231b3b80d8

Use Type::isIntOrPtrTy where possible, NFC

It's a bit neater to write T.isIntOrPtrTy() over `T.isIntegerTy() ||
T.isPointerTy()`.

I used Python's re.sub with this regex to update users:

r'([\w.\->()]+)isIntegerTy\s*\|\|\s*\1isPointerTy'

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336462 91177308-0d34-0410-b5e6-96231b3b80d8

[IR] Fix inconsistent declaration parameter name

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336459 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Remove patterns for MOVLPD/MOVLPS nodes with integer types.

Lowering shouldn't generate these. If we need to use them for integer types, it should use a bitcast.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336458 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Add more FMA3 memory folding patterns. Remove patterns that are no longer needed.

We've removed the legacy FMA3 intrinsics and are now using llvm.fma and extractelement/insertelement. So we don't need patterns for the nodes that could only be created by the old intrinscis. Those ISD opcodes still exist because we haven't dropped the AVX512 intrinsics yet, but those should go to EVEX instructions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336457 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-mca] Add HardwareUnit and Context classes.

This patch moves the construction of the default backend from llvm-mca.cpp and
into mca::Context. The Context class is responsible for holding ownership of
the simulated hardware components. These components are subclasses of
HardwareUnit. Right now the HardwareUnit is pretty bare-bones, but eventually
we might want to add some common functionality across all hardware components,
such as isReady() or something similar.

I have a feeling this patch will probably need some updates, but it's a start.
One thing I am not particularly fond of is the rather large interface for
createDefaultPipeline. That convenience routine takes a rather large set of
inputs from the llvm-mca driver, where many of those inputs are generated via
command line options.

One item I think we might want to change is the separating of ownership of
hardware components (owned by the context) and the pipeline (which owns
Stages). In short, a Pipeline owns Stages, a Context (currently) owns hardware.
The Pipeline's Stages make use of the components, and thus there is a lifetime
dependency generated. The components must outlive the pipeline. We could solve
this by having the Context also own the Pipeline, and not return a
unique_ptr<Pipeline>. Now that I think about it, I like that idea more.

Differential Revision: https://reviews.llvm.org/D48691

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336456 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-objcopy] Add support for static libraries

This diff adds support for handling static libraries
to llvm-objcopy and llvm-strip.

Test plan: make check-all

Differential revision: https://reviews.llvm.org/D48413

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336455 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] add more tests for potentially poisonous shifts; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336454 91177308-0d34-0410-b5e6-96231b3b80d8

Revert 336426 (and follow-ups 428, 440), it very likely caused PR38084.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336453 91177308-0d34-0410-b5e6-96231b3b80d8

[Debugify] Allow unsigned values narrower than their variables

Suppress the diagnostic for mis-sized dbg.values when a value operand is
narrower than the unsigned variable it describes. Assume that a debugger
would implicitly zero-extend these values.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336452 91177308-0d34-0410-b5e6-96231b3b80d8

[Local] replaceAllDbgUsesWith: Update debug values before RAUW

The replaceAllDbgUsesWith utility helps passes preserve debug info when
replacing one value with another.

This improves upon the existing insertReplacementDbgValues API by:

- Updating debug intrinsics in-place, while preventing use-before-def of
  the replacement value.
- Falling back to salvageDebugInfo when a replacement can't be made.
- Moving the responsibiliy for rewriting llvm.dbg.* DIExpressions into
  common utility code.

Along with the API change, this teaches replaceAllDbgUsesWith how to
create DIExpressions for three basic integer and pointer conversions:

- The no-op conversion. Applies when the values have the same width, or
  have bit-for-bit compatible pointer representations.
- Truncation. Applies when the new value is wider than the old one.
- Zero/sign extension. Applies when the new value is narrower than the
  old one.

Testing:

- check-llvm, check-clang, a stage2 `-g -O3` build of clang,
  regression/unit testing.
- This resolves a number of mis-sized dbg.value diagnostics from
  Debugify.

Differential Revision: https://reviews.llvm.org/D48676

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336451 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] add more tests with poison and undef; NFC

As discussed in D48987 and D48893, there are many different
ways to go wrong depending on the binop (and as shown here
we already do go wrong in some cases).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336450 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Fix UBSan error caused by r335942

Summary: Fixes PR38071.

Reviewers: arsenm, dstenb

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48979

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336448 91177308-0d34-0410-b5e6-96231b3b80d8

[Constants] extend getBinOpIdentity(); NFC

The enhanced version will be used in D48893 and related patches
and an almost identical (fadd is different) version is proposed
in D28907, so adding this as a preliminary step.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336444 91177308-0d34-0410-b5e6-96231b3b80d8

[Constant] add undef element query for vector constants; NFC

This is likely to be used in D48987 and similar patches,
so adding it as an NFC preliminary step.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336442 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM] ParallelDSP: added statistics, NFC.

Added statistics for the number of SMLAD instructions created, and
als renamed the pass name to -arm-parallel-dsp.

Differential Revision: https://reviews.llvm.org/D48971

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336441 91177308-0d34-0410-b5e6-96231b3b80d8

Commit rL336426 cause buildbot failures

http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/50537/testReport/junit/LLVM/CodeGen_AArch64/FoldRedundantShiftedMasking_ll/

This removes the comments of the function label causing this error.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336440 91177308-0d34-0410-b5e6-96231b3b80d8

[LoopSink] Make the enforcement of determinism deterministic.

LoopBlockNumber is a DenseMap<BasicBlock*, int>, comparing the result of
find() will compare a pair<BasicBlock*, int>. That's of course depending
on pointer ordering which varies from run to run. Reverse iteration
doesn't find this because we're copying to a vector first.

This bug has been there since 2016 but only recently showed up on clang
selfhost with FDO and ThinLTO, which is also why I didn't manage to get
a reasonable test case for this. Add an assert that would've caught
this.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336439 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-mca] A write latency cannot be a negative value. NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336437 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64] Armv8.4-A: TLB support

This adds:
- outer shareable TLB Maintenance instructions, and
- TLB range maintenance instructions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336434 91177308-0d34-0410-b5e6-96231b3b80d8

[dsymutil] Emit label at the begin of a CU

When emitting a CU, store the MCSymbol pointing to the beginning of the
CU. We'll need this information later when emitting the .debug_names
section (DWARF5 accelerator table).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336433 91177308-0d34-0410-b5e6-96231b3b80d8

Recommit: [AArch64] Armv8.4-A: Flag manipulation instructions

Now with the asm operand definition included.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336432 91177308-0d34-0410-b5e6-96231b3b80d8

Added missing semicolon

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336428 91177308-0d34-0410-b5e6-96231b3b80d8

[SelectionDAG] https://reviews.llvm.org/D48278

D48278

Allow to reduce redundant shift masks.
For example:
x1 = x & 0xAB00
x2 = (x >> 8) & 0xAB

can be reduced to:
x1 = x & 0xAB00
x2 = x1 >> 8
It only allows folding when the masks and shift values are constants.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336426 91177308-0d34-0410-b5e6-96231b3b80d8

Revert [AArch64] Armv8.4-A: Flag manipulation instructions

It's causing build errors.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336422 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64] Armv8.4-A: Flag manipulation instructions

These instructions are added to AArch64 only.

Differential Revision: https://reviews.llvm.org/D48926

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336421 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-mca] improve the instruction issue logic implemented by the Scheduler.

This patch modifies the Scheduler heuristic used to select the next instruction
to issue to the pipelines.

The motivating example is test X86/BtVer2/add-sequence.s, for which llvm-mca
wrongly reported an estimated IPC of 1.50. According to perf, the actual IPC for
that test should have been ~2.00.
It turns out that an IPC of 2.00 for test add-sequence.s cannot possibly be
predicted by a Scheduler that only prioritizes instructions based on their
"age". A similar issue also affected test X86/BtVer2/dependent-pmuld-paddd.s,
for which llvm-mca wrongly estimated an IPC of 0.84 instead of an IPC of 1.00.

Instructions in the ReadyQueue are now ranked based on two factors:
- The "age" of an instruction.
- The number of unique users of writes associated with an instruction.

The new logic still prioritizes older instructions over younger instructions to
minimize the pressure on the reorder buffer. However, the number of users of an
instruction now also affects the overall rank. This potentially increases the
ability of the Scheduler to extract instruction level parallelism. This patch
fixes the problem with the wrong IPC reported for test add-sequence.s and test
dependent-pmuld-paddd.s.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336420 91177308-0d34-0410-b5e6-96231b3b80d8

CallGraphSCCPass: iterate over all functions.

Previously we only iterated over functions reachable from the set of
external functions in the module. But since some of the passes under
this (notably the always-inliner and coroutine lowerer) are required for
correctness, they need to run over everything.

This just adds an extra layer of iteration over the CallGraph to keep
track of which functions we've already visited and get the next batch of
SCCs.

Should fix PR38029.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336419 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64][ARM] Armv8.4-A: Trace synchronization barrier instruction

This adds the Armv8.4-A Trace synchronization barrier (TSB) instruction.

Differential Revision: https://reviews.llvm.org/D48918

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336418 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Remove FMA4 scalar intrinsics. Use llvm.fma intrinsic instead.

The intrinsics can be implemented with a f32/f64 llvm.fma intrinsic and an insert into a zero vector.

There are a couple regressions here due to SelectionDAG not being able to pull an fneg through an extract_vector_elt. I'm not super worried about this though as InstCombine should be able to do it before we get to SelectionDAG.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336416 91177308-0d34-0410-b5e6-96231b3b80d8

[Support] Make support types more easily printable.

Summary:
Error's new operator<< is the first way to print an error without consuming it.

formatv() can now print objects with an operator<< that works with raw_ostream.

Reviewers: bkramer

Subscribers: mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D48966

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336412 91177308-0d34-0410-b5e6-96231b3b80d8

Reapply: "objdump: Support newer ObjC image info flags"

Summary:
Add support for two additional ObjC image info flags: `IS_SIMULATED` and
`HAS_CATEGORY_CLASS_PROPERTIES`.

`IS_SIMULATED` indicates a Mach-O binary built for iOS simulator.

`HAS_CATEGORY_CLASS_PROPERTIES` indicates a Mach-O binary built by a compiler
that supports class properties in categories.

Reviewers: enderby, compnerd

Reviewed By: compnerd

Subscribers: keith, llvm-commits

Differential Revision: https://reviews.llvm.org/D48568

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336411 91177308-0d34-0410-b5e6-96231b3b80d8

Revert "[InstCombine] Delay foldICmpUsingKnownBits until simple transforms are done"

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336410 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Remove all of the avx512 masked packed fma intrinsics. Use llvm.fma or unmasked 512-bit intrinsics with rounding mode.

This upgrades all of the intrinsics to use fneg instructions to convert fma into fmsub/fnmsub/fnmadd/fmsubadd. And uses a select instruction for masking.

This matches how clang uses the intrinsics these days.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336409 91177308-0d34-0410-b5e6-96231b3b80d8