OSDN Git Service
Julian Lettner [Tue, 26 Feb 2019 19:03:26 +0000 (19:03 +0000)]
[lit] Allow setting parallelism groups to None
Check that we do not crash if a parallelism group is explicitly set to
None. Permits usage of the following pattern.
[lit.common.cfg]
lit_config.parallelism_groups['my_group'] = None
if <condition>:
lit_config.parallelism_groups['my_group'] = 3
[project/lit.cfg]
config.parallelism_group = 'my_group'
Reviewers: rnk
Differential Revision: https://reviews.llvm.org/D58305
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354912
91177308-0d34-0410-b5e6-
96231b3b80d8
Kristina Brooks [Tue, 26 Feb 2019 18:53:13 +0000 (18:53 +0000)]
Update docs of memcpy/move/set wrt. align and len
Fix https://bugs.llvm.org/show_bug.cgi?id=38583: Describe
how memcpy/memmove/memset behave when len=0. Also fix
some fallout from when the alignment parameter was
replaced by an attribute.
This closes PR38583.
Patch by RalfJung (Ralf)
Differential Revision: https://reviews.llvm.org/D57600
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354911
91177308-0d34-0410-b5e6-
96231b3b80d8
Andrew Ng [Tue, 26 Feb 2019 18:50:49 +0000 (18:50 +0000)]
[TableGen] Make OpcodeMappings sort comparator deterministic NFCI
The previous sort comparator was not deterministic, i.e. in some
situations it would be possible for lhs < rhs && rhs < lhs. This was
discovered by an STL assertion in a Windows debug build of llvm-tblgen.
Differential Revision: https://reviews.llvm.org/D58687
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354910
91177308-0d34-0410-b5e6-
96231b3b80d8
Sanjay Patel [Tue, 26 Feb 2019 18:26:56 +0000 (18:26 +0000)]
[InstSimplify] remove zero-shift-guard fold for general funnel shift
As discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2019-February/130491.html
We can't remove the compare+select in the general case because
we are treating funnel shift like a standard instruction (as
opposed to a special instruction like select/phi).
That means that if one of the operands of the funnel shift is
poison, the result is poison regardless of whether we know that
the operand is actually unused based on the instruction's
particular semantics.
The motivating case for this transform is the more specific
rotate op (rather than funnel shift), and we are preserving the
fold for that case because there is no chance of introducing
extra poison when there is no anonymous extra operand to the
funnel shift.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354905
91177308-0d34-0410-b5e6-
96231b3b80d8
Petar Avramovic [Tue, 26 Feb 2019 17:22:42 +0000 (17:22 +0000)]
[MIPS GlobalISel] Select G_UADDO
Lower G_UADDO.
Legalize G_UADDO for MIPS32
Differential Revision: https://reviews.llvm.org/D58671
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354900
91177308-0d34-0410-b5e6-
96231b3b80d8
Ganesh Gopalasubramanian [Tue, 26 Feb 2019 16:55:10 +0000 (16:55 +0000)]
[X86] AMD znver2 enablement
This patch enables the following
1) AMD family 17h "znver2" tune flag (-march, -mcpu).
2) ISAs that are enabled for "znver2" architecture.
3) For the time being, it uses the znver1 scheduler model.
4) Tests are updated.
5) Scheduler descriptions are yet to be put in place.
Reviewers: craig.topper
Differential Revision: https://reviews.llvm.org/D58343
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354897
91177308-0d34-0410-b5e6-
96231b3b80d8
Jonas Paulsson [Tue, 26 Feb 2019 16:47:59 +0000 (16:47 +0000)]
[SystemZ] Wait with selection of legal vector/FP constants until Select().
This patch aims to make sure that any such constant that can be generated
with a vector instruction (for example VGBM) is recognized as such during
legalization and kept as a target independent node through post-legalize
DAGCombining.
Two new functions named isVectorConstantLegal() and loadVectorConstant()
replace old ways of handling vector/FP constants.
A new struct named SystemZVectorConstantInfo is used to cache the results of
isVectorConstantLegal() and pass them onto loadVectorConstant().
Support for fp128 constants in the presence of FeatureVectorEnhancements1
(z14) has been added.
Review: Ulrich Weigand
https://reviews.llvm.org/D58270
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354896
91177308-0d34-0410-b5e6-
96231b3b80d8
Sanjay Patel [Tue, 26 Feb 2019 16:44:08 +0000 (16:44 +0000)]
[InstSimplify] add tests for rotate; NFC
Rotate is a special-case of funnel shift that has different
poison constraints than the general case. That's not visible
yet in the existing tests, but it needs to be corrected.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354894
91177308-0d34-0410-b5e6-
96231b3b80d8
Sanjay Patel [Tue, 26 Feb 2019 15:25:42 +0000 (15:25 +0000)]
[InstCombine] remove duplicate (but not updated) tests; NFC
Not sure how it happened, but rL354886 was a duplicate of rL354881,
but not updated with rL354887.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354889
91177308-0d34-0410-b5e6-
96231b3b80d8
Sanjay Patel [Tue, 26 Feb 2019 15:18:49 +0000 (15:18 +0000)]
[InstCombine] canonicalize more unsigned saturated add with 'not'
Yet another pattern variation suggested by:
https://bugs.llvm.org/show_bug.cgi?id=14613
There are 8 more potential commuted patterns here on top of the
8 that were already handled (rL354221, rL354276, rL354393).
We have the obvious commute of the 'add' + commute of the cmp
predicate/operands (ugt/ult) + commute of the select operands:
Name: base
%notx = xor i32 %x, -1
%a = add i32 %notx, %y
%c = icmp ult i32 %x, %y
%r = select i1 %c, i32 -1, i32 %a
=>
%c2 = icmp ult i32 %a, %y
%r = select i1 %c2, i32 -1, i32 %a
Name: ugt
%notx = xor i32 %x, -1
%a = add i32 %notx, %y
%c = icmp ugt i32 %y, %x
%r = select i1 %c, i32 -1, i32 %a
=>
%c2 = icmp ult i32 %a, %y
%r = select i1 %c2, i32 -1, i32 %a
Name: commute select
%notx = xor i32 %x, -1
%a = add i32 %notx, %y
%c = icmp ult i32 %y, %x
%r = select i1 %c, i32 %a, i32 -1
=>
%c2 = icmp ult i32 %a, %y
%r = select i1 %c2, i32 -1, i32 %a
Name: ugt + commute select
%notx = xor i32 %x, -1
%a = add i32 %notx, %y
%c = icmp ugt i32 %x, %y
%r = select i1 %c, i32 %a, i32 -1
=>
%c2 = icmp ult i32 %a, %y
%r = select i1 %c2, i32 -1, i32 %a
https://rise4fun.com/Alive/den
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354887
91177308-0d34-0410-b5e6-
96231b3b80d8
Sanjay Patel [Tue, 26 Feb 2019 15:18:44 +0000 (15:18 +0000)]
[InstCombine] add more tests for saturated add; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354886
91177308-0d34-0410-b5e6-
96231b3b80d8
Nirav Dave [Tue, 26 Feb 2019 15:02:32 +0000 (15:02 +0000)]
[DAG] Fix constant store folding to handle non-byte sizes.
Avoid crashes from zero-byte values due to sub-byte store sizes.
Reviewers: uabelho, courbet, rnk
Reviewed By: courbet
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58626
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354884
91177308-0d34-0410-b5e6-
96231b3b80d8
Simon Atanasyan [Tue, 26 Feb 2019 14:45:17 +0000 (14:45 +0000)]
[mips] Emit `.module softfloat` directive
This change fixes crash on an assertion in case of using
`soft float` ABI for mips32r6 target.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354882
91177308-0d34-0410-b5e6-
96231b3b80d8
Sanjay Patel [Tue, 26 Feb 2019 14:40:23 +0000 (14:40 +0000)]
[InstCombine] add more tests for saturated add; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354881
91177308-0d34-0410-b5e6-
96231b3b80d8
Andrea Di Biagio [Tue, 26 Feb 2019 14:19:00 +0000 (14:19 +0000)]
[MCA] Always check if scheduler resources are unavailable when reporting dispatch stalls.
Dispatch stall cycles may be associated to multiple dispatch stall events.
Before this patch, each stall cycle was associated with a single stall event.
This patch also improves a couple of code comments, and adds a helper method to
query the Scheduler for dispatch stalls.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354877
91177308-0d34-0410-b5e6-
96231b3b80d8
George Rimar [Tue, 26 Feb 2019 14:14:49 +0000 (14:14 +0000)]
[yaml2obj][obj2yaml] - Add support for the architecture specific dynamic tags.
This allows tools to parse/dump the architecture specific tags
like DT_MIPS_*, DT_PPC64_* and DT_HEXAGON_*
Also fixes a bug in DynamicTags.def which was revealed in this patch.
Differential revision: https://reviews.llvm.org/D58667
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354876
91177308-0d34-0410-b5e6-
96231b3b80d8
Simon Pilgrim [Tue, 26 Feb 2019 13:22:35 +0000 (13:22 +0000)]
[AArch64] Add arithmetic zext bswap tests.
As requested on D58017.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354872
91177308-0d34-0410-b5e6-
96231b3b80d8
Xing GUO [Tue, 26 Feb 2019 13:06:16 +0000 (13:06 +0000)]
[llvm-objdump] Add `Version Definitions` dumper
Summary: `llvm-objdump` needs a `Version Definitions` dumper.
Reviewers: grimar, jhenderson
Reviewed By: grimar, jhenderson
Subscribers: rupprecht, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58615
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354871
91177308-0d34-0410-b5e6-
96231b3b80d8
Igor Kudrin [Tue, 26 Feb 2019 12:15:14 +0000 (12:15 +0000)]
[llvm-objdump] Implement -Mreg-names-raw/-std options.
The --disassembler-options, or -M, are used to customize
the disassembler and affect its output.
The two implemented options allow selecting register names on ARM:
* With -Mreg-names-raw, the disassembler uses rNN for all registers.
* With -Mreg-names-std it prints sp, lr and pc for r13, r14 and r15,
which is the default behavior of llvm-objdump.
Differential Revision: https://reviews.llvm.org/D57680
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354870
91177308-0d34-0410-b5e6-
96231b3b80d8
Simon Pilgrim [Tue, 26 Feb 2019 12:04:37 +0000 (12:04 +0000)]
[AArch64] Add 'free' zext bswap tests.
As requested on D58017.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354869
91177308-0d34-0410-b5e6-
96231b3b80d8
Luke Cheeseman [Tue, 26 Feb 2019 12:02:12 +0000 (12:02 +0000)]
[ARM] Add Cortex-M35P
- Add LLVM backend support for Cortex-M35P
- Documentation can be found at
https://developer.arm.com/products/processors/cortex-m/cortex-m35p
Differentail Revision: https://reviews.llvm.org/D57763
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354868
91177308-0d34-0410-b5e6-
96231b3b80d8
Simon Pilgrim [Tue, 26 Feb 2019 11:44:23 +0000 (11:44 +0000)]
[LegalizeDAG] Use APInt::getSplat helper to create bitreverse masks. NFCI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354867
91177308-0d34-0410-b5e6-
96231b3b80d8
Simon Pilgrim [Tue, 26 Feb 2019 11:27:53 +0000 (11:27 +0000)]
[LegalizeDAG] Expand SADDO/SSUBO using SADDSAT/SSUBSAT (PR37763)
If SADDSAT/SSUBSAT are legal, then we can expand SADDO/SSUBO by performing a ADD/SUB and a SADDO/SSUBO and then compare the results.
I looked at doing this for UADDO/USUBO as well but as we don't have to do as many range comparisons I didn't see any/much benefit.
Differential Revision: https://reviews.llvm.org/D58637
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354866
91177308-0d34-0410-b5e6-
96231b3b80d8
Simon Pilgrim [Tue, 26 Feb 2019 11:01:08 +0000 (11:01 +0000)]
[AMDGPU] Regenerate bswap/bitreverse tests.
Make codegen changes more obvious in D58017
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354863
91177308-0d34-0410-b5e6-
96231b3b80d8
Clement Courbet [Tue, 26 Feb 2019 10:54:45 +0000 (10:54 +0000)]
[llvm-exegesis] Teach llvm-exegesis to handle instructions with multiple tied variables.
Reviewers: gchatelet
Subscribers: tschuett, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58285
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354862
91177308-0d34-0410-b5e6-
96231b3b80d8
Eugene Leviant [Tue, 26 Feb 2019 09:24:22 +0000 (09:24 +0000)]
[llvm-objcopy] Add --set-start, --change-start and --adjust-start
Differential revision: https://reviews.llvm.org/D58173
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354854
91177308-0d34-0410-b5e6-
96231b3b80d8
Eugene Leviant [Tue, 26 Feb 2019 07:38:21 +0000 (07:38 +0000)]
[ThinLTO] Use defined node and edge order when dumping DOT file
Differential revision: https://reviews.llvm.org/D58631
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354850
91177308-0d34-0410-b5e6-
96231b3b80d8
Vlad Tsyrklevich [Tue, 26 Feb 2019 07:04:56 +0000 (07:04 +0000)]
Revert "Improve "llvm-nm -f sysv" output for Elf files"
This reverts commit r354833, it was causing ASan test failures on
sanitizer-x86_64-linux-fast.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354849
91177308-0d34-0410-b5e6-
96231b3b80d8
Chen Zheng [Tue, 26 Feb 2019 05:46:45 +0000 (05:46 +0000)]
[NFC] Add to contributor list.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354847
91177308-0d34-0410-b5e6-
96231b3b80d8
Dan Gohman [Tue, 26 Feb 2019 05:20:19 +0000 (05:20 +0000)]
[WebAssembly] Properly align fp128 arguments in outgoing varargs arguments
For outgoing varargs arguments, it's necessary to check the OrigAlign field
of the corresponding OutputArg entry to determine argument alignment, rather
than just computing an alignment from the argument value type. This is
because types like fp128 are split into multiple argument values, with
narrower types that don't reflect the ABI alignment of the full fp128.
This fixes the printf("printfL: %4.*Lf\n", 2, lval); testcase.
Differential Revision: https://reviews.llvm.org/D58656
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354846
91177308-0d34-0410-b5e6-
96231b3b80d8
Philip Reames [Tue, 26 Feb 2019 04:30:33 +0000 (04:30 +0000)]
[ARM] Be super conservative about atomics
As requested during review of D57601 <https://reviews.llvm.org/D57601> https://reviews.llvm.org/D57601, be equally conservative for atomic MMOs as for volatile MMOs in all in tree backends. At the moment, all atomic MMOs are also volatile, but I'm about to change that.
Differential Revision: https://reviews.llvm.org/D58490
Note: D58498 landed in several pieces as individual backends were approved. This is the last chunk.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354845
91177308-0d34-0410-b5e6-
96231b3b80d8
Heejin Ahn [Tue, 26 Feb 2019 04:08:49 +0000 (04:08 +0000)]
[WebAssembly] Fix a bug deleting instruction in a ranged for loop
Summary: We shouldn't delete elements while iterating a ranged for loop.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58519
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354844
91177308-0d34-0410-b5e6-
96231b3b80d8
Heejin Ahn [Tue, 26 Feb 2019 03:29:59 +0000 (03:29 +0000)]
[WebAssembly] Improve readability of EH tests
Summary:
- Indent check lines to easily figure out try-catch-end structure
- Add the original C++ code the tests were genereated from
- Add a few more lines to make the structure more readable
- Rename a couple function / structures
- Add label and branch annotations to cfg-stackify-eh.ll
- Temporarily delete check lines for `test1` in `cfg-stackify-eh.ll`
because it will be updated in a later CL soon and there's no point of
making it look better here
Reviewers: dschuff
Subscribers: sunfish, sbc100, jgravelle-google, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58562
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354842
91177308-0d34-0410-b5e6-
96231b3b80d8
Aaron Smith [Tue, 26 Feb 2019 03:23:56 +0000 (03:23 +0000)]
[CodeView] Emit HasConstructorOrDestructor class option for non-trivial constructors
Reviewers: zturner, rnk, llvm-commits, aleksandr.urakov
Reviewed By: zturner, rnk
Subscribers: jdoerfert, majnemer, asmith
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D44406
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354841
91177308-0d34-0410-b5e6-
96231b3b80d8
Reid Kleckner [Tue, 26 Feb 2019 02:30:00 +0000 (02:30 +0000)]
[llvm-cov] Fix llvm-cov on Windows and un-XFAIL test
Summary:
The llvm-cov tool needs to be able to find coverage names in the
executable, so the .lprfn and .lcovmap sections cannot be merged into
.rdata.
Also, the linker merges .lprfn$M into .lprfn, so llvm-cov needs to
handle that when looking up sections. It has to support running on both
relocatable object files and linked PE files.
Lastly, when loading .lprfn from a PE file, llvm-cov needs to skip the
leading zero byte added by the profile runtime.
Reviewers: vsk
Subscribers: hiraditya, #sanitizers, llvm-commits
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D58661
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354840
91177308-0d34-0410-b5e6-
96231b3b80d8
Reid Kleckner [Tue, 26 Feb 2019 02:11:25 +0000 (02:11 +0000)]
[X86] Fix bug in x86_intrcc with arg copy elision
Summary:
Use a custom calling convention handler for interrupts instead of fixing
up the locations in LowerMemArgument. This way, the offsets are correct
when constructed and we don't need to account for them in as many
places.
Depends on D56883
Replaces D56275
Reviewers: craig.topper, phil-opp
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D56944
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354837
91177308-0d34-0410-b5e6-
96231b3b80d8
Sunil Srivastava [Tue, 26 Feb 2019 00:19:39 +0000 (00:19 +0000)]
Improve "llvm-nm -f sysv" output for Elf files
Specifically, compute and Print Type and Section columns.
Differential Revision: https://reviews.llvm.org/D58263
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354833
91177308-0d34-0410-b5e6-
96231b3b80d8
Stanislav Mekhanoshin [Mon, 25 Feb 2019 22:59:55 +0000 (22:59 +0000)]
[AMDGPU] Added target to mir test. NFC.
Test was used without -mcpu, although tested instructions
not available on all ASICs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354830
91177308-0d34-0410-b5e6-
96231b3b80d8
Matt Arsenault [Mon, 25 Feb 2019 22:24:13 +0000 (22:24 +0000)]
RegBankSelect: Handle slightly more complex value mappings
Try to use concat_vectors. Also remove unnecessary assert on
pointers. Fixes asserting for <4 x s16> operations and 64-bit pointers
for AMDGPU.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354828
91177308-0d34-0410-b5e6-
96231b3b80d8
Matt Arsenault [Mon, 25 Feb 2019 21:32:48 +0000 (21:32 +0000)]
AMDGPU/GlobalISel: Fix bit ops for non-power-of-2 sizes
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354825
91177308-0d34-0410-b5e6-
96231b3b80d8
Roman Lebedev [Mon, 25 Feb 2019 21:11:19 +0000 (21:11 +0000)]
Revert "[Support] Make raw_string_ostream unbuffered"
Shame on me, did not run all the tests, bots are angry.
This reverts commit r354819.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354822
91177308-0d34-0410-b5e6-
96231b3b80d8
Simon Pilgrim [Mon, 25 Feb 2019 21:05:09 +0000 (21:05 +0000)]
[LangRef] *.overflow intrinsics now support vectors
We have all the necessary legalization, expansion and unrolling support required for the *.overflow intrinsics with vector types, so update the docs to make that clear.
Note: vectorization is not in place yet (the non-homogenous return types aren't well supported) so we still must explicitly use the vectors intrinsics and not reply on slp/loop.
Differential Revision: https://reviews.llvm.org/D58618
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354821
91177308-0d34-0410-b5e6-
96231b3b80d8
Roman Lebedev [Mon, 25 Feb 2019 20:51:49 +0000 (20:51 +0000)]
[Support] Make raw_string_ostream unbuffered
Summary:
In D58580 i have noted that `llvm::to_string()` is a memory hog.
It uses `raw_string_ostream`, and since it was buffered,
every `raw_string_ostream` had a cost of `BUFSIZ` bytes
(which is `8192` at least here). So every `llvm::to_string()`
call, even to just print an `int`, costed `8192` bytes.
In D58580, getting rid of that buffering //had// significant
performance and memory consumption improvements for `llvm-xray convert`.
Similarly, in D58580 @rnk pointed out that the `raw_svector_ostream`
is already unbuffered, and `write_unsigned_impl` and friends
do internal buffering. So it should be ok performance-wise to just
make the `raw_string_ostream` itself unbuffered.
Here, i don't have any perf measurements.
Another letdown is that i'm leaving a loose-end - not deleting the
`flush()` method. I don't expect that cleanup to be anything more
than just fixing every new compiler error, but i'm presently unable
to do that. Will look into that later.
Reviewers: rnk, zturner
Reviewed By: rnk
Subscribers: kristina, jdoerfert, llvm-commits, rnk
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58643
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354819
91177308-0d34-0410-b5e6-
96231b3b80d8
Matt Arsenault [Mon, 25 Feb 2019 20:46:06 +0000 (20:46 +0000)]
AMDGPU/GlobalISel: Clamp max implicit_def elements
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354818
91177308-0d34-0410-b5e6-
96231b3b80d8
Matt Arsenault [Mon, 25 Feb 2019 20:29:04 +0000 (20:29 +0000)]
RegisterScavenger: Allow fail without spill
AMDGPU wants to use this in some contexts where
the spilling is either impossible, or a worse alternative
to doing something else.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354816
91177308-0d34-0410-b5e6-
96231b3b80d8
Matt Arsenault [Mon, 25 Feb 2019 20:16:11 +0000 (20:16 +0000)]
AMDGPU: Remove IntrReadMem from memtime/memrealtime intrinsics
EarlyCSE with MemorySSA was able to use this to merge multiple calls
with no intervening store.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354814
91177308-0d34-0410-b5e6-
96231b3b80d8
Matt Arsenault [Mon, 25 Feb 2019 20:00:25 +0000 (20:00 +0000)]
GlobalISel: Make legalizer/regbankselect clear NoPHIs property
If no phi existed in the original MIR and these introduced one, the
verifier would fail.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354813
91177308-0d34-0410-b5e6-
96231b3b80d8
Craig Topper [Mon, 25 Feb 2019 19:42:47 +0000 (19:42 +0000)]
[X86] Improve detection of unneeded shift amount masking to also handle the case that the LHS has known zeroes in it
If the LHS has known zeros, the RHS immediate will have had bits removed. So call computeKnownBits to get the known zeroes so we can handle this case.
Differential Revision: https://reviews.llvm.org/D58475
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354811
91177308-0d34-0410-b5e6-
96231b3b80d8
Andrea Di Biagio [Mon, 25 Feb 2019 19:33:58 +0000 (19:33 +0000)]
Fix a sign compare warning breaking the -Werror build.
The warning was introduced at r354793.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354810
91177308-0d34-0410-b5e6-
96231b3b80d8
Matt Arsenault [Mon, 25 Feb 2019 19:24:46 +0000 (19:24 +0000)]
AMDGPU: Correct definitions for bitset instructions
These really read and write the result register, so these need a tied
input.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354809
91177308-0d34-0410-b5e6-
96231b3b80d8
Nikita Popov [Mon, 25 Feb 2019 18:54:17 +0000 (18:54 +0000)]
[Mips] Fix missing masking in fast-isel of br (PR40325)
Fixes https://bugs.llvm.org/show_bug.cgi?id=40325 by zero extending
(and x, 1) the condition before branching on it.
To avoid regressing trivial cases, I'm combining emission of cmp+br
sequences for the single-use + same block case (similar to what we
do in x86). icmpbr1.ll still regresses due to the cross-bb usage
of the condition.
Differential Revision: https://reviews.llvm.org/D58576
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354808
91177308-0d34-0410-b5e6-
96231b3b80d8
Amara Emerson [Mon, 25 Feb 2019 18:52:54 +0000 (18:52 +0000)]
[AArch64][GlobalISel] Refactor selectBuildVector to use MachineIRBuilder. NFC.
This is a preparatory change as I want to use emitScalarToVector() elsewhere,
and in general we want to transition to MIRBuilder instead of using BuildMI
directly.
Differential Revision: https://reviews.llvm.org/D58528
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354807
91177308-0d34-0410-b5e6-
96231b3b80d8
Philip Reames [Mon, 25 Feb 2019 17:36:10 +0000 (17:36 +0000)]
[Lanai] Be super conservative about atomics
As requested during review of D57601 <https://reviews.llvm.org/D57601>, be equally conservative for atomic MMOs as for volatile MMOs in all in tree backends. At the moment, all atomic MMOs are also volatile, but I'm about to change that.
Reviewed as part of https://reviews.llvm.org/D58490, with other backends still pending review.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354800
91177308-0d34-0410-b5e6-
96231b3b80d8
Simon Pilgrim [Mon, 25 Feb 2019 16:31:58 +0000 (16:31 +0000)]
[SelectionDAG] Add demanded elts variants to isConstOrConstSplat helpers. NFCI.
These helpers extend the existing isConstOrConstSplat helper checks to support DemandedElts masks as well.
We already had a local version of this in SelectionDAG that computeKnownBits/ComputeNumSignBits made use of, but this adds the functionality directly to the BuildVectorSDNode node and extends isConstOrConstSplat etc. to use that.
This will allow us to reuse the functionality in SimplifyDemandedVectorElts/SimplifyDemandedBits.
Differential Revision: https://reviews.llvm.org/D58503
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354797
91177308-0d34-0410-b5e6-
96231b3b80d8
Simon Pilgrim [Mon, 25 Feb 2019 16:02:01 +0000 (16:02 +0000)]
[DAGCombine] Add undef shuffle elt support to partitionShuffleOfConcats
Support undef shuffle mask indices in the shuffle(concat_vectors, concat_vectors) -> concat_vectors fold
Differential Revision: https://reviews.llvm.org/D58585
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354793
91177308-0d34-0410-b5e6-
96231b3b80d8
David Green [Mon, 25 Feb 2019 15:50:54 +0000 (15:50 +0000)]
[ARM] Add some more missing T1 opcodes for the peephole optimisier
This adds a few extra Thumb1 opcodes to improve the peephole opimisers
ability to remove redundant cmp instructions. tADC and tSBC require
a small fixup to prevent MOVS being moved past the instruction, giving
the wrong flags.
Differential Revision: https://reviews.llvm.org/D58281
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354791
91177308-0d34-0410-b5e6-
96231b3b80d8
Simon Pilgrim [Mon, 25 Feb 2019 15:42:02 +0000 (15:42 +0000)]
[Vectorizer] Add vectorization support for fixed smul/umul intrinsics
This requires a couple of tweaks to existing vectorization functions as they were assuming that only the second call argument (ctlz/cttz/powi) could ever be the 'always scalar' argument, but for smul.fix + umul.fix its the third argument.
Differential Revision: https://reviews.llvm.org/D58616
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354790
91177308-0d34-0410-b5e6-
96231b3b80d8
Luke Cheeseman [Mon, 25 Feb 2019 15:08:27 +0000 (15:08 +0000)]
[AArch64] Add support for Cortex-A76 and Cortex-A76AE
- Add LLVM backend support for Cortex-A76 and Cortex-A76AE
- Documentation can be found at
https://developer.arm.com/products/processors/cortex-a/cortex-a76
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354788
91177308-0d34-0410-b5e6-
96231b3b80d8
Eugene Leviant [Mon, 25 Feb 2019 14:12:41 +0000 (14:12 +0000)]
[llvm-objcopy] Add --add-symbol
Differential revision: https://reviews.llvm.org/D58234
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354787
91177308-0d34-0410-b5e6-
96231b3b80d8
Dmitri Gribenko [Mon, 25 Feb 2019 13:41:59 +0000 (13:41 +0000)]
Fixed typos in tests: s/CHEKC/CHECK/
Reviewers: ilya-biryukov
Subscribers: nemanjai, javed.absar, jsji, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D58611
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354785
91177308-0d34-0410-b5e6-
96231b3b80d8
Simon Pilgrim [Mon, 25 Feb 2019 13:30:23 +0000 (13:30 +0000)]
[TTI] Add generic cost model for smul/umul overflow intrinsics
Based off smul/umul fixed costs and the implementation in TargetLowering::expandMULO.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354784
91177308-0d34-0410-b5e6-
96231b3b80d8
Simon Pilgrim [Mon, 25 Feb 2019 13:26:30 +0000 (13:26 +0000)]
[SLPVectorizer][X86] Add fixed smul/umul tests
Baseline tests - fixed mul intrinsics aren't flagged as vectorizable yet
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354783
91177308-0d34-0410-b5e6-
96231b3b80d8
Xing GUO [Mon, 25 Feb 2019 13:13:19 +0000 (13:13 +0000)]
[llvm-objdump] Add `Version References` dumper
Summary: Add symbol version dumper for [#30241](https://bugs.llvm.org/show_bug.cgi?id=30241)
Reviewers: jhenderson, MaskRay, kristina, emaste, grimar
Reviewed By: jhenderson, grimar
Subscribers: grimar, rupprecht, jakehehrlich, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D54697
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354782
91177308-0d34-0410-b5e6-
96231b3b80d8
Dmitri Gribenko [Mon, 25 Feb 2019 13:12:33 +0000 (13:12 +0000)]
Fixed typos in tests: s/CEHCK/CHECK/
Reviewers: ilya-biryukov
Subscribers: sanjoy, sdardis, javed.absar, jrtc27, atanasyan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58608
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354781
91177308-0d34-0410-b5e6-
96231b3b80d8
Ganesh Gopalasubramanian [Mon, 25 Feb 2019 12:27:49 +0000 (12:27 +0000)]
Test commit (remove a blank space)
Change-Id: I69175571d3b1defeb85e96fdd87db5c3ccadcb63
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354775
91177308-0d34-0410-b5e6-
96231b3b80d8
Simon Pilgrim [Mon, 25 Feb 2019 11:59:23 +0000 (11:59 +0000)]
[TTI] Add generic cost model for fixed point smul/umul
Based on an IR equivalent of target lowering's generic expansion - target specific costs will typically be lower (IR doesn't have a good mull/mulh equivalent) but we need a baseline.
Differential Revision: https://reviews.llvm.org/D57925
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354774
91177308-0d34-0410-b5e6-
96231b3b80d8
Simon Pilgrim [Mon, 25 Feb 2019 11:19:37 +0000 (11:19 +0000)]
[X86] Merge ISD::ADD/SUB nodes into X86ISD::ADD/SUB equivalents (PR40483)
Avoid ADD/SUB instruction duplication by reusing the X86ISD::ADD/SUB results.
Includes ADD commutation - I tried to include NEG+SUB SUB commutation as well but this causes regressions as we don't have good combine coverage to simplify X86ISD::SUB.
Differential Revision: https://reviews.llvm.org/D58597
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354771
91177308-0d34-0410-b5e6-
96231b3b80d8
James Henderson [Mon, 25 Feb 2019 11:02:24 +0000 (11:02 +0000)]
[yaml2obj]Re-allow dynamic sections to have raw content
Recently, support was added to yaml2obj to allow dynamic sections to
have a list of entries, to make it easier to write tests with dynamic
sections. However, this change also removed the ability to provide
custom contents to the dynamic section, making it hard to test
malformed contents (e.g. because the section is not a valid size to
contain an array of entries). This change reinstates this. An error is
emitted if raw content and dynamic entries are both specified.
Reviewed by: grimar, ruiu
Differential Review: https://reviews.llvm.org/D58543
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354770
91177308-0d34-0410-b5e6-
96231b3b80d8
Simon Tatham [Mon, 25 Feb 2019 10:39:53 +0000 (10:39 +0000)]
[ARM] Make fullfp16 instructions not conditionalisable.
More or less all the instructions defined in the v8.2a full-fp16
extension are defined as UNPREDICTABLE if you put them in an IT block
(Thumb) or use with any condition other than AL (ARM). LLVM didn't
know that, and was happy to conditionalise them.
In order to force these instructions to count as not predicable, I had
to make a small Tablegen change. The code generation back end mostly
decides if an instruction was predicable by looking for something it
can identify as a predicate operand; there's an isPredicable bit flag
that overrides that check in the positive direction, but nothing that
overrides it in the negative direction.
(I considered the alternative approach of actually removing the
predicate operand from those instructions, but thought that it would
be more painful overall for instructions differing only in data type
to have different shapes of operand list. This way, the only code that
has to notice the difference is the if-converter.)
So I've added an isUnpredicable bit alongside isPredicable, and set
that bit on the right subset of FP16 instructions, and also on the
VSEL, VMAXNM/VMINNM and VRINT[ANPM] families which should be
unpredicable for all data types.
I've included a couple of representative regression tests, both of
which previously caused an fp16 instruction to be conditionalised in
ARM state and (with -arm-no-restrict-it) to be put in an IT block in
Thumb.
Reviewers: SjoerdMeijer, t.p.northover, efriedma
Reviewed By: efriedma
Subscribers: jdoerfert, javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57823
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354768
91177308-0d34-0410-b5e6-
96231b3b80d8
Roman Lebedev [Mon, 25 Feb 2019 09:36:12 +0000 (09:36 +0000)]
[llvm-exegesis] Split Epsilon param into two (PR40787)
Summary:
This eps param is used for two distinct things:
* initial point clusterization
* checking clusters against the llvm values
What if one wants to only look at highly different clusters, without changing
the clustering itself? In particular, this helps to weed out noisy measurements
(since the clusterization epsilon is still small, so there is a better chance
that noisy measurements from the same opcode will go into different clusters)
By splitting it into two params it is now possible.
This is nearly-free performance-wise:
Old:
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 10099 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
...
Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs):
390.01 msec task-clock # 0.998 CPUs utilized ( +- 0.25% )
12 context-switches # 31.735 M/sec ( +- 27.38% )
0 cpu-migrations # 0.000 K/sec
4745 page-faults # 12183.732 M/sec ( +- 0.54% )
1562711900 cycles #
4012303.327 GHz ( +- 0.24% ) (82.90%)
185567822 stalled-cycles-frontend # 11.87% frontend cycles idle ( +- 0.52% ) (83.30%)
392106234 stalled-cycles-backend # 25.09% backend cycles idle ( +- 1.31% ) (33.79%)
1839236666 instructions # 1.18 insn per cycle
# 0.21 stalled cycles per insn ( +- 0.15% ) (50.37%)
407035764 branches #
1045074878.710 M/sec ( +- 0.12% ) (66.80%)
10896459 branch-misses # 2.68% of all branches ( +- 0.17% ) (83.20%)
0.390629 +- 0.000972 seconds time elapsed ( +- 0.25% )
```
```
$ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 50572 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
...
Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (9 runs):
6803.36 msec task-clock # 0.999 CPUs utilized ( +- 0.96% )
262 context-switches # 38.546 M/sec ( +- 23.06% )
0 cpu-migrations # 0.065 M/sec ( +- 76.03% )
13287 page-faults # 1953.206 M/sec ( +- 0.32% )
27252537904 cycles #
4006024.257 GHz ( +- 0.95% ) (83.31%)
1496314935 stalled-cycles-frontend # 5.49% frontend cycles idle ( +- 0.97% ) (83.32%)
16128404524 stalled-cycles-backend # 59.18% backend cycles idle ( +- 0.30% ) (33.37%)
17611143370 instructions # 0.65 insn per cycle
# 0.92 stalled cycles per insn ( +- 0.05% ) (50.04%)
3894906599 branches #
572537147.437 M/sec ( +- 0.03% ) (66.69%)
116314514 branch-misses # 2.99% of all branches ( +- 0.20% ) (83.35%)
6.8118 +- 0.0689 seconds time elapsed ( +- 1.01%)
```
New:
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 10099 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new.html'
...
Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (25 runs):
400.14 msec task-clock # 0.998 CPUs utilized ( +- 0.66% )
12 context-switches # 29.429 M/sec ( +- 25.95% )
0 cpu-migrations # 0.100 M/sec ( +-100.00% )
4714 page-faults # 11796.496 M/sec ( +- 0.55% )
1603131306 cycles #
4011840.105 GHz ( +- 0.66% ) (82.85%)
199538509 stalled-cycles-frontend # 12.45% frontend cycles idle ( +- 2.40% ) (83.10%)
402249109 stalled-cycles-backend # 25.09% backend cycles idle ( +- 1.19% ) (34.05%)
1847783963 instructions # 1.15 insn per cycle
# 0.22 stalled cycles per insn ( +- 0.18% ) (50.64%)
407162722 branches #
1018925730.631 M/sec ( +- 0.12% ) (67.02%)
10932779 branch-misses # 2.69% of all branches ( +- 0.51% ) (83.28%)
0.40077 +- 0.00267 seconds time elapsed ( +- 0.67% )
lebedevri@pini-pini:/build/llvm-build-Clang-release$ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 50572 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new.html'
...
Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (9 runs):
6947.79 msec task-clock # 1.000 CPUs utilized ( +- 0.90% )
217 context-switches # 31.236 M/sec ( +- 36.16% )
1 cpu-migrations # 0.096 M/sec ( +- 50.00% )
13258 page-faults # 1908.389 M/sec ( +- 0.34% )
27830796523 cycles #
4006032.286 GHz ( +- 0.89% ) (83.30%)
1504554006 stalled-cycles-frontend # 5.41% frontend cycles idle ( +- 2.10% ) (83.32%)
16716574843 stalled-cycles-backend # 60.07% backend cycles idle ( +- 0.65% ) (33.38%)
17755545931 instructions # 0.64 insn per cycle
# 0.94 stalled cycles per insn ( +- 0.09% ) (50.04%)
3897255686 branches #
560980426.597 M/sec ( +- 0.06% ) (66.70%)
117045395 branch-misses # 3.00% of all branches ( +- 0.47% ) (83.34%)
6.9507 +- 0.0627 seconds time elapsed ( +- 0.90% )
```
I.e. it's +2.6% slowdown for one whole sweep, or +2% for 5 whole sweeps.
Within noise i'd say.
Should help with [[ https://bugs.llvm.org/show_bug.cgi?id=40787 | PR40787 ]].
Reviewers: courbet, gchatelet
Reviewed By: courbet
Subscribers: tschuett, RKSimon, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58476
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354767
91177308-0d34-0410-b5e6-
96231b3b80d8
Roman Lebedev [Mon, 25 Feb 2019 07:39:07 +0000 (07:39 +0000)]
[XRay][tools] Revert "Use Support/JSON.h in llvm-xray convert"
Summary:
This reverts D50129 / rL338834: [XRay][tools] Use Support/JSON.h in llvm-xray convert
Abstractions are great.
Readable code is great.
JSON support library is a *good* idea.
However unfortunately, there is an internal detail that one needs
to be aware of in `llvm::json::Object` - it uses `llvm::DenseMap`.
So for **every** `llvm::json::Object`, even if you only store a single `int`
entry there, you pay the whole price of `llvm::DenseMap`.
Unfortunately, it matters for `llvm-xray`.
I was trying to analyse the `llvm-exegesis` analysis mode performance,
and for that i wanted to view the LLVM X-Ray log visualization in Chrome
trace viewer. And the `llvm-xray convert` is sluggish, and sometimes
even ended up being killed by OOM.
`xray-log.llvm-exegesis.lwZ0sT` was acquired from `llvm-exegesis`
(compiled with ` -fxray-instruction-threshold=128`)
analysis mode over `-benchmarks-file` with 10099 points (one full
latency measurement set), with normal runtime of 0.387s.
Timings:
Old: (copied from D58580)
```
$ perf stat -r 5 ./bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT
Performance counter stats for './bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT' (5 runs):
21346.24 msec task-clock # 1.000 CPUs utilized ( +- 0.28% )
314 context-switches # 14.701 M/sec ( +- 59.13% )
1 cpu-migrations # 0.037 M/sec ( +-100.00% )
2181354 page-faults # 102191.251 M/sec ( +- 0.02% )
85477442102 cycles #
4004415.019 GHz ( +- 0.28% ) (83.33%)
14526427066 stalled-cycles-frontend # 16.99% frontend cycles idle ( +- 0.70% ) (83.33%)
32371533721 stalled-cycles-backend # 37.87% backend cycles idle ( +- 0.27% ) (33.34%)
67896890228 instructions # 0.79 insn per cycle
# 0.48 stalled cycles per insn ( +- 0.03% ) (50.00%)
14592654840 branches #
683631198.653 M/sec ( +- 0.02% ) (66.67%)
212207534 branch-misses # 1.45% of all branches ( +- 0.94% ) (83.34%)
21.3502 +- 0.0585 seconds time elapsed ( +- 0.27% )
```
New:
```
$ perf stat -r 9 ./bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT
Performance counter stats for './bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT' (9 runs):
7178.38 msec task-clock # 1.000 CPUs utilized ( +- 0.26% )
182 context-switches # 25.402 M/sec ( +- 28.84% )
0 cpu-migrations # 0.046 M/sec ( +- 70.71% )
33701 page-faults # 4694.994 M/sec ( +- 0.88% )
28761053971 cycles #
4006833.933 GHz ( +- 0.26% ) (83.32%)
2028297997 stalled-cycles-frontend # 7.05% frontend cycles idle ( +- 1.61% ) (83.32%)
10773154901 stalled-cycles-backend # 37.46% backend cycles idle ( +- 0.38% ) (33.36%)
36199132874 instructions # 1.26 insn per cycle
# 0.30 stalled cycles per insn ( +- 0.03% ) (50.02%)
6434504227 branches #
896420204.421 M/sec ( +- 0.03% ) (66.68%)
73355176 branch-misses # 1.14% of all branches ( +- 1.46% ) (83.33%)
7.1807 +- 0.0190 seconds time elapsed ( +- 0.26% )
```
So using `llvm::json` nearly triples run-time on that test case.
(+3x is times, not percent.)
Memory:
Old:
```
total runtime: 39.88s.
bytes allocated in total (ignoring deallocations): 79.07GB (1.98GB/s)
calls to allocation functions:
33267816 (834135/s)
temporary memory allocations:
5832298 (146235/s)
peak heap memory consumption: 9.21GB
peak RSS (including heaptrack overhead): 147.98GB
total memory leaked: 1.09MB
```
New:
```
total runtime: 17.42s.
bytes allocated in total (ignoring deallocations): 5.12GB (293.86MB/s)
calls to allocation functions:
21382982 (
1227284/s)
temporary memory allocations: 232858 (13364/s)
peak heap memory consumption: 350.69MB
peak RSS (including heaptrack overhead): 2.55GB
total memory leaked: 79.95KB
```
Diff:
```
total runtime: -22.46s.
bytes allocated in total (ignoring deallocations): -73.95GB (3.29GB/s)
calls to allocation functions: -
11884834 (529155/s)
temporary memory allocations: -
5599440 (249307/s)
peak heap memory consumption: -8.86GB
peak RSS (including heaptrack overhead): 0B
total memory leaked: -1.01MB
```
So using `llvm::json` increases *peak* memory consumption on *this* testcase ~+27x.
And total allocation count +15x. Both of these numbers are times, *not* percent.
And note that memory usage is clearly unbound with `llvm::json`, it directly depends
on the length of the log, so peak memory consumption is always increasing.
This isn't so with the dumb code, there is no accumulating memory consumption,
peak memory consumption is fixed. Naturally, that means it will handle *much*
larger logs without OOM'ing.
Readability is good, but the price is simply unacceptable here.
Too bad none of this analysis was done as part of the development/review D50129 itself.
Reviewers: dberris, kpw, sammccall
Reviewed By: dberris
Subscribers: riccibruno, hans, courbet, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58584
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354764
91177308-0d34-0410-b5e6-
96231b3b80d8
Craig Topper [Mon, 25 Feb 2019 03:11:44 +0000 (03:11 +0000)]
[SelectionDAG] Add a OPC_CheckChild2CondCode to SelectionDAGISel to remove a MoveChild and MoveParent pair.
OPC_CheckCondCode is always used as operand 2 of a setcc. And its always surrounded by a MoveChild2 and a MoveParent. By having a dedicated opcode for this case we can reduce the number of bytes needed for this pattern from 4 bytes to 2.
This saves ~3000 bytes in the X86 table.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354763
91177308-0d34-0410-b5e6-
96231b3b80d8
Kang Zhang [Mon, 25 Feb 2019 02:46:16 +0000 (02:46 +0000)]
[PowerPC] [PowerPC] Enhance the fast selection of fptoi & fptrunc instruction and clean up related asserts
Summary:
Fast selection of llvm fptoi & fptrunc instructions is not handled well about
VSX instruction support.
We'd use VSX float convert integer instruction instead of non-vsx float convert
integer instruction if the operand register class is VSSRC or VSFRC because i32
and i64 are mapped to VSSRC and VSFRC correspondingly if VSX feature is
openeded.
For float trunc instruction, we do this silimar work like float convert integer
instruction to try to use VSX instruction.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D58430
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354762
91177308-0d34-0410-b5e6-
96231b3b80d8
Nikita Popov [Sun, 24 Feb 2019 21:55:37 +0000 (21:55 +0000)]
[InstCombine] Add tests for PR40846; NFC
The icmps are the same as the overflow result of the intrinsic.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354760
91177308-0d34-0410-b5e6-
96231b3b80d8
Nikita Popov [Sun, 24 Feb 2019 21:55:31 +0000 (21:55 +0000)]
[InstCombine] Move with.overflow tests to separate file; NFC
And regenerate checks. I had to rename some variables, because
update_test_checks can't deal with the same variable names used
in lower and upper case. I've also dropped the result type aliases,
as just using the type directly gives a cleaner result.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354759
91177308-0d34-0410-b5e6-
96231b3b80d8
Simon Pilgrim [Sun, 24 Feb 2019 21:13:29 +0000 (21:13 +0000)]
[X86] Add PR40483 test cases
Demonstrate failure to merge ISD::ADD(x,y)/X86ISD::ADD(x,y) + ISD::SUB(x,y)/X86ISD::SUB(x,y) equivalent ops
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354758
91177308-0d34-0410-b5e6-
96231b3b80d8
Simon Pilgrim [Sun, 24 Feb 2019 19:57:52 +0000 (19:57 +0000)]
[X86] Combine zext(packus(x),packus(y)) -> concat(x,y) (PR39637)
Its proving tricky to combine shuffles across multiple vector sizes, so for now I'm adding this more specific combine - the pattern is common enough to be worth it as a first step.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354757
91177308-0d34-0410-b5e6-
96231b3b80d8
Craig Topper [Sun, 24 Feb 2019 19:33:37 +0000 (19:33 +0000)]
[X86] Fix tls variable lowering issue with large code model
Summary:
The problem here is the lowering for tls variable. Below is the DAG for the code.
SelectionDAG has 11 nodes:
t0: ch = EntryToken
t8: i64,ch = load<(load 8 from `i8 addrspace(257)* null`, addrspace 257)> t0, Constant:i64<0>, undef:i64
t10: i64 = X86ISD::WrapperRIP TargetGlobalTLSAddress:i64<i32* @x> 0 [TF=10]
t11: i64,ch = load<(load 8 from got)> t0, t10, undef:i64
t12: i64 = add t8, t11
t4: i32,ch = load<(dereferenceable load 4 from @x)> t0, t12, undef:i64
t6: ch = CopyToReg t0, Register:i32 %0, t4
And when mcmodel is large, below instruction can NOT be folded.
t10: i64 = X86ISD::WrapperRIP TargetGlobalTLSAddress:i64<i32* @x> 0 [TF=10]
t11: i64,ch = load<(load 8 from got)> t0, t10, undef:i64
So "t11: i64,ch = load<(load 8 from got)> t0, t10, undef:i64" is lowered to " Morphed node: t11: i64,ch = MOV64rm<Mem:(load 8 from got)> t10, TargetConstant:i8<1>, Register:i64 $noreg, TargetConstant:i32<0>, Register:i32 $noreg, t0"
When llvm start to lower "t10: i64 = X86ISD::WrapperRIP TargetGlobalTLSAddress:i64<i32* @x> 0 [TF=10]", it fails.
The patch is to fold the load and X86ISD::WrapperRIP.
Fixes PR26906
Patch by LuoYuanke
Reviewers: craig.topper, rnk, annita.zhang, wxiao3
Reviewed By: rnk
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58336
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354756
91177308-0d34-0410-b5e6-
96231b3b80d8
Craig Topper [Sun, 24 Feb 2019 19:23:41 +0000 (19:23 +0000)]
[X86][SSE] Use pblendw for v4i32/v2i64 during isel.
Summary:
Previously we used BLENDPS/BLENDPD but that puts the blend in the FP domain. Under optsize, the two address instruction pass can cause blendps/blendpd to commute to blendps/blendpd. But we probably shouldn't do that if the original type was a integer. So use pblendw instead.
Reviewers: spatel, RKSimon
Reviewed By: RKSimon
Subscribers: jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58574
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354755
91177308-0d34-0410-b5e6-
96231b3b80d8
Craig Topper [Sun, 24 Feb 2019 19:23:39 +0000 (19:23 +0000)]
[X86] Correct some ADC/SBB with immediate scheduler data for Broadwell and Skylake.
Summary:
The AX/EAX/RAX with immediate forms are 2 uops just like the AL with immediate.
The modrm form with r8 and immediate is a single uop just like r16/r32/r64 with immediate.
Reviewers: RKSimon, andreadb
Reviewed By: RKSimon
Subscribers: gbedwell, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58581
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354754
91177308-0d34-0410-b5e6-
96231b3b80d8
Craig Topper [Sun, 24 Feb 2019 19:23:36 +0000 (19:23 +0000)]
[LegalizeTypes][AArch64][X86] Make type legalization of vector (S/U)ADD/SUB/MULO follow getSetCCResultType for the overflow bits. Make UnrollVectorOverflowOp properly convert from scalar boolean contents to vector boolean contents
Summary:
When promoting the over flow vector for these ops we should use the target's desired setcc result type. This way a v8i32 result type will use a v8i32 overflow vector instead of a v8i16 overflow vector. A v8i16 overflow vector will cause LegalizeDAG/LegalizeVectorOps to have to use v8i32 and truncate to v8i16 in its expansion. By doing this in type legalization instead, we get the truncate into the DAG earlier and give DAG combine more of a chance to optimize it.
We also have to fix unrolling to use the scalar setcc result type for the scalarized operation, and convert it to the required vector element type after the scalar operation. We have to observe the vector boolean contents when doing this conversion. The previous code was just taking the scalar result and putting it in the vector. But for X86 and AArch64 that would have only put a the boolean value in bit 0 of the element and left all other bits in the element 0. We need to ensure all bits in the element are the same. I'm using a select with constants here because that's what setcc unrolling in LegalizeVectorOps used.
Reviewers: spatel, RKSimon, nikic
Reviewed By: nikic
Subscribers: javed.absar, kristof.beyls, dmgreen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58567
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354753
91177308-0d34-0410-b5e6-
96231b3b80d8
Sanjay Patel [Sun, 24 Feb 2019 17:31:15 +0000 (17:31 +0000)]
[InstCombine] add test for icmp+add fold; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354750
91177308-0d34-0410-b5e6-
96231b3b80d8
Simon Pilgrim [Sun, 24 Feb 2019 17:30:06 +0000 (17:30 +0000)]
[X86][AVX] Rename lowerShuffleByMerging128BitLanes to lowerShuffleAsLanePermuteAndRepeatedMask. NFC.
Name better matches the other similar 'lane permute' and 'repeated mask' functions we have.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354749
91177308-0d34-0410-b5e6-
96231b3b80d8
Sanjay Patel [Sun, 24 Feb 2019 16:57:45 +0000 (16:57 +0000)]
[InstCombine] canonicalize add/sub with bool
add A, sext(B) --> sub A, zext(B)
We have to choose 1 of these forms, so I'm opting for the
zext because that's easier for value tracking.
The backend should be prepared for this change after:
D57401
rL353433
This is also a preliminary step towards reducing the amount
of bit hackery that we do in IR to optimize icmp/select.
That should be waiting to happen at a later optimization stage.
The seeming regression in the fuzzer test was discussed in:
D58359
We were only managing that fold in instcombine by luck, and
other passes should be able to deal with that better anyway.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354748
91177308-0d34-0410-b5e6-
96231b3b80d8
Sanjay Patel [Sun, 24 Feb 2019 16:11:58 +0000 (16:11 +0000)]
[InstCombine] regenerate checks; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354747
91177308-0d34-0410-b5e6-
96231b3b80d8
Sanjay Patel [Sun, 24 Feb 2019 15:31:27 +0000 (15:31 +0000)]
[CGP] add special-cases to form unsigned add with overflow (PR40486)
There's likely a missed IR canonicalization for at least 1 of these
patterns. Otherwise, we wouldn't have needed the pattern-matching
enhancement in D57516.
Note that -- unlike usubo added with D57789 -- the TLI hook for
this transform defaults to 'on'. So if there's any perf fallout
from this, targets should look at how they're lowering the uaddo
node in SDAG and/or override that hook.
The x86 diffs suggest that there's some missing pattern-matching
for forming inc/dec.
This should fix the remaining known problems in:
https://bugs.llvm.org/show_bug.cgi?id=40486
https://bugs.llvm.org/show_bug.cgi?id=31754
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354746
91177308-0d34-0410-b5e6-
96231b3b80d8
Simon Pilgrim [Sun, 24 Feb 2019 13:31:52 +0000 (13:31 +0000)]
Fix "enumeral and non-enumeral type in conditional expression" gcc7 warning. NFCI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354745
91177308-0d34-0410-b5e6-
96231b3b80d8
Heejin Ahn [Sun, 24 Feb 2019 08:30:06 +0000 (08:30 +0000)]
[WebAssembly] Rename a variable in CFGStackify (NFC)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354744
91177308-0d34-0410-b5e6-
96231b3b80d8
Heejin Ahn [Sun, 24 Feb 2019 08:19:55 +0000 (08:19 +0000)]
[WebAssembly] Merge two identical switch case routines into one (NFC)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354743
91177308-0d34-0410-b5e6-
96231b3b80d8
Philip Reames [Sun, 24 Feb 2019 00:45:09 +0000 (00:45 +0000)]
[Hexagon, SystemZ] Be super conservative about atomics
As requested during review of D57601, be equally conservative for atomic MMOs as for volatile MMOs in all in tree backends. At the moment, all atomic MMOs are also volatile, but I'm about to change that.
Reviewed as part of https://reviews.llvm.org/D58490, with other backends still pending review.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354740
91177308-0d34-0410-b5e6-
96231b3b80d8
Duncan P. N. Exon Smith [Sat, 23 Feb 2019 23:48:47 +0000 (23:48 +0000)]
VFS: Avoid some unnecessary std::string copies
Thread Twine a little deeper through the VFS to avoid unnecessarily
constructing the same std::string twice in a parameter sequence:
Twine -> std::string -> StringRef -> std::string
Changing a few parameters from StringRef to Twine avoids the early call
to `Twine::str()`.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354739
91177308-0d34-0410-b5e6-
96231b3b80d8
Craig Topper [Sat, 23 Feb 2019 21:41:44 +0000 (21:41 +0000)]
[TwoAddressInstructionPass] After commuting an instruction and before trying to look for more commutable operands, resample the number of operands.
The new instruciton might have less operands than the original instruction. If we don't resample, the next loop iteration might read an operand that doesn't exist.
X86 can commute blends to movss/movsd which reduces from 4 operands to 3. This happened in the test case that caused r354363 & company to be reverted. A reduced version of that has been committed here.
Really this whole checking for more commutable operands is a little fragile. It assumes that the new instructions operands are the same order and positions as the original except for the pair that was swapped. I don't know of anything that breaks this assumption today, but I've left a fixme. Fixing this will likely require an interface change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354738
91177308-0d34-0410-b5e6-
96231b3b80d8
Craig Topper [Sat, 23 Feb 2019 21:41:42 +0000 (21:41 +0000)]
Recommit r354363 "[X86][SSE] Generalize X86ISD::BLENDI support to more value types"
And its follow ups r354511, r354640.
A follow patch will fix the issue that caused it to be reverted.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354737
91177308-0d34-0410-b5e6-
96231b3b80d8
Craig Topper [Sat, 23 Feb 2019 19:51:32 +0000 (19:51 +0000)]
Recommit r354647 and r354648 "[LegalizeTypes] When promoting the result of EXTRACT_SUBVECTOR, also check if the input needs to be promoted. Use that to determine the element type to extract"
r354648 was a follow up to fix a regression "[X86] Add a DAG combine for (aext_vector_inreg (aext_vector_inreg X)) -> (aext_vector_inreg X) to fix a regression from my previous commit."
These were reverted in r354713 as their context depended on other patches that were reverted for a bug.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354734
91177308-0d34-0410-b5e6-
96231b3b80d8
Nikita Popov [Sat, 23 Feb 2019 18:59:01 +0000 (18:59 +0000)]
[WebAssembly] Fix select of and (PR40805)
Fixes https://bugs.llvm.org/show_bug.cgi?id=40805 introduced by
patterns added in D53676.
I'm removing the patterns entirely here, as they are not correct
in the general case. If necessary something more specific can be
added in the future.
Differential Revision: https://reviews.llvm.org/D58575
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354733
91177308-0d34-0410-b5e6-
96231b3b80d8
Simon Pilgrim [Sat, 23 Feb 2019 18:53:03 +0000 (18:53 +0000)]
[X86][AVX] combineInsertSubvector - remove concat_vectors(load(x),load(x)) --> sub_vbroadcast(x)
D58053/rL354340 added this to EltsFromConsecutiveLoads directly
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354732
91177308-0d34-0410-b5e6-
96231b3b80d8
Simon Pilgrim [Sat, 23 Feb 2019 18:49:02 +0000 (18:49 +0000)]
Fix MSVC constant truncation warnings. NFCI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354731
91177308-0d34-0410-b5e6-
96231b3b80d8
Simon Pilgrim [Sat, 23 Feb 2019 18:34:05 +0000 (18:34 +0000)]
[X86][AVX] concat_vectors(scalar_to_vector(x),scalar_to_vector(x)) --> broadcast(x)
For AVX1, limit this to i32/f32/i64/f64 loading cases only.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354730
91177308-0d34-0410-b5e6-
96231b3b80d8
Simon Pilgrim [Sat, 23 Feb 2019 17:10:47 +0000 (17:10 +0000)]
[X86][AVX] Shuffle->Permute+Blend if we have one v4f64/v4i64 shuffle input in place
Even on AVX1 we can pretty cheaply (VPERM2F128+VSHUFPD) permute a single v4f64/v4i64 input (on AVX2 its just a single VPERMPD), followed by a BLENDPD.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354729
91177308-0d34-0410-b5e6-
96231b3b80d8
Simon Dardis [Sat, 23 Feb 2019 15:56:32 +0000 (15:56 +0000)]
[MIPS] Fix a incorrect test. (NFC)
This test is incorrect as it should be using the microMIPSR6 instruction to
return, not the microMIPS version.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354726
91177308-0d34-0410-b5e6-
96231b3b80d8