OSDN Git Service
Simon Pilgrim [Fri, 8 Jun 2018 17:58:42 +0000 (17:58 +0000)]
[X86][SSE] Support v8i16/v16i16 rotations
Extension to D46954 (PR37426), this patch adds support for v8i16/v16i16 rotations in a similar manner - the conversion of the shift/rotate amount to a multiplication factor and the use of PMULLW to shift left and PMULHUW (ISD::MULHU) to shift the wrapped bits back around to be ORd together.
Differential Revision: https://reviews.llvm.org/D47822
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334309
91177308-0d34-0410-b5e6-
96231b3b80d8
Sanjay Patel [Fri, 8 Jun 2018 17:54:28 +0000 (17:54 +0000)]
[x86] add tests for node-level FMF; NFC
These cases should be optimized using the change from D47911.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334308
91177308-0d34-0410-b5e6-
96231b3b80d8
Sanjay Patel [Fri, 8 Jun 2018 17:42:35 +0000 (17:42 +0000)]
[x86] regenerate test checks; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334307
91177308-0d34-0410-b5e6-
96231b3b80d8
Michael Berg [Fri, 8 Jun 2018 17:39:50 +0000 (17:39 +0000)]
Utilize new SDNode flag functionality to expand current support for fsub
Summary: This patch originated from D46562 and is a proper subset, with some issues addressed for fsub.
Reviewers: spatel, hfinkel, wristow, arsenm
Reviewed By: spatel
Subscribers: wdng
Differential Revision: https://reviews.llvm.org/D47910
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334306
91177308-0d34-0410-b5e6-
96231b3b80d8
Florian Hahn [Fri, 8 Jun 2018 17:30:45 +0000 (17:30 +0000)]
[VPlan] Move recipe construction to VPRecipeBuilder.
This patch moves the recipe-creation functions out of
LoopVectorizationPlanner, which should do the high-level
orchestration of the transformations.
Reviewers: dcaballe, rengolin, hsaito, Ayal
Reviewed By: dcaballe
Differential Revision: https://reviews.llvm.org/D47595
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334305
91177308-0d34-0410-b5e6-
96231b3b80d8
Simon Pilgrim [Fri, 8 Jun 2018 17:00:45 +0000 (17:00 +0000)]
[X86][BtVer2] Add support for all SUB/XOR 32/64 scalar instructions that should match the dependency-breaking 'zero-idiom'
As detailed on Agner's Microarchitecture doc (21.8 AMD Bobcat and Jaguar pipeline - Dependency-breaking instructions), these instructions are dependency breaking and fast-path zero the destination register (and appropriate EFLAGS bits).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334303
91177308-0d34-0410-b5e6-
96231b3b80d8
Simon Pilgrim [Fri, 8 Jun 2018 16:40:15 +0000 (16:40 +0000)]
[X86] Fix schedule-x86_64.s tests to use different registers in reg-reg cases
Same fix as rL334110: I noticed while working on zero-idiom + dependency-breaking support (PR36671) that most of our binary instruction schedule tests were reusing the same src registers, which would cause the tests to fail once we enable scalar zero-idiom support on btver2.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334302
91177308-0d34-0410-b5e6-
96231b3b80d8
Daniil Fukalov [Fri, 8 Jun 2018 16:29:04 +0000 (16:29 +0000)]
[AMDGPU] Inline asm - added i16, half and i128 types support
AMDGPU inline assembler support i16, half and i128 typed variables in constraints, but they were reported as error.
Needed to fix https://github.com/RadeonOpenCompute/ROCm/issues/341,
e.g. to be able to load with global_load_dwordx4 to a 128bit integer variable
Differential Revision: https://reviews.llvm.org/D44920
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334301
91177308-0d34-0410-b5e6-
96231b3b80d8
Daniil Fukalov [Fri, 8 Jun 2018 16:22:52 +0000 (16:22 +0000)]
reapply r334209 with fixes for harfbuzz in Chromium
r334209 description:
[LSR] Check yet more intrinsic pointer operands
the patch fixes another assertion in isLegalUse()
Differential Revision: https://reviews.llvm.org/D47794
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334300
91177308-0d34-0410-b5e6-
96231b3b80d8
Roman Lebedev [Fri, 8 Jun 2018 15:44:53 +0000 (15:44 +0000)]
[NFC][InstSimplify] SimplifyAddInst(): coding style: variable names.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334299
91177308-0d34-0410-b5e6-
96231b3b80d8
Roman Lebedev [Fri, 8 Jun 2018 15:44:47 +0000 (15:44 +0000)]
[InstSimplify] add nuw %x, -1 -> -1 fold.
Summary:
`%ret = add nuw i8 %x, C`
From [[ https://llvm.org/docs/LangRef.html#add-instruction | langref ]]:
nuw and nsw stand for “No Unsigned Wrap” and “No Signed Wrap”,
respectively. If the nuw and/or nsw keywords are present,
the result value of the add is a poison value if unsigned
and/or signed overflow, respectively, occurs.
So if `C` is `-1`, `%x` can only be `0`, and the result is always `-1`.
I'm not sure we want to use `KnownBits`/`LVI` here, because there is
exactly one possible value (all bits set, `-1`), so some other pass
should take care of replacing the known-all-ones with constant `-1`.
The `test/Transforms/InstCombine/set-lowbits-mask-canonicalize.ll` change *is* confusing.
What happening is, before this: (omitting `nuw` for simplicity)
1. First, InstCombine D47428/rL334127 folds `shl i32 1, %NBits`) to `shl nuw i32 -1, %NBits`
2. Then, InstSimplify D47883/rL334222 folds `shl nuw i32 -1, %NBits` to `-1`,
3. `-1` is inverted to `0`.
But now:
1. *This* InstSimplify fold `%ret = add nuw i32 %setbit, -1` -> `-1` happens first,
before InstCombine D47428/rL334127 fold could happen.
Thus we now end up with the opposite constant,
and it is all good: https://rise4fun.com/Alive/OA9
https://rise4fun.com/Alive/sldC
Was mentioned in D47428 review.
Follow-up for D47883.
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47908
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334298
91177308-0d34-0410-b5e6-
96231b3b80d8
Simon Pilgrim [Fri, 8 Jun 2018 15:43:00 +0000 (15:43 +0000)]
[X86][BtVer2] Remove SBB tests that were accidentally added in rL334296
These aren't true zero-idiom instructions (just dependency breaking).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334297
91177308-0d34-0410-b5e6-
96231b3b80d8
Simon Pilgrim [Fri, 8 Jun 2018 15:28:43 +0000 (15:28 +0000)]
[X86][BtVer2] Add tests for scalar SUB/XOR instructions that should match the dependency-breaking 'zero-idiom'
As detailed on Agner's Microarchitecture doc (21.8 AMD Bobcat and Jaguar pipeline - Dependency-breaking instructions).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334296
91177308-0d34-0410-b5e6-
96231b3b80d8
Alexander Kornienko [Fri, 8 Jun 2018 15:19:16 +0000 (15:19 +0000)]
commandLineFitsWithinSystemLimits Overestimates System Limits
Summary:
The function `llvm::sys::commandLineFitsWithinSystemLimits` appears to be overestimating the system limits. This issue was discovered while attempting to enable response files in the Swift compiler. When the compiler submits its frontend jobs, those jobs are subjected to the system limits on command line length. `commandLineFitsWithinSystemLimits` is used to determine if the job's arguments need to be wrapped in a response file. There are some cases where the argument size for the job passes `commandLineFitsWithinSystemLimits`, but actually exceeds the real system limit, and the job fails.
`clang` also uses this function to decide whether or not to wrap it's job arguments in response files. See: https://github.com/llvm-mirror/clang/blob/master/lib/Driver/Driver.cpp#L1341. Clang will also fail for response files who's size falls within a certain range. I wrote a script that should find a failure point for `clang++`. All that is needed to run it is Python 2.7, and a simple "hello world" program for `test.cc`. It should run on Linux and on macOS. The script is available here: https://gist.github.com/dabelknap/
71bd083cd06b91c5b3cef6a7f4d3d427. When it hits a failure point, you should see a `clang: error: unable to execute command: posix_spawn failed: Argument list too long`.
The proposed solution is to mirror the behavior of `xargs` in `commandLinefitsWithinSystemLimits`. `xargs` defaults to 128k for the command line length size (See: https://fossies.org/dox/findutils-4.6.0/buildcmd_8c_source.html#l00551). It adjusts this depending on the value of `ARG_MAX`.
Reviewers: alexfh
Reviewed By: alexfh
Subscribers: llvm-commits
Tags: #clang
Patch by Austin Belknap!
Differential Revision: https://reviews.llvm.org/D47795
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334295
91177308-0d34-0410-b5e6-
96231b3b80d8
Zachary Turner [Fri, 8 Jun 2018 15:16:25 +0000 (15:16 +0000)]
Clean up some code in Program.
NFC here, this just raises some platform specific ifdef hackery
out of a class and creates proper platform-independent typedefs
for the relevant things. This allows these typedefs to be
reused in other places without having to reinvent this preprocessor
logic.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334294
91177308-0d34-0410-b5e6-
96231b3b80d8
Zachary Turner [Fri, 8 Jun 2018 15:15:56 +0000 (15:15 +0000)]
Add a file open flag that disables O_CLOEXEC.
O_CLOEXEC is the right default, but occasionally you don't
want this. This is especially true for tools like debuggers
where you might need to spawn the child process with specific
files already open, but it's occasionally useful in other
scenarios as well, like when you want to do some IPC between
parent and child.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334293
91177308-0d34-0410-b5e6-
96231b3b80d8
Simon Pilgrim [Fri, 8 Jun 2018 15:01:40 +0000 (15:01 +0000)]
[X86][BtVer2] Limit zero idiom tests to a single iteration.
Reduces output size and we're only wanting to check that the instructions are fast-path'd (just Dispatch+Retire) anyhow
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334292
91177308-0d34-0410-b5e6-
96231b3b80d8
Simon Pilgrim [Fri, 8 Jun 2018 14:53:52 +0000 (14:53 +0000)]
Fix Wdocumentation warning for unknown param. NFCI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334291
91177308-0d34-0410-b5e6-
96231b3b80d8
Simon Pilgrim [Fri, 8 Jun 2018 14:07:21 +0000 (14:07 +0000)]
[X86][SSE] Add SSE2/AVX2 vector rotate tests
Now that we're custom lowering vector rotates for SSE in general we should be testing the combines with them as well.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334290
91177308-0d34-0410-b5e6-
96231b3b80d8
Simon Pilgrim [Fri, 8 Jun 2018 13:59:11 +0000 (13:59 +0000)]
[X86][SSE] Simplify combineVectorTruncationWithPACKUS to reduce code duplication
Simplify combineVectorTruncationWithPACKUS to mask the upper bits followed by calling truncateVectorWithPACK instead of duplicating with similar code.
This results in the codegen using (V)PACKUSDW on SSE41+ targets for vXi64/vXi32 inputs where before it always used PACKUSWB (along with a lot more bitcasting).
I've raised PR37749 as until we avoid unnecessary concats back to 256-bit for bitwise ops, we can't avoid splitting the input value into 128-bit subvectors for masking.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334289
91177308-0d34-0410-b5e6-
96231b3b80d8
Sanjay Patel [Fri, 8 Jun 2018 13:53:13 +0000 (13:53 +0000)]
[x86] restore test comment; NFC
The description got deleted along with the FIXME note in
rL334268.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334288
91177308-0d34-0410-b5e6-
96231b3b80d8
Artur Pilipenko [Fri, 8 Jun 2018 13:03:21 +0000 (13:03 +0000)]
[BPI] Apply invoke heuristic before loop branch heuristic
Currently the loop branch heuristic is applied before the invoke heuristic which makes us overestimate the probability of the unwind destination of invokes inside loops. This in turn makes us grossly underestimate the frequencies of loops with invokes.
Reviewed By: skatkov, vsk
Differential Revision: https://reviews.llvm.org/D47371
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334285
91177308-0d34-0410-b5e6-
96231b3b80d8
Florian Hahn [Fri, 8 Jun 2018 12:53:51 +0000 (12:53 +0000)]
[VPlan] Move recipe based VPlan generation to separate function.
This first step separates VPInstruction-based and VPRecipe-based
VPlan creation, which should make it easier to migrate to VPInstruction
based code-gen step by step.
Reviewers: Ayal, rengolin, dcaballe, hsaito, mkuper, mzolotukhin
Reviewed By: dcaballe
Subscribers: bollu, tschuett, rkruppe, llvm-commits
Differential Revision: https://reviews.llvm.org/D47477
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334284
91177308-0d34-0410-b5e6-
96231b3b80d8
Henry Wong [Fri, 8 Jun 2018 12:42:12 +0000 (12:42 +0000)]
[ADT] Add `StringRef::rsplit(StringRef Separator)`.
Summary: Add `StringRef::rsplit(StringRef Separator)` to achieve the function of getting the tail substring according to the separator. A typical usage is to get `data` in `std::basic_string::data`.
Reviewers: mehdi_amini, zturner, beanz, xbolva00, vsk
Reviewed By: zturner, xbolva00, vsk
Subscribers: vsk, xbolva00, llvm-commits, MTC
Differential Revision: https://reviews.llvm.org/D47406
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334283
91177308-0d34-0410-b5e6-
96231b3b80d8
Simon Dardis [Fri, 8 Jun 2018 10:55:34 +0000 (10:55 +0000)]
[mips] Correct the predicates for a number of codegen only instructions
Reviewers: smaksimovic, atanasyan, abeserminji
Differential Revision: https://reviews.llvm.org/D47638
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334280
91177308-0d34-0410-b5e6-
96231b3b80d8
Alex Bradbury [Fri, 8 Jun 2018 10:39:05 +0000 (10:39 +0000)]
[RISCV] Implement MC layer support for the fence.tso instruction
The instruction makes use of a previously ignored field in the fence
instruction. It is introduced in the version 2.3 draft of the RISC-V
specification after much work by the Memory Model Task Group.
As clarified here <https://github.com/riscv/riscv-isa-manual/issues/186>,
the fence.tso assembler mnemonic does not have operands.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334278
91177308-0d34-0410-b5e6-
96231b3b80d8
Simon Pilgrim [Fri, 8 Jun 2018 10:29:00 +0000 (10:29 +0000)]
[X86][SSE] Consistently prefer lowering to PACKUS over PACKSS
We have some combines/lowerings that attempt to use PACKSS-then-PACKUS and others that use PACKUS-then-PACKSS.
PACKUS is much easier to combine with if we know the upper bits are zero as ComputeKnownBits can easily see through BITCASTs etc. especially now that rL333995 and rL334007 have landed. It also effectively works at byte level which further simplifies shuffle combines.
The only (minor) annoyances are that ComputeKnownBits can sometimes take longer as it doesn't fail as quickly as ComputeNumSignBits (but I'm not seeing any actual regressions in tests) and PACKUSDW only became available after SSE41 so we have more codegen diffs between targets.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334276
91177308-0d34-0410-b5e6-
96231b3b80d8
Florian Hahn [Fri, 8 Jun 2018 09:54:04 +0000 (09:54 +0000)]
[TableGen] Make DAGInstruction own Pattern to avoid leaking it.
Reviewers: dsanders, craig.topper, stoklund, nhaehnle
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D47525
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334275
91177308-0d34-0410-b5e6-
96231b3b80d8
Roman Shirokiy [Fri, 8 Jun 2018 08:21:20 +0000 (08:21 +0000)]
[LV] Fix PR36983. For a given recurrence, fix all phis in exit block
There could be more than one PHIs in exit block using same loop recurrence.
Don't assume there is only one and fix each user.
Differential Revision: https://reviews.llvm.org/D47788
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334271
91177308-0d34-0410-b5e6-
96231b3b80d8
Matt Arsenault [Fri, 8 Jun 2018 08:05:54 +0000 (08:05 +0000)]
AMDGPU: Error on LDS global address in functions
These won't work as expected now, so error on them to avoid
wasting time debugging this in the future.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334269
91177308-0d34-0410-b5e6-
96231b3b80d8
Sam Parker [Fri, 8 Jun 2018 07:49:04 +0000 (07:49 +0000)]
[DAGCombine] Fix for PR37667
While trying to propagate AND masks back to loads, we currently allow
one non-load node to be included as a leaf in chain. This fix now
limits that node to produce only a single data value.
Differential Revision: https://reviews.llvm.org/D47878
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334268
91177308-0d34-0410-b5e6-
96231b3b80d8
Hiroshi Inoue [Fri, 8 Jun 2018 04:00:54 +0000 (04:00 +0000)]
[NFC] fix formatting
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334263
91177308-0d34-0410-b5e6-
96231b3b80d8
Craig Topper [Fri, 8 Jun 2018 01:09:31 +0000 (01:09 +0000)]
[X86] Improve some shuffle decoding code to remove a conditional from a loop and reduce the number of temporary variables. NFCI
The NumControlBits variable was definitely sketchy. I think that only worked because the expected value was 1 or 2 and the number of lanes was 2 or 4. Had their been 8 lanes the number of bits should have been 3 not 4 as the previous code would have given.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334258
91177308-0d34-0410-b5e6-
96231b3b80d8
Tony Tye [Fri, 8 Jun 2018 01:00:11 +0000 (01:00 +0000)]
[AMDGPU] Simplify memory legalizer (add missing virtual descructor)
Differential Revision: https://reviews.llvm.org/D47504
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334257
91177308-0d34-0410-b5e6-
96231b3b80d8
Reid Kleckner [Fri, 8 Jun 2018 00:43:27 +0000 (00:43 +0000)]
Revert r334209 "[LSR] Check yet more intrinsic pointer operands"
This causes cast failures when compiling harfbuzz in Chromium.
Reproducer on the way.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334254
91177308-0d34-0410-b5e6-
96231b3b80d8
Gabor Buella [Thu, 7 Jun 2018 23:32:18 +0000 (23:32 +0000)]
NFC Fix a comment in ValueTypes.td
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334247
91177308-0d34-0410-b5e6-
96231b3b80d8
Zachary Turner [Thu, 7 Jun 2018 23:25:13 +0000 (23:25 +0000)]
Expose a single global file open function.
This one allows much more flexibility than the standard
openFileForRead / openFileForWrite functions. Since there is now
just one "real" function that does the work, all other implementations
simply delegate to this one.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334246
91177308-0d34-0410-b5e6-
96231b3b80d8
Michael Berg [Thu, 7 Jun 2018 22:49:09 +0000 (22:49 +0000)]
propagate fast math flags via IR on fma and sub expressions
Summary: This change uses fmf subflags to guard fma optimizations as well as unsafe. These changes originated from D46483 and have been simplified via getNode.
Reviewers: spatel, arsenm, hfinkel, javed.absar
Reviewed By: spatel
Subscribers: nemanjai, wdng
Differential Revision: https://reviews.llvm.org/D47388
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334242
91177308-0d34-0410-b5e6-
96231b3b80d8
Tony Tye [Thu, 7 Jun 2018 22:28:32 +0000 (22:28 +0000)]
[AMDGPU] Simplify memory legalizer
- Make code easier to maintain.
- Avoid generating waitcnts for VMEM if the address sppace does not involve VMEM.
- Add support to generate waitcnts for LDS and GDS memory.
Differential Revision: https://reviews.llvm.org/D47504
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334241
91177308-0d34-0410-b5e6-
96231b3b80d8
Roman Lebedev [Thu, 7 Jun 2018 21:19:50 +0000 (21:19 +0000)]
[NFC][InstSimplify] Add tests for add nuw %x, -1 -> -1 fold.
%ret = add nuw i8 %x, C
From langref:
nuw and nsw stand for “No Unsigned Wrap” and “No Signed Wrap”,
respectively. If the nuw and/or nsw keywords are present,
the result value of the add is a poison value if unsigned
and/or signed overflow, respectively, occurs.
So if C is -1, %x can only be 0, and the result is always -1.
https://rise4fun.com/Alive/sldC
Was mentioned in D47428 review.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334236
91177308-0d34-0410-b5e6-
96231b3b80d8
Roman Lebedev [Thu, 7 Jun 2018 21:19:45 +0000 (21:19 +0000)]
[NFC][InstSimplify] One more negative test for shl nuw C, %x -> C fold.
Follow-up for rL334200, rL334206.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334235
91177308-0d34-0410-b5e6-
96231b3b80d8
Petr Hosek [Thu, 7 Jun 2018 21:01:32 +0000 (21:01 +0000)]
[Support] Link libzircon.so when building LLVM for Fuchsia
This is necessary for zx_* symbols.
Differential Revision: https://reviews.llvm.org/D47848
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334232
91177308-0d34-0410-b5e6-
96231b3b80d8
Zachary Turner [Thu, 7 Jun 2018 20:37:22 +0000 (20:37 +0000)]
Try to fix build.
I don't know how to build this code, but based on the failing
buildbot error message it looks like this change should get
the buildbot up and running again.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334231
91177308-0d34-0410-b5e6-
96231b3b80d8
Zachary Turner [Thu, 7 Jun 2018 20:07:08 +0000 (20:07 +0000)]
Fix unused private variable.
This parameter got lost in the refactor. Add it back.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334223
91177308-0d34-0410-b5e6-
96231b3b80d8
Roman Lebedev [Thu, 7 Jun 2018 20:03:45 +0000 (20:03 +0000)]
[InstSimplify] shl nuw C, %x -> C iff signbit is set on C.
Summary:
`%r = shl nuw i8 C, %x`
As per langref:
```
If the nuw keyword is present, then the shift produces
a poison value if it shifts out any non-zero bits.
```
Thus, if the sign bit is set on `C`, then `%x` can only be `0`,
which means that `%r` can only be `C`.
Or in other words, set sign bit means that the signed value
is negative, so the constant is `<= 0`.
https://rise4fun.com/Alive/WMk
https://rise4fun.com/Alive/udv
Was mentioned in D47428 review.
We already handle the `0` constant, https://godbolt.org/g/UZq1sJ, so this only handles negative constants.
Could use computeKnownBits() / LazyValueInfo,
but the cost-benefit analysis (https://reviews.llvm.org/D47891)
suggests it isn't worth it.
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47883
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334222
91177308-0d34-0410-b5e6-
96231b3b80d8
Zachary Turner [Thu, 7 Jun 2018 19:58:58 +0000 (19:58 +0000)]
[FileSystem] Split up the OpenFlags enumeration.
This breaks the OpenFlags enumeration into two separate
enumerations: OpenFlags and CreationDisposition. The first
controls the behavior of the API depending on whether or not
the target file already exists, and is not a flags-based
enum. The second controls more flags-like values.
This yields a more easy to understand API, while also allowing
flags to be passed to the openForRead api, where most of the
values didn't make sense before. This also makes the apis more
testable as it becomes easy to enumerate all the configurations
which make sense, so I've added many new tests to exercise all
the different values.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334221
91177308-0d34-0410-b5e6-
96231b3b80d8
Matt Arsenault [Thu, 7 Jun 2018 19:42:27 +0000 (19:42 +0000)]
DAG: Avoid bitcast/ext/build_vector combine
This avoids regressions in a future AMDGPU change
to make v4i16/v4f16 legal. For these types, build_vector
is implemented as bitcasted operations on v2i32. This
combine was creating v4i16s out of what would have been
already been a v2i32 build_vector, creating a mess
of nodes that never get cleaned up.
I'm not sure this is the right condition to check.
I initially tried just checking for the legality of the
new build_vector. This works for my case, but breaks dozens
of x86 tests. A Mips test seems to show some improvement
or at least a neutral change. I don't want to think
about how long it would take to analyze the set of
different x86 vector operations impacted.
Test included in future commit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334218
91177308-0d34-0410-b5e6-
96231b3b80d8
Alexander Shaposhnikov [Thu, 7 Jun 2018 19:41:42 +0000 (19:41 +0000)]
[llvm-objcopy] Remove unused field from Object
The class Object contains std::shared_ptr<MemoryBuffer> OwnedData
which is not used anywhere. Besides avoiding two stage initialization
the motivation to remove it comes from the plan to add (currently missing) support
for static libraries.
NFC.
Test plan: make check-all
Differential revision: https://reviews.llvm.org/D47855
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334217
91177308-0d34-0410-b5e6-
96231b3b80d8
Sanjay Patel [Thu, 7 Jun 2018 18:21:24 +0000 (18:21 +0000)]
[TargetLibraryInfo] add mappings from LLVM sin/cos intrinsics to SVML calls
These weren't included in D19544 - probably just an oversight.
D40044 made it more likely that we'll have LLVM math intrinsics rather
than libcalls, so this bug was more easily exposed.
As the tests/code show, we already have the complete mappings for pow/exp/log.
I don't have any experience with SVML, so I don't know if anything else is
missing. It's also not clear to me that we should be doing this transform in
IR rather than DAG/isel, but that's a separate issue.
Differential Revision: https://reviews.llvm.org/D47610
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334211
91177308-0d34-0410-b5e6-
96231b3b80d8
Daniil Fukalov [Thu, 7 Jun 2018 17:30:58 +0000 (17:30 +0000)]
[LSR] Check yet more intrinsic pointer operands
the patch fixes another assertion in isLegalUse()
Differential Revision: https://reviews.llvm.org/D47794
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334209
91177308-0d34-0410-b5e6-
96231b3b80d8
David Carlier [Thu, 7 Jun 2018 16:33:48 +0000 (16:33 +0000)]
[docs] add various sanitisers support for FreeBSD/OpenBSD
since couple of months, supports had been enabled for FreeBSD and OpenBSD.
Reviewers: thakis, spatel, dim
Reviewed By: dim
Differential Revision: https://reviews.llvm.org/D47322
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334207
91177308-0d34-0410-b5e6-
96231b3b80d8
Roman Lebedev [Thu, 7 Jun 2018 16:18:26 +0000 (16:18 +0000)]
[NFC][InstSimplify] Add more tests for shl nuw C, %x -> C fold.
Follow-up for rL334200.
For these, KnownBits will be needed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334206
91177308-0d34-0410-b5e6-
96231b3b80d8
Simon Pilgrim [Thu, 7 Jun 2018 16:08:40 +0000 (16:08 +0000)]
[X86][SSE] Updated comment - combineVectorSignBitsTruncation handles PACKSS and PACKUS. NFCI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334204
91177308-0d34-0410-b5e6-
96231b3b80d8
Alex Bradbury [Thu, 7 Jun 2018 15:35:47 +0000 (15:35 +0000)]
[RISCV] AsmParser support for the li pseudo instruction
The implementation follows the MIPS backend and expands the pseudo instruction
directly during asm parsing. As the result, only real MC instructions are
emitted to the MCStreamer. The actual expansion to real instructions is
similar to the expansion performed by the GNU Assembler.
This patch supersedes D41949.
Differential Revision: https://reviews.llvm.org/D46118
Patch by Mario Werner.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334203
91177308-0d34-0410-b5e6-
96231b3b80d8
Alex Bradbury [Thu, 7 Jun 2018 15:29:09 +0000 (15:29 +0000)]
[AVR] Fix build after r334078
r334078 added MCSubtargetInfo to fixupNeedsRelaxation and applyFixup. This
patch makes the necessary adjustment for the AVR target.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334202
91177308-0d34-0410-b5e6-
96231b3b80d8
Simon Pilgrim [Thu, 7 Jun 2018 14:53:32 +0000 (14:53 +0000)]
[X86][SSE] Simplify combineVectorTruncationWithPACKUS. NFCI.
Move code only used by combineVectorTruncationWithPACKUS out of combineVectorTruncation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334201
91177308-0d34-0410-b5e6-
96231b3b80d8
Roman Lebedev [Thu, 7 Jun 2018 14:18:38 +0000 (14:18 +0000)]
[NFC][InstSimplify] Add tests for shl nuw C, %x -> C fold.
%r = shl nuw i8 C, %x
As per langref: If the nuw keyword is present, then the shift produces
a poison value if it shifts out any non-zero bits.
Thus, if the sign bit is set on C, then %x can only be 0,
which means that %r can only be C.
https://rise4fun.com/Alive/WMk
Was mentioned in D47428 review.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334200
91177308-0d34-0410-b5e6-
96231b3b80d8
Sanjay Patel [Thu, 7 Jun 2018 14:11:18 +0000 (14:11 +0000)]
[x86] add tests for backwards propagate mask bug (PR37060, PR37667); NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334199
91177308-0d34-0410-b5e6-
96231b3b80d8
Guillaume Chatelet [Thu, 7 Jun 2018 14:00:29 +0000 (14:00 +0000)]
[llvm-exegesis] Make BenchmarkRunner handle multiple configurations.
Summary: BenchmarkRunner subclasses can now create many configurations - although this patch still generates one.
Reviewers: courbet
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D47877
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334197
91177308-0d34-0410-b5e6-
96231b3b80d8
Paul Semel [Thu, 7 Jun 2018 13:30:55 +0000 (13:30 +0000)]
[llvm-objdump] Add -R option
This option prints dynamic relocation entries of the given file
Differential Revision: https://reviews.llvm.org/D47493
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334196
91177308-0d34-0410-b5e6-
96231b3b80d8
Hiroshi Inoue [Thu, 7 Jun 2018 13:21:14 +0000 (13:21 +0000)]
[PowerPC] avoid unprofitable Repl32 flag in BitPermutationSelector
BitPermutationSelector sets Repl32 flag for bit groups which can be (potentially) benefit from 32-bit rotate-and-mask instructions with bit replication, i.e. rlwinm/rlwimi copies lower 32 bits into upper 32 bits on 64-bit PowerPC before rotation.
However, enforcing 32-bit instruction sometimes results in redundant generated code.
For example, the following simple code is compiled into rotldi + rlwimi while it can be compiled into only rldimi instruction if Repl32 flag is not set on the bit group for (a & 0xFFFFFFFF).
uint64_t func(uint64_t a, uint64_t b) {
return (a & 0xFFFFFFFF) | (b << 32) ;
}
To avoid such problem, this patch checks the potential benefit of Repl32 flag before setting it. If a bit group does not require rotation (i.e. RLAmt == 0) and won't be merged into another group, we do not benefit from Repl32 flag on this group.
Differential Revision: https://reviews.llvm.org/D47867
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334195
91177308-0d34-0410-b5e6-
96231b3b80d8
Petar Jovanovic [Thu, 7 Jun 2018 13:06:06 +0000 (13:06 +0000)]
[Mips] Silencing warnings in instruction info (NFC)
isORCopyInst and isReadOrWriteToDSPReg functions were producing warning
that some statements my fall through.
Patch by Nikola Prica.
Differential Revision: https://reviews.llvm.org/D47876
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334194
91177308-0d34-0410-b5e6-
96231b3b80d8
Simon Pilgrim [Thu, 7 Jun 2018 13:01:42 +0000 (13:01 +0000)]
[X86][SSE] Simplify combineVectorTruncationWithPACKSS to reduce code duplication
Simplify combineVectorTruncationWithPACKSS to just a SIGN_EXTEND_INREG followed by using the existing truncateVectorWithPACK instead of duplicating code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334193
91177308-0d34-0410-b5e6-
96231b3b80d8
Hiroshi Inoue [Thu, 7 Jun 2018 12:49:12 +0000 (12:49 +0000)]
[PowerPC] fix trivial typos in comment, NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334191
91177308-0d34-0410-b5e6-
96231b3b80d8
Matt Arsenault [Thu, 7 Jun 2018 12:16:31 +0000 (12:16 +0000)]
AMDGPU: Fix not including v2f64 in SReg_128
Fixes assertion with calls returning v2f64.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334189
91177308-0d34-0410-b5e6-
96231b3b80d8
Simon Pilgrim [Thu, 7 Jun 2018 11:22:52 +0000 (11:22 +0000)]
[X86][SSE] Add extra trunc(shl) test cases
The existing trunc_shl_17_v8i16_v8i32 test case should (but doesn't) fold to zero, I've added 2 new test cases:
- trunc_shl_16_v8i16_v8i32 which folds to zero (this is actually testing the target faux shuffle combine)
- trunc_shl_15_v8i16_v8i32 which should perform the full shl + truncate
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334188
91177308-0d34-0410-b5e6-
96231b3b80d8
Florian Hahn [Thu, 7 Jun 2018 11:09:05 +0000 (11:09 +0000)]
[Mem2Reg] Avoid replacing load with itself in promoteSingleBlockAlloca.
We do the same thing in rewriteSingleStoreAlloca.
Fixes PR37632.
Reviewers: chandlerc, davide, efriedma
Reviewed By: davide
Differential Revision: https://reviews.llvm.org/D47825
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334187
91177308-0d34-0410-b5e6-
96231b3b80d8
Matt Arsenault [Thu, 7 Jun 2018 10:15:20 +0000 (10:15 +0000)]
AMDGPU: Use scalar operations for f16 fabs/fneg patterns
Fixes unnecessary differences between subtargets.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334184
91177308-0d34-0410-b5e6-
96231b3b80d8
Simon Pilgrim [Thu, 7 Jun 2018 10:13:09 +0000 (10:13 +0000)]
[X86] Regenerate rotate tests
Add 32-bit tests to show missed SHLD/SHRD cases
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334183
91177308-0d34-0410-b5e6-
96231b3b80d8
Paul Semel [Thu, 7 Jun 2018 10:05:25 +0000 (10:05 +0000)]
[llvm-strip] Expose --strip-unneeded option
Differential Revision: https://reviews.llvm.org/D47818
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334182
91177308-0d34-0410-b5e6-
96231b3b80d8
Matt Arsenault [Thu, 7 Jun 2018 09:54:49 +0000 (09:54 +0000)]
AMDGPU: Try a lot harder to emit scalar loads
This has two main components. First, widen
widen short constant loads in DAG when they have
the correct alignment. This is already done a bit in
AMDGPUCodeGenPrepare, since that has access to
DivergenceAnalysis. This can't help kernarg loads
created in the DAG. Start to use DAG divergence analysis
to help this case.
The second part is to avoid kernel argument lowering
breaking the alignment of short vector elements because
calling convention lowering wants to split everything
into legal register types.
When loading a split type, load the nearest 4-byte aligned
segment and shift to get the desired bits. This extra
load of the earlier argument piece ends up merging,
and the bit extract hopefully folds out.
There are a number of improvements and regressions with
this, but I think as-is this is a better compromise between
several of the worst parts of SelectionDAG.
Particularly when i16 is legal, this produces worse code
for i8 and i16 element vector kernel arguments. This is
partially due to the very weak load merging the DAG does.
It only looks for fairly specific combines between pairs
of loads which no longer appear. In particular this
causes v4i16 loads to be split into 2 components when
previously the two halves were merged.
Worse, because of the newly introduced shifts, there
is a lot more unnecessary vector packing and unpacking code
emitted. At least some of this is due to reporting
false for isTypeDesirableForOp for i16 as a workaround for
the lack of divergence information in the DAG. The cases
where this happens it doesn't actually matter, but the
relevant code in SimplifyDemandedBits doens't have the context
to know to ignore this.
The use of the scalar cache is probably more important
than the mess of mostly scalar instructions doing this packing
and unpacking. Future work can fix this, possibly by making better
use of the new DAG divergence information for controlling promotion
decisions, or adding another version of shift + trunc + shift
combines that doesn't only know about the used types.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334180
91177308-0d34-0410-b5e6-
96231b3b80d8
Clement Courbet [Thu, 7 Jun 2018 09:26:33 +0000 (09:26 +0000)]
[X86][NFC] Fix harmless typo in BtVer2 model.
See D46356 for context.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334178
91177308-0d34-0410-b5e6-
96231b3b80d8
Tomasz Krupa [Thu, 7 Jun 2018 08:48:45 +0000 (08:48 +0000)]
[X86] Block UndefRegUpdate
Summary: Prevent folding of operations with memory loads when one of the sources has undefined register update.
Reviewers: craig.topper
Subscribers: llvm-commits, mike.dvoretsky, ashlykov
Differential Revision: https://reviews.llvm.org/D47621
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334175
91177308-0d34-0410-b5e6-
96231b3b80d8
Max Kazantsev [Thu, 7 Jun 2018 08:47:19 +0000 (08:47 +0000)]
[NFC] Use variable instead of accessing pair many times
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334173
91177308-0d34-0410-b5e6-
96231b3b80d8
Tomasz Krupa [Thu, 7 Jun 2018 08:20:28 +0000 (08:20 +0000)]
Test commit access.
Added a bunch of periods after comments.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334171
91177308-0d34-0410-b5e6-
96231b3b80d8
Guillaume Chatelet [Thu, 7 Jun 2018 08:11:54 +0000 (08:11 +0000)]
[llvm-exegesis] Add a Configuration object for Benchmark.
Summary: This is the first step to have the BenchmarkRunner create and measure many different configurations (different initial values for instance).
Reviewers: courbet
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D47826
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334169
91177308-0d34-0410-b5e6-
96231b3b80d8
Guillaume Chatelet [Thu, 7 Jun 2018 07:51:16 +0000 (07:51 +0000)]
[llvm-exegesis] Improve error reporting.
Summary: BenchmarkResult IO functions now return an Error or Expected so caller can deal take proper action.
Reviewers: courbet
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D47868
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334167
91177308-0d34-0410-b5e6-
96231b3b80d8
Guillaume Chatelet [Thu, 7 Jun 2018 07:40:40 +0000 (07:40 +0000)]
[llvm-exegesis] Serializes instruction's operand in BenchmarkResult's key.
Summary: Follow up patch to https://reviews.llvm.org/D47764.
Reviewers: courbet
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D47785
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334165
91177308-0d34-0410-b5e6-
96231b3b80d8
Clement Courbet [Thu, 7 Jun 2018 07:37:49 +0000 (07:37 +0000)]
[X86][NFC] Fix harmless typos in BDW/ZnVer1 sched models.
See D46356 for context.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334164
91177308-0d34-0410-b5e6-
96231b3b80d8
Karl-Johan Karlsson [Thu, 7 Jun 2018 07:20:33 +0000 (07:20 +0000)]
[BranchFolding] Fix live-in's when hoisting code
Summary:
When the branch folder hoist code into a predecessor it adjust live-in's
in the blocks it hoist code from. However it fail to handle hoisted code
that contain a defed register that originally is live-in in the block
through a super register.
This is fixed by replacing the live-in handling code with calls to
utility functions in LivePhysRegs.
Reviewers: kparzysz, gberry, MatzeB, uweigand, aprantl
Reviewed By: kparzysz
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47529
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334163
91177308-0d34-0410-b5e6-
96231b3b80d8
Jonas Paulsson [Thu, 7 Jun 2018 05:59:07 +0000 (05:59 +0000)]
[SystemZ] Build Load And Test from scratch in convertToLoadAndTest.
This is needed to get CC operand in right place, as expected by the
SchedModel.
Review: Ulrich Weigand
https://reviews.llvm.org/D47820
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334161
91177308-0d34-0410-b5e6-
96231b3b80d8
Michael Zolotukhin [Thu, 7 Jun 2018 00:19:29 +0000 (00:19 +0000)]
SpeculativeExecution Pass: Set PreserveCFG to avoid unnecessary analyses invalidation.
The pass doesn't touch CFG in any way, only moves instructions between
blocks.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334150
91177308-0d34-0410-b5e6-
96231b3b80d8
Peter Collingbourne [Thu, 7 Jun 2018 00:06:41 +0000 (00:06 +0000)]
Add definition for ELF dynamic tag DT_SYMTAB_SHNDX.
DT_SYMTAB_SHNDX is defined in generic-abi:
http://www.sco.com/developers/gabi/latest/ch5.dynamic.html
Patch by Rahul Chaudhry!
Differential Revision: https://reviews.llvm.org/D47803
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334149
91177308-0d34-0410-b5e6-
96231b3b80d8
Peter Collingbourne [Thu, 7 Jun 2018 00:02:07 +0000 (00:02 +0000)]
llvm-readobj: fix printing number of relocations in Android packed format.
With '-elf-output-style=GNU -relocations', a header containing the number
of entries is printed before all the relocation entries in the section.
For Android packed format, we need to perform the unpacking first before
we can get the actual number of relocations in the section.
Patch by Rahul Chaudhry!
Differential Revision: https://reviews.llvm.org/D47800
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334147
91177308-0d34-0410-b5e6-
96231b3b80d8
Stanislav Mekhanoshin [Wed, 6 Jun 2018 22:22:32 +0000 (22:22 +0000)]
[AMDGPU] Improve reciprocal handling
When denormals are supported we are producing a full division for
1.0f / x. That still can be replaced by the faster version:
bool c = fabs(x) > 0x1.0p+96f;
float s = c ? 0x1.0p-32f : 1.0f;
x *= s;
return s * v_rcp_f32(x)
in case if requested accuracy is 2.5ulp or less. The same version
is used if denormals are not supported for non 1.0 numerators, where
just v_rcp_f32 is then used for 1.0 numerator.
The optimization of 1/x is extended to the case -1/x, which is the
same except for the resulting sign bit.
OpenCL conformance passed with both enabled and disabled denorms.
Differential Revision: https://reviews.llvm.org/D47805
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334142
91177308-0d34-0410-b5e6-
96231b3b80d8
Teresa Johnson [Wed, 6 Jun 2018 22:22:01 +0000 (22:22 +0000)]
[ThinLTO] Rename index IsAnalysis flag to HaveGVs (NFC)
With the upcoming patch to add summary parsing support, IsAnalysis would
be true in contexts where we are not performing module summary analysis.
Rename to the more specific and approprate HaveGVs, which is essentially
what this flag is indicating.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334140
91177308-0d34-0410-b5e6-
96231b3b80d8
Sanjay Patel [Wed, 6 Jun 2018 21:58:12 +0000 (21:58 +0000)]
[InstCombine] fold another shifty abs pattern to cmp+sel (PR36036)
The bug report:
https://bugs.llvm.org/show_bug.cgi?id=36036
...requests a DAG change for this, but an IR canonicalization
probably handles most cases. If we still want to match this
pattern in the backend, there's a proposal for that too:
D47831
Alive proofs including nsw/nuw cases that were first noted in:
D46988
https://rise4fun.com/Alive/Kmp
This patch is largely copied from the existing code that was
initially added with:
D40984
...but I didn't see much gain from trying to share code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334137
91177308-0d34-0410-b5e6-
96231b3b80d8
Petr Hosek [Wed, 6 Jun 2018 21:43:37 +0000 (21:43 +0000)]
[CMake] Pass additional CMake tools to external projects
This is needed when the external projects try to use other tools
besides just the compiler and the linker.
Differential Revision: https://reviews.llvm.org/D47833
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334136
91177308-0d34-0410-b5e6-
96231b3b80d8
Sanjay Patel [Wed, 6 Jun 2018 21:32:42 +0000 (21:32 +0000)]
[InstCombine] add tests for another abs() pattern (PR36036); NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334133
91177308-0d34-0410-b5e6-
96231b3b80d8
Matt Arsenault [Wed, 6 Jun 2018 21:28:11 +0000 (21:28 +0000)]
AMDGPU: Custom lower v2f16 fneg/fabs with illegal f16
Fixes terrible code on targets without f16 support. The
legalization creates a mess that is difficult to recover
from. Also should avoid randomly breaking these tests
multiple times in sequence in future commits.
Some regressions in cases where it happens to be better
to pull the source modifier after the conversion.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334132
91177308-0d34-0410-b5e6-
96231b3b80d8
Alexander Shaposhnikov [Wed, 6 Jun 2018 21:23:19 +0000 (21:23 +0000)]
[llvm-strip] Expose --discard-all option
Expose objcopy's --discard-all option in llvm-strip.
Test plan: make check-all
Differential revision: https://reviews.llvm.org/D47750
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334131
91177308-0d34-0410-b5e6-
96231b3b80d8
Roman Lebedev [Wed, 6 Jun 2018 19:38:27 +0000 (19:38 +0000)]
[InstCombine] PR37603: low bit mask canonicalization
Summary:
This is [[ https://bugs.llvm.org/show_bug.cgi?id=37603 | PR37603 ]].
https://godbolt.org/g/VCMNpS
https://rise4fun.com/Alive/idM
When doing bit manipulations, it is quite common to calculate some bit mask,
and apply it to some value via `and`.
The typical C code looks like:
```
int mask_signed_add(int nbits) {
return (1 << nbits) - 1;
}
```
which is translated into (with `-O3`)
```
define dso_local i32 @mask_signed_add(int)(i32) local_unnamed_addr #0 {
%2 = shl i32 1, %0
%3 = add nsw i32 %2, -1
ret i32 %3
}
```
But there is a second, less readable variant:
```
int mask_signed_xor(int nbits) {
return ~(-(1 << nbits));
}
```
which is translated into (with `-O3`)
```
define dso_local i32 @mask_signed_xor(int)(i32) local_unnamed_addr #0 {
%2 = shl i32 -1, %0
%3 = xor i32 %2, -1
ret i32 %3
}
```
Since we created such a mask, it is quite likely that we will use it in `and` next.
And then we may get rid of `not` op by folding into `andn`.
But now that i have actually looked:
https://godbolt.org/g/VTUDmU
_some_ backend changes will be needed too.
We clearly loose `bzhi` recognition.
Reviewers: spatel, craig.topper, RKSimon
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47428
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334127
91177308-0d34-0410-b5e6-
96231b3b80d8
Roman Lebedev [Wed, 6 Jun 2018 19:38:21 +0000 (19:38 +0000)]
[InstCombine][NFC] PR37603: low bit mask canonicalization tests
Differential Revision: https://reviews.llvm.org/D47427
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334126
91177308-0d34-0410-b5e6-
96231b3b80d8
Roman Lebedev [Wed, 6 Jun 2018 19:38:16 +0000 (19:38 +0000)]
[X86] Emit BZHI when mask is ~(-1 << nbits))
Summary:
In D47428, i propose to choose the `~(-(1 << nbits))` as the canonical form of low-bit-mask formation.
As it is seen from these tests, there is a reason for that.
AArch64 currently better handles `~(-(1 << nbits))`, but not the more traditional `(1 << nbits) - 1` (sic!).
The other way around for X86.
It would be much better to canonicalize.
This patch is completely monkey-typing.
I don't really understand how this works :)
I have based it on `// x & (-1 >> (32 - y))` pattern.
Also, when we only have `BMI`, i wonder if we could use `BEXTR` with `start=0` ?
Related links:
https://bugs.llvm.org/show_bug.cgi?id=36419
https://bugs.llvm.org/show_bug.cgi?id=37603
https://bugs.llvm.org/show_bug.cgi?id=37610
https://rise4fun.com/Alive/idM
Reviewers: craig.topper, spatel, RKSimon, javed.absar
Reviewed By: craig.topper
Subscribers: kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D47453
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334125
91177308-0d34-0410-b5e6-
96231b3b80d8
Roman Lebedev [Wed, 6 Jun 2018 19:38:10 +0000 (19:38 +0000)]
[NFC][X86][AArch64] Reorganize/cleanup BZHI test patterns
Summary:
In D47428, i propose to choose the `~(-(1 << nbits))` as the canonical form of low-bit-mask formation.
As it is seen from these tests, there is a reason for that.
AArch64 currently better handles `~(-(1 << nbits))`, but not the more traditional `(1 << nbits) - 1` (sic!).
The other way around for X86.
It would be much better to canonicalize.
It would seem that there is too much tests, but this is most of all the auto-generated possible variants
of C code that one would expect for BZHI to be formed, and then manually cleaned up a bit.
So this should be pretty representable, which somewhat good coverage...
Related links:
https://bugs.llvm.org/show_bug.cgi?id=36419
https://bugs.llvm.org/show_bug.cgi?id=37603
https://bugs.llvm.org/show_bug.cgi?id=37610
https://rise4fun.com/Alive/idM
Reviewers: javed.absar, craig.topper, RKSimon, spatel
Reviewed By: RKSimon
Subscribers: kristof.beyls, llvm-commits, RKSimon, craig.topper, spatel
Differential Revision: https://reviews.llvm.org/D47452
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334124
91177308-0d34-0410-b5e6-
96231b3b80d8
Krzysztof Parzyszek [Wed, 6 Jun 2018 19:34:40 +0000 (19:34 +0000)]
[Hexagon] Implement vector-pair zero as V6_vsubw_dv
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334123
91177308-0d34-0410-b5e6-
96231b3b80d8
Craig Topper [Wed, 6 Jun 2018 19:15:15 +0000 (19:15 +0000)]
[X86] Properly disassemble gather/scatter instructions where xmm4/ymm4/zmm4 are used as the index.
These encodings correspond to the cases in the normal encoding scheme where there is no index and our modrm reading code initially decodes it as such. The VSIB handling code tried to compensate for this, but failed to add the base needed to make later code do the right thing.
Fixes PR37712.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334121
91177308-0d34-0410-b5e6-
96231b3b80d8
Craig Topper [Wed, 6 Jun 2018 19:15:12 +0000 (19:15 +0000)]
[X86] Rename vy512mem->vy512xmem and vz256xmem->vz256mem.
The index size is represented by the letter after the 'v'. The number represents the memory size. If an 'x' appears after the number its means the index register can be from VR128X/VR256X instead of VR128/VR256.
As vy512mem uses a VR256X index it should have an x.
And vz256mem uses a VR512 index so it shouldn't have an x.
I admit these names kind of suck and are confusing.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334120
91177308-0d34-0410-b5e6-
96231b3b80d8
Simon Pilgrim [Wed, 6 Jun 2018 19:06:09 +0000 (19:06 +0000)]
[X86][BtVer2] Add support for all vector instructions that should match the dependency-breaking 'zero-idiom'
As detailed on Agner's Microarchitecture doc (21.8 AMD Bobcat and Jaguar pipeline - Dependency-breaking instructions), all these instructions are dependency breaking and zero the destination register.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334119
91177308-0d34-0410-b5e6-
96231b3b80d8
Vedant Kumar [Wed, 6 Jun 2018 19:05:42 +0000 (19:05 +0000)]
[Debugify] Move debug value intrinsics closer to their operand defs
Before this patch, debugify would insert debug value intrinsics before the
terminating instruction in a block. This had the advantage of being simple,
but was a bit too simple/unrealistic.
This patch teaches debugify to insert debug values immediately after their
operand defs. This enables better testing of the compiler.
For example, with this patch, `opt -debugify-each` is able to identify a
vectorizer DI-invariance bug fixed in llvm.org/PR32761. In this bug, the
vectorizer produced different output with/without debug info present.
Reverting Davide's bugfix locally, I see:
$ ~/scripts/opt-check-dbg-invar.sh ./bin/opt \
.../SLPVectorizer/AArch64/spillcost-di.ll -slp-vectorizer
Comparing: -slp-vectorizer .../SLPVectorizer/AArch64/spillcost-di.ll
Baseline: /var/folders/j8/t4w0bp8j6x1g6fpghkcb4sjm0000gp/T/tmp.iYYeL1kf
With DI : /var/folders/j8/t4w0bp8j6x1g6fpghkcb4sjm0000gp/T/tmp.sQtQSeet
9,11c9,11
< %5 = getelementptr inbounds %0, %0* %2, i64 %0, i32 1
< %6 = bitcast i64* %4 to <2 x i64>*
< %7 = load <2 x i64>, <2 x i64>* %6, align 8, !tbaa !0
---
> %5 = load i64, i64* %4, align 8, !tbaa !0
> %6 = getelementptr inbounds %0, %0* %2, i64 %0, i32 1
> %7 = load i64, i64* %6, align 8, !tbaa !5
12a13
> store i64 %5, i64* %8, align 8, !tbaa !0
14,15c15
< %10 = bitcast i64* %8 to <2 x i64>*
< store <2 x i64> %7, <2 x i64>* %10, align 8, !tbaa !0
---
> store i64 %7, i64* %9, align 8, !tbaa !5
:: Found a test case ^
Running this over the *.ll files in tree, I found four additional examples
which compile differently with/without DI present. I plan on filing bugs for
these.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334118
91177308-0d34-0410-b5e6-
96231b3b80d8