git.osdn.net Git - android-x86/external-llvm.git/log

[asan] Reduce binary size by using unnamed private aliases

Summary:
--asan-use-private-alias increases binary sizes by 10% or more.
Most of this space was long names of aliases and new symbols.
These symbols are not needed for the ODC check at all.

Reviewers: eugenis

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D55146

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348221 91177308-0d34-0410-b5e6-96231b3b80d8

[MachineOutliner] Move stack instr check logic to getOutliningCandidateInfo

This moves the stack check logic into a lambda within getOutliningCandidateInfo.

This allows us to be less conservative with stack checks. Whether or not a
stack instruction is safe to outline is dependent on the frame variant and call
variant of the outlined function; only in cases where we modify the stack can
these be unsafe.

So, if we move that logic later, when we're looking at an individual candidate,
we can make better decisions here.

This gives some code size savings as a result.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348220 91177308-0d34-0410-b5e6-96231b3b80d8

[MachineOutliner][AArch64][NFC] Add early exit to candidate discarding logic

If we dropped too many candidates to be beneficial when dropping candidates
that modify the stack, there's no reason to check for other cost model
qualities.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348219 91177308-0d34-0410-b5e6-96231b3b80d8

[projects] Use directory name for add_llvm_external_projects

add_llvm_external_projects expects the directory name instead of the
full path, otherwise the check for an in-tree subproject will fail and
the project won't be configured.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348217 91177308-0d34-0410-b5e6-96231b3b80d8

[ThinLTO] Look through aliases when computing hash keys

Without this, we don't consider types used by aliasees in our cache key.
This caused issues when using the same cache for thin-linking the same
TU with different sets of virtual call candidates for a virtual call
inside of a constructor. That's sort of a mouthful. :)

Differential Revision: https://reviews.llvm.org/D55060

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348216 91177308-0d34-0410-b5e6-96231b3b80d8

[IR] Don't assume all functions are 4 byte aligned

In some cases different alignments for function might be used to save
space e.g. thumb mode with -Oz will try to use 2 byte function
alignment. Similar patch that fixed this in other areas exists here
https://reviews.llvm.org/D46110

Differential Revision: https://reviews.llvm.org/D55115

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348215 91177308-0d34-0410-b5e6-96231b3b80d8

[Hexagon] Switch to auto-generated intrinsic definitions and patterns

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348206 91177308-0d34-0410-b5e6-96231b3b80d8

[CodeExtractor] Split PHI nodes with incoming values from outlined region (PR39433)

If a PHI node out of extracted region has multiple incoming values from it,
split this PHI on two parts. First PHI has incomings only from region and
extracts with it (they are placed to the separate basic block that added to the
list of outlined), and incoming values in original PHI are replaced by first
PHI. Similar solution is already used in CodeExtractor for PHIs in entry block
(severSplitPHINodes method). It covers PR39433 bug.

Patch by Sergei Kachkov!

Differential Revision: https://reviews.llvm.org/D55018

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348205 91177308-0d34-0410-b5e6-96231b3b80d8

Adapt gcov to changes in CFE.

The clang frontend no longer emits the current working directory for
DIFiles containing an absolute path in the filename: and will move the
common prefix between current working directory and the file into the
directory: component.

This fixes the GCOV tests in compiler-rt that were broken by the Clang
change.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348203 91177308-0d34-0410-b5e6-96231b3b80d8

BumpPtrAllocator: Add a couple of convenient wrappers around identifyObject().

This allows obtaining smaller, more readable identifiers
in a more comfortable way.

Differential Revision: https://reviews.llvm.org/D54486

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348197 91177308-0d34-0410-b5e6-96231b3b80d8

[Hexagon] Extract operand decoders into a separate file, NFC

These decoders are automatically generated. Keeping them separated makes
updating architectures easier.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348196 91177308-0d34-0410-b5e6-96231b3b80d8

[DAGCombiner] narrow truncated vector binops when legal

This is the smallest vector enhancement I could find to D54640.
Here, we're allowing narrowing to only legal vector ops because we'll see
regressions without that. All of the test diffs are wins from what I can tell.
With AVX/AVX512, we can shrink ymm/zmm ops to xmm.

x86 vector multiplies are the problem case that we're avoiding due to the
patchwork ISA, and it's not clear to me if we can dance around those
regressions using TLI hooks or if we need preliminary patches to plug those
holes.

Differential Revision: https://reviews.llvm.org/D55126

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348195 91177308-0d34-0410-b5e6-96231b3b80d8

[mips] Fix TestDWARF32Version5Addr8AllForms test failure on MIPS hosts

The `DIEExpr` is used in debug information entries for either TLS variables
or call sites. For now the last case is unsupported for targets with delay
slots, for MIPS in particular.

The `DIEExpr::EmitValue` method calls a virtual `EmitDebugThreadLocal`
routine which, in case of MIPS, always emits either `.dtprelword` or
`.dtpreldword` directives. That is okay for "main" code, but in unit
tests `DIEExpr` instances can be created not for TLS variables only even
on MIPS hosts. That is a reason of the `TestDWARF32Version5Addr8AllForms`
failure because handling of the `R_MIPS_TLS_DTPREL` relocation writes
incorrect value into dwarf structures. And anyway unconditional emitting
of `.dtprelword` directives will be incorrect when/if debug information
entries for call sites become supported on MIPS.

The patch solves the problem by wrapping expression created in the
`MipsTargetObjectFile::getDebugThreadLocalSymbol` method in to the
`MipsMCExpr` expression with a new `MEK_DTPREL` tag. This tag is
recognized in the `MipsAsmPrinter::EmitDebugThreadLocal` method and
`.dtprelword` directives created in this case only. In other cases the
expression saved as a regular data.

Differential Revision: http://reviews.llvm.org/D54937

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348194 91177308-0d34-0410-b5e6-96231b3b80d8

[Hexagon] Remove unused encodings, NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348193 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] fix undef propagation bug with shuffle+binop

When we have a shuffle that extends a source vector with undefs
and then do some binop on that, we must make sure that the extra
elements remain undef with that binop if we reverse the order of
the binop and shuffle.

'or' is probably the easiest example to show the bug because
'or C, undef --> -1' (not undef). But there are other
opcode/constant combinations where this is true as shown by
the 'shl' test.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348191 91177308-0d34-0410-b5e6-96231b3b80d8

[gn build] Use print_function in write_cmake_config.py

No behavior change, just makes the script match the other scripts in
llvm/utils/gn/build.

Differential Revision: https://reviews.llvm.org/D55183

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348190 91177308-0d34-0410-b5e6-96231b3b80d8

[WebAssembly] Enforce assembler emits to streamer in order.

Summary:
The assembler processes directives and instructions in whatever order
they are in the file, then directly emits them to the streamer. This
could cause badly written (or generated) .s files to produce
incorrect binaries.

It now has state that tracks what it has most recently seen, to
enforce they are emitted in a given order that always produces
correct wasm binaries.

Also added a new test that compares obj2yaml output from llc (the
backend) to that going via .s and the assembler to ensure both paths
generate the same binaries.

The features this test covers could be extended.

Passes all wasm Lit tests.

Fixes: https://bugs.llvm.org/show_bug.cgi?id=39557

Reviewers: sbc100, dschuff, aheejin

Subscribers: jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D55149

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348185 91177308-0d34-0410-b5e6-96231b3b80d8

[Hexagon] Update timing classes

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348183 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] foldICmpWithLowBitMaskedVal(): disable 2 faulty folds.

These two folds are invalid for this non-constant pattern
when the mask ends up being all-ones:
https://rise4fun.com/Alive/9au
https://rise4fun.com/Alive/UcQM

Fixes https://bugs.llvm.org/show_bug.cgi?id=39861

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348181 91177308-0d34-0410-b5e6-96231b3b80d8

[cmake] Clean up add_llvm_subdirectory

I found the pattern of setting the project_BUILD variable to OFF after
processing the project to be pretty confusing. Using global properties
to explicitly keep track of whether a project has been processed or not
seems much more straightforward, and it also allows us to convert the
macro into a function (which is required for the early return).

Factor the project+type+name combination out into a variable while I'm
here, since it's used a whole bunch of times.

I don't believe this should result in any functional changes.

Differential Revision: https://reviews.llvm.org/D55104

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348180 91177308-0d34-0410-b5e6-96231b3b80d8

[TextAPI] Remove a superfluous semicolon, fixing GCC warnings. NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348179 91177308-0d34-0410-b5e6-96231b3b80d8

[COFF] Remove an outdated/incorrect comment. NFC.

Making the section writable doesn't affect how windows does
base relocs in case a DLL can't be loaded at the intended base
address.

This comment dates back to SVN r79346.

Differential Revision:

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348178 91177308-0d34-0410-b5e6-96231b3b80d8

[COFF] Don't mark mingw .eh_frame sections writable

This improves compatibility with GCC produced object files, where
the .eh_frame sections are read only. With mixed flags for the
involved .eh_frame sections, LLD creates two separate .eh_frame
sections in the output binary, one for each flag combination,
while ld.bfd probably merges them.

The previous setup of flags can be traced back to SVN r79346.

Differential Revision: https://reviews.llvm.org/D55209

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348177 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] rearrange shuffle+binop fold; NFC

This code has a bug dealing with undefs, so we need
to add another escape hatch, so doing some cleanup
ahead of that.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348175 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-objcopy] Add --build-id-link-dir flag

This flag does not exist in GNU objcopy but has a major use case.
Debugging tools support the .build-id directory structure to find
debug binaries. There is no easy way to build this structure up
however. One way to do it is by using llvm-readelf and some crazy
shell magic. This implements the feature directly. It is most often
the case that you'll want to strip a file and send the original to
the .build-id directory but if you just want to send a file to the
.build-id directory you can copy to /dev/null instead.

Differential Revision: https://reviews.llvm.org/D54384

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348174 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] add tests for shuffle+binop fold; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348173 91177308-0d34-0410-b5e6-96231b3b80d8

[Hexagon] Change instruction type field in TSFlags to 7 bits

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348171 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-tapi] initial commit, supports ELF text stubs

http://lists.llvm.org/pipermail/llvm-dev/2018-September/126472.html

TextAPI is a library and accompanying tool that allows conversion between binary shared object stubs and textual counterparts. The motivations and uses cases for this are explained thoroughly in the llvm-dev proposal [1]. This initial commit proposes a potential structure for the TAPI library, also including support for reading/writing text-based ELF stubs (.tbe) in addition to preliminary support for reading binary ELF files. The goal for this patch is to ensure the project architecture appropriately welcomes integration of Mach-O stubbing from Apple's TAPI [2].

Added:

- TextAPI library
- .tbe read support
- .tbe write (to raw_ostream) support

[1] http://lists.llvm.org/pipermail/llvm-dev/2018-September/126472.html
[2] https://github.com/ributzka/tapi

Differential Revision: https://reviews.llvm.org/D53051

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348170 91177308-0d34-0410-b5e6-96231b3b80d8

[MachineOutliner] Drop candidates that require fixups if it's beneficial

If it's a bigger code size win to drop candidates that require stack fixups
than to demote every candidate to that variant, the outliner should do that.

This happens if the number of bytes taken by calls to functions that don't
require fixups, plus the number of bytes that'd be left is less than the
number of bytes that it'd take to emit a save + restore for all candidates.

Also add tests for each possible new behaviour.

- machine-outliner-compatible-candidates shows that when we have candidates
that don't use the stack, we can use the default call variant along with the
no save/regsave variant.

- machine-outliner-all-stack shows that when it's better to fix up the stack,
we still will demote all candidates to that case

- machine-outliner-drop-stack shows that we can discard candidates that
require stack fixups when it would be beneficial to do so.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348168 91177308-0d34-0410-b5e6-96231b3b80d8

[Hexagon] Add HasV5 predicate for compatibility with auto-generated files

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348167 91177308-0d34-0410-b5e6-96231b3b80d8

Fix issue with Tpi Stream hash map.

Part of the patch to not build the hash map eagerly was omitted
due to a merge conflict. Add it back, which should fix the failing
tests.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348166 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Fix bad formatting. NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348164 91177308-0d34-0410-b5e6-96231b3b80d8

[Hexagon] Remove unused operand definitions, NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348163 91177308-0d34-0410-b5e6-96231b3b80d8

[Hexagon] Some formatting changes, NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348162 91177308-0d34-0410-b5e6-96231b3b80d8

Don't build the Tpi Hash map by default.

This is very slow and should be done for specific cases where
lookups will need to happen.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348160 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Teach LowerMUL/LowerMULH for vXi8 to unpack constant RHS.

Summary:
We need to unpackl and unpackh the operands to use two vXi16 multiplies. Previously it looks like the low unpack would get constant folded at least in the 128-bit case after shuffle lowering turned the unpackl into ZERO_EXTEND_VECTOR_INREG and X86 custom DAG combined it. The same doesn't happen for the high half. So we'd load a constant and then shuffle it. But the low half would just be loaded and used by the multiply directly.

After this patch we now end up with a constant pool entry for the low and high unpacks separately with no shuffle operations.

This is a step towards removing custom constant folding for ZERO_EXTEND_VECTOR_INREG/SIGN_EXTEND_VECTOR_INREG in the X86 backend.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D55165

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348159 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Add DAG combine to combine a v8i32->v8i16 truncate with a packuswb that truncates v8i16->v8i8.

Summary:
Under -x86-experimental-vector-widening-legalization, fp_to_uint/fp_to_sint with a smaller than 128 bit vector type results are custom type legalized by promoting the result to a 128 bit vector by promoting the elements, inserting an assertzext/assertsext, then truncating back to original type. The truncate will be further legalizdd to a pack shuffle. In the case of a v8i8 result type, we'll end up with a v8i16 fp_to_sint. This will need to be further legalized during vector op legalization by promoting to v8i32 and then truncating again. Under avx2 this produces good code with two pack instructions, but Under avx512 this will result in a truncate instruction and a packuswb instruction. But we should be able to get away with a single truncate instruction.

The other option is to promote all the way to vXi32 result type during the first type legalization. But in some experimentation that seemed to require more work to produce good code for other configurations.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54836

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348158 91177308-0d34-0410-b5e6-96231b3b80d8

Fix non-modular build.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348157 91177308-0d34-0410-b5e6-96231b3b80d8

Update Diagnostic handling for changes in CFE.

The clang frontend no longer emits the current working directory for
DIFiles containing an absolute path in the filename: and will move the
common prefix between current working directory and the file into the
directory: component.

https://reviews.llvm.org/D55085

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348155 91177308-0d34-0410-b5e6-96231b3b80d8

[SimplifyCFG] add tests for cross block compare folding; NFC

These are the baseline tests for D54827.
Patch based on code originally written by: @yinyuefengyi (luo xionghu)

Differential Revision: https://reviews.llvm.org/D54994

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348151 91177308-0d34-0410-b5e6-96231b3b80d8

[CmpInstAnalysis] fix formatting; NFC

There are potential improvements to the structure of this API
raised by D54994, but remove some cosmetic blemishes before
making any functional changes.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348149 91177308-0d34-0410-b5e6-96231b3b80d8

Fix line endings. NFCI.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348146 91177308-0d34-0410-b5e6-96231b3b80d8

Fixing -print-module-scope for legacy SCC passes

It appears that print-module-scope was not implemented for legacy SCC passes.
Fixed to print a whole module instead of just current SCC.

Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D54793

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348144 91177308-0d34-0410-b5e6-96231b3b80d8

[SystemZ::TTI] Return zero cost for ICmp that becomes Load And Test.

A loaded value with multiple users compared with 0 will become a load and
test single instruction. The load is not folded in this case (multiple
users), but the compare instruction is eliminated.

This patch returns 0 cost for the icmp in these cases.

Review: Ulrich Weigand
https://reviews.llvm.org/D55111

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348141 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64] Add command-line option for SSBS

Summary:
SSBS (Speculative Store Bypass Safe) is only mandatory from 8.5
onwards but is optional from Armv8.0-A. This patch adds a command
line option to enable SSBS, as it was previously only possible to
enable by selecting -march=armv8.5-a.

Similar patch upstream in GNU binutils:
https://sourceware.org/ml/binutils/2018-09/msg00274.html

Reviewers: olista01, samparker, aemerson

Reviewed By: samparker

Subscribers: javed.absar, kristof.beyls, kristina, llvm-commits

Differential Revision: https://reviews.llvm.org/D54629

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348137 91177308-0d34-0410-b5e6-96231b3b80d8

[AMDGPU] Add sdwa support for ADD|SUB U64 decomposed Pseudos

The introduction of S_{ADD|SUB}_U64_PSEUDO instructions which are decomposed
into VOP3 instruction pairs for S_ADD_U64_PSEUDO:
  V_ADD_I32_e64
  V_ADDC_U32_e64
and for S_SUB_U64_PSEUDO
  V_SUB_I32_e64
  V_SUBB_U32_e64
preclude the use of SDWA to encode a constant.
SDWA: Sub-Dword addressing is supported on VOP1 and VOP2 instructions,
but not on VOP3 instructions.

We desire to fold the bit-and operand into the instruction encoding
for the V_ADD_I32 instruction. This requires that we transform the
VOP3 into a VOP2 form of the instruction (_e32).
  %19:vgpr_32 = V_AND_B32_e32 255,
      killed %16:vgpr_32, implicit $exec
  %47:vgpr_32, %49:sreg_64_xexec = V_ADD_I32_e64
      %26.sub0:vreg_64, %19:vgpr_32, implicit $exec
%48:vgpr_32, dead %50:sreg_64_xexec = V_ADDC_U32_e64
      %26.sub1:vreg_64, %54:vgpr_32, killed %49:sreg_64_xexec, implicit $exec

which then allows the SDWA encoding and becomes
  %47:vgpr_32 = V_ADD_I32_sdwa
      0, %26.sub0:vreg_64, 0, killed %16:vgpr_32, 0, 6, 0, 6, 0,
      implicit-def $vcc, implicit $exec
  %48:vgpr_32 = V_ADDC_U32_e32
      0, %26.sub1:vreg_64, implicit-def $vcc, implicit $vcc, implicit $exec

Differential Revision: https://reviews.llvm.org/D54882

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348132 91177308-0d34-0410-b5e6-96231b3b80d8

ARM: use target-specific SUBS node when combining cmp with cmov.

This has two positive effects. First, using a custom node prevents
recombination leading to an infinite loop since the output DAG is notionally a
little more complex than the input one. Using a flag-setting instruction also
allows the subtraction to be folded with the related comparison more easily.

https://reviews.llvm.org/D53190

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348122 91177308-0d34-0410-b5e6-96231b3b80d8

[NFC][AArch64] Split out backend features

This patch splits backend features currently
hidden behind architecture versions.

For example, currently the only way to activate
complex numbers extension is targeting an v8.3
architecture, where after the patch this extension
can be added separately.

This refactoring is required by the new command lines proposal:
http://lists.llvm.org/pipermail/llvm-dev/2018-September/126346.html

Reviewers: DavidSpickett, olista01, t.p.northover

Subscribers: kristof.beyls, bryanpkc, javed.absar, pbarrio

Differential revision: https://reviews.llvm.org/D54633

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348121 91177308-0d34-0410-b5e6-96231b3b80d8

[CMake] Add LLVM_EXTERNALIZE_DEBUGINFO_OUTPUT_DIR for custom dSYM target directory on Darwin

Summary: When using `LLVM_EXTERNALIZE_DEBUGINFO` in LLDB, the default dSYM location for the shared library in LLDB.framework is inside the framework bundle. With `LLVM_EXTERNALIZE_DEBUGINFO_OUTPUT_DIR` we can easily fix that. I consider it a useful feature to be able to set a global output directory for external debug info (rather then having a target-specific one). Only implemented for Darwin so far.

Reviewers: beanz, aprantl

Reviewed By: aprantl

Subscribers: mgorny, aprantl, #lldb, lldb-commits, llvm-commits

Differential Revision: https://reviews.llvm.org/D55114

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348118 91177308-0d34-0410-b5e6-96231b3b80d8

[RISCV] Fix test/MC/Disassembler/RISCV/invalid-instruction.txt after rL347988

The test for [0x00 0x00] failed due to the introduction of c.unimp.

This particular test is unnecessary now that c.unimp was defined (and is
tested in test/MC/RISCV/rv32c-valid.s).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348117 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-dwarfdump] - Stop printing the bogus empty section name on invalid dwarf.

When there is no .debug_addr section for some reason,
llvm-dwarfdump would print the bogus empty section name when dumping ranges
in .debug_info:

DW_AT_ranges [DW_FORM_rnglistx]   (indexed (0x0) rangelist = 0x00000004
    [0x0000000000000000, 0x0000000000000001) ""
    [0x0000000000000000, 0x0000000000000002) "")

That happens because of the code which uses 0 (zero) as a section index as a default value.
The code should use -1ULL instead because technically 0 is a valid zero section index
in ELF and -1ULL is a special constant used that means "no section available".

This is mostly a fix for the overall correctness/safety of the code,
but a test case is provided too.

Differential revision: https://reviews.llvm.org/D55113

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348115 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM][MC] Move information about variadic register defs into tablegen

Currently, variadic operands on an MCInst are assumed to be uses,
because they come after the defs. However, this is not always the case,
for example the Arm/Thumb LDM instructions write to a variable number of
registers.

This adds a property of instruction definitions which can be used to
mark variadic operands as defs. This only affects MCInst, because
MachineInstruction already tracks use/def per operand in each instance
of the instruction, so can already represent this.

This property can then be checked in MCInstrDesc, allowing us to remove
some special cases in ARMAsmParser::isITBlockTerminator.

Differential revision: https://reviews.llvm.org/D54853

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348114 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM][Asm] Debug trace for the processInstruction loop

In the Arm assembly parser, we first match an instruction, then call
processInstruction to possibly change it to a different encoding, to
match rules in the architecture manual which can't be expressed by the
table-generated matcher.

This adds debug printing so that this process is visible when using the
-debug option.

To support this, I've added a new overload of MCInst::dump_pretty which
takes the opcode name as a StringRef, since we don't have an InstPrinter
instance in the assembly parser. Instead, we can get the same
information directly from the MCInstrInfo.

Differential revision: https://reviews.llvm.org/D54852

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348113 91177308-0d34-0410-b5e6-96231b3b80d8

[KMSAN] Enable -msan-handle-asm-conservative by default

This change enables conservative assembly instrumentation in KMSAN builds
by default.
It's still possible to disable it with -msan-handle-asm-conservative=0
if something breaks. It's now impossible to enable conservative
instrumentation for userspace builds, but it's not used anyway.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348112 91177308-0d34-0410-b5e6-96231b3b80d8

[GlobalISel] Fix test irtranslator-stackprotect-check.ll

Fix for commit r347862. Use correct AArch64 triple in test
CodeGen/AArch64/GlobalISel/irtranslator-stackprotect-check.ll.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348111 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM] FP16: support vld1.16 for vector loads with post-increment

Differential Revision: https://reviews.llvm.org/D55112

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348110 91177308-0d34-0410-b5e6-96231b3b80d8

[PowerPC] Fix inconsistent ImmMustBeMultipleOf for same instruction

Summary:
There are 4 instructions which have Inconsistent ImmMustBeMultipleOf in the
function PPCInstrInfo::instrHasImmForm, they are LFS, LFD, STFS, STFD.
These four instructions should set the ImmMustBeMultipleOf to 1 instead of 4.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D54738

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348109 91177308-0d34-0410-b5e6-96231b3b80d8

[NFC] [PowerPC] add an routine in PPCTargetLowering to determine if a global is accessed as got-indirect or not.

In theory, we should let the PPC target to determine how to lower the TOC Entry for globals.
And the PPCTargetLowering requires this query to do some optimization for TOC_Entry.

Differential Revision: https://reviews.llvm.org/D54925

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348108 91177308-0d34-0410-b5e6-96231b3b80d8

[gn build] Fix cosmetic bug in write_cmake_config.py

Before, #cmakedefine FOO resulted in #define FOO with a trailing space if FOO
was set to something truthy. Make it so that it's just #define FOO without a
trailing space.

No functional difference.

Differential Revision: https://reviews.llvm.org/D55172

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348107 91177308-0d34-0410-b5e6-96231b3b80d8

[gn build] Slightly simplify write_cmake_config.

Before, the script had a bunch of special cases for #cmakedefine and
#cmakedefine01 and then did general variable substitution. Now, the script
always does general variable substitution for all lines and handles the special
cases afterwards.

This has no observable effect for the inputs we use, but is easier to explain
and slightly easier to implement.

Also mention to link to CMake's configure_file() in the docstring.

(The new behavior doesn't quite match CMake on lines like #cmakedefine ${FOO},
but nobody does that.)

Differential Revision: https://reviews.llvm.org/D55171

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348106 91177308-0d34-0410-b5e6-96231b3b80d8

[gn build] Add build files for llvm/lib/Analysis and llvm/lib/ProfileData

Differential Revision: https://reviews.llvm.org/D55166

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348105 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Add a DAG combine to turn stores of vXi1 on pre-avx512 targets into a bitcast and a store of a iX scalar.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348104 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Fix bad comment. NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348103 91177308-0d34-0410-b5e6-96231b3b80d8

[test] Fix use of 'sort -b' in SimpleLoopUnswitch on NetBSD

Add '-k 1' to 'sort -b' calls in SimpleLoopUnswitch tests, as required
for sort implementation on NetBSD.  The '-b' modifier is ineffective
if specified without any key.  Per the manpage:

  Note that the -b option has no effect unless key fields are specified.

Differential Revision: https://reviews.llvm.org/D55168

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348097 91177308-0d34-0410-b5e6-96231b3b80d8

[test] Fix ScalarEvolution test to allow __func__ with prototype

Fix ScalarEvolution/solve-quadratic.ll test to account for __func__
output listing the complete function prototype rather than just its
name, as it does on NetBSD.

Example Linux output:

  GetQuadraticEquation: addrec coeff bw: 4
  GetQuadraticEquation: equation -2x^2 + -2x + -4, coeff bw: 5, multiplied by 2

Example NetBSD output:

  llvm::Optional<std::tuple<llvm::APInt, llvm::APInt, llvm::APInt, llvm::APInt, unsigned int> > GetQuadraticEquation(const llvm::SCEVAddRecExpr*): addrec coeff bw: 4
  llvm::Optional<std::tuple<llvm::APInt, llvm::APInt, llvm::APInt, llvm::APInt, unsigned int> > GetQuadraticEquation(const llvm::SCEVAddRecExpr*): equation -2x^2 + -2x + -4, coeff bw: 5, multiplied by 2

Differential Revision: https://reviews.llvm.org/D55162

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348096 91177308-0d34-0410-b5e6-96231b3b80d8

[test] Fix BugPoint/compile-custom.ll to use detected python exec

Spawn the custom compile command in BugPoint/compile-custom.ll via
%python rather than relying on implicit 'env python' shebang, in order
to fix it on systems that don't have 'python' executable such as NetBSD.

Differential Revision: https://reviews.llvm.org/D55161

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348095 91177308-0d34-0410-b5e6-96231b3b80d8

[ValueTracking] Support funnel shifts in computeKnownBits()

If the shift amount is known, we can determine the known bits of the
output based on the known bits of two inputs.

This is essentially the same functionality as implemented in D54869,
but for ValueTracking rather than InstCombine SimplifyDemandedBits.

Differential Revision: https://reviews.llvm.org/D55140

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348091 91177308-0d34-0410-b5e6-96231b3b80d8

[SelectionDAG] fold constant with undef vector per element

This makes the SDAG behavior consistent with the way we do this in IR.
It's possible that we were getting the wrong answer before. For example,
'xor undef, undef --> 0' but 'xor undef, C' --> undef.

But the most practical improvement is likely as shown in the tests here -
for FP, we were overconstraining undef lanes to NaN, and that can prevent
vector simplifications/narrowing (see D51553).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348090 91177308-0d34-0410-b5e6-96231b3b80d8

[DAGCombiner] guard against an oversized shift crash

This change prevents the crash noted in the post-commit comments
for rL347478 :
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181119/605166.html

We can't guarantee that an oversized shift amount is folded away,
so we have to check for it.

Note that I committed an incomplete fix for that crash with:
rL347502

But as discussed here:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181126/605679.html
...we have to try harder.

So I'm not sure how to expose the bug now (and apparently no fuzzers have found
a way yet either).

On the plus side, we have discovered that we're missing real optimizations by
not simplifying nodes sooner, so the earlier fix still has value, and there's
likely more value in extending that so we can simplify more opcodes and simplify
when doing RAUW and/or putting nodes on the combiner worklist.

Differential Revision: https://reviews.llvm.org/D54954

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348089 91177308-0d34-0410-b5e6-96231b3b80d8

[ValueTracking] add helper function for testing implied condition; NFCI

We were duplicating code around the existing isImpliedCondition() that
checks for a predecessor block/dominating condition, so make that a
wrapper call.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348088 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Simplify LowerBITCAST code for v2i32/v4i16/v8i8/i64->mmx/i64/f64 bitcast.

Previously this code generated its own extracts and build_vector. But we can use a simpler concat_vectors or scalar_to_vector operation and let type legalization do additional legalization of those operations.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348087 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Add custom type legalization for v2i32/v4i16/v8i8->mmx bitcasts to avoid a store/load to/from the stack.

Widen the input to a 128 bit vector by padding with undef elements. Then use a movdq2q to convert from xmm register to mmx register.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348086 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Custom type legalize v2i32/v4i16/v8i8->i64 bitcasts in 64-bit mode similar to what's done when the destination is f64.

The generic legalizer will fall back to a stack spill that uses a truncating store. That store will get expanded into a shuffle and non-truncating store on pre-avx512 targets. Once that happens the stack store/load pair will be combined away leaving behind the shuffle and bitcasts. On avx512 targets the truncating store is legal so doesn't get folded away.

By custom legalizing it we can avoid this churn and maybe produce better code.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348085 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Add vXi8 division/remainder by non-splat constant test cases to prepare for an upcoming patch.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348082 91177308-0d34-0410-b5e6-96231b3b80d8

[MachineOutliner][AArch64] Improve checks for stack instructions

If we know that we'll definitely save LR to a register, there's no reason to
pre-check whether or not a stack instruction is unsafe to fix up.

This makes it so that we check for that condition before mapping instructions.

This allows us to outline more, since we don't pessimise as many instructions.

Also update some tests, since we outline more.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348081 91177308-0d34-0410-b5e6-96231b3b80d8

Replace w16/w17 in machine-outliner.mir with w11/w12

These registers should not be used here, since they are interprocedural
scratch registers in AArch64.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348080 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Don't use zero_extend_vector_inreg for mulhu lowering with sse 4.1

Summary: With sse4.1 we use two zero_extend_vector_inreg and a pshufd to expand the v16i8 input into two v8i16 vectors for the multiply. That's 3 shuffles to extend one operand. The other operand is usually constant as this is mostly used by division by constant optimization. Pre sse4.1 we use a punpckhbw and a punpcklbw with a zero vector. That's two shuffles and an xor and a copy due to tied register constraints. That seems maybe better than the 3 shuffles. With AVX we avoid the copy so that's obviously better.

Reviewers: spatel, RKSimon

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D55138

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348079 91177308-0d34-0410-b5e6-96231b3b80d8

[TTI] Reduction costs only need to include a single extract element cost (REAPPLIED)

We were adding the entire scalarization extraction cost for reductions, which returns the total cost of extracting every element of a vector type.

For reductions we don't need to do this - we just need to extract the 0'th element after the reduction pattern has completed.

Fixes PR37731

Rebased and reapplied after being reverted in rL347541 due to PR39774 - which was fixed by D54955/rL347759 and D55017/rL347997

Differential Revision: https://reviews.llvm.org/D54585

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348076 91177308-0d34-0410-b5e6-96231b3b80d8

[AMDGPU] Split 64-Bit XNOR to 64-Bit NOT/XOR

The identity ~(x ^ y) == (~x ^ y) == (x ^ ~y) allows XNOR (XOR/NOT) to turn into NOT/XOR. Handling this case with its own split means we can make the NOT remain in the scalar unit. Previously, we split 64-bit XNOR into two 32-bit XNOR, then lowered. Now, we get three instructions (s_not, v_xor, v_xor) rather than four in the case where either of the sources is a scalar 64-bit.

Add test cases to xnor.ll to attempt XNOR Vx, Sy and XNOR Sx, Vy. Also adding test that uses the opposite identity such that (~x ^ y) on the scalar unit (or vector for gfx906) can generate XNOR. This already worked, but I didn't see a test for it.

Differential: https://reviews.llvm.org/D55071

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348075 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-readobj] Improve dynamic section iteration NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348074 91177308-0d34-0410-b5e6-96231b3b80d8

[SelectionDAG] Improve SimplifyDemandedBits to SimplifyDemandedVectorElts simplification

D52935 introduced the ability for SimplifyDemandedBits to call SimplifyDemandedVectorElts through BITCASTs if the demanded bit mask entirely covered the sub element.

This patch relaxes this to demanding an element if we need any bit from it.

Differential Revision: https://reviews.llvm.org/D54761

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348073 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] Support ssub.sat canonicalization for non-splats

Extend ssub.sat(X, C) -> sadd.sat(X, -C) canonicalization to also
support non-splat vector constants. This is done by generalizing
the implementation of the isNotMinSignedValue() helper to return
true for constants that are non-splat, but don't contain any
signed min elements.

Differential Revision: https://reviews.llvm.org/D55011

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348072 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Remove stale FIXME from test case. NFC

This was fixed in r346581. I just forgot to remove it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348069 91177308-0d34-0410-b5e6-96231b3b80d8

[ThinLTO] Allow importing of functions with var args

Summary:
Follow up to D54270, which allowed importing of var args functions
unless they called va_start. As pointed out in the post-commit comments
on that patch, the inliner can handle functions that call va_start in
certain situations as well. Go ahead and enable importing of all var
args functions. Measurements on a large binary show that this increases
imports and binary size by an insignificant amount.

Reviewers: davidxl

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D54607

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348068 91177308-0d34-0410-b5e6-96231b3b80d8

[RISCV] Remove RV64I SLLW/SRLW/SRAW patterns and add new test cases

As noted by Eli Friedman <https://reviews.llvm.org/D52977?id=168629#1315291>,
the RV64I shift patterns for SLLW/SRLW/SRAW make some incorrect assumptions.
SRAW assumed that (sext_inreg foo, i32) could only be produced when
sign-extended an i32. However, it can be produced by input such as:

define i64 @tricky_ashr(i64 %a, i64 %b) {
  %1 = shl i64 %a, 32
  %2 = ashr i64 %1, 32
  %3 = ashr i64 %2, %b
  ret i64 %3
}

It's important not to select sraw in the above case, because sraw only uses
bits lower 5 bits from the shift, while a shift of 32-63 would be valid.

Similarly, the patterns for srlw assumed (and foo, 0xffffffff) would only be
produced when zero-extending a value that was originally i32 in LLVM IR. This
is obviously incorrect.

This patch removes the SLLW/SRLW/SRAW shift patterns for the time being and
adds test cases that would demonstrate a miscompile if the incorrect patterns
were re-added.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348067 91177308-0d34-0410-b5e6-96231b3b80d8

[projects] Use add_llvm_external_project for implicit projects

This allows disabling implicit projects via the LLVM_TOOL_*_BUILD
variables, similar to how implicit tools can be disabled. They'll still
be enabled by default, since add_llvm_external_project defaults the
LLVM_TOOL_*_BUILD variables to ON for in-tree implciit projects.

Differential Revision: https://reviews.llvm.org/D55105

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348064 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][LoopVectorize] Replace -mcpu=skylake-avx512 with -mattr=avx512f in some tests that failed when experimenting with defaulting to -mprefer-vector-width=256 for skylake-avx512.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348063 91177308-0d34-0410-b5e6-96231b3b80d8

Use RequireNullTerminator=false in identify_magic.

identify_magic does not need the file to be null terminated. Passing
true here causes the file reading code to decide not to use mmap in
some rare cases (which happen to be true 100% of the time in PDB files)
which can lead to very large files failing to load. Since it was
probably just an accident that we were passing true here (since it is
the default function parameter), this should be strictly an improvement.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348059 91177308-0d34-0410-b5e6-96231b3b80d8

[lit] Add a generic build script with a lit substitution.

This adds a script called build.py as well as a lit substitution
called %build that we can use to invoke it.  The idea is that
this allows a lit test to build test inferiors without having
to worry about architecture / platform specific differences,
command line syntax, finding / configurationg a proper toolchain,
and other issues.  They can simply write something like:

%build --arch=32 -o %t.exe %p/Inputs/foo.cpp

and it will just work.  This paves the way for being able to
run lit tests with multiple configurations, platforms, and
compilers with a single test.

Differential Revision: https://reviews.llvm.org/D54914

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348058 91177308-0d34-0410-b5e6-96231b3b80d8

[NVPTX] Add lowering of i128 numbers as struct fields

Addition to D34555 - override VTs computation with ComputePTXValueVTs
for struct fields.

Author: Denys Zariaiev<denys.zariaiev@gmail.com>

Differential Revision: https://reviews.llvm.org/D55144

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348057 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Replace '-mcpu=skx' with -mattr=avx512f or -mattr=avx512bw in interleave/strided load/store cost model tests.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348056 91177308-0d34-0410-b5e6-96231b3b80d8

[gn build] Add action to generate VCSRevision.h and use it to add llvm/lib/Object/BUILD.gn

Differential Revision: https://reviews.llvm.org/D55090

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348054 91177308-0d34-0410-b5e6-96231b3b80d8

[codeview] Remove dead macros for codeview record serialization, NFC

These weren't needed when we went to the yaml IO style of serialization,
which has "mapOptional".

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348052 91177308-0d34-0410-b5e6-96231b3b80d8

LegacyDivergenceAnalysis: fix uninitialized value

Change-Id: I014502e431a68f7beddf169f6a3d19dac5dd2c26

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348051 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Divergence-driven selection of scalar buffer load intrinsics

Summary:
Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if
the load is really uniform. So select the scalar load intrinsics directly
to either VMEM or SMRD buffer loads based on divergence analysis.

If an offset happens to end up in a VGPR -- either because a floating
point calculation was involved, or due to other remaining deficiencies
in SIFixSGPRCopies -- we use v_readfirstlane.

There is some unrelated churn in tests since we now select MUBUF offsets
in a unified way with non-scalar buffer loads.

Change-Id: I170e6816323beb1348677b358c9d380865cd1a19

Reviewers: arsenm, alex-t, rampitec, tpr

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D53283

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348050 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Fix various issues around the VirtReg2Value mapping

Summary:
The VirtReg2Value mapping is crucial for getting consistently
reliable divergence information into the SelectionDAG. This
patch fixes a bunch of issues that lead to incorrect divergence
info and introduces tight assertions to ensure we don't regress:

1. VirtReg2Value is generated lazily; there were some cases where
   a lookup was performed before all relevant virtual registers were
   created, leading to an out-of-sync mapping. Those cases were:

  - Complex code to lower formal arguments that generated CopyFromReg
    nodes from live-in registers (fixed by never querying the mapping
    for live-in registers).

  - Code that generates CopyToReg for formal arguments that are used
    outside the entry basic block (fixed by never querying the
    mapping for Register nodes, which don't need the divergence info
    anyway).

2. For complex values that are lowered to a sequence of registers,
   all registers must be reflected in the VirtReg2Value mapping.

I am not adding any new tests, since I'm not actually aware of any
bugs that these problems are causing with trunk as-is. However,
I recently added a test case (in r346423) which fails when D53283 is
applied without this change. Also, the new assertions should provide
most of the effective test coverage.

There is one test change in sdwa-peephole.ll. The underlying issue
is that since the divergence info is now correct, the DAGISel will
select V_OR_B32 directly instead of S_OR_B32. This leads to an extra
COPY which affects the behavior of MachineLICM in a way that ends up
with the S_MOV_B32 with the constant in a different basic block than
the V_OR_B32, which is presumably what defeats the peephole.

Reviewers: alex-t, arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D54340

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348049 91177308-0d34-0410-b5e6-96231b3b80d8

[DA] GPUDivergenceAnalysis for unstructured GPU kernels

Summary:
This is patch #3 of the new DivergenceAnalysis

<https://lists.llvm.org/pipermail/llvm-dev/2018-May/123606.html>

The GPUDivergenceAnalysis is intended to eventually supersede the existing
LegacyDivergenceAnalysis. The existing LegacyDivergenceAnalysis produces
incorrect results on unstructured Control-Flow Graphs:

<https://bugs.llvm.org/show_bug.cgi?id=37185>

This patch adds the option -use-gpu-divergence-analysis to the
LegacyDivergenceAnalysis to turn it into a transparent wrapper for the
GPUDivergenceAnalysis.

Reviewers: nhaehnle

Reviewed By: nhaehnle

Subscribers: jholewinski, jvesely, jfb, llvm-commits, alex-t, sameerds, arsenm, nhaehnle

Differential Revision: https://reviews.llvm.org/D53493

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348048 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] add tests for undef + partial undef constant folding; NFC

Keep this file sync'd with the instsimplify version (rL348045).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348047 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Split skylake-avx512 run lines in SLP vectorizer tests to cover -mprefer=vector-width=256 and -mprefer-vector-width=512.

This will make these tests immune if we ever change the default behavior of -march=skylake-avx512 to prefer 256 bit vectors.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348046 91177308-0d34-0410-b5e6-96231b3b80d8

[InstSimplify] add tests for undef + partial undef constant folding; NFC

These tests should probably go under a separate test file because they
should fold with just -constprop, but they're similar to the scalar
tests already in here.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348045 91177308-0d34-0410-b5e6-96231b3b80d8