git.osdn.net Git - android-x86/external-llvm-project.git/log

[lldb] Don't process symlinks deep inside DWARFUnit

Summary:
This code is handling debug info paths starting with /proc/self/cwd,
which is one of the mechanisms people use to obtain "relocatable" debug
info (the idea being that one starts the debugger with an appropriate
cwd and things "just work").

Instead of resolving the symlinks inside DWARFUnit, we can do the same
thing more elegantly by hooking into the existing Module path remapping
code. Since llvm::DWARFUnit does not support any similar functionality,
doing things this way is also a step towards unifying llvm and lldb
dwarf parsers.

Reviewers: JDevlieghere, aprantl, clayborg, jdoerfert

Subscribers: lldb-commits

Tags: #lldb

Differential Revision: https://reviews.llvm.org/D71770

Fix the invisible-traversal to ignore more nodes

Make SymbolFileDWARF::ParseLineTable use std::sort instead of insertion sort

Summary:
Motivation: When setting breakpoints in certain projects line sequences are frequently being inserted out of order.

Rather than inserting sequences one at a time into a sorted line table, store all the line sequences as we're building them up and sort and flatten afterwards.

Reviewers: jdoerfert, labath

Reviewed By: labath

Subscribers: teemperor, labath, mgrang, JDevlieghere, lldb-commits

Tags: #lldb

Differential Revision: https://reviews.llvm.org/D72909

[lldb/DWARF] Simplify DWARFDebugInfoEntry::LookupAddress

Summary:
This method was doing a lot more than it's only caller needed
(DWARFDIE::LookupDeepestBlock) needed, so I inline it into the caller,
and remove any code which is not actually used. This includes code for
searching for the deepest function, and the code for working around
incomplete DW_AT_low_pc/high_pc attributes on a compile unit DIE (modern
compiler get this right, and this method is called on function DIEs
anyway).

This also improves our llvm consistency, as llvm::DWARFDebugInfoEntry is
just a very simple struct with no nontrivial logic.

Reviewers: JDevlieghere, aprantl

Subscribers: lldb-commits

Tags: #lldb

Differential Revision: https://reviews.llvm.org/D72920

Fix clang-formatting for recent commits

[LV] Vectorizer should adjust trip count in profile information

Summary: Vectorized loop processes VFxUF number of elements in one iteration thus total number of iterations decreases proportionally. In addition epilog loop may not have more than VFxUF - 1 iterations. This patch updates profile information accordingly.

Reviewers: hsaito, Ayal, fhahn, reames, silvas, dcaballe, SjoerdMeijer, mkuper, DaniilSuchkov

Reviewed By: Ayal, DaniilSuchkov

Subscribers: fedor.sergeev, hiraditya, rkruppe, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67905

[clang][CodeComplete] Propogate printing policy to FunctionDecl

Summary:
Printing policy was not propogated to functiondecls when creating a
completion string which resulted in canonical template parameters like
`foo<type-parameter-0-0>`. This patch propogates printing policy to those as
well.

Fixes https://github.com/clangd/clangd/issues/76

Reviewers: ilya-biryukov

Subscribers: jkorous, arphaman, usaxena95, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D72715

Compare traversal for memoization before bound nodes container

Add missing tests for parent traversal

[clangd] Remove a stale FIXME, NFC.

[X86][SSE] Add PACKSS SimplifyMultipleUseDemandedBits 'sign bit' handling.

Attempt to use SimplifyMultipleUseDemandedBits to simplify PACKSS if we're only after the sign bit.

[lldb/DWARF] Change how we construct a llvm::DWARFContext

Summary:
The goal of this patch is two-fold. First, it fixes a use-after-free in
the construction of the llvm DWARFContext. This happened because the
construction code was throwing away the lldb DataExtractors it got while
reading the sections (unlike their llvm counterparts, these are also
responsible for memory ownership). In most cases this did not matter,
because the sections are just slices of the mmapped file data. But this
isn't the case for compressed elf sections, in which case the section is
decompressed into a heap buffer. A similar thing also happen with object
files which are loaded from process memory.

The second goal is to make it explicit which sections go into the llvm
DWARFContext -- any access to the sections through both DWARF parsers
carries a risk of parsing things twice, so it's better if this is a
conscious decision. Also, this avoids loading completely irrelevant
sections (e.g. .text). At present, the only section that needs to be
present in the llvm DWARFContext is the debug_line_str. Using it through
both APIs is not a problem, as there is no parsing involved.

The first goal is achieved by loading the sections through the existing
lldb DWARFContext APIs, which already do the caching. The second by
explicitly enumerating the sections we wish to load.

Reviewers: JDevlieghere, aprantl

Subscribers: lldb-commits

Tags: #lldb

Differential Revision: https://reviews.llvm.org/D72917

[ARM][MVE] Tail-Predication: rematerialise iteration count in exit blocks

This patch uses helper function rewriteLoopExitValues that is refactored in
D72602 to rematerialise the iteration count in exit blocks, so that we can
clean-up loop update expressions inside the hardware-loops later in
ARMLowOverheadLoops, which is necessary to get actual performance gains for
tail-predicated loops.

Differential Revision: https://reviews.llvm.org/D72714

[NFC][LoopUtils] Minor change in comment according to review D71990.

[test] On Mac, don't try to use result of sysctl command if calling it failed.

If sysctl is not found at all, let the usual exception propogate
so that the user can fix their env. If it fails because of the
permissions required to read the property then print a warning
and continue.

Differential Revision: https://reviews.llvm.org/D72278

[LoopUtils] Better accuracy for getLoopEstimatedTripCount.

Summary: Current implementation of getLoopEstimatedTripCount returns 1 iteration less than it should. The reason is that in bottom tested loop first iteration is executed before first back branch is taken. For example for loop with !{!"branch_weights", i32 1 // taken, i32 1 // exit} metadata getLoopEstimatedTripCount gives 1 while actual number of iterations is 2.

Reviewers: Ayal, fhahn

Reviewed By: Ayal

Subscribers: mgorny, hiraditya, zzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71990

Recommit "[DWARF5][DebugInfo]: Added support for DebugInfo generation for auto return type for C++ member functions."

Summary:
This was reverted in 328e0f3dcac52171b8cdedeaba22c98e7fbb75ea due to
chromium bot failure. This revision addresses that case.

Original commit message:
Summary:
    This patch will provide support for auto return type for the C++ member
    functions. Before this return type of the member function is deduced and
    stored in the DIE.
    This patch includes llvm side implementation of this feature.

    Patch by: Awanish Pandey <Awanish.Pandey@amd.com>

    Reviewers: dblaikie, aprantl, shafik, alok, SouraVX, jini.susan.george

    Reviewed by: dblaikie

    Differential Revision: https://reviews.llvm.org/D70524

[llvm-objdump] - Fix the indentation when printing dynamic tags.

We have a bug currently: printed tag names might overlap the
value column. It happens for MIPS now.

This patch adds a logic to calculate the size of indentation on fly
to fix such issues.

Differential revision: https://reviews.llvm.org/D72838

[IndVarSimplify][LoopUtils] rewriteLoopExitValues. NFCI

This moves `rewriteLoopExitValues()` from IndVarSimplify to LoopUtils thus
making it a generic loop utility function. This allows to rewrite loop exit
values by just calling this function without running the whole IndVarSimplify
pass.

We use this in D72714 to rematerialise the iteration count in exit blocks, so
that we can clean-up loop update expressions inside the hardware-loops later.

Differential Revision: https://reviews.llvm.org/D72602

[test] Simplify CodeGen/PowerPC/stack-coloring-vararg.mir

[llvm-mc] - Produce R_X86_64_PLT32 relocation for branches with JCC opcodes too.

The idea is to produce R_X86_64_PLT32 instead of
R_X86_64_PC32 for branches.

It fixes https://bugs.llvm.org/show_bug.cgi?id=44397.

This patch teaches MC to do that for JCC (jump if condition is met)
instructions. The new behavior matches modern GNU as.
It is similar to D43383, which did the same for "call/jmp foo",
but missed JCC cases.

Differential revision: https://reviews.llvm.org/D72831

[LLVMgold][test] Fix llvm-nm test after D72658

Differential Revision: https://reviews.llvm.org/D73014

[ARM] MVE VLDn postinc

This adds Post inc variants of the VLD2/4 and VST2/4 instructions in
MVE. It uses the same mechanism/nodes as Neon, transforming the
intrinsic+add pair into a ARMISD::VLD2_UPD, which gets selected to a
post-inc instruction. The code to do that is mostly taken from the
existing Neon code, but simplified as less variants are needed.

It also fills in some getTgtMemIntrinsic for the arm.mve.vld2/4
instrinsics, which allow the nodes to have MMO's, calculated as the full
length to the memory being loaded/stored.

Differential Revision: https://reviews.llvm.org/D71194

[ARM] MVE VLDn post inc tests. NFC

[ARM] Favour post inc for MVE loops

We were previously not necessarily favouring postinc for the MVE loads
and stores, leading to extra code prior to the loop to set up the
preinc. MVE in general can benefit from postinc (as we don't have
unrolled loops), and certain instructions like the VLD2's only post-inc
versions are available.

Differential Revision: https://reviews.llvm.org/D70790

[StackColoring] Remap FixedStackPseudoSourceValue frame index referenced by MachineMemOperand

StackColoring::remapInstructions() remaps MachineOperand frame index (e.g. %stack.1 -> %stack.0)
but does not remap FixedStackPseudoSourceValue frame index (e.g. store 4 into %stack.1.ap2.i.i)
referenced by MachineMemoryOperand.

This can cause an assertion failure when LiveDebugValues references a dead stack object.

It is difficult to craft a test case. -g, va_copy and stack-coloring are required.
I can only reproduce it on ppc32.

[mlir] NFC: Fix trivial typos in comments

Differential Revision: https://reviews.llvm.org/D73012

[libc++][libc++abi] Fix or suppress failing tests in single-threaded
builds.

Fix a libc++abi test that was incorrectly checking for threading
primitives even when threading was disabled.

Additionally, temporarily XFAIL some module tests that fail because
the <atomic> header is unsupported but still built as a part of the
std module.

To properly address this libc++ would either need to produce a different
module.modulemap for single-threaded configurations, or it would need
to make the <atomic> header not hard-error and instead be empty
for single-threaded configurations

Undo changes to release notes intended for the Clang 10 branch, not

List implicit operator== after implicit destructors in a vtable.

Summary:
We previously listed first declared members, then implicit operator=,
then implicit operator==, then implicit destructors. Per discussion on
https://github.com/itanium-cxx-abi/cxx-abi/issues/88, put the implicit
equality comparison operators at the very end, after all special member
functions.

Reviewers: rjmccall

Subscribers: cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D72897

PR42108 Consistently diagnose binding a reference template parameter to
a temporary.

We previously failed to materialize a temporary when performing an
implicit conversion to a reference type, resulting in our thinking the
argument was a value rather than a reference in some cases.

Reorder targets in alphabetical order. NFC.

[X86] Try to avoid casts around logical vector ops recursively.

Currently PromoteMaskArithemtic only looks at a single operation to
skip casts. This means we miss cases where we combine multiple masks.

This patch updates PromoteMaskArithemtic to try to recursively promote
AND/XOR/AND nodes that terminate in truncates of the right size or
constant vectors.

Reviewers: craig.topper, RKSimon, spatel

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D72524

[BranchRelaxation] Simplify offset computation and fix a bug in adjustBlockOffsets()

If Start!=0, adjustBlockOffsets() may unnecessarily adjust the offset of
Start. There is no correctness issue, but it can create more block
splits.

fix doc typos to cycle bots

[TargetRegisterInfo] Default trackLivenessAfterRegAlloc() to true

Except AMDGPU/R600RegisterInfo (a bunch of MIR tests seem to have
problems), every target overrides it with true. PostMachineScheduler
requires livein information. Not providing it can cause assertion
failures in ScheduleDAGInstrs::addSchedBarrierDeps().

[ORC] Add weak symbol support to defineMaterializing, fix for PR40074.

The MaterializationResponsibility::defineMaterializing method allows clients to
add new definitions that are in the process of being materialized to the JIT.
This patch adds support to defineMaterializing for symbols with weak linkage
where the new definitions may be rejected if another materializer concurrently
defines the same symbol. If a weak symbol is rejected it will not be added to
the MaterializationResponsibility's responsibility set. Clients can check for
membership in the responsibility set via the
MaterializationResponsibility::getSymbols() method before resolving any
such weak symbols.

This patch also adds code to RTDyldObjectLinkingLayer to tag COFF comdat symbols
introduced during codegen as weak, on the assumption that these are COFF comdat
constants. This fixes http://llvm.org/PR40074.

Fix gcc `-Wunused-variable` warning. NFC.

Remove extra "\01" prefix in EH docs

These escapes haven't been necessary since f8b51c5f90c60. Remove them to
declutter the docs.

[clang-format] Expand the SpacesAroundConditions option to include catch statements

Summary: This diff expands the SpacesAroundConditions option added in D68346 to include adding spaces to catch statements.

Reviewed By: MyDeveloperDay

Patch by: timwoj

Differential Revision: https://reviews.llvm.org/D72793

[clang-format] Add IndentCaseBlocks option

Summary:
The documentation for IndentCaseLabels claimed that the "Switch
statement body is always indented one level more than case labels". This
is technically false for the code block immediately following the label.
Its closing bracket aligns with the start of the label.

If the case label are not indented, it leads to a style where the
closing bracket of the block aligns with the closing bracket of the
switch statement, which can be hard to parse.

This change introduces a new option, IndentCaseBlocks, which when true
treats the block as a scope block (which it technically is).

(Note: regenerated ClangFormatStyleOptions.rst using tools/dump_style.py)

Reviewed By: MyDeveloperDay

Patch By: capn

Tags: #clang-format, #clang

Differential Revision: https://reviews.llvm.org/D72276

Allow space after C-style cast in C# code

Reviewed By: MyDeveloperDay, krasimir

Patch By: jbcoe

Differential Revision: https://reviews.llvm.org/D72150

[gn build] Port a0f50d73163

fix doc typos to cycle bots

[CodeGen] Move fentry-insert, xray-instrumentation and patchable-function before addPreEmitPass()

This intention is to move patchable-function before aarch64-branch-targets
(configured in AArch64PassConfig::addPreEmitPass) so that we emit BTI before NOPs
(see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92424).

This also allows addPreEmitPass() passes to know the precise instruction sizes if they want.

Tried x86-64 Debug/Release builds of ccls with -fxray-instrument -fxray-instruction-threshold=1.
No output difference with this commit and the previous commit.

[XRay] Set hasSideEffects flag of PATCHABLE_FUNCTION_{ENTER,EXIT}

Otherwise they may be picked as the delay slot by mips-delay-slot-filler, if we move patchable-function before mips-delay-slot-filler.

[DebugInfo][test] Change two MIR tests to use -start-before=livedebugvalues instead of -start-after=patchable-function

To break order dependency between livedebugvalues and patchable-function.

[X86] Remove X86ISD::FILD_FLAG and stop gluing nodes together.

Summary:
I think whatever problem the gluing was fixing has long since been fixed. We don't have any of the restrictions on FP stack stuff that existed back when this was first added.

I had to change which type we use for FILD in BuildFILD when X86 was enabled because most of the isel patterns block f32/f64 instructions when SSE1/SSE2 are enabled. So I needed to use the f80 pattern, but this shouldn't have an effect the generated code since there is only one FILD instruction anyway. We already use f80 explicitly in other other places.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: andrew.w.kaylor, scanon, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72805

[X86][BranchAlign] Suppress branch alignment for {,_}__tls_get_addr

The x86-64 General Dynamic TLS code sequence uses prefixes to allow
linker relaxation. Adding segment override prefix or NOPs can break
linker relaxation (ld -pie/-no-pie).

i386 General Dynamic and x86-64 Local Dynamic do not use prefixes, but
for simplicity, just disable auto padding consistently.

Reviewed By: skan, LuoYuanke

Differential Revision: https://reviews.llvm.org/D72878

[AsmPrinter] Delete dead takeDeletedSymbsForFunction()

The code added in r98579 is dead now.

[Concepts] Fix name-type conflict compilation issues

D50360 caused some platforms to not compile due to a parameter with the name of a type.

Rename the parameter.

[Concepts] Requires Expressions

Implement support for C++2a requires-expressions.

Re-commit after compilation failure on some platforms due to alignment issues with PointerIntPair.

Differential Revision: https://reviews.llvm.org/D50360

[llvm-exegesis][mips] Fix -Wunused-function after D72858

[lldb/Test] XFAIL TestRequireHWBreakpoints when HW BPs are avialable

Resolves PR44055

[mlir] NFC: Rename index_t to index_type

mlir currently fails to build on Solaris:

  /vol/llvm/src/llvm-project/dist/mlir/lib/Conversion/VectorToLoops/ConvertVectorToLoops.cpp:78:20: error: reference to 'index_t' is ambiguous
    IndexHandle zero(index_t(0)), one(index_t(1));
                     ^
  /usr/include/sys/types.h:103:16: note: candidate found by name lookup is 'index_t'
  typedef short           index_t;
                          ^
  /vol/llvm/src/llvm-project/dist/mlir/include/mlir/EDSC/Builders.h:27:8: note: candidate found by name lookup is 'mlir::edsc::index_t'
  struct index_t {
         ^

and many more.

Given that POSIX reserves all identifiers ending in `_t` 2.2.2 The Name Space <https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html>, it seems
quite unwise to use such identifiers in user code, even more so without a distinguished
prefix.

The following patch fixes this by renaming `index_t` to `index_type`.
cases.

Tested on `amd64-pc-solaris2.11` and `sparcv9-sun-solaris2.11`.

Differential Revision: https://reviews.llvm.org/D72619

[mlir] Fix compilation with VS2019.

[debugserver] Share code between Enable/DisableHardwareWatchpoint (NFC)

This extract the common functionality of enabling and disabling hardware
watchpoints into a single function.

Differential revision: https://reviews.llvm.org/D72971

[test] clang/test/InterfaceStubs/externstatic.c requires x86-registered-target

Revert "[Support] Explicitly instantiate BumpPtrAllocatorImpl"

This reverts commit add95990508ee0aec90d07bcce1bba47b4f46622.

Buildbots don't seem to like it.

[Support] Explicitly instantiate BumpPtrAllocatorImpl

Most clients only ever use the default BumpPtrAllocator.

Revert "[ms] [llvm-ml] Add placeholder for llvm-ml, based on llvm-mc"

This reverts commit 22af2cbefc86dbef6e11ddaa96a08956e0baf22b, due to breakages on ARM platforms.

Revert "[Concepts] Requires Expressions"

This reverts commit 027931899763409e2c61a84bdee6057b5e838ffa.

There have been some failing tests on some platforms, reverting while investigating.

[X86] Rename lowerShuffleAsRotate -> lowerShuffleAsVALIGN

Since it can only ever create VALIGN nodes.

[X86][SSE] Add some v16i8 reverse + endian swap style shuffle tests

[Concepts] Requires Expressions

Implement support for C++2a requires-expressions.

Differential Revision: https://reviews.llvm.org/D50360

[DAG] Add helper for creating constant vector index with correct type. NFC.

[lldb/testsuite] Modernize 2 test Makefiles

Those old Makefiles used completely ad-hoc rules for building files,
which means they didn't obey the test harness' variants.

They were somewhat tricky to update as they use very peculiar build
flags for some files. For this reason I was careful to compare the
build commands before and after the change, which is how I found the
discrepancy fixed by the previous commit.

While some of the make syntax used here might not be easy to grasp for
newcomers (per-target variable overrides), it seems better than to
have to repliacte the Makefile.rules logic for the test variants and
platform support.

[lldb/Makefile.rules] Force the default target to be 'all'

The test harness invokes the test Makefiles with an explicit 'all'
target, but it's handy to be able to recursively call Makefile.rules
without speficying a goal.

Some time ago, we rewrote some tests in terms of recursive invocations
of Makefile.rules. It turns out this had an unintended side
effect. While using $(MAKE) for a recursive invocation passes all the
variables set on the command line down, it doesn't pass the make
goals. This means that those recursive invocations would invoke the
default rule. It turns out the default rule of Makefile.rules is not
'all', but $(EXE). This means that ti would work becuase the
executable is always needed, but it also means that the created
binaries would not follow some of the other top-level build
directives, like MAKE_DSYM.

Forcing 'all' to be the default target seems easier than making sure
all the invocations are correct going forward. This patch does this
using the .DEFAULT_GOAL directive rather than hoisting the 'all' rule
to be the first one of the file. It seems like this explicit approach
will be less prone to be broken in the future. Hopefully all the make
implementations we use support it.

DebugInfo: Move SectionLabel tracking into CU's addRange

This makes the SectionLabel handling more resilient - specifically for
future PROPELLER work which will have more CU ranges (rather than just
one per function).

Ultimately it might be nice to make this more general/resilient to
arbitrary labels (rather than relying on the labels being created for CU
ranges & then being reused by ranges, loclists, and possibly other
addresses). It's possible that other (non-rnglist/loclist) uses of
addresses will need the addresses to be in SectionLabels earlier (eg:
move the CU.addRange to be done on function begin, rather than function
end, so during function emission they are already populated for other
use).

[IR] Remove some unnecessary cleanup in Module's dtor, and use a unique_ptr to simplify some

Follow on from D72812, based on Mehdi Amini's feedback.

[WebAssembly] Track frame registers through VReg and local allocation

This change has 2 components:

Target-independent: add a method getDwarfFrameBase to TargetFrameLowering. It
describes how the Dwarf frame base will be encoded. That can be a register (the
default), the CFA (which replaces NVPTX-specific logic in DwarfCompileUnit), or
a DW_OP_WASM_location descriptr.

WebAssembly: Allow WebAssemblyFunctionInfo::getFrameRegister to return the
correct virtual register instead of FP32/SP32 after WebAssemblyReplacePhysRegs
has run. Make WebAssemblyExplicitLocals store the local it allocates for the
frame register. Use this local information to implement getDwarfFrameBase

The result is that the DW_AT_frame_base attribute is correctly encoded for each
subprogram, and each param and local variable has a correct DW_AT_location that
uses DW_OP_fbreg to refer to the frame base.

This is a reland of rG3a05c3969c18 with fixes for the expensive-checks
and Windows builds

Differential Revision: https://reviews.llvm.org/D71681

[MLIR] LLVM dialect: modernize and cleanups

Summary:
Modernize some of the existing custom parsing code in the LLVM dialect.
While this reduces some boilerplate code, it also reduces the precision
of the diagnostic error messges.

Reviewers: ftynse, nicolasvasilache, rriddle

Reviewed By: rriddle

Subscribers: merge_guards_bot, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, liufengdb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72967

TableGen/GlobalISel: Don't check exact intrinsic opcode value

Consolidate internal denormal flushing controls

Currently there are 4 different mechanisms for controlling denormal
flushing behavior, and about as many equivalent frontend controls.

- AMDGPU uses the fp32-denormals and fp64-f16-denormals subtarget features
- NVPTX uses the nvptx-f32ftz attribute
- ARM directly uses the denormal-fp-math attribute
- Other targets indirectly use denormal-fp-math in one DAGCombine
- cl-denorms-are-zero has a corresponding denorms-are-zero attribute

AMDGPU wants a distinct control for f32 flushing from f16/f64, and as
far as I can tell the same is true for NVPTX (based on the attribute
name).

Work on consolidating these into the denormal-fp-math attribute, and a
new type specific denormal-fp-math-f32 variant. Only ARM seems to
support the two different flush modes, so this is overkill for the
other use cases. Ideally we would error on the unsupported
positive-zero mode on other targets from somewhere.

Move the logic for selecting the flush mode into the compiler driver,
instead of handling it in cc1. denormal-fp-math/denormal-fp-math-f32
are now both cc1 flags, but denormal-fp-math-f32 is not yet exposed as
a user flag.

-cl-denorms-are-zero, -fcuda-flush-denormals-to-zero and
-fno-cuda-flush-denormals-to-zero will be mapped to
-fp-denormal-math-f32=ieee or preserve-sign rather than the old
attributes.

Stop emitting the denorms-are-zero attribute for the OpenCL flag. It
has no in-tree users. The meaning would also be target dependent, such
as the AMDGPU choice to treat this as only meaning allow flushing of
f32 and not f16 or f64. The naming is also potentially confusing,
since DAZ in other contexts refers to instructions implicitly treating
input denormals as zero, not necessarily flushing output denormals to
zero.

This also does not attempt to change the behavior for the current
attribute. The LangRef now states that the default is ieee behavior,
but this is inaccurate for the current implementation. The clang
handling is slightly hacky to avoid touching the existing
denormal-fp-math uses. Fixing this will be left for a future patch.

AMDGPU is still using the subtarget feature to control the denormal
mode, but the new attribute are now emitted. A future change will
switch this and remove the subtarget features.

AMDGPU/GlobalISel: Select llvm.amdgcn.update.dpp

The existing test is overly reliant on -mattr=-flat-for-global, and
some missing optimizations to re-use.

AMDGPU/GlobalISel: Select DS append/consume

Remove unneeded FoldingSet.h include from Attributes.h

Avoids 637 extra FoldingSet.h and Allocator.h includes. FoldingSet.h
needs Allocator.h, which is relatively expensive.

[libc] Replace the use of gtest with a new light weight unittest framework.

Header files included wrongly using <...> are now included using the
internal path names as the new unittest framework allows us to do so.

Reviewers: phosek, abrachet

Differential Revision: https://reviews.llvm.org/D72743

Remove AllTargetsAsmPrinters

It's been an empty target since r360498 and friends
(`git log --grep='Move InstPrinter files to MCTargetDesc.' llvm/lib/Target`),
but due to hwo the way these targets are structured it was silently
an empty target without anyone noticing.

No behavior change.

Remove redundant CXXScopeSpec from TemplateIdAnnotation.

A TemplateIdAnnotation represents only a template-id, not a
nested-name-specifier plus a template-id. Don't make a redundant copy of
the CXXScopeSpec and store it on the template-id annotation.

This slightly improves error recovery by more properly handling the case
where we would form an invalid CXXScopeSpec while parsing a typename
specifier, instead of accidentally putting the token stream into a
broken "annot_template_id with a scope specifier, but with no preceding
annot_cxxscope token" state.

[gn build] Port d3db13af7e5

[gn build] fix build after 22af2cbefc

Merge memtag instructions with adjacent stack slots.

Summary:
Detect a run of memory tagging instructions for adjacent stack frame slots,
and replace them with a shorter instruction sequence
* replace STG + STG with ST2G
* replace STGloop + STGloop with STGloop

This code needs to run when stack slot offsets are already known, but before
FrameIndex operands in STG instructions are eliminated; that's the
reason for the new hook in PrologueEpilogue.

This change modifies STGloop and STZGloop pseudos to take the size as an
immediate integer operand, and adds _untied variants of those pseudos
that are allowed to take the base address as a FI operand. This is needed to
simplify recognizing an STGloop instruction as operating on a stack slot
post-regalloc.

This improves memtag code size by ~0.25%, and it looks like an additional ~0.1%
is possible by rearranging the stack frame such that consecutive STG
instructions reference adjacent slots (patch pending).

Reviewers: pcc, ostannard

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70286

[MemDepAnalysis/VNCoercion] Move static method to its only use. [NFCI]

Static method MemoryDependenceResults::getLoadLoadClobberFullWidthSize
does not have or use any info specific to MemoryDependenceResults.
Move it to its only user: VNCoercion.

[CMake] Prefer multi-target variables over generic target variables in runtimes build

Runtimes variables in a multi-target environment are defined like:

RUNTIMES_target_VARIABLE_NAME
RUNTIMES_target+multi_VARIABLE_NAME

In my case, I have a downstream runtimes cache that does the following:

set(RUNTIMES_${target}+except_LIBCXXABI_ENABLE_EXCEPTIONS ON CACHE BOOL "")
set(RUNTIMES_${target}_LIBCXX_ENABLE_EXCEPTIONS OFF CACHE BOOL "")

I found that I was always getting the 'target' variable value (OFF) in
my 'target+except' build, which was unexpected. This behavior was
caused by the loop in llvm/runtimes/CMakeLists.txt that runs through all
variable names, adding '-DVARIABLE_NAME=' options to the subsequent
external project's cmake command.

The issue is that the loop does a single pass, such that if the 'target'
value appears in the cache after the 'target+except' value, the 'target'
value will take precedence. I suggest in my change here that the more
specific 'target+except' value should take precedence always, without
relying on CMake cache ordering.

Differential Revision: https://reviews.llvm.org/D71570

Patch By: JamesNagurne

hwasan: Remove dead code. NFCI.

Differential Revision: https://reviews.llvm.org/D72896

[profile] Support counter relocation at runtime

This is an alternative to the continous mode that was implemented in
D68351. This mode relies on padding and the ability to mmap a file over
the existing mapping which is generally only available on POSIX systems
and isn't suitable for other platforms.

This change instead introduces the ability to relocate counters at
runtime using a level of indirection. On every counter access, we add a
bias to the counter address. This bias is stored in a symbol that's
provided by the profile runtime and is initially set to zero, meaning no
relocation. The runtime can mmap the profile into memory at abitrary
location, and set bias to the offset between the original and the new
counter location, at which point every subsequent counter access will be
to the new location, which allows updating profile directly akin to the
continous mode.

The advantage of this implementation is that doesn't require any special
OS support. The disadvantage is the extra overhead due to additional
instructions required for each counter access (overhead both in terms of
binary size and performance) plus duplication of counters (i.e. one copy
in the binary itself and another copy that's mmapped).

Differential Revision: https://reviews.llvm.org/D69740

[CMake] Use LinuxRemoteTI instead of LinuxLocalTI in CrossWinToARMLinux cmake cache

Summary: Depends on D72847

Reviewers: vvereschaka, aorlov, andreil99

Reviewed By: vvereschaka

Subscribers: mgorny, kristof.beyls, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D72850

[libcxx] Introduce LinuxRemoteTI for remote testing

Summary:
This patch adds a new target info object called LinuxRemoteTI.
Unlike LinuxLocalTI, which asks the host system about various things
like available locales, distribution name etc. which don't make sense
if we're testing on a remote board, LinuxRemoteTI uses SSHExecutor
to get information from the target system.

Reviewers: jroelofs, ldionne, bcraig, EricWF, danalbert, mclow.lists

Reviewed By: jroelofs

Subscribers: christof, dexonsmith, libcxx-commits

Tags: #libc

Differential Revision: https://reviews.llvm.org/D72847

[lldb/Docs] Fix formatting for the variable formatting page

[mlir][Linalg] Extend linalg vectorization to MatmulOp

Summary:
This is a simple extension to allow vectorization to work not only on GenericLinalgOp
but more generally across named ops too.
For now, this still only vectorizes matmul-like ops but is a step towards more
generic vectorization of Linalg ops.

Reviewers: ftynse

Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72942

[libc++] Optimize / partially inline basic_string copy constructor

Splits copy constructor up inlining short initialization, outlining long
initialization into __init_long() which is the externally instantiated slow
path initialization.

Subsequently changing the copy ctor to be inlined (not externally instantiated)
provides significant speed ups for short string initialization.

Generated code given:

void StringCopyCtor(void* mem, const std::string& s) {
    std::string*p = new(mem) std::string{s};
}

asm:
        cmp     byte ptr [rsi + 23], 0
        js      .LBB0_2
        mov     rax, qword ptr [rsi + 16]
        mov     qword ptr [rdi + 16], rax
        movups  xmm0, xmmword ptr [rsi]
        movups  xmmword ptr [rdi], xmm0
        ret
.LBB0_2:
        jmp     std::basic_string::__init_long # TAILCALL

Benchmark:
BM_StringCopy_Empty                                           5.19ns ± 6%             1.50ns ± 8%  -71.02%        (p=0.000 n=10+10)
BM_StringCopy_Small                                           5.14ns ± 8%             1.53ns ± 7%  -70.17%        (p=0.000 n=10+10)
BM_StringCopy_Large                                           18.9ns ± 0%             19.3ns ± 0%   +1.92%        (p=0.000 n=10+10)
BM_StringCopy_Huge                                             309ns ± 1%              316ns ± 5%     ~            (p=0.633 n=8+10)

Patch from Martijn Vels (mvels@google.com)
Reviewed as D72160.

hwasan: Move .note.hwasan.globals note to hwasan.module_ctor comdat.

As of D70146 lld GCs comdats as a group and no longer considers notes in
comdats to be GC roots, so we need to move the note to a comdat with a GC root
section (.init_array) in order to prevent lld from discarding the note.

Differential Revision: https://reviews.llvm.org/D72936

[InstSimplify] add test for select of vector constants; NFC

[InstSimplify] add test for select of FP constants; NFC

[mlir] [VectorOps] Rename Utils.h into VectorUtils.h

Summary:
First step towards the consolidation
of a lot of vector related utilities
that are now all over the place
(or even duplicated).

Reviewers: nicolasvasilache, andydavis1

Reviewed By: nicolasvasilache, andydavis1

Subscribers: merge_guards_bot, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, liufengdb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72955

Pass length of string in Go binding of CreateCompileUnit

[xray] Allow instrumenting only function entry and/or only function exit

Extend -fxray-instrumentation-bundle to split function-entry and
function-exit into two separate options, so that it is possible to
instrument only function entry or only function exit. For use cases
that only care about one or the other this will save significant overhead
and code size.

Differential Revision: https://reviews.llvm.org/D72890

[clang][xray] Add -fxray-ignore-loops option

XRay allows tuning by minimum function size, but also always instruments
functions with loops in them. If the minimum function size is set to a
large value the loop instrumention ends up causing most functions to be
instrumented anyway. This adds a new flag, -fxray-ignore-loops, to disable
the loop detection logic.

Differential Revision: https://reviews.llvm.org/D72873

[xray] Add xray-ignore-loops option

XRay allows tuning by minimum function size, but also always instruments
functions with loops in them. If the minimum function size is set to a
large value the loop instrumention ends up causing most functions to be
instrumented anyway. This adds a new flag, xray-ignore-loops, to disable
the loop detection logic.

Differential Revision: https://reviews.llvm.org/D72659