git.osdn.net Git - android-x86/external-llvm-project.git/log

[CSSPGO][llvm-profgen] Aggregate samples on call frame trie to speed up profile generation

For CS profile generation, the process of call stack unwinding is time-consuming since for each LBR entry we need linear time to generate the context( hash, compression, string concatenation). This change speeds up this by grouping all the call frame within one LBR sample into a trie and aggregating the result(sample counter) on it, deferring the context compression and string generation to the end of unwinding.

Specifically, it uses `StackLeaf` as the top frame on the stack and manipulates(pop or push a trie node) it dynamically during virtual unwinding so that the raw sample can just be recoded on the leaf node, the path(root to leaf) will represent its calling context. In the end, it traverses the trie and generates the context on the fly.

Results:
Our internal branch shows about 5X speed-up on some large workloads in SPEC06 benchmark.

Differential Revision: https://reviews.llvm.org/D94110

[CSSPGO][llvm-profgen] Compress recursive cycles in calling context

This change compresses the context string by removing cycles due to recursive function for CS profile generation. Removing recursion cycles is a way to normalize the calling context which will be better for the sample aggregation and also make the context promoting deterministic.
Specifically for implementation, we recognize adjacent repeated frames as cycles and deduplicated them through multiple round of iteration.
For example:
Considering a input context string stack:
[“a”, “a”, “b”, “c”, “a”, “b”, “c”, “b”, “c”, “d”]
For first iteration,, it removed all adjacent repeated frames of size 1:
[“a”, “b”, “c”, “a”, “b”, “c”, “b”, “c”, “d”]
For second iteration, it removed all adjacent repeated frames of size 2:
[“a”, “b”, “c”, “a”, “b”, “c”, “d”]
So in the end, we get compressed output:
[“a”, “b”, “c”, “d”]

Compression will be called in two place: one for sample's context key right after unwinding, one is for the eventual context string id in the ProfileGenerator.
Added a switch `compress-recursion` to control the size of duplicated frames, default -1 means no size limit.
Added unit tests and regression test for this.

Differential Revision: https://reviews.llvm.org/D93556

[CSSPGO][llvm-profgen] Pseudo probe based CS profile generation

This change implements profile generation infra for pseudo probe in llvm-profgen. During virtual unwinding, the raw profile is extracted into range counter and branch counter and aggregated to sample counter map indexed by the call stack context. This change introduces the last step and produces the eventual profile. Specifically, the body of function sample is recorded by going through each probe among the range and callsite target sample is recorded by extracting the callsite probe from branch's source.

Please refer https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s and https://reviews.llvm.org/D89707 for more context about CSSPGO and llvm-profgen.

**Implementation**

- Extended `PseudoProbeProfileGenerator` for pseudo probe based profile generation.
- `populateBodySamplesWithProbes` reading range counter is responsible for recording function body samples and inferring caller's body samples.
- `populateBoundarySamplesWithProbes` reading branch counter is responsible for recording call site target samples.
- Each sample is recorded with its calling context(named `ContextId`). Remind that the probe based context key doesn't include the leaf frame probe info, so the `ContextId` string is created from two part: one from the probe stack strings' concatenation and other one from the leaf frame probe.
- Added regression test

Test Plan:

ninja & ninja check-llvm

Differential Revision: https://reviews.llvm.org/D92998

[CSSPGO] Fix MSVC initializing truncation warning (NFC)

MSVC warning:
```
\llvm-project\llvm\include\llvm\Transforms\IPO\SampleProfileProbe.h(65): warning C4305: 'initializing': truncation from 'double' to 'const float'
```

[SROA] Propagate correct TBAA/TBAA Struct offsets

SROA does not correctly account for offsets in TBAA/TBAA struct metadata.
This patch creates functionality for generating new MD with the corresponding
offset and updates SROA to use this functionality.

Differential Revision: https://reviews.llvm.org/D95826

(cherry picked from commit 40862b1a7486a969ff044cd240aad24f4183cc10)

[llvm-symbolizer] - Fix the crash in GNU output style with --no-inlines and missing input file.

Fixes https://bugs.llvm.org/show_bug.cgi?id=48882.

If the input file does not exist (or has a reading error), the
following code will crash if there are two or more input addresses.

```
auto ResOrErr = Symbolizer.symbolizeInlinedCode(
ModuleName, {Offset, object::SectionedAddress::UndefSection});
Printer << (error(ResOrErr) ? DILineInfo() : ResOrErr.get().getFrame(0));
```

For the first address, `symbolizeInlinedCode` returns an error.
For the second address, `symbolizeInlinedCode` returns an empty result
(not an error) and `.getFrame(0)` will crash.

Differential revision: https://reviews.llvm.org/D95609

(cherry picked from commit d22140687500f90830fe416d9c1e317f7c4535d5)

[llvm-dwp] Join dwo paths correctly when DWOPath is absolute

When the `DWOPath` is absolute, we want to use `DWOPath` as is, without prepending any other
components to the path. The `sys::path::append` does not join, but rather unconditionally appends
the paths, so something like `sys::path::append("/tmp", "/tmp/banana")` will result in
`/tmp/tmp/banana` rather than the desired `/tmp/banana`.

This then causes `llvm-dwp` to fail in a following situation:

```
$ clang -gsplit-dwarf /tmp/banana/test.c -c -o /tmp/outdir/foo.o
$ clang outdir/foo.o -o outdir/hm
$ llvm-dwarfdump outdir/hm | grep -C2 foo.dwo
                  DW_AT_comp_dir    ("/tmp")
                  DW_AT_GNU_pubnames  (true)
                  DW_AT_GNU_dwo_name    ("/tmp/outdir/foo.dwo")
                                DW_AT_GNU_dwo_id    (0xde4d396f3bf0e257)
                  DW_AT_low_pc  (0x0000000000401100)
$ strace -o trace llvm-dwp -e outdir/hm -o outdir/hm.dwp
error: No such file or directory
$ cat trace | grep foo.dwo
openat(AT_FDCWD, "/tmp/tmp/outdir/foo.dwo", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
```

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D96678

(cherry picked from commit 6ffcb2937c96bd0d7a55b984b5eb8f381b68e322)

[clangd] Treat "null" optional fields as missing

Clangd currently throws away any protocol messages whenever an optional
field has an unexpected type. This patch changes the behaviour to treat
`null` fields as missing.

This enables clangd to be more tolerant against small violations to the
LSP spec.

Fixes https://github.com/clangd/vscode-clangd/issues/134

Differential Revision: https://reviews.llvm.org/D95229

(cherry picked from commit af20232b8e189335da571f48c2467b244b7fd772)

[OpenMP][NVPTX] Add the support for CUDA 11.2 and CUDA 11.1

CUDA 11.2 and CUDA 11.1 are all available now.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D97004

(cherry picked from commit 89827fd404f954605663776e746ec351bde61348)

[clang] functions with the 'const' or 'pure' attribute must always return.

As described in
* https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-pure-function-attribute
* https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-const-function-attribute

An `__attribute__((pure))` function must always return, as well as an `__attribute__((const))` function.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D96960

(cherry picked from commit 46757ccb49ab88da54ca8ddd43665d5255ee80f7)

[LLD] Fix tests after D96993

We now need mustprogress to eliminate these calls. The code doesn't
really make sense, but that's not the point of the test...

(cherry picked from commit ac065b7a37d6dd8daacd526f6c3a0d1563bc88ac)

[DCE] Don't remove non-willreturn calls

In both ADCE and BDCE (via DemandedBits) we should not remove
instructions that are not guaranteed to return. This issue was
pointed out by fhahn in the recent llvm-dev thread.

Differential Revision: https://reviews.llvm.org/D96993

(cherry picked from commit 2f17ed294fcd8cde505b93c9c5bbab06ba59051c)

[IR] Move willReturn() to Instruction

This moves the willReturn() helper from CallBase to Instruction,
so that it can be used in a more generic manner. This will make
it easier to fix additional passes (ADCE and BDCE), and will give
us one place to change if additional instructions should become
non-willreturn (e.g. there has been talk about handling volatile
operations this way).

I have also included the IntrinsicInst workaround directly in
here, so that it gets applied consistently. (As such this change
is not entirely NFC -- FuncAttrs will now use this as well.)

Differential Revision: https://reviews.llvm.org/D96992

(cherry picked from commit 370addb996138a9e3634899cf264c7621307617a)

[DCE] Add tests for non-willreturn function being removed (NFC)

(cherry picked from commit 4045ad6b0ccd35fe990d51b9bfdd9e7de109bdf5)

[PowerPC] Update release notes for changes to PowerPC for V12.0

[release][docs] Update contributions to LLVM 12 for scalable vectors.

Differential Revision: https://reviews.llvm.org/D96270

[lld-macho] Fill out release notes for 12.x

Differential Revision: https://reviews.llvm.org/D95900

[clangd] Fix race in Global CDB shutdown

I believe the atomic write can be reordered after the notify, and that
seems to be happening on mac m1: http://45.33.8.238/macm1/2654/step_8.txt
In practice maybe seq_cst is enough? But no reason not to lock here.

https://bugs.llvm.org/show_bug.cgi?id=48998
(cherry picked from commit 6ac3fd9706047304c52a678884122a3a6bc55432)

[X86][AVX] Add missing VEX_WIG tags from VPACKUSDW/VPHSUBD/VPCMPISTRI/VPCMPISTRM/VPCMPESTRI/VPCMPESTRM

Fixes PR48877

Differential Revision: https://reviews.llvm.org/D95801

(cherry picked from commit 4d904776a77aa80342c65cf72a962920cc9d1fa9)

[X86][AVX] Add 'OK' tests cases for PR48877

(cherry picked from commit e9514429a02b1e4f8b9d54b28a934bfa9bd246ec)

[DAG] Fix shift amount limit in SimplifyDemandedBits trunc(shift(x,c)) to truncated bitwidth

We lost this in D56387/rG69bc0990a9181e6eb86228276d2f59435a7fae67 - where I got the src/dst bitwidths mixed up and assumed getValidShiftAmountConstant would catch it.

Patch by @craig.topper - confirmed by @Carrot that it fixes PR49162

(cherry picked from commit 7ad0c573bd4a68dc81886037457d47daa3d6aa24)

[X86] Add reduced test case for PR49162

(cherry picked from commit 5ca3ef98a71598d368f6f4aaf0b385b50b67ce4a)

Fix exegesis build on aarch64-windows-msvc host

Include x86 intrinsics only when compiling for x86_64
or i386. _MSC_VER no longer implies x86.

Reviewed By: gchatelet

Differential Revision: https://reviews.llvm.org/D96498

Fixes: https://bugs.llvm.org/show_bug.cgi?id=49149

(cherry picked from commit 06f53f2f095c45c93d269b5dc010af506f4b0ff4)

doc: Add a release note for the changed comment char for aarch64-msvc targets

This was backported in a6ea391b832573830b011f26013ebaa946032250.

Revert "Disable rosegment for old Android versions."

This reverts commit fae16fc0eed7cf60207901818cfe040116f2ef00.
Breaks building compiler-rt android runtimes with trunk clang
but older NDK, see discussion on https://reviews.llvm.org/D95166

(cherry picked from commit 1608ba09462d877111230e9461b895f696f8fcb1)

[ASTMatchers] Fix matching after generic top-level matcher

With a matcher like

  expr(anyOf(integerLiteral(equals(42)), unless(expr())))

and code such as

  struct B {
    B(int);
  };

  B func1() { return 42; }

the top-level expr() would match each of the nodes which are not spelled
in the source and then ignore-traverse to match the integerLiteral node.
This would result in multiple results reported for the integerLiteral.

Fix that by only running matching logic on nodes which are not skipped
with the top-level matcher.

Differential Revision: https://reviews.llvm.org/D95735

(cherry picked from commit d6a06365cf12bebe20a7d65cf3894608efc089b4)

[ASTMatchers] Fix definition of decompositionDecl

(cherry picked from commit b10d445307a0f3c7e5522836b4331090aacaf349)

Fix traversal with hasDescendant into lambdas

Differential Revision: https://reviews.llvm.org/D95607

(cherry picked from commit bb57a3422a09dcdd572ccb42767a0dabb5f966dd)

[ASTMatchers] Fix traversal below range-for elements

Differential Revision: https://reviews.llvm.org/D95562

(cherry picked from commit 79125085f16540579d27c7e4987f63eef9c4aa23)

Ensure that we traverse non-op() method bodys of lambdas

Differential Revision: https://reviews.llvm.org/D95644

(cherry picked from commit 43cc4f15008f8c700497d3d2b7020bfd29f5750f)

[ASTMatchers] Avoid pathological traversal over nested lambdas

Differential Revision: https://reviews.llvm.org/D95573

(cherry picked from commit 6f0df3cddb3e3f38df1baa7aa4d743a74bb46688)

Revert "[PowerPC] [Clang] Enable float128 feature on P9 by default"

Commit 6bf29dbb enables float128 feature by default for Power9 targets.
But float128 may cause build failure in libcxx testing. Revert this
commit first to unblock LLVM 12 release.

(cherry picked from commit 447dc856b243b99ce70019ba1187c39746f4e0e9)

[Verifier] Allow DW_TAG_class_type/DW_TAG_union_type to have no filename

`clang/lib/CodeGen/CGOpenMPRuntime.cpp` synthesized union
(`distinct !DICompositeType(tag: DW_TAG_union_type, name: "kmp_cmplrdata_t", size: 64, elements: <0x62b690>)`)
does not have meaningful filename/line number.

D94735 dropped the previously arbitrary and untested filename/line from the union and caused a verifier error here.

This fixes `check-libarcher` failures.

Differential Revision: https://reviews.llvm.org/D96212

(cherry picked from commit ad60802a7187aa39b0374536be3fa176fe3d6256)

[X86] Always assign reassoc flag for intrinsics *reduce_add/mul_ps/pd.

Intrinsics *reduce_add/mul_ps/pd have assumption that the elements in
the vector are reassociable. So we need to always assign the reassoc
flag when we call _mm_reduce_* intrinsics.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D96231

(cherry picked from commit dd2460ed5d77d908327ce29a15630cd3268bd76e)

[RISCV] Remove SRO* and SLO* instructions from bitmanip.

As of the current draft these are no longer being considered
for the bitmanip spec. It wasn't clear what sub extension they
belonged in in the 0.93 spec.

So remove them. They can always be added back if something changes.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D96157

(cherry picked from commit fd5adae02cafe388673d3b3f92ef791af3c73cfe)

Recommit of a2fdf9d4d734732a6fa9288f1ffdf12bf8618123.

- The failures are all cc1-based tests due to the missing `-aux-triple` options,
which is always prepared by the driver in CUDA/HIP compilation.
- Add extra check on the missing aux-targetinfo to prevent crashing.

[hip][cuda] Enable extended lambda support on Windows.

- On Windows, extended lambda has extra issues due to the numbering
schemes are different between the host compilation (Microsoft C++ ABI)
and the device compilation (Itanium C++ ABI. Additional device side
lambda number is required per lambda for the host compilation to
correctly mangle the device-side lambda name.
- A hybrid numbering context `MSHIPNumberingContext` is introduced to
number a lambda for both host- and device-compilations.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D69322

This reverts commit 4874ff02417916cc9ff994b34abcb5e563056546.

(cherry picked from commit 01bf529db2cf465b029e29e537807576bfcbc452)

[AArch64] Use '//' as comment string for MSVC assembly

As the actual MSVC toolset doesn't use the GAS-style assembly that
Clang/LLVM produces and consumes, there's no reference for what
string to use for e.g. comments when building with a MSVC triple.

This frees up the use of semicolon as separator string, just like
was done for GNU targets in 23413195649d0cf6f3860ae8b5fb115b35032075.
(Previously, both the separator and comment strings were set to
the same, a semicolon.)

Compiler-rt extensively uses separator chars in its assembly,
and that assembly should be buildable with clang-cl for MSVC too.

Differential Revision: https://reviews.llvm.org/D96259

(cherry picked from commit 71c29b4cf3fb2b5610991bfbc12b8bda97d60005)

[ELF] Allow R_386_GOTOFF from .debug_info

In GCC emitted .debug_info sections, R_386_GOTOFF may be used to
relocate DW_AT_GNU_call_site_value values
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98946).

R_386_GOTOFF (`S + A - GOT`) is one of the `isStaticLinkTimeConstant` relocation
type which is not PC-relative, so it can be used from non-SHF_ALLOC sections. We
current allow new relocation types as needs come. The diagnostic has caught some
bugs in the past.

Differential Revision: https://reviews.llvm.org/D95994

(cherry picked from commit b3165a70ae83b46dc145f335dfa9690ece361e92)

[AIX] Improve option processing for mabi=vec-extabi and mabi=vec=defaul

Opening this revision to better address comments by @hubert.reinterpretcast in https://reviews.llvm.org/rGcaaaebcde462

Reviewed By: hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D95702

(cherry picked from commit eb3426a528d5b3cbbb54aee662a779f2067fc9db)

[AIX] Actually push back "-mabi=vec-extabi" when option is on.

Accidentaly ommitted the portion of pushing back the option in
https://reviews.llvm.org/D94986

(cherry picked from commit caaaebcde462bf681498ce85c2659d683a07fc87)

[OpenMP][NVPTX] Refined CMake logic to choose compute capabilites

This patch refines the logic to choose compute capabilites via the
environment variable `LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES`. It supports the
following values (all case insensitive):
- "all": Build `deviceRTLs` for all supported compute capabilites;
- "auto": Only build for the compute capability auto detected. Note that this
requires CUDA. If CUDA is not found, a CMake fatal error will be raised.
- "xx,yy" or "xx;yy": Build for compute capabilities `xx` and `yy`.

If `LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES` is not set, it is equivalent to set
it to `all`.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95687

(cherry picked from commit 26d38f6d20ff137d89cb7c891b739662de1ca508)

Define new/delete in libc++ when using libcxxrt

Always turn on LIBCXX_ENABLE_NEW_DELETE_DEFINITIONS, if libcxxrt is used
as the C++ ABI library, since libcxxrt does not provide the full set
ofnew and delete operators. In particular, the aligned versions of these
operators are completely missing. This primarily addresses builds on
FreeBSD, as this platform uses libcxxrt by default.

Also, attempt to provide a FreeBSD.cmake cache file, with hopefully sane
settings, partially copied from the Apple.cmake cache file. This needs
more work, probably some additions to ci build scripts (although I am
not aware of any 'official' FreeBSD build bots).

Reviewed By: ldionne, #libc

Differential Revision: https://reviews.llvm.org/D96720

(cherry picked from commit 328261019f50a76b11fa625739cbf32ceb2ce2f7)

[OpenMP] Delay more diagnostics of potentially non-emitted code

Even code in target and declare target regions might not be emitted.
With this patch we delay more diagnostics and use laziness and linkage
to determine if a function is emitted (for the device). Note that we
still eagerly emit diagnostics for target regions, unfortunately, see
the TODO for the reason.

This hopefully fixes PR48933.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D95928

(cherry picked from commit 1dd66e6111a8247c6c7931143251c0cf1442b905)

[OpenMP] Attribute target diagnostics properly

Type errors in function declarations were not (always) diagnosed prior
to this patch. Furthermore, certain remarks did not get associated
properly which caused them to be emitted multiple times.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D95912

(cherry picked from commit f9286b434b764b366f1aad9249c04e7741ed5518)

[OpenMP][NFC] Pre-commit test changes regarding PR48933

This will highlight the effective changes in subsequent commits.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D95903

(cherry picked from commit 3b2f19d0bc2803697526191a8a607efa0b38f7e4)

[AssumptionCache] Do not track llvm.assume calls (PR49043)

This fixes PR49043 by invalidating the handle on RAUW. This will work
fine assuming all existing RAUW users add the new assumption to the
cache. That means, if a new llvm.assume call replaces an old one, you
need to add the new one now as a RAUW is not enough anymore.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D96208

(cherry picked from commit 378f4e5ec26c3e0d2119c1112ec645b369eed2de)

workflows: Increase the fetch-depth for actions/checkout steps

This avoids failures when many commits are pushed close together.

[clang-tidy] Fix crash in readability-identifier-naming check

`isParamInMainLikeFunction` didn't check if the function had an identifer name before calling getName() which could lead to an assert.

(cherry picked from commit c97592c5df09850404a9ddbfb614c7df271d1dfe)

[InlineFunction] Only update noalias scopes once for an instruction.

Inlining sometimes maps different instructions to be inlined onto the same instruction.

We must ensure to only remap the noalias scopes once. Otherwise the scope might disappear (at best).
This patch ensures that we only replace scopes for which the mapping is known.

This approach is preferred over tracking which instructions we already handled in a SmallPtrSet,
as that one will need more memory.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D95862

(cherry picked from commit 50c523a9d4402c69d59c0b2ecb383a763d16cde9)

[LoopPeel] Use llvm.experimental.noalias.scope.decl for duplicating noalias metadata as needed.

The reduction of a sanitizer build failure when enabling the dominance check (D95335) showed that loop peeling also needs to take care of scope duplication, just like loop unrolling (D92887).

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D95544

(cherry picked from commit 80cdd30eb90c3509bf315f1fa1369483e2448bbd)

IntrinsicEmitter: Change IntrinsicsToAttributesMap from uint8_t[] to uint16_t[]

We need at least 252 UniqAttributes now, which will soon overflow.
Actually with downstream backends we can easily use up the last few values.
So bump to uint16_t.

(cherry picked from commit b7d63244226ba2c0df651622fe7fe3f5f8aba262)

[RISCV] Add new vector instructions in v0.10.

* Add new vector instructions in v0.10.
- load/store for mask value vle1.v vse1.v
- vsetivli for 0-31 immediate vector length.
* Rename vector instructions in v0.10.
- vfrsqrte7 -> vfrsqrt7
- vfrece7 -> vfrec7
* Reserve memory width encodings for EEW>128b.

Differential Revision: https://reviews.llvm.org/D95781

(cherry picked from commit c7189ba78578d029e0162720319de3c1c6fc348b)

[RISCV] Replace NoX0 SDNodeXForm with a ComplexPattern to do the selection of the VL operand.

I think this is a more standard way of doing this.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D95833

(cherry picked from commit e7f9a834996f40be8dc46a0b059aa850f1f4ef05)

Don't infer attributes on '::operator new'.

These attributes were all incorrect or inappropriate for LLVM to infer:
- inaccessiblememonly is generally wrong; user replacement operator new
  can access memory that's visible to the caller, as can a new_handler
  function.
- willreturn is generally wrong; a custom new_handler is not guaranteed
  to terminate.
- noalias is inappropriate: Clang has a flag to determine whether this
  attribute should be present and adds it itself when appropriate.
- noundef and nonnull on the return value should be specified by the
  frontend on all 'operator new' functions if we want them, not here.

In any case, inferring attributes on functions declared 'nobuiltin' (as
these are when Clang emits them) seems questionable.

(cherry picked from commit ab243efb261ba7e27f4b14e1a6fbbff15a79c0bf)

Revert "[BuildLibcalls, Attrs] Support more variants of C++'s new, add attributes for C++'s delete"

Several of the new attributes here were incorrect, and even the ones
that are generally correct were being added even to nobuiltin calls.

This reverts commit bb3f169b59e1c8bd7fd70097532220bbd11e9967.

(cherry picked from commit 1484ad4137b5d627573672bad48b03785f8fdefd)

[ARM] Do not emit ldrexd/strexd on Cortex-M chips

The ldrexd/strexd instructions are not supported on M-class chips, see
for example
https://developer.arm.com/documentation/dui0489/e/arm-and-thumb-instructions/memory-access-instructions/ldrex-and-strex
which says:

> All these 32-bit Thumb instructions are available in ARMv6T2 and
> above, except that LDREXD and STREXD are not available in the ARMv7-M
> architecture.

Looking at the ARMv8-M architecture, it appears that these instructions
aren't supported either. The Architecture Reference Manual lists
ldrex/strex but not ldrexd/strexd:
https://developer.arm.com/documentation/ddi0553/bn/

Godbolt example on LLVM 11.0.0, which incorrectly emits ldrexd/strexd
instructions: https://llvm.godbolt.org/z/5qqPnE

Differential Revision: https://reviews.llvm.org/D95891

(cherry picked from commit aecdf15cc7f866180dc769265b8183cad34bb33a)

[Support] Indent multi-line descr of enum cli options.

As noted in https://reviews.llvm.org/D93459, the formatting of
multi-line descriptions of clEnumValN and the likes is unfavorable.
Thus this patch adds support for correctly indenting these.

Reviewed By: serge-sans-paille

Differential Revision: https://reviews.llvm.org/D93494

(cherry picked from commit e3f02302e318837d2421c6425450f04ae0a82b90)

[RISCV] Fix incorrect RVV sdiv/udiv lowering

Due to a clerical error, the sdiv operation was mapping to vdivu and
udiv to vdiv, when the opposite mapping is the correct one.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D95869

(cherry picked from commit b4106f9c7b8c498d109301ced7bf9aca32027168)

[DAGCombine] Do not remove masking argument to FP16_TO_FP for some targets

As of commit 284f2bffc9bc5, the DAG Combiner gets rid of the masking of the
input to this node if the mask only keeps the bottom 16 bits. This is because
the underlying library function does not use the high order bits. However, on
PowerPC's ELFv2 ABI, it is the caller that is responsible for clearing the bits
from the register. Therefore, the library implementation of __gnu_h2f_ieee will
return an incorrect result if the bits aren't cleared.

This combine is desired for ARM (and possibly other targets) so this patch adds
a query to Target Lowering to check if this zeroing needs to be kept.

Fixes: https://bugs.llvm.org/show_bug.cgi?id=49092

Differential revision: https://reviews.llvm.org/D96283

(cherry picked from commit a5222aa0858a42660629c410a5b669dee16a4359)

PR48587: is_constant_evaluated() should not evaluate to true during a
variable's destruction if it didn't do so during construction.

The standard doesn't give any guidance as to what to do here, but this
approach seems reasonable and conservative, and has been proposed to the
standard committee.

(cherry picked from commit c945dc4a5023d6a17d11fcda76509b94b36e34fc)

[GlobalISel] Check if branches use the same MBB in matchOptBrCondByInvertingCond

If the G_BR + G_BRCOND in this combine use the same MBB, then it will infinite
loop. Don't allow that to happen.

Differential Revision: https://reviews.llvm.org/D95895

(cherry picked from commit 02d4b365bf4f8c2cb56e5612902f6c3bb4316493)

[clangd] Fix clang tidy provider when multiple config files exist in directory tree

Currently Clang tidy provider searches from the root directory up to the target directory, this is the opposite of how clang-tidy searches for config files.
The result of this is .clang-tidy files are ignored in any subdirectory of a directory containing a .clang-tidy file.

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D96204

(cherry picked from commit ba3ea9c60f0f259f0ccc47e47daf8253a5885531)

Fix "not all control paths return a value" warning. NFCI.

PR48606: The lifetime of a constexpr heap allocation always started
during the same evaluation.

It looks like the only case for which this matters is determining
whether mutable subobjects of a heap allocation can be modified during
constant evaluation.

(cherry picked from commit 21e8bb83253e1a2f4b6fad9b53cafe8c530a38e2)

[AST] Update LVal before evaluating lambda decl fields.

Differential Revision: https://reviews.llvm.org/D96092

(cherry picked from commit 96fb49c3ff8e08680127ddd4ec45a0e6c199243b)

[lldb-vscode] correctly use Windows macros

@mstorsjo found a mistake that I made when trying to fix some Windows
compilation errors encountered by @stella.stamenova.

I was incorrectly using the LLVM_ON_UNIX macro. In any case, proper use
of

#if defined(_WIN32)

should be the actual fix.

Differential Revision: https://reviews.llvm.org/D96060

(cherry picked from commit 36496cc2992d6fa26e6024971efcfc7d15f69888)

Fix lldb-vscode builds on Windows targeting POSIX

@stella.stamenova found out that lldb-vscode's Win32 macros were failing
when building on windows targetings POSIX platforms.

I'm changing these macros for LLVM_ON_UNIX, which should be more
accurate.

(cherry picked from commit 0bca9a7ce2eeaa9f1d732ffbc17769560a2b236e)

Fix runInTerminal failures on Windows

stella.stemenova mentioned in https://reviews.llvm.org/D93951 failures on Windows for this test.

I'm fixing the macro definitions and disabling the tests for python
versions lower than 3.7. I'll figure out that actual issue with
python3.6 after the buildbots are fine again.

(cherry picked from commit ab5591e1d8f5abcfa9e75193d3e8a29087b61425)

[🍒][libc++] Fix libcxx build on 32bit architectures with 64bit time_t defaults e.g. riscv32

Patch by Khem Raj.

(cherry pick of commit 85b9c5ccc172a1e61c7ecaaec4752587cb6f1e26)

Differential Revision: https://reviews.llvm.org/D96062

[🍒]Disable CFI in __get_elem to allow casting a pointer to uninitialized memory

Fixes usage of shared_ptr with CFI enabled, which is llvm.org/pr48993.

(cherry pick of commit bab74864168bb5e28ecbc0294fe1095d8da7f569)

Differential Revision: https://reviews.llvm.org/D96063

[🍒][libc++] Rename include/support to include/__support

We do ship those headers, so the directory name should not be something
that can potentially conflict with user-defined directories.

This is a cherry-pick of b51756819a85563ae063e98eeb3d6af8e44c8f64.

Differential Revision: https://reviews.llvm.org/D96059

[OpenMP][libomptarget] Fixed an issue that device sync is skipped if the kernel doesn't have any argument

Currently if there is not kernel argument, device synchronization will
be skipped. This can lead to two issues:
1. If there is any device error, it will not be captured;
2. The target region might end before the kernel is done, which is not spec
conformant.

The test added in this patch only runs on NVPTX platform, although it will not
be executed by Phab at all. It also requires `not` which is not available on most
systems.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D96067

(cherry picked from commit b68a6b09e60a24733b923a0fc282746a855852da)

[MemorySSA] Don't treat lifetime.end as NoAlias

MemorySSA currently treats lifetime.end intrinsics as not aliasing
anything. This breaks MemorySSA-based MemCpyOpt, because we'll happily
move a read of a pointer below a lifetime.end intrinsic, as no clobber
is reported.

I think the MemorySSA modelling here isn't correct: lifetime.end(p)
has approximately the same effect as doing a memcpy(p, undef), and
should be treated as a clobber.

This patch removes the special handling of lifetime.end, leaving
alias analysis to handle it appropriately.

Differential Revision: https://reviews.llvm.org/D95763

[MemCpyOpt] Add test for incorrect optimization across lifetime (NFC)

This only affects the MemorySSA-based implementation.

workflows: Update libclang-abi-tests to work with minor release baselines

Add a release note about deprecating the clang-cl /fallback flag

As discussed in
https://lists.llvm.org/pipermail/cfe-dev/2021-January/067524.html

The flag has been removed on the main branch in D95876.

Differential revision: https://reviews.llvm.org/D96016

[OpenMP] Disabled profiling in `libomp` by default to unblock link errors

Link error occurred when time profiling in libomp is enabled by default
because `libomp` is assumed to be a C library but the dependence on
`libLLVMSupport` for profiling is a C++ library. Currently the issue blocks all
OpenMP tests in Phabricator.

This patch set a new CMake option `OPENMP_ENABLE_LIBOMP_PROFILING` to
enable/disable the feature. By default it is disabled. Note that once time
profiling is enabled for `libomp`, it becomes a C++ library.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95585

(cherry picked from commit c571b168349fdf22d1dc8b920bcffa3d5161f0a2)

[OpenMP] Fix building using LLVM_ENABLE_RUNTIMES

Fix when time profiling is enabled.

Related to: D94855

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D95398

(cherry picked from commit bb40e6731843de92f1c73ad6efceb8a89e045ea6)

[clang][aarch64][WOA64][docs] Release note for longjmp crash with /guard:cf

Add a release note workaround for PR47463.

Bug: https://bugs.llvm.org/show_bug.cgi?id=47463

Differential Revision: https://reviews.llvm.org/D95435

Revert "[OpenMP] Disabled profiling in `libomp` by default to unblock link errors"

This reverts commit f5602e0bf31ab590da19fa357980a753dbfd666e.

[X86] Accept 64-bit GPRs for vextractps when using a register that requires EVEX.

This is consistent with the VEX version. It also fixes a sorting
issue in the matching table that caused the EVEX version to be
prioritized over VEX in intel syntax.

Fixes issue [2] from PR48991.

(cherry picked from commit c691fe14da93a7c9eff466231515d6d4d16124fa)

[OpenMP][NVPTX] Take functions in `deviceRTLs` as `convergent`

OpenMP device compiler (similar to other SPMD compilers) assumes that
functions are convergent by default to avoid invalid transformations, such as
the bug (https://bugs.llvm.org/show_bug.cgi?id=49021).

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95971

(cherry picked from commit 0f0ce3c12edefd25448e39c4d20718a10d3d42c1)

[CSSPGO] Introducing distribution factor for pseudo probe.

Sample re-annotation is required in LTO time to achieve a reasonable post-inline profile quality. However, we have seen that such LTO-time re-annotation degrades profile quality. This is mainly caused by preLTO code duplication that is done by passes such as loop unrolling, jump threading, indirect call promotion etc, where samples corresponding to a source location are aggregated multiple times due to the duplicates. In this change we are introducing a concept of distribution factor for pseudo probes so that samples can be distributed for duplicated probes scaled by a factor. We hope that optimizations duplicating code well-maintain the branch frequency information (BFI) based on which probe distribution factors are calculated. Distribution factors are updated at the end of preLTO pipeline to reflect an estimated portion of the real execution count.

This change also introduces a pseudo probe verifier that can be run after each IR passes to detect duplicated pseudo probes.

A saturated distribution factor stands for 1.0. A pesudo probe will carry a factor with the value ranged from 0.0 to 1.0. A 64-bit integral distribution factor field that represents [0.0, 1.0] is associated to each block probe. Unfortunately this cannot be done for callsite probes due to the size limitation of a 32-bit Dwarf discriminator. A 7-bit distribution factor is used instead.

Changes are also needed to the sample profile inliner to deal with prorated callsite counts. Call sites duplicated by PreLTO passes, when later on inlined in LTO time, should have the callees’s probe prorated based on the Prelink-computed distribution factors. The distribution factors should also be taken into account when computing hotness for inline candidates. Also, Indirect call promotion results in multiple callisites. The original samples should be distributed across them. This is fixed by adjusting the callisites' distribution factors.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D93264

(cherry picked from commit 3d89b3cbec230633e8228787819b15116c1a1730)

[CSSPGO] Factor out common part for CSSPGO inline and AFDO inline

Refactoring SampleProfileLoader::inlineHotFunctions to use helpers from CSSPGO inlining and reduce similar code in the inlining loop, plus minor cleanup for AFDO path.

This is resubmit of D95024, with build break and overtighten assertion fixed.

Test Plan:

(cherry picked from commit 1645f465be85223e9f5b6303a3e5e0e491fd819f)

[CSSPGO] Call site prioritized inlining for sample PGO

This change implemented call site prioritized BFS profile guided inlining for sample profile loader. The new inlining strategy maximize the benefit of context-sensitive profile as mentioned in the follow up discussion of CSSPGO RFC. The change will not affect today's AutoFDO as it's opt-in. CSSPGO now defaults to the new FDO inliner, but can fall back to today's replay inliner using a switch (`-sample-profile-prioritized-inline=0`).

Motivation

With baseline AutoFDO, the inliner in sample profile loader only replays previous inlining, and the use of profile is only for pruning previous inlining that turned out to be cold. Due to the nature of replay, the FDO inliner is simple with hotness being the only decision factor. It has the following limitations that we're improving now for CSSPGO.
- It doesn't take inline candidate size into account. Since it's doing replay, the size growth is bounded by previous CGSCC inlining. With context-sensitive profile, FDO inliner is no longer limited by previous inlining, so we need to take size into account to avoid significant size bloat.
- The way it looks at hotness is not accurate. It uses total samples in an inlinee as proxy for hotness, while what really matters for an inline decision is the call site count. This is an unfortunate fall back because call site count and callee entry count are not reliable due to dwarf based correlation, especially for inlinees. Now paired with pseudo-probe, we have accurate call site count and callee's entry count, so we can use that to gauge hotness more accurately.
- It treats all call sites from a block as hot as long as there's one call site considered hot. This is normally true, but since total samples is used as hotness proxy, this transitiveness within block magnifies the inacurate hotness heuristic. With pseduo-probe and the change above, this is no longer an issue for CSSPGO.

New FDO Inliner

Putting all the requirement for CSSPGO together, we need a top-down call site prioritized BFS inliner. Here're reasons why each component is needed.
- Top-down: We need a top-down inliner to better leverage context-sensitive profile, so inlining is driven by accurate context profile, and post-inline is also accurate. This is already implemented in https://reviews.llvm.org/D70655.
- Size Cap: For top-down inliner, taking function size into account for inline decision alone isn't sufficient to control size growth. We also need to explicitly cap size growth because with top-down inlining, we can grow inliner size significantly with large number of smaller inlinees even if each individually passes the cost/size check.
- Prioritize call sites: With size cap, inlining order also becomes important, because if we stop inlining due to size budget limit, we'd want to use budget towards the most beneficial call sites.
- BFS inline: Same as call site prioritization, if we stop inlining due to size budget limit, we want a balanced inline tree, rather than going deep on one call path.

Note that the new inliner avoids repeatedly evaluating same set of call site, so it should help with compile time too. For this reason, we could transition today's FDO inliner to use a queue with equal priority to avoid wasted reevaluation of same call site (TODO).

Speculative indirect call promotion and inlining is also supported now with CSSPGO just like baseline AutoFDO.

Tunings and knobs

I created tuning knobs for size growth/cap control, and for hot threshold separate from CGSCC inliner. The default values are selected based on initial tuning with CSSPGO.

Results

Evaluated with an internal LLVM fork couple months ago, plus another change to adjust hot-threshold cutoff for context profile (will send up after this one), the new inliner show ~1% geomean perf win on spec2006 with CSSPGO, while reducing code size too. The measurement was done using train-train setup, MonoLTO w/ new pass manager and pseudo-probe. Note that this is just a starting point - we hope that the new inliner will open up more opportunity with CSSPGO, but it will certainly take more time and effort to make it fully calibrated and ready for bigger workloads (we're working on it).

Differential Revision: https://reviews.llvm.org/D94001

(cherry picked from commit 6bae5973c476e16dbbc82030d65c7859a6628e89)

[CSSPGO] Passing the clang driver switch -fpseudo-probe-for-profiling to the linker.

As titled.

Reviewed By: wmi, wenlei

Differential Revision: https://reviews.llvm.org/D95271

(cherry picked from commit d3e2e3740d0730cb6788c771bb01a8f3e935bf2e)

[CSSPGO] Tweaking inlining with pseudo probes.

Fixing up a couple places where `getCallSiteIdentifier` is needed to support pseudo-probe-based callsites.

Also fixing an issue in the extbinary profile reader where the metadata section is not fully scanned based on the number of profiles loaded only for the current module.

Reviewed By: wmi, wenlei

Differential Revision: https://reviews.llvm.org/D95791

(cherry picked from commit 224fee8219bb3aed34f13ce40935e1b3ede90a0f)

[CSSPGO] Support of CS profiles in extended binary format.

This change brings up support of context-sensitive profiles in the format of extended binary. Existing sample profile reader/writer/merger code is being tweaked to reflect the fact of bracketed input contexts, like (`[...]`). The paired brackets are also needed in extbinary profiles because we don't yet have an otherwise good way to tell calling contexts apart from regular function names since the context delimiter `@` can somehow serve as a part of the C++ mangled names.

Reviewed By: wmi, wenlei

Differential Revision: https://reviews.llvm.org/D95547

(cherry picked from commit 7e99bddfeaab2713a8bb6ca538da25b66e6efc59)

[OpenMP] Disabled profiling in `libomp` by default to unblock link errors

Link error occurred when time profiling in libomp is enabled by default
because `libomp` is assumed to be a C library but the dependence on
`libLLVMSupport` for profiling is a C++ library. Currently the issue blocks all
OpenMP tests in Phabricator.

This patch set a new CMake option `OPENMP_ENABLE_LIBOMP_PROFILING` to
enable/disable the feature. By default it is disabled. Note that once time
profiling is enabled for `libomp`, it becomes a C++ library.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95585

(cherry picked from commit c571b168349fdf22d1dc8b920bcffa3d5161f0a2)

Extend release notes for AST Matchers changes

PR44325 (and duplicates): don't issue -Wzero-as-null-pointer-constant
when rewriting 'a < b' as '(a <=> b) < 0'.

It's pretty common for comparison category types to use a pointer or
pointer-to-member type as their '0' parameter.

(cherry picked from commit 1f06f41993b6363e6b2c4f22a13488a3e687f31b)

[OpenMP] Fix seg fault in libomptarget when using Info with multiple threads

Summary:
One option for the LIBOMPTARGET_INFO environment variable is to print the current status of the device's data mappings. These are a shared resource among threads so this needs to be protected when using multiple streams.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95786

(cherry picked from commit fda48539988d2a1bdb6395799151e9090312a20b)

[OpenMP][NFC] Added release note for new `deviceRTLs` and hidden helper task

Added release note for new `deviceRTLs` and hidden helper task for LLVM
12.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95584

(cherry picked from commit 7bc31018f71cac22b7060c49cefb6f3d0d2e2069)

[OpenMP][deviceRTLs] Added `[[clang::loader_uninitialized]]` explicitly

`[[clang::loader_uninitialized]]` is in macro `SHARED` but it doesn't
work for array like `parallelLevel`, so the variable will be zero initialized.
There is also a similar issue for `omptarget_nvptx_device_State` which is in
global address space. Its c'tor is also generated, which was not in the past when
building the `deviceRTLs` with CUDA. In this patch, we added the attribute to
the two variables explicitly.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95550

(cherry picked from commit 19248d30e4ed5250fa84abbbd52fc7b835918a45)

[OpenMP][NVPTX] Added the missing -O1 when building NVPTX bitcode libraries

In the past `-O1` was used when building NVPTX bitcode libraries. After
we switched to OpenMP, `-O1` was missing by mistake, leading to a huge performance
regression.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D95545

(cherry picked from commit 5a64794bbad4010778406dfee7748e6080258dbf)

[OpenMP][Libomptarget] Fix conditional in CMake for remote plugin

The remote offloading plugin's CMakeLists was trying to build if its
flag was enabled even if it didn't find gRPC/protobuf. The conditional
was wrong, it's fixed by this.

Differential Revision: https://reviews.llvm.org/D95574

(cherry picked from commit 8a77056256d9970387595a5c729d894e3fe07131)

[elfabi] Fix tests which failed on different timezones

This patch fixes elfabi tests on machines using a GMT+X timezone
settings.

Differential Revision: https://reviews.llvm.org/D95641

(cherry picked from commit 771b35965457ebd5faaed8a1c3d2bcefffe721a3)

[X86] Fix disassembly of x86-64 GDTLS code sequence

For x86-64 the REX.w prefix takes precedence over any other size
override (i.e. 0x66). Therefore, for x86-64 when REX.w is present set
'hasOpSize' to false to ensure that any size override is ignored.

Fixes PR48901.

Differential Revision: https://reviews.llvm.org/D95682

(cherry picked from commit 94fedd266125a5425aa33e11332bf414f0b6dc35)

[LV] Fix crash when computing max VF too early

D90687 introduced a crash:

  llvm::LoopVectorizationCostModel::computeMaxVF(llvm::ElementCount, unsigned int):
    Assertion `WideningDecisions.empty() && Uniforms.empty() && Scalars.empty() &&
    "No decisions should have been taken at this point"' failed.

when compiling the following C code:

  typedef struct {
  char a;
  } b;

  b *c;
  int d, e;

  int f() {
    int g = 0;
    for (; d; d++) {
      e = 0;
      for (; e < c[d].a; e++)
        g++;
    }
    return g;
  }

with:

  clang -Os -target hexagon -mhvx -fvectorize -mv67 testcase.c -S -o -

This occurred since prior to D90687 computeFeasibleMaxVF would only be
called in computeMaxVF when a scalar epilogue was allowed, but now it's
always called. This causes the assert above since computeFeasibleMaxVF
collects all viable VFs larger than the default MaxVF, and for each VF
calculates the register usage which results in analysis being done the
assert above guards against. This can occur in computeFeasibleMaxVF if
TTI.shouldMaximizeVectorBandwidth and this target hook is implemented in
the hexagon backend to always return true.

Reported by @iajbar.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D94869

(cherry picked from commit 8cda227432f1c9ceb63b88802ed8136da97274f1)

[RISCV] Update the version number to v0.10 for vector.

(cherry picked from commit 9847023660467a4469b5667bcf7a4c73a4780037)