git.osdn.net Git - android-x86/external-llvm-project.git/log

[Verifier] Allow DW_TAG_class_type/DW_TAG_union_type to have no filename

`clang/lib/CodeGen/CGOpenMPRuntime.cpp` synthesized union
(`distinct !DICompositeType(tag: DW_TAG_union_type, name: "kmp_cmplrdata_t", size: 64, elements: <0x62b690>)`)
does not have meaningful filename/line number.

D94735 dropped the previously arbitrary and untested filename/line from the union and caused a verifier error here.

This fixes `check-libarcher` failures.

Differential Revision: https://reviews.llvm.org/D96212

(cherry picked from commit ad60802a7187aa39b0374536be3fa176fe3d6256)

[X86] Always assign reassoc flag for intrinsics *reduce_add/mul_ps/pd.

Intrinsics *reduce_add/mul_ps/pd have assumption that the elements in
the vector are reassociable. So we need to always assign the reassoc
flag when we call _mm_reduce_* intrinsics.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D96231

(cherry picked from commit dd2460ed5d77d908327ce29a15630cd3268bd76e)

[RISCV] Remove SRO* and SLO* instructions from bitmanip.

As of the current draft these are no longer being considered
for the bitmanip spec. It wasn't clear what sub extension they
belonged in in the 0.93 spec.

So remove them. They can always be added back if something changes.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D96157

(cherry picked from commit fd5adae02cafe388673d3b3f92ef791af3c73cfe)

Recommit of a2fdf9d4d734732a6fa9288f1ffdf12bf8618123.

- The failures are all cc1-based tests due to the missing `-aux-triple` options,
which is always prepared by the driver in CUDA/HIP compilation.
- Add extra check on the missing aux-targetinfo to prevent crashing.

[hip][cuda] Enable extended lambda support on Windows.

- On Windows, extended lambda has extra issues due to the numbering
schemes are different between the host compilation (Microsoft C++ ABI)
and the device compilation (Itanium C++ ABI. Additional device side
lambda number is required per lambda for the host compilation to
correctly mangle the device-side lambda name.
- A hybrid numbering context `MSHIPNumberingContext` is introduced to
number a lambda for both host- and device-compilations.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D69322

This reverts commit 4874ff02417916cc9ff994b34abcb5e563056546.

(cherry picked from commit 01bf529db2cf465b029e29e537807576bfcbc452)

[AArch64] Use '//' as comment string for MSVC assembly

As the actual MSVC toolset doesn't use the GAS-style assembly that
Clang/LLVM produces and consumes, there's no reference for what
string to use for e.g. comments when building with a MSVC triple.

This frees up the use of semicolon as separator string, just like
was done for GNU targets in 23413195649d0cf6f3860ae8b5fb115b35032075.
(Previously, both the separator and comment strings were set to
the same, a semicolon.)

Compiler-rt extensively uses separator chars in its assembly,
and that assembly should be buildable with clang-cl for MSVC too.

Differential Revision: https://reviews.llvm.org/D96259

(cherry picked from commit 71c29b4cf3fb2b5610991bfbc12b8bda97d60005)

[ELF] Allow R_386_GOTOFF from .debug_info

In GCC emitted .debug_info sections, R_386_GOTOFF may be used to
relocate DW_AT_GNU_call_site_value values
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98946).

R_386_GOTOFF (`S + A - GOT`) is one of the `isStaticLinkTimeConstant` relocation
type which is not PC-relative, so it can be used from non-SHF_ALLOC sections. We
current allow new relocation types as needs come. The diagnostic has caught some
bugs in the past.

Differential Revision: https://reviews.llvm.org/D95994

(cherry picked from commit b3165a70ae83b46dc145f335dfa9690ece361e92)

[AIX] Improve option processing for mabi=vec-extabi and mabi=vec=defaul

Opening this revision to better address comments by @hubert.reinterpretcast in https://reviews.llvm.org/rGcaaaebcde462

Reviewed By: hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D95702

(cherry picked from commit eb3426a528d5b3cbbb54aee662a779f2067fc9db)

[AIX] Actually push back "-mabi=vec-extabi" when option is on.

Accidentaly ommitted the portion of pushing back the option in
https://reviews.llvm.org/D94986

(cherry picked from commit caaaebcde462bf681498ce85c2659d683a07fc87)

[OpenMP][NVPTX] Refined CMake logic to choose compute capabilites

This patch refines the logic to choose compute capabilites via the
environment variable `LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES`. It supports the
following values (all case insensitive):
- "all": Build `deviceRTLs` for all supported compute capabilites;
- "auto": Only build for the compute capability auto detected. Note that this
requires CUDA. If CUDA is not found, a CMake fatal error will be raised.
- "xx,yy" or "xx;yy": Build for compute capabilities `xx` and `yy`.

If `LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES` is not set, it is equivalent to set
it to `all`.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95687

(cherry picked from commit 26d38f6d20ff137d89cb7c891b739662de1ca508)

Define new/delete in libc++ when using libcxxrt

Always turn on LIBCXX_ENABLE_NEW_DELETE_DEFINITIONS, if libcxxrt is used
as the C++ ABI library, since libcxxrt does not provide the full set
ofnew and delete operators. In particular, the aligned versions of these
operators are completely missing. This primarily addresses builds on
FreeBSD, as this platform uses libcxxrt by default.

Also, attempt to provide a FreeBSD.cmake cache file, with hopefully sane
settings, partially copied from the Apple.cmake cache file. This needs
more work, probably some additions to ci build scripts (although I am
not aware of any 'official' FreeBSD build bots).

Reviewed By: ldionne, #libc

Differential Revision: https://reviews.llvm.org/D96720

(cherry picked from commit 328261019f50a76b11fa625739cbf32ceb2ce2f7)

[OpenMP] Delay more diagnostics of potentially non-emitted code

Even code in target and declare target regions might not be emitted.
With this patch we delay more diagnostics and use laziness and linkage
to determine if a function is emitted (for the device). Note that we
still eagerly emit diagnostics for target regions, unfortunately, see
the TODO for the reason.

This hopefully fixes PR48933.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D95928

(cherry picked from commit 1dd66e6111a8247c6c7931143251c0cf1442b905)

[OpenMP] Attribute target diagnostics properly

Type errors in function declarations were not (always) diagnosed prior
to this patch. Furthermore, certain remarks did not get associated
properly which caused them to be emitted multiple times.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D95912

(cherry picked from commit f9286b434b764b366f1aad9249c04e7741ed5518)

[OpenMP][NFC] Pre-commit test changes regarding PR48933

This will highlight the effective changes in subsequent commits.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D95903

(cherry picked from commit 3b2f19d0bc2803697526191a8a607efa0b38f7e4)

[AssumptionCache] Do not track llvm.assume calls (PR49043)

This fixes PR49043 by invalidating the handle on RAUW. This will work
fine assuming all existing RAUW users add the new assumption to the
cache. That means, if a new llvm.assume call replaces an old one, you
need to add the new one now as a RAUW is not enough anymore.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D96208

(cherry picked from commit 378f4e5ec26c3e0d2119c1112ec645b369eed2de)

workflows: Increase the fetch-depth for actions/checkout steps

This avoids failures when many commits are pushed close together.

[clang-tidy] Fix crash in readability-identifier-naming check

`isParamInMainLikeFunction` didn't check if the function had an identifer name before calling getName() which could lead to an assert.

(cherry picked from commit c97592c5df09850404a9ddbfb614c7df271d1dfe)

[InlineFunction] Only update noalias scopes once for an instruction.

Inlining sometimes maps different instructions to be inlined onto the same instruction.

We must ensure to only remap the noalias scopes once. Otherwise the scope might disappear (at best).
This patch ensures that we only replace scopes for which the mapping is known.

This approach is preferred over tracking which instructions we already handled in a SmallPtrSet,
as that one will need more memory.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D95862

(cherry picked from commit 50c523a9d4402c69d59c0b2ecb383a763d16cde9)

[LoopPeel] Use llvm.experimental.noalias.scope.decl for duplicating noalias metadata as needed.

The reduction of a sanitizer build failure when enabling the dominance check (D95335) showed that loop peeling also needs to take care of scope duplication, just like loop unrolling (D92887).

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D95544

(cherry picked from commit 80cdd30eb90c3509bf315f1fa1369483e2448bbd)

IntrinsicEmitter: Change IntrinsicsToAttributesMap from uint8_t[] to uint16_t[]

We need at least 252 UniqAttributes now, which will soon overflow.
Actually with downstream backends we can easily use up the last few values.
So bump to uint16_t.

(cherry picked from commit b7d63244226ba2c0df651622fe7fe3f5f8aba262)

[RISCV] Add new vector instructions in v0.10.

* Add new vector instructions in v0.10.
- load/store for mask value vle1.v vse1.v
- vsetivli for 0-31 immediate vector length.
* Rename vector instructions in v0.10.
- vfrsqrte7 -> vfrsqrt7
- vfrece7 -> vfrec7
* Reserve memory width encodings for EEW>128b.

Differential Revision: https://reviews.llvm.org/D95781

(cherry picked from commit c7189ba78578d029e0162720319de3c1c6fc348b)

[RISCV] Replace NoX0 SDNodeXForm with a ComplexPattern to do the selection of the VL operand.

I think this is a more standard way of doing this.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D95833

(cherry picked from commit e7f9a834996f40be8dc46a0b059aa850f1f4ef05)

Don't infer attributes on '::operator new'.

These attributes were all incorrect or inappropriate for LLVM to infer:
- inaccessiblememonly is generally wrong; user replacement operator new
  can access memory that's visible to the caller, as can a new_handler
  function.
- willreturn is generally wrong; a custom new_handler is not guaranteed
  to terminate.
- noalias is inappropriate: Clang has a flag to determine whether this
  attribute should be present and adds it itself when appropriate.
- noundef and nonnull on the return value should be specified by the
  frontend on all 'operator new' functions if we want them, not here.

In any case, inferring attributes on functions declared 'nobuiltin' (as
these are when Clang emits them) seems questionable.

(cherry picked from commit ab243efb261ba7e27f4b14e1a6fbbff15a79c0bf)

Revert "[BuildLibcalls, Attrs] Support more variants of C++'s new, add attributes for C++'s delete"

Several of the new attributes here were incorrect, and even the ones
that are generally correct were being added even to nobuiltin calls.

This reverts commit bb3f169b59e1c8bd7fd70097532220bbd11e9967.

(cherry picked from commit 1484ad4137b5d627573672bad48b03785f8fdefd)

[ARM] Do not emit ldrexd/strexd on Cortex-M chips

The ldrexd/strexd instructions are not supported on M-class chips, see
for example
https://developer.arm.com/documentation/dui0489/e/arm-and-thumb-instructions/memory-access-instructions/ldrex-and-strex
which says:

> All these 32-bit Thumb instructions are available in ARMv6T2 and
> above, except that LDREXD and STREXD are not available in the ARMv7-M
> architecture.

Looking at the ARMv8-M architecture, it appears that these instructions
aren't supported either. The Architecture Reference Manual lists
ldrex/strex but not ldrexd/strexd:
https://developer.arm.com/documentation/ddi0553/bn/

Godbolt example on LLVM 11.0.0, which incorrectly emits ldrexd/strexd
instructions: https://llvm.godbolt.org/z/5qqPnE

Differential Revision: https://reviews.llvm.org/D95891

(cherry picked from commit aecdf15cc7f866180dc769265b8183cad34bb33a)

[Support] Indent multi-line descr of enum cli options.

As noted in https://reviews.llvm.org/D93459, the formatting of
multi-line descriptions of clEnumValN and the likes is unfavorable.
Thus this patch adds support for correctly indenting these.

Reviewed By: serge-sans-paille

Differential Revision: https://reviews.llvm.org/D93494

(cherry picked from commit e3f02302e318837d2421c6425450f04ae0a82b90)

[RISCV] Fix incorrect RVV sdiv/udiv lowering

Due to a clerical error, the sdiv operation was mapping to vdivu and
udiv to vdiv, when the opposite mapping is the correct one.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D95869

(cherry picked from commit b4106f9c7b8c498d109301ced7bf9aca32027168)

[DAGCombine] Do not remove masking argument to FP16_TO_FP for some targets

As of commit 284f2bffc9bc5, the DAG Combiner gets rid of the masking of the
input to this node if the mask only keeps the bottom 16 bits. This is because
the underlying library function does not use the high order bits. However, on
PowerPC's ELFv2 ABI, it is the caller that is responsible for clearing the bits
from the register. Therefore, the library implementation of __gnu_h2f_ieee will
return an incorrect result if the bits aren't cleared.

This combine is desired for ARM (and possibly other targets) so this patch adds
a query to Target Lowering to check if this zeroing needs to be kept.

Fixes: https://bugs.llvm.org/show_bug.cgi?id=49092

Differential revision: https://reviews.llvm.org/D96283

(cherry picked from commit a5222aa0858a42660629c410a5b669dee16a4359)

PR48587: is_constant_evaluated() should not evaluate to true during a
variable's destruction if it didn't do so during construction.

The standard doesn't give any guidance as to what to do here, but this
approach seems reasonable and conservative, and has been proposed to the
standard committee.

(cherry picked from commit c945dc4a5023d6a17d11fcda76509b94b36e34fc)

[GlobalISel] Check if branches use the same MBB in matchOptBrCondByInvertingCond

If the G_BR + G_BRCOND in this combine use the same MBB, then it will infinite
loop. Don't allow that to happen.

Differential Revision: https://reviews.llvm.org/D95895

(cherry picked from commit 02d4b365bf4f8c2cb56e5612902f6c3bb4316493)

[clangd] Fix clang tidy provider when multiple config files exist in directory tree

Currently Clang tidy provider searches from the root directory up to the target directory, this is the opposite of how clang-tidy searches for config files.
The result of this is .clang-tidy files are ignored in any subdirectory of a directory containing a .clang-tidy file.

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D96204

(cherry picked from commit ba3ea9c60f0f259f0ccc47e47daf8253a5885531)

Fix "not all control paths return a value" warning. NFCI.

PR48606: The lifetime of a constexpr heap allocation always started
during the same evaluation.

It looks like the only case for which this matters is determining
whether mutable subobjects of a heap allocation can be modified during
constant evaluation.

(cherry picked from commit 21e8bb83253e1a2f4b6fad9b53cafe8c530a38e2)

[AST] Update LVal before evaluating lambda decl fields.

Differential Revision: https://reviews.llvm.org/D96092

(cherry picked from commit 96fb49c3ff8e08680127ddd4ec45a0e6c199243b)

[lldb-vscode] correctly use Windows macros

@mstorsjo found a mistake that I made when trying to fix some Windows
compilation errors encountered by @stella.stamenova.

I was incorrectly using the LLVM_ON_UNIX macro. In any case, proper use
of

#if defined(_WIN32)

should be the actual fix.

Differential Revision: https://reviews.llvm.org/D96060

(cherry picked from commit 36496cc2992d6fa26e6024971efcfc7d15f69888)

Fix lldb-vscode builds on Windows targeting POSIX

@stella.stamenova found out that lldb-vscode's Win32 macros were failing
when building on windows targetings POSIX platforms.

I'm changing these macros for LLVM_ON_UNIX, which should be more
accurate.

(cherry picked from commit 0bca9a7ce2eeaa9f1d732ffbc17769560a2b236e)

Fix runInTerminal failures on Windows

stella.stemenova mentioned in https://reviews.llvm.org/D93951 failures on Windows for this test.

I'm fixing the macro definitions and disabling the tests for python
versions lower than 3.7. I'll figure out that actual issue with
python3.6 after the buildbots are fine again.

(cherry picked from commit ab5591e1d8f5abcfa9e75193d3e8a29087b61425)

[🍒][libc++] Fix libcxx build on 32bit architectures with 64bit time_t defaults e.g. riscv32

Patch by Khem Raj.

(cherry pick of commit 85b9c5ccc172a1e61c7ecaaec4752587cb6f1e26)

Differential Revision: https://reviews.llvm.org/D96062

[🍒]Disable CFI in __get_elem to allow casting a pointer to uninitialized memory

Fixes usage of shared_ptr with CFI enabled, which is llvm.org/pr48993.

(cherry pick of commit bab74864168bb5e28ecbc0294fe1095d8da7f569)

Differential Revision: https://reviews.llvm.org/D96063

[🍒][libc++] Rename include/support to include/__support

We do ship those headers, so the directory name should not be something
that can potentially conflict with user-defined directories.

This is a cherry-pick of b51756819a85563ae063e98eeb3d6af8e44c8f64.

Differential Revision: https://reviews.llvm.org/D96059

[OpenMP][libomptarget] Fixed an issue that device sync is skipped if the kernel doesn't have any argument

Currently if there is not kernel argument, device synchronization will
be skipped. This can lead to two issues:
1. If there is any device error, it will not be captured;
2. The target region might end before the kernel is done, which is not spec
conformant.

The test added in this patch only runs on NVPTX platform, although it will not
be executed by Phab at all. It also requires `not` which is not available on most
systems.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D96067

(cherry picked from commit b68a6b09e60a24733b923a0fc282746a855852da)

[MemorySSA] Don't treat lifetime.end as NoAlias

MemorySSA currently treats lifetime.end intrinsics as not aliasing
anything. This breaks MemorySSA-based MemCpyOpt, because we'll happily
move a read of a pointer below a lifetime.end intrinsic, as no clobber
is reported.

I think the MemorySSA modelling here isn't correct: lifetime.end(p)
has approximately the same effect as doing a memcpy(p, undef), and
should be treated as a clobber.

This patch removes the special handling of lifetime.end, leaving
alias analysis to handle it appropriately.

Differential Revision: https://reviews.llvm.org/D95763

[MemCpyOpt] Add test for incorrect optimization across lifetime (NFC)

This only affects the MemorySSA-based implementation.

workflows: Update libclang-abi-tests to work with minor release baselines

Add a release note about deprecating the clang-cl /fallback flag

As discussed in
https://lists.llvm.org/pipermail/cfe-dev/2021-January/067524.html

The flag has been removed on the main branch in D95876.

Differential revision: https://reviews.llvm.org/D96016

[OpenMP] Disabled profiling in `libomp` by default to unblock link errors

Link error occurred when time profiling in libomp is enabled by default
because `libomp` is assumed to be a C library but the dependence on
`libLLVMSupport` for profiling is a C++ library. Currently the issue blocks all
OpenMP tests in Phabricator.

This patch set a new CMake option `OPENMP_ENABLE_LIBOMP_PROFILING` to
enable/disable the feature. By default it is disabled. Note that once time
profiling is enabled for `libomp`, it becomes a C++ library.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95585

(cherry picked from commit c571b168349fdf22d1dc8b920bcffa3d5161f0a2)

[OpenMP] Fix building using LLVM_ENABLE_RUNTIMES

Fix when time profiling is enabled.

Related to: D94855

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D95398

(cherry picked from commit bb40e6731843de92f1c73ad6efceb8a89e045ea6)

[clang][aarch64][WOA64][docs] Release note for longjmp crash with /guard:cf

Add a release note workaround for PR47463.

Bug: https://bugs.llvm.org/show_bug.cgi?id=47463

Differential Revision: https://reviews.llvm.org/D95435

Revert "[OpenMP] Disabled profiling in `libomp` by default to unblock link errors"

This reverts commit f5602e0bf31ab590da19fa357980a753dbfd666e.

[X86] Accept 64-bit GPRs for vextractps when using a register that requires EVEX.

This is consistent with the VEX version. It also fixes a sorting
issue in the matching table that caused the EVEX version to be
prioritized over VEX in intel syntax.

Fixes issue [2] from PR48991.

(cherry picked from commit c691fe14da93a7c9eff466231515d6d4d16124fa)

[OpenMP][NVPTX] Take functions in `deviceRTLs` as `convergent`

OpenMP device compiler (similar to other SPMD compilers) assumes that
functions are convergent by default to avoid invalid transformations, such as
the bug (https://bugs.llvm.org/show_bug.cgi?id=49021).

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95971

(cherry picked from commit 0f0ce3c12edefd25448e39c4d20718a10d3d42c1)

[CSSPGO] Introducing distribution factor for pseudo probe.

Sample re-annotation is required in LTO time to achieve a reasonable post-inline profile quality. However, we have seen that such LTO-time re-annotation degrades profile quality. This is mainly caused by preLTO code duplication that is done by passes such as loop unrolling, jump threading, indirect call promotion etc, where samples corresponding to a source location are aggregated multiple times due to the duplicates. In this change we are introducing a concept of distribution factor for pseudo probes so that samples can be distributed for duplicated probes scaled by a factor. We hope that optimizations duplicating code well-maintain the branch frequency information (BFI) based on which probe distribution factors are calculated. Distribution factors are updated at the end of preLTO pipeline to reflect an estimated portion of the real execution count.

This change also introduces a pseudo probe verifier that can be run after each IR passes to detect duplicated pseudo probes.

A saturated distribution factor stands for 1.0. A pesudo probe will carry a factor with the value ranged from 0.0 to 1.0. A 64-bit integral distribution factor field that represents [0.0, 1.0] is associated to each block probe. Unfortunately this cannot be done for callsite probes due to the size limitation of a 32-bit Dwarf discriminator. A 7-bit distribution factor is used instead.

Changes are also needed to the sample profile inliner to deal with prorated callsite counts. Call sites duplicated by PreLTO passes, when later on inlined in LTO time, should have the callees’s probe prorated based on the Prelink-computed distribution factors. The distribution factors should also be taken into account when computing hotness for inline candidates. Also, Indirect call promotion results in multiple callisites. The original samples should be distributed across them. This is fixed by adjusting the callisites' distribution factors.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D93264

(cherry picked from commit 3d89b3cbec230633e8228787819b15116c1a1730)

[CSSPGO] Factor out common part for CSSPGO inline and AFDO inline

Refactoring SampleProfileLoader::inlineHotFunctions to use helpers from CSSPGO inlining and reduce similar code in the inlining loop, plus minor cleanup for AFDO path.

This is resubmit of D95024, with build break and overtighten assertion fixed.

Test Plan:

(cherry picked from commit 1645f465be85223e9f5b6303a3e5e0e491fd819f)

[CSSPGO] Call site prioritized inlining for sample PGO

This change implemented call site prioritized BFS profile guided inlining for sample profile loader. The new inlining strategy maximize the benefit of context-sensitive profile as mentioned in the follow up discussion of CSSPGO RFC. The change will not affect today's AutoFDO as it's opt-in. CSSPGO now defaults to the new FDO inliner, but can fall back to today's replay inliner using a switch (`-sample-profile-prioritized-inline=0`).

Motivation

With baseline AutoFDO, the inliner in sample profile loader only replays previous inlining, and the use of profile is only for pruning previous inlining that turned out to be cold. Due to the nature of replay, the FDO inliner is simple with hotness being the only decision factor. It has the following limitations that we're improving now for CSSPGO.
- It doesn't take inline candidate size into account. Since it's doing replay, the size growth is bounded by previous CGSCC inlining. With context-sensitive profile, FDO inliner is no longer limited by previous inlining, so we need to take size into account to avoid significant size bloat.
- The way it looks at hotness is not accurate. It uses total samples in an inlinee as proxy for hotness, while what really matters for an inline decision is the call site count. This is an unfortunate fall back because call site count and callee entry count are not reliable due to dwarf based correlation, especially for inlinees. Now paired with pseudo-probe, we have accurate call site count and callee's entry count, so we can use that to gauge hotness more accurately.
- It treats all call sites from a block as hot as long as there's one call site considered hot. This is normally true, but since total samples is used as hotness proxy, this transitiveness within block magnifies the inacurate hotness heuristic. With pseduo-probe and the change above, this is no longer an issue for CSSPGO.

New FDO Inliner

Putting all the requirement for CSSPGO together, we need a top-down call site prioritized BFS inliner. Here're reasons why each component is needed.
- Top-down: We need a top-down inliner to better leverage context-sensitive profile, so inlining is driven by accurate context profile, and post-inline is also accurate. This is already implemented in https://reviews.llvm.org/D70655.
- Size Cap: For top-down inliner, taking function size into account for inline decision alone isn't sufficient to control size growth. We also need to explicitly cap size growth because with top-down inlining, we can grow inliner size significantly with large number of smaller inlinees even if each individually passes the cost/size check.
- Prioritize call sites: With size cap, inlining order also becomes important, because if we stop inlining due to size budget limit, we'd want to use budget towards the most beneficial call sites.
- BFS inline: Same as call site prioritization, if we stop inlining due to size budget limit, we want a balanced inline tree, rather than going deep on one call path.

Note that the new inliner avoids repeatedly evaluating same set of call site, so it should help with compile time too. For this reason, we could transition today's FDO inliner to use a queue with equal priority to avoid wasted reevaluation of same call site (TODO).

Speculative indirect call promotion and inlining is also supported now with CSSPGO just like baseline AutoFDO.

Tunings and knobs

I created tuning knobs for size growth/cap control, and for hot threshold separate from CGSCC inliner. The default values are selected based on initial tuning with CSSPGO.

Results

Evaluated with an internal LLVM fork couple months ago, plus another change to adjust hot-threshold cutoff for context profile (will send up after this one), the new inliner show ~1% geomean perf win on spec2006 with CSSPGO, while reducing code size too. The measurement was done using train-train setup, MonoLTO w/ new pass manager and pseudo-probe. Note that this is just a starting point - we hope that the new inliner will open up more opportunity with CSSPGO, but it will certainly take more time and effort to make it fully calibrated and ready for bigger workloads (we're working on it).

Differential Revision: https://reviews.llvm.org/D94001

(cherry picked from commit 6bae5973c476e16dbbc82030d65c7859a6628e89)

[CSSPGO] Passing the clang driver switch -fpseudo-probe-for-profiling to the linker.

As titled.

Reviewed By: wmi, wenlei

Differential Revision: https://reviews.llvm.org/D95271

(cherry picked from commit d3e2e3740d0730cb6788c771bb01a8f3e935bf2e)

[CSSPGO] Tweaking inlining with pseudo probes.

Fixing up a couple places where `getCallSiteIdentifier` is needed to support pseudo-probe-based callsites.

Also fixing an issue in the extbinary profile reader where the metadata section is not fully scanned based on the number of profiles loaded only for the current module.

Reviewed By: wmi, wenlei

Differential Revision: https://reviews.llvm.org/D95791

(cherry picked from commit 224fee8219bb3aed34f13ce40935e1b3ede90a0f)

[CSSPGO] Support of CS profiles in extended binary format.

This change brings up support of context-sensitive profiles in the format of extended binary. Existing sample profile reader/writer/merger code is being tweaked to reflect the fact of bracketed input contexts, like (`[...]`). The paired brackets are also needed in extbinary profiles because we don't yet have an otherwise good way to tell calling contexts apart from regular function names since the context delimiter `@` can somehow serve as a part of the C++ mangled names.

Reviewed By: wmi, wenlei

Differential Revision: https://reviews.llvm.org/D95547

(cherry picked from commit 7e99bddfeaab2713a8bb6ca538da25b66e6efc59)

[OpenMP] Disabled profiling in `libomp` by default to unblock link errors

Link error occurred when time profiling in libomp is enabled by default
because `libomp` is assumed to be a C library but the dependence on
`libLLVMSupport` for profiling is a C++ library. Currently the issue blocks all
OpenMP tests in Phabricator.

This patch set a new CMake option `OPENMP_ENABLE_LIBOMP_PROFILING` to
enable/disable the feature. By default it is disabled. Note that once time
profiling is enabled for `libomp`, it becomes a C++ library.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95585

(cherry picked from commit c571b168349fdf22d1dc8b920bcffa3d5161f0a2)

Extend release notes for AST Matchers changes

PR44325 (and duplicates): don't issue -Wzero-as-null-pointer-constant
when rewriting 'a < b' as '(a <=> b) < 0'.

It's pretty common for comparison category types to use a pointer or
pointer-to-member type as their '0' parameter.

(cherry picked from commit 1f06f41993b6363e6b2c4f22a13488a3e687f31b)

[OpenMP] Fix seg fault in libomptarget when using Info with multiple threads

Summary:
One option for the LIBOMPTARGET_INFO environment variable is to print the current status of the device's data mappings. These are a shared resource among threads so this needs to be protected when using multiple streams.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95786

(cherry picked from commit fda48539988d2a1bdb6395799151e9090312a20b)

[OpenMP][NFC] Added release note for new `deviceRTLs` and hidden helper task

Added release note for new `deviceRTLs` and hidden helper task for LLVM
12.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95584

(cherry picked from commit 7bc31018f71cac22b7060c49cefb6f3d0d2e2069)

[OpenMP][deviceRTLs] Added `[[clang::loader_uninitialized]]` explicitly

`[[clang::loader_uninitialized]]` is in macro `SHARED` but it doesn't
work for array like `parallelLevel`, so the variable will be zero initialized.
There is also a similar issue for `omptarget_nvptx_device_State` which is in
global address space. Its c'tor is also generated, which was not in the past when
building the `deviceRTLs` with CUDA. In this patch, we added the attribute to
the two variables explicitly.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95550

(cherry picked from commit 19248d30e4ed5250fa84abbbd52fc7b835918a45)

[OpenMP][NVPTX] Added the missing -O1 when building NVPTX bitcode libraries

In the past `-O1` was used when building NVPTX bitcode libraries. After
we switched to OpenMP, `-O1` was missing by mistake, leading to a huge performance
regression.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D95545

(cherry picked from commit 5a64794bbad4010778406dfee7748e6080258dbf)

[OpenMP][Libomptarget] Fix conditional in CMake for remote plugin

The remote offloading plugin's CMakeLists was trying to build if its
flag was enabled even if it didn't find gRPC/protobuf. The conditional
was wrong, it's fixed by this.

Differential Revision: https://reviews.llvm.org/D95574

(cherry picked from commit 8a77056256d9970387595a5c729d894e3fe07131)

[elfabi] Fix tests which failed on different timezones

This patch fixes elfabi tests on machines using a GMT+X timezone
settings.

Differential Revision: https://reviews.llvm.org/D95641

(cherry picked from commit 771b35965457ebd5faaed8a1c3d2bcefffe721a3)

[X86] Fix disassembly of x86-64 GDTLS code sequence

For x86-64 the REX.w prefix takes precedence over any other size
override (i.e. 0x66). Therefore, for x86-64 when REX.w is present set
'hasOpSize' to false to ensure that any size override is ignored.

Fixes PR48901.

Differential Revision: https://reviews.llvm.org/D95682

(cherry picked from commit 94fedd266125a5425aa33e11332bf414f0b6dc35)

[LV] Fix crash when computing max VF too early

D90687 introduced a crash:

  llvm::LoopVectorizationCostModel::computeMaxVF(llvm::ElementCount, unsigned int):
    Assertion `WideningDecisions.empty() && Uniforms.empty() && Scalars.empty() &&
    "No decisions should have been taken at this point"' failed.

when compiling the following C code:

  typedef struct {
  char a;
  } b;

  b *c;
  int d, e;

  int f() {
    int g = 0;
    for (; d; d++) {
      e = 0;
      for (; e < c[d].a; e++)
        g++;
    }
    return g;
  }

with:

  clang -Os -target hexagon -mhvx -fvectorize -mv67 testcase.c -S -o -

This occurred since prior to D90687 computeFeasibleMaxVF would only be
called in computeMaxVF when a scalar epilogue was allowed, but now it's
always called. This causes the assert above since computeFeasibleMaxVF
collects all viable VFs larger than the default MaxVF, and for each VF
calculates the register usage which results in analysis being done the
assert above guards against. This can occur in computeFeasibleMaxVF if
TTI.shouldMaximizeVectorBandwidth and this target hook is implemented in
the hexagon backend to always return true.

Reported by @iajbar.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D94869

(cherry picked from commit 8cda227432f1c9ceb63b88802ed8136da97274f1)

[RISCV] Update the version number to v0.10 for vector.

(cherry picked from commit 9847023660467a4469b5667bcf7a4c73a4780037)

[RISCV] Update the version number to v0.10 for vector.

v0.10 is tagged in V specification. Update the version to v0.10.

Differential Revision: https://reviews.llvm.org/D95680

(cherry picked from commit 282aca10aeb03bdaef0a8d4f3faa4c2ff236e527)

[PowerPC][Power10] Fix XXSPLI32DX not correctly exploiting specific cases

Some cases may be transformed into 32 bit splats before hitting the boolean statement, which may cause incorrect behaviour and provide XXSPLTI32DX with the incorrect values of splat. The condition was reversed so that the shortcut prevents this problem.

Differential Revision: https://reviews.llvm.org/D95634

(cherry picked from commit 2e470e03b49f1d79ebc315ca9d62a690a633c0cd)

[PowerPC] Do not emit XXSPLTI32DX for sub 64-bit constants

If the APInt returned by BuildVectorSDNode::isConstantSplat() is narrower than
64 bits, the result produced by XXSPLTI32DX is incorrect. The result returned
by the function appears to be incorrect and we'll investigate/fix it in a
follow-up commit. However, since this causes miscompiles, we must
temporarily disable emitting this instruction for such values.

(cherry picked from commit 54e570d94af995ff58287a8288389641910a8239)

[VE] Change inetger constants 32-bit friendly

Correct integer constants like `1UL << 63` to `UINT64_C(1) << 63` in
order to make them work on 32-bit machines. Tested on both an i386
and x86_64 machines.

Reviewed By: mgorny

Differential Revision: https://reviews.llvm.org/D95724

(cherry picked from commit 4648098f97fa2a7c08c04632c70cf29293528812)

[OpenMP] Fix python3 compatibility in openmp's lit.cfg

Differential Revision: https://reviews.llvm.org/D95669

(cherry picked from commit c3c02d0d5a313272f6d35926bdf678fc6b884c02)

[X86][AVX] canonicalizeLaneShuffleWithRepeatedOps - don't merge VPERMILPD ops with different low/high masks.

Unlike VPERMILPS, VPERMILPD can have non-repeating masks in each 128-bit subvector, we weren't accounting for this when folding vperm2f128(vpermilpd(x,c),vpermilpd(y,c)) -> vpermilpd(vperm2f128(x,y),c).

I'm intending to add support for this but wanted to get a minimal fix in first for merging into 12.xx.

Fixes PR48908

(cherry picked from commit 6663330bc8c84a75ea092272297b557bfc310380)

[X86][AVX] Add PR48908 shuffle test case

(cherry picked from commit da8845fc3d3bb0b0e133f020931440511fa72723)

workflows: Add job to check for ABI changes in libclang.so and libclang-cpp.so

workflows: Fix actions repository name for llvm tests

workflows: Re-enable lldb test on Mac OS X

Revert "[ConstantFold] Fold more operations to poison"

This reverts commit 53040a968dc2ff20931661e55f05da2ef8b964a0 due to its
bad interaction with select i1 -> and/or i1 transformation.

This fixes:
https://bugs.llvm.org/show_bug.cgi?id=49005
https://bugs.llvm.org/show_bug.cgi?id=48435

(cherry picked from commit 06829034ca64b8c83a5b20d8abe5ddbfe7af0004)

[OpenMP][Libomptarget] Remove possible harmful copy constructor call for RTLsTy

From https://bugs.llvm.org/show_bug.cgi?id=48973, we know that
`std::call_once(PM->RTLs.initFlag, &RTLsTy::LoadRTLs, PM->RTLs)` causes compile
time problems in libstdc++v3 5.3.1. This is because there was a defect in the
standard regarding the `call_once` (LWG 2442). This was fixed in libstdc++ soon
thereafter, but there are likely other standard libraries where this will fail.

By matching this function call with the other one, we fix this bug.

Differential Revision: https://reviews.llvm.org/D95769

workflows: Fix libclc tests

[docs] Add release notes for things I've done for the 12.x release branch.

workflows: Fix LLVM ABI checks to work for X.0.0 releases

[LoopUnswitch] Properly update MSSA if header has non-clobbering stores.

This patch fixes updating MemorySSA if the header contains memory
defs that do not clobber a duplicated instruction. We need to find the
first defining access outside the loop body and use that as defining
access of the duplicated instruction.

This fixes a crash caused by bee486851c1a.

(Cherry-picked on the 12.x release branch from
10c57268c074c3ad48f76da38fa2ba575ee3d1f9)

[OpenMP][NVPTX] Disable building NVPTX deviceRTL by default on a non-CUDA system

D95466 dropped CUDA to build NVPTX deviceRTL and enabled it by default.
However, the building requires some libraries that are not available on non-CUDA
system by default, which could break the compilation. This patch disabled the
build by default. It can be enabled with `LIBOMPTARGET_BUILD_NVPTX_BCLIB=ON`.

Reviewed By: kparzysz

Differential Revision: https://reviews.llvm.org/D95556

(cherry picked from commit fb12df4a8e33d759938057718273dfb434b2d9c4)

[clang-tidy] Fix linking tests to LLVMTestingSupport

LLVMTestingSupport is not part of libLLVM, and therefore can not
be linked to via LLVM_LINK_COMPONENTS. Instead, it needs to be
specified explicitly to ensure that it is linked explicitly
even if LLVM_LINK_LLVM_DYLIB is used. This is consistent with handling
in clangd.

Fixes PR#48931

Differential Revision: https://reviews.llvm.org/D95653

(cherry picked from commit 632545e8ce846ccaeca8df15a3dc5e36d01a1275)

Make the profile-filter.c test compatible with 32-bit systems

This addresses PR48930.

Differential Revision: https://reviews.llvm.org/D95658

(cherry picked from commit 0217f1c7a31ba44715bc083a60cddc2192ffed96)

[sanitizer] Fix msan test build on FreeBSD after 7afdc89c2054

This commit accidentally enabled fgetgrent_r() in the msan tests under
FreeBSD, but this function is not supported. Also remove FreeBSD from
the SANITIZER_INTERCEPT_FGETGRENT_R macro.

(cherry picked from commit e056fc6cb676f72d5b7dfe7ca540b3275bd1a46f)

[AMDGPU] Avoid an illegal operand in si-shrink-instructions

Before the patch it was possible to trigger a constant bus
violation when folding immediates into a shrunk instruction.

The patch adds a check to enforce the legality of the new operand.

Differential Revision: https://reviews.llvm.org/D95527

(cherry picked from commit fc8e7411218c846386650cfba111b62827c71da0)

Relax test expectations in debug-info-gline-tables-only-codeview.cpp

To make it pass also on 32-bit Windows, see PR48920.

(cherry picked from commit 0024efc69ea6cd0b630cd11cef5991b7edb73ffc)

[OpenMP] libomp: fix build by cl with vs2019

Replace VLA with dynamic allocation using alloca().
This fixes https://bugs.llvm.org/show_bug.cgi?id=48919.

Differential Revision: https://reviews.llvm.org/D95627

(cherry picked from commit 7f5ad0e07162e0c19e569986ee37a17c147c9a27)

[clangd] Parse Diagnostics block, and nest ClangTidy block under it.

(ClangTidy configuration block hasn't been in any release, so we should be OK
to move it around like this)

Differential Revision: https://reviews.llvm.org/D95362

(cherry picked from commit c3df9d58c75e0f89ca95e947804d65e79a491adc)

[clangd] Log warning when using legacy (theia) semantic highlighting.

The legacy protocol will be removed on trunk after the 12 branch cut,
and gone in clangd 13.

Differential Revision: https://reviews.llvm.org/D95031

(cherry picked from commit 29472bb76915c4929aecc938300f6df31f63ac29)

clang: Fix static_assert in a few contexts in microsoft mode

Follow-up to D17444. Fixes PR48904. See bug for details.

Differential Revision: https://reviews.llvm.org/D95559

(cherry picked from commit 764a7a2155c6747ec8d0b38d8edbb65960eae874)

[clang-format] Avoid considering include directive as a template closer.

This fixes a bug [[ http://llvm.org/PR48891 | PR48891 ]] introduced in D93839 where:
```
#include <stdint.h>
namespace rep {}
```
got formatted as
```
#include <stdint.h>
namespace rep {
}
```

Reviewed By: MyDeveloperDay, leonardchan

Differential Revision: https://reviews.llvm.org/D95479

(cherry picked from commit e3713f156b8cb65a2b74f150afb824ce1e2a2fab)

workflows: Update branch names

Also remove main-brancy-sync workflow that was removed from the main branch.

Itanium Mangling: In 'enable_if', omit X/E around <expr-primary>.

The Clang enable_if extension is mangled as an <extended-qualifier>,
which is supposed to contain <template-args>. However, we were
unconditionally emitting X/E around its arguments, neglecting the fact
that <expr-primary> should be emitted directly without the surrounding
X/E.

Differential Revision: https://reviews.llvm.org/D95488

(cherry picked from commit a7246ba02a8923f316419a62d836dbe1c0b437bd)

Itanium Mangling: Fix handling of <expr-primary> in <template-arg>.

Previously, we were emitting an extraneous X .. E in <template-arg>
around an <expr-primary> if the template argument was constructed from
an expression (rather than an already-evaluated literal value). In
such a case, we would then e.g. emit 'XLi0EE' instead of 'Li0E'.

We had one special-case for DeclRefExpr expressions, in particular, to
omit them the mangled-name without the surrounding X/E. However,
unfortunately, that special case also triggered for ParmVarDecl (a
subtype of VarDecl), and _incorrectly_ emitted 'L_Z .. E' instead of
the proper 'Xfp_E'.

This change causes mangleExpression itself to be responsible for
emitting X/E around non-primary expressions, which removes the
special-case, and corrects both these problems.

Differential Revision: https://reviews.llvm.org/D95487

(cherry picked from commit 8ca33605ff0cfc536f5c6710fb5f6378bf11959a)

Itanium Mangling: Mangle `__alignof__` differently than `alignof`.

The two operations have acted differently since Clang 8, but were
unfortunately mangled the same. The new mangling uses new "vendor
extended expression" syntax proposed in
https://github.com/itanium-cxx-abi/cxx-abi/issues/112

GCC had the same mangling problem, https://gcc.gnu.org/PR88115, and
will hopefully be switching to the same mangling as implemented here.

Additionally, fix the mangling of `__uuidof` to use the new extension
syntax, instead of its previous nonstandard special-case.

Adjusts the demangler accordingly.

Differential Revision: https://reviews.llvm.org/D93922

(cherry picked from commit 9c7aeaebb3ac1b94200b59b111742cb6b8f090c2)

Revert "Suppress non-conforming GNU paste extension in all standard-conforming modes"

This reverts commit f4537935dcdbf390c863591cf556e76c3abab9c1.
This reverts commit b43c26d036dcbf7a6881f39e4434cf059364022a.

This GNU and MSVC extension turns out to be very popular. Most projects
are not using C++20, so cannot use the new __VA_OPT__ feature to be
standards conformant. The other workaround, using -std=gnu*, enables too
many language extensions and isn't viable.

Until there is a way for users to get the behavior provided by the
`, ## __VA_ARGS__` extension in the -std=c++17 and earlier language
modes, we need to revert this.

(cherry picked from commit 61a66e4b5ec18e9e73c2f6334f6b7f7dd4bca77e)