OSDN Git Service
wlei [Mon, 11 Jan 2021 20:47:22 +0000 (12:47 -0800)]
[CSSPGO][llvm-profgen] Aggregate samples on call frame trie to speed up profile generation
For CS profile generation, the process of call stack unwinding is time-consuming since for each LBR entry we need linear time to generate the context( hash, compression, string concatenation). This change speeds up this by grouping all the call frame within one LBR sample into a trie and aggregating the result(sample counter) on it, deferring the context compression and string generation to the end of unwinding.
Specifically, it uses `StackLeaf` as the top frame on the stack and manipulates(pop or push a trie node) it dynamically during virtual unwinding so that the raw sample can just be recoded on the leaf node, the path(root to leaf) will represent its calling context. In the end, it traverses the trie and generates the context on the fly.
Results:
Our internal branch shows about 5X speed-up on some large workloads in SPEC06 benchmark.
Differential Revision: https://reviews.llvm.org/D94110
wlei [Fri, 29 Jan 2021 23:00:08 +0000 (15:00 -0800)]
[CSSPGO][llvm-profgen] Compress recursive cycles in calling context
This change compresses the context string by removing cycles due to recursive function for CS profile generation. Removing recursion cycles is a way to normalize the calling context which will be better for the sample aggregation and also make the context promoting deterministic.
Specifically for implementation, we recognize adjacent repeated frames as cycles and deduplicated them through multiple round of iteration.
For example:
Considering a input context string stack:
[“a”, “a”, “b”, “c”, “a”, “b”, “c”, “b”, “c”, “d”]
For first iteration,, it removed all adjacent repeated frames of size 1:
[“a”, “b”, “c”, “a”, “b”, “c”, “b”, “c”, “d”]
For second iteration, it removed all adjacent repeated frames of size 2:
[“a”, “b”, “c”, “a”, “b”, “c”, “d”]
So in the end, we get compressed output:
[“a”, “b”, “c”, “d”]
Compression will be called in two place: one for sample's context key right after unwinding, one is for the eventual context string id in the ProfileGenerator.
Added a switch `compress-recursion` to control the size of duplicated frames, default -1 means no size limit.
Added unit tests and regression test for this.
Differential Revision: https://reviews.llvm.org/D93556
wlei [Mon, 11 Jan 2021 17:08:39 +0000 (09:08 -0800)]
[CSSPGO][llvm-profgen] Pseudo probe based CS profile generation
This change implements profile generation infra for pseudo probe in llvm-profgen. During virtual unwinding, the raw profile is extracted into range counter and branch counter and aggregated to sample counter map indexed by the call stack context. This change introduces the last step and produces the eventual profile. Specifically, the body of function sample is recorded by going through each probe among the range and callsite target sample is recorded by extracting the callsite probe from branch's source.
Please refer https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s and https://reviews.llvm.org/D89707 for more context about CSSPGO and llvm-profgen.
**Implementation**
- Extended `PseudoProbeProfileGenerator` for pseudo probe based profile generation.
- `populateBodySamplesWithProbes` reading range counter is responsible for recording function body samples and inferring caller's body samples.
- `populateBoundarySamplesWithProbes` reading branch counter is responsible for recording call site target samples.
- Each sample is recorded with its calling context(named `ContextId`). Remind that the probe based context key doesn't include the leaf frame probe info, so the `ContextId` string is created from two part: one from the probe stack strings' concatenation and other one from the leaf frame probe.
- Added regression test
Test Plan:
ninja & ninja check-llvm
Differential Revision: https://reviews.llvm.org/D92998
Yang Fan [Wed, 3 Feb 2021 03:04:58 +0000 (11:04 +0800)]
[CSSPGO] Fix MSVC initializing truncation warning (NFC)
MSVC warning:
```
\llvm-project\llvm\include\llvm\Transforms\IPO\SampleProfileProbe.h(65): warning C4305: 'initializing': truncation from 'double' to 'const float'
```
William S. Moses [Mon, 1 Feb 2021 23:16:17 +0000 (18:16 -0500)]
[SROA] Propagate correct TBAA/TBAA Struct offsets
SROA does not correctly account for offsets in TBAA/TBAA struct metadata.
This patch creates functionality for generating new MD with the corresponding
offset and updates SROA to use this functionality.
Differential Revision: https://reviews.llvm.org/D95826
(cherry picked from commit
40862b1a7486a969ff044cd240aad24f4183cc10)
Georgii Rymar [Thu, 28 Jan 2021 13:35:18 +0000 (16:35 +0300)]
[llvm-symbolizer] - Fix the crash in GNU output style with --no-inlines and missing input file.
Fixes https://bugs.llvm.org/show_bug.cgi?id=48882.
If the input file does not exist (or has a reading error), the
following code will crash if there are two or more input addresses.
```
auto ResOrErr = Symbolizer.symbolizeInlinedCode(
ModuleName, {Offset, object::SectionedAddress::UndefSection});
Printer << (error(ResOrErr) ? DILineInfo() : ResOrErr.get().getFrame(0));
```
For the first address, `symbolizeInlinedCode` returns an error.
For the second address, `symbolizeInlinedCode` returns an empty result
(not an error) and `.getFrame(0)` will crash.
Differential revision: https://reviews.llvm.org/D95609
(cherry picked from commit
d22140687500f90830fe416d9c1e317f7c4535d5)
Simonas Kazlauskas [Tue, 16 Feb 2021 21:35:32 +0000 (13:35 -0800)]
[llvm-dwp] Join dwo paths correctly when DWOPath is absolute
When the `DWOPath` is absolute, we want to use `DWOPath` as is, without prepending any other
components to the path. The `sys::path::append` does not join, but rather unconditionally appends
the paths, so something like `sys::path::append("/tmp", "/tmp/banana")` will result in
`/tmp/tmp/banana` rather than the desired `/tmp/banana`.
This then causes `llvm-dwp` to fail in a following situation:
```
$ clang -gsplit-dwarf /tmp/banana/test.c -c -o /tmp/outdir/foo.o
$ clang outdir/foo.o -o outdir/hm
$ llvm-dwarfdump outdir/hm | grep -C2 foo.dwo
DW_AT_comp_dir ("/tmp")
DW_AT_GNU_pubnames (true)
DW_AT_GNU_dwo_name ("/tmp/outdir/foo.dwo")
DW_AT_GNU_dwo_id (0xde4d396f3bf0e257)
DW_AT_low_pc (0x0000000000401100)
$ strace -o trace llvm-dwp -e outdir/hm -o outdir/hm.dwp
error: No such file or directory
$ cat trace | grep foo.dwo
openat(AT_FDCWD, "/tmp/tmp/outdir/foo.dwo", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
```
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D96678
(cherry picked from commit
6ffcb2937c96bd0d7a55b984b5eb8f381b68e322)
Kadir Cetinkaya [Fri, 22 Jan 2021 14:20:52 +0000 (15:20 +0100)]
[clangd] Treat "null" optional fields as missing
Clangd currently throws away any protocol messages whenever an optional
field has an unexpected type. This patch changes the behaviour to treat
`null` fields as missing.
This enables clangd to be more tolerant against small violations to the
LSP spec.
Fixes https://github.com/clangd/vscode-clangd/issues/134
Differential Revision: https://reviews.llvm.org/D95229
(cherry picked from commit
af20232b8e189335da571f48c2467b244b7fd772)
Shilei Tian [Fri, 19 Feb 2021 02:04:32 +0000 (21:04 -0500)]
[OpenMP][NVPTX] Add the support for CUDA 11.2 and CUDA 11.1
CUDA 11.2 and CUDA 11.1 are all available now.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D97004
(cherry picked from commit
89827fd404f954605663776e746ec351bde61348)
Jeroen Dobbelaere [Thu, 18 Feb 2021 16:29:46 +0000 (17:29 +0100)]
[clang] functions with the 'const' or 'pure' attribute must always return.
As described in
* https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-pure-function-attribute
* https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-const-function-attribute
An `__attribute__((pure))` function must always return, as well as an `__attribute__((const))` function.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D96960
(cherry picked from commit
46757ccb49ab88da54ca8ddd43665d5255ee80f7)
Nikita Popov [Fri, 19 Feb 2021 12:06:45 +0000 (13:06 +0100)]
[LLD] Fix tests after D96993
We now need mustprogress to eliminate these calls. The code doesn't
really make sense, but that's not the point of the test...
(cherry picked from commit
ac065b7a37d6dd8daacd526f6c3a0d1563bc88ac)
Nikita Popov [Thu, 18 Feb 2021 21:29:19 +0000 (22:29 +0100)]
[DCE] Don't remove non-willreturn calls
In both ADCE and BDCE (via DemandedBits) we should not remove
instructions that are not guaranteed to return. This issue was
pointed out by fhahn in the recent llvm-dev thread.
Differential Revision: https://reviews.llvm.org/D96993
(cherry picked from commit
2f17ed294fcd8cde505b93c9c5bbab06ba59051c)
Nikita Popov [Thu, 18 Feb 2021 21:15:17 +0000 (22:15 +0100)]
[IR] Move willReturn() to Instruction
This moves the willReturn() helper from CallBase to Instruction,
so that it can be used in a more generic manner. This will make
it easier to fix additional passes (ADCE and BDCE), and will give
us one place to change if additional instructions should become
non-willreturn (e.g. there has been talk about handling volatile
operations this way).
I have also included the IntrinsicInst workaround directly in
here, so that it gets applied consistently. (As such this change
is not entirely NFC -- FuncAttrs will now use this as well.)
Differential Revision: https://reviews.llvm.org/D96992
(cherry picked from commit
370addb996138a9e3634899cf264c7621307617a)
Nikita Popov [Thu, 18 Feb 2021 20:25:14 +0000 (21:25 +0100)]
[DCE] Add tests for non-willreturn function being removed (NFC)
(cherry picked from commit
4045ad6b0ccd35fe990d51b9bfdd9e7de109bdf5)
Lei Huang [Fri, 19 Feb 2021 19:24:05 +0000 (19:24 +0000)]
[PowerPC] Update release notes for changes to PowerPC for V12.0
David Sherwood [Mon, 8 Feb 2021 16:33:46 +0000 (16:33 +0000)]
[release][docs] Update contributions to LLVM 12 for scalable vectors.
Differential Revision: https://reviews.llvm.org/D96270
Jez Ng [Tue, 2 Feb 2021 23:18:07 +0000 (18:18 -0500)]
[lld-macho] Fill out release notes for 12.x
Differential Revision: https://reviews.llvm.org/D95900
Sam McCall [Tue, 2 Feb 2021 14:20:18 +0000 (15:20 +0100)]
[clangd] Fix race in Global CDB shutdown
I believe the atomic write can be reordered after the notify, and that
seems to be happening on mac m1: http://45.33.8.238/macm1/2654/step_8.txt
In practice maybe seq_cst is enough? But no reason not to lock here.
https://bugs.llvm.org/show_bug.cgi?id=48998
(cherry picked from commit
6ac3fd9706047304c52a678884122a3a6bc55432)
Simon Pilgrim [Tue, 2 Feb 2021 10:53:28 +0000 (10:53 +0000)]
[X86][AVX] Add missing VEX_WIG tags from VPACKUSDW/VPHSUBD/VPCMPISTRI/VPCMPISTRM/VPCMPESTRI/VPCMPESTRM
Fixes PR48877
Differential Revision: https://reviews.llvm.org/D95801
(cherry picked from commit
4d904776a77aa80342c65cf72a962920cc9d1fa9)
Simon Pilgrim [Mon, 1 Feb 2021 18:17:25 +0000 (18:17 +0000)]
[X86][AVX] Add 'OK' tests cases for PR48877
(cherry picked from commit
e9514429a02b1e4f8b9d54b28a934bfa9bd246ec)
Simon Pilgrim [Sat, 13 Feb 2021 11:59:52 +0000 (11:59 +0000)]
[DAG] Fix shift amount limit in SimplifyDemandedBits trunc(shift(x,c)) to truncated bitwidth
We lost this in D56387/rG69bc0990a9181e6eb86228276d2f59435a7fae67 - where I got the src/dst bitwidths mixed up and assumed getValidShiftAmountConstant would catch it.
Patch by @craig.topper - confirmed by @Carrot that it fixes PR49162
(cherry picked from commit
7ad0c573bd4a68dc81886037457d47daa3d6aa24)
Simon Pilgrim [Sat, 13 Feb 2021 11:33:14 +0000 (11:33 +0000)]
[X86] Add reduced test case for PR49162
(cherry picked from commit
5ca3ef98a71598d368f6f4aaf0b385b50b67ce4a)
Maxim Kuvyrkov [Fri, 12 Feb 2021 09:47:37 +0000 (09:47 +0000)]
Fix exegesis build on aarch64-windows-msvc host
Include x86 intrinsics only when compiling for x86_64
or i386. _MSC_VER no longer implies x86.
Reviewed By: gchatelet
Differential Revision: https://reviews.llvm.org/D96498
Fixes: https://bugs.llvm.org/show_bug.cgi?id=49149
(cherry picked from commit
06f53f2f095c45c93d269b5dc010af506f4b0ff4)
Martin Storsjö [Tue, 16 Feb 2021 13:00:54 +0000 (15:00 +0200)]
doc: Add a release note for the changed comment char for aarch64-msvc targets
This was backported in
a6ea391b832573830b011f26013ebaa946032250.
Nico Weber [Fri, 29 Jan 2021 16:20:04 +0000 (11:20 -0500)]
Revert "Disable rosegment for old Android versions."
This reverts commit
fae16fc0eed7cf60207901818cfe040116f2ef00.
Breaks building compiler-rt android runtimes with trunk clang
but older NDK, see discussion on https://reviews.llvm.org/D95166
(cherry picked from commit
1608ba09462d877111230e9461b895f696f8fcb1)
Stephen Kelly [Sat, 30 Jan 2021 01:36:40 +0000 (01:36 +0000)]
[ASTMatchers] Fix matching after generic top-level matcher
With a matcher like
expr(anyOf(integerLiteral(equals(42)), unless(expr())))
and code such as
struct B {
B(int);
};
B func1() { return 42; }
the top-level expr() would match each of the nodes which are not spelled
in the source and then ignore-traverse to match the integerLiteral node.
This would result in multiple results reported for the integerLiteral.
Fix that by only running matching logic on nodes which are not skipped
with the top-level matcher.
Differential Revision: https://reviews.llvm.org/D95735
(cherry picked from commit
d6a06365cf12bebe20a7d65cf3894608efc089b4)
Stephen Kelly [Sat, 30 Jan 2021 15:46:08 +0000 (15:46 +0000)]
[ASTMatchers] Fix definition of decompositionDecl
(cherry picked from commit
b10d445307a0f3c7e5522836b4331090aacaf349)
Stephen Kelly [Thu, 28 Jan 2021 13:12:43 +0000 (13:12 +0000)]
Fix traversal with hasDescendant into lambdas
Differential Revision: https://reviews.llvm.org/D95607
(cherry picked from commit
bb57a3422a09dcdd572ccb42767a0dabb5f966dd)
Stephen Kelly [Wed, 27 Jan 2021 22:03:23 +0000 (22:03 +0000)]
[ASTMatchers] Fix traversal below range-for elements
Differential Revision: https://reviews.llvm.org/D95562
(cherry picked from commit
79125085f16540579d27c7e4987f63eef9c4aa23)
Stephen Kelly [Thu, 28 Jan 2021 23:40:16 +0000 (23:40 +0000)]
Ensure that we traverse non-op() method bodys of lambdas
Differential Revision: https://reviews.llvm.org/D95644
(cherry picked from commit
43cc4f15008f8c700497d3d2b7020bfd29f5750f)
Stephen Kelly [Wed, 27 Jan 2021 23:47:05 +0000 (23:47 +0000)]
[ASTMatchers] Avoid pathological traversal over nested lambdas
Differential Revision: https://reviews.llvm.org/D95573
(cherry picked from commit
6f0df3cddb3e3f38df1baa7aa4d743a74bb46688)
Qiu Chaofan [Fri, 5 Feb 2021 12:33:56 +0000 (20:33 +0800)]
Revert "[PowerPC] [Clang] Enable float128 feature on P9 by default"
Commit
6bf29dbb enables float128 feature by default for Power9 targets.
But float128 may cause build failure in libcxx testing. Revert this
commit first to unblock LLVM 12 release.
(cherry picked from commit
447dc856b243b99ce70019ba1187c39746f4e0e9)
Fangrui Song [Mon, 8 Feb 2021 21:31:05 +0000 (13:31 -0800)]
[Verifier] Allow DW_TAG_class_type/DW_TAG_union_type to have no filename
`clang/lib/CodeGen/CGOpenMPRuntime.cpp` synthesized union
(`distinct !DICompositeType(tag: DW_TAG_union_type, name: "kmp_cmplrdata_t", size: 64, elements: <0x62b690>)`)
does not have meaningful filename/line number.
D94735 dropped the previously arbitrary and untested filename/line from the union and caused a verifier error here.
This fixes `check-libarcher` failures.
Differential Revision: https://reviews.llvm.org/D96212
(cherry picked from commit
ad60802a7187aa39b0374536be3fa176fe3d6256)
Wang, Pengfei [Tue, 9 Feb 2021 13:12:59 +0000 (21:12 +0800)]
[X86] Always assign reassoc flag for intrinsics *reduce_add/mul_ps/pd.
Intrinsics *reduce_add/mul_ps/pd have assumption that the elements in
the vector are reassociable. So we need to always assign the reassoc
flag when we call _mm_reduce_* intrinsics.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D96231
(cherry picked from commit
dd2460ed5d77d908327ce29a15630cd3268bd76e)
Craig Topper [Tue, 9 Feb 2021 17:28:06 +0000 (09:28 -0800)]
[RISCV] Remove SRO* and SLO* instructions from bitmanip.
As of the current draft these are no longer being considered
for the bitmanip spec. It wasn't clear what sub extension they
belonged in in the 0.93 spec.
So remove them. They can always be added back if something changes.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D96157
(cherry picked from commit
fd5adae02cafe388673d3b3f92ef791af3c73cfe)
Michael Liao [Thu, 4 Feb 2021 16:05:35 +0000 (11:05 -0500)]
Recommit of
a2fdf9d4d734732a6fa9288f1ffdf12bf8618123.
- The failures are all cc1-based tests due to the missing `-aux-triple` options,
which is always prepared by the driver in CUDA/HIP compilation.
- Add extra check on the missing aux-targetinfo to prevent crashing.
[hip][cuda] Enable extended lambda support on Windows.
- On Windows, extended lambda has extra issues due to the numbering
schemes are different between the host compilation (Microsoft C++ ABI)
and the device compilation (Itanium C++ ABI. Additional device side
lambda number is required per lambda for the host compilation to
correctly mangle the device-side lambda name.
- A hybrid numbering context `MSHIPNumberingContext` is introduced to
number a lambda for both host- and device-compilations.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D69322
This reverts commit
4874ff02417916cc9ff994b34abcb5e563056546.
(cherry picked from commit
01bf529db2cf465b029e29e537807576bfcbc452)
Martin Storsjö [Mon, 8 Feb 2021 13:24:42 +0000 (15:24 +0200)]
[AArch64] Use '//' as comment string for MSVC assembly
As the actual MSVC toolset doesn't use the GAS-style assembly that
Clang/LLVM produces and consumes, there's no reference for what
string to use for e.g. comments when building with a MSVC triple.
This frees up the use of semicolon as separator string, just like
was done for GNU targets in
23413195649d0cf6f3860ae8b5fb115b35032075.
(Previously, both the separator and comment strings were set to
the same, a semicolon.)
Compiler-rt extensively uses separator chars in its assembly,
and that assembly should be buildable with clang-cl for MSVC too.
Differential Revision: https://reviews.llvm.org/D96259
(cherry picked from commit
71c29b4cf3fb2b5610991bfbc12b8bda97d60005)
Fangrui Song [Thu, 4 Feb 2021 17:17:47 +0000 (09:17 -0800)]
[ELF] Allow R_386_GOTOFF from .debug_info
In GCC emitted .debug_info sections, R_386_GOTOFF may be used to
relocate DW_AT_GNU_call_site_value values
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98946).
R_386_GOTOFF (`S + A - GOT`) is one of the `isStaticLinkTimeConstant` relocation
type which is not PC-relative, so it can be used from non-SHF_ALLOC sections. We
current allow new relocation types as needs come. The diagnostic has caught some
bugs in the past.
Differential Revision: https://reviews.llvm.org/D95994
(cherry picked from commit
b3165a70ae83b46dc145f335dfa9690ece361e92)
Zarko Todorovski [Tue, 2 Feb 2021 15:56:15 +0000 (10:56 -0500)]
[AIX] Improve option processing for mabi=vec-extabi and mabi=vec=defaul
Opening this revision to better address comments by @hubert.reinterpretcast in https://reviews.llvm.org/rGcaaaebcde462
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D95702
(cherry picked from commit
eb3426a528d5b3cbbb54aee662a779f2067fc9db)
Zarko Todorovski [Fri, 29 Jan 2021 19:05:17 +0000 (14:05 -0500)]
[AIX] Actually push back "-mabi=vec-extabi" when option is on.
Accidentaly ommitted the portion of pushing back the option in
https://reviews.llvm.org/D94986
(cherry picked from commit
caaaebcde462bf681498ce85c2659d683a07fc87)
Shilei Tian [Sat, 30 Jan 2021 20:14:41 +0000 (15:14 -0500)]
[OpenMP][NVPTX] Refined CMake logic to choose compute capabilites
This patch refines the logic to choose compute capabilites via the
environment variable `LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES`. It supports the
following values (all case insensitive):
- "all": Build `deviceRTLs` for all supported compute capabilites;
- "auto": Only build for the compute capability auto detected. Note that this
requires CUDA. If CUDA is not found, a CMake fatal error will be raised.
- "xx,yy" or "xx;yy": Build for compute capabilities `xx` and `yy`.
If `LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES` is not set, it is equivalent to set
it to `all`.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D95687
(cherry picked from commit
26d38f6d20ff137d89cb7c891b739662de1ca508)
Dimitry Andric [Mon, 15 Feb 2021 17:22:01 +0000 (18:22 +0100)]
Define new/delete in libc++ when using libcxxrt
Always turn on LIBCXX_ENABLE_NEW_DELETE_DEFINITIONS, if libcxxrt is used
as the C++ ABI library, since libcxxrt does not provide the full set
ofnew and delete operators. In particular, the aligned versions of these
operators are completely missing. This primarily addresses builds on
FreeBSD, as this platform uses libcxxrt by default.
Also, attempt to provide a FreeBSD.cmake cache file, with hopefully sane
settings, partially copied from the Apple.cmake cache file. This needs
more work, probably some additions to ci build scripts (although I am
not aware of any 'official' FreeBSD build bots).
Reviewed By: ldionne, #libc
Differential Revision: https://reviews.llvm.org/D96720
(cherry picked from commit
328261019f50a76b11fa625739cbf32ceb2ce2f7)
Johannes Doerfert [Tue, 2 Feb 2021 17:17:44 +0000 (11:17 -0600)]
[OpenMP] Delay more diagnostics of potentially non-emitted code
Even code in target and declare target regions might not be emitted.
With this patch we delay more diagnostics and use laziness and linkage
to determine if a function is emitted (for the device). Note that we
still eagerly emit diagnostics for target regions, unfortunately, see
the TODO for the reason.
This hopefully fixes PR48933.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D95928
(cherry picked from commit
1dd66e6111a8247c6c7931143251c0cf1442b905)
Johannes Doerfert [Fri, 29 Jan 2021 08:42:20 +0000 (02:42 -0600)]
[OpenMP] Attribute target diagnostics properly
Type errors in function declarations were not (always) diagnosed prior
to this patch. Furthermore, certain remarks did not get associated
properly which caused them to be emitted multiple times.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D95912
(cherry picked from commit
f9286b434b764b366f1aad9249c04e7741ed5518)
Johannes Doerfert [Tue, 2 Feb 2021 23:24:53 +0000 (17:24 -0600)]
[OpenMP][NFC] Pre-commit test changes regarding PR48933
This will highlight the effective changes in subsequent commits.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D95903
(cherry picked from commit
3b2f19d0bc2803697526191a8a607efa0b38f7e4)
Johannes Doerfert [Sat, 6 Feb 2021 17:42:02 +0000 (11:42 -0600)]
[AssumptionCache] Do not track llvm.assume calls (PR49043)
This fixes PR49043 by invalidating the handle on RAUW. This will work
fine assuming all existing RAUW users add the new assumption to the
cache. That means, if a new llvm.assume call replaces an old one, you
need to add the new one now as a RAUW is not enough anymore.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D96208
(cherry picked from commit
378f4e5ec26c3e0d2119c1112ec645b369eed2de)
Tom Stellard [Mon, 15 Feb 2021 19:40:39 +0000 (11:40 -0800)]
workflows: Increase the fetch-depth for actions/checkout steps
This avoids failures when many commits are pushed close together.
Nathan James [Wed, 3 Feb 2021 05:11:28 +0000 (05:11 +0000)]
[clang-tidy] Fix crash in readability-identifier-naming check
`isParamInMainLikeFunction` didn't check if the function had an identifer name before calling getName() which could lead to an assert.
(cherry picked from commit
c97592c5df09850404a9ddbfb614c7df271d1dfe)
Jeroen Dobbelaere [Tue, 2 Feb 2021 16:55:06 +0000 (17:55 +0100)]
[InlineFunction] Only update noalias scopes once for an instruction.
Inlining sometimes maps different instructions to be inlined onto the same instruction.
We must ensure to only remap the noalias scopes once. Otherwise the scope might disappear (at best).
This patch ensures that we only replace scopes for which the mapping is known.
This approach is preferred over tracking which instructions we already handled in a SmallPtrSet,
as that one will need more memory.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D95862
(cherry picked from commit
50c523a9d4402c69d59c0b2ecb383a763d16cde9)
Jeroen Dobbelaere [Mon, 1 Feb 2021 08:23:33 +0000 (09:23 +0100)]
[LoopPeel] Use llvm.experimental.noalias.scope.decl for duplicating noalias metadata as needed.
The reduction of a sanitizer build failure when enabling the dominance check (D95335) showed that loop peeling also needs to take care of scope duplication, just like loop unrolling (D92887).
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D95544
(cherry picked from commit
80cdd30eb90c3509bf315f1fa1369483e2448bbd)
Fangrui Song [Thu, 28 Jan 2021 04:34:35 +0000 (20:34 -0800)]
IntrinsicEmitter: Change IntrinsicsToAttributesMap from uint8_t[] to uint16_t[]
We need at least 252 UniqAttributes now, which will soon overflow.
Actually with downstream backends we can easily use up the last few values.
So bump to uint16_t.
(cherry picked from commit
b7d63244226ba2c0df651622fe7fe3f5f8aba262)
Hsiangkai Wang [Mon, 1 Feb 2021 08:08:46 +0000 (16:08 +0800)]
[RISCV] Add new vector instructions in v0.10.
* Add new vector instructions in v0.10.
- load/store for mask value vle1.v vse1.v
- vsetivli for 0-31 immediate vector length.
* Rename vector instructions in v0.10.
- vfrsqrte7 -> vfrsqrt7
- vfrece7 -> vfrec7
* Reserve memory width encodings for EEW>128b.
Differential Revision: https://reviews.llvm.org/D95781
(cherry picked from commit
c7189ba78578d029e0162720319de3c1c6fc348b)
Craig Topper [Tue, 2 Feb 2021 07:53:54 +0000 (23:53 -0800)]
[RISCV] Replace NoX0 SDNodeXForm with a ComplexPattern to do the selection of the VL operand.
I think this is a more standard way of doing this.
Reviewed By: rogfer01
Differential Revision: https://reviews.llvm.org/D95833
(cherry picked from commit
e7f9a834996f40be8dc46a0b059aa850f1f4ef05)
Richard Smith [Thu, 4 Feb 2021 21:55:28 +0000 (13:55 -0800)]
Don't infer attributes on '::operator new'.
These attributes were all incorrect or inappropriate for LLVM to infer:
- inaccessiblememonly is generally wrong; user replacement operator new
can access memory that's visible to the caller, as can a new_handler
function.
- willreturn is generally wrong; a custom new_handler is not guaranteed
to terminate.
- noalias is inappropriate: Clang has a flag to determine whether this
attribute should be present and adds it itself when appropriate.
- noundef and nonnull on the return value should be specified by the
frontend on all 'operator new' functions if we want them, not here.
In any case, inferring attributes on functions declared 'nobuiltin' (as
these are when Clang emits them) seems questionable.
(cherry picked from commit
ab243efb261ba7e27f4b14e1a6fbbff15a79c0bf)
Richard Smith [Thu, 4 Feb 2021 21:38:38 +0000 (13:38 -0800)]
Revert "[BuildLibcalls, Attrs] Support more variants of C++'s new, add attributes for C++'s delete"
Several of the new attributes here were incorrect, and even the ones
that are generally correct were being added even to nobuiltin calls.
This reverts commit
bb3f169b59e1c8bd7fd70097532220bbd11e9967.
(cherry picked from commit
1484ad4137b5d627573672bad48b03785f8fdefd)
Ayke van Laethem [Tue, 2 Feb 2021 19:58:31 +0000 (20:58 +0100)]
[ARM] Do not emit ldrexd/strexd on Cortex-M chips
The ldrexd/strexd instructions are not supported on M-class chips, see
for example
https://developer.arm.com/documentation/dui0489/e/arm-and-thumb-instructions/memory-access-instructions/ldrex-and-strex
which says:
> All these 32-bit Thumb instructions are available in ARMv6T2 and
> above, except that LDREXD and STREXD are not available in the ARMv7-M
> architecture.
Looking at the ARMv8-M architecture, it appears that these instructions
aren't supported either. The Architecture Reference Manual lists
ldrex/strex but not ldrexd/strexd:
https://developer.arm.com/documentation/ddi0553/bn/
Godbolt example on LLVM 11.0.0, which incorrectly emits ldrexd/strexd
instructions: https://llvm.godbolt.org/z/5qqPnE
Differential Revision: https://reviews.llvm.org/D95891
(cherry picked from commit
aecdf15cc7f866180dc769265b8183cad34bb33a)
Joachim Meyer [Thu, 17 Dec 2020 22:58:13 +0000 (23:58 +0100)]
[Support] Indent multi-line descr of enum cli options.
As noted in https://reviews.llvm.org/D93459, the formatting of
multi-line descriptions of clEnumValN and the likes is unfavorable.
Thus this patch adds support for correctly indenting these.
Reviewed By: serge-sans-paille
Differential Revision: https://reviews.llvm.org/D93494
(cherry picked from commit
e3f02302e318837d2421c6425450f04ae0a82b90)
Fraser Cormack [Tue, 2 Feb 2021 14:40:52 +0000 (14:40 +0000)]
[RISCV] Fix incorrect RVV sdiv/udiv lowering
Due to a clerical error, the sdiv operation was mapping to vdivu and
udiv to vdiv, when the opposite mapping is the correct one.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D95869
(cherry picked from commit
b4106f9c7b8c498d109301ced7bf9aca32027168)
Nemanja Ivanovic [Tue, 9 Feb 2021 12:33:48 +0000 (06:33 -0600)]
[DAGCombine] Do not remove masking argument to FP16_TO_FP for some targets
As of commit
284f2bffc9bc5, the DAG Combiner gets rid of the masking of the
input to this node if the mask only keeps the bottom 16 bits. This is because
the underlying library function does not use the high order bits. However, on
PowerPC's ELFv2 ABI, it is the caller that is responsible for clearing the bits
from the register. Therefore, the library implementation of __gnu_h2f_ieee will
return an incorrect result if the bits aren't cleared.
This combine is desired for ARM (and possibly other targets) so this patch adds
a query to Target Lowering to check if this zeroing needs to be kept.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=49092
Differential revision: https://reviews.llvm.org/D96283
(cherry picked from commit
a5222aa0858a42660629c410a5b669dee16a4359)
Richard Smith [Tue, 9 Feb 2021 01:32:52 +0000 (17:32 -0800)]
PR48587: is_constant_evaluated() should not evaluate to true during a
variable's destruction if it didn't do so during construction.
The standard doesn't give any guidance as to what to do here, but this
approach seems reasonable and conservative, and has been proposed to the
standard committee.
(cherry picked from commit
c945dc4a5023d6a17d11fcda76509b94b36e34fc)
Jessica Paquette [Tue, 2 Feb 2021 22:21:33 +0000 (14:21 -0800)]
[GlobalISel] Check if branches use the same MBB in matchOptBrCondByInvertingCond
If the G_BR + G_BRCOND in this combine use the same MBB, then it will infinite
loop. Don't allow that to happen.
Differential Revision: https://reviews.llvm.org/D95895
(cherry picked from commit
02d4b365bf4f8c2cb56e5612902f6c3bb4316493)
Nathan James [Fri, 12 Feb 2021 16:55:44 +0000 (16:55 +0000)]
[clangd] Fix clang tidy provider when multiple config files exist in directory tree
Currently Clang tidy provider searches from the root directory up to the target directory, this is the opposite of how clang-tidy searches for config files.
The result of this is .clang-tidy files are ignored in any subdirectory of a directory containing a .clang-tidy file.
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D96204
(cherry picked from commit
ba3ea9c60f0f259f0ccc47e47daf8253a5885531)
Simon Pilgrim [Wed, 27 Jan 2021 10:14:54 +0000 (10:14 +0000)]
Fix "not all control paths return a value" warning. NFCI.
Richard Smith [Tue, 9 Feb 2021 01:58:05 +0000 (17:58 -0800)]
PR48606: The lifetime of a constexpr heap allocation always started
during the same evaluation.
It looks like the only case for which this matters is determining
whether mutable subobjects of a heap allocation can be modified during
constant evaluation.
(cherry picked from commit
21e8bb83253e1a2f4b6fad9b53cafe8c530a38e2)
Zequan Wu [Fri, 5 Feb 2021 01:00:09 +0000 (17:00 -0800)]
[AST] Update LVal before evaluating lambda decl fields.
Differential Revision: https://reviews.llvm.org/D96092
(cherry picked from commit
96fb49c3ff8e08680127ddd4ec45a0e6c199243b)
Walter Erquinigo [Thu, 4 Feb 2021 18:07:07 +0000 (10:07 -0800)]
[lldb-vscode] correctly use Windows macros
@mstorsjo found a mistake that I made when trying to fix some Windows
compilation errors encountered by @stella.stamenova.
I was incorrectly using the LLVM_ON_UNIX macro. In any case, proper use
of
#if defined(_WIN32)
should be the actual fix.
Differential Revision: https://reviews.llvm.org/D96060
(cherry picked from commit
36496cc2992d6fa26e6024971efcfc7d15f69888)
Walter Erquinigo [Thu, 28 Jan 2021 17:24:30 +0000 (09:24 -0800)]
Fix lldb-vscode builds on Windows targeting POSIX
@stella.stamenova found out that lldb-vscode's Win32 macros were failing
when building on windows targetings POSIX platforms.
I'm changing these macros for LLVM_ON_UNIX, which should be more
accurate.
(cherry picked from commit
0bca9a7ce2eeaa9f1d732ffbc17769560a2b236e)
Walter Erquinigo [Wed, 27 Jan 2021 21:02:45 +0000 (13:02 -0800)]
Fix runInTerminal failures on Windows
stella.stemenova mentioned in https://reviews.llvm.org/D93951 failures on Windows for this test.
I'm fixing the macro definitions and disabling the tests for python
versions lower than 3.7. I'll figure out that actual issue with
python3.6 after the buildbots are fine again.
(cherry picked from commit
ab5591e1d8f5abcfa9e75193d3e8a29087b61425)
Louis Dionne [Wed, 3 Feb 2021 22:00:20 +0000 (17:00 -0500)]
[🍒][libc++] Fix libcxx build on 32bit architectures with 64bit time_t defaults e.g. riscv32
Patch by Khem Raj.
(cherry pick of commit
85b9c5ccc172a1e61c7ecaaec4752587cb6f1e26)
Differential Revision: https://reviews.llvm.org/D96062
Reid Kleckner [Mon, 1 Feb 2021 23:18:42 +0000 (15:18 -0800)]
[🍒]Disable CFI in __get_elem to allow casting a pointer to uninitialized memory
Fixes usage of shared_ptr with CFI enabled, which is llvm.org/pr48993.
(cherry pick of commit
bab74864168bb5e28ecbc0294fe1095d8da7f569)
Differential Revision: https://reviews.llvm.org/D96063
Louis Dionne [Tue, 2 Feb 2021 21:58:38 +0000 (16:58 -0500)]
[🍒][libc++] Rename include/support to include/__support
We do ship those headers, so the directory name should not be something
that can potentially conflict with user-defined directories.
This is a cherry-pick of
b51756819a85563ae063e98eeb3d6af8e44c8f64.
Differential Revision: https://reviews.llvm.org/D96059
Shilei Tian [Fri, 5 Feb 2021 01:14:14 +0000 (20:14 -0500)]
[OpenMP][libomptarget] Fixed an issue that device sync is skipped if the kernel doesn't have any argument
Currently if there is not kernel argument, device synchronization will
be skipped. This can lead to two issues:
1. If there is any device error, it will not be captured;
2. The target region might end before the kernel is done, which is not spec
conformant.
The test added in this patch only runs on NVPTX platform, although it will not
be executed by Phab at all. It also requires `not` which is not available on most
systems.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D96067
(cherry picked from commit
b68a6b09e60a24733b923a0fc282746a855852da)
Nikita Popov [Sun, 31 Jan 2021 16:55:24 +0000 (17:55 +0100)]
[MemorySSA] Don't treat lifetime.end as NoAlias
MemorySSA currently treats lifetime.end intrinsics as not aliasing
anything. This breaks MemorySSA-based MemCpyOpt, because we'll happily
move a read of a pointer below a lifetime.end intrinsic, as no clobber
is reported.
I think the MemorySSA modelling here isn't correct: lifetime.end(p)
has approximately the same effect as doing a memcpy(p, undef), and
should be treated as a clobber.
This patch removes the special handling of lifetime.end, leaving
alias analysis to handle it appropriately.
Differential Revision: https://reviews.llvm.org/D95763
Nikita Popov [Fri, 29 Jan 2021 11:56:23 +0000 (12:56 +0100)]
[MemCpyOpt] Add test for incorrect optimization across lifetime (NFC)
This only affects the MemorySSA-based implementation.
Tom Stellard [Fri, 5 Feb 2021 01:40:33 +0000 (01:40 +0000)]
workflows: Update libclang-abi-tests to work with minor release baselines
Hans Wennborg [Thu, 4 Feb 2021 12:26:59 +0000 (13:26 +0100)]
Add a release note about deprecating the clang-cl /fallback flag
As discussed in
https://lists.llvm.org/pipermail/cfe-dev/2021-January/067524.html
The flag has been removed on the main branch in D95876.
Differential revision: https://reviews.llvm.org/D96016
Shilei Tian [Thu, 28 Jan 2021 12:24:19 +0000 (07:24 -0500)]
[OpenMP] Disabled profiling in `libomp` by default to unblock link errors
Link error occurred when time profiling in libomp is enabled by default
because `libomp` is assumed to be a C library but the dependence on
`libLLVMSupport` for profiling is a C++ library. Currently the issue blocks all
OpenMP tests in Phabricator.
This patch set a new CMake option `OPENMP_ENABLE_LIBOMP_PROFILING` to
enable/disable the feature. By default it is disabled. Note that once time
profiling is enabled for `libomp`, it becomes a C++ library.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D95585
(cherry picked from commit
c571b168349fdf22d1dc8b920bcffa3d5161f0a2)
Giorgis Georgakoudis [Mon, 25 Jan 2021 22:10:50 +0000 (14:10 -0800)]
[OpenMP] Fix building using LLVM_ENABLE_RUNTIMES
Fix when time profiling is enabled.
Related to: D94855
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D95398
(cherry picked from commit
bb40e6731843de92f1c73ad6efceb8a89e045ea6)
Peter Waller [Tue, 26 Jan 2021 11:55:24 +0000 (11:55 +0000)]
[clang][aarch64][WOA64][docs] Release note for longjmp crash with /guard:cf
Add a release note workaround for PR47463.
Bug: https://bugs.llvm.org/show_bug.cgi?id=47463
Differential Revision: https://reviews.llvm.org/D95435
Shilei Tian [Thu, 4 Feb 2021 13:44:20 +0000 (08:44 -0500)]
Revert "[OpenMP] Disabled profiling in `libomp` by default to unblock link errors"
This reverts commit
f5602e0bf31ab590da19fa357980a753dbfd666e.
Craig Topper [Mon, 1 Feb 2021 18:56:09 +0000 (10:56 -0800)]
[X86] Accept 64-bit GPRs for vextractps when using a register that requires EVEX.
This is consistent with the VEX version. It also fixes a sorting
issue in the matching table that caused the EVEX version to be
prioritized over VEX in intel syntax.
Fixes issue [2] from PR48991.
(cherry picked from commit
c691fe14da93a7c9eff466231515d6d4d16124fa)
Shilei Tian [Thu, 4 Feb 2021 01:57:59 +0000 (20:57 -0500)]
[OpenMP][NVPTX] Take functions in `deviceRTLs` as `convergent`
OpenMP device compiler (similar to other SPMD compilers) assumes that
functions are convergent by default to avoid invalid transformations, such as
the bug (https://bugs.llvm.org/show_bug.cgi?id=49021).
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D95971
(cherry picked from commit
0f0ce3c12edefd25448e39c4d20718a10d3d42c1)
Hongtao Yu [Fri, 11 Dec 2020 20:18:31 +0000 (12:18 -0800)]
[CSSPGO] Introducing distribution factor for pseudo probe.
Sample re-annotation is required in LTO time to achieve a reasonable post-inline profile quality. However, we have seen that such LTO-time re-annotation degrades profile quality. This is mainly caused by preLTO code duplication that is done by passes such as loop unrolling, jump threading, indirect call promotion etc, where samples corresponding to a source location are aggregated multiple times due to the duplicates. In this change we are introducing a concept of distribution factor for pseudo probes so that samples can be distributed for duplicated probes scaled by a factor. We hope that optimizations duplicating code well-maintain the branch frequency information (BFI) based on which probe distribution factors are calculated. Distribution factors are updated at the end of preLTO pipeline to reflect an estimated portion of the real execution count.
This change also introduces a pseudo probe verifier that can be run after each IR passes to detect duplicated pseudo probes.
A saturated distribution factor stands for 1.0. A pesudo probe will carry a factor with the value ranged from 0.0 to 1.0. A 64-bit integral distribution factor field that represents [0.0, 1.0] is associated to each block probe. Unfortunately this cannot be done for callsite probes due to the size limitation of a 32-bit Dwarf discriminator. A 7-bit distribution factor is used instead.
Changes are also needed to the sample profile inliner to deal with prorated callsite counts. Call sites duplicated by PreLTO passes, when later on inlined in LTO time, should have the callees’s probe prorated based on the Prelink-computed distribution factors. The distribution factors should also be taken into account when computing hotness for inline candidates. Also, Indirect call promotion results in multiple callisites. The original samples should be distributed across them. This is fixed by adjusting the callisites' distribution factors.
Reviewed By: wmi
Differential Revision: https://reviews.llvm.org/D93264
(cherry picked from commit
3d89b3cbec230633e8228787819b15116c1a1730)
Wenlei He [Wed, 20 Jan 2021 07:29:14 +0000 (23:29 -0800)]
[CSSPGO] Factor out common part for CSSPGO inline and AFDO inline
Refactoring SampleProfileLoader::inlineHotFunctions to use helpers from CSSPGO inlining and reduce similar code in the inlining loop, plus minor cleanup for AFDO path.
This is resubmit of D95024, with build break and overtighten assertion fixed.
Test Plan:
(cherry picked from commit
1645f465be85223e9f5b6303a3e5e0e491fd819f)
Wenlei He [Mon, 4 Jan 2021 00:43:06 +0000 (16:43 -0800)]
[CSSPGO] Call site prioritized inlining for sample PGO
This change implemented call site prioritized BFS profile guided inlining for sample profile loader. The new inlining strategy maximize the benefit of context-sensitive profile as mentioned in the follow up discussion of CSSPGO RFC. The change will not affect today's AutoFDO as it's opt-in. CSSPGO now defaults to the new FDO inliner, but can fall back to today's replay inliner using a switch (`-sample-profile-prioritized-inline=0`).
Motivation
With baseline AutoFDO, the inliner in sample profile loader only replays previous inlining, and the use of profile is only for pruning previous inlining that turned out to be cold. Due to the nature of replay, the FDO inliner is simple with hotness being the only decision factor. It has the following limitations that we're improving now for CSSPGO.
- It doesn't take inline candidate size into account. Since it's doing replay, the size growth is bounded by previous CGSCC inlining. With context-sensitive profile, FDO inliner is no longer limited by previous inlining, so we need to take size into account to avoid significant size bloat.
- The way it looks at hotness is not accurate. It uses total samples in an inlinee as proxy for hotness, while what really matters for an inline decision is the call site count. This is an unfortunate fall back because call site count and callee entry count are not reliable due to dwarf based correlation, especially for inlinees. Now paired with pseudo-probe, we have accurate call site count and callee's entry count, so we can use that to gauge hotness more accurately.
- It treats all call sites from a block as hot as long as there's one call site considered hot. This is normally true, but since total samples is used as hotness proxy, this transitiveness within block magnifies the inacurate hotness heuristic. With pseduo-probe and the change above, this is no longer an issue for CSSPGO.
New FDO Inliner
Putting all the requirement for CSSPGO together, we need a top-down call site prioritized BFS inliner. Here're reasons why each component is needed.
- Top-down: We need a top-down inliner to better leverage context-sensitive profile, so inlining is driven by accurate context profile, and post-inline is also accurate. This is already implemented in https://reviews.llvm.org/D70655.
- Size Cap: For top-down inliner, taking function size into account for inline decision alone isn't sufficient to control size growth. We also need to explicitly cap size growth because with top-down inlining, we can grow inliner size significantly with large number of smaller inlinees even if each individually passes the cost/size check.
- Prioritize call sites: With size cap, inlining order also becomes important, because if we stop inlining due to size budget limit, we'd want to use budget towards the most beneficial call sites.
- BFS inline: Same as call site prioritization, if we stop inlining due to size budget limit, we want a balanced inline tree, rather than going deep on one call path.
Note that the new inliner avoids repeatedly evaluating same set of call site, so it should help with compile time too. For this reason, we could transition today's FDO inliner to use a queue with equal priority to avoid wasted reevaluation of same call site (TODO).
Speculative indirect call promotion and inlining is also supported now with CSSPGO just like baseline AutoFDO.
Tunings and knobs
I created tuning knobs for size growth/cap control, and for hot threshold separate from CGSCC inliner. The default values are selected based on initial tuning with CSSPGO.
Results
Evaluated with an internal LLVM fork couple months ago, plus another change to adjust hot-threshold cutoff for context profile (will send up after this one), the new inliner show ~1% geomean perf win on spec2006 with CSSPGO, while reducing code size too. The measurement was done using train-train setup, MonoLTO w/ new pass manager and pseudo-probe. Note that this is just a starting point - we hope that the new inliner will open up more opportunity with CSSPGO, but it will certainly take more time and effort to make it fully calibrated and ready for bigger workloads (we're working on it).
Differential Revision: https://reviews.llvm.org/D94001
(cherry picked from commit
6bae5973c476e16dbbc82030d65c7859a6628e89)
Hongtao Yu [Fri, 22 Jan 2021 23:52:46 +0000 (15:52 -0800)]
[CSSPGO] Passing the clang driver switch -fpseudo-probe-for-profiling to the linker.
As titled.
Reviewed By: wmi, wenlei
Differential Revision: https://reviews.llvm.org/D95271
(cherry picked from commit
d3e2e3740d0730cb6788c771bb01a8f3e935bf2e)
Hongtao Yu [Mon, 1 Feb 2021 06:31:51 +0000 (22:31 -0800)]
[CSSPGO] Tweaking inlining with pseudo probes.
Fixing up a couple places where `getCallSiteIdentifier` is needed to support pseudo-probe-based callsites.
Also fixing an issue in the extbinary profile reader where the metadata section is not fully scanned based on the number of profiles loaded only for the current module.
Reviewed By: wmi, wenlei
Differential Revision: https://reviews.llvm.org/D95791
(cherry picked from commit
224fee8219bb3aed34f13ce40935e1b3ede90a0f)
Hongtao Yu [Thu, 28 Jan 2021 00:04:11 +0000 (16:04 -0800)]
[CSSPGO] Support of CS profiles in extended binary format.
This change brings up support of context-sensitive profiles in the format of extended binary. Existing sample profile reader/writer/merger code is being tweaked to reflect the fact of bracketed input contexts, like (`[...]`). The paired brackets are also needed in extbinary profiles because we don't yet have an otherwise good way to tell calling contexts apart from regular function names since the context delimiter `@` can somehow serve as a part of the C++ mangled names.
Reviewed By: wmi, wenlei
Differential Revision: https://reviews.llvm.org/D95547
(cherry picked from commit
7e99bddfeaab2713a8bb6ca538da25b66e6efc59)
Shilei Tian [Thu, 28 Jan 2021 12:24:19 +0000 (07:24 -0500)]
[OpenMP] Disabled profiling in `libomp` by default to unblock link errors
Link error occurred when time profiling in libomp is enabled by default
because `libomp` is assumed to be a C library but the dependence on
`libLLVMSupport` for profiling is a C++ library. Currently the issue blocks all
OpenMP tests in Phabricator.
This patch set a new CMake option `OPENMP_ENABLE_LIBOMP_PROFILING` to
enable/disable the feature. By default it is disabled. Note that once time
profiling is enabled for `libomp`, it becomes a C++ library.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D95585
(cherry picked from commit
c571b168349fdf22d1dc8b920bcffa3d5161f0a2)
Stephen Kelly [Wed, 3 Feb 2021 23:04:12 +0000 (23:04 +0000)]
Extend release notes for AST Matchers changes
Richard Smith [Wed, 3 Feb 2021 22:57:19 +0000 (14:57 -0800)]
PR44325 (and duplicates): don't issue -Wzero-as-null-pointer-constant
when rewriting 'a < b' as '(a <=> b) < 0'.
It's pretty common for comparison category types to use a pointer or
pointer-to-member type as their '0' parameter.
(cherry picked from commit
1f06f41993b6363e6b2c4f22a13488a3e687f31b)
Joseph Huber [Mon, 1 Feb 2021 15:31:09 +0000 (10:31 -0500)]
[OpenMP] Fix seg fault in libomptarget when using Info with multiple threads
Summary:
One option for the LIBOMPTARGET_INFO environment variable is to print the current status of the device's data mappings. These are a shared resource among threads so this needs to be protected when using multiple streams.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D95786
(cherry picked from commit
fda48539988d2a1bdb6395799151e9090312a20b)
Shilei Tian [Fri, 29 Jan 2021 18:12:47 +0000 (13:12 -0500)]
[OpenMP][NFC] Added release note for new `deviceRTLs` and hidden helper task
Added release note for new `deviceRTLs` and hidden helper task for LLVM
12.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D95584
(cherry picked from commit
7bc31018f71cac22b7060c49cefb6f3d0d2e2069)
Shilei Tian [Thu, 28 Jan 2021 13:12:39 +0000 (08:12 -0500)]
[OpenMP][deviceRTLs] Added `[[clang::loader_uninitialized]]` explicitly
`[[clang::loader_uninitialized]]` is in macro `SHARED` but it doesn't
work for array like `parallelLevel`, so the variable will be zero initialized.
There is also a similar issue for `omptarget_nvptx_device_State` which is in
global address space. Its c'tor is also generated, which was not in the past when
building the `deviceRTLs` with CUDA. In this patch, we added the attribute to
the two variables explicitly.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D95550
(cherry picked from commit
19248d30e4ed5250fa84abbbd52fc7b835918a45)
Shilei Tian [Thu, 28 Jan 2021 13:13:28 +0000 (08:13 -0500)]
[OpenMP][NVPTX] Added the missing -O1 when building NVPTX bitcode libraries
In the past `-O1` was used when building NVPTX bitcode libraries. After
we switched to OpenMP, `-O1` was missing by mistake, leading to a huge performance
regression.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D95545
(cherry picked from commit
5a64794bbad4010778406dfee7748e6080258dbf)
Atmn Patel [Wed, 27 Jan 2021 23:49:41 +0000 (18:49 -0500)]
[OpenMP][Libomptarget] Fix conditional in CMake for remote plugin
The remote offloading plugin's CMakeLists was trying to build if its
flag was enabled even if it didn't find gRPC/protobuf. The conditional
was wrong, it's fixed by this.
Differential Revision: https://reviews.llvm.org/D95574
(cherry picked from commit
8a77056256d9970387595a5c729d894e3fe07131)
Haowei Wu [Thu, 28 Jan 2021 22:13:20 +0000 (14:13 -0800)]
[elfabi] Fix tests which failed on different timezones
This patch fixes elfabi tests on machines using a GMT+X timezone
settings.
Differential Revision: https://reviews.llvm.org/D95641
(cherry picked from commit
771b35965457ebd5faaed8a1c3d2bcefffe721a3)
Andrew Ng [Wed, 27 Jan 2021 16:47:21 +0000 (16:47 +0000)]
[X86] Fix disassembly of x86-64 GDTLS code sequence
For x86-64 the REX.w prefix takes precedence over any other size
override (i.e. 0x66). Therefore, for x86-64 when REX.w is present set
'hasOpSize' to false to ensure that any size override is ignored.
Fixes PR48901.
Differential Revision: https://reviews.llvm.org/D95682
(cherry picked from commit
94fedd266125a5425aa33e11332bf414f0b6dc35)
Cullen Rhodes [Sat, 16 Jan 2021 16:08:40 +0000 (16:08 +0000)]
[LV] Fix crash when computing max VF too early
D90687 introduced a crash:
llvm::LoopVectorizationCostModel::computeMaxVF(llvm::ElementCount, unsigned int):
Assertion `WideningDecisions.empty() && Uniforms.empty() && Scalars.empty() &&
"No decisions should have been taken at this point"' failed.
when compiling the following C code:
typedef struct {
char a;
} b;
b *c;
int d, e;
int f() {
int g = 0;
for (; d; d++) {
e = 0;
for (; e < c[d].a; e++)
g++;
}
return g;
}
with:
clang -Os -target hexagon -mhvx -fvectorize -mv67 testcase.c -S -o -
This occurred since prior to D90687 computeFeasibleMaxVF would only be
called in computeMaxVF when a scalar epilogue was allowed, but now it's
always called. This causes the assert above since computeFeasibleMaxVF
collects all viable VFs larger than the default MaxVF, and for each VF
calculates the register usage which results in analysis being done the
assert above guards against. This can occur in computeFeasibleMaxVF if
TTI.shouldMaximizeVectorBandwidth and this target hook is implemented in
the hexagon backend to always return true.
Reported by @iajbar.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D94869
(cherry picked from commit
8cda227432f1c9ceb63b88802ed8136da97274f1)
Hsiangkai Wang [Fri, 29 Jan 2021 23:54:41 +0000 (07:54 +0800)]
[RISCV] Update the version number to v0.10 for vector.
(cherry picked from commit
9847023660467a4469b5667bcf7a4c73a4780037)