OSDN Git Service

android-x86/external-swiftshader.git
9 years agoList Subzero's local optlevel flags after LLVM's cxxflags (precedence).
Jan Voung [Tue, 9 Sep 2014 23:36:42 +0000 (16:36 -0700)]
List Subzero's local optlevel flags after LLVM's cxxflags (precedence).

Ended up needing to fix the InstX8632Lockable error that JF ran into
now that -O0 is really -O0:
https://codereview.chromium.org/512933006/

BUG=none
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/557933002

9 years agoSubzero: The cross tests should use the actual Subzero runtime.
Jim Stichnoth [Tue, 9 Sep 2014 21:40:40 +0000 (14:40 -0700)]
Subzero: The cross tests should use the actual Subzero runtime.

BUG= none
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/560493002

9 years agoAdd alloca instruction to Subzero bitcode reader.
Karl Schimpf [Tue, 9 Sep 2014 18:40:09 +0000 (11:40 -0700)]
Add alloca instruction to Subzero bitcode reader.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=3894
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/545623005

9 years agoSubzero: Add a script that builds a hybrid Subzero/llc native executable.
Jim Stichnoth [Tue, 9 Sep 2014 18:19:12 +0000 (11:19 -0700)]
Subzero: Add a script that builds a hybrid Subzero/llc native executable.

The script translates a pexe using both Subzero and llc, and then uses linker tricks to resolve each symbol into either its Subzero or llc version.  This enables quick bisection-based debugging of Subzero code generation.

BUG= none
R=jfb@chromium.org, jvoung@chromium.org

Review URL: https://codereview.chromium.org/551953002

9 years agoSubzero: Make sure alloca with align=0 is handled correctly.
Jim Stichnoth [Tue, 9 Sep 2014 00:56:50 +0000 (17:56 -0700)]
Subzero: Make sure alloca with align=0 is handled correctly.

1. Modify dump() to match LLVM.

2. If it weren't for minimum stack alignment, the alignment code would be broken, so add a test in case the alignment code changes.

BUG= none
R=jvoung@chromium.org, kschimpf@google.com

Review URL: https://codereview.chromium.org/557533003

9 years agoAdd constants block to PNaCl bitcode reader.
Karl Schimpf [Mon, 8 Sep 2014 20:41:09 +0000 (13:41 -0700)]
Add constants block to PNaCl bitcode reader.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=3894
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/548553002

9 years agoSubzero: Move python scripts into a common pydir.
Jim Stichnoth [Mon, 8 Sep 2014 19:57:52 +0000 (12:57 -0700)]
Subzero: Move python scripts into a common pydir.

This makes it much easier to run scripts from different working directories, as long as they are run somewhere under the native_client directory.

BUG= none
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/554013002

9 years agoSubzero: Be more strict about i1 calculations.
Jim Stichnoth [Mon, 8 Sep 2014 18:19:21 +0000 (11:19 -0700)]
Subzero: Be more strict about i1 calculations.

One issue is that the test_arith cross test defined functions on i1 but never actually invoked them.

Another issue is that the lowering was using 8-bit registers for i1 values, but was being sloppy about leaving stuff in the upper 7 bits, and then using all 8 bits for tests.

This takes the approach of explicitly masking the result whenever it's possible for the result to exceed one bit, such as trunc, fptosi, fptoui.

Another possibility might be to allow the upper 7 bits to stay sloppy, and explicitly only test the lower bit.

Additionally, some "CHECK: ret" lines were removed, since they aren't actually needed after the change to use CHECK-LABEL, and they are affected by an llvm-dump bug (which is fixed in LLVM 3.6).

BUG= none
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/547033002

9 years agoSubzero: Use cvttss2si and similar instead of cvtss2si for fp->int casts.
Jim Stichnoth [Mon, 8 Sep 2014 17:47:23 +0000 (10:47 -0700)]
Subzero: Use cvttss2si and similar instead of cvtss2si for fp->int casts.

This is the truncating cvt instruction instead of rounding.

A few interesting floating point inputs are added to the cross tests.

Also, the cross test error output is modified to be more clear.

BUG= none
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/550723002

9 years agoAdd branch instructions to Subzero bitcode reader.
Karl Schimpf [Fri, 5 Sep 2014 15:32:47 +0000 (08:32 -0700)]
Add branch instructions to Subzero bitcode reader.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=3894
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/545603003

9 years agoAdd icmp and fcmp instructions to Subzero bitcode reader.
Karl Schimpf [Fri, 5 Sep 2014 15:30:55 +0000 (08:30 -0700)]
Add icmp and fcmp instructions to Subzero bitcode reader.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=3894
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/543793003

9 years agoSubzero: Fix sext/zext lowering with i1 source operands.
Jim Stichnoth [Thu, 4 Sep 2014 23:39:02 +0000 (16:39 -0700)]
Subzero: Fix sext/zext lowering with i1 source operands.

Also declare a few variables as Constant* instead of Operand* when they hold the result of Ctx->getConstantInt(), to be consistent with the rest of the code.

BUG= none
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/541093002

9 years agoAdd select instruction to Subzero bitcode reader.
Karl Schimpf [Thu, 4 Sep 2014 19:22:14 +0000 (12:22 -0700)]
Add select instruction to Subzero bitcode reader.

BUG=https: //code.google.com/p/nativeclient/issues/detail?id=3894
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/531123002

9 years agoSubzero: Work around another llvm-mc parser bug for relocatable symbols.
Jim Stichnoth [Thu, 4 Sep 2014 18:32:20 +0000 (11:32 -0700)]
Subzero: Work around another llvm-mc parser bug for relocatable symbols.

There's already a hack that emits asm like:
  lea eax, myglobal
instead of:
  mov eax, [myglobal]
because of an llvm-mc parser bug.  However, the lea hack still doesn't work if the symbol is a reserved word, e.g.:
  lea eax, flags

The extra hack is to drop into AT&T syntax temporarily:
.att_syntax
  leal flags, %eax
.intel_syntax

BUG= none
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/543803002

9 years agoSubzero: Make sure register preferences obey register class constraints.
Jim Stichnoth [Thu, 4 Sep 2014 17:37:49 +0000 (10:37 -0700)]
Subzero: Make sure register preferences obey register class constraints.

The bug was first spotted in the optimized gl_Color4ub() from spec2k's mesa.  The lowering sequences for fptosi and fptoui with i8 or i16 include "mov T_2, T_1" where T_1 and T_2 may have different integer types, and the statement:
      T_2->setPreferredRegister(T_1, true);

If T_2's type is i8 and T_1 is assigned a register that has no 8-bit version, then T_2 gets an unsuitable register.

The fix is to honor RegisterOverlap only when RegMask allows.

It's hard to construct a good test for this, since it depends heavily on register allocation decisions, which will change over time.

BUG= none
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/544713002

9 years agoSubzero: Render constants in dump() to be more like LLVM.
Jim Stichnoth [Wed, 3 Sep 2014 22:19:12 +0000 (15:19 -0700)]
Subzero: Render constants in dump() to be more like LLVM.

Integers are generally dumped as signed instead of unsigned values.
Integers of i1 type are dumped as 'false' and 'true'.  Floating point
values still don't match LLVM.

BUG= none
R=kschimpf@google.com

Review URL: https://codereview.chromium.org/539743002

9 years agoAdd vector insert/extract instructions to Subzero bitcode reader.
Karl Schimpf [Wed, 3 Sep 2014 16:46:24 +0000 (09:46 -0700)]
Add vector insert/extract instructions to Subzero bitcode reader.

BUG= //code.google.com/p/nativeclient/issues/detail?id=3894
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/529113002

9 years agoSubzero: Rename -external to -externalize to match llc.
Jim Stichnoth [Tue, 2 Sep 2014 22:13:00 +0000 (15:13 -0700)]
Subzero: Rename -external to -externalize to match llc.

BUG= none
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/535623002

9 years agoSubzero: Remove the need for a separate NaCl SDK installation.
Jim Stichnoth [Tue, 2 Sep 2014 21:11:57 +0000 (14:11 -0700)]
Subzero: Remove the need for a separate NaCl SDK installation.

Now it assumes tests are being run from within the Subzero portion of the native_client tree, and sets up PATH to find PNaCl tools relative to there.

It may be necessary to do a one-time setup to be able to build pexes:

    pnacl/build.sh sdk newlib

or the equivalent scons commands.

If the tool chain is updated, propagate the changes via:

    toolchain_build/toolchain_build_pnacl.py llvm_i686_linux --install=toolchain/linux_x86/pnacl_newlib

BUG= none
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/525603002

9 years agoAdd cast instructions to subzero's pnacl bitcode translator.
Karl Schimpf [Tue, 2 Sep 2014 17:47:28 +0000 (10:47 -0700)]
Add cast instructions to subzero's pnacl bitcode translator.

Also clean up other error cases (in function block) to simply return, since they have already generated an error message.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=3894
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/514273002

9 years agoConvert lit tests to check disassembled assembly.
Jan Voung [Fri, 29 Aug 2014 19:59:02 +0000 (12:59 -0700)]
Convert lit tests to check disassembled assembly.

Then when we have an integrated assembler, we can check
its disassembly and the result should be the same.
This only touches the tests that invoke llvm-mc currently.
There are other tests which check for .s file output.

There are quite a bit of quirks with llvm-objdump,
which is unfortunate:

(*) The symbolizer doesn't pick up non-section-local
function calls. Some externals were converted to be
local functions. Workaround: where it counts, I just
left a check via .s files and a new --check-prefix.
It's a little better in 3.6.

(*) The symbolizer doesn't pick up global variable names.
I just checked for the relocation addend instead.
Didn't check if it was better in 3.6, but maybe.

(*) We have a bug in bundling lock + instructions.
See
BUG=https://code.google.com/p/nativeclient/issues/detail?id=3929

(*) There's no disassembly for branch lables.
Checks of jump instructions were converted to check
for positive or negative values, depending on whether
it is a forward or backward branch.

BUG=none
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/509233002

9 years agoAlign function starts to target-specific bundle alignment.
Jan Voung [Thu, 28 Aug 2014 23:00:53 +0000 (16:00 -0700)]
Align function starts to target-specific bundle alignment.

BUG=none
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/515993002

9 years agoAdd pnacl-freeze to the tests_lit/lit.cfg. Also, unsigned vs signed.
Jan Voung [Thu, 28 Aug 2014 17:04:03 +0000 (10:04 -0700)]
Add pnacl-freeze to the tests_lit/lit.cfg. Also, unsigned vs signed.

Otherwise, I don't have pnacl-freeze in my path, and
I think the lit tests have trouble finding it.

src/PNaClTranslator.cpp: In member function ‘uint32_t {anonymous}::FunctionParser::convertRelativeToAbsIndex(int32_t)’:
src/PNaClTranslator.cpp:882:55: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare]
     if (Id > 0 && AbsNextId < static_cast<uint32_t>(Id)) {
                                                       ^
BUG=none
R=kschimpf@google.com, stichnot@chromium.org

Review URL: https://codereview.chromium.org/515003004

9 years agoStart processing function blocks.
Karl Schimpf [Wed, 27 Aug 2014 22:34:58 +0000 (15:34 -0700)]
Start processing function blocks.

Handle binops and returns.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=3894
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/395193005

9 years agoSubzero: Fix address mode optimization involving phi temporaries.
Jim Stichnoth [Wed, 27 Aug 2014 20:50:03 +0000 (13:50 -0700)]
Subzero: Fix address mode optimization involving phi temporaries.

Also adds much-needed logging of the decision process that goes into the address mode optimization.

BUG= none
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/490333003

9 years agoSubzero: Fix the link command for Trusty.
Jim Stichnoth [Wed, 27 Aug 2014 18:02:50 +0000 (11:02 -0700)]
Subzero: Fix the link command for Trusty.

With the original link command, -lpthread comes before some other LLVM libraries, and this ends up causing undefined pthreads symbols.  The new link command makes sure the -lpthread part comes last.

BUG= none
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/514723004

9 years agoSubzero: Fix some legalization issues involving immediates.
Jim Stichnoth [Wed, 27 Aug 2014 05:16:29 +0000 (22:16 -0700)]
Subzero: Fix some legalization issues involving immediates.

Some lowering sequences were incorrectly allowing immediate operands in native instructions.  This includes 32-bit icmp, 64-bit icmp, select, switch, and 64-bit mul.

BUG= none
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/511543002

9 years agoSubzero: Add a check-lit target for faster smoke testing.
Jim Stichnoth [Tue, 26 Aug 2014 21:07:13 +0000 (14:07 -0700)]
Subzero: Add a check-lit target for faster smoke testing.

BUG= none
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/507813002

9 years agoSubzero: Fixes for Hello World and bisection debugging.
Jim Stichnoth [Tue, 26 Aug 2014 17:29:05 +0000 (10:29 -0700)]
Subzero: Fixes for Hello World and bisection debugging.

Add the llvm2ice -sandbox option (false by default) to select between
native and sandboxed code generation.  Currently, it controls whether
the llvm.nacl.read.tp intrinsic is lowered to gs:[0x0] or a call to
__nacl_read_tp.

Change the asm output slightly for -ffunction-sections so that objdump
is more willing to provide a disassembly.

BUG= none
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/504963002

9 years agoRevert "COmmit"
Jim Stichnoth [Tue, 26 Aug 2014 16:26:02 +0000 (09:26 -0700)]
Revert "COmmit"

This was committed as a test, not actually intended.

This reverts commit 420e8bf2ebdc6e681838c018ca07e33e4321235f.

BUG=
R=dschuff@chromium.org

Review URL: https://codereview.chromium.org/504073003

9 years agoCOmmit
Jim Stichnoth [Tue, 26 Aug 2014 16:00:14 +0000 (09:00 -0700)]
COmmit

Patch from Jim Stichnoth <stichnot@chromium.org>.

9 years agoSubzero: Fix the simple register allocation for -Om1.
Jim Stichnoth [Mon, 18 Aug 2014 17:55:19 +0000 (10:55 -0700)]
Subzero: Fix the simple register allocation for -Om1.

Background: After lowering each high-level ICE instruction, Om1 calls
postLower() to do simple register allocation.  It only assigns
registers where absolutely necessary, specifically for infinite-weight
variables, while honoring pre-coloring decisions.  The original Om1
register allocation never tried to reuse registers within a lowered
sequence, which was generally OK except for very long lowering
sequences, such as call instructions or some intrinsics.  In these
cases, when it ran out of physical registers, it would just reset the
free list and hope for the best, but with no guarantee of correctness.

The fix involves keeping track of which instruction in the lowered
sequence holds the last use of each variable, and releasing each
register back to the free list after its last use.  This makes much
better use of registers.  It's not necessarily optimal, at least with
respect to pre-colored variables, since those registers are
black-listed even if they don't interfere with an infinite-weight
variable.

BUG= none
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/483453002

9 years agoSubzero: Randomly insert nops.
Matt Wala [Fri, 15 Aug 2014 23:21:56 +0000 (16:21 -0700)]
Subzero: Randomly insert nops.

Adds command line options -nop-insertion, -nop-insertion-probability=X, and -max-nops-per-instruction=X.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/463563006

9 years agoSubzero: Start a list of SIMD improvement ideas.
Matt Wala [Fri, 15 Aug 2014 22:02:13 +0000 (15:02 -0700)]
Subzero: Start a list of SIMD improvement ideas.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/477773003

9 years agoSubzero: Align spill locations to natural alignment.
Matt Wala [Thu, 14 Aug 2014 21:24:12 +0000 (14:24 -0700)]
Subzero: Align spill locations to natural alignment.

This requires sorting the spilled variables based on alignment and
introducing additional padding around the spill location areas.

These changes allow vector instructions to accept memory operands.

Old stack frame layout:  New stack frame layout:
+---------------------+  +---------------------+
| return address      |  | return address      |
+---------------------+  +---------------------+
| preserved registers |  | preserved registers |
+---------------------+  +---------------------+
| global spill area   |  | padding             |
+---------------------+  +---------------------+
| local spill area    |  | global spill area   |
+---------------------+  +---------------------+
| padding             |  | padding             |
+---------------------+  +---------------------+
| local variables     |  | local spill area    |
+---------------------+  +---------------------+
                         | padding             |
                         +---------------------+
                         | local variables     |
                         +---------------------+

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/465413003

9 years agoEmit .local before .comm for bss to make llvm-mc happy.
Jan Voung [Thu, 14 Aug 2014 15:20:44 +0000 (08:20 -0700)]
Emit .local before .comm for bss to make llvm-mc happy.

Otherwise llvm-mc asserts. This is also the order that llc emits the directives.
Change a couple of RUIN -> RUN in lit tests.

BUG=none
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/469973002

9 years agoConvert lit test llvm-mc -arch arguments to full -triple.
Jan Voung [Wed, 13 Aug 2014 20:20:58 +0000 (13:20 -0700)]
Convert lit test llvm-mc -arch arguments to full -triple.

Mostly to make them a bit more portable across OSes.
Otherwise the OS assumed by llvm-mc is the build/host OS. So,
on Mac llvm-mc will assume it's targeting darwin and only accepts macho
assembler directives. Assembler directives like .rodata.cst8 are not accepted
(I'm guessing it uses .cstring, .literal4, etc. instead?).

Force an OS (NaCl) so that ELF-related assembler macros make sense.

Also remove a now unused function typeIdentString to make clang happy.

Example errors:
Command 5 Stderr:
<stdin>:5:2: error: unknown directive
        .type   fixed_400,@function
        ^
<stdin>:23:2: error: unknown directive
        .type   variable_n,@function
        ^
<stdin>:40:11: error: mach-o section specifier uses an unknown section type
        .section        .rodata.cst4,"aM",@progbits,4
                        ^
<stdin>:42:11: error: mach-o section specifier uses an unknown section type
        .section        .rodata.cst8,"aM",@progbits,8
                        ^

BUG=none
R=stichnot@chromium.org, wala@chromium.org

Review URL: https://codereview.chromium.org/467103004

9 years agoSubzero: Factor our commonalities between mov-like instructions.
Matt Wala [Tue, 12 Aug 2014 20:15:04 +0000 (13:15 -0700)]
Subzero: Factor our commonalities between mov-like instructions.

Introduce a base class for mov, movq, and movp instruction classes.

BUG=none
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/466733005

9 years agoSubzero: Align the stack at the point of function calls.
Matt Wala [Tue, 12 Aug 2014 02:56:19 +0000 (19:56 -0700)]
Subzero: Align the stack at the point of function calls.

Be compatible with the x86-32 calling convention by ensuring that the
stack is aligned to 16 bytes at the point of the call
instruction. Also ensure that vector arguments passed on the stack are
16 byte aligned.

Also, make alloca instructions respect alignment.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/444443002

9 years agoSubzero: address mode opt: Transform *(reg+const) into [reg+const].
Matt Wala [Tue, 12 Aug 2014 00:46:58 +0000 (17:46 -0700)]
Subzero: address mode opt: Transform *(reg+const) into [reg+const].

Teach address mode optimization about Base=Base+Const,
Base=Const+Base, and Base=Base-Const patterns.

Change ConstantInteger::emit() to emit signed values.

BUG=none
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/459133002

9 years agoSubzero: Fix a debugging string in the test_icmp crosstest.
Matt Wala [Tue, 12 Aug 2014 00:44:40 +0000 (17:44 -0700)]
Subzero: Fix a debugging string in the test_icmp crosstest.

STR(inst) should be STR(cmp).

BUG=none
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/466543002

9 years agoSubzero: Add a random number generator.
Matt Wala [Fri, 8 Aug 2014 21:02:09 +0000 (14:02 -0700)]
Subzero: Add a random number generator.

This is inital work necessary for diversification support in Subzero.
The random number generator implementation is temporary.  It will
eventually use a cryptographically secure pseudorandom number
generator (perhaps from LLVM, if LLVM gets one).

Add the -rng-seed= option to seed the random number generator from
the command line.

BUG=none
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/455593004

9 years agoSubzero: Add the "llvm2ice -ffunction-sections" argument.
Jim Stichnoth [Fri, 8 Aug 2014 17:13:44 +0000 (10:13 -0700)]
Subzero: Add the "llvm2ice -ffunction-sections" argument.

The purpose is to enable bisection debugging of Subzero-translated functions, using objcopy to selectively splice functions from llc and Subzero into the binary.

Note that llvm-mc claims to take this argument, but actually does nothing with it, so we need to implement it in Subzero.

Also moves the ClFlags object into the GlobalContext so everyone can access it.

BUG= none
R=wala@chromium.org

Review URL: https://codereview.chromium.org/455633002

9 years agoSubzero: Make InstX8632Cbwdq a UnaryOp.
Matt Wala [Fri, 8 Aug 2014 15:39:40 +0000 (08:39 -0700)]
Subzero: Make InstX8632Cbwdq a UnaryOp.

After the changes in CL 443203003, InstX8632Cbwdq fits the template for
a UnaryOp, so change it to be in instance of this class.

BUG=none
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/452143003

9 years agoSubzero: Use scalar arithmetic when no vector instruction exists.
Matt Wala [Thu, 7 Aug 2014 20:47:30 +0000 (13:47 -0700)]
Subzero: Use scalar arithmetic when no vector instruction exists.

Implement scalarizeArithmetic() which extracts the components of the
input vectors, performs the operation with scalar instructions, and
builds the output vector component by component.

Fix the lowering of sdiv and srem.  These were previously emitting a
wrong instruction (cdq) for i8 and i16 inputs (needing cbw, cwd).

In the test_arith crosstest, mask the inputs to vector shift
operations to ensure that the shifts are in range.  Otherwise the
Subzero output is not identical to the llc output in some (undefined)
cases.

BUG=none
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/443203003

9 years agoSubzero: A few fixes toward running larger programs.
Jim Stichnoth [Thu, 7 Aug 2014 17:58:05 +0000 (10:58 -0700)]
Subzero: A few fixes toward running larger programs.

1. Add 'llvm2ice -disable-globals' to disable Subzero translation of
global initializers, since full support isn't yet implemented.

2. Change the names of intra-block branch target labels to avoid
collisions with basic block labels.

3. Fix lowering of "br i1 <constant>, label ...", which was producing
invalid instructions like "cmp 1, 0".

4. Fix the "make format-diff" operation, which was diffing against the wrong target.

BUG= none
R=wala@chromium.org

Review URL: https://codereview.chromium.org/449093002

9 years agoSubzero: Fix and clean up some cross tests.
Jim Stichnoth [Tue, 5 Aug 2014 18:22:37 +0000 (11:22 -0700)]
Subzero: Fix and clean up some cross tests.

1. It turns out that the crosstest scripts mix different versions of
clang - build_pnacl_ir.py uses pnacl-clang from the NaCl SDK for the
tests, while crosstest.py uses clang/clang++ from LLVM_BIN_PATH for
the driver.  The SDK has been updated to use a different version of
the standard library, and now there is a mismatch as to whether int8_t
is typedef'd to 'char' or 'signed char', leading to name mangling
mismatches.  (char, signed char, and unsigned char are distinct
types.)  We deal with this by using myint8_t which is explicitly
defined as signed char.

2. Some ugly function pointer casting in test_arith_main.cpp is fixed/removed.

3. std::endl is replaced with "\n".

4. License text is added to tests that were touched by the above items.

BUG= none
R=wala@chromium.org

Review URL: https://codereview.chromium.org/435353002

9 years agoSubzero: Fix some issues related to legalization and undef handling.
Matt Wala [Thu, 31 Jul 2014 16:06:17 +0000 (09:06 -0700)]
Subzero: Fix some issues related to legalization and undef handling.

1. Much of the lowering code for vector operations was not properly
checking that the input operand was in a register or memory. This
problem could be exhibited by passing undef values as inputs.

=> Change the vector legalization code to legalize input operands to
register or memory before producing instructions that use the
operands. Also, append a suffix to the variable names in the vector
legalization code to clarify the legalization status of the values.

2. Undef values should never be emitted directly. Rather, they should
have been appropriately legalized to a zero value.

=> To enforce this, make ConstantUndef::emit() issue an error
message. Do this in the x86 backend, as other backends may decide to
treat undef values differently.

3. The regalloc_evict_non_overlap test was loading from an undef
pointer. Subzero was not handling this correctly (the undef pointer was
being emitted without being legalized), but it does not have to handle
this case since PNaCl IR disallows undef pointers.

=> Fix the regalloc_evict_non_overlap test to use an inttoptr instead of
directly loading from the undef pointer. Also, add an assert in
IceTargetLoweringX8632::FormMemoryOperand() to make sure that undef
pointers are never encountered.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/432613002

9 years agoSubzero: Fix a signed/unsigned warning reported on the Mac.
Jim Stichnoth [Wed, 30 Jul 2014 22:37:39 +0000 (15:37 -0700)]
Subzero: Fix a signed/unsigned warning reported on the Mac.

Also cleans up some unneeded table size const static variables.

BUG= https://codereview.chromium.org/296053008/
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/428353002

9 years agoSubzero: Try to fix warnings and errors in the Windows build.
Jim Stichnoth [Wed, 30 Jul 2014 21:45:20 +0000 (14:45 -0700)]
Subzero: Try to fix warnings and errors in the Windows build.

Quiet some unused-variable warnings when their only use is in an assert().

Forward-declare partial template specializations when the template method already has a default implementation, to avoid ODR violations and link errors.

BUG= https://codereview.chromium.org/296053008/
R=wala@chromium.org

Review URL: https://codereview.chromium.org/429993002

9 years agoAdd dtor to InstX8632Lockable.
Jan Voung [Wed, 30 Jul 2014 21:33:37 +0000 (14:33 -0700)]
Add dtor to InstX8632Lockable.

Speculative fix for Mac GCC build.

BUG=none
R=dschuff@chromium.org

Review URL: https://codereview.chromium.org/432523002

9 years agoSubzero: Add support for SSE4.1 instructions.
Matt Wala [Wed, 30 Jul 2014 19:44:39 +0000 (12:44 -0700)]
Subzero: Add support for SSE4.1 instructions.

* Add initial support for code generation with SSE4.1 instructions. The
following operations are affected:
 - multiplication with v4i32
 - select
 - insertelement
 - extractelement

* Add appropriate lit checks for SSE4.1 instructions. Run the crosstests
in both SSE2 and SSE4.1 mode.

* Introduce the -mattr flag to llvm2ice to control which instruction set
gets used.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/427843002

9 years agoFix bug when atomic load is fused with an arith op (and not in the entry BB)
Jan Voung [Wed, 30 Jul 2014 17:06:03 +0000 (10:06 -0700)]
Fix bug when atomic load is fused with an arith op (and not in the entry BB)

Normally, the FakeUse for preserving the atomic load ends
up on the load's Dest. However, for fused load+add, the load
is deleted, and its Dest is no longer defined. This trips
up the liveness analysis when it happens on a non-entry
block. So the FakeUse should be for the add's dest instead,
in that case.

We have no access to the add, so introduce a
getLastInserted() helper. A couple of ways to do that:
- modify insert() to track explicitly
- rewind from Next one step

Either that, or we disable the fusing for atomic loads.

BUG=  https://code.google.com/p/nativeclient/issues/detail?id=3882
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/417353003

9 years agoRemove extra semicolon after method definition
Derek Schuff [Wed, 30 Jul 2014 16:39:36 +0000 (09:39 -0700)]
Remove extra semicolon after method definition

The mac build treats this as an error.

R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/429253002

9 years agoAdd a peephole to fuse cmpxchg w/ later cmp+branch.
Jan Voung [Tue, 29 Jul 2014 21:38:51 +0000 (14:38 -0700)]
Add a peephole to fuse cmpxchg w/ later cmp+branch.

The cmpxchg instruction already sets ZF for comparing the return value
vs the expected value. So there is no need to compare eq again.

Lots of pexes-in-the-wild have this pattern. Some compare against
a constant, some compare against a variable.

BUG=https://code.google.com/p/nativeclient/issues/detail?id=3882
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/413903002

9 years agoA couple of fixes for using Makefile.standalone on Mac.
Jan Voung [Mon, 28 Jul 2014 22:19:43 +0000 (15:19 -0700)]
A couple of fixes for using Makefile.standalone on Mac.

(*) PNaCl toolchain_build builds 64-bit libraries for LLVM on Mac.
    That won't link with subzero code if subzero is built with -m32,
    so add an option to override the -m32.
(*) include locale header
(*) Mark xMacroIntegrityCheck unused to avoid clang compiler warning.
(*) virtual dtor, for inheritable class
(*) Mark compare function const

BUG=none
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/428733003

9 years agoSubzero: Make Ice::Ostream a typedef for llvm::raw_ostream.
Jim Stichnoth [Mon, 28 Jul 2014 06:14:00 +0000 (23:14 -0700)]
Subzero: Make Ice::Ostream a typedef for llvm::raw_ostream.

Previously Ostream was a class that wrapped a raw_ostream pointer,
structured that way in case we wanted to wrap an alternate stream
type.

Also, Ostream used to include a Cfg pointer, but that had to go away
when the Ostream became associated with the GlobalContext which
persists beyond the Cfg lifetime, so the Cfg pointer was removed
leaving only the raw_ostream.

Since llvm::raw_ostream is supposed to be very lightweight, we can
just give up the abstraction and equate it to Ice::Ostream.

BUG= none
R=kschimpf@google.com

Review URL: https://codereview.chromium.org/413393005

9 years agoUse movss to implement insertelement when elements = 4 and index = 0.
Matt Wala [Fri, 25 Jul 2014 22:57:56 +0000 (15:57 -0700)]
Use movss to implement insertelement when elements = 4 and index = 0.

This avoids using a pair of shufps instructions as the previous lowering
was doing.  Instead, we use movss to copy the element to be inserted
into the lower 32 bits of the destination.

Define InstX8632Movss as a Binop, the class to which it properly
belongs.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/412353005

9 years agoLower the fcmp instruction for <4 x float> operands.
Matt Wala [Thu, 24 Jul 2014 19:34:20 +0000 (12:34 -0700)]
Lower the fcmp instruction for <4 x float> operands.

Most fcmp conditions map directly to single x86 instructions. For
these, the lowering is table driven.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/413053002

9 years agoLower the select instruction when the operands are of vector type.
Matt Wala [Thu, 24 Jul 2014 16:44:42 +0000 (09:44 -0700)]
Lower the select instruction when the operands are of vector type.

Select of vectors is implemented by appropriately masking and
combining the inputs with sign extend / bitwise operations
and without the use of branches.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/417653004

9 years agoFix a counter in the test_global crosstest.
Matt Wala [Thu, 24 Jul 2014 16:43:36 +0000 (09:43 -0700)]
Fix a counter in the test_global crosstest.

Change TotalTests so that the test count matches up with the number of
recorded passes and failures.

BUG=none
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/415803004

9 years agoSubzero: Fix a regalloc eviction bug.
Jim Stichnoth [Thu, 24 Jul 2014 15:48:15 +0000 (08:48 -0700)]
Subzero: Fix a regalloc eviction bug.

We don't need/want to evict an inactive live range when it doesn't
overlap with the live range currently being considered.

This is especially important for Variables representing scratch
registers that are killed by call instructions.  These register
assignments should obviously never be evicted.

Note that the algorithm that computes the min-weight register to evict
doesn't consider inactive and non-overlapping live ranges.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=3903
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/417933004

9 years agoLower icmp operations between vector values.
Matt Wala [Wed, 23 Jul 2014 21:56:10 +0000 (14:56 -0700)]
Lower icmp operations between vector values.

SSE2 only has signed integer comparison. Unsigned compares are
implemented by inverting the sign bits of the operands and doing a
signed compare.

A common pattern in clang generated IR is a vector compare which
generates an i1 vector followed by a sign extension of the result of the
compare. The x86 comparison instructions already generate sign extended
values, so we can eliminate unnecessary sext operations that follow
compares in the IR.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/412593002

9 years agoAdd llvm-mc to the set of commands lit knows about.
Jim Stichnoth [Wed, 23 Jul 2014 16:43:46 +0000 (09:43 -0700)]
Add llvm-mc to the set of commands lit knows about.

BUG= none
R=wala@chromium.org

Review URL: https://codereview.chromium.org/415583003

9 years agoAdd -arch=x86 and -filetype=obj to all RUN lines involving
Matt Wala [Wed, 23 Jul 2014 01:26:05 +0000 (18:26 -0700)]
Add -arch=x86 and -filetype=obj to all RUN lines involving
llvm-mc.

This fixes the failing validation of callindirect.pnacl.ll.

The following tests fail to validate (some due to the
addition of -filetype=obj):
 * convert.ll
 * globalinit.pnacl.ll
 * mangle.ll
 * nacl-atomic-fence-all.ll
 * shift.ll

BUG=none
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/410743005

9 years agoFix legalization of source operand to bsr and bsf.
Matt Wala [Tue, 22 Jul 2014 23:39:38 +0000 (16:39 -0700)]
Fix legalization of source operand to bsr and bsf.

The source operand to bsr and bsf must be in a register or memory.

BUG=none
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/407093014

9 years agoValidate the assembly code that Subzero generates in unit tests.
Matt Wala [Tue, 22 Jul 2014 22:03:01 +0000 (15:03 -0700)]
Validate the assembly code that Subzero generates in unit tests.

Add RUN lines to applicable lit tests to pipe the output of Subzero (in
-Om1 and/or -O2 mode) to llvm-mc for validation.

Note that the following unit tests fail the validation:
 * callindirect.pnacl.ll
 * mangle.ll
 * nacl-other-intrinsics.ll

BUG=none
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/411693003

9 years agoFactor out common vector crosstesting code.
Matt Wala [Tue, 22 Jul 2014 17:55:30 +0000 (10:55 -0700)]
Factor out common vector crosstesting code.

Add vectors.h and vector.def to hold vector type declarations and useful
vector utilities. Change the existing tests to use this new header where
applicable (arith, vector_ops).

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/407543003

9 years agoUse lowerCast instead of inlined _movzx, to get legalization, for memset.
Jan Voung [Mon, 21 Jul 2014 21:05:29 +0000 (14:05 -0700)]
Use lowerCast instead of inlined _movzx, to get legalization, for memset.

Otherwise, there can be a movzx reg, 0, which is illegal,
when the memset value is constant 0.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=3882
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/402253002

10 years agoFix array index in test initialization.
Matt Wala [Fri, 18 Jul 2014 23:32:16 +0000 (16:32 -0700)]
Fix array index in test initialization.

Index() % NumElementsInType should be Index() % NumValues.

BUG=none
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/404553007

10 years agoLower stacksave and restore intrinsics.
Jan Voung [Fri, 18 Jul 2014 20:12:58 +0000 (13:12 -0700)]
Lower stacksave and restore intrinsics.

Just copies the current stack pointer to/from a variable.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=3882
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/396993009

10 years agoLower byte swap intrinsic.
Jan Voung [Fri, 18 Jul 2014 20:01:08 +0000 (13:01 -0700)]
Lower byte swap intrinsic.

Clump the negate instruction w/ the bswap instruction as an
"inplace" operation. One difference is that bswap has stricter
requirements the operand type.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=3882
R=stichnot@chromium.org, wala@chromium.org

Review URL: https://codereview.chromium.org/401533002

10 years agoLower insertelement and extractelement.
Matt Wala [Fri, 18 Jul 2014 19:45:09 +0000 (12:45 -0700)]
Lower insertelement and extractelement.

Use instructions that do the operations in registers and that are
available in SSE2. Spill to memory to perform the operation in the
absence of any other reasonable options (v16i8 and v16i1).

Unfortunately there is no natural class of SSE2 instructions that
insertelement / extractelement can get lowered
to for all vector types (though pinsr[bwd] and pextr[bwd] are
available in SSE4.1). There are in some cases a large number of
choices available for lowering and I have not looked into which
choices are the best yet, besides using LLVM output as a guide.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/401523003

10 years agoLower the rest of the vector arithmetic operations.
Matt Wala [Thu, 17 Jul 2014 19:41:31 +0000 (12:41 -0700)]
Lower the rest of the vector arithmetic operations.

The instructions emitted by the lowering operations require memory
operands to be aligned to 16 bytes. Since there is no support for
aligning memory operands in Subzero, do the arithmetic in registers for
now.

Add vector arithmetic to the arith crosstest. Pass the -mstackrealign
parameter to the crosstest clang so that llc code called back from
Subzero code (helper calls) doesn't assume that the stack is aligned at
the entry to the call.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/397833002

10 years agoLower casting operations that involve vector types.
Matt Wala [Wed, 16 Jul 2014 17:21:30 +0000 (10:21 -0700)]
Lower casting operations that involve vector types.

Impacted instructions:

bitcast {v4f32, v4i32, v8i16, v16i8} <-> {v4f32, v4i32, v8i16, v16i8}
bitcast v8i1 <-> i8
bitcast v16i1 <-> i16

(There was already code present to handle trivial bitcasts like v16i1 <-> v16i1.)

[sz]ext v4i1 -> v4i32
[sz]ext v8i1 -> v8i16
[sz]ext v16i1 -> v16i8

trunc v4i32 -> v4i1
trunc v8i16 -> v8i1
trunc v16i8 -> v16i1

[su]itofp v4i32 -> v4f32
fpto[su]i v4f32 -> v4i32

Where there is a relatively simple lowering to x86 instructions, it has been used. Otherwise a helper call is used.

Some lowerings require a materialization of a integer vector with 1s in each entry. Since there is no support for vector constant pools, the constant is materialized purely through register operations.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/383303003

10 years agoLower bitmanip intrinsics, assuming absence of BMI/SSE4.2 for now.
Jan Voung [Wed, 16 Jul 2014 00:52:39 +0000 (17:52 -0700)]
Lower bitmanip intrinsics, assuming absence of BMI/SSE4.2 for now.

We'll need the fallbacks in any case. However, once we've
decided on how to specify the CPU features of the user
machine we can use the nicer LZCNT/TZCNT/POPCNT as well.

Adds cmov, bsf, and bsr instructions.

Calls a popcount helper function for machines without SSE4.2.

Not handling bswap yet (which can also take i16 params).

BUG= https://code.google.com/p/nativeclient/issues/detail?id=3882
R=stichnot@chromium.org, wala@chromium.org

Review URL: https://codereview.chromium.org/390443005

10 years agoVarious improvements related to legalization code.
Matt Wala [Tue, 15 Jul 2014 00:37:37 +0000 (17:37 -0700)]
Various improvements related to legalization code.

1) In makeHelperCall(), function pointers that are created should have
type IceType_i32, not the functions' own return type.

2) In legalize(), change the name of WillHaveRegister to
MustHaveRegister. Add a comment to clarify the condition being computed.

3) In legalize(), add an assert to make sure that vector "constants"
don't get legalized (other than undef). There should be no constants of
vector type.

4) In copyToReg(), replace an unnecessary use of Src->getType().

BUG=none
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/385133006

10 years agoFix floating point vector frem lowering.
Matt Wala [Tue, 15 Jul 2014 00:18:14 +0000 (17:18 -0700)]
Fix floating point vector frem lowering.
The frem operation takes two arguments.
Pass both Src0 and Src1 to __frem_v4f32.

BUG=none
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/387153002

10 years agoRemove memcpy test workaround for name mangling substitutions.
Jan Voung [Mon, 14 Jul 2014 18:51:44 +0000 (11:51 -0700)]
Remove memcpy test workaround for name mangling substitutions.

Now that the name mangling is a bit smarter (from commit:
217dc082d5cc2af1cc7c544f51ef15b4abe5be8b), we don't need to
avoid having the same type twice in the function signature.

BUG=none
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/389683003

10 years agoSubzero: lower the rest of the atomic operations.
Jan Voung [Mon, 14 Jul 2014 17:32:41 +0000 (10:32 -0700)]
Subzero: lower the rest of the atomic operations.

64-bit ops are expanded via a cmpxchg8b loop.

64/32-bit and/or/xor are also expanded into a cmpxchg /
cmpxchg8b loop.

Add a cross test for atomic RMW operations and
compare and swap.

Misc: Test that atomic.is.lock.free can be optimized out if result is ignored.

TODO:
* optimize compare and swap with compare+branch further down
instruction stream.

* optimize atomic RMW when the return value is ignored
(adds a locked field to binary ops though).

* We may want to do some actual target-dependent basic
block splitting + expansion (the instructions inserted by
the expansion must reference the pre-colored registers,
etc.). Otherwise, we are currently getting by with modeling
the extended liveness of the variables used in the loops
using fake uses.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=3882
R=jfb@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/362463002

10 years agoLower vector floating point arithmetic operations.
Matt Wala [Fri, 11 Jul 2014 22:43:51 +0000 (15:43 -0700)]
Lower vector floating point arithmetic operations.

This adds lowering code for fadd, fsub, fmul, fdiv, and frem. frem, having no native x86 counterpart, is implemented by making a helper call.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/389653002

10 years agoSubzero: Fix the name mangling code's base-36 increment.
Jim Stichnoth [Fri, 11 Jul 2014 22:29:23 +0000 (15:29 -0700)]
Subzero: Fix the name mangling code's base-36 increment.

SZZZ_ was being incremented to S0000_ instead of S1000_.

BUG= https://codereview.chromium.org/385273002/
R=wala@chromium.org

Review URL: https://codereview.chromium.org/390533002

10 years agoSubzero: Deal with substitutions in the primitive remangler.
Jim Stichnoth [Fri, 11 Jul 2014 21:06:55 +0000 (14:06 -0700)]
Subzero: Deal with substitutions in the primitive remangler.

https://refspecs.linuxbase.org/cxxabi-1.75.html#mangling-compression
describes the mechanism for compressing mangled strings by using substitutions of the form S[0-9A-Z]*_ to represent repeated components.

When the prefix is handled as wrapping inside a namespace, the base-36 substitution numbers all have to be incremented.

This is implemented in a very simple way by scanning the string only for instances of the substitution pattern.

Unfortunately, false matches are possible because the S[0-9A-Z]*_ pattern can be a substring of the type name, or can span other components of the mangled name.  Getting this completely right would essentially require a full demangling parser - see the ~4000 lines of code in cxa_demangle.cpp and ItaniumMangle.cpp.

Since this is just for testing, any false matches will likely cause a linking error and the test can be rewritten to avoid false matches.

BUG= none
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/385273002

10 years agoClean up exit status and globals procecessing in llvm2ice.
Karl Schimpf [Fri, 11 Jul 2014 17:26:34 +0000 (10:26 -0700)]
Clean up exit status and globals procecessing in llvm2ice.

Makes IceTranslator.ExitStatus a boolean (rather than int), and changes
code to check flag when done. Fixes bug introduced in
https://codereview.chromium.org/387023002.

Also cleans up the (Ice) Converter class to handle globals processing,
rathe than doing it in llvm2ice.cpp.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=3894
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/387023002

10 years agoSubzero: Fix a regalloc bug involving too-aggressive AllowRegisterOverlap.
Jim Stichnoth [Thu, 10 Jul 2014 22:32:36 +0000 (15:32 -0700)]
Subzero: Fix a regalloc bug involving too-aggressive AllowRegisterOverlap.

See the BUG description for more details.  In short, the register allocator
was inappropriately honoring AllowRegisterOverlap even when the variable's
live range overlaps with an Unhandled variable precolored to the preferred
register.

Also changes legalize() logic to recognize when a variable is guaranteed
to ultimately have a physical register due to infinite weight, and not
create a new temporary in those cases.

Finally, dumps RegisterPreference and AllowRegisterOverlap info for
Variables for improved diagnostics.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=3897
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/380363002

10 years agoSubzero: Add "make format-diff" target.
Jim Stichnoth [Wed, 9 Jul 2014 23:53:40 +0000 (16:53 -0700)]
Subzero: Add "make format-diff" target.

This invokes clang-format-diff.py so you can easily reformat just
the code you touched.

(Caution, this may not apply to new files.)

BUG= none
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/372133002

10 years agoAdd support for passing and returning vectors in accordance with the x86 calling...
Matt Wala [Wed, 9 Jul 2014 23:33:22 +0000 (16:33 -0700)]
Add support for passing and returning vectors in accordance with the x86 calling convention.

- Add TargetLowering::lowerArguments() as a new stage in TargetLowering.
- Add support for passing arguments/return values in XMM registers in the x86 target.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/372113005

10 years agoAdd scalar lowering for sqrt intrinsic.
Jan Voung [Wed, 9 Jul 2014 23:13:13 +0000 (16:13 -0700)]
Add scalar lowering for sqrt intrinsic.

Re-used test_arith_main.cpp, mostly to share the set of interesting
floating point constants.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=3882
R=stichnot@chromium.org, wala@chromium.org

Review URL: https://codereview.chromium.org/384443003

10 years agoAvoid assigning esp (or ebp for framepointer-using frames) in Om1.
Jan Voung [Wed, 9 Jul 2014 16:54:25 +0000 (09:54 -0700)]
Avoid assigning esp (or ebp for framepointer-using frames) in Om1.

For ebp, exclude as needed. For esp, don't mark it as
an int register.

Not sure exactly how to do a targeted test for this Om1
register allocator. The Om1 regalloc seems to start w/ a
fresh whitelist after each instruction, so it may assign
the same register (e.g., eax), as an earlier instruction.
Without pre-colored registers, I'm not sure how to force it
to allocate something other than the first few registers.
I do have a test case that has a ton of pre-colored
registers, (e.g., cmpxchg8b), but that is a different CL:
https://codereview.chromium.org/362463002/

Encountered for:
BUG= https://code.google.com/p/nativeclient/issues/detail?id=3882
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/369573005

10 years agoSubzero: Temporary fix for build error.
Jim Stichnoth [Tue, 8 Jul 2014 21:44:09 +0000 (14:44 -0700)]
Subzero: Temporary fix for build error.

The compile error was introduced in https://codereview.chromium.org/361733002/ .

BUG= none
R=wala@chromium.org

Review URL: https://codereview.chromium.org/376923003

10 years agoAdd support for vector types.
Matt Wala [Mon, 7 Jul 2014 23:50:46 +0000 (16:50 -0700)]
Add support for vector types.

- Add vector types to the type table.

- Add support for parsing vector types in llvm2ice.

- Legalize undef vector values to zero. Test that undef vector values are lowered correctly.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/353553004

10 years agoUpdate Subzero to start parsing PNaCl bitcode files.
Karl Schimpf [Mon, 7 Jul 2014 21:50:30 +0000 (14:50 -0700)]
Update Subzero to start parsing PNaCl bitcode files.

This patch only handles global addresses in PNaCl bitcode files.
Function blocks are still not parsed. Also, factors out a common API
for translation, so that generated ICE can always be translated using
the same code.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=3892
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/361733002

10 years agoSubzero: Partial implementation of global initializers.
Jim Stichnoth [Sun, 29 Jun 2014 15:13:48 +0000 (08:13 -0700)]
Subzero: Partial implementation of global initializers.

This is still missing a couple things:

1. It only supports flat arrays and zeroinitializers.  Arrays of structs are not yet supported.

2. Initializers can't yet contain relocatables, e.g. the address of another global.Mod

Some changes are made to work around an llvm-mc assembler bug.  When assembling using intel syntax, llvm-mc doesn't correctly parse symbolic constants or add relocation entries in some circumstances.  Call instructions work, and use in a memory operand works, e.g. mov eax, [ArrayBase+4*ecx].  To work around this, we adjust legalize() to not allow ConstantRelocatable by default, except for memory operands and when called from lowerCall(), so the relocatable ends up being the source operand of a mov instruction.  Then, the mov emit routine actually emits an lea instruction for such moves.

A few lit tests needed to be adjusted to make szdiff work properly with respect to global initializers.

In the new cross test, the driver calls test code that returns a pointer to an array with a global initializer, and the driver compares the arrays returned by llc and Subzero.

BUG= none
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/358013003

10 years agoRefactor llvm2ice so that Ice can be built while reading bitcode.
Karl Schimpf [Fri, 27 Jun 2014 16:15:29 +0000 (09:15 -0700)]
Refactor llvm2ice so that Ice can be built while reading bitcode.

BUG=None
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/350933002

10 years agoSubzero: Add 'not' to the list of LLVM commands in lit.cfg.
Jim Stichnoth [Thu, 26 Jun 2014 20:32:27 +0000 (13:32 -0700)]
Subzero: Add 'not' to the list of LLVM commands in lit.cfg.

Without this being in the command substitutions list, lit will rely on the 'not' command being in $PATH.

The substitution code is adapted from llvm/test/lit.cfg to add word-break regexps to the list.

BUG= none
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/344063004

10 years agoAdd atomic load/store, fetch_add, fence, and is-lock-free lowering.
Jan Voung [Wed, 25 Jun 2014 17:36:46 +0000 (10:36 -0700)]
Add atomic load/store, fetch_add, fence, and is-lock-free lowering.

Loads/stores w/ type i8, i16, and i32 are converted to
plain load/store instructions and lowered w/ the plain
lowerLoad/lowerStore.  Atomic stores are followed by an mfence
for sequential consistency.

For 64-bit types, use movq to do 64-bit memory
loads/stores (vs the usual load/store being broken into
separate 32-bit load/stores). This means bitcasting the
i64 -> f64, first (which splits the load of the value to be
stored into two 32-bit ops) then stores in a single op. For
load, load into f64 then bitcast back to i64 (which splits
after the atomic load). This follows what GCC does for
c++11 std::atomic<uint64_t> load/store methods (uses movq
when -mfpmath=sse). This introduces some redundancy between
movq and movsd, but the convention seems to be to use movq
when working with integer quantities. Otherwise, movsd
could work too. The difference seems to be in whether or
not the XMM register's upper 64-bits are filled with 0 or
not. Zero-extending could help avoid partial register
stalls.

Handle up to i32 fetch_add. TODO: add i64 via a cmpxchg loop.

TODO: add some runnable crosstests to make sure that this
doesn't do funny things to integer bit patterns that happen
to look like signaling NaNs and quiet NaNs. However, the system
clang would not know how to handle "llvm.nacl.*" if we choose to
target that level directly via .ll files. Or, (a) we use old-school __sync
methods (sync_fetch_and_add w/ 0 to load) or (b) require buildbot's
clang/gcc to support c++11...

BUG= https://code.google.com/p/nativeclient/issues/detail?id=3882
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/342763004

10 years agoBitcast of 64-bit immediates may need to split the immediate, not a var.
Jan Voung [Tue, 24 Jun 2014 20:43:30 +0000 (13:43 -0700)]
Bitcast of 64-bit immediates may need to split the immediate, not a var.

Currently, the integer immediate is legalized to a
64-bit integer register first, and then the lower/upper
parts of that register are used for the bitcast.
However, mov(64_bit_reg, imm) done by the legalization
isn't legal.

Similarly, trunc of 64-bit immediates need to take the
lower half of the immediate, not legalize to a var first.

This shifts the legalization code around.

Other cases where immediates are illegal and legalized
are idiv/div, but for those cases 64-bit operands are
handled separately via a function call. The function
call code properly splits up immediate arguments.

BUG=none
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/348373005

10 years agoAdd a few Subzero intrinsics (not the atomic ones yet).
Jan Voung [Wed, 18 Jun 2014 17:50:57 +0000 (10:50 -0700)]
Add a few Subzero intrinsics (not the atomic ones yet).

Handle:
* mem{cpy,move,set} (without optimizations for known lengths)
* nacl.read.tp
* setjmp, longjmp
* trap

Mostly see if the dispatching/organization is okay.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=3882
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/321993002

10 years agoAdd ss/sd suffix to InstX8632Store and legalize FP constants.
Jan Voung [Wed, 18 Jun 2014 17:42:02 +0000 (10:42 -0700)]
Add ss/sd suffix to InstX8632Store and legalize FP constants.

InstX8632Store is essentially a "mov" and it would emit
a mov, but it did not add the ss/sd suffix based on the operand type.

Also, there are some cases where legalization would leave
two memory operands in the case that one of them
is a floating point immediate:

storeDoubleConst:
.LstoreDoubleConst$entry:
  mov     eax, dword ptr [esp+4]
  mov     qword ptr [eax], qword ptr [L$double$1]
  ret

BUG=none
R=stichnot@chromium.org, wala@chromium.org

Review URL: https://codereview.chromium.org/341683002

10 years agoUse GlobalContext::getConstantZero() to get zero valued constants.
Matt Wala [Wed, 18 Jun 2014 17:30:07 +0000 (10:30 -0700)]
Use GlobalContext::getConstantZero() to get zero valued constants.

BUG=none
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/344613002