From 98be03e2e2a261628e067a4f89fb26fd799f7f86 Mon Sep 17 00:00:00 2001 From: Ayal Zaks Date: Mon, 29 May 2017 15:36:23 +0000 Subject: [PATCH] [Docs] Add VectorizationPlan to docs/Proposals. Following the request made in https://reviews.llvm.org/D32871, the general documentation of the Vectorization Plan is hereby placed under docs/Proposals. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304161 91177308-0d34-0410-b5e6-96231b3b80d8 --- docs/Proposals/VectorizationPlan.rst | 182 +++++++++++++++++++++++++++++++++++ docs/Vectorizers.rst | 11 +++ docs/index.rst | 3 + 3 files changed, 196 insertions(+) create mode 100644 docs/Proposals/VectorizationPlan.rst diff --git a/docs/Proposals/VectorizationPlan.rst b/docs/Proposals/VectorizationPlan.rst new file mode 100644 index 00000000000..82ce4b2de17 --- /dev/null +++ b/docs/Proposals/VectorizationPlan.rst @@ -0,0 +1,182 @@ +================== +Vectorization Plan +================== + +.. contents:: + :local: + +Abstract +======== +The vectorization transformation can be rather complicated, involving several +potential alternatives, especially for outer-loops [1]_ but also possibly for +innermost loops. These alternatives may have significant performance impact, +both positive and negative. A cost model is therefore employed to identify the +best alternative, including the alternative of avoiding any transformation +altogether. + +The Vectorization Plan is an explicit model for describing vectorization +candidates. It serves for both optimizing candidates including estimating their +cost reliably, and for performing their final translation into IR. This +facilitates dealing with multiple vectorization candidates. + +High-level Design +================= + +Vectorization Workflow +---------------------- +VPlan-based vectorization involves three major steps, taking a "scenario-based +approach" to vectorization planning: + +1. Legal Step: check if a loop can be legally vectorized; encode contraints and + artifacts if so. +2. Plan Step: + + a. Build initial VPlans following the constraints and decisions taken by + Legal Step 1, and compute their cost. + b. Apply optimizations to the VPlans, possibly forking additional VPlans. + Prune sub-optimal VPlans having relatively high cost. +3. Execute Step: materialize the best VPlan. Note that this is the only step + that modifies the IR. + +Design Guidelines +----------------- +In what follows, the term "input IR" refers to code that is fed into the +vectorizer whereas the term "output IR" refers to code that is generated by the +vectorizer. The output IR contains code that has been vectorized or "widened" +according to a loop Vectorization Factor (VF), and/or loop unroll-and-jammed +according to an Unroll Factor (UF). +The design of VPlan follows several high-level guidelines: + +1. Analysis-like: building and manipulating VPlans must not modify the input IR. + In particular, if the best option is not to vectorize at all, the + vectorization process terminates before reaching Step 3, and compilation + should proceed as if VPlans had not been built. + +2. Align Cost & Execute: each VPlan must support both estimating the cost and + generating the output IR code, such that the cost estimation evaluates the + to-be-generated code reliably. + +3. Support vectorizing additional constructs: + + a. Outer-loop vectorization. In particular, VPlan must be able to model the + control-flow of the output IR which may include multiple basic-blocks and + nested loops. + b. SLP vectorization. + c. Combinations of the above, including nested vectorization: vectorizing + both an inner loop and an outer-loop at the same time (each with its own + VF and UF), mixed vectorization: vectorizing a loop with SLP patterns + inside [4]_, (re)vectorizing input IR containing vector code. + d. Function vectorization [2]_. + +4. Support multiple candidates efficiently. In particular, similar candidates + related to a range of possible VF's and UF's must be represented efficiently. + Potential versioning needs to be supported efficiently. + +5. Support vectorizing idioms, such as interleaved groups of strided loads or + stores. This is achieved by modeling a sequence of output instructions using + a "Recipe", which is responsible for computing its cost and generating its + code. + +6. Encapsulate Single-Entry Single-Exit regions (SESE). During vectorization + such regions may need to be, for example, predicated and linearized, or + replicated VF*UF times to handle scalarized and predicated instructions. + Innerloops are also modelled as SESE regions. + +Low-level Design +================ +The low-level design of VPlan comprises of the following classes. + +:LoopVectorizationPlanner: + A LoopVectorizationPlanner is designed to handle the vectorization of a loop + or a loop nest. It can construct, optimize and discard one or more VPlans, + each VPlan modelling a distinct way to vectorize the loop or the loop nest. + Once the best VPlan is determined, including the best VF and UF, this VPlan + drives the generation of output IR. + +:VPlan: + A model of a vectorized candidate for a given input IR loop or loop nest. This + candidate is represented using a Hierarchical CFG. VPlan supports estimating + the cost and driving the generation of the output IR code it represents. + +:Hierarchical CFG: + A control-flow graph whose nodes are basic-blocks or Hierarchical CFG's. The + Hierarchical CFG data structure is similar to the Tile Tree [5]_, where + cross-Tile edges are lifted to connect Tiles instead of the original + basic-blocks as in Sharir [6]_, promoting the Tile encapsulation. The terms + Region and Block are used rather than Tile [5]_ to avoid confusion with loop + tiling. + +:VPBlockBase: + The building block of the Hierarchical CFG. A pure-virtual base-class of + VPBasicBlock and VPRegionBlock, see below. VPBlockBase models the hierarchical + control-flow relations with other VPBlocks. Note that in contrast to the IR + BasicBlock, a VPBlockBase models its control-flow successors and predecessors + directly, rather than through a Terminator branch or through predecessor + branches that "use" the VPBlockBase. + +:VPBasicBlock: + VPBasicBlock is a subclass of VPBlockBase, and serves as the leaves of the + Hierarchical CFG. It represents a sequence of output IR instructions that will + appear consecutively in an output IR basic-block. The instructions of this + basic-block originate from one or more VPBasicBlocks. VPBasicBlock holds a + sequence of zero or more VPRecipes that model the cost and generation of the + output IR instructions. + +:VPRegionBlock: + VPRegionBlock is a subclass of VPBlockBase. It models a collection of + VPBasicBlocks and VPRegionBlocks which form a SESE subgraph of the output IR + CFG. A VPRegionBlock may indicate that its contents are to be replicated a + constant number of times when output IR is generated, effectively representing + a loop with constant trip-count that will be completely unrolled. This is used + to support scalarized and predicated instructions with a single model for + multiple candidate VF's and UF's. + +:VPRecipeBase: + A pure-virtual base class modeling a sequence of one or more output IR + instructions, possibly based on one or more input IR instructions. These + input IR instructions are referred to as "Ingredients" of the Recipe. A Recipe + may specify how its ingredients are to be transformed to produce the output IR + instructions; e.g., cloned once, replicated multiple times or widened + according to selected VF. + +:VPTransformState: + Stores information used for generating output IR, passed from + LoopVectorizationPlanner to its selected VPlan for execution, and used to pass + additional information down to VPBlocks and VPRecipes. + +Related LLVM components +----------------------- +1. SLP Vectorizer: one can compare the VPlan model with LLVM's existing SLP + tree, where TSLP [3]_ adds Plan Step 2.b. + +2. RegionInfo: one can compare VPlan's H-CFG with the Region Analysis as used by + Polly [7]_. + +References +---------- +.. [1] "Outer-loop vectorization: revisited for short SIMD architectures", Dorit + Nuzman and Ayal Zaks, PACT 2008. + +.. [2] "Proposal for function vectorization and loop vectorization with function + calls", Xinmin Tian, [`cfe-dev + `_]., + March 2, 2016. + See also `review `_. + +.. [3] "Throttling Automatic Vectorization: When Less is More", Vasileios + Porpodas and Tim Jones, PACT 2015 and LLVM Developers' Meeting 2015. + +.. [4] "Exploiting mixed SIMD parallelism by reducing data reorganization + overhead", Hao Zhou and Jingling Xue, CGO 2016. + +.. [5] "Register Allocation via Hierarchical Graph Coloring", David Callahan and + Brian Koblenz, PLDI 1991 + +.. [6] "Structural analysis: A new approach to flow analysis in optimizing + compilers", M. Sharir, Journal of Computer Languages, Jan. 1980 + +.. [7] "Enabling Polyhedral Optimizations in LLVM", Tobias Grosser, Diploma + thesis, 2011. + +.. [8] "Introducing VPlan to the Loop Vectorizer", Gil Rapaport and Ayal Zaks, + European LLVM Developers' Meeting 2017. diff --git a/docs/Vectorizers.rst b/docs/Vectorizers.rst index a909d458c31..317271af403 100644 --- a/docs/Vectorizers.rst +++ b/docs/Vectorizers.rst @@ -382,6 +382,17 @@ And Linpack-pc with the same configuration. Result is Mflops, higher is better. .. image:: linpack-pc.png +Ongoing Development Directions +------------------------------ + +.. toctree:: + :hidden: + + Proposals/VectorizationPlan + +:doc:`Proposals/VectorizationPlan` + Modeling the process and upgrading the infrastructure of LLVM's Loop Vectorizer. + .. _slp-vectorizer: The SLP Vectorizer diff --git a/docs/index.rst b/docs/index.rst index becbe48e7ec..220df1566bd 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -528,6 +528,7 @@ can be better. CodeOfConduct Proposals/GitHubMove + Proposals/VectorizationPlan :doc:`CodeOfConduct` Proposal to adopt a code of conduct on the LLVM social spaces (lists, events, @@ -536,6 +537,8 @@ can be better. :doc:`Proposals/GitHubMove` Proposal to move from SVN/Git to GitHub. +:doc:`Proposals/VectorizationPlan` + Proposal to model the process and upgrade the infrastructure of LLVM's Loop Vectorizer. Indices and tables ================== -- 2.11.0