docs/MemorySSA.rst

   1 =========
   2 MemorySSA
   3 =========
   4
   5 .. contents::
   6    :local:
   7
   8 Introduction
   9 ============
  10
  11 ``MemorySSA`` is an analysis that allows us to cheaply reason about the
  12 interactions between various memory operations. Its goal is to replace
  13 ``MemoryDependenceAnalysis`` for most (if not all) use-cases. This is because,
  14 unless you're very careful, use of ``MemoryDependenceAnalysis`` can easily
  15 result in quadratic-time algorithms in LLVM. Additionally, ``MemorySSA`` doesn't
  16 have as many arbitrary limits as ``MemoryDependenceAnalysis``, so you should get
  17 better results, too.
  18
  19 At a high level, one of the goals of ``MemorySSA`` is to provide an SSA based
  20 form for memory, complete with def-use and use-def chains, which
  21 enables users to quickly find may-def and may-uses of memory operations.
  22 It can also be thought of as a way to cheaply give versions to the complete
  23 state of heap memory, and associate memory operations with those versions.
  24
  25 This document goes over how ``MemorySSA`` is structured, and some basic
  26 intuition on how ``MemorySSA`` works.
  27
  28 A paper on MemorySSA (with notes about how it's implemented in GCC) `can be
  29 found here <http://www.airs.com/dnovillo/Papers/mem-ssa.pdf>`_. Though, it's
  30 relatively out-of-date; the paper references multiple heap partitions, but GCC
  31 eventually swapped to just using one, like we now have in LLVM.  Like
  32 GCC's, LLVM's MemorySSA is intraprocedural.
  33
  34
  35 MemorySSA Structure
  36 ===================
  37
  38 MemorySSA is a virtual IR. After it's built, ``MemorySSA`` will contain a
  39 structure that maps ``Instruction``\ s to ``MemoryAccess``\ es, which are
  40 ``MemorySSA``'s parallel to LLVM ``Instruction``\ s.
  41
  42 Each ``MemoryAccess`` can be one of three types:
  43
  44 - ``MemoryPhi``
  45 - ``MemoryUse``
  46 - ``MemoryDef``
  47
  48 ``MemoryPhi``\ s are ``PhiNode``\ s, but for memory operations. If at any
  49 point we have two (or more) ``MemoryDef``\ s that could flow into a
  50 ``BasicBlock``, the block's top ``MemoryAccess`` will be a
  51 ``MemoryPhi``. As in LLVM IR, ``MemoryPhi``\ s don't correspond to any
  52 concrete operation. As such, ``BasicBlock``\ s are mapped to ``MemoryPhi``\ s
  53 inside ``MemorySSA``, whereas ``Instruction``\ s are mapped to ``MemoryUse``\ s
  54 and ``MemoryDef``\ s.
  55
  56 Note also that in SSA, Phi nodes merge must-reach definitions (that is,
  57 definitions that *must* be new versions of variables). In MemorySSA, PHI nodes
  58 merge may-reach definitions (that is, until disambiguated, the versions that
  59 reach a phi node may or may not clobber a given variable).
  60
  61 ``MemoryUse``\ s are operations which use but don't modify memory. An example of
  62 a ``MemoryUse`` is a ``load``, or a ``readonly`` function call.
  63
  64 ``MemoryDef``\ s are operations which may either modify memory, or which
  65 introduce some kind of ordering constraints. Examples of ``MemoryDef``\ s
  66 include ``store``\ s, function calls, ``load``\ s with ``acquire`` (or higher)
  67 ordering, volatile operations, memory fences, etc.
  68
  69 Every function that exists has a special ``MemoryDef`` called ``liveOnEntry``.
  70 It dominates every ``MemoryAccess`` in the function that ``MemorySSA`` is being
  71 run on, and implies that we've hit the top of the function. It's the only
  72 ``MemoryDef`` that maps to no ``Instruction`` in LLVM IR. Use of
  73 ``liveOnEntry`` implies that the memory being used is either undefined or
  74 defined before the function begins.
  75
  76 An example of all of this overlaid on LLVM IR (obtained by running ``opt
  77 -passes='print<memoryssa>' -disable-output`` on an ``.ll`` file) is below. When
  78 viewing this example, it may be helpful to view it in terms of clobbers. The
  79 operands of a given ``MemoryAccess`` are all (potential) clobbers of said
  80 MemoryAccess, and the value produced by a ``MemoryAccess`` can act as a clobber
  81 for other ``MemoryAccess``\ es. Another useful way of looking at it is in
  82 terms of heap versions.  In that view, operands of a given
  83 ``MemoryAccess`` are the version of the heap before the operation, and
  84 if the access produces a value, the value is the new version of the heap
  85 after the operation.
  86
  87 .. code-block:: llvm
  88
  89   define void @foo() {
  90   entry:
  91     %p1 = alloca i8
  92     %p2 = alloca i8
  93     %p3 = alloca i8
  94     ; 1 = MemoryDef(liveOnEntry)
  95     store i8 0, i8* %p3
  96     br label %while.cond
  97
  98   while.cond:
  99     ; 6 = MemoryPhi({%0,1},{if.end,4})
 100     br i1 undef, label %if.then, label %if.else
 101
 102   if.then:
 103     ; 2 = MemoryDef(6)
 104     store i8 0, i8* %p1
 105     br label %if.end
 106
 107   if.else:
 108     ; 3 = MemoryDef(6)
 109     store i8 1, i8* %p2
 110     br label %if.end
 111
 112   if.end:
 113     ; 5 = MemoryPhi({if.then,2},{if.else,3})
 114     ; MemoryUse(5)
 115     %1 = load i8, i8* %p1
 116     ; 4 = MemoryDef(5)
 117     store i8 2, i8* %p2
 118     ; MemoryUse(1)
 119     %2 = load i8, i8* %p3
 120     br label %while.cond
 121   }
 122
 123 The ``MemorySSA`` IR is shown in comments that precede the instructions they map
 124 to (if such an instruction exists). For example, ``1 = MemoryDef(liveOnEntry)``
 125 is a ``MemoryAccess`` (specifically, a ``MemoryDef``), and it describes the LLVM
 126 instruction ``store i8 0, i8* %p3``. Other places in ``MemorySSA`` refer to this
 127 particular ``MemoryDef`` as ``1`` (much like how one can refer to ``load i8, i8*
 128 %p1`` in LLVM with ``%1``). Again, ``MemoryPhi``\ s don't correspond to any LLVM
 129 Instruction, so the line directly below a ``MemoryPhi`` isn't special.
 130
 131 Going from the top down:
 132
 133 - ``6 = MemoryPhi({entry,1},{if.end,4})`` notes that, when entering
 134   ``while.cond``, the reaching definition for it is either ``1`` or ``4``. This
 135   ``MemoryPhi`` is referred to in the textual IR by the number ``6``.
 136 - ``2 = MemoryDef(6)`` notes that ``store i8 0, i8* %p1`` is a definition,
 137   and its reaching definition before it is ``6``, or the ``MemoryPhi`` after
 138   ``while.cond``. (See the `Build-time use optimization`_ and `Precision`_
 139   sections below for why this ``MemoryDef`` isn't linked to a separate,
 140   disambiguated ``MemoryPhi``.)
 141 - ``3 = MemoryDef(6)`` notes that ``store i8 0, i8* %p2`` is a definition; its
 142   reaching definition is also ``6``.
 143 - ``5 = MemoryPhi({if.then,2},{if.else,3})`` notes that the clobber before
 144   this block could either be ``2`` or ``3``.
 145 - ``MemoryUse(5)`` notes that ``load i8, i8* %p1`` is a use of memory, and that
 146   it's clobbered by ``5``.
 147 - ``4 = MemoryDef(5)`` notes that ``store i8 2, i8* %p2`` is a definition; it's
 148   reaching definition is ``5``.
 149 - ``MemoryUse(1)`` notes that ``load i8, i8* %p3`` is just a user of memory,
 150   and the last thing that could clobber this use is above ``while.cond`` (e.g.
 151   the store to ``%p3``). In heap versioning parlance, it really only depends on
 152   the heap version 1, and is unaffected by the new heap versions generated since
 153   then.
 154
 155 As an aside, ``MemoryAccess`` is a ``Value`` mostly for convenience; it's not
 156 meant to interact with LLVM IR.
 157
 158 Design of MemorySSA
 159 ===================
 160
 161 ``MemorySSA`` is an analysis that can be built for any arbitrary function. When
 162 it's built, it does a pass over the function's IR in order to build up its
 163 mapping of ``MemoryAccess``\ es. You can then query ``MemorySSA`` for things
 164 like the dominance relation between ``MemoryAccess``\ es, and get the
 165 ``MemoryAccess`` for any given ``Instruction`` .
 166
 167 When ``MemorySSA`` is done building, it also hands you a ``MemorySSAWalker``
 168 that you can use (see below).
 169
 170
 171 The walker
 172 ----------
 173
 174 A structure that helps ``MemorySSA`` do its job is the ``MemorySSAWalker``, or
 175 the walker, for short. The goal of the walker is to provide answers to clobber
 176 queries beyond what's represented directly by ``MemoryAccess``\ es. For example,
 177 given:
 178
 179 .. code-block:: llvm
 180
 181   define void @foo() {
 182     %a = alloca i8
 183     %b = alloca i8
 184
 185     ; 1 = MemoryDef(liveOnEntry)
 186     store i8 0, i8* %a
 187     ; 2 = MemoryDef(1)
 188     store i8 0, i8* %b
 189   }
 190
 191 The store to ``%a`` is clearly not a clobber for the store to ``%b``. It would
 192 be the walker's goal to figure this out, and return ``liveOnEntry`` when queried
 193 for the clobber of ``MemoryAccess`` ``2``.
 194
 195 By default, ``MemorySSA`` provides a walker that can optimize ``MemoryDef``\ s
 196 and ``MemoryUse``\ s by consulting whatever alias analysis stack you happen to
 197 be using. Walkers were built to be flexible, though, so it's entirely reasonable
 198 (and expected) to create more specialized walkers (e.g. one that specifically
 199 queries ``GlobalsAA``, one that always stops at ``MemoryPhi`` nodes, etc).
 200
 201
 202 Locating clobbers yourself
 203 ^^^^^^^^^^^^^^^^^^^^^^^^^^
 204
 205 If you choose to make your own walker, you can find the clobber for a
 206 ``MemoryAccess`` by walking every ``MemoryDef`` that dominates said
 207 ``MemoryAccess``. The structure of ``MemoryDef``\ s makes this relatively simple;
 208 they ultimately form a linked list of every clobber that dominates the
 209 ``MemoryAccess`` that you're trying to optimize. In other words, the
 210 ``definingAccess`` of a ``MemoryDef`` is always the nearest dominating
 211 ``MemoryDef`` or ``MemoryPhi`` of said ``MemoryDef``.
 212
 213
 214 Build-time use optimization
 215 ---------------------------
 216
 217 ``MemorySSA`` will optimize some ``MemoryAccess``\ es at build-time.
 218 Specifically, we optimize the operand of every ``MemoryUse`` to point to the
 219 actual clobber of said ``MemoryUse``. This can be seen in the above example; the
 220 second ``MemoryUse`` in ``if.end`` has an operand of ``1``, which is a
 221 ``MemoryDef`` from the entry block.  This is done to make walking,
 222 value numbering, etc, faster and easier.
 223
 224 It is not possible to optimize ``MemoryDef`` in the same way, as we
 225 restrict ``MemorySSA`` to one heap variable and, thus, one Phi node
 226 per block.
 227
 228
 229 Invalidation and updating
 230 -------------------------
 231
 232 Because ``MemorySSA`` keeps track of LLVM IR, it needs to be updated whenever
 233 the IR is updated. "Update", in this case, includes the addition, deletion, and
 234 motion of ``Instructions``. The update API is being made on an as-needed basis.
 235 If you'd like examples, ``GVNHoist`` is a user of ``MemorySSA``\ s update API.
 236
 237
 238 Phi placement
 239 ^^^^^^^^^^^^^
 240
 241 ``MemorySSA`` only places ``MemoryPhi``\ s where they're actually
 242 needed. That is, it is a pruned SSA form, like LLVM's SSA form.  For
 243 example, consider:
 244
 245 .. code-block:: llvm
 246
 247   define void @foo() {
 248   entry:
 249     %p1 = alloca i8
 250     %p2 = alloca i8
 251     %p3 = alloca i8
 252     ; 1 = MemoryDef(liveOnEntry)
 253     store i8 0, i8* %p3
 254     br label %while.cond
 255
 256   while.cond:
 257     ; 3 = MemoryPhi({%0,1},{if.end,2})
 258     br i1 undef, label %if.then, label %if.else
 259
 260   if.then:
 261     br label %if.end
 262
 263   if.else:
 264     br label %if.end
 265
 266   if.end:
 267     ; MemoryUse(1)
 268     %1 = load i8, i8* %p1
 269     ; 2 = MemoryDef(3)
 270     store i8 2, i8* %p2
 271     ; MemoryUse(1)
 272     %2 = load i8, i8* %p3
 273     br label %while.cond
 274   }
 275
 276 Because we removed the stores from ``if.then`` and ``if.else``, a ``MemoryPhi``
 277 for ``if.end`` would be pointless, so we don't place one. So, if you need to
 278 place a ``MemoryDef`` in ``if.then`` or ``if.else``, you'll need to also create
 279 a ``MemoryPhi`` for ``if.end``.
 280
 281 If it turns out that this is a large burden, we can just place ``MemoryPhi``\ s
 282 everywhere. Because we have Walkers that are capable of optimizing above said
 283 phis, doing so shouldn't prohibit optimizations.
 284
 285
 286 Non-Goals
 287 ---------
 288
 289 ``MemorySSA`` is meant to reason about the relation between memory
 290 operations, and enable quicker querying.
 291 It isn't meant to be the single source of truth for all potential memory-related
 292 optimizations. Specifically, care must be taken when trying to use ``MemorySSA``
 293 to reason about atomic or volatile operations, as in:
 294
 295 .. code-block:: llvm
 296
 297   define i8 @foo(i8* %a) {
 298   entry:
 299     br i1 undef, label %if.then, label %if.end
 300
 301   if.then:
 302     ; 1 = MemoryDef(liveOnEntry)
 303     %0 = load volatile i8, i8* %a
 304     br label %if.end
 305
 306   if.end:
 307     %av = phi i8 [0, %entry], [%0, %if.then]
 308     ret i8 %av
 309   }
 310
 311 Going solely by ``MemorySSA``'s analysis, hoisting the ``load`` to ``entry`` may
 312 seem legal. Because it's a volatile load, though, it's not.
 313
 314
 315 Design tradeoffs
 316 ----------------
 317
 318 Precision
 319 ^^^^^^^^^
 320
 321 ``MemorySSA`` in LLVM deliberately trades off precision for speed.
 322 Let us think about memory variables as if they were disjoint partitions of the
 323 heap (that is, if you have one variable, as above, it represents the entire
 324 heap, and if you have multiple variables, each one represents some
 325 disjoint portion of the heap)
 326
 327 First, because alias analysis results conflict with each other, and
 328 each result may be what an analysis wants (IE
 329 TBAA may say no-alias, and something else may say must-alias), it is
 330 not possible to partition the heap the way every optimization wants.
 331 Second, some alias analysis results are not transitive (IE A noalias B,
 332 and B noalias C, does not mean A noalias C), so it is not possible to
 333 come up with a precise partitioning in all cases without variables to
 334 represent every pair of possible aliases.  Thus, partitioning
 335 precisely may require introducing at least N^2 new virtual variables,
 336 phi nodes, etc.
 337
 338 Each of these variables may be clobbered at multiple def sites.
 339
 340 To give an example, if you were to split up struct fields into
 341 individual variables, all aliasing operations that may-def multiple struct
 342 fields, will may-def more than one of them.  This is pretty common (calls,
 343 copies, field stores, etc).
 344
 345 Experience with SSA forms for memory in other compilers has shown that
 346 it is simply not possible to do this precisely, and in fact, doing it
 347 precisely is not worth it, because now all the optimizations have to
 348 walk tons and tons of virtual variables and phi nodes.
 349
 350 So we partition.  At the point at which you partition, again,
 351 experience has shown us there is no point in partitioning to more than
 352 one variable.  It simply generates more IR, and optimizations still
 353 have to query something to disambiguate further anyway.
 354
 355 As a result, LLVM partitions to one variable.
 356
 357 Use Optimization
 358 ^^^^^^^^^^^^^^^^
 359
 360 Unlike other partitioned forms, LLVM's ``MemorySSA`` does make one
 361 useful guarantee - all loads are optimized to point at the thing that
 362 actually clobbers them. This gives some nice properties.  For example,
 363 for a given store, you can find all loads actually clobbered by that
 364 store by walking the immediate uses of the store.