docs/Trees.rst

   1 Treating Trees
   2 ==============
   3
   4 Although any expression in Joy can be considered to describe a
   5 `tree <https://en.wikipedia.org/wiki/Tree_structure>`__ with the quotes
   6 as compound nodes and the non-quote values as leaf nodes, in this page I
   7 want to talk about `ordered binary
   8 trees <https://en.wikipedia.org/wiki/Binary_search_tree>`__ and how to
   9 make and use them.
  10
  11 The basic structure, in a `crude type
  12 notation <https://en.wikipedia.org/wiki/Algebraic_data_type>`__, is:
  13
  14 ::
  15
  16    BTree :: [] | [key value BTree BTree]
  17
  18 That says that a BTree is either the empty quote ``[]`` or a quote with
  19 four items: a key, a value, and two BTrees representing the left and
  20 right branches of the tree.
  21
  22 A Function to Traverse this Structure
  23 -------------------------------------
  24
  25 Let’s take a crack at writing a function that can recursively iterate or
  26 traverse these trees.
  27
  28 Base case ``[]``
  29 ^^^^^^^^^^^^^^^^
  30
  31 The stopping predicate just has to detect the empty list:
  32
  33 ::
  34
  35    BTree-iter == [not] [E] [R0] [R1] genrec
  36
  37 And since there’s nothing at this node, we just ``pop`` it:
  38
  39 ::
  40
  41    BTree-iter == [not] [pop] [R0] [R1] genrec
  42
  43 Node case ``[key value left right]``
  44 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  45
  46 Now we need to figure out ``R0`` and ``R1``:
  47
  48 ::
  49
  50    BTree-iter == [not] [pop] [R0]            [R1] genrec
  51               == [not] [pop] [R0 [BTree-iter] R1] ifte
  52
  53 Let’s look at it *in situ*:
  54
  55 ::
  56
  57    [key value left right] R0 [BTree-iter] R1
  58
  59 Processing the current node.
  60 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  61
  62 ``R0`` is almost certainly going to use ``dup`` to make a copy of the
  63 node and then ``dip`` on some function to process the copy with it:
  64
  65 ::
  66
  67    [key value left right] [F] dupdip                 [BTree-iter] R1
  68    [key value left right]  F  [key value left right] [BTree-iter] R1
  69
  70 For example, if we’re getting all the keys ``F`` would be ``first``:
  71
  72 ::
  73
  74    R0 == [first] dupdip
  75
  76    [key value left right] [first] dupdip                 [BTree-iter] R1
  77    [key value left right]  first  [key value left right] [BTree-iter] R1
  78    key                            [key value left right] [BTree-iter] R1
  79
  80 Recur
  81 ^^^^^
  82
  83 Now ``R1`` needs to apply ``[BTree-iter]`` to ``left`` and ``right``. If
  84 we drop the key and value from the node using ``rest`` twice we are left
  85 with an interesting situation:
  86
  87 ::
  88
  89    key [key value left right] [BTree-iter] R1
  90    key [key value left right] [BTree-iter] [rest rest] dip
  91    key [key value left right] rest rest [BTree-iter]
  92    key [left right] [BTree-iter]
  93
  94 Hmm, will ``step`` do?
  95
  96 ::
  97
  98    key [left right] [BTree-iter] step
  99    key left BTree-iter [right] [BTree-iter] step
 100    key left-keys [right] [BTree-iter] step
 101    key left-keys right BTree-iter
 102    key left-keys right-keys
 103
 104 Wow. So:
 105
 106 ::
 107
 108    R1 == [rest rest] dip step
 109
 110 Putting it together
 111 ^^^^^^^^^^^^^^^^^^^
 112
 113 We have:
 114
 115 ::
 116
 117    BTree-iter == [not] [pop] [[F] dupdip] [[rest rest] dip step] genrec
 118
 119 When I was reading this over I realized ``rest rest`` could go in
 120 ``R0``:
 121
 122 ::
 123
 124    BTree-iter == [not] [pop] [[F] dupdip rest rest] [step] genrec
 125
 126 (And ``[step] genrec`` is such a cool and suggestive combinator!)
 127
 128 Parameterizing the ``F`` per-node processing function.
 129 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 130
 131 ::
 132
 133    [F] BTree-iter == [not] [pop] [[F] dupdip rest rest] [step] genrec
 134
 135 Working backward:
 136
 137 ::
 138
 139    [not] [pop] [[F] dupdip rest rest]            [step] genrec
 140    [not] [pop] [F]       [dupdip rest rest] cons [step] genrec
 141    [F] [not] [pop] roll< [dupdip rest rest] cons [step] genrec
 142
 143 Ergo:
 144
 145 ::
 146
 147    BTree-iter == [not] [pop] roll< [dupdip rest rest] cons [step] genrec
 148
 149 .. code:: ipython2
 150
 151     from notebook_preamble import D, J, V, define, DefinitionWrapper
 152
 153 .. code:: ipython2
 154
 155     define('BTree-iter == [not] [pop] roll< [dupdip rest rest] cons [step] genrec')
 156
 157 .. code:: ipython2
 158
 159     J('[] [23] BTree-iter')  #  It doesn't matter what F is as it won't be used.
 160
 161
 162 .. parsed-literal::
 163
 164
 165
 166
 167 .. code:: ipython2
 168
 169     J('["tommy" 23 [] []] [first] BTree-iter')
 170
 171
 172 .. parsed-literal::
 173
 174     'tommy'
 175
 176
 177 .. code:: ipython2
 178
 179     J('["tommy" 23 ["richard" 48 [] []] ["jenny" 18 [] []]] [first] BTree-iter')
 180
 181
 182 .. parsed-literal::
 183
 184     'tommy' 'richard' 'jenny'
 185
 186
 187 .. code:: ipython2
 188
 189     J('["tommy" 23 ["richard" 48 [] []] ["jenny" 18 [] []]] [second] BTree-iter')
 190
 191
 192 .. parsed-literal::
 193
 194     23 48 18
 195
 196
 197 Adding Nodes to the BTree
 198 =========================
 199
 200 Let’s consider adding nodes to a BTree structure.
 201
 202 ::
 203
 204    BTree value key BTree-add == BTree
 205
 206 Adding to an empty node.
 207 ^^^^^^^^^^^^^^^^^^^^^^^^
 208
 209 If the current node is ``[]`` then you just return
 210 ``[key value [] []]``:
 211
 212 ::
 213
 214    BTree-add == [popop not] [[pop] dipd BTree-new] [R0] [R1] genrec
 215
 216 Where ``BTree-new`` is:
 217
 218 ::
 219
 220    value key BTree-new == [key value [] []]
 221
 222    value key swap [[] []] cons cons
 223    key value      [[] []] cons cons
 224    key      [value [] []]      cons
 225         [key value [] []]
 226
 227    BTree-new == swap [[] []] cons cons
 228
 229 .. code:: ipython2
 230
 231     define('BTree-new == swap [[] []] cons cons')
 232
 233 .. code:: ipython2
 234
 235     V('"v" "k" BTree-new')
 236
 237
 238 .. parsed-literal::
 239
 240                     . 'v' 'k' BTree-new
 241                 'v' . 'k' BTree-new
 242             'v' 'k' . BTree-new
 243             'v' 'k' . swap [[] []] cons cons
 244             'k' 'v' . [[] []] cons cons
 245     'k' 'v' [[] []] . cons cons
 246     'k' ['v' [] []] . cons
 247     ['k' 'v' [] []] .
 248
 249
 250 (As an implementation detail, the ``[[] []]`` literal used in the
 251 definition of ``BTree-new`` will be reused to supply the *constant* tail
 252 for *all* new nodes produced by it. This is one of those cases where you
 253 get amortized storage “for free” by using `persistent
 254 datastructures <https://en.wikipedia.org/wiki/Persistent_data_structure>`__.
 255 Because the tail, which is ``((), ((), ()))`` in Python, is immutable
 256 and embedded in the definition body for ``BTree-new``, all new nodes can
 257 reuse it as their own tail without fear that some other code somewhere
 258 will change it.)
 259
 260 If the current node isn’t empty.
 261 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 262
 263 We now have to derive ``R0`` and ``R1``, consider:
 264
 265 ::
 266
 267    [key_n value_n left right] value key R0 [BTree-add] R1
 268
 269 In this case, there are three possibilites: the key can be greater or
 270 less than or equal to the node’s key. In two of those cases we will need
 271 to apply a copy of ``BTree-add``, so ``R0`` is pretty much out of the
 272 picture.
 273
 274 ::
 275
 276    [R0] == []
 277
 278 A predicate to compare keys.
 279 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 280
 281 The first thing we need to do is compare the the key we’re adding to see
 282 if it is greater than the node key and ``branch`` accordingly, although
 283 in this case it’s easier to write a destructive predicate and then use
 284 ``ifte`` to apply it ``nullary``:
 285
 286 ::
 287
 288    [key_n value_n left right] value key [BTree-add] R1
 289    [key_n value_n left right] value key [BTree-add] [P >] [T] [E] ifte
 290
 291    [key_n value_n left right] value key [BTree-add] P                   >
 292    [key_n value_n left right] value key [BTree-add] pop roll> pop first >
 293    [key_n value_n left right] value key                 roll> pop first >
 294    key [key_n value_n left right] value                 roll> pop first >
 295    key key_n                                                            >
 296    Boolean
 297
 298    P > == pop roll> pop first >
 299    P < == pop roll> pop first <
 300    P   == pop roll> pop first
 301
 302 .. code:: ipython2
 303
 304     define('P == pop roll> pop first')
 305
 306 .. code:: ipython2
 307
 308     V('["k" "v" [] []] "vv" "kk" [0] P >')
 309
 310
 311 .. parsed-literal::
 312
 313                                   . ['k' 'v' [] []] 'vv' 'kk' [0] P >
 314                   ['k' 'v' [] []] . 'vv' 'kk' [0] P >
 315              ['k' 'v' [] []] 'vv' . 'kk' [0] P >
 316         ['k' 'v' [] []] 'vv' 'kk' . [0] P >
 317     ['k' 'v' [] []] 'vv' 'kk' [0] . P >
 318     ['k' 'v' [] []] 'vv' 'kk' [0] . pop roll> pop first >
 319         ['k' 'v' [] []] 'vv' 'kk' . roll> pop first >
 320         'kk' ['k' 'v' [] []] 'vv' . pop first >
 321              'kk' ['k' 'v' [] []] . first >
 322                          'kk' 'k' . >
 323                              True .
 324
 325
 326 If the key we’re adding is greater than the node’s key.
 327 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 328
 329 Here the parantheses are meant to signify that the right-hand side (RHS)
 330 is not literal, the code in the parentheses is meant to have been
 331 evaluated:
 332
 333 ::
 334
 335    [key_n value_n left right] value key [BTree-add] T == [key_n value_n left (BTree-add key value right)]
 336
 337 Use ``infra`` on ``K``.
 338 ^^^^^^^^^^^^^^^^^^^^^^^
 339
 340 So how do we do this? We know we’re going to want to use ``infra`` on
 341 some function ``K`` that has the key and value to work with, as well as
 342 the quoted copy of ``BTree-add`` to apply somehow:
 343
 344 ::
 345
 346    right left value_n key_n value key [BTree-add] K
 347        ...
 348    right value key BTree-add left value_n key_n
 349
 350 Pretty easy:
 351
 352 ::
 353
 354    right left value_n key_n value key [BTree-add] cons cons dipdd
 355    right left value_n key_n [value key BTree-add]           dipdd
 356    right value key BTree-add left value_n key_n
 357
 358 So:
 359
 360 ::
 361
 362    K == cons cons dipdd
 363
 364 And:
 365
 366 ::
 367
 368    [key_n value_n left right] [value key [BTree-add] K] infra
 369
 370 Derive ``T``.
 371 ^^^^^^^^^^^^^
 372
 373 So now we’re at getting from this to this:
 374
 375 ::
 376
 377    [key_n value_n left right]  value key [BTree-add] T
 378        ...
 379    [key_n value_n left right] [value key [BTree-add] K] infra
 380
 381 And so ``T`` is just:
 382
 383 ::
 384
 385    value key [BTree-add] T == [value key [BTree-add] K]                infra
 386                          T == [                      K] cons cons cons infra
 387
 388 .. code:: ipython2
 389
 390     define('K == cons cons dipdd')
 391     define('T == [K] cons cons cons infra')
 392
 393 .. code:: ipython2
 394
 395     V('"r" "l" "v" "k" "vv" "kk" [0] K')
 396
 397
 398 .. parsed-literal::
 399
 400                                   . 'r' 'l' 'v' 'k' 'vv' 'kk' [0] K
 401                               'r' . 'l' 'v' 'k' 'vv' 'kk' [0] K
 402                           'r' 'l' . 'v' 'k' 'vv' 'kk' [0] K
 403                       'r' 'l' 'v' . 'k' 'vv' 'kk' [0] K
 404                   'r' 'l' 'v' 'k' . 'vv' 'kk' [0] K
 405              'r' 'l' 'v' 'k' 'vv' . 'kk' [0] K
 406         'r' 'l' 'v' 'k' 'vv' 'kk' . [0] K
 407     'r' 'l' 'v' 'k' 'vv' 'kk' [0] . K
 408     'r' 'l' 'v' 'k' 'vv' 'kk' [0] . cons cons dipdd
 409     'r' 'l' 'v' 'k' 'vv' ['kk' 0] . cons dipdd
 410     'r' 'l' 'v' 'k' ['vv' 'kk' 0] . dipdd
 411                               'r' . 'vv' 'kk' 0 'l' 'v' 'k'
 412                          'r' 'vv' . 'kk' 0 'l' 'v' 'k'
 413                     'r' 'vv' 'kk' . 0 'l' 'v' 'k'
 414                   'r' 'vv' 'kk' 0 . 'l' 'v' 'k'
 415               'r' 'vv' 'kk' 0 'l' . 'v' 'k'
 416           'r' 'vv' 'kk' 0 'l' 'v' . 'k'
 417       'r' 'vv' 'kk' 0 'l' 'v' 'k' .
 418
 419
 420 .. code:: ipython2
 421
 422     V('["k" "v" "l" "r"] "vv" "kk" [0] T')
 423
 424
 425 .. parsed-literal::
 426
 427                                         . ['k' 'v' 'l' 'r'] 'vv' 'kk' [0] T
 428                       ['k' 'v' 'l' 'r'] . 'vv' 'kk' [0] T
 429                  ['k' 'v' 'l' 'r'] 'vv' . 'kk' [0] T
 430             ['k' 'v' 'l' 'r'] 'vv' 'kk' . [0] T
 431         ['k' 'v' 'l' 'r'] 'vv' 'kk' [0] . T
 432         ['k' 'v' 'l' 'r'] 'vv' 'kk' [0] . [K] cons cons cons infra
 433     ['k' 'v' 'l' 'r'] 'vv' 'kk' [0] [K] . cons cons cons infra
 434     ['k' 'v' 'l' 'r'] 'vv' 'kk' [[0] K] . cons cons infra
 435     ['k' 'v' 'l' 'r'] 'vv' ['kk' [0] K] . cons infra
 436     ['k' 'v' 'l' 'r'] ['vv' 'kk' [0] K] . infra
 437                         'r' 'l' 'v' 'k' . 'vv' 'kk' [0] K [] swaack
 438                    'r' 'l' 'v' 'k' 'vv' . 'kk' [0] K [] swaack
 439               'r' 'l' 'v' 'k' 'vv' 'kk' . [0] K [] swaack
 440           'r' 'l' 'v' 'k' 'vv' 'kk' [0] . K [] swaack
 441           'r' 'l' 'v' 'k' 'vv' 'kk' [0] . cons cons dipdd [] swaack
 442           'r' 'l' 'v' 'k' 'vv' ['kk' 0] . cons dipdd [] swaack
 443           'r' 'l' 'v' 'k' ['vv' 'kk' 0] . dipdd [] swaack
 444                                     'r' . 'vv' 'kk' 0 'l' 'v' 'k' [] swaack
 445                                'r' 'vv' . 'kk' 0 'l' 'v' 'k' [] swaack
 446                           'r' 'vv' 'kk' . 0 'l' 'v' 'k' [] swaack
 447                         'r' 'vv' 'kk' 0 . 'l' 'v' 'k' [] swaack
 448                     'r' 'vv' 'kk' 0 'l' . 'v' 'k' [] swaack
 449                 'r' 'vv' 'kk' 0 'l' 'v' . 'k' [] swaack
 450             'r' 'vv' 'kk' 0 'l' 'v' 'k' . [] swaack
 451          'r' 'vv' 'kk' 0 'l' 'v' 'k' [] . swaack
 452           ['k' 'v' 'l' 0 'kk' 'vv' 'r'] .
 453
 454
 455 If the key we’re adding is less than the node’s key.
 456 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 457
 458 This is very very similar to the above:
 459
 460 ::
 461
 462    [key_n value_n left right] value key [BTree-add] E
 463    [key_n value_n left right] value key [BTree-add] [P <] [Te] [Ee] ifte
 464
 465 In this case ``Te`` works that same as ``T`` but on the left child tree
 466 instead of the right, so the only difference is that it must use
 467 ``dipd`` instead of ``dipdd``:
 468
 469 ::
 470
 471    Te == [cons cons dipd] cons cons cons infra
 472
 473 This suggests an alternate factorization:
 474
 475 ::
 476
 477    ccons == cons cons
 478    T == [ccons dipdd] ccons cons infra
 479    Te == [ccons dipd] ccons cons infra
 480
 481 But whatever.
 482
 483 .. code:: ipython2
 484
 485     define('Te == [cons cons dipd] cons cons cons infra')
 486
 487 .. code:: ipython2
 488
 489     V('["k" "v" "l" "r"] "vv" "kk" [0] Te')
 490
 491
 492 .. parsed-literal::
 493
 494                                                      . ['k' 'v' 'l' 'r'] 'vv' 'kk' [0] Te
 495                                    ['k' 'v' 'l' 'r'] . 'vv' 'kk' [0] Te
 496                               ['k' 'v' 'l' 'r'] 'vv' . 'kk' [0] Te
 497                          ['k' 'v' 'l' 'r'] 'vv' 'kk' . [0] Te
 498                      ['k' 'v' 'l' 'r'] 'vv' 'kk' [0] . Te
 499                      ['k' 'v' 'l' 'r'] 'vv' 'kk' [0] . [cons cons dipd] cons cons cons infra
 500     ['k' 'v' 'l' 'r'] 'vv' 'kk' [0] [cons cons dipd] . cons cons cons infra
 501     ['k' 'v' 'l' 'r'] 'vv' 'kk' [[0] cons cons dipd] . cons cons infra
 502     ['k' 'v' 'l' 'r'] 'vv' ['kk' [0] cons cons dipd] . cons infra
 503     ['k' 'v' 'l' 'r'] ['vv' 'kk' [0] cons cons dipd] . infra
 504                                      'r' 'l' 'v' 'k' . 'vv' 'kk' [0] cons cons dipd [] swaack
 505                                 'r' 'l' 'v' 'k' 'vv' . 'kk' [0] cons cons dipd [] swaack
 506                            'r' 'l' 'v' 'k' 'vv' 'kk' . [0] cons cons dipd [] swaack
 507                        'r' 'l' 'v' 'k' 'vv' 'kk' [0] . cons cons dipd [] swaack
 508                        'r' 'l' 'v' 'k' 'vv' ['kk' 0] . cons dipd [] swaack
 509                        'r' 'l' 'v' 'k' ['vv' 'kk' 0] . dipd [] swaack
 510                                              'r' 'l' . 'vv' 'kk' 0 'v' 'k' [] swaack
 511                                         'r' 'l' 'vv' . 'kk' 0 'v' 'k' [] swaack
 512                                    'r' 'l' 'vv' 'kk' . 0 'v' 'k' [] swaack
 513                                  'r' 'l' 'vv' 'kk' 0 . 'v' 'k' [] swaack
 514                              'r' 'l' 'vv' 'kk' 0 'v' . 'k' [] swaack
 515                          'r' 'l' 'vv' 'kk' 0 'v' 'k' . [] swaack
 516                       'r' 'l' 'vv' 'kk' 0 'v' 'k' [] . swaack
 517                        ['k' 'v' 0 'kk' 'vv' 'l' 'r'] .
 518
 519
 520 Else the keys must be equal.
 521 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 522
 523 This means we must find:
 524
 525 ::
 526
 527    [key_n value_n left right] value key [BTree-add] Ee
 528        ...
 529    [key value left right]
 530
 531 This is another easy one:
 532
 533 ::
 534
 535    Ee == pop swap roll< rest rest cons cons
 536
 537    [key_n value_n left right] value key [BTree-add] pop swap roll< rest rest cons cons
 538    [key_n value_n left right] value key                 swap roll< rest rest cons cons
 539    [key_n value_n left right] key value                      roll< rest rest cons cons
 540    key value [key_n value_n left right]                            rest rest cons cons
 541    key value [              left right]                                      cons cons
 542              [key   value   left right]
 543
 544 .. code:: ipython2
 545
 546     define('Ee == pop swap roll< rest rest cons cons')
 547
 548 .. code:: ipython2
 549
 550     V('["k" "v" "l" "r"] "vv" "k" [0] Ee')
 551
 552
 553 .. parsed-literal::
 554
 555                                    . ['k' 'v' 'l' 'r'] 'vv' 'k' [0] Ee
 556                  ['k' 'v' 'l' 'r'] . 'vv' 'k' [0] Ee
 557             ['k' 'v' 'l' 'r'] 'vv' . 'k' [0] Ee
 558         ['k' 'v' 'l' 'r'] 'vv' 'k' . [0] Ee
 559     ['k' 'v' 'l' 'r'] 'vv' 'k' [0] . Ee
 560     ['k' 'v' 'l' 'r'] 'vv' 'k' [0] . pop swap roll< rest rest cons cons
 561         ['k' 'v' 'l' 'r'] 'vv' 'k' . swap roll< rest rest cons cons
 562         ['k' 'v' 'l' 'r'] 'k' 'vv' . roll< rest rest cons cons
 563         'k' 'vv' ['k' 'v' 'l' 'r'] . rest rest cons cons
 564             'k' 'vv' ['v' 'l' 'r'] . rest cons cons
 565                 'k' 'vv' ['l' 'r'] . cons cons
 566                 'k' ['vv' 'l' 'r'] . cons
 567                 ['k' 'vv' 'l' 'r'] .
 568
 569
 570 .. code:: ipython2
 571
 572     define('E == [P <] [Te] [Ee] ifte')
 573
 574 Now we can define ``BTree-add``
 575 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 576
 577 ::
 578
 579    BTree-add == [popop not] [[pop] dipd BTree-new] [] [[P >] [T] [E] ifte] genrec
 580
 581 Putting it all together:
 582
 583 ::
 584
 585    BTree-new == swap [[] []] cons cons
 586    P == pop roll> pop first
 587    T == [cons cons dipdd] cons cons cons infra
 588    Te == [cons cons dipd] cons cons cons infra
 589    Ee == pop swap roll< rest rest cons cons
 590    E == [P <] [Te] [Ee] ifte
 591
 592    BTree-add == [popop not] [[pop] dipd BTree-new] [] [[P >] [T] [E] ifte] genrec
 593
 594 .. code:: ipython2
 595
 596     define('BTree-add == [popop not] [[pop] dipd BTree-new] [] [[P >] [T] [E] ifte] genrec')
 597
 598 .. code:: ipython2
 599
 600     J('[] 23 "b" BTree-add')  # Initial
 601
 602
 603 .. parsed-literal::
 604
 605     ['b' 23 [] []]
 606
 607
 608 .. code:: ipython2
 609
 610     J('["b" 23 [] []] 88 "c" BTree-add')  # Less than
 611
 612
 613 .. parsed-literal::
 614
 615     ['b' 23 [] ['c' 88 [] []]]
 616
 617
 618 .. code:: ipython2
 619
 620     J('["b" 23 [] []] 88 "a" BTree-add')  # Greater than
 621
 622
 623 .. parsed-literal::
 624
 625     ['b' 23 ['a' 88 [] []] []]
 626
 627
 628 .. code:: ipython2
 629
 630     J('["b" 23 [] []] 88 "b" BTree-add')  # Equal to
 631
 632
 633 .. parsed-literal::
 634
 635     ['b' 88 [] []]
 636
 637
 638 .. code:: ipython2
 639
 640     J('[] 23 "a" BTree-add 88 "b" BTree-add 44 "c" BTree-add')  # Series.
 641
 642
 643 .. parsed-literal::
 644
 645     ['a' 23 [] ['b' 88 [] ['c' 44 [] []]]]
 646
 647
 648 We can use this to make a set-like datastructure by just setting values
 649 to e.g. 0 and ignoring them. It’s set-like in that duplicate items added
 650 to it will only occur once within it, and we can query it in
 651 `:math:`O(\log_2 N)` <https://en.wikipedia.org/wiki/Binary_search_tree#cite_note-2>`__
 652 time.
 653
 654 .. code:: ipython2
 655
 656     J('[] [3 9 5 2 8 6 7 8 4] [0 swap BTree-add] step')
 657
 658
 659 .. parsed-literal::
 660
 661     [3 0 [2 0 [] []] [9 0 [5 0 [4 0 [] []] [8 0 [6 0 [] [7 0 [] []]] []]] []]]
 662
 663
 664 .. code:: ipython2
 665
 666     define('to_set == [] swap [0 swap BTree-add] step')
 667
 668 .. code:: ipython2
 669
 670     J('[3 9 5 2 8 6 7 8 4] to_set')
 671
 672
 673 .. parsed-literal::
 674
 675     [3 0 [2 0 [] []] [9 0 [5 0 [4 0 [] []] [8 0 [6 0 [] [7 0 [] []]] []]] []]]
 676
 677
 678 And with that we can write a little program to remove duplicate items
 679 from a list.
 680
 681 .. code:: ipython2
 682
 683     define('unique == [to_set [first] BTree-iter] cons run')
 684
 685 .. code:: ipython2
 686
 687     J('[3 9 3 5 2 9 8 8 8 6 2 7 8 4 3] unique')  # Filter duplicate items.
 688
 689
 690 .. parsed-literal::
 691
 692     [7 6 8 4 5 9 2 3]
 693
 694
 695 ``cmp`` combinator
 696 ==================
 697
 698 Instead of all this mucking about with nested ``ifte`` let’s just go
 699 whole hog and define ``cmp`` which takes two values and three quoted
 700 programs on the stack and runs one of the three depending on the results
 701 of comparing the two values:
 702
 703 ::
 704
 705       a b [G] [E] [L] cmp
 706    ------------------------- a > b
 707            G
 708
 709       a b [G] [E] [L] cmp
 710    ------------------------- a = b
 711                E
 712
 713       a b [G] [E] [L] cmp
 714    ------------------------- a < b
 715                    L
 716
 717 We need a new non-destructive predicate ``P``:
 718
 719 ::
 720
 721    [key_n value_n left right] value key [BTree-add] P
 722    [key_n value_n left right] value key [BTree-add] over [Q] nullary
 723    [key_n value_n left right] value key [BTree-add] key  [Q] nullary
 724    [key_n value_n left right] value key [BTree-add] key   Q
 725    [key_n value_n left right] value key [BTree-add] key   popop popop first
 726    [key_n value_n left right] value key                         popop first
 727    [key_n value_n left right]                                         first
 728     key_n
 729    [key_n value_n left right] value key [BTree-add] key  [Q] nullary
 730    [key_n value_n left right] value key [BTree-add] key key_n
 731
 732    P == over [popop popop first] nullary
 733
 734 Here are the definitions again, pruned and renamed in some cases:
 735
 736 ::
 737
 738    BTree-new == swap [[] []] cons cons
 739    P == over [popop popop first] nullary
 740    T> == [cons cons dipdd] cons cons cons infra
 741    T< == [cons cons dipd] cons cons cons infra
 742    E == pop swap roll< rest rest cons cons
 743
 744 Using ``cmp`` to simplify `our code above at
 745 ``R1`` <#If-the-current-node-isn't-empty.>`__:
 746
 747 ::
 748
 749    [key_n value_n left right] value key [BTree-add] R1
 750    [key_n value_n left right] value key [BTree-add] P [T>] [E] [T<] cmp
 751
 752 The line above becomes one of the three lines below:
 753
 754 ::
 755
 756    [key_n value_n left right] value key [BTree-add] T>
 757    [key_n value_n left right] value key [BTree-add] E
 758    [key_n value_n left right] value key [BTree-add] T<
 759
 760 The definition is a little longer but, I think, more elegant and easier
 761 to understand:
 762
 763 ::
 764
 765    BTree-add == [popop not] [[pop] dipd BTree-new] [] [P [T>] [E] [T<] cmp] genrec
 766
 767 .. code:: ipython2
 768
 769     from joy.library import FunctionWrapper
 770     from joy.utils.stack import concat
 771     from notebook_preamble import D
 772
 773
 774     @FunctionWrapper
 775     def cmp_(stack, expression, dictionary):
 776         '''
 777         cmp takes two values and three quoted programs on the stack and runs
 778         one of the three depending on the results of comparing the two values:
 779
 780                a b [G] [E] [L] cmp
 781             ------------------------- a > b
 782                     G
 783
 784                a b [G] [E] [L] cmp
 785             ------------------------- a = b
 786                         E
 787
 788                a b [G] [E] [L] cmp
 789             ------------------------- a < b
 790                             L
 791         '''
 792         L, (E, (G, (b, (a, stack)))) = stack
 793         expression = concat(G if a > b else L if a < b else E, expression)
 794         return stack, expression, dictionary
 795
 796
 797     D['cmp'] = cmp_
 798
 799 .. code:: ipython2
 800
 801     from joy.library import FunctionWrapper, S_ifte
 802
 803
 804     @FunctionWrapper
 805     def cond(stack, expression, dictionary):
 806       '''
 807       like a case statement; works by rewriting into a chain of ifte.
 808
 809       [..[[Bi] Ti]..[D]] -> ...
 810
 811
 812             [[[B0] T0] [[B1] T1] [D]] cond
 813       -----------------------------------------
 814          [B0] [T0] [[B1] [T1] [D] ifte] ifte
 815
 816       '''
 817       conditions, stack = stack
 818       if conditions:
 819         expression = _cond(conditions, expression)
 820         try:
 821           # Attempt to preload the args to first ifte.
 822           (P, (T, (E, expression))) = expression
 823         except ValueError:
 824           # If, for any reason, the argument to cond should happen to contain
 825           # only the default clause then this optimization will fail.
 826           pass
 827         else:
 828           stack = (E, (T, (P, stack)))
 829       return stack, expression, dictionary
 830
 831
 832     def _cond(conditions, expression):
 833       (clause, rest) = conditions
 834       if not rest:  # clause is [D]
 835         return clause
 836       P, T = clause
 837       return (P, (T, (_cond(rest, ()), (S_ifte, expression))))
 838
 839
 840
 841     D['cond'] = cond
 842
 843 .. code:: ipython2
 844
 845     J("1 0 ['G'] ['E'] ['L'] cmp")
 846
 847
 848 .. parsed-literal::
 849
 850     'G'
 851
 852
 853 .. code:: ipython2
 854
 855     J("1 1 ['G'] ['E'] ['L'] cmp")
 856
 857
 858 .. parsed-literal::
 859
 860     'E'
 861
 862
 863 .. code:: ipython2
 864
 865     J("0 1 ['G'] ['E'] ['L'] cmp")
 866
 867
 868 .. parsed-literal::
 869
 870     'L'
 871
 872
 873 .. code:: ipython2
 874
 875     from joy.library import DefinitionWrapper
 876
 877
 878     DefinitionWrapper.add_definitions('''
 879
 880     P == over [popop popop first] nullary
 881     T> == [cons cons dipdd] cons cons cons infra
 882     T< == [cons cons dipd] cons cons cons infra
 883     E == pop swap roll< rest rest cons cons
 884
 885     BTree-add == [popop not] [[pop] dipd BTree-new] [] [P [T>] [E] [T<] cmp] genrec
 886
 887     ''', D)
 888
 889 .. code:: ipython2
 890
 891     J('[] 23 "b" BTree-add')  # Initial
 892
 893
 894 .. parsed-literal::
 895
 896     ['b' 23 [] []]
 897
 898
 899 .. code:: ipython2
 900
 901     J('["b" 23 [] []] 88 "c" BTree-add')  # Less than
 902
 903
 904 .. parsed-literal::
 905
 906     ['b' 23 [] ['c' 88 [] []]]
 907
 908
 909 .. code:: ipython2
 910
 911     J('["b" 23 [] []] 88 "a" BTree-add')  # Greater than
 912
 913
 914 .. parsed-literal::
 915
 916     ['b' 23 ['a' 88 [] []] []]
 917
 918
 919 .. code:: ipython2
 920
 921     J('["b" 23 [] []] 88 "b" BTree-add')  # Equal to
 922
 923
 924 .. parsed-literal::
 925
 926     ['b' 88 [] []]
 927
 928
 929 .. code:: ipython2
 930
 931     J('[] 23 "a" BTree-add 88 "b" BTree-add 44 "c" BTree-add')  # Series.
 932
 933
 934 .. parsed-literal::
 935
 936     ['a' 23 [] ['b' 88 [] ['c' 44 [] []]]]
 937
 938
 939 Factoring and naming
 940 ====================
 941
 942 It may seem silly, but a big part of programming in Forth (and therefore
 943 in Joy) is the idea of small, highly-factored definitions. If you choose
 944 names carefully the resulting definitions can take on a semantic role.
 945
 946 ::
 947
 948    get-node-key == popop popop first
 949    remove-key-and-value-from-node == rest rest
 950    pack-key-and-value == cons cons
 951    prep-new-key-and-value == pop swap roll<
 952    pack-and-apply == [pack-key-and-value] swoncat cons pack-key-and-value infra
 953
 954    BTree-new == swap [[] []] pack-key-and-value
 955    P == over [get-node-key] nullary
 956    T> == [dipdd] pack-and-apply
 957    T< == [dipd] pack-and-apply
 958    E == prep-new-key-and-value remove-key-and-value-from-node pack-key-and-value
 959
 960 A Version of ``BTree-iter`` that does In-Order Traversal
 961 ========================================================
 962
 963 If you look back to the `non-empty case of the ``BTree-iter``
 964 function <#Node-case-%5Bkey-value-left-right%5D>`__ we can design a
 965 varient that first processes the left child, then the current node, then
 966 the right child. This will allow us to traverse the tree in sort order.
 967
 968 ::
 969
 970    BTree-iter-order == [not] [pop] [R0 [BTree-iter] R1] ifte
 971
 972 To define ``R0`` and ``R1`` it helps to look at them as they will appear
 973 when they run:
 974
 975 ::
 976
 977    [key value left right] R0 [BTree-iter-order] R1
 978
 979 Process the left child.
 980 ^^^^^^^^^^^^^^^^^^^^^^^
 981
 982 Staring at this for a bit suggests ``dup third`` to start:
 983
 984 ::
 985
 986    [key value left right] R0        [BTree-iter-order] R1
 987    [key value left right] dup third [BTree-iter-order] R1
 988    [key value left right] left      [BTree-iter-order] R1
 989
 990 Now maybe:
 991
 992 ::
 993
 994    [key value left right] left [BTree-iter-order] [cons dip] dupdip
 995    [key value left right] left [BTree-iter-order] cons dip [BTree-iter-order]
 996    [key value left right] [left BTree-iter-order]      dip [BTree-iter-order]
 997    left BTree-iter-order [key value left right]            [BTree-iter-order]
 998
 999 Process the current node.
1000 ^^^^^^^^^^^^^^^^^^^^^^^^^
1001
1002 So far, so good. Now we need to process the current node’s values:
1003
1004 ::
1005
1006    left BTree-iter-order [key value left right] [BTree-iter-order] [[F] dupdip] dip
1007    left BTree-iter-order [key value left right] [F] dupdip [BTree-iter-order]
1008    left BTree-iter-order [key value left right] F [key value left right] [BTree-iter-order]
1009
1010 If ``F`` needs items from the stack below the left stuff it should have
1011 ``cons``\ ’d them before beginning maybe? For functions like ``first``
1012 it works fine as-is.
1013
1014 ::
1015
1016    left BTree-iter-order [key value left right] first [key value left right] [BTree-iter-order]
1017    left BTree-iter-order key [key value left right] [BTree-iter-order]
1018
1019 Process the right child.
1020 ^^^^^^^^^^^^^^^^^^^^^^^^
1021
1022 First ditch the rest of the node and get the right child:
1023
1024 ::
1025
1026    left BTree-iter-order key [key value left right] [BTree-iter-order] [rest rest rest first] dip
1027    left BTree-iter-order key right [BTree-iter-order]
1028
1029 Then, of course, we just need ``i`` to run ``BTree-iter-order`` on the
1030 right side:
1031
1032 ::
1033
1034    left BTree-iter-order key right [BTree-iter-order] i
1035    left BTree-iter-order key right BTree-iter-order
1036
1037 Defining ``BTree-iter-order``
1038 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1039
1040 The result is a little awkward:
1041
1042 ::
1043
1044    R1 == [cons dip] dupdip [[F] dupdip] dip [rest rest rest first] dip i
1045
1046 Let’s do a little semantic factoring:
1047
1048 ::
1049
1050    fourth == rest rest rest first
1051
1052    proc_left == [cons dip] dupdip
1053    proc_current == [[F] dupdip] dip
1054    proc_right == [fourth] dip i
1055
1056    BTree-iter-order == [not] [pop] [dup third] [proc_left proc_current proc_right] genrec
1057
1058 Now we can sort sequences.
1059
1060 .. code:: ipython2
1061
1062     define('BTree-iter-order == [not] [pop] [dup third] [[cons dip] dupdip [[first] dupdip] dip [rest rest rest first] dip i] genrec')
1063
1064 .. code:: ipython2
1065
1066     J('[3 9 5 2 8 6 7 8 4] to_set BTree-iter-order')
1067
1068
1069 .. parsed-literal::
1070
1071     2 3 4 5 6 7 8 9
1072
1073
1074 Getting values by key
1075 =====================
1076
1077 Let’s derive a function that accepts a tree and a key and returns the
1078 value associated with that key.
1079
1080 ::
1081
1082       tree key BTree-get
1083    ------------------------
1084            value
1085
1086 The base case ``[]``
1087 ^^^^^^^^^^^^^^^^^^^^
1088
1089 As before, the stopping predicate just has to detect the empty list:
1090
1091 ::
1092
1093    BTree-get == [pop not] [E] [R0] [R1] genrec
1094
1095 But what do we do if the key isn’t in the tree? In Python we might raise
1096 a ``KeyError`` but I’d like to avoid exceptions in Joy if possible, and
1097 here I think it’s possible. (Division by zero is an example of where I
1098 think it’s probably better to let Python crash Joy. Sometimes the
1099 machinery fails and you have to “stop the line”, methinks.)
1100
1101 Let’s pass the buck to the caller by making the base case a given, you
1102 have to decide for yourself what ``[E]`` should be.
1103
1104 ::
1105
1106       tree key [E] BTree-get
1107    ---------------------------- key in tree
1108               value
1109
1110       tree key [E] BTree-get
1111    ---------------------------- key not in tree
1112             tree key E
1113
1114 Now we define:
1115
1116 ::
1117
1118    BTree-get == [pop not] swap [R0] [R1] genrec
1119
1120 Note that this ``BTree-get`` creates a slightly different function than
1121 itself and *that function* does the actual recursion. This kind of
1122 higher-level programming is unusual in most languages but natural in
1123 Joy.
1124
1125 ::
1126
1127    tree key [E] [pop not] swap [R0] [R1] genrec
1128    tree key [pop not] [E] [R0] [R1] genrec
1129
1130 The anonymous specialized recursive function that will do the real work.
1131
1132 ::
1133
1134    [pop not] [E] [R0] [R1] genrec
1135
1136 Node case ``[key value left right]``
1137 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1138
1139 Now we need to figure out ``R0`` and ``R1``:
1140
1141 ::
1142
1143    [key value left right] key R0 [BTree-get] R1
1144
1145 We want to compare the search key with the key in the node, and if they
1146 are the same return the value and if they differ then recurse on one of
1147 the child nodes. So it’s very similar to the above funtion, with
1148 ``[R0] == []`` and ``R1 == P [T>] [E] [T<] cmp``:
1149
1150 ::
1151
1152    [key value left right] key [BTree-get] P [T>] [E] [T<] cmp
1153
1154 So:
1155
1156 ::
1157
1158    get-node-key == pop popop first
1159    P == over [get-node-key] nullary
1160
1161 The only difference is that ``get-node-key`` does one less ``pop``
1162 because there’s no value to discard. Now we have to derive the branches:
1163
1164 ::
1165
1166    [key_n value_n left right] key [BTree-get] T>
1167    [key_n value_n left right] key [BTree-get] E
1168    [key_n value_n left right] key [BTree-get] T<
1169
1170 The cases of ``T>`` and ``T<`` are similar to above but instead of using
1171 ``infra`` we have to discard the rest of the structure:
1172
1173 ::
1174
1175    [key_n value_n left right] key [BTree-get] T> == right key BTree-get
1176    [key_n value_n left right] key [BTree-get] T< == left key BTree-get
1177
1178 So:
1179
1180 ::
1181
1182    T> == [fourth] dipd i
1183    T< == [third] dipd i
1184
1185 E.g.:
1186
1187 ::
1188
1189    [key_n value_n left right]        key [BTree-get] [fourth] dipd i
1190    [key_n value_n left right] fourth key [BTree-get]               i
1191                        right         key [BTree-get]               i
1192                        right         key  BTree-get
1193
1194 And:
1195
1196 ::
1197
1198    [key_n value_n left right] key [BTree-get] E == value_n
1199
1200    E == popop second
1201
1202 So:
1203
1204 ::
1205
1206    fourth == rest rest rest first
1207    get-node-key == pop popop first
1208    P == over [get-node-key] nullary
1209    T> == [fourth] dipd i
1210    T< == [third] dipd i
1211    E == popop second
1212
1213    BTree-get == [pop not] swap [] [P [T>] [E] [T<] cmp] genrec
1214
1215 .. code:: ipython2
1216
1217     # I don't want to deal with name conflicts with the above so I'm inlining everything here.
1218     # The original Joy system has "hide" which is a meta-command which allows you to use named
1219     # definitions that are only in scope for a given definition.  I don't want to implement
1220     # that (yet) so...
1221
1222
1223     define('''
1224     BTree-get == [pop not] swap [] [
1225       over [pop popop first] nullary
1226       [[rest rest rest first] dipd i]
1227       [popop second]
1228       [[third] dipd i]
1229       cmp
1230       ] genrec
1231     ''')
1232
1233 .. code:: ipython2
1234
1235     J('[] "gary" [popop "err"] BTree-get')
1236
1237
1238 .. parsed-literal::
1239
1240     'err'
1241
1242
1243 .. code:: ipython2
1244
1245     J('["gary" 23 [] []] "gary" [popop "err"] BTree-get')
1246
1247
1248 .. parsed-literal::
1249
1250     23
1251
1252
1253 .. code:: ipython2
1254
1255     J('''
1256
1257         [] [[0 'a'] [1 'b'] [2 'c']] [i BTree-add] step
1258
1259         'c' [popop 'not found'] BTree-get
1260
1261     ''')
1262
1263
1264 .. parsed-literal::
1265
1266     2
1267
1268
1269 BTree-delete
1270 ============
1271
1272 Now let’s write a function that can return a tree datastructure with a
1273 key, value pair deleted:
1274
1275 ::
1276
1277       tree key BTree-delete
1278    ---------------------------
1279           tree
1280
1281 If the key is not in tree it just returns the tree unchanged.
1282
1283 So:
1284
1285 ::
1286
1287    BTree-Delete == [pop not] swap [R0] [R1] genrec
1288
1289 ::
1290
1291                 [Er] BTree-delete
1292    -------------------------------------
1293       [pop not] [Er] [R0] [R1] genrec
1294
1295 ::
1296
1297    [n_key n_value left right] [BTree-get]
1298    [n_key n_value left right] [BTree-get] E
1299    [n_key n_value left right] [BTree-get] T<
1300
1301 Now we get to figure out the recursive case:
1302
1303 ::
1304
1305    w/ D == [pop not] [Er] [R0] [R1] genrec
1306
1307    [node_key node_value left right] key R0                  [D] R1
1308    [node_key node_value left right] key over first swap dup [D] R1
1309    [node_key node_value left right] node_key key key        [D] R1
1310
1311 And then:
1312
1313 ::
1314
1315    [node_key node_value left right] node_key key key [D] R1
1316    [node_key node_value left right] node_key key key [D] cons roll> [T>] [E] [T<] cmp
1317    [node_key node_value left right] node_key key [key D]      roll> [T>] [E] [T<] cmp
1318    [node_key node_value left right] [key D] node_key key            [T>] [E] [T<] cmp
1319
1320 Now this:;
1321
1322 ::
1323
1324    [node_key node_value left right] [key D] node_key key [T>] [E] [T<] cmp
1325
1326 Becomes one of these three:;
1327
1328 ::
1329
1330    [node_key node_value left right] [key D] T>
1331    [node_key node_value left right] [key D] E
1332    [node_key node_value left right] [key D] T<
1333
1334 Greater than case and less than case
1335 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1336
1337 ::
1338
1339       [node_key node_value left right] [key D] T>
1340    -------------------------------------------------
1341       [node_key node_value left key D right]
1342
1343 First:
1344
1345 ::
1346
1347    right left       node_value node_key [key D] dipd
1348    right left key D node_value node_key
1349    right left'      node_value node_key
1350
1351 Ergo:
1352
1353 ::
1354
1355    [node_key node_value left right] [key D] [dipd] cons infra
1356
1357 So:
1358
1359 ::
1360
1361    T> == [dipd] cons infra
1362    T< == [dipdd] cons infra
1363
1364 The else case
1365 ~~~~~~~~~~~~~
1366
1367 ::
1368
1369    [node_key node_value left right] [key D] E
1370
1371 We have to handle three cases, so let’s use ``cond``.
1372
1373 The first two cases are symmetrical, if we only have one non-empty child
1374 node return it.
1375
1376 ::
1377
1378    E == [
1379        [[pop third not] pop fourth]
1380        [[pop fourth not] pop third]
1381        [default]
1382    ] cond
1383
1384 (If both child nodes are empty return an empty node.)
1385
1386 The initial structure of the default function:
1387
1388 ::
1389
1390    default == [E'] cons infra
1391
1392    [node_key node_value left right] [key D] default
1393    [node_key node_value left right] [key D] [E'] cons infra
1394    [node_key node_value left right] [[key D] E']      infra
1395
1396    right left node_value node_key [key D] E'
1397
1398 If both child nodes are non-empty, we find the highest node in our lower
1399 sub-tree, take its key and value to replace (delete) our own, then get
1400 rid of it by recursively calling delete() on our lower sub-node with our
1401 new key.
1402
1403 (We could also find the lowest node in our higher sub-tree and take its
1404 key and value and delete it. I only implemented one of these two
1405 symmetrical options. Over a lot of deletions this might make the tree
1406 more unbalanced. Oh well.)
1407
1408 First things first, we no longer need this node’s key and value:
1409
1410 ::
1411
1412    right left node_value node_key [key D] roll> popop E''
1413    right left [key D] node_value node_key       popop E''
1414    right left [key D]                                 E''
1415
1416 Then we have to we find the highest (right-most) node in our lower
1417 (left) sub-tree:
1418
1419 ::
1420
1421    right left [key D] E''
1422
1423 Ditch the key:
1424
1425 ::
1426
1427    right left [key D] rest E'''
1428    right left     [D]      E'''
1429
1430 Find the right-most node:
1431
1432 ::
1433
1434    right left        [D] [dup W] dip E''''
1435    right left dup  W [D]             E''''
1436    right left left W [D]             E''''
1437
1438 Consider:
1439
1440 ::
1441
1442    left W
1443
1444 We know left is not empty:
1445
1446 ::
1447
1448    [L_key L_value L_left L_right] W
1449
1450 We want to keep extracting the right node as long as it is not empty:
1451
1452 ::
1453
1454    left [P] [B] while W'
1455
1456 The predicate:
1457
1458 ::
1459
1460    [L_key L_value L_left L_right] P
1461    [L_key L_value L_left L_right] fourth
1462                          L_right
1463
1464
1465 (This has a bug, can run on ``[]`` so must be guarded:
1466
1467 ::
1468
1469    if_not_empty == [] swap [] ifte
1470    ?fourth == [fourth] if_not_empty
1471    W.rightmost == [?fourth] [fourth] while
1472
1473 The body is also ``fourth``:
1474
1475 ::
1476
1477    left [fourth] [fourth] while W'
1478    rightest                     W'
1479
1480 We know rightest is not empty:
1481
1482 ::
1483
1484    [R_key R_value R_left R_right] W'
1485    [R_key R_value R_left R_right] uncons uncons pop
1486    R_key [R_value R_left R_right]        uncons pop
1487    R_key R_value [R_left R_right]               pop
1488    R_key R_value
1489
1490 So:
1491
1492 ::
1493
1494    W == [fourth] [fourth] while uncons uncons pop
1495
1496 And:
1497
1498 ::
1499
1500    right left left W        [D] E''''
1501    right left R_key R_value [D] E''''
1502
1503 Final stretch. We want to end up with something like:
1504
1505 ::
1506
1507    right left [R_key D] i R_value R_key
1508    right left  R_key D    R_value R_key
1509    right left'            R_value R_key
1510
1511 If we adjust our definition of ``W`` to include ``over`` at the end:
1512
1513 ::
1514
1515    W == [fourth] [fourth] while uncons uncons pop over
1516
1517 That will give us:
1518
1519 ::
1520
1521    right left R_key R_value R_key [D] E''''
1522
1523    right left         R_key R_value R_key [D] cons dipdd E'''''
1524    right left         R_key R_value [R_key D]      dipdd E'''''
1525    right left R_key D R_key R_value                      E'''''
1526    right left'        R_key R_value                      E'''''
1527    right left'        R_key R_value                      swap
1528    right left' R_value R_key
1529
1530 So:
1531
1532 ::
1533
1534    E' == roll> popop E''
1535
1536    E'' == rest E'''
1537
1538    E''' == [dup W] dip E''''
1539
1540    E'''' == cons dipdd swap
1541
1542 Substituting:
1543
1544 ::
1545
1546    W == [fourth] [fourth] while uncons uncons pop over
1547    E' == roll> popop rest [dup W] dip cons dipdd swap
1548    E == [
1549        [[pop third not] pop fourth]
1550        [[pop fourth not] pop third]
1551        [[E'] cons infra]
1552    ] cond
1553
1554 Minor rearrangement:
1555
1556 ::
1557
1558    W == dup [fourth] [fourth] while uncons uncons pop over
1559    E' == roll> popop rest [W] dip cons dipdd swap
1560    E == [
1561        [[pop third not] pop fourth]
1562        [[pop fourth not] pop third]
1563        [[E'] cons infra]
1564    ] cond
1565
1566 Refactoring
1567 ~~~~~~~~~~~
1568
1569 ::
1570
1571    W.rightmost == [fourth] [fourth] while
1572    W.unpack == uncons uncons pop
1573    E.clear_stuff == roll> popop rest
1574    E.delete == cons dipdd
1575    W == dup W.rightmost W.unpack over
1576    E.0 == E.clear_stuff [W] dip E.delete swap
1577    E == [
1578        [[pop third not] pop fourth]
1579        [[pop fourth not] pop third]
1580        [[E.0] cons infra]
1581    ] cond
1582    T> == [dipd] cons infra
1583    T< == [dipdd] cons infra
1584    R0 == over first swap dup
1585    R1 == cons roll> [T>] [E] [T<] cmp
1586    BTree-Delete == [pop not] swap [R0] [R1] genrec
1587
1588 By the standards of the code I’ve written so far, this is a *huge* Joy
1589 program.
1590
1591 .. code:: ipython2
1592
1593     DefinitionWrapper.add_definitions('''
1594     first_two == uncons uncons pop
1595     fourth == rest rest rest first
1596     ?fourth == [] [fourth] [] ifte
1597     W.rightmost == [?fourth] [fourth] while
1598     E.clear_stuff == roll> popop rest
1599     E.delete == cons dipdd
1600     W == dup W.rightmost first_two over
1601     E.0 == E.clear_stuff [W] dip E.delete swap
1602     E == [[[pop third not] pop fourth] [[pop fourth not] pop third] [[E.0] cons infra]] cond
1603     T> == [dipd] cons infra
1604     T< == [dipdd] cons infra
1605     R0 == over first swap dup
1606     R1 == cons roll> [T>] [E] [T<] cmp
1607     BTree-Delete == [pop not] swap [R0] [R1] genrec''', D)
1608
1609 .. code:: ipython2
1610
1611     J("['a' 23 [] ['b' 88 [] ['c' 44 [] []]]] 'c' ['Er'] BTree-Delete ")
1612
1613
1614 .. parsed-literal::
1615
1616     ['a' 23 [] ['b' 88 [] []]]
1617
1618
1619 .. code:: ipython2
1620
1621     J("['a' 23 [] ['b' 88 [] ['c' 44 [] []]]] 'b' ['Er'] BTree-Delete ")
1622
1623
1624 .. parsed-literal::
1625
1626     ['a' 23 [] ['c' 44 [] []]]
1627
1628
1629 .. code:: ipython2
1630
1631     J("['a' 23 [] ['b' 88 [] ['c' 44 [] []]]] 'a' ['Er'] BTree-Delete ")
1632
1633
1634 .. parsed-literal::
1635
1636     ['b' 88 [] ['c' 44 [] []]]
1637
1638
1639 .. code:: ipython2
1640
1641     J("['a' 23 [] ['b' 88 [] ['c' 44 [] []]]] 'der' ['Er'] BTree-Delete ")
1642
1643
1644 .. parsed-literal::
1645
1646     ['a' 23 [] ['b' 88 [] ['c' 44 [] 'Er' 'der' []]]]
1647
1648
1649 .. code:: ipython2
1650
1651     J("['a' 23 [] ['b' 88 [] ['c' 44 [] []]]] 'der' [pop] BTree-Delete ")
1652
1653
1654 .. parsed-literal::
1655
1656     ['a' 23 [] ['b' 88 [] ['c' 44 [] []]]]
1657
1658
1659 One bug, I forgot to put ``not`` in the first two clauses of the
1660 ``cond``.
1661
1662 The behavior of the ``[Er]`` function should maybe be different: either
1663 just silently fail, or maybe implement some sort of function that can
1664 grab the pending expression up to a sentinel value or something,
1665 allowing for a kind of “except”-ish control-flow?
1666
1667 Then, once we have add, get, and delete we can see about abstracting
1668 them.
1669
1670 Tree with node and list of trees.
1671 =================================
1672
1673 Let’s consider a tree structure, similar to one described `“Why
1674 functional programming matters” by John
1675 Hughes <https://www.cs.kent.ac.uk/people/staff/dat/miranda/whyfp90.pdf>`__,
1676 that consists of a node value and a sequence of zero or more child
1677 trees. (The asterisk is meant to indicate the `Kleene
1678 star <https://en.wikipedia.org/wiki/Kleene_star>`__.)
1679
1680 ::
1681
1682    tree = [] | [node [tree*]]
1683
1684 ``treestep``
1685 ~~~~~~~~~~~~
1686
1687 In the spirit of ``step`` we are going to define a combinator
1688 ``treestep`` which expects a tree and three additional items: a
1689 base-case value ``z``, and two quoted programs ``[C]`` and ``[N]``.
1690
1691 ::
1692
1693    tree z [C] [N] treestep
1694
1695 If the current tree node is empty then just leave ``z`` on the stack in
1696 lieu:
1697
1698 ::
1699
1700       [] z [C] [N] treestep
1701    ---------------------------
1702          z
1703
1704 Otherwise, evaluate ``N`` on the node value, ``map`` the whole function
1705 (abbreviated here as ``k``) over the child trees recursively, and then
1706 combine the result with ``C``.
1707
1708 ::
1709
1710       [node [tree*]] z [C] [N] treestep
1711    --------------------------------------- w/ K == z [C] [N] treestep
1712           node N [tree*] [K] map C
1713
1714 Derive the recursive form.
1715 ~~~~~~~~~~~~~~~~~~~~~~~~~~
1716
1717 Since this is a recursive function, we can begin to derive it by finding
1718 the ``ifte`` stage that ``genrec`` will produce. The predicate and
1719 base-case functions are trivial, so we just have to derive ``J``.
1720
1721 ::
1722
1723    K == [not] [pop z] [J] ifte
1724
1725 The behavior of ``J`` is to accept a (non-empty) tree node and arrive at
1726 the desired outcome.
1727
1728 ::
1729
1730           [node [tree*]] J
1731    ------------------------------
1732       node N [tree*] [K] map C
1733
1734 So ``J`` will have some form like:
1735
1736 ::
1737
1738    J == .. [N] .. [K] .. [C] ..
1739
1740 Let’s dive in. First, unquote the node and ``dip`` ``N``.
1741
1742 ::
1743
1744    [node [tree*]] i [N] dip
1745     node [tree*]    [N] dip
1746    node N [tree*]
1747
1748 Next, ``map`` ``K`` over teh child trees and combine with ``C``.
1749
1750 ::
1751
1752    node N [tree*] [K] map C
1753    node N [tree*] [K] map C
1754    node N [K.tree*]       C
1755
1756 So:
1757
1758 ::
1759
1760    J == i [N] dip [K] map C
1761
1762 Plug it in and convert to ``genrec``:
1763
1764 ::
1765
1766    K == [not] [pop z] [i [N] dip [K] map C] ifte
1767    K == [not] [pop z] [i [N] dip]   [map C] genrec
1768
1769 Extract the givens to parameterize the program.
1770 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1771
1772 ::
1773
1774    [not] [pop z] [i [N] dip]   [map C] genrec
1775
1776    [not] [pop z]                   [i [N] dip] [map C] genrec
1777    [not] [z]         [pop] swoncat [i [N] dip] [map C] genrec
1778    [not]  z     unit [pop] swoncat [i [N] dip] [map C] genrec
1779    z [not] swap unit [pop] swoncat [i [N] dip] [map C] genrec
1780      \  .........TS0............./
1781       \/
1782    z TS0 [i [N] dip]                       [map C] genrec
1783    z     [i [N] dip]             [TS0] dip [map C] genrec
1784    z       [[N] dip] [i] swoncat [TS0] dip [map C] genrec
1785    z  [N] [dip] cons [i] swoncat [TS0] dip [map C] genrec
1786           \  ......TS1........./
1787            \/
1788    z [N] TS1 [TS0] dip [map C]                      genrec
1789    z [N]               [map C]  [TS1 [TS0] dip] dip genrec
1790    z [N] [C]      [map] swoncat [TS1 [TS0] dip] dip genrec
1791    z [C] [N] swap [map] swoncat [TS1 [TS0] dip] dip genrec
1792
1793 The givens are all to the left so we have our definition.
1794
1795 Define ``treestep``
1796 ~~~~~~~~~~~~~~~~~~~
1797
1798 ::
1799
1800         TS0 == [not] swap unit [pop] swoncat
1801         TS1 == [dip] cons [i] swoncat
1802    treestep == swap [map] swoncat [TS1 [TS0] dip] dip genrec
1803
1804 .. code:: ipython2
1805
1806     DefinitionWrapper.add_definitions('''
1807
1808          TS0 == [not] swap unit [pop] swoncat
1809          TS1 == [dip] cons [i] swoncat
1810     treestep == swap [map] swoncat [TS1 [TS0] dip] dip genrec
1811
1812     ''', D)
1813
1814 ::
1815
1816       [] 0 [C] [N] treestep
1817    ---------------------------
1818          0
1819
1820
1821          [n [tree*]] 0 [sum +] [] treestep
1822       --------------------------------------------------
1823           n [tree*] [0 [sum +] [] treestep] map sum +
1824
1825 .. code:: ipython2
1826
1827     J('[] 0 [sum +] [] treestep')
1828
1829
1830 .. parsed-literal::
1831
1832     0
1833
1834
1835 .. code:: ipython2
1836
1837     J('[23 []] 0 [sum +] [] treestep')
1838
1839
1840 .. parsed-literal::
1841
1842     23
1843
1844
1845 .. code:: ipython2
1846
1847     J('[23 [[2 []] [3 []]]] 0 [sum +] [] treestep')
1848
1849
1850 .. parsed-literal::
1851
1852     28
1853
1854
1855 A slight modification.
1856 ----------------------
1857
1858 Let’s simplify the tree datastructure definition slightly by just
1859 letting the children be the ``rest`` of the tree:
1860
1861 ::
1862
1863    tree = [] | [node tree*]
1864
1865 The ``J`` function changes slightly.
1866
1867 ::
1868
1869            [node tree*] J
1870    ------------------------------
1871       node N [tree*] [K] map C
1872
1873
1874    [node tree*] uncons [N] dip [K] map C
1875    node [tree*]        [N] dip [K] map C
1876    node N [tree*]              [K] map C
1877    node N [tree*]              [K] map C
1878    node N [K.tree*]                    C
1879
1880    J == uncons [N] dip [K] map C
1881
1882    K == [not] [pop z] [uncons [N] dip] [map C] genrec
1883
1884 .. code:: ipython2
1885
1886     define('TS1 == [dip] cons [uncons] swoncat')  # We only need to redefine one word.
1887
1888 .. code:: ipython2
1889
1890     J('[23 [2] [3]] 0 [sum +] [] treestep')
1891
1892
1893 .. parsed-literal::
1894
1895     28
1896
1897
1898 .. code:: ipython2
1899
1900     J('[23 [2 [8] [9]] [3] [4 []]] 0 [sum +] [] treestep')
1901
1902
1903 .. parsed-literal::
1904
1905     49
1906
1907
1908 I think these trees seem a little easier to read.
1909
1910 Redefining our BTree in terms of this form.
1911 -------------------------------------------
1912
1913 ::
1914
1915    BTree = [] | [[key value] left right]
1916
1917 What kind of functions can we write for this with our ``treestep``? The
1918 pattern for processing a non-empty node is:
1919
1920 ::
1921
1922    node N [tree*] [K] map C
1923
1924 Plugging in our BTree structure:
1925
1926 ::
1927
1928    [key value] N [left right] [K] map C
1929
1930
1931    [key value] uncons pop [left right] [K] map i
1932    key [value]        pop [left right] [K] map i
1933    key                    [left right] [K] map i
1934    key                    [lkey rkey ]         i
1935    key                     lkey rkey
1936
1937 .. code:: ipython2
1938
1939     J('[[3 0] [[2 0] [] []] [[9 0] [[5 0] [[4 0] [] []] [[8 0] [[6 0] [] [[7 0] [] []]] []]] []]]   23 [i] [uncons pop] treestep')
1940
1941
1942 .. parsed-literal::
1943
1944     3 23 23
1945
1946
1947 Doesn’t work because ``map`` extracts the ``first`` item of whatever its
1948 mapped function produces. We have to return a list, rather than
1949 depositing our results directly on the stack.
1950
1951 ::
1952
1953    [key value] N     [left right] [K] map C
1954
1955    [key value] first [left right] [K] map flatten cons
1956    key               [left right] [K] map flatten cons
1957    key               [[lk] [rk] ]         flatten cons
1958    key               [ lk   rk  ]                 cons
1959                      [key  lk   rk  ]
1960
1961 So:
1962
1963 ::
1964
1965    [] [flatten cons] [first] treestep
1966
1967 .. code:: ipython2
1968
1969     J('[[3 0] [[2 0] [] []] [[9 0] [[5 0] [[4 0] [] []] [[8 0] [[6 0] [] [[7 0] [] []]] []]] []]]   [] [flatten cons] [first] treestep')
1970
1971
1972 .. parsed-literal::
1973
1974     [3 2 9 5 4 8 6 7]
1975
1976
1977 There we go. #### In-order traversal with ``treestep``.
1978
1979 From here:
1980
1981 ::
1982
1983    key [[lk] [rk]] C
1984    key [[lk] [rk]] i
1985    key  [lk] [rk] roll<
1986    [lk] [rk] key swons concat
1987    [lk] [key rk]       concat
1988    [lk   key rk]
1989
1990 So:
1991
1992 ::
1993
1994    [] [i roll< swons concat] [first] treestep
1995
1996 .. code:: ipython2
1997
1998     J('[[3 0] [[2 0] [] []] [[9 0] [[5 0] [[4 0] [] []] [[8 0] [[6 0] [] [[7 0] [] []]] []]] []]]   [] [i roll< swons concat] [uncons pop] treestep')
1999
2000
2001 .. parsed-literal::
2002
2003     [2 3 4 5 6 7 8 9]
2004
2005
2006 Miscellaneous Crap
2007 ------------------
2008
2009 Toy with it.
2010 ~~~~~~~~~~~~
2011
2012 Let’s reexamine:
2013
2014 ::
2015
2016    [key value left right] R0 [BTree-iter-order] R1
2017        ...
2018    left BTree-iter-order key value F right BTree-iter-order
2019
2020
2021    [key value left right] unstack swap
2022     key value left right               swap
2023     key value right left
2024
2025    key value right left [BTree-iter-order] [cons dipdd] dupdip
2026    key value right left [BTree-iter-order] cons dipdd [BTree-iter-order]
2027    key value right [left BTree-iter-order]      dipdd [BTree-iter-order]
2028    left BTree-iter-order key value right              [BTree-iter-order]
2029
2030    left BTree-iter-order key value   right [F] dip [BTree-iter-order]
2031    left BTree-iter-order key value F right         [BTree-iter-order] i
2032    left BTree-iter-order key value F right          BTree-iter-order
2033
2034 So:
2035
2036 ::
2037
2038    R0 == unstack swap
2039    R1 == [cons dipdd [F] dip] dupdip i
2040
2041    [key value left right] R0                [BTree-iter-order] R1
2042    [key value left right] unstack swap [BTree-iter-order] [cons dipdd [F] dip] dupdip i
2043     key value right left                    [BTree-iter-order] [cons dipdd [F] dip] dupdip i
2044
2045     key value right left [BTree-iter-order] cons dipdd [F] dip [BTree-iter-order] i
2046     key value right [left BTree-iter-order]      dipdd [F] dip [BTree-iter-order] i
2047     left BTree-iter-order key value   right            [F] dip [BTree-iter-order] i
2048     left BTree-iter-order key value F right                    [BTree-iter-order] i
2049     left BTree-iter-order key value F right                     BTree-iter-order
2050
2051
2052    BTree-iter-order == [not] [pop] [unstack swap] [[cons dipdd [F] dip] dupdip i] genrec
2053
2054 Refactor ``cons cons``
2055 ^^^^^^^^^^^^^^^^^^^^^^
2056
2057 ::
2058
2059    cons2 == cons cons
2060
2061 Refactoring:
2062
2063 ::
2064
2065    BTree-new == swap [[] []] cons2
2066    T == [cons2 dipdd] cons2 cons infra
2067    Te == [cons2 dipd] cons2 cons infra
2068    Ee == pop swap roll< rest rest cons2
2069
2070 It’s used a lot because it’s tied to the fact that there are two “data
2071 items” in each node. This point to a more general factorization that
2072 would render a combinator that could work for other geometries of trees.
2073
2074 A General Form for Trees
2075 ------------------------
2076
2077 A general form for tree data with N children per node:
2078
2079 ::
2080
2081    [[data] [child0] ... [childN-1]]
2082
2083 Suggests a general form of recursive iterator, but I have to go walk the
2084 dogs at the mo’.
2085
2086 For a given structure, you would have a structure of operator functions
2087 and sort of merge them and run them, possibly in a different order (pre-
2088 post- in- y’know). The ``Cn`` functions could all be the same and use
2089 the ``step`` trick if the children nodes are all of the right kind. If
2090 they are heterogeneous then we need a way to get the different ``Cn``
2091 into the structure in the right order. If I understand correctly, the
2092 “Bananas…” paper shows how to do this automatically from a type
2093 description. They present, if I have it right, a tiny machine that
2094 accepts `some sort of algebraic data type description and returns a
2095 function that can recusre over
2096 it <https://en.wikipedia.org/wiki/Catamorphism#General_case>`__, I
2097 think.
2098
2099 ::
2100
2101       [data.. [c0] [c1] ... [cN]] [F C0 C1 ... CN] infil
2102    --------------------------------------------------------
2103       data F [c0] C0 [c1] C1 ... [cN] CN
2104
2105
2106
2107 Just make ``[F]`` a parameter.
2108 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2109
2110 We can generalize to a sort of pure form:
2111
2112 ::
2113
2114    BTree-iter == [not] [pop] [[F]]            [R1] genrec
2115               == [not] [pop] [[F] [BTree-iter] R1] ifte
2116
2117 Putting ``[F]`` to the left as a given:
2118
2119 ::
2120
2121     [F] unit [not] [pop] roll< [R1] genrec
2122    [[F]]     [not] [pop] roll< [R1] genrec
2123              [not] [pop] [[F]] [R1] genrec
2124
2125 Let’s us define a parameterized form:
2126
2127 ::
2128
2129    BTree-iter == unit [not] [pop] roll< [R1] genrec
2130
2131 So in the general case of non-empty nodes:
2132
2133 ::
2134
2135    [key value left right] [F] [BTree-iter] R1
2136
2137 We just define ``R1`` to do whatever it has to to process the node. For
2138 example:
2139
2140 ::
2141
2142    [key value left right] [F] [BTree-iter] R1
2143        ...
2144    key value F   left BTree-iter   right BTree-iter
2145    left BTree-iter   key value F   right BTree-iter
2146    left BTree-iter   right BTree-iter   key value F
2147
2148 Pre-, ??-, post-order traversals.
2149
2150 ::
2151
2152    [key value  left right] uncons uncons
2153     key value [left right]
2154
2155 For pre- and post-order we can use the ``step`` trick:
2156
2157 ::
2158
2159    [left right] [BTree-iter] step
2160        ...
2161    left BTree-iter right BTree-iter
2162
2163 We worked out one scheme for ?in-order? traversal above, but maybe we
2164 can do better?
2165
2166 ::
2167
2168    [key value left right]              [F] [BTree-iter] [unstack] dipd
2169    [key value left right] unstack [F] [BTree-iter]
2170     key value left right               [F] [BTree-iter]
2171
2172    key value left right [F] [BTree-iter] R1.1
2173
2174 Hmm…
2175
2176 ::
2177
2178    key value left right              [F] [BTree-iter] tuck
2179    key value left right [BTree-iter] [F] [BTree-iter]
2180
2181
2182    [key value left right]                          [F] [BTree-iter] [unstack [roll>] dip] dipd
2183    [key value left right] unstack [roll>] dip [F] [BTree-iter]
2184     key value left right               [roll>] dip [F] [BTree-iter]
2185     key value left roll> right                     [F] [BTree-iter]
2186     left key value right                           [F] [BTree-iter]
2187
2188    left            key value   right              [F] [BTree-iter] tuck foo
2189    left            key value   right [BTree-iter] [F] [BTree-iter] foo
2190        ...
2191    left BTree-iter key value F right  BTree-iter
2192
2193 We could just let ``[R1]`` be a parameter too, for maximum flexibility.
2194
2195 Automatically deriving the recursion combinator for a data type?
2196 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2197
2198 If I understand it correctly, the “Bananas…” paper talks about a way to
2199 build the processor function automatically from the description of the
2200 type. I think if we came up with an elegant way for the Joy code to
2201 express that, it would be cool. In Joypy the definitions can be circular
2202 because lookup happens at evaluation, not parsing. E.g.:
2203
2204 ::
2205
2206    A == ... B ...
2207    B == ... A ...
2208
2209 That’s fine. Circular datastructures can’t be made though.
2210
2211
2212
2213
2214
2215
2216
2217
2218
2219
2220
2221