4 * Joel Matthew Rees April 2020
6 * Borrowing some concepts from fig-Forth.
7 * Purely theoretical, not tested!
9 * Natural 32-bit version.
10 * Unnatural 16-bit version might be a project for another day?
11 * (Would primarily be of interest for MUL and DIV, but CPU32 version is more interesting.)
13 * ------------------------------------LICENSE-------------------------------------
15 * Copyright (c) 2009, 2010, 2011 Joel Matthew Rees
17 * Permission is hereby granted, free of charge, to any person obtaining a copy
18 * of this software and associated documentation files (the "Software"), to deal
19 * in the Software without restriction, including without limitation the rights
20 * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
21 * copies of the Software, and to permit persons to whom the Software is
22 * furnished to do so, subject to the following conditions:
24 * The above copyright notice and this permission notice shall be included in
25 * all copies or substantial portions of the Software.
27 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
28 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
29 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
30 * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
31 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
32 * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
35 * --------------------------------END-OF-LICENSE----------------------------------
39 * These must be edited for target runtime:
41 * Necessary here for fake forward referencing:
47 BYTESZ EQU 8 ; bit count in byte
49 * I may want to explore non-linear addressing in the 68000
50 * (and apply it to the 6805) once the basic interpreter is done
51 * in the natural 32-bit CELL.
53 ADRWDSZ EQU 4 ; bytes per address word
54 ADRWREM EQU (ADRWDSZ-1) ; for masking in address-odd remainder bits
55 ADRWMSK EQU ((-ADRWREM)-1) ; for masking out odd parts of addresses (manual bit invert)
58 HALFBSZ EQU (HALFSZ*2) ; bit count in half-CELL
59 * If at all possible, a CELL should be able to contain an address.
60 * Otherwise, fetch and store become oddities.
62 CELLBSZ EQU (ADRWDSZ*BYTESZ) ; bit count in CELL
64 DBLBSZ EQU (DBLSZ*BYTESZ) ; bit count in DOUBLE
66 GAPCT EQU 2 ; address words for the gaps
67 ALLOGAP EQU (GAPCT*ADRWDSZ) ; For crude checks, gaps always zero.
70 * Declare initial Return Stack (flow-of-control stack):
71 RSTKBND EQU ($8000-ADRWDSZ ; Bound: one beyond, but avoiding wraparound
72 RSTKINI EQU (RSTKBND) ; Init: next available byte on 68000 -- pre-dec
73 RSTKSZ EQU (62*ADRWDSZ) ; Size: Safe for most purposes.
74 RSTKLIM EQU (RSTKBND-RSTKSZ) ; Limit: Last useable
75 * Kibitzing -- CPUs really should have automatic stack bounds checking.
76 * Don't forget gaps for CPUs that don't automatically check.
77 * Crude guard rails is better than none?
79 * Declare initial Locals Stack (temporaries stack):
80 LSTKBND EQU (RSTKLIM-ALLOGAP)
81 LSTKINI EQU (LSTKBND-CELLSZ) ; Pre-dec, but on address word boundary.
82 LSTKSZ EQU (30*CELLSZ) ; Size: CELL addressing, small stacks.
83 LSTKLIM EQU (LSTKBND-LSTKSZ)
85 * Declare initial Parameter Stack (data stack):
86 SSTKBND EQU (LSTKLIM-ALLOGAP)
87 SSTKINI EQU (SSTKBND) ; Also pre-dec, but on address word boundary.
88 SSTKSZ EQU (126*ADRWDSZ) ; Size: CELL addressing, small stacks.
89 SSTKLIM EQU (SSTKBND-SSTKSZ)
91 * The paramater stack and heap at opposite ends of the same region
94 * The initial per-user allocation heap:
95 UPGBND EQU (SSTKLIM-ALLOGAP)
96 UPGSZ EQU 30*ADRWDSZ ; This will need adjusting in many cases.
97 UPGBASE EQU (UPGBND-UPGSZ)
101 * On 6809, BRN is being inserted at the top of routines
102 * with an offset to the end, to mark what can be pulled in
103 * for blind in-lining.
105 * But I'm not sure I'm actually going to use it.
108 * ORG directives in older assemblers can get tangled up
109 * if they are convoluted.
110 * Keep the ORGs in assending order.
116 * Internal registers --
117 * (When switching contexts, these must be saved and restored.):
120 * A7/SP == RP, the return/flow-of-control stack on 68000
123 * A6 == PSP, the parameter/data stack pointer on 68000
125 * A5 == UP, the pointer to the per-task heap on 68000
127 * A4 == LSP, optional local stack pointer on 68000
129 * all temps not in registers are allocated locally
130 * -- on RP/A7, PSP/A6, UP/A5, or possibly LSP/A4
132 * general counter allocated in any free Dn
134 * destination index pointer as any free An (A0 .. A3)
136 * source index pointer as any free An (A0 .. A3)
139 *** D0 to D3, A0, and A1 are throwaway or return values
140 * and should be saved by the caller before calls when necessary;
141 * maybe be used freely in called routines.
143 *** D4 to D7, A2, A3, and maybe A4 are persistent or parameters
144 * and should be saved before being used in called routines.
154 * Definitions which are good candidates for direct substitution instead of call
155 * might be marked by something like
160 * CODE ... ; not needed in substitution
162 * CODE ... ; code which gets substituted instead of call
164 * CODE ... ; not needed in substitution
166 * to bracket the substitution for the interpreter/compiler.
168 * This would be part of the _WORD_ macro.
171 * Taking the AND and the @ and ! primitives as examples,
172 * would using intermediates make optimization easier?
173 * (As in optimize by stripping stack maintenance
174 * and replacing it with register allocation.)
177 * MOVE.L CELLSZ(A6),D3
186 * MOVE.L D0,(A6) ; whole cell to TOS
191 * MOVE.L (A6)+,D0 ; Get whole cell to keep stack address correct.
192 * MOVE.W D0,(A1) ; Store only half-cell, do not clear high half!
196 * _WORD_ Take logical AND of top two on stack ( n1 n2 --- n3 ):
202 * _WORD_ Take logical OR of top two on stack ( n1 n2 --- n3 ):
208 * _WORD_ Take logical OR of top two on stack ( n1 n2 --- n3 ):
211 EOR.L D3,(A6) ; (Not) coincidentally, EOR does not do (A6),D3.
214 * _WORD_ + Add top two cells on stack ( n1 n2 --- sum ):
220 * _WORD_ - Subtract top cell on stack from second ( n1 n2 --- difference(n1-n2) ):
226 * _WORD_ B@ Fetch byte only pointed to by top cell on stack ( adr --- b(at adr) ):
227 * (Refer to Forth's C@, but byte is not character!)
230 CLR.L (A6) ; instead of intermediate Dn and CLR
234 * _WORD_ B! Store low byte of cell at 2nd at address on top of stack, deallocate both ( b adr --- ):
235 * (Refer to Forth's C!, but byte is not character!)
238 MOVE.B (A6),(A1) ; Store only byte, do not clear high bytes!
239 LEA ADRWDSZ(A6),A6 ; Less footprint than intermediate post-inc
242 * _WORD_ S@ Fetch half-cell only pointed to by top cell on stack ( adr --- h(at adr) ):
243 * adr must be even address aligned on most 68K.
246 CLR.L (A6) ; instead of intermediate Dn and CLR
250 * _WORD_ S! Store half-cell at 2nd at address on top of stack, deallocate both ( h adr --- ):
251 * adr must be even address aligned on most 68K.
254 MOVE.W (A6),(A1) ; Store only half-cell, do not clear high half!
255 LEA ADRWDSZ(A6),A6 ; Less footprint than intermediate post-inc
258 * _WORD_ @ Fetch cell pointed to by top cell on stack ( adr --- n(at adr) ):
259 * adr must be even address aligned on most 68K.
265 * _WORD_ ! Store cell at 2nd at address on top of stack, deallocate both ( n adr --- ):
266 * adr must be even address aligned on most 68K.
272 * Low confidence in the multiply and divide without an emulator to check.
274 * u1 h1:l1 ADRWDSZ:HALFSZ+ADRWDSZ
277 * _WORD_ U*bit Unsigned multiply of top two on stack ( u1 u2 --- udproduct(n1*n2) ):
278 * Consider bit test instead of shift?
280 MOVEQ #CELLBSZ,D0 ; bits/cell -- maybe count is one less for DBcc?
281 MOVEQ #0,D2 ; Clears carry, not extend
282 MOVE.L ADRWDSZ(A6),D3 ; multiplicand
283 MOVE.L (A6),D1 ; multiplier
286 DBF D0,USTART ; done? hits both carry and extend!
291 USTARNA RORX.L D2 ; shift result in
293 USTARX MOVEM.L D2/D3,(A6) ; Store result.
296 * u1 h1:l1 ADRWDSZ:HALFSZ+ADRWDSZ
299 * _WORD_ U* Unsigned multiply of top two on stack ( u1 u2 --- udproduct(n1*n2) )
300 * Using 68000's MUL for speed.
301 * Optimize for small operands at runtime.
302 * More code, less time, but I need to check that I'm handling the halves right:
304 * D1 gets product of two low halves plus the low halves of the mixed inner products.
305 * D0 gets product of two high halves plus the high halves of the two mixed inner products,
306 * plus carries from D1.
307 * Break early when we can.
309 MOVEM.L (A6),D3/D2 ; multiplicand/multiplier
311 MULU D3,D1 ; Lower halves product
313 SWAP D4 ; higher half of multiplier in D4
315 SWAP D5 ; higher half of multiplicand in D5
325 MULU D3,D6 ; multiplicand low by multiplier high
334 MULU D2,D6 ; multiplier low by multiplicand high
341 USTAR4 ; Anyway, now do the high halves, since both were non-zero.
347 MOVEQ #0 D1 ; Scratch area for inner products
350 MOVE.W HALFSZ+ADRWDSZ(A6),D3 ; low halves
354 MOVE.W ADRWDSZ(A6),D2 ; inner1: u1 high
355 MULU HALFSZ(A6),D2 ; u2 low
358 MOVE.W D2,D1 ; lower half of inner1
359 ADD.L D1,D3 ; No carry possible yet.
360 * bound: $FFFE0001+$0000FFFF=$FFFF0000
363 MOVE.W D2,D1 ; higher half of inner1, hold it.
365 MOVE.W HALFSZ+ADRWDSZ(A6),D2 ; inner2: u1 low
366 MULU (A6),D2 ; u2 high
369 MOVE.W D2,D0 ; lower half of inner2
370 ADD.L D0,D3 ; Still no carry possible.
371 * bound: $FFFF0000+$0000FFFF=$FFFFFFFF
373 MOVE.W D2,D0 ; higher half of inner2
374 ADD.L D0,D1 ; add to inner1 higher half
375 * bound: $0000FFFF+$0000FFFF=$0001FFFE
376 MOVE.W ADRWDSZ(A6),D2 ; high halves
380 * bound: $FFFE0001+$0001FFFE=$FFFFFFFF
381 * Done, result in D2:D3
385 * _WORD_ SWAP swap top two cells on stack ( n1 n2 --- n2 n1 ):
393 * MOVE.L D2,(ADRWDSZ,A6)
395 * Which will be smaller, faster, less bus activity?
398 * _WORD_ U/MODbit Unsigned divide of unsigned double dividend in 2nd:3rd on stack by top unsigned divisor
399 * ( ud u --- uremainder uquotient )
400 * Dividend should be range of product of 32 by 32 bit unsigned multiply,
401 * divisor should be the multiplier or multiplicand:
402 * Consider bit test instead of shift?
403 * Also, examine the native divide again.
404 * ** Native divide requires divide-by-zero trap code!
406 MOVEQ #1+CELLBSZ,D0 ; bit ct
407 MOVEM.L (A6),D3/D2/D1 ; D1 is divisor, D2:D3 is dividend
409 CMP.L D1,D2 ; dividend high in D2 - divisor in D1
410 BHS.B USLSUB ; *** need to look at this carefully
411 ANDI #^1,CCR ; clear carry bit
415 ORI #1,CCR ; quotient bit,
417 ROXL.L #1,D3 ; save it
418 SUBQ #1,D0 ; more bits?
419 BEQ.B USLR ; Can DBcc be used here?
420 ROXL.L #1,D2 ; move remainder in as we move dividend out
425 MOVE.L D3,(A6) ; quotient
426 MOVE.L D2,ADRWDSZ(A6) ; remainder
429 * _WORD_ U/MOD Unsigned divide of unsigned double dividend in 2nd:3rd on stack by top unsigned divisor
430 * ( ud u --- uremainder uquotient )
431 * Assume native divide trap on n/0 simply sets quotient to all ones or something.
432 * Start by doing 32/16 divide if divisor less than 65536.
433 * Maybe call the bit divide until I understand better, when divisor is more than 16 bits.
434 * Dividend should be range of product of 32 by 32 bit unsigned multiply,
435 * divisor should be the multiplier or multiplicand:
436 * Consider bit test instead of shift?
437 * Also, examine the native divide again.
438 * ** Native divide requires divide-by-zero trap code!
440 MOVEM.L (A6),D3/D2/D1 ; D1 is divisor, D2:D3 is dividend
446 Try working out 16/8 on 6801 (6800) for clues.
448 * _WORD_ >L Save top cell on stack to locals stack ( n --- ) { --- n }:
453 * _WORD_ >R Save top cell on stack to return stack ( n --- ) { --- n }:
463 * _WORD_ L> Pop top of locals stack to parameter stack ( --- n ) { n --- }:
468 * _WORD_ R> Pop top of return stack to parameter stack ( --- n ) { n --- }:
479 * _WORD_ L Retrieve, do not pop top of locals stack to parameter stack ( --- n ) { n --- n }:
484 * _WORD_ R Retrieve, do not pop top of return stack to parameter stack ( --- n ) { n --- n }:
486 MOVE.L ADRWDSZ(A7),-(A6)
493 * _WORD_ DUP Duplicate top cell on stack ( n --- n n ):
498 * _WORD_ DROP Remove a cell from the parameter stack ( n --- )
500 LEA CELLSZ(A7),A7 ; or PULU D and throw away
503 * _WORD_ 2DROP Remove a cell from the parameter stack ( n --- )
505 LEAU 2*CELLSZ(A7),U ; or PULU D,X and throw away
508 * Should true be set as 1 or -1?
509 * Going with all bits False (0)
510 * or all bits True (-1) as the flag to set.
511 * _WORD_ 0= Test top cell on stack for zero ( n --- f(top==0) ):
516 EXT.W D0 ; CPU32 can EXTB.L
532 * True as 1 would look like
545 * _LOWORD_ Duplicate (count in B) bytes on stack:
552 ****** Working in here to make it go both ways.
553 ****** Also need to check multiply and divide.
554 ****** And need to convert the stuff past multiply and divide to 68000
561 * not_LOWORD_ Move 2^16-1 bytes or less:
562 * source in A2, destination in A3, count in D4:
563 * Overlaps only work if source is higher than dest.
568 * _WORD_ Move up to 32K (2^15) bytes ( src dest count --- ):
569 * Copies zero when count > 2^15. (Limited for safety.)
570 * Compare CMOVE in Forth.
572 MOVE.L (2*ADRWDSZ,A6),A2 ; src
573 MOVE.L (ADRWDSZ,A6),A3 ; dest
575 CMP.L #$8000 ; Pre-test, do nothing if too big,
576 BLS.B BMOVEE ; or if zero.
581 DBF D4,SMOVEL ; Catches zero count here and stops.
582 BMOVEX LEAU (3*ADRWDSZ,A6),A6
585 * _WORD_ Execute the address on the stack:
592 JSR 0,X ; For debugging and flattening, no early optimizations.