1 @c Copyright (C) 2000, 2003, 2009, 2010 Red Hat, Inc.
2 @c This file is part of the CGEN manual.
3 @c For copying conditions, see the file cgen.texi.
6 @chapter CGEN's Register Transfer Language
8 @cindex Register Transfer Language
10 CGEN uses a variant of GCC's Register Transfer Language as the basis for
11 its CPU description language.
14 * RTL Introduction:: Introduction to CGEN's RTL
15 * Trade-offs:: Various trade-offs in the design
16 * Rules and notes:: Rules and notes common to all entries
17 * RTL Versions:: Supported versions and differences
18 * Top level conditionals:: Conditional definitions
19 * Definitions:: Definitions in the description file
20 * Attributes:: Random data associated with any entry
21 * Architecture variants:: Specifying variations of a CPU
22 * Model variants:: Specifying variations of a CPU's implementation
23 * Hardware elements:: Elements of a CPU
24 * Instruction fields:: Fields of an instruction
25 * Enumerated constants:: Assigning useful names to important numbers
26 * Keywords:: Like enums, plus string table
27 * Instruction operands:: Operands of instructions
28 * Derived operands:: Operands for CISC-like architectures
29 * Instructions:: Instructions
30 * Macro-instructions:: Macro instructions
31 * Modes:: Operand types in expressions
32 * Expressions:: Expressions in the language
33 * Macro-expressions:: A simplification of arithmetic expressions
36 @node RTL Introduction
37 @section RTL Introduction
39 The description language, or RTL
40 @footnote{While RTL stands for Register Transfer Language, it is also used
41 to denote the CPU description language as a whole.}, needs to support the
43 architectural and implementation features of a CPU, as well as enough
44 information for all intended applications. At present this is just the
45 opcodes table and an ISA level simulator, but it is not intended that
46 applications be restricted to these two areas. The goal is having an
47 application independent description of the CPU. In the end that's a lot to
48 ask for from one language. Certainly gate level specification of a CPU
51 The syntax of the language is inspired by GCC's RTL and by the Scheme
52 programming language, theoretically taking the best of both. To what
53 extent that is true, and to what extent that is sufficient inspiration
54 is certainly open to discussion. In actuality, there isn't much difference
55 here from GCC's RTL that is attributable to being Scheme-ish. One
56 important Scheme-derived concept is arbitrary precision of constants.
57 Sign or zero extension of constants in GCC has always been a source of
58 problems. In CGEN'S RTL constants have modes and there are both signed
61 Here is a graphical layout of the hierarchy of elements of a @file{.cpu}
67 cpu-family1 cpu-family2 ...
69 machine1 machine2 machine3 ...
74 Each of these elements is explained in more detail below. The
75 @emph{architecture} is one of @samp{sparc}, @samp{m32r}, etc. Within
76 the @samp{sparc} architecture, @emph{cpu-family} might be
77 @samp{sparc32}, @samp{sparc64}, etc. Within the @samp{sparc32} CPU
78 family, the @emph{machine} might be @samp{sparc-v8}, @samp{sparclite},
79 etc. Within the @samp{sparc-v8} machine classification, @emph{model}
80 might be @samp{hypersparc}, @samp{supersparc}, etc.
82 Instructions form their own hierarchy as each instruction may be supported
83 by more than one machine. Also, some architectures can handle more than
84 one instruction set on one chip (e.g. ARM).
93 hw1+ifield1 hw2+ifield2 ...
96 Each of these elements is explained in more detail below.
101 While CGEN is written in Scheme, this is not a requirement. The
102 description language should be considered absent of any particular
103 implementation, though certainly some things were done to simplify
104 reading @file{.cpu} files with Scheme. Scheme related choices have been
105 made in areas that have no serious impact on the usefulness of the CPU
106 description language. Places where that is not the case need to be
107 revisited, though there currently are no known ones.
109 One place where the Scheme implementation influenced the design of
110 CGEN's RTL is in the handling of modes. The Scheme implementation was
111 simplified by treating modes as an explicit argument, rather than as an
112 optional suffix of the operation name. For example, compare @code{(add
113 SI dr sr)} in CGEN versus @code{(add:SI dr sr)} in GCC RTL. The mode is
114 treated as optional so a shorthand form of @code{(add dr sr)} works.
116 @node Rules and notes
117 @section Rules and notes
119 A few basic guidelines for all entries:
122 @item Names must be valid Scheme symbols.
123 @item Comments are used, for example, to comment the generated C code
124 @footnote{It is possible to produce a reference manual from
125 @file{.cpu} files and such an application wouldn't be a bad idea.}.
126 @item Comments may be any number of lines, though generally succinct comments
127 are preferable@footnote{It would be reasonable to have a short form
128 and a long form of comment. Either as two entries are as one entry with
129 the short form separated from the long form via some delimiter (say the
131 @item Everything is case sensitive.@footnote{??? This is true in RTL,
132 though some apps add symbols and convert case that can cause collisions.}
133 @item While "_" is a valid character to use in symbols, "-" is preferred
134 @item Hex numbers are written using Scheme's notation.
135 Write 255 in hex as #xff, not 0xff.
136 One can also use #bNNN to write boolean values. E.g. #b111 == 7.
137 @item Except for the @samp{comment} and @samp{attrs} fields and unless
138 otherwise specified all fields must be present.
139 @item Symbols used to be allowed anywhere a string can be used.
140 This is what earlier versions of Guile supported.
141 Guile is more strict now, so this relaxation is gone.
142 The reverse is generally not allowed, strings can't be used in place
144 @item Use @samp{()} or @samp{#f} to indicate ``not specified'',
145 unless otherwise specified. This is not necessary for
146 @samp{define-foo} elements, one can just elide the entry,
147 but it is useful for @samp{define-*-foo} that take a fixed number
148 of arguments. E.g., @samp{define-normal-ifield}.
149 Whether to use @samp{()} or @samp{#f} is largely a matter of style.
153 @section RTL Versions
155 CGEN has minimal support for making changes to the language without
156 breaking existing ports. We do not put much effort into this because
157 over time it can become unmaintainable, but for some changes it is
158 useful to have a temporary window in which older versions are supported.
161 * Specifying the RTL version::
162 * List of supported RTL versions::
165 @node Specifying the RTL version
166 @subsection Specifying the RTL version
168 Specify the version of RTL that your cpu description was written to
169 with @samp{define-rtl-version}.
174 (define-rtl-version major-version minor-version)
177 When setting the RTL version, it must be the first thing done
178 in the description file or the behaviour is undefined.
179 This includes using or defining pmacros, the RTL version must be set first.
180 After the RTL version is set, if it is changed the behavior is undefined.
182 Note that one can still set it to the same version multiple times.
183 This is useful when the description is spread among several files,
184 and one is debugging/testing files individually.
186 The default RTL version, if @samp{define-rtl-version} is elided, is 0.7.
188 The latest RTL version is 0.9:
191 (define-rtl-version 0 9)
194 Every increment in major and minor versions is generally non-upward
195 compatible (otherwise the version would not have been incremented -
196 CGEN does not keep support for older versions long).
198 @node List of supported RTL versions
199 @subsection List of supported RTL versions
201 CGEN currently supports the following RTL versions.
205 @item 0.7 @code{(define-rtl-version 0 7)}
207 This is the original RTL version.
208 It is the default if no version is specified.
209 It is supported by CGEN versions 1.0, 1.1, and the current development tree.
210 Support for it will probably be removed for the CGEN 1.2 release.
212 @item 0.8 @code{(define-rtl-version 0 8)}
214 This version changed the syntax for defining keywords.
216 The @samp{print-name} field was renamed to @samp{enum-prefix}
217 and the @samp{prefix} field was renamed to @samp{name-prefix}.
224 (comment "description")
225 (attrs attribute-list)
227 (print-name "prefix-for-enum-values-with-trailing-dash")
228 (prefix "prefix-for-names-in-string-table")
238 (comment "description")
239 (attrs attribute-list)
241 (enum-prefix "prefix-for-enum-values")
242 (name-prefix "prefix-for-names-in-string-table")
247 Note that @samp{print-name} has been replaced with @samp{enum-prefix}
248 and @samp{prefix} has been replaced with @samp{name-prefix}.
250 Furthermore, there is also a difference between the behavior of
251 @samp{print-name} and @samp{enum-prefix}.
252 When computing complete enum names with @samp{print-name},
253 CGEN adds a @samp{-} between the prefix and the enum name.
254 CGEN does not insert a @samp{-} with @samp{enum-prefix}.
256 @item 0.9 @code{(define-rtl-version 0 9)}
258 This version changed the prefix of pmacros from @samp{.} to @samp{%}.
259 @samp{.pmacro} is changed to @samp{%pmacro}.
263 @node Top level conditionals
264 @section Top level conditionals
265 @cindex Top level conditionals
267 CGEN supports conditionally defining objects through the use of @samp{if}
268 and some specialized predicates. These must appear at the ``top level'',
269 i.e., not inside any other expression, except @samp{begin}.
271 The following predicates are supported:
275 @item (keep-isa? (isa-list))
276 Return ``true'' if any ISA in @samp{isa-list} is being kept.
277 This is controlled by the @samp{-i} option.
279 @item (keep-mach? (machine-list))
280 Return ``true'' if any machine in @samp{machine-list} is being kept.
281 This is controlled by the @samp{-m} option.
283 @item (application-is? application)
284 Return ``true'' if the current application generator is @samp{application}.
286 @item (rtl-version-equal? major minor)
287 Return ``true'' if the RTL version specified by the @file{.cpu} file is
290 @item (rtl-version-at-least? major minor)
291 Return ``true'' if the RTL version specified by the @file{.cpu} file is
292 at least @samp{major.minor}.
296 Here's an example from the CGEN testsuite.
297 It is used to write some wrappers around a few builtin pmacros
298 that are independent of the pmacro prefix character.
301 (if (rtl-version-at-least? 0 9)
303 (define-pmacro /begin %begin)
304 (define-pmacro /print %print)
305 (define-pmacro /dump %dump))
307 (define-pmacro /begin .begin)
308 (define-pmacro /print .print)
309 (define-pmacro /dump .dump)))
312 Here's an example from the @samp{SH} cpu description.
315 (if (keep-isa? (compact))
316 (include "sh64-compact.cpu"))
318 (if (keep-isa? (media))
319 (include "sh64-media.cpu"))
326 Each entry has the same format: @code{(define-foo arg1 arg2 ...)}, where
327 @samp{foo} designates the type of entry (e.g. @code{define-insn}). In
328 the general case each argument is a name/value pair expressed as
330 (*Note: Another style in common use is `:name value' and doesn't require
331 parentheses. Maybe that would be a better way to go here. The current
332 style is easier to construct from macros though.)
334 While the general case is flexible, it also is excessively verbose in
335 the normal case. To reduce this verbosity, a second version of most
336 define-foo's, generally named @samp{define-normal-foo} or
337 @samp{define-simple-foo}, exist that takes a fixed number
338 of positional arguments. With pmacros they can be even shortened further
339 to just their acronym. E.g. @samp{define-normal-ifield} -> @samp{dnf}.
340 Ports are free to write their own preprocessor macros to
341 simplify things further as desired.
342 See sections titled ``Simplification macros'' later in this chapter.
344 @c define-full-foo's are not documented on purpose.
345 @c They're fragile (e.g. if a new element is added),
346 @c and their use is discouraged.
352 Attributes are used throughout for specifying various properties.
353 For portability reasons attributes can only have 32 bit integral values
354 (signed or unsigned).
355 @c How about an example?
357 There are four kinds of attributes: boolean, integer, enumerated, and bitset.
358 Boolean attributes can be achieved via others, but they occur frequently
359 enough that they are special cased (and one bit can be used to record them).
360 Bitset attributes are a useful simplification when one wants to indicate an
361 object can be in one of many states (e.g. an instruction may be supported by
364 String attributes might be a useful addition.
365 Another useful addition might be functional attributes (the attribute
366 is computed at run-time - currently all attributes are computed at
367 compile time). One way to implement functional attributes would be to
368 record the attributes as byte-code and lazily evaluate them, caching the
369 results as appropriate. The syntax has been done to not
370 preclude either as an upward compatible extension.
372 Attributes must be defined before they can be used.
373 There are several predefined attributes for entry types that need them
374 (instruction field, hardware, operand, and instruction). Predefined
375 attributes are documented in each relevant section.
377 In C applications an enum is created that defines all the attributes.
378 Applications that wish to have some architecture independent-ness
379 need the attribute to have the same value across all architectures.
380 This is achieved by giving the attribute the INDEX attribute
381 @footnote{Yes, attributes can have attributes.},
382 which specifies the enum value must be fixed across all architectures.
383 @c FIXME: Give an example here.
384 @c FIXME: Need a better name than `INDEX'.
386 Convention requires attribute names consist of uppercase letters, numbers,
387 "-", and "_", and must begin with a letter.
388 To be consistent with Scheme, "-" is preferred over "_".
390 @subsection Boolean Attributes
391 @cindex Attributes, boolean
393 Boolean attributes are defined with:
399 (name attribute-name)
400 (comment "attribute comment")
401 (attrs attribute-attributes)
407 The default value of boolean attributes is always false. This can be
408 relaxed, but it's one extra complication that is currently unnecessary.
409 Boolean attributes are specified in either of two forms:
410 @code{(NAME expr)}, @code{NAME}, and @code{!NAME}.
411 The first form is the canonical form. The latter two
412 are shorthand versions.
413 @code{NAME} means "true" and @code{!NAME} means "false".
414 @samp{expr} is either @code{#f} or @code{#t}.
416 @code{user-list} is a space separated list of entry types that will use
417 the attribute. Possible values are: @samp{attr}, @samp{enum},
418 @samp{cpu}, @samp{mach}, @samp{model}, @samp{ifield}, @samp{hardware},
419 @samp{operand}, @samp{insn} and @samp{macro-insn}. If omitted all are
420 considered users of the attribute.
422 The @code{values} and @code{default} fields if provided must have the
423 indicated values. Usually these fields are elided.
425 @subsection Integer Attributes
426 @cindex Attributes, integer
428 Integer attributes are defined with:
434 (name attribute-name)
435 (comment "attribute comment")
436 (attrs attribute-attributes)
437 (default integer-value)
441 If omitted, the default is 0.
443 Integer attributes are specified with @code{(NAME value)}.
445 @subsection Enumerated Attributes
446 @cindex Attributes, enumerated
448 Enumerated attributes are the same as integer attributes except the
449 range of possible values is restricted and each value has a name.
450 Enumerated attributes are defined with
456 (name attribute-name)
457 (comment "attribute comment")
458 (attrs attribute-attributes)
459 (values enum-value1 enum-value2 ...)
460 (default default-enum-value)
464 If omitted, the default is the first entry in @code{values}.
466 Enum attributes are specified with @code{(NAME enum-value)}.
468 @subsection Bitset Attributes
469 @cindex Attributes, bitset
471 Bitset attributes are for situations where you want to indicate something
472 is a subset of a small set of possibilities. The MACH attribute uses this
473 for example to allow specifying which of the various machines support a
475 (*Note: At present the maximum number of possibilities is 32.
476 This is an implementation restriction which can be relaxed, but there's
479 Bitset attributes are defined with:
485 (name attribute-name)
486 (comment "attribute comment")
487 (attrs attribute-attributes)
488 (values enum-value1 enum-value2 ...)
489 (default default-value1 default-value2 ...)
493 The default values must be from the specified values.
494 The default must be provided, it may not be omitted.
496 Bitset attributes are specified with @code{(NAME val1 val2 ...)}.
498 For backward compatibility they may also be specified with
499 @code{(NAME val1,val2,...)} or @code{(NAME "val1,val2,...")},
500 there must be no spaces in ``@code{val1,val2,...}''
501 and each value must be a valid Scheme symbol.
502 Use of @code{(NAME val1,val2,...)} is deprecated, and
503 support for it will go away at some point.
505 @c NOTE: It's not clear whether allowing arbitrary expressions will be
506 @c useful here, but doing so is not precluded. For now each value must be
507 @c the name of one of the specified values.
509 @node Architecture variants
510 @section Architecture variants
511 @cindex Architecture variants
513 The base architecture and its variants are described in four parts:
514 @code{define-arch}, @code{define-isa}, @code{define-cpu}, and
525 @subsection define-arch
528 @code{define-arch} describes the overall architecture, and must be
531 The syntax of @code{define-arch} is:
535 (name architecture-name) ; e.g. m32r
536 (comment "description") ; e.g. "Mitsubishi M32R"
537 (attrs attribute-list)
538 (default-alignment aligned|unaligned|forced)
540 (machs mach-name-list)
545 @subsubsection default-alignment
547 Specify the default alignment to use when fetching data (and
548 instructions) from memory. At present this can't be overridden, but
549 support can be added if necessary. The default is @code{aligned}.
550 @c Definately need to say more here.
552 @subsubsection insn-lsb0?
555 Specifies whether the most significant or least significant bit in a
556 word is bit number 0. Generally this should conform to the convention
557 in the architecture manual. This is independent of endianness and is an
558 architecture wide specification. There is no support for using
559 different bit numbering conventions within an architecture.
560 @c Not that such support can't be added of course.
562 Instruction fields are always numbered beginning with the most
563 significant bit. That is, the `start' of a field is always its most
564 significant bit. For example, a 4 bit field in the uppermost bits of a
565 32 bit instruction would have a start/length of (31 4) when insn-lsb0? =
566 @code{#t}, and (0 4) when insn-lsb0? = @code{#f}.
568 @subsubsection mach-name-list
570 The list of names of machines in the architecture.
571 There should be one entry for each @code{define-mach}.
573 @subsubsection isa-name-list
575 The list of names of instruction sets in the architecture.
576 There must be one for each @code{define-isa}.
577 An example of an architecture with more than one is the ARM which
578 has a 32 bit instruction set and a 16 bit "Thumb" instruction set
579 (the sizes here refer to instruction size).
582 @subsection define-isa
585 @code{define-isa} describes aspects of the instruction set.
586 A minimum of one ISA must be defined.
588 The syntax of @code{define-isa} is:
593 (comment "description")
594 (attrs attribute-list)
595 (default-insn-word-bitsize n)
596 (default-insn-bitsize n)
597 (base-insn-bitsize n)
598 ; (decode-assist (b0 b1 b2 ...)) ; generally unnecessary
601 (condition ifield-name expr)
602 (setup-semantics expr)
603 ; (decode-splits decode-split-list) ; support temporarily disabled
604 ; ??? missing here are fetch/execute specs
608 @subsubsection default-insn-word-bitsize
610 Specifies the default size of an instruction word in bits.
611 This affects the numbering of field bits in words beyond the
613 @xref{Instruction fields}, for more information.
615 @subsubsection default-insn-bitsize
617 The default size of an instruction in bits. It is generally the size of
618 the smallest instruction. It is used when parsing instruction fields.
619 It is also used by the disassembler to know how many bytes to skip for
620 unrecognized instructions.
622 @subsubsection base-insn-bitsize
624 The minimum size of an instruction, in bits, to fetch during execution.
625 If the architecture has a variable length instruction set, this is the
626 size of the initial word to fetch. There is no need to specify the
627 maximum length of an instruction, that can be computed from the
628 instructions. Examples:
641 The M32R case is interesting because instructions can be 16 or 32 bits.
642 However instructions on 32 bit boundaries can always be fetched 32 bits
643 at a time as 16 bit instructions always come in pairs.
645 @subsubsection decode-assist
646 @cindex decode-assist
648 Override CGEN's heuristics about which bits to initially use to decode
649 instructions in a simulator. For example on the SPARC these are bits:
650 31 30 24 23 22 21 20 19. The entire decoder can be machine generated,
651 so this field is entirely optional. Since the heuristics are quite
652 good, you should only use this field if you have evidence that you
653 can pick a better set, in which case the CGEN developers would like to
656 ??? It might be useful to provide greater control, but this is sufficient
659 It is okay if the opcode bits are over-specified for some instructions.
660 It is also okay if the opcode bits are under-specified for some instructions.
661 The machine generated decoder will properly handle both these situations.
662 Just pick a useful number of bits that distinguishes most instructions.
663 It is usually best to not pick more than 8 bits to keep the size of the
664 initial decode table down.
666 Bit numbering is defined by the @code{insn-lsb0?} field.
668 @subsubsection liw-insns
671 The number of instructions the CPU always fetches at once. This is
672 intended for architectures like the M32R, and does not refer to a CPU's
673 ability to pre-fetch instructions. The default is 1.
675 @subsubsection parallel-insns
676 @cindex parallel-insns
678 The maximum number of instructions the CPU can execute in parallel. The
681 ??? Rename this to @code{max-parallel-insns}?
683 @subsubsection condition
685 Some architectures like ARM and ARC conditionally execute every instruction
686 based on the condition specified by one instruction field.
687 The @code{condition} spec exists to support these architectures.
688 @code{ifield-name} is the name of the instruction field denoting the
689 condition and @code{expression} is an RTL expressions that returns
690 the value of the condition (false=zero, true=non-zero).
692 @subsubsection setup-semantics
694 Specify a statement to be performed prior to executing particular instructions.
695 This is used, for example, on the ARM where the value of the program counter
696 (general register 15) is a function of the instruction (it is either
697 pc+8 or pc+12, depending on the instruction).
699 @subsubsection decode-splits
701 Specify a list of field names and values to split instructions up by.
702 This is used, for example, on the ARM where the behavior of some instructions
703 is quite different when the destination register is r15 (the pc).
711 ((split1-name (value1 value2 ...)) (split2-name ...)))
717 @code{constraints} is work-in-progress and should be @code{()} for now.
719 One copy of each instruction satisfying @code{constraint} is made
720 for each specified split. The semantics of each copy are then
721 simplified based on the known values of the specified instruction field.
724 @subsection define-cpu
727 @code{define-cpu} defines a ``CPU family'' which is a programmer
728 specified collection of related machines. What constitutes a family is
729 work-in-progress however it is intended to distinguish things like
730 sparc32 vs sparc64. Machines in a family are sufficiently similar that
731 the simulator semantic code can handle any differences at run time. At
732 least that's the current idea. A minimum of one CPU family must be
734 @footnote{FIXME: Using "cpu" in "cpu-family" here is confusing.
735 Need a better name. Maybe just "family"?}
737 The syntax of @code{define-cpu} is:
742 (comment "description")
743 (attrs attribute-list)
744 (endian big|little|either)
745 (insn-endian big|little|either)
746 (data-endian big|little|either)
747 (float-endian big|little|either)
749 (insn-chunk-bitsize n)
751 (file-transform transformation)
755 @subsubsection endian
757 The endianness of the architecture is one of three values: @code{big},
758 @code{little} and @code{either}.
760 An architecture may have multiple endiannesses, including one for each
761 of: instructions, integers, and floats (not that that's intended to be the
762 complete list). These are specified with @code{insn-endian},
763 @code{data-endian}, and @code{float-endian} respectively.
765 Possible values for @code{insn-endian} are: @code{big}, @code{little},
766 and @code{either}. If missing, the value is taken from @code{endian}.
768 Possible values for @code{data-endian} and @code{float-endian} are: @code{big},
769 @code{big-words}, @code{little}, @code{little-words} and @code{either}.
770 If @code{big-words} then each word is little-endian.
771 If @code{little-words} then each word is big-endian.
772 If missing, the value is taken from @code{endian}.
774 ??? Support for these is work-in-progress. All forms are recognized
775 by the @file{.cpu} file reader, but not all are supported internally.
777 @subsubsection word-bitsize
779 The number of bits in a word. In GCC, this is @code{BITS_PER_WORD}.
781 @subsubsection insn-chunk-bitsize
783 The number of bits in an instruction word chunk, for purposes of
784 per-chunk endianness conversion. The default is zero, meaning
785 no chunking is required.
787 @subsubsection parallel-insns
789 This is the same as the @code{parallel-insns} spec of @code{define-isa}.
790 It allows a CPU family to override the value.
792 @subsubsection file-transform
794 Specify the file name transformation of generated code.
796 Each generated file has a named related to the ISA or CPU family.
797 Sometimes generated code needs to know the name of another generated
798 file (e.g. #include's).
799 At present @code{file-transform} specifies the suffix.
801 For example, M32R/x generated files have an `x' suffix, as in @file{cpux.h}
802 for the @file{cpu.h} header. This is indicated with
803 @code{(file-transform "x")}.
805 ??? Ideally generated code wouldn't need to know anything about file names.
806 This breaks down for #include's. It can be fixed with symlinks or other
810 @subsection define-mach
813 @code{define-mach} defines a distinct variant of a CPU. It currently
814 has a one-to-one correspondence with BFD's "mach number". A minimum of
815 one mach must be defined.
817 The syntax of @code{define-mach} is:
822 (comment "description")
823 (attrs attribute-list)
824 (cpu cpu-family-name)
825 (bfd-name "bfd-name")
830 @subsubsection bfd-name
833 The name of the mach as used by BFD. If not specified the name of the
838 List of names of ISA's the machine supports.
841 @section Model variants
843 For each `machine', as defined here, there is one or more `models'.
844 There must be at least one model for each machine.
845 (*Note: There could be a default, but requiring one doesn't involve that much
846 extra typing and forces the programmer to at least think about such things.)
851 (comment "description")
852 (attrs attribute-list)
854 (state (variable-name-1 variable-mode-1) ...)
855 (unit name "comment" (attributes)
856 issue done state inputs outputs profile)
862 The name of the machine the model is an implementation of.
866 A list of variable-name/mode pairs for recording global function unit
867 state. For example on the M32R the value is @code{(state (h-gr UINT))}
868 and is a bitmask of which register(s) are the targets of loads and thus
869 subject to load stalls.
873 Specifies a function unit. Any number of function units may be specified.
874 The @code{u-exec} unit must be specified as it is the default.
879 (unit name "comment" (attributes)
880 issue done state inputs outputs profile)
883 @samp{issue} is the number of operations that may be in progress.
884 It originates from GCC function unit specification. In general the
887 @samp{done} is the latency of the unit. The value is the number of cycles
888 until the result is ready.
890 @samp{state} has the same syntax as the global model `state' and is a list of
891 variable-name/mode pairs.
893 @samp{inputs} is a list of inputs to the function unit.
894 Each element is @code{(operand-name mode default-value)}.
896 @samp{outputs} is a list of outputs of the function unit.
897 Each element is @code{(operand-name mode default-value)}.
899 @samp{profile} is an rtl-code sequence that performs function unit
900 modeling. At present the only possible value is @code{()} meaning
901 invoke a user supplied function named @code{<cpu>_model_<mach>_<unit>}.
903 The current function unit specification is a first pass in order to
904 achieve something that moderately works for the intended purpose (cycle
905 counting on the simulator). Something more elaborate is on the todo list
906 but there is currently no schedule for it. The new specification must
907 try to be application independent. Some known applications are:
908 cycle counting in the simulator, code scheduling in a compiler, and code
909 scheduling in a JIT simulator (where speed of analysis can be more
910 important than getting an optimum schedule).
912 The inputs/outputs fields are how elements in the semantic code are mapped
913 to function units. Each input and output has a name that corresponds
914 with the name of the operand in the semantics. Where there is no
915 correspondence, a mapping can be made in the unit specification of the
916 instruction (see the subsection titled ``Timing'').
918 Another way to achieve the correspondence is to create separate function
919 units that contain the desired input/output names. For example on the
920 M32R the u-exec unit is defined as:
923 (unit u-exec "Execution Unit" ()
926 ((sr INT -1) (sr2 INT -1)) ; inputs
927 ((dr INT -1)) ; outputs
928 () ; profile action (default)
932 This handles instructions that use sr, sr2 and dr as operands. A second
933 function unit called @samp{u-cmp} is defined as:
936 (unit u-cmp "Compare Unit" ()
939 ((src1 INT -1) (src2 INT -1)) ; inputs
941 () ; profile action (default)
945 This handles instructions that use src1 and src2 as operands. The
946 organization of units is arbitrary. On the M32R, src1/src2 instructions
947 are typically compare instructions so a separate function unit was
948 created for them. Current limitations require that each hardware item
949 behind the operands must be marked with the attribute @code{PROFILE} and
950 the hardware item must not be scalar.
952 @node Hardware elements
953 @section Hardware elements
955 The elements of hardware that make up a CPU are defined with
956 @code{define-hardware}. Examples of hardware elements include
957 registers, condition bits, immediate constants and memory.
959 Instruction fields that provide numerical values (``immediate
960 constants'') aren't really elements of the hardware, but it simplifies
961 things to think of them this way. Think of them as @emph{constant
962 generators}@footnote{A term borrowed from the book on the Bulldog
963 compiler and perhaps other sources.}.
965 Hardware elements are defined with:
970 (comment "description")
971 (attrs attribute-list)
972 (semantic-name hardware-semantic-name)
973 (type type-name type-arg1 type-arg2 ...)
974 (indices index-type index-arg1 index-arg2 ...)
975 (values values-type values-arg1 values-arg2 ...)
976 (handlers handler1 handler2 ...)
977 (get (args) expression)
978 (set (args) expression)
983 The only required elements are @samp{name} and @samp{type}.
984 Convention requires @samp{hardware-name} begin with @samp{h-}.
988 List of attributes. There are several predefined hardware attributes:
993 A bitset attribute used to specify which machines have this hardware element.
994 Do not specify the MACH attribute if the value is "all machs".
996 Usage: @code{(MACH mach1,mach2,...)}
997 There must be no spaces in ``@code{mach1,mach2,...}''.
1001 A hint to the simulator semantic code generator to tell it it can record the
1002 address of a selected register in an array of registers. This speeds up
1003 simulation by moving the array computation to extraction time.
1004 This attribute is only useful to register arrays and cannot be specified
1005 with @code{VIRTUAL} (??? revisit).
1009 This attribute must be present for hardware elements to which references
1010 are profiled. Beware, this is work-in-progress. If you use this
1011 attribute it is likely you have to hack CGEN. (Please submit patches.)
1015 The hardware element doesn't require any storage.
1016 This is used when you want a value that is derived from some other value.
1017 If @code{VIRTUAL} is specified, @code{get} and @code{set} specs must be
1023 This is the type of hardware. Current values are: @samp{pc}, @samp{register},
1024 @samp{memory}, and @samp{immediate}.
1026 For @samp{pc}, see @xref{Program counter}.
1028 For registers the syntax is one of:
1031 @code{(register mode [(number)])}
1032 @code{(register (mode bits) [(number)])}
1035 where @samp{(number)} is the number of registers and is optional. If
1036 omitted, the default is @samp{(1)}.
1037 The second form is useful for describing registers with an odd (as in
1038 unusual) number of bits.
1039 @code{mode} for the second form must be one of @samp{INT} or @samp{UINT}.
1040 Since these two modes don't have an implicit size, they cannot be used for
1043 @c ??? Might wish to remove the mode here and just specify number of bits.
1045 For memory the syntax is:
1048 @code{(memory mode (size))}
1051 where @samp{(size)} is the size of the memory in @samp{mode} units.
1052 In general @samp{mode} should be @code{QI}.
1054 For immediates the syntax is one of
1057 @code{(immediate mode)}
1058 @code{(immediate (mode bits))}
1061 The second form is for values for which a mode of that size doesn't exist.
1062 @samp{mode} for the second form must be one of @code{INT} or @code{UINT}.
1063 Since these two modes don't have an implicit size, they cannot be used
1066 ??? There's no real reason why a mode like SI can't be used
1067 for odd-sized immediate values. The @samp{bits} field indicates the size
1068 and the @samp{mode} field indicates the mode in which the value will be used,
1069 as well as its signedness. This would allow removing INT/UINT for this
1070 purpose. On the other hand, a non-width specific mode allows applications
1071 to choose one (a simulator might prefer to store immediates in an `int'
1072 rather than, say, char if the specified mode was @code{QI}).
1076 Specify names for individual elements with the @code{indices} spec.
1077 It is only valid for registers with more than one element.
1082 @code{(indices index-type arg1 arg2 ...)}
1085 where @samp{index-type} specifies the kind of index and @samp{arg1 arg2 ...}
1086 are arguments to @samp{index-type}.
1088 There are two supported values for @samp{index-type}: @code{keyword}
1089 and @code{extern-keyword}. The difference is that indices defined with
1090 @code{keyword} are kept internal to the hardware element's definition
1091 and are not usable elsewhere, whereas @code{extern-keyword} specifies
1092 a set of indices defined elsewhere with @code{define-keyword}.
1094 @subsubsection keyword
1097 @code{(indices keyword name-prefix ((name1 value1) (name2 value2) ...))}
1100 @samp{name-prefix} is the assembler prefix common to each of the index names,
1101 and is added to name in the generated lookup table.
1102 For example, SPARC registers usually begin with @samp{"%"}.
1104 Each @samp{(name value)} pair maps a name with an index number.
1105 An index can be specified multiple times, for example, when a register
1108 There may be gaps in the index list, e.g. for invalid/reserved registers.
1110 No enum is defined for keywords defined this way.
1111 If you want an enum use @samp{define-keyword} and @samp{extern-keyword}.
1118 (comment "Thumb's general purpose registers")
1119 (attrs (ISA thumb) VIRTUAL) ; ??? CACHE-ADDR should be doable
1120 (type register WI (8))
1122 ((r0 0) (r1 1) (r2 2) (r3 3) (r4 4) (r5 5) (r6 6) (r7 7)))
1123 (get (regno) (reg h-gr regno))
1124 (set (regno newval) (set (reg h-gr regno) newval))
1128 @subsubsection extern-keyword
1131 @code{(indices extern-keyword keyword-name)}
1134 Often one wants to make the keywords available for general use,
1135 i.e. to arbitrary tools.
1137 When the collection of indices is defined with @samp{define-keyword}
1138 refer to it in the @samp{indices} field with @samp{extern-keyword}.
1146 (values (fp 13) (lr 14) (sp 15)
1147 (r0 0) (r1 1) (r2 2) (r3 3) (r4 4) (r5 5) (r6 6) (r7 7)
1148 (r8 8) (r9 9) (r10 10) (r11 11) (r12 12) (r13 13) (r14 14) (r15 15))
1153 (comment "general registers")
1154 (attrs PROFILE CACHE-ADDR)
1155 (type register WI (16))
1156 (indices extern-keyword gr-names)
1162 Specify a list of valid values with the @code{values} spec.
1165 The syntax is identical to the syntax for @code{indices}.
1166 It is only valid for immediates.
1168 Example from sparc64:
1173 (comment "prediction bit")
1175 (type immediate (UINT 1))
1176 (values keyword "" (("" 0) (",pf" 0) (",pt" 1)))
1180 @subsection handlers
1182 The @code{handlers} spec is an escape hatch for indicating when a
1183 programmer supplied routine must be called to perform a function.
1188 @samp{(handlers (handler-name1 "function_name1")
1189 (handler-name2 "function_name2")
1193 @samp{handler-name} must be one of @code{parse} or @code{print}.
1194 How @samp{function_name} is used is application specific, but in
1195 general it is the name of a function to call. The only application
1196 that uses this at present is Opcodes. See the Opcodes documentation for
1197 a description of each function's expected prototype.
1198 @c FIXME: Need ref here.
1202 Specify special processing to be performed when a value is read
1203 with the @code{get} spec.
1205 The syntax for scalar registers is:
1208 @samp{(get () (expression))}
1211 The syntax for vector registers is:
1214 @samp{(get (index) (expression))}
1217 @code{expression} is an RTL expression that computes the value to return.
1218 The mode of the result must be the mode of the register.
1220 @code{index} is the name of the index as it appears in @code{expression}.
1222 At present, @code{sequence}, @code{parallel}, @code{do-count}
1223 and @code{case} expressions are not allowed here.
1227 Specify special processing to be performed when a value is written
1228 with the @code{set} spec.
1230 The syntax for scalar registers is:
1233 @samp{(set (newval) (expression))}
1236 The syntax for vector registers is:
1239 @samp{(set (index newval) (expression))}
1242 @code{expression} is an RTL expression that stores @code{newval}
1243 in the register. This may involve storing values in other registers as well.
1244 @code{expression} must be one of @code{set}, @code{if}, @code{sequence}, or
1247 @code{index} is the name of the index as it appears in @code{expression}.
1251 For specific hardware elements, specifying a layout is an alternative
1252 to providing getter/setter specs.
1254 At present this applies to only @samp{register} hardware elements,
1255 but not the @samp{pc}.
1257 Some registers are a collection of bits with different meanings.
1258 It is often useful to define each field of such a register as its
1259 own register. The @samp{layout} spec can then be used to build up
1260 the outer register from the individual register fields.
1262 The fields are written from least to most significant.
1263 Each field is either the name of another hardware register,
1264 or a list of (value length) to specify hardwired bits.
1266 A typical example is a ``flags'' register.
1267 Here is an example for a fictitious flags register.
1268 It is eight bits wide, with the lower four bits having defined values,
1269 and the upper four bits hardwired to zero.
1272 (dsh h-cf "carry flag" () (register BI))
1273 (dsh h-sf "sign flag" () (register BI))
1274 (dsh h-of "overflow flag" () (register BI))
1275 (dsh h-zf "zero flag" () (register BI))
1279 (layout (h-cf h-sf h-of h-zf (0 4)))
1283 @subsection Predefined hardware elements
1285 Several hardware types are predefined:
1293 main memory, where ``main'' is loosely defined
1295 data address (data only)
1297 instruction address (instructions only)
1300 @anchor{Program counter}
1301 @subsection Program counter
1303 The program counter must be defined and is not a builtin.
1304 If get/set specs are not required, define it as:
1307 (dnh h-pc "program counter" (PC) (pc) () () ())
1310 If get/set specs are required, define it as:
1315 (comment "<ARCH> program counter")
1318 (get () <insert get code here>)
1319 (set (newval) <insert set code here>)
1323 If the architecture has multiple instruction sets, all must be specified.
1324 If they're not, the default is the first one which is often not what you want.
1325 Here's an example from @file{arm.cpu}:
1330 (comment "ARM program counter (h-gr reg 15)")
1331 (attrs PC (ISA arm,thumb))
1335 (set (raw-reg SI h-pc) (and newval -2))
1336 (set (raw-reg SI h-pc) (and newval -4))))
1340 @subsection Simplification macros
1342 To simplify @file{.cpu} files several pmacros are provided.
1344 @anchor{a-define-normal-hardware}
1346 The @code{define-normal-hardware} pmacro (with alias @code{dnh})
1347 takes a fixed set of positional arguments for the typical hardware element.
1350 @code{(dnh name comment attributes type indices values handlers)}
1355 (dnh h-gr "general registers"
1358 (keyword "" ((fp 13) (sp 15) (lr 14)
1359 (r0 0) (r1 1) (r2 2) (r3 3)
1360 (r4 4) (r5 5) (r6 6) (r7 7)
1361 (r8 8) (r9 9) (r10 10) (r11 11)
1362 (r12 12) (r13 13) (r14 14) (r15 15)))
1367 This defines an array of 16 registers of mode @code{WI} ("word int").
1368 The names of the registers are @code{r0...r15}, and registers 13, 14 and
1369 15 also have the names @code{fp}, @code{lr} and @code{sp} respectively.
1371 @anchor{a-define-simple-hardware}
1373 Scalar registers with no special requirements occur frequently.
1374 Macro @code{define-simple-hardware} (with alias @code{dsh}) is identical to
1375 @code{dnh} except does not include the @code{indices}, @code{values},
1376 or @code{handlers} specs.
1379 (dsh h-ibit "interrupt enable bit" () (register BI))
1382 @node Instruction fields
1383 @section Instruction fields
1384 @cindex Instruction fields
1386 Instruction fields (ifields) define the raw bitfields of each instruction.
1387 Minimal semantic meaning is attributed to them. Support is provided for
1388 mapping to and from the raw bit pattern and the usable contents, and
1389 other simple manipulations.
1390 @footnote{Whether to also provide a way to specify instruction formats is not yet
1391 clear. Currently they are computed from the instructions, so there's no
1392 current *need* to provided them. However, providing the ability as an
1393 option may simplify other tools CGEN is used to generate. This
1394 simplification would come in the form of giving known names to the formats
1395 which CPU reference manuals often do. Pre-specified instruction formats
1396 may also simplify expression of more complicated instruction sets.
1397 Providing instruction formats may also simplify the support of really
1398 complex ISAs like i386 and m68k).}
1400 Instruction fields must be uniquely named within an instruction set,
1401 but different instruction sets (ISAs) may have ifields with the same name.
1403 The syntax for defining instruction fields is:
1408 (comment "description")
1409 (attrs attribute-list)
1410 (word-offset word-offset-in-bits)
1411 (word-length word-length-in-bits)
1412 (start starting-bit-number)
1413 (length number-of-bits)
1414 (follows ifield-name)
1416 (encode (value pc) (rtx to describe encoding))
1417 (decode (value pc) (rtx to describe decoding))
1421 The required elements are: @samp{name}, @samp{start}, @samp{length}.
1422 @footnote{Positional specification simplifies instruction description somewhat
1423 in that there is no required order of fields, and a disjunct set of fields can
1424 be referred to as one. On the other hand it can require knowledge of the length
1425 of the instruction which is inappropriate in cases like the M32R where
1426 the main fields have the same name and "position" regardless of the length
1427 of the instruction. Moving positional specification into instruction formats,
1428 whether machine generated or programmer specified, may be done.}
1430 Convention requires @samp{field-name} begin with @samp{f-}.
1434 There are several predefined instruction field attributes:
1438 The field contains a PC relative address. Various CPUs have various
1439 offsets from the PC from which the address is calculated. This is
1440 specified in the encode and decode sections.
1443 The field contains an absolute address.
1446 The field has an optional sign. It is sign-extended during
1447 extraction. Allowable values are -2^(n-1) to (2^n)-1.
1450 The field is marked as ``reserved'' by the architecture.
1451 This is an informational attribute. Tools may use it
1452 to validate programs, either statically or dynamically.
1455 The field does not directly contribute to the instruction's value. This
1456 is used to simplify semantic or assembler descriptions where a field's
1457 value is based on other values. Multi-ifields are always virtual.
1460 @subsection word-offset
1461 The offset in bits from the start of the instruction to the word containing
1463 This must be a multiple of eight.
1465 Either both of @samp{word-offset} and @samp{word-length} must be
1466 specified or neither of them must be specified. The presence of
1467 @samp{word-offset} means the long form of specifying the field's position is
1468 being used. If absent then the short form is being used and the value for
1469 @samp{word-offset} is encoded in @samp{start}.
1471 @subsection word-length
1472 The length in bits of the word containing the field.
1473 This must be a multiple of eight.
1476 The bit number of the field's most significant bit in the instruction.
1477 Bit numbering is determined by the @code{insn-lsb0?} field of
1480 If using the long form of specifying the field's position
1481 (i.e., @samp{word-offset} is specified) then this value is the value within
1482 the containing word. If using the short form then this value includes
1483 the word offset. See the Porting document for more info
1484 (@pxref{Writing define-ifield}).
1487 The number of bits in the field. The field must be contiguous. For
1488 non-contiguous instruction fields use ``multi-ifields''.
1491 Optional. Experimental.
1492 This should not be used for the specification of RISC-like architectures.
1493 It is an experiment in supporting CISC-like architectures.
1494 The argument is the name of the ifield or operand that immediately precedes
1495 this one. In general the argument is an "anyof" operand. The @code{follows}
1496 spec allows subsequent ifields to ``float''.
1499 The mode the value is to be interpreted in.
1500 Usually this is @code{INT} or @code{UINT}.
1502 @c ??? There's no real reason why modes like SI can't be used here.
1503 The @samp{length} field specifies the number of bits in the field,
1504 and the @samp{mode} field indicates the mode in which the value will be used,
1505 as well as its signedness. This would allow removing INT/UINT for this
1506 purpose. On the other hand, a non-width specific mode allows applications
1507 to choose one (a simulator might prefer to store immediates in an `int'
1508 rather than, say, char if the specified mode was @code{QI}).
1511 An expression to apply to convert from usable values to raw field
1512 values. The syntax is @code{(encode (value pc) expression)} or more
1513 generally @code{(encode ((<mode1> value) (IAI pc)) <expression>)},
1514 where @code{<mode1>} is the mode of the ``incoming'' value, and
1515 @code{<expression>} is an rtx to convert @code{value} to something that
1516 can be stored in the field.
1521 (encode ((SF value) (IAI pc))
1523 ((eq value (const SF 1.0)) (const 0))
1524 ((eq value (const SF 0.5)) (const 1))
1525 ((eq value (const SF -1.0)) (const 2))
1526 ((eq value (const SF 2.0)) (const 3))
1527 (else (error "invalid floating point value for field foo"))))
1530 In this example four floating point immediate values are represented in a
1531 field of two bits. The above might be expanded to a series of `if' statements
1532 or the generator could determine a `switch' statement is more appropriate.
1536 An expression to apply to convert from raw field values to usable
1537 values. The syntax is @code{(decode (value pc) expression)} or more
1538 generally @code{(decode ((<mode1> value) (IAI pc)) <expression>)},
1539 where @code{<mode1>} is the mode of the ``incoming'' value, and
1540 @code{<expression>} is an rtx to convert @code{value} to something usable.
1545 (decode ((WI value) (IAI pc))
1547 ((eq value 0) (const SF 1.0))
1548 ((eq value 1) (const SF 0.5))
1549 ((eq value 2) (const SF -1.0))
1550 ((eq value 3) (const SF 2.0))))
1553 There's no need to provide an error case as presumably @code{value}
1554 would never have an invalid value, though certainly one could provide an
1555 error case if one wanted to.
1557 @subsection Non-contiguous fields
1558 @cindex Instruction fields, non-contiguous
1560 Non-contiguous fields (e.g. sparc64's 16 bit displacement field) are
1561 built on top of support for contiguous fields. The syntax for defining
1565 (define-multi-ifield
1567 (comment "description")
1568 (attrs attribute-list)
1570 (subfields field1-name field2-name ...)
1571 (insert (code to set each subfield))
1572 (extract (code to set field from subfields))
1573 (encode (value pc) (rtx to describe encoding))
1574 (decode (value pc) (rtx to describe decoding))
1578 The required elements are: @samp{name}, @samp{subfields}.
1583 (define-multi-ifield
1585 (comment "20 bit unsigned")
1588 (subfields f-i20-4 f-i20-16)
1589 (insert (sequence ()
1590 (set (ifield f-i20-4) (srl (ifield f-i20) (const 16)))
1591 (set (ifield f-i20-16) (and (ifield f-i20) (const #xffff)))
1593 (extract (sequence ()
1594 (set (ifield f-i20) (or (sll (ifield f-i20-4) (const 16))
1600 @subsubsection subfields
1601 The names of the already defined fields that make up the multi-ifield.
1603 @subsubsection insert
1604 Code to set the subfields from the multi-ifield. All fields are referred
1605 to with @code{(ifield <name>)}.
1607 @subsubsection extract
1608 Code to set the multi-ifield from the subfields. All fields are referred
1609 to with @code{(ifield <name>)}.
1611 @subsection Simplification macros
1613 To simplify @file{.cpu} files several pmacros are provided.
1615 @anchor{a-define-normal-ifield}
1617 The @code{define-normal-ifield} pmacro (with alias @code{dnf})
1618 takes a fixed set of positional arguments for the typical instruction field.
1621 @code{(dnf name comment attributes start length)}
1626 (dnf f-r1 "register r1" () 4 4)
1629 This defines a field called @samp{f-r1} that is an unsigned field of 4
1630 bits beginning at bit 4. All fields defined with @code{dnf} are unsigned.
1633 The @code{df} pmacro adds @code{mode}, @code{encode}, and
1634 @code{decode} elements.
1636 The syntax of @code{df} is:
1638 @code{(df name comment attributes start length mode encode decode)}
1644 "disp8, slot unknown" (PCREL-ADDR)
1646 ((value pc) (sra WI (sub WI value (and WI pc (const -4))) (const 2)))
1647 ((value pc) (add WI (sll WI value (const 2)) (and WI pc (const -4)))))
1650 This defines a field called @samp{f-disp8} that is a signed PC-relative
1651 address beginning at bit 8 of size 8 bits that is left shifted by 2.
1653 @anchor{a-define-normal-multi-ifield}
1655 The macro @code{define-normal-multi-ifield} (with alias @code{dnmf})
1656 takes a fixed set of positional arguments for the typical multi-ifield.
1659 @code{(dnmf name comment attributes mode subfields insert extract)}
1662 The macro @code{dsmf} takes a fixed set of positional arguments for
1663 simple multi-ifields.
1666 @code{(dsmf name comment attributes mode subfields)}
1668 @node Enumerated constants
1669 @section Enumerated constants
1670 @cindex Enumerated constants
1671 @cindex Enumerations
1673 Enumerated constants (@emph{enums}) are important enough in instruction
1674 set descriptions that they are given special treatment.
1675 Enums are defined with:
1680 (comment "description")
1681 (attrs attribute-list)
1683 (values val1 val2 ...)
1687 Enums in opcode fields are further enhanced by specifying the opcode
1688 field they are used in. This allows the enum's name to be specified
1689 in an instruction's @code{format} entry.
1691 Instruction enums are defined with @code{define-insn-enum}:
1696 (comment "description")
1697 (attrs attribute-list)
1698 (ifield ifield-name)
1700 (values val1 val2 ...)
1704 @emph{define-insn-enum is currently not provided,
1705 use define-normal-insn-enum instead}.
1706 @xref{a-define-normal-insn-enum, define-normal-insn-enum}.
1709 Convention requires each enum value to be prefixed with the same text.
1710 Rather than specifying the prefix in each entry, it is specified once, here.
1711 Convention requires @samp{prefix} not contain any lowercase characters.
1712 You generally want to end @samp{prefix} with @samp{-} or @samp{_}
1713 as the complete name of each enum value is @samp{prefix} + @samp{value-name}.
1714 The convention is to use @samp{-}, though this convention is not
1715 adhered to as well as the other conventions.
1718 The default value is @samp{""}.
1721 The name of the instruction field that the enum is intended for. This
1722 must be a simple ifield, not a multi-ifield.
1724 @anchor{a-enum-values}
1726 A list of possible values. Each element has one of the following forms:
1731 @item @code{(name value)}
1732 @item @code{(name - (attribute-list))}
1733 @item @code{(name value (attribute-list))}
1736 The syntax for numbers is Scheme's, so hex numbers are @code{#xnnnn}.
1737 A value of @code{-} means use the next value (previous value plus 1).
1739 Enum values currently always have mode @samp{INT}.
1744 (values "a" ("b") ("c" #x12)
1745 ("d" - (sanitize foo)) ("e" #x1234 (sanitize bar)))
1748 @subsection Simplification macros
1750 To simplify @file{.cpu} files several pmacros are provided.
1752 @anchor{a-define-normal-enum}
1753 The @code{define-normal-enum} pmacro takes a fixed set of
1754 positional arguments for the typical enum.
1757 @code{(define-normal-enum name comment attrs prefix vals)}
1759 @anchor{a-define-normal-insn-enum}
1760 The @code{define-normal-insn-enum} pmacro takes a fixed set of
1761 positional arguments for the typical instruction enum.
1764 @code{(define-normal-insn-enum name comment attrs prefix ifield vals)}
1769 (dnf f-op1 "op1" () 0 4)
1770 (define-normal-insn-enum insn-op1 "insn format enums" () OP1_ f-op1
1771 (.map .str (.iota 16))
1775 This defines an instruction enum for field @samp{f-op1} with values
1776 OP1_0, OP1_1, ..., OP1_15. These values can be directly used in
1777 instruction format specs. This applies to ``instruction enums'' only.
1778 One can use normal enums in instruction format specs but one needs to
1779 explicitly specify the ifield, e.g. (f-op1 OP1_0).
1785 Keywords are like enums, @xref{Enumerated constants},
1786 but they also cause a table of names of each value to be generated.
1787 This is useful for things like registers where you want
1788 arbitrary tools to have access to the table of names.
1790 The syntax for defining keywords changed from RTL version 0.7 to
1791 RTL version 0.8. @xref{RTL Versions}.
1793 RTL version 0.7 syntax:
1798 (comment "description")
1799 (attrs attribute-list)
1801 (print-name "prefix-for-enum-values-without-trailing-dash")
1802 (prefix "prefix-for-names-in-string-table")
1807 RTL version 0.8 syntax:
1812 (comment "description")
1813 (attrs attribute-list)
1815 (enum-prefix "prefix-for-enum-values")
1816 (name-prefix "prefix-for-names-in-string-table")
1821 Note that @samp{print-name} has been replaced with @samp{enum-prefix}
1822 and @samp{prefix} has been replaced with @samp{name-prefix}.
1824 Furthermore, there is also a difference between the behavior of
1825 @samp{print-name} and @samp{enum-prefix}.
1826 When computing complete enum names with @samp{print-name},
1827 CGEN adds a @samp{-} between the prefix and the enum name.
1828 CGEN does not insert a @samp{-} with @samp{enum-prefix}.
1832 This is the mode to reference and record the keyword's value in.
1833 The default is @samp{INT}. It is normally not necessary to use
1836 @subsection print-name
1838 @emph{NOTE: This is for RTL version 0.7 only.}
1840 This value plus a trailing @samp{-} is passed as the @samp{prefix}
1841 parameter when defining the corresponding enum. @xref{Enumerated constants}.
1843 Convention requires @samp{print-name} not contain any lowercase characters.
1845 The default value is the keyword's name in uppercase.
1849 @emph{NOTE: This is for RTL version 0.7 only.}
1851 @samp{prefix} is the assembler prefix common to each of the index names,
1852 and is added to name in the generated lookup table.
1853 For example, SPARC registers usually begin with @samp{"%"}.
1854 It is @emph{not} added to the corresponding enum value names.
1856 The default value is @samp{""}.
1858 @subsection enum-prefix
1860 @emph{NOTE: This is for RTL version 0.8 and higher.
1861 You must specify the RTL version at the top of the description file.}
1863 This value is passed as the @samp{prefix} parameter when defining the
1864 corresponding enum. @xref{Enumerated constants}.
1866 @emph{NOTE:} Unlike @samp{print-name} in RTL version @samp{0.7},
1867 @samp{-} is not appended when defining the corresponding enum.
1869 Convention requires @samp{enum-prefix} not contain any lowercase characters.
1871 The default value is the keyword's name in uppercase + @samp{-}.
1873 @subsection name-prefix
1875 @emph{NOTE: This is for RTL version 0.8 and higher.
1876 You must specify the RTL version at the top of the description file.}
1878 @samp{name-prefix} is the assembler prefix common to each of the index names,
1879 and is added to name in the generated lookup table.
1880 For example, SPARC registers usually begin with @samp{"%"}.
1881 It is @emph{not} added to the corresponding enum value names.
1883 The default value is @samp{""}.
1887 The @samp{values} field has the same syntax as the @samp{values}
1888 field of @samp{define-enum}. @xref{a-enum-values, Enum Values}.
1896 (values (fp 13) (lr 14) (sp 15)
1897 (r0 0) (r1 1) (r2 2) (r3 3) (r4 4) (r5 5) (r6 6) (r7 7)
1898 (r8 8) (r9 9) (r10 10) (r11 11) (r12 12) (r13 13) (r14 14) (r15 15))
1902 Referencing enum values from this keyword in the .cpu file would use
1903 @samp{H-GR-} + @samp{register-name}. E.g., H-GR-r12.
1905 @node Instruction operands
1906 @section Instruction operands
1907 @cindex Instruction operands
1908 @cindex Operands, instruction
1910 Instruction operands provide:
1913 @item a layer between the assembler and the raw hardware description
1914 @item the main means of making an instruction's fields useful to
1919 Instruction operands must be uniquely named within an instruction set,
1920 but different instruction sets (ISAs) may have operands with the same name.
1922 The syntax for defining an operand is:
1927 (comment "description")
1928 (attrs attribute-list)
1929 (type hardware-element)
1931 (index instruction-field)
1932 (handlers handler-spec)
1933 (getter getter-spec)
1934 (setter setter-spec)
1938 The required elements are: @code{name}, @code{type}, @code{mode},
1939 and if @code{type} is not a scaler @code{index}.
1943 This is the name of the operand as a Scheme symbol.
1944 The name choice is fairly important as it is used in instruction
1945 syntax entries, instruction format entries, and semantic expressions.
1946 It can't collide with symbols used in semantic expressions
1947 (e.g. @code{and}, @code{set}, etc).
1949 The convention is that operands have no prefix (whereas ifields begin
1950 with @samp{f-} and hardware elements begin with @samp{h-}). A prefix
1951 like @samp{o-} would avoid collisions with other semantic elements, but
1952 operands are used often enough that any prefix is a hassle.
1954 Note that if you @emph{do} decide to prefix operand names, e.g. use
1955 a style like @samp{o-foo}, then you will need to remember to use the
1956 @samp{$@{o-foo@}} form in the assembler syntax and not the @samp{$o-foo}
1957 form because the latter only takes alphanumeric characters.
1958 @xref{assembler-syntax, syntax}.
1962 A list of attributes. In addition to attributes defined for the operand,
1963 an operand inherits the attributes of its instruction field. There are
1964 several predefined operand attributes:
1968 The operand contains negative values (not used yet so definition is
1972 This operand contains the changeable field (usually a branch address) of
1973 a relaxable/relaxed instruction.
1976 Use the SEM-ONLY attribute for cases where the operand will only be used
1977 in semantic specification, and not assembly code specification. A
1978 typical example is condition codes.
1979 @c Does this attribute need to exist?
1982 To refer to a hardware element in semantic code one must either use an
1983 operand or one of reg/mem/const. Operands generally exist to map
1984 instruction fields to the selected hardware element and are easier to
1985 use in semantic code than referring to the hardware element directly
1986 (e.g. @code{sr} is easier to type and read than @code{(reg h-gr
1987 <index>)}). Example:
1990 (dnop condbit "condition bit" (SEM-ONLY) h-cond f-nil)
1993 @code{f-nil} is the value to use when there is no instruction field
1995 @c There might be some language cleanup to be done here regarding f-nil.
1996 @c It is kind of extraneous.
1999 The hardware element this operand applies to. This must be the name of a
2003 The mode the value is to be interpreted in.
2006 The index of the hardware element. This is used to mate the hardware
2007 element with the instruction field that selects it, and must be the name
2008 of an ifield entry. (*Note: The index may be other things besides
2009 ifields in the future.) It must not be a multi-ifield, currently.
2011 @subsection handlers
2012 Sometimes it's necessary to escape to C to parse assembler, or print
2013 a value. This field is an escape hatch to implement this.
2016 @code{(handlers handler-spec)}
2018 where @code{handler-spec} is one or more of:
2020 @code{(parse "function_suffix")} -- a call to function
2021 @code{parse_<function_suffix>} is generated.
2023 @code{(print "function_suffix")} -- a call to function
2024 @code{print_<function_suffix>} is generated.
2026 These functions are intended to be provided in a separate @file{.opc}
2027 file. The prototype of a parse function depends on the hardware type.
2028 See @file{cpu/*.opc} for examples.
2030 @c FIXME: The following needs review.
2036 parse_foo (CGEN_CPU_DESC cd,
2039 unsigned long *valuep);
2042 @code{cd} is the result of @code{<arch>_cgen_cpu_open}.
2043 @code{strp} is a pointer to a pointer to the assembler and is updated by
2046 @code{opindex} is ???.
2047 @code{valuep} is a pointer to where to record the parsed value.
2049 If a relocation is needed, it is queued with a call to ???. Queued
2050 relocations are processed after the instruction has been parsed.
2052 The result is an error message or NULL if successful.
2054 The prototype of a print function depends on the hardware type. See
2055 @file{cpu/*.opc} for examples. For integers it is:
2058 void print_foo (CGEN_CPU_DESC cd,
2066 @samp{cd} is the result of @code{<arch>_cgen_cpu_open}.
2067 @samp{ptr} is the `info' argument to print_insn_<arch>.
2068 @samp{value} is the value to be printed.
2069 @samp{attrs} is the set of boolean attributes.
2070 @samp{pc} is the PC value of the instruction.
2071 @samp{length} is the length of the instruction.
2073 Actual printing is done by calling @code{((disassemble_info *)
2074 dis_info)->fprintf_func}.
2076 @subsection Simplification macros
2078 To simplify @file{.cpu} files several pmacros are provided.
2080 @anchor{a-define-normal-operand}
2083 The @code{define-normal-operand}) pmacro (with alias @code{dno})
2084 takes a fixed set of positional arguments for the typical operand.
2086 There is also the @code{dnop} pmacro, it is an alias of @code{dno}.
2088 The syntax of @code{dno} is:
2090 @code{(dno name comment attrs type index)}
2095 (dno sr "source register" () h-gr f-r2)
2098 This defines an operand name @samp{sr} that is an @samp{h-gr} register
2099 indexed by the @samp{f-r2} ifield.
2101 @node Derived operands
2102 @section Derived operands
2103 @cindex Derived operands
2104 @cindex Operands, instruction
2105 @cindex Operands, derived
2107 Derived operands are an experiment in supporting the addressing modes of
2108 CISC-like architectures. Addressing modes are difficult to support as
2109 they essentially increase the number of instructions in the architecture
2110 by an order of magnitude. Defining all the variants requires something
2111 in addition to the RISC-like architecture support. The theory is that
2112 since CISC-like instructions are basically "normal" instructions with
2113 complex operands the place to add the necessary support is in the
2116 Two kinds of operands exist to support CISC-like cpus, and they work
2117 together. ``derived-operands'' describe one variant of a complex
2118 argument, and ``anyof'' operands group them together.
2120 The syntax for defining derived operands is:
2123 (define-derived-operand
2125 (comment "description")
2126 (attrs attribute-list)
2128 (args arg1-operand-name arg2-operand-name ...)
2130 (base-ifield ifield-name)
2131 (encoding (+ arg1-operand-name arg2-operand-name ...))
2132 (ifield-assertion expression)
2138 @cindex anyof operands
2139 @cindex Operands, anyof
2141 The syntax for defining anyof operands is:
2144 (define-anyof-operand
2146 (comment "description")
2147 (attrs attribute-list)
2149 (base-ifield ifield-name)
2150 (choices derived-operand1-name derived-operand2-name ...)
2156 The name of the mode of the operand.
2160 List of names of operands the derived operand uses.
2161 The operands must already be defined.
2162 The argument operands can be any kind of operand: normal, derived, anyof.
2166 Assembler syntax of the operand.
2168 ??? This part needs more work. Addressing mode specification in assembler
2169 needn't be localized to the vicinity of the operand.
2171 @subsection base-ifield
2173 The name of the instruction field common to all related derived operands.
2174 Here related means "used by the same `anyof' operand".
2176 @subsection encoding
2178 The machine encoding of the operand.
2180 @subsection ifield-assertion
2182 An assertion of what values any instruction fields will or will not have
2183 in the containing instruction.
2185 @anchor{ifield-assertion-rtl}
2186 The syntax of the assertion is a restricted subset of RTL.
2187 It may only contain @samp{andif}, @samp{eq}, @samp{ne},
2188 and may only use scalar instruction fields
2189 @footnote{A scalar instruction field is a simple ifield
2190 (not a multi or derived ifield), or a multi-ifield consisting
2191 of only simple ifields.}
2192 and integers as operands.
2193 Furthermore, ifields must be specified in the first operand of
2194 @samp{eq}, @samp{ne}.
2196 As a degenerate case, a single non-zero integer, is also supported,
2197 meaning the assertion passes.
2199 In addition, the assertion may also use @samp{member}.
2201 Syntax: @code{(member ifield-name (number-list value1 [value2 ...]))}
2202 @footnote{Like all rtx, the full syntax is
2203 @code{(member [(options)] [member-mode] ifield-name (number-list [(options)] [numlist-mode] value1 [value2 ...]))},
2204 but @samp{options} and @samp{mode} are not really useful here.
2205 @samp{member-mode} is @samp{BI}, since the result is a boolean value.}
2207 The result of @samp{member} is one if the value of the ifield
2208 is a member of the list @code{(value1 [value2 ...])}.
2209 Otherwise the result is zero.
2211 If the result of the assertion is non-zero, the assertion passes.
2212 Otherwise it fails, and the instruction is not selected for that
2213 particular bit pattern.
2217 RTL expression to get the value of the operand.
2218 All operands refered to must be specified in @code{args}.
2222 RTL expression to set the value of the operand.
2223 All operands refered to must be specified in @code{args}.
2224 Use @code{newval} to refer to the value to be set.
2228 For anyof operands, the names of the derived operands.
2229 The operand may be "any of" the specified choices.
2232 @section Instructions
2233 @cindex Instructions
2235 Each instruction in the instruction set has an entry in the description
2237 @footnote{For complicated instruction sets this is a lot of typing. However,
2238 macros can reduce a lot of that typing. The real question is given the
2239 amount of information that must be expressed, how succinct can one express
2240 it and still be clean and usable? I'm open to opinions on how to improve
2241 this, but such improvements must take everything CGEN wishes to be into
2243 (*Note: Of course no claim is made that the current design is the
2244 be-all and end-all or that there is one be-all and end-all.)}
2246 Instructions must be uniquely named within an instruction set,
2247 but different instruction sets (ISAs) may have instructions with the same name.
2249 The syntax for defining an instruction is:
2254 (comment "description")
2255 (attrs attribute-list)
2256 (syntax "assembler syntax")
2257 (format (+ field-list))
2258 (ifield-assertion expression)
2259 (semantics expression)
2260 (timing timing-data)
2264 The required elements are: @code{name}, ???.
2266 Instructions specific to a particular cpu variant are denoted as such with
2269 Possible additions for the future:
2272 @item a field to describe a final constraint for determining a match
2273 @item choosing the output from a set of choices
2278 A list of attributes, for which there are several predefined instruction
2283 A bitset attribute used to specify which machines have this hardware
2284 element. Do not specify the MACH attribute if the value is for all
2287 Usage: @code{(MACH mach1,mach2,...)}
2289 There must be no spaces in ``@code{mach1,mach2,...}''.
2292 The instruction is an unconditional ``control transfer instruction''.
2294 (*Note: This attribute is derived from the semantic code. However if the
2295 computed value is wrong (dunno if it ever will be) the value can be
2296 overridden by explicitly mentioning it.)
2299 The instruction is an conditional "control transfer instruction".
2301 (*Note: This attribute is derived from the semantic code. However if the
2302 computed value is wrong (dunno if it ever will be) the value can be
2303 overridden by explicitly mentioning it.)
2306 The instruction can cause one or more insns to be skipped. This is
2307 derived from the semantic code.
2310 The instruction has one or more delay slots. This is derived from the
2314 The instruction has one or more identical variants. The assembler tries
2315 this one first and then the relaxation phases switches to larger ones as
2319 The instruction is a non-minimal variant of a relaxable instruction. It
2320 is avoided by the assembler in the first pass.
2323 Internal attribute set for macro-instructions that are an alias for one
2327 For macro-instructions, don't use during disassembly.
2330 @anchor{assembler-syntax}
2333 This is a character string consisting of raw characters and operands.
2334 Fields are denoted by @code{$operand} or
2335 @code{$@{operand@}}. The @code{$@{operand@}} form is required if
2336 the operand name contains non-alphanumeric characters.
2337 @c ??? Technically, '_' and '@' are ok too, I think, but do we want that?
2338 If a @samp{$} is required in the syntax, it is specified with @samp{\$}.
2339 If a @samp{\} is required in the syntax, it is specified with @samp{\\}.
2341 At most one white-space character may be
2342 present and it must be a blank separating the instruction mnemonic from
2343 the operands. This doesn't restrict the user's assembler, this is
2344 @c Is this reasonable?
2345 just a description file restriction to separate the mnemonic from the
2346 operands@footnote{The restriction can be relaxed by saying the first
2347 blank is the one that separates the mnemonic from its operands.}.
2348 Note that the assembler will accept multiple spaces in the assembler code
2349 after the mnemonic and between operands as expected.
2351 Operands can refer to registers, constants, and whatever else is necessary.
2353 Instruction mnemonics can take operands. For example, on the SPARC a
2354 branch instruction can take @code{,a} as an argument to indicate the
2355 instruction is being annulled (e.g. @code{bge$a $disp22}).
2359 This is a complete list of fields that specify the instruction. At
2360 present it must be prefaced with @code{+} to allow for future additions.
2361 Reserved bits must also be specified, gaps are not allowed.
2362 @c Well, actually I think they are and it could certainly be allowed.
2363 @c Question: should they be allowed?
2364 The ordering of the fields is not important.
2366 Format elements can be any of:
2369 @item an instruction field name with an integer, e.g. @code{(f-op1 4)}
2370 @item an instruction field name with an enum, e.g. @code{(f-op1 OP1_4)}
2371 @item an instruction field enum, e.g. @code{OP1_4}
2372 @item an operand name, e.g. @code{dr}
2375 @subsection ifield-assertion
2377 This is an expression with a boolean result that is run as the final
2378 part of instruction decoding to verify a match.
2380 The syntax of the assertion is a restricted subset of RTL.
2381 @xref{ifield-assertion-rtl}.
2383 @subsection semantics
2386 This field provides a mathematical description of what the instruction
2387 does. Its syntax is GCC RTL-like on purpose since GCC's RTL is well
2388 known by the intended audience. However, it is not intended that it be
2391 Obviously there are some instructions that are difficult if not
2392 impossible to provide a description for (e.g. I/O instructions). Rather
2393 than create a new semantic function for each quirky operation, escape
2394 hatches to C are provided to handle all such cases. The @code{c-code},
2395 @code{c-call} and @code{c-raw-call} semantic functions provide an
2396 escape-hatch to invoke C code to perform the
2397 operation. @xref{Expressions}.
2402 A list of entries for each function unit the instruction uses on each machine
2403 that supports the instruction. The default function unit is the u-exec unit.
2408 (model-name (unit name (direction unit-var-name1 insn-operand-name1)
2409 (direction unit-var-name2 insn-operand-name2)
2411 (cycles cycle-count))
2414 direction/unit-var-name/insn-operand-name mappings are optional.
2415 They map unit inputs/outputs to semantic elements. The
2416 direction specifier can be @code{in} or @code{out} mapping the
2417 name of a unit input or output, respectively, to an insn
2420 @code{cycles} overrides the @code{done} value (latency) of the function
2421 unit and is optional.
2423 @subsection Simplification macros
2425 To simplify @file{.cpu} files several pmacros are provided.
2427 @anchor{a-define-normal-insn}
2429 The @code{define-normal-insn} pmacro (with alias @code{dni})
2430 takes a fixed set of positional arguments for the typical instruction.
2432 The syntax of @code{dni} is:
2434 @code{(dni name comment attrs syntax format semantics timing)}
2439 (dni addi "add 8 bit signed immediate"
2443 (set dr (add dr simm8))
2448 @node Macro-instructions
2449 @section Macro-instructions
2450 @cindex Macro-instructions
2451 @cindex Instructions, macro
2453 Macro-instructions are for the assembler side of things and are not used
2456 Macro-instructions must be uniquely named within an instruction set,
2457 but different instruction sets (ISAs) may have macro-instructions
2460 The syntax for defining a macro-instruction is:
2464 (name macro-insn-name)
2465 (comment "description")
2466 (attrs attribute-list)
2467 (syntax "assembler syntax")
2468 (expansions expansion-spec)
2474 Syntax of the macro-instruction. This has the same value as the
2475 @code{syntax} field in @code{define-insn}.
2477 @subsection expansions
2479 An expression to emit code for the instruction. This is intended to be
2480 general in nature, allowing tests to be done at runtime that choose the
2481 form of the expansion. Currently the only supported form is:
2483 @code{(emit insn arg1 arg2 ...)}
2485 where @code{insn} is the name of an instruction defined with
2486 @code{define-insn} and @emph{argn} is the set of operands to
2487 @code{insn}'s syntax. Each argument is mapped in order to one operand
2488 in @code{insn}'s syntax and may be any of:
2491 @item operand specified in @code{syntax}
2492 @item @code{(operand value)}
2495 @subsection Simplification macros
2497 To simplify @file{.cpu} files several pmacros are provided.
2499 @anchor{a-define-normal-macro-insn}
2501 The @code{define-normal-macro-insn}) pmacro (with alias @code{dnmi})
2502 takes a fixed set of positional arguments for the typical macro-instruction.
2504 The syntax of @code{dnmi} is:
2506 @code{(dnmi name comment attrs syntax expansion)}
2511 (dni st-minus "st-" ()
2513 (+ OP1_2 OP2_7 src1 src2)
2514 (sequence ((WI new-src2))
2515 (set new-src2 (sub src2 (const 4)))
2516 (set (mem WI new-src2) src1)
2517 (set src2 new-src2))
2523 (dnmi push "push" ()
2525 (emit st-minus src1 (src2 15)) ; "st %0,@-sp"
2529 In this example, the @code{st-minus} instruction is a general
2530 store-and-decrement instruction and @code{push} is a specialized version
2531 of it that uses the stack pointer.
2537 Modes provide a simple and succinct way of specifying data types.
2539 (*Note: Should more complex types will be needed (e.g. structs? unions?),
2540 these can be handled by extending the definition of a mode to encompass them.)
2541 @c Also, have registers as just bits and have the operand / semantic operation
2542 @c provide the mode.
2544 Modes are similar to their usage in GCC, but there are some differences:
2547 @item modes for boolean values (i.e. bits) are also supported as they are
2549 @item integer modes exist in signed and unsigned versions
2550 @item constants have modes
2553 Currently supported modes are:
2560 Indicate the default mode is wanted, the value of which depends on context.
2561 This is a pseudo-mode and never appears in generated code.
2569 QI is an 8 bit quantity ("quarter int").
2570 HI is a 16 bit quantity ("half int").
2571 SI is a 32 bit quantity ("single int").
2572 DI is a 64 bit quantity ("double int").
2574 In cases where signedness matters, these modes are signed.
2576 @item UQI,UHI,USI,UDI
2577 Unsigned versions of QI,HI,SI,DI.
2579 These modes do not appear in semantic RTL. Instead, the RTL function
2580 specifies the signedness of its operands where necessary.
2581 To a cpu, a 32 bit register is a 32 bit register.
2582 Ditto for when the 32 bit quantity lives in memory.
2583 It's only in how it is subsequently used or interpreted that
2584 signedness might come into play.
2585 When signedness comes into play on the chip, it's explicitly
2586 specified in the operation, _not_ in the data.
2587 Ergo from this perspective Umodes don't belong in .cpu files.
2588 This is the perspective to use when writing .cpu files.
2590 @c I'm not entirely sure these unsigned modes are needed.
2591 @c They are useful in removing any ambiguity in how to sign extend constants
2592 @c which has been a source of problems in GCC.
2593 @c OTOH, maybe adding uconst akin to const is the way to go?
2595 @c ?? Some existing ports use these modes.
2598 word int, unsigned word int (word_mode in gcc).
2599 These are aliases for the real mode, typically either @code{SI} or @code{DI}.
2604 SF is a 32 bit IEEE float ("single float").
2605 DF is a 64 bit IEEE float ("double float").
2606 XF is either an 80 or 96 bit IEEE float ("extended float").
2607 (*Note: XF values on m68k and i386 are different so may
2608 wish to give them different names).
2609 TF is a 128 bit IEEE float.
2615 Instruction address integer
2618 Varying width int/unsigned-int. The width is specified by context,
2619 usually in an instruction field definition.
2624 @section Expressions
2627 The syntax of CGEN's RTL expressions (or @emph{rtx}) basically follows that of
2630 The handling of modes is different to simplify the implementation.
2631 Implementation shouldn't necessarily drive design, but it was a useful
2632 simplification. Still, it needs to be reviewed. The difference is that
2633 in GCC @code{(function:MODE arg1 ...)} is written in CGEN as
2634 @code{(function MODE arg1 ...)}. Note the space after @samp{function}.
2636 GCC RTL allows flags to be recorded with RTL (e.g. MEM_VOLATILE_P).
2637 This is supported in CGEN RTL by prefixing each RTL function's arguments
2638 with an optional list of modifiers:
2639 @code{(function (#:mod1 #:mod2) MODE arg1 ...)}.
2640 The list is a set of modifier names prefixed with '#:'. They can take
2642 ??? Modifiers are supported by the RTL traversing code, but no use is
2645 The mode may be elided if it can be deduced from the operands.
2646 For example, while the full form of @code{add} is
2647 @samp{(add () MODE arg1 arg2)},
2648 it may be written as @samp{(add arg1 arg2)}, with the mode being
2649 taken from the mode of @samp{arg1}.
2650 The fully specified version is called the ``canonical'' form.
2652 The currently defined semantic functions are:
2655 @item (set mode destination source)
2656 Assign @samp{source} to @samp{destination} reference in mode @samp{mode}.
2658 @item (set-quiet mode destination source)
2659 Assign @samp{source} to @samp{destination} referenced in mode
2660 @samp{mode}, but do not print any tracing message.
2662 @item (reg mode hw-name [index])
2663 Return an `operand' of hardware element @samp{hw-name} in mode @samp{mode}.
2664 If @samp{hw-name} is an array, @samp{index} selects which register.
2666 @item (raw-reg mode hw-name [index])
2667 Return an `operand' of hardware element @samp{hw-name} in mode @samp{mode},
2668 bypassing any @code{get} or @code{set} specs of the register.
2669 If @samp{hw-name} is an array, @samp{index} selects which register.
2670 This cannot be used with virtual registers (those specified with the
2671 @samp{VIRTUAL} attribute).
2673 @code{raw-reg} is most often used in @code{get} and @code{set} specs
2674 of a register: if it weren't read and write operations would infinitely
2677 @item (mem mode address)
2678 Return an `operand' of memory referenced at @samp{address} in mode
2681 @item (const mode value)
2682 Return an `operand' of constant @samp{value} in mode @samp{mode}.
2684 @item (enum mode value-name)
2685 Return an `operand' of constant @samp{value-name} in mode @samp{mode}.
2686 The value must be from a previously defined enum.
2688 @item (subword mode value word-num)
2689 Return part of @samp{value}. Which part is determined by @samp{mode} and
2690 @samp{word-num}. There are three cases.
2691 @c Blech. ``subword'' is a source of confusion in GCC.
2692 @c Maybe have three separate rtxs.
2694 If @samp{mode} is the same size as the mode of @samp{value}, @samp{word-num}
2695 must be @samp{0} and the result is @samp{value} recast in the new mode.
2696 There is no change in the bits of @samp{value}, they're just interpreted in a
2697 possibly different mode. This is most often used to interpret an integer
2698 value as a float and vice versa.
2700 If @samp{mode} is smaller than the mode of @samp{value}, @samp{value} is
2701 divided into N pieces and @samp{word-num} picks which piece.
2702 All pieces have the size of @samp{mode} except possibly the last.
2703 If the last piece has a different size, it cannot be referenced.
2704 Word number 0 is the most significant word, regardless of endianness.
2706 If @samp{mode} is larger than the mode of @samp{value}, @samp{value} is
2707 interpreted in the larger mode with the upper most significant bits treated
2708 as garbage (their value is assumed to be unimportant to the context in which
2709 the value will be used).
2710 @samp{word-num} must be @samp{0}.
2712 @item (join out-mode in-mode arg1 . arg-rest)
2713 Concatenate @samp{arg1[,arg2[,...]]} to create a value of mode @samp{out-mode}.
2714 @samp{arg1} becomes the most significant part of the result.
2715 Each argument is interpreted in mode @samp{in-mode}.
2716 @samp{in-mode} must evenly divide @samp{out-mode}.
2718 @item (sequence mode ((mode1 local1) ...) expr1 ...)
2719 Execute @samp{expr1}, @samp{expr2}, etc. sequentially.
2720 At least one expression must be specified, even if the result
2721 mode is @samp{VOID}.
2723 The result, if non-void-mode, is the value of the last expression.
2725 @samp{mode} is the mode of the result.
2726 If @samp{mode} is elided it is set to @samp{VOID} (void mode).
2728 `@code{((mode1 local1) ...)}' is a set of local variables.
2730 @item (parallel mode empty expr1 ...)
2731 Execute @samp{expr1}, @samp{expr2}, etc. in parallel. All inputs are
2732 read before any output is written.
2733 At least one expression must be specified.
2735 @samp{empty} must be @samp{()} and
2736 is present for consistency with @samp{sequence}.
2738 @samp{mode} must be @samp{VOID} (void mode), or it can be elided.
2740 @item (do-count mode iteration-variable number-of-iterations expr1 ...)
2741 This is a simple looping operation.
2742 Execute @samp{expr1}, @samp{expr2}, etc. the specified number of times.
2743 At least one expression must be specified.
2745 @samp{iteration-variable} will contain the iteration number and is
2746 available for use in expressions. It has mode @samp{INT}.
2747 It's value will be 0 ... @samp{number-of-iterations} - 1.
2749 @samp{number-of-iterations} is an rtl expression of mode INT
2750 (or a compatible mode). It is computed once and may not be modified
2753 @samp{mode} must be @samp{VOID} (void mode), or it can be elided.
2755 @item (unop mode operand)
2756 Perform a unary arithmetic operation.
2758 @samp{unop} is one of @code{neg},
2759 @code{abs}, @code{inv}, @code{not}, @code{zflag}, @code{nflag}.
2760 @code{zflag} returns a bit indicating if @samp{operand} is
2761 zero. @code{nflag} returns a bit indicating if @samp{operand} is
2762 negative. @code{inv} returns the bitwise complement of @samp{operand},
2763 whereas @code{not} returns its logical negation.
2765 @item (binop mode operand1 operand2)
2766 Perform a binary arithmetic operation.
2768 @samp{binop} is one of
2769 @code{add}, @code{sub}, @code{and}, @code{or}, @code{xor}, @code{mul},
2770 @code{div}, @code{udiv}, @code{mod}, @code{umod}.
2772 @item (binop-with-bit mode operand1 operand2 operand3)
2773 Same as @samp{binop}, except taking 3 operands. The third operand is
2774 always a single bit.
2776 @samp{binop-with-bit} is one of @code{addc},
2777 @code{addc-cflag}, @code{addc-oflag}, @code{subc}, @code{subc-cflag},
2780 Note: The following are deprecated:
2783 @item @code{add-cflag}, replaced with @code{addc-cflag}
2784 @item @code{add-oflag}, replaced with @code{addc-oflag}
2785 @item @code{sub-cflag}, replaced with @code{subc-cflag}
2786 @item @code{sub-cflag}, replaced with @code{subc-oflag}
2789 @item (shiftop mode operand1 operand2)
2790 Perform a shift operation.
2791 @samp{operand1} is shifted (or rotated) by the amount specified
2794 @samp{shiftop} is one of @code{sll}, @code{srl}, @code{sra},
2795 @code{ror}, @code{rol}.
2797 @samp{mode} must match the mode of @samp{operand1}.
2798 The mode of @samp{operand1} may be any integral mode.
2799 The mode of @samp{operand2} may be any integral mode, and need not match
2800 the mode of @samp{operand1}.
2802 It is an error if @samp{operand2} is negative or greater than
2803 or equal to the size of @samp{operand1}.
2804 If the architecture handles negative or large shift amounts,
2805 that needs to be handled in the surrounding RTL.
2807 @item (andif mode operand1 operand2)
2808 Evaluate @samp{operand1}.
2809 If it evaluates to zero the result is zero,
2810 and @samp{operand2} is not evaluated.
2811 If @samp{operand1} evaluates to non-zero, then evaluate @samp{operand2}.
2812 If it evaluates to non-zero the result is one,
2813 otherwise the result is zero.
2815 The mode of the result is @samp{BI}.
2816 @samp{mode} is generally elided or is @samp{BI}.
2818 @item (orif mode operand1 operand2)
2819 Evaluate @samp{operand1}.
2820 If it evaluates to non-zero the result is one,
2821 and @samp{operand2} is not evaluated.
2822 If @samp{operand1} evaluates to zero, then evaluate @samp{operand2}.
2823 If it evaluates to non-zero the result is one,
2824 otherwise the result is zero.
2826 The mode of the result is @samp{BI}.
2827 @samp{mode} is generally elided or is @samp{BI}.
2829 @item (integer-convop mode operand)
2830 Perform an integer mode->mode conversion operation.
2832 @samp{integer-convop} is one of:
2836 Sign-extend @samp{operand}, which must have an integer mode
2837 narrower than @samp{mode}, which also must be an integer mode.
2839 Zero-extend @samp{operand}, which must have an integer mode
2840 narrower than @samp{mode}, which also must be an integer mode.
2842 Truncate @samp{operand}, which must have an integer mode
2843 wider than @samp{mode}, which also must be an integer mode.
2846 @item (float-convop mode how operand)
2847 Perform a mode->mode conversion operation involving a floating point value.
2849 Conversions involving floating point values need to specify
2850 how things like truncation will be performed, e.g., the rounding mode.
2851 @samp{how} is an rtx of mode @samp{INT} that specifies how the conversion
2852 will be performed. The interpretation of @samp{how} is architecture-dependent,
2853 except that a value of zero has a specific meaning:
2854 If a particular floating-point conversion can only be done one way,
2855 or if the conversion is to be done the ``default'' way, specify zero
2857 What ``the default way'' is is application-dependent.
2859 @samp{float-convop} is one of:
2863 Extend @samp{operand}, which must have a floating point mode
2864 narrower than @samp{mode}, which also must be a floating point mode.
2866 Truncate @samp{operand}, which must have a floating point mode
2867 wider than @samp{mode}, which also must be a floating point mode.
2869 Convert @samp{operand}, which must have an integer mode,
2870 to a floating point value of mode @samp{mode}.
2871 @samp{operand} is treated as a signed integer.
2873 Convert @samp{operand}, which must have an integer mode,
2874 to a floating point value of mode @samp{mode}.
2875 @samp{operand} is treated as an unsigned integer.
2877 Convert @samp{operand}, which must have a floating point mode,
2878 to a signed integer of mode @samp{mode}.
2880 Convert @samp{operand}, which must have a floating point mode,
2881 to an unsigned integer of mode @samp{mode}.
2884 An enum is defined that specifies several predefined rounding modes.
2889 (comment "builtin floating point conversion kinds")
2890 (attrs VIRTUAL) ;; let app provide def'n instead of each cpu's desc.h
2892 (values ((DEFAULT 0)
2897 (TOWARD-NEGATIVE 5)))
2901 @item (cmpop mode operand1 operand2)
2902 Perform a comparison.
2904 @samp{cmpop} is one of @code{eq}, @code{ne},
2905 @code{lt}, @code{le}, @code{gt}, @code{ge}, @code{ltu}, @code{leu},
2906 @code{gtu}, @code{geu}.
2907 @c floating point compare-unordered?
2909 If the comparison succeeds the result is one,
2910 otherwise the result is zero.
2911 The mode of the result is @samp{BI}.
2913 @item (mathop mode operand)
2914 Perform a mathematical operation.
2916 @samp{mathop} is one of @code{sqrt}, @code{cos}, @code{sin}.
2918 @item (*nan mode operand)
2919 Return a boolean indicating if @samp{operand} is a NaN.
2920 @samp{mode} must be a floating point mode.
2921 There are three versions.
2925 Test whether @samp{operand} is any kind of NaN.
2926 @item (qnan operand)
2927 Test whether @samp{operand} is a quiet NaN.
2928 @item (snan operand)
2929 Test whether @samp{operand} is a signalling NaN.
2932 @item (if mode condition then [else])
2933 Standard @code{if} statement.
2935 @samp{condition} is any arithmetic expression.
2936 If the value is non-zero the @samp{then} part is executed.
2937 Otherwise, the @samp{else} part is executed (if present).
2939 @samp{mode} is the mode of the result, not of @samp{condition}.
2940 If @samp{mode} is not @code{VOID} (void mode), @samp{else} must be present.
2941 When the result is used, @samp{mode} must specified, and not be @code{VOID}.
2943 @item (cond mode (condition1 expr1a ...) (...) [(else exprNa...)])
2944 From Scheme: keep testing conditions until one succeeds, and then
2945 process the associated expressions.
2947 @item (case mode test ((case1 ..) expr1a ..) (..) [(else exprNa ..)])
2948 From Scheme: Compare @samp{test} with @samp{case1}, @samp{case2},
2949 etc. and process the associated expressions.
2951 @item (c-code mode "C expression")
2952 An escape hook to insert arbitrary C code. @samp{mode} must the
2953 compatible with the result of ``C expression''.
2955 @item (c-call mode symbol operand1 operand2 ...)
2956 An escape hook to emit a subroutine call to function named @samp{symbol}
2957 passing operands @samp{operand1}, @samp{operand2}, etc. An implicit
2958 first argument of @code{current_cpu} is passed to @samp{symbol}.
2959 @samp{mode} is the mode of the result. Be aware that @samp{symbol} will
2960 be restricted by reserved words in the C programming language and by
2961 existing symbols in the generated code.
2963 @item (c-raw-call mode symbol operand1 operand2 ...)
2964 Same as @code{c-call}: except there is no implicit @code{current_cpu}
2966 @samp{mode} is the mode of the result.
2968 @item (clobber mode object)
2969 Indicate that @samp{object} is written in mode @samp{mode}, without
2970 saying how. This could be useful in conjunction with the C escape hooks.
2972 @item (delay mode num expr)
2973 Indicate that there are @samp{num} delay slots in the processing of
2974 @samp{expr}. When using this rtx in instruction semantics, CGEN will
2975 infer that the instruction has the DELAY-SLOT attribute.
2977 @item (delay num expr)
2978 In older "sim" simulators, indicates that there are @samp{num} delay
2979 slots in the processing of @samp{expr}. When using this rtx in instruction
2980 semantics, CGEN will infer that the instruction has the DELAY-SLOT
2983 In newer "sid" simulators, evaluates to the writeback queue for hardware
2984 operand @samp{expr}, at @samp{num} instruction cycles in the
2985 future. @samp{expr} @emph{must} be a hardware operand in this case.
2987 For example, @code{(set (delay 3 pc) (+ pc 1))} will schedule write to
2988 the @samp{pc} register in the writeback phase of the 3rd instruction
2989 after the current. Alternatively, @code{(set gr1 (delay 3 gr2))} will
2990 immediately update the @samp{gr1} register with the @emph{latest write}
2991 to the @samp{gr2} register scheduled between the present and 3
2992 instructions in the future. @code{(delay 0 ...)} refers to the
2993 writeback phase of the current instruction.
2995 This effect is modeled with a circular buffer of "write stacks" for each
2996 hardware element (register banks get a single stack). The size of the
2997 circular buffer is calculated from the uses of @code{(delay ...)}
2998 rtxs. When a delayed write occurs, the simulator pushes the write onto
2999 the appropriate write stack in the "future" of the circular buffer for
3000 the written-to hardware element. At the end of each instruction cycle,
3001 the simulator executes all writes in all write stacks for the time slice
3002 just ending. When a delayed read (essentially a pipeline bypass) occurs,
3003 the simulator looks ahead in the circular buffer for any writes
3004 scheduled in the future write stack. If it doesn't find one, it
3005 progressively backs off towards the "current" instruction cycle's write
3006 stack, and if it still finds no scheduled writes then it returns the
3007 current state of the CPU. Thus while delayed writes are fast, delayed
3008 reads are potentially slower in a simulator with long pipelines and very
3009 large register banks.
3012 @c FIXME: put annul into the glossary.
3013 Annul the following instruction if @samp{yes?} is non-zero. This rtx is
3014 an experiment and will probably change.
3017 Skip the next instruction if @samp{yes?} is non-zero. This rtx is
3018 an experiment and will probably change.
3021 Return a symbol with value @samp{name}, for use in attribute
3022 processing. This is equivalent to @samp{quote} in Scheme but
3023 @samp{quote} sounds too jargonish.
3025 @item (int-attr mode object attr-name)
3026 Return the value of attribute @samp{attr-name} in mode @samp{mode}.
3027 @samp{object} must currently be @samp{(current-insn)}, the current instruction,
3028 or @samp{(current-mach)}, the current machine.
3029 The attribute's value must be representable as an integer.
3031 @item (eq-attr mode object attr-name value)
3032 Return non-zero if the value of attribute @samp{attr-name} of
3033 object @samp{object} is @samp{value}.
3035 @emph{NOTE:} List values of @samp{value} may be changed to allow use the
3036 @samp{number-list} rtx function.
3037 If @samp{value} is a list return ``true'' if the attribute is any of
3038 the listed values. But this is not implemented yet.
3040 @item (index-of operand)
3041 Return the index of @samp{operand}. For registers this is the register number.
3043 @item (regno operand)
3044 Same as @code{index-of}, but improves readability for registers.
3046 @item (error mode message)
3047 Emit an error message from CGEN RTL. Error message is specified by @samp{message}.
3052 @item (ifield field-name)
3053 Return the value of field @samp{field-name}. @samp{field-name} must be a
3054 field in the instruction.
3058 Operands can be any of:
3061 @item an operand defined in the description file
3062 @item a register reference, created with (reg mode [index])
3063 @item a memory reference, created with (mem mode address)
3064 @item a constant, created with (const mode value)
3065 @item a `sequence' local variable
3066 @item a `do-count' iteration variable
3067 @item another expression
3070 The @samp{symbol} in a @code{c-call} or @code{c-raw-call} function is
3071 currently the name of a C function or macro that is invoked by the
3072 generated semantic code.
3074 @node Macro-expressions
3075 @section Macro-expressions
3076 @cindex Macro-expressions
3078 Macro RTL expressions are a way to not have to always
3079 specify a mode for every expression (and sub-expression
3080 thereof). Whereas the formal way to specify, say, an add is
3081 @code{(add SI arg1 arg2)} if SI is the default mode of `arg1' then
3082 this can be simply written as @code{(add arg1 arg2)}.
3083 This gets expanded to @code{(add DFLT arg1 arg2)} where
3084 @code{DFLT} means ``default mode''.
3086 It might be possible to replace macro expressions with preprocessor macros,
3087 however for the nonce there is no plan to do this.