cgen/doc/rtl.texi

   1 @c Copyright (C) 2000, 2003, 2009, 2010 Red Hat, Inc.
   2 @c This file is part of the CGEN manual.
   3 @c For copying conditions, see the file cgen.texi.
   4
   5 @node RTL
   6 @chapter CGEN's Register Transfer Language
   7 @cindex RTL
   8 @cindex Register Transfer Language
   9
  10 CGEN uses a variant of GCC's Register Transfer Language as the basis for
  11 its CPU description language.
  12
  13 @menu
  14 * RTL Introduction::            Introduction to CGEN's RTL
  15 * Trade-offs::                  Various trade-offs in the design
  16 * Rules and notes::             Rules and notes common to all entries
  17 * RTL Versions::                Supported versions and differences
  18 * Top level conditionals::      Conditional definitions
  19 * Definitions::                 Definitions in the description file
  20 * Attributes::                  Random data associated with any entry
  21 * Architecture variants::       Specifying variations of a CPU
  22 * Model variants::              Specifying variations of a CPU's implementation
  23 * Hardware elements::           Elements of a CPU
  24 * Instruction fields::          Fields of an instruction
  25 * Enumerated constants::        Assigning useful names to important numbers
  26 * Keywords::                    Like enums, plus string table
  27 * Instruction operands::        Operands of instructions
  28 * Derived operands::            Operands for CISC-like architectures
  29 * Instructions::                Instructions
  30 * Macro-instructions::          Macro instructions
  31 * Modes::                       Operand types in expressions
  32 * Expressions::                 Expressions in the language
  33 * Macro-expressions::           A simplification of arithmetic expressions
  34 @end menu
  35
  36 @node RTL Introduction
  37 @section RTL Introduction
  38
  39 The description language, or RTL
  40 @footnote{While RTL stands for Register Transfer Language, it is also used
  41 to denote the CPU description language as a whole.}, needs to support the
  42 definition of all the
  43 architectural and implementation features of a CPU, as well as enough
  44 information for all intended applications.  At present this is just the
  45 opcodes table and an ISA level simulator, but it is not intended that
  46 applications be restricted to these two areas.  The goal is having an
  47 application independent description of the CPU.  In the end that's a lot to
  48 ask for from one language.  Certainly gate level specification of a CPU
  49 is not attempted!
  50
  51 The syntax of the language is inspired by GCC's RTL and by the Scheme
  52 programming language, theoretically taking the best of both.  To what
  53 extent that is true, and to what extent that is sufficient inspiration
  54 is certainly open to discussion.  In actuality, there isn't much difference
  55 here from GCC's RTL that is attributable to being Scheme-ish.  One
  56 important Scheme-derived concept is arbitrary precision of constants.
  57 Sign or zero extension of constants in GCC has always been a source of
  58 problems.  In CGEN'S RTL constants have modes and there are both signed
  59 and unsigned modes.
  60
  61 Here is a graphical layout of the hierarchy of elements of a @file{.cpu}
  62 file.
  63
  64 @example
  65                            architecture
  66                           /            \
  67                     cpu-family1        cpu-family2  ...
  68                       /     \            /      \
  69                 machine1   machine2  machine3   ...
  70                  /   \
  71              model1  model2  ...
  72 @end example
  73
  74 Each of these elements is explained in more detail below.  The
  75 @emph{architecture} is one of @samp{sparc}, @samp{m32r}, etc.  Within
  76 the @samp{sparc} architecture, @emph{cpu-family} might be
  77 @samp{sparc32}, @samp{sparc64}, etc.  Within the @samp{sparc32} CPU
  78 family, the @emph{machine} might be @samp{sparc-v8}, @samp{sparclite},
  79 etc.  Within the @samp{sparc-v8} machine classification, @emph{model}
  80 might be @samp{hypersparc}, @samp{supersparc}, etc.
  81
  82 Instructions form their own hierarchy as each instruction may be supported
  83 by more than one machine.  Also, some architectures can handle more than
  84 one instruction set on one chip (e.g. ARM).
  85
  86 @example
  87                      isa
  88                       |
  89                  instruction
  90                     /   \
  91              operand1  operand2  ...
  92                 |         |
  93          hw1+ifield1   hw2+ifield2  ...
  94 @end example
  95
  96 Each of these elements is explained in more detail below.
  97
  98 @node Trade-offs
  99 @section Trade-offs
 100
 101 While CGEN is written in Scheme, this is not a requirement.  The
 102 description language should be considered absent of any particular
 103 implementation, though certainly some things were done to simplify
 104 reading @file{.cpu} files with Scheme.  Scheme related choices have been
 105 made in areas that have no serious impact on the usefulness of the CPU
 106 description language.  Places where that is not the case need to be
 107 revisited, though there currently are no known ones.
 108
 109 One place where the Scheme implementation influenced the design of
 110 CGEN's RTL is in the handling of modes.  The Scheme implementation was
 111 simplified by treating modes as an explicit argument, rather than as an
 112 optional suffix of the operation name.  For example, compare @code{(add
 113 SI dr sr)} in CGEN versus @code{(add:SI dr sr)} in GCC RTL.  The mode is
 114 treated as optional so a shorthand form of @code{(add dr sr)} works.
 115
 116 @node Rules and notes
 117 @section Rules and notes
 118
 119 A few basic guidelines for all entries:
 120
 121 @itemize @bullet
 122 @item Names must be valid Scheme symbols.
 123 @item Comments are used, for example, to comment the generated C code
 124 @footnote{It is possible to produce a reference manual from
 125 @file{.cpu} files and such an application wouldn't be a bad idea.}.
 126 @item Comments may be any number of lines, though generally succinct comments
 127 are preferable@footnote{It would be reasonable to have a short form
 128 and a long form of comment. Either as two entries are as one entry with
 129 the short form separated from the long form via some delimiter (say the
 130 first newline).}.
 131 @item Everything is case sensitive.@footnote{??? This is true in RTL,
 132 though some apps add symbols and convert case that can cause collisions.}
 133 @item While "_" is a valid character to use in symbols, "-" is preferred
 134 @item Hex numbers are written using Scheme's notation.
 135 Write 255 in hex as #xff, not 0xff.
 136 One can also use #bNNN to write boolean values.  E.g. #b111 == 7.
 137 @item Except for the @samp{comment} and @samp{attrs} fields and unless
 138 otherwise specified all fields must be present.
 139 @item Symbols used to be allowed anywhere a string can be used.
 140 This is what earlier versions of Guile supported.
 141 Guile is more strict now, so this relaxation is gone.
 142 The reverse is generally not allowed, strings can't be used in place
 143 of symbols.
 144 @item Use @samp{()} or @samp{#f} to indicate ``not specified'',
 145 unless otherwise specified.  This is not necessary for
 146 @samp{define-foo} elements, one can just elide the entry,
 147 but it is useful for @samp{define-*-foo} that take a fixed number
 148 of arguments.  E.g., @samp{define-normal-ifield}.
 149 Whether to use @samp{()} or @samp{#f} is largely a matter of style.
 150 @end itemize
 151
 152 @node RTL Versions
 153 @section RTL Versions
 154
 155 CGEN has minimal support for making changes to the language without
 156 breaking existing ports.  We do not put much effort into this because
 157 over time it can become unmaintainable, but for some changes it is
 158 useful to have a temporary window in which older versions are supported.
 159
 160 @menu
 161 * Specifying the RTL version::
 162 * List of supported RTL versions::
 163 @end menu
 164
 165 @node Specifying the RTL version
 166 @subsection Specifying the RTL version
 167
 168 Specify the version of RTL that your cpu description was written to
 169 with @samp{define-rtl-version}.
 170
 171 Syntax:
 172
 173 @example
 174 (define-rtl-version major-version minor-version)
 175 @end example
 176
 177 When setting the RTL version, it must be the first thing done
 178 in the description file or the behaviour is undefined.
 179 This includes using or defining pmacros, the RTL version must be set first.
 180 After the RTL version is set, if it is changed the behavior is undefined.
 181
 182 Note that one can still set it to the same version multiple times.
 183 This is useful when the description is spread among several files,
 184 and one is debugging/testing files individually.
 185
 186 The default RTL version, if @samp{define-rtl-version} is elided, is 0.7.
 187
 188 The latest RTL version is 0.9:
 189
 190 @example
 191 (define-rtl-version 0 9)
 192 @end example
 193
 194 Every increment in major and minor versions is generally non-upward
 195 compatible (otherwise the version would not have been incremented -
 196 CGEN does not keep support for older versions long).
 197
 198 @node List of supported RTL versions
 199 @subsection List of supported RTL versions
 200
 201 CGEN currently supports the following RTL versions.
 202
 203 @itemize @bullet
 204
 205 @item 0.7 @code{(define-rtl-version 0 7)}
 206
 207 This is the original RTL version.
 208 It is the default if no version is specified.
 209 It is supported by CGEN versions 1.0, 1.1, and the current development tree.
 210 Support for it will probably be removed for the CGEN 1.2 release.
 211
 212 @item 0.8 @code{(define-rtl-version 0 8)}
 213
 214 This version changed the syntax for defining keywords.
 215 @xref{Keywords}.
 216 The @samp{print-name} field was renamed to @samp{enum-prefix}
 217 and the @samp{prefix} field was renamed to @samp{name-prefix}.
 218
 219 Previous syntax:
 220
 221 @smallexample
 222 (define-keyword
 223   (name keyword-name)
 224   (comment "description")
 225   (attrs attribute-list)
 226   (mode mode-name)
 227   (print-name "prefix-for-enum-values-with-trailing-dash")
 228   (prefix "prefix-for-names-in-string-table")
 229   (values value-list)
 230 )
 231 @end smallexample
 232
 233 New syntax:
 234
 235 @smallexample
 236 (define-keyword
 237   (name keyword-name)
 238   (comment "description")
 239   (attrs attribute-list)
 240   (mode mode-name)
 241   (enum-prefix "prefix-for-enum-values")
 242   (name-prefix "prefix-for-names-in-string-table")
 243   (values value-list)
 244 )
 245 @end smallexample
 246
 247 Note that @samp{print-name} has been replaced with @samp{enum-prefix}
 248 and @samp{prefix} has been replaced with @samp{name-prefix}.
 249
 250 Furthermore, there is also a difference between the behavior of
 251 @samp{print-name} and @samp{enum-prefix}.
 252 When computing complete enum names with @samp{print-name},
 253 CGEN adds a @samp{-} between the prefix and the enum name.
 254 CGEN does not insert a @samp{-} with @samp{enum-prefix}.
 255
 256 @item 0.9 @code{(define-rtl-version 0 9)}
 257
 258 This version changed the prefix of pmacros from @samp{.} to @samp{%}.
 259 @samp{.pmacro} is changed to @samp{%pmacro}.
 260
 261 @end itemize
 262
 263 @node Top level conditionals
 264 @section Top level conditionals
 265 @cindex Top level conditionals
 266
 267 CGEN supports conditionally defining objects through the use of @samp{if}
 268 and some specialized predicates.  These must appear at the ``top level'',
 269 i.e., not inside any other expression, except @samp{begin}.
 270
 271 The following predicates are supported:
 272
 273 @itemize @bullet
 274
 275 @item (keep-isa? (isa-list))
 276 Return ``true'' if any ISA in @samp{isa-list} is being kept.
 277 This is controlled by the @samp{-i} option.
 278
 279 @item (keep-mach? (machine-list))
 280 Return ``true'' if any machine in @samp{machine-list} is being kept.
 281 This is controlled by the @samp{-m} option.
 282
 283 @item (application-is? application)
 284 Return ``true'' if the current application generator is @samp{application}.
 285
 286 @item (rtl-version-equal? major minor)
 287 Return ``true'' if the RTL version specified by the @file{.cpu} file is
 288 @samp{major.minor}.
 289
 290 @item (rtl-version-at-least? major minor)
 291 Return ``true'' if the RTL version specified by the @file{.cpu} file is
 292 at least @samp{major.minor}.
 293
 294 @end itemize
 295
 296 Here's an example from the CGEN testsuite.
 297 It is used to write some wrappers around a few builtin pmacros
 298 that are independent of the pmacro prefix character.
 299
 300 @smallexample
 301 (if (rtl-version-at-least? 0 9)
 302     (begin
 303       (define-pmacro /begin %begin)
 304       (define-pmacro /print %print)
 305       (define-pmacro /dump %dump))
 306     (begin
 307       (define-pmacro /begin .begin)
 308       (define-pmacro /print .print)
 309       (define-pmacro /dump .dump)))
 310 @end smallexample
 311
 312 Here's an example from the @samp{SH} cpu description.
 313
 314 @smallexample
 315 (if (keep-isa? (compact))
 316     (include "sh64-compact.cpu"))
 317
 318 (if (keep-isa? (media))
 319     (include "sh64-media.cpu"))
 320 @end smallexample
 321
 322 @node Definitions
 323 @section Definitions
 324 @cindex Definitions
 325
 326 Each entry has the same format: @code{(define-foo arg1 arg2 ...)}, where
 327 @samp{foo} designates the type of entry (e.g. @code{define-insn}).  In
 328 the general case each argument is a name/value pair expressed as
 329 @code{(name value)}.
 330 (*Note: Another style in common use is `:name value' and doesn't require
 331 parentheses.  Maybe that would be a better way to go here.  The current
 332 style is easier to construct from macros though.)
 333
 334 While the general case is flexible, it also is excessively verbose in
 335 the normal case.  To reduce this verbosity, a second version of most
 336 define-foo's, generally named @samp{define-normal-foo} or
 337 @samp{define-simple-foo}, exist that takes a fixed number
 338 of positional arguments.  With pmacros they can be even shortened further
 339 to just their acronym.  E.g. @samp{define-normal-ifield} -> @samp{dnf}.
 340 Ports are free to write their own preprocessor macros to
 341 simplify things further as desired.
 342 See sections titled ``Simplification macros'' later in this chapter.
 343
 344 @c define-full-foo's are not documented on purpose.
 345 @c They're fragile (e.g. if a new element is added),
 346 @c and their use is discouraged.
 347
 348 @node Attributes
 349 @section Attributes
 350 @cindex Attributes
 351
 352 Attributes are used throughout for specifying various properties.
 353 For portability reasons attributes can only have 32 bit integral values
 354 (signed or unsigned).
 355 @c How about an example?
 356
 357 There are four kinds of attributes: boolean, integer, enumerated, and bitset.
 358 Boolean attributes can be achieved via others, but they occur frequently
 359 enough that they are special cased (and one bit can be used to record them).
 360 Bitset attributes are a useful simplification when one wants to indicate an
 361 object can be in one of many states (e.g. an instruction may be supported by
 362 multiple machines).
 363
 364 String attributes might be a useful addition.
 365 Another useful addition might be functional attributes (the attribute
 366 is computed at run-time - currently all attributes are computed at
 367 compile time).  One way to implement functional attributes would be to
 368 record the attributes as byte-code and lazily evaluate them, caching the
 369 results as appropriate.  The syntax has been done to not
 370 preclude either as an upward compatible extension.
 371
 372 Attributes must be defined before they can be used.
 373 There are several predefined attributes for entry types that need them
 374 (instruction field, hardware, operand, and instruction).  Predefined
 375 attributes are documented in each relevant section.
 376
 377 In C applications an enum is created that defines all the attributes.
 378 Applications that wish to have some architecture independent-ness
 379 need the attribute to have the same value across all architectures.
 380 This is achieved by giving the attribute the INDEX attribute
 381 @footnote{Yes, attributes can have attributes.},
 382 which specifies the enum value must be fixed across all architectures.
 383 @c FIXME: Give an example here.
 384 @c FIXME: Need a better name than `INDEX'.
 385
 386 Convention requires attribute names consist of uppercase letters, numbers,
 387 "-", and "_", and must begin with a letter.
 388 To be consistent with Scheme, "-" is preferred over "_".
 389
 390 @subsection Boolean Attributes
 391 @cindex Attributes, boolean
 392
 393 Boolean attributes are defined with:
 394
 395 @example
 396 (define-attr
 397   (type boolean)
 398   (for user-list)
 399   (name attribute-name)
 400   (comment "attribute comment")
 401   (attrs attribute-attributes)
 402   (values #f #t)
 403   (default #f)
 404 )
 405 @end example
 406
 407 The default value of boolean attributes is always false.  This can be
 408 relaxed, but it's one extra complication that is currently unnecessary.
 409 Boolean attributes are specified in either of two forms:
 410 @code{(NAME expr)}, @code{NAME}, and @code{!NAME}.
 411 The first form is the canonical form.  The latter two
 412 are shorthand versions.
 413 @code{NAME} means "true" and @code{!NAME} means "false".
 414 @samp{expr} is either @code{#f} or @code{#t}.
 415
 416 @code{user-list} is a space separated list of entry types that will use
 417 the attribute.  Possible values are: @samp{attr}, @samp{enum},
 418 @samp{cpu}, @samp{mach}, @samp{model}, @samp{ifield}, @samp{hardware},
 419 @samp{operand}, @samp{insn} and @samp{macro-insn}.  If omitted all are
 420 considered users of the attribute.
 421
 422 The @code{values} and @code{default} fields if provided must have the
 423 indicated values.  Usually these fields are elided.
 424
 425 @subsection Integer Attributes
 426 @cindex Attributes, integer
 427
 428 Integer attributes are defined with:
 429
 430 @example
 431 (define-attr
 432   (type integer)
 433   (for user-list)
 434   (name attribute-name)
 435   (comment "attribute comment")
 436   (attrs attribute-attributes)
 437   (default integer-value)
 438 )
 439 @end example
 440
 441 If omitted, the default is 0.
 442
 443 Integer attributes are specified with @code{(NAME value)}.
 444
 445 @subsection Enumerated Attributes
 446 @cindex Attributes, enumerated
 447
 448 Enumerated attributes are the same as integer attributes except the
 449 range of possible values is restricted and each value has a name.
 450 Enumerated attributes are defined with
 451
 452 @example
 453 (define-attr
 454   (type enum)
 455   (for user-list)
 456   (name attribute-name)
 457   (comment "attribute comment")
 458   (attrs attribute-attributes)
 459   (values enum-value1 enum-value2 ...)
 460   (default default-enum-value)
 461 )
 462 @end example
 463
 464 If omitted, the default is the first entry in @code{values}.
 465
 466 Enum attributes are specified with @code{(NAME enum-value)}.
 467
 468 @subsection Bitset Attributes
 469 @cindex Attributes, bitset
 470
 471 Bitset attributes are for situations where you want to indicate something
 472 is a subset of a small set of possibilities.  The MACH attribute uses this
 473 for example to allow specifying which of the various machines support a
 474 particular insn.
 475 (*Note: At present the maximum number of possibilities is 32.
 476 This is an implementation restriction which can be relaxed, but there's
 477 currently no rush.)
 478
 479 Bitset attributes are defined with:
 480
 481 @example
 482 (define-attr
 483   (type bitset)
 484   (for user-list)
 485   (name attribute-name)
 486   (comment "attribute comment")
 487   (attrs attribute-attributes)
 488   (values enum-value1 enum-value2 ...)
 489   (default default-value1 default-value2 ...)
 490 )
 491 @end example
 492
 493 The default values must be from the specified values.
 494 The default must be provided, it may not be omitted.
 495
 496 Bitset attributes are specified with @code{(NAME val1 val2 ...)}.
 497
 498 For backward compatibility they may also be specified with
 499 @code{(NAME val1,val2,...)} or @code{(NAME "val1,val2,...")},
 500 there must be no spaces in ``@code{val1,val2,...}''
 501 and each value must be a valid Scheme symbol.
 502 Use of @code{(NAME val1,val2,...)} is deprecated, and
 503 support for it will go away at some point.
 504
 505 @c NOTE: It's not clear whether allowing arbitrary expressions will be
 506 @c useful here, but doing so is not precluded.  For now each value must be
 507 @c the name of one of the specified values.
 508
 509 @node Architecture variants
 510 @section Architecture variants
 511 @cindex Architecture variants
 512
 513 The base architecture and its variants are described in four parts:
 514 @code{define-arch}, @code{define-isa}, @code{define-cpu}, and
 515 @code{define-mach}.
 516
 517 @menu
 518 * define-arch::
 519 * define-isa::
 520 * define-cpu::
 521 * define-mach::
 522 @end menu
 523
 524 @node define-arch
 525 @subsection define-arch
 526 @cindex define-arch
 527
 528 @code{define-arch} describes the overall architecture, and must be
 529 present.
 530
 531 The syntax of @code{define-arch} is:
 532
 533 @example
 534 (define-arch
 535   (name architecture-name) ; e.g. m32r
 536   (comment "description")  ; e.g. "Mitsubishi M32R"
 537   (attrs attribute-list)
 538   (default-alignment aligned|unaligned|forced)
 539   (insn-lsb0? #f|#t)
 540   (machs mach-name-list)
 541   (isas isa-name-list)
 542 )
 543 @end example
 544
 545 @subsubsection default-alignment
 546
 547 Specify the default alignment to use when fetching data (and
 548 instructions) from memory.  At present this can't be overridden, but
 549 support can be added if necessary.  The default is @code{aligned}.
 550 @c Definately need to say more here.
 551
 552 @subsubsection insn-lsb0?
 553 @cindex insn-lsb0?
 554
 555 Specifies whether the most significant or least significant bit in a
 556 word is bit number 0.  Generally this should conform to the convention
 557 in the architecture manual.  This is independent of endianness and is an
 558 architecture wide specification.  There is no support for using
 559 different bit numbering conventions within an architecture.
 560 @c Not that such support can't be added of course.
 561
 562 Instruction fields are always numbered beginning with the most
 563 significant bit.  That is, the `start' of a field is always its most
 564 significant bit.  For example, a 4 bit field in the uppermost bits of a
 565 32 bit instruction would have a start/length of (31 4) when insn-lsb0? =
 566 @code{#t}, and (0 4) when insn-lsb0? = @code{#f}.
 567
 568 @subsubsection mach-name-list
 569
 570 The list of names of machines in the architecture.
 571 There should be one entry for each @code{define-mach}.
 572
 573 @subsubsection isa-name-list
 574
 575 The list of names of instruction sets in the architecture.
 576 There must be one for each @code{define-isa}.
 577 An example of an architecture with more than one is the ARM which
 578 has a 32 bit instruction set and a 16 bit "Thumb" instruction set
 579 (the sizes here refer to instruction size).
 580
 581 @node define-isa
 582 @subsection define-isa
 583 @cindex define-isa
 584
 585 @code{define-isa} describes aspects of the instruction set.
 586 A minimum of one ISA must be defined.
 587
 588 The syntax of @code{define-isa} is:
 589
 590 @example
 591 (define-isa
 592   (name isa-name)
 593   (comment "description")
 594   (attrs attribute-list)
 595   (default-insn-word-bitsize n)
 596   (default-insn-bitsize n)
 597   (base-insn-bitsize n)
 598   ; (decode-assist (b0 b1 b2 ...)) ; generally unnecessary
 599   (liw-insns n)
 600   (parallel-insns n)
 601   (condition ifield-name expr)
 602   (setup-semantics expr)
 603   ; (decode-splits decode-split-list) ; support temporarily disabled
 604   ; ??? missing here are fetch/execute specs
 605 )
 606 @end example
 607
 608 @subsubsection default-insn-word-bitsize
 609
 610 Specifies the default size of an instruction word in bits.
 611 This affects the numbering of field bits in words beyond the
 612 base instruction.
 613 @xref{Instruction fields}, for more information.
 614
 615 @subsubsection default-insn-bitsize
 616
 617 The default size of an instruction in bits. It is generally the size of
 618 the smallest instruction. It is used when parsing instruction fields.
 619 It is also used by the disassembler to know how many bytes to skip for
 620 unrecognized instructions.
 621
 622 @subsubsection base-insn-bitsize
 623
 624 The minimum size of an instruction, in bits, to fetch during execution.
 625 If the architecture has a variable length instruction set, this is the
 626 size of the initial word to fetch.  There is no need to specify the
 627 maximum length of an instruction, that can be computed from the
 628 instructions.  Examples:
 629
 630 @table @asis
 631 @item i386
 632 8
 633 @item M68k
 634 16
 635 @item SPARC
 636 32
 637 @item M32R
 638 32
 639 @end table
 640
 641 The M32R case is interesting because instructions can be 16 or 32 bits.
 642 However instructions on 32 bit boundaries can always be fetched 32 bits
 643 at a time as 16 bit instructions always come in pairs.
 644
 645 @subsubsection decode-assist
 646 @cindex decode-assist
 647
 648 Override CGEN's heuristics about which bits to initially use to decode
 649 instructions in a simulator.  For example on the SPARC these are bits:
 650 31 30 24 23 22 21 20 19.  The entire decoder can be machine generated,
 651 so this field is entirely optional.  Since the heuristics are quite
 652 good, you should only use this field if you have evidence that you
 653 can pick a better set, in which case the CGEN developers would like to
 654 hear from you!
 655
 656 ??? It might be useful to provide greater control, but this is sufficient
 657 for now.
 658
 659 It is okay if the opcode bits are over-specified for some instructions.
 660 It is also okay if the opcode bits are under-specified for some instructions.
 661 The machine generated decoder will properly handle both these situations.
 662 Just pick a useful number of bits that distinguishes most instructions.
 663 It is usually best to not pick more than 8 bits to keep the size of the
 664 initial decode table down.
 665
 666 Bit numbering is defined by the @code{insn-lsb0?} field.
 667
 668 @subsubsection liw-insns
 669 @cindex liw-insns
 670
 671 The number of instructions the CPU always fetches at once.  This is
 672 intended for architectures like the M32R, and does not refer to a CPU's
 673 ability to pre-fetch instructions.  The default is 1.
 674
 675 @subsubsection parallel-insns
 676 @cindex parallel-insns
 677
 678 The maximum number of instructions the CPU can execute in parallel.  The
 679 default is 1.
 680
 681 ??? Rename this to @code{max-parallel-insns}?
 682
 683 @subsubsection condition
 684
 685 Some architectures like ARM and ARC conditionally execute every instruction
 686 based on the condition specified by one instruction field.
 687 The @code{condition} spec exists to support these architectures.
 688 @code{ifield-name} is the name of the instruction field denoting the
 689 condition and @code{expression} is an RTL expressions that returns
 690 the value of the condition (false=zero, true=non-zero).
 691
 692 @subsubsection setup-semantics
 693
 694 Specify a statement to be performed prior to executing particular instructions.
 695 This is used, for example, on the ARM where the value of the program counter
 696 (general register 15) is a function of the instruction (it is either
 697 pc+8 or pc+12, depending on the instruction).
 698
 699 @subsubsection decode-splits
 700
 701 Specify a list of field names and values to split instructions up by.
 702 This is used, for example, on the ARM where the behavior of some instructions
 703 is quite different when the destination register is r15 (the pc).
 704
 705 The syntax is:
 706
 707 @example
 708 (decode-splits
 709   (ifield1-name
 710    constraints
 711    ((split1-name (value1 value2 ...)) (split2-name ...)))
 712   (ifield2-name
 713    ...)
 714 )
 715 @end example
 716
 717 @code{constraints} is work-in-progress and should be @code{()} for now.
 718
 719 One copy of each instruction satisfying @code{constraint} is made
 720 for each specified split.  The semantics of each copy are then
 721 simplified based on the known values of the specified instruction field.
 722
 723 @node define-cpu
 724 @subsection define-cpu
 725 @cindex define-cpu
 726
 727 @code{define-cpu} defines a ``CPU family'' which is a programmer
 728 specified collection of related machines.  What constitutes a family is
 729 work-in-progress however it is intended to distinguish things like
 730 sparc32 vs sparc64.  Machines in a family are sufficiently similar that
 731 the simulator semantic code can handle any differences at run time.  At
 732 least that's the current idea.  A minimum of one CPU family must be
 733 defined.
 734 @footnote{FIXME: Using "cpu" in "cpu-family" here is confusing.
 735 Need a better name.  Maybe just "family"?}
 736
 737 The syntax of @code{define-cpu} is:
 738
 739 @example
 740 (define-cpu
 741   (name cpu-name)
 742   (comment "description")
 743   (attrs attribute-list)
 744   (endian big|little|either)
 745   (insn-endian big|little|either)
 746   (data-endian big|little|either)
 747   (float-endian big|little|either)
 748   (word-bitsize n)
 749   (insn-chunk-bitsize n)
 750   (parallel-insns n)
 751   (file-transform transformation)
 752 )
 753 @end example
 754
 755 @subsubsection endian
 756
 757 The endianness of the architecture is one of three values: @code{big},
 758 @code{little} and @code{either}.
 759
 760 An architecture may have multiple endiannesses, including one for each
 761 of: instructions, integers, and floats (not that that's intended to be the
 762 complete list).  These are specified with @code{insn-endian},
 763 @code{data-endian}, and @code{float-endian} respectively.
 764
 765 Possible values for @code{insn-endian} are: @code{big}, @code{little},
 766 and @code{either}.  If missing, the value is taken from @code{endian}.
 767
 768 Possible values for @code{data-endian} and @code{float-endian} are: @code{big},
 769 @code{big-words}, @code{little}, @code{little-words} and @code{either}.
 770 If @code{big-words} then each word is little-endian.
 771 If @code{little-words} then each word is big-endian.
 772 If missing, the value is taken from @code{endian}.
 773
 774 ??? Support for these is work-in-progress.  All forms are recognized
 775 by the @file{.cpu} file reader, but not all are supported internally.
 776
 777 @subsubsection word-bitsize
 778
 779 The number of bits in a word.  In GCC, this is @code{BITS_PER_WORD}.
 780
 781 @subsubsection insn-chunk-bitsize
 782
 783 The number of bits in an instruction word chunk, for purposes of
 784 per-chunk endianness conversion.  The default is zero, meaning
 785 no chunking is required.
 786
 787 @subsubsection parallel-insns
 788
 789 This is the same as the @code{parallel-insns} spec of @code{define-isa}.
 790 It allows a CPU family to override the value.
 791
 792 @subsubsection file-transform
 793
 794 Specify the file name transformation of generated code.
 795
 796 Each generated file has a named related to the ISA or CPU family.
 797 Sometimes generated code needs to know the name of another generated
 798 file (e.g. #include's).
 799 At present @code{file-transform} specifies the suffix.
 800
 801 For example, M32R/x generated files have an `x' suffix, as in @file{cpux.h}
 802 for the @file{cpu.h} header.  This is indicated with
 803 @code{(file-transform "x")}.
 804
 805 ??? Ideally generated code wouldn't need to know anything about file names.
 806 This breaks down for #include's.  It can be fixed with symlinks or other
 807 means.
 808
 809 @node define-mach
 810 @subsection define-mach
 811 @cindex define-mach
 812
 813 @code{define-mach} defines a distinct variant of a CPU.  It currently
 814 has a one-to-one correspondence with BFD's "mach number".  A minimum of
 815 one mach must be defined.
 816
 817 The syntax of @code{define-mach} is:
 818
 819 @example
 820 (define-mach
 821   (name mach-name)
 822   (comment "description")
 823   (attrs attribute-list)
 824   (cpu cpu-family-name)
 825   (bfd-name "bfd-name")
 826   (isas isa-name-list)
 827 )
 828 @end example
 829
 830 @subsubsection bfd-name
 831 @cindex bfd-name
 832
 833 The name of the mach as used by BFD.  If not specified the name of the
 834 mach is used.
 835
 836 @subsubsection isas
 837
 838 List of names of ISA's the machine supports.
 839
 840 @node Model variants
 841 @section Model variants
 842
 843 For each `machine', as defined here, there is one or more `models'.
 844 There must be at least one model for each machine.
 845 (*Note: There could be a default, but requiring one doesn't involve that much
 846 extra typing and forces the programmer to at least think about such things.)
 847
 848 @example
 849 (define-model
 850   (name model-name)
 851   (comment "description")
 852   (attrs attribute-list)
 853   (mach machine-name)
 854   (state (variable-name-1 variable-mode-1) ...)
 855   (unit name "comment" (attributes)
 856         issue done state inputs outputs profile)
 857 )
 858 @end example
 859
 860 @subsection mach
 861
 862 The name of the machine the model is an implementation of.
 863
 864 @subsection state
 865
 866 A list of variable-name/mode pairs for recording global function unit
 867 state.  For example on the M32R the value is @code{(state (h-gr UINT))}
 868 and is a bitmask of which register(s) are the targets of loads and thus
 869 subject to load stalls.
 870
 871 @subsection unit
 872
 873 Specifies a function unit.  Any number of function units may be specified.
 874 The @code{u-exec} unit must be specified as it is the default.
 875
 876 The syntax is:
 877
 878 @example
 879   (unit name "comment" (attributes)
 880      issue done state inputs outputs profile)
 881 @end example
 882
 883 @samp{issue} is the number of operations that may be in progress.
 884 It originates from GCC function unit specification.  In general the
 885 value should be 1.
 886
 887 @samp{done} is the latency of the unit.  The value is the number of cycles
 888 until the result is ready.
 889
 890 @samp{state} has the same syntax as the global model `state' and is a list of
 891 variable-name/mode pairs.
 892
 893 @samp{inputs} is a list of inputs to the function unit.
 894 Each element is @code{(operand-name mode default-value)}.
 895
 896 @samp{outputs} is a list of outputs of the function unit.
 897 Each element is @code{(operand-name mode default-value)}.
 898
 899 @samp{profile} is an rtl-code sequence that performs function unit
 900 modeling.  At present the only possible value is @code{()} meaning
 901 invoke a user supplied function named @code{<cpu>_model_<mach>_<unit>}.
 902
 903 The current function unit specification is a first pass in order to
 904 achieve something that moderately works for the intended purpose (cycle
 905 counting on the simulator).  Something more elaborate is on the todo list
 906 but there is currently no schedule for it.  The new specification must
 907 try to be application independent.  Some known applications are:
 908 cycle counting in the simulator, code scheduling in a compiler, and code
 909 scheduling in a JIT simulator (where speed of analysis can be more
 910 important than getting an optimum schedule).
 911
 912 The inputs/outputs fields are how elements in the semantic code are mapped
 913 to function units.  Each input and output has a name that corresponds
 914 with the name of the operand in the semantics.  Where there is no
 915 correspondence, a mapping can be made in the unit specification of the
 916 instruction (see the subsection titled ``Timing'').
 917
 918 Another way to achieve the correspondence is to create separate function
 919 units that contain the desired input/output names.  For example on the
 920 M32R the u-exec unit is defined as:
 921
 922 @example
 923 (unit u-exec "Execution Unit" ()
 924    1 1 ; issue done
 925    () ; state
 926    ((sr INT -1) (sr2 INT -1)) ; inputs
 927    ((dr INT -1)) ; outputs
 928    () ; profile action (default)
 929 )
 930 @end example
 931
 932 This handles instructions that use sr, sr2 and dr as operands.  A second
 933 function unit called @samp{u-cmp} is defined as:
 934
 935 @example
 936 (unit u-cmp "Compare Unit" ()
 937    1 1 ; issue done
 938    () ; state
 939    ((src1 INT -1) (src2 INT -1)) ; inputs
 940    () ; outputs
 941    () ; profile action (default)
 942 )
 943 @end example
 944
 945 This handles instructions that use src1 and src2 as operands.  The
 946 organization of units is arbitrary.  On the M32R, src1/src2 instructions
 947 are typically compare instructions so a separate function unit was
 948 created for them.  Current limitations require that each hardware item
 949 behind the operands must be marked with the attribute @code{PROFILE} and
 950 the hardware item must not be scalar.
 951
 952 @node Hardware elements
 953 @section Hardware elements
 954
 955 The elements of hardware that make up a CPU are defined with
 956 @code{define-hardware}.  Examples of hardware elements include
 957 registers, condition bits, immediate constants and memory.
 958
 959 Instruction fields that provide numerical values (``immediate
 960 constants'') aren't really elements of the hardware, but it simplifies
 961 things to think of them this way.  Think of them as @emph{constant
 962 generators}@footnote{A term borrowed from the book on the Bulldog
 963 compiler and perhaps other sources.}.
 964
 965 Hardware elements are defined with:
 966
 967 @example
 968 (define-hardware
 969   (name hardware-name)
 970   (comment "description")
 971   (attrs attribute-list)
 972   (semantic-name hardware-semantic-name)
 973   (type type-name type-arg1 type-arg2 ...)
 974   (indices index-type index-arg1 index-arg2 ...)
 975   (values values-type values-arg1 values-arg2 ...)
 976   (handlers handler1 handler2 ...)
 977   (get (args) expression)
 978   (set (args) expression)
 979   (layout layout-list)
 980 )
 981 @end example
 982
 983 The only required elements are @samp{name} and @samp{type}.
 984 Convention requires @samp{hardware-name} begin with @samp{h-}.
 985
 986 @subsection attrs
 987
 988 List of attributes. There are several predefined hardware attributes:
 989
 990 @itemize @minus
 991 @item MACH
 992
 993 A bitset attribute used to specify which machines have this hardware element.
 994 Do not specify the MACH attribute if the value is "all machs".
 995
 996 Usage: @code{(MACH mach1,mach2,...)}
 997 There must be no spaces in ``@code{mach1,mach2,...}''.
 998
 999 @item CACHE-ADDR
1000
1001 A hint to the simulator semantic code generator to tell it it can record the
1002 address of a selected register in an array of registers.  This speeds up
1003 simulation by moving the array computation to extraction time.
1004 This attribute is only useful to register arrays and cannot be specified
1005 with @code{VIRTUAL} (??? revisit).
1006
1007 @item PROFILE
1008
1009 This attribute must be present for hardware elements to which references
1010 are profiled.  Beware, this is work-in-progress.  If you use this
1011 attribute it is likely you have to hack CGEN.  (Please submit patches.)
1012
1013 @item VIRTUAL
1014
1015 The hardware element doesn't require any storage.
1016 This is used when you want a value that is derived from some other value.
1017 If @code{VIRTUAL} is specified, @code{get} and @code{set} specs must be
1018 provided.
1019 @end itemize
1020
1021 @subsection type
1022
1023 This is the type of hardware.  Current values are: @samp{pc}, @samp{register},
1024 @samp{memory}, and @samp{immediate}.
1025
1026 For @samp{pc}, see @xref{Program counter}.
1027
1028 For registers the syntax is one of:
1029
1030 @example
1031 @code{(register mode [(number)])}
1032 @code{(register (mode bits) [(number)])}
1033 @end example
1034
1035 where @samp{(number)} is the number of registers and is optional. If
1036 omitted, the default is @samp{(1)}.
1037 The second form is useful for describing registers with an odd (as in
1038 unusual) number of bits.
1039 @code{mode} for the second form must be one of @samp{INT} or @samp{UINT}.
1040 Since these two modes don't have an implicit size, they cannot be used for
1041 the first form.
1042
1043 @c ??? Might wish to remove the mode here and just specify number of bits.
1044
1045 For memory the syntax is:
1046
1047 @example
1048 @code{(memory mode (size))}
1049 @end example
1050
1051 where @samp{(size)} is the size of the memory in @samp{mode} units.
1052 In general @samp{mode} should be @code{QI}.
1053
1054 For immediates the syntax is one of
1055
1056 @example
1057 @code{(immediate mode)}
1058 @code{(immediate (mode bits))}
1059 @end example
1060
1061 The second form is for values for which a mode of that size doesn't exist.
1062 @samp{mode} for the second form must be one of @code{INT} or @code{UINT}.
1063 Since these two modes don't have an implicit size, they cannot be used
1064 for the first form.
1065
1066 ??? There's no real reason why a mode like SI can't be used
1067 for odd-sized immediate values.  The @samp{bits} field indicates the size
1068 and the @samp{mode} field indicates the mode in which the value will be used,
1069 as well as its signedness.  This would allow removing INT/UINT for this
1070 purpose.  On the other hand, a non-width specific mode allows applications
1071 to choose one (a simulator might prefer to store immediates in an `int'
1072 rather than, say, char if the specified mode was @code{QI}).
1073
1074 @subsection indices
1075
1076 Specify names for individual elements with the @code{indices} spec.
1077 It is only valid for registers with more than one element.
1078
1079 The syntax is:
1080
1081 @example
1082 @code{(indices index-type arg1 arg2 ...)}
1083 @end example
1084
1085 where @samp{index-type} specifies the kind of index and @samp{arg1 arg2 ...}
1086 are arguments to @samp{index-type}.
1087
1088 There are two supported values for @samp{index-type}: @code{keyword}
1089 and @code{extern-keyword}.  The difference is that indices defined with
1090 @code{keyword} are kept internal to the hardware element's definition
1091 and are not usable elsewhere, whereas @code{extern-keyword} specifies
1092 a set of indices defined elsewhere with @code{define-keyword}.
1093
1094 @subsubsection keyword
1095
1096 @example
1097 @code{(indices keyword name-prefix ((name1 value1) (name2 value2) ...))}
1098 @end example
1099
1100 @samp{name-prefix} is the assembler prefix common to each of the index names,
1101 and is added to name in the generated lookup table.
1102 For example, SPARC registers usually begin with @samp{"%"}.
1103
1104 Each @samp{(name value)} pair maps a name with an index number.
1105 An index can be specified multiple times, for example, when a register
1106 has multiple names.
1107
1108 There may be gaps in the index list, e.g. for invalid/reserved registers.
1109
1110 No enum is defined for keywords defined this way.
1111 If you want an enum use @samp{define-keyword} and @samp{extern-keyword}.
1112
1113 Example from Thumb:
1114
1115 @example
1116 (define-hardware
1117   (name h-gr-t)
1118   (comment "Thumb's general purpose registers")
1119   (attrs (ISA thumb) VIRTUAL) ; ??? CACHE-ADDR should be doable
1120   (type register WI (8))
1121   (indices keyword ""
1122            ((r0 0) (r1 1) (r2 2) (r3 3) (r4 4) (r5 5) (r6 6) (r7 7)))
1123   (get (regno) (reg h-gr regno))
1124   (set (regno newval) (set (reg h-gr regno) newval))
1125 )
1126 @end example
1127
1128 @subsubsection extern-keyword
1129
1130 @example
1131 @code{(indices extern-keyword keyword-name)}
1132 @end example
1133
1134 Often one wants to make the keywords available for general use,
1135 i.e. to arbitrary tools.
1136 @xref{Keywords}.
1137 When the collection of indices is defined with @samp{define-keyword}
1138 refer to it in the @samp{indices} field with @samp{extern-keyword}.
1139
1140 Example from M32R:
1141
1142 @example
1143 (define-keyword
1144   (name gr-names)
1145   (enum-prefix H-GR-)
1146   (values (fp 13) (lr 14) (sp 15)
1147           (r0 0) (r1 1) (r2 2) (r3 3) (r4 4) (r5 5) (r6 6) (r7 7)
1148           (r8 8) (r9 9) (r10 10) (r11 11) (r12 12) (r13 13) (r14 14) (r15 15))
1149 )
1150
1151 (define-hardware
1152   (name h-gr)
1153   (comment "general registers")
1154   (attrs PROFILE CACHE-ADDR)
1155   (type register WI (16))
1156   (indices extern-keyword gr-names)
1157 )
1158 @end example
1159
1160 @subsection values
1161
1162 Specify a list of valid values with the @code{values} spec.
1163 @c Clumsy wording.
1164
1165 The syntax is identical to the syntax for @code{indices}.
1166 It is only valid for immediates.
1167
1168 Example from sparc64:
1169
1170 @example
1171 (define-hardware
1172   (name h-p)
1173   (comment "prediction bit")
1174   (attrs (MACH64))
1175   (type immediate (UINT 1))
1176   (values keyword "" (("" 0) (",pf" 0) (",pt" 1)))
1177 )
1178 @end example
1179
1180 @subsection handlers
1181
1182 The @code{handlers} spec is an escape hatch for indicating when a
1183 programmer supplied routine must be called to perform a function.
1184
1185 The syntax is:
1186
1187 @example
1188 @samp{(handlers (handler-name1 "function_name1")
1189                 (handler-name2 "function_name2")
1190                 ...)}
1191 @end example
1192
1193 @samp{handler-name} must be one of @code{parse} or @code{print}.
1194 How @samp{function_name} is used is application specific, but in
1195 general it is the name of a function to call.  The only application
1196 that uses this at present is Opcodes.  See the Opcodes documentation for
1197 a description of each function's expected prototype.
1198 @c FIXME: Need ref here.
1199
1200 @subsection get
1201
1202 Specify special processing to be performed when a value is read
1203 with the @code{get} spec.
1204
1205 The syntax for scalar registers is:
1206
1207 @example
1208 @samp{(get () (expression))}
1209 @end example
1210
1211 The syntax for vector registers is:
1212
1213 @example
1214 @samp{(get (index) (expression))}
1215 @end example
1216
1217 @code{expression} is an RTL expression that computes the value to return.
1218 The mode of the result must be the mode of the register.
1219
1220 @code{index} is the name of the index as it appears in @code{expression}.
1221
1222 At present, @code{sequence}, @code{parallel}, @code{do-count}
1223 and @code{case} expressions are not allowed here.
1224
1225 @subsection set
1226
1227 Specify special processing to be performed when a value is written
1228 with the @code{set} spec.
1229
1230 The syntax for scalar registers is:
1231
1232 @example
1233 @samp{(set (newval) (expression))}
1234 @end example
1235
1236 The syntax for vector registers is:
1237
1238 @example
1239 @samp{(set (index newval) (expression))}
1240 @end example
1241
1242 @code{expression} is an RTL expression that stores @code{newval}
1243 in the register.  This may involve storing values in other registers as well.
1244 @code{expression} must be one of @code{set}, @code{if}, @code{sequence}, or
1245 @code{case}.
1246
1247 @code{index} is the name of the index as it appears in @code{expression}.
1248
1249 @subsection layout
1250
1251 For specific hardware elements, specifying a layout is an alternative
1252 to providing getter/setter specs.
1253
1254 At present this applies to only @samp{register} hardware elements,
1255 but not the @samp{pc}.
1256
1257 Some registers are a collection of bits with different meanings.
1258 It is often useful to define each field of such a register as its
1259 own register.  The @samp{layout} spec can then be used to build up
1260 the outer register from the individual register fields.
1261
1262 The fields are written from least to most significant.
1263 Each field is either the name of another hardware register,
1264 or a list of (value length) to specify hardwired bits.
1265
1266 A typical example is a ``flags'' register.
1267 Here is an example for a fictitious flags register.
1268 It is eight bits wide, with the lower four bits having defined values,
1269 and the upper four bits hardwired to zero.
1270
1271 @smallexample
1272 (dsh h-cf "carry flag"    () (register BI))
1273 (dsh h-sf "sign flag"     () (register BI))
1274 (dsh h-of "overflow flag" () (register BI))
1275 (dsh h-zf "zero flag"     () (register BI))
1276 (define-hardware
1277   (name flags)
1278   (type register QI)
1279   (layout (h-cf h-sf h-of h-zf (0 4)))
1280 )
1281 @end smallexample
1282
1283 @subsection Predefined hardware elements
1284
1285 Several hardware types are predefined:
1286
1287 @table @code
1288 @item h-uint
1289 unsigned integer
1290 @item h-sint
1291 signed integer
1292 @item h-memory
1293 main memory, where ``main'' is loosely defined
1294 @item h-addr
1295 data address (data only)
1296 @item h-iaddr
1297 instruction address (instructions only)
1298 @end table
1299
1300 @anchor{Program counter}
1301 @subsection Program counter
1302
1303 The program counter must be defined and is not a builtin.
1304 If get/set specs are not required, define it as:
1305
1306 @example
1307 (dnh h-pc "program counter" (PC) (pc) () () ())
1308 @end example
1309
1310 If get/set specs are required, define it as:
1311
1312 @example
1313 (define-hardware
1314   (name h-pc)
1315   (comment "<ARCH> program counter")
1316   (attrs PC)
1317   (type pc)
1318   (get () <insert get code here>)
1319   (set (newval) <insert set code here>)
1320 )
1321 @end example
1322
1323 If the architecture has multiple instruction sets, all must be specified.
1324 If they're not, the default is the first one which is often not what you want.
1325 Here's an example from @file{arm.cpu}:
1326
1327 @example
1328 (define-hardware
1329   (name h-pc)
1330   (comment "ARM program counter (h-gr reg 15)")
1331   (attrs PC (ISA arm,thumb))
1332   (type pc)
1333   (set (newval)
1334        (if (reg h-tbit)
1335            (set (raw-reg SI h-pc) (and newval -2))
1336            (set (raw-reg SI h-pc) (and newval -4))))
1337 )
1338 @end example
1339
1340 @subsection Simplification macros
1341
1342 To simplify @file{.cpu} files several pmacros are provided.
1343
1344 @anchor{a-define-normal-hardware}
1345 @anchor{a-dnh}
1346 The @code{define-normal-hardware} pmacro (with alias @code{dnh})
1347 takes a fixed set of positional arguments for the typical hardware element.
1348 The syntax is:
1349
1350 @code{(dnh name comment attributes type indices values handlers)}
1351
1352 Example:
1353
1354 @example
1355 (dnh h-gr "general registers"
1356      () ; attributes
1357      (register WI (16))
1358      (keyword "" ((fp 13) (sp 15) (lr 14)
1359                   (r0 0) (r1 1) (r2 2) (r3 3)
1360                   (r4 4) (r5 5) (r6 6) (r7 7)
1361                   (r8 8) (r9 9) (r10 10) (r11 11)
1362                   (r12 12) (r13 13) (r14 14) (r15 15)))
1363      () ()
1364 )
1365 @end example
1366
1367 This defines an array of 16 registers of mode @code{WI} ("word int").
1368 The names of the registers are @code{r0...r15}, and registers 13, 14 and
1369 15 also have the names @code{fp}, @code{lr} and @code{sp} respectively.
1370
1371 @anchor{a-define-simple-hardware}
1372 @anchor{a-dsh}
1373 Scalar registers with no special requirements occur frequently.
1374 Macro @code{define-simple-hardware} (with alias @code{dsh}) is identical to
1375 @code{dnh} except does not include the @code{indices}, @code{values},
1376 or @code{handlers} specs.
1377
1378 @example
1379 (dsh h-ibit "interrupt enable bit" () (register BI))
1380 @end example
1381
1382 @node Instruction fields
1383 @section Instruction fields
1384 @cindex Instruction fields
1385
1386 Instruction fields (ifields) define the raw bitfields of each instruction.
1387 Minimal semantic meaning is attributed to them.  Support is provided for
1388 mapping to and from the raw bit pattern and the usable contents, and
1389 other simple manipulations.
1390 @footnote{Whether to also provide a way to specify instruction formats is not yet
1391 clear.  Currently they are computed from the instructions, so there's no
1392 current *need* to provided them.  However, providing the ability as an
1393 option may simplify other tools CGEN is used to generate.  This
1394 simplification would come in the form of giving known names to the formats
1395 which CPU reference manuals often do.  Pre-specified instruction formats
1396 may also simplify expression of more complicated instruction sets.
1397 Providing instruction formats may also simplify the support of really
1398 complex ISAs like i386 and m68k).}
1399
1400 Instruction fields must be uniquely named within an instruction set,
1401 but different instruction sets (ISAs) may have ifields with the same name.
1402
1403 The syntax for defining instruction fields is:
1404
1405 @example
1406 (define-ifield
1407   (name field-name)
1408   (comment "description")
1409   (attrs attribute-list)
1410   (word-offset word-offset-in-bits)
1411   (word-length word-length-in-bits)
1412   (start starting-bit-number)
1413   (length number-of-bits)
1414   (follows ifield-name)
1415   (mode mode-name)
1416   (encode (value pc) (rtx to describe encoding))
1417   (decode (value pc) (rtx to describe decoding))
1418 )
1419 @end example
1420
1421 The required elements are: @samp{name}, @samp{start}, @samp{length}.
1422 @footnote{Positional specification simplifies instruction description somewhat
1423 in that there is no required order of fields, and a disjunct set of fields can
1424 be referred to as one.  On the other hand it can require knowledge of the length
1425 of the instruction which is inappropriate in cases like the M32R where
1426 the main fields have the same name and "position" regardless of the length
1427 of the instruction.  Moving positional specification into instruction formats,
1428 whether machine generated or programmer specified, may be done.}
1429
1430 Convention requires @samp{field-name} begin with @samp{f-}.
1431
1432 @subsection attrs
1433
1434 There are several predefined instruction field attributes:
1435
1436 @table @code
1437 @item PCREL-ADDR
1438 The field contains a PC relative address.  Various CPUs have various
1439 offsets from the PC from which the address is calculated.  This is
1440 specified in the encode and decode sections.
1441
1442 @item ABS-ADDR
1443 The field contains an absolute address.
1444
1445 @item SIGN-OPT
1446 The field has an optional sign.  It is sign-extended during
1447 extraction. Allowable values are -2^(n-1) to (2^n)-1.
1448
1449 @item RESERVED
1450 The field is marked as ``reserved'' by the architecture.
1451 This is an informational attribute.  Tools may use it
1452 to validate programs, either statically or dynamically.
1453
1454 @item VIRTUAL
1455 The field does not directly contribute to the instruction's value.  This
1456 is used to simplify semantic or assembler descriptions where a field's
1457 value is based on other values.  Multi-ifields are always virtual.
1458 @end table
1459
1460 @subsection word-offset
1461 The offset in bits from the start of the instruction to the word containing
1462 the field.
1463 This must be a multiple of eight.
1464
1465 Either both of @samp{word-offset} and @samp{word-length} must be
1466 specified or neither of them must be specified.  The presence of
1467 @samp{word-offset} means the long form of specifying the field's position is
1468 being used.  If absent then the short form is being used and the value for
1469 @samp{word-offset} is encoded in @samp{start}.
1470
1471 @subsection word-length
1472 The length in bits of the word containing the field.
1473 This must be a multiple of eight.
1474
1475 @subsection start
1476 The bit number of the field's most significant bit in the instruction.
1477 Bit numbering is determined by the @code{insn-lsb0?} field of
1478 @code{define-arch}.
1479
1480 If using the long form of specifying the field's position
1481 (i.e., @samp{word-offset} is specified) then this value is the value within
1482 the containing word.  If using the short form then this value includes
1483 the word offset.  See the Porting document for more info
1484 (@pxref{Writing define-ifield}).
1485
1486 @subsection length
1487 The number of bits in the field.  The field must be contiguous.  For
1488 non-contiguous instruction fields use ``multi-ifields''.
1489
1490 @subsection follows
1491 Optional.  Experimental.
1492 This should not be used for the specification of RISC-like architectures.
1493 It is an experiment in supporting CISC-like architectures.
1494 The argument is the name of the ifield or operand that immediately precedes
1495 this one.  In general the argument is an "anyof" operand.  The @code{follows}
1496 spec allows subsequent ifields to ``float''.
1497
1498 @subsection mode
1499 The mode the value is to be interpreted in.
1500 Usually this is @code{INT} or @code{UINT}.
1501
1502 @c ??? There's no real reason why modes like SI can't be used here.
1503 The @samp{length} field specifies the number of bits in the field,
1504 and the @samp{mode} field indicates the mode in which the value will be used,
1505 as well as its signedness.  This would allow removing INT/UINT for this
1506 purpose.  On the other hand, a non-width specific mode allows applications
1507 to choose one (a simulator might prefer to store immediates in an `int'
1508 rather than, say, char if the specified mode was @code{QI}).
1509
1510 @subsection encode
1511 An expression to apply to convert from usable values to raw field
1512 values.  The syntax is @code{(encode (value pc) expression)} or more
1513 generally @code{(encode ((<mode1> value) (IAI pc)) <expression>)},
1514 where @code{<mode1>} is the mode of the ``incoming'' value, and
1515 @code{<expression>} is an rtx to convert @code{value} to something that
1516 can be stored in the field.
1517
1518 Example:
1519
1520 @example
1521 (encode ((SF value) (IAI pc))
1522         (cond WI
1523               ((eq value (const SF 1.0)) (const 0))
1524               ((eq value (const SF 0.5)) (const 1))
1525               ((eq value (const SF -1.0)) (const 2))
1526               ((eq value (const SF 2.0)) (const 3))
1527               (else (error "invalid floating point value for field foo"))))
1528 @end example
1529
1530 In this example four floating point immediate values are represented in a
1531 field of two bits.  The above might be expanded to a series of `if' statements
1532 or the generator could determine a `switch' statement is more appropriate.
1533
1534 @subsection decode
1535
1536 An expression to apply to convert from raw field values to usable
1537 values.  The syntax is @code{(decode (value pc) expression)} or more
1538 generally @code{(decode ((<mode1> value) (IAI pc)) <expression>)},
1539 where @code{<mode1>} is the mode of the ``incoming'' value, and
1540 @code{<expression>} is an rtx to convert @code{value} to something usable.
1541
1542 Example:
1543
1544 @example
1545 (decode ((WI value) (IAI pc))
1546         (cond SF
1547               ((eq value 0) (const SF 1.0))
1548               ((eq value 1) (const SF 0.5))
1549               ((eq value 2) (const SF -1.0))
1550               ((eq value 3) (const SF 2.0))))
1551 @end example
1552
1553 There's no need to provide an error case as presumably @code{value}
1554 would never have an invalid value, though certainly one could provide an
1555 error case if one wanted to.
1556
1557 @subsection Non-contiguous fields
1558 @cindex Instruction fields, non-contiguous
1559
1560 Non-contiguous fields (e.g. sparc64's 16 bit displacement field) are
1561 built on top of support for contiguous fields.  The syntax for defining
1562 such fields is:
1563
1564 @example
1565 (define-multi-ifield
1566   (name field-name)
1567   (comment "description")
1568   (attrs attribute-list)
1569   (mode mode-name)
1570   (subfields field1-name field2-name ...)
1571   (insert (code to set each subfield))
1572   (extract (code to set field from subfields))
1573   (encode (value pc) (rtx to describe encoding))
1574   (decode (value pc) (rtx to describe decoding))
1575 )
1576 @end example
1577
1578 The required elements are: @samp{name}, @samp{subfields}.
1579
1580 Example:
1581
1582 @example
1583 (define-multi-ifield
1584   (name f-i20)
1585   (comment "20 bit unsigned")
1586   (attrs)
1587   (mode UINT)
1588   (subfields f-i20-4 f-i20-16)
1589   (insert (sequence ()
1590                     (set (ifield f-i20-4)  (srl (ifield f-i20) (const 16)))
1591                     (set (ifield f-i20-16) (and (ifield f-i20) (const #xffff)))
1592                     ))
1593   (extract (sequence ()
1594                      (set (ifield f-i20) (or (sll (ifield f-i20-4) (const 16))
1595                                              (ifield f-i20-16)))
1596                      ))
1597 )
1598 @end example
1599
1600 @subsubsection subfields
1601 The names of the already defined fields that make up the multi-ifield.
1602
1603 @subsubsection insert
1604 Code to set the subfields from the multi-ifield. All fields are referred
1605 to with @code{(ifield <name>)}.
1606
1607 @subsubsection extract
1608 Code to set the multi-ifield from the subfields. All fields are referred
1609 to with @code{(ifield <name>)}.
1610
1611 @subsection Simplification macros
1612
1613 To simplify @file{.cpu} files several pmacros are provided.
1614
1615 @anchor{a-define-normal-ifield}
1616 @anchor{a-dnf}
1617 The @code{define-normal-ifield} pmacro (with alias @code{dnf})
1618 takes a fixed set of positional arguments for the typical instruction field.
1619 The syntax is:
1620
1621 @code{(dnf name comment attributes start length)}
1622
1623 Example:
1624
1625 @example
1626 (dnf f-r1 "register r1" () 4 4)
1627 @end example
1628
1629 This defines a field called @samp{f-r1} that is an unsigned field of 4
1630 bits beginning at bit 4.  All fields defined with @code{dnf} are unsigned.
1631
1632 @anchor{a-df}
1633 The @code{df} pmacro adds @code{mode}, @code{encode}, and
1634 @code{decode} elements.
1635
1636 The syntax of @code{df} is:
1637
1638 @code{(df name comment attributes start length mode encode decode)}
1639
1640 Example:
1641
1642 @example
1643 (df f-disp8
1644     "disp8, slot unknown" (PCREL-ADDR)
1645     8 8 INT
1646     ((value pc) (sra WI (sub WI value (and WI pc (const -4))) (const 2)))
1647     ((value pc) (add WI (sll WI value (const 2)) (and WI pc (const -4)))))
1648 @end example
1649
1650 This defines a field called @samp{f-disp8} that is a signed PC-relative
1651 address beginning at bit 8 of size 8 bits that is left shifted by 2.
1652
1653 @anchor{a-define-normal-multi-ifield}
1654 @anchor{a-dnmf}
1655 The macro @code{define-normal-multi-ifield} (with alias @code{dnmf})
1656 takes a fixed set of positional arguments for the typical multi-ifield.
1657 The syntax is:
1658
1659 @code{(dnmf name comment attributes mode subfields insert extract)}
1660
1661 @anchor{a-dsmf}
1662 The macro @code{dsmf} takes a fixed set of positional arguments for
1663 simple multi-ifields.
1664 The syntax is:
1665
1666 @code{(dsmf name comment attributes mode subfields)}
1667
1668 @node Enumerated constants
1669 @section Enumerated constants
1670 @cindex Enumerated constants
1671 @cindex Enumerations
1672
1673 Enumerated constants (@emph{enums}) are important enough in instruction
1674 set descriptions that they are given special treatment.
1675 Enums are defined with:
1676
1677 @example
1678 (define-enum
1679   (name enum-name)
1680   (comment "description")
1681   (attrs attribute-list)
1682   (prefix prefix)
1683   (values val1 val2 ...)
1684 )
1685 @end example
1686
1687 Enums in opcode fields are further enhanced by specifying the opcode
1688 field they are used in.  This allows the enum's name to be specified
1689 in an instruction's @code{format} entry.
1690
1691 Instruction enums are defined with @code{define-insn-enum}:
1692
1693 @example
1694 (define-insn-enum
1695   (name enum-name)
1696   (comment "description")
1697   (attrs attribute-list)
1698   (ifield ifield-name)
1699   (prefix prefix)
1700   (values val1 val2 ...)
1701 )
1702 @end example
1703
1704 @emph{define-insn-enum is currently not provided,
1705 use define-normal-insn-enum instead}.
1706 @xref{a-define-normal-insn-enum, define-normal-insn-enum}.
1707
1708 @subsection prefix
1709 Convention requires each enum value to be prefixed with the same text.
1710 Rather than specifying the prefix in each entry, it is specified once, here.
1711 Convention requires @samp{prefix} not contain any lowercase characters.
1712 You generally want to end @samp{prefix} with @samp{-} or @samp{_}
1713 as the complete name of each enum value is @samp{prefix} + @samp{value-name}.
1714 The convention is to use @samp{-}, though this convention is not
1715 adhered to as well as the other conventions.
1716 @c FIXME
1717
1718 The default value is @samp{""}.
1719
1720 @subsection ifield
1721 The name of the instruction field that the enum is intended for.  This
1722 must be a simple ifield, not a multi-ifield.
1723
1724 @anchor{a-enum-values}
1725 @subsection values
1726 A list of possible values.  Each element has one of the following forms:
1727
1728 @itemize @bullet
1729 @item @code{name}
1730 @item @code{(name)}
1731 @item @code{(name value)}
1732 @item @code{(name - (attribute-list))}
1733 @item @code{(name value (attribute-list))}
1734 @end itemize
1735
1736 The syntax for numbers is Scheme's, so hex numbers are @code{#xnnnn}.
1737 A value of @code{-} means use the next value (previous value plus 1).
1738
1739 Enum values currently always have mode @samp{INT}.
1740
1741 Example:
1742
1743 @example
1744 (values "a" ("b") ("c" #x12)
1745         ("d" - (sanitize foo)) ("e" #x1234 (sanitize bar)))
1746 @end example
1747
1748 @subsection Simplification macros
1749
1750 To simplify @file{.cpu} files several pmacros are provided.
1751
1752 @anchor{a-define-normal-enum}
1753 The @code{define-normal-enum} pmacro takes a fixed set of
1754 positional arguments for the typical enum.
1755 The syntax is:
1756
1757 @code{(define-normal-enum name comment attrs prefix vals)}
1758
1759 @anchor{a-define-normal-insn-enum}
1760 The @code{define-normal-insn-enum} pmacro takes a fixed set of
1761 positional arguments for the typical instruction enum.
1762 The syntax is:
1763
1764 @code{(define-normal-insn-enum name comment attrs prefix ifield vals)}
1765
1766 Example:
1767
1768 @example
1769 (dnf f-op1 "op1" () 0 4)
1770 (define-normal-insn-enum insn-op1 "insn format enums" () OP1_ f-op1
1771   (.map .str (.iota 16))
1772 )
1773 @end example
1774
1775 This defines an instruction enum for field @samp{f-op1} with values
1776 OP1_0, OP1_1, ..., OP1_15.  These values can be directly used in
1777 instruction format specs.  This applies to ``instruction enums'' only.
1778 One can use normal enums in instruction format specs but one needs to
1779 explicitly specify the ifield, e.g. (f-op1 OP1_0).
1780
1781 @node Keywords
1782 @section Keywords
1783 @cindex Keywords
1784
1785 Keywords are like enums, @xref{Enumerated constants},
1786 but they also cause a table of names of each value to be generated.
1787 This is useful for things like registers where you want
1788 arbitrary tools to have access to the table of names.
1789
1790 The syntax for defining keywords changed from RTL version 0.7 to
1791 RTL version 0.8.  @xref{RTL Versions}.
1792
1793 RTL version 0.7 syntax:
1794
1795 @example
1796 (define-keyword
1797   (name keyword-name)
1798   (comment "description")
1799   (attrs attribute-list)
1800   (mode mode-name)
1801   (print-name "prefix-for-enum-values-without-trailing-dash")
1802   (prefix "prefix-for-names-in-string-table")
1803   (values value-list)
1804 )
1805 @end example
1806
1807 RTL version 0.8 syntax:
1808
1809 @example
1810 (define-keyword
1811   (name keyword-name)
1812   (comment "description")
1813   (attrs attribute-list)
1814   (mode mode-name)
1815   (enum-prefix "prefix-for-enum-values")
1816   (name-prefix "prefix-for-names-in-string-table")
1817   (values value-list)
1818 )
1819 @end example
1820
1821 Note that @samp{print-name} has been replaced with @samp{enum-prefix}
1822 and @samp{prefix} has been replaced with @samp{name-prefix}.
1823
1824 Furthermore, there is also a difference between the behavior of
1825 @samp{print-name} and @samp{enum-prefix}.
1826 When computing complete enum names with @samp{print-name},
1827 CGEN adds a @samp{-} between the prefix and the enum name.
1828 CGEN does not insert a @samp{-} with @samp{enum-prefix}.
1829
1830 @subsection mode
1831
1832 This is the mode to reference and record the keyword's value in.
1833 The default is @samp{INT}.  It is normally not necessary to use
1834 something else.
1835
1836 @subsection print-name
1837
1838 @emph{NOTE: This is for RTL version 0.7 only.}
1839
1840 This value plus a trailing @samp{-} is passed as the @samp{prefix}
1841 parameter when defining the corresponding enum.  @xref{Enumerated constants}.
1842
1843 Convention requires @samp{print-name} not contain any lowercase characters.
1844
1845 The default value is the keyword's name in uppercase.
1846
1847 @subsection prefix
1848
1849 @emph{NOTE: This is for RTL version 0.7 only.}
1850
1851 @samp{prefix} is the assembler prefix common to each of the index names,
1852 and is added to name in the generated lookup table.
1853 For example, SPARC registers usually begin with @samp{"%"}.
1854 It is @emph{not} added to the corresponding enum value names.
1855
1856 The default value is @samp{""}.
1857
1858 @subsection enum-prefix
1859
1860 @emph{NOTE: This is for RTL version 0.8 and higher.
1861 You must specify the RTL version at the top of the description file.}
1862
1863 This value is passed as the @samp{prefix} parameter when defining the
1864 corresponding enum.  @xref{Enumerated constants}.
1865
1866 @emph{NOTE:} Unlike @samp{print-name} in RTL version @samp{0.7},
1867 @samp{-} is not appended when defining the corresponding enum.
1868
1869 Convention requires @samp{enum-prefix} not contain any lowercase characters.
1870
1871 The default value is the keyword's name in uppercase + @samp{-}.
1872
1873 @subsection name-prefix
1874
1875 @emph{NOTE: This is for RTL version 0.8 and higher.
1876 You must specify the RTL version at the top of the description file.}
1877
1878 @samp{name-prefix} is the assembler prefix common to each of the index names,
1879 and is added to name in the generated lookup table.
1880 For example, SPARC registers usually begin with @samp{"%"}.
1881 It is @emph{not} added to the corresponding enum value names.
1882
1883 The default value is @samp{""}.
1884
1885 @subsection values
1886
1887 The @samp{values} field has the same syntax as the @samp{values}
1888 field of @samp{define-enum}.  @xref{a-enum-values, Enum Values}.
1889
1890 Example from M32R:
1891
1892 @smallexample
1893 (define-keyword
1894   (name gr-names)
1895   (enum-prefix H-GR-)
1896   (values (fp 13) (lr 14) (sp 15)
1897           (r0 0) (r1 1) (r2 2) (r3 3) (r4 4) (r5 5) (r6 6) (r7 7)
1898           (r8 8) (r9 9) (r10 10) (r11 11) (r12 12) (r13 13) (r14 14) (r15 15))
1899 )
1900 @end smallexample
1901
1902 Referencing enum values from this keyword in the .cpu file would use
1903 @samp{H-GR-} + @samp{register-name}.  E.g., H-GR-r12.
1904
1905 @node Instruction operands
1906 @section Instruction operands
1907 @cindex Instruction operands
1908 @cindex Operands, instruction
1909
1910 Instruction operands provide:
1911
1912 @itemize @bullet
1913 @item a layer between the assembler and the raw hardware description
1914 @item the main means of making an instruction's fields useful to
1915 the semantic code
1916 @c More?
1917 @end itemize
1918
1919 Instruction operands must be uniquely named within an instruction set,
1920 but different instruction sets (ISAs) may have operands with the same name.
1921
1922 The syntax for defining an operand is:
1923
1924 @example
1925 (define-operand
1926   (name operand-name)
1927   (comment "description")
1928   (attrs attribute-list)
1929   (type hardware-element)
1930   (mode mode-name)
1931   (index instruction-field)
1932   (handlers handler-spec)
1933   (getter getter-spec)
1934   (setter setter-spec)
1935 )
1936 @end example
1937
1938 The required elements are: @code{name}, @code{type}, @code{mode},
1939 and if @code{type} is not a scaler @code{index}.
1940
1941 @subsection name
1942
1943 This is the name of the operand as a Scheme symbol.
1944 The name choice is fairly important as it is used in instruction
1945 syntax entries, instruction format entries, and semantic expressions.
1946 It can't collide with symbols used in semantic expressions
1947 (e.g. @code{and}, @code{set}, etc).
1948
1949 The convention is that operands have no prefix (whereas ifields begin
1950 with @samp{f-} and hardware elements begin with @samp{h-}).  A prefix
1951 like @samp{o-} would avoid collisions with other semantic elements, but
1952 operands are used often enough that any prefix is a hassle.
1953
1954 Note that if you @emph{do} decide to prefix operand names, e.g. use
1955 a style like @samp{o-foo}, then you will need to remember to use the
1956 @samp{$@{o-foo@}} form in the assembler syntax and not the @samp{$o-foo}
1957 form because the latter only takes alphanumeric characters.
1958 @xref{assembler-syntax, syntax}.
1959
1960 @subsection attrs
1961
1962 A list of attributes. In addition to attributes defined for the operand,
1963 an operand inherits the attributes of its instruction field. There are
1964 several predefined operand attributes:
1965
1966 @table @code
1967 @item NEGATIVE
1968 The operand contains negative values (not used yet so definition is
1969 still nebulous.
1970
1971 @item RELAX
1972 This operand contains the changeable field (usually a branch address) of
1973 a relaxable/relaxed instruction.
1974
1975 @item SEM-ONLY
1976 Use the SEM-ONLY attribute for cases where the operand will only be used
1977 in semantic specification, and not assembly code specification.  A
1978 typical example is condition codes.
1979 @c Does this attribute need to exist?
1980 @end table
1981
1982 To refer to a hardware element in semantic code one must either use an
1983 operand or one of reg/mem/const.  Operands generally exist to map
1984 instruction fields to the selected hardware element and are easier to
1985 use in semantic code than referring to the hardware element directly
1986 (e.g. @code{sr} is easier to type and read than @code{(reg h-gr
1987 <index>)}). Example:
1988
1989 @example
1990   (dnop condbit "condition bit" (SEM-ONLY) h-cond f-nil)
1991 @end example
1992
1993 @code{f-nil} is the value to use when there is no instruction field
1994
1995 @c There might be some language cleanup to be done here regarding f-nil.
1996 @c It is kind of extraneous.
1997
1998 @subsection type
1999 The hardware element this operand applies to. This must be the name of a
2000 hardware element.
2001
2002 @subsection mode
2003 The mode the value is to be interpreted in.
2004
2005 @subsection index
2006 The index of the hardware element. This is used to mate the hardware
2007 element with the instruction field that selects it, and must be the name
2008 of an ifield entry. (*Note: The index may be other things besides
2009 ifields in the future.)  It must not be a multi-ifield, currently.
2010
2011 @subsection handlers
2012 Sometimes it's necessary to escape to C to parse assembler, or print
2013 a value.  This field is an escape hatch to implement this.
2014 The syntax is:
2015
2016 @code{(handlers handler-spec)}
2017
2018 where @code{handler-spec} is one or more of:
2019
2020 @code{(parse "function_suffix")} -- a call to function
2021 @code{parse_<function_suffix>} is generated.
2022
2023 @code{(print "function_suffix")} -- a call to function
2024 @code{print_<function_suffix>} is generated.
2025
2026 These functions are intended to be provided in a separate @file{.opc}
2027 file.  The prototype of a parse function depends on the hardware type.
2028 See @file{cpu/*.opc} for examples.
2029
2030 @c FIXME: The following needs review.
2031
2032 For integer it is:
2033
2034 @example
2035 static const char *
2036 parse_foo (CGEN_CPU_DESC cd,
2037            const char **strp,
2038            int opindex,
2039            unsigned long *valuep);
2040 @end example
2041
2042 @code{cd} is the result of @code{<arch>_cgen_cpu_open}.
2043 @code{strp} is a pointer to a pointer to the assembler and is updated by
2044 the function.
2045 @c FIXME
2046 @code{opindex} is ???.
2047 @code{valuep} is a pointer to where to record the parsed value.
2048 @c FIXME
2049 If a relocation is needed, it is queued with a call to ???. Queued
2050 relocations are processed after the instruction has been parsed.
2051
2052 The result is an error message or NULL if successful.
2053
2054 The prototype of a print function depends on the hardware type.  See
2055 @file{cpu/*.opc} for examples. For integers it is:
2056
2057 @example
2058 void print_foo (CGEN_CPU_DESC cd,
2059                 PTR dis_info,
2060                 long value,
2061                 unsigned int attrs,
2062                 bfd_vma pc,
2063                 int length);
2064 @end example
2065
2066 @samp{cd} is the result of @code{<arch>_cgen_cpu_open}.
2067 @samp{ptr} is the `info' argument to print_insn_<arch>.
2068 @samp{value} is the value to be printed.
2069 @samp{attrs} is the set of boolean attributes.
2070 @samp{pc} is the PC value of the instruction.
2071 @samp{length} is the length of the instruction.
2072
2073 Actual printing is done by calling @code{((disassemble_info *)
2074 dis_info)->fprintf_func}.
2075
2076 @subsection Simplification macros
2077
2078 To simplify @file{.cpu} files several pmacros are provided.
2079
2080 @anchor{a-define-normal-operand}
2081 @anchor{a-dno}
2082 @anchor{a-dnop}
2083 The @code{define-normal-operand}) pmacro (with alias @code{dno})
2084 takes a fixed set of positional arguments for the typical operand.
2085
2086 There is also the @code{dnop} pmacro, it is an alias of @code{dno}.
2087
2088 The syntax of @code{dno} is:
2089
2090 @code{(dno name comment attrs type index)}
2091
2092 Example:
2093
2094 @example
2095 (dno sr "source register" () h-gr f-r2)
2096 @end example
2097
2098 This defines an operand name @samp{sr} that is an @samp{h-gr} register
2099 indexed by the @samp{f-r2} ifield.
2100
2101 @node Derived operands
2102 @section Derived operands
2103 @cindex Derived operands
2104 @cindex Operands, instruction
2105 @cindex Operands, derived
2106
2107 Derived operands are an experiment in supporting the addressing modes of
2108 CISC-like architectures.  Addressing modes are difficult to support as
2109 they essentially increase the number of instructions in the architecture
2110 by an order of magnitude.  Defining all the variants requires something
2111 in addition to the RISC-like architecture support.  The theory is that
2112 since CISC-like instructions are basically "normal" instructions with
2113 complex operands the place to add the necessary support is in the
2114 operands.
2115
2116 Two kinds of operands exist to support CISC-like cpus, and they work
2117 together.  ``derived-operands'' describe one variant of a complex
2118 argument, and ``anyof'' operands group them together.
2119
2120 The syntax for defining derived operands is:
2121
2122 @example
2123 (define-derived-operand
2124   (name operand-name)
2125   (comment "description")
2126   (attrs attribute-list)
2127   (mode mode-name)
2128   (args arg1-operand-name arg2-operand-name ...)
2129   (syntax "syntax")
2130   (base-ifield ifield-name)
2131   (encoding (+ arg1-operand-name arg2-operand-name ...))
2132   (ifield-assertion expression)
2133   (getter expression)
2134   (setter expression)
2135 )
2136 @end example
2137
2138 @cindex anyof operands
2139 @cindex Operands, anyof
2140
2141 The syntax for defining anyof operands is:
2142
2143 @example
2144 (define-anyof-operand
2145   (name operand-name)
2146   (comment "description")
2147   (attrs attribute-list)
2148   (mode mode-name)
2149   (base-ifield ifield-name)
2150   (choices derived-operand1-name derived-operand2-name ...)
2151 )
2152 @end example
2153
2154 @subsection mode
2155
2156 The name of the mode of the operand.
2157
2158 @subsection args
2159
2160 List of names of operands the derived operand uses.
2161 The operands must already be defined.
2162 The argument operands can be any kind of operand: normal, derived, anyof.
2163
2164 @subsection syntax
2165
2166 Assembler syntax of the operand.
2167
2168 ??? This part needs more work.  Addressing mode specification in assembler
2169 needn't be localized to the vicinity of the operand.
2170
2171 @subsection base-ifield
2172
2173 The name of the instruction field common to all related derived operands.
2174 Here related means "used by the same `anyof' operand".
2175
2176 @subsection encoding
2177
2178 The machine encoding of the operand.
2179
2180 @subsection ifield-assertion
2181
2182 An assertion of what values any instruction fields will or will not have
2183 in the containing instruction.
2184
2185 @anchor{ifield-assertion-rtl}
2186 The syntax of the assertion is a restricted subset of RTL.
2187 It may only contain @samp{andif}, @samp{eq}, @samp{ne},
2188 and may only use scalar instruction fields
2189 @footnote{A scalar instruction field is a simple ifield
2190 (not a multi or derived ifield), or a multi-ifield consisting
2191 of only simple ifields.}
2192 and integers as operands.
2193 Furthermore, ifields must be specified in the first operand of
2194 @samp{eq}, @samp{ne}.
2195
2196 As a degenerate case, a single non-zero integer, is also supported,
2197 meaning the assertion passes.
2198
2199 In addition, the assertion may also use @samp{member}.
2200
2201 Syntax: @code{(member ifield-name (number-list value1 [value2 ...]))}
2202 @footnote{Like all rtx, the full syntax is
2203 @code{(member [(options)] [member-mode] ifield-name (number-list [(options)] [numlist-mode] value1 [value2 ...]))},
2204 but @samp{options} and @samp{mode} are not really useful here.
2205 @samp{member-mode} is @samp{BI}, since the result is a boolean value.}
2206
2207 The result of @samp{member} is one if the value of the ifield
2208 is a member of the list @code{(value1 [value2 ...])}.
2209 Otherwise the result is zero.
2210
2211 If the result of the assertion is non-zero, the assertion passes.
2212 Otherwise it fails, and the instruction is not selected for that
2213 particular bit pattern.
2214
2215 @subsection getter
2216
2217 RTL expression to get the value of the operand.
2218 All operands refered to must be specified in @code{args}.
2219
2220 @subsection setter
2221
2222 RTL expression to set the value of the operand.
2223 All operands refered to must be specified in @code{args}.
2224 Use @code{newval} to refer to the value to be set.
2225
2226 @subsection choices
2227
2228 For anyof operands, the names of the derived operands.
2229 The operand may be "any of" the specified choices.
2230
2231 @node Instructions
2232 @section Instructions
2233 @cindex Instructions
2234
2235 Each instruction in the instruction set has an entry in the description
2236 file.
2237 @footnote{For complicated instruction sets this is a lot of typing.  However,
2238 macros can reduce a lot of that typing.  The real question is given the
2239 amount of information that must be expressed, how succinct can one express
2240 it and still be clean and usable?  I'm open to opinions on how to improve
2241 this, but such improvements must take everything CGEN wishes to be into
2242 account.
2243 (*Note: Of course no claim is made that the current design is the
2244 be-all and end-all or that there is one be-all and end-all.)}
2245
2246 Instructions must be uniquely named within an instruction set,
2247 but different instruction sets (ISAs) may have instructions with the same name.
2248
2249 The syntax for defining an instruction is:
2250
2251 @example
2252 (define-insn
2253   (name insn-name)
2254   (comment "description")
2255   (attrs attribute-list)
2256   (syntax "assembler syntax")
2257   (format (+ field-list))
2258   (ifield-assertion expression)
2259   (semantics expression)
2260   (timing timing-data)
2261 )
2262 @end example
2263
2264 The required elements are: @code{name}, ???.
2265
2266 Instructions specific to a particular cpu variant are denoted as such with
2267 the MACH attribute.
2268
2269 Possible additions for the future:
2270
2271 @itemize @bullet
2272 @item a field to describe a final constraint for determining a match
2273 @item choosing the output from a set of choices
2274 @end itemize
2275
2276 @subsection attrs
2277
2278 A list of attributes, for which there are several predefined instruction
2279 attributes:
2280
2281 @table @code
2282 @item MACH
2283 A bitset attribute used to specify which machines have this hardware
2284 element. Do not specify the MACH attribute if the value is for all
2285 machines.
2286
2287 Usage: @code{(MACH mach1,mach2,...)}
2288
2289 There must be no spaces in ``@code{mach1,mach2,...}''.
2290
2291 @item UNCOND-CTI
2292 The instruction is an unconditional ``control transfer instruction''.
2293
2294 (*Note: This attribute is derived from the semantic code. However if the
2295 computed value is wrong (dunno if it ever will be) the value can be
2296 overridden by explicitly mentioning it.)
2297
2298 @item COND-CTI
2299 The instruction is an conditional "control transfer instruction".
2300
2301 (*Note: This attribute is derived from the semantic code. However if the
2302 computed value is wrong (dunno if it ever will be) the value can be
2303 overridden by explicitly mentioning it.)
2304
2305 @item SKIP-CTI
2306 The instruction can cause one or more insns to be skipped. This is
2307 derived from the semantic code.
2308
2309 @item DELAY-SLOT
2310 The instruction has one or more delay slots. This is derived from the
2311 semantic code.
2312
2313 @item RELAXABLE
2314 The instruction has one or more identical variants.  The assembler tries
2315 this one first and then the relaxation phases switches to larger ones as
2316 necessary.
2317
2318 @item RELAXED
2319 The instruction is a non-minimal variant of a relaxable instruction.  It
2320 is avoided by the assembler in the first pass.
2321
2322 @item ALIAS
2323 Internal attribute set for macro-instructions that are an alias for one
2324 real insn.
2325
2326 @item NO-DIS
2327 For macro-instructions, don't use during disassembly.
2328 @end table
2329
2330 @anchor{assembler-syntax}
2331 @subsection syntax
2332
2333 This is a character string consisting of raw characters and operands.
2334 Fields are denoted by @code{$operand} or
2335 @code{$@{operand@}}.  The @code{$@{operand@}} form is required if
2336 the operand name contains non-alphanumeric characters.
2337 @c ??? Technically, '_' and '@' are ok too, I think, but do we want that?
2338 If a @samp{$} is required in the syntax, it is specified with @samp{\$}.
2339 If a @samp{\} is required in the syntax, it is specified with @samp{\\}.
2340
2341 At most one white-space character may be
2342 present and it must be a blank separating the instruction mnemonic from
2343 the operands.  This doesn't restrict the user's assembler, this is
2344 @c Is this reasonable?
2345 just a description file restriction to separate the mnemonic from the
2346 operands@footnote{The restriction can be relaxed by saying the first
2347 blank is the one that separates the mnemonic from its operands.}.
2348 Note that the assembler will accept multiple spaces in the assembler code
2349 after the mnemonic and between operands as expected.
2350
2351 Operands can refer to registers, constants, and whatever else is necessary.
2352
2353 Instruction mnemonics can take operands.  For example, on the SPARC a
2354 branch instruction can take @code{,a} as an argument to indicate the
2355 instruction is being annulled (e.g. @code{bge$a $disp22}).
2356
2357 @subsection format
2358
2359 This is a complete list of fields that specify the instruction.  At
2360 present it must be prefaced with @code{+} to allow for future additions.
2361 Reserved bits must also be specified, gaps are not allowed.
2362 @c Well, actually I think they are and it could certainly be allowed.
2363 @c Question: should they be allowed?
2364 The ordering of the fields is not important.
2365
2366 Format elements can be any of:
2367
2368 @itemize @bullet
2369 @item an instruction field name with an integer, e.g. @code{(f-op1 4)}
2370 @item an instruction field name with an enum, e.g. @code{(f-op1 OP1_4)}
2371 @item an instruction field enum, e.g. @code{OP1_4}
2372 @item an operand name, e.g. @code{dr}
2373 @end itemize
2374
2375 @subsection ifield-assertion
2376
2377 This is an expression with a boolean result that is run as the final
2378 part of instruction decoding to verify a match.
2379
2380 The syntax of the assertion is a restricted subset of RTL.
2381 @xref{ifield-assertion-rtl}.
2382
2383 @subsection semantics
2384 @cindex Semantics
2385
2386 This field provides a mathematical description of what the instruction
2387 does.  Its syntax is GCC RTL-like on purpose since GCC's RTL is well
2388 known by the intended audience.  However, it is not intended that it be
2389 precisely GCC RTL.
2390
2391 Obviously there are some instructions that are difficult if not
2392 impossible to provide a description for (e.g. I/O instructions).  Rather
2393 than create a new semantic function for each quirky operation, escape
2394 hatches to C are provided to handle all such cases.  The @code{c-code},
2395 @code{c-call} and @code{c-raw-call} semantic functions provide an
2396 escape-hatch to invoke C code to perform the
2397 operation. @xref{Expressions}.
2398
2399 @subsection timing
2400 @cindex Timing
2401
2402 A list of entries for each function unit the instruction uses on each machine
2403 that supports the instruction.  The default function unit is the u-exec unit.
2404
2405 The syntax is:
2406
2407 @example
2408 (model-name (unit name (direction unit-var-name1 insn-operand-name1)
2409                        (direction unit-var-name2 insn-operand-name2)
2410                        ...
2411                        (cycles cycle-count))
2412 @end example
2413
2414 direction/unit-var-name/insn-operand-name mappings are optional.
2415 They map unit inputs/outputs to semantic elements.  The
2416 direction specifier can be @code{in} or @code{out} mapping the
2417 name of a unit input or output, respectively, to an insn
2418 operand.
2419
2420 @code{cycles} overrides the @code{done} value (latency) of the function
2421 unit and is optional.
2422
2423 @subsection Simplification macros
2424
2425 To simplify @file{.cpu} files several pmacros are provided.
2426
2427 @anchor{a-define-normal-insn}
2428 @anchor{a-dni}
2429 The @code{define-normal-insn} pmacro (with alias @code{dni})
2430 takes a fixed set of positional arguments for the typical instruction.
2431
2432 The syntax of @code{dni} is:
2433
2434 @code{(dni name comment attrs syntax format semantics timing)}
2435
2436 Example:
2437
2438 @example
2439 (dni addi "add 8 bit signed immediate"
2440      ()
2441      "addi $dr,$simm8"
2442      (+ OP1_4 dr simm8)
2443      (set dr (add dr simm8))
2444      ()
2445 )
2446 @end example
2447
2448 @node Macro-instructions
2449 @section Macro-instructions
2450 @cindex Macro-instructions
2451 @cindex Instructions, macro
2452
2453 Macro-instructions are for the assembler side of things and are not used
2454 by the simulator.
2455
2456 Macro-instructions must be uniquely named within an instruction set,
2457 but different instruction sets (ISAs) may have macro-instructions
2458 with the same name.
2459
2460 The syntax for defining a macro-instruction is:
2461
2462 @example
2463 (define-macro-insn
2464   (name macro-insn-name)
2465   (comment "description")
2466   (attrs attribute-list)
2467   (syntax "assembler syntax")
2468   (expansions expansion-spec)
2469 )
2470 @end example
2471
2472 @subsection syntax
2473
2474 Syntax of the macro-instruction. This has the same value as the
2475 @code{syntax} field in @code{define-insn}.
2476
2477 @subsection expansions
2478
2479 An expression to emit code for the instruction.  This is intended to be
2480 general in nature, allowing tests to be done at runtime that choose the
2481 form of the expansion.  Currently the only supported form is:
2482
2483 @code{(emit insn arg1 arg2 ...)}
2484
2485 where @code{insn} is the name of an instruction defined with
2486 @code{define-insn} and @emph{argn} is the set of operands to
2487 @code{insn}'s syntax.  Each argument is mapped in order to one operand
2488 in @code{insn}'s syntax and may be any of:
2489
2490 @itemize @bullet
2491 @item operand specified in @code{syntax}
2492 @item @code{(operand value)}
2493 @end itemize
2494
2495 @subsection Simplification macros
2496
2497 To simplify @file{.cpu} files several pmacros are provided.
2498
2499 @anchor{a-define-normal-macro-insn}
2500 @anchor{a-dnmi}
2501 The @code{define-normal-macro-insn}) pmacro (with alias @code{dnmi})
2502 takes a fixed set of positional arguments for the typical macro-instruction.
2503
2504 The syntax of @code{dnmi} is:
2505
2506 @code{(dnmi name comment attrs syntax expansion)}
2507
2508 Example:
2509
2510 @example
2511 (dni st-minus "st-" ()
2512      "st $src1,@-$src2"
2513      (+ OP1_2 OP2_7 src1 src2)
2514      (sequence ((WI new-src2))
2515                (set new-src2 (sub src2 (const 4)))
2516                (set (mem WI new-src2) src1)
2517                (set src2 new-src2))
2518      ()
2519 )
2520 @end example
2521
2522 @example
2523 (dnmi push "push" ()
2524   "push $src1"
2525   (emit st-minus src1 (src2 15)) ; "st %0,@-sp"
2526 )
2527 @end example
2528
2529 In this example, the @code{st-minus} instruction is a general
2530 store-and-decrement instruction and @code{push} is a specialized version
2531 of it that uses the stack pointer.
2532
2533 @node Modes
2534 @section Modes
2535 @cindex Modes
2536
2537 Modes provide a simple and succinct way of specifying data types.
2538
2539 (*Note: Should more complex types will be needed (e.g. structs? unions?),
2540 these can be handled by extending the definition of a mode to encompass them.)
2541 @c Also, have registers as just bits and have the operand / semantic operation
2542 @c provide the mode.
2543
2544 Modes are similar to their usage in GCC, but there are some differences:
2545
2546 @itemize @bullet
2547 @item modes for boolean values (i.e. bits) are also supported as they are
2548 useful
2549 @item integer modes exist in signed and unsigned versions
2550 @item constants have modes
2551 @end itemize
2552
2553 Currently supported modes are:
2554
2555 @table @code
2556 @item VOID
2557 VOIDmode in GCC.
2558
2559 @item DFLT
2560 Indicate the default mode is wanted, the value of which depends on context.
2561 This is a pseudo-mode and never appears in generated code.
2562
2563 @item BI
2564 Boolean zero/one
2565
2566 @item QI,HI,SI,DI
2567 Same as GCC.
2568
2569 QI is an 8 bit quantity ("quarter int").
2570 HI is a 16 bit quantity ("half int").
2571 SI is a 32 bit quantity ("single int").
2572 DI is a 64 bit quantity ("double int").
2573
2574 In cases where signedness matters, these modes are signed.
2575
2576 @item UQI,UHI,USI,UDI
2577 Unsigned versions of QI,HI,SI,DI.
2578
2579 These modes do not appear in semantic RTL.  Instead, the RTL function
2580 specifies the signedness of its operands where necessary.
2581 To a cpu, a 32 bit register is a 32 bit register.
2582 Ditto for when the 32 bit quantity lives in memory.
2583 It's only in how it is subsequently used or interpreted that
2584 signedness might come into play.
2585 When signedness comes into play on the chip, it's explicitly
2586 specified in the operation, _not_ in the data.
2587 Ergo from this perspective Umodes don't belong in .cpu files.
2588 This is the perspective to use when writing .cpu files.
2589
2590 @c I'm not entirely sure these unsigned modes are needed.
2591 @c They are useful in removing any ambiguity in how to sign extend constants
2592 @c which has been a source of problems in GCC.
2593 @c OTOH, maybe adding uconst akin to const is the way to go?
2594 @c
2595 @c ?? Some existing ports use these modes.
2596
2597 @item WI,UWI
2598 word int, unsigned word int (word_mode in gcc).
2599 These are aliases for the real mode, typically either @code{SI} or @code{DI}.
2600
2601 @item SF,DF,XF,TF
2602 Same as GCC.
2603
2604 SF is a 32 bit IEEE float ("single float").
2605 DF is a 64 bit IEEE float ("double float").
2606 XF is either an 80 or 96 bit IEEE float ("extended float").
2607 (*Note: XF values on m68k and i386 are different so may
2608 wish to give them different names).
2609 TF is a 128 bit IEEE float.
2610
2611 @item AI
2612 Address integer
2613
2614 @item IAI
2615 Instruction address integer
2616
2617 @item INT,UINT
2618 Varying width int/unsigned-int.  The width is specified by context,
2619 usually in an instruction field definition.
2620
2621 @end table
2622
2623 @node Expressions
2624 @section Expressions
2625 @cindex Expressions
2626
2627 The syntax of CGEN's RTL expressions (or @emph{rtx}) basically follows that of
2628 GCC's RTL.
2629
2630 The handling of modes is different to simplify the implementation.
2631 Implementation shouldn't necessarily drive design, but it was a useful
2632 simplification.  Still, it needs to be reviewed.  The difference is that
2633 in GCC @code{(function:MODE arg1 ...)} is written in CGEN as
2634 @code{(function MODE arg1 ...)}.  Note the space after @samp{function}.
2635
2636 GCC RTL allows flags to be recorded with RTL (e.g. MEM_VOLATILE_P).
2637 This is supported in CGEN RTL by prefixing each RTL function's arguments
2638 with an optional list of modifiers:
2639 @code{(function (#:mod1 #:mod2) MODE arg1 ...)}.
2640 The list is a set of modifier names prefixed with '#:'.  They can take
2641 arguments.
2642 ??? Modifiers are supported by the RTL traversing code, but no use is
2643 made of them yet.
2644
2645 The mode may be elided if it can be deduced from the operands.
2646 For example, while the full form of @code{add} is
2647 @samp{(add () MODE arg1 arg2)},
2648 it may be written as @samp{(add arg1 arg2)}, with the mode being
2649 taken from the mode of @samp{arg1}.
2650 The fully specified version is called the ``canonical'' form.
2651
2652 The currently defined semantic functions are:
2653
2654 @table @code
2655 @item (set mode destination source)
2656 Assign @samp{source} to @samp{destination} reference in mode @samp{mode}.
2657
2658 @item (set-quiet mode destination source)
2659 Assign @samp{source} to @samp{destination} referenced in mode
2660 @samp{mode}, but do not print any tracing message.
2661
2662 @item (reg mode hw-name [index])
2663 Return an `operand' of hardware element @samp{hw-name} in mode @samp{mode}.
2664 If @samp{hw-name} is an array, @samp{index} selects which register.
2665
2666 @item (raw-reg mode hw-name [index])
2667 Return an `operand' of hardware element @samp{hw-name} in mode @samp{mode},
2668 bypassing any @code{get} or @code{set} specs of the register.
2669 If @samp{hw-name} is an array, @samp{index} selects which register.
2670 This cannot be used with virtual registers (those specified with the
2671 @samp{VIRTUAL} attribute).
2672
2673 @code{raw-reg} is most often used in @code{get} and @code{set} specs
2674 of a register: if it weren't read and write operations would infinitely
2675 recurse.
2676
2677 @item (mem mode address)
2678 Return an `operand' of memory referenced at @samp{address} in mode
2679 @samp{mode}.
2680
2681 @item (const mode value)
2682 Return an `operand' of constant @samp{value} in mode @samp{mode}.
2683
2684 @item (enum mode value-name)
2685 Return an `operand' of constant @samp{value-name} in mode @samp{mode}.
2686 The value must be from a previously defined enum.
2687
2688 @item (subword mode value word-num)
2689 Return part of @samp{value}.  Which part is determined by @samp{mode} and
2690 @samp{word-num}.  There are three cases.
2691 @c Blech.  ``subword'' is a source of confusion in GCC.
2692 @c Maybe have three separate rtxs.
2693
2694 If @samp{mode} is the same size as the mode of @samp{value}, @samp{word-num}
2695 must be @samp{0} and the result is @samp{value} recast in the new mode.
2696 There is no change in the bits of @samp{value}, they're just interpreted in a
2697 possibly different mode.  This is most often used to interpret an integer
2698 value as a float and vice versa.
2699
2700 If @samp{mode} is smaller than the mode of @samp{value}, @samp{value} is
2701 divided into N pieces and @samp{word-num} picks which piece.
2702 All pieces have the size of @samp{mode} except possibly the last.
2703 If the last piece has a different size, it cannot be referenced.
2704 Word number 0 is the most significant word, regardless of endianness.
2705
2706 If @samp{mode} is larger than the mode of @samp{value}, @samp{value} is
2707 interpreted in the larger mode with the upper most significant bits treated
2708 as garbage (their value is assumed to be unimportant to the context in which
2709 the value will be used).
2710 @samp{word-num} must be @samp{0}.
2711
2712 @item (join out-mode in-mode arg1 . arg-rest)
2713 Concatenate @samp{arg1[,arg2[,...]]} to create a value of mode @samp{out-mode}.
2714 @samp{arg1} becomes the most significant part of the result.
2715 Each argument is interpreted in mode @samp{in-mode}.
2716 @samp{in-mode} must evenly divide @samp{out-mode}.
2717
2718 @item (sequence mode ((mode1 local1) ...) expr1 ...)
2719 Execute @samp{expr1}, @samp{expr2}, etc. sequentially.
2720 At least one expression must be specified, even if the result
2721 mode is @samp{VOID}.
2722
2723 The result, if non-void-mode, is the value of the last expression.
2724
2725 @samp{mode} is the mode of the result.
2726 If @samp{mode} is elided it is set to @samp{VOID} (void mode).
2727
2728 `@code{((mode1 local1) ...)}' is a set of local variables.
2729
2730 @item (parallel mode empty expr1 ...)
2731 Execute @samp{expr1}, @samp{expr2}, etc. in parallel. All inputs are
2732 read before any output is written.
2733 At least one expression must be specified.
2734
2735 @samp{empty} must be @samp{()} and
2736 is present for consistency with @samp{sequence}.
2737
2738 @samp{mode} must be @samp{VOID} (void mode), or it can be elided.
2739
2740 @item (do-count mode iteration-variable number-of-iterations expr1 ...)
2741 This is a simple looping operation.
2742 Execute @samp{expr1}, @samp{expr2}, etc. the specified number of times.
2743 At least one expression must be specified.
2744
2745 @samp{iteration-variable} will contain the iteration number and is
2746 available for use in expressions.  It has mode @samp{INT}.
2747 It's value will be 0 ... @samp{number-of-iterations} - 1.
2748
2749 @samp{number-of-iterations} is an rtl expression of mode INT
2750 (or a compatible mode).  It is computed once and may not be modified
2751 inside the loop.
2752
2753 @samp{mode} must be @samp{VOID} (void mode), or it can be elided.
2754
2755 @item (unop mode operand)
2756 Perform a unary arithmetic operation.
2757
2758 @samp{unop} is one of @code{neg},
2759 @code{abs}, @code{inv}, @code{not}, @code{zflag}, @code{nflag}.
2760 @code{zflag} returns a bit indicating if @samp{operand} is
2761 zero. @code{nflag} returns a bit indicating if @samp{operand} is
2762 negative. @code{inv} returns the bitwise complement of @samp{operand},
2763 whereas @code{not} returns its logical negation.
2764
2765 @item (binop mode operand1 operand2)
2766 Perform a binary arithmetic operation.
2767
2768 @samp{binop} is one of
2769 @code{add}, @code{sub}, @code{and}, @code{or}, @code{xor}, @code{mul},
2770 @code{div}, @code{udiv}, @code{mod}, @code{umod}.
2771
2772 @item (binop-with-bit mode operand1 operand2 operand3)
2773 Same as @samp{binop}, except taking 3 operands. The third operand is
2774 always a single bit.
2775
2776 @samp{binop-with-bit} is one of @code{addc},
2777 @code{addc-cflag}, @code{addc-oflag}, @code{subc}, @code{subc-cflag},
2778 @code{subc-oflag}.
2779
2780 Note: The following are deprecated:
2781
2782 @itemize @bullet
2783 @item @code{add-cflag}, replaced with @code{addc-cflag}
2784 @item @code{add-oflag}, replaced with @code{addc-oflag}
2785 @item @code{sub-cflag}, replaced with @code{subc-cflag}
2786 @item @code{sub-cflag}, replaced with @code{subc-oflag}
2787 @end itemize
2788
2789 @item (shiftop mode operand1 operand2)
2790 Perform a shift operation.
2791 @samp{operand1} is shifted (or rotated) by the amount specified
2792 in @samp{operand2}.
2793
2794 @samp{shiftop} is one of @code{sll}, @code{srl}, @code{sra},
2795 @code{ror}, @code{rol}.
2796
2797 @samp{mode} must match the mode of @samp{operand1}.
2798 The mode of @samp{operand1} may be any integral mode.
2799 The mode of @samp{operand2} may be any integral mode, and need not match
2800 the mode of @samp{operand1}.
2801
2802 It is an error if @samp{operand2} is negative or greater than
2803 or equal to the size of @samp{operand1}.
2804 If the architecture handles negative or large shift amounts,
2805 that needs to be handled in the surrounding RTL.
2806
2807 @item (andif mode operand1 operand2)
2808 Evaluate @samp{operand1}.
2809 If it evaluates to zero the result is zero,
2810 and @samp{operand2} is not evaluated.
2811 If @samp{operand1} evaluates to non-zero, then evaluate @samp{operand2}.
2812 If it evaluates to non-zero the result is one,
2813 otherwise the result is zero.
2814
2815 The mode of the result is @samp{BI}.
2816 @samp{mode} is generally elided or is @samp{BI}.
2817
2818 @item (orif mode operand1 operand2)
2819 Evaluate @samp{operand1}.
2820 If it evaluates to non-zero the result is one,
2821 and @samp{operand2} is not evaluated.
2822 If @samp{operand1} evaluates to zero, then evaluate @samp{operand2}.
2823 If it evaluates to non-zero the result is one,
2824 otherwise the result is zero.
2825
2826 The mode of the result is @samp{BI}.
2827 @samp{mode} is generally elided or is @samp{BI}.
2828
2829 @item (integer-convop mode operand)
2830 Perform an integer mode->mode conversion operation.
2831
2832 @samp{integer-convop} is one of:
2833
2834 @itemize @bullet
2835 @item @code{ext}
2836 Sign-extend @samp{operand}, which must have an integer mode
2837 narrower than @samp{mode}, which also must be an integer mode.
2838 @item @code{zext}
2839 Zero-extend @samp{operand}, which must have an integer mode
2840 narrower than @samp{mode}, which also must be an integer mode.
2841 @item @code{trunc}
2842 Truncate @samp{operand}, which must have an integer mode
2843 wider than @samp{mode}, which also must be an integer mode.
2844 @end itemize
2845
2846 @item (float-convop mode how operand)
2847 Perform a mode->mode conversion operation involving a floating point value.
2848
2849 Conversions involving floating point values need to specify
2850 how things like truncation will be performed, e.g., the rounding mode.
2851 @samp{how} is an rtx of mode @samp{INT} that specifies how the conversion
2852 will be performed.  The interpretation of @samp{how} is architecture-dependent,
2853 except that a value of zero has a specific meaning:
2854 If a particular floating-point conversion can only be done one way,
2855 or if the conversion is to be done the ``default'' way, specify zero
2856 for @samp{how}.
2857 What ``the default way'' is is application-dependent.
2858
2859 @samp{float-convop} is one of:
2860
2861 @itemize @bullet
2862 @item @code{fext}
2863 Extend @samp{operand}, which must have a floating point mode
2864 narrower than @samp{mode}, which also must be a floating point mode.
2865 @item @code{ftrunc}
2866 Truncate @samp{operand}, which must have a floating point mode
2867 wider than @samp{mode}, which also must be a floating point mode.
2868 @item @code{float}
2869 Convert @samp{operand}, which must have an integer mode,
2870 to a floating point value of mode @samp{mode}.
2871 @samp{operand} is treated as a signed integer.
2872 @item @code{ufloat}
2873 Convert @samp{operand}, which must have an integer mode,
2874 to a floating point value of mode @samp{mode}.
2875 @samp{operand} is treated as an unsigned integer.
2876 @item @code{fix}
2877 Convert @samp{operand}, which must have a floating point mode,
2878 to a signed integer of mode @samp{mode}.
2879 @item @code{ufix}
2880 Convert @samp{operand}, which must have a floating point mode,
2881 to an unsigned integer of mode @samp{mode}.
2882 @end itemize
2883
2884 An enum is defined that specifies several predefined rounding modes.
2885
2886 @smallexample
2887 (define-enum
2888   (name fpconv-kind)
2889   (comment "builtin floating point conversion kinds")
2890   (attrs VIRTUAL) ;; let app provide def'n instead of each cpu's desc.h
2891   (prefix FPCONV-)
2892   (values ((DEFAULT 0)
2893            (TIES-TO-EVEN 1)
2894            (TIES-TO-AWAY 2)
2895            (TOWARD-ZERO 3)
2896            (TOWARD-POSITIVE 4)
2897            (TOWARD-NEGATIVE 5)))
2898 )
2899 @end smallexample
2900
2901 @item (cmpop mode operand1 operand2)
2902 Perform a comparison.
2903
2904 @samp{cmpop} is one of @code{eq}, @code{ne},
2905 @code{lt}, @code{le}, @code{gt}, @code{ge}, @code{ltu}, @code{leu},
2906 @code{gtu}, @code{geu}.
2907 @c floating point compare-unordered?
2908
2909 If the comparison succeeds the result is one,
2910 otherwise the result is zero.
2911 The mode of the result is @samp{BI}.
2912
2913 @item (mathop mode operand)
2914 Perform a mathematical operation.
2915
2916 @samp{mathop} is one of @code{sqrt}, @code{cos}, @code{sin}.
2917
2918 @item (*nan mode operand)
2919 Return a boolean indicating if @samp{operand} is a NaN.
2920 @samp{mode} must be a floating point mode.
2921 There are three versions.
2922
2923 @itemize @bullet
2924 @item (nan operand)
2925 Test whether @samp{operand} is any kind of NaN.
2926 @item (qnan operand)
2927 Test whether @samp{operand} is a quiet NaN.
2928 @item (snan operand)
2929 Test whether @samp{operand} is a signalling NaN.
2930 @end itemize
2931
2932 @item (if mode condition then [else])
2933 Standard @code{if} statement.
2934
2935 @samp{condition} is any arithmetic expression.
2936 If the value is non-zero the @samp{then} part is executed.
2937 Otherwise, the @samp{else} part is executed (if present).
2938
2939 @samp{mode} is the mode of the result, not of @samp{condition}.
2940 If @samp{mode} is not @code{VOID} (void mode), @samp{else} must be present.
2941 When the result is used, @samp{mode} must specified, and not be @code{VOID}.
2942
2943 @item (cond mode (condition1 expr1a ...) (...) [(else exprNa...)])
2944 From Scheme: keep testing conditions until one succeeds, and then
2945 process the associated expressions.
2946
2947 @item (case mode test ((case1 ..) expr1a ..) (..) [(else exprNa ..)])
2948 From Scheme: Compare @samp{test} with @samp{case1}, @samp{case2},
2949 etc. and process the associated expressions.
2950
2951 @item (c-code mode "C expression")
2952 An escape hook to insert arbitrary C code. @samp{mode} must the
2953 compatible with the result of ``C expression''.
2954
2955 @item (c-call mode symbol operand1 operand2 ...)
2956 An escape hook to emit a subroutine call to function named @samp{symbol}
2957 passing operands @samp{operand1}, @samp{operand2}, etc.  An implicit
2958 first argument of @code{current_cpu} is passed to @samp{symbol}.
2959 @samp{mode} is the mode of the result.  Be aware that @samp{symbol} will
2960 be restricted by reserved words in the C programming language and by
2961 existing symbols in the generated code.
2962
2963 @item (c-raw-call mode symbol operand1 operand2 ...)
2964 Same as @code{c-call}: except there is no implicit @code{current_cpu}
2965 first argument.
2966 @samp{mode} is the mode of the result.
2967
2968 @item (clobber mode object)
2969 Indicate that @samp{object} is written in mode @samp{mode}, without
2970 saying how. This could be useful in conjunction with the C escape hooks.
2971
2972 @item (delay mode num expr)
2973 Indicate that there are @samp{num} delay slots in the processing of
2974 @samp{expr}.  When using this rtx in instruction semantics, CGEN will
2975 infer that the instruction has the DELAY-SLOT attribute.
2976
2977 @item (delay num expr)
2978 In older "sim" simulators, indicates that there are @samp{num} delay
2979 slots in the processing of @samp{expr}. When using this rtx in instruction
2980 semantics, CGEN will infer that the instruction has the DELAY-SLOT
2981 attribute.
2982
2983 In newer "sid" simulators, evaluates to the writeback queue for hardware
2984 operand @samp{expr}, at @samp{num} instruction cycles in the
2985 future. @samp{expr} @emph{must} be a hardware operand in this case.
2986
2987 For example, @code{(set (delay 3 pc) (+ pc 1))} will schedule write to
2988 the @samp{pc} register in the writeback phase of the 3rd instruction
2989 after the current. Alternatively, @code{(set gr1 (delay 3 gr2))} will
2990 immediately update the @samp{gr1} register with the @emph{latest write}
2991 to the @samp{gr2} register scheduled between the present and 3
2992 instructions in the future. @code{(delay 0 ...)}  refers to the
2993 writeback phase of the current instruction.
2994
2995 This effect is modeled with a circular buffer of "write stacks" for each
2996 hardware element (register banks get a single stack). The size of the
2997 circular buffer is calculated from the uses of @code{(delay ...)}
2998 rtxs. When a delayed write occurs, the simulator pushes the write onto
2999 the appropriate write stack in the "future" of the circular buffer for
3000 the written-to hardware element. At the end of each instruction cycle,
3001 the simulator executes all writes in all write stacks for the time slice
3002 just ending. When a delayed read (essentially a pipeline bypass) occurs,
3003 the simulator looks ahead in the circular buffer for any writes
3004 scheduled in the future write stack. If it doesn't find one, it
3005 progressively backs off towards the "current" instruction cycle's write
3006 stack, and if it still finds no scheduled writes then it returns the
3007 current state of the CPU. Thus while delayed writes are fast, delayed
3008 reads are potentially slower in a simulator with long pipelines and very
3009 large register banks.
3010
3011 @item (annul yes?)
3012 @c FIXME: put annul into the glossary.
3013 Annul the following instruction if @samp{yes?} is non-zero. This rtx is
3014 an experiment and will probably change.
3015
3016 @item (skip yes?)
3017 Skip the next instruction if @samp{yes?} is non-zero. This rtx is
3018 an experiment and will probably change.
3019
3020 @item (symbol name)
3021 Return a symbol with value @samp{name}, for use in attribute
3022 processing. This is equivalent to @samp{quote} in Scheme but
3023 @samp{quote} sounds too jargonish.
3024
3025 @item (int-attr mode object attr-name)
3026 Return the value of attribute @samp{attr-name} in mode @samp{mode}.
3027 @samp{object} must currently be @samp{(current-insn)}, the current instruction,
3028 or @samp{(current-mach)}, the current machine.
3029 The attribute's value must be representable as an integer.
3030
3031 @item (eq-attr mode object attr-name value)
3032 Return non-zero if the value of attribute @samp{attr-name} of
3033 object @samp{object} is @samp{value}.
3034
3035 @emph{NOTE:} List values of @samp{value} may be changed to allow use the
3036 @samp{number-list} rtx function.
3037 If @samp{value} is a list return ``true'' if the attribute is any of
3038 the listed values.  But this is not implemented yet.
3039
3040 @item (index-of operand)
3041 Return the index of @samp{operand}. For registers this is the register number.
3042
3043 @item (regno operand)
3044 Same as @code{index-of}, but improves readability for registers.
3045
3046 @item (error mode message)
3047 Emit an error message from CGEN RTL. Error message is specified by @samp{message}.
3048
3049 @item (nop)
3050 A no-op.
3051
3052 @item (ifield field-name)
3053 Return the value of field @samp{field-name}. @samp{field-name} must be a
3054 field in the instruction.
3055
3056 @end table
3057
3058 Operands can be any of:
3059
3060 @itemize @bullet
3061 @item an operand defined in the description file
3062 @item a register reference, created with (reg mode [index])
3063 @item a memory reference, created with (mem mode address)
3064 @item a constant, created with (const mode value)
3065 @item a `sequence' local variable
3066 @item a `do-count' iteration variable
3067 @item another expression
3068 @end itemize
3069
3070 The @samp{symbol} in a @code{c-call} or @code{c-raw-call} function is
3071 currently the name of a C function or macro that is invoked by the
3072 generated semantic code.
3073
3074 @node Macro-expressions
3075 @section Macro-expressions
3076 @cindex Macro-expressions
3077
3078 Macro RTL expressions are a way to not have to always
3079 specify a mode for every expression (and sub-expression
3080 thereof).  Whereas the formal way to specify, say, an add is
3081 @code{(add SI arg1 arg2)} if SI is the default mode of `arg1' then
3082 this can be simply written as @code{(add arg1 arg2)}.
3083 This gets expanded to @code{(add DFLT arg1 arg2)} where
3084 @code{DFLT} means ``default mode''.
3085
3086 It might be possible to replace macro expressions with preprocessor macros,
3087 however for the nonce there is no plan to do this.