doc/manual.dtx

   1 %#! lualatex -shell-escape manual.ins
   2
   3 %<*en>
   4 \documentclass[a4paper,titlepage]{article}
   5 \usepackage[margin=20mm]{geometry}
   6 %</en>
   7 %<*ja>
   8 \documentclass[a4paper,titlepage]{bxjsarticle}
   9 \setpagelayout*{margin=20mm}
  10 \def\headfont{\normalfont\bfseries}
  11 % \def\headfont{\sffamily\gtfamily} is needed in ordinal documents
  12 %</ja>
  13
  14 \usepackage{amsmath,amssymb,xcolor,pict2e,multienum}
  15 \usepackage{booktabs,listings,lltjlisting,showexpl,multicol}
  16 \usepackage{luatexja-otf}
  17 \usepackage[unicode=true]{hyperref}
  18 \usepackage[all]{xy}
  19 \SelectTips{cm}{}
  20
  21 \DeclareRobustCommand\eTeX{\ensuremath{\varepsilon}-\kern-.125em\TeX}
  22 \DeclareRobustCommand\LuaTeX{Lua\TeX}
  23 \DeclareRobustCommand\pTeX{p\kern-.05em\TeX}
  24 \DeclareRobustCommand\upTeX{p\kern-.05em\TeX}
  25 \DeclareRobustCommand\pLaTeX{p\kern-.05em\LaTeX}
  26 \DeclareRobustCommand\pLaTeXe{p\kern-.05em\LaTeXe}
  27 \DeclareRobustCommand\epTeX{\ensuremath{\varepsilon}-\kern-.125em\pTeX}
  28
  29
  30 \makeatletter
  31 \long\def\@makecaption#1#2{%
  32   \vskip\abovecaptionskip
  33   \sbox\@tempboxa{{\small #1. #2}}%
  34   \ifdim \wd\@tempboxa >\hsize
  35     {\small #1. #2}\par
  36   \else
  37     \global \@minipagefalse
  38     \hb@xt@\hsize{\hfil\box\@tempboxa\hfil}%
  39   \fi
  40   \vskip\belowcaptionskip}
  41 \makeatother
  42
  43 %<*en>
  44 \title{The \LuaTeX-ja package}
  45 \author{The \LuaTeX-ja project team}
  46 %</en>
  47 %<*ja>
  48 \title{\LuaTeX-jaパッケージ}
  49 \author{\LuaTeX-jaプロジェクトチーム}
  50 %</ja>
  51
  52 \lstset{
  53   basicstyle=\ttfamily\small, pos=o, breaklines=true,
  54   numbers=none, rframe={}, basewidth=0.5em
  55 }
  56
  57 \parskip=\smallskipamount
  58 \begin{document}
  59 \catcode`\<=13
  60 \def<#1>{{\normalfont\rm\itshape$\langle$#1$\rangle$}}
  61 \maketitle
  62
  63 \tableofcontents
  64 \bigskip
  65
  66 %<*en>
  67 {\Large\bf This documentation is far from complete. It may have many
  68 grammatical (and contextual) errors.}
  69 %</en>
  70 %<*ja>
  71 \textbf{\large 本ドキュメントはまだまだ未完成です．
  72 また，英語版と日本語版をdocstripプログラムを用いることで一緒に生成している都合上，
  73 見出しが英語のままになっています．}
  74 %</ja>
  75
  76 \clearpage
  77 \part{User's manual}
  78
  79 \section{Introduction}
  80
  81 %<*en>
  82 The \LuaTeX-ja package is a macro package for typesetting high-quality
  83 Japanese documents when using \LuaTeX.
  84 %</en>
  85 %<*ja>
  86 \LuaTeX-jaパッケージは，次世代標準\TeX である\LuaTeX の上で，\pTeX と同等
  87 /それ以上の品質の日本語組版を実現させようとするマクロパッケージである．
  88 %</ja>
  89
  90 \subsection{Backgrounds}
  91 Traditionally, ASCII \pTeX, an extension of \TeX, and its derivatives
  92 are used to typeset Japanese documents in \TeX. \pTeX\ is an engine
  93 extension of \TeX: so it can produce high-quality Japanese documents
  94 without using very complicated macros. But this point is a mixed
  95 blessing: \pTeX\ is left behind from other extensions of \TeX,
  96 especially \eTeX\ and pdf\TeX, and from changes about
  97 Japanese processing in computers (\textit{e.g.}, the UTF-8 encoding).
  98
  99 Recently extensions of \pTeX, namely \upTeX\ (Unicode-implementation
 100 of \pTeX) and \epTeX\ (merging of \pTeX\ and
 101 \eTeX\ extension), have developed to fill those gaps to some
 102 extent, but gaps still exist.
 103
 104 However, the appearance of \LuaTeX\ changed the whole situation. With
 105 using Lua `callbacks', users can customize the internal processing of
 106 \LuaTeX. So there is no need to modify sources of engines to
 107 support Japanese typesetting: to do this, we only have to write Lua
 108 scripts for appropriate callbacks.
 109
 110
 111 \subsection{Major Changes from \pTeX}
 112 The \LuaTeX-ja package is under much influence of \pTeX\ engine. The initial
 113 target of development was to implement features of \pTeX. However,
 114 \emph{\LuaTeX-ja is not a just porting of \pTeX; unnatural
 115 specifications/behaviors of \pTeX\ were not adopted}.
 116
 117 The followings are major changes from \pTeX:
 118 \begin{itemize}
 119 \item A Japanese font is a tuple of a `real' font, a Japanese font
 120       metric (\textbf{JFM}, for short), and an optional string called
 121       `variation'.
 122
 123 \item In \pTeX, a linebreak after Japanese character is ignored (and
 124       doesn't yield a space), since linebreaks (in source files) are
 125       permitted almost everywhere in Japanese texts. However, \LuaTeX-ja
 126       doesn't have this function completely, because of a specification
 127       of \LuaTeX.
 128 \item The insertion process of glues/kerns between two Japanese
 129       characters and between a Japanese character and other characters
 130       (we refer these glues/kerns as \textbf{JAglue}) is rewritten from
 131       scratch.
 132
 133 \begin{itemize}
 134 \item As \LuaTeX's internal character handling is `node-based'
 135       (\textit{e.g.}, \verb+of{}fice+ doesn't prevent ligatures), the
 136       insertion process of \textbf{JAglue} is now `node-based'.
 137 \item Furthermore, nodes between two characters which have no effects in
 138       linebreak (\textit{e.g.}, \verb+\special+ node) are ignored in the
 139       insertion process.
 140 \item In the process, two Japanese fonts which differ in their `real'
 141       fonts only are identified.
 142 \end{itemize}
 143 \item At the present, vertical typesetting (\emph{tategaki}), is not
 144       supported in \LuaTeX-ja.
 145
 146 \end{itemize}
 147 For detailed information, see Part~\ref{part-imp}.
 148
 149 \subsection{Notations}
 150 In this document, the following terms and notations are used:
 151 \begin{itemize}
 152 \item Characters are divided into two types:
 153 \begin{itemize}
 154 \item \textbf{JAchar}: standing for Japanese characters such as
 155       Hiragana, Katakana, Kanji and other punctuation marks for
 156       Japanese.
 157 \item \textbf{ALchar}: standing for all other characters like alphabets.
 158 \end{itemize}
 159 We say `alphabetic fonts' for fonts used in \textbf{ALchar}, and `Japanese fonts' for fonts used in \textbf{JAchar}.
 160
 161 \item A word in a sans-serif font (like \textsf{prebreakpenalty})
 162       represents an internal parameter for Japanese typesetting, and it
 163       is used as a key in \verb+\ltjsetparameter+ command.
 164 \item The word `primitive' is used not only for primitives in \LuaTeX,
 165       but also for control sequences that defined in the core module of
 166       \LuaTeX-ja.
 167 \item In this document, natural numbers start from~0.
 168 \end{itemize}
 169
 170 \subsection{About the project}
 171 \paragraph{Project Wiki} Project Wiki is under construction.
 172 \begin{itemize}
 173 \item \url{http://sourceforge.jp/projects/luatex-ja/wiki/FrontPage%28en%29} (English)
 174 \item \url{http://sourceforge.jp/projects/luatex-ja/wiki/FrontPage} (Japanese)
 175 \end{itemize}
 176
 177 This project is hosted by SourceForge.JP.
 178
 179 \paragraph{Members}\
 180 \begin{multienumerate}
 181 \def\labelenumi{$\bullet$}
 182 \mitemxxx{Hironori KITAGAWA}{Kazuki MAEDA}{Takayuki YATO}
 183 \mitemxxx{Yusuke KUROKI}{Noriyuki ABE}{Munehiro YAMAMOTO}
 184 \mitemxx{Tomoaki HONDA}{}{}
 185 \end{multienumerate}
 186
 187 % \paragraph{Acknowledgments} -- 挿入するならここ
 188
 189 \clearpage
 190 \section{Getting Started}
 191 \subsection{Installation}
 192 To install the \LuaTeX-ja\ package, you will need:
 193 \begin{itemize}
 194 \item \LuaTeX\ (version 0.65.0-beta or later) and its supporting packages.\\
 195 If you are using \TeX~Live~2011 or current W32\TeX, you don't have to worry.
 196 \item The source archive of \LuaTeX-ja, of course{\tt:)}
 197 \end{itemize}
 198
 199 The installation methods are as follows:
 200 \begin{enumerate}
 201 \item Download the source archive.
 202
 203 At the present, \LuaTeX-ja has no official release, so you have to retrieve
 204 the archive from the repository.
 205 You can retrieve the Git repository via
 206 \begin{verbatim}
 207 $ git clone git://git.sourceforge.jp/gitroot/luatex-ja/luatexja.git
 208 \end{verbatim}
 209 or download the archive of HEAD in \texttt{master} branch from
 210 \begin{flushleft}
 211 \url{http://git.sourceforge.jp/view?p=luatex-ja/luatexja.git;a=snapshot;h=HEAD;sf=tgz}.
 212 \end{flushleft}
 213
 214 Note that the forefront of development may not be in \texttt{master} branch.
 215 \item Extract the archive. You will see {\tt src/} and several other sub-directories.
 216 \item Copy all the contents of {\tt src/} into one of your \texttt{TEXMF} tree.
 217 \item If {\tt mktexlsr} is needed to update the filename database, make it so.
 218 \end{enumerate}
 219
 220 \subsection{Cautions}
 221 \begin{itemize}
 222 \item The encoding of your source file must be UTF-8. No other
 223       encodings, such as EUC-JP or Shift-JIS, are not supported.
 224 \item May be conflict with other packages.
 225
 226 For example, the default setting of \textbf{JAchar} in the present
 227       version does not coexist with \texttt{unicode-math}
 228       package. Putting the following line in preamble makes that
 229       mathematical symbols will be typeset correctly, but several
 230       Japanese characters will be treated as an \textbf{ALchar} as
 231       side-effect:
 232 \begin{verbatim}
 233 \ltjsetparameter{jacharrange={-3, -8}}
 234 \end{verbatim}
 235 \end{itemize}
 236
 237 \subsection{Using in plain \TeX}\label{ssec-plain}
 238 To use \LuaTeX-ja in plain \TeX, simply put the following  at the beginning of the document:
 239 \begin{verbatim}
 240 \input luatexja.sty
 241 \end{verbatim}
 242
 243 This does minimal settings (like {\tt ptex.tex}) for typesetting Japanese documents:
 244 \begin{itemize}
 245 \item The following 6~Japanese fonts are preloaded:
 246 \begin{center}
 247 \begin{tabular}{ccccc}
 248 \toprule
 249 \textbf{classification}&\textbf{font name}&\bf `10\,pt'&\bf`7\,pt'&\bf`5\,pt'\\\midrule
 250 \emph{mincho}&Ryumin-Light    &\verb+\tenmin+&\verb+\sevenmin+&\verb+\fivemin+\\
 251 \emph{gothic}&GothicBBB-Medium&\verb+\tengt+ &\verb+\sevengt+ &\verb+\fivegt+\\
 252 \bottomrule
 253 \end{tabular}
 254 \end{center}
 255 \begin{itemize}
 256 \item The `Q' is a unit used in Japanese phototypesetting, and
 257       $1\,\textrm{Q}=0.25\,\textrm{mm}$. This length is stored in a
 258       dimension \verb+\jQ+.
 259
 260 \item It is widely accepted that the font `Ryumin-Light' and
 261       `GothicBBB-Medium' aren't embedded into PDF files, and PDF reader
 262       substitute them by some external Japanese fonts (\textit{e.g.},
 263       Kozuka Mincho is used for Ryumin-Light in Adobe Reader). We adopt this custom to
 264       the default setting.
 265 \item A character in an alphabetic font is generally smaller than a
 266       Japanese font in the same size. So actual size specification of
 267       these Japanese fonts is in fact smaller than that of alphabetic
 268       fonts, namely scaled by 0.962216.
 269 \end{itemize}
 270 \item The amount of glue that are inserted between a \textbf{JAchar} and
 271       an \textbf{ALchar} (the parameter \textsf{xkanjiskip}) is set to
 272 \[
 273  (0.25\cdot 13.5\,\textrm{Q})^{+1\,\text{pt}}_{-1\,\text{pt}}
 274  = {27\over 32}\,\mathrm{mm}^{+1\,\text{pt}}_{-1\,\text{pt}}.
 275 \]
 276 \end{itemize}
 277
 278 \subsection{Using in \LaTeX}\label{ssec-ltx}
 279 \paragraph{\LaTeXe}
 280 Using in \LaTeXe\ is basically same. To set up the minimal environment
 281 for Japanese, you only have to load {\tt luatexja.sty}:
 282 \begin{verbatim}
 283 \usepackage{luatexja}
 284 \end{verbatim}
 285 It also does minimal settings (counterparts in \pLaTeX\ are {\tt
 286 plfonts.dtx} and {\tt pldefs.ltx}):
 287
 288 \begin{itemize}
 289 \item {\tt JY3} is the font encoding for Japanese fonts (in horizontal direction).\\
 290 When vertical typesetting is supported by \LuaTeX-ja in the future, {\tt JT3} will be used for vertical fonts.
 291 \item Two font families {\tt mc} and {\tt gt} are defined:
 292 \begin{center}
 293 \begin{tabular}{ccccc}
 294 \toprule
 295 \textbf{classification}&\textbf{family}&\verb+\mdseries+&\verb+\bfseries+&\textbf{scale}\\\midrule
 296 \emph{mincho}&\tt mc&Ryumin-Light    &GothicBBB-Medium&0.962216\\
 297 \emph{gothic}&\tt gt&GothicBBB-Medium&GothicBBB-Medium&0.962216\\
 298 \bottomrule
 299 \end{tabular}
 300 \end{center}
 301 Remark that the bold series in both family are same as the medium series of \emph{gothic} family.
 302 This is a convention in \pLaTeX.
 303
 304 \item Japanese characters in math mode are typeset by the font family {\tt mc}.
 305 \end{itemize}
 306
 307 However, above settings are not sufficient for Japanese-based
 308 documents. To typeset Japanese-based documents, You are better to use
 309 class files other than {\tt article.cls}, {\tt book.cls}, and so on.  At
 310 the present, we have the counterparts of \texttt{jclasses} (standard
 311 classes in \pLaTeX) and \texttt{jsclasses} (classes by Haruhiko
 312 Okumura), namely, \texttt{ltjclasses} and \texttt{ltjsclasses}.
 313
 314 \paragraph{{\tt\char92 CID, {\tt\char92 UTF}} and macros in OTF package}
 315 Under \pTeX, \texttt{OTF} package (developed by Shuzaburo Saito) is
 316 used for typesetting characters which is in Adobe-japan1-6 CID but not
 317 in JIS~X~0208. Since this package is widely used, \LuaTeX-ja
 318 supports some of functions in \texttt{OTF} package.
 319
 320 \begin{LTXexample}
 321 森\UTF{9DD7}外と内田百\UTF{9592}とが\UTF{9AD9}島屋に行く。
 322
 323 \CID{7652}飾区の\CID{13706}野家，
 324 葛飾区の吉野家
 325 \end{LTXexample}
 326 %lltjlisting.sty要修正？：↑「森」の直後で改行．
 327
 328
 329 \subsection{Changing Fonts}
 330 \paragraph{Remark: Japanese Characters in Math Mode}
 331 Since \pTeX\ supports Japanese characters in math mode, there are
 332 sources like the following:
 333
 334 \begin{LTXexample}
 335 $f_{高温}$~($f_{\text{high temperature}}$).
 336 \[ y=(x-1)^2+2\quad{}よって\quad y>0 \]
 337 $5\in{}素:=\{\,p\in\mathbb N:\text{$p$ is a prime}\,\}$.
 338 \end{LTXexample}
 339
 340 We (the project members of \LuaTeX-ja) think that using
 341 Japanese characters in math mode are allowed if and only if these are used as identifiers.
 342 In this point of view,
 343 \begin{itemize}
 344 \item The lines 1~and~2 above are not correct, since `高温' in above is used as a textual label, and
 345 `よって' is used as a conjunction.
 346 \item However, the line~3 is correct, since `素' is used as an identifier.
 347 \end{itemize}
 348 Hence, in our opinion, the above input should be corrected as:
 349 \begin{LTXexample}
 350 $f_{\text{高温}}$~%
 351 ($f_{\text{high temperature}}$).
 352 \[ y=(x-1)^2+2\quad
 353   \mathrel{\text{よって}}\quad y>0 \]
 354 $5\in{}素:=\{\,p\in\mathbb N:\text{$p$ is a prime}\,\}$.
 355 \end{LTXexample}
 356 %BUG?: \{\}がなければ「素」がでない．上の段落の「よって」もでてない．
 357 We also believe that using Japanese characters as identifiers is rare,
 358 hence we don't describe how to change Japanese fonts in math mode in
 359 this chapter. For the method, please see Part~\ref{part-ref}.
 360
 361
 362 \paragraph{plain \TeX}
 363 To change Japanese fonts in plain \TeX, you must use the primitive
 364 \verb+\jfont+. So please see Part~\ref{part-ref}.
 365
 366
 367 \paragraph{NFSS2}
 368 For \LaTeXe, \LuaTeX-ja simply adopted the font selection system from that
 369 of \pLaTeXe\ (in {\tt plfonts.dtx}).
 370 \begin{itemize}
 371 \item Two control sequences \verb+\mcdefault+ and \verb+\gtdefault+ are
 372       used to specify the default font families for \emph{mincho} and
 373       \emph{gothic}, respectively. Default values: \texttt{mc} for
 374       \verb+\mcdefault+ and \texttt{gt} for \verb+\gtdefault+.
 375 \item Commands \verb+\fontfamily+, \verb+\fontseries+,
 376       \verb+\fontshape+ and \verb+\selectfont+ can be used to change
 377       attributes of Japanese fonts.
 378 \begin{center}
 379 \begin{tabular}{ccccc}
 380 \toprule
 381 &\textbf{encoding}&\textbf{family}&\textbf{series}&\textbf{shape}\\\midrule
 382 alphabetic fonts
 383 &\verb+\romanencoding+&\verb+\romanfamily+&\verb+\romanseries+&\verb+\romanshape+\\
 384 Japanese fonts
 385 &\verb+\kanjiencoding+&\verb+\kanjifamily+&\verb+\kanjiseries+&\verb+\kanjishape+\\
 386 both&---&--&\verb+\fontseries+&\verb+\fontshape+\\
 387 auto select&\verb+\fontencoding+&\verb+\fontfamily+&---&---\\
 388 \bottomrule
 389 \end{tabular}
 390 \end{center}
 391 \item For defining a Japanese font family, use \verb+\DeclareKanjiFamily+
 392       instead of \verb+\DeclareFontFamily+.
 393 \end{itemize}
 394
 395 \paragraph{fontspec}
 396 To coexist with the \texttt{fontspec} package, it is needed to load
 397 \texttt{luatexja-fontspec} package in the preamble. This additional
 398 package automatically loads \texttt{luatexja} and \texttt{fontspec}
 399 package, if needed.
 400
 401 In \texttt{luatexja-fontspec} package, the following 7~commands are defined as
 402 counterparts of original commands in \texttt{fontspec}:
 403 \begin{center}
 404 \begin{tabular}{ccccc}
 405 \toprule
 406 Japanese fonts
 407 &\verb+\jfontspec+&\verb+\setmainjfont+&\verb+\setsansjfont+&\verb+\newjfontfamily+\\
 408 alphabetic fonts
 409 &\verb+\fontspec+&\verb+\setmainfont+&\verb+\setsansfont+&\verb+\newfontfamily+\\
 410 \midrule
 411 Japanese fonts
 412 &\verb+\newjfontface+&\verb+\defaultjfontfeatures+&\verb+\addjfontfeatures+\\
 413 alphabetic fonts
 414 &\verb+\newfontface+&\verb+\defaultfontfeatures+&\verb+\addfontfeatures+\\
 415 \bottomrule
 416 \end{tabular}
 417 \end{center}
 418 使用例
 419
 420
 421 Note that there is no command named \verb+\setmonojfont+, since it is
 422 popular for Japanese fonts that nearly all Japanese glyphs have same
 423 widths.  Also note that the kerning feature is set off by default in
 424 these 7~commands, since this feature and \textbf{JAglue} will clash (see
 425 \ref{para-kern}).
 426
 427 \section{Changing Parameters}
 428 There are many parameters in \LuaTeX-ja. And due to the behavior of \LuaTeX,
 429 most of them are not stored as internal register of \TeX, but as an
 430 original storage system in \LuaTeX-ja. Hence, to assign or acquire those
 431 parameters, you have to use commands \verb+\ltjsetparameter+ and
 432 \verb+\ltjgetparameter+.
 433
 434 \subsection{Editing the range of \textbf{JAchar}s}
 435
 436
 437 To edit the range of \textbf{JAchar}s, You have to assign a non-zero
 438 natural number which is less than 217 to the character range first. This
 439 can be done by using \verb+\ltjdefcharrange+ primitive. For example, the
 440 next line assigns whole characters in Supplementary Multilingual Plane
 441 and the character `漢' to the range number~100.
 442 \begin{lstlisting}
 443 \ltjdefcharrange{100}{"10000-"1FFFF,`漢}
 444 \end{lstlisting}
 445 This assignment of numbers to ranges are always global, so you should
 446 not do this in the middle of a document.
 447
 448 If some character has been belonged to some non-zero numbered range,
 449 this will be overwritten by the new setting. For example, whole SMP
 450 belong the range~4 in the default setting of \LuaTeX-ja, and if you
 451 specify the above line, then SMP will belong the range~100 and be
 452 removed from the range~4.
 453
 454 After assigning numbers to ranges, the {\sf jacharrange} parameter can
 455 be used to customize which character range will be treated as ranges of
 456 \textbf{JAchar}s, as the following line (this is just the default
 457 setting of \LuaTeX-ja):
 458 \begin{verbatim}
 459 \ltjsetparameter{jacharrange={-1, +2, +3, -4, -5, +6, +7, +8}}
 460 \end{verbatim}
 461
 462
 463
 464 \paragraph{Default Setting}
 465 Lua\TeX-ja predefines eight character ranges for convinience. They are
 466 determined from the following data:
 467 \begin{itemize}
 468 \item Blocks in Unicode~6.0.
 469 \item The \texttt{Adobe-Japan1-UCS2} mapping between a CID Adobe-Japan1-6 and Unicode.
 470 \item The \texttt{PXbase} bundle for \upTeX\ by Takayuki Yato.
 471 \end{itemize}
 472
 473 Now we describe these eight ranges. The alphabet `J' or `A' after the
 474 number shows whether characters in the range is treated as
 475 \textbf{JAchar}s or not by default. These settings are similar to \texttt{prefercjk} ...
 476 \begin{description}
 477 \item[Range~8${}^{\text{J}}$] Symbols in the intersection of the upper half of ISO~8859-1
 478          (Latin-1 Supplement) and JIS~X~0208 (a basic character set for Japanese). This character range
 479          consists of the following charatcers:
 480 \begin{multicols}{2}
 481 \begin{itemize}
 482 \def\ch#1#2{\item \char"#1\ ({\tt U+00#1}, #2)}%"
 483 \ch{A7}{Section Sign}
 484 \ch{A8}{Umlaut or diaeresis}
 485 \ch{B0}{Degree sign}
 486 \ch{B1}{Plus-minus sign}
 487 \ch{B4}{Spacing acute}
 488 \ch{B6}{Paragraph sign}
 489 \ch{D7}{Multiplication sign}
 490 \ch{F7}{Division Sign}
 491 \end{itemize}
 492 \end{multicols}
 493 \item[Range~1${}^{\text{A}}$] Latin characters that some of them are included in Adobe-Japan1-6.
 494 This range consist of the following Unicode ranges, \emph{except characters in the range~8 above}:
 495 \begin{multicols}{2}
 496 \begin{itemize}
 497 \item {\tt U+0080}--{\tt U+00FF}: Latin-1 Supplement
 498 \item {\tt U+0100}--{\tt U+017F}: Latin Extended-A
 499 \item {\tt U+0180}--{\tt U+024F}: Latin Extended-B
 500 \item {\tt U+0250}--{\tt U+02AF}: IPA Extensions
 501 \item {\tt U+02B0}--{\tt U+02FF}: Spacing Modifier Letters
 502 \item {\tt U+0300}--{\tt U+036F}: Combining Diacritical Marks
 503 \item {\tt U+1E00}--{\tt U+1EFF}: Latin Extended Additional
 504 \par\
 505 \end{itemize}
 506 \end{multicols}
 507 \item[Range~2${}^{\text{J}}$] Greek and Cyrillic letters. JIS~X~0208 (hence most of Japanese
 508            fonts) has some of these characters.
 509 \begin{multicols}{2}
 510 \begin{itemize}
 511 \item {\tt U+0370}--{\tt U+03FF}: Greek and Coptic
 512 \item {\tt U+0400}--{\tt U+04FF}: Cyrillic
 513 \item {\tt U+1F00}--{\tt U+1FFF}: Greek Extended
 514 \\\
 515 \end{itemize}
 516 \end{multicols}
 517 \item[Range~3${}^{\text{J}}$] Punctuations and Miscellaneous symbols. The block list is
 518            indicated in Table~\ref{table-rng3}.
 519 \begin{table}[p]
 520 \caption{Unicode blocks in predefined character range~3.}\label{table-rng3}
 521 \catcode`\"=13\def"#1#2#3#4{{\tt U+#1#2#3#4}}%"
 522 \begin{center}
 523 \begin{tabular}{ll}
 524 "2000--"206F&General Punctuation\\
 525 "2070--"209F&Superscripts and Subscripts\\
 526 "20A0--"20CF&Currency Symbols\\
 527 "20D0--"20FF&Combining Diacritical Marks for Symbols\\
 528 "2100--"214F&Letterlike Symbols\\
 529 "2150--"218F&Number Forms\\
 530 "2190--"21FF&Arrows\\
 531 "2200--"22FF&Mathematical Operators\\
 532 "2300--"23FF&Miscellaneous Technical\\
 533 "2400--"243F&Control Pictures\\
 534 "2500--"257F&Box Drawing\\
 535 "2580--"259F&Block Elements\\
 536 "25A0--"25FF&Geometric Shapes\\
 537 "2600--"26FF&Miscellaneous Symbols\\
 538 "2700--"27BF&Dingbats\\
 539 "2900--"297F&Supplemental Arrows-B\\
 540 "2980--"29FF&Miscellaneous Mathematical Symbols-B\\
 541 "2B00--"2BFF&Miscellaneous Symbols and Arrows\\
 542 "E000--"F8FF&Private Use Area\\
 543 "FB00--"FB4F&Alphabetic Presentation Forms
 544 \end{tabular}
 545 \end{center}
 546 \end{table}
 547 \item[Range~4${}^{\text{A}}$] Characters usually not in Japanese fonts. This range consists
 548            of almost all Unicode blocks which are not in other
 549            predefined ranges. Hence, instead of showing the block list,
 550            we put the definition of this range itself:
 551 \begin{lstlisting}
 552 \ltjdefcharrange{4}{%
 553    "500-"10FF, "1200-"1DFF, "2440-"245F, "27C0-"28FF, "2A00-"2AFF,
 554   "2C00-"2E7F, "4DC0-"4DFF, "A4D0-"A82F, "A840-"ABFF, "FB50-"FE0F,
 555   "FE20-"FE2F, "FE70-"FEFF, "10000-"1FFFF} % non-Japanese
 556 \end{lstlisting}
 557 \item[Range~5${}^{\text{A}}$] Surrogates and Supplementary Private Use Areas.
 558 \item[Range~6${}^{\text{J}}$] Characters used in Japanese. The block list is indicated in Table~\ref{table-rng6}.
 559 \begin{table}[p]
 560 \caption{Unicode blocks in predefined character range~6.}\label{table-rng6}
 561 \catcode`\"=13\def"#1#2#3#4{{\tt U+#1#2#3#4}}%"
 562 \begin{center}
 563 \begin{tabular}{ll}
 564 "2460--"24FF&Enclosed Alphanumerics\\
 565 "2E80--"2EFF&CJK Radicals Supplement\\
 566 "3000--"303F&CJK Symbols and Punctuation\\
 567 "3040--"309F&Hiragana\\
 568 "30A0--"30FF&Katakana\\
 569 "3190--"319F&Kanbun\\
 570 "31F0--"31FF&Katakana Phonetic Extensions\\
 571 "3200--"32FF&Enclosed CJK Letters and Months\\
 572 "3300--"33FF&CJK Compatibility\\
 573 "3400--"4DBF&CJK Unified Ideographs Extension A\\
 574 "4E00--"9FFF&CJK Unified Ideographs\\
 575 "F900--"FAFF&CJK Compatibility Ideographs\\
 576 "FE10--"FE1F&Vertical Forms\\
 577 "FE30--"FE4F&CJK Compatibility Forms\\
 578 "FE50--"FE6F&Small Form Variants\\
 579 "{20}000--"{2F}FFF&(Supplementary Ideographic Plane)
 580 \end{tabular}
 581 \end{center}
 582 \end{table}
 583 \item[Range~7${}^{\text{J}}$] Characters used in CJK languages, but not included in  Adobe-Japan1-6.
 584 The block list is indicated in Table~\ref{table-rng7}.
 585 \begin{table}[p]
 586 \caption{Unicode blocks in predefined character range~7.}\label{table-rng7}
 587 \catcode`\"=13\def"#1#2#3#4{{\tt U+#1#2#3#4}}%"
 588 \begin{center}
 589 \begin{tabular}{ll}
 590 "1100--"11FF&Hangul Jamo\\
 591 "2F00--"2FDF&Kangxi Radicals\\
 592 "2FF0--"2FFF&Ideographic Description Characters\\
 593 "3100--"312F&Bopomofo\\
 594 "3130--"318F&Hangul Compatibility Jamo\\
 595 "31A0--"31BF&Bopomofo Extended\\
 596 "31C0--"31EF&CJK Strokes\\
 597 "A000--"A48F&Yi Syllables\\
 598 "A490--"A4CF&Yi Radicals\\
 599 "A830--"A83F&Common Indic Number Forms\\
 600 "AC00--"D7AF&Hangul Syllables\\
 601 "D7B0--"D7FF&Hangul Jamo Extended-B
 602 \end{tabular}
 603 \end{center}
 604 \end{table}
 605 \end{description}
 606
 607
 608 \subsection{\textsf{kanjiskip} and \textsf{xkanjiskip}}\label{subs-kskip}
 609 \textbf{JAglue} is divided into the following three categories:
 610 \begin{itemize}
 611 \item Glues/kerns specified in JFM. If \verb+\inhibitglue+ is issued
 612       around a Japanese character, this glue will be not inserted at the
 613       place.
 614 \item The default glue which inserted between two \textbf{JAchar}s ({\sf
 615       kanjiskip}).
 616 \item The default glue which inserted between a \textbf{JAchar} and an
 617       \textbf{ALchar} (\textsf{xkanjiskip}).
 618 \end{itemize}
 619 The value (a skip) of \textsf{kanjiskip} or \textsf{xkanjiskip} can be
 620 changed as the following.
 621 \begin{lstlisting}
 622 \ltjsetparameter{kanjiskip={0pt plus 0.4pt minus 0.4pt},
 623                  xkanjiskip={0.25\zw plus 1pt minus 1pt}}
 624 \end{lstlisting}
 625
 626
 627 It may occur that JFM contains the data of `ideal width of {\sf
 628 kanjiskip}' and/or `ideal width of \textsf{xkanjiskip}'.
 629 To use these data from JFM, set the value of \textsf{kanjiskip} or
 630 \textsf{xkanjiskip} to \verb+\maxdimen+.
 631
 632 \subsection{Insertion Setting of \textsf{xkanjiskip}}
 633 It is not desirable that \textsf{xkanjiskip} is inserted between every
 634 boundary between \textbf{JAchar}s and \textbf{ALchar}s. For example,
 635 \textsf{xkanjiskip} should not be inserted after opening parenthesis
 636 (\textit{e.g.}, compare `(あ' and `(\hskip\ltjgetparameter{xkanjiskip}あ').
 637
 638 \LuaTeX-ja can control whether \textsf{xkanjiskip} can be inserted
 639 before/after a character, by changing \textsf{jaxspmode} for \textbf{JAchar}s and
 640 \textsf{alxspmode} parameters \textbf{ALchar}s respectively.
 641 \begin{LTXexample}
 642 \ltjsetparameter{jaxspmode={`あ,preonly}, alxspmode={`\!,postonly}}
 643 pあq い!う
 644 \end{LTXexample}
 645
 646 The second argument {\tt preonly} means `the insertion of
 647 \textsf{xkanjiskip} is allowed before this character, but not after'.
 648 the other possible values are {\tt postonly}, {\tt allow} and {\tt
 649 inhibit}. For the compatibility with \pTeX, natural numbers between
 650 0~and~3 are also allowed as the second argument\footnote{But we don't
 651 recommend this: since numbers 1~and~2 have opposite meanings in
 652 \textsf{jaxspmode} and \textsf{alxspmode}.}.
 653
 654 If you want to enable/disable all insertions of \textsf{kanjiskip} and
 655 \textsf{xkanjiskip}, set \textsf{autospacing} and \textsf{autoxspacing}
 656 parameters to {\tt false}, respectively.
 657
 658
 659 \subsection{Shifting Baseline}
 660 To make a match between a Japanese font and an alphabetic font, sometimes
 661 shifting of the baseline of one of the pair is needed. In \pTeX, this is achieved
 662 by setting \verb+\ybaselineshift+ to a non-zero length (the
 663 baseline of alphabetic fonts is shifted below). However, for documents
 664 whose main language is not Japanese, it is good to shift the baseline of
 665 Japanese fonts, but not that of alphabetic fonts.
 666 Because of this, \LuaTeX-ja can independently set the shifting amount
 667 of the baseline of alphabetic fonts (\textsf{yalbaselineshift}
 668 parameter) and that of Japanese fonts (\textsf{yjabaselineshift}
 669 parameter).
 670
 671 \begin{LTXexample}
 672 \vrule width 150pt height 0.4pt depth 0pt\hskip-120pt
 673 \ltjsetparameter{yjabaselineshift=0pt, yalbaselineshift=0pt}abcあいう
 674 \ltjsetparameter{yjabaselineshift=5pt, yalbaselineshift=2pt}abcあいう
 675 \end{LTXexample}
 676 Here the horizontal line in above is the baseline of a line.
 677
 678 There is an interesting side-effect: characters in different size can be
 679 vertically aligned center in a line, by setting two parameters appropriately.
 680 The following is an example (beware the value is not well tuned):
 681 \begin{LTXexample}
 682 xyz漢字
 683 {\scriptsize
 684   \ltjsetparameter{yjabaselineshift=-1pt,
 685     yalbaselineshift=-1pt}
 686   XYZひらがな
 687 }abcかな
 688 \end{LTXexample}
 689
 690
 691 \subsection{Cropmark}
 692 Cropmark is a mark for indicating 4~corners and horizontal/vertical
 693 center of the paper. In Japanese, we call cropmark as tombo(w).
 694 \pLaTeX\ and this \LuaTeX-ja support `tombow' by their kernel.
 695 The following steps are needed to typeset cropmark:
 696
 697 \begin{enumerate}
 698 \item First, define the banner which will be printed at the upper left
 699       of the paper. This is done by assigning a token list to
 700       \verb+\@bannertoken+.
 701
 702 For example, the following sets banner as `{\tt filename (2012-01-01 17:01)}':
 703 \begin{verbatim}
 704 \makeatletter
 705
 706 \hour\time \divide\hour by 60 \@tempcnta\hour \multiply\@tempcnta 60\relax
 707 \minute\time \advance\minute-\@tempcnta
 708 \@bannertoken{%
 709    \jobname\space(\number\year-\two@digits\month-\two@digits\day
 710    \space\two@digits\hour:\two@digits\minute)}%
 711 \end{verbatim}
 712
 713 \item ...
 714 \end{enumerate}
 715
 716
 717 \part{Reference}\label{part-ref}
 718 \section{Font Metric and Japanese Font}
 719 \subsection{\texttt{\char92jfont} primitive}
 720 To load a font as a Japanese font, you must use the
 721 \verb+\jfont+ primitive instead of~\verb+\font+, while
 722 \verb+\jfont+ admits the same syntax used in~\verb+\font+.
 723 \LuaTeX-ja automatically loads \texttt{luaotfload} package,
 724 so TrueType/OpenType fonts with features can be used for Japanese fonts:
 725 \begin{LTXexample}
 726 \jfont\tradgt={file:ipaexg.ttf:script=latn;%
 727   +trad;-kern;jfm=ujis} at 14pt
 728 \tradgt{}当／体／医／区
 729 \end{LTXexample}
 730
 731 Note that the defined control sequence
 732 (\verb+\tradgt+ in the example above) using \verb+\jfont+ is not a
 733 \textit{font\_def} token, hence the input like \verb+\fontname\tradgt+
 734 causes a error.  We denote control sequences which are defined in
 735 \verb+\jfont+ by <jfont\_cs>.
 736
 737 \paragraph{Prefix \texttt{psft}}
 738 Besides \texttt{file:}\ and \texttt{name:}\ prefixes, \texttt{psft:}\
 739 can be used a prefix in \verb+\jfont+ (and~\verb+\font+) primitive.
 740 Using this prefix, you can specify a `name-only' Japanese font which
 741 will be not embedded to PDF. Typical use of this prefix is to specify
 742 the `standard' Japanese fonts, namely, `Ryumin-Light' and
 743 `GothicBBB-Medium'. For kerning or other informations, that of Kozuka
 744 Mincho Pr6N Regular (this is a font by Adobe Inc., and included in
 745 Japanese Font Packs for Adore Reader) will be used.
 746
 747
 748 \paragraph{JFM}
 749 As noted in Introduction, a JFM has measurements of characters and
 750 glues/kerns that are automatically inserted for Japanese
 751 typesetting. The structure of JFM will be described in the next
 752 subsection. At the calling of \verb+\jfont+ primitive, you must specify
 753 which JFM will be used for this font by the following keys:
 754
 755 \begin{list}{}{\def\makelabel{\ttfamily}\def\{{\char`\{}\def\}{\char`\}}}
 756 \item[jfm=<name>]
 757 Specify the name of JFM. A file named \texttt{jfm-<name>.lua} will be searched and/or loaded.
 758
 759 The followings are JFMs shipped with Lua\TeX-ja:
 760 \begin{description}
 761 \item[\tt jfm-ujis.lua] A standard JFM in Lua\TeX-ja. This JFM is
 762       based on \verb+upnmlminr-h.tfm+, a metric for UTF/OTF package that
 763       is used in \upTeX. When you use \texttt{luatexja-otf.sty}, please use this JFM.
 764 \item[\tt jfm-jis.lua] A counterpart for \verb+jis.tfm+, `JIS font
 765            metric' which is widely used in \pTeX. A major difference of
 766            \texttt{jfm-ujis.lua} and this \texttt{jfm-jis.lua} is that
 767            most haracters under \texttt{jfm-ujis.lua} are square-shaped,
 768            while that under \texttt{jfm-jis.lua} are horizontal
 769            rectangles.
 770
 771 \item[\tt jfm-min.lua] A counterpart for \verb+min10.tfm+, which is one
 772            of the default Japanese font metric shipped with \pTeX. There
 773            are notable difference between this JFM and other 2~JFMs, as
 774            showed below:
 775
 776 何かいい例．単純に「min10にはバグあり」ではなく，プロポーショナルな側面も見せたいよね
 777 （乙部さんのmin10.pdfの例を使う？）
 778 \end{description}
 779
 780 \item[jfmvar=<string>] Sometimes there is a need that
 781 \end{list}
 782
 783
 784 \paragraph{Note: kern feature}\label{para-kern}
 785 Some fonts have information for inter-glyph spacing. However, this
 786 information is not well-compatible with \LuaTeX-ja.  More concretely,
 787 this kerning space from this information are inserted \emph{before} the
 788 insertion process of \textbf{JAglue}, and this causes incorrect spacing
 789 between two characters when both a glue/kern from the data in the font
 790 and it from JFM are present.
 791
 792 \begin{itemize}
 793 \item You should specify {\tt -kern} in
 794 {\tt\char92jfont} primitive, when you want to use other font features,
 795       such as {\tt script=...}\,.
 796 \item If you want to use Japanese fonts in proportinal width, and use
 797       information from this font, use \texttt{jfm-prop.lua} for its JFM, and ...
 798
 799 TODO: kanjiskip?
 800 \end{itemize}
 801
 802
 803 \subsection{Structure of JFM file}
 804 A JFM file is a Lua script which has only one function call:
 805 \begin{verbatim}
 806 luatexja.jfont.define_jfm { ... }
 807 \end{verbatim}
 808 Real data are stored in the table which indicated above by
 809 \verb+{ ... }+.  So, the rest of this subsection are devoted to describe the
 810 structure of this table.  Note that all lengths in a JFM file are
 811 floating-point numbers in design-size unit.
 812
 813 \begin{list}{}{\def\makelabel{\ttfamily}\def\{{\char`\{}\def\}{\char`\}}}
 814 \item[dir=<direction>] (required)
 815
 816 The direction of JFM. At the present, only \texttt{'yoko'} is supported.
 817
 818 \item[zw=<length>] (required)
 819
 820 The amount of the length of the `full-width'.
 821
 822 \item[zh=<length>] (required)
 823
 824 \item[kanjiskip=\{<natural>, <stretch>, <shrink>\}] (optional)
 825
 826 This field specifies the `ideal' amount of \textsf{kanjiskip}. As noted
 827              in Subsection~\ref{subs-kskip}, if the parameter
 828              \textsf{kanjiskip} is \verb+\maxdimen+, the value specified
 829              in this field is actually used (if this field is not specified in
 830              JFM, it is regarded as 0\,pt). Note that <stretch> and <shrink>
 831              fields are in design-size unit too.
 832
 833
 834 \item[xkanjiskip=\{<natural>, <stretch>, <shrink>\}] (optional)
 835
 836 Like the \texttt{kanjiskip} field, this field specifies the `ideal'
 837              amount of \textsf{xkanjiskip}.
 838
 839 \end{list}
 840
 841 Besides from above fields, a JFM file have several sub-tables those
 842 indices are natural numbers.  The table indexed by~$i\in\omega$ stores
 843 informations of `character class'~$i$. At least, the character class~0 is
 844 always present, so each JFM file must have a sub-table whose index is
 845 \texttt{[0]}.  Each sub-table (its numerical index is denoted by $i$) has
 846 the following fields:
 847
 848 \begin{list}{}{\def\makelabel{\ttfamily}\def\{{\char`\{}\def\}{\char`\}}}
 849 \item[chars=\{<character>, ...\}] (required except character class~0)
 850
 851 This field is a list of characters which are in this character
 852              type~$i$. This field is not required if $i=0$, since all
 853              \textbf{JAchar} which are not in any character class other
 854              than 0 (hence, the character class~0 contains most of
 855              \textbf{JAchar}s). In the list, a character can be
 856              specified by its code number, or by the character itself
 857              (as a string of length~1).
 858
 859 In addition to those `real' characters, the following `imaginary
 860              characters' can be specified in the list:
 861 \begin{list}{}{\def\makelabel{\ttfamily}\def\{{\char`\{}\def\}{\char`\}}}
 862 \item['lineend'] An ending of a line.
 863 \item['diffmet'] Used at a boundary between two \textbf{JAchar}s whose JFM or size is different.
 864 \item['boxbdd'] The beginning/ending of a horizontal box, and the beginging of a noindented paragraph.
 865 \item['parbdd'] The beginning of an (indented) paragraph.
 866 \item['jcharbdd'] A boundary between \textbf{JAchar} and anything else
 867              (such as \textbf{ALchar}, kern, glue, ...).
 868 \item[$-1$] The left/right boundary of an inline math formula.
 869 \end{list}
 870
 871 \item[width=<length>, height=<length>, depth=<length>, italic=<length>]\ (required)
 872
 873 Specify width of characters in character class~$i$, height, depth and
 874 the amount of italic correction. All characters in character class~$i$ are regarded that its width, height and depth are
 875 as values of these fields.
 876 But there is one exception: if \texttt{'prop'} is specified in \texttt{width} field, width of a character becomes that of its `real' glyph
 877
 878 \item[left=<length>, down=<length>, align=<align>]\
 879
 880 These fields are for adjusting the position of the `real' glyph. Legal
 881              values of \texttt{align} field are \texttt{'left'},
 882              \texttt{'middle'} and \texttt{'right'}. If one of these
 883              3~fields are omitted, \texttt{left} and \texttt{down} are
 884              treated as~0, and \texttt{align} field is treated as
 885              \texttt{'left'}.
 886 The effects of these 3~fields are indicated in Figure~\ref{fig-pos}.
 887
 888 In most cases, \texttt{left} and \texttt{down} fields are~0, while
 889 it is not uncommon that the \texttt{align} field is \texttt{'middle'} or \texttt{'right'}.
 890 For example, setting the \texttt{align} field to \texttt{'right'} is practically needed
 891 when the current character class is the class for opening delimiters'.
 892 \begin{figure}[tb]
 893 \begin{minipage}{0.4\textwidth}%
 894 \begin{center}\unitlength=10pt\small
 895 \begin{picture}(15,12)(-1,-4)
 896 \color{black!10!white}% real glyph :step1
 897 \put(0,0){\vrule width 12\unitlength height 8\unitlength depth 3\unitlength}
 898
 899 \color{red!20!white}% real glyph :step1
 900 \put(-1,-1.5){\vrule width 6\unitlength height 7\unitlength depth 2.5\unitlength}
 901
 902 \color{red}% real glyph
 903 \thicklines
 904 \put(-1,-1.5){\vector(0,1){7}\vector(0,-1){2.5}\vector(1,0){6}}
 905 \put(5,-1.5){\line(0,1){7}\line(0,-1){2.5}}
 906 \put(-1,5.5){\line(1,0){6}}
 907 \put(-1,-4){\line(1,0){6}}
 908
 909 \color{green!20!white}% real glyph :step1
 910 \put(3,0){\vrule width 6\unitlength height 7\unitlength depth 2.5\unitlength}
 911
 912 \color{black}% real glyph :step1
 913 \thicklines
 914 \put(0,0){\vector(0,1){8}\line(0,-1){3}\vector(1,0){12}}
 915 \put(12,0){\line(0,1){8}\vector(0,-1){3}}
 916 \put(0,8){\line(1,0){12}}
 917 \put(0,-3){\line(1,0){12}}
 918 \put(0.2,4){\makebox(0,0)[l]{\texttt{height}}}
 919 \put(12.2,-1.5){\makebox(0,0)[l]{\texttt{depth}}}
 920 \put(6,0.2){\makebox(0,0)[b]{\texttt{width}}}
 921
 922 \color{green!50!black}% real glyph :step1
 923 \thicklines
 924 \put(3,0){\vector(0,1){7}\vector(0,-1){2.5}\vector(1,0){6}}
 925 \put(9,0){\line(0,1){7}\line(0,-1){2.5}}
 926 \put(3,7){\line(1,0){6}}
 927 \put(3,-2.5){\line(1,0){6}}
 928 \newsavebox{\eqdist}
 929 \savebox{\eqdist}(0,0)[b]{%
 930   \thinlines
 931   \put(-0.08,0.2){\line(0,-1){0.4}}%
 932   \put(0.08,0.2){\line(0,-1){0.4}}}
 933 \put(1.5,0){\usebox{\eqdist}}
 934 \put(10.5,0){\usebox{\eqdist}}
 935
 936 \color{blue}% shifted
 937 \thicklines
 938 \put(3,-1.5){\vector(-1,0){4}}
 939 \put(1,-1.7){\makebox(0,0)[t]{\texttt{left}}}
 940 \put(3,0){\vector(0,-1){1.5}}
 941 \put(3.2,-0.75){\makebox(0,0)[l]{\texttt{down}}}
 942 \end{picture}
 943 \end{center}
 944 \end{minipage}%
 945 \begin{minipage}{0.6\textwidth}%
 946 Consider a node containing Japanese character whose value of the \texttt{align}
 947 field is \texttt{'middle'}.
 948 \begin{itemize}
 949 \item The black rectangle is a frame of the node.
 950 Its width, height and depth are specified by JFM.
 951 \item Since the \texttt{align} field is \texttt{'middle'},
 952 the `real' glyph is centered horizontally (the green rectangle).
 953 \item Furthermore, the glyph is shifted according to values of fields
 954       \texttt{left} and \texttt{down}. The ultimate position of the real
 955       glyph is indicated by the red rectangle.
 956 \end{itemize}
 957 \end{minipage}
 958 \caption{The position of the `real' glyph.}
 959 \label{fig-pos}
 960 \end{figure}
 961
 962
 963 \item[kern={\{[$j$]=<kern>, ...\}}]
 964
 965 \item[glue={\{[$j$]=\{<width>, <stretch>, <shrink>\}, ...\}}]
 966 \end{list}
 967
 968 \subsection{Math Font Family}
 969 \TeX\ handles fonts in math formulas by 16~font families\footnote{Omega,
 970 Aleph, \LuaTeX~and $\varepsilon$-\kern-.125em(u)\pTeX can handles 256~families, but
 971 an external package is needed to support this in plain \TeX\ and
 972 \LaTeX.}, and each family has three fonts:
 973 \verb+\textfont+, \verb+\scriptfont+ and \verb+\scriptscriptfont+.
 974
 975 \LuaTeX-ja's handling of Japanese fonts in math formulas is similar;
 976 Table~\ref{tab-math} shows counterparts to \TeX's primitives for math
 977 font families.
 978
 979 \begin{table}[tb]
 980 \label{tab-math}
 981 \caption{Primitives for Japanese math fonts.}
 982 \begin{center}\def\{{\char`\{}\def\}{\char`\}}
 983 \begin{tabular}{lll}
 984 \toprule
 985 &Japanese fonts&alphabetic fonts\\
 986 \midrule
 987 font family&\verb+\jfam+${}\in [0,256)$&\verb+\fam+\\
 988 text size&\tt\textsf{jatextfont}\,=\{<jfam>,<jfont\_cs>\}&\tt\verb+\textfont+<fam>=<font\_cs>\\
 989 script size&\tt\textsf{jascriptfont}\,=\{<jfam>,<jfont\_cs>\}&\tt\verb+\scriptfont+<fam>=<font\_cs>\\
 990 scriptscript size&\tt\textsf{jascriptscriptfont}\,=\{<jfam>,<jfont\_cs>\}&\tt\verb+\scriptscriptfont+<fam>=<font\_cs>\\
 991 \bottomrule
 992 \end{tabular}
 993 \end{center}
 994 \end{table}
 995
 996
 997
 998 \section{Parameters}
 999 \subsection{{\tt\char92 ltjsetparameter} primitive}
1000 As noted before, \verb+\ltjsetparameter+ and \verb+\ltjgetparameter+ are
1001 primitives for accessing most parameters of \LuaTeX-ja. One of the main
1002 reason that \LuaTeX-ja didn't adopted the syntax similar to that of \pTeX\
1003 (\textit{e.g.},~\verb+\prebreakpenalty`）=10000+)
1004 is the position of \verb+hpack_filter+ callback in the source
1005 of \LuaTeX, see Section~\ref{sec-para}.
1006
1007 \verb+\ltjsetparameter+ and \verb+\ltjglobalsetparameter+ are primitives
1008 for assigning parameters. These take one argument which is a
1009 \texttt{<key>=<value>} list. Allowed keys are described in the next
1010 subsection.
1011 The difference between
1012 \verb+\ltjsetparameter+ and \verb+\ltjglobalsetparameter+ is only the
1013 scope of assignment;
1014 \verb+\ltjsetparameter+ does a local assignment and
1015 \verb+\ltjglobalsetparameter+ does a global one.
1016 They also obey the value of \verb+\globaldefs+,
1017 like other assignment.
1018
1019 \verb+\ltjgetparameter+ is the primitive for acquiring parameters. It
1020 always takes a parameter name as first argument, and also takes the
1021 additional argument---a character code, for example---in some cases.
1022 \begin{LTXexample}
1023 \ltjgetparameter{differentjfm},
1024 \ltjgetparameter{autospacing},
1025 \ltjgetparameter{prebreakpenalty}{`）}.
1026 \end{LTXexample}
1027 \emph{The return value of\/ {\normalfont\tt\char92ltjgetparameter} is
1028 always a string}. This is outputted by \texttt{tex.write()}, so any
1029 character other than space~`{\tt\char32}'~(U+0020) has the category code
1030 12~(other), while the space has 10~(space).
1031
1032 \subsection{List of Parameters}
1033 In the following list of parameters, [\verb+\cs+] indicates the counterpart in \pTeX, and each symbol has the following meaning:
1034 \begin{itemize}
1035 \item No mark: values at the end of the paragraph or the hbox are
1036       adopted in the whole paragraph/hbox.
1037 \item `\ast' : local parameters, which can change everywhere inside a paragraph/hbox.
1038 \item `\dagger': assignments are always global.
1039 \end{itemize}
1040
1041 \begin{list}{}{\def\makelabel{\ttfamily}\def\{{\char`\{}\def\}{\char`\}}}
1042 \item[\textsf{jcharwidowpenalty}\,=<penalty>] [\verb+\jcharwidowpenalty+]
1043
1044 Penalty value for supressing orphans. This penalty is inserted just
1045              after the last \textbf{JAchar} which is not regarded as a
1046              (Japanese) punctuation mark.
1047
1048 \item[\textsf{kcatcode}\,=\{<chr\_code>,<natural number>\}]\
1049
1050 An additional attributes having each character whose character code is <chr\_code>.
1051 At the present version, the lowermost bit of <natural number> indicates
1052              whether the character is considered as a punctuation mark
1053              (see the description of \textsf{jcharwidowpenalty} above).
1054
1055
1056 \item[\textsf{prebreakpenalty}\,=\{<chr\_code>,<penalty>\}] [\verb+\prebreakpenalty+]
1057 \item[\textsf{postbreakpenalty}\,=\{<chr\_code>,<penalty>\}] [\verb+\postbreakpenalty+]
1058 \item[\textsf{jatextfont}\,=\{<jfam>,<jfont\_cs>\}] [\verb+\textfont+ in \TeX]
1059 \item[\textsf{jascriptfont}\,=\{<jfam>,<jfont\_cs>\}] [\verb+\scriptfont+ in \TeX]
1060 \item[\textsf{jascriptscriptfont}\,=\{<jfam>,<jfont\_cs>\}] [\verb+\scriptscriptfont+ in \TeX]
1061 \item[\textsf{yjabaselineshift}\,=<dimen>$^\ast$]\
1062 \item[\textsf{yalbaselineshift}\,=<dimen>$^\ast$] [\verb+\ybaselineshift+]
1063
1064 \item[\textsf{jaxspmode}\,=\{<chr\_code>,<mode>\}] [\verb+\inhibitxspcode+]
1065
1066 Setting whether inserting  \textsf{xkanjiskip} is allowed before/after a \textbf{JAchar} whose character code is <chr\_code>.
1067 The followings are allowed for <mode>:
1068 \begin{description}
1069 \item[0, \texttt{inhibit}] Insertion of \textsf{xkanjiskip} is inhibited before the charater, nor after the charater.
1070 \item[2, \texttt{preonly}] Insertion of \textsf{xkanjiskip} is allowed before the charater, but not after.
1071 \item[1, \texttt{postonly}] Insertion of \textsf{xkanjiskip} is allowed after the charater, but not before.
1072 \item[3, \texttt{allow}] Insertion of \textsf{xkanjiskip} is allowed before the charater and after the charater.
1073 This is the default value.
1074 \end{description}
1075
1076 \item[\textsf{alxspmode}\,=\{<chr\_code>,<mode>\}] [\verb+\xspcode+]
1077
1078 Setting whether inserting  \textsf{xkanjiskip} is allowed before/after a \textbf{ALchar} whose character code is <chr\_code>.
1079 The followings are allowed for <mode>:
1080 \begin{description}
1081 \item[0, \texttt{inhibit}] Insertion of \textsf{xkanjiskip} is inhibited before the charater, nor after the charater.
1082 \item[1, \texttt{preonly}] Insertion of \textsf{xkanjiskip} is allowed before the charater, but not after.
1083 \item[2, \texttt{postonly}] Insertion of \textsf{xkanjiskip} is allowed after the charater, but not before.
1084 \item[3, \texttt{allow}] Insertion of \textsf{xkanjiskip} is allowed before the charater and after the charater.
1085 This is the default value.
1086 \end{description}
1087 Note that parameters \textsf{jaxspmode} and \textsf{alxspmode} use a common table.
1088
1089 \item[\textsf{autospacing}\,=<bool>$^\ast$] [\verb+\autospacing+]
1090 \item[\textsf{autoxspacing}\,=<bool>$^\ast$] [\verb+\autoxspacing+]
1091 \item[\textsf{kanjiskip}\,=<skip>] [\verb+\kanjiskip+]
1092 \item[\textsf{xkanjiskip}\,=<skip>] [\verb+\xkanjiskip+]
1093
1094 \item[\textsf{differentjfm}\,=<mode>$^\dagger$]
1095
1096 Specify how glues/kerns between two \textbf{JAchar}s whose JFM (or size) are different.
1097 The allowed arguments are the followings:
1098 \begin{description}
1099 \item[\texttt{average}]
1100 \item[\texttt{both}]
1101 \item[\texttt{large}]
1102 \item[\texttt{small}]
1103 \end{description}
1104
1105 \item[\textsf{jacharrange}\,=<ranges>$^\ast$]
1106 \item[\textsf{kansujichar}\,=\{<digit>, <chr\_code>\}] [\verb+\kansujichar+]
1107 \end{list}
1108
1109
1110 \section{Other Primitives}
1111 \subsection{Compatibility with \pTeX}
1112 \begin{list}{}{\def\makelabel{\ttfamily\char92 }}
1113 \item[kuten]
1114 \item[jis]
1115 \item[euc]
1116 \item[sjis]
1117 \item[ucs]
1118 \item[kansuji]
1119 \end{list}
1120 \subsection{{\tt\char92 inhibitglue}}
1121 The primitive \verb+\inhibitglue+ suppresses the insertion of \textbf{JAglue}.
1122 The following is an example, using a special JFM that there will be a glue between
1123 the beginning of a box and `あ', and also between `あ' and `ウ'.
1124
1125 \begin{LTXexample}
1126 \jfont\g=psft:Ryumin-Light:jfm=test \g
1127 あウあ\inhibitglue{}ウ\inhibitglue\par
1128 あ\par\inhibitglue{}あ
1129 \par\inhibitglue\hrule{}あoff\inhibitglue ice
1130 \end{LTXexample}
1131
1132 With the help of this example, we remark the specification of \verb+\inhibitglue+:
1133 \begin{itemize}
1134 \item The call of \verb+\inhibitglue+ in the (internal) vertical mode is
1135       effective at the beginning of the next paragraph. This is realized
1136       by hacking \verb+\everypar+.
1137 \item The call of \verb+\inhibitglue+ in the (restricted) horizontal
1138       mode is only effective on the spot; does not get over boundary of
1139       paragraphs. Moreover, \verb+\inhibitglue+ cancels ligatures and
1140       kernings, as shown in l.~4 of above example.
1141 \item The call of \verb+\inhibitglue+ in math mode is just ignored.
1142 \end{itemize}
1143
1144 \section{Control Sequences for \LaTeXe}
1145 \subsection{Patch for NFSS2}
1146 As described in Subsection~\ref{ssec-ltx}, \LuaTeX-ja simply adopted
1147 \texttt{plfonts.dtx} in \pLaTeXe\ for the Japanese patch for NFSS2.
1148
1149 \subsection{Cropmark/`tombow'}
1150
1151 \section{Extensions}
1152 \subsection{{\tt luatexja-fontspec.sty}}
1153
1154 \subsection{{\tt luatexja-otf.sty}}
1155 This optional package supports typesetting charaters in
1156 Adobe-Japan1. {\tt luatexja-otf.sty} offers the following 2~low-level
1157 commands:
1158 \begin{list}{}{\def\makelabel{\ttfamily}\def\{{\char`\{}\def\}{\char`\}}}
1159 \item[\char92CID\{<number>\}]
1160 Typeset a character whose CID number is <number>.
1161 \item[\char92UTF\{<hex\_number>\}]
1162 Typeset a character whose character code is <hex\_number> (in hexadecimal).
1163 This command is similar to \verb+\char"+<hex\_number>,\ %"
1164 but please remind remarks below.
1165 \end{list}
1166
1167 \paragraph{Remarks}
1168 Characters by \verb+\CID+ and \verb+\UTF+ commands are different from
1169 ordinary characters in the following points:
1170 \begin{itemize}
1171 \item Always treated as \textbf{JAchar}s.
1172 \item Processing codes for supporting OpenType features (\textit{e.g.},
1173       glyph replacement and kerning) by the \texttt{luaotfload} package
1174       is not performed to these characters.
1175 \end{itemize}
1176
1177
1178 \paragraph{Additionally Syntax of JFM}
1179 {\tt luatexja-otf.sty} extends the syntax of JFM; the entries of {\tt
1180 chars} table in JFM now allows a string in the form
1181 \verb+'AJ1-xxx'+, which stands for the character
1182 whose CID number in Adobe-Japan1 is \verb+xxx+.
1183
1184 \part{Implementations}\label{part-imp}
1185 \section{Storing Parameters}\label{sec-para}
1186 \subsection{Used Dimensions,  Attributes and whatsit nodes}
1187 Here the following is the list of dimension and attributes which are used in \LuaTeX-ja.
1188 \begin{list}{}{%
1189 \def\makelabel{\ttfamily}
1190 \def\dim#1{\item[\char92 #1\ \textrm{(dimension)}]}
1191 \def\attr#1{\item[\char92 #1\ \textrm{(attribute)}]}
1192 }
1193
1194 \dim{jQ}
1195 As explained in Subsection~\ref{ssec-plain}, \verb+\jQ+ is equal to
1196                         $1\,\textrm{Q}=0.25\,\textrm{mm}$, where `Q'~(also called `級') is
1197                         a unit used in Japanese phototypesetting. So one should not change the value of this dimension.
1198 \dim{jH}
1199 There is also a unit called `歯' which equals to $0.25\,\textrm{mm}$ and
1200                         used in Japanese phototypesetting. The dimension
1201                         \verb+\jH+ stores this length, similar to \verb+\jQ+.
1202 \dim{ltj@zw} A temporal register for the `full-width' of current Japanese font.
1203 \dim{ltj@zh} A temporal register for the `full-height' (usually the sum of height of imaginary body and its depth) of current Japanese font.
1204 \attr{jfam} Current number of Japanese font family for math formulas.
1205 \attr{ltj@curjfnt} The font index of current Japanese font.
1206 \attr{ltj@charclass} The character class of Japanese \textit{glyph\_node}.
1207 \attr{ltj@yablshift} The amount of shifting the baseline of alphabetic
1208                         fonts in scaled point ($2^{-16}\,\textrm{pt}$).
1209 \attr{ltj@ykblshift} The amount of shifting the baseline of Japanese
1210                         fonts in scaled point ($2^{-16}\,\textrm{pt}$).
1211 \attr{ltj@autospc} Whether the auto insertion of \textsf{kanjiskip} is allowed at the node.
1212 \attr{ltj@autoxspc} Whether the auto insertion of \textsf{xkanjiskip} is allowed at the node.
1213 \attr{ltj@icflag} For distinguishing `kinds' of the node. To this
1214                         attribute, one of the following value is
1215                         assigned:
1216 \begin{description}
1217 \item[ITALIC (1)] Glues from an itaric correction
1218            (\verb+\/+). This distinction of origins of glues
1219            (from explicit \verb+\kern+, or from \verb+\/+)
1220            is needed in the insertion process of \textsf{xkanjiskip}.
1221 \item[PACKED (2)]
1222 \item[KINSOKU (3)] Penalties inserted for the word-wrapping  process of Japanese characters (\emph{kinsoku}).
1223 \item[FROM\_JFM (4)] Glues/kerns from JFM.
1224 \item[LINE\_END (5)] Kerns for ...
1225 \item[KANJI\_SKIP (6)] Glues for \textsf{kanjiskip}.
1226 \item[XKANJI\_SKIP (7)] Glues for \textsf{xkanjiskip}.
1227 \item[PROCESSED (8)] Nodes which is already processed by ...
1228 \item[IC\_PROCESSED (9)] Glues from an itaric correction, but also already processed.
1229 \item[BOXBDD (15)] Glues/kerns that inserted just the beginning or the ending of an hbox or a paragraph.
1230 \end{description}
1231 \attr{ltj@kcat$i$} Where $i$~is a natural number which is less than~7.
1232 These 7~attributes store bit~vectors indicating which character block is regarded as a block of \textbf{JAchar}s.
1233 \end{list}
1234
1235 Furthermore, \LuaTeX-ja uses several `user-defined' whatsit nodes for
1236 typesetting. All those nodes store a natural number (hence the node's
1237 \texttt{type} is 100).
1238 \begin{description}
1239 \item[30111] Nodes for indicating that \verb+\inhibitglue+ is
1240            specified. The \texttt{value} field of these nodes doesn't matter.
1241 \item[30112] Nodes for \LuaTeX-ja's stack system (see the next
1242            subsection). The \texttt{value} field of these nodes is
1243            current group.
1244 \item[30113] Nodes for Japanese Characters which the callback process of
1245            luaotfload won't be applied, andd the character code is
1246            stored in the \texttt{value} field. Each node having this
1247            \verb+user_id+ is converted to a `glyph\_node' \emph{after}
1248            the callback process of luaotfload.
1249 \end{description}
1250 These whatsits will be removed during the process of inserting \textbf{JAglue}s.
1251
1252 \subsection{Stack System of \LuaTeX-ja}\label{ssec-stack}
1253 \paragraph{Background}
1254 \LuaTeX-ja has its own stack system, and most parameters of \LuaTeX-ja
1255 are stored in it.  To clarify the reason, imagine the parameter
1256 \textsf{kanjiskip} is stored by a skip, and consider the following
1257 source:
1258 \begin{LTXexample}
1259 \ltjsetparameter{kanjiskip=0pt}ふがふが.%
1260 \setbox0=\hbox{\ltjsetparameter{kanjiskip=5pt}ほげほげ}
1261 \box0.ぴよぴよ\par
1262 \end{LTXexample}
1263
1264 As described in Part~\ref{part-ref}, the only effective value of
1265 \textsf{kanjiskip} in an hbox is the latest value, so the value of
1266 \textsf{kanjiskip} which applied in the entire hbox should be 5\,pt.
1267 However, by the implementation method of \LuaTeX, this `5\,pt' cannot be
1268 known from any callbacks.  In the \texttt{tex/packaging.w} (which is a
1269 file in the source of \LuaTeX), there are the following codes:
1270 \begin{lstlisting}
1271 void package(int c)
1272 {
1273     scaled h;                   /* height of box */
1274     halfword p;                 /* first node in a box */
1275     scaled d;                   /* max depth */
1276     int grp;
1277     grp = cur_group;
1278     d = box_max_depth;
1279     unsave();
1280     save_ptr -= 4;
1281     if (cur_list.mode_field == -hmode) {
1282         cur_box = filtered_hpack(cur_list.head_field,
1283                                  cur_list.tail_field, saved_value(1),
1284                                  saved_level(1), grp, saved_level(2));
1285         subtype(cur_box) = HLIST_SUBTYPE_HBOX;
1286 \end{lstlisting}
1287 Notice that \verb+unsave+ is executed \emph{before}
1288 \verb+filtered_hpack+ (this is where \verb+hpack_filter+ callback is
1289 executed): so `5\,pt' in the above source is orphaned at
1290 \texttt+unsave+, and hence it can't be accessed from \verb+hpack_filter+
1291 callback.
1292
1293 \paragraph{The method}
1294 The code of stack system is based on that in a post of Dev-luatex mailing list\footnote{%
1295 \texttt{[Dev-luatex] tex.currentgrouplevel}, a post at 2008/8/19 by Jonathan Sauer.}.
1296
1297 These are two \TeX\ count registers for maintaining informations:
1298 \verb+\ltj@@stack+ for the stack level, and \verb+\ltj@@group@level+ for
1299 the \TeX's group level when the last assignment was done.  Parameters
1300 are stored in one big table named \texttt{charprop\_stack\_table}, where
1301 \texttt{charprop\_stack\_table[$i$]} stores data of stack level~$i$. If
1302 a new stack level is created by \verb+\ltjsetparameter+, all data of the
1303 previous level is copied.
1304
1305 To resolve the problem mentioned in `Background' above, \LuaTeX-ja uses
1306 another thing: When a new stack level is about to be created, a whatsit
1307 node whose type, subtype and value are 44~(\textit{user\_defined}),
1308 30112, and current group level respectively is appended to the current
1309 list (we refer this node by \textit{stack\_flag}). This enables us to
1310 know whether assignment is done just inside a hbox. Suppose that the
1311 stack level is~$s$ and the \TeX's group level is~$t$ just after the hbox
1312 group, then:
1313 \begin{itemize}
1314 \item If there is no \textit{stack\_flag} node in the list of hbox, then
1315       no assignment was occurred inside the hbox. Hence values of
1316       parameters at the end of the hbox are stored in the stack
1317       level~$s$.
1318 \item If there is a \textit{stack\_flag} node whose value is~$t+1$, then
1319       an assignment was occurred just inside the hbox group. Hence
1320       values of parameters at the end of the hbox are stored in the
1321       stack level~$s+1$.
1322 \item If there are \textit{stack\_flag} nodes but all of their values
1323       are more than~$t+1$, then an assignment was occurred in the box,
1324       but it is done is `more internal' group. Hence values of
1325       parameters at the end of the hbox are stored in the stack
1326       level~$s$.
1327 \end{itemize}
1328
1329 Note that to work this trick correctly, assignments to
1330 \verb+\ltj@@stack+ and \verb+\ltj@@group@level+ have to be local always,
1331 regardless the value of \verb+\globaldefs+.
1332 This problem is resolved by using
1333 \hbox{\verb+\directlua{tex.globaldefs=0}+} (this assignment is local).
1334
1335
1336 \section{Linebreak after Japanese Character}\label{sec-lbreak}
1337 \subsection{Reference: Behavior in \pTeX}
1338 %<*en>
1339 In~\pTeX, a linebreak after a Japanese character doesn't emit a space,
1340 since words are not separated by spaces in Japanese writings. However,
1341 this feature isn't fully implemented in \LuaTeX-ja due to the
1342 specification of callbacks in~\LuaTeX. To clarify the difference between
1343 \pTeX~and~\LuaTeX, We briefly describe the handling of a linebreak in~\pTeX, in
1344 this subsection.
1345
1346 \pTeX's input processor can be described in terms of a finite state
1347 automaton, as that of~\TeX\ in~Section~2.5 of~\cite{texbytopic}. The
1348 internal states are as follows:
1349 \begin{itemize}
1350 \item State~$N$: new line
1351 \item State~$S$: skipping spaces
1352 \item State~$M$: middle of line
1353 \item State~$K$: after a Japanese character
1354 \end{itemize}
1355 The first three states---$N$, $S$~and~$M$---are as same as \TeX's input
1356 processor.  State~$K$ is similar to state~$M$, and is entered after
1357 Japanese characters.  The diagram of state transitions are indicated in
1358 Figure~\ref{fig-ptexipro}.  Note that \pTeX\ doesn't leave state~$K$
1359 after `beginning/ending of a group' characters.
1360 %</en>
1361
1362 %<*ja>
1363 欧文では文章の改行は単語間でしか行わない．そのため，\TeX では，（文字の直後の）改行は
1364 空白文字と同じ扱いとして扱われる．一方，和文ではほとんどどどこでも改行が可能なため，
1365 \pTeX では和文文字の直後の改行は単純に無視されるようになっている．
1366
1367 このような動作は，\pTeX が\TeX からエンジンとして拡張されたことによって可能になったことである．
1368 \pTeX の入力処理部は，\TeX におけるそれと同じように，有限オートマトンとして記述することができ，
1369 以下に述べるような4状態を持っている．
1370
1371 \begin{itemize}
1372 \item State~$N$: 行の開始．
1373 \item State~$S$: 空白読み飛ばし．
1374 \item State~$M$: 行中．
1375 \item State~$K$: 行中（和文文字の後）．
1376 \end{itemize}
1377 また，状態遷移は，図\label{fig-ptexipro}のようになっており，図中の数字は
1378 カテゴリーコードを表している．最初の3状態は\TeX の入力処理部と同じであり，
1379 図中から状態$K$と「$j$」と書かれた矢印を取り除けば，\TeX の入力処理部と同
1380 じものになる．
1381
1382 この図から分かることは，
1383 \begin{quote}
1384 行が和文文字（とグループ境界文字）で終わっていれば，改行は無視される
1385 \end{quote}
1386 ということである．
1387 %</ja>
1388
1389 \begin{figure}[tb]
1390 \begin{gather*}
1391  \def\sp{\text{\tt\char32}}
1392  \xymatrix{&&
1393    {\text{scan a cs}}\ar@(r,ul)[dr]&\\
1394 \ar[r]&
1395    *++[o][F-]{N}\ar[ur]^0\ar[dd]_{d,\ g}\ar[u]^{5\ (\text{\tt\char92par})}
1396      \ar@{->}@(d,l)[ddrr]_(0.45){j}&&
1397    *++[o][F-]{S}\ar@(l,dr)[ul]^0\ar@(l,ur)[ddll]_{d,\ g}\ar[u]_{5}
1398      \ar@{->}@(r,r)[dd]^{j}\\&\\&
1399    *++[o][F-]{M}\ar[uuur]^0\ar@(r,dl)[uurr]_(0.55){10\ (\sp)}
1400      \ar[d]_{5\ ({\sp})}\ar@{->}@(dr,dl)[rr]_{j}&&
1401    *++[o][F-]{K}\ar@{->}@(ul,d)[uuul]^0\ar@{->}[ll]^{d}
1402      \ar@{->}@(ur,dr)[uu]^{10\ (\sp)}\ar@{->}[d]_5\\
1403    &&&
1404  }\\
1405  d:=\{3,4,6,7,8,11,12,13\},\quad g:=\{1,2\},\quad j:=(\text{Japanese characters})
1406 \end{gather*}
1407 \begin{itemize}
1408 \item Numbers represent category codes.
1409 \item Category codes 9~(ignored), 14~(comment)~and~15~(invalid) are omitted in above diagram.
1410 \end{itemize}
1411 \caption{State transitions of \pTeX's input processor.}
1412 \label{fig-ptexipro}
1413 \end{figure}
1414
1415
1416 \subsection{Behavior in \LuaTeX-ja}
1417 %<*en>
1418 States in the input processoe of \LuaTeX\ is the same as that of \TeX,
1419 and they can't be customized by any callbacks. Hence, we can only use
1420 \verb+process_input_buffer+ and \verb+token_filter+ callbacks for to
1421 suppress a space by a linebreak which is after Japanese characters.
1422
1423 However, \verb+token_filter+ callback cannot be used either, since a
1424 character in category code 5~(end-of-line) is converted into an space
1425 token \emph{in the input processor}.  So we can use only the
1426 \verb+process_input_buffer+ callback.  This means that suppressing a
1427 space must be done \emph{just before} an input line is read.
1428
1429 Considering these situations, handling of an end-of-line in \LuaTeX-ja are as follows:
1430 \begin{quote}
1431 A character U+FFFFF (its category code is set to 14~(comment) by
1432 \LuaTeX-ja) is appended to an input line, \emph{before \LuaTeX\ actually
1433 process it}, if and only if the following two conditions are satisfied:
1434 \begin{enumerate}
1435 \item The category code of the character $\langle${return}$\rangle$
1436       (whose character code is 13) is 5~(end-of-line).
1437 \item The input line matches the following `regular expression':
1438 \[
1439   (\text{any char})^*(\textbf{JAchar})
1440   \bigl(\{\text{catcode}=1\}\cup\{\text{catcode}=2\}\bigr)^*
1441 \]
1442 \end{enumerate}
1443 \end{quote}
1444
1445 \paragraph{Remark}
1446 The following example shows the major difference from the behavior of \pTeX:
1447 \begin{LTXexample}
1448 \ltjsetparameter{autoxspacing=false}
1449 \ltjsetparameter{jacharrange={-6}}xあ
1450 y\ltjsetparameter{jacharrange={+6}}zあ
1451 u
1452 \end{LTXexample}
1453 \begin{itemize}
1454 \item There is no space between `x' and `y', since the line~2 ends with a \textbf{JAchar} `あ'
1455 (this `あ' considered as an \textbf{JAchar} at the ending of line~1).
1456 \item There is no space between `あ' (in the line~3) and `u', since the
1457       line~3 ends with an \textbf{ALchar}
1458 (the letter `あ' considered as an \textbf{ALchar} at the ending of line~2).
1459 \end{itemize}
1460 %</en>
1461
1462 %<*ja>
1463 \LuaTeX の入力処理部は\TeX のそれと全く同じであり，callbackによりユーザが
1464 カスタマイズすることはできない．このため，改行抑制の目的でユーザが利用で
1465 きそうなcallbackとしては，\verb+process_input_buffer+や
1466 \verb+token_filter+に限られてしまう．しかし，\TeX の入力処理部をよく見る
1467 と，後者も役には経たないことが分かる：改行文字は，入力処理部によってトー
1468 クン化される時に，カテゴリーコード10の32番文字へと置き換えられてしまうた
1469 め，\verb+token_filter+で非標準なトークン読み出しを行おうとしても，空白文
1470 字由来のトークンと，改行文字由来のトークンは区別できないのだ．
1471
1472 すると，我々のとれる道は，\verb+process_input_buffer+を用いて
1473 \LuaTeX の入力処理部に引き渡される前に入力文字列を編集するというものしかない．
1474 以上を踏まえ，\LuaTeX-jaにおける「和文文字直後の改行抑制」の処理は，次のようになっている：
1475
1476 \begin{quote}
1477 各入力行に対し，\textbf{その入力行が読まれる前の内部状態で}
1478 以下の2条件が満たされている場合，\LuaTeX-jaはU+FFFFF番の文字
1479 \footnote{この文字はコメント文字として扱われるように\LuaTeX-ja内部で設定をしている．}
1480 を末尾に追加する．よって，その場合に改行は空白とは見做されないこととなる．
1481 \begin{enumerate}
1482 \item 改行文字（文字コード13番）のカテゴリーコードが5~(end-of-line)である．
1483 \item 入力行は次の「正規表現」にマッチしている：
1484 \[
1485   (\text{any char})^*(\textbf{JAchar})
1486   \bigl(\{\text{catcode}=1\}\cup\{\text{catcode}=2\}\bigr)^*
1487 \]
1488 \end{enumerate}
1489 \end{quote}
1490
1491 この仕様は，前節で述べた\pTeX の仕様にできるだけ近づけたものとなっている．最初の条件は，
1492 \texttt{verbatim}系環境などの日本語対応マクロを書かなくてすませるためのものである．
1493 しかしながら，完全に同じ挙動が実現できたわけではない．
1494 差異は，次の例が示すように，和文文字の範囲を変更した行の改行において見られる：
1495 \begin{LTXexample}
1496 \ltjsetparameter{autoxspacing=false}
1497 \ltjsetparameter{jacharrange={-6}}xあ
1498 y\ltjsetparameter{jacharrange={+6}}zあ
1499 u
1500 \end{LTXexample}
1501 もし\pTeX とまったく同じ挙動を示すならば，出力は
1502 「\hbox{\ltjsetparameter{autoxspacing=false}x yzあu}」となるべきである．しかし，実際には
1503 上のように異なる挙動となっている．
1504 \begin{itemize}
1505 \item 2行目は「あ」という和文文字で終わる（2行目を処理する前の時点では，
1506       「あ」は和文文字扱いである）ため，直後の改行文字は無視される．
1507 \item 3行目は「あ」という欧文文字で終わる（2行目を処理する前の時点では，
1508       「あ」は欧文文字扱いである）ため，直後の改行文字は空白に置き換わる．
1509 \end{itemize}
1510 このため，トラブルを避けるために，和文文字の範囲を\verb+\ltjsetparameter+で編集した場合，
1511 その行はそこで改行するようにした方がいいだろう．
1512 %</ja>
1513
1514
1515 \section{Insertion of JFM glues, \textsf{kanjiskip} and \textsf{xkanjiskip}}
1516 %<*en>
1517 This will be the longest section of the document.
1518 However, ...
1519 %</en>
1520
1521 %<*ja>
1522 \LuaTeX-ja における和文処理グルーの挿入方法は，\pTeX のそれとは全く異なる．……
1523
1524 %</ja>
1525
1526
1527 \end{document}