Fix the treatment of '(文字)*' syntax.

author Hironori Kitagawa <h_kitagawa2001@yahoo.co.jp>

Sun, 15 Jul 2012 04:17:24 +0000 (13:17 +0900)

committer Hironori Kitagawa <h_kitagawa2001@yahoo.co.jp>

Sun, 15 Jul 2012 04:17:24 +0000 (13:17 +0900)
author Hironori Kitagawa <h_kitagawa2001@yahoo.co.jp>
Sun, 15 Jul 2012 04:17:24 +0000 (13:17 +0900)
committer Hironori Kitagawa <h_kitagawa2001@yahoo.co.jp>
Sun, 15 Jul 2012 04:17:24 +0000 (13:17 +0900)
diff --git a/doc/jfm-test.lua b/doc/jfm-test.lua

index 8d73b95..fdee46c 100644 (file)
--- a/doc/jfm-test.lua
+++ b/doc/jfm-test.lua
@@ -5,6 +5,7 @@ luatexja.jfont.define_jfm {
     xkanjiskip = { 0.31, 0.045, 0.057 },
  
     [0] = {
+      chars = { '漢', 'ヒ*' }, 
        align = 'left', left = 0.0, down = 0.0,
        width = 1.0, height = 0.88, depth = 0.12, italic=0.0,
     },
@@ -65,6 +66,11 @@ luatexja.jfont.define_jfm {
        width = 1.0, height = 0.88, depth = 0.12, italic=0.0,
        glue = { [199] = { 0.78, 0, 0} },
     },
+   [2000] = {
+      chars = { '。', '、*', 'ﾋ' },
+      align = 'left', left = 0.0, down = 0.0,
+      width = 0.5, height = 0.88, depth = 0.12, italic=0.0,
+   },
     [100] = {
        chars = { '「' },
        align = 'right', left = 0.0, down = 0.0,
diff --git a/doc/luatexja.dtx b/doc/luatexja.dtx

index 1580ac9..4c9d25e 100644 (file)
--- a/doc/luatexja.dtx
+++ b/doc/luatexja.dtx
@@ -3187,6 +3187,7 @@ Like the \Param{kanjiskip} field, this field specifies the `ideal'
  \end{list}
  
  %<*en>
+\paragraph{Character classes}
  Besides from above fields, a JFM file have several sub-tables those
  indices are natural numbers.  The table indexed by~$i\in\omega$ stores
  information of `character class'~$i$. At least, the character class~0 is
@@ -3195,6 +3196,7 @@ always present, so each JFM file must have a sub-table whose index is
  the following fields:
  %</en>
  %<*ja>
+\paragraph{文字クラス}
  上記のフィールドに加えて，JFMファイルはそのインデックスが自然数であるいくつかの
  サブテーブルを持つ．インデックスが$i\in\omega$であるテーブルは「文字クラス」$i$の
  情報を格納する．少なくとも，文字クラス0は常に存在するので，JFMファイルはインデックス
@@ -3202,6 +3204,7 @@ the following fields:
  （そのインデックスを$i$で表わす）は以下のフィールドを持つ：
  %</ja>
  %<*zh>
+\paragraph{Character classes}
  除了上面涉及到的内容，JFM文件中还有几个以自然数进行声明的次级表。
  这些表依靠满足$i\in\omega$的“字符类”$i$来索引。
  一般，最少需要的是字符类0，故每一个JFM文件必须有次级表索引为\texttt{[0]}。
@@ -3215,22 +3218,25 @@ the following fields:
  
  %<*en>
  This field is a list of characters which are in this character
-            type~$i$. This field is not required if $i=0$, since all
+            type~$i$. This field is optional if $i=0$, since all
              \textbf{JAchar} which are not in any character class other
              than 0 are in the character class 0
               (hence, the character class~0 contains most of
              \textbf{JAchar}s). In the list, a character can be
              specified by its code number, or by the character itself
              (as a string of length~1). Moreover, there are `imaginary
-            characters' which specified in the list. We will describe these later.
+            characters' which can be specified in the list. We will describe these later.
  %</en>
  %<*ja>
  このフィールドは文字クラス$i$に属する文字のリストである．このフィールドは$i=0$の
-場合には必須ではない．なぜならば，文字クラス0には，0以外の文字クラスに属するものを
-除いた全ての\textbf{JAchar}が属するからである（よって，文字クラス0はほとんどの
-\textbf{JAchar}を含む）．このリストでは，文字はその文字コードを用いて，もしくは
-文字それ自体（長さ1の文字列）によって指定される．さらに，このリストで指定される
-「仮想的な文字」も存在する．これらについては後に記す．
+場合には任意である（文字クラス0には，0以外の文字クラスに属するものを
+除いた全ての\textbf{JAchar}が属するから）．このリスト中で文字を指定するには，以下の方法がある：
+\begin{itemize}
+\item Unicode におけるコード番号
+\item 「\verb+'あ'+」のような，文字それ自体
+\item 「\verb+'あ*'+」のような，文字それ自体の後にアスタリスクをつけたもの
+\item いくつかの「仮想的な文字」（後に説明する）
+\end{itemize}
  %</ja>
  %<*zh>
  这部分为字符集$i$的字符列表。当$i=0$时不需要设定此部分，因为不在字符集0种的\textbf{JAchar}
@@ -3427,6 +3433,66 @@ Furthermore, the glyph is shifted according to values of fields
  \item[glue={\{[$j$]=\{<width>, <stretch>, <shrink>\}, ...\}}]
  \end{list}
  
+%<*ja>
+\paragraph{文字クラスの決定}
+文字クラスの決定は少々複雑である．ここでは例を用いて説明しよう．
+
+
+たとえば，次の内容を一部に含んだ \texttt{jfm-test.lua} を考えよう：
+\begin{lstlisting}
+   [0] = {
+      chars = { '漢', 'ヒ*' }, 
+      align = 'left', left = 0.0, down = 0.0,
+      width = 1.0, height = 0.88, depth = 0.12, italic=0.0,
+   },
+   [2000] = {
+      chars = { '。', '、*', 'ﾋ' },
+      align = 'left', left = 0.0, down = 0.0,
+      width = 0.5, height = 0.88, depth = 0.12, italic=0.0,
+   },
+\end{lstlisting}
+句点「。」の幅は二分であるので
+\begin{LTXexample}
+\jfont\a=psft:Ryumin-Light:jfm=test;+vert
+\setbox0\hbox{\a 。\inhibitglue 漢}
+\the\wd0
+\end{LTXexample}
+では，全角二分(15.0\,pt)とならなければおかしいが，上の実行結果では20\,ptとなっている．
+それは以下の事情によるものである：
+\begin{enumerate}
+\item \verb+vert+ featureによって句点が縦書き用のグリフと置き換わる（\Pkg{luaotfload} による処理）．
+\item しかしこのグリフは「文字コード」U+F0000以降とみなされている
+（実際にいくらになるかは，フォントによって異なる）．
+\item よって，文字クラス0とみなされるため，結果として「。」の幅は全角だと認識されてしまう．
+\end{enumerate}
+
+一方，「\texttt{'、*'}」のようにアスタリスクつきの指定があると，
+状況は異なってくる．
+\begin{LTXexample}
+\jfont\a=psft:Ryumin-Light:jfm=test;+vert
+\a 漢、\inhibitglue 漢
+\end{LTXexample}
+ここで，読点「、」の文字クラスは，以下のようにして決まる．
+\begin{enumerate}
+\item とりあえず句点の時と同じように，\Pkg{luaotfload} によって縦書き用読点のグリフに置き換わる．
+\item 置換後のグリフの「文字コード」はU+F0000以降であり，
+そのままでは文字クラスは0と判定される．
+\item ところが，JFMには「\texttt{'、*'}」指定があるので，置換前の横書き用読点のグリフ「、」（文字コードはU+3001）によって文字クラスを判定する．
+\item 結果として，上の出力例中の読点の文字クラスは2000となる．
+\end{enumerate}
+
+なお，置換後のグリフで判定した文字クラスの値が0でなければ，そちらをそのまま作用する．
+\begin{LTXexample}
+\jfont\a=psft:Ryumin-Light:jfm=test;+hwid
+\a 漢ﾋひ
+\end{LTXexample}
+上の例では，
+\texttt{hwid} featureにより，「ヒ」が半角の「ﾋ」に置き換わるが，
+文字クラスは「ヒ」の属する0\textbf{ではなく}，「ﾋ」の属する2000となる．
+%</ja>
+
+%<ja>\paragraph{仮想的な文字}
+%<!ja>\paragraph{Imaginary characters}
  %<*en>
  As described before, you can specify several `imaginary characters' in
  \texttt{chars} field. The most of these characters are regarded as the
diff --git a/src/ltj-jfmglue.lua b/src/ltj-jfmglue.lua

index e4562d9..0863cca 100644 (file)
--- a/src/ltj-jfmglue.lua
+++ b/src/ltj-jfmglue.lua
@@ -62,6 +62,7 @@ local kanji_skip
  local xkanji_skip
  
  local attr_jchar_class = luatexbase.attributes['ltj@charclass']
+local attr_orig_char = luatexbase.attributes['ltj@origchar']
  local attr_curjfnt = luatexbase.attributes['ltj@curjfnt']
  local attr_icflag = luatexbase.attributes['ltj@icflag']
  local attr_autospc = luatexbase.attributes['ltj@autospc']
@@ -75,12 +76,7 @@ local par_indented -- is the paragraph indented?
  -------------------- Helper functions
  
  local function copy_attr(new, old) 
-  local a = old.attr
-  if a then a = a.next end
-  while a do
-     set_attr(new, a.number, a.value)
-     a = node.next(a)
-  end
+  -- 仕様が決まるまで off にしておく
  end
  
  -- This function is called only for acquiring `special' characters.
@@ -429,9 +425,9 @@ end
  -- 和文文字のデータを取得
  function set_np_xspc_jachar(Nx, x)
     local z = ltjf.font_metric_table[x.font]
-   local c = has_attr(x, attr_jchar_class) or 0
+   local c = has_attr(x, attr_orig_char) or 0
     local cls = ltjf.find_char_class(x.char, z) or 0
-   if cls==0 then cls = ltjf.find_char_class(-c, z) end
+   if cls==0 and c ~= x.char then cls = ltjf.find_char_class(-c, z) end
     local m = ltjf.metrics[z.jfm]
     set_attr(x, attr_jchar_class, cls)
     Nx.class = cls
@@ -619,7 +615,6 @@ end
  -- get kanjiskip
  local function get_kanji_skip_from_jfm(Nn)
     local i = Nn.met.size_cache[Nn.size].kanjiskip
-   print(Nn.met.size_cache[Nn.size])
     if i then
        return { i[1], i[2], i[3] }
     else return nil
diff --git a/src/ltj-pretreat.lua b/src/ltj-pretreat.lua

index 87cbdd6..e935a89 100644 (file)
--- a/src/ltj-pretreat.lua
+++ b/src/ltj-pretreat.lua
@@ -35,7 +35,7 @@ local attr_ykblshift = luatexbase.attributes['ltj@ykblshift']
  
  local ltjf_font_metric_table = ltjf.font_metric_table
  local ltjc_is_ucs_in_japanese_char = ltjc.is_ucs_in_japanese_char
-local attr_jchar_class = luatexbase.attributes['ltj@charclass']
+local attr_orig_char = luatexbase.attributes['ltj@origchar']
  local ltjf_find_char_class = ltjf.find_char_class
  
  ------------------------------------------------------------------------
@@ -48,9 +48,9 @@ local function suppress_hyphenate_ja(head)
     local non_math = true
     for p in node_traverse(head) do
        if p.id == id_glyph and non_math then
-        if (has_attr(p, attr_icflag) or 0)==0 and ltjc_is_ucs_in_japanese_char(p) then
+        if (has_attr(p, attr_icflag) or 0)<=0 and ltjc_is_ucs_in_japanese_char(p) then
             p.font = has_attr(p, attr_curjfnt) or p.font
-           set_attr(p, attr_jchar_class, p.char)
+           set_attr(p, attr_orig_char, p.char)
             set_attr(p, attr_yablshift, has_attr(p, attr_ykblshift) or 0)
             p.subtype = floor(p.subtype/2)*2
          end
diff --git a/src/ltj-setwidth.lua b/src/ltj-setwidth.lua

index 7946829..945ddc9 100644 (file)
--- a/src/ltj-setwidth.lua
+++ b/src/ltj-setwidth.lua
@@ -66,7 +66,7 @@ function capsule_glyph(p, dir, mode, met, class)
     fshift = luatexbase.call_callback("luatexja.set_width", fshift, met, class)
  --   local ti = 
     p.xoffset= p.xoffset - fshift.left
-   if mode or p.width ~= fwidth or p.height ~= fheight or p.depth ~= fdepth then
+   if (mode or p.width ~= fwidth or p.height ~= fheight or p.depth ~= fdepth) then
        local y_shift = - p.yoffset + (has_attr(p,attr_yablshift) or 0)
        p.yoffset = -fshift.down
        head, q = node.remove(head, p)
diff --git a/src/luatexja-core.sty b/src/luatexja-core.sty

index 44445a5..4f606bc 100644 (file)
--- a/src/luatexja-core.sty
+++ b/src/luatexja-core.sty
@@ -136,6 +136,7 @@
  \newluatexattribute\jfam          % index for current jfam
  \newluatexattribute\ltj@uniqid    % unique id of box/paragraph
  \newluatexattribute\ltj@charclass % 
+\newluatexattribute\ltj@origchar % 
  \newluatexattribute\ltj@yablshift % attribute for \yabaselineshift
  \newluatexattribute\ltj@ykblshift % attribute for \ykbaselineshift
  \newluatexattribute\ltj@autospc   % attribute for autospacing
diff --git a/src/luatexja.lua b/src/luatexja.lua

index bcbe4a9..73023a3 100644 (file)
--- a/src/luatexja.lua
+++ b/src/luatexja.lua
@@ -249,7 +249,7 @@ local function debug_show_node_X(p,print_fn)
     local base = debug_depth .. string.format('%X', has_attr(p,attr_icflag) or 0)
     .. ' ' .. pt .. ' ' .. tostring(p.subtype) .. ' '
     if pt == 'glyph' then
-      s = base .. ' ' .. utf.char(p.char) .. ' (' .. p.char .. ') ' .. tostring(p.font)
+      s = base .. ' ' .. utf.char(p.char) .. ' '  .. tostring(p.font)
           .. ' (' .. print_scaled(p.height) .. '+' 
           .. print_scaled(p.depth) .. ')x' .. print_scaled(p.width)
        print_fn(s)
diff --git a/test/test04-jfm.pdf b/test/test04-jfm.pdf

index 44ab5d1..47376d0 100644 (file)

Binary files a/test/test04-jfm.pdf and b/test/test04-jfm.pdf differ
diff --git a/test/test04-jfm.tex b/test/test04-jfm.tex

index 4d7ca23..5ac470a 100644 (file)
--- a/test/test04-jfm.tex
+++ b/test/test04-jfm.tex
@@ -1,6 +1,6 @@
-%#!luatex
+%#!luatex test04-jfm ; pdftotext test04-jfm.pdf /tmp/new
  \input luatexja-core.sty
-
+\pdfcompresslevel=0
  \def\head#1{\medskip\penalty-100\noindent{\bf\tengt ■ #1}\par\penalty10000 }
  \jfont\rml={psft:Ryumin-Light:jfm=ujis} at 10pt
  \rml あ\inhibitglue\char"201Cあ・い←Ryumin-Light
@@ -74,7 +74,7 @@
  あいうえおさしすせそ}\par
  
  
-\vfill\eject
+\vfill\eject\tracingonline=0
  \noindent{\bf\gt  以下はJFMグルー挿入検証}
  \jfont\rmlh={psft:Ryumin-Light:jfm=test} at 10pt
  \jfont\sixgt={psft:GothicBBB-Medium:jfm=ujis} at 6pt
author	Hironori Kitagawa <h_kitagawa2001@yahoo.co.jp>
	Sun, 15 Jul 2012 04:17:24 +0000 (13:17 +0900)
committer	Hironori Kitagawa <h_kitagawa2001@yahoo.co.jp>
	Sun, 15 Jul 2012 04:17:24 +0000 (13:17 +0900)
doc/jfm-test.lua		patch \| blob \| history
doc/luatexja.dtx		patch \| blob \| history
src/ltj-jfmglue.lua		patch \| blob \| history
src/ltj-pretreat.lua		patch \| blob \| history
src/ltj-setwidth.lua		patch \| blob \| history
src/luatexja-core.sty		patch \| blob \| history
src/luatexja.lua		patch \| blob \| history
test/test04-jfm.pdf		patch \| blob \| history
test/test04-jfm.tex		patch \| blob \| history