OSDN Git Service

Unicode と文字種に関わるツールを作成.
authorKUROKI Yusuke <kuroky@users.sourceforge.jp>
Wed, 4 May 2011 03:39:51 +0000 (12:39 +0900)
committerKUROKI Yusuke <kuroky@users.sourceforge.jp>
Wed, 4 May 2011 03:39:51 +0000 (12:39 +0900)
- Unicode のブロックを元に defcharrange を作成するツールを作成
- ファイルに含まれる文字を元に,一つの defcharrange を作成するツールを作成.教育漢字範囲を定義して,フォントを切り替えるときなどの用途を想定.

tool/Blocks.txt [new file with mode: 0644]
tool/blocks2defcharrange.rb [new file with mode: 0644]
tool/chars2defcharrange.rb [new file with mode: 0644]
tool/kyoikukanji.txt [new file with mode: 0644]
tool/kyoikukanjiChars.tex [new file with mode: 0644]
tool/unicodeBlocks.tex [new file with mode: 0644]

diff --git a/tool/Blocks.txt b/tool/Blocks.txt
new file mode 100644 (file)
index 0000000..50df2e1
--- /dev/null
@@ -0,0 +1,240 @@
+# Blocks-6.0.0.txt
+# Date: 2010-06-04, 11:12:00 PDT [KW]
+#
+# Unicode Character Database
+# Copyright (c) 1991-2010 Unicode, Inc.
+# For terms of use, see http://www.unicode.org/terms_of_use.html
+# For documentation, see http://www.unicode.org/reports/tr44/
+#
+# Note:   The casing of block names is not normative.
+#         For example, "Basic Latin" and "BASIC LATIN" are equivalent.
+#
+# Format:
+# Start Code..End Code; Block Name
+
+# ================================================
+
+# Note:   When comparing block names, casing, whitespace, hyphens,
+#         and underbars are ignored.
+#         For example, "Latin Extended-A" and "latin extended a" are equivalent.
+#         For more information on the comparison of property values, 
+#            see UAX #44: http://www.unicode.org/reports/tr44/
+#
+#  All code points not explicitly listed for Block
+#  have the value No_Block.
+
+# Property:    Block
+#
+# @missing: 0000..10FFFF; No_Block
+
+0000..007F; Basic Latin
+0080..00FF; Latin-1 Supplement
+0100..017F; Latin Extended-A
+0180..024F; Latin Extended-B
+0250..02AF; IPA Extensions
+02B0..02FF; Spacing Modifier Letters
+0300..036F; Combining Diacritical Marks
+0370..03FF; Greek and Coptic
+0400..04FF; Cyrillic
+0500..052F; Cyrillic Supplement
+0530..058F; Armenian
+0590..05FF; Hebrew
+0600..06FF; Arabic
+0700..074F; Syriac
+0750..077F; Arabic Supplement
+0780..07BF; Thaana
+07C0..07FF; NKo
+0800..083F; Samaritan
+0840..085F; Mandaic
+0900..097F; Devanagari
+0980..09FF; Bengali
+0A00..0A7F; Gurmukhi
+0A80..0AFF; Gujarati
+0B00..0B7F; Oriya
+0B80..0BFF; Tamil
+0C00..0C7F; Telugu
+0C80..0CFF; Kannada
+0D00..0D7F; Malayalam
+0D80..0DFF; Sinhala
+0E00..0E7F; Thai
+0E80..0EFF; Lao
+0F00..0FFF; Tibetan
+1000..109F; Myanmar
+10A0..10FF; Georgian
+1100..11FF; Hangul Jamo
+1200..137F; Ethiopic
+1380..139F; Ethiopic Supplement
+13A0..13FF; Cherokee
+1400..167F; Unified Canadian Aboriginal Syllabics
+1680..169F; Ogham
+16A0..16FF; Runic
+1700..171F; Tagalog
+1720..173F; Hanunoo
+1740..175F; Buhid
+1760..177F; Tagbanwa
+1780..17FF; Khmer
+1800..18AF; Mongolian
+18B0..18FF; Unified Canadian Aboriginal Syllabics Extended
+1900..194F; Limbu
+1950..197F; Tai Le
+1980..19DF; New Tai Lue
+19E0..19FF; Khmer Symbols
+1A00..1A1F; Buginese
+1A20..1AAF; Tai Tham
+1B00..1B7F; Balinese
+1B80..1BBF; Sundanese
+1BC0..1BFF; Batak
+1C00..1C4F; Lepcha
+1C50..1C7F; Ol Chiki
+1CD0..1CFF; Vedic Extensions
+1D00..1D7F; Phonetic Extensions
+1D80..1DBF; Phonetic Extensions Supplement
+1DC0..1DFF; Combining Diacritical Marks Supplement
+1E00..1EFF; Latin Extended Additional
+1F00..1FFF; Greek Extended
+2000..206F; General Punctuation
+2070..209F; Superscripts and Subscripts
+20A0..20CF; Currency Symbols
+20D0..20FF; Combining Diacritical Marks for Symbols
+2100..214F; Letterlike Symbols
+2150..218F; Number Forms
+2190..21FF; Arrows
+2200..22FF; Mathematical Operators
+2300..23FF; Miscellaneous Technical
+2400..243F; Control Pictures
+2440..245F; Optical Character Recognition
+2460..24FF; Enclosed Alphanumerics
+2500..257F; Box Drawing
+2580..259F; Block Elements
+25A0..25FF; Geometric Shapes
+2600..26FF; Miscellaneous Symbols
+2700..27BF; Dingbats
+27C0..27EF; Miscellaneous Mathematical Symbols-A
+27F0..27FF; Supplemental Arrows-A
+2800..28FF; Braille Patterns
+2900..297F; Supplemental Arrows-B
+2980..29FF; Miscellaneous Mathematical Symbols-B
+2A00..2AFF; Supplemental Mathematical Operators
+2B00..2BFF; Miscellaneous Symbols and Arrows
+2C00..2C5F; Glagolitic
+2C60..2C7F; Latin Extended-C
+2C80..2CFF; Coptic
+2D00..2D2F; Georgian Supplement
+2D30..2D7F; Tifinagh
+2D80..2DDF; Ethiopic Extended
+2DE0..2DFF; Cyrillic Extended-A
+2E00..2E7F; Supplemental Punctuation
+2E80..2EFF; CJK Radicals Supplement
+2F00..2FDF; Kangxi Radicals
+2FF0..2FFF; Ideographic Description Characters
+3000..303F; CJK Symbols and Punctuation
+3040..309F; Hiragana
+30A0..30FF; Katakana
+3100..312F; Bopomofo
+3130..318F; Hangul Compatibility Jamo
+3190..319F; Kanbun
+31A0..31BF; Bopomofo Extended
+31C0..31EF; CJK Strokes
+31F0..31FF; Katakana Phonetic Extensions
+3200..32FF; Enclosed CJK Letters and Months
+3300..33FF; CJK Compatibility
+3400..4DBF; CJK Unified Ideographs Extension A
+4DC0..4DFF; Yijing Hexagram Symbols
+4E00..9FFF; CJK Unified Ideographs
+A000..A48F; Yi Syllables
+A490..A4CF; Yi Radicals
+A4D0..A4FF; Lisu
+A500..A63F; Vai
+A640..A69F; Cyrillic Extended-B
+A6A0..A6FF; Bamum
+A700..A71F; Modifier Tone Letters
+A720..A7FF; Latin Extended-D
+A800..A82F; Syloti Nagri
+A830..A83F; Common Indic Number Forms
+A840..A87F; Phags-pa
+A880..A8DF; Saurashtra
+A8E0..A8FF; Devanagari Extended
+A900..A92F; Kayah Li
+A930..A95F; Rejang
+A960..A97F; Hangul Jamo Extended-A
+A980..A9DF; Javanese
+AA00..AA5F; Cham
+AA60..AA7F; Myanmar Extended-A
+AA80..AADF; Tai Viet
+AB00..AB2F; Ethiopic Extended-A
+ABC0..ABFF; Meetei Mayek
+AC00..D7AF; Hangul Syllables
+D7B0..D7FF; Hangul Jamo Extended-B
+D800..DB7F; High Surrogates
+DB80..DBFF; High Private Use Surrogates
+DC00..DFFF; Low Surrogates
+E000..F8FF; Private Use Area
+F900..FAFF; CJK Compatibility Ideographs
+FB00..FB4F; Alphabetic Presentation Forms
+FB50..FDFF; Arabic Presentation Forms-A
+FE00..FE0F; Variation Selectors
+FE10..FE1F; Vertical Forms
+FE20..FE2F; Combining Half Marks
+FE30..FE4F; CJK Compatibility Forms
+FE50..FE6F; Small Form Variants
+FE70..FEFF; Arabic Presentation Forms-B
+FF00..FFEF; Halfwidth and Fullwidth Forms
+FFF0..FFFF; Specials
+10000..1007F; Linear B Syllabary
+10080..100FF; Linear B Ideograms
+10100..1013F; Aegean Numbers
+10140..1018F; Ancient Greek Numbers
+10190..101CF; Ancient Symbols
+101D0..101FF; Phaistos Disc
+10280..1029F; Lycian
+102A0..102DF; Carian
+10300..1032F; Old Italic
+10330..1034F; Gothic
+10380..1039F; Ugaritic
+103A0..103DF; Old Persian
+10400..1044F; Deseret
+10450..1047F; Shavian
+10480..104AF; Osmanya
+10800..1083F; Cypriot Syllabary
+10840..1085F; Imperial Aramaic
+10900..1091F; Phoenician
+10920..1093F; Lydian
+10A00..10A5F; Kharoshthi
+10A60..10A7F; Old South Arabian
+10B00..10B3F; Avestan
+10B40..10B5F; Inscriptional Parthian
+10B60..10B7F; Inscriptional Pahlavi
+10C00..10C4F; Old Turkic
+10E60..10E7F; Rumi Numeral Symbols
+11000..1107F; Brahmi
+11080..110CF; Kaithi
+12000..123FF; Cuneiform
+12400..1247F; Cuneiform Numbers and Punctuation
+13000..1342F; Egyptian Hieroglyphs
+16800..16A3F; Bamum Supplement
+1B000..1B0FF; Kana Supplement
+1D000..1D0FF; Byzantine Musical Symbols
+1D100..1D1FF; Musical Symbols
+1D200..1D24F; Ancient Greek Musical Notation
+1D300..1D35F; Tai Xuan Jing Symbols
+1D360..1D37F; Counting Rod Numerals
+1D400..1D7FF; Mathematical Alphanumeric Symbols
+1F000..1F02F; Mahjong Tiles
+1F030..1F09F; Domino Tiles
+1F0A0..1F0FF; Playing Cards
+1F100..1F1FF; Enclosed Alphanumeric Supplement
+1F200..1F2FF; Enclosed Ideographic Supplement
+1F300..1F5FF; Miscellaneous Symbols And Pictographs
+1F600..1F64F; Emoticons
+1F680..1F6FF; Transport And Map Symbols
+1F700..1F77F; Alchemical Symbols
+20000..2A6DF; CJK Unified Ideographs Extension B
+2A700..2B73F; CJK Unified Ideographs Extension C
+2B740..2B81F; CJK Unified Ideographs Extension D
+2F800..2FA1F; CJK Compatibility Ideographs Supplement
+E0000..E007F; Tags
+E0100..E01EF; Variation Selectors Supplement
+F0000..FFFFF; Supplementary Private Use Area-A
+100000..10FFFF; Supplementary Private Use Area-B
+
+# EOF
\ No newline at end of file
diff --git a/tool/blocks2defcharrange.rb b/tool/blocks2defcharrange.rb
new file mode 100644 (file)
index 0000000..186d0ba
--- /dev/null
@@ -0,0 +1,25 @@
+#! /usr/bin/ruby
+
+# The following script converts Blocks.txt
+# (http://unicode.org/Public/UNIDATA/Blocks.txt)
+# to the character range definitions of LuaTeX-ja.
+
+# USAGE: ruby blocks2defcharrange.rb > unicodeBlocks.tex
+
+count = 1
+open("Blocks.txt", "r").each_line {|line|
+  if line =~ /#/
+    line = $`
+  end
+  if line =~ /^\s*$/
+    next
+  end
+  if line =~ /([0-9a-f]+)\.\.([0-9a-f]+); (.*)/i
+    bcharcode = $1
+    echarcode = $2
+    blockname = $3
+    print "\\defcharrange{", count
+    print "}{\"", bcharcode, "-\"", echarcode, "} % ", blockname, "\n"
+    count += 1
+  end
+}
diff --git a/tool/chars2defcharrange.rb b/tool/chars2defcharrange.rb
new file mode 100644 (file)
index 0000000..21f6e77
--- /dev/null
@@ -0,0 +1,51 @@
+#! /usr/bin/ruby
+# -*- coding: utf-8 -*-
+
+# The following script converts a set of chars except "\s", as Ruby defines,
+# to the character range definition of LuaTeX-ja.
+
+# USAGE: ruby __FILE__ ifile rangeNo [> ofile]
+
+# Example (in Japanese)
+# 教育漢字リスト (http://www.aozora.gr.jp/kanji_table/kyouiku_list.zip)
+# に対して適用したいとき.
+# 1. kyoikukanji.txt に対して,コメント部分の先頭に # をつける編集を加える;
+# 2. ruby chars2defcharrange.rb kyoikukanji.txt 210 > kyoikukanjiChars.tex
+#    を実行する.
+
+def print_usage()
+  print "USAGE: ruby ", __FILE__, "ifile rangeNo [> ofile]\n"
+end
+
+if __FILE__ == $0
+  # コマンドライン引数の処理
+  if ARGV.length < 2
+    print_usage()
+    exit
+  end
+  ifile = ARGV[0]
+  rangeNo = ARGV[1]
+
+  # 対象文字列の作成
+  string = ""
+  open(ifile, "r").each_line{|line|
+    if line =~ /#/
+      line = $`
+    end
+    line.gsub!(/\s/){}
+    string += line
+  }
+
+  # 10 進 unicode code point 配列に変換
+  decs = string.unpack("U*")
+
+  # print
+  print "\defcharrange{", rangeNo, "}{"
+  decs.each_with_index{|code, index|
+    if index != 0
+      print ","
+    end
+    print "\"", code.to_s(16)
+  }
+  print "}\n"
+end
diff --git a/tool/kyoikukanji.txt b/tool/kyoikukanji.txt
new file mode 100644 (file)
index 0000000..76c48a6
--- /dev/null
@@ -0,0 +1,19 @@
+#教育漢字・学年別漢字配当表
+
+#★第一学年★(80字)
+一    右     雨     円     王     音     下     火     花     貝     学     気     九     休     玉     金     空     月     犬     見     五     口     校     左     三     山     子     四     糸     字     耳     七     車     手     十     出     女     小     上     森     人     水     正     生     青     夕     石     赤     千     川     先     早     草     足     村     大     男     竹     中     虫     町     天     田     土     二     日     入     年     白     八     百     文     木     本     名     目     立     力     林     六
+
+#★第二学年★(160字)
+引    羽     雲     園     遠     何     科     夏     家     歌     画     回     会     海     絵     外     角     楽     活     間     丸     岩     顔     汽     記     帰     弓     牛     魚     京     強     教     近     兄     形     計     元     言     原     戸     古     午     後     語     工     公     広     交     光     考     行     高     黄     合     谷     国     黒     今     才     細     作     算     止     市     矢     姉     思     紙     寺     自     時     室     社     弱     首     秋     週     春     書     少     場     色     食     心     新     親     図     数     西     声     星     晴     切     雪     船     線     前     組     走     多     太     体     台     地     池     知     茶     昼     長     鳥     朝     直     通     弟     店     点     電     刀     冬     当     東     答     頭     同     道     読     内     南     肉     馬     売     買     麦     半     番     父     風     分     聞     米     歩     母     方     北     毎     妹     万     明     鳴     毛     門     夜     野     友     用     曜     来     里     理     話
+
+#★第三学年★(200字)
+悪    安     暗     医     委     意     育     員     院     飲     運     泳     駅     央     横     屋     温     化     荷     界     開     階     寒     感     漢     館     岸     起     期     客     究     急     級     宮     球     去     橋     業     曲     局     銀     区     苦     具     君     係     軽     血     決     研     県     庫     湖     向     幸     港     号     根     祭     皿     仕     死     使     始     指     歯     詩     次     事     持     式     実     写     者     主     守     取     酒     受     州     拾     終     習     集     住     重     宿     所     暑     助     昭     消     商     章     勝     乗     植     申     身     神     真     深     進     世     整     昔     全     相     送     想     息     速     族     他     打     対     待     代     第     題     炭     短     談     着     注     柱     丁     帳     調     追     定     庭     笛     鉄     転     都     度     投     豆     島     湯     登     等     動     童     農     波     配     倍     箱     畑     発     反     坂     板     皮     悲     美     鼻     筆     氷     表     秒     病     品     負     部     服     福     物     平     返     勉     放     味     命     面     問     役     薬     由     油     有     遊     予     羊     洋     葉     陽     様     落     流     旅     両     緑     礼     列     練     路     和
+
+#★第四学年★(200字)
+愛    案     以     衣     位     囲     胃     印     英     栄     塩     億     加     果     貨     課     芽     改     械     害     街     各     覚     完     官     管     関     観     願     希     季     紀     喜     旗     器     機     議     求     泣     救     給     挙     漁     共     協     鏡     競     極     訓     軍     郡     径     型     景     芸     欠     結     建     健     験     固     功     好     候     航     康     告     差     菜     最     材     昨     札     刷     殺     察     参     産     散     残     士     氏     史     司     試     児     治     辞     失     借     種     周     祝     順     初     松     笑     唱     焼     象     照     賞     臣     信     成     省     清     静     席     積     折     節     説     浅     戦     選     然     争     倉     巣     束     側     続     卒     孫     帯     隊     達     単     置     仲     貯     兆     腸     低     底     停     的     典     伝     徒     努     灯     堂     働     特     得     毒     熱     念     敗     梅     博     飯     飛     費     必     票     標     不     夫     付     府     副     粉     兵     別     辺     変     便     包     法     望     牧     末     満     未     脈     民     無     約     勇     要     養     浴     利     陸     良     料     量     輪     類     令     冷     例     歴     連     老     労     録
+
+#★第五学年★(185字)
+圧    移     因     永     営     衛     易     益     液     演     応     往     桜     恩     可     仮     価     河     過     賀     快     解     格     確     額     刊     幹     慣     眼     基     寄     規     技     義     逆     久     旧     居     許     境     均     禁     句     群     経     潔     件     券     険     検     限     現     減     故     個     護     効     厚     耕     鉱     構     興     講     混     査     再     災     妻     採     際     在     財     罪     雑     酸     賛     支     志     枝     師     資     飼     示     似     識     質     舎     謝     授     修     述     術     準     序     招     承     証     条     状     常     情     織     職     制     性     政     勢     精     製     税     責     績     接     設     舌     絶     銭     祖     素     総     造     像     増     則     測     属     率     損     退     貸     態     団     断     築     張     提     程     適     敵     統     銅     導     徳     独     任     燃     能     破     犯     判     版     比     肥     非     備     俵     評     貧     布     婦     富     武     復     複     仏     編     弁     保     墓     報     豊     防     貿     暴     務     夢     迷     綿     輸     余     預     容     略     留     領
+
+#★第六学年★(181字)
+異    遺     域     宇     映     延     沿     我     灰     拡     革     閣     割     株     干     巻     看     簡     危     机     揮     貴     疑     吸     供     胸     郷     勤     筋     系     敬     警     劇     激     穴     絹     権     憲     源     厳     己     呼     誤     后     孝     皇     紅     降     鋼     刻     穀     骨     困     砂     座     済     裁     策     冊     蚕     至     私     姿     視     詞     誌     磁     射     捨     尺     若     樹     収     宗     就     衆     従     縦     縮     熟     純     処     署     諸     除     将     傷     障     城     蒸     針     仁     垂     推     寸     盛     聖     誠     宣     専     泉     洗     染     善     奏     窓     創     装     層     操     蔵     臓     存     尊     宅     担     探     誕     段     暖     値     宙     忠     著     庁     頂     潮     賃     痛     展     討     党     糖     届     難     乳     認     納     脳     派     拝     背     肺     俳     班     晩     否     批     秘     腹     奮     並     陛     閉     片     補     暮     宝     訪     亡     忘     棒     枚     幕     密     盟     模     訳     郵     優     幼     欲     翌     乱     卵     覧     裏     律     臨     朗     論
diff --git a/tool/kyoikukanjiChars.tex b/tool/kyoikukanjiChars.tex
new file mode 100644 (file)
index 0000000..82ff1eb
--- /dev/null
@@ -0,0 +1 @@
+defcharrange{210}{"4e00,"53f3,"96e8,"5186,"738b,"97f3,"4e0b,"706b,"82b1,"8c9d,"5b66,"6c17,"4e5d,"4f11,"7389,"91d1,"7a7a,"6708,"72ac,"898b,"4e94,"53e3,"6821,"5de6,"4e09,"5c71,"5b50,"56db,"7cf8,"5b57,"8033,"4e03,"8eca,"624b,"5341,"51fa,"5973,"5c0f,"4e0a,"68ee,"4eba,"6c34,"6b63,"751f,"9752,"5915,"77f3,"8d64,"5343,"5ddd,"5148,"65e9,"8349,"8db3,"6751,"5927,"7537,"7af9,"4e2d,"866b,"753a,"5929,"7530,"571f,"4e8c,"65e5,"5165,"5e74,"767d,"516b,"767e,"6587,"6728,"672c,"540d,"76ee,"7acb,"529b,"6797,"516d,"5f15,"7fbd,"96f2,"5712,"9060,"4f55,"79d1,"590f,"5bb6,"6b4c,"753b,"56de,"4f1a,"6d77,"7d75,"5916,"89d2,"697d,"6d3b,"9593,"4e38,"5ca9,"9854,"6c7d,"8a18,"5e30,"5f13,"725b,"9b5a,"4eac,"5f37,"6559,"8fd1,"5144,"5f62,"8a08,"5143,"8a00,"539f,"6238,"53e4,"5348,"5f8c,"8a9e,"5de5,"516c,"5e83,"4ea4,"5149,"8003,"884c,"9ad8,"9ec4,"5408,"8c37,"56fd,"9ed2,"4eca,"624d,"7d30,"4f5c,"7b97,"6b62,"5e02,"77e2,"59c9,"601d,"7d19,"5bfa,"81ea,"6642,"5ba4,"793e,"5f31,"9996,"79cb,"9031,"6625,"66f8,"5c11,"5834,"8272,"98df,"5fc3,"65b0,"89aa,"56f3,"6570,"897f,"58f0,"661f,"6674,"5207,"96ea,"8239,"7dda,"524d,"7d44,"8d70,"591a,"592a,"4f53,"53f0,"5730,"6c60,"77e5,"8336,"663c,"9577,"9ce5,"671d,"76f4,"901a,"5f1f,"5e97,"70b9,"96fb,"5200,"51ac,"5f53,"6771,"7b54,"982d,"540c,"9053,"8aad,"5185,"5357,"8089,"99ac,"58f2,"8cb7,"9ea6,"534a,"756a,"7236,"98a8,"5206,"805e,"7c73,"6b69,"6bcd,"65b9,"5317,"6bce,"59b9,"4e07,"660e,"9cf4,"6bdb,"9580,"591c,"91ce,"53cb,"7528,"66dc,"6765,"91cc,"7406,"8a71,"60aa,"5b89,"6697,"533b,"59d4,"610f,"80b2,"54e1,"9662,"98f2,"904b,"6cf3,"99c5,"592e,"6a2a,"5c4b,"6e29,"5316,"8377,"754c,"958b,"968e,"5bd2,"611f,"6f22,"9928,"5cb8,"8d77,"671f,"5ba2,"7a76,"6025,"7d1a,"5bae,"7403,"53bb,"6a4b,"696d,"66f2,"5c40,"9280,"533a,"82e6,"5177,"541b,"4fc2,"8efd,"8840,"6c7a,"7814,"770c,"5eab,"6e56,"5411,"5e78,"6e2f,"53f7,"6839,"796d,"76bf,"4ed5,"6b7b,"4f7f,"59cb,"6307,"6b6f,"8a69,"6b21,"4e8b,"6301,"5f0f,"5b9f,"5199,"8005,"4e3b,"5b88,"53d6,"9152,"53d7,"5dde,"62fe,"7d42,"7fd2,"96c6,"4f4f,"91cd,"5bbf,"6240,"6691,"52a9,"662d,"6d88,"5546,"7ae0,"52dd,"4e57,"690d,"7533,"8eab,"795e,"771f,"6df1,"9032,"4e16,"6574,"6614,"5168,"76f8,"9001,"60f3,"606f,"901f,"65cf,"4ed6,"6253,"5bfe,"5f85,"4ee3,"7b2c,"984c,"70ad,"77ed,"8ac7,"7740,"6ce8,"67f1,"4e01,"5e33,"8abf,"8ffd,"5b9a,"5ead,"7b1b,"9244,"8ee2,"90fd,"5ea6,"6295,"8c46,"5cf6,"6e6f,"767b,"7b49,"52d5,"7ae5,"8fb2,"6ce2,"914d,"500d,"7bb1,"7551,"767a,"53cd,"5742,"677f,"76ae,"60b2,"7f8e,"9f3b,"7b46,"6c37,"8868,"79d2,"75c5,"54c1,"8ca0,"90e8,"670d,"798f,"7269,"5e73,"8fd4,"52c9,"653e,"5473,"547d,"9762,"554f,"5f79,"85ac,"7531,"6cb9,"6709,"904a,"4e88,"7f8a,"6d0b,"8449,"967d,"69d8,"843d,"6d41,"65c5,"4e21,"7dd1,"793c,"5217,"7df4,"8def,"548c,"611b,"6848,"4ee5,"8863,"4f4d,"56f2,"80c3,"5370,"82f1,"6804,"5869,"5104,"52a0,"679c,"8ca8,"8ab2,"82bd,"6539,"68b0,"5bb3,"8857,"5404,"899a,"5b8c,"5b98,"7ba1,"95a2,"89b3,"9858,"5e0c,"5b63,"7d00,"559c,"65d7,"5668,"6a5f,"8b70,"6c42,"6ce3,"6551,"7d66,"6319,"6f01,"5171,"5354,"93e1,"7af6,"6975,"8a13,"8ecd,"90e1,"5f84,"578b,"666f,"82b8,"6b20,"7d50,"5efa,"5065,"9a13,"56fa,"529f,"597d,"5019,"822a,"5eb7,"544a,"5dee,"83dc,"6700,"6750,"6628,"672d,"5237,"6bba,"5bdf,"53c2,"7523,"6563,"6b8b,"58eb,"6c0f,"53f2,"53f8,"8a66,"5150,"6cbb,"8f9e,"5931,"501f,"7a2e,"5468,"795d,"9806,"521d,"677e,"7b11,"5531,"713c,"8c61,"7167,"8cde,"81e3,"4fe1,"6210,"7701,"6e05,"9759,"5e2d,"7a4d,"6298,"7bc0,"8aac,"6d45,"6226,"9078,"7136,"4e89,"5009,"5de3,"675f,"5074,"7d9a,"5352,"5b6b,"5e2f,"968a,"9054,"5358,"7f6e,"4ef2,"8caf,"5146,"8178,"4f4e,"5e95,"505c,"7684,"5178,"4f1d,"5f92,"52aa,"706f,"5802,"50cd,"7279,"5f97,"6bd2,"71b1,"5ff5,"6557,"6885,"535a,"98ef,"98db,"8cbb,"5fc5,"7968,"6a19,"4e0d,"592b,"4ed8,"5e9c,"526f,"7c89,"5175,"5225,"8fba,"5909,"4fbf,"5305,"6cd5,"671b,"7267,"672b,"6e80,"672a,"8108,"6c11,"7121,"7d04,"52c7,"8981,"990a,"6d74,"5229,"9678,"826f,"6599,"91cf,"8f2a,"985e,"4ee4,"51b7,"4f8b,"6b74,"9023,"8001,"52b4,"9332,"5727,"79fb,"56e0,"6c38,"55b6,"885b,"6613,"76ca,"6db2,"6f14,"5fdc,"5f80,"685c,"6069,"53ef,"4eee,"4fa1,"6cb3,"904e,"8cc0,"5feb,"89e3,"683c,"78ba,"984d,"520a,"5e79,"6163,"773c,"57fa,"5bc4,"898f,"6280,"7fa9,"9006,"4e45,"65e7,"5c45,"8a31,"5883,"5747,"7981,"53e5,"7fa4,"7d4c,"6f54,"4ef6,"5238,"967a,"691c,"9650,"73fe,"6e1b,"6545,"500b,"8b77,"52b9,"539a,"8015,"9271,"69cb,"8208,"8b1b,"6df7,"67fb,"518d,"707d,"59bb,"63a1,"969b,"5728,"8ca1,"7f6a,"96d1,"9178,"8cdb,"652f,"5fd7,"679d,"5e2b,"8cc7,"98fc,"793a,"4f3c,"8b58,"8cea,"820e,"8b1d,"6388,"4fee,"8ff0,"8853,"6e96,"5e8f,"62db,"627f,"8a3c,"6761,"72b6,"5e38,"60c5,"7e54,"8077,"5236,"6027,"653f,"52e2,"7cbe,"88fd,"7a0e,"8cac,"7e3e,"63a5,"8a2d,"820c,"7d76,"92ad,"7956,"7d20,"7dcf,"9020,"50cf,"5897,"5247,"6e2c,"5c5e,"7387,"640d,"9000,"8cb8,"614b,"56e3,"65ad,"7bc9,"5f35,"63d0,"7a0b,"9069,"6575,"7d71,"9285,"5c0e,"5fb3,"72ec,"4efb,"71c3,"80fd,"7834,"72af,"5224,"7248,"6bd4,"80a5,"975e,"5099,"4ff5,"8a55,"8ca7,"5e03,"5a66,"5bcc,"6b66,"5fa9,"8907,"4ecf,"7de8,"5f01,"4fdd,"5893,"5831,"8c4a,"9632,"8cbf,"66b4,"52d9,"5922,"8ff7,"7dbf,"8f38,"4f59,"9810,"5bb9,"7565,"7559,"9818,"7570,"907a,"57df,"5b87,"6620,"5ef6,"6cbf,"6211,"7070,"62e1,"9769,"95a3,"5272,"682a,"5e72,"5dfb,"770b,"7c21,"5371,"673a,"63ee,"8cb4,"7591,"5438,"4f9b,"80f8,"90f7,"52e4,"7b4b,"7cfb,"656c,"8b66,"5287,"6fc0,"7a74,"7d79,"6a29,"61b2,"6e90,"53b3,"5df1,"547c,"8aa4,"540e,"5b5d,"7687,"7d05,"964d,"92fc,"523b,"7a40,"9aa8,"56f0,"7802,"5ea7,"6e08,"88c1,"7b56,"518a,"8695,"81f3,"79c1,"59ff,"8996,"8a5e,"8a8c,"78c1,"5c04,"6368,"5c3a,"82e5,"6a39,"53ce,"5b97,"5c31,"8846,"5f93,"7e26,"7e2e,"719f,"7d14,"51e6,"7f72,"8af8,"9664,"5c06,"50b7,"969c,"57ce,"84b8,"91dd,"4ec1,"5782,"63a8,"5bf8,"76db,"8056,"8aa0,"5ba3,"5c02,"6cc9,"6d17,"67d3,"5584,"594f,"7a93,"5275,"88c5,"5c64,"64cd,"8535,"81d3,"5b58,"5c0a,"5b85,"62c5,"63a2,"8a95,"6bb5,"6696,"5024,"5b99,"5fe0,"8457,"5e81,"9802,"6f6e,"8cc3,"75db,"5c55,"8a0e,"515a,"7cd6,"5c4a,"96e3,"4e73,"8a8d,"7d0d,"8133,"6d3e,"62dd,"80cc,"80ba,"4ff3,"73ed,"6669,"5426,"6279,"79d8,"8179,"596e,"4e26,"965b,"9589,"7247,"88dc,"66ae,"5b9d,"8a2a,"4ea1,"5fd8,"68d2,"679a,"5e55,"5bc6,"76df,"6a21,"8a33,"90f5,"512a,"5e7c,"6b32,"7fcc,"4e71,"5375,"89a7,"88cf,"5f8b,"81e8,"6717,"8ad6,}
diff --git a/tool/unicodeBlocks.tex b/tool/unicodeBlocks.tex
new file mode 100644 (file)
index 0000000..204b242
--- /dev/null
@@ -0,0 +1,209 @@
+\defcharrange{1}{"0000-"007F} % Basic Latin
+\defcharrange{2}{"0080-"00FF} % Latin-1 Supplement
+\defcharrange{3}{"0100-"017F} % Latin Extended-A
+\defcharrange{4}{"0180-"024F} % Latin Extended-B
+\defcharrange{5}{"0250-"02AF} % IPA Extensions
+\defcharrange{6}{"02B0-"02FF} % Spacing Modifier Letters
+\defcharrange{7}{"0300-"036F} % Combining Diacritical Marks
+\defcharrange{8}{"0370-"03FF} % Greek and Coptic
+\defcharrange{9}{"0400-"04FF} % Cyrillic
+\defcharrange{10}{"0500-"052F} % Cyrillic Supplement
+\defcharrange{11}{"0530-"058F} % Armenian
+\defcharrange{12}{"0590-"05FF} % Hebrew
+\defcharrange{13}{"0600-"06FF} % Arabic
+\defcharrange{14}{"0700-"074F} % Syriac
+\defcharrange{15}{"0750-"077F} % Arabic Supplement
+\defcharrange{16}{"0780-"07BF} % Thaana
+\defcharrange{17}{"07C0-"07FF} % NKo
+\defcharrange{18}{"0800-"083F} % Samaritan
+\defcharrange{19}{"0840-"085F} % Mandaic
+\defcharrange{20}{"0900-"097F} % Devanagari
+\defcharrange{21}{"0980-"09FF} % Bengali
+\defcharrange{22}{"0A00-"0A7F} % Gurmukhi
+\defcharrange{23}{"0A80-"0AFF} % Gujarati
+\defcharrange{24}{"0B00-"0B7F} % Oriya
+\defcharrange{25}{"0B80-"0BFF} % Tamil
+\defcharrange{26}{"0C00-"0C7F} % Telugu
+\defcharrange{27}{"0C80-"0CFF} % Kannada
+\defcharrange{28}{"0D00-"0D7F} % Malayalam
+\defcharrange{29}{"0D80-"0DFF} % Sinhala
+\defcharrange{30}{"0E00-"0E7F} % Thai
+\defcharrange{31}{"0E80-"0EFF} % Lao
+\defcharrange{32}{"0F00-"0FFF} % Tibetan
+\defcharrange{33}{"1000-"109F} % Myanmar
+\defcharrange{34}{"10A0-"10FF} % Georgian
+\defcharrange{35}{"1100-"11FF} % Hangul Jamo
+\defcharrange{36}{"1200-"137F} % Ethiopic
+\defcharrange{37}{"1380-"139F} % Ethiopic Supplement
+\defcharrange{38}{"13A0-"13FF} % Cherokee
+\defcharrange{39}{"1400-"167F} % Unified Canadian Aboriginal Syllabics
+\defcharrange{40}{"1680-"169F} % Ogham
+\defcharrange{41}{"16A0-"16FF} % Runic
+\defcharrange{42}{"1700-"171F} % Tagalog
+\defcharrange{43}{"1720-"173F} % Hanunoo
+\defcharrange{44}{"1740-"175F} % Buhid
+\defcharrange{45}{"1760-"177F} % Tagbanwa
+\defcharrange{46}{"1780-"17FF} % Khmer
+\defcharrange{47}{"1800-"18AF} % Mongolian
+\defcharrange{48}{"18B0-"18FF} % Unified Canadian Aboriginal Syllabics Extended
+\defcharrange{49}{"1900-"194F} % Limbu
+\defcharrange{50}{"1950-"197F} % Tai Le
+\defcharrange{51}{"1980-"19DF} % New Tai Lue
+\defcharrange{52}{"19E0-"19FF} % Khmer Symbols
+\defcharrange{53}{"1A00-"1A1F} % Buginese
+\defcharrange{54}{"1A20-"1AAF} % Tai Tham
+\defcharrange{55}{"1B00-"1B7F} % Balinese
+\defcharrange{56}{"1B80-"1BBF} % Sundanese
+\defcharrange{57}{"1BC0-"1BFF} % Batak
+\defcharrange{58}{"1C00-"1C4F} % Lepcha
+\defcharrange{59}{"1C50-"1C7F} % Ol Chiki
+\defcharrange{60}{"1CD0-"1CFF} % Vedic Extensions
+\defcharrange{61}{"1D00-"1D7F} % Phonetic Extensions
+\defcharrange{62}{"1D80-"1DBF} % Phonetic Extensions Supplement
+\defcharrange{63}{"1DC0-"1DFF} % Combining Diacritical Marks Supplement
+\defcharrange{64}{"1E00-"1EFF} % Latin Extended Additional
+\defcharrange{65}{"1F00-"1FFF} % Greek Extended
+\defcharrange{66}{"2000-"206F} % General Punctuation
+\defcharrange{67}{"2070-"209F} % Superscripts and Subscripts
+\defcharrange{68}{"20A0-"20CF} % Currency Symbols
+\defcharrange{69}{"20D0-"20FF} % Combining Diacritical Marks for Symbols
+\defcharrange{70}{"2100-"214F} % Letterlike Symbols
+\defcharrange{71}{"2150-"218F} % Number Forms
+\defcharrange{72}{"2190-"21FF} % Arrows
+\defcharrange{73}{"2200-"22FF} % Mathematical Operators
+\defcharrange{74}{"2300-"23FF} % Miscellaneous Technical
+\defcharrange{75}{"2400-"243F} % Control Pictures
+\defcharrange{76}{"2440-"245F} % Optical Character Recognition
+\defcharrange{77}{"2460-"24FF} % Enclosed Alphanumerics
+\defcharrange{78}{"2500-"257F} % Box Drawing
+\defcharrange{79}{"2580-"259F} % Block Elements
+\defcharrange{80}{"25A0-"25FF} % Geometric Shapes
+\defcharrange{81}{"2600-"26FF} % Miscellaneous Symbols
+\defcharrange{82}{"2700-"27BF} % Dingbats
+\defcharrange{83}{"27C0-"27EF} % Miscellaneous Mathematical Symbols-A
+\defcharrange{84}{"27F0-"27FF} % Supplemental Arrows-A
+\defcharrange{85}{"2800-"28FF} % Braille Patterns
+\defcharrange{86}{"2900-"297F} % Supplemental Arrows-B
+\defcharrange{87}{"2980-"29FF} % Miscellaneous Mathematical Symbols-B
+\defcharrange{88}{"2A00-"2AFF} % Supplemental Mathematical Operators
+\defcharrange{89}{"2B00-"2BFF} % Miscellaneous Symbols and Arrows
+\defcharrange{90}{"2C00-"2C5F} % Glagolitic
+\defcharrange{91}{"2C60-"2C7F} % Latin Extended-C
+\defcharrange{92}{"2C80-"2CFF} % Coptic
+\defcharrange{93}{"2D00-"2D2F} % Georgian Supplement
+\defcharrange{94}{"2D30-"2D7F} % Tifinagh
+\defcharrange{95}{"2D80-"2DDF} % Ethiopic Extended
+\defcharrange{96}{"2DE0-"2DFF} % Cyrillic Extended-A
+\defcharrange{97}{"2E00-"2E7F} % Supplemental Punctuation
+\defcharrange{98}{"2E80-"2EFF} % CJK Radicals Supplement
+\defcharrange{99}{"2F00-"2FDF} % Kangxi Radicals
+\defcharrange{100}{"2FF0-"2FFF} % Ideographic Description Characters
+\defcharrange{101}{"3000-"303F} % CJK Symbols and Punctuation
+\defcharrange{102}{"3040-"309F} % Hiragana
+\defcharrange{103}{"30A0-"30FF} % Katakana
+\defcharrange{104}{"3100-"312F} % Bopomofo
+\defcharrange{105}{"3130-"318F} % Hangul Compatibility Jamo
+\defcharrange{106}{"3190-"319F} % Kanbun
+\defcharrange{107}{"31A0-"31BF} % Bopomofo Extended
+\defcharrange{108}{"31C0-"31EF} % CJK Strokes
+\defcharrange{109}{"31F0-"31FF} % Katakana Phonetic Extensions
+\defcharrange{110}{"3200-"32FF} % Enclosed CJK Letters and Months
+\defcharrange{111}{"3300-"33FF} % CJK Compatibility
+\defcharrange{112}{"3400-"4DBF} % CJK Unified Ideographs Extension A
+\defcharrange{113}{"4DC0-"4DFF} % Yijing Hexagram Symbols
+\defcharrange{114}{"4E00-"9FFF} % CJK Unified Ideographs
+\defcharrange{115}{"A000-"A48F} % Yi Syllables
+\defcharrange{116}{"A490-"A4CF} % Yi Radicals
+\defcharrange{117}{"A4D0-"A4FF} % Lisu
+\defcharrange{118}{"A500-"A63F} % Vai
+\defcharrange{119}{"A640-"A69F} % Cyrillic Extended-B
+\defcharrange{120}{"A6A0-"A6FF} % Bamum
+\defcharrange{121}{"A700-"A71F} % Modifier Tone Letters
+\defcharrange{122}{"A720-"A7FF} % Latin Extended-D
+\defcharrange{123}{"A800-"A82F} % Syloti Nagri
+\defcharrange{124}{"A830-"A83F} % Common Indic Number Forms
+\defcharrange{125}{"A840-"A87F} % Phags-pa
+\defcharrange{126}{"A880-"A8DF} % Saurashtra
+\defcharrange{127}{"A8E0-"A8FF} % Devanagari Extended
+\defcharrange{128}{"A900-"A92F} % Kayah Li
+\defcharrange{129}{"A930-"A95F} % Rejang
+\defcharrange{130}{"A960-"A97F} % Hangul Jamo Extended-A
+\defcharrange{131}{"A980-"A9DF} % Javanese
+\defcharrange{132}{"AA00-"AA5F} % Cham
+\defcharrange{133}{"AA60-"AA7F} % Myanmar Extended-A
+\defcharrange{134}{"AA80-"AADF} % Tai Viet
+\defcharrange{135}{"AB00-"AB2F} % Ethiopic Extended-A
+\defcharrange{136}{"ABC0-"ABFF} % Meetei Mayek
+\defcharrange{137}{"AC00-"D7AF} % Hangul Syllables
+\defcharrange{138}{"D7B0-"D7FF} % Hangul Jamo Extended-B
+\defcharrange{139}{"D800-"DB7F} % High Surrogates
+\defcharrange{140}{"DB80-"DBFF} % High Private Use Surrogates
+\defcharrange{141}{"DC00-"DFFF} % Low Surrogates
+\defcharrange{142}{"E000-"F8FF} % Private Use Area
+\defcharrange{143}{"F900-"FAFF} % CJK Compatibility Ideographs
+\defcharrange{144}{"FB00-"FB4F} % Alphabetic Presentation Forms
+\defcharrange{145}{"FB50-"FDFF} % Arabic Presentation Forms-A
+\defcharrange{146}{"FE00-"FE0F} % Variation Selectors
+\defcharrange{147}{"FE10-"FE1F} % Vertical Forms
+\defcharrange{148}{"FE20-"FE2F} % Combining Half Marks
+\defcharrange{149}{"FE30-"FE4F} % CJK Compatibility Forms
+\defcharrange{150}{"FE50-"FE6F} % Small Form Variants
+\defcharrange{151}{"FE70-"FEFF} % Arabic Presentation Forms-B
+\defcharrange{152}{"FF00-"FFEF} % Halfwidth and Fullwidth Forms
+\defcharrange{153}{"FFF0-"FFFF} % Specials
+\defcharrange{154}{"10000-"1007F} % Linear B Syllabary
+\defcharrange{155}{"10080-"100FF} % Linear B Ideograms
+\defcharrange{156}{"10100-"1013F} % Aegean Numbers
+\defcharrange{157}{"10140-"1018F} % Ancient Greek Numbers
+\defcharrange{158}{"10190-"101CF} % Ancient Symbols
+\defcharrange{159}{"101D0-"101FF} % Phaistos Disc
+\defcharrange{160}{"10280-"1029F} % Lycian
+\defcharrange{161}{"102A0-"102DF} % Carian
+\defcharrange{162}{"10300-"1032F} % Old Italic
+\defcharrange{163}{"10330-"1034F} % Gothic
+\defcharrange{164}{"10380-"1039F} % Ugaritic
+\defcharrange{165}{"103A0-"103DF} % Old Persian
+\defcharrange{166}{"10400-"1044F} % Deseret
+\defcharrange{167}{"10450-"1047F} % Shavian
+\defcharrange{168}{"10480-"104AF} % Osmanya
+\defcharrange{169}{"10800-"1083F} % Cypriot Syllabary
+\defcharrange{170}{"10840-"1085F} % Imperial Aramaic
+\defcharrange{171}{"10900-"1091F} % Phoenician
+\defcharrange{172}{"10920-"1093F} % Lydian
+\defcharrange{173}{"10A00-"10A5F} % Kharoshthi
+\defcharrange{174}{"10A60-"10A7F} % Old South Arabian
+\defcharrange{175}{"10B00-"10B3F} % Avestan
+\defcharrange{176}{"10B40-"10B5F} % Inscriptional Parthian
+\defcharrange{177}{"10B60-"10B7F} % Inscriptional Pahlavi
+\defcharrange{178}{"10C00-"10C4F} % Old Turkic
+\defcharrange{179}{"10E60-"10E7F} % Rumi Numeral Symbols
+\defcharrange{180}{"11000-"1107F} % Brahmi
+\defcharrange{181}{"11080-"110CF} % Kaithi
+\defcharrange{182}{"12000-"123FF} % Cuneiform
+\defcharrange{183}{"12400-"1247F} % Cuneiform Numbers and Punctuation
+\defcharrange{184}{"13000-"1342F} % Egyptian Hieroglyphs
+\defcharrange{185}{"16800-"16A3F} % Bamum Supplement
+\defcharrange{186}{"1B000-"1B0FF} % Kana Supplement
+\defcharrange{187}{"1D000-"1D0FF} % Byzantine Musical Symbols
+\defcharrange{188}{"1D100-"1D1FF} % Musical Symbols
+\defcharrange{189}{"1D200-"1D24F} % Ancient Greek Musical Notation
+\defcharrange{190}{"1D300-"1D35F} % Tai Xuan Jing Symbols
+\defcharrange{191}{"1D360-"1D37F} % Counting Rod Numerals
+\defcharrange{192}{"1D400-"1D7FF} % Mathematical Alphanumeric Symbols
+\defcharrange{193}{"1F000-"1F02F} % Mahjong Tiles
+\defcharrange{194}{"1F030-"1F09F} % Domino Tiles
+\defcharrange{195}{"1F0A0-"1F0FF} % Playing Cards
+\defcharrange{196}{"1F100-"1F1FF} % Enclosed Alphanumeric Supplement
+\defcharrange{197}{"1F200-"1F2FF} % Enclosed Ideographic Supplement
+\defcharrange{198}{"1F300-"1F5FF} % Miscellaneous Symbols And Pictographs
+\defcharrange{199}{"1F600-"1F64F} % Emoticons
+\defcharrange{200}{"1F680-"1F6FF} % Transport And Map Symbols
+\defcharrange{201}{"1F700-"1F77F} % Alchemical Symbols
+\defcharrange{202}{"20000-"2A6DF} % CJK Unified Ideographs Extension B
+\defcharrange{203}{"2A700-"2B73F} % CJK Unified Ideographs Extension C
+\defcharrange{204}{"2B740-"2B81F} % CJK Unified Ideographs Extension D
+\defcharrange{205}{"2F800-"2FA1F} % CJK Compatibility Ideographs Supplement
+\defcharrange{206}{"E0000-"E007F} % Tags
+\defcharrange{207}{"E0100-"E01EF} % Variation Selectors Supplement
+\defcharrange{208}{"F0000-"FFFFF} % Supplementary Private Use Area-A
+\defcharrange{209}{"100000-"10FFFF} % Supplementary Private Use Area-B