1 .\" Hey Emacs! This file is -*- nroff -*- source.
3 .\" Copyright (C) Markus Kuhn, 1996, 2001
5 .\" This is free documentation; you can redistribute it and/or
6 .\" modify it under the terms of the GNU General Public License as
7 .\" published by the Free Software Foundation; either version 2 of
8 .\" the License, or (at your option) any later version.
10 .\" The GNU General Public License's references to "object code"
11 .\" and "executables" are to be interpreted as the output of any
12 .\" document formatting or typesetting system, including
13 .\" intermediate and printed output.
15 .\" This manual is distributed in the hope that it will be useful,
16 .\" but WITHOUT ANY WARRANTY; without even the implied warranty of
17 .\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
18 .\" GNU General Public License for more details.
20 .\" You should have received a copy of the GNU General Public
21 .\" License along with this manual; if not, write to the Free
22 .\" Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111,
25 .\" 1995-11-26 Markus Kuhn <mskuhn@cip.informatik.uni-erlangen.de>
26 .\" First version written
27 .\" 2001-05-11 Markus Kuhn <mgk25@cl.cam.ac.uk>
30 .\" Japanese Version Copyright (c) 1997 HANATAKA Shinya
31 .\" all rights reserved.
32 .\" Translated Thu Jun 3 20:40:01 JST 1997
33 .\" by HANATAKA Shinya <hanataka@abyss.rim.or.jp>
34 .\" Updated (add SECURITY section) & modified Mon Feb 26 2001
35 .\" by NAKANO Takeo <nakano@apm.seikei.ac.jp>
36 .\" Updated & Modified Sun Jul 1 09:28:47 JST 2001
37 .\" by Yuichi SATO <ysato@h4.dion.ne.jp>
39 .TH UTF-8 7 2001-05-11 "GNU" "Linux Programmer's Manual"
41 UTF-8 \- ASCII ¤È¸ß´¹À¤Î¤¢¤ë¿¥Ð¥¤¥È Unicode ¤ÎÉä¹æ²½
43 .B "¥æ¥Ë¥³¡¼¥É (Unicode) 3.0"
44 ʸ»ú½¸¹ç¤Ï 16 ¥Ó¥Ã¥È¤Î¥³¡¼¥É¶õ´Ö¤òÀê¤á¤ë¡£
45 ºÇ¤âñ½ã¤Ê Unicode ¤ÎÉä¹æ²½ÊýË¡
47 ¤Ç¤Ï¡¢Ê¸»ú¤Ï 16 ¥Ó¥Ã¥È¡¦¥ï¡¼¥É (16 ¥Ó¥Ã¥Èʸ»ú¤ÎÎó) ¤Ç¹½À®¤µ¤ì¤ë¡£
49 \(aq\\0\(aq ¤ä \(aq/\(aq ¤Î¤è¤¦¤Ê (¥Õ¥¡¥¤¥ë̾¤ä C ¤Î¥é¥¤¥Ö¥é¥ê´Ø¿ô¤Î°ú¤¿ô¤ÎÆâÉô¤Ç)
50 Æüì¤Ê°ÕÌ£¤ò»ý¤Ä 16 ¥Ó¥Ã¥Èʸ»ú¤¬´Þ¤Þ¤ì¤ë¤³¤È¤¬¤¢¤ë¡£
51 ¤µ¤é¤Ë¡¢¤Û¤È¤ó¤É¤Î UNIX ¥Ä¡¼¥ë¤Ï ASCII ¥Õ¥¡¥¤¥ë¤òÆþÎϤȤ·¤Æ´üÂÔ¤¹¤ë¤Î¤Ç¡¢
52 ÂçÉý¤ÊÊѹ¹¤Ê¤·¤Ë¤Ï 16 ¥Ó¥Ã¥È¥ï¡¼¥É¤òʸ»ú¤È¤·¤ÆÆɤळ¤È¤¬¤Ç¤¤Ê¤¤¡£
55 ¤Ï¥Õ¥¡¥¤¥ë̾¡¦¥Æ¥¥¹¥È¥Õ¥¡¥¤¥ë¡¦´Ä¶ÊÑ¿ô¤Ê¤É¤ËÍѤ¤¤ë¡¢³°ÉôÍѤÎ
57 Éä¹æ¤È¤·¤Æ¤ÏÉÔŬÀڤǤ¢¤ë¡£
58 Unicode ¤Î¥¹¡¼¥Ñ¡¼¥»¥Ã¥È¤Ç¤¢¤ë
59 .B "ISO 10646 Universal Character Set (UCS)"
60 ¤Ï 31 ¥Ó¥Ã¥È¤Î¥³¡¼¥É¶õ´Ö¤òÀê¤á¤ë¤¬¡¢¤½¤ÎºÇ¤âñ½ã¤ÊÉä¹æ²½¤Ç¤¢¤ë
62 ¤Ë¤â (32 ¥Ó¥Ã¥È¡¦¥ï¡¼¥É¤ÎÎó¤È¤·¤Æ) Ʊ¤¸ÌäÂ꤬¤¢¤ë¡£
69 Éä¹æ²½¤Ë¤Ï¤³¤ì¤é¤ÎÌäÂ꤬¤Ê¤¤¤Î¤Ç¡¢UNIX ·Á¼°¤Î OS ¾å¤Ç
71 ʸ»ú½¸¹ç¤ò»ÈÍѤ¹¤ë¤¿¤á¤Î°ìÈÌŪ¤ÊÊýË¡¤È¤Ê¤Ã¤Æ¤¤¤ë¡£
74 Éä¹æ²½¤Ï°Ê²¼¤Î¤è¤¦¤ÊÁÇÀ²¤·¤¤À¼Á¤òÈ÷¤¨¤Æ¤¤¤ë:
78 ʸ»ú¤Î¤¦¤Á 0x00000000 ¤«¤é 0x0000007f ¤Þ¤Ç (¸ÅŵŪ¤Ê
80 ¤Îʸ»ú) ¤Ï (ASCII ¤È¤Î¸ß´¹À¤Î¤¿¤á¤Ë) ñ½ã¤Ë 0x00 ¤«¤é 0x7f ¤Î¥Ð¥¤¥È¤Ë
81 Éä¹æ²½¤¹¤ë¡£¤³¤ì¤Ï 7 ¥Ó¥Ã¥È ASCII ʸ»ú¤Î¤ß¤ò´Þ¤à¥Õ¥¡¥¤¥ë¤äʸ»úÎó¤Ë
86 ¤ÇƱ¤¸Éä¹æ²½¤ò¹Ô¤Ê¤¦¤³¤È¤ò°ÕÌ£¤¹¤ë¡£
89 0x7f ¤è¤êÂ礤¤¤Î¤¹¤Ù¤Æ¤Î
91 ʸ»ú¤Ï¡¢ 0x80 ¤«¤é 0xfd ¤Þ¤Ç¤ÎÈϰϤΥХ¤¥È¤Î¤ß¤ò´Þ¤à
92 ¿¥Ð¥¤¥Èʸ»úÎó¤ËÉä¹æ²½¤µ¤ì¤ë¡£
94 ASCII ¥Ð¥¤¥È¤¬´Þ¤Þ¤ì¤ë¤³¤È¤¬¤Ê¤¯¡¢\(aq\\0\(aq ¤ä \(aq/\(aq ¤ÎÌäÂê¤ÏȯÀ¸¤·¤Ê¤¤¡£
98 ʸ»úÎó¤Ç¤Ï¼½ñŪ¥½¡¼¥È¤Î½ç½ø¤¬Êݤ¿¤ì¤ë¡£
101 2^31 ¥Ó¥Ã¥È¤Î¤¹¤Ù¤Æ¤Î UCS ¥³¡¼¥É ¤¬
103 ¤ò»ÈÍѤ·¤ÆÉä¹æ²½¤Ç¤¤ë¡£
107 Éä¹æ²½¤Ç¤Ï 0xfe ¤È 0xff ¤Î¥Ð¥¤¥È¤ÏÀäÂФ˻ÈÍѤ·¤Ê¤¤¡£
112 ʸ»ú¤Î¿¥Ð¥¤¥ÈÎó¤ÎºÇ½é¤Î¥Ð¥¤¥È¤Ï¡¢
113 ¾ï¤Ë 0xc0 ¤«¤é 0xfd ¤ÎÈϰϤÇɽ¸½¤µ¤ì¡¢
114 ¤½¤Îʸ»ú¤¬²¿¥Ð¥¤¥È¤Ç¹½À®¤µ¤ì¤Æ¤¤¤ë¤«¤ò¼¨¤¹¡£
115 ¿¥Ð¥¤¥ÈÎó¤Î»Ä¤ê¤ÎÉôʬ¤Î¥Ð¥¤¥È¤Ï¡¢¤½¤ì¤¾¤ì 0x80 ¤«¤é 0xbf ¤ÎÈϰϤˤ¢¤ë¡£
116 ¤³¤ì¤Ë¤è¤êƱ´ü¤¬Íưפˤʤꡢ¥¹¥Æ¡¼¥È¥ì¥¹¤ÊÉä¹æ²½¤¬²Äǽ¤Ë¤Ê¤ê¡¢
117 ¥Ð¥¤¥È¤Îʶ¼º¤ËÂФ·¤Æ·ø¸Ç¤Ë¤Ê¤ë¡£
123 ʸ»ú¤ÎÉä¹æ²½¤ÏºÇÂç 6 ¥Ð¥¤¥È¤ÎŤµ¤Ë¤Ê¤ë¡£
126 µ¬³Ê¤Ç¤Ï 0x10ffff ¤è¤êÀè¤Îʸ»ú¤ò»ØÄꤷ¤Ê¤¤¤Î¤Ç¡¢Unicode ʸ»ú¤Ï
128 ¤Ç¤Ï 4 ¥Ð¥¤¥È¤Þ¤Ç¤Ë¤·¤«¤Ê¤é¤Ê¤¤¡£
130 °Ê²¼¤Î¥Ð¥¤¥ÈÎó¤¬Ê¸»ú¤Îɽ¸½¤Ë»ÈÍѤµ¤ì¤ë¡£
131 ¤É¤Î¥Ð¥¤¥ÈÎó¤ò»ÈÍѤ¹¤ë¤«¤Ïʸ»ú¤Î UCS ¥³¡¼¥ÉÈÖ¹æ¤Ë°Í¸¤¹¤ë:
133 0x00000000 \- 0x0000007F:
136 0x00000080 \- 0x000007FF:
140 0x00000800 \- 0x0000FFFF:
145 0x00010000 \- 0x001FFFFF:
151 0x00200000 \- 0x03FFFFFF:
158 0x04000000 \- 0x7FFFFFFF:
167 ¥Ó¥Ã¥È¤ÎÉôʬ¤Ë¤Ï 2 ¿Ê¿ô¤Çɽ¤ï¤·¤¿Ê¸»ú¥³¡¼¥É¤Î¥Ó¥Ã¥ÈÉôʬ¤¬Âбþ¤¹¤ë¡£
168 ¤½¤Îʸ»ú¤òɽ¸½¤¹¤ë¤Î¤ËºÇ¤âû¤¤¥Ð¥¤¥ÈÎó¤Î¤ß¤¬»ÈÍѤǤ¤ë¡£
170 0xd800\(en0xdfff (UTF-16 ¥µ¥í¥²¡¼¥È) ¤ä
171 0xfffe, 0xffff (UCS ¤Î noncharacter) ¤È¤¤¤¦
175 ¤Ë½àµò¤·¤¿¥¹¥È¥ê¡¼¥à¤ËÆþ¤ì¤ë¤Ù¤¤Ç¤Ï¤Ê¤¤¡£
178 ʸ»ú¤Î 0xa9 = 1010 1001 (¥³¥Ô¡¼¥é¥¤¥È¡¦¥Þ¡¼¥¯) ¤Ï UTF-8 ¤ÇÉä¹æ²½¤¹¤ë¤È
181 11000010 10101001 = 0xc2 0xa9
186 0x2260 = 0010 0010 0110 0000 (ÉÔÅù¹æ) ¤Ï
189 11100010 10001001 10100000 = 0xe2 0x89 0xa0
193 .SS ¥¢¥×¥ê¥±¡¼¥·¥ç¥ó¤Ë¤ª¤±¤ëÃí°Õ
194 ¥æ¡¼¥¶¡¼¤Ï¥¢¥×¥ê¥±¡¼¥·¥ç¥ó¤Î
196 ¥µ¥Ý¡¼¥È¤ò͸ú¤Ë¤¹¤ë¤¿¤á¤Ë¡¢°Ê²¼¤Î¤è¤¦¤Ë¤·¤Æ
198 ¥í¥±¡¼¥ë¤òÁªÂò¤·¤Ê¤±¤ì¤Ð¤Ê¤é¤Ê¤¤¡£
201 export LANG=en_GB.UTF-8
204 »ÈÍѤµ¤ì¤Æ¤¤¤ëʸ»úÉä¹æ²½¤òʬ¤«¤Ã¤Æ¤¤¤Ê¤±¤ì¤Ð¤Ê¤é¤Ê¤¤
205 ¥¢¥×¥ê¥±¡¼¥·¥ç¥ó¥½¥Õ¥È¥¦¥§¥¢¤Ï¡¢
206 °Ê²¼¤Î¤è¤¦¤Ë¤·¤Æ¾ï¤Ë¥í¥±¡¼¥ë¤òÀßÄꤹ¤Ù¤¤Ç¤¢¤ë¡£
209 setlocale(LC_CTYPE, "")
214 ¥í¥±¡¼¥ë¤¬ÁªÂò¤µ¤ì¤Æ¤¤¤Æ¡¢¥×¥ì¡¼¥ó¥Æ¥¥¹¥È¤Îɸ½àÆþ½ÐÎÏ¡¦Ã¼Ëö´ÖÄÌ¿®¡¦
215 ¥×¥ì¡¼¥ó¥Æ¥¥¹¥È¥Õ¥¡¥¤¥ë¤ÎÆâÍÆ¡¦¥Õ¥¡¥¤¥ë̾¡¦´Ä¶ÊÑ¿ô¤¬
217 ¤ÇÉä¹æ²½¤µ¤ì¤Æ¤¤¤ë¤«¤ò¥Á¥§¥Ã¥¯¤¹¤ë¤¿¤á¤Ë¡¢
218 ¥×¥í¥°¥é¥Þ¡¼¤Ï°Ê²¼¤Î¤è¤¦¤Ê¼°¤ò»î¤¹¤³¤È¤¬¤Ç¤¤ë¡£
221 strcmp(nl_langinfo(CODESET), "UTF-8") == 0
227 ¤È¤¤¤Ã¤¿¥·¥ó¥°¥ë¥Ð¥¤¥È¤ÎÉä¹æ²½¤¬½¬´·¤Ë¤Ê¤Ã¤Æ¤¤¤ë¥×¥í¥°¥é¥Þ¡¼¤Ï¡¢
228 ¤³¤ì¤Þ¤Ç¤Î 2 ¤Ä¤Î²¾Ä꤬
230 ¥í¥±¡¼¥ë¤Ë¤ª¤¤¤Æ¤ÏºÇÁá͸ú¤Ç¤Ï¤Ê¤¯¤Ê¤Ã¤¿¤³¤È¤òÃΤäƤª¤¯¤Ù¤¤À¡£
231 1 ÈÖÌܤÎÊѹ¹ÅÀ¤Ï¡¢1 ¥Ð¥¤¥È¤¬É¬¤º¤·¤â 1 ¤Ä¤Îʸ»ú¤ËÂбþ¤·¤Ê¤¤¤È¤¤¤¦ÅÀ¤Ç¤¢¤ë¡£
232 2 ÈÖÌܤÎÊѹ¹ÅÀ¤Ï¡¢ºÇ¶á¤ÎüËö¥¨¥ß¥å¥ì¡¼¥¿¤Ï
234 ¥â¡¼¥É¤Ë¤ª¤¤¤ÆÃæ¹ñ¸ì¡¦ÆüËܸ졦´Ú¹ñÄ«Á¯¸ì¤Î
236 ¤ä¥¹¥Ú¡¼¥¹¤¬Æþ¤é¤Ê¤¤ (nonspacing)
237 .B "¹çÀ®Ê¸»ú (combining characters)"
240 ¤Î¤È¤¤Î¤è¤¦¤Ë 1 ʸ»ú½ÐÎϤ·¤¿¸å¤Ç
241 ¥«¡¼¥½¥ë¤òɬ¤º¤·¤â 1 ¤Ä¤À¤±¿Ê¤á¤ë¤ï¤±¤Ç¤Ï¤Ê¤¤¤È¤¤¤¦ÅÀ¤Ç¤¢¤ë¡£
242 º£Æü¤Ç¤Ï¡¢Ê¸»ú¤ä¥«¡¼¥½¥ë¤Î°ÌÃÖ¤ò¿ô¤¨¤ë¤Î¤Ë
246 ¤È¤¤¤Ã¤¿¥é¥¤¥Ö¥é¥ê´Ø¿ô¤ò»È¤¦¤Ù¤¤Ç¤¢¤ë¡£
248 (VT100 üËö¤Ê¤É¤Ç»È¤ï¤ì¤ë)
252 ¤ØÀÚÂؤ¨¤ë¸ø¼°¤Ê¥¨¥¹¥±¡¼¥×¥·¡¼¥±¥ó¥¹¤Ï ESC % G ("\\x1b%G") ¤Ç¤¢¤ë¡£
257 ¤Ø¤Î¥ê¥¿¡¼¥ó¥·¡¼¥±¥ó¥¹¤Ï ESC % @ ("\\x1b%@") ¤Ç¤¢¤ë¡£
258 (G0 ¥»¥Ã¥È¤È G1 ¥»¥Ã¥È¤òÀÚÂؤ¨¤ë¤È¤¤¤Ã¤¿)
259 ¤½¤Î¾¤Î ISO 2022 ¥·¡¼¥±¥ó¥¹¤Ï¡¢UTF-8 ¥â¡¼¥É¤Ç¤Ï»È¤¨¤Ê¤¤¡£
261 ͽÃΤǤ¤ë¾Íè¤Ç¤Ï¡¢POSIX ¥·¥¹¥Æ¥à¾å¤Î°ìÈÌŪ¤Êʸ»úÉä¹æ²½¤ÎÁ´¤Æ¤Î¥ì¥Ù¥ë¤Ç
267 ¤òÃÖ¤´¹¤¨¡¢¥×¥ì¡¼¥ó¥Æ¥¥¹¥È¤ò°·¤¦Èó¾ï¤ËÍ¥¤ì¤¿´Ä¶¤¬ºî¤é¤ì¤ë¤³¤È¤¬´üÂԤǤ¤ë¡£
269 .BR Unicode " ¤È " UCS
272 ¤ÎÀ¸À®¼Ô¤Ï¤Ç¤¤ë¤À¤±Ã»¤¤·Á¼°¤òÍѤ¤¤ë¤è¤¦Í׵ᤷ¤Æ¤¤¤ë¡£
273 Î㤨¤Ð¡¢ÀèƬ¥Ð¥¤¥È¤¬ 0xc0 ¤Ç¤¢¤ë¤è¤¦¤Ê 2 ¥Ð¥¤¥ÈÎó¤ò
274 À¸À®¤¹¤ë¤Î¤Ï½àµò¤·¤Æ¤¤¤ë¤È¤Ï¤¤¤¨¤Ê¤¤¡£
276 ¤Ç¤Ï¡¢µ¬³Ê¤Ë½àµò¤¹¤ë¥×¥í¥°¥é¥à¤Ï
277 ºÇû¤Îɽ¸½·Á¼°¤Ç¤Ï¤Ê¤¤ÆþÎϤò¼õ¤±ÉÕ¤±¤Ê¤¤¡¢¤È¤¤¤¦Í×µá»ö¹à¤¬Äɲ䵤줿¡£
278 ¤³¤ì¤Ï¥»¥¥å¥ê¥Æ¥£¾å¤ÎÍýͳ¤Ë¤è¤ë¡£
279 ¥æ¡¼¥¶¡¼ÆþÎϤ¬¥»¥¥å¥ê¥Æ¥£¾å¤Î´í¸±¤ËÂФ·¥Á¥§¥Ã¥¯¤µ¤ì¤ë¾ì¹ç¡¢
282 ÈǤΠ"/../" ¤ä ";" ¤ä "NUL" ¤À¤±¤ò¥Á¥§¥Ã¥¯¤·¡¢
283 ºÇû¤ËÉä¹æ²½¤µ¤ì¤Æ¤Ê¤¤¤³¤ì¤é¤Îʸ»ú¤ò¸«²á¤´¤·¤Æ¤·¤Þ¤¦¤«¤â¤·¤ì¤Ê¤¤¤«¤é¤Ç¤¢¤ë¡£
284 ¤Ê¤¼¤Ê¤é¡¢ºÇû¤Ç¤Ï¤Ê¤¤
286 Éä¹æ²½¤Ç¤Ï¡¢¤³¤ì¤é¤Îʸ»ú¤òɽ¸½¤¹¤ë¤è¤¦¤ÊÍÍ¡¹¤Ê
288 °Ê³°¤Î·Á¼°¤¬Â¸ºß¤¹¤ë¤¿¤á¤Ç¤¢¤ë¡£
290 ISO/IEC 10646-1:2000, Unicode 3.1, RFC\ 2279, Plan 9.
292 .\" Markus Kuhn <mgk25@cl.cam.ac.uk>