-.\" Hey Emacs! This file is -*- nroff -*- source.
-.\"
.\" Copyright (C) Markus Kuhn, 1995, 2001
.\"
+.\" %%%LICENSE_START(GPLv2+_DOC_FULL)
.\" This is free documentation; you can redistribute it and/or
.\" modify it under the terms of the GNU General Public License as
.\" published by the Free Software Foundation; either version 2 of
.\" GNU General Public License for more details.
.\"
.\" You should have received a copy of the GNU General Public
-.\" License along with this manual; if not, write to the Free
-.\" Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111,
-.\" USA.
+.\" License along with this manual; if not, see
+.\" <http://www.gnu.org/licenses/>.
+.\" %%%LICENSE_END
.\"
.\" 1995-11-26 Markus Kuhn <mskuhn@cip.informatik.uni-erlangen.de>
.\" First version written
.\" 2001-05-11 Markus Kuhn <mgk25@cl.cam.ac.uk>
.\" Update
.\"
-.TH UNICODE 7 2001-05-11 "GNU" "Linux Programmer's Manual"
+.TH UNICODE 7 2012-08-05 "GNU" "Linux Programmer's Manual"
.SH NAME
Unicode \- universal character set
.SH DESCRIPTION
character set and the characters in the range 0x0000 to 0x00ff
are identical to those in
.BR "ISO 8859-1 Latin-1" .
-.SS "Combining Characters"
+.SS Combining characters
Some code points in
.B UCS
have been assigned to
Combining characters are essential for instance for encoding the Thai
script or for mathematical typesetting and users of the International
Phonetic Alphabet.
-.SS "Implementation Levels"
+.SS Implementation levels
As not all systems are expected to support advanced mechanisms like
combining characters, ISO 10646-1 specifies the following three
.I implementation levels
They provide guidelines and algorithms for
editing, sorting, comparing, normalizing, converting and displaying
Unicode strings.
-.SS "Unicode Under Linux"
+.SS Unicode under Linux
Under GNU/Linux, the C type
.I wchar_t
is a signed 32-bit integer type.
general precomposed characters should be preferred where available
(Unicode calls this
.BR "Normalization Form C" ).
-.SS "Private Area"
+.SS Private area
In the
.BR BMP ,
the range 0xe000 to 0xf8ff will never be assigned to any characters by
This is the official specification of
.BR UCS .
-Available as a PDF file on CD-ROM from http://www.iso.ch/.
+Available as a PDF file on CD-ROM from
+.UR http://www.iso.ch/
+.UE .
.TP
*
The Unicode Standard, Version 3.0.
*
Unicode Technical Reports.
.RS
-http://www.unicode.org/unicode/reports/
+.UR http://www.unicode.org\:/unicode\:/reports/
+.UE
.RE
.TP
*
Markus Kuhn: UTF-8 and Unicode FAQ for UNIX/Linux.
.RS
-http://www.cl.cam.ac.uk/~mgk25/unicode.html
+.UR http://www.cl.cam.ac.uk\:/~mgk25\:/unicode.html
+.UE
Provides subscription information for the
.I linux-utf8
*
Bruno Haible: Unicode HOWTO.
.RS
-ftp://ftp.ilog.fr/pub/Users/haible/utf8/Unicode-HOWTO.html
+.UR ftp://ftp.ilog.fr\:/pub\:/Users\:/haible\:/utf8\:/Unicode-HOWTO.html
+.UE
.RE
.SH BUGS
When this man page was last revised, the GNU C Library support for
with sophisticated text rendering engines.
.\" .SH AUTHOR
.\" Markus Kuhn <mgk25@cl.cam.ac.uk>
-.SH "SEE ALSO"
+.SH SEE ALSO
.BR setlocale (3),
.BR charsets (7),
.BR utf-8 (7)