1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
5 >Character Set Support</TITLE
8 CONTENT="Modular DocBook HTML Stylesheet Version 1.7"><LINK
10 HREF="mailto:pgsql-docs@postgresql.org"><LINK
12 TITLE="PostgreSQL 7.4.1 Documentation"
13 HREF="index.html"><LINK
16 HREF="charset.html"><LINK
19 HREF="charset.html"><LINK
21 TITLE="Routine Database Maintenance Tasks"
22 HREF="maintenance.html"><LINK
25 HREF="stylesheet.css"><META
27 CONTENT="2003-12-22T03:48:47"></HEAD
33 SUMMARY="Header navigation table"
43 >PostgreSQL 7.4.1 Documentation</TH
67 >Chapter 20. Localization</TD
81 HREF="maintenance.html"
96 >20.2. Character Set Support</A
102 > The character set support in <SPAN
106 allows you to store text in a variety of character sets, including
107 single-byte character sets such as the ISO 8859 series and
108 multiple-byte character sets such as <ACRONYM
112 Code), Unicode, and Mule internal code. All character sets can be
113 used transparently throughout the server. (If you use extension
114 functions from other sources, it depends on whether they wrote
115 their code correctly.) The default character set is selected while
116 initializing your <SPAN
123 >. It can be overridden when you
124 create a database using <TT
131 >. So you can have multiple
132 databases each with a different character set.
140 >20.2.1. Supported Character Sets</A
144 HREF="multibyte.html#CHARSET-TABLE"
146 > shows the character sets available
147 for use in the server.
156 >Table 20-1. Server Character Sets</B
261 >Mule internal code</TD
273 > 94 (Latin alphabet no.1)</TD
285 > 94 (Latin alphabet no.2)</TD
297 > 94 (Latin alphabet no.3)</TD
309 > 94 (Latin alphabet no.4)</TD
321 > 128 (Latin alphabet no.5)</TD
330 >ISO 8859-10/<ACRONYM
333 > 144 (Latin alphabet no.6)</TD
342 >ISO 8859-13 (Latin alphabet no.7)</TD
351 >ISO 8859-14 (Latin alphabet no.8)</TD
360 >ISO 8859-15 (Latin alphabet no.9)</TD
369 >ISO 8859-16/<ACRONYM
372 > SR 14111 (Latin alphabet no.10)</TD
384 > 113 (Latin/Cyrillic)</TD
396 > 114 (Latin/Arabic)</TD
408 > 118 (Latin/Greek)</TD
420 > 121 (Latin/Hebrew)</TD
459 >Windows CP1256 (Arabic)</TD
471 >-5712/Windows CP1258 (Vietnamese)</TD
480 >Windows CP874 (Thai)</TD
499 mistakenly meant ISO 8859-5. From 7.2 on, <TT
503 means ISO 8859-9. If you have a <TT
507 created on 7.1 or earlier and want to migrate to 7.2 or later,
508 you should be careful about this change.
516 >s support all the listed character sets. For example, the
521 JDBC driver does not support <TT
543 >20.2.2. Setting the Character Set</A
549 > defines the default character set
553 > cluster. For example,
557 >initdb -E EUC_JP</PRE
560 sets the default character set (encoding) to
564 > (Extended Unix Code for Japanese). You
572 > if you prefer to type longer option strings.
586 > You can create a database with a different character set:
590 >createdb -E EUC_KR korean</PRE
593 This will create a database named <TT
597 uses the character set <TT
601 accomplish this is to use this SQL command:
604 CLASS="PROGRAMLISTING"
605 >CREATE DATABASE korean WITH ENCODING 'EUC_KR';</PRE
608 The encoding for a database is stored in the system catalog
612 >. You can see that by using the
632 Database | Owner | Encoding
633 ---------------+---------+---------------
634 euc_cn | t-ishii | EUC_CN
635 euc_jp | t-ishii | EUC_JP
636 euc_kr | t-ishii | EUC_KR
637 euc_tw | t-ishii | EUC_TW
638 mule_internal | t-ishii | MULE_INTERNAL
639 regression | t-ishii | SQL_ASCII
640 template1 | t-ishii | EUC_JP
641 test | t-ishii | EUC_JP
642 unicode | t-ishii | UNICODE
653 >20.2.3. Automatic Character Set Conversion Between Server and Client</A
660 character set conversion between server and client for certain
661 character sets. The conversion information is stored in the
665 > system catalog. You can create a new
666 conversion by using the SQL command <TT
674 predefined conversions. They are listed in <A
675 HREF="multibyte.html#MULTIBYTE-TRANSLATION-TABLE"
682 NAME="MULTIBYTE-TRANSLATION-TABLE"
686 >Table 20-2. Client/Server Character Set Conversions</B
694 >Server Character Set</TH
696 >Available Client Character Sets</TH
1397 > To enable the automatic character set conversion, you have to
1402 (encoding) you would like to use in the client. There are several
1403 ways to accomplish this:
1421 > allows you to change client
1422 encoding on the fly. For
1423 example, to change the encoding to <TT
1429 CLASS="PROGRAMLISTING"
1430 >\encoding SJIS</PRE
1446 >PQsetClientEncoding()</CODE
1451 >int PQsetClientEncoding(PGconn *<VAR
1463 > is a connection to the server,
1467 > is the encoding you
1468 want to use. If the function successfully sets the encoding, it returns 0,
1469 otherwise -1. The current encoding for this connection can be determined by
1474 >int PQclientEncoding(const PGconn *<VAR
1480 Note that it returns the encoding ID, not a symbolic string
1484 >. To convert an encoding ID to an encoding name, you
1489 >char *pg_encoding_to_char(int <VAR
1500 >SET client_encoding TO</TT
1503 Setting the client encoding can be done with this SQL command:
1506 CLASS="PROGRAMLISTING"
1507 >SET CLIENT_ENCODING TO '<VAR
1513 Also you can use the more standard SQL syntax <TT
1519 CLASS="PROGRAMLISTING"
1526 To query the current client encoding:
1529 CLASS="PROGRAMLISTING"
1530 >SHOW client_encoding;</PRE
1533 To return to the default encoding:
1536 CLASS="PROGRAMLISTING"
1537 >RESET client_encoding;</PRE
1545 >PGCLIENTENCODING</TT
1548 If environment variable <TT
1550 >PGCLIENTENCODING</TT
1552 in the client's environment, that client encoding is automatically
1553 selected when a connection to the server is made. (This can subsequently
1554 be overridden using any of the other methods mentioned above.)
1559 > Using the configuration variable <VAR
1561 >client_encoding</VAR
1566 >client_encoding</VAR
1569 >postgresql.conf</TT
1571 client encoding is automatically selected when a connection to the
1572 server is made. (This can subsequently be overridden using any of the
1573 other methods mentioned above.)
1580 > If the conversion of a particular character is not possible --
1581 suppose you chose <TT
1584 > for the server and
1588 > for the client, then some Japanese
1589 characters cannot be converted to <TT
1593 is transformed to its hexadecimal byte values in parentheses,
1606 >20.2.4. Further Reading</A
1609 > These are good sources to start learning about various kinds of encoding
1615 CLASS="VARIABLELIST"
1619 HREF="ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf"
1621 >ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf</A
1625 > Detailed explanations of <TT
1639 > appear in section 3.2.
1644 HREF="http://www.unicode.org/"
1646 >http://www.unicode.org/</A
1650 > The web site of the Unicode Consortium
1660 >-8 is defined here.
1674 SUMMARY="Footer navigation table"
1703 HREF="maintenance.html"
1727 >Routine Database Maintenance Tasks</TD