Index Click this button to go to the index for this section.


dechanzi(5)

NAME

dechanzi - A character encoding system (codeset) for Simplified Chinese

DESCRIPTION

The DEC Hanzi (dechanzi) codeset consists of the following character sets: · ASCII · GB2312-80 · Extended GB DEC Hanzi uses a 2-byte data representation for symbols and ideographic characters that are defined in GB2312-80. ASCII Characters All ASCII characters are represented in the form of single-byte, 7-bit data in the DEC Hanzi codeset; that is, the most significant bit (MSB) of the byte that represents an ASCII character is always set off. For more information on ASCII characters, refer to ascii(5). GB2312-80 Characters The code table for GB2312-80 characters is divided into 94 rows(Qu), numbered from 1 to 94. Each row has 94 columns(Wei), also numbered from 1 to 94. The code table defines a total of 7445 characters, of which 6763 are Chinese characters. Chinese characters are grouped as follows: · Graphic symbols There are 682 graphic symbols, which occupy rows 1 to 9 in the code table. · Frequently used (Level 1) characters There are 3755 frequently used characters, which occupy rows 16 to 55 in the code table. · Less frequently used (Level 2) characters There are 3008 less frequently used characters, which occupy rows 56- 87 in the code table. To differentiate GB2312-80 character codes from ASCII and Extended GB character codes, the most significant bit (MSB) of both the first byte and the second byte are set on. The following formulas show how to calculate the value for a GB2312-80 character from its row and column numbers: 1st byte = A0 + Row number 2nd byte = A0 + Column number For example, if a GB2312-80 character is in the first column of the 16th row, the character's value is B0A1, which is calculated as follows: 1st byte = A0(hex) + 16 = B0(hex) 2nd byte = A0(hex) + 01 = A1(hex) Extended GB Characters The Extended GB code table is similar to the GB2312 code table and is divided into 94 rows and 94 columns (8894 code points). However, the Extended GB code table provides code points for user-defined characters (UDC). The 8836 code points in this table are divided into two areas: · User-defined area This area spans rows 1 to 87 and provides 8178 code points. · User-defined (reserved) area This area spans rows 88 to 94 and provides 658 code points. This area is where users can define special and long-lasting user-defined characters. To differentiate Extended GB codes from ASCII codes and GB2312-80 codes, the most significant bit (MSB) of the first byte is set on while that of the second byte is set off. The following formulas show how the code value of an Extended GB character is calculated from its row and column numbers: 1st byte = A0 + Row number 2nd byte = 20 + Column number For example, if a character is positioned at the first column of the 16th row on the GB2312-80 code plane, the character's value is B021, which is calculated as follows: 1st byte = A0(hex) + 16 = B0(hex) 2nd byte = 20(hex) + 01 = 21(hex) Codeset Conversion The following codeset converter pairs are available for converting Simplified Chinese characters between dechanzi and other encoding formats. Refer to iconv_intro(5) for an introduction to codeset conversion. For more information about the other codeset for which dechanzi is the input or output, see the reference page specified in the list item. · big5_dechanzi, dechanzi_big5 Converting from and to the Big-5 codeset: big5(5) · dechanyu_dechanzi, dechanzi_dechanyu Converting from and to the DEC Hanyu codeset: dechanyu(5) · eucTW_dechanzi, dechanzi_eucTW Converting from and to Taiwanese Extended UNIX Code: eucTW(5) · UCS-2_dechanzi, dechanzi_UCS-2 Converting from and to UCS-2 format: Unicode(5) · UCS-4_dechanzi, dechanzi_UCS-4 Converting from and to UCS-4 format: Unicode(5) · UTF-8_dechanzi, dechanzi_UTF-8 Converting from and to UTF-8 format: Unicode(5) DEC Hanzi encoding is identical to the Microsoft code-page format (cp936) used for Simplified Chinese characters on PC systems. However, DEC Hanzi supports fewer characters than supported by the code page. Therefore, using converters with dechanzi in the converter name to convert between cp936 and other formats can result in some data loss. Refer to code_page(5) for more information about PC code pages. DEC Hanzi Fonts The operating system provides both screen and printer fonts for DEC Hanzi characters. The following bitmap fonts are grouped according to family and reflect various sizes and typefaces for 75dpi and 100dpi display devices: Fangsongti Family: -adecw-fangsongti-medium-r-normal--24-240-75-75-m-240-gb2312.1980-1 -adecw-fangsongti-medium-r-normal--34-340-75-75-m-340-gb2312.1980-1 -adecw-fangsongti-medium-r-normal--24-240-100-100-m-240-gb2312.1980-1 -adecw-fangsongti-medium-r-normal--34-340-100-100-m-340-gb2312.1980-1 Heiti Family: -adecw-heiti-medium-r-normal--16-160-75-75-m-160-gb2312.1980-1 -adecw-heiti-medium-r-normal--24-240-75-75-m-240-gb2312.1980-1 -adecw-heiti-medium-r-normal--34-340-75-75-m-340-gb2312.1980-1 -adecw-heiti-medium-r-normal--16-160-100-100-m-160-gb2312.1980-1 -adecw-heiti-medium-r-normal--24-240-100-100-m-240-gb2312.1980-1 -adecw-heiti-medium-r-normal--34-340-100-100-m-340-gb2312.1980-1 Kaiti Family: -adecw-kaiti-medium-r-normal--24-240-75-75-m-240-gb2312.1980-1 -adecw-kaiti-medium-r-normal--34-340-75-75-m-340-gb2312.1980-1 -adecw-kaiti-medium-r-normal--24-240-100-100-m-240-gb2312.1980-1 -adecw-kaiti-medium-r-normal--34-340-100-100-m-340-gb2312.1980-1 Screen Family: -adecw-screen-medium-r-normal--18-180-75-75-m-160-gb2312.1980-1 -adecw-screen-medium-r-normal--24-240-75-75-m-240-gb2312.1980-1 -adecw-screen-medium-r-normal--18-180-100-100-m-160-gb2312.1980-1 -adecw-screen-medium-r-normal--24-240-100-100-m-240-gb2312.1980-1 -adecw-screen-medium-r-normal--18-180-100-100-m-160-gb2312.1980-UDC -adecw-screen-medium-r-normal--24-240-100-100-m-240-gb2312.1980-UDC Songti Family: -adecw-songti-medium-r-normal--16-160-75-75-m-160-gb2312.1980-1 -adecw-songti-medium-r-normal--24-240-75-75-m-240-gb2312.1980-1 -adecw-songti-medium-r-normal--34-340-75-75-m-340-gb2312.1980-1 -adecw-songti-medium-r-normal--16-160-100-100-m-160-gb2312.1980-1 -adecw-songti-medium-r-normal--24-240-100-100-m-240-gb2312.1980-1 -adecw-songti-medium-r-normal--34-340-100-100-m-340-gb2312.1980-1 The operating system provides the following PostScript printer fonts for DEC Hanzi characters: · Hei-GB2312-80 · XiSong-GB2312-80 For general information on printing Asian language text, refer to i18n_printing(5).

SEE ALSO

Commands: locale(1) Others: ascii(5), big5(5), Chinese(5), code_page(5), dechanyu(5), eucTW(5), i18n_intro(5), i18n_printing(5), iconv_intro(5), l10n_intro(5), sbig5(5), telecode(5), Unicode(5)