ISO 8859-1 National Character Set FAQ [condensed] Michael K. Gschwind This FAQ discusses topics related to the use of ISO 8859-1 based 8 bit character sets. It discusses how to use European (Latin American) national character sets on UNIX-based systems and the Internet. 1. Which coding should I use for accented characters? Use the internationally standardized ISO-8859-1 character set to type accented characters. This character set contains all characters necessary to type all major (West) European languages. This encoding is also the preferred encoding on the Internet. This character set is also used by AmigaDOS, MS-Windows, VMS (DEC MCS is practically equivalent to ISO 8859-1) and (practically all) UNIX implementations. MS-DOS normally uses a different character set and is not compatible with this character set. (It can, however, be translated to this format with various tools.) Footnote: Supposedly, IBM code page 819 is fully ISO 8859-1 compliant. ISO 8859-1 supports the following languages: Afrikaans, Basque, Catalan, Danish, Dutch, English, Faeroese, Finnish, French, Galician, German, Icelandic, Irish, Italian, Norwegian, Portuguese, Spanish and Swedish. 5. Translating between different international character sets. While ISO 8859-1 is an international standard, not everybody uses this encoding. Many computers use their own, vendor-specific character sets (most notably Microsoft for MS-DOS). If you want to edit or view files written in different encoding, you will have to translate them to an ISO 8859-1 based representation. 13.3 News and ISO 8859-1 Much as mail, the Usenet news protocol specification is 7 bit based, but the infrastructure has been upgraded to 8 bit service... Thus, accented characters are transferred correctly between much of Europe (and Latin America). ISO 8859-1 is _the_ standard for typing accented characters in most newsgroups (may be different for MS-DOS centered newsgroups ;-), and is preferred in most European news group hierarchies, such as at.* or de.* 15.4 MS-DOS PCs MS-DOS PCs normally use a different encoding for accented characters, so there are two options: * you can use a terminal emulator which will translate between the different encodings. If you use the PROCOMM PLUS, TELEMATE and TELIX modem programs, you can down-load the translation tables from URL ftp://oak.oakland.edu/pub/msdos/modem/xlate.zip. (You need to install CP850 for this to work.) * you can reconfigure your MS-DOS PC to use an ISO-8859-1 code page. Either install IBM code page 819 (see section 19), or you can get the free ISO 8859-X support files from the anonymous ftp archive ftp://ftp.uni-erlangen.de/pub/doc/ISO/charsets, which contains data on how to do this (and other ISO-related stuff). The README file contains an index of the files you need. Note that many terminal emulations for PCs strip the 8th bit when in text transmission mode. If you are using such a program to dial up a computer, you may have to configure your terminal program to transmit all 8 bits. 18.3 MS DOS IBM code page 819 _is_ ISO 8859-1. Code Page 850 has the same characters as ISO 8859-1, BUT the characters are in different locations (i.e., you can translate 1-to-1, but you do have to translate the characters.) 18.4 MS-Windows Microsoft Windows uses an ISO 8859-1 compatible character set (Code Page 1252), as delivered in the US, Europe (except Eastern Europe) and Latin America. In Windows 3.1, Microsoft has added additional characters in the 0x80-0x9F range. 23. Home location of this document 23.1 www You can find this and other i18n documents under URL http://www.vlsivie.tuwien.ac.at/mike/i18n.html. 23.2 ftp The most recent version of this document is available via anonymous ftp from ftp.vlsivie.tuwien.ac.at under the file name /pub/8bit/FAQ-ISO-8859-1