CHARACTER CONVERSION ==================== INTRODUCTION ------------ NOTE TO DIALUP (TEXT) USERS: The location of files in the PDA is given here in http format. For example: "http://www.ncf.ca/ncf/pda/computer/dos/comm/xlate.zip" means that the file "xlate.zip" is in the computer/dos/comm directory of the PDA. For help with using the PDA, Your Choice ==> go pda Every once in a while you may see funny characters on the Freenet when you try to read through a message in French or in some other language. These funny characters might take the form of `line drawing' or other unusual characters, line feeds (skips to the next line), or seemingly missing letters. You may think that this is the result of either typing errors or system defects. The truth is that these anomalies may simply turn out to be accented characters after you configure your system properly. Those who consistently write in English seldom have any real concern for accents and diacritics. However one must remember that the English language has borrowed many words from other languages, and while doing so has imported whatever spelling those words had in the other language. Some examples: résumé, cliché, après-ski, soupçon, mañana, etc. If you see strange characters or again if you think some letters have been omitted in the preceding sentence you may wish to continue reading. The following is a test line showing some of the accents and diacritics more likely to be seen in writing on the Freenet. You may also wish to use any one of them at one time or another: à é è â ç ñ ô Again if you see blank lines in the space above or if you see a group of unintelligible characters you should perhaps continue reading. If you have read the words above correctly and if you see the above test line consisting of 5 accented letters and 2 other diacritics, the odds are that the system you are using to get on Freenet is based on the standard ISO 8859-1 to represent eight-bit accented characters, in which case you may wish to read no further. TWO STANDARDS ------------- There are three main areas where problems with accented characters occur: display (screen), input (keyboard), and output (printer). Only the first two will be discussed here. Before dealing with each of these problems, a short explanation of how characters are represented inside the computer will greatly assist in understanding both the problems and their solutions.[1] Within a computer system, information is represented by collections of electronic or magnetic switches, referred to as bits (binary digits). Typically, eight bits are grouped into a byte to represent a character. There are 256 possible combinations of eight bits. Each combination can be used to represent one character (letter, number or other symbol) and is called the code for that character. A collection of such codes for a particular purpose (e.g., to represent an alphabet) is also called a code or a character set. There are several widely-used codes, but the most widely accepted is called ASCII (American Standard Code for Information Interchange). It was developed many years ago when one objective was to minimize the number of bits sent between communications devices, and is actually only a seven-bit code, giving 128 possible characters (0-127). This was all that was necessary for standard English text, and the people who developed it were not then concerned with internationalization. However, with further computer development, the 8-bit byte became a standard so the ability to expand the character set to 256 characters was there. (The 16-bit Unicode standard will not be discussed here). It is these 8-bit codes (128-255) that are used to represent accented and other non-English characters. The standard extended character set that is used on much of the Internet is called Latin1 or ISO 8859-1. This is the character set that is used on FreeNet. However, while ISO 8859-1 is an international standard, not everybody uses this encoding. Many computers use their own, vendor-specific character sets (most notably IBM compatible PC's under MS-DOS). The following solutions use programs that run on your PC, so they depend on the operating system that you are using. Only DOS programs will be discussed here. READING ACCENTED CHARACTERS - DISPLAY ------------------------------------- For those connecting to the Freenet using a computer that does not use the ISO 8859-1 character set there are two main options: 1. Translation between ISO 8859-1 and other character sets using the terminal emulation of your communication program. 2. Installing the ISO 8859-1 character set on your PC. NOTE: Many terminal emulations for PCs strip the 8th bit when in text transmission mode, so if you are using such a program to dial up a computer, you may have to configure your terminal program to transmit all 8 bits (in both directions) even if your computer uses the Latin1 character set. 1. TERMINAL EMULATION --------------------- CONEX (DOS) Download http://www.ncf.ca/ncf/pda/computer/dos/comm/conex75.zip from the PDA and install conex as your terminal program. Change the emulation (Alt+m) for "Character (160-255)" from "IBM-8" to "ANSI-8". ISO 8859-1 characters will be displayed correctly while you are online. Captured text will *not* be converted, so you will have to use another method to view it offline. TELIX (DOS) 1. Download http://www.ncf.ca/ncf/pda/computer/dos/comm/xlate.zip from the PDA. 2. Unzip telixin.xlt and telixout.xlt into your Telix directory. 3. Run Telix as normal and ISO 8859-1 will be displayed correctly while you are online. Captured text will also be translated so you will be able to view it offline (see next section). TELIX NOTE: You may wish to save original telixin.xlt and telixout.xlt files under different names (such as telixin/out.ori), so as to be able to retrieve them again should the need arise. NOTE: If your PC is using the hardware character set (CP437) you won't be able to use all the ISO 8859-1 characters, but only those that are contained in the CP437 character set. In order to use all the ISO 8859-1 characters, you can either use the above method plus CP850 (with Telix) or use CP819 without changing your terminal emulation. (See Section 3.) CAPTURING TEXT WITH ACCENTED CHARACTERS (TELIX) ----------------------------------------------- The capture function is a one-way transmission of characters from the Freenet to your computer. Once you have configured Telix to translate the Latin1 characters, any capture of text to your system will be done as if the characters were sent to your screen: accented characters will be converted through the same mechanism which makes them visible on your screen, and will be transmitted to your drive as they should be. You will need no further adjustments for the capture of texts. 2. INSTALLING THE ISO 8859-1 CHARACTER SET (DOS) ------------------------------------------------ 1. DOS National Language Support (Code Pages). This system uses *.cpi files. CP819 fonts are contained in isolatin.cpi, which can be found, along with instructions for installing Code Pages, in http://www.ncf.ca/ncf/pda/computer/dos/util/iso-8859.zip. 2. Load desired font directly into ega/vga adapter. Individual font files can be extracted from *.cpi files using http://www.ncf.ca/ncf/pda/computer/dos/util/breakcpi.zip. The font file for a 80x25 screen can be loaded using the following batch file. The two programs called in the batch file are from http://www.ncf.ca/ncf/pda/computer/dos/util/fpman220.zip. | @echo off | lh xvreset.com | vga.com font cp819.f16 NOTE: Code Page 819 _is_ ISO 8859-1. Code Page 850 has the same characters as ISO 8859-1, BUT the characters are in different locations (i.e., you can translate 1-to-1, but you do have to translate the characters.) NOTE: Code Page 819 and fonts do not include the MS-Windows extensions (128-159). WRITING ACCENTED CHARACTERS WHILE IN FREENET EDITORS - INPUT ------------------------------------------------------------ 1. KEYB.COM (DOS) If you know how to use accents on your own system and you wish to be able to write them onto a Freenet editor, you may wish to go ahead and try it. Again if your system already uses Latin-1 you may have no difficulty doing that. If you use a DOS system with the US or some other keyboard which does NOT include keys for accented letters, you will NOT be able to transmit those characters simply because the keyboard doesn't have them. You may wish to change keyboard simply by typing `keyb cf' (without the quotes) at the *DOS* prompt (not at the Freenet ==> prompt). If your keyboard is an extended keyboard, you will find while on the CF configuration that the line of characters shown in the introductory paragraph can be typed as described below: à (a grave accent): apostrophe (') followed by `a' (the apostrophe is now obtained with Shift+comma) é (e acute accent): forward slash (/) key è (e grave accent): apostrophe followed by `e' â (a circumflex): opening square bracket ([) followed by `a' ç (c cedilla): closing square bracket (]) followed by `c' ñ (spanish n): Shift+closing square bracket ô (o circumflex): apostrophe followed by `o' Of course while you are using the CF configured keyboard you will not be able to locate some of the characters you normally find on the US keyboard. So remember to return your usual keyboard with the appropri- ate command at the DOS prompt (returning to the US keyboard, type `keyb us' at the DOS prompt). Please note that most communications software will allow users to access DOS or use a DOS command without leaving the software. By way of an example TELIX will let you give a DOS command after you enter Alt-V. 2. KEYSWAP (DOS) Use http://www.ncf.ca/ncf/pda/computer/dos/util/keyswp12.zip to create your own user-defined keyboard for entering accented characters. FILE TRANSFER - CONVERTING AN UPLOADED FILE ------------------------------------------- You may not feel comfortable using a strange keyboard on a Freenet editor and for that reason you may wish to write a text off-line and then upload it using one of the available transfer protocols. Transfer protocols allow the sending and receiving computers to accept -- and to verify the accuracy of -- transmitted characters. Because these computers "talk" to each other in the process they must be using identical character tables to transfer accurately. Again if your computer makes use of the Latin-1 code page you have no problem. Otherwise the transfer function will necessitate an additional process once the transmission has been executed. Let us say that you have just typed a text containing accented characters and you have saved it as "TOBESENT.TXT". This file will be transmitted like any other file except that when you look at it on the Freenet after transmission, you will note that your accented characters have been changed into some other unintelligible characters. Marc Gauthier has foreseen that problem and has created a program available on the Freenet which will convert such a transmitted file into a Latin-1 coded file. To make use of that program, please enter the following sequence of commands: Your Choice ==> go conv-char and then select the code page you normally use in your own computer. You will then be prompted to enter the name of a file in your Work directory that you wish to convert. i.e. TOBESENT.TXT Note that case is important when typing file names. The next prompt will ask you the name you wish to give to the converted file. You may wish to retain the same name and give it a new exten- sion, i.e. TOBESENT.LAT If you now read your file ("go files") you will see that all accented characters are seen as they should be. FILE TRANSFER - CONVERTING A DOWNLOADED FILE (DOS) -------------------------------------------------- CFILT is a DOS program that can be used to convert files on your own PC computer so that you can read them. It is available in the PDA at http://www.ncf.ca/ncf/pda/computer/dos/util/cfilt.zip. If you run CFILT without any command line parameters, it will display how to use it. The following will take "infile" that has only LF at the end of each line of text (and is therefore unreadable in DOS) and creates "outfile" with a CR character added to each line to conform with DOS text format (CRLF): > cfilt infile outfile out=cp850:crlf For more help with CFILT, Your Choice ==> go help-tools #8 A NOTE OF WARNING ----------------- As you may surmise after having read the above, the proper handling of accents in transactions between the Freenet and your computer is highly dependant on the type of system you have, both hardware *and* software. It would be difficult to investigate all the possibilities that are possible in the multitude of systems used by all Freenet members. If the handling of accented characters and diacritics in your system causes results unforeseen in the above instructions, please bring it to the authors' attention so they can be modified (or post a query in the Help Desk, "go help"). [1] The Use of French Characters with IBM-Compatible Personal Computers A Technical Report Treasury Board Information Technology Standard 15 by Ed Hicks of the Department of Justice (1991). 1996-10-18 ab388 2000-10-01 ag221