Input methods to enter characters from the repertoire of ISO/IEC 10646 with a keyboard or other input devices


This document has been written by Alain LaBonté (<ALB@RIQ.Qc.Ca>), and has been put here by Philippe Deschamp to help circulate this information. It is also available by FTP from Germany or Finland. Please direct any comment to your national normalisation body, in preparation for the vote which should take place at ISO/CEI JTC1/SC18/GT9 a few weeks from now (now being the last days of February, 1995).
Project:       18.57.00.00.00.00                    ISO/IEC
               JTC1/SC18/WG9 N1440

Title  Input methods to enter characters from the repertoire of ISO/IEC
       10646 with a keyboard or other input devices

       [Méthodes de saisie de caractères du répertoire de l'ISO/CÉI
       10646 à l'aide d'un clavier ou d'autres unités d'entrée]

Status:        Working Draft 3 for CD registration and ballot

Date:          1995-02-14

Acting Editor:        Alain LaBonté

                      Gouvernement du Québec
                      Secrétariat du Conseil du trésor
                      Service de la prospective et de la francisation

                      Édifice H
                      875, Grande-Allée Est, 4e étage
                      Québec, QC  G1R 5R8
                      Canada

Email:                       ALB@RIQ.QC.CA

Foreword ISO (the International Organization for Standardization) and IEC (the International Electrotechnical Commission) form the specialized system for worldwide standardization. National bodies that are members of ISO or IEC participate in the development of International Standards through technical committees established by the respective organization to deal with particular fields of mutual interest. Other international organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the work. In the field of information technology, ISO and IEC have established a joint technical committee ISO/IEC JTC1. Draft International Standards adopted by the joint technical committee are circulated to the national bodies for voting. Publication as an international standard requires approval by at least 75% of the national bodies casting a vote. International standard ISO/IEC POIUY has been prepared by Joint Technical Committee ISO/IEC JTC1, Information technology. Introduction Today, there is a well-known method in existence for inputting characters foreign to a given keyboard on PC compatibles. However this method is code-dependent and is limited to 8-bit character sets. There is a need to standardize such a method independently of coding even for these limited sets of characters. There is also an international standard, ISO/IEC 9995-3, for inputting on a standard 48-key keyboard the repertoire of characters belonging to those European languages using the Latin script. But this standard is limited to the Latin script, even if it opens the door to the defining supplementary groups for other scripts. In the meanwhile, until other groups are well defined and documented, there should be an easy standard way to enter non-Latin characters in a code-independent fashion. This would avoid the multiplication of such methods, a situation that is never desirable for end-users. Furthermore, ISO/IEC JTC1 recently published a standard, ISO/IEC 10646, titled "Universal multiple-octet coded character set (UCS)", which is a superset of the repertoires of all standard character sets published so far by ISO/IEC JTC1. For this one large character set (UCS), there is no standard input method in existence today. But there will be an increasing need to do so, which would also solve the problem of code independence.
Title Input methods to enter characters from the repertoire of ISO/IEC 10646 with the help of a keyboard or other input/output devices [Méthodes de saisie de caractères du répertoire de l'ISO/CÉI 10646 à l'aide d'un clavier ou d'autres unités d'entrée-sortie] 1 Scope This international standard defines methods that allow entry of characters belonging to the repertoire of single and multi-octet coding standards such as ISO/IEC 10646 in a code independent manner using a keyboard or other input/output devices. It is also expected that for implementations of different character set coding schemes, this method will be usable, provided that the target character sets have repertoires that are subsets of the universal multiple-octet coded character set (ISO/IEC 10646) or of any other standard character set. More specifically, this project will define: - a basic method for entering a character which involves using its bit representation (canonical or abbreviated form) in ISO/IEC 10646 as a catalog number, whichever underlying code is used for that character; - a meta-entry method for entering characters corresponding to the visual representation of the keyboard function symbols (according to ISO/IEC 9995- 7 symbols and ISO/IEC 10646 characters), with the help of the function keys themselves; - a composition method to enter a character with the help of a mnemonic sequence of characters; - a screen-selection entry method for selecting a character displayed on a screen for data entry; - a feedback method that allows exact identification of characters shown on a screen, for subsequent data entry; - a group identification method for entering characters with a group select mechanism; This standard is intended to complement existing national keyboard layouts or existing input methods optimized for national use. Hence it does not replace any national keyboard entry requirement but is rather a tool to ease entry of the complete character répertoire of ISO/IEC 10646 with the help of already existing national keyboards. 2 Normative references ISO 639 Two[three]-letter codes for languages ISO 3166 Two-letter codes for countries ISO/IEC 9995-1 Information technology - Keyboard layouts for text and office systems - Part 1 - General principles governing keyboard layouts ISO/IEC 9995-3 Information technology - Keyboard layouts for text and office systems - Part 3 - Complementary layouts of the alphanumeric zone of the alphabetic section ISO/IEC 9995-7 Information technology - Keyboard layouts for text and office systems - Part 7 - Symbols used to represent functions ISO/IEC 10646-1 Information technology - Character sets and information coding -Universal multiple-octet coded character set - Part 1 - Architecture and basic multilingual plane 3 Definitions canonical form the form with which characters of the UCS are specified using four octets to represent each character. compose character a function which selects a graphic character which has not been allocated on the keyboard by associating other allocated characters. control key a key representing the function Control as described in ISO/IEC 9995-7 group a logical state of a keyboard providing access to a collection of graphic characters or elements of graphic characters. Usually these graphic characters or elements of graphic characters logically belong together and may be arranged on several levels within a group. The input of certain graphic characters, such as accented letters, may require access to more than one group. Note: The term "group" is not to be mistaken in keyboard standards with the term "group" as used in ISO/IEC 10646-1, a term which is defined differently from the one given in keyboard standards. The definition proposed in the previous paragraph is the one given in ISO/IEC 9995-1. level a logical state of a keyboard providing access to a collection of graphic characters or elements of graphic characters. Usually these graphic characters or elements of graphic characters logically belong together, such as the capital forms of letters. In certain cases the level selected may also affect function keys. level 2 select a function which selects the set of characters or functions allocated to the level 2 of the keyboard in an active group. UCS the Universal Multiple-Octet Coded Character Set standard known as ISO/IEC 10646-1 and its extensions to come. 4 Symbols and abbreviations In this standard, identification of specific characters of the UCS repertoire will be by means of symbols of the form Uxx[xx[xx[xx]]] where the occurrences of xx which follow the letter "U" represent the canonical form (or its abbreviated form, by eliminating as many leading zeroes as required), in hexadecimal, of a coded character as it is defined in the UCS, which is a means to be code-independent (the same form, considered as a catalog number, being possibly used even if the coded character set in use in a given implementation is not the UCS) and at the same time to keep a straightforward link with the Universal Multiple- Octet Coded Character Set, which is assumed to contain all the coded graphic characters ever defined by ISO/IEC. Whenever possible other short mnemonic identifiers will be used in comments in addition to the printing of the characters themselves. The letter U stands for UCS, which itself stands for Universal multiple- octet Coded Character set. When the name of a character is used (standard or conventional) in this text, it is surrounded in the text by LESS-THAN SIGN and GREATER- THAN SIGN, as for, example, in <SPACE> which represents the character SPACE. 5 Requirements 5.1 Basic method 5.1.1 Prerequisites The basic method requires that the keyboard in use has, in its alphanumeric section, the space bar (which generates the character <SPACE>), the ten decimal digits and the first 6 letters of the Latin script allocated (or the first 6 letters of any other alphabet mapped accordingly) to represent hexadecimal numbers. In examples presented here, a one-to-one sequential mapping is implicit between the first 6 letters of any alphabet and the letters ABCDEF of the Latin script). 5.1.2 Principles of Operation The basic method operates as follows: While the Control key (as described in ISO/IEC 9995-7) and the Level 2 select key (as described in the same standard) are simultaneously depressed [preferably by activating the Level 2 select key first, then the Control key], typing the hexadecimal value of the canonical form (or its abreviation as described in the Symbols and Abbreviations clause) targeting the desired character and ending the "catalog number" by depressing the space bar shall generate a coded graphic character equivalent to the one corresponding to this "catalog number" in the UCS. If the Control and the Level 2 Select keys are released, depressing the space bar is not necessary to generate a character. When the space bar is used, it allows multiple series of hexadecimal numbers, each representing a character, to be entered without releasing the Control and Level 2 select keys. Whenever possible no distinction should be made between upper case and lower case letters (or shape variants) for entering the first 6 letters of the alphabet as hexadecimal digits. For example, if the sequence 00C0 is typed while Level 2 select and Control were previously depressed and held depressed during the typing, character "À" will be generated. In an ISO/IEC 8859-1 environment, the code value generated by such an operation would be hexadecimal C0; in an IBM 850 environment, this would generate the code value equal to hexadecimal B7 (or decimal 183); in an IBM 863 environment, the code value generated would be hexadecimal 8E (decimal 142). In an IBM 437 environment (the original PC code page) or in any coding environment where the target character does not exist, it is recommended that this code will at the minimum give a warning that this character is not available and generate, for the example presented here, the base letter "A" unaccented. See clause 5.7 for the normative presentation fallback. 5.2 Meta-entry method Function keys of the keyboard to which correspond, for each of them, a symbol in ISO/IEC 9995-7, and for which symbol an appropriate character exists in the UCS, can be used as follows, to select that character (say, for searching that character in online documentation): After the Control key (as described in ISO/IEC 9995-7) and the Level 2 select key (as described in the same standard) have been simultaneously depressed and released without any other simultaneous key depression [preferably by activating the Level 2 select key first, then the Control key and then releasing both], depressing a function key whose function has been assigned an ISO/IEC 9995-7 symbol the shape of which corresponds to a UCS character, will select that character. If a function is normally activitated with the help of a select key (level 2 or level 3), like for example, the Tabulation Left function, then the selection of the character corresponding to the symbol function can be done in the same way as when the function is activated. The normative presentation fallback presented in clause 5.7 applies when this clause is a requirement as per conformance level 2. To illustrate that method, if one depresses the Control key in conjunction with the Level 2 select key and releases both keys, then depresses the Tabulation Right key, the resulting character that should be generated with this method should be the UCS character represented by the canonical value U000021E5 (the name of this character is <RIGHTWARDS ARROW TO BAR>, the shape of which corresponds to the ISO/IEC 9995-7 symbol for Tabulation Right) 5.3 Composition method Given the availability in a system file of a mapping between mnemonic sequences of characters available on the user keyboard and individual characters of the UCS (such as the sequence <E'> representing the letter "É" or such as the sequence <9I> representing the character "¶"), then the selection of any of the target characters of this map is done in the following way: While a composition mechanism (Compose character function) is activated, typing a mnemonic sequence of characters exactly and deactivating the Compose character function shall select the appropriate character mapped to this sequence. The normative presentation fallback presented in clause 5.7 applies when this clause is a requirement as per conformance level 3. 5.4 Screen-selection entry method By this method, the user will select, by using a pointing device, a character displayed on a screen. The system shall input the correct coded character that is represented by the shape selected, even if that shape is the result of the fallback representation described in clause 5.7. The feedback method described in clause 5.5 below can optionally be used to indicate to the user the actual coded character in use. The normative presentation fallback presented in clause 5.7 applies when this clause is a requirement as per conformance level 4. 5.5 Feedback method for identifying displayed characters for later input When this method is selected, in addition or in conjunction with the screen selection entry method, selecting a character on a screen with the help of a pointer device, shall display to the user the exact identification of the character represented by the shape pointed, even if that shape is the result of the fallback representation described in clause 5.7. So that this information can be used for further input using as a minimum the basic method described in clause 5.1, this information shall be presented giving the canonical form of the UCS character under the form Uxxxxxxxx (according to the convention described in clause 4). It is recommended that if another code table is used internally, the bit combination actually used in that system be also displayed as complementary information. As an illustration of the preceding, if shape "A" is displayed on a screen, the underlying coded character can correspond to one of the three following possibilities: capital Latin letter A, capital Greek letter Alpha or capital Cyrillic letter A. If, for reference, one wants to later retrieve the same occurrence in a file, one needs to recall the right character. The feedback method will positively identify the right character and easily make possible later input of the latter. 5.6 Group identification method of national keyboard layouts Whenever more than two groups are in use, to be conformant to this standard at level 6, the group select mechanism shall be able to directly select a group corresponding to a country code specified in ISO 3166 and, optionally, or if more than one language is in use in the country, a language code specified in ISO 639 to identify standard national keyboard layouts. When more than one keyboard layout (typically de facto keyboard layouts, by opposition to de jure national standards) is in use for a given country language, the system shall provide a default to the user. It shall be possible for the user to change that default at system level for a given country and language. As an example, selecting the Canadian keyboard standard suitable for both French and English (standard CAN/CSA Z243.200-1992), the group will be identified by CA (ISO 3166 code for Canada). In this country, in addition to the Canadian standard, former Canadian French keyboard variants (vendor- dependent) can be invoked by using the supplementary code "fr[a]" (ISO 639 code for the French language). In the same manner, American keyboard variants (also vendor-dependent) can be invoked by using the supplementary code "en[g]" (ISO 639 code for the English language). In this example, without indication of a language, the default would implicitly be the Canadian keyboard standard for French and English, as there is only one logical national standard in this country, suitable for the two official languages at once. If a specific language is identified, a default keyboard layout would then be determined by system parameters (either the standard keyboard or de facto language-specific variants). Note: At time of editing this standard, ISO 639 currently documents languages with 2-lowercase-letter codes (which allows the theoretical representation of a maximum of 676 languages). This standard is under revision and is expected to be able to represent more languages with a 3-letter code. Therefore, in implementing this method, no assumption on the number of letters to represent a language code should be made. Flexibility of implementation regarding this issue is recommended. 5.7 Character integrity and presentation fallback method In all cases when a character that is part of the répertoire of ISO/IEC 10646 is entered, it shall be internally mapped to the same character if a different target code table is used than the UCS. If this character is not available in the target code, the system shall take whatever appropriate measure so that character integrity be respected in all cases (such as using the Universal Transformation Format 8 [UTF-8] prescribed by ISO/IEC 10646 that allows to represent a UCS character in an 8-bit environment). If the character entered is undefined in ISO/IEC 10646, in addition to the requirement that if applicable, the UCS coding space integrity be respected (i.e. that the integrity of the character be respected as if the character were defined, to care about addenda to this international standard before a next release of the actual interface in use) it is recommended that a special warning be issued informing the user of the potential problem and giving him/her the opportunity to undo this particular input. If the character entered can not be displayed because the character is not available on an output device, as an acceptable optional presentation fallback measure, it is recommended that, following the selection of the unavailable character, the display driver generate, whenever possible, a character with a close similarity (a character with a related shape or a character with a generally recognized common interpretation), while giving a warning (by an audible signal or by any means judged appropriate). It is to be noted, however, that for most UCS characters a fallback will be very difficult with 7/8-bit character sets, and a very unlikely character (such as, for example, in an MS-DOS environment, PC code 127 for EMPTY HOUSE) is then suggested to be generated instead of the selected character, as a further error indication. When programming complexity is not a problem, a string of characters can be displayed instead of only one fallback character, to represent more accurately the character unavailable for normal display. If this option is chosen, however, the system shall keep a reversible one-to-one mapping between the boundaries of whole string displayed and the intended original coded character, so that the feedback method specified in clause 5.6 be usable in pointing any element of that string. 6. Conformance Any claim of conformance of a system to this standard shall be accompanied by a list of conformance levels selected from the following list: level 1 shall meet the requirements of clauses 5.1 and 5.7; level 2 shall meet the requirements of clauses 5.2 and 5.7; level 3 shall meet the requirements of clauses 5.3 and 5.7; level 4 shall meet the requirements of clauses 5.4 and 5.7; level 5 shall meet the requirements of clauses 5.5 and 5.7; level 6 shall meet the requirements of clauses 5.6 and 5.7; level 7 shall meet the requirements of all clauses of section 5 (5.1 through 5.7).

If this can raise attendance or participation to SC18/WG9/RGK it would be nice. Of course the draft is only a draft and does not carry an ISO/IEC copyright. People are invited to follow the de jure status of this project through ISO/IEC channels.

Regards.

Alain LaBonté
Québec

Project Editor


Information about the current status of any ISO document is available from ISO headquarters in Geneva.
email: <Central@ISOCS.ISO.CH>, phone: +41 22 7490111, fax: +41 22 7333430, WWW: <http://www.iso.ch/>