ISO/IEC JTC1/SC18/WG9 N1522

Second CD - ISO/IEC CD2 14755 - Input methods to enter characters from the repertoire of ISO/IEC 10646 with a keyboard or other input devices

[Méthodes de saisie de caractères du répertoire de l'ISO/CÉI 10646 à l'aide d'un clavier ou d'autres unités d'entrée]


This document has been written by Alain LaBonté (<ALB@SCT.Gouv.Qc.Ca>), and has been put here by Philippe Deschamp to help circulate this information. Please direct any comment to your national normalisation body, and check the evolution of this document until its becoming a de jure international standard.
[Ce document a été rédigé par Alain LaBonté (<ALB@SCT.Gouv.Qc.Ca>), et placé ici par Philippe Deschamp en vue d'aider à la circulation de cette information. Merci de bien vouloir faire parvenir vos commentaires à votre organisme de normalisation national, et de suivre le cheminement de ce document jusqu'à son adoption en tant que norme internationale.]


A previous version of this document is also available, for reference, as is available a French version.

[Vous pouvez également vous reporter à une version française de ce document, ainsi qu'à une version antérieure.]


Project
18.57.00.00.00.00
Status
Second CD ballot
Date
1995-10-16
Source
Alain LaBonté, project editor
Gouvernement du Québec
Secrétariat du Conseil du trésor
Service de la prospective et de la francisation
Édifice H
875, Grande-Allée Est, 4e étage
Québec, QC G1R 5R8
Canada

Email
ALB@SCT.Gouv.Qc.Ca

INTERNATIONAL                                                    ISO/IEC 14755
STANDARD                                                             Second CD
    

Information Technology - Input methods to enter characters from the repertoire of ISO/IEC 10646 with a keyboard or other input devices

Technologies de l'information - Méthodes de saisie de caractères du répertoire de l'ISO/CÉI 10646 à l'aide d'un clavier ou d'autres unités d'entrée

Foreword

ISO (the International Organization for Standardization) and IEC (the International Electrotechnical Commission) form the specialized system for worldwide standardization. National bodies that are members of ISO or IEC participate in the development of International Standards through technical committees established by the respective organization to deal with particular fields of mutual interest. Other international organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the work.

In the field of information technology, ISO and IEC have established a joint technical committee ISO/IEC JTC1. Draft International Standards adopted by the joint technical committee are circulated to the national bodies for voting. Publication as an international standard requires approval by at least 75% of the national bodies casting a vote.

International standard ISO/IEC 14755 has been prepared by Joint Technical Committee ISO/IEC JTC1, Information technology.

Introduction

Today, there is a well-known method in existence for inputting characters foreign to a given keyboard on PC compatibles. However this method is code-dependent and is limited to 8-bit character sets. There is a need to standardize such a method independently of coding even for these limited sets of characters.

There is also an international standard, ISO/IEC 9995-3, for inputting on a standard 48-key keyboard the repertoire of characters belonging to those European languages using the Latin script. But this standard is limited to the Latin script, even if it opens the door to the defining supplementary groups for other scripts. In the meanwhile, until other groups are well defined and documented, there should be an easy standard way to enter non-Latin characters in a code-independent fashion. This would avoid the multiplicity of such methods, a situation that is never desirable for end-users.

Furthermore, ISO/IEC JTC1 recently published a standard, ISO/IEC 10646, titled "Universal multiple-octet coded character set (UCS)", which is a superset of the repertoires of all standard character sets published so far by ISO/IEC JTC1. For this one large character set (UCS), there is no standard input method in existence today. But there will be an increasing need to do so, which would also solve the problem of code independence.

The preliminary version of this international standard included additional input methods which were not considered sufficiently matured to be standardized. These methods have been moved to the technical report on future keyboards, in preparation at time of publishing this international standard.

1 Scope

This international standard defines methods that allow entry of characters belonging to the repertoire of single and multi-octet coding standards such as ISO/IEC 10646 in a code independent manner using a keyboard or other input/output devices. It is also expected that for implementations of different character set coding schemes, this method will be usable, provided that the target character sets have repertoires that are subsets of the universal multiple-octet coded character set (ISO/IEC 10646) or of any other standard character set.

More specifically, this project will define:

-a basic method for entering a character which involves using its bit representation (canonical or abbreviated form) in ISO/IEC 10646 as a catalog number, whichever underlying code is used for that character;

-a method for entering standard keyboard symbols representing the functions used on the keyboard (formerly called meta-entry method in preliminary drafts of this international standard); this method is intended for entering characters corresponding to the visual representation of the keyboard function symbols (according to ISO/IEC 9995-7 symbols and ISO/IEC 10646 characters), with the help of the function keys themselves;

-a screen-selection entry method for selecting a character displayed on a screen for data entry;

-a feedback method that allows exact identification of characters shown on a screen, for subsequent data entry;

This standard is intended to complement existing national keyboard layouts or existing input methods optimized for national use. Hence it does not replace any national keyboard entry requirement but is rather a tool to ease entry of the complete character répertoire of ISO/IEC 10646 with the help of already existing national keyboards.

2 Normative references

ISO/IEC 9995-1 Information technology - Keyboard layouts for text and office systems - Part 1 - General principles governing keyboard layouts

ISO/IEC 9995-3 Information technology - Keyboard layouts for text and office systems - Part 3 - Complementary layouts of the alphanumeric zone of the alphabetic section

ISO/IEC 9995-7 Information technology - Keyboard layouts for text and office systems - Part 7 - Symbols used to represent functions

ISO/IEC 10646-1 Information technology - Character sets and information coding -Universal multiple-octet coded character set - Part 1 - Architecture and basic multilingual plane

3 Definitions

canonical form
the form with which characters of the UCS are specified using four octets to represent each character.
compose character
a function which selects a graphic character which has not been allocated on the keyboard by associating other allocated characters.
control key
a key representing the function Control as described in ISO/IEC 9995-7
hexadecimal
hexadecimal numbering is a counting system analogous to the decimal system, but uses base 16 instead of base 10. In other words, instead of counting from 1 to 10 before changing decimal position (a position represents units, tens, hundreds and so on in the decimal system), the counting is done from 1 to 16 before changing position (and then positions represent powers of 16: 16, 256, 4096). As there are no digits available beyond 9, the first 6 letters of the Latin alphabet (or of any alphabet if the Latin script is not used) are used to represent the extra hexadecimal "digits" 10 (A), 11 (B), 12 (C), 13 (D), 14 (E), 15 (F). In this international standard, hexadecimal numbers are used to refer to the UCS, the hexadecimal coding of which is considered as equivalent to a catalog numbering system to select characters. Hexadecimal notation exists as a shortcut to represent groups of 4 bits (there are 16 combinations possible with permutations of 4 digits whose values can be either a zero or a one); it also takes less characters to express a number in hexadecimal than in decimal.
level
a logical state of a keyboard providing access to a collection of graphic characters or elements of graphic characters. Usually these graphic characters or elements of graphic characters logically belong together, such as the capital forms of letters. In certain cases the level selected may also affect function keys.
Level 2 Select
a function which selects the set of characters or functions allocated to the level 2 of the keyboard in an active group.
UCS
the Universal Multiple-Octet Coded Character Set standard known as ISO/IEC 10646-1 and its extensions to come.

4 Symbols and abbreviations

In this standard, identification of specific characters of the UCS repertoire will be by means of symbols of the form Uxxxxxxxx where the occurrences of xx which follow the letter "U" represent the canonical form (or its abbreviated form, where leading zeros are allowed to be omitted), in hexadecimal, of a coded character as it is defined in the UCS, which is in practice a means to be code-independent (the same form, considered as a catalog number, being possibly used even if the coded character set in use in a given implementation is not the UCS) and at the same time to keep a straightforward link with the Universal Multiple-Octet Coded Character Set, which is assumed to contain all the coded graphic characters ever defined by ISO/IEC. Whenever possible other short mnemonic identifiers will be used in comments in addition to the printing of the characters themselves.

The letter U stands for UCS, which itself stands for Universal multiple-octet Coded Character set.

When the name of a character is used (standard or conventional) in this text, it is surrounded in the text by LESS-THAN SIGN and GREATER-THAN SIGN, as for, example, in the string <SPACE>, which represents the character SPACE.

5 Requirements

5.1 Basic method

5.1.1 Prerequisites

For the basic method to be used, the keyboard in use shall have an alphanumeric section. This alphanumeric section shall provide a space bar (which generates the character <SPACE>), the ten decimal digits and the first 6 letters of the Latin alphabet if the Latin script is used, or the first six letters of any other alphabet if a different script is used. These are used to represent hexadecimal numbers (see definition for hexadecimal numbering system), the catalog numbers of all UCS characters. The machine on which the keyboard operates shall be programmed in such a way that the sequence of events required by some methods (in particular method 5.2) be possible. This standard is hence typically not applicable to a mechanical typewriter.

5.1.2 Principles of Operation

The basic method operates as follows:

While the Control key (as described in ISO/IEC 9995-7) and the Level 2 Select key (as described in the same standard) are simultaneously depressed [preferably by activating the Level 2 Select key first, then the Control key], typing the hexadecimal value of the canonical form (or its abreviation as described in the Symbols and Abbreviations clause) targeting the desired character and ending the "catalog number" by depressing the space bar shall generate a coded graphic character equivalent to the one corresponding to this "catalog number" in the UCS. If the Control and the Level 2 Select keys are released, depressing the space bar is not necessary to generate a character. When the space bar is used, it allows multiple series of hexadecimal numbers, each representing a character, to be entered without releasing the Control and Level 2 Select keys. Whenever possible no distinction should be made between upper case and lower case letters (or shape variants) for entering the first 6 letters of the alphabet as hexadecimal digits.

If the machine uses a coding space in which this character does not exist, it is recommended that the user interface will at the minimum give a warning that this character is not available.

Example a. Entering character À

1. Depress the Level 2 Select key and hold it.

2. Depress the Control key and hold it.

3. Type the sequence 00C0.

4. Release the Control key and the Level 2 Select key.

Result: Character À will be generated, as it represents the UCS catalog number of the letter À, whatever code is used in the machine, provided that this character is available in the actual character set.

Please note that the canonical form of this character is 000000C0, but that the user has here chosen to not type the first four zeros. It would also have been allowed not to type any leading zero at all (typing C0 would have been valid).

If the machine uses the ISO/IEC 8859-1 coding, the bit combination whose notation is 12/00 will correspond to the actual coding.

If the machine uses the personal computer code page 850, the coded character generated will correspond to decimal number 183 (hexadecimal B7 or bit combination 11/7).

If the machine uses the EBCDIC code page 037 or 500, used by different computer manufacturers on mainframe computers, the coded character generated will correspond to hexadecimal 64.

If the machine does not have character À available or has no integral support for it in any manner, the computer should issue a warning, such as emitting a beep sound indicating that something went wrong.

Example b. Entering [Shàn], the Chinese character Shàn (Japanese Zen, Korean Seon)

1. Depress the Level 2 Select key and hold it.

2. Depress the Control key and hold it.

3. Type the sequence 5584.

4. Release the Control key and the Level 2 Select key.

Result: Character whose canonical form is 00005584 will be generated, as it represents the UCS catalog number of the Chinese character whose Mandarin phonetics corresponds to Shàn, in the Latin script transliteration system known as "hanyu pinyin", officially used in China. This character is also known in Japan and can be transliterated, in Japanese Romanji, as Zen. It is also used in Korea and pronounced Seon. The ideograph means kindness in English.

This character will be generated whatever code is used in the machine, provided that this character is available in the actual character set.

Please note that the canonical form of this character is 00005584, but that the user has here chosen to not type the first four zeros, which is a valid practice according to this international standard.

If the machine uses GB 12345-1990 coding, the coding generated (in hexadecimal notation) will be 6143.

If the machine uses JIS X 0208-1990 coding, the coding generated (in hexadecimal notation) will be 4131.

If the machine uses KS C 5601-1987 coding, the coding generated (in hexadecimal notation) will be 603C.

If the machine does not have this character available or has no integral support for it in any manner, the computer should issue a warning, such as emitting a beep sound indicating that something went wrong.

Example c. Entering character \

1. Depress the Level 2 Select key and hold it.

2. Depress the Control key and hold it.

3. Type the sequence 5C.

4. Release the Control key and the Level 2 Select key.

Result: Character \ will be generated, as it represents the UCS catalog number of <REVERSE SOLIDUS>, whatever code is used in the machine, provided that this character is available in the actual character set.

Please note that the canonical form of this character is 0000005C, but that the user has here chosen to not type the first six zeros, which is a valid practice according to this international standard.

If the machine uses the ISO/IEC 8859 coding, the bit combination whose notation is 05/12 will correspond to the actual coding (this is also notation 5/12 according to ISO/IEC 646 7-bit-coded character set).

If the machine uses the de facto EBCDIC code in use on different manufacturers' mainframe computers, the coded character generated will correspond to hexadecimal E0.

If the machine does not have character \ available or has no integral support for it in any manner, the computer should issue a warning, such as emitting a beep sound indicating that something went wrong.

5.2 Keyboard symbols entry method (formerly known as Meta-entry method)

Certain function keys of keyboards have been associated a symbol in ISO/IEC 9995-7. For those symbols which are also associated an appropriate character in the UCS, the entry of the character can be done using the function key itself. This can be used as follows, to select that character (say, for searching that character in online documentation, or simply editing such documentation for users of internationalized applications):

After the Control key and the Level 2 select key (as described in the same standard) have been simultaneously depressed and released without any other simultaneous key depression [preferably by activating the Level 2 Select key first, then the Control key and then releasing both], depressing a function key and releasing it shall enter the character assigned in the UCS to the symbol representing this function in ISO/IEC 9995-7 (provided that the character exists in the UCS). As some functions are activated with the help of a qualifier key such as Level 2 Select key, itself a function key, the resulting character shall be generated only once all keys have been released by the user.

Example a. Entering the character <RIGHTWARDS ARROW TO BAR>

1. Depress the Level 2 Select key and hold it.

2. Depress the Control key.

3. Release both the Control key and the Level 2 Select key.

At this stage, the keyboard is in a logical state where depressing any function key combination will potentially generate a character instead of activating the associated function.

4. Depress the Tabulation Right function key.

5. Release the Tabulation Right function key.

Result: Character <RIGHTWARDS ARROW TO BAR> will be generated, as it represents the UCS character whose normal presentation form corresponds to ISO/IEC 9995-7 symbol for the Tabulation Right function. The canonical value of this character is 000021E5 in the UCS.

If the machine does not have this character available or has no integral support for it in any manner, the computer should issue a warning, such as emitting a beep sound indicating that something went wrong.

Example b. Entering the character <UPWARDS WHITE ARROW>

1. Depress the Level 2 Select key and hold it.

2. Depress the Control key.

3. Release both the Control key and the Level 2 Select key.

At this stage, the keyboard is in a logical state where depressing any function key combination will potentially generate a character instead of activating the associated function.

4. Depress the Level 2 Select function key.

5. Release the Level 2 Select function key.

Result: Character <UPWARDS WHITE ARROW> will be generated, as it represents the UCS character whose normal presentation form corresponds to ISO/IEC 9995-7 symbol for the Level 2 Select function. The canonical value of this character is 000021E7 in the UCS.

Example c. Entering the character <LEFTWARDS ARROW TO BAR>

1. Depress the Level 2 Select key and hold it.

2. Depress the Control key.

3. Release both the Control key and the Level 2 Select key.

At this stage, the keyboard is in a logical state where depressing any function key combination will potentially generate a character instead of activating the associated function.

4. Depress the Level 2 Select function key.

5. Depress the Tabulation Right function key.

6. Release all keys.

Note: this scenario is an example only and does not imply any obligation to have a keyboard on which the Tabulation Left function is allocated to the same key as the Tabulation Right key nor that the Tabulation Right function is allocated a dedicated key. It only reflects a common practice.

Result: Character <LEFTWARDS ARROW TO BAR> will be generated, as it represents the UCS character whose normal presentation form corresponds to ISO/IEC 9995-7 symbol for the Tabulation Left function, assumed here to be placed on the same key as the Tabulation Right function (this one being accessible with level 1 of the keyboard, the first one being accessible after selection of level 2). The canonical value of this character is 000021E4 in the UCS.

If the machine does not have this character available or has no integral support for it in any manner, the computer should issue a warning, such as emitting a beep sound indicating that something went wrong.

5.3 Screen-selection entry method

By using a pointing device and a selection mechanism (such as clicking a button), the user will select, with this method, a character displayed on a screen. The system shall input the correct coded character that is represented by the shape selected. The feedback method described in clause 5.4 below shall be used to indicate to the user the actual coded character in use.

5.4 Feedback method for identifying displayed characters for later input

By using a pointing device, whether or not the screen-selection input method is activated or not by a selection mechanism (such as clicking a button), pointing a character on a screen shall result in indicating to the user (whether by displaying on the screen, or by voice-output, or by any other means) the exact identification of the character represented by the shape pointed. So that this information can be used for further input using as a minimum the basic method described in clause 5.1, this information shall be presented giving the canonical form of the UCS character under the form xxxxxxxx (according to the convention described in clause 4). It is furthermore recommended that if another code table is used internally, the bit combination actually used in that system be also displayed as complementary information, as well as the name of the character in the natural language of the user.

Example scenario

Let's suppose that shape A is displayed on a screen to represent LATIN CAPITAL LETTER A and that the user is French-speaking. Let's also suppose that feedback is generated on a screen under the form of a virtual slide which is displayed as long as a character is pointed by a mouse, used as pointing device.

The user points this character with his mouse.

The computer then displays the canonical form 00000041 whatever the coding actually used by the machine. According to the recommendation made in this international standard, in this scenario, the actual coding used by the machine is also displayed (let's suppose here EBCDIC character A), in this case hexadecimal C1 as well as the string LETTRE MAJUSCULE LATINE A, corresponding to the character name in the user's natural language.

This example is used to show the necessity, in a lot of circumstances, of such a function. The UCS is a huge character set, in which many characters can share the same shape, or external representation, while they actually represent different conceptual characters, which is reflected in the internal coding also being different.

In this example, shape A is displayed on a screen, and the conceptual character can to which this shape corresponds can be three-fold: capital Latin letter A, capital Greek letter Alpha or capital Cyrillic letter A. In order to have provisions for this the UCS uses three canonical values for these three different characters. Hence it becomes important for a user to know what the actual character is if it is to be used for input later on, and for retrieval even at a much later stage. The feedback method will positively identify the right character without any possible ambiguity for further input.

5.5 Unexpected conditions

It is recommended that whenever a character entered through any of one of these methods can not be supported by the underlying applicative environment, a warning be issued to the user by either sound or visual indication, or by any sensible means of attracting the user's attention. Whenever possible, data integrity of the character entered should be preserved even if it is not possible to display it or if it is not possible to process it properly with full applicative support. The preservation of data integrity should then allow interchanging the character to another application which will be better able to deal with this character. This recommendation could be achieved by a lot of different technical means left to the implementer.

6. Conformance

An applicative environment (an application [or a complete system performing functions for all applications running under it] used in conjunction with a keyboard) conforms to this international standard if all mandatory prescriptions of clause 5 are met.


Of course the draft is only a draft and does not carry an ISO/IEC copyright. People are invited to follow the de jure status of this project through ISO/IEC channels.

Alain LaBonté
Québec

Project Editor, ISO/IEC 14755
[Rédacteur]


Information about the current status of any ISO document is available from ISO headquarters in Geneva.
email: <Central@ISOCS.ISO.CH>, phone: +41 22 7490111, fax: +41 22 7333430, WWW: <http://www.iso.ch/>


Responsable : Philippe Deschamp.
Adresse : <Philippe.Deschamp@INRIA.Fr> — tout commentaire courtois bienvenu.
Date de création : 1995-10-18
Date de mise à jour : 1996-02-09
Date de modification : 2011-03-18
URL d'origine : http://Deschamp.Free.Fr/exinria/divers/ALB-CD.html HTML 3.0 (Beta) Checked!

Le contenu de ces pages relève de la seule responsabilité de leurs auteurs, et ne représente pas nécessairement le point de vue officiel de l'INRIA ni de toute autre partie. Les informations sont présentées de bonne foi, mais leur exactitude ne saurait être garantie.

[INRIA]
Institut National de Recherche en Informatique et en Automatique
Free Sitemap Generator