Project: 18.57.00.00.00.00 ISO/IEC
JTC1/SC18/WG9 N1440
Title Input methods to enter characters from the repertoire of ISO/IEC
10646 with a keyboard or other input devices
[Méthodes de saisie de caractères du répertoire de l'ISO/CÉI
10646 à l'aide d'un clavier ou d'autres unités d'entrée]
Status: Working Draft 3 for CD registration and ballot
Date: 1995-02-14
Acting Editor: Alain LaBonté
Gouvernement du Québec
Secrétariat du Conseil du trésor
Service de la prospective et de la francisation
Édifice H
875, Grande-Allée Est, 4e étage
Québec, QC G1R 5R8
Canada
Email: ALB@RIQ.QC.CA
Foreword
ISO (the International Organization for Standardization) and IEC (the
International Electrotechnical Commission) form the specialized system
for worldwide standardization. National bodies that are members of ISO
or IEC participate in the development of International Standards through
technical committees established by the respective organization to deal
with particular fields of mutual interest. Other international
organizations, governmental and non-governmental, in liaison with ISO
and IEC, also take part in the work.
In the field of information technology, ISO and IEC have established a
joint technical committee ISO/IEC JTC1. Draft International Standards
adopted by the joint technical committee are circulated to the national
bodies for voting. Publication as an international standard requires
approval by at least 75% of the national bodies casting a vote.
International standard ISO/IEC POIUY has been prepared by Joint
Technical Committee ISO/IEC JTC1, Information technology.
Introduction
Today, there is a well-known method in existence for inputting
characters foreign to a given keyboard on PC compatibles.
However this method is code-dependent and is limited to 8-bit
character sets. There is a need to standardize such a method
independently of coding even for these limited sets of characters.
There is also an international standard, ISO/IEC 9995-3, for
inputting on a standard 48-key keyboard the repertoire of
characters belonging to those European languages using the Latin
script. But this standard is limited to the Latin script, even if it
opens the door to the defining supplementary groups for other
scripts. In the meanwhile, until other groups are well defined
and documented, there should be an easy standard way to enter
non-Latin characters in a code-independent fashion. This would
avoid the multiplication of such methods, a situation that is never
desirable for end-users.
Furthermore, ISO/IEC JTC1 recently published a standard,
ISO/IEC 10646, titled "Universal multiple-octet coded character
set (UCS)", which is a superset of the repertoires of all standard
character sets published so far by ISO/IEC JTC1. For this one
large character set (UCS), there is no standard input method in
existence today. But there will be an increasing need to do so,
which would also solve the problem of code independence.
Title Input methods to enter characters from the repertoire of ISO/IEC
10646 with the help of a keyboard or other input/output devices
[Méthodes de saisie de caractères du répertoire de l'ISO/CÉI
10646 à l'aide d'un clavier ou d'autres unités d'entrée-sortie]
1 Scope
This international standard defines methods that allow entry of
characters belonging to the repertoire of single and multi-octet
coding standards such as ISO/IEC 10646 in a code independent
manner using a keyboard or other input/output devices. It is also
expected that for implementations of different character set
coding schemes, this method will be usable, provided that the
target character sets have repertoires that are subsets of the
universal multiple-octet coded character set (ISO/IEC 10646) or
of any other standard character set.
More specifically, this project will define:
- a basic method for entering a character which involves
using its bit representation (canonical or abbreviated
form) in ISO/IEC 10646 as a catalog number, whichever
underlying code is used for that character;
- a meta-entry method for entering characters
corresponding to the visual representation of the
keyboard function symbols (according to ISO/IEC 9995-
7 symbols and ISO/IEC 10646 characters), with the help
of the function keys themselves;
- a composition method to enter a character with the help
of a mnemonic sequence of characters;
- a screen-selection entry method for selecting a character
displayed on a screen for data entry;
- a feedback method that allows exact identification of
characters shown on a screen, for subsequent data entry;
- a group identification method for entering characters
with a group select mechanism;
This standard is intended to complement existing national
keyboard layouts or existing input methods optimized for
national use. Hence it does not replace any national keyboard
entry requirement but is rather a tool to ease entry of the
complete character répertoire of ISO/IEC 10646 with the help of
already existing national keyboards.
2 Normative references
ISO 639 Two[three]-letter codes for languages
ISO 3166 Two-letter codes for countries
ISO/IEC 9995-1 Information technology - Keyboard layouts for
text and office systems - Part 1 - General principles governing
keyboard layouts
ISO/IEC 9995-3 Information technology - Keyboard layouts for
text and office systems - Part 3 - Complementary layouts of the
alphanumeric zone of the alphabetic section
ISO/IEC 9995-7 Information technology - Keyboard layouts for
text and office systems - Part 7 - Symbols used to represent
functions
ISO/IEC 10646-1 Information technology - Character sets and
information coding -Universal multiple-octet coded character set
- Part 1 - Architecture and basic multilingual plane
3 Definitions
canonical form the form with which characters of the UCS are
specified using four octets to represent each
character.
compose character a function which selects a graphic character
which has not been allocated on the keyboard by
associating other allocated characters.
control key a key representing the function Control as
described in ISO/IEC 9995-7
group a logical state of a keyboard providing access to
a collection of graphic characters or elements of
graphic characters. Usually these graphic
characters or elements of graphic characters
logically belong together and may be arranged on
several levels within a group. The input of
certain graphic characters, such as accented
letters, may require access to more than one
group.
Note: The term "group" is not to be mistaken in
keyboard standards with the term "group" as
used in ISO/IEC 10646-1, a term which is
defined differently from the one given in keyboard
standards. The definition proposed in the
previous paragraph is the one given in ISO/IEC
9995-1.
level a logical state of a keyboard providing access to
a collection of graphic characters or elements of
graphic characters. Usually these graphic
characters or elements of graphic characters
logically belong together, such as the capital
forms of letters. In certain cases the level
selected may also affect function keys.
level 2 select a function which selects the set of characters or
functions allocated to the level 2 of the keyboard
in an active group.
UCS the Universal Multiple-Octet Coded Character Set
standard known as ISO/IEC 10646-1 and its
extensions to come.
4 Symbols and abbreviations
In this standard, identification of specific characters of the UCS
repertoire will be by means of symbols of the form Uxx[xx[xx[xx]]]
where the occurrences of xx which follow the letter "U" represent the
canonical form (or its abbreviated form, by eliminating as many leading
zeroes as required), in hexadecimal, of a coded character as it is defined
in the UCS, which is a means to be code-independent (the same form,
considered as a catalog number, being possibly used even if the coded
character set in use in a given implementation is not the UCS) and at the
same time to keep a straightforward link with the Universal Multiple-
Octet Coded Character Set, which is assumed to contain all the coded
graphic characters ever defined by ISO/IEC. Whenever possible other
short mnemonic identifiers will be used in comments in addition to the
printing of the characters themselves.
The letter U stands for UCS, which itself stands for Universal multiple-
octet Coded Character set.
When the name of a character is used (standard or conventional) in this
text, it is surrounded in the text by LESS-THAN SIGN and GREATER-
THAN SIGN, as for, example, in <SPACE> which represents the
character SPACE.
5 Requirements
5.1 Basic method
5.1.1 Prerequisites
The basic method requires that the keyboard in use has, in its
alphanumeric section, the space bar (which generates the
character <SPACE>), the ten decimal digits and the first 6
letters of the Latin script allocated (or the first 6 letters of any
other alphabet mapped accordingly) to represent hexadecimal
numbers. In examples presented here, a one-to-one sequential
mapping is implicit between the first 6 letters of any alphabet
and the letters ABCDEF of the Latin script).
5.1.2 Principles of Operation
The basic method operates as follows:
While the Control key (as described in ISO/IEC 9995-7) and the
Level 2 select key (as described in the same standard) are
simultaneously depressed [preferably by activating the Level 2
select key first, then the Control key], typing the hexadecimal
value of the canonical form (or its abreviation as described in the
Symbols and Abbreviations clause) targeting the desired character
and ending the "catalog number" by depressing the space bar
shall generate a coded graphic character equivalent to the one
corresponding to this "catalog number" in the UCS. If the
Control and the Level 2 Select keys are released, depressing the
space bar is not necessary to generate a character. When the
space bar is used, it allows multiple series of hexadecimal
numbers, each representing a character, to be entered without
releasing the Control and Level 2 select keys. Whenever possible
no distinction should be made between upper case and lower
case letters (or shape variants) for entering the first 6 letters of
the alphabet as hexadecimal digits.
For example, if the sequence 00C0 is typed while Level 2 select
and Control were previously depressed and held depressed
during the typing, character "À" will be generated. In an
ISO/IEC 8859-1 environment, the code value generated by such
an operation would be hexadecimal C0; in an IBM 850
environment, this would generate the code value equal to
hexadecimal B7 (or decimal 183); in an IBM 863 environment,
the code value generated would be hexadecimal 8E (decimal
142). In an IBM 437 environment (the original PC code page)
or in any coding environment where the target character does
not exist, it is recommended that this code will at the minimum
give a warning that this character is not available and generate,
for the example presented here, the base letter "A" unaccented.
See clause 5.7 for the normative presentation fallback.
5.2 Meta-entry method
Function keys of the keyboard to which correspond, for each of
them, a symbol in ISO/IEC 9995-7, and for which symbol an
appropriate character exists in the UCS, can be used as follows,
to select that character (say, for searching that character in
online documentation):
After the Control key (as described in ISO/IEC 9995-7) and the
Level 2 select key (as described in the same standard) have been
simultaneously depressed and released without any other
simultaneous key depression [preferably by activating the Level
2 select key first, then the Control key and then releasing both],
depressing a function key whose function has been assigned an
ISO/IEC 9995-7 symbol the shape of which corresponds to a
UCS character, will select that character.
If a function is normally activitated with the help of a select key
(level 2 or level 3), like for example, the Tabulation Left
function, then the selection of the character corresponding to the
symbol function can be done in the same way as when the
function is activated. The normative presentation fallback
presented in clause 5.7 applies when this clause is a requirement
as per conformance level 2.
To illustrate that method, if one depresses the Control key in
conjunction with the Level 2 select key and releases both keys,
then depresses the Tabulation Right key, the resulting character
that should be generated with this method should be the UCS
character represented by the canonical value U000021E5 (the
name of this character is <RIGHTWARDS ARROW TO
BAR>, the shape of which corresponds to the ISO/IEC 9995-7
symbol for Tabulation Right)
5.3 Composition method
Given the availability in a system file of a mapping between
mnemonic sequences of characters available on the user
keyboard and individual characters of the UCS (such as the
sequence <E'> representing the letter "É" or such as the
sequence <9I> representing the character "¶"), then the
selection of any of the target characters of this map is done in
the following way:
While a composition mechanism (Compose character function)
is activated, typing a mnemonic sequence of characters exactly
and deactivating the Compose character function shall select the
appropriate character mapped to this sequence. The normative
presentation fallback presented in clause 5.7 applies when this
clause is a requirement as per conformance level 3.
5.4 Screen-selection entry method
By this method, the user will select, by using a pointing device,
a character displayed on a screen. The system shall input the
correct coded character that is represented by the shape selected,
even if that shape is the result of the fallback representation
described in clause 5.7. The feedback method described in
clause 5.5 below can optionally be used to indicate to the user
the actual coded character in use. The normative presentation
fallback presented in clause 5.7 applies when this clause is a
requirement as per conformance level 4.
5.5 Feedback method for identifying displayed characters for later
input
When this method is selected, in addition or in conjunction with
the screen selection entry method, selecting a character on a
screen with the help of a pointer device, shall display to the user
the exact identification of the character represented by the shape
pointed, even if that shape is the result of the fallback
representation described in clause 5.7. So that this information
can be used for further input using as a minimum the basic
method described in clause 5.1, this information shall be
presented giving the canonical form of the UCS character under
the form Uxxxxxxxx (according to the convention described in
clause 4). It is recommended that if another code table is used
internally, the bit combination actually used in that system be
also displayed as complementary information.
As an illustration of the preceding, if shape "A" is displayed on
a screen, the underlying coded character can correspond to one
of the three following possibilities: capital Latin letter A, capital
Greek letter Alpha or capital Cyrillic letter A. If, for reference,
one wants to later retrieve the same occurrence in a file, one
needs to recall the right character. The feedback method will
positively identify the right character and easily make possible
later input of the latter.
5.6 Group identification method of national keyboard layouts
Whenever more than two groups are in use, to be conformant to
this standard at level 6, the group select mechanism shall be able
to directly select a group corresponding to a country code
specified in ISO 3166 and, optionally, or if more than one
language is in use in the country, a language code specified in
ISO 639 to identify standard national keyboard layouts. When
more than one keyboard layout (typically de facto keyboard
layouts, by opposition to de jure national standards) is in use for
a given country language, the system shall provide a default to
the user. It shall be possible for the user to change that default
at system level for a given country and language.
As an example, selecting the Canadian keyboard standard
suitable for both French and English (standard CAN/CSA
Z243.200-1992), the group will be identified by CA (ISO 3166
code for Canada). In this country, in addition to the Canadian
standard, former Canadian French keyboard variants (vendor-
dependent) can be invoked by using the supplementary code
"fr[a]" (ISO 639 code for the French language). In the same
manner, American keyboard variants (also vendor-dependent)
can be invoked by using the supplementary code "en[g]" (ISO
639 code for the English language). In this example, without
indication of a language, the default would implicitly be the
Canadian keyboard standard for French and English, as there is
only one logical national standard in this country, suitable for
the two official languages at once. If a specific language is
identified, a default keyboard layout would then be determined
by system parameters (either the standard keyboard or de facto
language-specific variants).
Note: At time of editing this standard, ISO 639 currently
documents languages with 2-lowercase-letter codes (which allows
the theoretical representation of a maximum of 676 languages).
This standard is under revision and is expected to be able to
represent more languages with a 3-letter code. Therefore, in
implementing this method, no assumption on the number of
letters to represent a language code should be made. Flexibility
of implementation regarding this issue is recommended.
5.7 Character integrity and presentation fallback method
In all cases when a character that is part of the répertoire of
ISO/IEC 10646 is entered, it shall be internally mapped to the
same character if a different target code table is used than the
UCS. If this character is not available in the target code, the
system shall take whatever appropriate measure so that character
integrity be respected in all cases (such as using the Universal
Transformation Format 8 [UTF-8] prescribed by ISO/IEC 10646
that allows to represent a UCS character in an 8-bit
environment).
If the character entered is undefined in ISO/IEC 10646, in
addition to the requirement that if applicable, the UCS coding
space integrity be respected (i.e. that the integrity of the
character be respected as if the character were defined, to care
about addenda to this international standard before a next release
of the actual interface in use) it is recommended that a special
warning be issued informing the user of the potential problem
and giving him/her the opportunity to undo this particular input.
If the character entered can not be displayed because the
character is not available on an output device, as an acceptable
optional presentation fallback measure, it is recommended that,
following the selection of the unavailable character, the display
driver generate, whenever possible, a character with a close
similarity (a character with a related shape or a character with
a generally recognized common interpretation), while giving a
warning (by an audible signal or by any means judged
appropriate). It is to be noted, however, that for most UCS
characters a fallback will be very difficult with 7/8-bit character
sets, and a very unlikely character (such as, for example, in an
MS-DOS environment, PC code 127 for EMPTY HOUSE) is
then suggested to be generated instead of the selected character,
as a further error indication.
When programming complexity is not a problem, a string of
characters can be displayed instead of only one fallback
character, to represent more accurately the character unavailable
for normal display. If this option is chosen, however, the system
shall keep a reversible one-to-one mapping between the
boundaries of whole string displayed and the intended original
coded character, so that the feedback method specified in clause
5.6 be usable in pointing any element of that string.
6. Conformance
Any claim of conformance of a system to this standard shall be
accompanied by a list of conformance levels selected from the
following list:
level 1 shall meet the requirements of clauses 5.1 and 5.7;
level 2 shall meet the requirements of clauses 5.2 and 5.7;
level 3 shall meet the requirements of clauses 5.3 and 5.7;
level 4 shall meet the requirements of clauses 5.4 and 5.7;
level 5 shall meet the requirements of clauses 5.5 and 5.7;
level 6 shall meet the requirements of clauses 5.6 and 5.7;
level 7 shall meet the requirements of all clauses of section 5
(5.1 through 5.7).
Regards.
Alain LaBonté
Québec
Project Editor