The Linguistic Implications of Standardization of Information Technologies

Stéphane Chaudiron
Ministère de l'Éducation nationale, de l'Enseignement supérieur et de la Recherche, Paris
(telephone: +33 1 46 34 30 32; e-mail: chaudiron@distb.mesr.fr)

Marcel Cloutier
Ministère des Relations internationales, Quebec City
(telephone: +1 418 649-2323; e-mail: mcloutie@riq.qc.ca)

The extremely rapid growth of information highways, especially the Internet network, which is the main player in the sector, cannot overshadow the fact that numerous, diverse obstacles are hindering the access of potential users to the services available. As a result of this situation, not everyone has equal access to information services. Parts of continents are excluded from this new Eldorado, along with a considerable portion of the populations of industrialized nations because of economic factors, inadequate telecommunications infrastructures, a lack of acculturation in the realms of computers and telematics, or simply the lack of adaptation of existing services to users' genuine needs.

It has been noted that the development of this new worldwide information infrastructure, which now spans over 80 countries, is based on a network, i.e., the Internet, whose technical characteristics limit the use that can be made of it, especially when information must be processed in a multilingual environment. At present, and despite a growing number of non-English-language services on the Web, the Internet is still highly influenced by the American origins of the technical choices that only make it possible to fully present the English language at all times. In this respect, the Internet displays a serious lack of attention to the needs of multilingual communication and does not make it possible to express linguistic diversity.

Bearing in mind the likely linguistic and cultural consequences of this situation were it to persist, a task force was set up in January 1995 in conjunction with Franco-Quebec cooperation in the realm of information technologies. The group, dubbed NoTIAL (Normalisation des Technologies de l'Information dans leurs Aspects Linguistiques), was asked to put forward a plan of action aimed at influencing the normative strategies now being devised with respect to information technologies to ensure that such strategies take into account the specific traits of various national languages, especially French. The NoTIAL group is focusing primarily on normalization and standardization measures likely to concern respect for and the promotion of linguistic diversity in technology and information highways.

In conjunction with the report that will be submitted shortly to the authorities concerned, the experts in the group--made up of industrialists, users, specialists from normalization organizations, and institutional representatives--are analyzing the situation of the French language and proposing several recommendations regarding strategies governing the making of de jure and de facto standards and initiatives aimed at promotion, awareness, and monitoring.

Linguistic implications of normalization and standardization

Two complementary factors come into play with respect to the question of the French language and standardization. First, there is a question of elaborating standards that make it possible to efficiently transmit and process French in all information technologies, especially on information highways. This objective, which is crucial to the future of the French language, can only be achieved if French-speaking experts are sufficiently present in standardization bodies and a constant awareness campaign is conducted among users and key influencers, especially industrial and institutional ones.

Next, there is every reason to ensure that the status of French, which is one of the three official languages of the ISO, is bolstered as a common language in standardization agencies, as the language in which standards are written, and as a working language. It should be noted in this respect that its status as an official language is very precarious and that there is a tendency for English to impose itself in broad sectors of standardization as the only working language. It is important to ensure that translated versions of standards are available and that the writing in French of standards be encouraged. Such a move would significantly enrich normative terminology and encourage the broader dissemination of standards in the international French-speaking community.

The NoTIAL group has adopted five priority themes:

1. Information coding, markup, and tagging

Information coding is the very foundation of computer communication involving language. Virtually all of the problems pertaining to the processing of French and languages other than English in computer processing and telematics have arisen because several implementations of software or telematic transmission tools use seven-bit character sets and because the same software or tools tend to eliminate one bit out of eight for character sets representing languages other than English.

Since 1987, several eight-bit character-set standards have been available, including ISO/IEC Standard 8859-1, which makes it possible to represent 14 European languages that use Latin characters and are spoken in at least 44 industrialized nations. However, this character set does not solve all problems, as it does not make it possible to represent simultaneously the Greek and Cyrillic alphabets. It is, nonetheless, a sound base that remains to be promoted and used for the purpose of elaborating other standards.

More recently, ISO published standard ISO/IEC 10646-1, the Universal Coded Character Set, which can solve all of the coding and character representation problems in Europe and elsewhere. The character set is still under development, but its current implementation, which reflects the Unicode standard, must be encouraged.

A key factor regarding the promotion of this standard in the French-speaking community is the availability of a French version. Until very recently, ISO sought to impose the English version as the only one having normative validity. A campaign is under way to reestablish the status of French in this respect.

As for information markup and tagging, it is important to monitor the development of the Standard Generalized Markup Language (SGML) standard and the Multipurpose Internet Mail Extension (MIME) standard used on the Internet.

In any definition type document, SGML defines by default a character set that corresponds to the set described in ISO/IEC Standard 646, i.e., ASCII code. While it does make it possible to properly read an SGML source file in English, it makes a source file in French virtually illegible. At the very least, the use of ISO Standard 8859-1 should be required, which is allowed by SGML, or ISO/IEC Standard 10646-1 should be promoted for the aforementioned reasons.

In the realm of the Internet, Hypertext Markup Language (HTML) used on the World Wide Web is a limited implementation of SGML. The HTML specification makes possible a ubiquitous use of ISO/IEC Standard 8859-1, in part because it was devised by the French-speaking community.

MIME is a tagging system that makes it possible to pinpoint objects disseminated by the Internet. However, like any Internet specification, it is not a standard and has a number of weaknesses, notably the presence of a seven-bit mail entry standard on which the systems fall back in case of doubt. To facilitate the processing of certain applications, it is important to ensure that the language of use is tagged in MIME. In principle, ISO/IEC Standard 639 codifies all the languages of the world. Unfortunately, not all languages are found there, and there exists an American coding system that is more complete, ETHNOLOGUE, whose descriptions are unilingual English and nonstandardized.

2. User-machine access interfaces

There is no question of using any character whatsoever without the appropriate data input mechanism. Several national standards, while poorly implemented on both sides of the Atlantic, are available for the integral input of French. There is every reason to promote the standards, especially among institutional key influencers when specifications are drawn up in conjunction with calls for tender pertaining to government procurement.

No national or international standard now exists that makes it possible to capture the Universal Coded Character Set. ISO is preparing a minimum standard in this respect, which is being written by Quebec experts. There are many shortcomings with regard to the integral input of French on keyboards; and capture methods of the future, especially voice input, pose new problems from the standpoint of linguistic implications. A draft technical report on the matter is in preparation, the writing of which has been assigned to French experts.

3. Ordering of information

Sort and collating order procedures are used in 70 percent of computer applications, although no international standard yet exists in this respect. Several national standards exist that are more or less adapted to data processing. A Canadian standard (which originates from Quebec) covers the order of dictionaries and data processing requirements pertaining to rigorousness. This standard already harmonizes the ordering of six European languages: French, English, German, Portuguese, Dutch, and Italian.

A proposed international standard, based initially on the Canadian standard, which fully takes into account French, is in preparation. The standard in question must be implicit and leave room for adaptation procedures suited to languages not covered. The objective is to harmonize as many languages as possible in the implicit standard. The Universal Coded Character Set is fully integrated into this international classification standard.

4. Internationalization of software

Internationalization goes hand-in-hand with the localization of software. The challenge is to elaborate standardized methods that make it possible to develop culturally and linguistically neutral generic software whose localization will be facilitated to serve the cultural and linguistic interests of users the world over.

The challenge is to ensure that the models now being developed reflect the viewpoints of French speakers. It is important to quickly pinpoint the weaknesses of emerging technologies in the realm of internationalization (e.g., the Internet, vocal input of text using regional accents).

5. Coding of linguistic resources

The coding of linguistic resources reflects an extremely important line of thinking to the extent that the choices being made by various authorities will directly affect the place reserved for French and other languages in the electronic information industry.

Choices concern structuring of information that is accessible through electronic means (e.g., terminology databanks, dictionaries, bodies of written and spoken text) and the linguistic data used by electronic information processing tools (e.g., thesauruses, dictionaries of words or sounds).

What implementation strategy should be adopted?

The elaboration of a veritable strategy for the promotion of multilingualism on information highways will make it necessary to take into consideration two complementary issues: the importance of the implementation of immediate solutions without waiting for them to attain the status of standards, and the overlapping and interdependence between basic norms or standards and the fields of application.

On one hand, de jure standards are, by virtue of their voluntary and consensual nature, of common interest and require a long, complex process of adoption. On the other hand, de facto standards are based on special products or interests that certain industrial or regional groups want to protect. Consequently, they engender a genuine desire for promotion on the part of the players concerned and lie in the realm of the fait accompli or the monopoly. Moreover, because of the very high cost of elaborating them, de jure standards are, more often than not, less accessible to the public than de facto standards.

This explains, by and large, the success of many de facto standards implemented in the construction of the Internet (while it can be noted that Internet also uses de jure standards). Indeed, if de jure standards are more readily able to satisfy common needs in a broad, long-term manner, then de facto standards are better suited to solving immediate problems and satisfying urgent needs. They are in the hands of the most powerful, if not the most decisive, players and contribute to further exacerbating inequalities and dominant positions, specifically that of English in the case under review.

This observation clearly indicates the need to emphasize pragmatic solutions. First, it is important to promote existing de jure and de facto standards when they reflect technical choices that are compatible with multilingualism. When standards do not exist but solutions appear to be emerging, industrialists and other players must be encouraged to implement them before they are officially adopted as de jure or de facto standards.

At a time when information technologies are becoming more widespread and exchanges are undergoing a process of internationalization, standardization is crucial to the development of applications. The overlapping and interdependence of different levels of norms and standards are now apparent. Certain basic de jure standards, such as those governing the coding of character sets, are used in numerous applications and tools that, in turn, require the elaboration of other de jure standards pertaining specifically to the fields in question.

Under the circumstances, it is not sufficient to deal with a single de jure standard but at all levels, in a concerted manner. E-mail client software provides a typical example: Even if the adoption of the MIME standard as a stopgap solution in the face of the US-ASCII standard could almost solve the problem of the transfer of special characters in the heading and body of messages, thereby making possible the implementation of ISO/IEC Standard 8859-1, it must be noted that the integral use of French is still not possible because of the standards governing the domain name of the field (for the mail address), the management of the networks (for messages generated by the server and the operating system), the file management system (for the names of message filers), and so on.

It is all the more important to take into account the impeding and interdependent nature of standards as applications and tools will continue to emerge or converge. Every time it is necessary to elaborate a standard, certain established basic standards will be subject to the risk of abandonment or reexamination.

Acknowledgments

We wish to thank all members of the NoTIAL group, on whose work this paper is largely based.