Business Solutions in Multilingual Databases and Internet Services

Michael G. McKenna <mgm@sybase.com>
Globalization Architect
Sybase, Inc.
1650-65th Street
Emeryville, California 94608, USA
Tel.: +1 510 922 3579
fax: +1 510 922 4228

Abstract

With the the blinding growth of the Internet and subsequent adoption of its paradigms for the intranet for business communications and information management, the benefits as well as the inherent problems of status quo global cyberspace have been inherited by the business community.

Specifically, for global organizations, the problem of multilingual access and multinational management of data has surfaced. How can a Chinese business track its shipments to other Chinese locations on an Internet tool designed for an American-based overnight shipping company? How should a marketing, insurance, or online shopping company manage its mailing lists and customer information in a truly World Wide Web? With the Internet, traditionally local companies become global entities overnight. Global organizations, when implementing an intranet solution, must then deal with cross-cultural business rules, heterogeneous data encodings, and multiple scripts for the representation and manipulation of data.

When databases are connected to the Web they may use different character set encodings, and different database vendors use different proprietary protocols to ascertain or communicate the encoding being used.

From a systems analyst's point of view, this can quickly become a nightmare of conflicting standards, cultures, scripts, and tongues. Should the developer, when confronted with the prospect of creating a globally accessible world-class system, be reduced to a quivering mound of jelly or a dictatorial proponent of English-only formats? No! By systematically breaking the various components down to their atomic elements, one can find commonalities, as well as areas for normalization of data and protocols. All implementations use at least one data encoding. All implementations use at least one (human) language.

By keeping track of encodings and languages, a developer can normalize to a universal character set (Unicode) for guaranteed data integrity in data transmissions. Unicode can now be used in operating systems from Microsoft, IBM, DEC, and Sun; is available for use in database systems from Sybase, Adabas, Taligent, and Oracle; and is the internal process code for Java. A tagging and encoding scheme for Unicode has been agreed upon by the Internet Engineering Task Force, which enables its use for MIME-encoded e-mail and HTML.

Sybase is actively pursuing the Internet marketplace by providing database, connectivity, applets, multimedia, and developer tools. After analyzing the problems presented above, Sybase came to realize that an integral part of the global solution is to provide powerful codeset conversion tools. They are also working as the coordinators of the MAITS Consortium to suggest solutions and standards for seamless multilingual global inteconnectivity for Internet- and Web-based applications.

This paper will present these problems, with an analysis and a look at solutions being implemented by Sybase and other database and systems providers.