INET Conferences

	Conferences

	INET
	NDSS Other Conferences

Using the Internet in Arabic: Problems and Solutions

Badr H. AL-BADR <badr@kacst.edu.sa>
King Abdulaziz City for Science and Technology
Saudi Arabia

Abstract

This paper addresses the support required for Arabic on the Internet in the fields of content, transport, client processing, and server processing. The problems in each category are discussed and the solutions are surveyed, with the new Internet protocols that facilitate using Arabic on the Internet being taken into consideration.

One of the major problems that faces the use of Arabic is the plurality of character sets. Transporting Arabic text over the Internet is problematic because of its non-ASCII character sets. Major among the client-processing issues is display of Arabic text.

The display features of Arabic text set it apart from other languages in several ways. These features necessitate specialized text-display algorithms. One of the most important server-processing issues for Arabic text is the problem of search and indexing. These operations are more involved in Arabic than in many other languages.

Solutions have started to emerge with browsers and mail programs building on new Internet standards such as Multipurpose Internet Mail Extensions (MIME) and HTML 4.0. The trend towards Unicode also helps the exchange of Arabic on the Internet. The paper concentrates on the issues of character sets, bidirectional display, Arabic e-mail, Arabic Web browsing, and search and indexing of Arabic text on the Internet.

Introduction
Character sets
- Arabic character sets
- Unicode
Arabic display issues
- Rendering Arabic text
- Displaying bi-directional text
Arabic e-mail
Arabic web browsing
Arabic search and indexing
Summary
References

Introduction

For the Internet to be truly international, it must support the diverse languages of the world. Arabic, a language spoken by millions of people worldwide, is being increasingly used on the Internet, despite the confronting obstacles. A major obstacle facing Arabization of the Internet is the lack of standards, particularly in the field of character sets. The Internet is a heterogeneous environment composed of different configurations of hardware and software (transport and net equipment). Standards are the way to get the different parties on the Internet to agree on how to format and exchange information.

Other obstacles were uncovered by a survey conducted at the beginning of 1997 on the perceived problems facing Arab-speaking Internet/intranet users [1]. Users ranked the obstacles to greater Internet usage in the Arab world in the following order: weak telecom infrastructure, lack of Arabic content on the Internet, and lack of Arabic Internet access programs for the Web and for e-mail. (Most of the survey respondents were in Saudi Arabia.)

The support required for Arabic on the Internet can be categorized in the fields of content, transport, client processing, and server processing. A certain level of support is required in each category. The support required is not all unique to Arabic. In fact, internationalization (i18n) is an active field of research in Internet technology.

Arabic content (textual content, to be specific) relates to representing the data itself (using character sets) and to formatting it. Formatting is specified by Internet standards such as HTML in the case of the World Wide Web pages and RFC 822 and MIME in the case of e-mail messages. The Transport protocol is HTTP (HyperText Transfer Protocol) for the Web and SMTP (Simple Mail Transfer Protocol) for e-mail. Client processing includes generating, displaying, and interacting with Arabic text, while server processing includes storing, processing, searching, and providing Arabic content. Most of these issues are addressed below.

One of the major problems that faces the use of Arabic is the plurality of character sets. Transporting Arabic text over the Internet is problematic because of its non-ASCII character sets.

Major among the client processing issues is display of Arabic text. The display features of Arabic text set it apart from other languages in several ways: Arabic text is cursive, and the shapes of its characters depend on their position in the word. Most Arabic characters connect to one another when they are written in the same word. The directionality of Arabic text is peculiar: While Arabic text is written right-to-left, Arabic numbers are written left-to-right. This feature and the frequent need in everyday use of combining Arabic and Latin text on the same line necessitate handling of bi-directional text. These features affect the display of Arabic text in mail programs and Web browsers. One of the most important server processing issues for Arabic text is the problem of search and indexing. These operations are more involved in Arabic than many other languages.

The representation and transport problems are external to Arabic, meaning that they are not related to the features of Arabic text. Rather, they are byproducts of Internet protocols originating in the Western world, which uses Latin characters. These problems are shared with many other languages of the world. While the display problems relate to the features of Arabic, they do not affect transport of Arabic text.

Solutions have started to emerge with browsers and mail programs building on new Internet standards (such as MIME). The trend toward Unicode also helps the exchange of Arabic on the Internet. Other interim solutions are frequently used, such as encoding text as graphics and relying on ad hoc rules in Web servers to guess the Arabic capabilities of browsers and send information accordingly. The trend set by the Internet standard setters toward internationalization of Internet protocols is also very encouraging (e.g., [2]).

Character sets

The character set serves a major role in any information processing or exchange system. Text, the building block for human-understandable information, is encoded on computers and transmitted through networks in the form of integers, because computers at their lowest levels can deal only with numbers. The character set serves as the table for conversion between the textual and numeric forms.

Many times, the term "character set" is used to mean different things. A definition of the term and its related terms follow [3]:

A Coded Character Set is a mapping from a set of abstract characters to a set of integers, such as ISO-8859-6.
A Character Encoding Scheme is a mapping from a Coded Character Set (or several coded character sets) to a set of octets, such as UTF-8.
A Transfer Encoding Syntax is a transformation applied to character data to allow it to be transmitted, such as base64 encoding. It is used to transform encoded text into a format that is transmissible by specific protocols, which is frequently necessary.

The knowledge of the name of the character set used is needed for the correct transport (or encoding if needed) and the ability to decode the text at the other end. Completely specifying the parameters of a textual transmission requires both (1) a set of labels for specifying the character set, encoding scheme, and transfer syntax used, and (2) a technique for attaching these labels to the data. The labels are typically registered with the Internet Assigned Number Authority (IANA). Specifying the character set can be done through MIME headers, which will be shown below.

Arabic character sets

Here we review the history of Arabic character sets. In 1981, CUDAR-U appeared as the first standard Arabic character set (which used 7 bits per character). In 1982, the Arab Standards and Metrology Organization (ASMO) produced its first character set standard, AMSO-449 (7 bits). It became the basis for all subsequent standard sets. In that, it has a role similar to ASCII for Latin characters. In 1986, ASMO standard 708 appeared (8 bits), and became the international standard ISO-8859-6 [4]. Since then, it has gained widespread acceptance and was used particularly in the Arabized Macintosh system.

In the 1980s, more than 20 Arabic code sets coexisted, most of which were p-code sets. With the spread of personal computers and the MS-Windows operating system in the 1990s, Microsoft's MS-Windows Arabic code page (MSCP-1256) became almost a de facto standard (a situation not unique to Arabic!). Microsoft opted not to use the standard 8-bit character set and developed its own to allow simultaneous use of Arabic and French and use of display control characters.

Unicode

It is believed that the Universal Character Set (UCS, Unicode) has the potential to solve the problem of the plurality of character sets [5].

This position is supported by the following reasons: It is the strategic direction of major software and Internet developers. It is also promoted on the Internet as the character set of choice in new protocols, while older protocols can use its various encodings. Finally, a study on the suitability of the Unicode representation of Arabic found that it is suitable for this task [6].

The Universal Character Set was developed jointly by the International Standards Organization (ISO-10646 [7]) and the Unicode consortium ([8]). The most important feature of this set is that it uses 32 bits to code virtually all characters of the world, and it codes over 35,000 characters. The Arabic coding in Unicode coincides with the ASMO-449 code page. UCS has various transfer encoding syntaxes such as: UCS-4 (32 bits per character), UCS-2 (16 bits), UTF-16 (multiple 16 bits), UTF-8 (multiple 8 bits), and UTF-7 (multiple 7 bits).

Unicode has detailed character property tables and algorithms (e.g., bi-directional text display), which are particularly suited for Arabic. Further, it provides characters for text directionality.

Arabic display issues

Rendering Arabic text

When discussing the display of characters, it is important to distinguish between the characters themselves and their visual representation, called glyphs. While a character is a letter, a series of characters are visually represented as a series of glyphs. This is particularly important in Arabic, where shapes of characters depend on the context. To display Arabic text correctly, a context analysis program is needed to select the right shape of a character (glyph) depending on the context. The context is not necessarily the preceding and following characters only. Arabic script is highly decorative, and many ligatures (a glyph for multiple characters) are used, especially in stylized fonts. This implies that Internet client programs that display Arabic (such as Web browsers) must employ contextual analysis or rely on an underlying operating system to do that. Finally, Arabic has a number of diacritic marks that are written above and below the characters to aid in pronunciation. The diacritics must naturally be displayed in their places.

Displaying bi-directional text

Arabic text is stored in logical (reading) order. Before it is displayed, it must be reordered correctly on the screen. This is an important issue because most computer systems are designed to display text left-to-right, and also because bi-directional text must be simultaneously displayed on the same text line (e.g., Arabic words and numbers).

Unicode defines a direction property for each character and provides a text directionality algorithm for the display of bi-directional text [8]. The directional property of Arabic and Hebrew characters is strong right-to-left, while the characters of other languages are strong left-to-right.

The text directionality algorithm uses a set of directional ordering codes to influence the ordering of text. These codes are used for embedding one language into another (e.g., RLE) and for overriding the default direction of text (e.g., RLO). The algorithm is rather involved, so the details are left out.

Arabic e-mail

The specification of electronic mail on the Internet has two major components: mail transport and mail message format. Mail transport over the Internet is governed by the Simple Mail Transfer Protocol (SMTP) [9], which is an application-level protocol that runs over TCP, and by the newer Extended SMTP [10]. The format of Internet e-mail messages is specified in "RFC 822: Standard for ARPA Internet Text Messages" [11] and is updated by the MIME standard [12].

Both the transport and message format standards hinder the exchange of Arabic e-mail messages. The SMTP standard stipulates the transfer of ASCII-text messages and, in fact, older implementations enforce this. This means that non-ASCII (8 bit) characters are not guaranteed to be transported to their destination. The message format standard RFC 822 specifies that a message has two main parts: the header and the body. The body is composed of lines, each of 1,000 characters or less of 7-bit U.S.-ASCII characters. The header is composed of lines, each of which is a long line of printable ASCII characters and has the general format:

"field-name: field-body <CRLF>."

An example of a header field is:

Date: 13 Feb 88 1429 EDT

So the two problems in exchanging Arabic e-mail are (1) correctly transporting Arabic messages that are encoded using 8 bits, and (2) specifying the language and character set used in a particular message, since transporting e-mail does not involve a prior exchange of information about content (as in HTTP).

These problems are both solved by the MIME standard. MIME allows labeling and structuring message contents using RFC 822 headers, because it introduces a new set of header fields that are added to the message header. By so doing, it allows the sending of binaries and non-ASCII text through e-mail by encoding them in ASCII. Further, it facilitates specifying the character set used in a message.

The MIME facilities important to our discussion are (1) encoding the message in 7 bits to be transported safely, (2) labeling of the character set used in the message body, and (3) labeling of the character set used in the message header. These facilities are discussed next.

Encoding message body

Using MIME, 8-bit content in the message body can be encoded using 7 bits. The transfer encoding syntax is specified in the header field "Content-Transfer-Encoding," which can take on the values:

7bit for contents originally in text lines of 7-bit characters

8bit for contents originally in text lines of 8-bit characters

Binary for contents originally in 8 bits (not necessarily lines, as in image data)

Base64 for contents encoded using the base64 transfer encoding syntax

Base64 is a transfer encoding syntax that represents groups of 24-input bits as output strings of four encoded characters. The encoded characters are from an alphabet of 64 ASCII characters. This encoding increases text size by 33%.

The following is an example of the header field of a message whose body is encoded in base64:

Content-transfer-encoding: base64

Indicating body character set

Using MIME, the content of the message body is labeled using the special header field: "Content-Type," which has, as a parameter, the character set specification field "charset."

The following is an example of the header field of a message whose body uses the ISO-8859-6 character set:

Content-Type: text/plain; charset=ISO-8859-6

Indicating header character set

Using MIME, the message header can contain non-ASCII text by using inline labeling [13], whose format is:

"=?" charset "?" encoding "?" encoded-text "?="

An example of a non-ASCII header is:

Subject: =?ISO-8859-6?B?SWYgeW91IGNhbiB=?=

where the encoding "B" refers to base64.

The names of character sets that are used in MIME headers must be registered with IANA [15]. The registered character set names for Arabic include: ISO-8859-6 (ASMO-708), ISO-9036 (ASMO-449), Windows-1256, and ISO-10646 (Unicode).

Extended SMTP

Going back to the transport of messages, Extended SMTP (ESMTP) [10] improves on SMTP by allowing the transport of 8-bit text by using the "8BITMIME" extension. However, both sides must use ESMTP and must negotiate first. It also might be necessary to MIME encode message first.

Obviously, mail clients need to be MIME-compatible to benefit from the above-mentioned facilities. In fact, MIME is now used in most e-mail clients' Web browsers.

In addition to supporting MIME, Arabized e-mail clients must be able to display Arabic text. Several Arabized e-mail clients are now available including: Sindbad from Sakhr, which is an Arabization layer for Netscape Navigator, Tango from Alis, which supports many languages simultaneously, and Exchange from Microsoft.

Arabic web browsing

The specification of the World Wide Web (WWW) system has two main components: The page transfer protocol (HyperText Transfer Protocol, HTTP) and page description language (HyperText Markup Language, HTML). HTTP is an 8-bit clean protocol, meaning that it allows the transport of Arabic pages in 8-bit character sets. The major issues surrounding the use of Arabic on the Web are the labeling of the character set used and the of marking up Arabic pages in HTML.

The internationalization of HTML standard [16] introduced many new features that facilitate the use of Arabic on the Web. These features are now incorporated in HTML 4.0 [17], which is based on Unicode and is a W3C recommendation at the time of this writing.

The new internationalization features relevant to Arabic include: (1) indicating character set, (2) tagging of language, (3) mark of bi-directional text, and (4) controlling cursive joining behavior. These features are discussed next. This section concludes with a discussion of alternate web Arabization techniques.

Indicating character set

Indicating the character set of a document may now be performed in three ways, described here in increasing order of priority. First, it could be specified on the "charset" attribute of the "A HREF" element, as in the following example:

<A HREF=doc.html CHARSET="ISO-8859-6"> ... <A>

which specifies that the document "doc.html" uses the character set "ISO-8859-6."

The second way is to use the "META" element in the HTML document header with the MIME-like content-type header, as in the following example:

<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-6">

which specifies that the HTML document uses the "ISO-8859-6" character set.

The third way is to specify the character set in the HTTP header sent ahead of the document [18] MIME tags, as in the following example:

The HTTP Request The Response

GET /Arabic.html HTTP/1.0 Accept-Language : ar, en Accept-Charset: ISO-8859-6 Accept: */* 200 OK Content-Type: text/html; charset=ISO-8859-6 Content-Language: ar Content-Length: xxx ... data ...

Language tagging

Language tagging is different from character set specification. It helps in performing high-level operations such as searching, sorting, hyphenation, and spell checking.

Specifying the language of a text block is done through the new "lang" attribute, which can be part of most HTML elements, including the new "span" element, as in the following example:

<span lang=ar>... Arabic text ...</span>

The language codes used with this attribute are defined in [19].

Bi-directional text markup

The HTML bi-directional specifications promote the use of the Unicode directional text display facilities. It stipulates that if the Web client (browser) claims to display bi-directional text, then it must use the Unicode algorithm. Text directionality is encoded in the directional property of the characters. Yet, additional directional markup is specifically needed for direction-neutral text and for tables.

HTML offers higher-level markup constructs to control text direction, which have a function identical to Unicode's direction characters.

An example where bi-directional text may require additional markup is that of neutral characters, as to determine the position of a double quote when its sits between an Arabic and a Latin letter. For that, two marks are defined: the left-to-right mark "&lrm " and the right-to-left mark "&rlm ," which are invisible characters with no effect otherwise.

The direction attribute "dir" indicates the base directionality of the text and can take either of the two values "LTR" or "RTL." It is attached to block-type elements such as <HTML>, <P>,<LI>, and <TD>, and sets the default value of the "ALIGN" attribute as well. It also affects the correct placement for bullets and aids in setting up bilingual tables. An example of an Arabic table cell is:

<TD lang="ar" dir="rtl">... Arabic text...</TD>

Cursive joining behavior

To mark up unusual cases for cursive text, HTML offers the zero-width joiner "&zwj," and the zero-width non-joiner "&zwnj." Here is the description from the HTML 4.0 [17] specification:

The zwnj entity is used to block joining behavior in contexts where joining will occur but shouldn't. The zwj entity does the opposite; it forces joining when it wouldn't occur but should. For example, the Arabic letter "HEH" is used to abbreviate "Hijri", the name of the Islamic calendar system. Since the isolated form of "HEH" looks like the digit "five" as employed in Arabic script (based on Indic digits), in order to prevent confusing "HEH" as a final digit five in a year, the initial form of "HEH" is used. There is no following context (i.e., a joining letter), however, to which the "HEH" can join. The zwj character provides that context.

Forms

When filling a form, the Web client needs know what character set the server will accept. The "ACCEPT_CHARSET" attribute, which attaches to the "FORM" element, specifies the list of character encodings for input data that are accepted by the server processing this form. As an example:

<FORM ACTION=... ACCEPT_CHARSET="ISO-8859-6, UCS-2">

There yet to be an effective method for specifying the character set in a filled form.

Other Arabization techniques

The confusion in the character set scene and the complexity of displaying Arabic text have restricted the growth of Web use in Arabic. Web publishers have resorted to creative techniques to overcome these problems.

Text as image techniques

The first solution that gained wide use was presenting text as graphics. Pages of text were converted to images, which could be displayed by most Web browsers. Needless to say, this solution suffers from many disadvantages, most notably the huge increase in page size (almost two orders of magnitude), which means slower page download and display. Also, it is not possible to do text operations such as search, selection, copying, or editing on the text images, unless the image undergoes character recognition, which is not necessarily accurate. This is in addition to the burden it places on the publisher of the pages in terms of increased storage space and complex publishing procedures.

A slight improvement using similar technology was to use individual character images and construct words and sentences from the character images instead of making an image for the whole text block [20]. The number of different images that may be used is limited to the number of Arabic character shapes (note that an Arabic character can have multiple shapes). This means that contextual analysis is performed to select the appropriate character images. This method reduces the download time needed, because the character images are stored in the browser and need not be downloaded each time. This method does not address, however, the need to edit the text and perform text operations such as searching and indexing.

Other publishers took another route by providing multiple versions of the same texts, such as providing an image version and texts in different character sets.

Proxy conversion

Still another proposed solution is to have the Web server deduce the characteristics of the client (browser), particularly the supported character set, and then supply a version of the page that was suitable for the client. Such a solution is proposed in [21]. There, a proxy web server translates the character set of web pages on the fly. The pages could be stored in Unicode and converted on the fly to the required character set.

Java applet

A solution based on Java is to have the intelligence to display Arabic in a Java applet and to let it manage displaying Arabic text in the browser. This means having a Java applet running, adding computation overhead and also limiting text-processing options.

Automatic detection

Another technique that is used particularly in browsers is to add intelligence to infer the character set of a Web page by analyzing its contents, as in the Sindbad browser from Sakhr. The browser then switches to the inferred character set and displays the page for the user. This involves some heuristics but appears to be relatively accurate, as there are only two major Arabic character sets to worry about. A similar technology is used in the Alta Vista search engine [22] to infer the character set. It is currently capable of inferring the languages from a set of 25 languages (excluding Arabic).

In summary, to be Arabized the browser must have the ability to select a language, select a character set, load a font, accept Arabic input, and display bi-directional text. Browsers that provide these capabilities to different degrees are now available, including: Sindbad from Sakhr, which is an Arabization layer for Netscape Navigator; Tango from Alis, which supports many languages simultaneously; and Explorer from Microsoft.

Arabic search and indexing

With the huge amount of information on the Internet, search and indexing tools are crucial for locating specific resources and organizing information. The search and indexing of Arabic text is more involved than other languages such as English. The paper by Al-Kharashi [23] provides an extensive coverage of this topic.

Arabic is a derivational language where words are derived from a root. Searching and indexing Arabic text (e.g., for searching the Web) must rely on the root of a word and not merely on the final form. Further, the same word can have more than 100 combinations of prefixes and suffixes, which in English would be preceding stop words such as "with" and "for." These numerous derived words, although related in meaning, are not necessarily consecutive in a word index, implying that a word and its derivations could have a lot of entries spread out in an index. Search systems for Arabic need to employ morphological analysis, which is an involved process and has its limitations. The large number of synonyms of Arabic words intensifies this problem greatly.

Further, Arabic has a large number of combined word expressions. To search for an expression, one needs to use logical operator such as "and" or "near" to find the expression. Further, current usage of Arabic includes many foreign words that are written in Arabic letters. These foreign words can be written in different ways, which leads to difficulties in retrieving them.

A few search and indexing systems are currently available on the Internet that can handle Arabic text. Some of the standard search engines on the Internet such as Alta Vista [22] and Infoseek [24] allow entering Arabic keywords and searching for Arabic documents, although Arabic is not officially supported by them (assuming the use of an Arabized browser). Specialized Arabic site indexes of the Internet include Ayna [20] and Naseej [25]. Specialized Arabic search engines include Ayna [20] and Alidrisi [26]. The specialized Arabic sites provide varying native Arabic search capabilities; however, the coverage is still limited.

Summary

The Arabic language is being increasingly used on the Internet, despite significant obstacles. The paper has discussed the major problems and outlined how new international standards and new Internet protocols are helping to alleviate the problems.

One of the biggest problems is the issue of multiple character sets for representing Arabic. The use of MIME tags in specifying the name of the character set used has been discussed. The advent of Unicode will perhaps mean the replacement the other character sets and thus a unified character set for all languages. However, this is not expected to happen anytime soon.

Arabic has some particular characteristics that require specialized display routines, including the need for contextual analysis to select the appropriate character shapes and the incorporation of a bi-directional text display routine to order the text correctly. The Unicode standard provides a bi-directional display algorithm.

The two most important Internet applications, e-mail and the WWW, must work in Arabic seamlessly. The paper discussed how MIME facilitates this for e-mail by allowing the specification of character set and by encoding 8-bit messages in 7-bits for safe transport. The new HTML 4.0 specification also provides facilities for Arabic character set indication and for Arabic message markup.

Alternate techniques for Web Arabization were discussed as well. These are viewed as interim solutions until a simpler and more satisfying solution is found. It is our opinion that when Web servers adhere to HTTP standards and include character set information in the header for page transmission, and when Web browsers use that information and set up the pages accordingly, the problem will be solved.

The ability to search and index Arabic content on the Internet is crucial. Features of the Arabic language that relate to text search were mentioned, and available systems were listed.

References

Al-Fantookh A. and Al-Badr B. "Survey of the Usage of Arabic Internet Technologies." In Proceedings of the First KSU Workshop on Internet Arabization. May 18, 1997. Riyadh, Saudi Arabia. (In Arabic.)
Alvestrand H. "IETF Policy on Character Sets and Languages." RFC 2277. January 1998.
Weider C., et al., "The Report of the IAB Character Set Workshop held 29 February - 1 March, 1996." RFC 2130, April 1997.
International Organization for Standardization, "Information Processing - 8-bit single-byte coded graphic character sets - Part 6: Latin/Arabic Alphabet." ISO 8859-6: 1987.
Al-Badr B. "Arabic Character Sets: Towards a Unified Standard." Report No. C-2, August 10, 1997, Computer and Electronics Research Institute, KACST, Riyadh.
Jaffal J. "The New Encoding Technology of the UCS." In the Proceedings of the Second Symposium on Computer Arabization. King Saud University. Riyadh. (In Arabic.)
ISO/IEC 10646-1:1993. International standard -- Information technology -- Universal multiple-octet coded character set (UCS) -- Part 1: Architecture and basic multilingual plane.
The Unicode Consortium, The Unicode Standard V. 1.0. Addison Wesley. 1992.
Postel, J., "Simple Mail Transfer Protocol." STD 10, RFC 821, August, 1982.
Klensin K. et. Al. "SMTP Service Extensions." RFC 1869. November 1995.
Crocker, D., "Standard for the Format of ARPA Internet Text Messages." STD 11, RFC 822, UDEL, August 1982.
Freed N. and Borenstein N., "Request for Comments 2045: Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies." November 1996.
Moore K.. "Multipurpose Internet Mail Extensions (MIME) Part Three: Message Header Extensions for Non-ASCII Text." RFC 2047, November 1996.
Simonsen, K., "Character Mnemonics & Character Sets." RFC 1345, Rationel Almen Planlaegning, June 1992.
ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets.
Yergeau F., et al. "Internationalization of the Hypertext Markup Language." RFC 2070. January 1997.
Raggett D. "HyperText Markup Language Specification Version 4.0." December 1997. Available at http://www.w3.org/TR/REC-html40.
Fielding R., et. Al.. "Hypertext Transfer Protocol -- HTTP/1.1." RFC 2068. January 1997.
Alvestrand, H., "Tags for the Identification of Languages." RFC 1766, UNINETT, March 1995.
http://www.ayna.com.
Al-Rashid R., Al-Amri M. and Al-Badr B. "Proxy Server for Web Arabization." In Proceedings of the First KSU workshop on Internet Arabization. May 18, 1997. Riyadh, Saudi Arabia. (In Arabic.)
http://www.altavista.digital.com/
Al-Kharashi, I. "Search System Technologies on the Internet and the Arabic Language." In Proceedings of the Kuwait Conference on Information Highway, March 18, Kuwait, To appear. (In Arabic.)
http://www.infoseek.com/
http://www.naseej.com/about.htm
http://www.alidrisi.com/idrhelp.htm
Durst, Nicol, Yerge, and Adams, "Internationalization of the Internet." An Inet 97 tutorial, Kuala Lumpur, Malaysia, 1997.
Al-Badr B. "Standards for Supporting Arabic on the Internet." In Proceedings of the First KSU workshop on Internet Arabization. May 18, 1997. Riyadh, Saudi Arabia. (In Arabic.)

INET Conferences

Using the Internet in Arabic: Problems and Solutions

Abstract

Contents