A Major Art Resource on the World Wide Web: Putting the Roland Collection Resource Guide Online

Sheena Bassett, Francis Cave, David Jones, Roberto Minio and Mick Neary

The Roland Collection

The Roland Collection is a critically acclaimed comprehensive collection of high- quality videos and films on art, with topics ranging from prehistoric art, archaeology and architecture, sculpture and drawing, and philosophy and technique to modern-day art. The collection is owned by Anthony Roland, who has himself produced a number of films on art and who founded the collection when he realized that no such resource existed at the time. Over the years he has amassed a collection in excess of 450 programs on both 16mm film and VHS video made by 230 filmmakers from a total of 25 different countries. Many of the films are available in more than one language. Roland owns the distribution rights of the programs, many of which are available worldwide; in the United States, programs are also available for rental. A number of the programs are available as series, such as The Age of Baroque and La Grande Epoque. Typical purchasers are educational and public institutions that tend to buy subsections, if not the whole collection, for subsequent teaching aids and for lending to the public. The Roland Collection also includes a number of titles on modern literature, which covers writing and authors from all over the world. Finally, the collection is constantly growing in size and coverage as more titles are added.

The printed Roland Collection Resource Guide

The printed version of the Roland Collection Resource Guide, which was completed in March 1996, is a major resource guide to all the programs available in the collection and was developed by Anthony Roland himself. It is more than just a catalog, as a great deal of information is given about the different art periods, the people and the techniques; for example, short biographies are also included for the more famous artists. All programs are organized by sections, and also in series where applicable, with many of the programs illustrated with art images. The indexes were specially created to help both goal-oriented users and casual browsers find what they need and locate the programs of interest. There is also a separate guide to the five series on modern literature. As such, the resource guide constitutes a valuable and comprehensive information resource about the subject of art and modern literature.

Pira International's interest in the Roland Resource Guide

Pira International is the United Kingdom's independent research association serving the printing, packaging, paper and publishing industries in Europe. The Publishing and New Media Group, which undertook this project, are particularly interested in information management and network publishing, and so when they were approached by Anthony Roland, they were very keen to do this project, as the group felt that it could demonstrate how such a major resource could be created for the Internet using the recommendations and best practices that had emerged from its research and consultancy projects in this area. These projects include ATM trials for the purchase of images from photo agencies (between the United Kingdom and Denmark) and best practices for asset (i.e., images, text, sound and video clips, etc.) trading and management and for magazine publishing, and the group has prototyped electronic demonstrators for these and has spent three years working on Macmillan's Dictionary of Art.

The objectives of creating an online version of the Roland Resource Guide

Anthony Roland had a number of objectives in making his resource guide available over the Internet:

In addition, by use of a database, there would always be a complete and timely information set about the collection, and by careful design and management of these data, the information could be exported and used for other purposes such as future updates of the printed resource guide and part sections, and for the production of a CD-ROM.

Important criteria

From Anthony Roland's point of view, the major requirement for the online version of the resource guide was that it reflect the high quality of both the video and film collection and the printed version of the resource guide. It was extremely important that the design and impact of the online resource guide achieved this. As part of the printed resource guide, a complex set of indexes had been created to assist people to locate the program(s) of interest to them. In particular, the general index had been compiled to be comprehensive and was extensively cross-linked. For example, artists are indexed by first and surnames, and major topics contain many subcategories (such as countries, periods of art and languages; programs suitable for people with hearing difficulties are also indicated). These indexes are considered to be a very valuable part of the complete resource guide, and obviously would form a linchpin of the online resource guide. Although the other indexes for directors, authors, composers, film awards and narrators could be generated by searching the database, the general index was a greater challenge as this could not be generated by a general search (which would result in a large quantity of extraneous data).

Pira's perspective is that any business relying on content such as publishers, or in this case, a film and video collection promoted through a catalog, has to be structured around this content. This means that content, be it text and images from the resource guide or the assets comprising a multimedia CD-ROM, has to be stored and indexed in a structured way and that all associated information about the individual items, such as formats, copyright, prices, etc., should be stored with the items. As the requirements also included that information be updated on a regular basis, probably by nontechnical staff, and the information be exported for other purposes, a user-friendly editorial interface was also needed.

When an electronic product is created, the brand should be exploited as much as possible to create a synergy between both the off-line and online products. Examples of this include TIME magazine, which is available in traditional and electronic formats, and the Electronic Telegraph, which is the online version of the British Daily Telegraph Newspaper. Consequently, a similar relationship had to be developed between the printed resource guide and the online version within the confines of publishing on the Web (using HTML), but by also making the best use of other features such as hyperlinks.

From the user's point of view, the information about each program had to be easily accessible, the images should not take too long to download and the images needed to be of reasonable quality. Ordering had to be straightforward.

Finally, the rights for the images used in the printed resource guide allowed all the images except three to be used online for the promotional purpose of selling the films and videos. However, in order to deter illegal copying of the images, low-resolution copies were to be used, but preferably not poor-quality copies as this could reflect badly upon the product, i.e., the video or film.

The above defined the initial framework for the project.

Why the World Wide Web was chosen

As the resource guide required "pages" of text, sometimes with an image, the World Wide Web (WWW) was the obvious medium. Furthermore, because of the high inter-relationships between each program (which could appear in a series and in more than one section of the resource guide) and the high level of cross-referencing in the general index, the hyperlink aspect of the WWW was very useful. This cross-linking aspect has also been used to reference other art and literature sites, such as the major museums and individual artist's sites available on the Internet, to create an even greater resource.

A Web browser then had to be selected as the target development tool. Netscape version 1.1 was chosen for this purpose as this was at the time the most popular browser with Web users (about 60%). Although it was decided to avoid making heavy use of Netscape extensions (as this would make the pages rather messy for people using other browsers such as Mosaic), the use of some extensions such as tables and forms was necessary in order to meet minimum design criteria and standards. After some initial experimentation, a color palette of 256 colors was selected as most PC displays are set up for this minimum. If only 16 colors could have been used, then the problem with colors being displayed differently on different machines would have been largely avoided, but this made the supplied images unacceptable and possibly unrecognizable! In addition, a Portable Document Format (PDF) version of the printed resource guide was also made available so that the individual sections could be viewed with an Adobe Acrobat viewer and down-loaded for printing out.

Structuring the data using SGML

The text and images used in the printed resource guide were supplied in Quark XPress. The first step was to extract the text. It was decided to structure the text using SGML (Standard Generalized Markup Language) because of its inherent value in data modeling independent of any database management system that was ultimately chosen and because the source material was document-based. This allowed time to design the database. This also enabled changes to this design to be made without too much effort as the data could always be quickly reloaded, since data loading could be automated as the data was SGML-tagged. In addition, having created the SGML filter, the data could be exported from the database back into SGML and then into any other format required.

Selecting a database

The database selected for the project was BasisPlus by Information Dimensions Ltd. (IDL), which has a Web server interface linked to the CERN httpd and also a Common Gateway Internet (CGI) module for access from other http servers. It had already been used for a smaller-scale trial project, so the functionality was understood. IDL provided a high level of technical assistance, which was invaluable given the size of the project undertaken. BasisPlus is a document-retrieval database that allows single and multiple field searches via a number of different interfaces. The database is Unix-based and has a Windows front-end for editorial purposes.

Technical overview of the Web server

The online Roland Collection Resource Guide is a database with a CGI module, which contains text and images, stored in a number of separate tables. The documents that appear on the user's screen are in HTML 3. The user requests are always made to a CERN httpd server (V 3.0), but these requests are then translated into database queries (which are still URLs) via a Perl script. The database is then accessed from the Perl script via the CGI module provided by the database supplier. The documents that are output directly from the database WWW server are not sufficiently formatted for the needs of the catalog and are therefore post-processed by the same Perl script that called them up.

Figure 1

Figure 1

Database

BasisPlus is a full-text retrieval engine whose output has been modified to suit this particular application. The database is capable of holding both text and images. Using BasisPlus allows a number of features to be added to the catalog, which would not have been possible using a flat file approach. First, full-text searching can be rapidly carried out, cross-referencing can be easily carried out, so that the contents of one record can have an effect on the output that is being used for another, and maintenance is eased through only having a few files to maintain rather than the 1000-plus that would be needed otherwise.

Flow of control

The flow of control between the different components of the online resource guide is shown in the diagram below. The CGI in the figure is the "glue" that links the user's viewer with the database and includes a Perl script (roland.pl). This is invoked via a WWW server. This Perl script is approximately 1,000 lines long. Additionally the Perl script references a number of other public domain scripts to carry out specific tasks. The purpose of this script is to format all the queries into valid database queries and to then post-process the output of the database.

As a result of the CGI layer, the ability to control user display and input is greatly increased compared to using only a direct httpd to browser feed. For example, the values produced from pull-down menus can be completely transformed before they are used to construct queries on the database. A more detailed breakdown of the control sequence within the "Web server" module of Figure 1 is shown in Figure 2.

Figure 2

Figure 2

Some basic elements of the resource guide design

The resource guide is divided into 37 sections, including sections for art, for modern literature, and for contents pages, indexes and information about the Roland Collection and the resource guide itself. Sections 1-23 present the programs in art historical chronological order. The remaining sections 24-37 present the programs by subject. Each section has an introduction page and image followed by a varying number of programs. For the online version, each program has its own page that contains a program title, descriptive text about the program, and a number of data items about the film such as age-range suitability, duration, price, and availability (territorial) and a hyperlink to an order form. The page may or may not include an image and hyperlinks to a series (if the program belongs to one) and other possible cross-references within the text. Consequently, a template was designed to display these data as there are over 460 different program pages. The template contained the necessary HTML markup and links to images (the first title and the logo). Other information such as the hex values for the background and foreground colors were stored with the program information in the database.

In the printed resource guide, each section has two complementary colors associated with it that are used on the introduction page of each section. However, the pages featuring the program details are white, as it is very difficult to find a color that will go with all of the different images that may be used in that section. For the online version, we wanted to use the color theme and extend it into the program pages to indicate which section the program appeared in. In addition, the average line length was to be around eight words, as this was the maximum number of words the average user feels comfortable with. This was achieved by dividing the page up into a table so that the left-hand column was colored, while the title, descriptive text and image (when one was present) appeared on white in the right-hand column, the width of which resulted in the required eight- word average. All the technical and other information about the program appears in the left-hand column, along with the Roland Collection logo and the icons.

Another aspect of the page design was that we did not want users put off by the time it could take to load the image. To get around this, the image was placed toward the bottom of the page, below the descriptive text, and the size of the image was supplied to the browser to make down-loading as efficient as possible. Consequently, the text is displayed first, providing the user with information to read while the image loads into the page below this text. For the initial version of the online resource guide, the '.GIF' format was used (to be compatible with as many browsers as possible).

As mentioned previously, the general index could not be generated directly from the database. Instead, this was a labor of love where the individual link addresses had to be entered by hand and lists generated for the "see also" entries.

Some of the more unique pages were stored as individual pages containing the HTML notation. For example, all of the contents pages were done this way, as well as the text pages used for Help and for the introductory pages about the Roland Collection.

Development approach

A high level of cooperative development was employed on the project, with the client making frequent visits to review the work done and suggest changes. The great majority of the work was undertaken by the systems engineer, who designed the database and implemented it and wrote the Perl scripts for the interaction between the database and the Web pages, and by our information designer, who created the "look" of the Web site and how each of the Web pages was presented on screen. Other members of the group contributed by designing the SGML schemata, extracting the text and images from the Quark XPress files, converting the file formats and testing the online resource guide. The complete project was coordinated and managed by the senior consultant (with help from the chief consultant), who acted as the interface between the client and the production staff. In addition, the Pira Design Group were asked to contribute some ideas and paper mock-ups at the beginning of the project to give the client a selection of design approaches and for us to gain a better understanding of his requirements.

The design of the Web server was started by creating a "road map" of the required pages and links between them. Icons were added at a later stage when the structure had stabilized. Database interaction with the Web pages was minimal at the top level (i.e., the home page, introduction and contents pages), but increased with the indexes, and was minimal at the bottom level, which corresponded to the 460 and more individual program pages. As a template was being used at the bottom level and a great deal of functionality was required, it was decided that a bottom-up approach would be used, and the information designer started with this template. Since the top-level pages were one-offs, these could be coded later and dummy pages used in the meantime. Another consideration was that the home page design was to reflect the cover design of the printed resource guide, but since this had not been agreed upon at this stage, the home page would have to wait.

Some problems and solutions

Getting the design right was one of the most time-consuming aspects of this project. The main problem was creating a design that met the client's expectations within the confines of electronic publishing and HTML, and achieving the clarity of information needed for the amount of data to be presented. Many hours were spent exploring different possibilities or explaining to the client why a particular approach was either impossible to achieve, or possible but only with a lot of additional time and effort spent on writing complicated software. For example, a number of video extracts were available for use, but as these were usually two minutes or more in length, it was not feasible to use these with the Web server, as the average user would not have either the memory or bandwidth (or software) required to view the video clips. Storage space could also pose a problem in terms of space and expense, and the client felt that reducing a clip to a few seconds would not meaningfully represent the contents of the program.

The images also presented a problem. After extraction, the image resolutions were checked and found to be 72 dpi as required. However, when displayed on screen, there was a marked difference in the quality between the images, which ranged from very good to truly appalling. Some investigation revealed that this was a feature of the method used by Quark XPress to generate "placeholder images," which could result in images being generated from reduced data and consequently appearing pixelated and fuzzy. However, some of the TIFF format low-resolution images (from which the GIF images had been derived) were also of a bad quality, and therefore the only solution was to use the high-resolution scanned images to generate new lower-resolution (again 72 dpi) but better-quality GIF images (since these would be derived from a complete data set) to replace the few really bad images that we were using. In retrospect, we should have used JPEG for images; the quality would have been just as good, but the images would have downloaded much more quickly. This option had been turned down at the start of the project, as JPEG was not then supported by all browsers, but this situation changed during development. We had also considered having both GIF and JPEG versions of the images (the format displayed being dependent on the browser used), but this would have been additional work to the original proposal.

What we have learned from the project

As we progressed into the project, the true complexity of the data and its structure became more apparent. Although this resulted in more effort than originally envisaged on the technical side, we are convinced that the approach we adopted in formally structuring the data was the right one and was worth the additional overhead. The data can now be easily reused for other purposes and can be updated by nontechnical staff by means of the editorial desktop (graphical interface) that comes with the database. Another advantage of this approach is that new features that become available with later versions of browsers, such as frames, can be quickly adopted as the content is structured and kept largely independent of the HTML markup. Upgrades, such as the addition of the French language text, are also easy to implement. If the Web site had been constructed as a collection of flat files, the end result would have been an unmanageable tangle of nightmare proportions.

Use of the collaborative and incremental development style ultimately resulted in a satisfying product for the customer, but it took a much greater amount of time than we had originally budgeted for. Not only did Pira develop the online resource guide, but we also spent a lot of time educating our client in the possibilities and limitations of publishing online. In return, we have all learned a lot about design issues as well as art.

Future developments

A French-language version of the resource guide text is currently being created, and the translator has been requested to insert markup characters into the text. This can be directly converted into the full SGML markup and enables the automatic loading of tags into the database to allow users to select a French language version of the online resource guide. The primary reason for the translation is to supply the text in French on a diskette (in a compressed format) to go with the printed version of the resource guide. Other translations may be made in the future.

The image formats will be updated to JPEG since most browsers have begun to support it now, and it provides much more efficient compression. The default browser will also be updated to Netscape 2.0 to take advantage of frames. The database will also be upgraded to BasisPlus 2.0 as the new version provides much greater functionality, thus removing the need for the Perl script used with version 1.0.

The other possibility that is currently being explored is to create a CD-ROM version of the resource guide. As the content of the database will be constantly updated and can be exported into any required format, creation of a CD-ROM becomes more cost-effective since much of the data will already be in a prepared format. Depending on the numbers required, the manufacturing costs will be comparable, if not less than, producing the full printed resource guide and will certainly be much cheaper to mail. In addition, periodic updates are easier to make on electronic media than on the traditional ones.