Introducing E-OTI's Education Feature

Dear e-OTI Readers:

I am delighted to launch the first in a regular series of articles on how the Internet continues, on a global scale, to affect the way we teach and learn. The first education installment deals with the launch of the first formal, written Guidelines for Computer-Based Testing. In the past five years, certification and testing for the information technology industry have grown an estimated 35 percent every six months, and the trend continues in the new millennium. Many companies and educational institutions are trying to learn how to incorporate assessment into the new online learning and computer-based learning paradigms. James Olsen, who prepared the article, is a pioneer in the field of computer adaptive testing and a major contributor to the development of the guidelines. Part of a year-long effort sponsored by the Association of Test Publishers and leading information technology companies and educators, the guidelines are designed to supplement NCME and APA Guidelines and are available through the Test Publishers Association.

Let us know what you think of this article, and please feel free to submit pieces that will help our membership become aware of online innovations and breakthroughs in teaching and learning worldwide. Please contact me at aranag@earthlink.net; I will be happy to send you author guidelines and to exchange ideas and links for future articles. Our goal is to have regular contributing authors who will write pieces that highlight exciting online educational projects—projects that further lifelong learning and the empowerment of individuals and communities in every region of the world. We also welcome you to send us links to sites and people who are making a difference in education. In June we will have an article about the African Information Technology Conference. Thanks, and have a great month!

Arana Greenberg
e-OTI Education Editor

Guidelines for Computer-Based Testing
By James B. Olsen

This article introduces the recent Guidelines for Computer-Based Testing, published by the Association of Test Publishers (ATP). The guidelines were released at a professional testing conference held on February 17, 2000, at Carmel Valley Ranch in Carmel, California. The conference was dedicated to the life and memory of Frederic Mather Lord (1912–2000), who contributed significantly to the theory and applications of educational measurement, item response theory, and computer-based and computerized adaptive testing.

Computers are now standard and pervasive tools that significantly affect our daily lives. In testing and assessment applications, they have changed the ways in which tests and assessments are developed and administered. Computer-based tests are defined as tests or assessments that are administered by computer in either stand-alone or networked configuration or by other technology devices linked to the Internet or the World Wide Web. In the face of the rapid growth of computer-based testing, the ATP sponsored the development of formal, written guidelines to help ensure high measurement quality of computer- and Internet-based tests and to provide direction for the principles and procedures used for developing and administering those tests. Guidelines for Computer-Based Testing is intended to supplement, extend, and elaborate on the recently published Standards for Educational and Psychological Testing (Joint Standards) as they apply to computer-based and Internet-based testing and assessment.

Audiences for the Guidelines for Computer-Based Testing

The guidelines can appropriately be used by a wide variety of audiences, including:
  • Test development organizations—for specifying procedures for designing, developing, field-testing, and validating computer-based tests
  • Test publishers and administrators and test delivery organizations—for establishing common industry guidelines for communication of test items, exam scores, and item response information to and from computer-based testing locations
  • Test delivery organizations—for providing information about how to achieve high-quality delivery of computer-based tests
  • Test takers—for providing information about and descriptions of the types of test items, tests, and test score interpretations and test orientations they might encounter when they take a computer-based test
  • Research and evaluation specialists—for providing information on current and expected future uses of computer-based tests
  • Teachers at educational institutions that administer or use computer-based tests—for providing information about interpreting test scores from these exams and using these scores appropriately, as well as about helping students prepare appropriately for computer-based tests
  • Advanced technology companies—as aids in determining how to create products and services that might be beneficial in solving current or future problems and issues in computer-based testing
  • General readers—for providing interesting and useful information

Millennium Conference Held to Release the Guidelines for Computer-Based Testing

The February conference consisted of pioneering leaders in computer-based testing. The conference was called to address emerging applications and practices that are aligned with the recently completed guidelines. Dr. Ronald Hambleton, professor of education and psychology and chairperson of the Research and Evaluation Methods Program at the University of Massachusetts, opened the conference by commenting that a strong research base, improved psychometric methods, and expanded item banks are critical in reaching the potential of computer-based testing.

Participant presentations included a combination of traditional and classical test theories and leading-edge technologies. Presenters discussed best practices associated with computer-based testing delivery as attendees learned about the potential such delivery provides for developing and delivering tests with improved validity, fairness, and reliability due to the increased capability, adaptability, and realism that computer delivery makes possible. Representatives from the ATP guidelines committee gave the 150 attendees a historical overview and current measurement theory context for the guidelines.

Concise working sessions covered the relevance of computer-administered testing. Representatives from Alpine Media, Educational Testing Service, the University of Nebraska, HumRRO, Microsoft, Lotus Development, the Northwest Evaluation Association, Novell, and Hewlett-Packard presented research in the following areas:

  • New strategies for computer-based testing (CBT)
  • Educational implications of the ATP guidelines
  • Implementing the ATP guidelines
  • Applications for high-stakes licensure testing
  • Use of innovative item types for testing higher-level cognitive abilities
  • Issues and challenges in test planning and design for CBT

Conference attendees included representatives from industrial/organizational, clinical, education, certification, and licensure groups. "The unique size and structure of the conference allowed for great interaction," said John Oswald, president of the ATP. G. William Harris, executive director of the ATP, was especially pleased with the outcome. "We have designed a forum that provides continuous learning for the testing community at large," he said. "Our plan is to offer a conference on an annual basis to address the issues and realities of the computer-based testing arena."

Closing keynote speaker Craig Mills, executive director of examinations at the American Institute of Certified Public Accountants, summed it up: "There will be an explosion of new item types and testing methodologies," he said. "We must be ready with good tools and approaches to manage this explosion."

Relationships between the Standards for Educational and Psychological Testing and the Guidelines for Computer-Based Testing

Published near the end of 1999, the Standards for Educational and Psychological Testing were adopted by the leadership organizations of the American Educational Research Association (AERA), the American Psychological Association (APA), and the National Council on Measurement in Education (NCME). The document states: "The purpose of publishing the Standards is to provide criteria for the evaluation of tests, testing practices, and the effects of test use. Although the evaluation of the appropriateness of a test or testing application should depend heavily on professional judgment, the Standards provide a frame of reference to assure that relevant issues are addressed. It is hoped that all professional test publishers will adopt the Standards and encourage others to do so." (AERA, APA, NCME, 1999, p. 2)

The committee assisting with development of the guidelines decided that the guidelines would be written to supplement, extend, and elaborate on the standards. The committee said that all computer-based tests should be designed by using the fundamental standards identified in the six technical areas for 1) test construction, evaluation, and documentation; 2) reliability and errors of measurement; 3) test development, and revision; 4) scales, norms, and score comparability; 5) test administration, scoring, and reporting; and 6) supporting documentation for tests. The committee also recommended that all computer-based tests be used in accordance with fundamental measurement standards for test fairness, including fairness in testing and test use, in the rights and responsibilities of test takers, in testing individuals with diverse linguistic backgrounds, and in testing individuals with disabilities.

The Standards make only three specific references to computer-based testing:

Standard 3.12: "The rationale and supporting evidence for computerized adaptive tests should be documented. The documentation should include procedures used in selecting subsets of items for administration, in determining the starting point and termination conditions for the test, in scoring the test and for controlling item exposure." (AERA, APA, NCME, 1999, p. 45)

Standard 5.5: "Instructions to test takers should clearly indicate how to make responses. Instructions should also be given in the use of any equipment likely to be unfamiliar to test takers. Opportunities to practice responding should be given when equipment is involved, unless use of the equipment is being assessed." (AERA, APA, NCME, 1999, p. 63)

Standard 6.11: "If a test is designed so that more than one method can be used for administration or recording responses—such as marking responses in a test booklet, on a separate answer sheet, or on a computer keyboard—then the manual should clearly document the extent to which scores arising from these methods are interchangeable." (AERA, APA, NCME, 1999, 70)

The committee concluded that additional guidelines were needed to supplement, extend, and elaborate on the Standards when they were applied to computer-based tests. The published guidelines are separated into two parts. The first part provides general background and explanations of the guidelines and includes chapters called "Introduction," "Validity and Test Design," "Test Development and Analysis," and "Test Administration." The second part provides the specific computer-based testing guidelines. Part 2 includes chapters entitled "Planning and Design" (11 guidelines), "Test Development" (23 guidelines), "Test Administration" (18 guidelines), "Scoring and Score Reporting" (11 guidelines), "Psychometric Analysis" (13 guidelines), and "Stakeholder Communications" (7 guidelines). To illustrate the breadth of these guidelines, one guideline from each of the major categories follows.

"Planning and Design," guideline 1.4: "A wide variety of computer-based [tests] can be designed and developed to meet different purposes. The test specification for computer-based tests should include: the test purpose, the content domain definitions, the content structure for the test items, required response formats for the test items, sample test items illustrating the response formats, the number of items to be developed and administered, scoring and reporting formats and procedures, and test administration procedures. The test specification should be thoroughly documented."

"Test Development," guideline 2.1: "The test delivery environment should be evaluated before item authoring begins. It is important to make sure that items being created can be properly displayed in the test delivery environment and that test-taker input and results can be collected, aggregated, and reported. For example, graphics constraints need to be identified so that item writers do not create items that have too many colors or require a screen resolution that is too high for the current test delivery environment."

"Test Administration," guideline 3.1: "The test sponsor should provide test-takers with clear and concise information regarding procedures to register for an examination, obtain an authorization-for-testing document, and scheduling a test appointment."

"Scoring and Score Reporting," guideline 4.7: "The accuracy of computer scoring algorithms should be established prior to implementation of the computer-based test."

"Psychometric Analysis," guideline 5.1: "Determine appropriate reliability indices if different test-takers are given different items or exercises."

"Stakeholder Communications," guideline 6.2: "When appropriate, developers of computer-based tests should provide sufficient information concerning the test purpose, and test content specifications to test users prior to when the test is available for widespread administration. This test information should be kept accurate and as up-to-date as possible."

The six foregoing guidelines show that the guidelines provide supplemental and elaborative information on the Standards for individuals and organizations seeking to develop computer-based or Internet-based tests and assessments.

For further information on the guidelines, contact the Association of Test Publishers, 1201 Pennsylvania Avenue, Suite 300, Washington, DC, 20004. The phone number in the U.S. is 202-857-8444.

Comparisons of the ATP Guidelines with Previous Professional Efforts

Early efforts to determine technical guidelines for assessing computerized adaptive tests were summarized in a 1984 Journal of Educational Measurement article by Bert Green, Darrell Bock, Lloyd Humphreys, Robert Linn, and Mark Reckase. The article addressed technical guidelines related to dimensionality, measurement error, validity, estimation of item parameters, item pool characteristics, and human factors. An additional pioneering article in this effort, "Developing Standards for Computerized Psychological Testing," was written by Paul Hofer in Computers in Human Behavior in 1985. This article identified the need for standards in the categories of equivalence, test takers, format and content, and equipment and procedures. Then, in 1995, Barbara Plake, director of the Buros Institute for Mental Measurement at the University of Nebraska–Lincoln, chaired a task force for the American Council on Education. That task force developed guidelines for computerized adaptive test development and use in education. Plake’s presentation at the February 17, 1995, ATP conference noted that the ATP guidelines were organized consistent with the Standards, provide greater specificity, are more state-of-the-art, and cover more components of the testing and assessment process.


Following are some of the key indicators of the growth of computer-based testing. The area of computer-based testing and assessment is emerging as a significant professional field in educational measurement. There have been three major reference editions for educational measurement: in 1951, 1971, and 1989. The proportion of reference pages devoted to computer-based or machine-based testing in each of those three editions was 3 percent in 1951, 8 percent in 1971, and 14 percent in 1989. By the turn of the millennium, the number of computerized tests that had been administered in professional testing and assessment centers reached at least 4.5 million. It is estimated that Internet-based testing programs have administered at least several hundred thousand additional computer-based tests. Some school districts have also installed district-wide computerized testing systems. New and innovative Internet-based testing systems are being developed and are available for worldwide assessment applications.

To show how far we have come, here are some final quotes. Bert Green, in the conclusion of his comments on the significance and insights of Fred Mather Lord’s paper on tailored testing, says: "The computer has barely started to establish itself in the testing business. As experience with computer-controlled testing accumulates, we can expect important changes in the technology of testing. Most of these changes lie in the future. Lord’s results, clear-cut and devastating as they are, will in the end seem a minor skirmish in the inevitable computer conquest of testing." (Green, 1970, p. 194)

Finally, in a 1988 article, Samuel Messick states: "Over the next decade or two, computer and audiovisual technology will dramatically change the way individuals learn as well as the way they work. Technology will also have a profound impact on the ways in which knowledge, aptitudes, competencies, and personal qualities are assessed and even conceptualized. There will also come a heightened emphasis on individuality in assessment with a premium on the adaptive measurement, perhaps even dynamic measurement, of knowledge structures, skill competencies, personal strategies and styles as they develop with instruction and experience. But although the modes and methods of measurement may change, the basic maxims of measurement, and especially of validity, will retain their essential character. The key validity issues are the interpretability, relevance, and utility of scores, the import or value implications of scores as a basis for action, and the functional worth of scores in terms of social consequences of their use." (Messick, 1988, p. 33)


About the Author

Dr. James B. Olsen is Chief Scientist at Alpine Media Corporation in Orem, Utah.

