Dear e-OTI Readers:
Millennium Conference Held to Release the Guidelines for Computer-Based Testing
The February conference consisted of pioneering leaders in computer-based testing. The conference was called to address emerging applications and practices that are aligned with the recently completed guidelines. Dr. Ronald Hambleton, professor of education and psychology and chairperson of the Research and Evaluation Methods Program at the University of Massachusetts, opened the conference by commenting that a strong research base, improved psychometric methods, and expanded item banks are critical in reaching the potential of computer-based testing.
Participant presentations included a combination of traditional and classical test theories and leading-edge technologies. Presenters discussed best practices associated with computer-based testing delivery as attendees learned about the potential such delivery provides for developing and delivering tests with improved validity, fairness, and reliability due to the increased capability, adaptability, and realism that computer delivery makes possible. Representatives from the ATP guidelines committee gave the 150 attendees a historical overview and current measurement theory context for the guidelines.
Concise working sessions covered the relevance of computer-administered testing. Representatives from Alpine Media, Educational Testing Service, the University of Nebraska, HumRRO, Microsoft, Lotus Development, the Northwest Evaluation Association, Novell, and Hewlett-Packard presented research in the following areas:
Conference attendees included representatives from industrial/organizational,
clinical, education, certification, and licensure groups. "The
unique size and structure of the conference allowed for great
interaction," said John Oswald, president of the ATP. G. William
Harris, executive director of the ATP, was especially pleased
with the outcome. "We have designed a forum that provides continuous
learning for the testing community at large," he said. "Our plan
is to offer a conference on an annual basis to address the issues
and realities of the computer-based testing arena."
Closing keynote speaker Craig Mills, executive director of examinations at the American Institute of Certified Public Accountants, summed it up: "There will be an explosion of new item types and testing methodologies," he said. "We must be ready with good tools and approaches to manage this explosion."
Relationships between the Standards for Educational and Psychological Testing and the Guidelines for Computer-Based Testing
Published near the end of 1999, the Standards for Educational and Psychological Testing were adopted by the leadership organizations of the American Educational Research Association (AERA), the American Psychological Association (APA), and the National Council on Measurement in Education (NCME). The document states: "The purpose of publishing the Standards is to provide criteria for the evaluation of tests, testing practices, and the effects of test use. Although the evaluation of the appropriateness of a test or testing application should depend heavily on professional judgment, the Standards provide a frame of reference to assure that relevant issues are addressed. It is hoped that all professional test publishers will adopt the Standards and encourage others to do so." (AERA, APA, NCME, 1999, p. 2)
The committee assisting with development of the guidelines decided that the guidelines would be written to supplement, extend, and elaborate on the standards. The committee said that all computer-based tests should be designed by using the fundamental standards identified in the six technical areas for 1) test construction, evaluation, and documentation; 2) reliability and errors of measurement; 3) test development, and revision; 4) scales, norms, and score comparability; 5) test administration, scoring, and reporting; and 6) supporting documentation for tests. The committee also recommended that all computer-based tests be used in accordance with fundamental measurement standards for test fairness, including fairness in testing and test use, in the rights and responsibilities of test takers, in testing individuals with diverse linguistic backgrounds, and in testing individuals with disabilities.
The Standards make only three specific references to computer-based testing:
Standard 3.12: "The rationale and supporting evidence for computerized adaptive tests should be documented. The documentation should include procedures used in selecting subsets of items for administration, in determining the starting point and termination conditions for the test, in scoring the test and for controlling item exposure." (AERA, APA, NCME, 1999, p. 45)
Standard 5.5: "Instructions to test takers should clearly indicate how to make responses. Instructions should also be given in the use of any equipment likely to be unfamiliar to test takers. Opportunities to practice responding should be given when equipment is involved, unless use of the equipment is being assessed." (AERA, APA, NCME, 1999, p. 63)
Standard 6.11: "If a test is designed so that more than one method can be used for administration or recording responsessuch as marking responses in a test booklet, on a separate answer sheet, or on a computer keyboardthen the manual should clearly document the extent to which scores arising from these methods are interchangeable." (AERA, APA, NCME, 1999, 70)
The committee concluded that additional guidelines were needed to supplement, extend, and elaborate on the Standards when they were applied to computer-based tests. The published guidelines are separated into two parts. The first part provides general background and explanations of the guidelines and includes chapters called "Introduction," "Validity and Test Design," "Test Development and Analysis," and "Test Administration." The second part provides the specific computer-based testing guidelines. Part 2 includes chapters entitled "Planning and Design" (11 guidelines), "Test Development" (23 guidelines), "Test Administration" (18 guidelines), "Scoring and Score Reporting" (11 guidelines), "Psychometric Analysis" (13 guidelines), and "Stakeholder Communications" (7 guidelines). To illustrate the breadth of these guidelines, one guideline from each of the major categories follows.
"Planning and Design," guideline 1.4: "A wide variety of computer-based [tests] can be designed and developed to meet different purposes. The test specification for computer-based tests should include: the test purpose, the content domain definitions, the content structure for the test items, required response formats for the test items, sample test items illustrating the response formats, the number of items to be developed and administered, scoring and reporting formats and procedures, and test administration procedures. The test specification should be thoroughly documented."
"Test Development," guideline 2.1: "The test delivery environment should be evaluated before item authoring begins. It is important to make sure that items being created can be properly displayed in the test delivery environment and that test-taker input and results can be collected, aggregated, and reported. For example, graphics constraints need to be identified so that item writers do not create items that have too many colors or require a screen resolution that is too high for the current test delivery environment."
"Test Administration," guideline 3.1: "The test sponsor should provide test-takers with clear and concise information regarding procedures to register for an examination, obtain an authorization-for-testing document, and scheduling a test appointment."
"Scoring and Score Reporting," guideline 4.7: "The accuracy of computer scoring algorithms should be established prior to implementation of the computer-based test."
"Psychometric Analysis," guideline 5.1: "Determine appropriate reliability indices if different test-takers are given different items or exercises."
"Stakeholder Communications," guideline 6.2: "When appropriate, developers of computer-based tests should provide sufficient information concerning the test purpose, and test content specifications to test users prior to when the test is available for widespread administration. This test information should be kept accurate and as up-to-date as possible."
The six foregoing guidelines show that the guidelines provide supplemental and elaborative information on the Standards for individuals and organizations seeking to develop computer-based or Internet-based tests and assessments.
For further information on the guidelines, contact the Association of Test Publishers, 1201 Pennsylvania Avenue, Suite 300, Washington, DC, 20004. The phone number in the U.S. is 202-857-8444.
Comparisons of the ATP Guidelines with Previous Professional Efforts
Early efforts to determine technical guidelines for assessing computerized adaptive tests were summarized in a 1984 Journal of Educational Measurement article by Bert Green, Darrell Bock, Lloyd Humphreys, Robert Linn, and Mark Reckase. The article addressed technical guidelines related to dimensionality, measurement error, validity, estimation of item parameters, item pool characteristics, and human factors. An additional pioneering article in this effort, "Developing Standards for Computerized Psychological Testing," was written by Paul Hofer in Computers in Human Behavior in 1985. This article identified the need for standards in the categories of equivalence, test takers, format and content, and equipment and procedures. Then, in 1995, Barbara Plake, director of the Buros Institute for Mental Measurement at the University of NebraskaLincoln, chaired a task force for the American Council on Education. That task force developed guidelines for computerized adaptive test development and use in education. Plakes presentation at the February 17, 1995, ATP conference noted that the ATP guidelines were organized consistent with the Standards, provide greater specificity, are more state-of-the-art, and cover more components of the testing and assessment process.
Following are some of the key indicators of the growth of computer-based testing. The area of computer-based testing and assessment is emerging as a significant professional field in educational measurement. There have been three major reference editions for educational measurement: in 1951, 1971, and 1989. The proportion of reference pages devoted to computer-based or machine-based testing in each of those three editions was 3 percent in 1951, 8 percent in 1971, and 14 percent in 1989. By the turn of the millennium, the number of computerized tests that had been administered in professional testing and assessment centers reached at least 4.5 million. It is estimated that Internet-based testing programs have administered at least several hundred thousand additional computer-based tests. Some school districts have also installed district-wide computerized testing systems. New and innovative Internet-based testing systems are being developed and are available for worldwide assessment applications.
To show how far we have come, here are some final quotes. Bert Green, in the conclusion of his comments on the significance and insights of Fred Mather Lords paper on tailored testing, says: "The computer has barely started to establish itself in the testing business. As experience with computer-controlled testing accumulates, we can expect important changes in the technology of testing. Most of these changes lie in the future. Lords results, clear-cut and devastating as they are, will in the end seem a minor skirmish in the inevitable computer conquest of testing." (Green, 1970, p. 194)
Finally, in a 1988 article, Samuel Messick states: "Over the next decade or two, computer and audiovisual technology will dramatically change the way individuals learn as well as the way they work. Technology will also have a profound impact on the ways in which knowledge, aptitudes, competencies, and personal qualities are assessed and even conceptualized. There will also come a heightened emphasis on individuality in assessment with a premium on the adaptive measurement, perhaps even dynamic measurement, of knowledge structures, skill competencies, personal strategies and styles as they develop with instruction and experience. But although the modes and methods of measurement may change, the basic maxims of measurement, and especially of validity, will retain their essential character. The key validity issues are the interpretability, relevance, and utility of scores, the import or value implications of scores as a basis for action, and the functional worth of scores in terms of social consequences of their use." (Messick, 1988, p. 33)
American Council on Education. Guidelines for Computerized Adaptive Test Development and Use in Education. Washington, D.C.: American Council on Education, 1995.
American Educational Research Association, American Psychological Association, National Council on Measurement in Education. Standards for Educational and Psychological Testing. Washington, D.C.: American Educational Research Association. 1999.
Association of Test Publishers. Guidelines for Computer-Based Testing. Washington, D.C.: Association of Test Publishers, 2000.
Fitzgerald, Cyndy. "Computer-Based Guidelines for the New Millennium." Paper presented at the Association of Test Publishers Conference on Computer-Based Testing: Applications for the New Millennium, February 17, 2000, Carmel Valley Ranch, Carmel, California.
Green, B.F. Jr. "Comments on Tailored Testing." In Computer-Assisted Instruction, Testing and Guidance, pp.184-197, Wayne H. Holtzman, ed. New York: Harper and Row, 1970.
Green, B.F. Jr., R.D. Bock, L.G. Humphreys, R.L. Linn, and M.D. Reckase. "Technical Guidelines for Assessing Computerized Adaptive Tests." Journal of Educational Measurement 21, 347360, 1984.
Hambleton, Ronald K. "Computer-Enhanced Assessments: Lots of Promise, but Many Problems to Be Overcome." Paper presented at the Association of Test Publishers Conference on Computer-Based Testing: Applications for the New Millennium, February 17, 2000, Carmel Valley Ranch, Carmel, California.
Hofer, Paul J. "Developing Standards for Computerized Psychological Testing." Computers in Human Behavior 1, 301-315, 1985.
Messick, Samuel. "The Once and Future Issues of Validity." In Test Validity, pp. 3345, Howard Wainer and Henry Braun, eds. Hillsdale, N.J.: Lawrence Erlbaum, 1988.
Mills, Craig N. "Unlocking the Promise of Computer Based Testing." Paper and multimedia presentation at the Association of Test Publishers Conference on Computer-Based Testing: Applications for the New Millennium, February 17, 2000, Carmel Valley Ranch, Carmel, California.
Olsen, James B. "ATP Computerized Testing Guidelines: Current Educational Measurement Theory and New Standards." Paper presented at the Association of Test Publishers Conference on Computer-Based Testing: Applications for the New Millennium, February 17, 2000, Carmel Valley Ranch, Carmel, California.
Plake, Barbara S. "Evolution of Guidelines for Computer-Based Testing." Paper presented at the Association of Test Publishers Conference on Computer-Based Testing: Applications for the New Millennium, February 17, 2000, Carmel Valley Ranch, Carmel, California.
Join the Internet Society today: http://www.isoc.org