Technology Assisted Research Methodologies:
A historical perspective of technology-based data collection methods

Marianne J. D’Onofrio
Michael S. Gendron

Management Information Systems
Central Connecticut State University

860-832-3297
donofrio@ccsu.edu


Introduction

Technology has been used to supplement researcher activities for many years. Specifically, technology has been instrumental in providing new and innovative ways to collect and process data. New data collection techniques have changed over time with a focus on enhancing statistical validity and limited generalizibility while new processing methods have essentially provided better and more sophisticated ways to analyze data. Both new data collection techniques and processing methods have attempted to increase efficiency and reduce costs.  These new processing methods and data collection techniques are being referred to collectively as technology assisted research methods (TARM) in this paper.

Technology assisted research methodologies  (TARM) include: 1) data collection methods (TARM-c) and 2) data processing methods (TARM-p).  This paper provides a historical perspective for TARM-c and provides insight that should be carried forward as TARM-c is implemented using the Internet.

We describe the data collection methods that have developed over a lengthy period of time as TARM-c.   These data collection methods are described in the timeline that follows. 

TARM-c is a superset of methods, which contains traditional statistical sampling methods as well as ad hoc data collection methods. A framework for TARM-c is shown in Figure 1.

Figure 1

TARM-c
Technology assisted Sampling Methodologies
for Data Collection

TARM-ca
Ad Hoc Data
Collection

TARM-cs
Statistical Sampling and Data Collection

Technology assisted ad hoc data collection (TARM-ca) is a term we use to describe data collection activities that do not have the statistical validity or reliability of data collected using statistical sampling techniques.  TARM-ca can be referred to as selective non-sampling (that is, collecting data for a specific purpose, but without the rigor associated with the development of a sampling frame--usually a convenience or non-random sample).  Much of the data collected over the Internet is of this type.  A sample of Internet TARM-ca might be the creation of a web survey that is linked to your organization's home page inviting participation by all who are interested.  This represents a non-random sample of people who visit your homepage and by definition has limited generalizibility.

We refer to data collection assisted by technology that employs the rigor of statistical sampling as TARM-cs.  An example of this would be the use of statistical sampling methods to create a sampling frame then directing members of the sampling frame to a secure website for survey completion.  You would then be able to generalize the results to your target population.

These are two dramatically different approaches within TARM.  In one, TARM-ca, you potentially get large quantities of data from people who are interested in your web site but do not know much about them.  You are not assured they represent the underlying population.  In TARM-cs your resultant data represents the underlying population from which you drew the sample, and you are thus allowed to generalize the results to that population.

Timeline

Today, it is incumbent on us to examine the Internet for implementations of TARM.  Both TARM-ca and TARM-cs can be performed using the Internet, but it is helpful to reflect on how technology has been used to collect data over the last century.  There are valuable lessons to learn from earlier implementation that can be carried forward to Internet TARM.

Sampling theory can be traced to the late nineteenth century; however, basic statistical techniques for probability sampling were first proposed by J. Neyman (Neyman 1934) in his seminal work “On the Two Different Aspects of Representative Methods: The Method of Stratified Sampling and the Method of Purposive Selection.” Ensuing years gave us statistical theory for many new sampling methods, including cluster sampling, multistage sampling, replicated sampling, and systematic sampling.  During the years that these theories were developed, acceptable sampling methods were galvanized by statistician’s attempts to correctly identify presidential election victors.  In the 1970s social scientists were confronted with new challenges; namely, to collect more complex data via survey samples and to perform more complex statistics. (Frankel 1987) Today, researchers are faced with new challenges: Does the Internet make data collection more cost-effective and efficient?  If so, can we collect even more and better data and perform more complex studies? However, before we answer these question, we should understand the uses of technology to collect data.  The major milestones of computer assisted survey information collection are delineated in Table 1.  The insights gained from these activities should be carried forward to TARM.

Table 1
Adopted from (Couper and Nicholls 1998)

YEAR

MAJOR MILESTONE

1966

First Nationwide Telephone Interviewing Facility Established.

1971

R.M. Gryb of AT&T proposes first form of Computer Assisted Telephone Interviewing (CATI).  First CATI used by Chilton. 

1978

First large-scale academic production use of CATI – California Disability Survey conducted by the Survey Research Center at Berkeley and UCLA.

Late 1970’s

Computer-Assisted Personal Interviewing (CAPI) became widely understood. Shanks, as quoted in

Early 1980’s

First desktop computers introduced.

Mid-1980

Field interviewers use laptops for face-to-face interviews.

Late-1980’s

Computer Self-administered Data Collection  (CSAQ) was introduced.(Couper and Nicholls 1998)

Early 1990’s

Internet was introduced.

1990’S

Internet grew and its potential for data collection was realized

Today

A need to understand TARM-ca and TARM-cs and how they can be best implemented through the Internet.

Figure one shows us a growth from operator run technology assisted interviewing in the 1970s to computer self-administered questionnaires in the late 1980s and early 1990s.  In the early 1990s the possibilities for using technology for data collection grew exponentially with the ubiquitous nature of the Internet.  The years up to the mid-1980s were the OPERATOR YEARS, then in the mid-1980s the notion of self-administered surveys was introduced.  It should be noted that all of these techniques involved the creation of a valid sampling frame.

In the 1990s the Internet came into fashion with all it portends for data collection.  However, without the 1980’s value switch from technology assisted operator run surveys to technology self-administered surveys, the interest in using the Internet to collect data may have not developed.

Conclusion

The Internet has made it easier than ever for non-trained individuals to collect massive amounts of data.  For instance, if you create a website which is fortunate enough to generate large amounts of traffic, you have a ready-made sample.  It is self-apparent that you could retrospectively reference all individuals that visit your website as your population, but there are many issues that are not addressed using that logic.  Specifically, the issues of statistical validity, reliability, and generalizablity need to be considered.

We have shown a model of TARM that incorporates data collection and data processing, and that the data collection subset (TARM-c) includes two types TARM-ca – ad hoc data collection and TARN-cs – sampling frame based data collection. TARM-c incorporates all of the technologies in Computer Assisted Survey Information Collection (CASIC), Computer Assisted Telephone Interviewing (CATI), Computer Assisted Personal Interviewing (CAPI), and Computerized Self-administered Questionnaires (CSAQ), as well as the host of Internet-based data collection technologies.  However, TARM-c adds the dimension of ad hoc data collection and sampled data collection.

With the easy nature of data collection over the Internet, it is incumbent that we carefully examine and understand how we employ TARM.  This is especially important to organizations when business decisions are made using the data generated through an internet-based survey.

References

Couper, M. and W. Nicholls, Eds. (1998). The history and development of Computer Assisted Survey Information Collection Methods. Computer Assisted Survey Informaiton Collection, John Wiley & Sons, Inc.

Frankel, M. R. a. F., L.R. (1987). "Fifty years of survey sampling in the united states." Public Opinion Quarterly 51(4): S127-S138.

Neyman, J. (1934). "On the two different aspects of the representative method: the method of stratified sampling and the method of purposive selection." Journal of the Royal Statistical Society 97: 558-606.