Technology
Assisted Research Methodologies: Michael
S. Gendron Management
Information Systems 860-832-3293 http://wwwsb.ccus.edu/dataquality IntroductionOrganizations
face new and more complex ways of collecting data. The Internet provides many of these innovations,
while at the same time it provides an increased number of variables
that must be considered when choosing how to collect data. One of the
considerations is the collection of data with the appropriate level
of quality. This paper introduces the Technology Assisted Research Methodology
Data Collection Quality Model (TARM-DQM) of assessing the best technology
for Internet-based data collection. TARM (Technology Assisted Research
Methodologies) is the term we use to describe technologies that are
employed to collect and process survey data (D'Onofrio & Gendron, 2001) . We
believe that the selection of the best Internet TARM implementation
can be formalized to enhance decision-making.
This paper presents a preliminary model that aids in the understanding
the multi-attribute nature of data quality that accords the selection
of Internet TARM. There
are basically two types of sampling done when Internet TARM is employed
to collect data: 1) traditional
random sampling, and 2) ad hoc data sampling. We also see at least two
types of Internet data collection activities:
1) e-mail surveys, and 2) web surveys. Sampling
Methodologies
Traditional sampling methodology has evolved over many years starting in the late nineteenth century; however, basic statistical techniques for probability sampling were first proposed by J. Neyman (Neyman, 1934) . Random sampling is employed to ensure the validity, reliability, generalizability and representativeness of data collected. In short, it is used to make sure that the data collected sufficiently describes the population so that the data is useful. Ad hoc sampling is where subjects occur naturally, and no or few statistical methods are used to ensure the validity, reliability, generalizability and representativeness of data collection. In other words, data is collected from subjects that happen to be available without regard for ensuring that the sample is representative. This is very easy to do in Internet TARM implementations. One only has to envision a survey placed on a high volume web site – all visitors are requested to complete a survey but no attempt is made to select respondents…hence the name “ad hoc.” Data
Collection Activities
While
there are many types of data collection activities that can occur over
the Internet, we choose to emphasize two.
This paper discusses survey data collected by email and by the
web. In
an email sample, subjects are sent the survey and are asked to return
it by email. For web-based
surveys subjects are contacted and asked to visit a website. They complete
a survey there. The TARM-DQM ModelThe
proposed model uses multi-attribute utility theory (Drummond, Stottard, & Torrance, 1994; Haddawy &
Haddawy, 1997) to determine the type of data collection that has the
highest data and information quality and to remove some of the subjectivity
from the decision process. This theory proposes the representation and
the construction problems (Martin,
2000) . The representation
problem involves the determination of attributes that represent the
decision maker’s preferences so they can be described by a function.
The construction problem involves the maximization of the function and
the estimation of its parameters.
Our goals are to further validate the data quality attributes
thus solving the representation problem and to create an orthogonal
set of scales that solve the construction problem while maintaining
clear relative importance of attributes. Table
1
- TARM-DQM
Q
represents
the output of a utility function to calculate data quality. We propose that by creating a function (Q) that the data collection
option with the highest relative quality can be determined. Based on
work that defines data quality attributes (Wang
& Strong, 1996) and our own research, we propose that these decisions
can be made with a finite set of attributes and parameters. For each
of these attributes an importance rating (I) and a level (L) would be
derived, as is shown in Table 2.
Earlier work suggests that the importance weightings (I) are
organization specific (Gendron
& D'Onofrio, 2000) . Further research is needed to validate the metrics
required to ensure appropriate levels (L) for each cell of the model
(i.e., the levels (L) are specific to the decision choices explicated
in Table). Table
2
– Data Quality Attributes and Parameters
Using
a multi-attribute utility model these parameters would give us the formula: Importance
(I) rating scales have been developed and are Likert scales anchored
at 1 as unimportant and 7 as Extremely Important. Our current research
suggests that it is necessary to give respondents a non-applicable
choice as an alternative to the importance rating scale as
they seem reluctant to rate any data quality attribute as un-important. The scales for level (L) are under development and could be
either ordinal or nominal with mapped responses. ImplementationAn
organization presented with choices for data collection needs to make
a decision regarding which choice is best.
In our experience this is often an unstructured decision (Simon,
1977) . This
model attempts to remove some of the subjectivity and thus enhance decision-making. Use
of the proposed model would proceed as follows:
To
date the following has been done:
ConclusionThis
paper presents a multi-attribute decision model to assist in the selection
of the best data collection alternative.
This model is designed to provide managers with a tool to aid
in their decision-making. References
D'Onofrio, M. J.,
& Gendron, M. S. (2001). Technology
Assisted Research Methodologies: A historical perspective of technology-based
data collection methods. Paper presented at the Internet Global
Summit: A Net Odyssey - Mobility and the Internet - The 11th Annual
Internet Society Conference, Stockholm Sweden. Drummond,
M., Stottard, G., & Torrance, G. (1994). Methods
of Economic Evaluation of Healthcare Programmes. New York: Oxford
Press. Gendron,
M. S., & D'Onofrio, M. J. (2000). Data
Quality in the Healthcare Industry:
An Exploratory Study. Paper presented at the Systemics Cybernetics
and Informatics, Orlando, FL. Haddawy,
V. H., & Haddawy, P. (1997). Problem-Focused
Incremental Elicitation of Multi-Attribute Utility Models. Paper
presented at the Thirteenth Conference on Uncertainty in Artificial
Intelligence, Brown University, Providence, Rhode Island, USA. Martin,
M. (2000, May 10, 2000). Multi-criteria
decision aid, [available: http://www.agena.co.uk/mcda_article/mcda_intro3.html].
Agena [2001, 04/06]. Neyman,
J. (1934). On the two different aspects of the representative method:
the method of stratified sampling and the method of purposive selection.
Journal of the Royal Statistical
Society, 97, 558-606. Simon,
H. (1977). The New Science of Management Decision. Engle Cliffs, New Jersey:
Prentice-Hall. Wang, R. Y., & Strong, D. M. (1996). Beyond accuracy: what data quality means to data consumers. Journal of Management Information Systems, 12(4), 5-34. |