Return to: How Do I Know If the Research Is Trustworthy?
What data were collected, and how were they collected?
Most education research studies attempt to connect a treatment to a result. This result is called the dependent variable and refers to what is being measured in a research study. Data make up the body of information produced by these measures. Student achievement and teacher classroom practices are examples of dependent variables in education research. The researcher should provide operational definitions for all dependent variables in the study. Valid conclusions can be made only about the dependent variables that are measured in the study. For example, if the dependent variable is type of instruction, then a conclusion about student achievement is invalid.
Data-collection procedures refer to how and when the data were collected. The procedures used to collect data can influence research validity. For example, whether or not participants were guaranteed anonymity affects whether participants are honest in their responses to surveys. The time and frequency of classroom observations influence the type of data obtained from the observations. A classroom observation conducted the day before spring break is unlikely to provide valid data about a teacher’s instruction.
The most commonly used data-collection instruments in education research are the following:
- Tests
- Scaled Questionnaires
- Surveys
- Interviews
- Observations.
It is critical that data-collection instruments have both
validity and
reliability. In general, instruments have validity when they measure what they are designed to measure. For example, results for 9th graders on a test of algebraic ability should be similar to their results on other tests of algebraic ability (e.g., test items on the Third International Mathematics and Science Study). Instruments are reliable if repeating a measurement within a short time span produces the same result. It is the responsibility of the researcher to report data on the validity and reliability of the instruments used for data collection in a study.
Caution:
Do not be fooled into thinking that because an instrument has a name, it is a valid measure of what is named. For example, an instrument called a “Test of Teacher Content Knowledge” is not necessarily a test that actually measures teacher content knowledge.
Hint:
Because there are so many things that can vary during a research study, a pilot test or a field test can increase the probability that measures are appropriate and that conclusions will be valid. Both types of tests refer to trial runs of all or some parts of a study. Data-collection instruments frequently undergo field testing to establish their validity and reliability. For example, prior to publishing a test, commercial-test developers conduct extensive field testing to demonstrate that the test is valid for its designed use and that test results are reliable.
Tests
With the current emphasis on accountability in education, tests (also known as assessments) are common data-collection instruments in education research. Most standardized tests are produced by commercial test developers who administer them to large samples of participants. The developers then analyze the results to determine the tests’ validity and reliability. Researchers who use a commercial test for a study should either summarize the information on validity and reliability or direct the reader to a source for obtaining it. To judge the validity of conclusions about test results, it is also necessary to know whether the test is norm-referenced or criterion-referenced. In addition, it is important to know for what uses a test was developed. A test that is a valid measure of algebraic ability might not be a valid measure of the ability to teach algebra.
Scaled Questionnaires
Scaled questionnaires (also called attitude scales) are often used to measure attitudes and beliefs. Most scaled questionnaires use a Likert scale, in which respondents are given choices reflecting varying degrees of intensity. For example, researchers have developed scaled questionnaires to measure school culture using items such as the following:
- In this school, staff members are recognized when they do a task well.
Choose one: Strongly Disagree, Disagree, Agree, Strongly Agree
Scaled questionnaires have the same validity and reliability requirements as tests. For example, what is the evidence that a school culture scale is actually measuring school culture and not some other property or characteristic of the school, such as material wealth? How a scaled questionnaire is used in a study also affects research validity. A scaled questionnaire developed to measure school culture might not have any relationship to leadership or student achievement, yet sometimes a researcher will make such unwarranted conclusions. The conclusions of a research study can be invalid despite the use of a valid data-collection instrument if the conclusions extend beyond the limits of what was measured.
Here’s an example of how scaled questionnaires are developed:
A scaled questionnaire designed to measure school culture might ask teachers and administrators questions such as the following:
- In this school, staff members are recognized when they do a task well.
Choose one: Strongly Disagree, Disagree, Agree, Strongly Agree
- I feel comfortable about discussing my concerns in this school.
Choose one: Strongly Disagree, Disagree, Agree, Strongly Agree
To develop a scaled questionnaire (also called an
attitude scale), a researcher asks a large sample of participants to respond to a large number of items the researcher has judged to have
content validity with regard to a particular concept. For example, the researcher might verify with practitioners and other researchers that the items concern aspects of school culture. Next, the researcher often reduces the number of questionnaire items through a statistical procedure called
factor analysis, which results in a small number of factors that relate to school culture. The researcher might call one factor “staff relations” because it consists of eight items that have to do with staff interactions. In studies where factor analysis has been used, it is important to identify the actual questionnaire items that make up a factor. Sometimes the name that the researcher gives to the factor might not reflect what was asked of participants. For example, questionnaire items for “staff relations” might ask participants only about interactions with the principal and not about interactions with teachers. It also is important to examine the
reliability coefficient for each factor to determine how strongly the questionnaire items that represent a factor are related to one another. A low reliability coefficient (e.g., less than .50) means that the factor is not representative of the questionnaire items.
Surveys
Surveys are widely used in education research, particularly in descriptive research studies. The key to a good survey is its design. The survey items should be carefully chosen to produce the data needed to answer the research questions. Survey items should be clear and should not bias a respondent toward particular answers (such as socially desirable responses). When the survey is the main data-collection instrument in a study, the researcher should include the survey in an appendix or make it available upon request. When a survey is mailed as a questionnaire rather than administered in person, a frequent problem is low response rate. Studies that use mailed questionnaires should always report the response rate and discuss the implications if it is low (i.e., less than 75%). If the response rate is low, the results might not be representative of the group of persons to whom the questionnaire was mailed. It is particularly important to know in a comparative descriptive study whether the response rates were different for the different groups.
Interviews
Interviews are surveys that are administered verbally, either individually or in groups. An interview protocol can be structured or unstructured. Interviews are more reactive measures than are paper-and-pencil questionnaires. For this reason, interviewers should have training in conducting the interview. This is especially true when more than one interviewer is gathering data. If the interviewers are not asking the questions in the same way, comparisons of data across different interviewees will be invalid. The researcher should describe the interviewer training in the research report and should include the interview protocol in an appendix or provide it upon request.
Hint:
A focus group is a group of participants who are interviewed together and encouraged to share their opinions on a specific topic, which is the focus of the interview. The interviewer (also called the moderator) should have training in conducting this type of interview because adequately and accurately capturing the discussion is not a simple matter.
Observations
Observation protocols are instruments used to document observations, usually in classrooms. A good observation protocol has clear operational definitions of the behaviors to be observed, as well as guidelines for recording the frequency of each behavior. For example, an observation protocol for a study of teachers’ instructional practices should list the various expected teaching behaviors (e.g., small-group discussion), provide operational definitions of each behavior (e.g., three to six students discussing problems), and indicate the length of each observational period (e.g., two hours) as well as the frequency of the observations (e.g., two times each week for four weeks). The researcher should provide information about the inter-rater reliability of the observation protocol. If multiple observers are used in a study and the observers do not agree on what they are observing, conclusions about the observational data will be invalid.