question one link question two link question three link question four link question five link question six link question seven link question eight link

How To Read the Report
The Basis for the Report
How Was the Research Selected?
How Was The Research Evidence Assessed?

This is the second in a series of three reports about the research on teaching quality that the Education Commission of the States (ECS) is producing through a grant from the U.S. Department of Education's Fund for the Improvement of Education (FIE). The focus of this report is on teacher recruitment and retention. The first report in the series, Eight Questions on Teacher Preparation: What Does the Research Say, was completed in July 2003. It can be viewed online at and a print version purchased from ECS at that same Web. The final report will focus on teacher certification and licensure, and should be available by fall 2005.

The reports are intended to guide policymakers, educators and foundation officials in their efforts to improve the quality and supply of America's teacher workforce. ECS also hopes the reports will help researchers and others strengthen the knowledge base that underlies policy and practice, and ensure research in the field better addresses the needs and interests of practitioners and, especially, policymakers.

Among ECS' constituents — governors, legislators, state school chiefs and other political and education leaders — the issue of teaching quality consistently ranks as one of their top concerns. This is no doubt due in part to the shortage of well-qualified teachers faced by virtually every state to one degree or another. It also is due to the persuasive and growing body of evidence that teacher effectiveness is the single most-important educational factor in children's achievement in school. Without reliable guidance and the ultimate success of efforts to strengthen teacher quality and supply, however, policymakers and education leaders may turn their attention away from this issue, in spite of its fundamental importance, and pursue other strategies for improving education.

It is hoped this report and the other two in this series can indeed begin to offer the information so greatly needed. This report, as a starting point, presents an assessment of the current baseline of the research knowledge relating to specific questions about teacher recruitment and retention. As research continues, the report will need to be revised and updated periodically to reflect new studies that may shed light on the questions under consideration here or on other questions about teacher recruitment and retention that may emerge over time.

The report also indicates where there is insufficient research to answer the questions asked. This not only has implications for efforts to ground policy decisions in solid evidence but also for the assessment of what additional research needs to be undertaken to provide stronger evidence and more satisfactory answers.

[ back to top ]

How To Read the Report

The report is structured around the discussion of eight questions, each of which can be read independently of the others. The discussion of each question allows for both a quick summary reading (in the section called Quick Answer) and a more in-depth exploration. A Summary of Studies provides an overview of the key findings and conclusions related to each question. Before delving into the discussions of the specific questions, however, it is recommended the reader turn to the chapter titled About the Eight Questions.

In addition to the specific discussions of each of the eight questions, the report includes other material that enhances the understanding of the reader and discusses the larger implications of the report's findings. The Introduction provides an overview of the issues involved in teacher recruitment and retention, and discusses the role of research in policy decisions. The concluding chapter, Improving the Research on Teacher Recruitment and Retention, contains some suggestions for making research on the issue more rigorous, more complete and more useful to policymakers. A more general discussion about improving the research on education, including suggestions for roles various stakeholders can play in such an effort, is found in the earlier report, Eight Questions on Teacher Preparation.

Because the report deals with highly technical issues and material, the use of technical terms was unavoidable. Terms relating to research are italicized in blue text (i.e., term). Except in the summaries of individual research studies, however, they are noted only the first time they appear in a given section of the report. Clicking on the identified term causes a window to appear with the more complete Glossary definition. The Glossary also can be viewed independently.

This report may be used in conjunction with A Policymaker's Primer on Education Research: How To Understand, Evaluate and Use It, which ECS and Mid-continent Research for Education and Learning (McREL) developed jointly, to help policymakers and others understand the subtleties of scientific research and be more confident in assessing and using it. The Primer, which was written by Patricia A. Lauer, is accessible online at and available in an abridged version at that Web address.

This report notes those instances, via the use of a colored asterisk ( * ) followed by colored text, where the Primer can provide the reader with a more in-depth understanding of the related methodological issues.

[ back to top ]

The Basis for the Report

This report relies heavily on a review of the research literature on teacher recruitment and retention that ECS commissioned from a research team at RAND that included Cassandra Guarino, Lucrecia Santibañez, Glenn Dailey and Dominic Brewer. The researchers, who have outstanding reputations as social scientists, sought to be as objective as possible in carrying out their review. They employed rigorous criteria in the selection and analysis of the studies they reviewed; the criteria are summarized in the next section below and explained in more detail in Appendix A. Moreover, their work was itself reviewed anonymously by three prestigious outside scholars prior to completion of the final manuscript. The original RAND review, titled A Review of the Research Literature on Teacher Recruitment and Retention, is available from RAND at

In addition to the RAND review, ECS commissioned Richard Ingersoll and Jeffrey Kralik, of the University of Pennsylvania, to conduct a review of research on induction and mentoring that also employed rigorous, though somewhat different, criteria of analysis and is used here as a supplement to the RAND report. That review, The Impact of Mentoring on Teacher Retention: What the Research Says, is found online on ECS' Web site at

The summaries of the research in this report generally mirror those of the RAND researchers, although they may differ in some details. The present report frequently differs from the RAND study in the conclusions that can be drawn from the research. Moreover, the present report goes well beyond the scope of its predecessor in attempting to assess the implications of the research for developing relevant public policy. Those implications are based upon the author's own knowledge and understanding of the constellation of policy issues surrounding teacher recruitment and retention. And while the author has tried to be as fair and objective as possible in drawing the implications of research for policy, those implications ultimately reflect the author's own perceptions. Prior to its final release, however, the present report was reviewed in its entirety both within ECS and by external experts to minimize errors and identify unsound or unwarranted conclusions.

This report concludes with a discussion of some of the major shortcomings of research on teacher recruitment and retention and with a set of recommendations for strengthening it. While many of the recommendations for improving the research are based on the conclusions of both the RAND researchers and Ingersoll and Kralik, other recommendations grow out of several meetings with researchers and policymakers that ECS convened in 2002 as part of the larger project of which the present report is a part.

[ back to top ]

How Was the Research Selected?

The overwhelming bulk of the research included for review in this report was selected by the researchers at RAND, and the report relies heavily on the judgment of the RAND researchers as to the appropriate inclusion criteria. For some questions, however, additional literature from the previous ECS report on the research on teacher preparation was included. In addition, on rare occasions, the author used his discretion to add literature that was published since the time of the RAND review or recommended by outside reviewers. All the literature reviewed for the present report are examples of empirical research — studies that offer evidence for their conclusions that comes from systematic observation rather than from articles that are based on opinion and use other studies for support. Non-empirical pieces can be quite helpful in clarifying issues conceptually, but since this report addresses empirical questions, it seeks to provide empirical evidence.

The RAND researchers ultimately selected 91 studies for inclusion in their review, out of 1,780 potential candidates. A number of potential candidates were eliminated either because they were non-empirical or lacked the characteristics of sound scholarship. That then left just over 300 studies, with the final 91 empirical studies included on the basis of certain criteria and the judgment by the RAND researchers as to whether the studies met the following criteria:

  • Published in high-quality, peer-reviewed publications or by organizations with a well-established peer-review process
  • Original studies and not reviews of other work
  • Current [published since 1980] and not superceded by later studies
  • Addressed precisely the research questions asked
  • Research design and analysis employed were appropriate to the topic under study.

For quantitative research, several additional considerations determined whether or not a study was included for review:

  • The sample used in the analyses was of adequate size and was appropriately selected and surveyed
  • The variables used were reliably measured with a high degree of validity
  • The statistical model used in the analysis was judged to be largely free from bias or to address the likely sources of any bias, and it neither omitted relevant variables nor included irrelevant ones
  • The conclusions offered in the study neither overstated nor misinterpreted the findings. (In some cases, such a failing did not eliminate a study from consideration but merited an appropriate critique in the original RAND review.)

For qualitative research, there were additional criteria used to decide whether or not to include a study in the review:

  • Qualitative methods were employed either because the study used a small sample, considered data that were difficult to quantify or addressed phenomena for which no previous hypotheses had been developed
  • Adequate empirical evidence and strong analysis were presented in support of the conclusions drawn
  • The hypotheses formulated were relevant, or the interpretations drawn were informative for other researchers in the field.

A more detailed summary of the inclusion criteria employed by the RAND team appears in Appendix A. The criteria used for the review of the literature that was added from the Eight Questions on Teacher Preparation report were similar to those used by the RAND researchers and found in Appendix A of that report.

The Ingersoll and Kralik review used somewhat different inclusion criteria, which are detailed in the ECS report, The Impact of Mentoring on Teacher Retention: What the Research Says. The two key differences between the criteria used by Ingersoll and Kralik and those used by the other researchers are (a) Ingersoll and Kralik did not restrict their review to published, peer-reviewed literature, and (b) Ingersoll and Kralik reviewed only quantitative studies that involved a comparison between one or more groups who received induction and mentoring and a group who did not.

While the present review cannot claim to be absolutely exhaustive, it is hoped it includes virtually all the highest-quality relevant literature published up through 2004. A complete list of the sources reviewed for this report appears in the References section. There were 91 empirical studies reviewed for this report.

To be sure, relying only on published literature invites a bias in favor of research that is of interest to an academic audience and that supports traditionally held positions. And it excludes a good deal of the local research and evaluation studies that teacher educators or other researchers conduct in relative obscurity. This is one of the advantages of the Ingersoll and Kralik review, which did look at studies that were not peer reviewed, though this was largely because there were so few studies, published or not, that met its specific criteria for inclusion.

In general, however, the value of peer review is it screens out work of inferior quality and work that has a strong advocacy, rather than scientific, orientation. Moreover, a good deal of local research relies on a set of experiences and assumptions that are often not widely shared outside a local context, so the wider significance or external validity of such local studies is often very limited. Finally, it would require an enormous amount of time (and a significantly greater expense) even to locate unpublished (or "fugitive") literature. Thus, the restriction of the review to published peer-reviewed literature gives it at least an initial assurance of quality and seemed a reasonable and cost-efficient limitation.

[ back to top ]

How Was the Research Evidence Assessed?

Assessing how well the research responds to the eight key questions is tricky. The reader will note frequent observations throughout this report about the implications or limitations of the research. These observations often draw on the assessments provided by the RAND researchers in their original research review.

This report attempts to provide an overall evaluation of how strongly the body of studies relevant to a specific question points to a particular answer. How to undertake such an overall evaluation, or synthesis, of the research is a subject of intense scientific discussion in and of itself. Even among research methodologists who consider only quantitative research, there are disagreements about proper procedure. When, as in the present case, there are both quantitative and qualitative research involved, and when there is little experimental research that stands above the rest in identifying cause-and-effect relationships, an assessment of the strength of the research base is that much more difficult.

Some researchers employ an approach called meta-analysis to provide a quantitative, statistical summary of the combined results of multiple studies related to the same question. The RAND researchers did not use this approach, however, because the questions being addressed were somewhat broader than those typically addressed in meta-analysis and because the outcome measures in the studies were so varied. Following their example, this report also does not use meta-analysis, but rather relies on a non-statistical and less formal approach to summarize the aggregate evidence provided by the research.

Because the primary purpose of this report is to provide an assessment of the relevant research for policymakers, the designations of the strength of the research are intended to be utilitarian. The criteria employed in making these judgments are certainly not the only ones possible, and accomplished researchers certainly may quibble with the overall assessments given. Hopefully, however, the criteria used here provide a reasonable comparative evaluation and a practical and comprehensible shorthand indication for policymakers who want to use the research evidence in making policy decisions.

The designations of the strength of the research support used in answering the eight questions are as follows:

  • The research was considered to offer strong support or evidence for a conclusion if (1) there were several solid experimental studies or quasi-experimental studies that supported it; and/or (2) there were a significant number of correlational studies that supported it involving advanced statistical approaches such as regression analysis; and (3) there were very few, if any, studies that cast doubt upon the conclusion. In other words, there needed to be an unequivocal pattern of support for the conclusion on the basis of solid quantitative research.
  • The research was considered to offer moderate support or evidence for a conclusion if it did not meet the criteria for strong support, but (1) there were one or more solid experimental studies or quasi-experimental studies that supported it; and/or (2) there were more than several correlational studies that supported it involving advanced statistical approaches; (3) there were few studies that cast doubt upon the response; and (4) in borderline cases, especially if there was disagreement among studies, there were descriptive studies present that made it more plausible that certain correlations were based upon a true causal relationship. In other words, there needed to be a clear pattern of support for the conclusion on the basis of solid quantitative research.
  • The research was considered to offer limited support or evidence for a conclusion if it did not meet the criteria for moderate support, but (1) there was at least one solid experimental study or quasi-experimental study that supported it; and/or (2) there were several correlational studies that supported it involving advanced statistical approaches; (3) there were a preponderance of descriptive studies that supported it, and (4) there was considerably weaker evidence in support of any conflicting conclusion.
  • If the research for any conclusion did not at least meet the standard of providing limited support, then it was regarded as being inconclusive. This could be the case both when only one or two studies supported a conclusion and when there were not significantly more studies that support one conclusion than support one or more opposing conclusions.

It should be noted that although answers to several questions were judged to have strong research support, there was no experimental research, at all, in the relevant literature reviewed for this report. That absence is particularly lamentable in the case of induction and mentoring given the strong interest in it on the part of educators and policymakers. Thanks to funding from the U.S. Department of Education's Institute of Education Sciences, however, experimental research on induction and mentoring is recently underway. It would be extraordinarily difficult, however, to carry out experimental studies of the impact of compensation policies, school factors, etc. because the kind of controlled situation required to carry out such studies is not easily established. On such issues, good correlational research may be the best that can be accomplished.

Though the body of literature reviewed for this report lacks experimental studies, the rigor and sophistication of many of the statistical studies included is impressive. There are three caveats that should be noted, however:

  1. The fact that a study passed muster with the reviewers and was included in the body of literature reviewed for this report does not mean it was without any weaknesses. All research studies have their flaws. While the RAND reviewers frequently noted problems with some of the studies, those were not included in this report. Anyone interested in the shortcomings of the studies identified by the RAND team is encouraged to read their original literature review.

  2. Related to the first caveat, the complex statistical studies reviewed generally employed sophisticated statistical models. In this particular report, these are often models that attempt to account for all factors involved in an individual's choice to take or leave a particular teaching job, even in the absence of actual data about all those factors. Although the RAND reviewers attempted to screen out studies in which such models were biased or poorly constructed, even the best models have the status of good approximations.

  3. Unlike the questions in our previous report on teacher preparation, several of the questions here do not concern causal relationships or the impact of certain policies or practices but, rather, seek to describe demographic realities and trends. Thus, it seems a little peculiar to discuss the strength of the research evidence that the teacher workforce, for example, is composed of a certain percentage of males, females, whites and minorities. Research on statistical realities is not cumulative in the way research on causal connections would be. Indeed, research studies published in the past may not quite reflect the actual demographic realities of 2005. Assuming a recent demographic research study is thorough and its sample is representative of the larger population in review, then its findings would be accepted as valid. Where an evidentiary assessment regarding demographic research does sometimes come into play in this report, however, is in trying to find the explanation for the statistical realities and trends that are noted.

* For additional insight into the methodological issues involved in the preceding discussion, see the section titled "How Do I Know if the Research Is Trustworthy?" in A Policymaker's Primer on Education Research.


[ back to top ]

© 2005 Education Commission of the States