Bruce D. Baker of the National Education Policy Center writes about what he calls the "Toxic Trifecta" of education reforms involving high-stakes teacher evaluation systems. The items on the list should sound awfully familiar to South Dakotans debating Governor Dennis Daugaard's teacher evaluation proposals:

First, the standard evaluation model proposed in legislation requires that objective measures of student achievement growth necessarily be considered in a weighting system of parallel components. Student achievement growth measures are assigned, for example, a 40 or 50% weight alongside observation and other evaluation measures....

Second, the standard evaluation model proposed in legislation requires that teachers be placed into effectiveness categories by assigning arbitrary numerical cutoffs to the aggregated weighted evaluation components. That is, a teacher in the 25%ile or lower when combining all evaluation components might be assigned a rating of "ineffective," whereas the teacher at the 26%ile might be labeled effective....

Third, the standard evaluation model proposed in legislation places exact timelines on the conditions for removal of tenure... [Bruce D. Baker, "The Toxic Trifecta, Bad Measurement, & Evolving Teacher Evaluation Policies," National Education Policy Center, 2012.04.24].

Read Governor Daugaard's HB 1234, and you'll find he plans to impose plank #1 of Baker's Toxic Trifecta in spades:

Fifty percent of the evaluation of a teacher shall be based on quantitative measures of student growth, based on a single year or multiple years of data [HB 1234, Section 38.2.a].

HB 1234 works toward the second toxic plank by creating four-tiered rating systems for teachers and principals. However, HB 1234 does not mandate arbitrary numerical cutoffs for those four categories of teacher effectiveness. The exact formula for assigning ratings of "Distinguished", "Proficient", "Basic", and "Unsatisfactory" remains the task of a twenty-person work group.

HB 1234 doubles down on toxic plank #3. Governor Daugaard's plan doesn't link evaluations to "tenure"; it ends tenure for all new teachers. For the remaining teachers with grandfathered tenure, the plan then links evaluations to renewal of their contracts by adding "a rating of unsatisfactory on two consecutive evaluations" to just cause for nonrenewal.

Now it's a good idea to get rid of bad teachers. The problem is that the evaluations on which such nonrenewals are flawed. HB 1234 vaguely mentions tests that measure "student growth." Baker explains that student growth measures do not measure teacher quality or performance:

Arguably, one reason for the increasing popularity of the student growth percentile (SGP) approach across states is the extent of highly publicized scrutiny and large and growing body of empirical research over problems with using value-added measures for determining teacher effectiveness (See Green, Baker and Oluwole, 2012). Yet, there has been little such research on the usefulness of student growth percentiles for determining teacher effectiveness. The reason for this vacuum is not that student growth percentiles are simply not susceptible to the problems of value-added models, but that researchers have chosen not to evaluate their validity for this purpose -- estimating teacher effectiveness -- because they are not designed to infer teacher effectiveness.

...a student growth percentile is a descriptive measure of the relative change of a student's performance compared to that of all students and based on a given underlying test or set of tests. That is, the individual scores obtained on these underlying tests are used to construct an index of student growth, where the median student, for example, may serve as a baseline for comparison. Some students have achievement growth on the underlying tests that is greater than the median student, while others have growth from one test to the next that is less. That is, the approach estimates not how much the underlying scores changed, but how much the student moved within the mix of other students taking the same assessments, using a method called quantile regression to estimate the rarity that a child falls in her current position in the distribution, given her past position in the distribution.[3] Student growth percentile measures may be used to characterize each individual student's growth, or may be aggregated to the classroom level or school level, and/or across children who started at similar points in the distribution to attempt to characterize collective growth of groups of students.

... since student growth percentiles make no attempt (by design) to consider other factors that contribute to student achievement growth, the measures have significant potential for omitted variables bias. SGPs leave the interpreter of the data to naively infer (by omission) that all growth among students in the classroom of a given teacher must be associated with that teacher. Even subtle changes to explanatory variables in value-added models change substantively the ratings of individual teachers (Ballou et al., 2012, Briggs & Domingue, 2010). Excluding all potential explanatory variables, as do SGPs, takes this problem to the extreme. As a result, it may turn out that SGP measures at the teacher level appear more stable from year to year than value-added estimates, but that stability may be entirely a function of teachers serving similar populations of students from year to year. That is, the measures may contain stable omitted variables bias, and thus may be stable in their invalidity [Baker, 2012.04.24].

In short, if Governor Daugaard imposes a system that evaluates teachers on the basis of student growth percentiles, it would be like measuring your speed with a thermometer. The tests we have aren't designed to measure teacher performance.

"Toxic" is a really good word for Governor Daugaard's education reforms. They will not improve South Dakota's K-12 education system; they will make our K-12 system worse.

