How the Race to the Top Evaluations Look So Far in NYC

You don't have to work with machines to be a factory worker.

You don’t have to work with machines to be a factory worker.

The first week of classes in New York City public schools will be in the books by the end of the day today. To say this is not an ordinary year does not really capture the mood. We are entering our first year of a new teacher evaluation system, which has been the cause of much confusion over the past week. On the horizon looms the Common Core in 2014. Both the evaluation and Common Core are still largely question marks whose implications are only starting to be felt now.

To summarize what the new evaluations look like (much more specific stuff can be found here, here and here) the old system of “Satisfactory” and “Unsatisfactory” ratings for teachers is out the window in favor of a four-point scale ranging from “ineffective” to “developing” to “effective” to “highly effective”.

60% of our ratings will be based upon a “scientifically-based” rubric mandated by the federal Race to the Top program, the reason for these evaluations in the first place. In New York City’s case, we are using Charlotte Danielson’s Framework for Teaching. The term “Danielson” has become a commonplace shorthand on the lips of everyone in every school building city-wide, carrying with it a pall of fearful uncertainty.

20% of our evaluations will be based upon our students’ performance on state-wide exams. For elementary and middle schools, this means new and more frequent testing of their students at the rate of at least one exam each year. For high school teachers, this means being tied to the Regents exams in various subjects. Even teachers of gym, art, music and other enrichment classes will be tied to Regents exams in other subjects that have very little to do with the content of the courses they teach. For these teachers, the principal will assign the Regents exam to which this 20% of the evaluation will be tied.

The final 20% will be based on how each teacher’s students do on what are called “local measures”.  Each school district around the state got to choose some form of assessment that does not necessarily have to be an exam, although it will certainly be an exam in most cases. In New York City, each school had a choice as to whether they wanted this 20% “local measure” to be some form of DOE-generated and DOE-graded assessment or to simply be the exams used for the 20% state measure counted in a different way (this to be explained below). For both the state and local measures, the absolute score of a teacher’s students on these exams does not matter as much as how much the scores of the students improve over the past year. Improve over what you might ask? On what exams will this and future year’s baseline scores be based? The answer to that question is a bizarre kaleidoscope of exams and quasi-exams, each depending on the grade and/or subject of each teacher’s schedule and roster of students. It is impossible to find anyone who has mastered the permutations of which exams will be used to determine baseline scores for which students in which grades, mostly because it is quite apparent that the state and especially the city do not even know yet.

That means that the way teachers are evaluated here in New York City over the next few years will vary somewhat from school to school. It also means that there are some unsettling question marks surrounding this new evaluation regime. This past summer, each school had to assemble a team of 4 people chosen by the principal and 4 people chosen by the union chapter leader who then would advise the principal on their recommendation(s) on what should count as the school’s 20% “local measure”. As chapter leader at my school I was part of this committee What follows is an account of how this new evaluation process has unfolded in my school since that summer meeting up until now. For such a relatively short amount of time the changes for everyone in our building have been marked and instructive.

Without getting into petty details, our committee of 4 teachers and 4 administrators had a relatively harmonious meeting about local measures. We decided essentially to go with what has been dubbed the “default measure”. That means that all of the scores on all of the assessments used for the state measures at our school will be averaged into one overall score. That overall score will apply to every teacher in the school. The growth of that score over a baseline score (How is this baseline determined? We don’t fully know.) from the previous year will determine our local 20%. Considering the circumstances, I believe this was the best possible choice we could have made for our school, students, teachers and administrators included. This option precludes us from having to give more exams, which I think is its most important virtue. Also, by uniting all teachers under one score it maintains that all-important atmosphere of collaboration vital to any school staff. Instead of teachers being divided by departments, all of us sink or swim as one. Administrators do not have to waste time and resources on organizing more test dates, which includes altering schedules, assigning proctors and everything that comes with ensuring proper testing protocol is followed.

Our committee also had a choice between using a “goal setting” or “growth model” process for our local 20%. In “goal setting”, the DOE issues baseline scores (based on whatever) for every teacher’s students. Then, at the start of the year, each teacher must meet with their administrator to determine how they think their students will do on the local assessment (whatever assessment that was chosen by the committee) given at the end of the year. The over-under of that prediction is essentially what constitutes that “20%”.

We chose the “growth model” formula where the growth in our students’ scores will be compared with the growth of “similar” students’ (demographically speaking, for the most part) scores from around the state. Our students tend to do very well on Regents exams for the most part, so we had the confidence in them to go with this choice. It precludes us from having to guess (and guess blindly in my opinion) how our students might perform on exams 10 months from now.

While a good portion of this past summer was spent discussing exams, the first portion of this school year has been all about “Danielson”. Exams are relatively far in the future (June is always a decade away when you are in September) but Danielson is knocking on our door now. In fact, we have already opened the door and Danielson is hanging up her coat and taking off her shoes to stay for dinner.

To simplify things, the Danielson rubric has 4 “quadrants”. Quadrants 2 and 3 deal with what happens in the classroom and, between them, count for 3/4 of our Danielson rating. Quadrants 1 and 4 deal with what we do outside of the classroom (professional development, preparation, etc.) and count for 1/4 of our Danielson rating between them. Between all four quadrants there are 22 individual points we must hit by the end of the school year. Our administrators will come in, observe us and literally check off which parts of Danielson they saw in our lesson. For those areas that are either unobservable in a classroom (because they fall under quadrants 1 or 4) or that the administrator has yet to check off, we can submit up to 8 “artifacts” a year to our administrators. These artifacts can be lessons, units, exams, certificates of completion for professional development sessions, phone logs for parent calls or basically anything that shows what you do as a teacher. Based upon those artifacts, our administrator might check off more Danielson boxes on our evaluation, or they might not check off any.

It all has the feel of a video game where we are collecting “easter eggs” or completing little missions or jumping to grab coins hidden in bricks. We have to make sure to get all of the coins by the end of the game or we will not be able to save the princess. In this case, the princess is an “effective” rating and losing a life means getting an “ineffective” rating, which puts careers on thin ice no matter how tenured or how great the teacher is.

This has led to an epidemic in my school of what I call “artifact fever”. Teachers are busily making copies, gathering records, exchanging notes and asking each other about what constitutes an appropriate artifact. The more studious teachers have already started handing in artifacts and, in a Danielson rubric posted in their brains, have already started checking off the quadrants they have fulfilled. Some teachers have the beginning of artifact fever, whose initial symptoms include confusion and disorientation at all of the hustle and bustle of their colleagues. The next stage is a feeling of delinquency because they are not gathering artifacts and so they better start soon lest their colleagues beat them in some race that nobody is really having to begin with. There are a few like me who refuse to allow some asinine evaluation system to put a bug in their ear about them being bad teachers unless they get their artifacts in. We will get them in, but we will start to do so only after we get the more important tasks of getting to know our new students and preparing them for the school year out of the way first.

And therein lies the biggest problem with this process. Here is where we see how this new evaluation regime is bad all around. My colleagues are doing what they honestly believe is right, especially since they are starting to feel the first tingles of a career in jeopardy. They might not be explicitly thinking this but lurking behind all of this to-do about artifacts and Danielson is the prospect of being rated “ineffective”, putting them on the path to termination. I would even go so far as to say that most teachers are making an effort to fulfill both Danielson and what they think good teaching is based upon their experience. This assumes on my part that Danielson and good teaching are mutually exclusive, which I firmly believe they are.

These teachers would have been hustling and bustling anyway at this time of year. They would be preparing lessons, homework assignments and decorations to start the school year off on the right foot. Their efforts are being diluted by the advent of this new rubric, this Danielson, that tells them “yes, but you must at least do this.” We grade the first homework assignments of the year while that Danielson voice goes off in our heads saying “it is nice you are grading homework but Danielson says you must at least do this.” So then we run to the store to buy more decorations so our “classroom environment” looks welcoming and educational (because that is what Danielson says) and nothing screams education like a cartoon poster of Winnie the Pooh saying “history is fun”. That of course is an exaggeration but that is more or less the nature of the pull that all NYC teachers must be feeling. Not only must we do what we know is right by our students, we have to make sure Danielson is being fulfilled and that we will achieve all 22 check marks by the end of the year.

There are some that might argue that this might make us better teachers. My response to that is you do not know what makes a teacher better. Teaching is an art, not a science and not an assembly line process. New teachers grow and flourish by getting in there and practicing their craft under the guidance of an experienced mentor who knows how to develop that teacher’s natural strengths and use them to help overcome their weaknesses. Experienced teachers grow by guiding younger teachers since it enables them to reflect on their craft and the assumptions they make about it.

But teaching is a dirty word. It is only valid when it is guided by a “framework”, which effectively perverts teaching into pedagogy. It perverts art into pseudo-science. The university education professor’s 100-year crusade to be taken seriously as a person of “science” has resulted in a “rubric”, this Danielson, that crystallizes in laymen’s terms much of the superficial babble that qualifies as “sound pedagogy” in the halls of education colleges nationwide. Sure, one might argue that if it is so superficial then it should not be a problem for a skilled teacher to easily fulfill the Danielson rubric. My response to that is a skilled teacher has deep reasons for doing the things they do and a rubric that does not speak to those reasons is not a rubric for teaching. How can there be such a rubric in the first place?

In the end, what the new evaluations are doing to New York City schools is giving them more work to do on top of the work they have already been doing. It is just that too: work. It is not a journey of professional self-discovery for teachers and administrators. It is a highly pressurized atmosphere which is causing teachers to do things they would not otherwise do mostly for the purpose of getting a few tick marks checked off so they do not end up getting fired. It is an evaluation system that was born in a culture that sees teachers as union thugs and burnouts and school administrators as middle managers whose jobs largely consist of making the little union thug dogs bark. Neither teacher nor administrator are assumed to have much knowledge of what it takes to help children learn. Instead, the experts are faux pedagogues like Charlotte Danielson and the good people at Pearson. Our jobs during the school day are considered to be those of bureaucratic functionaries who must demonstrate the appropriate outward behavior. What lies on the inside in terms of professional depth or experience is irrelevant. Worse, it is unwelcome.

In short, things do not augur well here in New York City schools

2 responses to “How the Race to the Top Evaluations Look So Far in NYC

  1. Our school chose an English based assessment for the local measure. Our Regents scores aren’t great. I liked the “default” for the reason that NYSED and the NYCDOE have no idea what they’re doing and the least extra work would have been best. I must thank Julie Cavanaugh at the MORE website for providing excellent info about these matters.

    In regards to our union leadership, weren’t they talking about reducing the “paperwork” of these new evaluations?

    • Yes, I heard that reducing the paperwork was one of things they were talking about. From what I have seen, one way or another we are in for a ton of paperwork between new standards, new rubrics, new observation protocols and new testing. We won’t even be getting our end-of-year rating until well after the school year. What a waste of time and resources, thanks in part to out union.

Assail me some more by leaving a comment:

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s