"I know no safe depository of the ultimate powers of the society but the people themselves; and if we think them not enlightened enough to exercise their control with a wholesome discretion, the remedy is not to take it from them, but to inform their discretion by education. This is the true corrective of abuses of constitutional power." - Thomas Jefferson 1820

"There is a growing technology of testing that permits us now to do in nanoseconds things that we shouldn't be doing at all." - Dr. Gerald Bracey author of Rotten Apples in Education

Search This Blog

Tuesday, April 24, 2012

The Black Box of Computerized Assessments - Does Anyone Know What's Really In There?

There's an old saying about the inmates running the asylum that comes to mind with this story from a science educator in Florida who looked into the science portion of Florida's standardized assessment. Robert Krampf was developing some science test prep questions for 5th graders preparing to take that state's high-stakes FCAT test.  Using the FL DOE's own site as a guide, he found many flaws in what was presented as science.
A few weeks ago, I started developing FCAT practice questions to help students review concepts and prepare for the test. To develop those questions I used FLDOE's FCAT 2.0 Science Test Item Specifications. These documents are used as:
"a resource that defines the content and format of the test and test items for item writers and reviewers." 
I expected the Test Item Specifications to be a tremendous help in writing simulated FCAT questions. What I found was a collection of poorly written examples, multiple-choice questions where one or more of the wrong responses were actually scientifically correct answers, and definitions that ranged from misleading to totally wrong.
On his blog - the Happy Scientist, he sited these examples:
A glossary of definitions (Appendix C) is provided for test item writers to indicate the level of understanding expected of fifth grade students. Included in that list is the following definition:

Predator—An organism that obtains nutrients from other organisms.

By that definition, cows are predators because they obtain nutrients from plants. The plants are predators too, since they obtain nutrients from decaying remains of other organisms. I have yet to find anyone who thinks that this is a proper definition of a predator.
This sample question offers the following observations, and asks which is scientifically testable.
  1. The petals of red roses are softer than the petals of yellow roses.
  2. The song of a mockingbird is prettier than the song of a cardinal.
  3. Orange blossoms give off a sweeter smell than gardenia flowers.
  4. Sunflowers with larger petals attract more bees than sunflowers with smaller petals.
The document indicates that 4 is the correct answer, but answers 1 and 3 are also scientifically testable.
For answer 1, the Sunshine State Standards list texture as a scientifically testable property in the third grade (SC.3.P.8.3), fourth grade (SC.4.P.8.1), and fifth grade (SC.5.P.8.1), so even the State Standards say it is a scientifically correct answer.
For answer 3, smell is a matter of chemistry. Give a decent chemist the chemical makeup of the scent of two different flowers, and she will be able to tell you which smells sweeter without ever smelling them.
While this question has three correct answers, any student that answered 1 or 3 would be graded as getting the question wrong. Why use scientifically correct "wrong" answers instead of using responses that were actually incorrect? Surely someone on the Content Advisory Committee knew enough science to spot this problem.

This is another example of why local control is so important. Who does the public, or teachers for that matter, go to to correct these types of errors in a test they didn't write? There is not only no mechanism for doing so, to even suggest you might want to, is to invite the accusation of cheating.

Teachers describe the atmosphere is school during these standardized tests as similar to nuclear lock down. The test forms are stored in the office and only released to the teachers moments before testing begins once something akin two officials turning their keys simultaneously after entering the launch codes occurs. Teachers are forbidden from doing almost anything to help the children during the test and there are sometimes observers in the classroom to make sure teachers follow this rule.

Here is one teacher's account of what happened during his class's algebra end of course (EOC) exam done on computer:
The teacher/proctor is not permitted to read the question, only to assist with students and computer operating issues.  Student works through problem on scratch paper, and finds that his/her answer doesn't match with any choices given.  Teacher looks at problem worked out on scratch paper, and determines that the child has correctly answered the problem, but the correct answer is not one listed.  Teacher can do nothing about it since he/she is not permitted to read the test, only the test prompts.  School therefore does nothing.  Child has no defense, and since testing by computer is graded by computer, the testing company is not held accountable. 
Stories like this abound in the comment sections of similar articles.

Since no one is allowed to see the test before testing begins, there is no way to correct errors, even innocent typographical ones, by the test developers. And since teachers are not supposed to be reading the tests, there is no mechanism for them to report faulty questions after the testing has been completed. The same errors can last year in and year out depending on how often the test developers decide to reuse old material.

An obvious fact should be springing to everyone's mind. The computer grades these tests based on what the test developers have told it is the correct answer. In the science question above, children who answered #1 would be marked wrong even though they are not only inherently right, but are correct as defined by their own Board of Education. In the algebra question above children have a 25% chance of guessing the correct wrong answer previously determined by the mental midgets who developed the test. The computer is only as smart as the people who programmed it. If they are not that smart, why do they have control of the education system?

And therein lies the biggest problem in standardized testing, done on computer as is planned for all Common Core assessments. The tests will only be as accurate as the people writing and entering them in the first place. If those people don't know their subject or don't care about accuracy, then what chance do our children have of scoring well on these exams?  If no one except the students is allowed to see the exam, how will we even know what they are actually being tested on?

Worse still is the fact that we will now hold teachers and schools accountable for what amounts to the errors of the testing company. Who will hold Pearson, or McGraw-Hill responsible? Can they defend the qualifications of their "content committees" if these types of errors appear on their tests? When thousands of educational dollars are on the line for these test scores, their answer should be "no." Nothing short of perfection should be accepted.
Local control could address these errors. Massive government funded and mandated distant bureaucracies have little incentive to address them at all, let alone on a timely basis.  We should not accept the black box of computerized standardized assessments.

No comments:

Post a Comment

Keep it clean and constructive. We reserve the right to delete comments that are profane, off topic, or spam.

Site Meter