Some thoughts on final exams

Posted on January 29, 2010 6:05 AM by Andrew

I just finished grading my final exams–see here for the problems and the solutions–and it got me thinking about a few things.

#1 is that I really really really should be writing the exams before the course begins. Here’s the plan (as it should be):
– Write the exam
– Write a practice exam
– Give the students the practice exam on day 1, so they know what they’re expected to be able to do, once the semester is over.
– If necessary, write two practice exams so that you have more flexibility in what should be on the final.

The students didn’t do so well on my exam, and I totally blame myself, that they didn’t have a sense of what to expect. I’d given them weekly homework, but these were a bit different than the exam questions.

My other thought on exams is that I like to follow the principles of psychometrics and have many short questions testing different concepts, rather than a few long, multipart essay questions. When a question has several parts, the scores on these parts will be positively correlated, thus increasing the variance of the total.

More generally, I think there’s a tradeoff in effort. Multi-part essay questions are easier to write but harder to grade. We tend to find ourselves in a hurry when it’s time to write an exam, but we end up increasing our total workload by writing these essay questions. Better, I think, to put in the effort early to write short-answer questions that are easier to grade and, I believe, provide a better evaluation of what the students can do. (Not that I’ve evaluated that last claim; it’s my impression based on personal experience and my casual reading of the education research literature. I hope to do more systematic work in this area in the future.)

12 thoughts on “Some thoughts on final exams”

Ahuri on January 29, 2010 5:05 AM at 5:05 am said:

It might be that you are touching upon "une spécificité française" here. I recall finding exams during my ERASMUS year in England quite shallow, because they tended to have a lot of short questions about many different concepts. My high schoold and university education in France was quite different : exams were more like long, multipart essay questions. There was always a risk that you could "dry up" ("sécher") on the first question and fail the whole essay, because you didn't get started at all. That was virtually impossible with the English exam I had. So it might be that your student didn't do that well because they were expecting a long, multipart essay, and revise your course accordingly (that is, they skipped some parts but went deep into a few topics expecting big questions on them.). That said, my last exam was 4 years ago, so this comment could just be water under the (Parisian) bridge.
Mike@pvl on January 29, 2010 6:19 AM at 6:19 am said:

"Write the exam first" is a major part of the Understanding by Design approach, which helped me a lot when I taught HS. Might be worth checking out: http://www.ascd.org/research_a_topic/Understandin…
Antonio on January 29, 2010 6:59 AM at 6:59 am said:

What is the background of your students? What type of stats they had before your class? Your exam is about applied stats but I would guess that most students in France are used to math-stats type exams.
Andrew Gelman on January 29, 2010 7:18 AM at 7:18 am said:

Mike: Yes, this looks very good. I wish I'd known about these ideas 20 years ago so I didn't have to figure out these principles from painful experience.
Ed on January 29, 2010 7:58 AM at 7:58 am said:

The best course for grading I took in college was a comparative literature course, where the assignment was to hand in one two page essay per week on the topic (book) covered that week. There was a chance your essay would be read and discussed in class (my university had lectures and classes with face to face discussions, sometimes conducted by a grad student and sometimes by the professor). You would hand in ten of these essays, with the worst grade dropped and the other nine averaged to give your class grade. There were no exams.

From a student's perspective, there was tons of good stuff about this system. Best of all, by final exam time, you were done with the class, and could concentrate on studying for the classes which kept the traditional finals. But you had to do the work and reading each week to turn in a coherent essay -you couldn't blow off a week and hope to catch up/ cram later in time for the finals. And the grading seemed fairer than my other classes. At least if you were off the same page as the professor it would show up in the first couple of essays and you could adjust.

Now this was a comparative literature classes, and not all courses lend themselves to the essay format. Though you could substitute labwork for some science classes. But how about a two or three section final, and two or three interim exams? Each section of the final covers the same material as an interim exam, though the questions are different. Only the higher grade in that section of the final or corresponding interim exam gets counted. A student who blows an interim exam gets another chance on the final, or a student who does well on the interim can even skip that section of the final, with the second strategy less riskier.

I'm assuming the purpose of exams and grading is to first, give students more incentive to master the material, and second measure how well they master the material after learning it, so designed that system with those goals in mind. The biggest problem I've seen with grading is that the professor is bad at communicating to the students what exactly he is trying to teach and what they will be graded on, and I have memories going into midterms and being presented with a completely different test than I've studied for (this might be where the common dream of taking a final test in a course that you didn't know you were taking came from). So some sort of trial test should be incorporated. Also, I never understood the practice of midterm that counts, then final that counts that tests a second time on the first half material, plus tests on the second test material. What is the point of testing twice on material from one half and only once on the other? If its to see whether students forgot what they learned a quarter earlier, OK now they will just forget the material three months later than otherwise.
Keith O'Rourke on January 29, 2010 12:33 PM at 12:33 pm said:

Best course I took for grading in grad school (MBA) was everyone but two randomly drawn students got B+, they by a coin toss got A- versus B-.

But seriously, teacher education is slowly drifting up to university and for applied stats in particular, I always thought that what was really important was not the puzzle solving but thats largely what gets tested.

Puzzle solving skills are essential for anyone who wants to do math and math/stat but I don't beleieve its the most important skill for grasping research and doing applied statistics.

Or maybe its just too easy to pass the test by simply knowing how to solve the puzzles so student don't have to waste time thinking about important modelling issues.

Keith
p.s. I would have missed question 9 without the the phrase "observed" treatment effect was always exactly 5.
Joel Elvery on January 29, 2010 12:51 PM at 12:51 pm said:

I have switched to the many short question approach for my exams for a Master's level class for public administration and urban planning students. In addition to the advantages you mentioned, I like them because I can add one or two very difficult questions to test the upper bounds of students' knowledge. That is risky when there are only a few questions in the whole exam, but its fine when there are plenty of reasonable questions.

While I think the many short question approach is a better gauge of what students know, I think it is not as effective a learning tool as a well written multi-part essay question. I think of these longer questions as roadmaps that lead students to a higher level of understanding. I use that approach for take home exams for more advanced classes.
Andrew Gelman on January 29, 2010 1:20 PM at 1:20 pm said:

Keith:

I could see that you might view my exam as puzzle-like, but I'm not really sure what else I can do. If the questions are too straightforward and insufficiently puzzle-like, then they reward the process of plugging in the formula, which seems to me to be a dead end if you want to actually use the stuff. Conversely, essay-style problems (for example, where the student is asked to consider potential problems with a study) are super-hard to grade and, to me, encourage a sloppy sort of thinking that I associate with liberal-arts students who are planning to go to law school–the idea that you can find 3 good reasons to shoot down any possible study.

I'm open to be convinced on this–if you have some ideas for better sorts of exam questions, please post them here.

In the meantime, I think that the substantive question of what is covered on the exam is less important than the procedural issue of making sure that the course and the exam are aligned. Better to teach people something, I think. As it is, I suspect that my classes are intellectually exciting experiences for the students, but they don't really leave being able to do much more than they could do when they came in.
Mitch on January 29, 2010 11:17 PM at 11:17 pm said:

Now that I teach high school Algebra and Geometry to students with low skills, I see how critical it is to align instruction to my assessments after they've been written. Having a vision of what I want them to be able to do is really helpful. I can then structure instruction around solving those kinds of problems and use them to teach the broader concept and still get accurate data about mastery on the assessment.

Then again, I'd expect a longer radius for college and graduate students who should be stretching ideas further than high school students may be able to by the 9th grade.
Will on January 30, 2010 7:09 AM at 7:09 am said:

I think writing the exam first is a great idea. I got about half way in my last two courses by having my preferred exam dataset in mind while I wrote the course. Knowing what I wanted them to be able to find in it was very helpful for deciding course structure.

I note in passing the almost uncanny similarity of your suggestion to 'extreme' programming, where one is supposed to write the tests first and then the code to pass them. In this analogy, the short answers versus longer essays dilemma seems to be the graphical user interface testing problem – that is, the difficulty of testing all possible sequences of interactions. Writing a test for 'first they press this, then for some reason they cancel it and press the other button, then quit the program' is tricky in, it seems, rather the same way as marking a set of otherwise sensible inferences that rest on a single mistake earlier on in the question. The upshot seems to be that test orientation rather forces one into shallow 'one step' GUI sequence tests, and short answer exams.
Keith O'Rourke on February 1, 2010 6:42 AM at 6:42 am said:

Andrew – not especially "puzzle-like" – question 4 and the programing ones had a high ratio of concept testing compared to "plug and chug" (though its hard to discern being outside the course.)

But I think it is one of the critical insurmountable opportunities for teaching applied statistics – to allign the course and the exam – and so I raised it.

Certainly will be much work but hopefully there could be some community learning.

I don't have anything I can share here on it, except perhaps to refer to Freedmanet al's discussion of this issue in the teaching manual for thier Intro stats book.

Keith
Bob Carpenter on February 1, 2010 9:09 AM at 9:09 am said:

This is the system I like, for all the reasons Ed mentioned. Especially the way it encourages class participation. I also like the way it more naturally mimics how one works in the real world (though not in terms of timing).

All of my undergrad math major classes at Michigan State followed this model, but calc, diffeqs and matrices had traditional exams. All of my philosophy and cognitive psych classes followed this model, though they often also required final written projects. That meant a whole lot of grading on the prof's part, but it was hugely helpful in both math and philosophy.

The only exam I had in my Ph.D. program in Edinburgh was a qualifying-like exam that was a grueling multiple day take-home open-ended affair at the end of the first year. It's also how we wound up giving quals in Carnegie Mellon's computational linguistics program. Everything fully open book so you don't get bonus points for being able to read 4 pt font.

The other thing the math classes did is set up problems that were hard enough that if you did half of them, you got an A in the class. In math, things tended to build on themselves, so it was unlikely you missed whole concepts. It not only got you thinking about harder problems, the solutions to the ones you couldn't solve on your own were also very interesting to me as a student.

Comments are closed.