Helen Westmoreland: Welcome to today's episode, I'm Helen Westmoreland
LaWanda Toney: And I'm LaWanda Toney, and we are your co-hosts.
Helen Westmoreland: Today, we're talking about testing, which can be a polarizing issue. Whether you love them, hate them or fall somewhere in between testing is a part of our kids' lives, and it's important that we understand them.
LaWanda Toney: Helen, I'm looking forward to today's guest, because I know when Caleb's test scores arrive, they're not always easy to make sense of. I know a lot of families have other questions about testing too.
Helen Westmoreland: Totally. I think testing is a bit of a mystery for many parents, which is why we are so glad to have an expert with us today to really shine a light on this topic.
Today, we're welcoming Dr. Andrew Ho, Charles William Eliot Professor of Education at the Harvard Graduate School of Education to the show. Andrew Ho is also a Director of the Carnegie Foundation for the Advancement of Teaching and has served on the governing boards of the National Council on Measurement and Education and the National Assessment of Educational Progress. Before graduate school, he was a middle school teacher in creative writing in his home state of Hawaii and a high school physics teacher in California. He is also the father of two.
Welcome to the show, Andrew. Thanks for joining us. So I have to ask what made you want to become an expert in testing?
Andrew Ho: It's such a great question. Yeah, whenever you meet a psychometrician and that's what we call ourselves, psychometricians I often say that it sounds more like an insult than a job title. Certainly, it means that I do work in educational and psychological testing and measurement. And we always say that you can always start a conversation with the psychometrician by asking, how did you get into this field? Because, no one grows up saying I'd love to become a psychometrician when I grow up, no one even knows what it means, but I've always been interested in education and I've always been interested in teaching. And my mom was a teacher and my grandmother was a teacher. And I started teaching during the summers, when I was in ninth and 10th grade.
So I always loved teaching and I was always trying to figure out how to get students more engaged. And, in my junior year, in my undergrad institution, I studied in Kyoto, Japan, And I went to this high school in Osaka, and I immediately realized that everything was about the test. Everything was about the test, the teachers were talking about tests, the students were talking about tests. I said, wow, what an amazing tool to focus, the attention of students and teachers and educational systems. The problem is these are bad. They're not good, and so if only I could make them better, if only I could create a test worth teaching to, then I could improve educational systems. I have since discovered that that is incredibly naive, and yet it is also the hope of so many teachers as they design their own tests and administer them to their students, so, many education leaders and as so many policymakers..
LaWanda Toney: So Andrew, what is the purpose of standardized testing?
Andrew Ho: Whenever I talk about the fact that I do work in testing, people immediately gravitate to one of many particular uses of tests. Can you guess what use that is?
Helen Westmoreland: Is it accountability?
Andrew Ho: So it's actually not, and so I find it really interesting. I think different families will talk about different kinds of testing. It's not accountability testing. When I talk to my relatives, it tends to be much more about the SAT and the ACT. So, but I love how you immediately went to accountability testing, because that is what I would consider a very different purpose of testing. And so you can see immediately, that's actually the area of testing that I do the most work in. And yet, my family immediately jumps to, oh, tell me more about how, my kids can do better on the SAT and ACT exams. I'm like, I actually don't study that as much as I do all these other purposes of testing.
So I find that also, when we have disagreements about testing and controversies about testing, a lot of it stems from the fact that we're talking about different uses and purposes of testing.
I actually have these f four quadrants and this isn't a visual medium, I understand. But put, present this, I put this on a slide and I say, is it high stakes or low stakes? And that's one dimension, high stakes or low stakes, and then I say, is it more for individuals or is it more for groups? And, if you take the sort of two by two diagram, is it high stakes or low stakes or for individuals or for groups, you start to see the four different purposes of educational testing emerging. And so, the first and most widespread use of educational testing is the one we rarely debate. It's what teachers do with their students in the classroom every day.
It is informal, and it is what I do with my own students, it's what you've done with yours. It's what parents do with their kids, when they're there working with them on homework. Homework is often graded, that's a kind of assessment. Teachers watching students in classrooms, that's a kind of assessment. Are they engaged? Are they paying attention? And then there are slightly more formal assessments that go on in classrooms all the time at the end of the week, often, or at midterms or at the end of the semester, we have all sorts of classroom assessments.
The vast majority of the amount of testing that happens in the United States is in this one quadrant that is not often debated or discussed. We then can shift in these four quadrants, right to the high-stakes individual assessments that my relatives asking about. Right. And these are often college admissions tests, but they are also tests that we sometimes use for screening people into remedial or accelerated education classes. We can use these tests to diagnose learning disabilities. Those are relatively high stakes individual tests and assessments, and yet a different kind of purpose, that demands a different kind of test.
And now we move up a level to the level of accountability testing, where actually the student test scores, individual student test scores don't matter that much. We're more interested in understanding how our schools are doing and how our teachers are doing right, and how our districts are making progress over time. And that is yet a high stakes, but group level assessment where we're trying to figure out how we can encourage our schools and our teachers and our districts and our systems to make progress. And then finally, the last quadrant, which also is, I think, underappreciated and under discussed is this low stakes aggregate level testing what I call the monitoring quadrant. What do I mean by monitoring?
We just want to know if we're making educational progress. We're not going to place high stakes on it. We're not going to withhold your funding. We're not going to increase the salaries of our teachers on the basis of it, where we just want to know, as a citizenry, as a group of Americans, like here in, in Massachusetts, for example, is Massachusetts making progress? Is the United States making progress? And for that, we have this, this assessment that you mentioned when you introduced me called the National assessment of Educational Progress.
Helen Westmoreland: That's such a good point. \I think for me as a parent and even having worked in education, I wish and feel frustration sometimes that those higher stakes group tests around accountability are more useful to my child. Like, understanding where my child is individually. I wonder if that's a frequent misunderstanding around testing and what other big misunderstandings do you see in the field?
Andrew Ho: It's such a great observation Helen. I just got my MCAS scores. MCAS is the state test here in Massachusetts. I just got them. It's like December, what are we doing?
Helen Westmoreland: Your kid took that in March, April probably.
Andrew Ho: So which is to say, right, you can see how the purposes of the test drive, how we produce results. And when we produce results, because those tests are really not designed to support individual instruction. The states just punt on them, they're like, oh, we'll get them back to you in the fall, this really isn't meant to serve this purpose. What's interesting. You have to keep an eye on it, right. Sometimes they're sold as serving that purpose.
Helen Westmoreland: Yeah, I totally thought that was their purpose.
Andrew Ho: Right. And of course, one of the big misunderstandings about testing that I find my four-quadrant diagram helps to clarify is that no single test can serve all purposes well. And that actually the only way to cross purposes is to invest a massive amount of money that amounts to duplicating an entire testing program. Which is to say, yes, it's true that accountability tests oversell and underserve the purpose of informing instruction and providing parents timely information.
LaWanda Toney: So for me my son is in the third grade, now. and he goes to a public school. However, he goes to a specialty school, they focus on Montessori learning. So it's definitely a challenge for him testing wise, because that's not the approach of teaching that they have. So I can remember when he took his first test, he was like, why do I have to answer all these questions? Can I come back to it? Can I do it later? Because, that's the style of teaching that he had but they're taking standardized tests, but they don't teach in that format. So it was definitely jarring for him, at first. And I'm sure other people have challenges with testing as a way to assess how my child is doing.
Andrew Ho: Yeah. That is a common concern. If you look at what a standardized test administration looks like, it does not look like a typical day at school and especially at a Montessori school.
There has long been a desire to create more what we call authentic assessments. Authentic assessments, and wouldn't that be great, if we could have assessments that, that sample, that select from what a kid does, you know, in a typical class, on a typical day. That turns out to be a very, very good way to do, classroom level assessment, individual low stakes to provide information to inform instruction and in tell parents how they're doing, right. There's no better assessment. I think than just opening up, I've got a fifth grader and a second grader. You just open up their notebook at the end of the week and look at what they did for the week. You're like, oh my goodness, this is wonderful. And yes, you need to make progress here and improvement here, and this is also wonderful.
That's an incredible assessment. It is not standardized in the sense it's not comparable across different weeks, to different kids. It's can't aggregate them and average them up to the state level. And there's the rub, right? So to serve the purpose of monitoring or accountability, we need to have more comparable assessments. And then all of a sudden you have this very unnatural scene in a school where all these kids are standardized. They're comparable, because that enables us to actually compare, my daughter to the, to the kid sitting next to her, to the kid, who's taking a similar test in another part of the state. And so, that is the trade-off and one would think that we could just sample in an authentic way from classrooms, but what that sacrifices is comparability.
And that's what we really mean by standardized. Standardized has taken on a very negative connotation. But under its hood, right under the hood of standardization is comparability. So if you have a question that, that requires fair comparison of scores from one kid to another, and we might argue about whether or not that comparison is necessary. But if one wishes to make that comparison, you must have a basis for fair comparison.
LaWanda Toney: Right.
Andrew Ho: And that is where standardization comes in.
Helen Westmoreland: That's such a good point. So how much should parents actually worry about all these different test scores? I think that's a leg big question on everybody's mind, particularly now, we're seeing a number of colleges have waived ACT and SATs, for the year, but it still causes a lot of anxiety, worry, frustration for families and their kids, particularly on those high stakes dimensions test, but just generally worry about like, it is time away from instruction or enrichment What do you advise in terms of how we, as a general public think about testing and our worries around?
Andrew Ho: It's such an important question, and if I could leave this audience with one takeaway, it's please worry less about test scores, please, please worry less about test scores, but I can actually be more specific than that. Please worry less about test scores for your child. Please worry more about test scores for children. Which is to say, let's try to think about how these monitoring assessments that at a high level track educational progress. Let's worry about that and closing gaps in inequality and disparities that we see. Let's worry more about the aggregate. And less about our kids' numbers. Now of course, far be it from, for me to, as a parent myself, easier said than done, like don't care, don't care so much about your kids try to care about all kids. But, I would say right, that there are very specific mechanisms, very specific fallacies and errors that we make when we interpret our kids' test scores.
Like I'll use I-statements when I interpret my kids' test scores. And I do this for a living and I still make what I'd like to think of is like the three most common errors, that parents make when they interpret their kids' test scores. And the first is that you think the scores are more meaningful than they are. Second, you think the scores are more precise than they are.
Helen Westmoreland: LaWanda, have you fallen in that bucket yet, any of those?
LaWanda Toney: Yeah. I would say two out of the three, for sure.
Andrew Ho: More meaningful than they are, more precise than they are, and more permanent than they are. Those are the fallacies, those are the inaccurate interpretations that we make of our own kids' test scores,
LaWanda Toney: Even when we know better.
Andrew Ho: I study this stuff. I can show you the bell curves that show the massive amount of imprecision. And I can show you the growth curves that show you how impermanent these scores are, and I can look at the test questions and show that they are not the be all end, all of education and yet, and yet, right, we are weak to numbers
LaWanda Toney: Yes we are weak to numbers. It's terrible.
Andrew Ho: So that is one of the reasons why I, I have found my original statement of purpose, for why I went into this field as naive. Because I used to say, well, let's create a test worth teaching to, but it turns out that no test, can solve the problem of a parent or a teacher or an admissions officer over interpreting a number in those ways.
So that to me is what I wish I could leave your audience with is dialing the stress level far back to show through evidence, which we have, how test scores are just a sample. They are not the be all end, all they're imprecise and their impermanent.
LaWanda Toney: I love that, especially the impermanent part. They think that it's just a paper trail that will carry on with your child forever and what does that really mean? So, I'm glad that you're saying it in this way. A lot has changed due to COVID, so how have remote assessments affected the way school systems are using standardized testing now?
Andrew Ho: Just the fact that we are testing remotely is, in many cases, a dramatic shift. I can tell you, as a psychometrician on the backend that we have done a lot of work, trying to clarify how comparable those scores are, because as we've said, standardization is about comparison. And we want to make sure that those scores that are taken at home often in extremely different environments and sometimes quite chaotic environments and sometimes quite stressful environments, especially at the height of COVID. And our general findings were that you cannot trust those remote test scores, especially at early grades and in low stakes conditions. Now certainly for SATs and GREs and, and other more formal testing programs, they are doing their utmost to create fair conditions where people cannot, to be quite blunt, cheat.
But that is a very, very costly endeavor and, and so I just say that there's a lot of work going on to ensure that those scores are in fact comparable. I think the more important question is really about how well we know the debts we need to pay to our schools and societies to recover from COVID and the estimates that we have, although those estimates are, profound and reveal considerable funding gaps that we need to close. They're also a bit suspect, because the population has changed so much and the conditions have changed so much.
So this is again, the enduring measurement challenge is, how can we monitor progress over time?
LaWanda Toney: Yeah.
Helen Westmoreland: Could you talk a little more about that? Because, I think one of the, big debates that's going on in a lot of states and communities now is like, can we, should we, waive, that annual high stakes group, standardized test? And, in part, I think some of that pushback, particularly from parents does come from feeling like time and effort spent on these things that don't necessarily implicate my child is a little outsized and here we are in this incredibly difficult situation. My kid might not be a good test taker? What is your perspective on, that debate? What are some of the changes we should be thinking about as a nation around that end of year annual standardized test that so many schools have taken and put so much effort into because it does drive funding and their grades and their whatever.
Andrew Ho: What I've tried to recommend is that we right-size educational testing, which is to say it has been way too heavy on the accountability side and we can have a much lighter footprint and still answer the same important questions by shifting to low stakes aggregate group level assessment. And that does not need the same amount of time.
For example, here in Massachusetts, they simply cut the test in half, they realize, hey, wait a second, to understand how disparities have increased that we need to address, we actually don't need to test everybody from it for the same amount of time, we could just take a sample for the, in the same way that you can conduct a political poll, not by calling every single person in the country, but just by picking a thousand people. So too, can we test for less time and potentially fewer people in the future.
Not swinging the pendulum so far that we can't answer important questions, about where and whether to fund our schools, but rather put it right back in the middle to where we have good measures of educational progress that we can act on without the high stakes that distort our teaching and shift our attention away from academic health, social, emotional health, physical health that should be the highest priorities.
Helen Westmoreland: All the educators in the room, too, right, who are like themselves having some really challenging experiences during the year. And like, I, I can't imagine being a teacher and having to like go into prepping your kids for high stakes tests and just a few months. That's, that sounds hard.
Andrew Ho: Certainly, teacher level evaluation based solely on test scores, is indefensible at this time, there's no empirical support for it. And we've been very outspoken in our field about how that would be a misuse of test scores, in these conditions.
LaWanda Toney: Andrew, can you talk a little bit more about some of the inequities around testing? Even as far as maybe some of the language used in the development of tests and things like that.
Andrew Ho: Yeah, it's one of the most fundamental questions in our field does testing expose or exacerbate inequality. And the answer of course is both. Yes, it does expose inequality. That we must address as a society, but it also worsens it because, of the ways we use testing, because of the purposes we use testing to serve. And so I've struggled with this balance throughout my career, and tried to say yes, numbers can be used for good, but let's not forget, they're currently being used to exacerbate inequalities. And to your question about the ways that we design tests, to pull the curtain back a bit on what tests actually are. I strongly encourage every parent to what we call RTQ, read the question.
So just take some sample questions and read them, and what Massachusetts in many other states do Massachusetts does this particularly well, they release all of their items, all of their test questions, and you can see all of them and say, what is this measuring, really? And that helps I think, demystify what tests really are, right? And I wish they did more of this on our score reports. So it's not just a number and not just a description of what proficiency is, although that's useful, but show me a kind of questions, my kid can get right. And show me the kinds of questions my kid might not be getting right, but could be getting right with additional help in the future. Show me like brass tacks, like show me like a specific example of the kinds of questions they can answer correctly, and then I'll decide whether I'm worried or not.
LaWanda Toney: I love that. And I don't think a lot of parents realize that they may be able to find the sample questions on the test. Where would people go?
Andrew Ho: Yeah, so every testing program has released at least sample items. Massachusetts has a very, very rich database of exactly these and I wish they provided them in their reports. It would help to demystify, but they're all Google-able, it's suffice to say at every testing website, and I'd be happy if people want to find me on Twitter to direct them to the, to the, to the right location.
Helen Westmoreland: I mean, you mentioned the reports and one of our partners here at National PTA, Learning Heroes, who's done some really incredible research on those reports that come six months after your kids taking a test or something and what parents really think of it. And you mentioned one of the things, which is like, improving those reports could be adding sample items. Are there other things you'd encourage parents to do, or test providers quite frankly, to do, to make that information a little less confusing and scary, for families who receive those reports?
Andrew Ho: It is really hard and I don't think a score report on its own will suffice. I think that there needs to be more of a human face or a deliverer of a message. In the same way that if you get a much more precise and consequential diagnosis from a doctor, that they sit down with you and talk about what these, what these scores mean. Now these are much blurrier and you kind of need someone to say that. You need someone to say, look this is just like a sample on a day. And here are the kinds of questions it measures, it's not the be all end all. If we did it again, it could blur this much. It could vary this much.
It's astounding how much imprecision, and impermanence again, there is in these numbers. The margins of error that you see around the scores in these score reports are underestimating the amount, like if you tested them today, right. And it would be completely different, right. Again, six months later, so it's like this moment in history, this single snapshot. Imagine going through your phone and being like, oh, let's take this random picture from this random day and see if that captures who my kid is. It's just not how we should be interpreting numbers.
So I hope we can become stronger to the interpretations from test scores and I don't think a score report on its own will do that. I think it takes interventions like these, conversations like these that parents have with their teachers, with their school leaders with other parents that say, hey, this is not the be all end all. This is just like a single picture from your phone, on a single day, way back when.
LaWanda Toney: Yeah, I think that that's also something that we need to share with our kids as well, because they get very stressful, because of the way that it's presented to them. Make sure you get a good night's rest before testing. Make sure you do this, what you should be doing that every day. Yes, and I think that builds a lot of anxiety on them and their ability, because they want to know how did I do? What happened and, and I think just what you said about taking a beat and, and, and really putting it in its context, that one picture, that one snapshot, not the whole year.
It doesn't show like your collage of pictures, it just gives you that one little frame. So I love how you said that. Now you've given us Andrew, a lot of good things to share with our families, and I know you mentioned one thing you want it families to walk away with, but do you have another?
Andrew Ho: I love what you said about, how we, again, overinterpret scores and that it's not just about parents, it's about kids. That's a really important point. And I think as parents, we should remind our children as often as possible, in particular about the impermanent part, right. That scores are improvable. And again, our weakness to numbers makes us assume that these numbers are somehow immutable, revealing these truths about ourselves, that we cannot change. And certainly the interventions we have these days around improvement mindset, growth mindset, where I always tell my students, it's okay to be wrong.
If you're wrong, it just means you haven't learned it yet, right. And to, and to know that saying that is insufficient. Repeating the mantra, it doesn't mean you've started to believe it, right. But there's another way to convince folks about that, about the, the relative imprecision and in consequentiality of numbers too which is to remind folks that the best, teachers and frankly, admissions officers, the people who they believe have power over them and use these numbers to wield power over them, are actually not using numbers the way that most kids and parents think, and what I try to remind parents of is that in college admissions, right? We have what I like to call the big five. And the big five are number one, test scores, number two, academic record, GPA, number three, extracurricular activities, number four letters, and number five essays. And so that's the portfolio. And test scores are only one of those five and yet take on this dramatically over-weighted importance in the minds of kids and parents. And it's interesting, I think, and important to recognize that the other four, receive much less attention.
So I think reminding ourselves that there are five and each of these five represent massive universes in and of themselves. We have to take a multi-dimensional perspective on our kids and how we help them,. And help them learn, help them grow. And that's not what tests are good at, it must be what we are good at in, in our use of scores as teachers and as parents and as citizens.
Helen Westmoreland: I have learned so much, I also just want to call out and appreciate that from a psychometrician testing expert, you have said care less about tests, which not too many people say about their career, the fields of study they've chosen. If folks want to learn more, Andrew, about this debate or your work specifically, do you have any social media handles or websites that you'd encourage people to check out?
Andrew Ho: Of course, yeah. I am @AndrewDeanHo. Dean is my middle name, not my job title. Andrew Dean Ho, H-O at Twitter.
LaWanda Toney: Awesome. Thank you. Yes. Thank you so much.
Helen Westmoreland: This has been awesome. To our listeners, thanks for joining us, please remember to visit Apple Podcast page and leave a rating and review, we'd love to hear your thoughts on the season so far. And as always for more resources related to today's episode, check out notes from the backpack.com. Thanks for listening and join us next time.