Standardized Testing
Summary
TLDRIn this video, Dan Hickey explores standardized testing, discussing its practices, principles, and policies. He explains what makes a test standardized, including the use of a common item bank and standardized scoring. Hickey touches on the role of educational and psychological standards, the application of item response theory, and the evolution of testing formats with technology. He also addresses the interpretation of test scores, the controversy surrounding their use, and the challenges in leveraging them for instruction improvement. The video concludes with a call for educators to understand standardized testing's basics and its impact on their practice.
Takeaways
- ๐ Standardized tests involve all test takers answering the same questions, often drawn from large item banks with known difficulty.
- ๐ฏ Standardized tests are scored in a consistent manner, though they may not always be aligned with educational or psychological standards.
- ๐ฌ Many standardized tests, especially those using psychological measurement, are aligned to theoretical constructs rather than formal educational standards.
- ๐ง Item Response Theory (IRT) allows test developers to understand the relative difficulty of test items, making it possible to equate different versions of the same test.
- ๐ Standardized tests often involve selected-response items, but performance assessments and writing tests are notable exceptions.
- ๐ก New developments in standardized testing, such as evidence-centered design and adaptive tests, are changing how tests are structured and administered.
- ๐ซ Standardized tests are often dictated by the educational context, meaning educators may have limited control over their use.
- โณ Standardized testing consumes significant time and resources, affecting instructional time and costs in schools.
- ๐ Interpreting standardized test scores can be done using percentiles, grade-level equivalents, or more complex scale scores, which are difficult to understand but more accurate.
- ๐ Achievement test scores are used for evaluating students, teachers, and schools, but their effectiveness in improving instruction is highly debated.
Q & A
What is a key difference between standardized tests and other types of assessments?
-One key difference is that standardized tests require all test takers to answer the same questions, often drawn from a large pool of carefully constructed items with known difficulty.
What is item response theory (IRT) and why is it important in standardized testing?
-Item Response Theory (IRT) is a psychometric technique that helps test developers determine the relative difficulty of each item in a test. This allows for the creation of different versions of a test and enables the equating of scores across these versions.
How do standardized writing tests differ from traditional multiple-choice standardized tests?
-Standardized writing tests require test takers to respond to the same prompt, and their responses are scored in a standardized manner, often by both computers and humans, unlike multiple-choice tests which involve selected responses.
What is evidence-centered design and how is it impacting standardized tests?
-Evidence-centered design is a newer approach in standardized testing that allows test takers to answer different items based on their responses to previous questions. This model, used by organizations like the Smarter Balance Assessment Coalition, is changing the traditional formats of standardized tests.
What challenges come with using standardized performance assessments?
-Standardized performance assessments face difficulties because they are more open-ended, making it harder to apply the psychometric assumptions and techniques that work well with selected-response items.
What is the difference between the mean and median when interpreting test scores?
-The median represents the middle score in a distribution, while the mean is the average score. The median is easier to understand but the mean provides a more accurate picture of central tendency.
How is the standard deviation used in interpreting test scores?
-Standard deviation measures the spread of scores around the mean. In a normal distribution, approximately 68% of scores fall within one standard deviation above or below the mean, while 95% fall within two standard deviations.
Why have percentile scores and grade-level equivalents been largely replaced by scale scores?
-Percentile scores and grade-level equivalents are easy to interpret but can be misleading depending on the norming group. Scale scores, while more complex, provide more accurate comparisons across different groups and test versions.
Why is it problematic to use achievement test scores to compare schools?
-Comparing schools based on achievement test scores has often led to negative consequences, such as overemphasis on test preparation rather than instruction improvement, and the use of scores for political purposes rather than meaningful educational reform.
What is a common criticism of using standardized tests to evaluate teachers, as seen in initiatives like Race to the Top?
-A common criticism is that using standardized test scores to evaluate teachers is problematic because these scores do not directly reflect the quality of instruction. The tests are often disconnected from classroom activities, making it difficult to link test scores to teaching effectiveness.
Outlines
๐ Introduction to Standardized Testing
Dan Hickey introduces the topic of standardized testing, emphasizing its relevance to educators, administrators, and teachers in terms of policy. He discusses the characteristics of standardized tests, such as the use of a common question bank, scoring methods, and the alignment with educational standards or psychological constructs. Hickey also touches on the use of item response theory (IRT) in test development and the evolution of testing formats, including the impact of technology and evidence-centered design.
๐ Interpreting Standardized Test Scores
This section delves into the interpretation of standardized test scores, focusing on individual and group interpretations. Hickey explains the concepts of median and range, then transitions to the importance of understanding mean scores and standard deviations. He uses a normal distribution diagram to illustrate how scores are distributed and the significance of standard deviations in interpreting test results. The paragraph also discusses the shift from percentile and grade level scores to scale scores, which are more accurate but less intuitive, and encourages educators to understand the context of scale scores through test interpretation guides.
๐ซ The Use and Controversy of Achievement Test Scores
Hickey addresses the use of achievement test scores in evaluating student learning, noting the general agreement on measuring achievement butไบ่ฎฎ over how scores are utilized. He critiques the use of test scores for comparing schools and the political motivations behind such practices, particularly under the No Child Left Behind Act. The paragraph also discusses the challenges of using test scores to improve instruction directly, suggesting that they are often more useful for research and long-term instructional system improvement. Hickey expresses skepticism about the effectiveness of standardized testing in improving schools and cautions about the involvement of commercial testing companies.
๐ฎ The Future and Impact of Standardized Testing
In the final paragraph, Hickey reflects on the future of standardized testing, suggesting that despite ongoing reforms, it is likely to remain a fixture in education. He urges viewers to understand the basics of standardized tests, consider their intended and actual uses, and be aware of the potential negative consequences. Hickey also points out the pressures to introduce standardized testing in higher education and encourages critical examination of these efforts. The video concludes with a call to action for educators at all levels to deeply understand and thoughtfully engage with standardized testing in their contexts.
Mindmap
Keywords
๐กStandardized Testing
๐กPsychometric Techniques
๐กItem Response Theory (IRT)
๐กEducational Standards
๐กScale Scores
๐กPercentile Scores
๐กStandard Deviation
๐กAchievement Tests
๐กEvidence-Centered Design
๐กNo Child Left Behind Act
๐กRace To The Top Initiative
Highlights
Introduction to the video: Dan Hickey explains that the video will focus on standardized testing policies, relevant for teachers, educators, and administrators.
Definition of standardized tests: All test-takers answer the same questions, often from a large pool of items of known difficulty.
Standardized tests use psychometric techniques, specifically item response theory (IRT), to determine item difficulty and equate different versions of the same test.
Emerging trends in standardized testing, such as evidence-centered design and adaptive tests that change based on prior responses, introduce new complexities.
Context matters in standardized testing: Educators often have little control over the tests used in their context, which are dictated by broader policies and standards.
Standardized tests are costly to develop and administer, consuming significant instructional time in schools.
Many teachers standardize classroom assessments when working with multiple instructors to ensure consistency across courses.
Test score interpretation: Key metrics include median, mean, range, and standard deviations. These are crucial for understanding how students' scores compare within a normal distribution.
Percentile scores, once commonly used, have been largely replaced by scale scores due to their accuracy, though scale scores are harder to interpret.
Using achievement test scores to compare schools has been problematic and politically charged, especially in the context of initiatives like No Child Left Behind.
Achievement test scores are challenging to use for improving classroom instruction due to their disconnection from daily teaching practices.
Despite criticism, standardized tests remain essential for evaluating long-term instructional improvements, particularly in research contexts.
Teacher evaluation through standardized test scores, especially in the U.S. under initiatives like Race To The Top, is highly controversial and complex.
Dan Hickey stresses the importance of understanding test interpretation guides and thinking critically about how standardized tests are used in different educational contexts.
Final thoughts: Standardized testing is here to stay, and educators must be aware of both its potential benefits and profound negative consequences.
Transcripts
Hi again, this is Dan Hickey coming to with another short video on the practices, principles,
and policies of educational assessment.
This video concerns the topic of standardized testing. We'll discuss some of the practices
and some of the principles.
But mostly, you'll be thinking--if you're a teacher, an educator, or an administrator--you'll
be thinking of standardized testing really in terms of policies.
This video is intended to help you deal with the policies that come up around standardized
testing in your own educational practice.
Let's start by discussing what makes a test standardized.
One of the key differences with standardized tests is that all takers answer the same questions.
Now, these items usually come from a common bank of items.
Commercial tests in particular draw from large ppols of very carefully constructed items
of known difficulty.
Now, items on a standardized tests are scored in a so called standard manner.
These items may or may not be connected to standards, or educational standards as you're
likely familiar with.
Educational standards are collections of curricular aims that states or countries have agreed
upon.
Those are typically measured. Acheivment of those standards is measured using standardized
tests.
But there are many standardized tests that are not associated with educational standards
or other standards.
In fact, many standardized tests, such as those using psychological measurement, are
alligned to no standards at all, but rather they're alligned to a psychological construct
that has been theoretically developed.
Standardized tests are usuaully given to large numbers of test takers.
They're developed for comparring large number of people in a standardized manner.
They're developed using sophisticated psychometric techniques.
We won't get into those techniques in this video, but it is important to know that a
lot of assumptions go into making standardized tests
And many of these relate to a technique know as item response theory, or IRT.
The beauty of IRT is that it lets test developers know--it lets them figure out how difficult
each item is relative to other items in the ppol.
Once you've done this, it allows you to create different versions of the same test and allows
you to equate different tests.
Typically, standardized tests involve selected response items.
However, there are quite a few exceptions to this. For instance, in the familiar standardized
writing tests--where all test takers complete the same standardized prompt so to speak--and
then those responses from the writers--from the students--are then scored in a very standardized
manner often using computers and humans.
You also have standardized performance assessments. There was a big push in the nineties in the
US and in Europe to reform testing using standardized performance assessments.
This became quite problematic for some of the reasons we'll talk about today as you
learn some of the assumptions that go into standardized tests are difficult to do with
more open ended formats.
In recent years, we've seen the emergance of so called evidence centered design.
Many of the American states are part of the Smarter Balance Assessment Coalition.
We'll be implementing these new assessments. These are standardized tests.
However, they use very different models of psychometrics to allow test takers in fact,
in some cases, to answer different items depending on their response to prior items and that
raises a lot of very complicated issues.
So, just be aware that things are changing. There is a lot of new formats coming out now
and, in particular, the use of technology is really changing standardized testing very
quickly.
It's important to think about hte context in which you work and how that relates to
standardized testing.
Perhaps more than any other aspect of educational assessment, this is because standardized testing
that relates to you is dictated by that context.
In other words, you don't have very much control over it in many cases.
A lot of times, as an educator or an administrator, and often as a researcher, the broader context
in which you are doin gyour work is dictating the standardized testing that will be used
in that setting.
Standardized tests are very expensive to develo. They must be securely administered and this
is a big issue.
They take a lot of time and a lot of money to develop, and then they actually take quite
a bit of time to administer as well.
Those of you who work in schools can know that enormous ammounts of instructional time
ends up being dedicated to standardized testing.
Now, some of you may do standardization processes with your classroom assessments.
For instance, if you are a faculty member and you work with multiple instructors teaching
the same course, this happens in secondary schools sometimes aa well, you want to take
some time and think about how your understanding and your knowledge of standardized testing
is impacted by the role and the domain in which you teach.
For example, in my own course:-my graduate level courses in educational assessment and
educational psychology--now, there is no really standardized test out there.
I expect I could probably find one to administer.
Now, I do know that many of my students when they're in pre-service education are going
to face standardized tests of the concepts that they are learning in my course.
It's really in my own research where I really encounter standardized tests a lot.
A lot of what I do in my research is I allign semi-formal and informal classroom assessments
to external acheivment tests.
Now, those tests are really important because they allow me to make claims that my interventions,
my efforts, and my improvements when I am collaborating with people really will lead
to gains on other standardized tests that the students face.
Now, I'd like to talk about interpreting test scores.
There are two ways of thinking about interpreting standardized test scores.
Many of you will end up interpreting scores as they relate individual students, but it's
important to understand how test scores are interpreted for groups as well.
Most of you are probably familiar with the notion of median score and the range of scores.
Now, median is what's known as a measure of central tendency.
It's one of several ways of saying how a distribution of scores clusters around the center.
Now the range is a simple way of characterizing the diversity of scores around that central
measure.
It's really much more appropriate to think in temrs of means.
Most of you probably know the difference between median and the mean.
Now the median is easier to understand, of course, it's the middle score, but the mean
really captures sort of the average.
And rather than range, it's more appropriate to think in terms of standard deviations.
Now, let's take a minute here and think about what a standard deviation really is.
If you look at the diagram on the bottom, you can see what's know as a distribution
of scores.
Now, this particular distribution is an example of what's know as a normal distribution where
you can see that the highest score here is at .04 of whatever this is measuring and you
can see that there are more scores at .04 than any other one.
One either side, you can see that there is a minus one or a plus one.
These represent one standard deviation above or below the mean.
Now, when scores are normally distributed as they are in this diagram, when you know
the standard deviation
when you calculate the standard deviation, which is relatively simple to calculate.
I am not going to talk about it in this video, but when you do calculate it you can understand
quite a bit about the way those scores are distributed.
You can see, for example, that around 68 percent of the scores fall within plus or minus one
standard deviation.
In other words, 34.1 percent of the scores fall within minus one standard deviation and
another 34.1 percent are one standard deviation above.
Moving out on either side, you can see it at minus two and plus two, you can see an
additional 13 percent on either side.
What you really see here is that relatively small proportion of scores occur beyond minus
one or plus two standard deviations.
Just two percent on either side.
This is really a helpful way of thinking. If someone said that a a standard score was
two standard deviations above the mean, right, you can see that that's higher than almost
all the other scores in the distribution.
And likewise, if someone says a score is minus three standard deviations below the mean,
you can see from that diagram that only .1 percent of the scores fall in that range.
Now most of the time, you're going to be concerned with interpreting individual scores.
Now, there are three common ways that have been used to interpret scores on standardized
tests.
Traditionally, percentile scores were often reported. You would say that a student would
score, for instance, above 50 percent of the other students in the thrid grade or above
75 percent.
Now, that's really easy to interpret, but it's incredibly problematic because what that
means depends entirely on the norming group.
Right, so what sample of fourth graders are you referring to? Was this the fourth graders
who took the test? Was this all the fourth graders in the country, the fourth graders
in the state?
Both percentile scores and grade level equivalents have largely been replaced by scale scores.
Now the problem here is that scale scores are very difficult to interpret, but they're
much more accurate and are now most widely used.
In particular, the advantage of scale scores is they address the issues of equating and
equivalence.
Now, I am not going to take the time in this video to really unpack scale scores.
I know from my experience that it just really won't make much sense at all if I explain
it.
Instead, I encourage you to get your hands on a standardized test that's relevant to
you and go through the interpretation guide, and it will explain what scale scores mean
in the context of a test that is meaningful to you.
And I think you'll find that is a much easire way to make sense of this complex concept.
But I really want to encourage you to dig into it.
If you're dealing with standardized tests and they're reported in scale scores, you're
really going to have to help others make sense of what those numbers actually mean.
Now let's talk about using achievement test scores.
Obviously, the purpose of achievement tests is measuring student achievement.
There is a general agreement about how you do that, but what you do with the scores depends
on a lot of other factors.
Most of you are probably aware of the enormous controversy that surrounds the way achievement
test scores are used in the US and in many other countries.
In many cases, achievement scores have been used to compare schools.
Many observers argue that most of the benefit that might come from comparing schools on
standardized scores was already sort of accomplished as schools struggled and worked to sort of
improve their scores.
This turns out to be a really problematic thing. Some argue that kind of external pressure
that the sort of dysfunctional schools that people think that achievement scores and giving
them letter grades and what not is going to somehow lead them to use that data to improve.
So far, that's just not hasn't been the case.
Certainly, people hadn't given up. It's continuing to happen, but it remains quite problematic.
In my opinion, I think that a lot of these efforts are quite misguided. They're very
laden with political motivations, and,
in particular, I think the involvement of commercial testing companies in most achievement
testing contexts is quite problematic.
In particular, I am very critical of the No Child Left Behind Act where schools were compared
against a fixed criteria: an unreachable criteria
Many argue now that that was really a politically motivated effort to break the public's confidence
in public schools, and it appear to have worked.
Again, that is my opinion. Other people disagree with me/
Of course, one of the intended uses of achievement tests is improving and evaluating instruction.
This is really the problematic area in my opinion.
In most school settings, efforts to use scores on achievement tests to directly improve instruction
really have led to very little improvement and really many problematic, negative consequences.
There's a lot of reasons this is the case.
In my opinion, the big issue is that the achievement tests are so removed from the actual instruction
of the classrooms, both in terms of time and the way the knowledge is represented.
Many teachers end up abusing achievement scores for things like sorting and placing students
in different classes and tacking progress, but in terms of actually improving instruction
most teachers will admit that it's very very difficult to use those scores.
Now in my own work, I find that standardized testings are essential for the research that
I do when I am improving instruction, but I work very hard to keep those achievement
tests very removed from the actual work that I am doing.
So if I want to make claims about my work improving achievement, I really have to use
those tests.
I also find that standardized tests are really useful for what Ireally like to do, which
is try to improve instructional systems over time.
I work very hard to align curriculum and assessment in classrooms.
And if I want to see if those efforts are improving, in the long term, I can't really
look at the results on the classroom assessments because I'm aligning the instruction to those
assessments.
In other words, I am compromising the validity of the classroom assessments for evaluating
my improvements.
The only way that I can really see whether or not I am improving over time is comparing
gains on achievement tests that are administered in a standardized way.
Another use of standardized achievement test scores that you'll probably be hearing about--certainly
if you're in the United States dealing with K-12 education--is evaluating teachers.
The Race To The Top Initiative that's currently underway in the United States has the evaluation
of teachers using standardized tests scores really at its center.
I am going to talk about this in another video because this is such a big issue, but this
is really quite problematic for many people.
What I really want to say is this.
Regardiess of your opinion of standardized tests, if they're relevant to your area, you
really need to understand some of the basics.
You need to get your hands on a test interpretation guide for a relevant tests and really understand
where those scores come from.
Then, you really want to think about how those scores are being used and how they're intended
to be used and how they're actually being used.
There is no way to put this lightly. The negative consequences of standardized testing can be
quite profound.
The positive consequences can be as well, but those have proven quite elusive in recent
years.
Whether or not these new reforms that are underway now in the United States and other
countries actually have an impact remains to be seen, but, either way, it appears that
standardized testing is here to stay.
So, it really behooves you, whatever your role, if your an administrator, a teacher,
a faculty member
many of you, if you're working in universities, you know that there is enormous pressure to
introduce standardized achievement tests in university contexts as well.
This is also extremely problematic from a bunch of different perspectives, and I encourage
you to pay close attention to these efforts.
That's all for this short video on standardized testing. Ireally hope you'll take some time
and dig deeply into resources, text books, and information on the web and think about
how standardized tests are supposed to be used or intended to be used in your setting
and how they are actually being used and what you might do to take better advantage of the
scores that those tests are generating in your setting.
Thank you very much.
Browse More Related Video
Assessment Bias
What standardized tests don't measure | Nikki Adeli | TEDxPhiladelphia
Creating a Better Future Through Collaborative Learning | Maddie Edwards | TEDxYouth@Brambleton
Membandingan Konsep Teori Tes Klasik CTT vs Teori Tes Modern IRT
The 3 Most Common GRE Trick Questions (in the GMAT and many other tests too!)
The Fascinating History of Money: From Bartering to Modern Currency
5.0 / 5 (0 votes)