Standardized Testing

BigOpen OnlineClasses
29 Jun 201416:08

Summary

TLDRIn this video, Dan Hickey explores standardized testing, discussing its practices, principles, and policies. He explains what makes a test standardized, including the use of a common item bank and standardized scoring. Hickey touches on the role of educational and psychological standards, the application of item response theory, and the evolution of testing formats with technology. He also addresses the interpretation of test scores, the controversy surrounding their use, and the challenges in leveraging them for instruction improvement. The video concludes with a call for educators to understand standardized testing's basics and its impact on their practice.

Takeaways

  • 📚 Standardized tests involve all test takers answering the same questions, often drawn from large item banks with known difficulty.
  • 🎯 Standardized tests are scored in a consistent manner, though they may not always be aligned with educational or psychological standards.
  • 🔬 Many standardized tests, especially those using psychological measurement, are aligned to theoretical constructs rather than formal educational standards.
  • 🧠 Item Response Theory (IRT) allows test developers to understand the relative difficulty of test items, making it possible to equate different versions of the same test.
  • 📝 Standardized tests often involve selected-response items, but performance assessments and writing tests are notable exceptions.
  • 💡 New developments in standardized testing, such as evidence-centered design and adaptive tests, are changing how tests are structured and administered.
  • 🏫 Standardized tests are often dictated by the educational context, meaning educators may have limited control over their use.
  • ⏳ Standardized testing consumes significant time and resources, affecting instructional time and costs in schools.
  • 📈 Interpreting standardized test scores can be done using percentiles, grade-level equivalents, or more complex scale scores, which are difficult to understand but more accurate.
  • 🎓 Achievement test scores are used for evaluating students, teachers, and schools, but their effectiveness in improving instruction is highly debated.

Q & A

  • What is a key difference between standardized tests and other types of assessments?

    -One key difference is that standardized tests require all test takers to answer the same questions, often drawn from a large pool of carefully constructed items with known difficulty.

  • What is item response theory (IRT) and why is it important in standardized testing?

    -Item Response Theory (IRT) is a psychometric technique that helps test developers determine the relative difficulty of each item in a test. This allows for the creation of different versions of a test and enables the equating of scores across these versions.

  • How do standardized writing tests differ from traditional multiple-choice standardized tests?

    -Standardized writing tests require test takers to respond to the same prompt, and their responses are scored in a standardized manner, often by both computers and humans, unlike multiple-choice tests which involve selected responses.

  • What is evidence-centered design and how is it impacting standardized tests?

    -Evidence-centered design is a newer approach in standardized testing that allows test takers to answer different items based on their responses to previous questions. This model, used by organizations like the Smarter Balance Assessment Coalition, is changing the traditional formats of standardized tests.

  • What challenges come with using standardized performance assessments?

    -Standardized performance assessments face difficulties because they are more open-ended, making it harder to apply the psychometric assumptions and techniques that work well with selected-response items.

  • What is the difference between the mean and median when interpreting test scores?

    -The median represents the middle score in a distribution, while the mean is the average score. The median is easier to understand but the mean provides a more accurate picture of central tendency.

  • How is the standard deviation used in interpreting test scores?

    -Standard deviation measures the spread of scores around the mean. In a normal distribution, approximately 68% of scores fall within one standard deviation above or below the mean, while 95% fall within two standard deviations.

  • Why have percentile scores and grade-level equivalents been largely replaced by scale scores?

    -Percentile scores and grade-level equivalents are easy to interpret but can be misleading depending on the norming group. Scale scores, while more complex, provide more accurate comparisons across different groups and test versions.

  • Why is it problematic to use achievement test scores to compare schools?

    -Comparing schools based on achievement test scores has often led to negative consequences, such as overemphasis on test preparation rather than instruction improvement, and the use of scores for political purposes rather than meaningful educational reform.

  • What is a common criticism of using standardized tests to evaluate teachers, as seen in initiatives like Race to the Top?

    -A common criticism is that using standardized test scores to evaluate teachers is problematic because these scores do not directly reflect the quality of instruction. The tests are often disconnected from classroom activities, making it difficult to link test scores to teaching effectiveness.

Outlines

00:00

📚 Introduction to Standardized Testing

Dan Hickey introduces the topic of standardized testing, emphasizing its relevance to educators, administrators, and teachers in terms of policy. He discusses the characteristics of standardized tests, such as the use of a common question bank, scoring methods, and the alignment with educational standards or psychological constructs. Hickey also touches on the use of item response theory (IRT) in test development and the evolution of testing formats, including the impact of technology and evidence-centered design.

05:03

📊 Interpreting Standardized Test Scores

This section delves into the interpretation of standardized test scores, focusing on individual and group interpretations. Hickey explains the concepts of median and range, then transitions to the importance of understanding mean scores and standard deviations. He uses a normal distribution diagram to illustrate how scores are distributed and the significance of standard deviations in interpreting test results. The paragraph also discusses the shift from percentile and grade level scores to scale scores, which are more accurate but less intuitive, and encourages educators to understand the context of scale scores through test interpretation guides.

10:04

🏫 The Use and Controversy of Achievement Test Scores

Hickey addresses the use of achievement test scores in evaluating student learning, noting the general agreement on measuring achievement but争议 over how scores are utilized. He critiques the use of test scores for comparing schools and the political motivations behind such practices, particularly under the No Child Left Behind Act. The paragraph also discusses the challenges of using test scores to improve instruction directly, suggesting that they are often more useful for research and long-term instructional system improvement. Hickey expresses skepticism about the effectiveness of standardized testing in improving schools and cautions about the involvement of commercial testing companies.

15:08

🔮 The Future and Impact of Standardized Testing

In the final paragraph, Hickey reflects on the future of standardized testing, suggesting that despite ongoing reforms, it is likely to remain a fixture in education. He urges viewers to understand the basics of standardized tests, consider their intended and actual uses, and be aware of the potential negative consequences. Hickey also points out the pressures to introduce standardized testing in higher education and encourages critical examination of these efforts. The video concludes with a call to action for educators at all levels to deeply understand and thoughtfully engage with standardized testing in their contexts.

Mindmap

Keywords

💡Standardized Testing

Standardized testing refers to a form of assessment where all test takers answer the same questions, which are typically drawn from a common pool and scored in a uniform manner. In the video, this concept is central as it discusses the practices, principles, and policies surrounding such tests. The video emphasizes that standardized tests are designed to compare large numbers of people using sophisticated psychometric techniques, and they are often used to measure achievement of educational standards.

💡Psychometric Techniques

Psychometric techniques are scientific methods used to measure psychological attributes, such as intelligence or personality. In the context of the video, these techniques are crucial in developing standardized tests. They help in understanding the difficulty of test items and in creating equitable test versions, as mentioned when discussing item response theory (IRT).

💡Item Response Theory (IRT)

Item response theory is a model used in the construction of standardized tests to understand how test items relate to the underlying trait being measured. The video highlights IRT's utility in determining the difficulty of each test item relative to others, which is vital for creating different test versions and ensuring their equivalence.

💡Educational Standards

Educational standards are sets of curricular aims or learning objectives that are agreed upon by educational authorities, such as states or countries. The video explains that while standardized tests often measure these standards, not all standardized tests are connected to educational standards, and some are aligned with psychological constructs instead.

💡Scale Scores

Scale scores are numerical representations of a test taker's performance on a standardized test. The video points out that scale scores have largely replaced percentile scores and grade level equivalents because they are more accurate and address issues of test equating. However, they can be challenging to interpret and require understanding the context in which they are used.

💡Percentile Scores

Percentile scores indicate the percentage of test takers that a particular score is higher than. While easy to understand, as mentioned in the video, they are problematic because their meaning depends entirely on the norming group, which may vary widely.

💡Standard Deviation

Standard deviation is a measure of the amount of variation or dispersion in a set of values. The video uses standard deviation to explain how scores are distributed around the mean in a normal distribution, highlighting that a score two standard deviations above the mean is higher than almost all other scores.

💡Achievement Tests

Achievement tests are assessments designed to measure what a student knows and can do in a particular subject area. The video discusses the controversy surrounding the use of these test scores, particularly in comparing schools and evaluating teachers, and the challenges in using them to improve instruction.

💡Evidence-Centered Design

Evidence-centered design is a framework for developing assessments that focuses on the evidence a test provides about a test taker's knowledge or skills. The video mentions that this approach is being used in newer standardized tests, allowing for more dynamic test designs where test takers may answer different items based on previous responses.

💡No Child Left Behind Act

The No Child Left Behind Act is a U.S. federal law that aims to improve the performance of America's schools. The video criticizes this act for comparing schools against fixed criteria, which some argue was politically motivated to undermine confidence in public schools.

💡Race To The Top Initiative

Race To The Top is a competitive grant program that encourages states to adopt education reforms. The video briefly mentions that this initiative has put the evaluation of teachers using standardized test scores at its center, a topic that is described as problematic and deserving of a more detailed discussion.

Highlights

Introduction to the video: Dan Hickey explains that the video will focus on standardized testing policies, relevant for teachers, educators, and administrators.

Definition of standardized tests: All test-takers answer the same questions, often from a large pool of items of known difficulty.

Standardized tests use psychometric techniques, specifically item response theory (IRT), to determine item difficulty and equate different versions of the same test.

Emerging trends in standardized testing, such as evidence-centered design and adaptive tests that change based on prior responses, introduce new complexities.

Context matters in standardized testing: Educators often have little control over the tests used in their context, which are dictated by broader policies and standards.

Standardized tests are costly to develop and administer, consuming significant instructional time in schools.

Many teachers standardize classroom assessments when working with multiple instructors to ensure consistency across courses.

Test score interpretation: Key metrics include median, mean, range, and standard deviations. These are crucial for understanding how students' scores compare within a normal distribution.

Percentile scores, once commonly used, have been largely replaced by scale scores due to their accuracy, though scale scores are harder to interpret.

Using achievement test scores to compare schools has been problematic and politically charged, especially in the context of initiatives like No Child Left Behind.

Achievement test scores are challenging to use for improving classroom instruction due to their disconnection from daily teaching practices.

Despite criticism, standardized tests remain essential for evaluating long-term instructional improvements, particularly in research contexts.

Teacher evaluation through standardized test scores, especially in the U.S. under initiatives like Race To The Top, is highly controversial and complex.

Dan Hickey stresses the importance of understanding test interpretation guides and thinking critically about how standardized tests are used in different educational contexts.

Final thoughts: Standardized testing is here to stay, and educators must be aware of both its potential benefits and profound negative consequences.

Transcripts

play00:02

Hi again, this is Dan Hickey coming to with another short video on the practices, principles,

play00:08

and policies of educational assessment.

play00:12

This video concerns the topic of standardized testing. We'll discuss some of the practices

play00:18

and some of the principles.

play00:19

But mostly, you'll be thinking--if you're a teacher, an educator, or an administrator--you'll

play00:23

be thinking of standardized testing really in terms of policies.

play00:26

This video is intended to help you deal with the policies that come up around standardized

play00:31

testing in your own educational practice.

play00:34

Let's start by discussing what makes a test standardized.

play00:38

One of the key differences with standardized tests is that all takers answer the same questions.

play00:44

Now, these items usually come from a common bank of items.

play00:48

Commercial tests in particular draw from large ppols of very carefully constructed items

play00:54

of known difficulty.

play00:55

Now, items on a standardized tests are scored in a so called standard manner.

play01:00

These items may or may not be connected to standards, or educational standards as you're

play01:05

likely familiar with.

play01:07

Educational standards are collections of curricular aims that states or countries have agreed

play01:12

upon.

play01:13

Those are typically measured. Acheivment of those standards is measured using standardized

play01:18

tests.

play01:18

But there are many standardized tests that are not associated with educational standards

play01:23

or other standards.

play01:24

In fact, many standardized tests, such as those using psychological measurement, are

play01:29

alligned to no standards at all, but rather they're alligned to a psychological construct

play01:33

that has been theoretically developed.

play01:35

Standardized tests are usuaully given to large numbers of test takers.

play01:38

They're developed for comparring large number of people in a standardized manner.

play01:43

They're developed using sophisticated psychometric techniques.

play01:47

We won't get into those techniques in this video, but it is important to know that a

play01:52

lot of assumptions go into making standardized tests

play01:55

And many of these relate to a technique know as item response theory, or IRT.

play02:00

The beauty of IRT is that it lets test developers know--it lets them figure out how difficult

play02:07

each item is relative to other items in the ppol.

play02:11

Once you've done this, it allows you to create different versions of the same test and allows

play02:16

you to equate different tests.

play02:20

Typically, standardized tests involve selected response items.

play02:24

However, there are quite a few exceptions to this. For instance, in the familiar standardized

play02:28

writing tests--where all test takers complete the same standardized prompt so to speak--and

play02:35

then those responses from the writers--from the students--are then scored in a very standardized

play02:40

manner often using computers and humans.

play02:42

You also have standardized performance assessments. There was a big push in the nineties in the

play02:48

US and in Europe to reform testing using standardized performance assessments.

play02:55

This became quite problematic for some of the reasons we'll talk about today as you

play03:00

learn some of the assumptions that go into standardized tests are difficult to do with

play03:04

more open ended formats.

play03:07

In recent years, we've seen the emergance of so called evidence centered design.

play03:11

Many of the American states are part of the Smarter Balance Assessment Coalition.

play03:15

We'll be implementing these new assessments. These are standardized tests.

play03:21

However, they use very different models of psychometrics to allow test takers in fact,

play03:27

in some cases, to answer different items depending on their response to prior items and that

play03:33

raises a lot of very complicated issues.

play03:35

So, just be aware that things are changing. There is a lot of new formats coming out now

play03:39

and, in particular, the use of technology is really changing standardized testing very

play03:43

quickly.

play03:44

It's important to think about hte context in which you work and how that relates to

play03:50

standardized testing.

play03:52

Perhaps more than any other aspect of educational assessment, this is because standardized testing

play03:59

that relates to you is dictated by that context.

play04:02

In other words, you don't have very much control over it in many cases.

play04:05

A lot of times, as an educator or an administrator, and often as a researcher, the broader context

play04:12

in which you are doin gyour work is dictating the standardized testing that will be used

play04:16

in that setting.

play04:17

Standardized tests are very expensive to develo. They must be securely administered and this

play04:21

is a big issue.

play04:22

They take a lot of time and a lot of money to develop, and then they actually take quite

play04:26

a bit of time to administer as well.

play04:28

Those of you who work in schools can know that enormous ammounts of instructional time

play04:33

ends up being dedicated to standardized testing.

play04:37

Now, some of you may do standardization processes with your classroom assessments.

play04:43

For instance, if you are a faculty member and you work with multiple instructors teaching

play04:48

the same course, this happens in secondary schools sometimes aa well, you want to take

play04:53

some time and think about how your understanding and your knowledge of standardized testing

play04:59

is impacted by the role and the domain in which you teach.

play05:03

For example, in my own course:-my graduate level courses in educational assessment and

play05:08

educational psychology--now, there is no really standardized test out there.

play05:13

I expect I could probably find one to administer.

play05:16

Now, I do know that many of my students when they're in pre-service education are going

play05:22

to face standardized tests of the concepts that they are learning in my course.

play05:27

It's really in my own research where I really encounter standardized tests a lot.

play05:31

A lot of what I do in my research is I allign semi-formal and informal classroom assessments

play05:38

to external acheivment tests.

play05:40

Now, those tests are really important because they allow me to make claims that my interventions,

play05:44

my efforts, and my improvements when I am collaborating with people really will lead

play05:50

to gains on other standardized tests that the students face.

play05:53

Now, I'd like to talk about interpreting test scores.

play05:57

There are two ways of thinking about interpreting standardized test scores.

play06:01

Many of you will end up interpreting scores as they relate individual students, but it's

play06:06

important to understand how test scores are interpreted for groups as well.

play06:11

Most of you are probably familiar with the notion of median score and the range of scores.

play06:16

Now, median is what's known as a measure of central tendency.

play06:19

It's one of several ways of saying how a distribution of scores clusters around the center.

play06:26

Now the range is a simple way of characterizing the diversity of scores around that central

play06:33

measure.

play06:35

It's really much more appropriate to think in temrs of means.

play06:38

Most of you probably know the difference between median and the mean.

play06:41

Now the median is easier to understand, of course, it's the middle score, but the mean

play06:46

really captures sort of the average.

play06:48

And rather than range, it's more appropriate to think in terms of standard deviations.

play06:52

Now, let's take a minute here and think about what a standard deviation really is.

play06:57

If you look at the diagram on the bottom, you can see what's know as a distribution

play07:02

of scores.

play07:02

Now, this particular distribution is an example of what's know as a normal distribution where

play07:07

you can see that the highest score here is at .04 of whatever this is measuring and you

play07:12

can see that there are more scores at .04 than any other one.

play07:17

One either side, you can see that there is a minus one or a plus one.

play07:21

These represent one standard deviation above or below the mean.

play07:26

Now, when scores are normally distributed as they are in this diagram, when you know

play07:31

the standard deviation

play07:32

when you calculate the standard deviation, which is relatively simple to calculate.

play07:35

I am not going to talk about it in this video, but when you do calculate it you can understand

play07:41

quite a bit about the way those scores are distributed.

play07:43

You can see, for example, that around 68 percent of the scores fall within plus or minus one

play07:50

standard deviation.

play07:51

In other words, 34.1 percent of the scores fall within minus one standard deviation and

play07:57

another 34.1 percent are one standard deviation above.

play08:02

Moving out on either side, you can see it at minus two and plus two, you can see an

play08:08

additional 13 percent on either side.

play08:12

What you really see here is that relatively small proportion of scores occur beyond minus

play08:20

one or plus two standard deviations.

play08:23

Just two percent on either side.

play08:26

This is really a helpful way of thinking. If someone said that a a standard score was

play08:29

two standard deviations above the mean, right, you can see that that's higher than almost

play08:35

all the other scores in the distribution.

play08:37

And likewise, if someone says a score is minus three standard deviations below the mean,

play08:43

you can see from that diagram that only .1 percent of the scores fall in that range.

play08:50

Now most of the time, you're going to be concerned with interpreting individual scores.

play08:54

Now, there are three common ways that have been used to interpret scores on standardized

play09:00

tests.

play09:01

Traditionally, percentile scores were often reported. You would say that a student would

play09:07

score, for instance, above 50 percent of the other students in the thrid grade or above

play09:12

75 percent.

play09:13

Now, that's really easy to interpret, but it's incredibly problematic because what that

play09:19

means depends entirely on the norming group.

play09:21

Right, so what sample of fourth graders are you referring to? Was this the fourth graders

play09:25

who took the test? Was this all the fourth graders in the country, the fourth graders

play09:29

in the state?

play09:32

Both percentile scores and grade level equivalents have largely been replaced by scale scores.

play09:36

Now the problem here is that scale scores are very difficult to interpret, but they're

play09:41

much more accurate and are now most widely used.

play09:44

In particular, the advantage of scale scores is they address the issues of equating and

play09:49

equivalence.

play09:50

Now, I am not going to take the time in this video to really unpack scale scores.

play09:55

I know from my experience that it just really won't make much sense at all if I explain

play09:59

it.

play09:59

Instead, I encourage you to get your hands on a standardized test that's relevant to

play10:03

you and go through the interpretation guide, and it will explain what scale scores mean

play10:09

in the context of a test that is meaningful to you.

play10:11

And I think you'll find that is a much easire way to make sense of this complex concept.

play10:16

But I really want to encourage you to dig into it.

play10:19

If you're dealing with standardized tests and they're reported in scale scores, you're

play10:23

really going to have to help others make sense of what those numbers actually mean.

play10:28

Now let's talk about using achievement test scores.

play10:33

Obviously, the purpose of achievement tests is measuring student achievement.

play10:36

There is a general agreement about how you do that, but what you do with the scores depends

play10:43

on a lot of other factors.

play10:45

Most of you are probably aware of the enormous controversy that surrounds the way achievement

play10:49

test scores are used in the US and in many other countries.

play10:53

In many cases, achievement scores have been used to compare schools.

play10:58

Many observers argue that most of the benefit that might come from comparing schools on

play11:04

standardized scores was already sort of accomplished as schools struggled and worked to sort of

play11:09

improve their scores.

play11:11

This turns out to be a really problematic thing. Some argue that kind of external pressure

play11:16

that the sort of dysfunctional schools that people think that achievement scores and giving

play11:22

them letter grades and what not is going to somehow lead them to use that data to improve.

play11:28

So far, that's just not hasn't been the case.

play11:30

Certainly, people hadn't given up. It's continuing to happen, but it remains quite problematic.

play11:35

In my opinion, I think that a lot of these efforts are quite misguided. They're very

play11:39

laden with political motivations, and,

play11:43

in particular, I think the involvement of commercial testing companies in most achievement

play11:49

testing contexts is quite problematic.

play11:53

In particular, I am very critical of the No Child Left Behind Act where schools were compared

play11:59

against a fixed criteria: an unreachable criteria

play12:01

Many argue now that that was really a politically motivated effort to break the public's confidence

play12:07

in public schools, and it appear to have worked.

play12:09

Again, that is my opinion. Other people disagree with me/

play12:12

Of course, one of the intended uses of achievement tests is improving and evaluating instruction.

play12:19

This is really the problematic area in my opinion.

play12:23

In most school settings, efforts to use scores on achievement tests to directly improve instruction

play12:29

really have led to very little improvement and really many problematic, negative consequences.

play12:35

There's a lot of reasons this is the case.

play12:38

In my opinion, the big issue is that the achievement tests are so removed from the actual instruction

play12:44

of the classrooms, both in terms of time and the way the knowledge is represented.

play12:49

Many teachers end up abusing achievement scores for things like sorting and placing students

play12:54

in different classes and tacking progress, but in terms of actually improving instruction

play12:59

most teachers will admit that it's very very difficult to use those scores.

play13:03

Now in my own work, I find that standardized testings are essential for the research that

play13:08

I do when I am improving instruction, but I work very hard to keep those achievement

play13:13

tests very removed from the actual work that I am doing.

play13:15

So if I want to make claims about my work improving achievement, I really have to use

play13:21

those tests.

play13:22

I also find that standardized tests are really useful for what Ireally like to do, which

play13:27

is try to improve instructional systems over time.

play13:30

I work very hard to align curriculum and assessment in classrooms.

play13:35

And if I want to see if those efforts are improving, in the long term, I can't really

play13:41

look at the results on the classroom assessments because I'm aligning the instruction to those

play13:46

assessments.

play13:47

In other words, I am compromising the validity of the classroom assessments for evaluating

play13:51

my improvements.

play13:52

The only way that I can really see whether or not I am improving over time is comparing

play13:57

gains on achievement tests that are administered in a standardized way.

play14:01

Another use of standardized achievement test scores that you'll probably be hearing about--certainly

play14:08

if you're in the United States dealing with K-12 education--is evaluating teachers.

play14:13

The Race To The Top Initiative that's currently underway in the United States has the evaluation

play14:20

of teachers using standardized tests scores really at its center.

play14:24

I am going to talk about this in another video because this is such a big issue, but this

play14:28

is really quite problematic for many people.

play14:30

What I really want to say is this.

play14:34

Regardiess of your opinion of standardized tests, if they're relevant to your area, you

play14:39

really need to understand some of the basics.

play14:42

You need to get your hands on a test interpretation guide for a relevant tests and really understand

play14:47

where those scores come from.

play14:48

Then, you really want to think about how those scores are being used and how they're intended

play14:53

to be used and how they're actually being used.

play14:56

There is no way to put this lightly. The negative consequences of standardized testing can be

play15:01

quite profound.

play15:03

The positive consequences can be as well, but those have proven quite elusive in recent

play15:08

years.

play15:08

Whether or not these new reforms that are underway now in the United States and other

play15:12

countries actually have an impact remains to be seen, but, either way, it appears that

play15:16

standardized testing is here to stay.

play15:18

So, it really behooves you, whatever your role, if your an administrator, a teacher,

play15:23

a faculty member

play15:24

many of you, if you're working in universities, you know that there is enormous pressure to

play15:29

introduce standardized achievement tests in university contexts as well.

play15:33

This is also extremely problematic from a bunch of different perspectives, and I encourage

play15:37

you to pay close attention to these efforts.

play15:42

That's all for this short video on standardized testing. Ireally hope you'll take some time

play15:46

and dig deeply into resources, text books, and information on the web and think about

play15:51

how standardized tests are supposed to be used or intended to be used in your setting

play15:56

and how they are actually being used and what you might do to take better advantage of the

play16:00

scores that those tests are generating in your setting.

play16:03

Thank you very much.

Rate This

5.0 / 5 (0 votes)

Etiquetas Relacionadas
Standardized TestingEducational AssessmentPsychometric TechniquesTesting PoliciesAchievement ScoresIRTEducation ReformInstructional ImprovementTesting CritiqueEducational Standards
¿Necesitas un resumen en inglés?