You give tests—what do tests give you back? The following tips help you understand what assessment results mean so you can use them effectively in the classroom to improve student achievement.
The terms reliability, validity, and fairness are tossed around a lot in conversations surrounding effective assessments, so it’s imperative that they be properly defined.
- Reliability is the degree to which a test produces consistent results. This means that if you were to give a biology exam today, and then give the same exam tomorrow without any intervening instruction, your students would get roughly the same score.
- Validity refers to how well a test measures what it is intended to measure. A valid test delivers evidence that the scores provide meaningful, relevant information about the identified knowledge, skills, and abilities being assessed.
- Fairness levels the playing field. When a test is fair, no student is at an unfair advantage or disadvantage over another. All students should have equal access to the content being tested and an equal opportunity to demonstrate what they understand, know, and can do.
A fair test is culturally inclusive and makes accommodations for students with physical or cognitive needs.
2: Identify what’s being assessed.
Student performance is most commonly measured against educational standards. Often, educational standards group related skills into a single standard—and test results may be grouped around that standard, rather than breaking it down to individual skills. You may need to do some additional interim or formative testing to dig deeper and get skill-level details. Once you have those details, you can design effective classroom interventions for groups and individuals to improve understanding of specific skill gaps.
3: Understand the parts of a test item.
The infographic below illustrates the basic elements of an item, so you can use a common vocabulary to discuss assessments and their results:
4: Learn which item statistics really matter for classroom instruction.
Let’s look at three statistical measures (methods to quantify student performance):
- Item Difficulty (p-value) identifies the proportion of students who answer an item correctly. Lower values indicate harder items; higher values indicate easier items. Correct answers typically have a higher value than distractors. Consider this statistic careful in context with the other statistics: some items with a low difficulty may be core content—material that is easy, but essential—and a hard item is not necessarily a bad item.
- Item Discrimination (point biserial) indicates an item’s ability to differentiate between students taking the test; distinguishing between students who have successfully learned the concepts covered and those who have not. A positive value indicates that students who score highly on the test as a whole tend to get the item correct. A negative value indicates that students who score highly on the test as a whole tend to get the item incorrect, meaning there may be a problem with the item. Be careful not to take too much meaning from results for items with a low discrimination rating.
By learning these core concepts, you can look at results reports with a more educated eye and use the statistical data provided to help your students attain state standards and increase their educational achievement.