My goal is to provide people more information about how speaker points get assigned so that hopefully we can all make more informed decisions. My hope is that this information is not used to single out any particular judge or judges for criticism. Instead, it is an attempt to make a relatively opaque process somewhat more transparent. Assigning speaker points is not an exact science. Nor is it completely arbitrary or capricious. Hopefully, judges can use this information to better understand how their points relate to the community at large.
To be clear, I do not believe that there is any such thing as "correct" points. Similarly, there is no single rubric for what counts as a good speaker. Every judge values different things about a speaker, and that should be celebrated. Furthermore, beyond random variance, there may be a good reason for a judge's points to diverge if they value qualities in speakers that are disproportionately undervalued by the rest of the community. The aim in normalizing point distributions is not to get everybody to agree about what counts as a good speaker. Rather, it is to get everybody to use a common language in scoring. We may disagree about what "good" means, but for speaker points to work, we need to know that when I think a speech is good that I'm giving similar points as you are when you think a speech is good.
I apologize that the table is not necessarily presented in a format that is super easy to understand without some basic knowledge of statistics, but there is a glossary that defines each of the categories. Furthermore, to help clarify, I will work through my own line as an example.
However, we can see that I give slightly below average points by looking at the "Deb Med" and "Med Diff" columns. "Deb Med" (or Debater Median) is the median points that the debaters that I have judged have gotten in all of their debates over the course of the year -- 28.6 meaning that they were slightly above average speakers. "Med Diff" (Median Difference) is the average that I deviate from the points that those that I judge typically receive. I have a -0.1 median difference, which means that on average, I give a tenth of a point less than what everybody else gives the same debaters. Median difference is the simplest way to see if your points tend to deviate from average and by how much.
The next two columns ("< Med" and "> Med") go together. They express the percentage of the time that you give points that are below ("< Med") or above ("> Med") the average points of those you judge. Ideally, these two numbers would be equal, meaning that you give out below average points as often as you give out above average points. However, we can see that my split is not even. I give out below average points 67% of the time and above average points only 17% of the time. This is consistent with what we would expect from the fact that my Median Difference is also negative.
The final four columns all go together and point to how often the judge gives points that significantly deviate from a debater's average ("SD" meaning Standard Deviation). To be clear, we should expect this to happen. Debaters are not robots. They perform inconsistently, and different judges value different things in a speaker. However, if there is a large and consistent skew toward the positive or negative, then a judge might consider whether their points are not in tune with community norms for what points generally mean. Under the "-2 SD" column, I have a 2.3, which means that 2.3% of the time I give points that are more than 2 standard deviations worse than what those debaters usually receive. 10.3% of the time I give points that are between 1 and 2 standard deviations worse, and 2.3% of the time I give points that are between 1 and 2 standard deviations better. I never gave points that were more than 2 standard deviations better. To help concretize what this means a bit, points that are outside of 2 standard deviations are at the extremes, basically what we would expect to be the highest or lowest ~2% of points that that debater will receive over the course of the year. Points that are more than 1 standard deviation are about the highest/lowest ~16%.
In sum, I gave out slightly bad points, but I should be able to address it with a fairly minor correction.