Full disclosure: In addition to my work with Concordia, I am also helping Michigan this year. To avoid possible conflict of interest problems, I have made no changes to the ratings algorithm since summer and will also make no changes over the course of this year. This is pretty easy with Glicko style ratings because once you set them going all you have to do is enter new results data as they arrive.
A quick refresher on how the ratings work: Glicko style ratings are determined in a self-correcting relational way based on who a team competes against. If you win, your rating goes up. If you lose, your rating goes down. How much it moves is based on the rating of your opponent. If you beat somebody with a much higher rating, yours will go up a lot. If you beat somebody with a much lower rating, yours might barely move at all. And vice versa for losing. At the beginning of the season, each team starts with the same rating (1500). As results come in, the ratings begin to separate teams by moving teams up and down as they win and lose debates. Since there is little data early on, the ratings are much rougher at the beginning. They gradually become more fine tuned over the course of the season. They need some time to sort themselves out. More data = better. More data also gradually stabilizes a team's rating. At the beginning of the season the ratings are more unstable and react more quickly to change than they do at the end of the season. The numeric value of a team's rating is essentially a predictive value. The difference between two teams' ratings forms the basis of a prediction concerning the outcome of a debate between them. For example, a team with a 200 point advantage is considered to be a 3:1 favorite over their opponent. You can find out the predictions for your own debates by using the prediction calculator.
Comments on where the ratings sit now:
- The relative dearth of data doesn't allow the ratings to be terribly sophisticated yet. As it stands, the ratings line up pretty directly with win/loss percentage. This will change as they are able to account more and more for strength of opposition.
- The bifurcation of tournament travel also impacts the ratings. The algorithm makes no assumptions about the quality of a tournament overall. Instead, one's rating is determined in relation to the "pond" that one swims in. This can have a big impact if we are trying to compare teams from relatively discrete ponds that don't interact much. In other words, 1700 for a team that went to UMKC/Weber does not necessarily mean the same thing as it does for a team that went to GSU/UK. We will only be able to get a better picture as there begins to be more circulation between the two pools of debaters.
- Neither the Kentucky nor Weber round robins were included in the first period of data. I discussed the reason here. In short, including the round robins in the first data period has a large distorting effect because the ratings have not yet been able to distinguish quality of opposition strongly enough. Since at a round robin you have a large number of rounds against almost exclusively high quality opponents, the ratings can unduly punish teams that underperform and inadequately reward teams that do well. The solution is to move the inclusion of the results back one time period (in effect, pretend that the round robins happen a week later than they do). This allows the ratings to settle in and more properly evaluate round outcomes. For now, this means that at least Michigan State ST (as well as maybe Iowa KL and one or two others like Binghamton SS) are mostly likely underrated in the standings. Inclusion of the round robins will also likely pull Michigan KM and possibly Harvard HS down a little.
- To make sure that there's a minimum of data available for each team, there is a two tournament minimum for inclusion into the ratings. It will go up to three later in the season. Consequently, you will not see your name in the ratings if you have only attended one tournament, even if you did well.