If you feel strongly that you belong on this list but don't see your name, it may be possible that this is due to the ratings not having enough data on you. The deviation for your rating has to be below 2.2 to be listed (which roughly amounts to around 18 to 20 rounds). I count somewhere around 25 teams that might have a good enough rating but lack the number of rounds necessary to give the system enough confidence.
If you feel strongly that some teams are not correctly ranked, consider:
- These are not my personal opinions. The algorithm is set and runs autonomously from how I may personally feel about teams. I do not put my finger on the scale.
- The ratings are determined by nothing more than the head to head outcome of debate rounds. No preconceptions about which schools or debaters are good, no weighting for perceived quality of tournaments, no eye test adjustments. If you beat somebody, your rating goes up and theirs goes down. If you beat somebody with a much higher rating, it goes up more. If you beat them in elims, it will go up by more than if you do so in prelims. That's it. If you want all the gory details, follow the link given below.
- The quality of the ratings is limited by the quantity and quality of the data available. It is still early in the season and a whole lot of teams haven't seen one another. The geographic split in tournament travel makes things even more complicated. Teams listed high or low right now might see considerable changes in their rankings over the course of the season. It is entirely possible (even certain) that there are teams that have not performed in a way that's consistent with how good they "really are."
There have been significant changes to the ratings algorithm from last year. For a detailed description, please follow this link. The changes can be briefly summarized as follows:
- The ratings now use an implementation of the TrueSkill algorithm (TM Microsoft Corporation - Yay capitalism!) instead of the Glicko ratings system. The two systems share much of the same basic logic, but the core mathematical tools used are very different.
- The ratings now do a much better job accounting for opponent strength early in the season than in previous iterations. The algorithm does this by retroactively using data from "future" rounds to help it assess opponent strength when it is not sufficiently confident in its contemporaneous rating of them. An example: say you debated Team X at the first tournament of the year. The ratings do not have very much data at that point to determine how good they are. So, the algorithm will look at that team's future rounds at subsequent tournaments to create a provisional rating in order to give you more appropriate credit for your performance against them (it looks forward the minimum amount necessary to reach an adequate level of confidence). However, when you debate Team X again later in the year, say at the Wake Forest tournament, the ratings might now have a much better picture of how good your opponent is so they no longer have to create a provisional rating and will instead just use Team X's real (contemporaneous) one.
- The ratings now give extra weight to elimination round wins.
- I have stopped trying to account for partnership switches. Instead, teams will only be considered as a unit. If you debate with multiple partners, each pairing is considered a discrete unit. You will not get the credit from a win (or loss) with one partner on your rating with a different partner. While I feel bad for those that this might disadvantage, the obstacles are too large for me to overcome. It's possible that somebody might prove me wrong, but I have serious doubts that it is even possible to create an adequate model that accounts for partner switches given the type of information collected on ballots.
For a sense of what the ratings number actually means:
- A 1 point ratings advantage translates roughly into 5:4 expected odds,
- 2 points is about 3:2
- 3 points is about 2:1
- 4 points is about 3:1
- 5 points is about 4:1
- 8 points is about 9:1
If you are attentive to the rating number - not just the ranking - it will help you to understand even large differences in ranking might not amount to very much difference between teams. For example, the difference between the 13th ranked team and the 21st ranked team is only about 1 point, meaning that a debate between them would be treated as little more than a coin flip.
If you follow the link given above, you will find some graphics showing how the new ratings algorithm has performed when using data from the past four years. While the ratings don't explicitly set out to predict at large bids, they would have produced ballots well within the range of error of the actual human voters. They would have performed as a slightly below average voter for first round bids, but would have actually been an above average voter for second round bids.