Strength of Schedule is the friggin’ worst.

I hate strength of schedule.

It’s not that it’s a meaningless number, of course. It tells us… something. But what is that something? To hear fans and, too often, media discuss it, that something is nearly everything. Without a tough schedule, you can’t be great. But playing a great schedule makes you great, regardless of the outcomes. Or something like that.

But that’s wrong on myriad levels, which is the topic I wanted to explore here. Let’s give Strength of Schedule some serious thought and then discuss how much it should actually play into our basis for analyzing college football teams.

Here’s a thought experiment. Which team has a tougher schedule:

Screen Shot 2019-06-18 at 11.18.24 AM

Maybe the answer is Team A. They have to play multiple games vs. top-10 opponents.

Or maybe it’s Team B. They have three times as many games vs. top-25 opponents.

The real answer is… it depends.

Let’s say we’re talking about Clemson. Clemson is really good. They’ve got top-5 talent, so while facing off against a team ranked, say, 19th, isn’t exactly a cakewalk, we’d expect the Tigers to win. So, vs. Team B’s, there’s a good chance Clemson is going 12-0. But if Team A was Clemson, well now they’ve got two games vs. teams that, theoretically, have a similar level of talent on the roster. If those top-10 teams are Alabama and Ohio State, we might expect Clemson to go 11-1 or 10-2 against that slate. So, clearly Team A has the tougher schedule, right?

Well, let’s change the details a bit. Now let’s say we’re talking about Syracuse. The Orange are pretty good, too. They beat some good teams last year. They also lost to Pitt and nearly fell to UNC and were utterly smoked by Notre Dame. So, how would they fare against Team B’s schedule? Probably decent, but there are nine games on that docket where Syracuse could be challenged. Let’s say things go pretty well and they go 9-3. Now give them Team A’s schedule. They’re almost certainly going to lose those two top-10 games, but every other game on their slate is definitely winnable. They might be 10-2. So, Team B has the tougher schedule than, right?

The point here is that schedule strength requires us to choose the same context and apply it to all teams, even if that doesn’t entirely make sense in reality. The truth is, there’s a ton of context that impacts the difficulty of a team’s schedule beyond the simple metric of “how many good teams did they play?”

Take the 2014 Ohio State team. Remember them? They won a natty. They also lost at home to Virginia Tech in Week 2. That was a BAD loss. That Hokies team wasn’t very good. But let’s remember the context: Ohio State had injuries on the O-line. J.T. Barrett was making just his second career start. Virginia Tech finished that season just 7-6 but its defense was ranked 11th in S&P+ and had 109 tackles for loss, sixth-most nationally.

In other words, 2014 Virginia Tech wasn’t helping anyone’s strength of schedule, but if you had a green QB and a beat-up O-line, the Hokies were NOT the team you wanted up next on the docket.

Here’s another question: What was Purdue’s strength of schedule last year? Go ahead, you can Google it. This is an open-book quiz.

What’d you find?

Answer: Purdue had the No. 3 strength of schedule in 2018, according to ESPN.

But what if you looked at Sagarin instead? No biggie, they were fourth there.

Oh, but what about Football Outsiders? There Purdue was 27th.

Or the Colley Index? They had the Boilermakers at 15th.

Or FEI? Yikes, now Purdue’s 40th!

It almost is as if Strength of Schedule was an arbitrarily determined metric that’s inconsistent from source to source.

This isn’t to say these outlets are simply plucking numbers out of thin air, but just that the formula for figuring out strength of schedule differs from one place to the next.

In fact, here’s your next quiz: What is the formula for strength of schedule? How does, say, Football Outsiders determine it? Does it account for recruiting rankings? Or home-field advantage? Or injuries? Or who you played the week before? Or potential trap games? Or particularly bad personnel matchups like that Ohio State-VT game? Is it based on last year’s records or the last 10 years of records or what we expect this year’s records to be? Does it take travel distance into account? Thursday games after a Saturday game? Second-order wins? Are all FCS games treated the same?

(Note: No knock on Football Outsiders, who’s given this more thought than almost anyone.)

The answer here is, you probably don’t know. And you certainly don’t know for every outlet. And you most definitely don’t know for the metrics being used by the playoff committee. So you’re simply trusting that an outlet like Football Outsiders or ESPN has a reliable enough track record that you’ll trust their computations. And that’s fine. But when studying data, these are still good questions to ask.

Next question. A team’s non-conference games are The Citadel, Oklahoma State, Notre Dame and Florida. True or False: This is a good non-conference schedule?

Again the answer is, it depends.

If this was 2018, when Florida and Notre Dame were both top-10 teams, Oklahoma State was good, and The Citadel gave Alabama one of its toughest games of the year, you’d say this was a really tough non-conference slate.

But, way back in 2014, when this was Florida State’s actual OOC schedule, it was something of a joke. FSU’s strength of schedule for the year, despite having three Power 5 OOC games and playing Clemson, Louisville and Miami, was ranked 34th by Football Outsiders.

So, if FSU scheduled big-name programs out of conference, doesn’t control its ACC slate, and still ended up 34th (and undefeated!), what’s a team supposed to do? Identifying a problem should also come with an action item. Here’s how to fix it. But scheduling isn’t fixable. It’s out of the team’s control.

Games are scheduled years in advance, so who knows how good, say, Notre Dame might be in 2023? When FSU faced Oklahoma State in 2014, the Cowboys had a healthy starting QB and played well. But J.W. Walsh got hurt a week later and didn’t throw another pass that season for a team that finished 7-6. Notre Dame was 6-0 entering its game vs. Florida State. It was an epic battle decided on the final play. And then… the Irish went 2-4 the rest of the way to finish 8-5. If anything, we could suggest FSU took all the wind out of Notre Dame’s sails, which you might think was a positive for the Noles. Instead, the struggles of the Irish down the stretch hurt FSU’s strength of schedule. And teams shouldn’t be in the position of rooting for their former opponents months later in hopes of bolstering their own resume.

OK, another question: Team A is 12-0, has the 50th-ranked strength of schedule, and has an average margin of victory of 31 points. Team B is 11-1, has the 4th-ranked strength of schedule, and has an average margin of victory of 12 points. Which team is better?

Team A has clearly won in more impressive fashion, but it hasn’t had to play nearly as many tough games. But Team B actually lost a game, so can we simply ignore the outcome and rank based on degree of difficulty?

Or how about this: Team A is 13-0 with the 17th-ranked SoS, and Team B is also 13-0 with the 76th-ranked SoS. Which team is better?

If you answered Team A, good work. You noticed that they played a harder schedule to get to the same record. Of course, Team A in this scenario is Alabama entering last year’s playoff, and Team B is Clemson. And as it turned out, on the field, Clemson was significantly better.

The reason: Strength of schedule is not a metric to determine how good a team is. It is simply a measure of our confidence in the quality of those teams.

Let’s say Bill Belichick was found deflating balls and spying on other teams again, and so the New England Patriots are relegated to the Sun Belt next year. Fun, right? But Tom Brady is still Tom Brady, so the Patriots go 12-0 and win the Sun Belt with ease. They win every game by 30 and rest their starters in the second half routinely. Still, their SoS at year’s end isn’t going to be great. The highest ranked Sun Belt team last year, per ESPN, was Louisiana-Lafayette, which ranked 90th.

So, are the Patriots, with that 90th-ranked SoS, making the playoff? Hell yes! Because we know who Tom Brady is, and we know the Patriots are great, and so we don’t need a metric like SoS to tell us that. We don’t need a measure of certainty for the Patriots because we’re already certain. UCF? We’re less certain about them. And that’s what a bad SoS tells us. It says, “We don’t know.” And that’s ALL it says. It doesn’t say UCF is bad. It says we don’t know. A good SoS says we’ve seen teams tested, so we know more. A bad SoS says we’ve not seen teams tested, so we’re not sure. The rest is up to us (or, at least, other metrics).

Last question: Last year, Maryland had the 20th-best strength of schedule, per Football Outsiders. Nebraska had the 30th-best. How much better was Maryland’s schedule than Nebraska’s?

Was it 10 better? The Terps were 10 spots higher.

Football Outsiders does give us an SoS score, with an average top-5 team projected to have an .861 winning percentage vs. Maryland’s schedule, and an .868 vs. Nebraksa’s. They’re essentially the same, right? They’d have to play more than 100 games before we’d expect a difference of just one win in outcomes.

In fact, if we look at the SoS scores and add one standard deviation, plus or minus, it covers every team from No. 18 Florida State (.850) to No. 112 New Mexico State (.959). In other words, when discussing how an “average top-5 team” would perform, the difference between the 18th-toughest schedule and the 18th-easiest schedule could be decided by relatively normal fluctuations — a few strange bounces of the oblong ball.

If we were to take, instead, an “average top-25 team,” we’d see a bit more fluctuation, but the general point remains. For good teams, there are only a handful of real-world schedules each year that would create a dramatically different outcome.

OK, so we’ve been through all this now and shown that strength of schedule isn’t a metric designed to measure a team’s quality and can vary based on which formula you use and even then misses out on some key context that should be applied when analyzing specific teams. But when we get to November, you’ll still be arguing that strength of schedule matters a lot. Hell, in January, folks were arguing Alabama only lost to Clemson BECAUSE of its difficult strength of schedule. Why?

The thing that feels reasonable to us about strength of schedule is the idea that doing a difficult thing repeatedly makes it less likely you’ll continue to succeed at that difficult thing. Think about bench pressing 250 pounds. Maybe you can lift it once. Maybe two or three times. But you’ll start getting tired by that fourth rep and, damn, by the fifth one, your arms are jiggling and you’re yelling for a spotter to save you from certain death. Same idea with schedule strength, right? Keep playing really good teams, and you’ll wear down.

But is that actually true? Let’s test the hypothesis.

Last year, there were exactly 26 Power 5 teams that played three or more games vs. teams ranked in the FPI top 25 prior to Nov. 1.

(Note: We’re using FPI top 25 as an arbitrary stand-in for “good team.” The numbers wouldn’t shift dramatically if we used AP top 25, for example. And these are stats based on year-end ranking, so we’re not including, say, LSU’s game against Miami since the Hurricanes proved not to actually be very good.)

The first takeaway from this should be: Wow, nearly half of the P5 played 3 or more games vs. FPI top 25? That’s a lot. And, it’s worth noting, only six teams played four games vs. FPI top-25, and only Tennessee played five. So, the distinction of a team that’s playing some merciless schedule really doesn’t exist but for maybe one or two teams, while nearly all teams are tested to some degree. (And even those numbers include the implicit assumption that SEC teams typically make for a tougher opponent than other teams.)

Taking this a step further, of the 46 bowl-eligible Power 5 teams, 17 played back-to-back games vs. teams ranked (at game time) in the top 25. That’s 37% — or a little more than one-third. But guess how many played three in a row? One. Just LSU. That’s it. No other team played three consecutive weeks vs. a ranked team during the regular season. Again, there’s always one or two outliers, but most teams are not challenged by an elite opponent on a weekly basis. Brutal scheduling is a myth.

But set that aside. Let’s look at how those supposedly tough schedules impacted teams.

First, let’s look at the teams with tough September/October slates.

Screen Shot 2019-06-18 at 11.19.04 AM

Shockingly, the group was essentially the same before and after. Even if we filter out games vs. top-25 teams and only look at how they performed vs. lesser teams, they had essentially the same win% before and after Nov. 1, and actually won a bit more impressively afterward.

Or how about those teams that had back-to-back games vs. top-25 opponents. They’re beaten up, right? They’re emotionally drained, right? Well, a few actually had an off week the following week (because good ADs go out of their way to try to make the schedule easier) but the ones that didn’t went 9-5 in games following their top-25 doubleheaders, which is right about the record we would expect. Aside from TCU beating Oklahoma State, there wasn’t even a legitimate upset in the mix.

We can repeat this for any season using pretty much any metric, and in the aggregate, there is zero evidence to suggest that playing a particularly arduous schedule correlates to a team performing worse as the season progresses. In other words, it’s harder to beat good teams and easier to beat bad teams, but that doesn’t change based on who you played previously.

One metric we don’t have: Injuries. I’d love to see a data set that compares how many injuries occur vs. each team and compare that with win-loss records. My instinct is that most injuries are flukes and can happen vs. anyone, regardless of the opponent’s physical acumen. But it’d be nice to have some real proof of that.

So, what’s all this mean? Should we just abandon strength of schedule as a metric?

Of course not. A win against Alabama is more impressive than a win against Alabama State. No one should argue this. But what needs to be remembered is that if Team A beats Alabama, and Team B beats Alabama State, we cannot then use that information to definitively say that Team A is better than Team B. All we can say is that, all other information being equal, we’re more certain that Team A is legitimately good than we are about Team B.

The entire point of the college football playoff was to allow actual games to determine our champion, and to remove the liability of inherently unequal scheduling by affording the four best teams a chance to decide it on the field. Instead, what we’ve gotten is one logical fallacy after another, based on little evidence, using metrics that most fans don’t actually understand. Numbers should be used to illuminate a narrative, not to create it. Strength of schedule shouldn’t be telling us the opposite of what our eyes have seen. So let’s be smart out there. Let’s use strength of schedule the right way. And let’s not spend all season bickering over why losses don’t matter if the opponent was good enough.

OK, I know. That ain’t happening. But a guy can dream, can’t he?

