We have always been interested in more generally understanding underlying strengths — performance evaluation, talent evaluation, these are very general challenges in the world. They are far more applicable outside of sports.
The days when sports fans based their picks just on a hunch haven’t disappeared but they’re dwindling. Data-driven decision-making has made its way from Wall Street to the stadiums, particularly in football.
Teams, sports journalists, and even casual fans have become more interested in quantitative sports data and how it factors into predicting performance on the field.
We sat down with Practice Professor and Wharton Moneyball co-host Cade Massey, who specializes in judgment under uncertainty — how, and how well, people predict what will happen in the future.
Massey has published some game-changing ideas about overconfidence in the NFL draft and co-developed a mathematical model with Las Vegas sports analyst Rufus Peabody that ranks all 32 NFL teams based on purely objective measures of on-field performance. Since the model’s inception in 2010, the Massey-Peabody Power Rankings have been published in the Wall Street Journal.
Ignoring outside factors like personnel, coaching, and motivation, it looks at four main factors — rushing, passing, scoring, and play success — and is then adjusted for home field advantage and game situation weighted by predictive ability.
Massey shared his insights with us about the model’s greatest advantage, predictability vs. luck, and his pick for the big game this Sunday.
Did you consider using different variables before settling on the four in the mathematical model you use to calculate the Massey-Peabody NFL Power Rankings?
We didn’t do a whole lot of fishing for variables. We have refined them a little bit over time, but largely those were the four that we began with. Our philosophy has always been to focus on two things — the first is cleaning up these basic variables so that we get a cleaner version of them than other people might. That means contextualizing, considering the opponent, considering the game situation, that kind of thing.
The other thing we focused on is predictive ability — what weight these variables have when explaining team performance in other games. So you look at these four variables on the offensive side or on the defensive side, and ask how well what they do in this game predicts what they’re going to do in another game.
Factoring in predictive ability is probably our single greatest advantage because ultimately the task of people who are trying to forecast performance is to predict out-of-sample performance. So you have to weight the variables by their out-of-sample predictive ability, and a lot of people are crunching numbers these days but there aren’t many who are assigning those numbers weight by how well they predict performance out of sample. They typically assign them weight by how they relate to performance within the same sample.
Since 2011, you’ve had a 56.4 percent success rate against the spread with this model. What is the model’s record against the total during the same time period?
We don’t know because we haven’t tracked its record against the total because we don’t focus on it. This whole thing is built to get team strength not scoring. We have a way of producing an expected total but we don’t focus on it, and I would strongly expect that our prediction there would be at the market.
The scoring is a thing that’s pretty much only of interest to bettors, and we have always been interested in more generally understanding underlying strengths — performance evaluation, talent evaluation, these are very general challenges in the world. They are far more applicable outside of sports. They’re widely applicable. We want to develop tools for that, we want to develop language for that, we want to proselytize good practices when it comes to performance evaluation and talent evaluation, and that is all about assessing team strength. That doesn’t really have anything to do with totals.
How much does luck play into it?
A ton. It’s unbelievable. We have a six-year track record that says we’re pretty good. We’ve beat the line four or five of those years, and at this point, we have well over 500 public bets posted so that’s a big record and probably the best of anybody out there who’s just giving it away. It competes with people who do it professionally and who sell their stuff. And yet we’re only right 56 percent of the time.
I mean 56 is great in this business and people can make a living with 56 percent, but that means you’re wrong 44 percent of the time. We consider that irreducible uncertainty. There’s just a limit to what you can anticipate — what you can explain. And the rest, that means you’re just exposed to luck, and that’s a big jump.
Who’s your pick for the big game this Sunday between the New England Patriots and the Philadelphia Eagles?
It’s going to be tight game. Our predicted score is Patriots 27, Eagles 21.
The good news for Eagles fans is that, as Massey says, there’s a 44 percent they’re wrong. – Colleen Donnelly
M&T student Red Dimaano, W’18, E’18, on why he loves data science and how Prof. Pete Fader started him on a path into academic research.
There are over twenty Wharton concentrations, and one of the newest concentrations to be added to the list is business analytics: a path combining operations, information, and decisions with statistics. The blog of the Jerome Fisher Program in Management and Technology talked with M&T senior Rafael “Red” Dimaano to find out more about this concentration and what led him to it.
1. What drew you to the business analytics concentration?
Being able to declare the concentration was honestly sheer luck! Earlier in my studies, I made a list of Wharton electives I wanted to take: most were quantitative and data-centric courses, fitting my interests in statistics and computer science. I didn’t put much thought into concentrations and decided I’d figure it out later. Then business analytics was announced as a concentration the summer after sophomore year and, upon looking at it, I realized every class I wanted to take was on the list of requirements. Boom! My concentration was born.
2. Why do you think business analytics was added as a concentration? What benefits does it afford to M&T and Penn students?
In my opinion, business analytics is one of the best concentrations Wharton is offering right now. Data has become increasingly important in the business world and its value cannot be understated. Personally, I see business analytics as a nice complement to my computer science major in terms of learning about data science. Though both curricula have many offerings in data science, the business analytics concentration has a stronger focus on statistics and optimization, while the computer science classes focus more on algorithms and technologies. I like having that double whammy of data from very different perspectives.
3. What opportunities do you see for your future with this concentration?
While the concentration is overall geared towards learning more about “data science,” I found the courses I took helped make that interest set more specific. In particular, taking MKTG/STAT 476 with Prof. Peter Fader has sparked my interest in research. After taking that class, I went into research with him on advanced statistical/decision models for data and decided that a Ph.D. might be the way to go after Penn.
4. What advice would you give to students trying to decide on an area of study or concentration?
Most people think of a concentration as a list of classes needed to get it done. I recommend doing the reverse: first list the classes you want to take, then slot them into the requirements. This method provides flexibility in designing your course of study by say, doing an individualized concentration or petitioning for certain classes to count for an existing one. Or, you might get lucky and a new concentration will come up that fits what you want exactly!
5. You said you started doing research with Prof. Fader after taking his class. How did you get there and what are you working on with him?
I got there through his class! It’s arguably one of the best classes I took at Penn. After the class ended, there was still so much more I wanted to learn. There aren’t other classes teaching similar material, so the only good option was to do research.
I’m currently working on the problem of estimating customer lifetime value for two-sided markets. Estimating customer lifetime value in single-seller situations is a well-studied problem we talked about a lot in 476. The two-sided market case is different since firms that run two-sided platforms may also be interested in how valuable their sellers are and the value of their interactions with certain buyers. We’re currently working on extending the current models for single sellers to this case.