Building a better bracket: CHD interviews Microsoft’s Walter Sun

Walter Sun is a principal applied science manager at Microsoft who helped create “Bing Predicts”.  After researching 10 years of NCAA data and applying some sweet science to identify key patterns, his team has formulated a tool that can help make all kinds of predictions from who will win an Oscar for Best picture to who will win the NCAA Tourney.  CHD’s Jon Teitel got to chat with Mr. Sun about this new tool for people who want to build a winning bracket based on something other than uniforms and mascots.


How did you come up with “Bing Predicts”, and what makes it so effective?  Bing Predicts is an initiative that we have been working on for some time. It uses machine-learned models to analyze and detect trends from aggregate and anonymous web data and social activity as well as a host of other data for differing events.
For popularity-based contests like American Idol, web and social signals can highly correlate with popularity voting patterns. From this, the engine can make accurate projections on who will be eliminated each week and who the eventual winner will be.  On the other end of the spectrum, predicting the World Cup, NFL season, and now the NCAA tournament requires the incorporation of player/team stats, tournament trends and game history, location, and data from web and social channels. The online web and social data provide the Bing model with the “wisdom of the crowd.”
For the NCAA tournament, we have built Bing predictions right into our bracket builder. As you hover over a team we give you all the data you need to make an informed decision on a matchup.

Have you previously tried to predict which schools would win the tourney, and how successful have you been?  This is our first time predicting March Madness but Bing has had a successful predictions track record and we are excited to apply that to the NCAA tournament. In fact, with Bing Predicts we have seen incredible success across the spectrum of our predictions. We went 15-for-15 in the knockout-round for World Cup predictions, had better than 95% accuracy in the 2014 US mid-term elections, and (even to our dismay in Washington state) predicted the New England Patriots would win the Super Bowl.

How did you approach the past decade of data, and why did you decide to just go back one decade instead of 2-3 decades?  There have been a number of changes to D-1 basketball over the years including different rules and regulations. Our models are trained on this past data so we want to choose a window of time which is representative of the current 2014-2015 season. For instance, we would not go earlier than 1985 since the tournament first moved to the now familiar 64-team format that year. Furthermore, the 1986 season did not have a standardized 3-point line so data learned from then would not map well to this season. The shot clock reduction from 45 seconds to 35 seconds happened in 1994 so the pace of games increased a little after that.  Considering changes like these which impact how the game is played, we found that 10 years was a happy medium between having sufficient data to learn from without learning factors which perhaps aren’t as important in today’s game. In 2005 the NBA increased the age minimum, thus introducing the one-and-done group of freshmen which exist today. In 2009, the 3-point line was moved back 1 foot to 20’/9” and the low block was removed during free throws, so we kept these facts in mind when training before and after 2009. Beyond those major changes the basic flow of the game has been similar across these 10 years. Many top high school players still attend one year of college and then go pro. We’ve observed a slight decreasing trend in possessions per game so this is something which has to be considered, making each possession all the more valuable but the trend is much less than the larger shifts seen prior to 2005.

Your review employed several different factors: which factor is most important to a team’s tourney success?  Our models indeed take a large number of variables into account for tournament success and we’ve found a handful of factors emerging as the most vital (I say a few since no single one stands out far and away as the most important). The strength of one’s conference, the success in neutral and road games, and player experience in past tournaments are features which are perhaps not going to be surprises to the sports fan as important features. Other factors which might be more surprising are that defensive efficiency is a strong indicator for success, more so than 3-point shooting efficiency, for instance. The reason is that good defense does not generally vary much from game to game, whereas any shooter can tell you that it is very hard to guarantee the same consistency every game. For instance, a good 3-point shooting team converts 40% of their shots (about a dozen teams exceed this threshold this year). This could mean that they go 8-for-8 in one game and 0-for-12 in another which creates a dramatic amount of uncertainty, bringing statistical meaning to the phrase “live and die by the three.” Teams like Kentucky (on pace to have possibly the lowest point-per-possession allowed in NCAA history) and Virginia (who has allowed less than 20 points by halftime for a large percentage of their games) appear to be in good shape for deep tournament runs. Another hidden factor is the aggregate of individual talent, with preseason expectations and number of McDonald’s All-Americans on roster as proxies. While regular season games are important, the intensity of competition goes up a level come tournament time. The idea is that once you get to the tournament everyone is giving 100% so the best talent can differentiate. A recent example of this would be the 2014 Kentucky team which had a lot of talent, but for a variety of reasons (1 possibly being that with numerous freshmen it took time for the team to gel) they did not play at that level in the regular season, relegating themselves to an 8-seed. However, once we reached the single-elimination tournament the team played up to their pre-season expectations.

What is the most important thing you learned after analyzing all the data?  By reviewing 10 years of data from the NCAA we have been able to identify key trends we think contribute to success. Beyond the features learned as described in the prior question, we also identified interesting trends in tournament play. For instance, it is well-known that #12 seeds spring a lot of upsets against #5 seeds (winning a robust 34% of match-ups: 47 wins and 93 losses), but the deeper data analysis might lend some explanation why. Namely, the automatic qualifiers from smaller conferences are usually assigned lower seeds due to the expectation that they are weaker. However, the 11 and 12 seeds are usually rewarded to the BEST of these automatic qualifiers. Some of these are underrated or have not been tested but they are clearly the top of class in their conference. When pitted against a second or third place team from a major conference, these teams on occasion match up better than the ratings may indicate. Using last year as an example since it is fresh in our minds, Stephen F. Austin was 31-2, 18-0 in the Southland, and undefeated since two early road losses before Thanksgiving. North Dakota State won the Summit and Harvard won the Ivy. All three of these teams played non-conference winners in a bigger conference and were underdogs, but they perhaps stood a better chance than the public was giving them credit for. For 2015, a team like Valparaiso bears some resemblance to these conference winners, although their actual success will still depend on actual seeding and match-up since this is being answered pre-bracket. I will still add that #12 seeds are winning just one out of three games, so if you just picked 12-over-5 for each of their four games you will most likely still do worse than picking the #5 seeds, but this is some food for thought with regards to insights on where you might want to find your carefully chosen upset picks.

I do not need a formula to see that a team like Kentucky has a better chance of winning the tourney than a team like LSU, but how can you separate bubble teams to know whether LSU has a better chance of winning it all than Georgia?  Once the brackets are populated you will be able to see how teams stack up against each as you fill out your bracket. Indeed, it would be no surprise if I said that Kentucky should win their first round game (the 1-seeds in the men’s tournament is 120-0 against 16-seeds since we moved to 64 teams in 1985), but what our models indeed seek to do is to extract who can win the 8 vs #9 games (52% to the 8 seed) and #7 vs #10 games (60% to the 9 seed), as these are close to toss-ups, and from the 10 years of data plus web and social signals we hope to provide an edge in these games. As far as “how”, our models have learned which factors are most important for success, so when our final predictions come out Sunday night you can observe which bubble teams and low seeds will most likely make some noise in the tournament.

Why is historical data so helpful when a team like Kentucky has so many freshmen who have never played in the NCAA tourney before?  Although these fresh faces may be new to the tournament, our models learn how similar composition of teams have done in the past and then extrapolate that to this season. Simply for illustration as we didn’t use this specific season of data, some of the recent freshmen-heavy Kentucky teams might best resemble the 1991-1992 Michigan “Fab 5” team which started 5 freshmen. How those McDonald’s All-Americans did can be extrapolated to predict how a young Kentucky team might do this year.  As another example, we might have learned that the Big East was the strongest conference in 2013 and thus them having a pair of Final 4 teams was a good possibility. From this knowledge, if we learned that the SEC was the stronger in 2014 we would infer that the SEC would have two Final 4 teams rather than the Big East. The side fact that the Big East also had a massive composition change is another reason why we do not learn specific teams and individuals but rather use these as templates for what components succeed.

How much of your analysis is science, and how much of it (if any) is luck?  The analysis and predictions we publish are all from science, but due to the uncertainty (luck?) of each game, getting 100% accuracy is definitely not expected. Rather, the expectation is to compare favorably against other experts. Furthermore, Bing Predicts is a great tool to see the current odds of a particular team’s next game and whether they will make a deep run in the tournament. However, Bing is not just about making predictions for just one team in a matchup. We will build out prediction scenarios for what factors would need to happen in an underdog upset. Here we are able to make the case for both teams. While one team may have a higher likelihood of winning, nothing – especially in sports – is ever guaranteed, and Bing Predicts helps us reflect that.

Is it possible to create a model that will get every tourney prediction correct, and if not, why not?  If every match-up was a best-of-7, algorithmic models could be created which could get close to 100% accuracy. However, since each game is single elimination, many factors which can’t be known before a game can affect the odds dramatically enough that it’s almost mathematically impossible to pick a perfect bracket. If you take Gonzaga and BYU this year as an example, Gonzaga won at BYU by 7, then lost by 3 at home, and won on a neutral court in the WCC title game by 16. If they were to play again on a neutral floor there is no 100% guarantee which of these outcomes would transpire. Would we see BYU run into foul trouble like in the conference title game to lead to a blowout, or would we see the strong effort made by bubble team in ending the nation’s longest home win streak, or maybe something else? If you do not believe us, recall that Warren Buffett’s actuaries did the math last year and knew they were safe enough to offer a $1 billion prize for a perfect bracket!  We are excited about the future of Bing Predictions and we hope our accuracy continues to improve with time. The quantity of upsets vary by season, but if you want a rough target we are estimating around a 75% success rate. There are 9.2 quintillion ways to pick a bracket and while there are some very safe picks (all 4 #1 seeds beating the #16 seeds in their 1st game) there are many more which are uncertain.
We believe we have just scratched the surface for predictive search capabilities. While the company is currently focused on consumer experiences like awards shows and the NCAAs, in the future Microsoft believes predictive search experiences could show up in all sorts of ways, from anticipating needs in Cortana (the company’s digital assistant) to more accurate search results based on anticipating intent.

Who is going to be cutting down the nets next month?!   Once the tournament teams are announced on Sunday, our servers will need couple of hours to run through all 9.2 quintillion outcomes of the bracket. Stay tuned to www.bing.com/bracketbuilder to see our final predictions on Sunday night!