One of the more frustrating limitations of Wizards-published dailies is the lack of matchup information. You get lists, you get standings, you get win percentages, but you don't actually know which decks beat which decks en route to their finish. This is fine if you just want to describe the Modern metagame, but far less helpful if you are actually trying to figure out what decks are good. That's where the "MTGO Deep Dive" dataset comes in. I've focused on this project (recording dailies from the client in their entirety) in two of my last articles, the first showing overall match win percentages (MWPs) for different top decks, and the second highlighting some overperforming decks that were not necessarily top-tier. Now it's time to take the analysis one step further and see how decks matchup against each other. Is Abzan truly strong against Twin? Is Affinity vs. Burn just a race? The data will give us some answer to these questions and more.
This article uses the MTGO Deep Dive dataset to get the win rates of different decks in different matchups. This is very much in line with a similar analysis done by reddit user dafrk3in, who calculated matchups using data from Pro Tour Fate Reforged. Using data from MTGO, I run a similar analysis of matchups in our current March-April metagame. In the interest of space and of only presenting reliable results, I'm only going to discuss matchups among the top decks of MTGO. These decks are both pillars of Modern and have suitably high Ns for us to make conclusions from. This analysis will give us a sense as to how decks succeed or fail against each other, and how that knowledge can be used to make informed decisions going into events.
Dataset and Methods
As in past articles, I'm using the so-called MTGO Deep Dive dataset I used in my last two articles. This project is the result of collaboration between me and a few other MTG friends from the MTGSalvation community. In essence, it's a set of 16 dailies recorded in their entirety. It includes not just the 4-0/3-1 matches we see online, but also the 2-2 or worse finishes that do not get published. And, of course, it also includes the matchups between decks. Although 16 may seem like a small N, these events span dozens of decks, hundreds of players, and thousands of games. The end result is a wealth of matchup data we can use to calculate the "true" MWPs of various decks in Modern.
In calculating MWPs, I have already adjusted for all byes, mirror matches, drops, draws, and other elements that could affect the accuracy of an MWP. This applies both to the overall MWP of any given deck, but also its MWP in relation to another matchup. All overall MWPs have also been compared to the "average" MWP of all MTGO decks in the sample, a weighted average that is also adjusted for all those elements above. Based on their values relative to the average, all deck MWPs receive a P value to indicate whether it is likely or unlikely to be an above average deck. A high P value (>.1) would indicate the deck is probably within expected variance and not truly above or below average. A low P value (< .1) starts to suggest a deck is above or below the average MTGO performance.
Finally, as is always the case with statistics in these articles, all other data disclaimers about the perils and pitfalls of statistics apply!
Matchups and Win Rates: Top-Tier decks
Today's article focuses on the top-tier decks of MTGO as defined and shown on the Top Decks page. For each deck, I give its prevalence in both the Top Decks and Deep Dive datasets, along with its overall MWP and the the significance of that MWP. After that, it's all matchup win-loss rates for the different top-tier decks. I'll end each section with some summary of the stats and takeaways I view as important.
It is important to NOT use these numbers as set-in-stone benchmarks for matchups. Rather, they should be checked against your own testing and game experience to see how they can confirm or challenge your own conclusions. This is the mix of quantitative and qualitative methods we want to see when looking at this kind of data. Remember: a lot of these MWPs could well be higher or lower than the "true" MWP if we had a much larger N, so we need to view this as a starting point rather than an ending one. Some of these numbers will make perfect sense (e.g. UR Twin vs. Grixis Delver). Others seem odd and demand further investigation (e.g. UR Twin vs. Abzan). Either way, it is up to us to interpret the data, not to just categorically accept or reject it based on a few datapoints.
As one last point of reference, the average MTGO-wide MWP is 49.25%. Use this as a point of reference when thinking about the different decks below.
- Top Decks prevalence: 8.1%
- Deep Dive prevalence: 8% (76)
Deep Dive matches: 242
- MWP: 52.1% (p=.38)
vs. Abzan: 75% (6/8)
vs. Affinity: 58.3% (14/24)
vs. Burn: 50% (16/32)
vs. Jund: 50% (4/8)
vs. Amulet Bloom: 36.4% (4/11)
vs. Grixis Delver: 18.8% (3/16)
Twin has been called Modern's best deck, and although there is reason to suspect that is true, we don't really see it in the numbers. I think this is due in large part to the popularity of Twin. It's the third most-played deck on MTGO, but it's also the most expensive of the top three decks. This suggests you have players gravitating towards Twin just because they think it is "good", not because it is cheap (i.e. Burn) or the hot new thing (i.e. Grixis Delver). This is why Twin's MWP of 52.1% is probably lower than it would otherwise be in the hands of a skilled pilot. Moreso than Burn or Grixis Delver, Twin rewards tight play and punishes bad pilots, which explains why Twin's MWP is one of the lowest of the top-tier decks (while still being above average).
As for matchups, I am surprised the Abzan matchup is so heavily in Twin's favor. To me, this suggests player inexperience more than any other result: Twin is by no means a "good" Abzan matchup, but it's also not quite this bad. The Grixis Delver MWP is very interesting, because if true, it would go a long way to explaining why Grixis Delver is so successful right now on MTGO. The rest are about expected, although I was curious to see such an even Burn matchup. My guess is Burn wins this in any game where Twin is on the draw or misses the turn 4 combo, and Twin wins on the play and if it can go Exarch/Twin on t3/t4.
- Top Decks prevalence: 9%
- Deep Dive prevalence: 10.4% (99)
Deep Dive matches: 299
- MWP: 53.9% (p=.11)
vs. Abzan: 63.6% (14/22)
vs. Affinity: 50% (13/26)
vs. Jund: 28.6% (4/14)
vs. UR Twin: 50% (16/32)
vs. Amulet Bloom: 36.4% (4/11)
vs. Grixis Delver: 81.8% (9/11)
Twin may be Modern's "best" deck, but Burn is its most-played. It's prevalence is actually higher in the Deep Dive dataset than the Top Decks one, which suggests there is even more Burn out there online than we see on public dailies. That said, the Deep Dive also shows Burn might have more going for it than just prevalence. With a P of .11 on its 53.9% MWP, Burn is actually pushing the upper edge of the expected variance of MTGO MWPs. Indeed, of all the tier 1 decks, Burn is the only deck that gets this close (although not quite making it). So Burn isn't just popular: it's also very strong. This makes sense to me because Burn is so dang linear, which is going to give you a lot of random wins against opponents who are too slow or too interactive.
Turning to matchups, neither the Abzan, Twin, or Affinity matchup should surprise anyone: the latter two are a race and the first is the matchup that put Burn on the map to begin with. I am very interested by Burn's apparent strength against Grixis Delver and weakness against Jund, two decks rising up for very different reasons. We know two big reasons for Jund seeing more play are because it is less painful than Abzan (thanks, Blackcleave Cliffs) and because it can use Lightning Bolt to stave off early threats. As for Grixis Delver, this also makes a lot of sense. Terminate is just awful against Burn, Gitaxian Probe is often free damage, and the deck has lots of fetches/shocks with no lifegain. So these are two more numbers supported by our theoretical understanding of the matchups.
- Top Decks prevalence: 6.9%
- Deep Dive prevalence: 7.7% (73)
Deep Dive matches: 220
- MWP: 52.7% (p=.31)
vs. Abzan: 40% (4/10)
vs. Burn: 50% (13/26)
vs. Jund: 60% (3/5)
vs. UR Twin: 41.7% (10/24)
vs. Amulet Bloom: 50% (4/8)
vs. Grixis Delver: 37.5% (6/16)
Like both Twin and Burn, Affinity does push the upper edge of the MWP range, but not with any degree of significance. And like Burn, the true prevalence of Affinity might be higher than the observed prevalence in the published dailies, which makes sense given the enduring popularity of Affinity in Modern. When I look at matchups, almost all the numbers above are in line with our expectations. Abzan is rough because of Stony Silence, and although 10 matches isn't the big N I would like to see, 40/60 seems about right for this matchup given all the factors. Burn makes perfect sense as a straight up 50/50 race, and Twin seems right at about 40/60 due to the dual pressures of efficient red removal (including the powerful Electrolyze) and a combo finish Affinity can't interact with in game 1 short of 3-4 Galvanic Blast. Of course, the Grixis Delver matchup is the most interesting, because it suggests a further reason as to why Grixis Delver is doing so well in this current metagame. Not only is the deck beating Twin, but it's also killing it against Affinity. This also makes sense from a theoretical perspective: Affinity would definitely struggle with the efficient removal of Grixis Delver, the efficient countermagic to stop cards like Plating/Thoughtcast, and fast and durable clocks they can't kill. This would only get worse in games 2/3 after Grixis Delver brought in anti-artifact effects.
- Top Decks prevalence: 4.9%
- Deep Dive prevalence: 5.2% (49)
Deep Dive matches: 152
- MWP: 53.3% (p=.324)
vs. Affinity: 60% (6/10)
vs. Burn: 36.4% (8/22)
vs. Jund: 71.4% (5/7)
vs. UR Twin: 25% (2/8)
vs. Amulet Bloom: 80% (4/5)
vs. Grixis Delver: 90% (9/10)
To me, the most interesting Abzan fact is not the win rates, but rather the metagame share. This deck has completely tanked across MTGO, a trend shared in paper but not nearly to the same extent. How did a deck that was 25% of the recent PT fall down to about 5% of the MTGO metagame in less than 3 months? The MWP doesn't explain it. 53.3% is at the upper end of top-tier deck performance, and although it's not quite statistically significant, it's still exactly where the so-called 50-50 "police deck" of Modern should be performing. So why is Abzan's share dropping if its overall performance is basically fine?
The matchup data gives us two possible explanations for this. The first is Burn. Burn is rampant on MTGO, and you don't want to be the deck with a bad Burn matchup. You also don't want to be spending about $2000 on a deck that has a bad Burn matchup either. It gets even worse when you are losing to Twin, a deck Abzan is supposed to beat. To be fair, I don't think the actual matchup between Twin/Abzan is this lopsided. Yes, there are reasons to believe Twin is actually favored in this matchup (or it is close to even), but this number doesn't make a lot of sense. Even so, assuming Abzan's true Twin matchup is closer to 40/60 or 50/50, that's still not enough to buoy a crappy Burn matchup. About the only saving grace for Abzan is its Grixis Delver matchup, which is exactly what we would expect of a BGx deck against a Delver deck. This number just has to be overrepresented, but even if it's just a 60/40 or 70/30 matchup, that's a big boost in Abzan's favor.
- Top Decks prevalence: 3.6%
- Deep Dive prevalence: 3.6% (34)
Deep Dive matches: 115
- MWP: 52.2% (p=.54)
vs. Abzan: 29% (2/7)
vs. Affinity: 40% (2/5)
vs. Burn: 71.4% (10/14)
vs. UR Twin: 50% (4/8)
vs. Amulet Bloom: 66% (4/6)
vs. Grixis Delver: 50% (4/8)
As Abzan has fallen across Modern, Jund has gradually risen to take its place. Jund went from basically 0% at the time of the TC banning to about 4%-5% of paper and MTGO. The current MTGO prevalence is a little lower now than before, but Jund is still a very viable deck that is showing up everywhere. Indeed, it's getting the tier 1 bump in this most recent metagame update, and these MTGO stats give some explanation about why. From an MWP perspective, Jund is pretty average for the top-tier decks, which suggests it's not the overall deck MWP driving its rise. To see where Jund is successful, we need to look at matchups.
Our N is a bit small for some of these matchups, so I am hesitant to draw strong conclusions from much of this data. Two exceptions to this are Abzan and Burn. It makes a lot of sense Jund is weak to Abzan: Jund can't do anything about Spirit swarms, Path gives Abzan the midrange edge, and Bolt isn't very useful as removal. Dark Confidant is an easy way to improve this matchup, and my guess is if we controlled for Jund decks running Bob and those not running it, we would see bad Abzan matchups mostly in Bobless decks. But Bob himself is at odds with Jund's best matchup: Burn. I think this is one of the big reasons the deck is enjoying success these days, for similar reasons as to why Abzan is declining. A less painful manabase and Bolt go a long way to keeping Burn at bay. Expect to see more Jund as the format keeps evolving, based largely on this Burn matchup.
- Top Decks prevalence: 4.1%
- Deep Dive prevalence: 3.3% (31)
Deep Dive matches: 104
- MWP: 60.6% (p=.03**)
vs. Abzan: 20% (1/5)
vs. Affinity: 50% (4/8)
vs. Burn: 63.6% (7/11)
vs. Jund: 33% (2/6)
vs. UR Twin: 63.6% (7/11)
vs. Grixis Delver: 71.4% (5/7)
I don't really care too much about these specific matchups. The positive Twin matchup is nice, the unfavorable Abzan/Jund matchup makes sense (but seems overstated based on my own experience with the decks), and the strong Grixis Delver/Burn matchups are totally in line with Amulet Bloom's gameplan. But again, the most interesting data here is not in the matchup section. The real takeaway is that MWP and its corresponding P value. Of every deck in the dataset, Amulet Bloom is the only deck with more than 20 matches to have a P value so low. The .03 means it is 97% likely that Amulet Bloom has an MWP truly above average relative to the average MTGO performance. And boy, is it above average! 60% is well over the 49% average and even the expected variance. It's also well over what other decks are doing. Yes, this data has a lot of limitations, both statistical (e.g. the size of N) and contextual (e.g. it's MTGO data). But this MWP is still so far above and beyond other MWPs that it's impossible to ignore. If someone were to ask what the best deck in Modern is, I'd probably have to say Amulet Bloom. We had reasons to suspect this in the past, and this is yet another datapoint confirming the theory. This still leads to questions about its relatively small metagame share if its MWP is so high, but those questions don't undercut the MWP and its significance.
- Top Decks prevalence: 8.4%
- Deep Dive prevalence: 7.2% (68)
Deep Dive matches: 213
- MWP: 48.4% (p=.79)
vs. Abzan: 10% (1/10)
vs. Affinity: 63.5% (10/16)
vs. Burn: 18.2% (2/11)
vs. Jund: 50% (4/8)
vs. UR Twin: 81.2% (13/16)
vs. Amulet Bloom: 28.5% (2/7)
I could write a whole article on how awesome this deck is. Oh wait... Grixis Delver is one of Modern's hottest new decks and these stats give some context to its rise. The prevalence is obviously striking: this is a deck that went from 0% to 7%-8% in about 2-3 months without a single pro player or major paper event driving that rise. This is an MTGO community special, developed more or less independently by players across the community. This homespun approach is reflected in the MWP, which is solidly average and one of the lowest of all the different top-tier deck MWPs. To me, this is expected behavior. For one, there is no established Grixis Delver baseline list for players to use. Two, because the deck has such a flavor-of-the-month feel, lots of players are picking it up without necessarily much experience. Both of these factors will bring down the deck's "true" MWP.
Grixis Delver's matchups are probably the most interesting of all the different matchups we have seen so far. If you want to know why the deck is successful, look no further than the Affinity and Twin matchup. Beating Affinity is good, but absolutely trouncing Twin is something special. Few decks can do this, and I think this is a huge reason for Grixis Delver's success and popularity. By contrast, the Abzan and Burn matchups are just terrible, which presents an odd metagame tension when deciding whether or not to play this deck. When comparing these quantitative measures to our qualitative theories for why Grixis Delver might be good or bad, we see a lot of overlap. For example, the deck's disruption is awesome against Twin and Affinity but (with the exception of Terminate) totally underwhelming against Abzan and just plain bad against Burn. Although I think all of these numbers are on the higher end of their actual range, their general thrust is about accurate, so you can adjust them by +/- 10% or so and probably be closer to accurate.
For all you MTGO and Modern regulars, you are probably wondering about decks like Merfolk, Scapeshift, RG Tron, and other decks we would expect to see matchup data about. I'll discuss these matchups, and some additional takeaways from today's article, when I revisit the Deep Dive dataset next week. Just considering the numbers today, I can't emphasize enough that these are not immovable benchmarks. We should not read these numbers and say "Grixis Delver has an 80% win rate against Twin". That's both a misuse of the dataset and a misunderstanding of how Modern matchups work. Rather, we should say "Grixis Delver seems strongly favored in Twin. What are some interactions and cards that could explain this on both sides of the table? Does this line up with my experience with the decks?" This is the way to use matchup data like we looked at today.
Join me on Wednesday when I give some metagame updates and talk about some new changes coming to the Top Decks page. Until then, enjoy those new MM2015 previews (Noble Hierarch confirmed at RARE!!).