The Deep Dive dataset is back! In three earlier articles, I analyzed a collection of MTGO dailies to determine the matchups and win rates of different top-tier decks. Unlike the publicly published MTGO dailies used to inform the Top Decks page, our Deep Dive dataset includes all finishes from a sample of dailies, not just the 4-0/3-1 ones. It also includes all the matchups between those different decks, not just their overall standings. Today, we are returning to the Deep Dive to see how different win rates and matchups are doing. This includes both overall deck win rates and individual matchups between decks, all with an even larger sample size than before. And as many of you can guess, one of the best decks from last time is still on top. In fact, it's more vigorous than ever.
As in this last articles, I'm going to focus on the top-tier Modern decks as defined in our Top Decks page, paying special attention to the MTGO stats because the Deep Dive dataset is MTGO-based. I'll also include a brief discussion of the dataset itself and all the different pieces that go into it. Then we'll dive right into the deck win rates and their matchups. All in all, this analysis gives us an important quantitative perspective on which decks are strong in the format, and which are strong against each other. So whether you are thinking of bringing these decks to the major events in June, or are just preparing to face the diverse Modern field, this article will give you a statistical foundation with which to start your testing and decision making.
Special thanks to MTGS users pizzap and Rickster for their work on the dataset. Also to Kim Josefsen, a regular reader who contributed to the work.
Dataset and Methods
The MTGO Deep Dive dataset compiles a semi-random selection of dailies and the different decks and finishes within those dailies. In my last article on the topic, the dataset included 16 dailies. Now, we are up to 28, consisting of just over 5700 matches. For each daily, I analyze deck performance to determine a deck's collective Match Win Percentage (MWP) across different events. I also calculate matchup win rates between the different decks. This gives us a sense of both the "true" overall MWP of those decks (calculated over hundreds of games), and the "true" matchup win-rates between different decks. Also, note these are MATCH win percentages, not GAME win percentages (GWPs): a 2-1 win is counted the same as a 2-0 win for the purposes of counting an MWP. I focus on MWPs instead of GWPs because GWP numbers don't distinguish between pre-sideboard and post-sideboard games. MWPs at least capture this over the course of a match.
I adjust all MWPs and win-rates for byes, drops, splits, mirror matches, and other MTGO/statistical oddities that would skew the dataset. In addition, I assess all MWPs for statistical significance relative to the "weighted average MWP" of decks across the dataset. This produces different P values for each deck's MWP. The statistical tests and the resulting P value checks the likelihood of any given deck's MWP value falling within expected variance relative to the average MTGO MWP. A P value greater than .10 would suggest the deck is not truly above or below average relative to the MTGO-wide MWP: it's just within the expected spread. But a P value of less than .10 (or even better: less than .05) would suggest the deck is a legitimate outlier, and a true under- or over-performer.
Overall Win Rates: Tier 1 and Tier 2 Decks
To get our deep dive started, here are the MWPs for all the tier 1 and tier 2 decks in the format, along with their statistical significance. I also show the number of appearances each deck made throughout the dataset, and the total number of matchups used to calculate the overall MWP. This gives you some sense of the sample size, N, for each of the calculations, and in turn a sense of how accurate those calculations might be. All tier 1 and tier 2 decks are taken from our Top Decks page: visit the page to see how we define which decks belong in which tier (note the page is being updated on Wednesday, 6/3).
Below are the tier 1 decks as defined on the Top Decks page. As a point of reference, the average MTGO-wide MWP for all decks (weighted based on prevalence) is 50.1%.
|Deck||# of Deep Dive|
|% of Deep|
|# of Deep Dive|
|MWP||P value and
I don't want to go into too much detail on these overall MWPs -- that's coming in the next section. For now, it's enough to say most decks are hovering right around that 50% marker, with just Affinity standing out as an overperformer. With a P value of .01, the deck's 57% win rate is significantly higher from the MTGO average of 50%. More on that later.
Next, here are the tier 2 decks.
Tier 2: 2/5/16 - 2/16/16
|MTGO %||Paper %||Day 2%|
Amulet Bloom is clearly knocking the MWP ball out of the park, but I'm going to be discussing that in the next section so let's ignore it for now. Instead, let's focus on all the decks below Grixis Delver, decks without sufficient N to include in a matchup analysis section, but with enough overall matches to extrapolate a net MWPs. One of the challenge in working with the Deep Dive dataset is always obtaining a large N for any given deck. You are at the mercy of what people are playing for any given daily, so if people stop playing a deck (poor Infect!), we stop seeing matchups for it. This means a lot of these MWPs have a lower appearance and match N than I would like. But we can still make some general observations from what we are seeing here, because most decks still have over 100 matches.
Let's start with the Collected Company decks: Abzan Company and Elves. The Abzan Company MWP is just terrible right now, whereas Elves is right around the average. I think there are a few elements at play here. First, Elves is a fast, linear, minimally-interactive combo deck. Those kind of decks tend to be very successful on MTGO, where tournaments are just four rounds and you can gamble on good matchups. Heck, those decks tend to be very successful in Modern period, for the very same reasons. When you screw up against Elves, you probably lose on the spot. When you screw up against Abzan Company however, you can still play a fair game of Magic (unless the Company player combo'd, but that's harder to do now than it was in the days of Pod). This favors Elves in the MWP contest. The second element explaining these differences is in the decklists: it's much easier to optimize an Elves list than an Abzan Company one. There is substantial consensus about what goes into Elves -- not so for Abzan Company and its many variations. This suggests Company players might be bringing suboptimal lists into dailies, which would help account for the lower MWP.
The other deck I want us to think about is UWR Control. We don't have quite enough matches to determine if this deck's MWP is actually as high as its pointing here, but early signs indicate it might be. UWR Control has a lot of tools for this metagame, including ample early removal, lifegain, countermagic to get you through the midgame, and resilient finishers. My guess is UWR Control still suffers from many of the same problems it suffered from in past months (chiefly it's a reactive deck in a metagame rewarding proactive strategies), but I also think it's a better deck than people give it credit. Myself included! I've written off UWR Control before, but it seems like it's better positioned now than in the past. After all, as Bolt becomes better, decks like UWR Control become more viable, particularly with redundant Bolt effects like Helix and Electrolyze. Cryptic Command also becomes much better in slower/fairer formats. With decks like Grixis Delver, Temur/Blue/Grixis Moon, Jund, Abzan Company, and other similar decks rising through the metagame ranks, the format is becoming much friendlier to Command.
Before turning to the in-depth analysis of certain decks, one final word on the MWP tables above: don't look at the tables and say "UWR Midrange only has a 47% MWP. It's clearly a bad deck!" Instead, consider those MWPs in relation to their P values and their N. In almost all cases, the decks are right within expected variance around the MTGO-wide average of 50%. This suggests ALL of the decks are actually decent choices, although some (cough Amulet cough) might have more going for them.
In-Depth Win Rate and Matchup Analysis
Some of our decks have hundreds of appearances and matchups, which lets us perform a much deeper analysis on their performances. In this section, I break down some of those key decks to discuss both their overall MWPs and their matchups against each other. Not all top-tier decks are included here! Some decks didn't have a large enough N to draw results from, either overall or within different matchups. But for those decks I do show, I'll give a detailed discussion of the results and how I make sense of them.
Remember: quantitative data is just one datapoint you need to consider when doing any kind of evaluation or data analysis. Make sure you combine the numbers here with your own experiences and the other sources/experience you may know of. I'll offer a bit of commentary in each section to try and help people make sense of the numbers and put them in the larger Modern context.
Again, for reference, our weighted average MTGO-wide MWP is 50.1% (N=98 different decks with ~5700 matches).
- Top Decks MTGO prevalence: 5.8%
- Deep Dive MTGO prevalence: 7.7% (142)
Deep Dive matches: 442
- MWP: 52.3% (p=.26)
vs. Abzan: 68.8% (11/16)
vs. Affinity: 55.2% (16/29)
vs. Burn: 46% (23/50)
vs. Jund: 55.6% (10/18)
vs. Amulet Bloom: 36% (9/25)
vs. Grixis Delver: 36.8% (14/38)
Twin's showing up a lot less in the Top Decks metagame than in the Deep Dive, which suggests a lot of people who play the deck are not consistently making 4-0/3-1. There is an underperformance effect at play here. This is also reflected in the MWP, which is slightly above-average but not significantly so. I found this a bit odd, given how strong Twin has been at events in the past year (the winningest GP deck after Pod). I think underperforming players are pulling down Twin's online MWP, which is why its MWP is only slightly and insignificantly higher than the MTGO average. In the hands of a good pilot, Twin is still one of the format's best decks. In the hands of a less experienced one, however, the deck does not necessarily carry the player. With so many people on Twin (remember: it's the third most-played deck), the MWP is going to take a hit just from player skill differences.
Turning to the individual matchups, the Burn and Affinity matchups make perfect sense. These are effectively 50-50 races, which reflects most of my experiences with the decks and those of players I know. Grixis Delver also makes a ton of sense. Delver decks are excellent against Twin, particularly the hard-removal-packed Grixis variants (sorry 4 toughness Exarch). Grixis Delver has exploded on the scene, and we know its Twin matchup is a big part of that.
Then we get to the Abzan and Amulet Bloom matchups. Abzan is supposed to be great against Twin. Here, however, it can't seem to win. We saw a similar effect last time we looked at the dataset, and it's still present even after increasing N. I believe player experience accounts for this, but not on Twin's side of the table. Abzan's metagame share has been declining rapidly on MTGO, which suggests to me the BGx deck is not a great choice these days (more on that later). True, players who are still sticking to BGW might be diehard Abzan pros, but they might also be players who are simply behind the metagame times. That second kind of player might have less overall Modern experience and thus be less equipped to battle Twin. The reverse effect is probably driving the Amulet matches. Your average Amulet Bloom player is quite experienced with their deck: Amulet has one of the lowest ratios of unique players to number of matches. Because of their experience and skill, those Amulet players are probably more experienced at navigating the Twin matchup than Twin players are at navigating the Amulet one. Player skill being more equal, we would expect both win percentages to normalize more towards 50%.
- Top Decks MTGO prevalence: 10%
- Deep Dive MTGO prevalence: 10.3% (190)
Deep Dive matches: 569
- MWP: 50.8% (p=.38)
vs. Abzan: 55.6% (20/36)
vs. Affinity: 41.9% (18/43)
vs. Jund: 38.5% (10/26)
vs. UR Twin: 54% (27/50)
vs. Amulet Bloom: 22.7% (5/22)
vs. Grixis Delver: 59.6% (31/52)
Burn has been the most-played MTGO deck for a while, and that's just as true in the Deep Dive dataset as it is in the MTGO metagame numbers. Burn's paper metagame share has been crashing (it's currently between 4.5% and 5%), but it remains an MTGO powerhouse going into June. Even so, the deck's MWP has declined a few percentage points since my last article. This reflects both metagame adaptions to Burn, and the tendency for decks to fall back to 50% as more people play them. We saw a similar effect with Twin, but it's notable to me that Burn's MWP is right at the average even though Twin's is slightly over. This reflects the oops-I-win element of Twin, which is less present in Burn.
Unlike the Twin vs. Abzan/Amulet matchups, the Burn matchups make sense across the board. Affinity and Twin are straight races, with Affinity at a slight edge (it can threaten the turn 3 win and easily wins turn 4 on the play) and Twin at a slight deficit (the only way it wins turn 4 is if it draws the combo or if it can somehow control the damage). Amulet is also a race, but between the lifegain from sources like Radiant Fountain and the relative difficulty of Burn interacting with Bloom's cards, this is heavily in Amulet's favor. Burn struggles with Jund due to the less painful BGx manabase and Bolt, and beats Abzan for the opposite reasons (you can read my article on Jund's strengths for more on these points). Finally, Grixis Delver remains Burn's best matchup, which is something many Grixis Delver players will admit to. Grixis Delver struggles against Burn because of a painful manabase, a lack of lifegain, cards like Gitaxian Probe which are just terrible in the matchup, and a gameplan that is a bit too slow. Thankfully for Grixis mages, the Burn vs. Delver MWP isn't nearly as lopsided as it was in the first article, which was a normalization I predicted would happen as we added more data.
- Top Decks MTGO prevalence: 5.8%
- Deep Dive MTGO prevalence: 5% (108)
Deep Dive matches: 339
- MWP: 57.5% (p=.01***)
vs. Abzan: 61% (11/18)
vs. Burn: 60.5% (26/43)
vs. Jund: 30% (3/10)
vs. UR Twin: 44.8% (13/29)
vs. Amulet Bloom: 57.1% (8/14)
vs. Grixis Delver: 50% (12/24)
From a metagame perspective, Affinity's Deep Dive prevalence is very close to the overall Top Decks prevalence, although Affinity's paper presence has historically (and currently) been higher than its online share. Affinity is in an MTGO metagame share dip these days, but I expect that to reverse in the coming months. Just like you should never bet against BGx, never bet against Affinity.
Speaking of never betting against Affinity, the real takeaway here is not the prevalence -- it's the MWP and its statistical significance. The deck's MWP is considerably higher than the MTGO-wide average, which reflects Affinity's longevity in Modern and its biggest events. This deck has been around for as long as the format, and it has always put up results, particularly when people expect it least. With all the focus on Burn, Grixis Delver, Abzan Company, Jund, and other hot Modern decks, players are probably forgetting their Silences and Grudges at home. This is especially true of all the decks not represented in the top-tier echelons. Brewers and tier 2-3 players are preparing for a field of Company/Delver/Burn/BGx/Twin/etc. They are probably forgetting the oldest aggro deck in Modern. This is reflected in the data itself: Affinity has 339 matches, only about 1/3 of which are represented against top-tier decks. This suggests other matchups are strongly driving the significant MWP, which is exactly what we would expect in a format where players might be forgetting Affinity to try and beat decks with more hype.
Looking at individual matchups, the most interesting results are the Amulet matchup and the Abzan matchup. Against Abzan, I expected this matchup to be more even, but I also still think some of the players who are sticking strong with Abzan might not have the best grasp of the format right now. So it's possible those less experienced/informed players are bringing the Affinity vs. Abzan rate. As for Amulet, I think this is mostly a function of Affinity players having the clock advantage against a deck that can't really interact with them. It's not like Bloom has any tools short of a Hive Mind to consistently beat giant Inkmoth hits, or a huge Skirge swinging the life totals. Affinity players who know their Ravager/Plating combat math will be rewarded in this matchup.
- Top Decks MTGO prevalence: 5.1%
- Deep Dive MTGO prevalence: 5.3% (97)
Deep Dive matches: 302
- MWP: 52% (p=.51)
vs. Affinity: 38.9% (7/18)
vs. Burn: 36.4% (16/36)
vs. Jund: 71.4% (10/14)
vs. UR Twin: 31.3% (5/16)
vs. Amulet Bloom: 45.5% (5/11)
vs. Grixis Delver: 72.7% (16/22)
Abzan's MTGO metagame share continues to decline as players switch to other decks (get 'em Jund mages!). Abzan may still be considered the 50-50 deck, but that definition is becoming increasingly uncertain in a metagame where everyone expects Abzan. The deck's MWP is solidly average, which partially reflects the 50-50 nature of the deck, but also reflects metagame context less friendly to Abzan than it used to be. Looking at the deck's matchups, this makes a lot of sense. Path, TS, and a painful manabase are just not where you want to be against Burn and Affinity, which is why those win rates are so low. Abzan may have a very strong Jund matchup (which is absolutely reflected in my experience with the matchup, where Abzan easily outvalues Jund), but that's not enough to shore up those matchups against the linear, less-interactive decks. All of this contributes to Abzan's falling metagame share and its lackluster MWP.
Grixis Delver is notable here in being one of Abzan's few remaining strong matchups. The Delver variant is everywhere online, and it really struggles against things like Siege Rhino, Lingering Souls, and Path (especially against bigger Delver decks favoring Angler/Tas). Also, Decay is still just as crazy against Delver as it has always been. Another notable matchup is Twin. Abzan is supposed to have a good Twin matchup but, again, that's not what the data is tracking here. As I mentioned before, I think this is a function of player skill and experience. A lot of players jumped ship from Abzan in the last month, leaving some combination of high-quality Abzan regulars (who will bring the MWP up) and players who effectively "missed the memo" about Abzan's declining effectiveness (who will probably bring the MWP down). Perhaps the most important takeaway here is that the data suggests the deck isn't itself so strong against Twin that you can just rely on card quality to win. Pilots still matter.
- Top Decks MTGO prevalence: 4%
- Deep Dive MTGO prevalence: 4.7% (86)
Deep Dive matches: 263
- MWP: 49.3% (p=.83)
vs. Abzan: 28.6% (4/14)
vs. Affinity: 70% (7/10)
vs. Burn: 61.5% (16/26)
vs. UR Twin: 44.4% (8/18)
vs. Amulet Bloom: 71.4% (10/14)
vs. Grixis Delver: 38.9% (7/18)
Jund continues to rise up through the MTGO ranks, and I fully expect it to surpass Abzan by the end of the summer if the rest of the field still looks like it does now. Jund's metagame share is still lower than decks like Twin, Affinity, and Burn, but we have already seen Jund shoot up to 6.5% of paper: MTGO is likely to follow soon. That said, the deck's MWP is actually lower than the MTGO-wide average, which seems unexpected of a deck that is supposed to be such a great metagame choice. The difference is by no means significant, so it's hard to know where Jund's true MWP falls around the MTGO average, but this is certainly not the MWP we would expect of a rising tier 1 staple.
To understand the potential discrepancies between Jund's metagame trends and its MWP, we need to look at the matchups. Burn, Affinity, and Amulet Bloom are all at the core of Jund's successes. As I've discussed in the earlier article on Jund's successes, Bolt and a less painful manabase go a long way towards beating those two aggro decks. As for Amulet, Jund combines Abzan's disruption with better card advantage engines (Bob is way better than Souls here because Amulet can't kill him and can't handle the card advantage) and a faster clock (Bolt is big here). These are important driving factors behind Jund's success, and I expect this to continue into the summer. That said, Jund has some clear weaknesses bringing down its MWP. Jund is not great against fairer decks. Bolt is terrible against Goyf and Tas, and just as terrible against decks like Grixis Moon and UWR Control/Midrange playing Bolt-resistant strategies. Bolt is also not where you want to be against Exarch. All of that is reflected in the abysmal Jund vs. Abzan MWP, as well as the Jund vs. Twin MWP: Bolt is not what you want to be doing against Exarch. Grixis Delver is also an uphill battle, because Jund's strongest cards are not so great in that matchup (Bob gets killed too easily, Bolt doesn't stop Tas or Angler, you have no Rhinos to seal the game, etc.).
- Top Decks prevalence: 4.1%
- Deep Dive prevalence: 4% (76)
Deep Dive matches: 250
- MWP: 60% (p=.002***)
vs. Abzan: 54.5% (6/11)
vs. Affinity: 42.9% (6/14)
vs. Burn: 77.3% (17/22)
vs. Jund: 28.6% (4/14)
vs. UR Twin: 64% (16/25)
vs. Grixis Delver: 61.1% (11/18)
Yeah, Amulet Bloom is still probably the best deck in Modern. We are up to 250 matches and the MWP is only getting crazier. Now it's 60%, a full 10% points over the MTGO-wide average, with a jaw-dropping statistical significance of P = .002. This means Amulet isn't just at the upper end of expected variance. It's a legitimate overperformer in another MWP league relative to the competition. This also aligns with our more qualitative experiences of the deck. Amulet Bloom is perhaps the most difficult combo deck to interact with in Modern, and also one of the most linear. It punishes decks that don't interact with it, and very hard to interact with for decks that try. This matches all other available data on the deck, all of which suggests Amulet is the real deal and the hand's down victor for highest MWP in Modern.
From a metagame perspective, Amulet Bloom sees a solid amount of play but nothing too overwhelming. It's about as common as Merfolk, RG Tron, and Jund, which feels odd given how crazy its overall MWP is. Why aren't more people playing this deck? It has positive matchups everywhere, it has a strong gameplan, and it punishes opponents who either don't interact with it or screw up an interaction. Why is it underplayed? The big reason is a perceived skill floor. People think this deck is really hard to play, which scares prospective pilots. Is it actually as hard as people think? Yes and no. The deck has a lot of internal nuances to figure out and many play lines you need to consider. But it's not much harder than Tempo Twin variants or Affinity in that respect, and those decks see a lot more play. That said, most players don't believe this to be the case, which is why so many of them don't run Amulet. Those running it online are extremely experienced with the deck: many have been playing it for years, and the deck has the lowest ratio of unique players to matches of any top-tier deck. This is reflected in all the matchups. Those win-rates aren't just Amulet Bloom showing its power. It's the players themselves showing their experience. Amulet is both a deck that rewards player mastery, and Amulet players on MTGO tend to be very experienced with the deck.
As a final note on this, I think both the Twin and Abzan matchup are closer to 50% than the numbers are indicating here. In both cases, there is a player experience effect at play that increases the Amulet Bloom win rate. These guys know their stuff and have been playing for a long time. But it's also a feature of the deck itself. When you screw up against most decks, you don't instantly lose. A misplay against Amulet, however, is often game over, and Amulet gives lots of opportunities for opponent misplays.
- Top Decks prevalence: 8.7%
- Deep Dive prevalence: 8.9% (165)
Deep Dive matches: 500
- MWP: 50.4% (p=.395)
vs. Abzan: 27.3% (6/22)
vs. Affinity: 50% (12/24)
vs. Burn: 40.4% (21/52)
vs. Jund: 61.1% (11/18)
vs. UR Twin: 63.2% (24/38)
vs. Amulet Bloom: 38.9% (7/18)
We end with Grixis Delver, an MTGO staple which exploded on the scene back in March and hasn't looked back since. Grixis Delver is the second most-played MTGO deck after Burn, which is reflected in both the Top Decks dataset and the Deep Dive. Like Burn, the most-played deck online, Grixis Delver has a very middling MWP, which is expected given how many people are on the deck. With such a deck, you naturally see a mix of experienced pilots, good players who are just picking up the deck, people boarding the MTGO hype train, and outright bad players. This all but ensures an MWP hovering right around the average.
Grixis Delver's observed matchups align nicely with my own experience of the deck. Affinity is a straight race, although I think this is slightly in Delver's favor depending on what build the Delver player is using. Burn is a bad matchup and Abzan is much worse, the former because of a painful manabase and a slower effective turn, and the latter because Abzan's cards generally outclass Delver's. Getting your turn 2-3 Angler or Tas hit by Path is a disaster. So is trying to burn out a Rhino. Amulet Bloom is probably more in Delver's favor than the matchup results indicate here, but player experience is a strong matchup determinant on both sides of the table. Amulet players tend to be very experienced with their deck and the format. Grixis Delver players run a huge range.
When I look over this data, my biggest takeaway has to do with player experience and skill. I see lots of instances where a matchup is brought up or down based on the relative skill of pilots. This doesn't mean the deck isn't a factor. As with most social science data analysis, it's a little bit of both. But player experience is an under-appreciated factor in deck performance analysis, and one affecting most Modern players. You can use this kind of analysis to see which decks reward tight, experienced play, and which decks are easier to just pick up and take to town. You can also use it to see which matchups are easy/hard independent of player skill. Again, don't interpret this as player skill being the only deciding factor in matchups and win rates. Decks play a big part in this too. It's just to say you need to consider all the factors in deck evaluation.
We'll keep adding data to the Deep Dive dataset and keep updating you on its progress. June is here which means we are in for three Modern GPs and an SCG Open. Hopefully this article gives you some additional tools to help you pick your decks and improve your matchups. And hopefully those events will give us some more awesome finishes and data to discuss as the month goes on!