It's the most wonderful time of the year! The holiday season? Nope. Oath of the Gatewatch spoiler hype and rage? Try again, but you're getting warmer on the "hype" and "rage" business. With 2015 wrapping up and January 2016 right around the corner, 'tis the season of endless debate, discussion, and delirium about arguably the hottest button issue in competitive Magic: the Modern banlist. Our own Trevor Holmes gave some banning history the other week, and we've already seen opinions from Anthony Lowry, Craig Wescoe, and every other OP on r/spikes and r/modernmagic. The overwhelming majority of this conversation is often devoid of data or evidence (sorry, a personal anecdote of losing on turn 1 last Friday night doesn't qualify). Today's article is the first in a series of Modern banlist-focused pieces where I'll try to add some concrete datapoints to this dialogue.
The lovely Stoneforge Mystic has been the talk of the town since Wizards announced their 2016 Grand Prix promo. Today, I'll summon my buddy Arcbound Ravager, along with his merry minions, to test the impact of a hypothetical Mystic unban.
In this inaugural "Banlist Test" article, we'll choose a banned card, stick it in a list, and throw it into the Modern octagon. We're following in the footsteps of Caleb Durward's fascinating "Banned Series" on ChannelFireball, but with fewer videos, way more rounds, and extensive context around our card and deck choices. Hopefully, this will inject some much-needed evidence into banlist discussions that are often heavy on rhetoric and light on actual evidence. We're kicking it off with a Stoneforge Mystic Abzan list battling against a stock Affinity build, spread out over three games and 80 matches. In the interest of space, I'm splitting this article into a Part 1 (deck overviews and Game 1) and a Part 2 (Games 2-3 and conclusions). We'll publish Part 2 next week. For now, let's launch into the banlist testing action!
One of the biggest pitfalls in testing banned cards is picking non-representative decks. When you run Bloodbraid Elf in Tribal Shamans and determine the Elf is safe for Modern, the only thing you've really revealed is that you've been hoarding foil Rage Forgers since 2012. Cards need to be tested in those same strategies they would call home if unbanned. For Bloodbraid, that means Jund and maybe Naya Zoo. For Ancestral Vision, it would be UR Twin and Grixis Control/Midrange. What about our leading lady of the day? Archetypes across Modern would undoubtedly welcome an unbanned Stoneforge Mystic, but for testing purposes, the deck we need to worry about is a deck that already doesn't need much help. This deck has already been Tier 1 on numerous occasions, one Mystic could easily push over the edge: Abzan.
Abzan might have missed Tier 1 in November, but the BGx powerhouse still boasts an impressive metagame history, almost all of it following Siege Rhino's arrival in Khans. We've published ten metagame updates since our site's launch and Abzan has made the Tier 1 cutoff in five different periods. Its Pro Tour Fate Reforged performance drove the early Abzan dominance in the spring, where Bloodbraid Rhino carried Abzan to a 25%+ share in the Pro Tour's Day 2 metagame. Abzan's share fell less than a month later at Grand Prix Vancouver, but it still ruled Day 2 at around 17.5%. It's true we haven't seen this level of BGW dominance since the spring: Abzan's most recent Tier 1 stint was in September at 5% of the format. That said, if Abzan could reach these 17%-25% levels without Stoneforge's help, we have every reason to be worried about what it could do with her. Given these metagame shares, Mystic's natural fit in midrange, and Abzan's love of good-stuff creatures, there's no better deck to welcome the Artificer and her equipment arsenal.
Now that we've tapped Abzan to champion the Mystic, we need to select our challenger. Although any top-tier deck could work here, we really need to pick an opponent that fulfills two criteria. First, we need our matchup to have a documented, baseline win-rate. This lets us check if Stoneforge Abzan pushes that win-rate too far in one direction or the other. Second, we need to choose a sparring partner that directly tests Stoneforge's strengths and weaknesses. No one is too worried about Mystic skewing the Abzan vs. RG Tron matchup too heavily. Warping an aggro matchup, however, is much more in line with the Kor's talents.
Based on this, Affinity is an easy selection for our banlist-test grudge match. Numerous sources have attested to the 50-50 nature of Abzan vs. Affinity. We see this frequently in quantitative pieces, such as those published here, on MTG Goldfish, and on ChannelFireball. We also have qualitative confirmation of this contest, as seen in Andrea Mengucci's Abzan primer and Frank Karsten's Affinity primer, both published in the aftermath of Pro Tour Fate Reforged. With the datapoints aligning across the quantitative and qualitative spectrum, we can be reasonably confident this matchup is very close to 50-50. That presents a perfect opportunity to see how Stoneforge's addition could influence the duel.
From a more theoretical perspective, one of the biggest fears around a Stoneforge unbanning is reducing format diversity by depressing the metagame share of aggressive strategies. As Wizards has said, they don't want a format dominated by the Mystic. Turn three Batterskull does a number on decks trying to win through damage, especially backed up by Abzan's disruption. Affinity is happy to rise to this challenge. If the robot horde is stymied by the lifelinking Germ, you can bet lower-tier aggro decks will be in even deeper trouble. That would suggest Stoneforge is much more dangerous than its proponents admitted. On the other hand, if Ravager and friends can keep an early Batterskull in check, it's possible other aggro players can adapt as well. This wouldn't be the end of testing, but it would be a very promising start for Mystic supporters.
Finalizing an Affinity list was easy: Aaron Webster just got 2nd at Grand Prix Pittsburgh with a no-frills 75. We made some adjustments to the sideboard to reflect Affinity battling in a post-Stoneforge world, but the maindeck was largely unchanged except for one swap.
We added in the Decays as game 2-3 concessions to Mystic. Although the BGx staple doesn't kill Batterskull itself, Affinity can easily pull ahead in the turns Decay buys after blowing up the Germ token. It can also use Decay to blow up the Mystic, whether against Abzan or the UWx decks that would invariably wield Stoneforge too. We also swapped the lone sideboarded Champion with a maindecked Spellskite to give us more beef in Game 1. Between Thoughtseize and Decay to disrupt, Aether Grid to circumvent Stony Silence, and the full set of Champions in the main, our Affinity list has more than a few answers to Abzan. Of course, there's an entirely separate question here as to whether Affinity runs Stoneforge itself, but we didn't worry about that for these tests.
Tailoring an existing Affinity list was easy. Crafting a post-Stoneforge Abzan list, however, was a bigger puzzle.
We started with Jon Westburg's 8th place Abzan deck from the October StarCityGames Open in Dallas. This was the highest-performing Abzan build in the fall and a good beginning for our Stoneforge overhaul. The first question was determining how many Mystic's to play, and then what equipment to run alongside her. We bounced around between three and four copies before realizing this was Stoneforge frikkin Mystic getting played in Modern. Of course we should be running the playset!
After that, we needed to determine what else the Kor would be Stoneforging apart from Batterskull. Without access to Umezawa's Jitte, it was a tossup between Sword of Fire and Ice and Sword of Feast and Famine, both of which saw similar degrees of Modern play throughout 2015. Feast and Famine won in the end not because it's better against Affinity (it isn't), but because it's better in a metagame where everyone is playing Stoneforge. Protection from black and green tears through opposing tokens, not to mention all the Goyfs and Decays inevitably accompanying Stoneforge into battle. Fire and Ice got shipped to the board instead. This left us with a maindeck Stoneforge suite of 4 Mystic, 1 Batterskull, and 1 Sword. Silvestri used the same equipment package when brainstorming Esper Stoneblade, further justifying our decision.
After figuring out Mystic, we reconfigured the deck's removal to be more generic (nixing cuteness like Abzan Charm and Murderous Cut). We ended with sideboard tweaks, throwing in that leftover Sword, an extra Maelstrom Pulse, and even a Slaughter Pact to address Stoneforge on the play against opposing Mystic decks. Our final Abzan list bore a strong resemblance to Willy Edel's deck at Grand Prix Pittsburgh, which suggested our reworks were on the right track.
We agonized over that lone Scavenging Ooze for a long time. Most builds go up to two in the main, but we thought we could get away with one if we had Stoneforge instead. There was no avoiding the slot problem at 61 cards, so it was either trimming the Ooze, going to two copies of Liliana/Path, or nudging Rhino/Decay/Inquisition to three. All of those options sucked for different reasons (especially when viewed through the lens of a grindier, post-Mystic metagame), so we settled on the least problematic of the lot.
A friend of mine with over a decade of Affinity practice piloted the robots. I stayed on Abzan. We considered switching decks between games, but experience is so important in getting the most out of Affinity that I deferred to his expertise. All tests were conducted online to speed games up (especially around Abzan's shuffling). For Game 1, we played 30 total rounds: 15 with Abzan on the play and another 15 with Abzan on the draw. Then we sideboarded and played 50 Games 2-3 trials, split evenly with both decks playing and drawing.
In his Affinity primer, Karsten estimated Game 1 at 75-25 in Affinity's favor, with Games 2-3 leaning towards Abzan at 40-60. Based on this and the other sources, we wanted to see if the Stoneforge Abzan vs. Affinity matchup would be 50-50 overall, with a ~25% Abzan win-rate in Game 1 and a ~60% Abzan win rate in Games 2-3.
Game 1 Results
In our thirty Game 1 trials, Stoneforge Abzan went 11/30 for a total win rate of 37%. Although this is higher than Karsten's 25% estimate, it's well within the expected variance (statistics talk: I bootstrapped the Game 1 sample in 10,000 resamples and then compared those results to Karsten's 25%, finding a statistically insignificant difference between the two at p=.50). This suggests Stoneforge Mystic had no statistically significant impact on the Affinity vs. Abzan contest in Game 1. She did, however, make small differences for the Abzan pilot.
Before we dive into the themes and takeaways of this matchup, here are the high-level Abzan figures from our thirty games.
- Abzan win %: 37% (11/30)
- Abzan win % on the play: 47% (7/15)
- Abzan win % on the draw: 27% (4/15)
- Average Abzan win-turn: 9
- Average Abzan loss-turn: 6
It should come as no surprise you want to be on the play against Affinity, and Stoneforge Abzan was no exception. Similarly, the more you can prolong the Affinity matchup, the better it is for the Abzan pilot. This is particularly true when it comes to Stoneforge Mystic. Landing the turn 3 Batterskull before your opponent's fourth turn makes a world of difference, especially if you can beat Steel Overseer to its first tap. That said, Abzan couldn't push above 50-50 on the play even with Mystic in the mix, which suggests the Game 1 advantage remains solidly on Affinity's side of the court. For the sake of completion, here are the high-level Affinity stats, which just flip the Abzan numbers.
- Affinity win %: 63% (19/30)
- Affinity win % on the play: 53% (8/15)
- Affinity win % on the draw: 73% (11/15)
- Average Affinity win-turn: 6
- Average Affinity loss-turn: 9
Let's go a little deeper. Here are some statistics around Stoneforge Mystic and her impact on games.
- Games with 1+ Mystic: 70% (21/30)
- Abzan win % with Mystic: 38% (8/21)
- Abzan win % with no Mystic: 33% (3/9)
- Abzan loss % with Mystic: 62% (13/21)
- Abzan loss % with no Mystic: 67% (6/9)
Or maybe I should say, Stoneforge Mystic's relative lack of impact. Despite seeing the Artificer in 70% of games, Abzan managed only a slight improvement when it dropped her on the board. Without Mystic, Abzan won 33% of games. With her, it won 38%. That's an insignificant difference both at a glance and statistically. Even if we doubled our sample size to 60 games (or went all the way to 100), I don't think we would see much change here. The no-Mystic win rate would likely slip down to 25%-30%. As for games with Mystic, it might eke up to 40%. This would represent a very minimal improvement over a generally bad matchup, pointing to Mystic being safer than many of her critics acknowledged, but still of measurable benefit to Abzan.
On the subject of Mystic herself, one of the main objections to a Stoneforge unbanning is the power of turn 2 Mystic into turn 3 Batterskull. How did that line play out in the Abzan vs. Affinity fight?
- Average Mystic turn: 2.75
- % of Mystic games where Mystic landed on turn 2: 62% (13/21)
- % of total games where Mystic landed on turn 2: 43% (13/30)
Win or lose, Abzan dropped a turn 2 Mystic onto the battlefield in 43% of its games. That's right around the expected value of drawing a Mystic in your opening 7-9 cards when you're running the full playset. There were three games where I had to hold removal or Inquisition instead of living the dream (typically blowing up Steel Overseer or discarding Etched Champion or an uncast Plating), but most of the time the turn 2 Mystic was the right play. Indeed, in those games where Abzan saw a Mystic at all, she landed on turn 2 in 62% of games.
Fortunately, we're not here to throw a fit about turn 2 Stoneforges in the abstract. We're here to look at how that turn 2 Mystic actually affected our win percentages.
- Total Abzan wins: 11
- % of Abzan wins with Mystic: 73% (8/11)
- % of Abzan wins after a turn 2 Mystic: 46% (5/11)
- % of Abzan wins after a turn 3+ Mystic: 27% (3/11)
- % of Abzan wins with no Mystics: 27% (3/11)
Looking at wins alone, the turn 2 Mystic was a clear factor in Abzan's victories. Almost half of Affinity's losses came to the dreaded turn 2 Stoneforge, on top of Mystic's involvement in 73% of Abzan wins overall. Reviewing my notes, the Kor was a major contributor to all the wins where she made an appearance, particularly when that showing came on turn 2. Although we don't know with certainty how games would have ended if the Abzan player had drawn Kitchen Finks instead of Stoneforge, my notes suggest the one-two punch of Mystic into Skull was too much for Affinity to handle. Batterskull generated massive life advantages when left alone. I got Skull up on a Spirit four times total. All of those resulted in landslide victories. Taken as a whole, when Abzan won with Stoneforge, it tended to win big.
Of course, that gets us wondering about Mystic's performance in games Abzan eventually lost.
- Total Abzan losses: 19
- % of Abzan losses with Mystic: 74% (14/19)
- % of Abzan losses after a turn 2 Mystic: 42% (8/19)
- % of Abzan losses after a turn 3+ Mystic: 26% (5/19)
- % of Abzan losses with no Mystics: 32% (6/19)
Hmm. Maybe Stoneforge wasn't so decisive after all...
Even though Stoneforge hit play on turn 2 in 42% of these losses, she was unable to avert the inevitable defeat. Indeed, Mystic entered the battlefield in 74% of the lost Abzan games overall, the exact same rate she appeared in the Abzan wins (73%). My notes give some explanation around these losses. There were three factors which worked against the turn 2 Mystic, converting a potential win into a guaranteed loss. The first was removal: a stray Galvanic Blast crippled the Batterskull line and left the Abzan player on the back foot. The second was Inkmoth Nexus, which combined with Ravager or Plating to soar across for a poisonous finale. Finally, Etched Champion could sit back and block the hapless Germ token all day while Affinity's fliers chipped away for the win. These numbers and the narratives behind them show the Kor was quite beatable.
In the end, Abzan saw the Mystic in 70% of its games, but still maintained similar win percentages when it drew Mystic (38%) and when it didn't (33%). These numbers are right around Karsten's 25-75 Game 1 estimate, and although Mystic nudges the scales in Abzan's favor, it's not nearly a big enough push to cause worry. Based on these numbers and their context, I am tentatively concluding that Mystic does not break the Abzan vs. Affinity matchup in Game 1. Affinity has more than enough maindeck ways to handle the infamous Artificer, whether through direct removal, the protected Champion, venomous Inkmoths, or even through a simple airforce damage race.
Like any testing environment, our Abzan vs. Affinity study today has limitations. For one, I know at least a handful of readers are going to see the number of test games and immediately cry foul about an insufficient N. Many of these critics wouldn't be satisfied with 50 or even 100 games, because these samples fall below N levels needed for "truly" significant results. I've compensated for this in a few ways. This includes bootstrapping our sample, checking the observations against the expected values and seeing no serious deviations, and digging into the narratives behind each game to contextualize the numbers. Social science analysis often deals with smaller samples, and these methods are all great ways of mitigating the low-N effect. Moreover, I sincerely doubt Wizards runs 10 matches of hypothetical Modern decks, let alone 30 Game 1s. This suggests the testing should be more than enough to suggest something about banned cards.
A second limitation concerns the test's applicability to other matchups. Can we make conclusions about the Abzan vs. Burn Game 1 based on these results? Or Abzan vs. Gruul Zoo? These comparisons are fraught with difficulties. On the one hand, something like Burn or Zoo would lack both Champions to stonewall a Batterskull and the Inkmoths to ignore lifegain. On the other hand, both decks pack significantly more removal than Affinity's four Blasts, not to mention anti-lifegain bullets in Atarka's Command and Skullcrack. Without testing these matchups, it's hard to know if one factor would compensate for another. This underscores the need for further testing, but also the importance of looking for matchup themes. For example, the Affinity results suggest Burn would probably be okay battling through a Mystic. It has enough spells to either kill Stoneforge (a noticeable Mystic weakness in even the removal-light Affinity tests), or to ignore the Skull and blast for lethal (as Affinity could do with its flying creatures). Then again the landlocked, creature-packed Gruul Zoo might struggle here.
Stay tuned for Round 2!
I hope you're as excited to read the Games 2-3 results as I am to report them! We'll be back next week with the conclusion of our Stoneforge Abzan vs. Affinity series, along with some final thoughts based on this round of testing. Mystic might not have caused too many problems in Game 1, but I'm sure we're all excited to see how she fares once the sideboard comes in.
Let me know if there are any additional matchup numbers you want to see or unanswered questions you want addressed. Do you have issues with the methodology? Concerns about the lists or feedback about the matchup? Any other opinions you have on Mystic and translating these Affinity-focused results to a broader metagame? Bring it into the comments and I'll see you all there!