I thought about marking up my work calendar with a countdown to January 18, but that felt too nerdy even for me. It's just so hard not to get excited about the upcoming January 18 banlist update and its probable impact on Modern. Since last week, ban and unban speculation has been relatively quiet throughout the Magic content-sphere, although I fully expect it to return with a roar as we open up 2016. Stoneforge Mystic is one of the biggest question marks in the conversation. On the ne hand, a Grand Prix promo announcement could suggest a possible unbanning. On the other, Mystic brings a lot of baggage from her reign in Standard and impact on Legacy. Last Wednesday, I published a series of test results in the Abzan Stoneforge vs. Affinity matchup to try and assess the impact of a possible Mystic unban. Today, we're finishing up the study with the Games 2-3 results and a final word on Mystic's unbanning herself.
My goal in Part 1 was to add some hard numbers and evidence to banlist conversations, pushing us away from the rhetoric, hyperbole, and personal anecdote that so frequently drags down discussion. If the pageviews and internet-wide reaction were any indicator, the community was pumped to come along for the ride. Huge props to Eric Levine at ChannelFireball for featuring Part 1 in his "This Week In Magic" wrap-up, and I hope our Games 2-3 test results live up to the buzz around that first article. We're going to start with a quick summary of the Game 1 tests and an overview of our sideboarding process for both decks. After that, it's on to the post-sideboard , the hard numbers, and some takeaways on the series and Mystic themselves. Will Mystic strike back in Games 2-3? Will Ravager hold the line? Read on to find out!
Game 1 Recap
If you want the blow-by-blow Game 1 account, you'll have to read last week's article. If you're joining us for the first time today, or forget the key statistics in the mess of numbers from last Wednesday, here are the big results you need to know from Game 1. These numbers assume 30 total games split 15-15 with Abzan and Affinity alternating play and draw.
Overall, Stoneforge Abzan scraped by with a 37% win rate. This is slightly higher than Frank Karsten's reported 25% win rate in his own experience, but well within the expected range given the sample size of 30. Abzan saw Stoneforge Mystic in 70% of games, winning 38% of games with Stoneforge involvement and 33% of games with no Mystic whatsoever. This points to Mystic having a relatively insignificant impact, tending towards a small bump for the Abzan pilot. Even though Abzan enjoyed the coveted turn two Stoneforge in 62% of its total games, it lost more of those turn two games than it won (39% versus 61%). That said, almost half of Abzan's wins ultimately involved the one-two Mystic into Batterskull punch, underscoring the combo's power but forcing us to admit its limitations in the Affinity matchup.
Based on Game 1 alone, it looks like Mystic is a lot safer in the Abzan vs. Affinity contest than many may have acknowledged. But as tournament-goers know, Game 1 often determines less than half of a match's outcome. We'll need to get into Games 2-3 to see how the Artificer really affects the Stoneforge Abzan vs. Affinity showdown.
We're running a fairly stock Abzan sideboard except for the Sword of Fire and Ice and the singleton Ooze we couldn't squeeze into the main 60. Our lone Maelstrom Pulse is an acknowledgement of a post-Mystic metagame, and I'm comfortable with the Stony Silence and equipment anti-synergy if it means consistent wins against Affinity. We'll see more of this at the end of the article, but the conflict between Stony and Stoneforge is barely noticeable in actual games.
To review, here's our sideboard.
Despite an ugly Game 1, Abzan is favored in the post-sideboard Affinity contest, and that should be truer than ever with Mystic in the mix. Here was our plan to take back the match:
I maintain that Sword of Feast and Famine is the proper maindeck choice in a post-Mystic metagame, whether smashing the Abzan mirror, stealing hits against those Jund mages and their Kolaghan's Commands, or outgrinding UWx Mystic players. Is the Lingering Souls and Sword of Fire and Ice engine strong? Absolutely, but Feast and Famine is much better on your average creature and I don't want to rely on drawing a combo in a midrange deck. Now that we're in Games 2-3, however, I'm happily trading protection from black and green for the Aether Shockwaveing, card-drawing blade.
Tasigur and Ooze are too slow for this matchup so we drop them for the Silences. Abzan never lacked for threats, clocks, or Sword-bearers in Game 1 and I don't expect this to be a problem in Games 2-3. We need ways to stop Affinity in the first four turns, not try to race them with vanilla beaters or prolong the game with 3-4 points of lifegain. Similarly, Liliana drops down to two copies to make room for Engineered Explosives, a much more reliable way to remove the nightmarish Etched Champion. Lily does this too but Explosives doesn't have to worry about sniping Champ through a sky full of Thopters, animating (Bl)Inkmoths, and Skirges.
We considered Fulminator Mage as an out to those Inkmoths (-2 Lily, +2 Mage), but this seemed like the wrong approach. The Shaman isn't blocking anything, which makes it a glorified Rain of Tears that only matters if Affinity is trying to win via animated lands. It's a terrible topdeck if we're behind against real threats like Overseer or Champion, and it sucks on our curve alongside Souls and Stoneforge activations. After Fulminator, we also entertained the extra Pulse to nuke Platings, Overseers, Pests, and Affinity's many four-ofs, but this sorcery-speed solution suffered from similar passivity issues as Mage. In the end, I was satisfied with our sideboard plan, although I would have loved an additional Lingering Souls.
The robots and their Champions might have triumphed in Game 1, but we knew Affinity was in for an uphill battle during Games 2-3. We kept Aaron Webster's Pittsburgh sideboard with a few adjustments for a Stoneforge world (and one new change since we presented the Game 1 results last week). Here was our Affinity 15.
One of the biggest challenges for an Affinity player is adding Games 2-3 bullets without diluting the core gameplan. Sometimes this involves shaving a Galvanic Blast, but the card was too good against Mystic to get trimmed. Other players cut a Pest or an Ornithopter, but flying was one of the few ways to keep Batterskull out of the fight. To respect this balance, here was our sideboarding approach:
Affinity's air force was its strongest maindeck asset in Game 1, and we didn't want to touch it after sideboarding. By contrast, Memnite consistently underperformed. It almost never enabled Drum, Glimmervoid, or Opal when something else wouldn't have done it too, but it did sit around staring stupidly at the 1/2 Mystic or something bigger. The rest of our Games 2-3 cuts were concessions to Mystic and the inevitable Stony Silence, the latter of which shuts down the otherwise awesome Overseer. Whipflare is normally pretty bad against Abzan when it's only burning Spirit tokens, but Mystic plus the Souls was a big enough issue in Game 1 that we wanted additional firepower against them for the rematches.
Some of our audience from last week might notice a conspicuous absence from the sideboard: Abrupt Decay. I talked about it with readers, some of my Magic friends, and my testing partner before deciding we should run Wear // Tear instead of the BGx instant. To be clear, Decay wasn't nearly as bad as many fretted over in the comments. In those 30 Games 2-3 we had already practiced with Decay, the card worked perfectly in 74% of the time we drew it, and passably in another 13% (averaging 1.5 turns between when it was drawn and when it was cast). Admittedly, the remaining 13% of Decay games saw the spell fail miserably. This performance was significantly better than some critics argued, but not exactly where we wanted. Affinity was already facing an uphill battle in Games 2-3, and we didn't need an extra handful of losses because we needed to topdeck a Mox Opal to kill a Batterskull. Moreover, although we initially included Decay as a nod to countermagic in the UWx Stoneforge matchup, we hadn't appreciated Stony Silence's increased profile in a white-shifted metagame. All of these factors pointed us to switching cards, and we re-ran all tests to account for Wear // Tear's presence.
Repeating our Game 1 methods, I stayed on Abzan while a friend of mine stuck with his trusted Affinity. Given how many shuffle effects Stoneforge Abzan uses (fetchlands, Mystic, Path, Ghost Quarter), we kept our games online to ensure the tests took hours and not days. We ran 50 total trials, divided 25 with Abzan playing and 25 with Abzan drawing. I took notes in a spreadsheet in between rounds, primarily quantitative measures like win turn, life total for each deck, number of Mystics per game, etc. I also included a more qualitative section with brief narratives so I could look back and remember what happened in each game, filtering themes from the match descriptions.
Looking back to Karsten's Affinity primer, Games 2-3 should see Abzan favored 60-40 over Affinity. This fit both our own understandings of the matchup (Affinity has a big lead in Game 1 and then falls stonily silent in Games 2-3), and aligns with the overall 50-50 nature of Abzan vs. Affinity. Based on those numbers, we wanted to see if Mystic changed the matchup from its 60-40 baseline.
Games 2-3 Results
Stoneforge Abzan posted a 58% win rate over the 50 games, going 29/50 against Affinity. With Affinity trailing at 42%, this is almost perfectly in line with Karsten's expected 60-40 post-sideboard split. Bootstrapping the sample to account for a small initial N, we found no statistically significant difference (P=.49) between our observed win percentage and the percentage expected by Karsten. Calculating a Match Win Percentage out of those different Game Win Percentages, we estimate a 52% Abzan win rate even after adding Stoneforge to the mix, aligning nicely with the 50-50 expectation. After running a more complex probability equation to account for playing vs. drawing in Games 2-3, we still find a 53% Abzan win rate, also in the expected range.
All of these results point to Mystic having virtually no effect on the Games 2-3 matches against Affinity, nor on the matchup as a whole.
We're going to unpack all of the themes and explanations behind these numbers, but first we need to start with all the top-level Games 2-3 statistics. To keep things consistent, and to help out readers who might open this article alongside Part 1, I'll try to keep the numbers in the same order as last time.
- Abzan win %: 58% (29/50)
- Abzan win % on the play: 64% (16/25)
- Abzan win % on the draw: 52% (13/25)
- Average Abzan win-turn: 9.5
- Average Abzan loss-turn: 6.5
Following our Game 1 lessons, Abzan did much better on the play than on the draw. Unlike Game 1, however, Abzan was generally favored to win regardless of whether it went first or second. This trend has always been true of the Abzan-favored Games 2-3, and I was happy to see our testing honored that trajectory. The BGx representative just picks up so much added power from its sideboard. On the other side of the table, Affinity jams in whatever answers it can without also watering down its gameplan. The end result is always going to preference Abzan, and today was no exception. Thinking thematically, Siege Rhino and Lingering Souls were uncontested MVPs, followed closely by Mystic. There was also this Stony Silence that singlehandedly won games, but we'll focus more on that later. For now, here are the summary numbers for Affinity's Games 2-3.
- Affinity win %: 42% (21/50)
- Affinity win % on the play: 36% (9/25)
- Affinity win % on the draw: 48% (12/25)
- Average Affinity win-turn: 6.5
- Average Affinity loss-turn: 9.5
Most of Affinity's wins came from uncontested Inkmoths and metalcrafted Champions, although it was much harder for the Nexus to punch through Spirits and Stirring Wildwood without Ravager support. Sideboard cards were critical in keeping Affinity afloat, especially Wear // Tear to flexibly answer Silences and Stoneforges. Now that we've laid out the matchup summary numbers, we can focus on Mystic and her own value (or lack of value) in individual games.
- Games with 1+ Mystic: 62% (31/50)
- Abzan win % with Mystic: 58% (18/31)
- Abzan win % with no Mystic: 58% (11/19)
- Abzan loss % with Mystic: 42% (13/31)
- Abzan loss % with no Mystic: 42% (8/19)
Nope, those aren't typos. Abzan was winning and losing at the exact same rate regardless of whether it did or didn't have Mystic. We saw a similar effect in Game 1, where Abzan won 38% of the games with Stoneforge and 33% without, although such similarities are far more pronounced here. I was cheering my Stoneforges throughout our tests, but I also couldn't help but notice her shortcomings against Affinity's improved answers. Wear // Tear was a serious pain, slapping down a living weapon or a Sword in almost a third of Mystic games. Turn one Thoughtseize was also bad news for Stoneforge. Add in Whipflare on top of the rest (at its best, massacring Spirits and Kors; at its worst, just a Mystic-specific Exterminate!), and it's no wonder the lovely lady didn't always leave a big footprint on games. Was I ever sad to see her? Absolutely not, and you always got some value out of Mystic as long as you cast her. Was I always living the Stoneforge dream? No way: once Affinity tried interacting with her and her creations, their weaknesses became much clearer.
Given the identical 58% win rates in Mystic and no-Mystic games, we have to look elsewhere for a matchup decider. Enter Stony Silence, one of Affinity's most feared nemeses in Modern. No one should be surprised that its presence in Abzan had a huge impact on the overall win rate, and I was pleased we got some numbers around its performance. Here are some key statistics for the stone-cold white hoser.
- Games with 1+ Silence: 42% (21/50)
- Abzan win % with Silence: 67% (14/21)
- Abzan loss % with Silence: 33% (7/21)
- Games with 0 Silence: 58% (29/50)
- Abzan win % with no Silence: 52% (15/29)
- Abzan loss % with no Silence: 48% (14/29)
Turn three Batterskulls might not have had a big influence on Abzan's win percentage, but Silence definitely did. In rounds where the enchantment never showed up, Affinity almost crawled back to 50-50, up 10% from the 40-60 baseline prediction. Once Stony showed up, however, things got bad for the artifact swarm. Real bad. Abzan crushed Affinity in 67% of games where the enchantment hit the battlefield. Admittedly, these numbers don't include situations where an Affinity Thoughtseize took Silence out of the picture before it entered play (a regrettable note-taking oversight), but the overall picture still shows Silence's dominance relative to the less-decisive Stoneforge.
Let's return to our reporting structure from the last article, this time adding in more Stony Silence numbers to see how they relate with and compare to the Mystic figures. We'll pay special attention to the vaunted turn two Mystic/turn three Batterskull pairing that is so often cited as a reason to keep her banned.
- Total Abzan wins: 29
- % of Abzan wins with Mystic: 62% (18/29)
- % of Abzan wins after a turn 2 Mystic: 28% (8/29)
- % of Abzan wins after a turn 3+ Mystic: 34% (10/29)
- % of Abzan wins with no Mystics: 38% (11/29)
- % of Abzan wins with Silence: 48% (14/29)
- % of Abzan wins with either Mystic or Silence: 89% (26/29)
- % of Abzan wins with both Mystic and Silence: 21% (6/29)
It's tempting to review these numbers and conclude Abzan's wins came from either Silence or Mystic because so many of its wins featured the combo. Both the data and the game notes show it wasn't that simple. As we saw above, Stoneforge had very little impact on Abzan's wins: the deck posted a 58% win rate with and without Mystic, and a 42% loss rate with and without Mystic. By contrast, Silence saw a 67% with- and 33% without- win rate, as compared with 52-48 in games where the enchantment never showed up at all. More qualitatively, Etched Champion, Blinkmoth Nexus, and removal were all amply prepared to steal games from the tutored Germ. By contrast, Silence may not have directly addressed these win conditions (Champ and Inkmoth still work even if the rest of the Affinity board is frozen solid), but it unquestionably slammed the brakes on all the enablers that make these cards scary.
We see a similar tale in the Abzan losses.
- Total Abzan losses: 21
- % of Abzan losses with Mystic: 62% (13/21)
- % of Abzan losses after a turn 2 Mystic: 19% (4/21)
- % of Abzan losses after a turn 3+ Mystic: 43%% (9/21)
- % of Abzan losses with no Mystics: 38% (8/21)
- % of Abzan losses with Silence: 33% (7/21)
- % of Abzan losses with either Mystic or Silence: 90% (19/21)
- % of Abzan losses with both Mystic and Silence: 9% (2/21)
Comparing win and loss records, a few themes emerge. First, Abzan does win more games than it loses after that turn two Stoneforge line, which suggests this play is strong but not game-defining. Second, Abzan wins and loses at about the same rate in games with either Silence or Stoneforge, but wins significantly more in games with Silence on its own. This suggests the white enchantment is itself a bigger factor in Abzan's wins than the white creature, although Mystic is surely playing a part. None of this is to diminish Mystic's impact on Games 2-3, where she clearly participated in numerous wins and sealed games alongside Stony Silence. That said, the numbers show her impact wasn't quite as definitive as many might think at first glance. Affinity's enhanced interaction did a number on poor Stoneforge.
Taken as a whole, Affinity was more than a match for Stoneforge Abzan. Although Mystic improved the midrange deck's odds, notably in Game 1, her net effect was negligible relative to the pre-Mystic numbers. Naturally, the same limitations apply this week as we saw in our Game 1 tests, so keep those in mind when extrapolating conclusions.
I talked about this in the comments of last week's article, not to mention via email and on message boards, but it's worth repeating here for posterity. Yes, Affinity has a lot of maindeck solutions to Batterskull that aggro exemplars like Burn, Zoo variants, or Merfolk lack. This might make Affinity seem like an odd choice for a matchup, but it is a necessary first step before additional tests can be conducted. As we've seen throughout the year and in pre-2015 metagames, Affinity is Modern's best aggro deck, a format pillar that sets the speed limit for many other strategies. If Mystic somehow warped the 50-50 Abzan vs. Affinity matchup, that would be sufficient evidence for me to oppose a Mystic unbanning. On the other hand, if Affinity could hold the line against Stoneforge Abzan, then we could safely move on to other decks knowing at least one aggro and midrange contest would be unchanged. Those later tests might prove the dangers of a Mystic unbanning, but at least we passed the first milestone. That's where we are now and I'm looking forward to seeing what comes next.
Based on the results of our Stoneforge Abzan vs. Affinity tests, I'm tentatively, cautiously, and hesitantly labeling the Mystic as a safe unban... for now. Key term: for now. Although the infamous Artificer didn't conquer the robots in spectacular fashion, her performance left me with some major concerns we need to address in future tests. Here are the big questions we still need to answer if we want to allay our fears and take one step closer to putting a community-wide stamp of approval on Mystic.
- How are decks without Blinkmoth Nexus and Etched Champion going to handle that turn three Batterskull? Affinity may be the benchmark aggro deck, but we don't want all other aggressive builds to be pushed out.
- How many non-Abzan decks are going to run Mystic? Even if she's safe in a few matchups, no one wants a format where 70% of strategies choose Stoneforge over something else.
- Are there scarier decks than Abzan which can run Mystic? Hybrid strategies are terrifying in formats like Modern, and the "Twinblade" thought experiment (an example of which showed up in Michael Majors' Stoneforge article last week) looks disgusting.
The Abzan vs. Affinity matchup gave many previews of how these three questions could play out, and I'm a little nervous to see their answers. More testing and discussion (hopefully emphasizing the former over the latter!) will be needed to address these worries, but I know the community is capable of grappling with these issues using the rigorous and evidence-based methods I've used in articles like today's piece.
What's Next For Stoneforge Mystic?
This is our last big article for 2015 and I'm happy to go out on a note like this. When I founded this site alongside Sean Ridgeley, I wanted to bring my love of data and data analysis to Modern, and this is the exact kind of article I'm excited to contribute to the broader Modern community. Hopefully you've enjoyed reading the series as much as I enjoyed making it. You can expect much more like this in 2016 and beyond. We'll be taking a publication break through the end of the holiday season, but you can be sure the Nexus team will return in January with more awesome Modern content! We might even have time to throw Stoneforge Mystic into the ring with her old friend from Zendikar, Goblin Guide...
Any overall thoughts on the test results or the process? Did you run any Stoneforge matches of your own? Interested in seeing any next steps, whether lists to test, synergies to try, or banned cards to explore? I'm excited to chat with you all in the comments and enjoy the rest of 2015!