The banned list is one of the hot Modern topics whenever a new set is released. Everyone is speculating about what, if anything, will get the ax or be unleashed upon the world. Speculation this time is focused on Infect and/or Dredge taking a hit and Bloodbraid Elf coming off the list. I'm not here to ad to the speculation but instead provide hard data on whether an unrelated card should come off.
I have been hinting at (and making excuses for) this article for weeks now. The time has finally come for me to publish my findings. Today I begin presenting the results of my investigation into the viability of unbanning Stoneforge Mystic. It will be quite long, so today will present the setup and methodology and next week I will actually present my data.
Long time readers may remember that last December Sheridan tested Stoneforge Mystic in an Abzan list against Afffinity. What he found was that the option for a turn-three Batterskull did not significantly impact the matchup game 1 and that sideboard cards played a much larger role in giving Abzan a 50% win rate against Affinity. For reference, here's the deck Sheridan used:
I don't doubt his results are accurate, but I don't think they really tell the story. Affinity has plenty of ways to get around Batterskull so I never expected Stoneforge to have much effect there. Affinity is a "fair" deck (I really need to come up with a better term for that kind of deck) and can ignore most of what Abzan is doing. What I was always interested in was the effect it would have on fair decks, and Sheridan never got a chance to test those.
Additionally, Sheridan mentioned that he wanted to do more testing with other decks, so I started gathering data for him. Specifically I started testing a TwinBlade deck, which was Jeskai Twin with Stoneforge Mystic and a pair of Batterskulls. I was mostly done with data collection when Splinter Twin got banned, rendering it all moot.
What I can say about TwinBlade was that it was a nightmare to play against. I tested Burn and was working on Jund and Stoneforge had a noticeable, trending toward significant, impact on both matchups. Burn traditionally had trouble against Twin because it couldn't win quickly enough to beat the combo when Twin had some interaction while the consensus of Twin vs. Jund was that it was 50/50.
The addition of Mystic definitively pushed Twin over Burn. Repeatable lifegain is unsurprisingly hard for Burn to beat, and trying to do so left them open to being comboed out. Jund was also losing ground, though I was never certain if that was due to Mystic herself or if we were just playing the matchups poorly. Trying to defend against the combo and Batterskull spread Jund pretty thin, but that might have been player error.
In any case, the threat of that deck was going to lead me to recommend that Mystic never be unbanned. With Twin gone, I thought it worth looking into again.
Having decided to test out Stoneforge, and that I wanted to provide a definitive answer about its impact, I knew that meant I had to test a lot of decks. The problem was that there isn't as strong a consensus about Abzan's other matchups besides Affinity. I decided to establish a baseline myself. This would involve playing a stock Abzan list against a test gauntlet and then running it again with the Stoneforge list. After some scouring, this is what I came up with:
Keep in mind that I began the process in late June, so the Grim Flayer and Collective Brutality technology didn't exist at the time. At this time I also decided that I wanted to use Sheridan's results in my final analysis since it was an already complied data point. To make this work I would be using his list for the actual testing, which was not a problem since at the time Abzan hadn't dramatically evolved since December.
I wanted a mix of fair and less-fair decks for my gauntlet. I also wanted the results to be applicable to the metagame as it existed when I began. Complaints about linearity and aggro saturation were particularly high at the time, I so settled upon some fair and unfair linear aggro and the most successful truly unfair deck in Modern. The other consideration was that I wanted decks where Mystic could have an impact. I doubt very strongly that Tron cares about an artifact that's smaller than Wurmcoil Engine, and I wanted to improve the chances of results worth reporting.
I also made sure to go as stock as possible with these lists. I wanted the most representative results as possible, and the less common builds could have skewed things. This was difficult for Burn and Infect as everyone has their own take and I ended up aggregating them to find the "average" deck. The rest seemed to be pretty close to consensus and were relatively easy. As a bonus, the decks had sideboards that were reasonable in a Mystic-fueled Modern.
If traditional Naya or 5-Color Zoo had any metagame presence at the time I would have gone with those as they're closer to what players think of when we talk about fast aggressive decks. The Burn decks that run Wild Nacatl may have a different result than this more traditional list, but the version above is still widely represented and there is considerable dissent about which is better.
Burn was a good choice for the red side of aggro, but as for the non-red I really had only one choice. I wanted top-tier decks that had proven themselves and when I started, there was only one deck that fit the criteria.
Honestly, even if Merfolk wasn't Tier 1 I would have tested it anyway. It's my deck and I want to know what effect Mystic would have on it. Testing with this deck also reminded me why I play UW Merfolk instead. I ended up missing Path to Exile and Echoing Truth, as well as my sideboard, and being underwhelmed by Harbinger. Still, I'm the only one playing that version, so I played the same deck everyone else does.
And then we have the most complained-about deck (that isn't Dredge).
Infect has the fastest kill in the format, but it's fairly vulnerable to Jund and Abzan's disruption, and like Affinity it can ignore Batterskull. This would really show how powerful a threat it is rather than just acting as a wall and lifegain source.
Ad Naus is the most successful unfair deck in Modern now. Grishoalbrand is more broken but also inconsistent, and rarely appears on our tiering charts. Scapeshift is a fair deck and Titan Breach really wasn't a deck when I started.
Despite what was said during the World Championship I think the matchup of Abzan vs. combo decks is pretty even. When Abzan goes Inquisition, Tarmogoyf, Liliana, it's hard to lose. If it doesn't get the right disruption or a decent clock it will lose. Testing would be focused on whether Mystic improves the clock enough to shift the matchup.
I was proceeding through testing all these decks when I began to notice a trend in the data. This trend was interesting enough to want to confirm the result, despite the exhaustion all this Magic was causing. However with PPTQ season getting underway I didn't think that was possible. Then I won won the first one and suddenly I didn't need to test for real anymore. With my ticket to the RPTQ punched (Congratulations to Jordan for doing the same) I had the time to actually test more decks. To confirm the data trend I would need another fair deck and a less fair one. Thus I added two more decks to my gauntlet.
Jund is the poster child for fair decks and I would have gone with it if I could have found a Jund player to test with. I didn't, but a Jeskai player volunteered, and Jeskai will do.
Dredge seemed like a good candidate for the unfair deck. It was the new hotness at the time and while I didn't think turn-three Batterskull would be good, that was actually in line with the phenomenon I wanted to test. The problem was that after the practice matches it was clear that Abzan's win percentage game one was too low and the sideboard matches too swingy for me to consider the data valid. Abandoning that, I looked at the current tiered unfair decks and went with Death's Shadow.
Death's Shadow presents itself as another Zoo deck but with an unfair fast win, coupled with consistency, that pushes aggro decks out of fair territory. On reflection, picking a deck that straddles fair and unfair is the best indication of what Stoneforge will actually do to both. Tracking the fair Zoo style wins versus the Become Immense wins proved enlightening.
Adding all these decks to the gauntlet and finding experienced pilots to work with added several weeks to the project. For anyone looking to perform a similar test, take care to limit yourself and keep your curiosity in check or project creep like this will ruin you. If I wasn't butting up against the next banned announcement I might still be collecting data. Which brings us to my actual methodology.
I would be playing the Abzan decks. My project, I would do the grunt work. I didn't want to switch off piloting decks because I wanted to model how these matchups would actually play out in "real Magic," where players know their decks and know the matchups. This required finding experienced pilots who were as crazy as I am, who specialized in the decks I wanted to test, and were willing to use these stock lists (on which I negotiated with a few on what actually went into the lists).
This was about as hard as you'd think, especially when I explained the scale of the project. In the end I found online players for Burn, Merfolk, Death's Shadow, and Ad Nauseam. The previously codenamed "Elliot" agreed to pilot Infect and then Jeskai in paper after some
begging persuasion. As I'm writing this my online partners have not told me how they want to be credited. If I get responses, I will add them in.
I ambitiously set the target of 100 matches per deck, 50 with the "normal" configuration and 50 with Stoneforge. This actually isn't a large enough n value for a true statistical study, but it would be reasonably representative. Play/draw was alternated with the initial decision based on coin flip, ensuring 25 games a piece on the play for each deck. Sideboarding was included, and will be included in the discussion of the data next week. The testing was conducted over a number of sessions due to scheduling concerns/MODO crashes. "Elliot" testing was done in person, the rest were online.
During the Ad Naus sessions we made special consideration for how Lightning Storm doesn't really work online. We both knew what was supposed to happen, so if that wasn't reflected by the interface we discussed what would have actually happened in paper and recorded that result. Misclicks were also accounted for, with some matches thrown out and repeated.
Prior to the actual test games a minimum of ten practice games were played against each Abzan deck so that we could get our eyes in and get a feel for the matchup to better mimic Stoneforge actually being legal. It also helped us to get the "correct" sideboarding strategy worked out. Once that was decided upon it was not changed for the duration of testing, even when we later concluded in several cases that there was a better strategy.
Next Stop: Enlightenment
Let me begin concluding by saying that this was not a fun exercise, but it was educational and I am a better player for the effort. Magic should be fun, and this grinding was exhausting and enraging (my distaste for MODO approached a burning hatred many times). It will be a while before I try this again, and probably longer before I find anyone willing to join in my madness.
Next week I will present the sideboarding strategies and win percentages, and explain what it all means. See you then!
Read about David's conclusions in his subsequent article, Testing Stoneforge Mystic in Modern: Part Two.