It's finally time for another banlist testing report. Back in May, readers chose Green Sun's Zenith as the next banned card to test, which I'll just call GSZ from here on out. I've spent the intervening months grinding games and recording results to test its effect on Modern. This week, I'll explain my test procedure, unveil the testing gauntlet, and describe the huge complication I encountered along the way.
It's been a while since my last banlist test, so let's review how they're done. My general procedure is described here: in short, I select a deck to run the test card as a four-of, then run it against a gauntlet of high-performing decks. Ideally, the test deck would be an updated version of the deck that got the card banned, but building that is rarely possible. I compare that result to a baseline control test, and use statistical analysis and the experience gained during testing to draw conclusions about the card in question.
Testing is done in paper, not online. I locate players who actually pilot the decks in the gauntlet, and after some practice to get a feel for the matchup, we play 50 recorded matches with the control (current) deck, and 50 with the test deck (which runs the banned card). Sideboarding and decklists are set in stone once the first recorded match begins. Because it's natural to improve at a matchup through practice, we alternate between the control and test decks each match.
Choosing the Deck
When the vote came in for GSZ, I was surprised and rather unprepared. I'd thrown it in for some variety, as previous votes indicated that Dig Through Time would be the winner hands down. I didn't know which deck to test GSZ with, and there really wasn't guidance available because it had been banned in the first wave in hopes of increasing diversity among creature decks. Asking around for ideas didn't help, as every single green-featuring deck was suggested. The only consistent advice was to run Dryad Arbor too, which grants Zenith ability to act as a mana dork on turn 1.
Looking to Legacy showed GSZ in Elves, Maverick, and sometimes Infect. Maverick is a hybrid of Death and Taxes and Abzan Midrange, which suggested that the card could work in Hatebears or BGx Rock. After some exploratory testing, the answer was... maybe? Running GSZ alongside Gaddock Teeg generates a lot of tension, and in midrange, I was mostly trying to dig up Siege Rhino. GSZ was never bad in either deck, but I also didn't feel they fully utilized the spell. However, they could have, which is a point in favor of Wizard's diversity argument.
Infect actually seemed promising enough that I nearly picked it for the test. It already ran Dryad Arbor as Liliana of the Veil protection, and there are some decent green infectors beside Glistener Elf. However, it couldn't quite live up to my expectations.
Space requirements meant Blighted Agent, arguably the best infector, had to be cut. Blight Mamba was decent, but kept getting chump-blocked. Rancor helped on this front, but that plan subtracted from the raw power of a standalone threat. There was enough doubt about the deck that I didn't pull the trigger. However, I do believe that with more development, GSZ could be good in Infect.
Defaulting to Elves
In the end, I just went with Elves. It's a pretty obvious choice, and I had the advantage of knowing a lot of Elves players, so I could get help building the deck. I ended up regretting asking for help, as those players gave me very different answers individually, and whenever they overheard another player, it started arguments: which deck; which splashes; which cards to cut. The only things they agreed on was a Dryad Arbor, and not to run Bloodbraid Elf, which doesn't cascade off GSZ or Chord. In the end, I found a GB list and modified it until I was happy.
It may seem odd to have Devoted Druid without Vizier of Remedies, but the whole combo is a bit space-intensive and doesn't mesh with the tribal synergies. Also, two Druids and Ezuri already goes infinite. A single Druid will provide enough mana for a second pump on its own, and if the game isn't won at that point, it's probably not winnable.
The test deck was basically the same, but I tweaked the numbers around to fit the extra non-creature spells in.
Building the Gauntlet
These decisions were easier to make. I was putting this together in mid-June, around GP Las Vegas and the metagame was fairly defined. UW Control was rising, Humans was still on top, mono-green Tron was the deck of the GP, and Storm was the most popular combo deck. Ironworks was admittedly the combo in the spotlight, but I didn't know any Ironworks players, and so stuck with Storm.
The fifth deck was a judgement call. The top tier at the time was primarily aggro decks, but I wanted more variety, so I didn't pick Burn or Affinity. Hollow One was everywhere, but I didn't want to play against it 100 times; I can barely stand it once a tournament. It also wasn't clear if Hollow One was real or a deck of the moment. Mardu Pyromancer was another fine choice, and had I found a willing pilot, I would have chosen that. Unfortunately, the only willing pilot had to drop out shortly before testing. I therefore defaulted to Grixis Death's Shadow.
I let the pilots run their personal decks for the test, as there's no time-efficient method that I trust of averaging decks. MTGGoldfish's averaging system gives some weird results, and my pilots were playing close to norms anyway.
The Great Complication
I got started with Elves vs UW Control first, before anyone else was ready. This wasn't planned, but it was fortunate that it happened this way. Simply put, my UW pilot and I didn't have enough work to do mid-June and did the testing instead. The testing went very smoothly, however once it was done and I summed up the results, I realized that there was a problem. Here were the results:
- Control Win: 22/50, 44%
- Test Win: 41/50, 82%
That is an enormous deviation. I was shocked, and my opponent was in denial. This required further investigation.
While this is the experimental result, and any result is still a result scientifically speaking, this result looked extremely problematic to me. I've had some big swings before, but never one as massive as 44%-82%, nor any that were this unexpected. Nothing that happened during testing indicated that it would be this skewed, and both of us had thought the matchup felt really close. However, with the data indicating otherwise, I began to suspect that the result was actually an outlier. My UW pilot was similarly sure that something was wrong, though for him, I think it was more indignation at losing to Elves so unequivocally.
I decided that further investigation was necessary. Fortunately, we discovered very quickly that mistakes had been made during our test. The question was how to correct them.
Realizing the Mistake
We ran into a number of pitfalls in this test. The main one: I didn't recognize the deviation between the control and test results until the very end. Had I been more aware during testing, we could have adjusted earlier.
The second problem happened on the UW end. The pilot was a Legacy Miracles specialist who took the Sensei's Divining Top ban worryingly hard and jumped on the Modern Miracles hope train the minute Jace, the Mind Sculptor was unbanned. As a result, he thinks like a Legacy player, and apparently that was the problem.
In his evaluation, he was playing our test games as if it were Legacy Elves vs Miracles. This makes logical sense, but Legacy Elves is a combo deck. It plays no lords or reasonably sized creatures, and is all about finding Craterhoof Behemoth and crashing in for 20. Modern Elves is beatdown to the bone. It can have combos in it, but the deck mostly revolves around Elvish Archdruid and Ezuri, Renegade Leader.
Given this difference and the effect that we realized GSZ had on Elves, he should have playing like he does against Legacy Goblins. Goblins was one of the few bad matchups Miracles had, because it couldn't be locked out with Counterbalance and couldn't be exhausted by attrition: Terminus tucks creatures back into the library, where they can be found by Goblin Matron. Even worse was stacking Terminused goblins for Grenzo, Dungeon Warden retrieve. At the time, Miracles won either through concession, Jace, or a single Entreat the Angels. The eventual solution was to become more aggressive by following Terminus with Monastery Mentor and winning before Goblins rebuilt.
GSZ was allowing Modern Elves to play a very similar game to Legacy Goblins. I would flood the board with dorks, and if I got Terminused, I had so many tutors that I could find whatever I needed to get going again. Also, GSZ recycles itself, so sneaking even one through created a long-term problem for UW. My pilot argued that the potential to just grind him to death was so high I should be prioritizing that strategy, sideboarding in Eternal Witness to rebuy the tutors he counters and the "missing" copies of Collected Company. I had wanted to run Witness, but there wasn't maindeck room and it's not really a sideboard card. Given the experience of this matchup, though, it made sense to find some room.
In the end, I decided to redo the testing with new decklists. The rest of the team wasn't ready to start, so redoing a test wouldn't hamper testing. It was hard to disagree that Eternal Witness wasn't the right strategy for Elves post-board. Additionally, it was July by then, and M19 with Elvish Clancaller was being released. Normally my testing doesn't consider new releases, but if I was going to change the sideboard, I figured I might as well alter the maindeck, too. I was also keeping a closer watch on the data this time so nothing surprised me again.
On the opposite side, the UW deck would greatly change its strategy post-board. Taking a page from Legacy, my partner would try to become an aggro-control deck instead of pure control. This would prove difficult since Monastery Mentor isn't really Modern playable, and I wasn't going to let him go too far off the rails just to beat Elves in a theoretical test, but Geist of Saint Traft, Baneslayer Angel, and Vendilion Clique were already inclusions in UW sideboards, so he got a few more. After some minor adjustments in his post-board counter suite, we were ready.
In the end, the decklists didn't end up that different from the originals. I tried a number of combinations and while these felt best, I'm not convinced they're correct. I ended up cutting the Devoted Druids because it felt like I wouldn't need to combo off ever, but in retrospect I think that was a mistake. The way things played out I realized that I was underevaluating the power of the tutors, and could have included more searchable synergy elements.
Elvish Visionary was a major piece of older Elves decks, and with GSZ, it would have been great for the grinding plan, but they're getting cut fairly universally. Maybe that would change in a GSZ world, but I can't say for certain. It also didn't end up mattering that much to the overall test.
My UW pilot kept trying to drastically change his sideboard and maindeck for the matchup, but I held firm. Most of his changes were the agreed upon and some he'd made for his real deck based on metagame shifts in late June.
No other changes occurred, and testing proceeded normally from this point. Or, at least, as normally as infrequent testing schedules allowed. Next week, I will reveal the data from those tests. See you then!