Once again, it is time to start rolling out my results from the latest Banlist Test. As usual, I will start with the experimental setup and the unquantifiable results. I know that what most readers care about are the hard numbers, but I'm not done gathering the data yet. That will be coming sometime in September—probably. I'm done with Storm and about halfway through the UW testing. Completion date will depend on how the PPTQ season goes, as I'm splitting my testing time between that and Preordain.
For those who are new to the series, I take a card from the Modern Banned List, put it back into the deck that got it banned (or as close as possible), and see how it fares in the current metagame. My goal is to bring hard data and scientific inquiry into the discussion instead of more opinion and baseless speculation. Therefore, I play a lot of matches with the deck (normally 250 with the banned card, 250 without it) to build a sufficient data set for analysis. I take the test data, compare it to the control data, and from that I hypothesize about the safety of the test card. I laid all this out in more detail in a previous piece. The card that readers voted for me to test this time was Preordain.
This test was very different from the last several. With both Stoneforge Mystic and Jace, the Mind Sculptor, I just tested a single deck against the gauntlet. While this often took a while, the testing was fairly straightforward. I took the deck, learned the deck enough to be passable, ran the gauntlet. The decks I was using certainly helped. Yes, they were midrange decks, but their gameplan was clear and the decision trees relatively clear and comprehensible.
This time, for reasons explained here, I tested Gifts Storm and UW Control. This complicated things. To get a decent data set for both I'd have to play a lot more games. Doing the usual 500 matches would yield half the data. I made this harder for myself by playing hard decks against hard matchups. These decks require a lot of experience to navigate and Storm is very vulnerable to itself in the face of pressure. I'm not claiming to have played these decks perfectly, but I was at least average with Storm and good enough with UW that I took an updated version to a PPTQ. Thus if you see issues with the results or my data, consider that I am just one man with a few volunteers—in an enormous undertaking like this, exhaustion and deck difficulty are bound to play a part.
As always, I would be piloting the test decks against (semi-) willing opponents wielding decks that they are reasonably good with. We'd play match after match at a stretch, with me alternating between the test and control deck to even out the experience and skills I was developing during the tests. Prior to data collection, we always played at least a few practice games to get a feel for things and determine the correct sideboard plans. Previously, my team has used a variety of methods to actually play the games, including MTGO. We did not use MTGO at all this time. This prevented us from losing matches to misclicks and ruining the data set. It was also significantly cheaper. I don't own most of the digital pieces for Storm, couldn't get them, and already dislike MTGO. Playing paper in person or over Skype was much easier. And free. I like free.
As I mentioned above, my data set is normally 500 matches. That is too small a set for two decks, but it was logistically implausible to just double it. It takes months to get all the data together as is—doubling would push completion into October at the earliest. I'm just not going to put that kind of time in to this project. Therefore, this data set is 640 total matches (160 per deck, and 32 per matchup). Why 640? I didn't have a set target when I started, but I knew that 150 was the bare minimum. Of course, I was testing both decks simultaneously to save time and I was burning out. I decided I'd had enough at 27 matches, but that was an ugly looking number and felt like too big a cop-out so I kept going to 30. And then did two more so we'd get nicer aggregate numbers.
The Test Decks
All of the decks were chosen in mid-May. They are as close to "average" lists as my team could find. Several members were irritated, as they wanted to try out their personal tech during testing, but the whole point is to see how these cards work against a representative metagame. Thus we used the most average build of every deck possible.
Choosing the test decks was harder than actually fitting in Preordain. In previous tests, I actually had to build decks around the test card. Stoneforge Mystic requires six slots minimum, Jace, the Mind Sculptor benefits from and rewards decks that play lots of very cheap spells. This required actual deckbuilding. This time I'm testing a cantrip in decks that already play cantrips. I just replaced the weaker one for Preordain. There is some consideration of adding more, like a Legacy deck would do, but we couldn't agree on how to do that and the clock was ticking. I went with the quick and easy option.
The core combo of the deck is very well established, and it's just as powerful and fragile now as it was in 2013. Swapping Pyromancer Ascension for Baral and the banned Gitaxian Probe for Gifts Ungiven is the only new innovation. I saw some lists running Merchant Scroll, but that was very much a fringe choice and didn't make the cut.
The most common sideboards at the time were Gifts packages. I'm not sure they're actually better than more focused boards, particularly because there are no Blood Moons, but this was what saw the most play at the time. I don't know that it made much of a difference. My experience showed that sideboarding was a very delicate thing and I did it at the barest minimum possible to preserve the combo. I doubt that the exact composition of my sideboard would have changed that plan. There was some consideration for the transformative Madcap Experiment/Platinum Emperion combo, but everyone I asked said it was worse than extra Empty the Warrens.
There's a lot more variation in UW Control, and it took awhile to put together a "stock" list. Sphinx's Revelation and Ancestral Vision didn't make the cut in favor of Spreading Seas and Condemn, by a very small margin.
The Spell Queller plan was popular at the time, though it has gone away recently. I didn't really like it, but it also didn't have much opportunity to shine.
As usual, I chose five decks from all corners of the metagame, giving preference to Tier 1 decks. Again, the point is to test the power of these boosted decks; it makes the most sense to test against the best. This was both easier and harder than before. Every type of deck was represented in Tier 1 in May, but the control deck was UW Control. Which I was already testing by virtue of it being the... erm, control deck.
I needed to use the same gauntlet for both decks so the results were comparable. As such I fudged it to use a Jeskai list. This is not unusual now, with Jeskai ticking up in popularity, but it was unheard of at the time. I'm also fudging a bit by using Counters Company as my combo deck. It's far more combo than Abzan Company was, but it's still not a true combo deck.
[su_spoiler title="* Note on Burn" style="fancy"]Naya Burn appeared to have been pushed out of the mainstream, so we used a Boros list.[/su_spoiler]
The initial results are actually very disappointing. At this point I've played over 500 matches (~140 to go!) and I don't have a strong opinion on Preordain. This shouldn't be surprising: it's a cantrip. Cantrips don't have that much impact on a game (unless you play a lot of them), hence the name (it's a D&D reference). They're like the oil in an engine. You notice when they're not there, but otherwise you just don't see the impact. Upgrading your cantrip is like buying higher quality oil. Yes, your engine will run smoother and your mechanic may see some improvement, but you are unlikely to actually notice any difference in normal operation.
In a way, that is my answer. It didn't really feel special to play with Preordain. It was a definite improvement over the replaced cantrip, but not enough for me to feel strongly about the card. Its value swung wildly based on the situation and stage of the game, but so does that of any cantrip. Part of that may be how I played it, and it is very possible that decks would be built very differently with Preordain in the format. But players may also find that the lengths you have to go to just aren't worthwhile, like putting high-octane gas and racing lubricant in a Civic.
I barely noticed any difference between Preordain and Sleight of Hand. This is probably because most of the time Preordain was Sleight of Hand. I will include the actual numbers when I circle back to this, but most of the time I kept one card and bottomed the other. You do get extra value from having options, but I didn't utilize them very often. It is entirely possible that I was wrong about that, but it certainly didn't seem that way to me or my team.
Preordain was swept up in the post-Pro Tour Philadelphia 2011 crackdown on combo. At the time it made sense—not all the combo decks used fast mana but they all used cantrips. Subsequent bannings have further weakened combo. Based on what I experienced, those later bannings made cantrips worse in combo. Games when I had a cost-reducer into Gifts Ungiven were far better than stringing cantrips together. It just didn't feel important to Storm.
Of course, it really doesn't feel special in UW either. It is unequivocally better than Serum Visions after turn four, but on turns 1-2, it's worse. In the mid- to late-game, you're looking for specific answers and Preordain delivers them right away instead of setting you up for next turn. However, early on you're just looking to get deeper into your deck, and Visions will always show you three cards. You get a random card that you won't play anyway and then set up for the next two turns. It's normally correct to Visions at the first opportunity as a result. Preordain cannot do that, so you don't play it early, saving it to find specific cards when you need them. I suspect that I should have played both, but hindsight is 20/20. I believe that I'm doing better as the game goes long but losing to mana screw early more often. We'll see what happens when the data comes in.
So that's it for now, I'll be back with the data sometime relatively soon. Next week, we'll be seeing if anything interesting happens to the banlist on Monday.