Are you a Quiet Speculation member?
If not, now is a perfect time to join up! Our powerful tools, breaking-news analysis, and exclusive Discord channel will make sure you stay up to date and ahead of the curve.
Having finished the appetizer, it's time for the main course: the data from my Hypergenesis test. This is the hard, quantitative data, and I've done statistics on them to determine the validity of the test. For the stats people out there, I do a multiple significance test, but will report the z-test here. There's never been disagreement between tests, and I believe that more people will remember the z-test from high school than any others. Also, the Excel readout is cleaner.

Boilerplate Disclaimers
Contained are the results from my experiment. It is entirely possible that repetition will yield different results. This project models the effect that the banned card would have on the metagame as it stood when the experiment began. My result does not seek to be definitive, but rather provide a starting point for discussions on whether the card should be unbanned.
Meaning of Significance
When I refer to statistical significance, I really mean probability; specifically, the probability that the differences between a set of results are the result of the trial, and not of normal variance. Statistical tests are used to evaluate whether normal variance is behind the result, or if the experiment caused a noticeable change in result. This is expressed in confidence intervals determined by the p-value from the statistical test. In other words, statistical testing determines how confident researchers are that their results came from the test and not from chance. The assumption is typically "no change," or a null hypothesis of H=0.
If a test yields p > .10, the test is not significant, as we are less than 90% certain that the result isn't variance. If p < .10, then the result is significant at the 90% level. This is considered weakly significant and insufficiently conclusive by most academic standards; however, it can be acceptable when the n-value of the data set is low. While significant results are possible as few as 30 entries, it takes huge disparities to produce significant results, so sometimes 90% confidence is all that is achievable.
p < .05 is the 95% confidence interval, which is considered a significant result. It means that we are 95% certain that any variation in the data is the result of the experiment. Therefore, this is the threshold for accepting that the experiment is valid and models the real effect of the treatment on reality. Should p < .01, the result is significant at the 99% interval, which is as close to certainty as possible. When looking at the results, check the p-value to see if the data is significant.
Significance is highly dependent on the n-value of the data: in this case, how many matches were recorded. The lower the n, the less likely it is that the result will be significant irrespective of the magnitude of the change. With an n of 30, a 10% change will be much less significant than that same change with n=1000. This is why the individual results frequently aren't significant, even when the overall result is very significant.
Overall Matchup Data
As a reminder and for those who’ve never seen one of these tests before, I played 500 total matches: 50 matches with each experiment deck against each gauntlet deck. I switched decks each match to level out any effect skill gains had on the data. Familiarity and matchup knowledge naturally increase with games played, and since I would be better with both decks by the end, the data could end up skewed. Alternating decks ensures that the increase happens at the same time for both decks. Play/draw alternated each match, so both decks spent the same time on the draw and play. The deck lists for both the gauntlet and test decks can be found here.
- Total Neoform Match Wins: 77 (30.8%)
- Total Hypergenesis Match Wins: 122 (48.8%)

The data shows that Hypergenesis won a statistically significant percentage more than Neoform. P is so tiny that it is functionally certain that any variation is the result of the test and not natural variance. In other words, Hypergenesis did better than Neoform by a large enough degree that I can be certain the result is valid.
Honestly, I absolutely expected that Hypergenesis would do better than Neoform. It's been a pretty consistent refrain of mine for years at this point, but Neoform is not and has never been a good deck. It's pretty busted if it works, but very easily disrupted. The test (as far as I was concerned) was not to see if Hypergenesis is a better deck, but by how much. Players tend to grumble about this style of gameplay, but so long as it's inconsistent, it's no problem. Given the fact that Hypergenesis did 18% better than Neoform and in light of the cascade debacle, I think it's safe to conclude that Hypergenesis's data is instructive.
Additional Data
The hard data that a test seeks isn't always the total story. Often it's the surprises along the way that make a test. Sometimes I know what I want to look for, some only appear in exploratory testing. This time, I intended to watch for turn 1 wins. Those are obviously the most problematic aspects of broken combo decks, and since both decks have turn 1 kills, knowing which is more likely to win on turn 1 is instructive for their place in the metagame. I intended to count both actual wins and opponent concessions as wins. The latter was more relevant for Hypergenesis than Neoform, as the former's wins were often unsolvable boards rather than kills.
Actually following through and recording that data was a problem. Because I... *cough*... (mumbles) didn't. No excuses, I straight up forgot to record all the turn 1 wins. There were a number of sessions where it just slipped my mind. In fact, the only data that I'm sure that my numbers are accurate comes from the DnT testing. Which is less than ideal, but better than nothing.
| Deck | Turn 1 Game Wins vs DnT | % of Game Wins vs DnT | Average Win Turn |
|---|---|---|---|
| Neoform | 15 | 37.5 | 2.00 |
| Hypergenesis | 20 | 30.0 | 1.7 |
Hypergenesis won more games on turn 1, but they represent a lower percentage of the total game wins. This makes sense as Neoform is easily disrupted and relies on that fast kill. And always has. Plus, Hypergenesis won more games, so it would have more turn 1 kills.
However, I was surprised that Hypergenesis's average win turn is higher than Neoform's. It's very clear from the data, but I wasn't expecting that result, which challenges some of my assumptions about both decks. Hypergenesis's win distribution is bowl-shaped: Turn 1 had the highest number of wins, turn 3 was lowest, and there was a spike to turn 4 just below turn 1. Meanwhile, half of Neoform's game wins came on turn 2, there were no turn 3 wins, and a few turn 4's. Neither deck won after turn 4. It suggests that Neoform is more glass-cannon than expected, but perhaps not as broken.
Finding Fizzles
The other thing I watched for was fizzling. It's known that Neoform has a fizzle rate, but I've never seen it quantified. It's also important to define fizzling, and for me it was any time that the decks successfully started comboing, but failed to compile a winning sequence with no input from the opponent. Getting something countered or removed mid-combo and failing is not a "fizzle;"
that's just getting disrupted. Failing to finish the combo because of poor draws is. And this never happened to Hypergenesis. If it played a cascade spell, it cast Hypergenesis. That didn't always translate into a win, but that was thanks to opponent's action, not deck failure.
The same could not be said of Neoform. I recorded a fizzle rate of about 3%. These mostly happened due to drawing too few Nourishing Shoals to draw the whole deck or even get more than two Griselbrand activations.
Frequently, Neoform subsequently lost, though not always. Every so often, this was a loss because it took Summoner's Pact to get going. The most memorable fizzle was once I got down to 7 cards in library and 8 life, but couldn't win because I had no blue mana floating and all my Simian Spirit Guides and my last two Manamorphoses were in those 7 cards. I'd used the Wild Cantor to get going, so there was no way to get the mana and turn it blue for Laboratory Maniac without decking. My opponent untapped, Pathed Griselbrand, and won the game.
Deck By Deck
Given that the overall data is statistically significant, the deck-by-deck results may be
surprising. Regardless of the overall results, historically, the individual decks haven't always yielded significant results. This is because of the lower number of data points. I only have 50 matches to work with per deck rather than 250 for the overall results, so the threshold for significance increases. So if you see something odd in the data, blame the low n.
The other thing to note is that, unlike other tests, my play didn't change based on my opponent's deck. I always had to mulligan aggressively because there's little opportunity for sculpting either deck. I also always just went for the combo at first opportunity, particularly game 1. They're glass cannon combos without much or any interaction game 1, so there's nothing to gain by waiting. In games 2-3, I would only hold off on comboing if I had Ricochet Trap or Veil of Summer in hand against 4-Color, so that I could protect against counters. This meant that this test went a lot faster than any previous one. And was easier on me because I didn't have to think much.
In the order I finished the matches:
Death and Taxes
Death and Taxes does not interact turn 1 except via Path to Exile. However, each subsequent turn, the number of disruptive spells increases. Thalia is obviously rough for both test decks, but Archon of Emeria was game against Hypergenesis game 1. Both decks could subsequently be Strip Mined into submission. As a result, games didn't go very long and neither combo deck won after turn 4.
- Total Neoform Match Wins: 12 (24%)
- Total Hypergenesis Match Wins: 21 (42%)

This result is statistically significant at p<.05. The likelihood that Hypergenesis doesn't outperform Neoform is less than 5%, so we can be confidant in the result.
A big part of this result was that Leonin Arbiter was relevant disruption against Neoform and not Hypergenesis. My opponent planned ahead with the Burrenton Forge-Tenders against my Anger of the Gods. We discussed at length whether against Neoform it was better to Path the Griselbrand immediately or wait for Laboratory Maniac. I wasn't running Pact of Negation maindeck, but my opponent didn't know that but did know that it wasn't always played maindeck anymore. Taking the latter course 100% wins the game against my deck, but is risky otherwise.
4-Color Omnath
Something I didn't realize until this test is that Hypergenesis's text is different than Eureka's. The latter says all permanents, but Hypergenesis excludes planeswalkers. This actually takes it back to Eureka's original functionality, but it's still intriguing that Wizards deliberately made that change right before planeswalker's came out.
- Total Neoform Match Wins: 17 (34%)
- Total Hypergenesis Match Wins: 23 (46%)

These results are not significant at p>.10. Thus, we can conclude that Hypergenesis is not statistically better than Neoform in this matchup.
This was the only deck where either deck won later in the game, and the reason is that they could afford to. 4-Color Omnath wins rapidly, but not quickly. Once it actually produces threats, it puts the game away in short order, but that may take awhile. Thus, a single failure didn't spell the end for either deck. Fighting counter walls was hard, but not impossible, post-board. Hypergenesis could, and I sometimes did, overwhelm counter walls even late-game thanks to Trap. Occasionally, planeswalkers spared Omnath immediate death by bouncing a non-hasty Emrakul, but it was rarely enough.
However, longer games also gave Neoform more time to draw both Griselbrands, which could be lethal unless they managed to discard and then Noxious Revival one back and immediately combo off. Teferi, Time Raveler was game over for Hypergenesis, but there only being two copies meant it didn't happen too much. 4-Color getting to Supreme Verdict after sideboard helped a lot, but with only one, it didn't much tip the scale in its favor.
Scourge Shadow
As testing got going, my Scourge pilot got increasingly annoyed. Neoform does very poorly against discard, but Hypergenesis can overcome it thanks to cascade redundancy. Plus, both decks ran sets of Leyline of Sanctity in the sideboard. He frequently wished he was still on Grixis Death's Shadow to have counters as a backup. We tried running Blood Moon, and it was better than the cards we cut, but still wasn't very effective against either deck.
- Total Neoform Match Wins: 16 (32%)
- Total Hypergenesis Match Wins: 25 (50%)

This result is statistically significant at p<.05. Thus, we can conclude with confidence that Hypergenesis is statistically better than Neoform in this matchup.
The difference here is Neoform's game 1 weakness to Thoughtseize. Both deck's improve a lot after board while Scourge's options are limited. However, both need to cheese game 1 to beat hate games 2 and 3, and that being so much easier for Hypergenesis was decisive. Take my Violent Outburst? I've got 11 more ways to cascade. Take my fatty? Tons more, and you can't kill any of them. Also, Chancellor of the Annex was especially good here thanks to Scourge's low land count. Mishra's Bauble is a work-around, but doesn't always line up correctly.
Amulet Titan
Amulet's game 1 against combo is a straight race. And unfortunately, it's slower than most combo. There was some hope after board because this deck ran 3 Mystical Dispute, but that's narrow against Neoform and pretty poor against Hypergenesis. The biggest hope against Hypergenesis was to keep Primeval Titan, Dryad of the Illysian Grove, and five lands so that Hypergenesis immediately turned on Valakut, the Molten Pinnacle.
- Total Neoform Match Wins: 14 (28%)
- Total Hypergenesis Match Wins: 27 (54%)

This result is strongly significant, p<.01. This is in fact the most strongly significant individual result.
Dispute did a lot of work against the fast Neoforms, bumping up Amulet's win percentage. However, I also recorded more fizzles here than in other matchups. I think that this result is actually more attributable to variance than it appears. Not enough that it would have pushed it out of significance or change the overall conclusion, but enough to alter the stats.
Oops, All Spells
Oops was a lot like Amulet in that game 1, it was a straight-up race. The difference is that, under very rare circumstances, Oops can kill on turn 1 too. Thus it could keep pace with the combos. Casting Hypergenesis against a single-creature combo deck may seem like a liability, but the creatures in Oops lose to the Hypergenesis ones, so it couldn't usually attack for the win. And that's not counting the times that Urabrask the Hidden was disruptive.
- Total Neoform Match Wins: 18 (30%)
- Total Hypergenesis Match Wins: 26 (52%)

This result is weakly significant at p<.10. It just missed the 95% interval, likely one positive result away. If this were an academic paper, this is what I'd be writing my Further Research section about.
My combo decks didn't sideboard against Oops. Neither had any graveyard hate, and even then, why bother if we're racing? Oops removed the useless maindeck Leylines for Thoughtseizes, but those are only effective against Neoform, so the general tone of the matchup never changed. I've since wondered how things would have been different if Oops was also running the Belcher option like many do now, but that just wasn't a thing in November.
Half the Story
And that's the hard data. However, it's not the full story of what I found during the test. And it also doesn't address the effect of banning Simian Spirit Guide. For all that and my conclusions, tune in next week.







were so
record the results, and then statistically analyze the results to see if adding the tested card made a statistically significant change. I also record the overall gameplay experience and any interesting details that come up during testing, because raw data doesn't tell the full story.
Most lists ran a full set of Terastodon and several Ashen Riders, and I'm not sure why. They didn't do much in exploratory testing, so they were cut down to make room for Annex. I cut back on legends generally because I had multiple copies too often. This despite Emrakul and Progenitus being the main threats. It's also responsible for the split between Urabrask and Dragonlord Kolaghan. Both are mainly there to give everything haste, and having the split meant that I could have both out and protect against Path to Exile.

The sideboard features a key difference from previous builds in its bid to run a full set of Sorcerous Spyglass. Previously, I'd bounced between employing Spyglass (which gets around Chalice and offers a bit of extra information) and Pithing Needle (which is importantly one mana cheaper) in CES depending on the metagame. Either way, though, the effect never merited more than 2 slots in the sideboard. Things changed with Kaldheim. Suddenly, every deck and its grandmother seemed to feature the power-play of
Biggest of all changes here is my move to 26 lands. I abandoned Endless One and Matter Reshaper,
Running so many lands had another positive effect on the deck that I hadn't anticipated, although in hindsight it makes perfect sense. We often get low on cards in the mulligan stage, and having many "spell-lands" in the deck rather than actual spells reduces the pressure on those leaner hands. At 26, we can keep more hands that are light on mana and feel confident the deck will deliver what we need. Plus, there are some matchups where more lands—any lands—are preferred over other cards such as Copter or Dismember (Burn, for instance), and we lack the sideboard space to optimize a plan (these matchups tend to be favorable already).
With Guide's praises good and sung, it's time to focus on the task at hand: replacing it. Naturally, Modern contains no free-mana cards as generically splashable as Simian Spirit Guide, which is exactly why the creature was axed. And it's not like we can keep running Chalice of the Void; there are some decks a Chalice on 1 doesn't beat, and even in the matchups where it shines, the artifact can sometimes be too slow if played on turn two.
To boot, the card that replaced Guide, Endless One, is yet another aggressive body, dropping as a 2/2 on turn one and a fatty down the road. One is fantastic in this build, plugging curve holes in the Temple hands while resolving for as low as one mana to crew Copter in a pinch.
CES always had two kinds of openers to find with its mulligans: nutty Temple hands and turn one Chalice hands. Scourge represented our third
That's not to downplay the fact that turn one Chalice put away games by itself. And Smuggler's Copter is no turn one Chalice. But with that out of the way, Smuggler's Copter is nonetheless phenomenal against basically everything. Removal-heavy midrange decks? Bask in the tempo-sucking warmth of
Besides 4 Blast Zone and 4 Mutavault, other changes to the manabase include cutting Scavenger Grounds entirely (we play 4 Relic now) and maintaining Gemstone Caverns at 3. I did try 4, both with a sided copy and a mainboarded one, but found that it was overkill; the legend clause can really bite us with that many (or else I'd max it fa sho), and there are some matchups where starting "on the play" at the cost of a card isn't even worth it unless we happen to have Scourge (e.g. against attrition decks such as Rakdos Midrange). Caverns is at its best against linear combo and aggro decks, where the speed boosts from Simian are most sorely missed.
Simian Spirit Guide powered Colorless Eldrazi Stompy's most impressive openers, leading to utter nonsense like turn one Chalice (common) and turn two Reality Smasher (with some luck). Heck, I've even turn one'd a Smasher!




With that, onto the banned cards. Uro, Titan of Nature's Wrath getting axed is not at all surprising. It's been
However, Uro isn't going down quietly. Two of its best friends, Field of the Dead and Mystic Sanctuary, were also banned. Even in death, Uro just can't stop generating extra value, can it? This was actually quite shocking. As I explained in the comments of the
Unlike Field's
Ad Naus and Neoform as we knew them are dead. The traditional kill required a minimum of five mana on the kill turn, with a three mana investment in Phyrexian Unlife, and then three mana to cast Storm on turn four. While it's not impossible for Ad Naus to adapt and play more artifact mana, it will make the Storm kill far harder. The alternate kill of Spoils of the Vault into Thassa's Oracle is still intact, so perhaps the deck will rebuild around them and drop the namesake. Neoform has been driven out already, but it had no way to make mana besides SSG, so I don't think it will survive in recognizable form.
The final ban was Tibalt's Trickery. And that's so much whatever. The card is clearly a mistake, but the deck was harmless. Consistency and power-wise, it was
However, I don't begrudge Wizards. I don't think the ban is necessary, but nothing of value's being lost either. Either Trickery did nothing or was busted, and maybe valuing back-and-forth gameplay over "oops, I win" is good. The first irony of it is that all it takes for Trickery to be an interesting Polymorph variant rather than bannable is to change "Counter target spell" to "Counter target spell an opponent controls." The second, with the reasons given, is that banning Trickery probably obviates the need to ban SSG and vice versa.

theoretically keep making land drops and cast the big bombs, but no opponent who
It looked initially like Valki was another Jund card which may or may not pan out. Traditionally, Jund relied on Dark Confidant for card advantage. However, the rise of Wrenn and Six has made Bob too vulnerable. Valki looked like a potential replacement. For the same stats, Valki nets at least some value by looking at opposing hands before getting removed. If Valki takes a creature and isn't killed, best-case scenario is Valki becomes a turn 3 Uro. Which is a huge upside when it happens, though mostly it's just a Peek.
draws two cards a turn and the emblem never leaves. However, that alone isn't good enough to make it in Modern. Just like Karn, Tibalt's cost is prohibitive and abilities are weak unless accelerated out.
The advice I'm giving then is to target Tibalt. Particularly, don't let Tibalt resolve, as the value it might (some hits may be worthless) generate from one activation goes a very long way. Given the number of free counters this deck runs and the trend towards a full set of Teferi, I don't think counters are the way to go. Permanent-based answers are a little risky as they can be bounced, but a diverse suite of answers is very effective. They can't answer everything. For example, going Pithing Needle into Damping Sphere followed by Teferi will completely stop the combo and lead to a lengthy fight to keep all the hate safe from bounce effects.
The cheap value engine is another story. If Tibalt's emblem came from his ultimate, then there wouldn't be a problem. Then he'd just be a color-shifted Karn Liberated. However, that isn't how it works, and as a result Tibalt, Cosmic Imposter draws two cards a turn from the moment he hits the board. We've already had this problem with Oko and Uro; adding another ridiculous cheap engine is not okay.


