Ah, the joy of figuring out a connection between two things. The feeling of achievement when you realize that users who progress past level 10 on day 0, retain better in the future. Makes you want to skip around and run to the game director with an optimization suggestion. But wait…What is that anvil doing in the clear blue sky? Why is it approaching you?
Let me tell you what just hit you.
If we take the example above, you thought there was a correlation between reaching level 10 on day 0 and retention. However, the fact that a user progressed past level 10 may mean that they simply like your game and would have retained anyways.
Or, you understand that users who receive more rewards on day 0, show higher levels of retention later on. It might be true, but it is also likely that these users get more rewards because they are more engaged with the game on day 0. Naturally, they will retain better in the future. So you might want to hold off giving out rewards left and right to every new user after you’ve discovered this connection.
Some call it “correlation causation fallacy,” yours truly prefers “self-selection effect.” I like to say that the self-selection effect is a heartless entity and haunts analysts in pretty much all tasks they do.
Now, Why Is It So Daunting?
1. It can completely devalue your analytical findings.
Remember that anvil I told you about? That’s why it’s important to think about this effect early on. Not before you make a conclusion, but even before you start the analysis.
2. It can be very hard to account for.
Let’s take the level 10 example I mentioned in the beginning. It is possible that progressing past level 10 simply means that a user enjoys the game a lot, and you will be no better off if you make everyone reach level 10 on day 0. BUT it might also be that levels prior to 10 are less fun to play, earlier rewards are less appealing, and the whole game just starts to “blossom” after level 10?
This is extremely hard to figure out by analysis alone, without testing.
3. It can make you doubt everything you do.
Now it’s more of a psychological aspect that I usually do not cover, but this time I will. While it is crucial to keep self-selection in mind, it can become daunting and make you question your every move. Should I even look into the progression & retention correlation? Is this finding useless? Can it be that there is nothing to optimize? Can we ever be sure? All these are important questions and they need to be pondered upon, but, fellow analysts, don’t let self-selection effect stop you from at least attempting to get to the bottom of whatever you’re looking into.
What Can You Do About It?
1. Squeaky clean baseline.
Let’s go back to the rewards example. Imagine that your game offers a reward crate after each level. Each crate can award 5/10/15/20 coins. You would like to find out whether being lucky with these crates on day 0 has an impact on future retention and conversion.
You may be tempted to segment the users by the amount of coins they received on day 0:
Segment 1: 0-10 coins on day 0
Segment 2: 11-50 coins on day 0
Segment 3: 51+ coins on day 0.
It is likely that segment 3 will show higher retention and conversion rates. However, due to the heartless ghost effect we’ve been discussing, this finding on its own does not mean that increasing rewards on day 0 will help retain new users. Segment 3 players are likely to enjoy the game more and play more actively on day 0, thus getting more rewards.
There is a trick you can do to account for self-selection here.
Consider restricting your sample to only the users that opened the same number of crates & reached the same level. For example, analyse the users who reached level 5 & opened 10 crates on day 0. This way, you know that the sample you are looking into is likely to be homogenous in terms of game enjoyment and skill – so, you can safely segment these players by rewards earned and look into their future behavior.
2. AB testing.
I know, that sounds like something really obvious! Not always available though (surprise-surprise!). But if your gut tells you that reaching level 10 might be something that motivates users to progress and you have a reliable AB testing framework set-up – go for it, test it and see if your theory is true. If there is no difference – self-selection effect got you. If there is an improvement in retention if you make reaching level 10 easy for everyone on day 0 – star for you and your intuition.
3. Approaching the problem from a different side.
Back to the level 10 example: you probably ventured out to analyse it because you were willing to optimize retention and improve player experience on day 0. If you cannot account for self-selection and you have no possibility to AB test your theory, think of other ways to analyse the same thing. In this case, it might be useful to build a retention graph by minutes spent in game on day 0 and use it to look for optimization opportunities instead of focusing on what levels users achieve on day 0. (here is a good article on the topic).
Maybe you can also check if there are any points at which users tend to drop-off during the tutorial? How does the difficulty increase as users progress? Explore your options, don’t focus on the one that has a self-selection effect you cannot account for!
There we go, here is a brief and (hopefully) fun explanation of what self-selection effect is and what to do about it. I hope you learned something! And if you have anything to add – please let me know. We need more ghostbusters to fight this heartless effect.