Keynote: Practical Defaults for A/B Testing
Ronny Kohavi – Consultant, Instructor for online controlled experimentation
1 minute video preview
Theo – Freelance CRO Specialist, feedback through our #CH2022 attendee survey:
Thanks for coming back and sharing your extensive knowledge again!
Slides
Notes
This is the link to the live notes of Ronny his talk
Questions asked by attendees through our #CH2022 app:
- With >0.05 alpha two-tailed, you will have (a lot of) false nagetives, thus miss out on money. Is that not worse than a false positive?
- Talking about stat. Significance, what di you think about frequentist vs bayesian tests evaluation?
- You are referring to 2 tailed test, but would’nt you think one tailed approaches support our CRO-related goals better since we’re aiming for improvements?
- Shouldn’t we look at risk of a wrong decision on a case by case basis? The million dollar decision is more risky than the hundred dollar decision.
- What if you don’t run enough experiments to be confident of your successd rate to manage FPR?
- Shouldn’t FPR be compared with a coin toss? Even at 41% you are still betaing the coin toss or the average hippo idea.
- Reducing alphas and replicating reduces FPRs, but at a huge cost: lower power. And isn’t that what we’re really interested in? Finding real winners, reducing false negatives?
- If the recommendation is to replicate an experimentation if in doubt, why not extending the duration of the initial test?
- What benchmark would you use for success rate to calculate fpr?
- Is MDE of 5% still a good default for testing on a checkout page from which 80% already converts or do you need a lower MDE?
- If the company doesnt have that amount of traffic, should they completely abandon the idea of running A/B tests?
- What would your advice be for testing on lower traffic websites (<200k)
- Would you ever recommend testing from a bayesian approach over frequentist? Why or why not?
- How about setting alpha based on defining the costs of a false positive (low: only costs of dev en pushing live), versus false negatives (high: missing real growth) up front?
- “What do you think of sequntial analysis?
- Should we only start analysis the results when the power is >80%?”
- So if you can’t get 200k/2weeks what is the recommendation? Guessing? Hippo?
- For low traffic websites: is running an experiment not still better than not running it and not validating anything at all?
- Is it reasonable to be optimistic about MDE and work with lower sample sizes when managing innovation teams working on “big sized experiments?”
- Is there a case for using non inferiority testing as the default?
- What to do if you only have 5 days a year to test where 90 % of sales is made?
- Did you optimize for speed? By how much did numbers change?
- “Why “”flat OEC”” equals no ship?
- E.g. a branding change caused a page update which had no OEC impact. Makes sense to still deploy the variation?”
- How about experiments where there is no difference in cost/tech debt/maintenance between A/B. Couldn’t we just take what little data we have and take the highest mean with no test?
- What would your advice be for B2B websites with low traffic (10.000 monthly users or less)?