Keynote: Practical Defaults for A/B Testing

19 Nov 2022
11:20 - 12:20
Keynote Room

Keynote: Practical Defaults for A/B Testing

Ronny Kohavi – Consultant, Instructor for online controlled experimentation

1 minute video preview

Theo – Freelance CRO Specialist, feedback through our #CH2022 attendee survey:

Thanks for coming back and sharing your extensive knowledge again!

Slides


Notes

This is the link to the live notes of Ronny his talk

Questions asked by attendees through our #CH2022 app:

  • With >0.05 alpha two-tailed, you will have (a lot of) false nagetives, thus miss out on money. Is that not worse than a false positive?
  • Talking about stat. Significance, what di you think about frequentist vs bayesian tests evaluation?
  • You are referring to 2 tailed test, but would’nt you think one tailed approaches support our CRO-related goals better since we’re aiming for improvements?
  • Shouldn’t we look at risk of a wrong decision on a case by case basis? The million dollar decision is more risky than the hundred dollar decision.
  • What if you don’t run enough experiments to be confident of your successd rate to manage FPR?
  • Shouldn’t FPR be compared with a coin toss? Even at 41% you are still betaing the coin toss or the average hippo idea.
  • Reducing alphas and replicating reduces FPRs, but at a huge cost: lower power. And isn’t that what we’re really interested in? Finding real winners, reducing false negatives?
  • If the recommendation is to replicate an experimentation if in doubt, why not extending the duration of the initial test?
  • What benchmark would you use for success rate to calculate fpr?
  • Is MDE of 5% still a good default for testing on a checkout page from which 80% already converts or do you need a lower MDE?
  • If the company doesnt have that amount of traffic, should they completely abandon the idea of running A/B tests?
  • What would your advice be for testing on lower traffic websites (<200k)
  • Would you ever recommend testing from a bayesian approach over frequentist? Why or why not?
  • How about setting alpha based on defining the costs of a false positive (low: only costs of dev en pushing live), versus false negatives (high: missing real growth) up front?
  • “What do you think of sequntial analysis?
  • Should we only start analysis the results when the power is >80%?”
  • So if you can’t get 200k/2weeks what is the recommendation? Guessing? Hippo?
  • For low traffic websites: is running an experiment not still better than not running it and not validating anything at all?
  • Is it reasonable to be optimistic about MDE and work with lower sample sizes when managing innovation teams working on “big sized experiments?”
  • Is there a case for using non inferiority testing as the default?
  • What to do if you only have 5 days a year to test where 90 % of sales is made?
  • Did you optimize for speed? By how much did numbers change?
  • “Why “”flat OEC”” equals no ship?
  • E.g. a branding change caused a page update which had no OEC impact. Makes sense to still deploy the variation?”
  • How about experiments where there is no difference in cost/tech debt/maintenance between A/B. Couldn’t we just take what little data we have and take the highest mean with no test?
  • What would your advice be for B2B websites with low traffic (10.000 monthly users or less)?

Become an attendee of our next event!

The average #CH2022 attendee experience score on a 1 (awful) to 5 (awesome) scale was 4.79!

Get yourself on the Conversion Hotel ticket notification list!
Or buy access to our video archive and check to see if new #CH tickets are available!

<< Back to the the full #CH2022 overview!