Keynote: one neat trick to run better experiments

23 Nov 2019

10:55 - 11:30

Keynote Room

Lukas Vermeer (NL)

Lukas Vermeer (NL)

5 minute video summary

Faye – Marketing Analyst, feedback through our #CH2019 attendee survey:

Loved this session. Very nice level of information, great presentation.

Slides

Notes

Check the live notes of Lukas his talk

Questions asked by attendees through our #CH2019 app:

How often does a SRM occur @ Booking?
How do you know you’ve checked for all possible variables? Is there a defined list of variables?
what is the difference between SRM and selection bias?
Do tests exist that calculate expected SRM?
What do we need to do with tests that have ran for weeks and than turn out to be suffering from SRM?
If you have run a test or research, and there’s SRM, can you correct your results. Or is it game over (you probably answer this later in your talk)
Should you also check SRM when both sample sizes are equal in number?
Could cookie banners and consents be causing SRM? How to get around it?
Google specifically says in their ab testing and Google search to specifically not exclude the bot or, because that’s against guidelines, but that could well be a big reason for SRM, what’s your opinion
Could responsibility for SRM find itself into the contracts of experimentation tools, ideally acting as processors? Where does the liability for bias lie?
Is SRM another word for response bias/sampling errors? Or is the difference in numbers, whilst being sampled proportionally , a problem as well?
Is it possible to fix your data (with srm) after the test ended? And how? Cleaning data afterwards?
When does a SRM error happens the most? At the beginning or the end of an experiment?
Is it possible to decrease the risk of SRM by doing more research upfront?
How do you have enough time writing papers and doing analysis while keeping up with the latest memes
How do you check SRM with dynamic traffic allocation?
Is SRM making multi armed bandit testing obsolete?
How come SRM is only getting attention in the industry now?
so talking about groups on test that are not equal. How do you correct for a test a the bottom of the page comparing it to the A group ?
Can you have a SRM mismatch on a session level but not on a user level? Or vice versa
What do we have to do to discuss this issue before testing/doing well at the start?
Is SRM something that can happen in exact science experiments?
If you have machine learning tests, how do you track srm?
So conclusion: the more you know about data (SRM, false positives), the less we are sure about the results. Let’s start redesigning again 😄