
Abstract: Online controlled experiments (A/B tests) have become the gold standard for learning the impact of new product features in technology companies. A/B testing empowers Wish to continuously improve the customer experience and measure the impact of new ideas, market strategies, product features, and machine learning models. At any given time, there are hundreds of experiments running at Wish.
Although the concept of A/B testing is simple, it is challenging to conduct A/B tests at scale. Over the past few years, data scientists and engineers at Wish have substantially evolved the statistical engine ( hereafter engine) of our experimentation platform. The engine provides t-test, ratio metric testing, or percentile metric testing to run hypothesis testing for different types of metrics. When rolling out a neutral feature, the engine runs non-inferiority tests. The engine conducts multiple hypothesis corrections and sequential testings when testing multiple metrics and making decisions sequentially. It also analyzes sample ratio mismatch that verifies that each experiment is unbiased. Further, to ensure the engine's reliability, we run hundreds of simulated A/A and A/B tests to evaluate the false-positive rate and power and various A/A tests to evaluate the false positives holistically.
An open-source python library wishab from Wish implements this statistical engine. We tested the library in the field with the experiments at Wish. Further, The library wishab is platform agnostic. Data scientists at any company can utilize their existing data pipeline to compute the summary statistics, which are inputs for wishab. Lastly, we implemented wishab with scalability in mind and applied it for experiments with hundreds of millions of users.
Bio: Max is a Staff Data Scientist at Wish where he focuses on online experimentation (A/B testing) and machine learning. He has been revamping the A/B testing platform at Wish on various fronts, including infrastructure, statistical testing, usability, etc. His passion is to empower data-driven decision-making through the rigorous use of data. Max earned his Ph.D. in Statistical Informatics from the University of Arizona.