Abstract: I introduce a new, NoGAN alternative to standard tabular data synthetization. It is designed to run faster by several orders of magnitude, compared to training generative adversarial networks (GAN). In addition, the quality of the generated data is superior to almost all other products available on the market. The hyperparameters are intuitive, leading to explainable AI.
Many evaluation metrics to measure faithfulness have critical flaws, sometimes rating generated data as excellent, when it is actually a failure, due to relying on low-dimensional indicators. I fix this problem with the full multivariate empirical distribution (ECDF). As an additional benefit, both for synthetization and evaluation, all types of features — categorical, ordinal, or continuous — are processed with a single formula, regardless of type, even in the presence of missing values.
In real-life case studies, the synthetization was generated in less than 5 seconds, versus 10 minutes with GAN. It produced higher quality results, verified via cross-validation. Thanks to the very fast implementation, it is possible to automatically and efficiently fine-tune the hyperparameters. I also discuss next steps to further improve the speed, the faithfulness of the generated data, auto-tuning, Gaussian NoGAN, and applications other than synthetization.
Bio: Rajiv works as a Data Specialist at MLTechniques. He has over 21 years of extensive experience in database development and analytics. His prior professional background features roles at prominent organizations such as Oracle, Verifone, and TCS. Presently, Rajiv is deeply immersed in projects centered around Generative AI solutions. He has developed two GenAI python packages in this domain: "genai-evaluation" & "nogan-synthesizer". He lives in Bangalore - India, and has a strong passion for staying updated with the most recent research and trends in the field of Machine Learning.