Abstract:
Currently, synthetic data is highly relevant in machine learning. Modern synthetic data generation algorithms make it possible to generate data that is very similar in statistical properties to the original data. Synthetic data is used in practice in a wide range of tasks, including those related to data augmentation.
The author of the article proposes a data augmentation method that combines the approaches of increasing the sample size using synthetic data and synthetic anomaly generation. This method has been used to solve an information security problem of anomaly detection in server logs in order to detect attacks.
The model trained for the task shows high results. This demonstrates the effectiveness of using synthetic data to increase sample size and generate anomalies, as well as the ability to use these approaches together with high efficiency.