A machine-learning model needs to be taught before it can finish a task, like recognising cancer in medical photos. In order to train an image classification model, a large dataset comprising millions of example photos must be presented to the model.
Using genuine image data, however, can bring up both practical and moral issues: the photographs might infringe copyright rules, invade people’s privacy, or be prejudiced against a particular racial or ethnic group. Researchers can generate synthetic data for model training using picture generating technologies to avoid these issues.