When Adobe Inc. released its Firefly image-generating software last year, the company said the artificial intelligence model was trained mainly on Adobe Stock, its database of hundreds of millions of licensed images. Firefly, Adobe said, was a “commercially safe” alternative to competitors like Midjourney, which learned by scraping pictures from across the internet.
But behind the scenes, Adobe also was relying in part on AI-generated content to train Firefly, including from those same AI rivals. In numerous presentations and public postsabout how Firefly is safer than the competition due to its training data, Adobe never made clear that its model actually used images from some of these same competitors.
Well, you wouldn't train on images that you consider bad, or rather you'd use them as examples for what not to do.
Yes, you have to be careful when training a model on its own output. It already has a tendency to produce that, so it's easy to "overshoot", so to say. But it's not a problem in principle. It's also not what's happening here. Adobe doesn't use the same model as Midjourney.