Building Diverse Datasets

Emmanuel discusses the challenges of creating a dataset for 100 skin shades, highlighting the initial reliance on celebrity images to establish a proof of concept. As the project evolved, the need for a more comprehensive data pipeline became clear, leading to the incorporation of synthetic data to fill gaps in representation, particularly for specific skin tones.