Data Curation Insights

Akshita emphasizes the importance of using publicly accessible data for their project, detailing the meticulous curation process involved. Initial language and quality filters were applied, with a focus on eliminating toxicity and personally identifiable information. Additionally, they employed tools to ensure the dataset remained uncontaminated by existing evaluation benchmarks, maintaining scientific integrity.