Building a Safety Benchmark Dataset in Hindi

This project has concluded.

In 2024, Tattle was selected in this pilot project to build a dataset of prompts in Hindi as part of ML Commons' safety benchmark. We followed Uli's participatory approach and created 2000 prompts in Hindi on two hazard categories: hate and sex-related crimes. These prompts were created by an expert group, consisting of individuals with expertise in journalism, social work, feminist advocacy, gender studies, fact-checking, political campaigning, education, psychology, and research.

Project Team

Mansi Gupta

Srravya C

Vamsi Krishna Pothuru

Saumya Gupta

Tarunima Prabhakar

Aatman Vaidya

Kaustubha Kalidindi

Denny George

Maanas B

Outcomes

AI Safety Benchmark Datasets in Hindi

Analysis of Indic Language Capabilities in LLMs

FAQ Contributors Privacy Policy Contact Us Site Map

Text and illustrations on the website is licensed under Creative Commons 4.0 License. The code is licensed under GPL. For data, please look at respective licenses.