Building a Safety Benchmark Dataset in Hindi

This project has concluded.
ML Commons Report

In 2024, Tattle was selected in this pilot project to build a dataset of prompts in Hindi as part of ML Commons' safety benchmark. We followed Uli's participatory approach and created 2000 prompts in Hindi on two hazard categories: hate and sex-related crimes. These prompts were created by an expert group, consisting of individuals with expertise in journalism, social work, feminist advocacy, gender studies, fact-checking, political campaigning, education, psychology, and research.

Project Team

Mansi Gupta

Srravya C

Vamsi Krishna Pothuru

Saumya Gupta

Tarunima Prabhakar

Aatman Vaidya

Kaustubha Kalidindi

Denny George

Maanas B

Outcomes

Landscape Analysis Report

AI Safety Benchmark Datasets in Hindi

ML Commons Report

Analysis of Indic Language Capabilities in LLMs

Text and illustrations on the website is licensed under Creative Commons 4.0 License. The code is licensed under GPL. For data, please look at respective licenses.