Spotting Harmful Stereotypes in LLMs

Introduction to SHADES

“I hope that people use [SHADES] as a diagnostic tool to identify where and how there might be issues in a model,” says Talat. “It’s a way of knowing what’s missing from a model, where we can’t be confident that a model performs well, and whether or not it’s accurate.”

Creating the Multilingual Dataset

To create the multilingual dataset, the team recruited native and fluent speakers of languages including Arabic, Chinese, and Dutch. They translated and wrote down all the stereotypes they could think of in their respective languages, which another native speaker then verified. Each stereotype was annotated by the speakers with the regions in which it was recognized, the group of people it targeted, and the type of bias it contained.

Expanding the Dataset

Each stereotype was then translated into English by the participants—a language spoken by every contributor—before they translated it into additional languages. The speakers then noted whether the translated stereotype was recognized in their language, creating a total of 304 stereotypes related to people’s physical appearance, personal identity, and social factors like their occupation.

Upcoming Presentation and Reception

The team is due to present its findings at the annual conference of the Nations of the Americas chapter of the Association for Computational Linguistics in May. “It’s an exciting approach,” says Myra Cheng, a PhD student at Stanford University who studies social biases in AI. “There’s a good coverage of different languages and cultures that reflects their subtlety and nuance.”

Future Plans for SHADES

Mitchell says she hopes other contributors will add new languages, stereotypes, and regions to SHADES, which is publicly available, leading to the development of better language models in the future. “It’s been a massive collaborative effort from people who want to help make better technology,” she says.

Conclusion

SHADES is a significant step towards creating more accurate and unbiased language models. By providing a diagnostic tool to identify issues in models, it can help developers create more inclusive and fair AI systems. As the dataset continues to grow and expand, it has the potential to make a substantial impact on the field of natural language processing.

FAQs

Q: What is SHADES?
A: SHADES is a multilingual dataset designed to help identify and address biases in language models.
Q: How was the dataset created?
A: The dataset was created by recruiting native and fluent speakers of various languages to translate and annotate stereotypes.
Q: What is the goal of SHADES?
A: The goal of SHADES is to provide a diagnostic tool to identify issues in language models and help create more accurate and unbiased AI systems.
Q: Is SHADES publicly available?
A: Yes, SHADES is publicly available for contributors to add new languages, stereotypes, and regions, leading to the development of better language models in the future.