Helping students recognize potential bias in AI datasets

Introduction to AI and Bias

Every year, thousands of students take courses that teach them how to deploy artificial intelligence models that can help doctors diagnose disease and determine appropriate treatments. However, many of these courses omit a key element: training students to detect flaws in the training data used to develop the models.

The Impact of Bias in AI Models

Bias in AI models can have serious consequences, particularly in the medical field. Many previous studies have found that models trained mostly on clinical data from white males don’t work well when applied to people from other groups. This is because the data used to train these models is often biased, and any problems in the data will be baked into the models.

Sources of Bias in Data

Any problems in the data will be baked into any modeling of the data. In the past, instruments and devices that don’t work well across individuals have been described. For example, pulse oximeters overestimate oxygen levels for people of color because there weren’t enough people of color enrolled in the clinical trials of the devices. Medical devices and equipment are optimized on healthy young males, but they are used for a diverse population.

Electronic Health Records and Bias

The electronic health record system is not designed to be a learning system, and for that reason, it’s essential to be careful when using electronic health records. The system is not optimized for diverse populations and can introduce bias into AI models. One promising avenue that is being explored is the development of a transformer model of numeric electronic health record data, including laboratory test results.

The Importance of Teaching Bias in AI Courses

It’s crucial for courses in AI to cover the sources of potential bias. Our course at MIT started in 2016, and at some point, we realized that we were encouraging people to build models that are overfitted to some statistical measure of model performance, when in fact the data that we’re using is rife with problems that people are not aware of. When we looked at different online courses, we found that only five included sections on bias in datasets, and only two contained any significant discussion of bias.

What Course Developers Should Incorporate

Course developers should incorporate content that teaches students to evaluate their data before incorporating it into their models. This includes giving them a checklist of questions to ask about the data, such as where the data came from, who collected it, and what devices were used to measure it. Truly, understanding the data is 50 percent of the course content, if not more.

Critical Thinking Skills

Our main objective now is to teach critical thinking skills. The main ingredient for critical thinking is bringing together people with different backgrounds. When we have datathons, we don’t even have to teach them how to think critically. As soon as you bring the right mix of people, it just happens. We now tell our participants and students to please not start building any model unless they truly understand how the data came about.

Conclusion

In conclusion, bias in AI models is a significant problem, particularly in the medical field. It’s essential to teach students to detect flaws in the training data used to develop these models. By incorporating content that teaches students to evaluate their data and by promoting critical thinking skills, we can reduce the risk of harm caused by biased AI models.

FAQs

Q: What is bias in AI models?
A: Bias in AI models refers to the errors or inaccuracies that occur when the data used to train the models is not representative of the population it will be applied to.
Q: Why is it essential to teach bias in AI courses?
A: It’s crucial to teach bias in AI courses because biased models can have serious consequences, particularly in the medical field.
Q: How can course developers incorporate content that teaches students to evaluate their data?
A: Course developers can incorporate a checklist of questions that students can ask about the data, such as where the data came from, who collected it, and what devices were used to measure it.
Q: What is the main ingredient for critical thinking?
A: The main ingredient for critical thinking is bringing together people with different backgroun