Why Do Large Language Models Fabricate Information?

Introduction to Artificial Intelligence Models

Fine-tuning helps mitigate the problem of artificial intelligence models providing inaccurate or unrelated responses, guiding the model to act as a helpful assistant and to refuse to complete a prompt when its related training data is sparse. That fine-tuning process creates distinct sets of artificial neurons that researchers can see activating when the model encounters the name of a "known entity" (e.g., "Michael Jordan") or an "unfamiliar name" (e.g., "Michael Batkin") in a prompt.

How the Model Works

Activating the "unfamiliar name" feature amid an LLM’s neurons tends to promote an internal "can’t answer" circuit in the model, encouraging it to provide a response starting along the lines of "I apologize, but I cannot…" In fact, the researchers found that the "can’t answer" circuit tends to default to the "on" position in the fine-tuned "assistant" version of the model, making the model reluctant to answer a question unless other active features in its neural net suggest that it should.

Recognition vs. Recall

When the model encounters a well-known term like "Michael Jordan" in a prompt, it activates the "known entity" feature and causes the neurons in the "can’t answer" circuit to be "inactive or more weakly active." Once that happens, the model can dive deeper into its graph of Michael Jordan-related features to provide its best guess at an answer to a question like "What sport does Michael Jordan play?" On the other hand, artificially increasing the neurons’ weights in the "known answer" feature could force the model to confidently hallucinate information about completely made-up athletes like "Michael Batkin."

Understanding Hallucinations

The researchers suggest that "at least some" of the model’s hallucinations are related to a "misfire" of the circuit inhibiting that "can’t answer" pathway—that is, situations where the "known entity" feature (or others like it) is activated even when the token isn’t actually well-represented in the training data. This highlights the importance of fine-tuning and the need for more research into how artificial intelligence models process and respond to different types of input.

Conclusion

In conclusion, the fine-tuning process of artificial intelligence models is crucial in mitigating the problem of inaccurate or unrelated responses. The model’s ability to recognize and respond to "known entities" and "unfamiliar names" is a key aspect of its functionality, and understanding how it works can help improve its performance and reduce hallucinations.

FAQs

Q: What is fine-tuning in artificial intelligence models?
A: Fine-tuning is the process of adjusting the model’s parameters to improve its performance on a specific task or dataset.
Q: What is the "can’t answer" circuit in the model?
A: The "can’t answer" circuit is a mechanism that prevents the model from providing an answer when it is unsure or lacks sufficient information.
Q: What are hallucinations in artificial intelligence models?
A: Hallucinations refer to the model’s tendency to provide false or inaccurate information, often due to a "misfire" of the circuit inhibiting the "can’t answer" pathway.
Q: How can hallucinations be reduced in artificial intelligence models?
A: Hallucinations can be reduced through fine-tuning, improving the model’s training data, and adjusting its parameters to prevent overconfidence in its responses.