“Smart Coach” Helps LLMs Switch Between Text and Code

Introduction to Large Language Models

Large language models (LLMs) are excellent at using textual reasoning to understand the context of a document and provide a logical answer about its contents. However, these same LLMs often struggle to correctly answer even the simplest math problems. Textual reasoning is usually a less-than-ideal way to deliberate over computational or algorithmic tasks. While some LLMs can generate code like Python to handle symbolic queries, the models don’t always know when to use code, or what kind of code would work best.

The Need for Guidance

LLMs, it seems, may need a coach to steer them toward the best technique. This is where CodeSteer comes in, a smart assistant developed by MIT researchers that guides an LLM to switch between code and text generation until it correctly answers a query. CodeSteer, itself a smaller LLM, automatically generates a series of prompts to iteratively steer a larger LLM. It reviews the model’s current and previous answers after each round and provides guidance for how it can fix or refine that solution until it deems the answer is correct.

How CodeSteer Works

The researchers found that augmenting a larger LLM with CodeSteer boosted its accuracy on symbolic tasks, like multiplying numbers, playing Sudoku, and stacking blocks, by more than 30 percent. CodeSteer works in conjunction with the larger LLM, first reviewing a query and determining whether text or code is suitable for this problem, and which sort of code would be best. Then it generates a prompt for the larger LLM, telling it to use a coding method or textual reasoning to answer the query.

Benefits of CodeSteer

The larger model follows this prompt to answer the query and sends the result back to CodeSteer, which reviews it. If the answer is not correct, CodeSteer will continue prompting the LLM to try different things that might fix the problem, such as incorporating a search algorithm or constraint into its Python code, until the answer is correct. This advance could improve the problem-solving capabilities of LLMs for complex tasks that are especially difficult to solve with textual reasoning alone, such as generating paths for robots in uncertain environments or scheduling shipments in an international supply chain.

Tackling Complex Tasks

As the researchers designed CodeSteer, they couldn’t find suitable symbolic datasets to fine-tune and test the model, since many existing benchmarks don’t point out whether a certain query could be best solved with text or code. So, they gathered a corpus of 37 complex symbolic tasks, including spatial reasoning, mathematics, order reasoning, and optimization, and built their own dataset, called SymBench. They implemented a fine-tuning approach that leverages SymBench to maximize the performance of CodeSteer.

Results and Future Directions

In their experiments, CodeSteer outperformed all nine baseline methods they evaluated and boosted average accuracy from 53.3 percent to 86.4 percent. It maintains similar performance even on unseen tasks, and on a variety of LLMs. In addition, a general-purpose model augmented with CodeSteer can achieve higher accuracy than state-of-the-art models designed to focus on complex reasoning and planning, while requiring much less computation. The researchers want to streamline CodeSteer to speed up its iterative prompting process and study how to effectively fine-tune a unified model with the ability to switch between textual reasoning and code generation.

Conclusion

CodeSteer is a significant advancement in the field of large language models, as it enables LLMs to improve their problem-solving capabilities for complex tasks. By guiding LLMs to switch between code and text generation, CodeSteer can help LLMs achieve higher accuracy and efficiency. This technology has the potential to be applied to a wide range of tasks, from generating paths for robots to scheduling shipments in an international supply chain.

FAQs

Q: What is CodeSteer?
A: CodeSteer is a smart assistant developed by MIT researchers that guides a large language model (LLM) to switch between code and text generation until it correctly answers a query.
Q: How does CodeSteer work?
A: CodeSteer works in conjunction with a larger LLM, reviewing a query and determining whether text or code is suitable for the problem, and generating a prompt for the larger LLM to use a coding method or textual reasoning to answer the query.
Q: What are the benefits of CodeSteer?
A: CodeSteer can improve the problem-solving capabilities of LLMs for complex tasks, achieving higher accuracy and efficiency.
Q: What is SymBench?
A: SymBench is a dataset of 37 complex symbolic tasks, including spatial reasoning, mathematics, order reasoning, and optimization, built by the researchers to fine-tune and test CodeSteer.
Q: What are the future directions of CodeSteer?
A: The researchers want to streamline CodeSteer to speed up its iterative prompting process and study how to effectively fine-tune a unified model with the ability to switch between textual reasoning and code generation.