• About Us
  • Contact Us
  • Terms & Conditions
  • Privacy Policy
Technology Hive
  • Home
  • Technology
  • Artificial Intelligence (AI)
  • Cyber Security
  • Machine Learning
  • More
    • Deep Learning
    • AI in Healthcare
    • AI Regulations & Policies
    • Business
    • Cloud Computing
    • Ethics & Society
No Result
View All Result
  • Home
  • Technology
  • Artificial Intelligence (AI)
  • Cyber Security
  • Machine Learning
  • More
    • Deep Learning
    • AI in Healthcare
    • AI Regulations & Policies
    • Business
    • Cloud Computing
    • Ethics & Society
No Result
View All Result
Technology Hive
No Result
View All Result
Home Technology

BERT-Based Multi-Hop Question Answering

Linda Torries – Tech Writer & Digital Trends Analyst by Linda Torries – Tech Writer & Digital Trends Analyst
May 13, 2025
in Technology
0
BERT-Based Multi-Hop Question Answering
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter

Introduction

In a previous article, we addressed a critical limitation of today’s Retrieval-Augmented Generation (RAG) systems: missing contextual information due to independent chunking. However, this is just one of RAG’s shortcomings. Another significant challenge is multi-hop question answering (QA), which involves integrating evidence spread across multiple chunks to derive a complete and accurate response. This means the system must gather and reason over information distributed throughout the document — often spanning non-contiguous sections — to arrive at the correct answer.

The Challenge of Multi-Hop QA

Multi-hop QA tasks demand that multiple pieces of evidence from different sections of a document (or even multiple documents) be combined to answer a question. Some examples are straightforward. For instance:
“Compare iPhone 14 and iPhone 16 characteristics.” This can be handled using two independent retrievals: one for “iPhone 14” and one for “iPhone 16.” But others are more complex, such as causal chains: “Give me a detailed comparison of the latest OSX versions.” In this case, a model must first determine what the latest OSX versions are (e.g., Sonama, Catalina), and then perform separate retrievals or queries for each version before aggregating and comparing the information. This multi-step querying process is computationally expensive and slow.

A Simpler Alternative: Single-Model Chunk Scoring

What if we could avoid query generation altogether and directly score all document chunks against the original question using a single machine learning model? This approach solves three core RAG problems:

  • Eliminates the need for query generation
  • Mitigates the downsides of independent chunking
  • Enables multi-hop reasoning in a single pass

The Nuance: Chunk Embeddings Instead of Token Embeddings

A common objection is: wouldn’t scoring all chunks at once require large context windows, the very problem RAG was meant to solve? Here’s the twist: we replace individual token embeddings with chunk embeddings — each chunk being up to 200 tokens long. With a sequence length of 500, this setup effectively allows us to encode 100,000 tokens’ worth of content.

Training Setup

Training was conducted on a desktop PC with an 8GB GPU for a few hours. The setup was as follows:

  • Embedding model: jinaai/jina-embeddings-v3
  • Base model for fine-tuning: google/bert-base-uncased
  • Input format: [question, chunk_i_1, chunk_i_2, …, chunk_j_1, chunk_j_2, …] where chunk_i_1 is the first chunk embedding of document_i.
  • Labels: [0, label_i_1, label_i_2, …, label_j_2, …] where label_i_x = 1 if chunk_i_x contributes to the answer, else 0
  • Chunks are shuffled across documents for each mini-batch, but order within a document is preserved
    Training parameters:
  • Batch size: 16
  • Precision: FP32
  • Learning rate: 1e-5
  • Epochs: 100
  • Average sequence length: 511
  • Loss function: Binary Cross-Entropy

Dataset

A hybrid dataset was selected for training, composed of several sources:
Testset: 100 samples from HotpotQA

Results

Recall@10 was used as the performance metric — how many relevant chunks were retrieved in the top 10 candidates.
Baseline comparisons:

  1. Raw cosine similarity using jinaai embeddings
  2. Cohere Rerank API — which evaluates relevance using a learned scoring model rather than relying solely on cosine similarity. Cosine similarity, while efficient, often fails to capture semantic nuances between a question and a passage. To address this, Cohere and similar companies have developed reranking models that assign a more accurate relevance score based on contextual understanding of both the query and the retrieved chunk (one by one).
    Sample output:

    • Relevant chunks: [1, 312, 313]
    • Top 10 candidates: [1, 313, 312, 2, 188, 46, 183, 325, 91, 149]
    • Recall@10: 3/3

Open-Sourcing the Code

Want to try this yourself? We’ve open-sourced the code with training and testing scripts, including integrations with Cohere Rerank and cosine similarity baselines.

Future Work

This experiment used bert-base-uncased, which has a limited input size of 512 tokens. Next steps:

  • Experiment with larger models: bert-large, DeBERTa, Longformer, etc.
  • Jointly fine-tune the embedding and scoring models to enable end-to-end optimization. This allows the embedding space to evolve in a way that is directly aligned with the downstream scoring task, potentially improving both chunk relevance and overall retrieval accuracy.
  • Move beyond chunk scoring: fine-tune a large language model such as LLaMA 3 to reason directly over chunk embeddings and generate answers end-to-end. This enables the model to not only retrieve relevant content but also synthesize and articulate coherent multi-hop responses in a single step.

Conclusion

Transformer-based language models continue to impress. Over the recent experiments, we observed they can efficiently adapt to new modalities like character probabilities or chunk embeddings very quickly, and their adaptability to complex tasks like multi-hop reasoning is fascinating.

FAQs

Q: What is multi-hop question answering?
A: Multi-hop question answering involves integrating evidence spread across multiple chunks to derive a complete and accurate response.
Q: What is the challenge of multi-hop QA?
A: The challenge of multi-hop QA is that it requires the system to gather and reason over information distributed throughout the document — often spanning non-contiguous sections — to arrive at the correct answer.
Q: What is the proposed solution?
A: The proposed solution is to use a single machine learning model to directly score all document chunks against the original question.
Q: What is chunk embedding?
A: Chunk embedding is a technique that replaces individual token embeddings with chunk embeddings — each chunk being up to 200 tokens long.
Q: What is the future work?
A: The future work includes experimenting with larger models, jointly fine-tuning the embedding and scoring models, and moving beyond chunk scoring to generate answers end-to-end.

Previous Post

Copyright Office Head Fired After Reporting AI Training Isn’t Always Fair Use

Next Post

New Pope Chose Name Based on AI Threats to Human Dignity

Linda Torries – Tech Writer & Digital Trends Analyst

Linda Torries – Tech Writer & Digital Trends Analyst

Linda Torries is a skilled technology writer with a passion for exploring the latest innovations in the digital world. With years of experience in tech journalism, she has written insightful articles on topics such as artificial intelligence, cybersecurity, software development, and consumer electronics. Her writing style is clear, engaging, and informative, making complex tech concepts accessible to a wide audience. Linda stays ahead of industry trends, providing readers with up-to-date analysis and expert opinions on emerging technologies. When she's not writing, she enjoys testing new gadgets, reviewing apps, and sharing practical tech tips to help users navigate the fast-paced digital landscape.

Related Posts

Google Generates Fake AI Podcast From Search Results
Technology

Google Generates Fake AI Podcast From Search Results

by Linda Torries – Tech Writer & Digital Trends Analyst
June 13, 2025
Meta Invests  Billion in Scale AI to Boost Disappointing AI Division
Technology

Meta Invests $15 Billion in Scale AI to Boost Disappointing AI Division

by Linda Torries – Tech Writer & Digital Trends Analyst
June 13, 2025
Drafting a Will to Avoid Digital Limbo
Technology

Drafting a Will to Avoid Digital Limbo

by Linda Torries – Tech Writer & Digital Trends Analyst
June 13, 2025
AI Erroneously Blames Airbus for Fatal Air India Crash Instead of Boeing
Technology

AI Erroneously Blames Airbus for Fatal Air India Crash Instead of Boeing

by Linda Torries – Tech Writer & Digital Trends Analyst
June 12, 2025
AI Chatbots Tell Users What They Want to Hear
Technology

AI Chatbots Tell Users What They Want to Hear

by Linda Torries – Tech Writer & Digital Trends Analyst
June 12, 2025
Next Post
New Pope Chose Name Based on AI Threats to Human Dignity

New Pope Chose Name Based on AI Threats to Human Dignity

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Latest Articles

Creating Smart Hospitals through Technology and Human Collaboration

Creating Smart Hospitals through Technology and Human Collaboration

April 8, 2025
A Geometric Model of Cosmological Redshift via Angular Geometry in a Static Universe

A Geometric Model of Cosmological Redshift via Angular Geometry in a Static Universe

February 25, 2025
Tackling Healthcare’s Unstructured Data with AI

Tackling Healthcare’s Unstructured Data with AI

June 5, 2025

Browse by Category

  • AI in Healthcare
  • AI Regulations & Policies
  • Artificial Intelligence (AI)
  • Business
  • Cloud Computing
  • Cyber Security
  • Deep Learning
  • Ethics & Society
  • Machine Learning
  • Technology
Technology Hive

Welcome to Technology Hive, your go-to source for the latest insights, trends, and innovations in technology and artificial intelligence. We are a dynamic digital magazine dedicated to exploring the ever-evolving landscape of AI, emerging technologies, and their impact on industries and everyday life.

Categories

  • AI in Healthcare
  • AI Regulations & Policies
  • Artificial Intelligence (AI)
  • Business
  • Cloud Computing
  • Cyber Security
  • Deep Learning
  • Ethics & Society
  • Machine Learning
  • Technology

Recent Posts

  • Best Practices for AI in Bid Proposals
  • Artificial Intelligence for Small Businesses
  • Google Generates Fake AI Podcast From Search Results
  • Technologies Shaping a Nursing Career
  • AI-Powered Next-Gen Services in Regulated Industries

Our Newsletter

Subscribe Us To Receive Our Latest News Directly In Your Inbox!

We don’t spam! Read our privacy policy for more info.

Check your inbox or spam folder to confirm your subscription.

© Copyright 2025. All Right Reserved By Technology Hive.

No Result
View All Result
  • Home
  • Technology
  • Artificial Intelligence (AI)
  • Cyber Security
  • Machine Learning
  • AI in Healthcare
  • AI Regulations & Policies
  • Business
  • Cloud Computing
  • Ethics & Society
  • Deep Learning

© Copyright 2025. All Right Reserved By Technology Hive.

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?