• About Us
  • Contact Us
  • Terms & Conditions
  • Privacy Policy
Technology Hive
  • Home
  • Technology
  • Artificial Intelligence (AI)
  • Cyber Security
  • Machine Learning
  • More
    • Deep Learning
    • AI in Healthcare
    • AI Regulations & Policies
    • Business
    • Cloud Computing
    • Ethics & Society
No Result
View All Result
  • Home
  • Technology
  • Artificial Intelligence (AI)
  • Cyber Security
  • Machine Learning
  • More
    • Deep Learning
    • AI in Healthcare
    • AI Regulations & Policies
    • Business
    • Cloud Computing
    • Ethics & Society
No Result
View All Result
Technology Hive
No Result
View All Result
Home Technology

Transfer Learning with BERT for Multi-Hop Question Answering

Linda Torries – Tech Writer & Digital Trends Analyst by Linda Torries – Tech Writer & Digital Trends Analyst
August 29, 2025
in Technology
0
Transfer Learning with BERT for Multi-Hop Question Answering
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter

Introduction to Multi-Hop Question Answering

In previous articles, we addressed a critical limitation of today’s Retrieval-Augmented Generation (RAG) systems: missing contextual information due to independent chunking. However, this is just one of RAG’s shortcomings. Another significant challenge is multi-hop question answering (QA), which involves integrating evidence spread across multiple chunks to derive a complete and accurate response. This means the system must gather and reason over information distributed throughout the document — often spanning non-contiguous sections — to arrive at the correct answer.

The Challenge of Multi-Hop QA

Multi-hop QA tasks demand that multiple pieces of evidence from different sections of a document (or even multiple documents) be combined to answer a question. Some examples are straightforward. For instance:
“Compare iPhone 14 and iPhone 16 characteristics.” This can be handled using two independent retrievals: one for “iPhone 14” and one for “iPhone 16.” But others are more complex, such as causal chains: “Give me a detailed comparison of the latest OSX versions.” In this case, a model must first determine what the latest OSX versions are (e.g., Sonama, Catalina), and then perform separate retrievals or queries for each version before aggregating and comparing the information. This multi-step querying process is computationally expensive and slow.

A Simpler Alternative: Single-Model Chunk Scoring

What if we could avoid query generation altogether and directly score all document chunks against the original question using a single machine learning model? This approach solves three core RAG problems:

  • Eliminates the need for query generation
  • Mitigates the downsides of independent chunking
  • Enables multi-hop reasoning in a single pass

The Nuance: Chunk Embeddings Instead of Token Embeddings

A common objection is: wouldn’t scoring all chunks at once require large context windows, the very problem RAG was meant to solve? Here’s the twist: we replace individual token embeddings with chunk embeddings — each chunk being up to 200 tokens long. With a sequence length of 500, this setup effectively allows us to encode 100,000 tokens’ worth of content.

Training Setup

Training was conducted on a desktop PC with an 8GB GPU for a few hours. The setup was as follows:

  • Embedding model: jinaai/jina-embeddings-v3
  • Base model for fine-tuning: google/bert-base-uncased
  • Input format: [question, chunk_i_1, chunk_i_2, …, chunk_j_1, chunk_j_2, …] where chunk_i_1 is the first chunk embedding of document_i.
  • Labels: [0, label_i_1, label_i_2, …, label_j_2, …] where label_i_x = 1 if chunk_i_x contributes to the answer, else 0
  • Chunks are shuffled across documents for each mini-batch, but order within a document is preserved

Training Parameters

  • Batch size: 16
  • Precision: FP32
  • Learning rate: 1e-5
  • Epochs: 100
  • Average sequence length: 511
  • Loss function: Binary Cross-Entropy

Dataset

A hybrid dataset was selected for training, composed of several sources:
Testset: 100 samples from HotpotQA

Results

Recall@10 was used as the performance metric — how many relevant chunks were retrieved in the top 10 candidates.
Baseline comparisons:

  1. Raw cosine similarity using jinaai embeddings
  2. Cohere Rerank API — which evaluates relevance using a learned scoring model rather than relying solely on cosine similarity. Cosine similarity, while efficient, often fails to capture semantic nuances between a question and a passage. To address this, Cohere and similar companies have developed reranking models that assign a more accurate relevance score based on contextual understanding of both the query and the retrieved chunk (one by one).
    Sample output:

    • Relevant chunks: [1, 312, 313]
    • Top 10 candidates: [1, 313, 312, 2, 188, 46, 183, 325, 91, 149]
    • Recall@10: 3/3

Open-Sourcing the Code

Want to try this yourself? We’ve open-sourced the code with training and testing scripts, including integrations with Cohere Rerank and cosine similarity baselines.

Future Work

This experiment used bert-base-uncased, which has a limited input size of 512 tokens. Next steps:

  • Experiment with larger models: bert-large, DeBERTa, Longformer, etc.
  • Jointly fine-tune the embedding and scoring models to enable end-to-end optimization. This allows the embedding space to evolve in a way that is directly aligned with the downstream scoring task, potentially improving both chunk relevance and overall retrieval accuracy.
  • Move beyond chunk scoring: fine-tune a large language model such as LLaMA 3 to reason directly over chunk embeddings and generate answers end-to-end. This enables the model to not only retrieve relevant content but also synthesize and articulate coherent multi-hop responses in a single step.

Final Thoughts

Transformer-based language models continue to impress. Over the recent experiments, we observed they can efficiently adapt to new modalities like character probabilities or chunk embeddings very quickly, and their adaptability to complex tasks like multi-hop reasoning is fascinating.

Code Snippets

Chunk Embedding Generation (JinaAI)

# Example code for generating question and chunk embeddings
emb_cache = {}
embedding_model = AutoModel.from_pretrained("jinaai/jina-embeddings-v3", trust_remote_code=True)
embedding_model.eval()
embedding_model.cuda()
for step, (question, chunks_list, labels_list) in enumerate(dataset):
    print(f"rEmb: {step}/10k", end="", flush=True)
    h = hashf(question)
    if h not in emb_cache: emb_cache[h] = embedding_model.encode([question], task="retrieval.query")[0]
    for chunks in chunks_list: 
        for chunk in chunks:
            h = hashf(chunk)
            if h not in emb_cache: emb_cache[h] = embedding_model.encode([chunk], task="retrieval.passage")[0]

Cohere Rerank API

# Example usage of Cohere Rerank API
import cohere
co = cohere.Client('your-api-key')
documents_reranked = co.rerank(model="rerank-v3.5", query=question, documents=documents, top_n=10)
top_indices = [r.index for r in documents_reranked.results]

Data Collator (Batch Creator)


# Example code for custom data collator
class DataCollator:
    def shuffle_lists(self, a, b):
        combined = list(zip(a, b)) # Pair corresponding elements
        random.shuffle(combined) # Shuffle the pairs
        a_shuffled, b_shuffled = zip(*combined) # Unzip after shuffling
        return list(a_shuffled), list(b_shuffled)

    def __call__(self, features) -> Dict[str, torch.Tensor]:
        batch = {"input_values": [], "labels":[], "position_ids":{}}
        for x in features:
            question, labels_list, chunks_list = x["question"], x["labels_list"], x["chunks_list"]
            question_emb, chunks_emb = emb_cache[hashf(question)], []
            for chunks in chunks_list:
                chunks_emb.append( [emb_cache[hashf(chunk)] for chunk in chunks] )
            chunks_emb, labels_list = self.shuffle_lists(chunks_emb, labels_list)

            input_values, labels, position_ids = [question_emb], [0], [0]
            for i, embs in enumerate(chunks_emb): 
                for idx, emb in enumerate(embs):
                    input_values.append(emb)
                    position_ids.append(idx+1)
                    labels += labels_list[i]

            input_values, labels, position_ids = torch.tensor(input_values), torch.tensor(labels), torch.tensor(position_ids, dtype=torch.long)
            batch["input_values"].append(input_values)
            batch["labels"].append(labels)
            batch["position_ids"].append(position_ids)

        batch["input_values"] = pad_sequence(batch["input_values"], batch_first=True, padding_value=0) #B,S,C
        batch["labels"] = pad_sequence(batch["labels"], batch_first=True, padding_value=0)
        batch["position_ids"] = pad_sequence(batch["position_ids"], batch_first=True, padding_value=0)
        return batch

## Conclusion
In this article, we discussed the challenges of multi-hop question answering and proposed a simpler alternative using single-model chunk scoring. We also provided code snippets for chunk embedding generation, Cohere Rerank API, and data collator. Our experiment showed promising results, and we plan to explore further improvements in the future.

## FAQs
Q: What is multi-hop question answering?
A: Multi-hop question answering is a type of question answering task that requires integrating evidence from multiple sources to derive a complete and accurate response.
Q: What is the challenge of multi-hop QA?
A: The challenge of multi-hop QA is that it requires the model to gather and reason over information distributed throughout the document, often spanning non-contiguous sections, to arrive at the correct answer.
Q: What is the proposed solution?
A: The proposed solution is to use single-model chunk scoring, which eliminates the need for query generation, mitigates the downsides of independent chunking, and enables multi-hop reasoning in a single pass.
Q: What is the role of chunk embeddings?
A: Chunk embeddings replace individual token embeddings and allow the model to encode larger context windows, enabling it to capture more semantic nuances between the question and the passage.
Q: What is the future work?
A: The future
Previous Post

Modern Data Analytics and Engineering Techniques With Python

Next Post

Local AI for Everyone

Linda Torries – Tech Writer & Digital Trends Analyst

Linda Torries – Tech Writer & Digital Trends Analyst

Linda Torries is a skilled technology writer with a passion for exploring the latest innovations in the digital world. With years of experience in tech journalism, she has written insightful articles on topics such as artificial intelligence, cybersecurity, software development, and consumer electronics. Her writing style is clear, engaging, and informative, making complex tech concepts accessible to a wide audience. Linda stays ahead of industry trends, providing readers with up-to-date analysis and expert opinions on emerging technologies. When she's not writing, she enjoys testing new gadgets, reviewing apps, and sharing practical tech tips to help users navigate the fast-paced digital landscape.

Related Posts

Pulling Real-Time Website Data into Google Sheets
Technology

Pulling Real-Time Website Data into Google Sheets

by Linda Torries – Tech Writer & Digital Trends Analyst
September 14, 2025
AI-Powered Agents with LangChain
Technology

AI-Powered Agents with LangChain

by Linda Torries – Tech Writer & Digital Trends Analyst
September 14, 2025
AI Hype vs Reality
Technology

AI Hype vs Reality

by Linda Torries – Tech Writer & Digital Trends Analyst
September 14, 2025
XAI: Graph Neural Networks
Technology

XAI: Graph Neural Networks

by Linda Torries – Tech Writer & Digital Trends Analyst
September 13, 2025
REFRAG Delivers 30× Faster RAG Performance in Production
Technology

REFRAG Delivers 30× Faster RAG Performance in Production

by Linda Torries – Tech Writer & Digital Trends Analyst
September 13, 2025
Next Post
Local AI for Everyone

Local AI for Everyone

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Latest Articles

A faster way to solve complex planning problems

A faster way to solve complex planning problems

April 16, 2025
Broadcom Enhances VMware Platform for Simplified Private Cloud Management

Broadcom Enhances VMware Platform for Simplified Private Cloud Management

June 18, 2025
VMware starts down the AI route, but it’s not core business

VMware starts down the AI route, but it’s not core business

September 11, 2025

Browse by Category

  • AI in Healthcare
  • AI Regulations & Policies
  • Artificial Intelligence (AI)
  • Business
  • Cloud Computing
  • Cyber Security
  • Deep Learning
  • Ethics & Society
  • Machine Learning
  • Technology
Technology Hive

Welcome to Technology Hive, your go-to source for the latest insights, trends, and innovations in technology and artificial intelligence. We are a dynamic digital magazine dedicated to exploring the ever-evolving landscape of AI, emerging technologies, and their impact on industries and everyday life.

Categories

  • AI in Healthcare
  • AI Regulations & Policies
  • Artificial Intelligence (AI)
  • Business
  • Cloud Computing
  • Cyber Security
  • Deep Learning
  • Ethics & Society
  • Machine Learning
  • Technology

Recent Posts

  • Pulling Real-Time Website Data into Google Sheets
  • AI-Powered Agents with LangChain
  • AI Hype vs Reality
  • XAI: Graph Neural Networks
  • REFRAG Delivers 30× Faster RAG Performance in Production

Our Newsletter

Subscribe Us To Receive Our Latest News Directly In Your Inbox!

We don’t spam! Read our privacy policy for more info.

Check your inbox or spam folder to confirm your subscription.

© Copyright 2025. All Right Reserved By Technology Hive.

No Result
View All Result
  • Home
  • Technology
  • Artificial Intelligence (AI)
  • Cyber Security
  • Machine Learning
  • AI in Healthcare
  • AI Regulations & Policies
  • Business
  • Cloud Computing
  • Ethics & Society
  • Deep Learning

© Copyright 2025. All Right Reserved By Technology Hive.

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?