Administrator
Published on 2025-07-13 / 1 Visits
0
0

Architecture and Design of a Banking AI Assistant

This note summarizes the system design and technical insights from building a banking customer service assistant powered by retrieval-augmented generation (RAG) and tool-calling.


1. Project Context

In the banking industry, customer service teams often handle large volumes of repetitive queries, especially around account management, credit cards, and loans. While traditional FAQ systems rely on keyword matching or rule-based responses, they tend to:

  • Misunderstand intent

  • Provide irrelevant answers

  • Lack integration with backend systems

To solve these issues, we designed a hybrid system combining semantic retrieval and LLM-based reranking, with support for API-triggering capabilities when needed.


2. System Architecture


3. Component Breakdown

🧾 a. FAQ + API-Linked Knowledge Base

The knowledge base is stored in JSONL format, where each item includes:

{
  "question": "How do I apply for a credit card?",
  "answer": "You can apply via mobile banking, our website, or at any branch...",
  "api": "/apply-credit-card"
}

The presence of the api field allows downstream processes to determine whether an automated action can be taken.


🧠 b. Embedding & Vector Search with FAISS

  • Documents are chunked and embedded using the BAAI/bge-base-zh model.

  • FAISS is used to build a high-performance local vector index.

  • For any incoming user query, the system retrieves Top-K semantically similar Q&A items.


🎯 c. Prompt-Based Reranking (Index Selector)

To avoid hallucinated responses and keep answers grounded:

  • Retrieved Q&A pairs are labeled numerically (0, 1, 2, …).

  • A structured prompt is passed to the LLM asking it to return only the number of the best-matching item.

This pattern turns the language model into a semantic ranker, not a generator—ideal for enterprise settings where control and consistency are critical.


🤖 d. Language Model Inference

The system uses an OpenAI-compatible model (e.g. Qwen) to rerank the results and identify the best response based on intent and similarity.

Fallback logic is implemented: if no suitable match is found, a default response such as

“Sorry, I’m unable to determine the answer at the moment.”

is returned.


🔁 e. Structured Output for API Integration

The final response is returned in a structured JSON format:

{
  "answer": "...",
  "api": "/apply-credit-card",
  "related_questions": ["How to apply for a debit card?", "How to apply for a credit loan?"]
}

The frontend or backend can then:

  • Display the answer directly

  • Offer related queries to the user

  • Automatically trigger the API if the field is present and authorized


4. Key Learnings

✅ What Worked Well

  • Reduced hallucination risk via controlled prompt selection

  • Clean separation between NLU and execution layers

  • Ready-to-integrate JSON structure for downstream systems

  • Strong support for Chinese Q&A scenarios using bge embeddings

🔄 Future Improvements

  • Multi-turn context tracking for dialogue continuity

  • Graph-based reasoning (e.g., GraphRAG) for complex Q&A

  • Native function-calling integration for auto-execution


5. Final Thoughts

Working on this project gave me firsthand experience designing a scalable, controllable AI assistant architecture for enterprise use. It deepened my understanding of RAG pipelines, prompt engineering, and how to connect natural language understanding with real-world business logic.

In the future, I aim to explore more advanced agent-based architectures that combine memory, planning, and tool execution into a unified AI assistant capable of real workflow automation.


Comment