AI-Powered Lead Qualification Chatbot Using LLM and Embedded Context
We built a modular AI solution that streamlines lead qualification using LLM and an embedded vector database. Users interact through a user-friendly Streamlit interface, entering natural prompts to assess lead quality and identify key decision-makers. The system preprocesses input, retrieves relevant context from CRM, LinkedIn, and third-party data via Supabase, and delivers intelligent, context-aware responses. Workflow automation with n8n ensures prompt handling and follow-ups. The solution improves sales targeting, reduces manual triage, and enhances LLM accuracy by over 80% through grounded context retrieval.
Hamza
5/8/20242 min read
Sector: Sales and Marketing
Duration: February 2025 - Present
Work Delivered:
Streamlit Front-End Development:
Developed an intuitive Streamlit UI where users input natural language prompts.
Designed clean UX for non-technical users to interact seamlessly with the lead qualification chatbot.
Enabled fast iteration and testing of user queries without requiring technical knowledge.
Natural Language Preprocessing:
Implemented a preprocessing layer to clean and normalize user input.
Applied regex-based sanitization, entity extraction, and noise reduction to improve LLM prompt quality.
Ensured compatibility with the prompt engineering strategy to maximize LLM performance.
Embedded Database Creation and Management:
Used Supabase to build and manage an embedded vector database from multi-source inputs: LinkedIn data, CRM exports, and third-party data feeds.
Transformed raw CRM and LinkedIn data using embedding modules before ingestion.
Enabled semantic search and similarity-based context retrieval for LLM prompts.
LLM Integration and Contextual Prompting:
Integrated an LLM with context-aware retrieval to enhance answer quality and relevance.
Retrieved top-k relevant records based on user intent via vector similarity from the embedded database.
Generated output classifying whether the lead is qualified, and identified key decision-makers within the organization.
Workflow Automation using n8n:
Built an event-driven automation using n8n that triggers the LLM workflow based on user prompt submissions.
Included branching logic for handling multiple input types (e.g., leads vs. companies) and responses (qualified vs. disqualified).
Connected the workflow to backend APIs for logging and follow-up actions.
Pressure Points / Challenges:
LLM Response Accuracy:
Early models struggled with understanding vague or ambiguous prompts.
Added preprocessing and structured context prompts to boost classification precision.
Data Context Relevance:
Initially returned irrelevant or outdated context from the embedded database.
Tuned embedding logic and similarity scoring to improve precision in top-k retrieval.
Hallucinated LLM Responses:
The LLM sometimes generated fabricated company roles or incorrect lead information, especially when context was sparse or mismatched.
Solved this by:
Enhancing embedded database categorization (e.g., tagging records by company, industry, role level).
Ensuring top-k retrieval included accurate metadata about leads.
Introducing a confidence threshold and fallback logic to ask for clarification when ambiguity remained.
Scalability of Contextual Data:
Vector database required performance tuning as data volume increased.
Introduced chunking and partitioning strategies to maintain sub-second retrieval times.
Program/Project Overview:
Scope:
Automate and scale lead qualification and prioritization using AI.
Identify qualified leads and decision-makers based on structured and unstructured data inputs.
Deliver immediate insights to sales and marketing teams for faster engagement.
Collaboration:
Worked with client’s sales and marketing heads to define qualification criteria.
Partnered with CRM admins and data providers to standardize input formats for embedding.
Promises:
Automated Qualification:
All lead prompts analyzed in real-time with LLM-assisted classification.
Standardized Lead Scoring:
Same logic applied to every user query and lead data, reducing bias and subjectivity.
Faster Time to Engagement:
Sales reps get near-instant insight into whether a lead is worth pursuing.
Modular and Scalable Design:
Easy onboarding of new data sources and models with minimal effort.
Problems and Pains (In-Project):
Embedding Data Volume:
CRM and LinkedIn datasets resulted in large vector sizes.
Solved via chunking and compression techniques during vector generation.
Prompt Drift:
Users sometimes input vague or open-ended prompts, which confused the LLM.
Addressed with a standardized set of instructions in the preprocessing layer.
Hallucinated Responses by LLM:
Example: The LLM incorrectly stated that a lead was a VP at a company based on unrelated name matches.
Resolution:
Context was strictly scoped using the embedded database's role and organization filters.
Only embeddings with verified metadata (e.g., role, company match, email domains) were used to construct prompts.
Reduced hallucinations by over 80% after contextual alignment enhancements.
Solution:
Built a modular architecture where components (frontend, LLM, vector DB, automation) operate independently but integrate seamlessly.
Used Supabase to manage both the collector database (raw inputs) and the embedded database (vectorized data).
Streamlined query flow to fetch the most relevant context for LLM-based classification and decision-maker extraction.
Payoffs:
Improved Conversion Rates by focusing on better-qualified leads.
Smarter Sales Targeting with key decision-maker insights baked into output.
LLM Accuracy Confidence: Over 80% reduction in hallucinated or fabricated responses thanks to grounded context.

