Gemini Code Assistant | Imran Khalid

Overview

Custom AI agent leveraging Retrieval-Augmented Generation to provide context-aware coding assistance and documentation search, increasing developer productivity by 30%.

The Problem

Developers spend excessive time searching through documentation, Stack Overflow, and codebases. Generic AI assistants lack project-specific context and often provide outdated or incorrect information.

The Solution

Built a RAG-based chatbot that indexes project documentation, API references, and internal codebases into a vector database. Uses Gemini Pro for natural language understanding and code generation, with retrieval ensuring responses are grounded in actual documentation.

Project Gallery

Technical Architecture

Retrieval-Augmented Generation system with vector search and LLM integration

Document Ingestion Pipeline

Processes markdown, code files, and API docs into chunked embeddings

Vector Database (Chroma)

Stores and retrieves relevant documentation chunks based on semantic similarity

Gemini Pro Integration

Generates contextual responses using retrieved documentation as context

Code Execution Sandbox

Safely executes generated code snippets for validation

Methodology

Indexed 50K+ documentation pages and 100K+ lines of code
Chunking strategy: 512 tokens with 50-token overlap
Embeddings: text-embedding-004 (768 dimensions)
Retrieval: Top-5 chunks with MMR for diversity
Prompt engineering: Few-shot examples for code generation

Results & Impact

94% Response Accuracy Verified against ground truth

+30% Developer Productivity Measured by task completion time

1.2s Query Response Time Average end-to-end latency

4.7/5 User Satisfaction Internal team rating

Key Impact

Reduced documentation search time from 15 min to 2 min
Onboarding time for new developers cut by 40%
Consistent code style through AI-suggested patterns
Adopted by 25+ developers in organization

Challenges & Solutions

Hallucination Prevention

Strict retrieval filtering and confidence thresholds; cite sources in responses

Code Context Understanding

AST parsing and dependency graph analysis for better code comprehension

API Rate Limits

Caching layer for common queries and exponential backoff retry logic

Key Implementation

RAG Query Pipeline

class RAGCodeAssistant:
    def __init__(self, vector_db, llm):
        self.vector_db = vector_db
        self.llm = llm
    
    def query(self, user_question):
        # Retrieve relevant documentation
        docs = self.vector_db.similarity_search(
            query=user_question,
            k=5,
            filter={"type": "documentation"}
        )
        
        # Build context from retrieved docs
        context = "\n\n".join([
            f"Source: {doc.metadata['source']}\n{doc.page_content}"
            for doc in docs
        ])
        
        # Generate response with context
        prompt = f"""You are a coding assistant. Use the following documentation to answer the question.
        
Documentation:
{context}

Question: {user_question}

Answer with code examples where appropriate. Cite your sources."""
        
        response = self.llm.generate(prompt)
        
        return {
            "answer": response,
            "sources": [doc.metadata['source'] for doc in docs]
        }

Technologies Used

PythonLangChainGemini ProChromaDBFastAPIStreamlitSentence TransformersAST Parser