HACKATHON PROJECT · LOCAL LLM · RAG SYSTEM

0 Files Indexed
0% Cloud Dependency
0B Parameters Qwen2.5-Coder
<0s Retrieval Latency
3 Task Modes

RAG Architecture

Five-stage retrieval-augmented generation pipeline — fully offline, runs on your local GPU

💬
Query
User prompt enters via Streamlit UI with mode selection (Gen / Debug / GPU)
app.py
📐
Embed
Query encoded by all-MiniLM-L6-v2 into a 384-dim L2-normalised vector
SentenceTransformers
🗄️
Retrieve
ChromaDB cosine search returns top-K chunks from 8,528 indexed cuDF files
ChromaDB
📝
Prompt
Task-specific template injects retrieved context as grounding evidence for the LLM
prompt_templates.py
Generate
Qwen2.5-Coder-14B on LM Studio returns grounded, GPU-first code answers
LM Studio · localhost:1234
🔒
Fully Offline
No API keys. No cloud calls. Everything runs on localhost.
📚
Grounded Answers
Every response is anchored to real cuDF source code, not hallucination.
⚙️
Context-Configurable
Slider control for retrieved chunks (2–8) balances depth vs. speed.

Three Specialised Modes

Each mode uses a purpose-built prompt template, tuned for its specific task

Data Ingestion Pipeline

From raw source files to a queryable vector index — fully automated

.ipynb
One chunk per notebook cell, tagged by type (code / markdown)
nbformat
.py
Split on def / class boundaries via regex. Tagged: function / class / top-level
regex
.md
Split on blank lines or # heading markers. Tagged: markdown
splitlines
.pdf
Page extraction then blank-line splitting. Tagged: pdf
PyPDF2
01
📂
Ingest & Chunk
auto_chunker.py
Walks dataset/ and splits every file by type. Notebooks → cells. Python → function/class boundaries. Markdown → headings. PDFs → paragraphs.
dataset/ (8,528 files) chunks.jsonl
nbformat regex PyPDF2
02
🧬
Embed & Index
build_index.py
Loads chunks.jsonl, encodes every chunk with all-MiniLM-L6-v2, L2-normalises embeddings, then upserts into ChromaDB in batches of 128.
chunks.jsonl chroma_db/ (persistent)
SentenceTransformers ChromaDB tqdm
03
🔎
Query & Retrieve
rag_utils.py
Encodes the user query, performs cosine similarity search in ChromaDB, returns top-K chunks (default 4) with source file metadata.
User query string Context + metadata list
ChromaDB cosine similarity
04
📝
Build Prompt
prompt_templates.py
Selects one of three task-specific prompt templates (Gen / Debug / GPU) and injects the retrieved context as grounding evidence.
Context + mode selection Full prompt string
f-strings Jinja-like templates
05
LLM Inference
llm_utils.py
Sends the prompt to Qwen2.5-Coder-14B via LM Studio's OpenAI-compatible endpoint at localhost:1234. Returns streamed completion.
Full prompt string Generated code / hints
openai<1.0 LM Studio Qwen2.5-Coder-14B
8,528
Source Files
128
Embed Batch Size
384
Embedding Dims
all-MiniLM-L6-v2
Encoder Model

Technology Stack

Carefully chosen tools — all open-source, all running locally

🧠 LLM
Qwen2.5-Coder-14B
State-of-the-art 14B parameter code LLM, runs locally via LM Studio. Zero cloud dependency.
🗄️ Vector DB
ChromaDB
Persistent vector database storing dense embeddings of 8,528 cuDF source files.
📐 Embeddings
SentenceTransformers
all-MiniLM-L6-v2 encodes queries and document chunks into 384-dim embedding space.
GPU
RAPIDS cuDF
GPU-accelerated DataFrame library — the entire source repo forms the RAG knowledge base.
🖥️ Inference
LM Studio
OpenAI-compatible local inference server. Runs Qwen2.5-Coder at localhost:1234.
🎛️ UI
Streamlit
Interactive web UI with mode selector, chunk-count slider, and source citations.
🔥 ML
PyTorch
Deep learning backbone for embedding computation and GPU tensor operations.
🔌 API
OpenAI SDK (v0.x)
Legacy openai library used to speak the OpenAI protocol with LM Studio's local server.

Get Started

Clone, index, and query — three commands to a local GPU code tutor

Python 3.10+ LM Studio ChromaDB RAPIDS cuDF Streamlit GitHub Stars
bash — LLM_Hackathon setup
# 1. Clone & install
git clone https://github.com/ShubbhRM/LLM_Hackathon.git
cd LLM_Hackathon
pip install -r requirements.txt

# 2. Build the vector index (one-time, ~5–10 min)
python auto_chunker.py   # chunks 8,528 cuDF files
python build_index.py    # embeds & stores in ChromaDB

# 3. Start LM Studio with Qwen2.5-Coder-14B at localhost:1234
# 4. Launch the app
streamlit run app.py
Project structure
LLM_Hackathon/
├── app.py               ← Streamlit UI (mode selector, chunk slider)
├── auto_chunker.py      ← Multi-format document ingestion
├── build_index.py       ← Embedding + ChromaDB indexing
├── rag_utils.py         ← Retrieval module (used by app.py)
├── rag_query.py         ← Standalone CLI query tool
├── llm_utils.py         ← LM Studio OpenAI-compat wrapper
├── prompt_templates.py  ← Three task-specific prompt templates
├── dataset/
│   └── cudf/            ← RAPIDS cuDF source (8,528 files — knowledge base)
├── chunks.jsonl          ← Generated: chunked documents
└── chroma_db/            ← Generated: persistent vector store