LLM Hackathon — Local RAG Code Tutor | Fully offline RAG system powered by Qwen2.5-Coder-14B. Indexes 8,528 cuDF files into ChromaDB to answer GPU/ML code questions locally.

RAG Architecture

Five-stage retrieval-augmented generation pipeline — fully offline, runs on your local GPU

💬

Query

User prompt enters via Streamlit UI with mode selection (Gen / Debug / GPU)

app.py

📐

Embed

Query encoded by all-MiniLM-L6-v2 into a 384-dim L2-normalised vector

SentenceTransformers

🗄️

Retrieve

ChromaDB cosine search returns top-K chunks from 8,528 indexed cuDF files

ChromaDB

📝

Prompt

Task-specific template injects retrieved context as grounding evidence for the LLM

prompt_templates.py

⚡

Generate

Qwen2.5-Coder-14B on LM Studio returns grounded, GPU-first code answers

LM Studio · localhost:1234

🔒

Fully Offline

No API keys. No cloud calls. Everything runs on localhost.

📚

Grounded Answers

Every response is anchored to real cuDF source code, not hallucination.

⚙️

Context-Configurable

Slider control for retrieved chunks (2–8) balances depth vs. speed.

Three Specialised Modes

Each mode uses a purpose-built prompt template, tuned for its specific task

⚡

Code Generation

GEN

Generate well-commented ML/data science code using retrieved cuDF context. Prefers GPU-accelerated RAPIDS and PyTorch idioms.

Prompt Template

Use ONLY retrieved context. Prefer cuDF over pandas, PyTorch .cuda() over CPU ops. Generate well-commented, production-ready code.

INPUT Write a function to compute rolling mean on a GPU DataFrame

OUTPUT

import cudf
def rolling_gpu_mean(df, col, window):
    return df[col].rolling(window=window).mean()

🔍

Bug Hints

DEBUG

Identify bugs and inefficiencies in your code. Returns hints and guidance — not full solutions — to help you learn and fix issues yourself.

Prompt Template

Give ONLY hints and pointers — do NOT write the solution. Identify type errors, CPU↔GPU data copies, inefficient loops.

INPUT Why is my cuDF groupby slower than pandas?

OUTPUT

HINT: Check if you're calling .to_pandas() before groupby — this copies data back to CPU, negating GPU benefits.

🚀

GPU Optimisation

CUDA

Suggest GPU rewrites for CPU-bound code. Maps pandas → cuDF, NumPy → CuPy, and explains complexity improvements with CUDA thread-level reasoning.

Prompt Template

Rewrite CPU code using cuDF, CuPy, or PyTorch CUDA. Explain speedup in terms of warp parallelism and memory bandwidth.

INPUT Optimise this pandas merge for large datasets

OUTPUT

# GPU (fast): cuDF merge
gdf1.merge(gdf2, on='id')  # GPU-parallel hash join

Data Ingestion Pipeline

From raw source files to a queryable vector index — fully automated

.ipynb

One chunk per notebook cell, tagged by type (code / markdown)

nbformat

.py

Split on def / class boundaries via regex. Tagged: function / class / top-level

regex

.md

Split on blank lines or # heading markers. Tagged: markdown

splitlines

.pdf

Page extraction then blank-line splitting. Tagged: pdf

PyPDF2

01

📂

Ingest & Chunk

auto_chunker.py

Walks dataset/ and splits every file by type. Notebooks → cells. Python → function/class boundaries. Markdown → headings. PDFs → paragraphs.

dataset/ (8,528 files) → chunks.jsonl

nbformat regex PyPDF2

02

🧬

Embed & Index

build_index.py

Loads chunks.jsonl, encodes every chunk with all-MiniLM-L6-v2, L2-normalises embeddings, then upserts into ChromaDB in batches of 128.

chunks.jsonl → chroma_db/ (persistent)

SentenceTransformers ChromaDB tqdm

03

🔎

Query & Retrieve

rag_utils.py

Encodes the user query, performs cosine similarity search in ChromaDB, returns top-K chunks (default 4) with source file metadata.

User query string → Context + metadata list

ChromaDB cosine similarity

04

📝

Build Prompt

prompt_templates.py

Selects one of three task-specific prompt templates (Gen / Debug / GPU) and injects the retrieved context as grounding evidence.

Context + mode selection → Full prompt string

f-strings Jinja-like templates

05

⚡

LLM Inference

llm_utils.py

Sends the prompt to Qwen2.5-Coder-14B via LM Studio's OpenAI-compatible endpoint at localhost:1234. Returns streamed completion.

Full prompt string → Generated code / hints

openai<1.0 LM Studio Qwen2.5-Coder-14B

8,528

Source Files

128

Embed Batch Size

384

Embedding Dims

all-MiniLM-L6-v2

Encoder Model

Technology Stack

Carefully chosen tools — all open-source, all running locally

🧠 LLM

Qwen2.5-Coder-14B

State-of-the-art 14B parameter code LLM, runs locally via LM Studio. Zero cloud dependency.

🗄️ Vector DB

ChromaDB

Persistent vector database storing dense embeddings of 8,528 cuDF source files.

📐 Embeddings

SentenceTransformers

all-MiniLM-L6-v2 encodes queries and document chunks into 384-dim embedding space.

⚡ GPU

RAPIDS cuDF

GPU-accelerated DataFrame library — the entire source repo forms the RAG knowledge base.

🖥️ Inference

LM Studio

OpenAI-compatible local inference server. Runs Qwen2.5-Coder at localhost:1234.

🎛️ UI

Streamlit

Interactive web UI with mode selector, chunk-count slider, and source citations.

🔥 ML

PyTorch

Deep learning backbone for embedding computation and GPU tensor operations.

🔌 API

OpenAI SDK (v0.x)

Legacy openai library used to speak the OpenAI protocol with LM Studio's local server.

Get Started

Clone, index, and query — three commands to a local GPU code tutor

bash — LLM_Hackathon setup

# 1. Clone & install
git clone https://github.com/ShubbhRM/LLM_Hackathon.git
cd LLM_Hackathon
pip install -r requirements.txt

# 2. Build the vector index (one-time, ~5–10 min)
python auto_chunker.py   # chunks 8,528 cuDF files
python build_index.py    # embeds & stores in ChromaDB

# 3. Start LM Studio with Qwen2.5-Coder-14B at localhost:1234
# 4. Launch the app
streamlit run app.py

Project structure

LLM_Hackathon/
├── app.py               ← Streamlit UI (mode selector, chunk slider)
├── auto_chunker.py      ← Multi-format document ingestion
├── build_index.py       ← Embedding + ChromaDB indexing
├── rag_utils.py         ← Retrieval module (used by app.py)
├── rag_query.py         ← Standalone CLI query tool
├── llm_utils.py         ← LM Studio OpenAI-compat wrapper
├── prompt_templates.py  ← Three task-specific prompt templates
├── dataset/
│   └── cudf/            ← RAPIDS cuDF source (8,528 files — knowledge base)
├── chunks.jsonl          ← Generated: chunked documents
└── chroma_db/            ← Generated: persistent vector store