HACKATHON PROJECT · LOCAL LLM · RAG SYSTEM
RAG Architecture
Five-stage retrieval-augmented generation pipeline — fully offline, runs on your local GPU
Three Specialised Modes
Each mode uses a purpose-built prompt template, tuned for its specific task
Code Generation
GENGenerate well-commented ML/data science code using retrieved cuDF context. Prefers GPU-accelerated RAPIDS and PyTorch idioms.
Use ONLY retrieved context. Prefer cuDF over pandas, PyTorch .cuda() over CPU ops. Generate well-commented, production-ready code.
import cudf
def rolling_gpu_mean(df, col, window):
return df[col].rolling(window=window).mean()
Bug Hints
DEBUGIdentify bugs and inefficiencies in your code. Returns hints and guidance — not full solutions — to help you learn and fix issues yourself.
Give ONLY hints and pointers — do NOT write the solution. Identify type errors, CPU↔GPU data copies, inefficient loops.
HINT: Check if you're calling .to_pandas() before groupby — this copies data back to CPU, negating GPU benefits.
GPU Optimisation
CUDASuggest GPU rewrites for CPU-bound code. Maps pandas → cuDF, NumPy → CuPy, and explains complexity improvements with CUDA thread-level reasoning.
Rewrite CPU code using cuDF, CuPy, or PyTorch CUDA. Explain speedup in terms of warp parallelism and memory bandwidth.
# GPU (fast): cuDF merge
gdf1.merge(gdf2, on='id') # GPU-parallel hash join
Data Ingestion Pipeline
From raw source files to a queryable vector index — fully automated
def / class boundaries via regex. Tagged: function / class / top-level# heading markers. Tagged: markdownauto_chunker.pybuild_index.pyrag_utils.pyprompt_templates.pyllm_utils.pyTechnology Stack
Carefully chosen tools — all open-source, all running locally
Get Started
Clone, index, and query — three commands to a local GPU code tutor
# 1. Clone & install
git clone https://github.com/ShubbhRM/LLM_Hackathon.git
cd LLM_Hackathon
pip install -r requirements.txt
# 2. Build the vector index (one-time, ~5–10 min)
python auto_chunker.py # chunks 8,528 cuDF files
python build_index.py # embeds & stores in ChromaDB
# 3. Start LM Studio with Qwen2.5-Coder-14B at localhost:1234
# 4. Launch the app
streamlit run app.py
LLM_Hackathon/
├── app.py ← Streamlit UI (mode selector, chunk slider)
├── auto_chunker.py ← Multi-format document ingestion
├── build_index.py ← Embedding + ChromaDB indexing
├── rag_utils.py ← Retrieval module (used by app.py)
├── rag_query.py ← Standalone CLI query tool
├── llm_utils.py ← LM Studio OpenAI-compat wrapper
├── prompt_templates.py ← Three task-specific prompt templates
├── dataset/
│ └── cudf/ ← RAPIDS cuDF source (8,528 files — knowledge base)
├── chunks.jsonl ← Generated: chunked documents
└── chroma_db/ ← Generated: persistent vector store