# ShubbhRM / Finance-RL

finance_rl ~ bash
14 Experiments
3 Architectures
2 Algorithms
3 Reward Functions

about --me

whoami

$ whoami

ShubbhRM


$ cat bio.txt

ML / AI Engineer with a focus on Reinforcement Learning and Financial Markets. Building trading agents that learn from market data using Deep RL — DQN and PPO — across multiple neural architectures.


$ cat interests.txt

  • Deep Reinforcement Learning (DQN, PPO, DDPG)
  • Financial Markets & Algorithmic Trading
  • Neural Architecture Design (CNN, LSTM, Hybrid)
  • Reward Function Engineering & Sharpe Optimization
  • PyTorch & Stable-Baselines3 ecosystem

$ cat status.txt

Available for collaboration & research

cat project.md

$ cat project.md

# Finance-RL: RL Trading Agents


A systematic study comparing Deep RL algorithms and neural architectures for stock trading. Each experiment varies one axis — architecture, algorithm, or reward signal — to isolate the effect on trading performance.


## Research Questions

  • Does CNN beat MLP for price pattern recognition?
  • Does LSTM memory improve sequential decisions?
  • Does Sharpe reward outperform raw P&L?
  • DQN vs PPO on-policy vs off-policy trade-offs?

## Next: DDPG continuous control | Sentiment signals | Multi-stock

skills --tech-stack

> Core competencies across languages, deep learning, financial ML, and tooling.

# languages
Python 92%
Bash / Shell 60%
SQL 55%
# deep-learning
PyTorch 85%
Stable-Baselines3 88%
CNN (Conv1D/2D) 80%
LSTM / RNN 78%
DQN 90%
PPO / Actor-Critic 82%
# finance-ml
RL Trading Envs (Gym) 87%
Sharpe Ratio Optimization 83%
Feature Engineering 78%
Backtesting 72%
OHLCV Data 85%
# tools
Jupyter Notebook 92%
Git / GitHub 80%
NumPy / Pandas 88%
Matplotlib 80%
Conda / Pip 75%

experiments --list

> 14 experiments across 3 architectures × 2 strategies × 3 reward signals × 2 algorithms. Use filters to navigate.

> Showing 14 / 14 experiments
EXP_001
DQN + MLP + OneStockPolicy + P&L
MLP DQN PnL OneStockPolicy BASELINE

Baseline MLP agent with simple P&L reward. Buy/Sell/Hold one unit at a time.

→ open notebook
EXP_002
DQN + MLP + OneStockPolicy + NetWorth
MLP DQN NetWorth OneStockPolicy

MLP agent optimizing total portfolio net worth rather than per-trade P&L.

→ open notebook
EXP_003
DQN + MLP + AllInPolicy + P&L
MLP DQN PnL AllInPolicy

AllIn strategy — agent commits 100% of portfolio to a position. Maximum aggression.

→ open notebook
EXP_004
DQN + MLP + AllInPolicy + NetWorth
MLP DQN NetWorth AllInPolicy

AllIn strategy with net-worth-centric reward — tracks total portfolio wealth.

→ open notebook
EXP_005
DQN + MLP + Feature Scaling
MLP DQN PnL OneStockPolicy ABLATION

Ablation study: MLP with feature normalization/scaling enabled. Tests whether scaled inputs improve convergence.

→ open notebook
EXP_006
DQN + CNN + OneStockPolicy + P&L
CNN DQN PnL OneStockPolicy

Custom CNN feature extractor for temporal price pattern recognition. Conv1D layers over OHLCV sequences.

→ open notebook
EXP_007
DQN + CNN + OneStockPolicy + Sharpe
CNN DQN Sharpe OneStockPolicy RISK-ADJUSTED

CNN + Sharpe Ratio reward — convolutional features combined with risk-adjusted optimization.

→ open notebook
EXP_008
DQN + CNN + AllInPolicy + P&L
CNN DQN PnL AllInPolicy

CNN with aggressive AllIn trading strategy. Tests whether spatial features support high-conviction entries.

→ open notebook
EXP_009
DQN + CNN + AllInPolicy + Sharpe
CNN DQN Sharpe AllInPolicy ADVANCED

Maximum risk environment: CNN agent making all-in decisions with Sharpe-penalized reward signal.

→ open notebook
EXP_010
DQN + CNN-LSTM + OneStockPolicy + P&L
CNN-LSTM DQN PnL OneStockPolicy HYBRID

Hybrid CNN-LSTM: local Conv1D pattern detection feeding into LSTM(256) for trend memory.

→ open notebook
EXP_011
DQN + CNN-LSTM + AllInPolicy + P&L
CNN-LSTM DQN PnL AllInPolicy

Full sequential CNN-LSTM architecture with aggressive AllIn position sizing.

→ open notebook
EXP_012
PPO + MLP + AllInPolicy + P&L
MLP PPO PnL AllInPolicy PPO

PPO baseline: on-policy proximal policy optimization vs. DQN off-policy. MLP policy network.

→ open notebook
EXP_013
PPO + MLP + AllInPolicy + P&L (Copy)
MLP PPO PnL AllInPolicy PPO

PPO variant / checkpoint run. On-policy learning experiment iteration.

→ open notebook
EXP_014
ML Modifications Experiment
MLP DQN PnL OneStockPolicy RESEARCH

Broader ML modifications experiment — architecture search, hyperparameter tuning, and policy comparisons.

→ open notebook

results --summary

> Experiment matrix. Populate sharpe and pnl fields in _data/experiments.yml after running notebooks.

ID Architecture Algorithm Strategy Reward Badge
EXP_001 MLP DQN OneStockPolicy PnL BASELINE
EXP_002 MLP DQN OneStockPolicy NetWorth
EXP_003 MLP DQN AllInPolicy PnL
EXP_004 MLP DQN AllInPolicy NetWorth
EXP_005 MLP DQN OneStockPolicy PnL ABLATION
EXP_006 CNN DQN OneStockPolicy PnL
EXP_007 CNN DQN OneStockPolicy Sharpe RISK-ADJUSTED
EXP_008 CNN DQN AllInPolicy PnL
EXP_009 CNN DQN AllInPolicy Sharpe ADVANCED
EXP_010 CNN-LSTM DQN OneStockPolicy PnL HYBRID
EXP_011 CNN-LSTM DQN AllInPolicy PnL
EXP_012 MLP PPO AllInPolicy PnL PPO
EXP_013 MLP PPO AllInPolicy PnL PPO
EXP_014 MLP DQN OneStockPolicy PnL RESEARCH
# MLP
Stable-Baselines3 default MlpPolicy. Fully-connected layers (256→128→64→actions). Fast to train, strong baseline.
7 experiments
# CNN
Custom Conv1D feature extractor. Learns local price patterns from OHLCV windows. Lower parameter count than MLP-equiv.
4 experiments

> To populate metrics: extract final Sharpe and P&L values from each notebook and add sharpe: 1.23 & pnl: "+12.4%" fields to _data/experiments.yml.

github --stats

> repository
GitHub Stars GitHub Forks Last Commit Repo Size License
> built-with
Python PyTorch Stable Baselines 3 OpenAI Gym Jupyter Algorithms
> contributor-activity
ShubbhRM GitHub Stats
> top-languages
ShubbhRM Top Languages