Reinforcement learning for robotic manipulation

New York University · 2025 – present · Role: implementation and experiments

ROLE & KEY CONTRIBUTIONS

Solo, ongoing reinforcement-learning work on manipulation — core value-based algorithms implemented from scratch, with the environments to train them.

Implemented Q-Learning and DQN from scratch, including replay buffer, target network, exploration schedules, and reward shaping;
Built MuJoCo environments for a drawer-opening manipulation task;
Stood up a working end-to-end RL pipeline, updated as the research grows.

Overview

To understand RL algorithms at the implementation level — not just as library calls — I built Q-Learning and Deep Q-Networks (DQN) from scratch in Python and applied them to robotic-arm manipulation experiments, including drawer-opening tasks.

What's inside

Tabular Q-Learning and DQN implemented from first principles: replay buffer, target network, epsilon-greedy exploration schedules, and reward shaping for sparse manipulation rewards.
MuJoCo simulation environments for the manipulation tasks, connecting my mechanical-design background to the learning pipeline: the same workspace analysis used for hardware validation defines the RL task space.
Drawer-opening as the benchmark task — a contact-rich problem where naive reward design fails and the agent must sequence reaching, grasping, and pulling.

Why it matters for my work

Mechanical designers who understand learning-based control design different hardware: actuation that is torque-transparent, mechanisms whose state is observable, structures that survive the exploration phase. This project is my bridge between the two worlds.