Template Library

Pick a blueprint and let the config snap it into place.

PyTorch Module

PyTorch Generator

Generate a ready-to-run PyTorch project in seconds.

11 flags 7 files

View Blueprint

Distillation — PyTorch

Distillation with PyTorch

Distill GPT-4/Claude/Gemini into a GPT-2 student built from scratch. Full control over every tensor.

15 flags 11 files

View Blueprint

Distillation — HuggingFace

Distillation with HuggingFace

Distill API teachers into a pretrained HF student (DistilGPT2, GPT-2, etc.) using HF Trainer.

14 flags 9 files

View Blueprint

Fine-tuning — HuggingFace

SFT with HuggingFace TRL

Fine-tune any HF model with SFTTrainer. LoRA/QLoRA, 4-bit/8-bit quantization, packing, chat templates, push to Hub.

33 flags 6 files

View Blueprint

Fine-tuning — Unsloth

SFT with Unsloth

Fine-tune 4x faster with Unsloth. 4-bit LoRA, RSLoRA, gradient checkpointing, GGUF export.

30 flags 6 files

View Blueprint

RL — HuggingFace

RLHF with HuggingFace TRL

Preference tuning with DPO, GRPO, PPO, ORPO, or KTO. Full TRL trainer configs, LoRA/QLoRA support.

36 flags 6 files

View Blueprint

RL — Unsloth

RLHF with Unsloth

DPO, GRPO, or ORPO at 4x speed with Unsloth. 4-bit LoRA, merged/GGUF export, push to Hub.

30 flags 6 files

View Blueprint

Data Parallelism

Data Parallelism (DP)

Pure PyTorch distributed data parallelism: replicate model, shard batches, all-reduce gradients across GPUs.

12 flags 8 files

View Blueprint

ZeRO Optimizer

ZeRO (Stages 1-3)

ZeRO optimizer/gradient/parameter sharding with pure PyTorch. Select stage 1, 2, or 3 to trade communication for memory.

13 flags 9 files

View Blueprint

Tensor Parallelism

Tensor Parallelism (TP)

Split weight matrices across GPUs with column/row parallel linears and attention head splitting. Pure PyTorch.

12 flags 9 files

View Blueprint

Pipeline Parallelism

Pipeline Parallelism (PP)

Split model layers across GPUs with micro-batch pipelining. GPipe (fill-drain) and 1F1B schedules. Pure PyTorch.

13 flags 9 files

View Blueprint

Sequence Parallelism

Sequence Parallelism (SP)

Shard sequence dimension for LayerNorm/dropout, paired with tensor parallelism. Reduces activation memory. Pure PyTorch.

12 flags 10 files

View Blueprint

Expert Parallelism

Expert Parallelism (EP)

Mixture-of-Experts with all-to-all token dispatch, top-k routing, load balancing loss, and shared experts. Pure PyTorch.

16 flags 9 files

View Blueprint

HuggingFace Pipeline

HuggingFace Generator

Generate a HuggingFace pipeline project with model loading, inference, and configuration.

6 flags 5 files

View Blueprint

GPU Kernel Blueprint

Custom CUDA Kernel

Generate custom CUDA kernels with automatic benchmark harness, Makefile, and PyTorch integration.

4 flags 4 files

View Blueprint