Sequence Parallelism

Sequence Parallelism (SP)

Progress 0%

Step 1 of 5

Project Name

Sequence-dim sharding for LayerNorm/dropout, paired with TP. Pure PyTorch.