PyTorch SDPA Attention Benchmark

PyTorch는 scaled_dot_product_attention을 제공한다.

import torch.nn.functional as F

out = F.scaled_dot_product_attention(
    q,
    k,
    v,
    is_causal=True,
)

SDPA는 같은 attention 의미를 더 최적화된 backend로 실행할 수 있다. 환경과 shape에 따라 math, memory-efficient, FlashAttention 계열 backend가 선택될 수 있다.

비교는 manual attention과 같은 조건에서 한다.

same B, H, T, Dh
same dtype
same causal setting
same dropout setting
same correctness tolerance

중요한 질문은 “SDPA가 빠른가?”가 아니라 “어떤 shape와 dtype에서 어떤 backend가 이득을 주는가?”이다.

확인