nano-vLLM Benchmark Baseline

마지막 수정: 2026년 6월 30일

inferencenano-vllmbenchmark

Kernel을 바꾸기 전에 baseline이 있어야 한다.

nano-vLLM에서 최소 측정할 값은 네 가지다.

TTFT: first token까지 걸린 시간
TPOT: output token당 평균 시간
throughput: tokens/sec
peak memory: 최대 메모리

실험 조건도 같이 기록한다.

model size
batch/request count
prompt length
output length
dtype
device

이 baseline이 있어야 다음 path에서 kernel swap이 실제 serving 성능을 바꿨는지 판단할 수 있다.

확인