본문 바로가기

GPU-KERNEL

다음 추가 개념 리스트

필수 (강력 추천, 당장 익히면 커널 품질이 달라짐)

  1. ILP (Instruction-Level Parallelism)
  2. Shared Memory Tiling / Double Buffering
  3. cp.async (async global → shared copy)
  4. Tensor Core MMA (warp-level matrix multiply)
  5. Warp-level primitives (shfl_sync, ballot_sync)
  6. Latency vs Throughput 모델링
  7. Register Spill (local memory)
  8. Launch Bounds / Occupancy Tuning
  9. Cache Hinting (.cg, .ca, .cs)
  10. Memory Access Locality (sliding window, reuse)

중급 + 알아두면 커널 품질 올라가는 개념

  1. SM Scheduling / persistent threads
  2. Data Prefetch / Pipeline staging
  3. Block shape tuning
  4. Pipeline hazards / dependency chain
  5. Branch predication