본문 바로가기

GPU-KERNEL

다음 추가 개념 리스트

명징직조지훈 2025. 11. 18. 17:15

필수 (강력 추천, 당장 익히면 커널 품질이 달라짐)

ILP (Instruction-Level Parallelism)
Shared Memory Tiling / Double Buffering
cp.async (async global → shared copy)
Tensor Core MMA (warp-level matrix multiply)
Warp-level primitives (shfl_sync, ballot_sync)
Latency vs Throughput 모델링
Register Spill (local memory)
Launch Bounds / Occupancy Tuning
Cache Hinting (.cg, .ca, .cs)
Memory Access Locality (sliding window, reuse)

중급 + 알아두면 커널 품질 올라가는 개념

SM Scheduling / persistent threads
Data Prefetch / Pipeline staging
Block shape tuning
Pipeline hazards / dependency chain
Branch predication

'GPU-KERNEL' 카테고리의 다른 글

GEMM 커널 테스트 코드 작성 및 최적화 (1024^3 크기 실험) (0)	2025.11.18
ILP ( Instruction - Level Parallelism ) (0)	2025.11.18
l1_l2_temporal_locality_test (0)	2025.11.18
퀴즈 — Register Pressure & Occupancy & Compiler Behavior (0)	2025.11.18
occupancy_reg_pressure_test_v3 - 실제 register pressure 에 따른 occupancy 감소 확인, (0)	2025.11.18

티스토리툴바