IR 기반 실행 완료

기존엔 학습 step 을 CUDA Graph capture / replay 로만 돌렸음

IR 기반 실행은 그와 별개로

compile 결과물인 lowered ops 를 그대로 해석 interpreter
각 op 를 backend.op_call_out 로 eager dispatch
CUDA Graph replay 가 아니라, IR 이 만든 backend op 시퀀스를 직접 실행하는 경로

캡처된 그래프를 재생하는 게 아니라, IR 로 낮춘 실행 계획을 인터프리트하여 커널을 직접 호출하는 실행기

전체 파이프 라인 compile - lower - capture - IRExec

compile ( IR 생성 )

compile_ir 에서 with tracing : step_fn 실행
이때 functional / nn / optim 호출들이 IR 로 기록됨

IR 은 학습 step 의 의미를 보존하는 SSA 그래프가 됨

lower ( IR - backend ops list )

lower_to_backend_ops(ir) 가 backend op list 로 변환
- order 고정
- adam_step 의 grad 입력을 backward 에서 만들어진 grad 로 wiring

capture ( CUDA graph )

compile_and_capture() 에서 warmup 후
backend.capture_begin(); step_fn(); backend.capture_end()
trace 를 켜서 trace_ops 도 확보
이 artifact 는 replay 용 실행물을 가지게 됨

IRExecutor 실행

IRExecutor.from_aartifact(art) 로 생성
art.lowered 를 그대로 순서대로 돌며
각 op 를 backend.op_call_out() 로 eager dispatch

핵심은 runtime env 바인딩이 추가된 것

IR 만으로는 실제 텐서 포인터를 모름

IRExecutor 가 실행되려면

IRValue id 마다 대응하는 실제 torch.Tensor 핸들이 필요함

env 를 만드는 방식

compile_and_capture 는 env 를 만들지 않음
대신 Module.compile 이
- compile_and_capture 로 artifact 생성
- _build_env_exact 로 env 구성
- art.attach_env 로 붙임

즉 env 바인딩은 프레임워크 레벨에서 정확히 하도록 옮긴 상태

각 추가된 파일

executor.py - IRExecutor 추가

입력 : ir, lowered, env, backend
실행 : run 이 lowered 를 순회하며 bk.op_call_out() 호출

이것이 IR 기반 실행 경로 그 자체

artifact.py - env 보관 기능 추가

compileartifact에

env : Dict
attach_env / runtime_env 추가

IRExecutor 는 runtime_env 로 env 를 받음

module.py - env 빌더 포함

Module.compile 이

auto train_step 생성
compile_and_capture 호출
deterministic binding 구성
artifact 에 env attach

즉, IRExecutor 가 쓸 env 를 프레임워크가 책임지고 정확히 바인딩하게 된 것

compile.py - lowering + capture

lowering 순서 고정
backward 에서 생성한 grad 를 adam_step 입력으로 연결
compile_and_capture 는 캡처 / 트레이스만 담당

trace.py - torch tensor scaler 캐시

tracing 중 torch.Tensor를
- data_ptr 기반 key 로 cache
그래서 BiasCorr 출력 스칼라들이 tracing 에서 재사용 / 연결될 수 있음

adam.py & funtional.adam_step_

tracing 경로에서 AdamStep 을 항상 emit
runtime 경로에서 capture 중 grad None 이면 fail fast

SSA / alias 규칙 필요성

IR / Lowering 는 SSA 처럼 출력 vid 가 새 값이지만,

실제 학습은 대부분 in-place update 를 함

bias_add 는 y 에 누적
step_inc 는 step 텐서를 in-place 증가
adam_step 은 p / m / v 를 in-place 업데이트

그래서 env 바인딩에서 반드시 해줘야 하는 규칙

bias_add
step_inc
adam_step

이걸 안 하면

replay 는 정상인데
IRExecutor 는 새 버퍼에 써버리거나 업데이트가 실제 파라미터에 반영 안 됨

완전 bitwise 동일이 아닌 allclose 가 필요한 이유

IRExecutor 는 eager 로 동일 op 를 호출하지만 완전 동일 보장 못함

'AI Compiler framework' 카테고리의 다른 글

6.2 성공 케이스 문서화 : Warmup (no drift) + CUDA Graph Replay == IRExecutor (0)	2026.01.12
AICF_fw / core 역할 문서 (0)	2026.01.12
IR 기반 실행 변경 로드맵 (0)	2026.01.10
현재 IR 상태 요약 - IR 기반 실행은 아님, 변화 필요 (0)	2026.01.10
테스트 코드 내 임의 구현한 기능을 프레임 워크로 승격 ( validate, train_state, artifact ) (0)	2026.01.09

뜻 지, 가르칠 훈

IR 기반 실행 완료

전체 파이프 라인 compile - lower - capture - IRExec

compile ( IR 생성 )

lower ( IR - backend ops list )

capture ( CUDA graph )

IRExecutor 실행

핵심은 runtime env 바인딩이 추가된 것

env 를 만드는 방식

각 추가된 파일

executor.py - IRExecutor 추가

artifact.py - env 보관 기능 추가

module.py - env 빌더 포함

compile.py - lowering + capture

trace.py - torch tensor scaler 캐시

adam.py & funtional.adam_step_

SSA / alias 규칙 필요성

완전 bitwise 동일이 아닌 allclose 가 필요한 이유

'AI Compiler framework' 카테고리의 다른 글

티스토리툴바

IR 기반 실행 완료

전체 파이프 라인 compile - lower - capture - IRExec

compile ( IR 생성 )

lower ( IR - backend ops list )

capture ( CUDA graph )

IRExecutor 실행

핵심은 runtime env 바인딩이 추가된 것

env 를 만드는 방식

각 추가된 파일

executor.py - IRExecutor 추가

artifact.py - env 보관 기능 추가

module.py - env 빌더 포함

compile.py - lowering + capture

trace.py - torch tensor scaler 캐시

adam.py & funtional.adam_step_

SSA / alias 규칙 필요성

완전 bitwise 동일이 아닌 allclose 가 필요한 이유

'AI Compiler framework' 카테고리의 다른 글

'AI Compiler framework' Related Articles

티스토리툴바