MCIR Specification - Memory-Centric Intermediate Representation

1. Design Goal

MCIR 은 memory-aware computation structure 를 표현하기 위한 IR 이다.

일반적인 IR 은 연산을 다음과 같이 표현한다.

Y = f(X)

하지만 GPU 실행에서 중요한 것은 단순한 연산이 아니라 다음이다.

어떤 intermediate 가 저장되는가
어떤 state 가 streaming 가능한가
어떤 computation 이 tile 내부에 닫히는가.

따라서 MCIR 은 연산을 다음처럼 표현한다.

Y = f(X)
with memory perperties

즉

Operator semantics
+
Memory behavior

를 동시에 표현하는 IR 이다.

2. Core Abstraction

MCIR 의 기본 단위는 Node 이다.

각 Node 는 다음 정보를 가진다.

Node {
    op
    inputs
    outputs
    attributes
    memory_properties
}

예

Node {
    op: matmul
    inputs: [A, B]
    outputs: [C]

    memory_properties:
        tile_compatible_compute
}

3. Memory Property Schema

Memory optimization 을 표현하기 위해 MCIR 은 property system 을 사용한다.

현재 정의된 property 는 다음 네 가지이다.

online_reducible_norm
weighted_streaming_reduction
rematerializable_intermediate
tile_compatible_compute

각 property 는 다음 구조를 가진다.

Property {
    name
    legality_conditions
    rewrite_rule
    lowering_strategy
}

4. Property Definition

4.1 online_reducible_norm

의미

통계 계산이 streaming 방식으로 수행될 수 있다.

Property {
    name
    legality_conditions
    rewrite_rule
    lowering_strategy
}

legality

statistics state mergeable
update rule associative

rewrite

tow_pass_norm
-? streaming_norm

4.2 weighted_streaming_reduction

의미

정규화된 weighted reduction 이 streaming 방식으로 계산 가능하다.

대표 예

softmax(QK^T)V

Node {
    op: weighted_reduce
    inputs: [scores, values]

    properties:
        weighted_streaming_reduction
}

legality

rescaling invariant
normalization meraeable

rewrite

materialized attention
-? streaming attention

4.3 rematerializable_intermediate

의미

intermediate tensor 를 저장하지 않고 재계산 가능하다

Node {
    op: gelu
    inputs: [X]

    properties:
        rematerializable_intermediate
}

legality

pure function
deterministic
side-effect free

rewrite

store intermediate
-> recompute

4.4 tile_compatible_compute

의미

연산이 tile 내부에서 닫힌 실행 구조를 만들 수 있다.

Node {
    op: gelu
    inputs: [X]

    properties:
        rematerializable_intermediate
}

legality

dependency closure within tile
working-set fits on-chip

rewrite

naive loops
-> tiled schedule

5. Graph Representation

MCIR 은 directed graph 형태를 가진다.

Tensor -> Node -> Tensor

예

Q,K,V
   ↓
matmul
   ↓
softmax
   ↓
matmul
   ↓
output

하지만 MCIR 에서는 property 가 붙는다

attention

properties:
    weighted_streaming_reduction
    tile_compatible_compute

.따라서 다음 rewrite 가 가능하다.

3 nodes
→ fused streaming attention

6. Memory State Model

MCIR 은 intermediate state 를 세 가지로 분류한다.

meterialized
streaming
recomputable

materialized

global memory 저장

streaming

state update
tile processing

recomputable

필요할 때 recompute

7.Rewrite System

MCIR rewrite 는 다음 규칙을 따른다.

pattern detection
→ property annotation
→ rewrite

예

softmax + matmul
→ weighted_streaming_reduction

또는

elementwise chain
→ rematerialization

8. Example

Attention graph

Original graph

QKᵀ
→ softmax
→ matmul

MCIR graph

attention_op

properties:
    weighted_streaming_reduction
    tile_compatible_compute

Rewrite

materialized attention
→ streaming attention kernel

9. Lowering Interface

MCIR node 는 backend lowering 에 다음 정보를 전달한다.

KernelPlan {
    tiling_strategy
    recompute_policy
    streaming_state
}

예

KernelPlan {
    tiling_strategy: block_tile
    recompute_policy: enabled
    streaming_state: softmax_state
}

10. Compiler Pipeline

AICF compilation pipeline

Model definition
↓
Graph capture
↓
MCIR generation
↓
Property detection
↓
Graph rewrite
↓
Kernel plan
↓
CUDA kernel generation

11. MCIR vs Traditional IR

operator semantics	yes	yes
memory behavior	implicit	explicit
streaming reduction	hidden	explicit
recompute policy	heuristic	property

12. Final Design Principle

MCIR 의 핵심 철학은 다음이다.

memory optimization
≠ kernel trick

대신

memory optimization
=
computation structure transformation

즉

compute graph
-> memory aware execution graph

로 변환하는 것이다.

AICF IR Stack

전체 구조는 다음과 같다

Pattern Catalog
↓
Memory Property System
↓
MCIR
↓
Graph Rewrite
↓
Kernel Generation

'Memory-Centric IR for AICF' 카테고리의 다른 글

AICF Memory IR Design - MCIR Memory Property System (0)	2026.03.12
Memory Optimization Pattern Catalog (1)	2026.03.12
Streaming Weighted Reduction - FlashAttention generalization (1)	2026.03.12
Tile-Compatible Compute (0)	2026.03.12
Re-materializable Intermediate (0)	2026.03.12

뜻 지, 가르칠 훈

MCIR Specification - Memory-Centric Intermediate Representation

1. Design Goal

2. Core Abstraction

3. Memory Property Schema

4. Property Definition

4.1 online_reducible_norm

의미

4.2 weighted_streaming_reduction

의미

4.3 rematerializable_intermediate

의미

4.4 tile_compatible_compute

의미

5. Graph Representation

6. Memory State Model

7.Rewrite System

8. Example

9. Lowering Interface

10. Compiler Pipeline

11. MCIR vs Traditional IR

12. Final Design Principle

AICF IR Stack

'Memory-Centric IR for AICF' 카테고리의 다른 글

티스토리툴바

MCIR Specification - Memory-Centric Intermediate Representation

1. Design Goal

2. Core Abstraction

3. Memory Property Schema

4. Property Definition

4.1 online_reducible_norm

의미

4.2 weighted_streaming_reduction

의미

4.3 rematerializable_intermediate

의미

4.4 tile_compatible_compute

의미

5. Graph Representation

6. Memory State Model

7.Rewrite System

8. Example

9. Lowering Interface

10. Compiler Pipeline

11. MCIR vs Traditional IR

12. Final Design Principle

AICF IR Stack

'Memory-Centric IR for AICF' 카테고리의 다른 글

'Memory-Centric IR for AICF' Related Articles

티스토리툴바