Re-materializable Intermediate

Method Overview

많은 연산 그래프에서 intermediate tensor 는 다음 두 가지 중 하난이다.

두 번째 경우에는 intermediate 를 저장하는 대신 필요할 때 다시 계산한느 전략을 사용할 수 있다.

이를 rematerialization 또는 recompute strategy 라고 한다.

핵심 trade-off 는 다음이다.

Memory usage 감소
Compute cost 증가

하지만 GPU 에서는 종종 다음이 성립한다.

HBM access cost > recompute cost

따라서 intermediate 를 저장하는 것보다 다시 계산하는 것이 더 빠를 수 있다.

Re-materialization 이 가능한 이유는 deterministic recomputation 이다.

즉 intermediate 가 다음 조건을 만족해야 한다.

Y = f(X)

그리고

f(X)

를 언제든 다시 계산할 수 있어야 한다.

예

GELU(x)
exp(x)
sin(x)

반면 다음은 rematerialization 이 어렵다.

random operations
stateful updates
external side effects

HBM access 는 GPU 에서 가장 비싼 연산 중 하나이다.

예

HBM bandwidth ~ 1TB/s
compute FLOPs ~ 100TFLOPS

즉 많은 경우

recompute cost << memory load cost

따라서 다음 전략이 가능해진다.

store intermediate -> remove
recompute when needed

대표 사례

MCIR property

Property : rematerializable_intermediate

Legality

- detterministic function
- side-effect free

Rewrite

store intermediate
-> recompute on demand

Kernel mapping

producer op inline recomputation

Streaming Weighted Reduction - FlashAttention generalization (1)	2026.03.12
Tile-Compatible Compute (0)	2026.03.12
Online Reducible Norm - Welford 기반 Streaming Statistics (0)	2026.03.12
Streaming Algorithms in Deep Learning Operators (0)	2026.03.11
Mathematical Properties Behind FlashAttention - Streaming Transformations for Memory-Efficient Computation (0)	2026.03.11