CUDA 이상치 문제 해결해보기...

INFO:dev.models.sequential:✅ 레이어 추가됨: Flatten (input_shape=(1, 1, 2), output_shape=(1, 2))
INFO:dev.models.sequential:✅ 레이어 추가됨: Dense (input_shape=(1, 2), output_shape=(1, 4))
INFO:dev.models.sequential:✅ 레이어 추가됨: Activation (input_shape=(1, 4), output_shape=(1, 4))
INFO:dev.models.sequential:✅ 레이어 추가됨: Dense (input_shape=(1, 4), output_shape=(1, 1))
INFO:dev.models.sequential:✅ 레이어 추가됨: Activation (input_shape=(1, 1), output_shape=(1, 1))
[Dense] bias_id=dense_2706329399712_b, bias_shape=(1, 4)
[Dense] bias_id=dense_2706309270128_b, bias_shape=(1, 1)

=== [Graph E] 계산 그래프 ===
[0] type=5, input=input, output=flatten_2706327946592_out
[1] type=0, input=flatten_2706327946592_out, output=dense_2706329399712_linear
[2] type=1, input=dense_2706329399712_linear, output=dense_2706329399712_out
[ADD] input=dense_2706329399712_linear + param=dense_2706329399712_b -> output=dense_2706329399712_out
[3] type=3, input=dense_2706329399712_out, output=activation_2706329589456_out
[4] type=0, input=activation_2706329589456_out, output=dense_2706309270128_linear
[5] type=1, input=dense_2706309270128_linear, output=dense_2706309270128_out
[ADD] input=dense_2706309270128_linear + param=dense_2706309270128_b -> output=dense_2706309270128_out
[6] type=3, input=dense_2706309270128_out, output=activation_2706329331536_out
[7] type=7, input=activation_2706329331536_out, output=loss
[INFO] op_type=7, output_id=loss, input_id=activation_2706329331536_out
[INFO] grad_out ptr = 0000000000000000, grad_input ptr = 0000000000000000
[LOSS_GRAD] Average dL/dy: 0.80833
dL/dy[0] = 0.80833
[INFO] op_type=3, output_id=activation_2706329331536_out, input_id=dense_2706309270128_out
[INFO] grad_out ptr = 0000000706032A00, grad_input ptr = 0000000706032C00
[DEBUG] activation output (first 10): 0.80833 0.83167 0.80585 0.82951 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
[DEBUG] grad_input (first 10): 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
[DEBUG] grad_out before activation_backward (first 10): 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
[INFO] op_type=1, output_id=dense_2706309270128_out, input_id=dense_2706309270128_linear
[INFO] grad_out ptr = 0000000706032C00, grad_input ptr = 0000000706032E00
[add_backward_input] d_input[0] = 0.125237 (from d_out[0] = 0.125237)
[add_backward_bias] d_bias[0] = 0.125237
[GRADIENT] dense_2706309270128_b grad ??min=0.125237, max=0.125237, mean=0.125237
  [0] = 0.125237
[INFO] op_type=0, output_id=dense_2706309270128_linear, input_id=activation_2706329589456_out
[INFO] grad_out ptr = 0000000706032E00, grad_input ptr = 0000000706033200
[transpose] input[0] = 0.424412 -> output[0] = 0.424412
[matmul_bw_input_simple] M=1, N=1, K=4 | d_out[0]=0.125237, W_T[0]=0.424412, sum=0.053152
[transpose] input[0] = 0.558090 -> output[0] = 0.558090
[transpose] input[1] = 0.443799 -> output[1] = 0.443799
[transpose] input[2] = 0.470293 -> output[2] = 0.470293
[transpose] input[3] = 0.401031 -> output[3] = 0.401031
[matmul_bw_weight] d_weight[0] = 0.069893, input_T[0] = 0.558090, d_out[0] = 0.125237
[GRADIENT] dense_2706309270128_W grad ??min=0.0502238, max=0.0698933, mean=0.0586487
  [0] = 0.0698933
  [1] = 0.0555799
  [2] = 0.0588979
  [3] = 0.0502238
[INFO] op_type=3, output_id=activation_2706329589456_out, input_id=dense_2706329399712_out
[INFO] grad_out ptr = 0000000706033200, grad_input ptr = 0000000706033400
[DEBUG] activation output (first 10): 0.55809 0.44380 0.47029 0.40103 0.42595 0.47439 0.57805 0.48343 0.62972 0.46929
[DEBUG] grad_input (first 10): 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
[DEBUG] grad_out before activation_backward (first 10): 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 73039675966023533717868249088.00000 73163501137570858685016571904.00000 57570896702982323836293731686809600.00000 0.00000
[INFO] op_type=1, output_id=dense_2706329399712_out, input_id=dense_2706329399712_linear
[INFO] grad_out ptr = 0000000706033400, grad_input ptr = 0000000706033800
[add_backward_input] d_input[0] = 0.013109 (from d_out[0] = 0.013109)
[add_backward_bias] d_bias[0] = 0.013109
[GRADIENT] dense_2706329399712_b grad ??min=0.0131086, max=0.03259, mean=0.0243397
  [0] = 0.0131086
  [1] = 0.0226771
  [2] = 0.03259
  [3] = 0.028983
[INFO] op_type=0, output_id=dense_2706329399712_linear, input_id=flatten_2706327946592_out
[INFO] grad_out ptr = 0000000706033800, grad_input ptr = 0000000706033C00
[transpose] input[0] = 0.531791 -> output[0] = 0.531791
[transpose] input[1] = -0.123217 -> output[2] = -0.123217
[transpose] input[2] = -0.433729 -> output[4] = -0.433729
[transpose] input[3] = -0.334884 -> output[6] = -0.334884
[matmul_bw_input_simple] M=1, N=4, K=2 | d_out[0]=0.013109, W_T[0]=0.531791, sum=-0.019664
[transpose] input[0] = 1.000000 -> output[0] = 1.000000
[transpose] input[1] = 1.000000 -> output[1] = 1.000000
[matmul_bw_weight] d_weight[0] = 0.013109, input_T[0] = 1.000000, d_out[0] = 0.013109
[GRADIENT] dense_2706329399712_W grad ??min=0, max=0.0131086, mean=0.00327715
  [0] = 0.0131086
  [1] = 0
  [2] = 0
  [3] = 0
  [4] = 0.0131086
  [5] = 0
  [6] = 0
  [7] = 0
[INFO] op_type=5, output_id=flatten_2706327946592_out, input_id=input
[INFO] grad_out ptr = 0000000706033C00, grad_input ptr = 0000000000000000

📊 최종 평가 메트릭 (BCE): 0.358550

🔍 XOR 예측 결과:
====================================
  입력         |  정답  |  예측값
---------------|--------|----------
  [0.0, 0.0]  |   0.0   |  0.8295
  [0.0, 1.0]  |   1.0   |  0.8316
  [1.0, 0.0]  |   1.0   |  0.8058
  [1.0, 1.0]  |   0.0   |  0.8083
====================================

✅ 현재 확인된 이상 현상 요약

activation_backward에 들어온 grad_out 값이 비정상적으로 크거나 NaN에 가까운 수치 (1e+37~1e+39)가 포함됨.
이러한 이상치는 단순한 디버그 출력 이상이 아닌, 실제 메모리에 존재함.
activation_backward 커널 자체는 grad_out과 out 값의 유효성 검사를 철저히 수행하고 있으며, 비정상 값을 0.0f으로 클리핑하고 있음.
따라서 문제는 activation_backward 진입 전, 즉 grad_out을 생성한 직전 연산에서 유입되었을 가능성이 매우 큼.

초기 grad_out 은 1.0 으로 초기화 이후 그 값으로부터 loss_backward ( 비용함수의 기울기 변화량 ) 호출

하지만 출력 내용 중

[INFO] op_type=7, output_id=loss, input_id=activation_2706329331536_out
[INFO] grad_out ptr = 0000000000000000, grad_input ptr = 0000000000000000

해당 내용으로 봤을 때 이 로그는 Loss 연산 직후의 backward 단계에서 grad_out과 grad_input 포인터가 전혀 초기화되지 않은 상태 — 즉 null 포인터임을 보여주고 있음

nullptr 인 상태, 수정 및 확인 필요

모델의 구조를 나타내는 행렬 E, 계산 그래프 의 연순회를 통한 가중치 변화량 계산 중에서, LOSS 의 부분은 먼저 처리 후 계산 그래프 역순회 수행으로 변경, gradients[loss] 의 할당 순서 문제 해결

다음으로 activation_backward 에서 grad_out. 값 전달을 할 때 여기에 이상치가 포함되어 있음을 확인

[LOSS_GRAD] Average dL/dy: 0.557036
dL/dy[0] = -0.557036
[INFO] op_type=7, output_id=loss, input_id=activation_1655391300080_out
[INFO] grad_out ptr = 0000000B06032A00, grad_input ptr = 0000000000000000
[INFO] op_type=3, output_id=activation_1655391300080_out, input_id=dense_1654958295472_out
[INFO] grad_out ptr = 0000000B06032A00, grad_input ptr = 0000000B06032C00
[DEBUG] activation output (first 10): 0.44296 0.48141 0.44123 0.48474 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
[DEBUG] grad_input (first 10): 1043866941386571923822913620279296.00000 0.00000 1043866941386571923822913620279296.00000 0.00000 1043866941386571923822913620279296.00000 0.00000 0.00000 0.00000 0.00000 0.00000
[DEBUG] grad_out before activation_backward (first 10): 0.00000 0.00000 764270755873733076455297019543552.00000 0.00000 307348073856968022527436828704768.00000 12570781311549046784.00000 1090004751872788039274070016.00000 0.00000 0.00000 0.00000

'dev_AI_framework' 카테고리의 다른 글

run_graph_backward의 각 변수 해석 (0)	2025.08.09
LOSS Function 의 역전파 이후 데이터 검증 (6)	2025.08.08
행렬 곱에 기반한 grad_input, grad_weight, shared memory 데이터 문제, tiling 알고리즘 사용 중에서 발생한 오류 (6)	2025.08.07
CUDA, CUPY, Python 버전 세팅 및 확인 (0)	2025.08.03
간단한 모델의 학습으로 필요한 추가 사항의 확인 (0)	2025.08.02

뜻 지, 가르칠 훈

CUDA 이상치 문제 해결해보기...

✅ 현재 확인된 이상 현상 요약

'dev_AI_framework' 카테고리의 다른 글

티스토리툴바

CUDA 이상치 문제 해결해보기...

✅ 현재 확인된 이상 현상 요약

'dev_AI_framework' 카테고리의 다른 글

'dev_AI_framework' Related Articles

티스토리툴바