다항 회귀를 이용한 과소적합과 과적합 이해

EastLight

We all try to make things work, no one sets out to fail. Let's give it a try first and decide afterward.

Today :
Yesterday :

MachineLearning, 선형회귀, linearRegression, 데이터분석, LightGBM, 데이터전처리, 회귀분석, 군집화, 머신러닝, 머신러닝회귀, regression, 비지도학습, Feature Engineering, unsupervised learning, kmeans, Machine Learning, XGBoost, ensemble learning, 차원축소, 피처엔지니어링,

Programming

다항 회귀를 이용한 과소적합과 과적합 이해

Lucas.Kim 2026. 1. 5. 08:03

1. 왜 과소적합·과적합을 이해해야 하는가

머신러닝 모델의 성능 문제는 대부분 과소적합(Underfitting) 또는 과적합(Overfitting) 에서 발생합니다.
다항 회귀(Polynomial Regression)는 모델 복잡도(degree) 를 직접 조절할 수 있기 때문에
이 두 현상을 가장 직관적으로 이해할 수 있는 예제입니다.

이번 글에서는

동일한 데이터에 대해
다항식 차수(degree)를 1, 4, 15로 변경하면서
모델의 예측 곡선과 MSE 변화를 비교하여

👉 Bias–Variance Trade-off 를 눈으로 확인합니다.

2. 실험 데이터 생성 – “정답 함수 + 노이즈”

import numpy as np
import matplotlib.pyplot as plt
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_score
%matplotlib inline

# 실제 정답 함수 (Cosine 기반 비선형 함수)
def true_fun(X):
    return np.cos(1.5 * np.pi * X)

# 0~1 구간에서 30개 샘플 생성
np.random.seed(0)
n_samples = 30
X = np.sort(np.random.rand(n_samples))

# 실제 함수 값 + 노이즈
y = true_fun(X) + np.random.randn(n_samples) * 0.1

# 원본 데이터 시각화
plt.scatter(X, y)

핵심 의도

정답은 비선형
데이터 수는 적음
노이즈 존재 → 현실 데이터와 유사

3. 다항 회귀 차수별 비교 실험

plt.figure(figsize=(14, 5))
degrees = [1, 4, 15]

for i in range(len(degrees)):
    ax = plt.subplot(1, len(degrees), i + 1)
    plt.setp(ax, xticks=(), yticks=())
    
    # degree별 Polynomial 변환 + LinearRegression
    polynomial_features = PolynomialFeatures(
        degree=degrees[i],
        include_bias=False
    )
    linear_regression = LinearRegression()
    
    pipeline = Pipeline([
        ("polynomial_features", polynomial_features),
        ("linear_regression", linear_regression)
    ])
    
    pipeline.fit(X.reshape(-1, 1), y)

4. 교차 검증 기반 성능 평가 (MSE)

scores = cross_val_score(
    pipeline,
    X.reshape(-1,1),
    y,
    scoring="neg_mean_squared_error",
    cv=10
)

coefficients = pipeline.named_steps['linear_regression'].coef_
print('\nDegree {0} 회귀 계수는 {1} 입니다.'
      .format(degrees[i], np.round(coefficients, 2)))
print('Degree {0} MSE 는 {1:.2f} 입니다.'
      .format(degrees[i], -1*np.mean(scores)))

5. 예측 곡선 vs 실제 함수 시각화

X_test = np.linspace(0, 1, 100)

plt.plot(X_test,
         pipeline.predict(X_test[:, np.newaxis]),
         label="Model")

plt.plot(X_test,
         true_fun(X_test),
         '--',
         label="True function")

plt.scatter(X, y, edgecolor='b', s=20, label="Samples")

plt.xlabel("x")
plt.ylabel("y")
plt.xlim((0, 1))
plt.ylim((-2, 2))
plt.legend(loc="best")
plt.title(
    "Degree {}\nMSE = {:.2e}(+/- {:.2e})"
    .format(degrees[i], -scores.mean(), scores.std())
)

plt.show()

6. 실험 결과 요약 (중요)

Degree = 1 (선형 회귀)

Degree 1 회귀 계수는 [-1.61] 입니다. Degree 1 MSE 는 0.41 입니다.

모델이 너무 단순
비선형 패턴 전혀 학습 못함
과소적합 (High Bias, Low Variance)

Degree = 4 (적절한 복잡도)

Degree 4 회귀 계수는 [ 0.47 -17.79 23.59 -7.26] 입니다. Degree 4 MSE 는 0.04 입니다.

실제 함수 형태를 잘 근사
일반화 성능 우수
Bias–Variance 균형 구간

Degree = 15 (과도한 복잡도)

Degree 15 회귀 계수는 [-2.98e+03 1.03e+05 -1.87e+06 2.03e+07 ...] 입니다. Degree 15 MSE 는 182621180.61 입니다.

회귀 계수 폭발
노이즈까지 암기
테스트 성능 붕괴
과적합 (Low Bias, High Variance)

7. 핵심 개념 정리

✅ 과소적합 (Underfitting)

모델이 너무 단순
패턴을 제대로 학습하지 못함
Bias ↑ / Variance ↓

✅ 과적합 (Overfitting)

모델이 지나치게 복잡
학습 데이터에 과도하게 맞춤
Bias ↓ / Variance ↑

✅ Bias–Variance Trade-off

가장 중요한 목표는 둘의 균형점(Golden Point)을 찾는 것

8. 결론

다항 회귀는 모델 복잡도 조절 실험의 교과서
차수를 무작정 늘리면 성능이 좋아지지 않음
교차 검증 + 시각화 는 필수
실제 문제에서는
→ 다항 회귀 + Ridge / Lasso 규제 가 매우 중요

저작자표시 비영리 변경금지 (새창열림)

'Programming' 카테고리의 다른 글

Lasso 회귀와 ElasticNet 회귀 이해 및 실습 (1)	2026.01.05
규제 선형회귀(Regularized Linear Regression)와 Ridge 회귀 실습 (1)	2026.01.05
다항 회귀(Polynomial Regression)를 이용한 보스턴 주택가격 예측 (1)	2026.01.05
다항 회귀(Polynomial Regression)의 이해와 실습 (0)	2026.01.05
LinearRegression 클래스와 보스턴 주택 가격 예측 실습 (0)	2026.01.05

현재글다항 회귀를 이용한 과소적합과 과적합 이해

다항 회귀를 이용한 과소적합과 과적합 이해

1. 왜 과소적합·과적합을 이해해야 하는가

2. 실험 데이터 생성 – “정답 함수 + 노이즈”

핵심 의도

3. 다항 회귀 차수별 비교 실험

4. 교차 검증 기반 성능 평가 (MSE)

5. 예측 곡선 vs 실제 함수 시각화

6. 실험 결과 요약 (중요)

Degree = 1 (선형 회귀)

Degree = 4 (적절한 복잡도)

Degree = 15 (과도한 복잡도)

7. 핵심 개념 정리

✅ 과소적합 (Underfitting)

✅ 과적합 (Overfitting)

✅ Bias–Variance Trade-off

8. 결론

'Programming' 카테고리의 다른 글

'Programming'의 다른글

티스토리툴바

« 2026/04 »
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30

다항 회귀를 이용한 과소적합과 과적합 이해

1. 왜 과소적합·과적합을 이해해야 하는가

2. 실험 데이터 생성 – “정답 함수 + 노이즈”

핵심 의도

3. 다항 회귀 차수별 비교 실험

4. 교차 검증 기반 성능 평가 (MSE)

5. 예측 곡선 vs 실제 함수 시각화

6. 실험 결과 요약 (중요)

Degree = 1 (선형 회귀)

Degree = 4 (적절한 복잡도)

Degree = 15 (과도한 복잡도)

7. 핵심 개념 정리

✅ 과소적합 (Underfitting)

✅ 과적합 (Overfitting)

✅ Bias–Variance Trade-off

8. 결론

'Programming' 카테고리의 다른 글

'Programming'의 다른글

관련글

티스토리툴바