Advice of application of M.L

⇒ 이 중 어떤 방법을 사용할지는 그저 느낌임

Diagnostic: 무엇이 잘 작동하는지 확인할 수 있는 방법

Overfitting & underfitting Problem

Overfitting을 어떻게 판단할까?

parameter가 많아질수록 plot을 그릴 수는 없다

→ Split the data set as Training set & Test set(7:3 정도)

→ Test set의 $J_{test}(\theta)$를 계산 이때, classfircation과 linear regression에 따라 식을 다르게 적용(classification은 0,1로 에러 발생시 1로 계산 후 m으로 나눈다)

d= degree of polynomial

Degree, 적합한 모델의 차수를 어떻게 결정할까? 간단하다 한번 더 test하면 된다.

즉 위의 overfitting 판단 처럼 이번에는 Training set 60%, Cross validation set 20%, Test set 20%로 나눈다.

이후 각각 차수에 해당하는 $h(\theta)$의 theta를 training set로 구한 후 이를 cross valation set로 테스트해서 가장 적합한 d를 구한다.

이후 test set으로 overfitting을 시험한다.