[회귀-예측]Prediction of Wild Blueberry Yield-03-GridSearch활용(부스팅모델)돌려보기

2023. 6. 3. 00:29

728x90

1. 부스팅 모델 LightGBM 사용해보기

from sklearn.model_selection import train_test_split
#
num_train = len(train) # 훈련 데이터 개수

# 훈련 데이터와 테스트 데이터 나누기

X_train= all_data[:num_train] # 0~num_train -1 행
X_test = all_data[num_train:] # num_train ~ 마지막 행

y_train = train['yield'].values

y_test = submission['yield'].values

%%time

from sklearn.model_selection import GridSearchCV
from lightgbm import LGBMRegressor

# LightGBM 모델 생성
lgb_model = LGBMRegressor()

# 그리드 서치를 위한 하이퍼파라미터 그리드 준비
param_grid_lgb = {
    'max_depth': [8,10],                        # 트리의 최대 깊이
    'learning_rate': [0.04],               # 학습률
    'n_estimators': [50, 200],             # 트리의 개수
    'min_child_samples': [1, 2],           # 리프 노드에 필요한 최소 샘플 수
    'subsample': [0.9],                   # 트리를 학습할 때 사용할 샘플링 비율
    'colsample_bytree': [0.8],                # 트리를 학습할 때 사용할 특성의 비율
    'reg_alpha': [0.3 ,0.2],                     # L1 정규화 항의 가중치
    'reg_lambda': [ 0.4 , 0.5],                         # L2 정규화 항의 가중치
    'random_state': [42]                   # 랜덤 시드
}

# 그리드 서치 객체 생성
grid_search_lgb = GridSearchCV(lgb_model, param_grid_lgb, scoring=mae_scorer,cv=10, n_jobs=-1)

# 그리드 서치 수행
grid_search_lgb.fit(X_train, y_train)

# 최적의 모델 및 파라미터 출력
best_model_grid_search_lgb = grid_search_lgb.best_estimator_
best_params_grid_search_lgb = grid_search_lgb.best_params_
print("Best Model (LightGBM):", best_model_grid_search_lgb)
print("Best Parameters (LightGBM):", best_params_grid_search_lgb)

==> 그리드 서치를 통한 최적의 LightGBM 하이퍼파라미터 찾아보기

https://knowallworld.tistory.com/448

[LightGBM()활용]GridSearch 활용한 하이퍼파라미터 값 알아보기★불순도,순수도★클래스

1. LightGBM() 활용 최적의 하이퍼파라미터 찾고 학습하기 %%time from sklearn.model_selection import GridSearchCV from sklearn.tree import DecisionTreeClassifier import lightgbm as lgb # LightGBM 모델 생성 lgb_model = lgb.LGBMClassifier(

knowallworld.tistory.com

Best Model (LightGBM): LGBMRegressor(colsample_bytree=0.8, learning_rate=0.04, max_depth=8,
min_child_samples=2, n_estimators=200, random_state=42,
reg_alpha=0.2, reg_lambda=0.4, subsample=0.9)
Best Parameters (LightGBM): {'colsample_bytree': 0.8, 'learning_rate': 0.04, 'max_depth': 8, 'min_child_samples': 2, 'n_estimators': 200, 'random_state': 42, 'reg_alpha': 0.2, 'reg_lambda': 0.4, 'subsample': 0.9}
Wall time: 23.2 s

# 타깃값 1일 확룰 예측

y_preds= best_model_grid_search_lgb.predict(X_test)
y_preds

from sklearn.model_selection import train_test_split

num_train = len(train) # 훈련 데이터 개수

# 훈련 데이터와 테스트 데이터 나누기

X= all_data[:num_train] # 0~num_train -1 행
X_test = all_data[num_train:] # num_train ~ 마지막 행

y = train['yield'].values


X_train , X_valid , y_train, y_valid = train_test_split(X,y, test_size = 0.2, random_state=0)

from sklearn.metrics import mean_absolute_error

# 검증 데이터에 대한 예측 수행
y_pred = best_model_grid_search_lgb.predict(X_valid)

# MAE 계산
mae = mean_absolute_error(y_valid, y_pred)
print("MAE:", mae)

MAE: 323.15844619471255

==> 검증 수행

==> 345.101 점! ==> 향상 되었다

==> 1100등 / 1875 등

2. 부스팅 모델 XGBoost 사용해보기

from sklearn.model_selection import train_test_split
#
num_train = len(train) # 훈련 데이터 개수

# 훈련 데이터와 테스트 데이터 나누기

X_train= all_data[:num_train] # 0~num_train -1 행
X_test = all_data[num_train:] # num_train ~ 마지막 행

y_train = train['yield'].values

y_test = submission['yield'].values

%%time

from sklearn.model_selection import GridSearchCV
import xgboost as xgb

# XGBoost 모델 생성
xgb_model = xgb.XGBRegressor()

# 그리드 서치를 위한 하이퍼파라미터 그리드 준비
param_grid_xgb = {
    # 'max_depth': [3, 5],
    # 'learning_rate': [0.01 , 0.05],
    # 'n_estimators': [300, 400],
    # 'subsample': [1.0 ],  # subsample 비율
    'colsample_bytree': [0.8, 0.9 ],  # 각 트리에 사용되는 특성(feature)의 비율
    # 'gamma': [0 , 0.1],  # 트리 노드를 추가로 분할하기 위한 최소 손실 감소값
    'reg_alpha': [1],  # L1 정규화 항의 가중치
    'reg_lambda': [1.1,1.2],  # L2 정규화 항의 가중치
    "n_estimators":[30, 50],
    "max_depth":[3, 4,6],
    "n_jobs" : [-1],
    "learning_rate":[.3,.2, .1],
    "subsample":[.8 , 1.0],


}

# 그리드 서치 객체 생성
grid_search_xgb = GridSearchCV(xgb_model, param_grid_xgb,scoring = mae_scorer, cv=10, n_jobs=-1)

# 그리드 서치 수행
grid_search_xgb.fit(X_train, y_train)

# 최적의 모델 및 파라미터 출력
best_model_grid_search_xgb = grid_search_xgb.best_estimator_
best_params_grid_search_xgb = grid_search_xgb.best_params_
print("Best Model (XGBoost):", best_model_grid_search_xgb)
print("Best Parameters (XGBoost):", best_params_grid_search_xgb)

==> 그리드 서치를 통한 XGBoost 모델의 최적의 하이퍼파라미터 찾아보기

Best Model (XGBoost): XGBRegressor(base_score=None, booster=None, callbacks=None,
             colsample_bylevel=None, colsample_bynode=None,
             colsample_bytree=0.8, early_stopping_rounds=None,
             enable_categorical=False, eval_metric=None, feature_types=None,
             gamma=None, gpu_id=None, grow_policy=None, importance_type=None,
             interaction_constraints=None, learning_rate=0.1, max_bin=None,
             max_cat_threshold=None, max_cat_to_onehot=None,
             max_delta_step=None, max_depth=6, max_leaves=None,
             min_child_weight=None, missing=nan, monotone_constraints=None,
             n_estimators=50, n_jobs=-1, num_parallel_tree=None, predictor=None,
             random_state=None, ...)
Best Parameters (XGBoost): {'colsample_bytree': 0.8, 'learning_rate': 0.1, 'max_depth': 6, 'n_estimators': 50, 'n_jobs': -1, 'reg_alpha': 1, 'reg_lambda': 1.2, 'subsample': 1.0}
Wall time: 1min 27s

# 타깃값 1일 확룰 예측

y_preds= best_model_grid_search_xgb.predict(X_test)
y_preds

from sklearn.model_selection import train_test_split

num_train = len(train) # 훈련 데이터 개수

# 훈련 데이터와 테스트 데이터 나누기

X= all_data[:num_train] # 0~num_train -1 행
X_test = all_data[num_train:] # num_train ~ 마지막 행

y = train['yield'].values


X_train , X_valid , y_train, y_valid = train_test_split(X,y, test_size = 0.2, random_state=0)

from sklearn.metrics import mean_absolute_error

y_pred = best_model_grid_search_xgb.predict(X_valid)

# MAE 계산
mae = mean_absolute_error(y_valid, y_pred)
print("MAE:", mae)

MAE: 324.7218648920353

==> 1106등 / 1875 등

==> 345.18736 점! ==> LightGBM에 비해 향상 되지 않았다

728x90

'머신러닝 > 캐글_실습_블루베리' 카테고리의 다른 글

[회귀-예측]Prediction of Wild Blueberry Yield-06-베이지안 및 KFOLD활용(부스팅모델_CatBoost)돌려보기(내가 한것중 제일 best!) (0)	2023.06.11
[회귀-예측]Prediction of Wild Blueberry Yield-05-PCA처리 후 베이지안 및 KFOLD활용(부스팅모델)돌려보기 (1)	2023.06.09
[회귀-예측]Prediction of Wild Blueberry Yield-04-베이지안 최적활용(부스팅모델)돌려보기(내가 한것중 Best였다!) (0)	2023.06.09
[회귀-예측]Prediction of Wild Blueberry Yield-02-BaseLineModel(선형회귀,Ridge)돌려보기 (0)	2023.06.02
[회귀-예측]Prediction of Wild Blueberry Yield★XGBoost, LightGBM 활용-01-EDA (0)	2023.06.02

뭐든지 다 알아보자

Menu

Category

Notice

Recent comments

Links