![]() ![]() # Original (full) set of variables X_o = df_wdummy] X_o = sm.add_constant(X_o) y_o = df_wdummy # Baseline results model_o = sm.OLS(y_o, X_o) results_o = model_o.fit() results_o.summary() ![]() However, this approach has its own challenges when multicollinearity exists among the predictor variables as illustrated through the Boston housing dataset in sklearn. One approach in feature selection would be through the use of p-values where variables with p-values above a certain threshold (typically +/- 0.05) are surfaced as not significantly contributing towards the target variable, and hence can be dropped to reduce model complexity. In cases where selected predictor variables are not independent of each other, we would not be able to clearly determine or attribute the contribution from the various predictor variables towards the target variable - interpretability of the model coefficients becomes an issue.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |