Outliers

While investigating the distributions of my variables with R weeks ago I could have looked for outliers, but here I’ll look again and consider removing any unusual observations.

First, I’ll draw charts of dnce versus each of my three significant (and highly valuable) predictors. Keep in mind that outliers should be removed before modeling begins–you wouldn’t want to have outliers in your TRAINING set or TEST set, right? So I’ll use the original dataset, d, to investigate my dataset contains outliers.

I don’t see too many extremes, but I do see values of bpm, val, and acous that are equal to 0, and that seems quite strange. So I won’t focus my efforts on trimming top and bottom percentiles, but rather on removing that odd behavior.

Let’s count how often these things occur before removing them:

In this case, it isn’t a concern that the response (dnce) and the predictors (bpm, val, and acous) is 0 , so I won’t remove them here. In most scenarios, a real world analysis would come with real consequences.

Here’s the code I’ll use to remove the 0 observations:

My dataset now has 530 observations, which is not very many. This is something I might be concerned about if my prediction accuracy starts to suffer. But for now, let’s just see if removing these outliers improves the overall fit of our model to the other, more typical, observations in the dataset.

Now I’ll divide the dataset and build my model:

If you recall, the original dataset (before removing outliers) had an R-squared of 34.91%. Now that I’ve removed these strange observations, my R-squared has increased to 37.64%.

Taking it a step further, it would also be a good idea to check the prediction accuracy and stability before and after removing the outliers to be sure the model is really improving since I have a much smaller sample size now.

Removing outliers is an easy way to improve overall model fit to the typical observations of the response variable, especially when compared to collecting more observations or adding new predictor variables you don’t already have.

Thanks for following along!

Leave a comment

Design a site like this with WordPress.com
Get started