Multicollinearity in regression analysis occurs when two or an ext explanatory variables room highly associated to each other, such that they perform not administer unique or independent info in the regression model. If the degree of correlation is high enough in between variables, that can cause problems as soon as fitting and interpreting the regression model. 

For example, suppose you run a multiple straight regression through the following variables:

Response variable: max upright jump

Explanatory variables: shoe size, height, time invested practicing

In this case, the explanatory variables shoe size and also height are likely to be extremely correlated since taller civilization tend to have larger shoes sizes. This method that multicollinearity is likely to it is in a problem in this regression.

You are watching: Variance inflation factor stata

Fortunately, it’s feasible to detect multicollinearity making use of a metric known as the variance inflation aspect (VIF), which measures the correlation and also strength the correlation between the explanatory variables in a regression model.

This tutorial describes how to usage VIF to detect multicollinearity in a regression evaluation in Stata.

Example: Multicollinearity in Stata

For this example we will usage the Stata integrated dataset called auto. Usage the following command to load the dataset:

sysuse auto

We’ll use the regress command to fit a multiple direct regression version using price together the response variable and weight, length, and mpg together the explanatory variables:

regress price weight length mpg

*

Next, we’ll usage the vif command come test for multicollinearity:

vif

*

This produces a VIF value for each of the explanatory variables in the model. The worth for VIF starts in ~ 1 and has no upper limit. A general dominion of thumb for interpreting VIFs is together follows:

A worth of 1 shows there is no correlation in between a provided explanatory variable and any various other explanatory variables in the model.A value in between 1 and also 5 indicates moderate correlation between a offered explanatory variable and also other explanatory variables in the model, but this is often not severe sufficient to need attention.A value better than 5 suggests potentially significant correlation in between a given explanatory variable and also other explanatory variables in the model. In this case, the coefficient estimates and also p-values in the regression calculation are likely unreliable.

We have the right to see that the VIF worths for both weight and also length are better than 5, which suggests that multicollinearity is likely a problem in the regression model.

How to attend to Multicollinearity

Often the easiest means to deal with multicollinearity is to simply remove among the problem variables because the variable you’re removing is likely redundant anyway and also adds little unique or independent details the model.

To identify which variable to remove, we have the right to use the corr command to produce a correlation matrix to view the correlation coefficients in between each the the variables in the model, i m sorry can help us determine which variables can be highly correlated with each other and also could be resulting in the trouble of multicollinearity:

corr price weight length mpg

*

We deserve to see that size is highly associated with both weight and also mpg, and also it has actually the lowest correlation through the solution variable price. Thus, removing size from the model could solve the difficulty of multicollinearity there is no reducing the overall quality the the regression model.

See more: Zero-Percent Certificate Of Indebtedness Definition, Certificate Of Indebtedness Definition

To check this, we can perform the regression evaluation again using simply weight and also mpg together explanatory variables:

regress price load mpg

*

We can see that the readjusted R-squared that this version is 0.2735 compared to 0.3298 in the vault model. This shows that the overall usefulness the the model reduced only slightly. Next, we can discover the VIF worths again utilizing the VIF command:

VIF

*

Both VIF values are below 5, which shows that multicollinearity is no much longer a problem in the model.