Abstract: Lenders are required to transmit relatively few raw consumer credit behavior data values to credit reporting agencies. From the raw data, credit bureaus have derived thousands of predictive attributes. By construction, these derived attributes are highly correlated. Multicollinearity is known to hamper the ability to explain statistical and machine learning models. In general, it is desirable if not required to be able to explain the inner workings of credit risk models. When a modeler attempts to incorporate numerous highly collinear attributes in a statistical model and maximize prediction, the impact of multicollinearity works against explainability. Traditional solutions to this problem include omitting variables with the wrong sign or using factor analysis to collapse the original variables into a new subset of variables prior to estimating model parameters. Both solutions increase the ability to explain the model at the expense of decreasing its predictive power.
This paper describes a novel method of utilizing multicollinearity in the data to increase the predictive power of the credit risk model and simultaneously allows reason codes to be extracted from it. We make use of the original attributes, and develop a factor analysis after building the predictive model that allows identification of the concepts that describe reasons credit may be denied. The model is constructed so as to be as predictive as possible using readily available data. Reason codes are extracted from the factors. Eight ways to accomplish this are described. It can be applied to any credit scoring system including traditional logistic regression models and machine learning models.
Bio: Michael McBurnett is a Distinguished Scientist in the Equifax Data Science Lab. He has 30 years of experience building, deploying, or monetizing mathematical models of human behavior in the credit risk, banking, combination utility, telecommunications, direct marketing, counterinsurgency warfare, intelligence, political, and academic arenas. His professional career has focused on mathematical and statistical modeling, data collection, the invention or identification of new data sources appropriate for particular problems, and data analysis. He is a co-inventor of NeuroDecision®, a regulatory compliant method of producing actionable risk scores with appropriate adverse action codes using neural networks.