Initially published in MBA Insights on Tuesday, July 16, 2019.
In almost every statistics class, students will learn this odd fact: in many cities, there are two numbers that trend together: ice cream sales and violent crime rates. The rise in both figures, however, occurs during the hotter, summer months, and the heat is considered to be a contributing factor for both the increase in ice cream sales and the short tempers that contribute to spikes in violent crime.
Without understanding the confounding variable (the heat), you would be left with what is deemed a spurious correlation between two variables that does not result from a direct relationship between the two variables, but from their relationship to an outside variable. This is used to teach one of the most critical points in statistics: correlation does not and should not imply causality.
31 Flavors of Ice Cream. Thousands of Flavors of Data
Picture yourself peering into the ice cream case and scanning the round cardboard containers--mint chocolate chip, cookies and cream, rocky road, strawberry. Yes, it can be overwhelming, but you can eventually take it all in and decide. It's not that easy with lending data. With the rapid advancement of technology, institutions now have more data than ever before, and they are finding new, and often exciting, ways to leverage that data.
Artificial intelligence, for example, is allowing mortgage lenders to leverage machine learning to utilize large sets of data points in the underwriting, pricing and marketing of loans. These new data points and methodologies have the potential to expand access to credit, which is something that is needed in many markets.
The Scoop of Ice Cream Is Connected to the Cone—Until it Falls Off
First the plop sound. Then the shrill sound of a child wailing. The cone and the ice cream appeared to be connected, but one flip of the hand and it becomes clear they are not. While it is easy to assume that the sheer volume of potential alternative data will easily facilitate the ability for lenders to draw appropriate conclusions from their increasingly vast troves of lending data, for example, as we noted with ice cream and crime rates, relationships between data are nuanced at best, and at worst, baffling.
For example, machine learning algorithms are often seeking to determine new, reliable ways to determine credit-worthiness of loan applicants. The power in machine learning is its capacity to "see" connections in data that our human brains are often incapable of noting, and these algorithms have started to note unusual connections.
For example, in reviewing historically repaid loans, some may note a correlation between the type of web browser the borrower used or the type of smart phone. It is unlikely that the web browser or phone type truly causes a borrower to pay as agreed or default. It is more likely that there is a spurious correlation in play and we don't know yet exactly what the confounding variable may be. Without appropriate context and a full understanding of the relationship between data points, making decisions based on these spurious correlations can be dangerous and can potentially cause disparate impact concerns.
A Waffle Was Just a Waffle—Until it Became the Ice Cream Cone
At the 1904 St. Louis World's Fair, a waffle vendor rolled his waffles into a cone when the neighboring ice cream vendor ran out of bowls. Voilá, the cornucopia shaped waffle becomes an American institution because someone took it out of context and used it for a different purpose.
As we can see, context is important in data analytics of any kind. This is especially true with the expansive sets of alternative data that are becoming more common with the use of AI and machine learning. Your business savvy side understands that data is often one of the most valuable resources that you have in remaining competitive in an evolving business landscape. The compliance officer in you is concerned about the potential for data to be misunderstood, misused, and/or presented in ways that could have devastating consequences. The data analyst in you is doing a happy dance just thinking about expanded data and the potential to leverage it for deeper analyses How does one reconcile these competing factors?
As the availability, scope and complexity of banking data becomes increasingly pronounced, lenders will want to be aware of the pitfalls that can readily come into play when analytics are performed without the proper context, input and oversight of regulatory compliance experts.
Now, who wants some ice cream?
To learn more, please complete the form below and we'll be in touch.