computer chip with image of brain and the letters AI
LegalMay 25, 2021

Building a solid data foundation is the first step to driving better intelligence

Artificial intelligence and machine learning are already helping attorneys see how different decisions—including what outside counsel to use, or whether to use them at all—will affect litigation settlement amounts, legal fees, and case duration. Going forward, those technologies will only become more advanced, giving AI-assisted lawyers such an unfair advantage over their competitors that not using AI may start to border on malpractice. But there’s a catch: You don’t get this unfair advantage unless your data is clean and complete for the purposes at hand.

Not all corporate legal departments (CLDs) are there yet, for several reasons. For starters, while lawyers mostly speak the same language when doing actual legal work, they do not speak the same language when administering it. A legal matter might be categorized one way at one CLD and another way at another CLD (and even categorized inconsistently within either of those departments). Even more vexing, a lot of data is not recorded at all.

The most obvious case is insourced work. CLDs are required by the reality of e-billing to record the existence of matters sent to outside counsel. Otherwise, there is no way for their law firms to get paid. No analogous constraint necessitates creating a record for insourced legal matters. Even when some data is recorded, it often means nothing except to the person who created it.

These challenges need to be addressed if you want accurate predictions from AI. Here are five strategies to build the required foundation.

Appoint a “Data Czar”

Your data strategy should start at the top. That means designating an executive with the responsibility of ensuring your data is clean and complete enough for not only your current needs, but also the future state you envision.

You could call this person a chief data officer, but I liken them more to a data czar. A czar denotes someone with great power or authority, which is exactly what you need when building a data-driven CLD. You need someone who will drive a culture dedicated to embracing data-based decision making and take responsibility for the quality of data within your organization.

This person needs the technical chops and gravitas to get people to listen and understand the importance of having clean data. It needs to be obvious to both them and everybody else that ultimate responsibility for clean data falls on their shoulders and their shoulders alone, giving them the incentive to crack down on sloppy data. Saying things like, “Everybody needs to take responsibility for data quality” in practice ends up meaning the exact opposite: “Nobody needs to take responsibility for data quality.”

Create a Data Double Team

Your data czar needs to create a team to do the actual cleanup. The team needs to have at least two areas of expertise: expertise in analyzing and cleaning up legal data, and subject matter expertise in the underlying legal activity that data represents.

Data experts will be responsible for sanitizing and categorizing your data. They don’t necessarily need to be data scientists or know coding languages like Python or R, but they should have good mathematical and critical thinking skills and strong knowledge of whatever business intelligence tools your organization uses. Ideally, they will have worked with legal data before and understand the most common ways it can be incomplete or inaccurate.

They’ll work with the SMEs, who will apply their knowledge about a specific legal area to determine how the data may apply to the matter at hand. More on that in a minute.

Clean Up Your Data

Now it’s time to begin the mundane yet highly necessary task of curating your data. That starts with the data team creating a list of every data field that might conceivably matter to what you’re trying to predict.

Keep in mind that not every data field is created equal. Some are going to be vitally important to the predictions you are trying to make while others will not be important at all. In single-plaintiff L&E litigation, things that probably matter include who the opposing counsel is, the protected attributes of the plaintiff, the named causes of action, and so forth. The day of the week your organization was served probably doesn’t matter.

Your SME should know what fields obviously matter, what ones obviously don’t, and what ones might. After eliminating the obvious nonstarters, your data expert and SME should continue working with the data to perform various analyses—for instance, a regression analysis—to eliminate all fields except those with significant predictive value. In the L&E example above, I personally performed an analysis for a consulting client that showed a particular type of claim practically guaranteed a settlement of no more than $15,000. That was highly relevant, considering the client was previously farming those cases out for a flat fee of $65,000. We adjusted our pricing scheme accordingly.

Once the important fields have been determined, teams must audit data for accuracy and completeness. This is a messy process that will require extensive consultation with the people who didn’t put the information in completely and accurately the first time. Even though it is a lot of work, there is no better way to get that tribal knowledge into actual records than picking up the phone.

Finally, it’s time to set up a system to reduce the extent to which you’ll have to reclean the data going forward. Establish policies about what values belong in what fields, and train people on those policies. Set up rules and other controls in your e-billing and other systems that prevent inaccurate or blank field values.

Despite best efforts, you will never be able to totally prevent dirty or inaccurate data from getting into your systems. Instead, I recommend auditing your data for quality and completeness every quarter, a manual, sometimes maddening process that is nonetheless necessary.

Get Help to Prepare for Change

Getting the right information together and translating it into actionable intelligence is hard. Therefore, consider working with a consultant that can guide you through the process.

They can work with you to create a plan to implement the necessary changes to make good data hygiene a key part of your organizational culture. They’ll help you develop your data management teams and other resources, map out project milestones and completion dates, and advise you on how best to involve your legal teams in the process.

They’ll also help identify why you need to be capturing certain pieces of information and ensure that you’re capturing it in the right way—using drop down lists when you have a fixed list of acceptable values, or ensuring that you’re actually recording data points that may be relevant to making decisions down the road. The best consultants will also help you communicate the reasoning behind any technical and process changes you may implement, improving the chances that those changes will be embraced throughout the CLD.

Don’t Lose Sight of Your Goals

Remember why you’re doing this. It’s not to become the most innovative CLD on the block. It’s not to dabble in some really cool technology. It’s to save time and money, and start seeing improved litigation outcomes. If you take the time to build the right data foundation, you’ll see those benefits, sooner.

Nathan Cemenska
Director of Legal Operations and Industry Insights

Nathan Cemenska, JD/MBA, is the Director of Legal Operations and Industry Insights at Wolters Kluwer's ELM Solutions.

Solutions
Enterprise legal management
Market-leading provider of enterprise legal spend and matter management, contract lifecycle management, and legal analytics solutions