Demystifying Data A Litigator's Guide to Understanding Data Analytics

Demystifying Data A Litigator's Guide to Understanding Data Analytics

Data analytics has infiltrated the world of law, and we outline a Data Analytics Blueprint applicable for deliverables that are used in a litigation.

October 03, 2018

"Your honor, we would like to request more time." More time. 

Anxiety tinged with embarrassment radiates down your extremities as you await a response from the judge, from opposing counsel. You thought you were prepared: Schedules were set, the team was assembled, and experts were hired. But then the data started coming in. First a few kilobytes of PDFs over email, then megabytes of Excel files over SFTP.[1] Finally, the hard drives came, filled with terabytes of something called "Flat Files." ”Flat Files" with ”delimiter" issues, you were told. Before you knew it, expert reports were due, and you had not even been able to open your client's data until last week. So here you are, on pins and needles, waiting for a response.

Data Analytics: the way of the future, as you have likely heard.[2] Theo Epstein, the General Manager of the Chicago Cubs, used data analytics to revolutionize baseball and break the Curse of the Bambino and the Billy Goat to lead the Boston Red Sox and Cubs to World Series titles.[3] Now, data analytics has infiltrated the world of law and litigation. But what is it and how can it be used in a legal setting?

In designing a winning litigation strategy, attorneys and corporate legal teams use information to support or refute allegations. An overwhelming amount of information is generated each day by corporations and individuals in the form of structured and unstructured data.[4] Due to the sheer magnitude of the data, it is often not easy to discern any meaning from it. This is where data analytics becomes important. Data analytics uses computer techniques to break data into small, digestible bits of meaningful information.[5] This information, which data analytics experts can gather and interpret, can be used in any scenario where large amounts of data have been produced. The results can be used for an expert report, legal brief, or other deliverables that are used in a litigation or investigation.

The process that converts raw data into useful information is what we will refer to as the Data Analytics Blueprint (the ”Blueprint"). The Blueprint follows six steps and, although we map out these steps chronologically, in reality they are often iterative or performed concurrently. This is the nature of the process, especially in the context of litigation where data is updated, replaced, or deemed irrelevant as the case progresses. Thus, it is vital for experts in data analytics to be hired far in advance and to have ongoing communication between the client, data owners, attorneys, and experts.

Data Analytics Blueprint

The Data Analytics Blueprint

The six steps of the Data Analytics Blueprint can be broken into two parts. A useful analogy for imagining these two parts is the process of painting a house. The first part is the grunt work: choosing a color, laying drop cloths, washing walls, taping edges, etc. The second part is when you reap the benefits of this preparation; it is the easiest and most rewarding part when you can finally begin painting. In analytics, the grunt work involves defining your data goals, extracting the data, and validating that data. The second part, the painting equivalent, includes having the data analytics experts perform the analysis, build the model, and report their findings.

Each step described herein will be followed by a hypothetical case study example. The case study is an amalgamation of actual cases worked on by experienced Stout data forensics practitioners.


Docs R Us was a medical provider that offered physical therapy to patients through a contract with a large health insurance carrier. As part of the contract, patients would visit a Docs R Us clinic and receive physical therapy. Docs R Us would then send a bill to the insurance carrier for the services provided. Next, the insurance carrier would pay Docs R Us based on specified contract rates. After several years, Docs R Us became suspicious that they were not being paid at the contract rates and filed a lawsuit. Docs R Us hired an expert data forensics team to determine if they were being underpaid and, if so, by how much.

Data Analytics Blueprint

Part One


In the first step of the Blueprint, you must define your objectives and needs. It is helpful to frame these objectives and needs using the Three W's: ”What?", ”When?", and ”Who?" The ”What?" involves identifying what data you need. Depending on the issue, this could mean accounting, operational, or marketing data. It is advantageous to collect data as broadly as possible at the beginning of the case so your experts can familiarize themselves with what data is available. Next you must identify the ”When?", which defines the time period you are interested in. Considerations for this could include defining the damage period or any statute of limitations that may exist. Last, the ”Who?" identifies the parties and stakeholders that were impacted by the event being analyzed. This could involve identifying product numbers related to a product-recall litigation or insurance policyholders party to a class action lawsuit.

Case Study

The Three W's for Docs R Us: In our case study, the objective is to determine if Docs R Us was being underpaid and by how much. The ”What?" to accomplish this would be billing data, payment data, and contract rate data. For the ”When?", the Docs R Us contract with the insurance carrier began in January 2005 and ended in December 2015. However, statute of limitations rules limited the damage period from March 2010 to December 2015, which defined the timing of data to request. Finally, for the ”Who?", the Docs R Us data included patients that were not covered by the insurance carrier, and the payment data included payments on non-Docs R Us claims. Thus, a list of social security numbers should be obtained, as well as a list of Docs R Us provider identification numbers.


Extracting data from client systems is often the most complex part of the analytics process. Data extraction involves working with data owners, IT teams, and others who work in the data day-in and day-out. These people are often so embedded in the data that the language they use can sound like Dothraki to a layperson.[6]

Once you have coordinated with data owners to identify the proper systems and locations, the extraction can be complicated and laborious. Corporate enterprise software is expensive and time-consuming to upgrade, and it often results in antiquated and inefficient client systems. In addition, client acquisitions of external companies might require extractions from multiple systems or outsourced providers. All of these factors can prove to be both costly and time-consuming in a litigation setting. Thus, it is crucial for attorneys and data analytics experts to work closely with the client's data owners to map out a work plan and timeline for extraction early in the matter.

Case Study

The Docs R Us case suffered from many of the aforementioned complications. It took weeks of client and expert coordination to splice together different systems to obtain a complete set of data. One particularly complicated factor was a system that stored physical therapy pricing dynamically, meaning that the system did not archive historical rates. This meant that, for a record of physical therapy performed in 2010, the rates reflected current pricing. As a result, we had to work with the client to systematically fill in the historical rates.


At this point in the Blueprint, you have defined what you want to do with your data, and you have planned how to get there. You have coordinated with the client's data team and performed the data extract. Now what? Now it is time to find out what you have and, equally important, what you do not have. As Q. Ethan McCallum explains in Bad Data Handbook, ”You can't assume that a new dataset is clean and ready for analysis."[7]

Data validation is where you get your hands dirty and determine if there are anomalies, errors, or other complications. Common issues include duplicate records, inconsistent fields, delimiter issues, and gaps in the data. Resolution of these issues is essential for the integrity of the data, and it involves coordination between client personnel, attorneys, and experts.

Case Study

The Docs R Us case sourced millions of records from multiple systems based on the time period and location of service. The validation process involved the reconciliation of patient medical records to billing records, billing records to payment records, and periodic sampling of patient records to verify compliance and accuracy with government guidelines. As an example, certain medical codes had standardized Medicare billing rates. Part of the validation process included graphing the extract's price of those codes over time and ensuring the pricing was consistent with Medicare rates through data visualization.

Part Two – Data Deduction


The analysis of the data builds on the work completed in the validation step to begin creating useful information. The work performed in the analysis step helps to guide how the model will be built and what factors it should consider. If performed during the discovery phase, analysis could potentially be useful in discrediting or invalidating data from inclusion in expert reports.

Case Study

During the Docs R Us litigation, five productions of contract rate data was received. The analysis performed on this data helped us piece together the appropriate productions to create a complete and reliable dataset. Separately, the opposing side argued that Docs R Us billing data was incomplete and should not be used as part of the litigation. In response, the insurance carrier’s billing data was obtained and compared with the Docs R Us billing data. After rigorous reconciliations and detailed sampling was developed, the data analytics expert successfully argued that the insurance carrier's data, not the Docs R Us data, was incomplete.


The modeling step of the Blueprint is where you build your engine and turn your fuel (data) into power (useful information). The actual model will vary depending on the circumstances of the case and the sophistication of the data available.

Case Study

The next step in the Docs R Us case was designing the damage calculation. The client claimed they were owed a rate of compensation above what they were actually paid. Thus, the model subtracted ”what should have been paid" from ”what was actually paid" for each claim throughout the damage period and calculated the total. While this calculation appears to be straightforward on the surface, it was not. It took years from the start of the case to arrive at a completed model, with complications including those mentioned in the Extract and Validate steps, as well as data updates and replacements submitted by both parties in the case.


Depending on the context of the litigation, reporting typically involves a written document that outlines the data analytic expert's role, analysis performed, and conclusions based on the analysis. Steps taken during the Blueprint should be covered in this document so that the work performed is clear and defensible. The report exhibits are a crucial element of the final deliverable. The exhibits might include summary tables and other visualizations to simplify the analysis performed for the intended recipients. Furthermore, these exhibits will often end up as demonstratives for trial testimony, arbitration, or other proceedings. Thus, it is crucial that proper quality assurance is performed to validate the calculations undertaken by the data forensics team.

Case Study

The Docs R Us data analytics work resulted in initial and rebuttal expert reports, as well as a deposition of the data analytics expert. The exhibits to the expert report involved numerous slices of the damage calculation, varying by location, time period, and medical code. It also included low, medium, and high damage estimates to account for how the judge might rule on certain arguments. The case was eventually settled with a favorable result.[8]

The Data Analytics Blueprint provides a broad guideline as to how data analytics can function in a litigation setting. Although the process can be circuitous, proper communication and planning can avoid unnecessary costs. It is also crucial to have the full support and buy-in from both the client and the litigation team. After all, Theo Epstein is not winning World Series by himself.

  1. Secure File Transfer Protocol (SFTP).
  2. A magazine affiliated with the American Bar Association described this as a Hot Button issue in mid-2013. Sharon D. Nelson and John W. Simek, ”BIG DATA: Big Pain or Big Gain for Lawyers?" Law Practice Magazine, 2013.
  3. Rany Jazayerli, ”The Curious Have Won" The Ringer, 2016.
  4. Structured Data: Data that can be immediately identified within an electronic file, such as a relational database, that is structured in rows (records) and columns (fields). Unstructured Data: Data that is free-form text in business documents and reports, news articles, and social media. For example, unstructured data is found in word processing files, PDF files, email messages, Internet forums, blogs, Web pages, Twitter feeds, and Facebook pages. (As defined in the PC Mag encyclopedia at
  5. Computer techniques often involve the use of specialized software beyond the Microsoft Office suite. Database software, such as SQL, SAS, Python, or R, is commonly used by data analytics experts to manipulate the raw data.
  6. Dothraki is a fictional language used in George R. R. Martin's fantasy novel series A Song of Ice and Fire and its television adaptation, Game of Thrones.
  7. Q. Ethan McCallum, Bad Data Handbook: Mapping the World of Data Problems. California: O'Reilly Media, Inc., 2013.
  8. Analytics is becoming increasingly used in healthcare, even within Medicare (Centers for Medicare & Medicaid Services). Mary Beth Johnston and Leah D'Aurora Richardson, ”Big Data: The Next Revolution in Healthcare Operations," ABA Health eSource, 2016.