5 Ways to Use Analytics to Ease Your eDiscovery Burden
5 Ways to Use Analytics to Ease Your eDiscovery Burden
We take a look at several key options for implementing analytics into your everyday practice.
Data analytics is a common buzzword in many industries, but what exactly does it mean and how is it beneficial? With the rise of technology-assisted review and analytics options, corporate legal departments and law firms are increasingly moving away from the once standard linear review by custodian in favor of analytics to bring more efficiency, accuracy, and cost savings to the eDiscovery process.
Outlined below are five analytics options that have become more common, along with the best use case and important considerations for each. A case study is also provided to demonstrate how analytics is being used in practice to ease the eDiscovery burden.
1. Keyword Expansion
Keyword Expansion will allow you to search and identify unknown relevant terms not yet included on your keyword list. This in turn may lead you to identify additional documents not previously included in the review population. By including keyword expansion in your eDiscovery process, you ensure the appropriate documents are included for a more complete review.
Use Case
You have identified a starting list of keywords and would like to search for conceptually similar or related words.
Consideration
Data sets with high volume count can pose an eDiscovery challenge in striking the right balance between reducing the number of documents necessary to review and capturing all potentially relevant information.
Why Is It Helpful?
- Ensures a more comprehensive review
- Identifies additional keywords and thus additional documents for review
When Should It Be Used?
- A preliminary keyword list exists
- Review expansion is deemed necessary
- Data sets are large
2. Email Threading
Email Threading is commonly used to aid in review efficiency and consistency. An email thread is a single chain, which includes the original email, any attachments, all responses, and all forwarded conversations. This single chain, known as the “inclusive email,” can be reviewed in place of all duplicate emails within the thread and helps paint the most complete picture of the exchange.
Use Case
To reduce the review population, all non-inclusive/duplicative emails can be excluded from review.
Alternatively, email threads can be sorted during review so that all thread components are reviewed together for increased coding consistency.
Consideration
Rolling deliveries can erroneously group threads together or result in incomplete threads if new documents are being added to the database after review begins.
Why Is It Helpful?
- Increases consistency of review coding
- Reduces overall data volume / spend
- Increases review speed
- Provides a more complete picture
When Should It Be Used?
- Data has high-quality metadata and extracted text
- Most of the data is in the system
- Data sets are of all sizes
3. Language Identification
Language Identification is often used to identify and parse documents that are written in a non-English language. Language Identification analyzes the extracted text of each document to determine the language(s) within each file. This allows for proactive planning for foreign language review needs.
Use Case
Documents can be batched to foreign language reviewers by language at the outset of the review to avoid delays and ensure the appropriate resources are assigned.
Consideration
Quality extracted text should be considered as Language Identification relies heavily on the text for accuracy. Poor text may render the analytics tool ineffective.
Why Is It Helpful?
- Allows for proactive review planning
- Structures like-data together
- Lays the groundwork for the rest of the review workflow
When Should It Be Used?
- Source data is likely in a foreign language
- Data sets are of all sizes
- Other analytics can also be used (e.g., predictive coding)
4. Concept Clustering
Concept Clustering uses an algorithm to cluster conceptually similar documents together. This process generates a description of the concept the documents share and makes it easier to identify documents that could prove to be potentially relevant or irrelevant.
Use Case
To increase review speed and accuracy, group similar concepts during review batching. Alternatively, to reduce the review population, exclude non-relevant content by grouping documents with non-relevant concepts.
Consideration
Documenting the decisions and reasoning behind each decision is a defensible practice that can be called upon if an opposing party questions the use of analytics (e.g., search terms and structuring, culling, and reviewing data).
Why Is It Helpful?
- Groups like data together (conceptually)
- Increases consistency of review coding
- Increases review speed
- Reduces overall data volume / spend
When Should It Be Used?
- Most of the data is in the system
- Data sets are large
- Documents have been left untagged
5. Active Learning
Active Learning allows a model to be continually trained to return similar documents until all potentially relevant documents are reviewed. This can dramatically reduce the time needed to review large data sets.
Use Case
To help locate potentially relevant documents more quickly and reduce the review population.
Consideration
Enough time is available to align the size of the document population and production deadline to ensure ample time to train the model and QC the results.
Why Is It Helpful?
- Groups like data together
- Increases consistency of review coding
- Increases review speed
- Reduces overall data volume / spend
When Should It Be Used?
- Most of the data is in the system
- Data sets are large
- Documents have been left untagged
Case Study
In the following example, there was a 30-day deadline to complete the data collection, review, and production of potentially relevant documents to opposing counsel. As a result, multiple analytics tools within Relativity were successfully utilized to dramatically reduce the review population, and associated cost, while meeting the tight deadline.
Process
See Figure 1.
Figure 1. Population Filtering Process
Validation
All documents coded in the Active Learning project are assigned a relevancy rank based on the coding. In this scenario, documents with a relevancy rank of 51% or higher were the focus. Ultimately, about 10,000 documents or about 15% of all documents in the Active Learning project were reviewed. With the Relativity feature called the Elusion Test, we validated the accuracy of an Active Learning project across the documents that were not coded in the Active Learning project and found an error rate of about 1%, which was deemed an acceptable margin of error for a sound and defensible review. We took all documents coded “Relevant,” along with related family members, applied a privilege screen across that population (to ensure nothing privileged was produced), and produced all non-privileged documents.
Outcome
Less than 1% of the initial population was produced, which led to review cost savings of over $100,000. These results were achieved much faster and with a higher degree of accuracy when compared to standard linear review.
Key Takeaways
Each industry, company, and legal department is unique and may have different levels of maturity when it comes to implementing analytics in eDiscovery. The following tips can be helpful as you explore your use of analytics in eDiscovery.
- Closely review the ESI agreement to help shape your eDiscovery plan
- Strike a balance between planning your strategy and the timing constraints
- Data quality and type play a large role in the success of particular analytics tools
- Analytics generates the most notable efficiencies in large data sets, but most analytics can be applied to data sets of any size
- Document your decisions and reasoning to create an auditable and defensible trail
- Lean on experienced analytics resources from your eDiscovery vendor, law firm, or consulting partner to help guide you in the practical application of analytics tools on your matters and to provide oversight throughout the process
Co-authored by:
Raul Mendoza
Analyst, Legal Management Consulting
+1.312.752.3341
rmendoza@stout.com
Chris Bojar, JD
Litigation Support Manager
Barack Ferrazzano Kirschbaum & Nagelberg LLP
+1.312.629.5174
chris.bojar@bfkn.com