Data Modeling: 101 (Framework Included)
Data Modeling: 101 (Framework Included)
Get Smart: Part 3
"Get Smart, Forensically Speaking...for the Litigator” is an article series written specifically for litigators. The series helps litigators understand key concepts, strategies, and best practices without getting overly technical. Below is Part 3 of the article series. If you missed Part 2, click here.
We’ve all heard the saying that “a building is only as strong as its foundation.” Forensically speaking, this is a good analogy for data modeling: The validity of your analysis is dependent upon the quality of your data.
Modeling is the fourth in a sequence of six phases per the CRISP-DM (Cross Industry Standard Process for Data Mining) methodology. Modeling follows the initial phases of (1) Business Understanding, (2) Data Understanding, and (3) Data Preparation, and precedes (5) Evaluation and (6) Deployment. In most cases, the amount of time spent on the first three phases significantly outweighs that spent on the last three. Although the time required for each phase is heavily dependent upon the unique aspects of the case, the 80/20 Rule is generally a good approximation. This is similar to how painters would plan their work effort — 80% preparation work and 20% painting. In the book Introduction to Data Mining with Case Studies, data mining expert Dr. G.K. Gupta states, “…most time is spent on data extraction, data cleansing and data manipulation.” He further states, “In addition to cleaning [the data]…it needs to be…relevant, adequate, and well-defined.”
To provide a structure for managing data-critical projects, we provide a three-part framework.
Click below to view the full article.