How Many documents In A Gigabyte? 2024 statistics for litigators

Posted by Paulette Keheley | Thu, Apr 02, 2020

Litigators and eDiscovery consultants alike have debated for years the best methods to create litigation budgets. Assuming that most cases settle, eDiscovery can account for a significant portion of the costs, so that is often the best place to start. eDiscovery fees can vary significantly between vendors, and you will need to consider the possibility of flat fees, per-gb hosting fees, per-gb processing fees, user fees, service fees, and more.

Most vendors charge per-GB.  The best way to estimate these vendor-based eDiscovery costs is to figure out the number of GBs you will need to process and host for your matter.

Spoiler alert Summary: Digital Warroom statistical analysis suggests a best rule of thumb is 10,000 emails per GB and 5,000 loose files (including pictures and AV files) per GB.

 

Comparing Data Size Across File Types

For the purpose of getting a grip on cost estimation, how many documents are actually in a GB? Well, the answer varies significantly depending on the data type. If we fleshed out a single GB into single paper documents, we could get a sense of the staggering scope of today’s eDiscovery workflows.

How many megabytes are in one gigabyte?

  • 1 GB = 1000 MB

What could we fit inside 5 MB?

  • The entire written works of Shakespeare
  • An AV file of Cheeseburger in Paradise being performed by Jimmy Buffet at Wrigley Field

Source: The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences by Rob Kitchin

Average Document Sizes  

DWR looked at the most recent 2000 hosted matters comprising over 150 million documents to generate current 2024 statistics on average document sizes. 

  • Email + attachments account for 80% of review data.
  • 1 in 5 emails have at least one attachment.
  • The distribution is surprisingly normal (which is comforting in a confidence sense).

The vast majority of docs fall between 50K and 350K in size (regardless of type). Given the attachment count, it’s safe to assume:

  • An “email w/attachements” will take out about 100 KB, or 10,000 emails per GB.
  • A “loose” file is close to 200 KB per doc, or 5,000 files per GB.

email size

How Many Pages Of Text Can Fit In A Gigabyte? How Much Is One Gigabyte Of Data?

  • Text files: Nearly 678,000 pages per gigabyte.
  • Emails: More than 100,000 pages.
  • Microsoft Word files: Almost 65,000 pages.
  • PowerPoint Slide Decks: Roughly 17,500 pages.
  • Images: Close to 15,500 pages.

Given the relatively larger amount of data stored in a single image as opposed to a text file lawyers are looking at around 15,000 pages of images which could encompass a single gigabyte. That’s assuming the data is coming from similar sources, of course. Hard drives and records are not uniform when it comes to data formats. A single email could contain text files, embedded images, and attached files.

The numbers make it clear: data needs to be understood and analyzed on a case by case basis to get an accurate GB count and cost estimate. You may also have to consider the possibility of your data set growing in size as you collect more data or receive opposing party data.

 

What About Other Units Of Data?

Gigabytes aren’t the only units of data attorneys have to work with, of course. Data processed during eDiscovery could be gathered in megabytes, kilobytes, bytes, or even bits. Due to the variances between data formats and file types, it’s difficult conclusively determine the number of documents that would fit in each one. That being said, these figures should give an approximate idea of how many pages can be stored on each unit of data:

  • Bit: 0.0004 pages.
  • Byte: 0.005 pages.
  • Kilobyte: 0.5 pages.
  • Megabyte: 500 pages.

 

Some of those figures may not seem like much, but they add up fast. In many cases, attorneys could be working with thousands of megabytes or gigabytes of data. It’s an enormous amount of information to manage.

 

Visualizing Data At Scale

It can be difficult to wrap one’s head around the massive amount of data contained in larger file formats. Here are a few comparisons to put things into perspective (keep in mind, for the sake of simplicity, these figures assume that all data comes from emails):

 

  • Half a gigabyte of pages would match the height of the average giraffe.
  • 1 gigabyte would be nearly as tall as a telephone pole.
  • 2 gigabytes would extend across an entire bowling lane.
  • 10 gigabytes’ worth of paper documents would cover the length of a football field.
  • 100 gigabytes would tower over the Burj Khalifa, the tallest skyscraper in the world.
  • 500 gigabytes would be nearly as tall as Mount Kilimanjaro.
  • 1000 gigabytes (the equivalent of 1 terabyte) would almost reach the bottom of the Marianas Trench, the lowest point in any ocean.

 

Considering a modern hard disk drive can easily exceed 1 terabyte of storage space, conducting eDiscovery data collection and preservation across multiple machines can be a major undertaking.

 

Increasing Data Strains on eDiscovery

Digitization has significantly increased the scope of discovery for attorneys. Law firms have more data to factor into the equation than ever. The world is creating more data with each passing day, and the amount of digital information is growing at an exponential rate. Human beings are producing new data at a rate that’s 44 times faster than we were in 2020. Imagine where we’ll be 10 years from now.

According to some estimates, there are 3.2 zettabytes of data in the world today. If that doesn’t sound terribly impressive, keep in mind there are a billion terabytes in one zettabyte. With 3.2 zettabytes of hard drive space, you could store nearly 100 million years of high-definition video.

It’s not just the volume of data that makes eDiscovery difficult; there are far more file types, data sources and formats to consider, too. We’re way beyond emails and Excel spreadsheets here:

Social media, streaming data, the Internet of Things, SMS messaging, and cloud-based communications are just the tip of the iceberg.

Not all of that data will be relevant to every case, of course. However, we’re already starting to see courts expand the definition of electronically stored information to include data and file types that would have previously never even been considered for inclusion in discovery. For instance, an employer’s counsel used GPS information to show that the litigant routinely took excessive lunch breaks and spent a large amount of time away from their worksite. That data directly contradicted the employee’s claim that they were required to work unpaid overtime to do their job.

As the digital universe expands and new formats emerge, lawyers are going to have their hands full carrying out their eDiscovery responsibilities.

 

 

Conclusion

Simplifying eDiscovery processes is absolutely critical for law firms - and it’s only going to become more important as people create more data, new data sources emerge, and average storage capacities increase. Handling it all on your own is simply not tenable.

Digital WarRoom’s eDiscovery software streamlines collection and review workflows, eliminating tedious work, filtering out irrelevant data and helping attorneys meet their burden of responsibility for preservation and production

Discovery doesn’t have to be difficult or time-consuming. Contact our team today to find out more.

 

If you found this article interesting, be sure to subscribe you and your team to our monthly blog distribution email. This email list is solely for blog distribution purposes and we promise to only send one email per month. To subscribe, simply scroll down and fill out the "Subscribe" form below the comment box.

 

Topics: Best Practices