A lesson in data storage, starring WikiLeaks and pizza

How many scanned electronic documents might represent 5GB of storage space? 

This is the question posed to me last week by New York Times reporter Nelson Schwartz. You can read more about why he wanted to know in his article “Facing Threat From WikiLeaks, Bank Plays Defense” (see page two, paragraph one).

A lesson in data storage, starring WikiLeaks and pizzaMy response? That’s like figuring out how many slices of pizza fit in a large pizza box. The answer depends on how large the slices are. A pizza box could hold 100 little slices or eight large slices.   

Similarly, the size of electronic documents absolutely impacts the answer to the original question.

Before we start calculating, we need to understand that calculations are based on pages, not documents. This is an important distinction. Documents can vary in the number of pages, so we count pages to get an accurate picture.

There are then a number of variables which will profoundly impact the final size of an electronic document. 

  1. Page size. Is this 8.5 x 11 letter, 8.5 x 14 legal, or something else?
  2. Scan resolution. This is measured in dots per inch (dpi). The number of dots relate to the number of electronic “bits” stored. The higher the resolution, the better quality the image will be and the more space this will consume.
  3. Color choice.  Black and white images take up much less space than color images.
  4. File type. Most image file types (GIF, TIF, JPEG, etc.) determine how far the image is compressed. A group 4 TIF compresses at 20:1; a JPEG compresses at 100:1. Very high compression ratios such as JPEGs are known as “lossy compression.” This means that you will have the advantage of consuming less space on the disk, but may lose some image quality.

Here’s an industry standard example: An 8.5 x 11 page, scanned at 200 dpi in black and white stored as a TIF file type. We can calculate this document will consume around 22KB of space. With that electronic page size, you can store 238,545 pages in 5 GB*

If you wanted to improve the quality of that scanned image by increasing the resolution to 400 dpi, you will end up with a document that consumes 91 KB of space. With that increased resolution, with the same scanned documents, you can store 57,614 pages in 5GB*.

This is just by adjusting resolution. But these calculations change when you change color choice, page size and the compression ratios used. It’s up to an organization to ensure that the choices it makes regarding space consumed by electronic documents will not result in images that are unreadable because of being scanned at a very low resolution with a high compression. 

Let’s talk pizza again. If you purchased 100 slices of pizza for $10 and the pizza shop delivered one box crammed with 100 tiny slices, your friends would probably never come over to watch the Super Bowl at your place ever again. 

However, if your 100 slices arrived in 20 boxes, while the boxes take up more space, everyone would be much happier.

* These are the calculations used to arrive at these figures

Size: 8.5 x 11
Resolution measured in Dots Per Inch (DPI): Example 1 uses 200 DPI, Example 2 uses 400 DPI
Color Choice: Black and White
Compression: Group 4 TIFF (20:1) compression

Example 1 at 200DPI 

  1. We calculate the number of electronic bits based on the number of dots that will be scanned.  A dot is equivalent to an electronic bit.
    (8.5 x 200) x (11 x 200) = 3740000 bits stored. 
  2. There are 8 Bits in a Byte.
    3740000 / 8 = 467500 Bytes
  3. There are 1024 Bytes in a Kilobyte.
    467500 / 1024 = 456.54 KB
  4. This is stored as a group 4 TIFF which has 20:1 compression
     456.54 / 20 = 22.87KB

Example 2 at 400DPI 

  1. We calculate the number of electronic Bits based on the number of dots that will be scanned.  A dot is equivalent to an electronic Bit.
    (8.5 x 400) x (11 x 400) = 14960000 Bits stored. 
  2. There are 8 Bits in a Byte
    14960000 / 8 = 1870000 Bytes
  3. There are 1024 Bytes in a Kilobyte
    1870000 / 1024 = 1826.17KB
  4. This is stored as a group 4 TIFF which has 20:1 compression
    1826.17 / 20 = 91.3KB
Glenn Gibson

Glenn Gibson

Glenn Gibson is the director of Product Communication at Hyland, creator of OnBase. With 15 years working in the IT industry, he’s collected several certifications over the years as a VMware Certified Professional, Citrix Certified Administrator and Microsoft Certified Professional. As a self-proclaimed “presentation junkie”, he is very passionate about everything that goes along with public speaking, and has picked up a few awards along the way too. A native of Scotland, his passions outside of work include all things Scottish; kilts, bagpipes, whisky, (real) football and is often heard beating a drum or two in his spare time.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.