Wednesday, November 30, 2011

De NIST Service

I just finished implementing my De NIST service. This was really easy since most of the work is being done by the National Institute of Standards and Technology. They maintain a database of more than 20 Million unique SHA1 Hashes. Each hash corresponds to a common file. In the eDiscovery industry, files that are identified in the NIST database have no value in an eDiscovery case. These files could be anything from standard system files to vendor print drivers to just about anything you can think of. Once identified by NIST, the have no value to us. Finding these files and weeding them out saves a lot of resources down the road.

My eDiscovery platform hashes each file it discovers (more on this in a future post) and then compares those hashes to the NIST database. Anything found gets marked as “deNisted” and moved off so that no other work is performed on that file. Files that are not found in the NIST database will be available for further discovery and processing.

Another service scratched off the TODO list. I now need to create my Hash service which will allow me to hash any file. I will use this to compare it against the NIST database and, more importantly, I will use it to ensure I only save each file (unique hash) one time. This will be used in the De Duplication process that I will get to later.

No comments:

Post a Comment