My eDiscovery Project: Scalability Change

Monday, December 12, 2011

Scalability Change

I originally wanted to design my eDiscovery Processing platform using several self-hosting services. This would allow me to isolate frequently used tasks such as File Identification, DeNisting, Hashing, Text Extraction, etc., and also offer those services as a SaaS service to other users (customers, competitors, etc.). During some rudimentary benchmarking, I found that my services were not scaling like I needed them too. I was seeing too many “wait states” for the above listed service. It makes sense now that I think about it. I have many machines all running between 8 and 16 threads all trying to do work at the same time. Having each thread consume these services was overwhelming the servers running the services.

I have since changed this and I now have each of these “services” included with each core process. This allows for better scalability since each new machine that gets added to the processing server pool will now have its own suite of services. Once changed and benchmarked, I saw an increase of 8X – not too shabby.

As for my SaaS service that I plan to expose to the outside world, well, these will just have to be written and implemented separately. All the code is the same, so it’s not the end of the world.

My eDiscovery Project

Monday, December 12, 2011

Scalability Change

No comments:

Post a Comment