My eDiscovery Project: Picking The Right Tool For The Job

My goal is to write the entire e-discovery processing platform in C#. From the initial set of requirements I created so far, I should be able to accomplish this. I’ve been writing C# since it was first released as a Beta many moons ago, but being proficient in the language is just one reason to for me to use it. I am also looking ahead and know that I will not be the only one creating and maintaining this code if things go right. It will be much easier for me to find C# developers than it will be to find C++ (my other core language) developers. Also, when you get right down to the differences between managed and unmanaged code, I argue that some managed code runs faster than native code. I can hear the C++ fans booing me already, but here are a couple of examples:

When running managed code the JIT compiler compiles it just before it is needed on the hosting machine. When running native code, the code is compiled at design time and needs to be compiled down to very generic machine code so it will run on a variety of systems. Having the JIT compiler do the work only when it is needed allows for optimization in the code that you would not otherwise have. In other words, managed code can be optimized on the fly where native code cannot.
Multiple threads. By using managed code, I can write code that utilizes multiple threads and let the runtime figure out how many threads to use based on how many cores are available. This allows me to write code once and target a huge variety of systems. This goes back to my scalability discussion. Since I am processing everything in parallel, I may have a few massive machines pulling work items from my queue at the same time I have some not-so-fast machines pulling work items from the queue. The optimal number of threads running on any machine can be decided at runtime instead of compile time – a big win!

Now, I’m writing a modular platform, so if it makes sense to write native code for a specific plug-in or feature, then I won’t hesitate to do it. For now though, I’m fairly certain I can get most of this project done using managed code.

Enough theory – it’s time to roll up my sleeves and get started. I’ve identified several core services that my processing platform will need to expose, so I will be working on those over the next few days. Once I get a few of the services out of the way, I will write my first Queue Manager so I can actually do something interesting with what I have written up to that point.

My eDiscovery Project

Tuesday, November 22, 2011

Picking The Right Tool For The Job

No comments:

Post a Comment