Monday, July 26, 2004

difficult optimizations .net

Recently I read a brain twister on http://www.codeproject.com/ about how to tell which ball (out of 12) was not the same weight as the other balls.  You had to do this in no more than 3 measurements.  Obviously, divide and conquer wasn't going to work for this scenario so I started looking at other solutions.  After spending more than 30 minutes trying to divine a global way of solving this riddle I realized the solution wasn't an all encompassing master solution.  The solution was to break it down into many subsets and solve those problems.  Have a strategy for each possibility.  It took me a little while but eventually I was able to cover every possible position of the odd ball.

I got to thinking about this and realized that this is a metaphor for difficult optimization.  The best solution is almost always one where you can speed things up with the most minimal code and architecture change.  However, if your unable to do this, you can still achieve your goals by breaking the problem down and fixing it at the pain points. 


Recently while authoring my Glacial TreeList I ran across some serious speed problems.  After instrumenting much of the control I found that the problem wasn't in one single place but in many places.  I had to fight serious disappointment as doubt about whether this control was feasable as a purely .net written control washed over me.  Since there were many problem areas, I decided to tackle the problems one step at a time.  When it was all said and done, my control was faster than even I had possibly hoped as I loaded 1 million nodes into the treelist in about 1.2 seconds.  The .net treeview can't handle more than 10k nodes without coughing up a huge hairball.  As follows are some notes I made while optimizing.


Some notes:

One thing I found early on (and that was quite disappointing) is that the performance difference between ArrayList and CollectionBase on the IndexOf function is staggering.  IndexOf in ArrayList with around 1 million nodes is quite a bit faster than that of CollectionBase.  This is a shame as I like to use CollectionBase to create typed collections. 

If you put X number of objects of size W where X*W>[Physical Memory] then you are in for a world of pain.  For whatever reason, most of the collection classes in .net will cause the virtual memory to thrash at about the speed of a 386/33.  While I realize that this scenario is %99 unlikely in most situations, the fact that it crops up depending on how much physical RAM you have bothers me a great deal.  It is for this reason that I am considering a high performance list class as one of my near next projects. 

For next seems to be an order of magnitude faster than foreach (wtf).  I don't know why this is, but going through and setting all my iterations to for next's yielded quite a bit of speed improvements.

Fishhook lists suck when your node count gets high and everything is in memory (as opposed to fishhook lists in a DB).  If you have a hierarchial set of data (nodes in this case) but you want to display them in a flat layout while preserving the ease of moving nodes to different branches a fishhook list is the thing to use.  Since every set of 'nodes' is just a collection of every node that has a Parent member to that node then moving nodes and navigation is a breeze.  However, the benefits of this type of construct soon fall away if you have 100k+ items.  the problem is that you have to iterate the entire list every time you want to pull all the 'children' of a given node.

I'll post more on various optimizations later.  I need sleep now.


Friday, July 23, 2004

practical object communications technologies

Over the years, technologies for object communications have evolved.  I remember in the early days when everything was basically raw socket based where you sent and received msgs then decoded the binary.  We have come a long way since those dark days.

The first technologies for object communication I became aware of were Corba and COM.  COM was an excellent first step IMO as it allowed for both inproc and out of proc communications.  With COM you could create an object, publish the object and consume it quickly without having to get into the nasty details of socket communications.  The downside was that COM could land you in DLL HELL quickly and DCOM was almost universally reviled for its difficulty of installation and use.

With .Net, MS introduced remoting.  Remoting is an excellent way of connecting objects both locally and from across the planet.  Remoting however didn't always scale very well and was mostly ignored by people creating INPROC object associations.  The downside to this is that many programmers have reverted back to the path of least resistance or 'spaghetti coding' where objects aren't so clearly defined (one of the benefits the old COM technology had forced on programmers).  Remoting is a great technology for object communications.

Now, as Microsoft begins to push for the new Service Oriented Architectures, yet another sea change is in the offing.  Web services which became much easier to author with .net and now heavily upgraded in WSE 2.0 have become the object technology of choice.  With Indigo on the horizon which MS has stated will closely follow WSE 2.0 in implementation, we now have a new technology to work with. 

I have begun to dig into WSE 2.0 and I must say I am impressed with it's abilities.  Calling WSE 2.0 pure web services isn't even really correct as it is more a messaging system that can handle different types of transport and routing.  I will be posting a lot more information on WSE 2.0 and SOA architectures in this space as I work more with these concepts/technologies.

My impressions of WSE 2.0 in the Enterprise space are nothing but positive so far.  I believe SOA designs will greatly enhance the ability of enterprises to maintain their corporate business rules and intranets.  By black boxing services, we may move back to a more implementation agnostic way of communicating within a business entity.  It will take much longer for enterprises to communicate with each other I believe simply due to security fears.

I do have some reservations about the object communication direction MS is taking in WSE 2.0 as it seems very much geared towards the enterprise.  I am concerned about this in so much as most of the applications written on the PC are not enterprise and I hate to see technologies split up with the enterprise going in one direction and everyone else in another.  What are average applications programmers that need to do simple connections going to use?  My guess is remoting will stay around for a while to service those needs, but frankly nobody likes to use deprecated systems.    What are the practical effects of Microsoft abandoning remoting to the average programmer?  I hope to have more answers to these questions in the coming months.
Some extra reading on this subject can be found at

http://blogs.msdn.com/richturner666/archive/2004/03/05/84834.aspx
http://blogs.msdn.com/richturner666/archive/0001/01/01/84771.aspx





Thursday, July 22, 2004

and then it started...

So I've been reading various blogs for over a year now and I finally decided to start one myself.  I will attempt to post useful information here as well as point out those things I find useful and relevant to software architecture.  I will specifically try to focus on architecture for the real world. 

Over my career I have had little use for ivory tower architects who postulate theories from on high that have little use.  From this space I will attempt to drill into useful architecture and methods for getting things done now.

My theories in this regard are a metaphor for my life in general I think.  Early in my life I read a book "Surely your joking, Mr. Feynman" by Richard Feynman that left quite an impression on me.  It spoke of lab experiments at some Ivy League schools, one of which was impecably setup and clean but also which rarely yeilded useful data vs another that was barely held together by tape and glue but that was a wealth of knowledge.  The point of this was that some times you have to get your hands dirty to really get into the heart of a problem.  I believe this to be quite true as I believe architects who are only theorizing on perfect systems are ultimately out of touch with how those systems will be useful in the real world.  I will attempt to leverage my real world experience with architecture to help real world architects solve real world problems.