Recently I read a brain twister on http://www.codeproject.com/ about how to tell which ball (out of 12) was not the same weight as the other balls. You had to do this in no more than 3 measurements. Obviously, divide and conquer wasn't going to work for this scenario so I started looking at other solutions. After spending more than 30 minutes trying to divine a global way of solving this riddle I realized the solution wasn't an all encompassing master solution. The solution was to break it down into many subsets and solve those problems. Have a strategy for each possibility. It took me a little while but eventually I was able to cover every possible position of the odd ball.
I got to thinking about this and realized that this is a metaphor for difficult optimization. The best solution is almost always one where you can speed things up with the most minimal code and architecture change. However, if your unable to do this, you can still achieve your goals by breaking the problem down and fixing it at the pain points.
Recently while authoring my Glacial TreeList I ran across some serious speed problems. After instrumenting much of the control I found that the problem wasn't in one single place but in many places. I had to fight serious disappointment as doubt about whether this control was feasable as a purely .net written control washed over me. Since there were many problem areas, I decided to tackle the problems one step at a time. When it was all said and done, my control was faster than even I had possibly hoped as I loaded 1 million nodes into the treelist in about 1.2 seconds. The .net treeview can't handle more than 10k nodes without coughing up a huge hairball. As follows are some notes I made while optimizing.
Some notes:
One thing I found early on (and that was quite disappointing) is that the performance difference between ArrayList and CollectionBase on the IndexOf function is staggering. IndexOf in ArrayList with around 1 million nodes is quite a bit faster than that of CollectionBase. This is a shame as I like to use CollectionBase to create typed collections.
If you put X number of objects of size W where X*W>[Physical Memory] then you are in for a world of pain. For whatever reason, most of the collection classes in .net will cause the virtual memory to thrash at about the speed of a 386/33. While I realize that this scenario is %99 unlikely in most situations, the fact that it crops up depending on how much physical RAM you have bothers me a great deal. It is for this reason that I am considering a high performance list class as one of my near next projects.
For next seems to be an order of magnitude faster than foreach (wtf). I don't know why this is, but going through and setting all my iterations to for next's yielded quite a bit of speed improvements.
Fishhook lists suck when your node count gets high and everything is in memory (as opposed to fishhook lists in a DB). If you have a hierarchial set of data (nodes in this case) but you want to display them in a flat layout while preserving the ease of moving nodes to different branches a fishhook list is the thing to use. Since every set of 'nodes' is just a collection of every node that has a Parent member to that node then moving nodes and navigation is a breeze. However, the benefits of this type of construct soon fall away if you have 100k+ items. the problem is that you have to iterate the entire list every time you want to pull all the 'children' of a given node.
I'll post more on various optimizations later. I need sleep now.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment