Ah the memories
Eric Lippert’s blog post about partial order sorting was, as always, interesting and well written. However, this specific post brought back memories of a project I did for Gateway Foods of Lacrosse Wisconsin back in the early 90’s. (I googled them and they don’t appear to no longer exist.)
Though my project was not as fun or easy to understand as Eric’s example, Gateway wanted to perform the "hot and new" business analysis called Activity-Based Costing, or ABC. Instead of just looking at which food items were most profitable, they looked at which customers were most profitable.
Aced-out by the SQL dude
They hired one of the big eight firms (remember when they were called "the big eight?) Booze-Allen-Hamilton to do the analysis and then went looking for a programmer. I heard about the project from my contact "George" for whom I had delivered a training course on Clipper earlier. I got excited, but really had no chance at it because the IT department didn’t believe in PCs, so they hired a SQL programmer to implement.
A few months went by and George called to tell me the project was going no where. A few more later George cakked to tell me the project was in real trouble. A few month more still, and I got a call from George asking if I could I start on the project that week in hopes of salvaging it (of course they had already spent most of the budget on the SQL guy, but what can you do?)
Be careful what you wish for
Later that week I arrived and met with the people from Booze, and they proceeded to hand me a stack of paper 10 inches thick that contained the allocation formulas! (I kid you not.) My job was to write a Clipper program to apply those formulas to a HUGE 300Mb database (remember, this was the early 90s and 300Mb was huge back then.) The allocation formulas were for things like "Allocate the cost of fuel to deliver to each customer based on the miles required to travel to customer. Sum the cost of fuel and multiply by the miles traveled for each customer divided by total miles traveled to all customers." And so on. Around 1800+ formulas in all.
For the next three days, my eyes glazed over as they tried to understand the formulas. I starting thinking "What the hell have I got myself into? There is no way I will ever be able to understand all of these formulas well enough to encode them into a program. No wonder the SQL guy failed. I am screwed too!" But finally, it started to dawn on me; "Don’t learn the formulas, learn the patterns." I’ve always been extremely good at pattern recognition so why didn’t I see it immediately? (which contrasts nicely with all the other things at which I’m extremely bad.)
Patterns; what a concept!
The initial database had number tables like fooditems, customers, and order. It turned out there were only three patterns:
- Scan a table and perform a calculation like sum or average and write the output to a one-record table. Examples included calculating total sales, average order size, etc.
- Scan a table and perform a calculation like sum or average for each fooditem, or customer, or similar, and write the output to a table that has one record for every fooditem, customer, or similar.
- Scan a table and perform a calculation like every record and write the output to a table with the same number of records as the scanned table.
That’s it. Three patterns. All 1800+ formulas boiled down to three patterns. SQL gurus know the following are examples of the above three patterns:
-
SELECT SUM(SubTotal) AS TotalPrice INTO OneRec FROM Orders
-
SELECT CustomerID, SUM(SubTotal) AS TotalOrdered INTO OnePerCustomer FROM Orders GROUP BY CustomerID
-
SELECT opc.CustomerID, opc.TotalOrdered/c.TotalDeliveryMiles AS SalesPerMile INTO CustomerFactors FROM Customer c INNER JOIN OnePerCustomer opc ON opc.CustomerID=c.ID
Technique triumphs over toolset
Ultimately I succeeded using Clipper where the SQL programmer had failed. He didn’t fail because he used SQL and my success was not because I used Clipper. He failed because he tried to hand-code all 1800+ formulas and I succeeded because I wrote a Clipper program to handle the three patterns, not the 1800+ formulas. In hindsight given the tools available at the time, the best solution would probably have used Turbo Pascal and SQL, but such is life.
The Gateway people entered the allocation formulas from Booze into a grid in my program, my program then determined which formulas to calculate and in which order, and then program executed those formulas. I used Clipper’s "macro" capability to allow me to execute the text formulas much like how Execute() can be used in VBScript. However, VB.NET programmer could do the same thing today by writing a program that generates SQL and then passing that SQL to ADO.NET to execute.
Back to Eric’s Post
The "which formulas in which order" was the part I was reminded of by Eric’s post. It actually took me three weeks to write that first program. 12 hours a day. 7 days a week. In the dead of winter. In Lacrosse Wisconsin. Brrrrr. (I’m from the south in Atlanta and I am a wuss when it comes to frigid weather!) Today I could probably write that same program in VB.NET and SQL in an afternoon. But then hardware is a lot faster now.
Intermediate Representations
Why three weeks? My program took almost two days to apply the 1800+ formulas to the 300Mb database! Ever try to debug a program where you have to step through code for 24 hours before you get to see the next bug? Not very productive. It was here I actually learned the tremendous value of creating an "intermediate representation." HTML is a perfect example: one program can generate HTML and another can display it. SQL, XML, and MSIL are also immediate representations. One of the benefits of an intermediate representation is you can validate output without having to observe behavior. In essense intermediate representations provide much the same benefits as state machines.
I split my program into two modules; one that compiled the formulas, and a second that executed the compiler’s output. My "compiler" took only about 2 hours to run which allowed me to debug my program in this lifetime. It still took two days to execute the formulas, but at least those formulas executed correctly!
Just when I thought I was out…
I went home victorious. Or so I thought. A few weeks later I get a call. (They tracked me down while I was on vacation, no less!) The Gateway people decided they wanted to modify the formulas and run "what-ifs." After each "what-if" it took them two days to get an answer. "Couldn’t I make this thing any faster?" (Imagine if the SQL guy had succeeded in encoding all those formulas into a SQL script? He would have had a job for life! A boring-as-hell job, but job nonetheless. Until Gateway realized all the money they were wasting and then threw him out on his ear.)
My original program, except for the compiler part, was almost exactly what Eric Lippert blogged about a few days ago. I was damn proud I wrote it. However, optimizing it was much, much harder. Turns out what was happening was each pass on each of the tables took a really, really long time (we knew that), but also there were formulas that could be performed on the same pass, but I doing a complete pass for each formula.
Well, to finally cut a long story short, another three weeks and I was able to implement an optimizer which batched together formulas that could run at the same. That version took only about 6 hours to run. Good enough, and that was the last of Gateway foods for me (I never was very good at milking clients; guess that’s why never made much money while in consulting.)
Optimal Dressing
So, back to Eric’s example. Let’s assume it took one minute for each seperate Eric put on, but if he was able to put on multiple items "at the same time", then each batch of items would only takes him 1.5 minutes to put on. His original program’s output, which dressed him in this manner:
tophat : [ ] shirt : [ ] bowtie : ["ashirt"a] socks : [ ] vest : ["ashirt"a] pocketwatch : ["avest"a] underpants : [ ] trousers : ["aunderpants"a] shoes : ["atrousers"a, "asocks"a] cufflinks : ["ashirt"a] gloves : [ ] tailcoat : ["avest"a]
This would take Eric 12 minutes to dress. However, if he batched items that could be done at the same time (i.e. items that had no prior dependencies), Eric could dress in 4.5 minutes:
tophat : [ ] socks : [ ] shirt : [ ] underpants : [ ] gloves : [ ] then trousers : ["aunderpants"a] bowtie : ["ashirt"a] vest : ["ashirt"a] cufflinks : ["ashirt"a] then pocketwatch : ["avest"a] shoes : ["atrousers"a, "asocks"a] tailcoat : ["avest"a]
The Challenge:
So here’s my challenge Eric (and I hope you don’t hate me for it): Implement an optimizer that will generate batched output where items in each batch have no interrelated dependencies, as shown in the last example above. It was a bitch to write in Clipper; how about in JScript?
That’s a straightforward modification of the existing toposort algorithm.
Call the "height" of a given dependency the length of the LONGEST path from that dependency to a dependency with no children.
Notice in your example that in the first group, it is all nodes with height zero, then the next group is height one, then the next group is height two, and so on.
THerefore the problem reduces to computing the height of every node. We modify the algorithm so that Visit returns the height of the visited node, where a node with no children has height zero, and the height of a node with dependents is equal to one plus the maximum height returned by the recursive step.
That’s a sketch, anyway. It would be fairly easy to come up with an implementation with an asymptotic run time of O(V + E), where V is the number of nodes and E is the total number of children in all nodes.
Wow! That was a fast reply. It felt like no sooner had I posted than you answered the challenge. And sir, I believe you are correct.
When I saw your response, I had two emotions: 1.) Cool, he got it, and 2.) Darn, I hope he’s wrong. Why the latter? Because I worked really hard on optimizing it 10+ years ago, and it couldn’t have been *that* easy. Certainly you misunderstood? Or maybe I didn’t phrase the question correctly? Or maybe that’s what a high IQ and a CS degree from Waterloo will get you? :-)
So I have a follow up question: is this a pattern you recognized, or was it something you solved?
Still, I think the original problem I solved was more difficult than the problem I posed you (my mistake.) I think maybe I was trying to minimize the time required to traverse the tree to count the height. Remember, this was pre-Pentium and 640k limit days; nothing like the computer power we have today.
In trying to recreate how I (would have) solved it, I think I traversed the tree I built in memory (Clipper had ragged arrays) and any nodes that did not have a parent node got collected, deleted from the tree, and then output as a batch by my compiler. I must have continued this until all nodes were deleted. But that seems too easy. There must have been something more to it.
I only wish I could remember how I solved it (if it were important enough I’m sure I could find the code *somewhere*.)
Ah, the mists of time.
> is this a pattern you recognized, or was it something you solved?
Both — though I’m sure that I’ve never seen an optimized-for-parallelism topological sort before, it was fairly easy to figure out the algorithm by recognizing the pattern. I wasn’t quite sure what you were asking, so I drew out the dependency tree and marked on it the "set order" that you describe. Once I did that, it was obvious that the "set order" was in fact the greatest path length to a leaf, and that’s a well-known algorithm.
> maybe I was trying to minimize the time required to traverse the tree to count the height.
The algorithm I came up with is I believe O(V + E), which is the best you can do asymptotically.
> think I traversed the tree I built in memory (Clipper had ragged arrays) and any
> nodes that did not have a parent node got collected, deleted from the tree, and then
> output as a batch by my compiler.
What you’re describing is an alternate algorithm for topological sort. We could implement your variation on toposort like this:
1) find all the "bottoms" — the nodes with no children (if you can’t find any and there are still nodes under consideration, you have a cycle!)
2) output them as the latest cohort
3) remove them all from the child lists of all of their parents
4) repeat until there are no more nodes.
The hard part of that algorithm is implementing it efficiently. This turns into an O(V^3) algorithm if you implement it naively, and if there are cycles then you have to implement a cycle detector that doesn’t go into an infinite loop. The other irritating part of this algorithm is that it is "destructive" — you destroy the input structure as you process it.
>> The other irritating part of this algorithm is that it is "destructive" — you destroy the input structure as you process it.
Yeah. But that was actually a good thing in Clipper with a 640k DOS limit… :-)