pmb: (Default)
[personal profile] pmb

So [ profile] conform has been going through the Streamtech problem set and doing them one by one. We got in a conversation about dynamic programming and memoization, and he pointed out which has that in spades. In particular, with input sizes of up to 100*100, the naive algorithm is going to run in time O(n^8)ish, and finding the answer will take forever.

So I coded up a solution to the problem ( ) using my memoize decorator from long ago. I created a 100*100 test case and the program proceeded to run forever and use all my memory. Clearly something was up. So I created a test-case creator ( ) and tested it on problems of size 5,7,9,11,...,50 because 50 was the largest test case that finished in a reasonable amount of time.

Putting this into gnuplot and fitting the curve to f(x) = a*x**b, gnuplot found that my program was running in time O(n^5.5). This is too darn slow.

So I made a new solution ( ) which doesn't use my cheater memoize decorator, but actually builds a 4 dimensional array and fills it in. This should definitely run in time O(n^4), but after testing the other, I wanted to test this one the same way. So I ran it on all the same size inputs, and then took the data and fitted it to the same curve in gnuplot, and the exponent parameter was, in the fitted line, almost exactly 4. Hooray!

Two programs, two runtimes

Then I ran it on a 100*100 example, and it finished in 5 minutes and 30 seconds. Not great, but not bad for python. So that's that. A cute problem that can be resolved into a fast(ish) solution using dynamic programming/memoization, and a potent reminder to me that hashtables are not always O(1). I think the neatest thing about the graph is that you can actually see the hastables starting to fail - the runtime of the programs match each other exactly up through n=20, and then begin to diverge as collisions become more and more prevalent. I wonder if this is a good test of a hashtable, or if this is a useless corner case. Because I really like my memoize decorator and hate having to continually wonder about whether or not it's working right.

Update: [ profile] conform found a faster and sexier way using an algorithms from Jon Bentley's 'Programming Pearls'. At first I couldn't believe that it worked, because it was totally subtle, but now I buy it. See how to do this right at:

Date: 2009-01-14 10:59 pm (UTC)
From: [identity profile]
could you possibly help with a minor python problem?

Date: 2009-01-14 11:04 pm (UTC)
From: [identity profile]
Certainly. IM is best for this kind of thing. I am on gtalk and jongleurpeter and AIM

Date: 2009-01-14 11:20 pm (UTC)
From: [identity profile]
As my professional career continues, I slowly drift further and further from these sorts of problems. But if you need to know about the day-to-day practicalities of implementing bit-bang protocols, radio controls, frequency hopping, clock synchronization, or anything in that direction, I can certainly help you.

Date: 2009-01-14 11:39 pm (UTC)
From: [identity profile]
Ironically, I've moved in the opposite direction. I used to write microcode and debug with an oscilloscope. Now I'm a data-structures and algorithm junkie and do significant work in C# and Python...

Date: 2009-01-14 11:46 pm (UTC)
From: [identity profile]
I'm not worried that I couldn't move back in the algorithms direction, just commenting that I am doing that very little anymore.

I happen to like oscilloscopes and logic analyzers and embedded systems a lot, though. For Christmas I got a pretty cool beginner RC helicopter, which was a cool gift in itself. But some part of the fun for me was in realizing that they had implemented a 2.4GHz DSSS radio with diversity receive (a very difficult feat) in a consumer level product (even more difficult!), and opening it up to see what transceiver chipset they were using. (Cypress WirelessUSB series stuff, if you care, but they had to put two receiver chips in for diversity functionality...)

Date: 2009-01-14 11:53 pm (UTC)
From: [identity profile]
I wonder if the O(n^5.5) version is slow because of the hash table or because of the garbage collector.

See also this performance bug.

Date: 2009-01-15 12:27 am (UTC)
From: [identity profile]
Interesting. I am running Python 2.5 (due to laziness) and I definitely WAS building a giant dict of tuples... hmmm...

Date: 2009-01-15 12:47 am (UTC)
From: [identity profile]
The patch isn't applied until Python 2.7 or 3.1, which don't exist yet, so upgrading won't help you. "import gc; gc.disable()" will work though, unless you're creating cyclical garbage. ;)

Date: 2009-01-15 12:03 am (UTC)
From: [identity profile]
are there actually any pale green plusses on that chart? I can't fucking see them.

Date: 2009-01-15 12:11 am (UTC)
kirin: Kirin Esper from Final Fantasy VI (Default)
From: [personal profile] kirin
They start lining up with the red n^5.5 line shortly after it passes the blue one. Before that they're under the blue line and pink x's.

Date: 2009-01-15 12:15 am (UTC)
From: [identity profile]
It is true that Gnuplot's default colors are basically the worst choices possible. Fixing this is not easy, and I didn't want to waste more time than I already had.

Also what [ profile] kirinn said.

Date: 2009-01-15 12:21 am (UTC)
From: [identity profile]
actually, blame fucking livejournal.
the chart was truncated on the right side on my view of your entry, with no scrollbar.

Date: 2009-01-15 01:10 am (UTC)
From: [identity profile]
Use PyX. (Especially for a post about a problem solved in python.)

Date: 2009-01-15 09:23 pm (UTC)
From: [identity profile]
Let D be the input matrix (with annoying 1-based indexing -- which actually turns out to be useful when coding a solution with python). "D" for "Data".

Let C(i,j) = sum_{p=1 to i} sum_{q=1 to j} D(p,q). Note that C is the same size as D. For convenience, also define C(i,0)=C(0,j)=0 (yay zero-based!). "C" for "Corner". Obviously C can be computed in O(N^2).

The sum over the sub-rectangle CartesianProduct((u,v], (x,y]) is C(v,y)-C(u,y)-C(v,x)+C(u,x). This makes more sense when standing on a grid-tiled floor, preferably while sipping tea. Enumerating all rectangles is easy (4 nested for-loops) in O(N^4).

I can has fortran? takes ~30sec in python. I don't yet quite grok the algorithms you posted.

Date: 2009-01-16 06:07 am (UTC)
From: [identity profile]
Just for posterity (I didn't know about this and had to look it up), the "Streamtech problem set" is a set of 100 programming problems from which one may select one to solve along with an application for employment, and their index page is here ( Streamtech, in turn, is apparently based in the Netherlands and claims that they "specialize in the development of first-class web applications. Notably, we offer the BittAds online advertising application."

[ profile] pmb, I really enjoyed reading some of these (and plan to use some of them to teach friends about comparative language features), how did [ profile] conform stumble onto these? Are they widely known in the CS world? I'm an engineer and had never heard of them until now. Are they original, or do they descend from somewhere else (Knuth?)

Date: 2009-01-16 03:11 pm (UTC)
From: [identity profile]
They are sort of oral-tradition problems. I had done a problem remarkably similar to these in the ACM programming contest, and another one I had done in algorithms class.

Date: 2009-01-16 05:53 pm (UTC)
From: [identity profile]
They appear to be ACM problems, actually. The bottom of the page says "Problems courtesy of:".

I found them when [ profile] lindsay_kuper posted to LJ about applying for an internship at Streamtech.


pmb: (Default)

October 2009

    1 23

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Sep. 20th, 2017 07:53 pm
Powered by Dreamwidth Studios