I’ve finally hit a problem requiring multidimensional data structures, mathematical grunt, and speed, coupled with good DB bindings and text/file handling. Normally, I would use a perl script to fetch data from the db and process it with R (as I’ve been having trouble with the R DBI library). However, processing 31K chunks of data for 13K variables each just won’t work in R.
So I’m delving into python as a one-stop shop for all my woes. I’ve been procrastinating about learning the language, because, let’s face it, why write bad code in a new language when you can write ugly code in your long-term favourites.
Some resources I’ve found are: a python tutorial, the NumPy library for data arrays, SciPy and the python DB-API module. I’ll also have a look at StatPy for statistical computing in python.


Why not have the best of both worlds:
http://rpy.sourceforge.net/
Comment by Greg Tyrelle — August 24, 2007 @ 5:47 pm
Ah yes – my next port of call. I’m not yet sure I understand how everything fits together, though.
Besides, I have to get the &*#% data out of the db first
Comment by chris — August 24, 2007 @ 6:01 pm
I found myself editing a MODELLER script (in Python) recently, to accept files as command-line arguments rather than as hard-coded variables. Perhaps we all turn to Python eventually
Like you, I’ve also taken the “learn it as and when you need it” approach. You won’t catch me reading programming manuals for fun.
Comment by Neil — August 26, 2007 @ 12:49 am
I’ve been meaning to make the switch for a while, but being the lazy sod I am have been waiting for a critical mass of stuff hanging over my head. What really draws me to python is the natural handling of 2+ dimensional data, with an eye on generating subsets, summaries on those, etc etc.
Oh, and the R bindings, too
Comment by chris — August 30, 2007 @ 2:21 pm