My geeky friends
10/25/2006
My friend John sent me a funny text message today:
I am going to class dressed like Zod.I love my geeky friends.
Detecting duplicate images using Python
10/16/2006
Brent Fiztgerald posted an open question about if there was a way to compare images (in his case, detect offensive user-submitted icons). I whipped up this quick recipe using the Python Imaging Library. Here it is:
import Image
# the function to compare the images
def is_the_same(base, test_image):
for i in range(len(base)):
if (base[i] - test_image[i]) != 0:
return False
return True
# set the test inputs; all images will be compared against base
base = Image.open('base.png').getdata()
bad = Image.open('bad.png').getdata()
good = Image.open('good.png').getdata()
print is_the_same(base, bad) # should return True
print is_the_same(base, good) # should return False (good!)
It is nothing really fancy, but sure does seem to get the job done rather quickly. Hacking this script to scoop/compare images in a directory is rather trivial.
The "my language is slow, therefore it sucks!" fallacy
10/07/2006
Note, this is a rough continuation of the series. Catch part two if you are curious what all of these rants are about.
I'll start by quoting the awesome Peter Norvig on Python vs. Lisp:
Qualitatively, Python feels about the same speed as interpreted Lisp, but very noticeably slower than compiled Lisp. For this reason I wouldn't recommend Python for applications that are (or are likely to become over time) compute intensive. But my purpose is oriented towards pedagogy, not production, so this is less of an issue.But this post is not about Python or Lisp specifically. It's about the general attitude many developers have towards up-and-coming or pre-pubescent languages. Take Ruby for instance. Generally, the only two negative things I continue to hear about the growing phenomenon is: 1) that the syntax sucks and 2) that it is dreadfully slow.
From my experience, trying to defend a language's syntactic rationale to another developer is like explaining to a child that eating broccoli is healthy and yummy. The only way they will truly understand is when they are more mature and are open to trying new things. In the case of Ruby, even if I try to defend its clean syntax to Perl or Java coders, it would be moot. Developers settle on a language for several reasons: 1) they like the syntax or philosophies, 2) they like the third party library support (this issue ties into short-term vs. long-term viability), 3) they are forced to, or 4) they missed the curve or complacent (have you ever dealt with a clingy programmer?).
On to the second issue - rate of computation. Given the furor of open source coupled with Moore's law and decreasing memory prices, speed and optimization is slowly becoming an irrelevant issue. This of course is amplified with distributed computing and concurrent systems, which has started to pick up steam (see programming.reddit). It's all about leveraging a network to solve large problems. Python developers that complain about speed should absolutely check out Pyrex, which compiles Python modules into C extensions (I am going to play with it this weekend).
How come Erlang and Ruby are getting so much attention now than ever before? If you apply Occam's Razor, it's because these languages appeal to developers that care about finding more efficient tools for their craft; Ruby for the Java coders looking to slim down their codebase, and Erlang for developers that want concurrency without writing C. Or, maybe we can link this to Norvig's quote about pedagogy. Perhaps the application of these languages make for excellent brain exercises. Again, this is all relative to the developer.
Take the trend of web development for instance. In the mid 90's, the primordial web programmer was coding away in either C, C++, or Perl. Today, that same person can pick between even more tools, including PHP, Perl (still a great contender), Python, Lisp, Haskell, Erlang, Ruby, etc. Fundamentally, the game has changed and the smarter hackers have figured out how to either adapt their favorite languages to the space or moved to one that can boost productivity. Entrepreneurs/coders creating their own companies do not care how functional their code-base is or how normalized their database looks like. Instead, they care about being efficient with time. This ties into why scripting languages sell so well: you can make a powerful application really fast. Sometimes, it's not worth the hassle to allocate/manage memory when you can let a higher-level language do it for you.
Yet we still cannot settle on a programming language because none of them are perfect. In fact, saying there is an uber language would be a funny paradox. It would be inconvenient to write a web server to serve a static website in C (you'd be reinventing the wheel) just like it would be horrific to build an operating system like Vista using Perl. You could probably do them both (mileage will vary), but they are not the best tools for getting things done efficiently. Ultimately, we have to always explore the space and experiment with new languages.
RSync to the rescue
10/06/2006
My iMac has been complaining about a corrupted hard disk volume header. Fortunately I have been keeping all my documents in ~/, which makes things really nice and simple to backup. On a Windows box, I couldn't simply backup C:\, because I would end up backing up the entire Windows OS as well, which was typically the reason for backing up in the first place (I formatted my computer 20+ times in one academic year).
If you need to backup your home directory (includes Desktop, Documents, Pictures, Movies, Sites, etc.) to an external hard drive, fire up your terminal (which drops you into your user's home directory by default) and key rsync -aE --delete ~ /Volumes/bender/. Replace 'bender' (yep, I am a Futurama nerd) with your hard disk's name. The prefix of the string copies everything and uses my iMac as the source; it also deletes files that have moved or been renamed. Sure this is low-fi - but it works like a charm. Since I have a lot of data, I usually kick off the command and go play the guitar/drums for an hour or two.