Checking for file existence in Python: isfile vs. open
9/20/2006
I have been doing a lot of image processing in Python recently. Specifically, I am trying to scoop up files, named in an iterative convention located in a known directory. Part of the process requires me to test if the file exists in the first place. This was a good opportunity to test os.path.isfile() vs. open() with and without string interpolation. I set the loop counts to something ridiculous: 1,000,000 (can directories even hold this much?).
Here are the results - the first line is the start time, the last line is the finish time:
isfile - with string concatenation
----------------------------
(2006, 9, 20, 21, 46, 56, 2, 263, 1)
(2006, 9, 20, 21, 47, 27, 2, 263, 1)
isfile - with string interpolation
----------------------------
(2006, 9, 20, 21, 47, 27, 2, 263, 1)
(2006, 9, 20, 21, 47, 58, 2, 263, 1)
open - with string concatenation
----------------------------
(2006, 9, 20, 21, 47, 58, 2, 263, 1)
(2006, 9, 20, 21, 48, 30, 2, 263, 1)
open - with string interpolation
----------------------------
(2006, 9, 20, 21, 48, 30, 2, 263, 1)
(2006, 9, 20, 21, 48, 35, 2, 263, 1)
Interesting indeed! I never knew
open was that much faster than isfile. Note, I set the open to 'rb' to force binary - which makes a huge difference compared to just an 'r'. I know you can test for file existence using glob, but did not bother to test it this round. As always, I have uploaded my super basic and totally non-DRY script to the sever. Feel free to download it.For reference, this was tested using the default installation of Python (2.3.5) that ships with OS X Tiger. My iMac is 2GHz (Intel Core Duo) and has 2 gigs (DDR2) of RAM. I am going to install Python 2.5 soon. I look forward to the overall speed optimizations.
Getting value from the long tail of online social networks
9/05/2006
After checking out Kenny's rant on LinkedIn, I started thinking about social networking and whether or not you can find the right candidate from growing networks. I think the first problem is that most of these sites force linear/predictive behaviors to tackle random problem spaces.
Suppose you are a writer or animator looking to make a feature film. You need to find a producer or someone with enough cash and interest to bankroll your project. Let's call this type of person an anchor - a person that makes a significant impression in at least one network degree. Confused? Think of Steven Spielberg. If an anchor like Spielberg is in your 1st ( e.g. your friends) or 2nd degree (e.g. your friends' friends) network, you'll already know about their presence. The distance between people does not grow proportionately with more degrees - it grows exponentially. The shorter the distance means you'll have a better chance of approaching and convincing someone to work with you. The predicate of course is based on a distributed and favorable level of trust for all stakeholders between you and the target person. The difficulty of maintaining that high quality level of trust increases everytime a new stakeholder comes between you and the target.
For instance, let's ponder if Spielberg is now in your 3rd degree? Well guess what? That means that you are in his 3rd degree too! Assuming that the predicate is true, the odds of getting him to produce your flick is far less than when he was in your 2nd degree. I am not even factoring the time that Spielberg would have to take to sift through profiles/resumes/reels/blah. When you take this into account, the odds drop even further.
If you already know a person in your 1st degree, you get no value from LinkedIn. The way you get more value is when your network makes more bridges with other interesting people (remember when mom urged you to only make friends with winners?). If someone in my social network makes a relationship with an anchor, my relative value improves.
But as a user's 1st degree network grows, I believe that his/her perceived worth to the 2nd/3rd degree networks shrinks. Put yourself in Spielberg's shoes. Suppose you have exhausted your 1st degree and 2nd degree networks and still can't find a director for your new film. The odds of finding the right candidate on LinkedIn is like finding the right candidate in a coffee house on Sunset Blvd. Low. Before you jump up and say that LinkedIn has these trust/testimonial articulations...hold on for a second. To someone like Steven, trusting a candidate in the 3rd degree is like trusting a random stranger off the streets - you would not want them to waste your time or money.
If you're a big fish in a small pond, you will always do well for yourself. The moment the size and species of the pond increases, survival of the fittest becomes vital. Sometimes these small and personal networks facilitate tangible opportunities better than their digital counterparts.
Introducing Newspyle.com
9/01/2006

I have a new mashup out called Newspyle. I borrowed/hacked some source code from Sam Ruby's Planet and made a quick aggregator for digg, del.icio.us popular, and reddit. Since I get my morning/afternoon/evening fixes from these sites, it felt logical to split the content into keyboard navigational panes. The project started out as a small concept that was encouraged by a few friends to be made public. That said, the interface is not meant to be mainstream or degrade nicely across all browsers (at least not now). The whole thing was based around my needs, browser, resolution, etc. - well, you get the point.
In case you are wondering, some of my hacks include 1) fun keyboard shortcuts that let you directly access a column and let you wrap around them intuitively as well (because we love rivers) and 2) del.icio.us integration that looks and behaves like the official bookmarklett.
Going into this, I barely knew enough JavaScript to alert a "hello world." While I still don't consider myself an expert at it, I have to say that FireBug has taken a lot of the work out of understanding the DOM. For the curious geeks, the entire site is static (scale should not be an issue) and re-built with Python every 15 minutes. As a tangent, I discovered an issue between Python 2.3 (local) and Python 2.4 (server) on how it handles dictionaries. I will have to look into that. I tested the JavaScript with FireFox and a bit with Opera. I know there are issues with Internet Explorer (keycode issues) and Safari - especially the contentWindow.focus() trick to move between columns. I will attempt to fix some of this over the next few days.
Even though I like keeping this slim, I enjoy reading feature requests or squashing bugs to give you a better experience. Anyways, have fun with Newspyle.
Updates:
- Fixed a JavaScript issue when the site first loads up and when the contentWindow focus shifts between panes from an empty state.
- Added some JavaScript code and CSS styles to make it more noticeable on what pane you are reading at the moment. I should have realized this before.
- I optimized a lot of JavaScript and consolidated lots of code. I still see some areas that could use some further improvements and hacks.
- Keyboard shortcuts should be working in IE now. Time to check out the other browsers.
- Added a transparent .favicon which should look better in most browsers.
- Added a boss mode! Try pushing "B" on your keyboard and watch SICP source code with fake syntax highlighting fill up your screen. I don't know Scheme, so I was just winging it. This should be enough to convince casual screen watchers.
- Added a header toggle. I may resize the header (it may be too tall).
- You can now adjust the pane heights to fit your screen resolution/browser window using "M" and "N".
- I did a lot of JavaScript tweaking (should run a bit more efficiently). I am still looking into the Safari/Opera issue. This has to do with iframe focusing. I haven't found any help online about this, so i might file a Safari bug.