whatspop - Kunal Anand

Introducing MovieHoppers.com (riding the Rails)

1/29/2006

For those that are wondering what I have been up to these days...



I recently started a entertainment/media project with two friends, Kenny Roy and Anderson Evans, called Movie Hoppers (www.moviehoppers.com). Unlike other technical projects of mine, this one involves a lot more creative energy. Right now, we are creating original content and writing movie reviews. The two characters that you see throughout this post are Daryl (left), the annoying one, and Dobbs (right), the laid back one.



Ordinarily, maintaining a media driven web site would make me pretty nervous - but I feel extremely confident about these circumstances. I think it is largely due to the fact that I am using Ruby on Rails. Now before you start saying that I've been drinking the framework punch, let me mention three distinct advantages of riding the Rails:

  1. SwitchTower - This tool is worth using Rails alone! Essentially, it "automagically" checks out the latest code from my Subversion repository, adjusts a symbolic link for the server's public directory, and restarts the server (rake commit). The real benefit of the tool is the ability to rollback changes. Suppose I check in bad code and pages start running wild. All I have to do is issue a quick command and things go back to the previous release (rake rollback). I have already saved some gray hair.
  2. AJAX helpers - Even though I have read a fair amount about the Document Object Model (DOM), I really do not enjoy writing JavaScript. Fortunately, Rails utilizes the famous Prototype library and generates the necessary JS code for AJAX requests. This has saved a lot of development and debugging time. AJAX has not made its way into the existing version of Movie Hoppers - but it will be visible soon.
  3. Callbacks - Another helpful component of Rails is the ability to manipulate data before/after filters. This is incredibly useful for our administrative interface. For instance, we can create slugs for posts and reviews and check uniqueness before the data entry. While I could work around this with other frameworks, having this ability is just convenient.


We are working at the rate of adding a new episode and many site features every week. Also, if you have MySpace account, we are giving away Fandango bucks every week. So come watch our funny animations and tell your friends about the site! If you have any suggestions for new features or whatnot, leave a comment or send me an email.

Tasty juice-box packaging

1/22/2006

While browsing YayHooray, a forum that is a bit rough around the edges, I came across a skinny thread with this hidden treasure:



This Japanese packaging is functional and well-designed. Without knowing the product, you can simply observe that it is a banana juice beverage. I anticipate, and hope, that this company will extend this theme to other fruits like oranges, apples, and strawberries where the packaging would have to be differentiated on color and texture.

Even if the average consumer can vaguely understand what you are trying to sell without having any product knowledge or experience, indirect marketing becomes easier. Interactive designers should try and create user interfaces allow users make instantaneous connections with the product, such as a web site. When I visit a site like Friendster or MySpace, the user interface does not convince me that I am in a social network. Instead, I feel like child back in 1996 with a Geocities page (/Area51/Zone/2307/ was my virtual domicile or origin) surfing a community web ring. While that may make money for investors and advertisers, both sites are incredibly ugly, inefficient, and unfriendly. Does any large people-oriented site create a great experience?

Setting up a database for tagging

1/17/2006

Thanks for the encouraging email about the ColdFusion tagging library. Since I have received a lot of questions about setting up a database to handle tagging, I presumed that it would be better to write a walkthrough for everyone. Before jumping ahead, I should mention that I am by no means a database administrator (DBA) or guru. However, I have had to build and scale very large web applications and found many of the following techniques valuable throughout my career. With that disclaimer, let us move on.

We need to create an example for this walkthrough. Let us pretend that we are going to create a database for a del.icio.us clone (you know my thoughts about clones). Even though I actually wanted to write some CFML code and deploy it, I am going to address the topic of setting up the necessary database tables. Writing the CFML is actually trivial for building the clone. In fact, I have listed the necessary queries for making your own del.icio.us.

Chances are that you already have a current database for your application. If you plan on growing the number of users or want to handle large site activity, I would recommend upsizing from SQLite or Microsoft Access to MySQL or PostgreSQL. The rationale is that adding tagging queries will potentially introduce lots of joins, which will exponentially increase with more site usage. I should mention that I am not slamming the smaller databases. If you are in a small environment with low usage, then you probably have nothing to worry about. Just realize that there are more robust and open-source alternatives out there. I prefer MySQL as it is free, simple to install (binaries and source are available), ubiquitous, and easy to replicate across a database farm.

So when you have decided on your database client, you have to design your tables. Right off the bat, you should know that normalizing your database tables is not a rule, it is an option. While it may be considered great practice, it may get in the way of writing and scaling your application. In fact, it may be prudent to de-normalize your tables if performance is an issue or if you want your database to breathe once in a while. You should not feel bad about de-normalizing either - lots of large web sites, like Flickr, realize the technical benefits of it. I will post a link to that article with Cal Henderson once I find it in my del.icio.us bookmarks. Also, I will not go into setting up foreign keys between your tables. This is an architectural issue that can be answered by one question: do you want the database or application source code to handle referential integrity, such as cascade updates and deletes for links and tags? Yes, this issue could argued both ways forever.

Here is the mostly-normalized approach (you can normalize this further):

  • users [id, email, name, password_hash, created_on]
  • links [url, title]
  • user_links [link_id, user_id, description, created_on]
  • user_link_tags [user_link_id, tag]
Here is the de-normalized approach:
  • users [id, email, name, password_hash, created_on]
  • user_links [user_id, url, title, description, created_on]
  • user_link_tags [user_id, user_link_id, tag, created_on]
The first approach has most things compartmentalized in its right place whereas the second approach creates data redundancy (note the missing linking table). While the first is good for feng shui, the second is more practical for working with users and their respective tags, without having to perform joins, which can use up more processing power as the application scales. Using the de-normalized approach, here are ten simple and effective queries:

1. Get all the user's links:

SELECT url, title
FROM user_links
WHERE user_id = #

2. Get all the user's tags:

SELECT tag
FROM user_link_tags
WHERE user_id = #

3. Get the user's 10 most popular tags:

SELECT tag, count(tag) as count
FROM user_link_tags
WHERE user_id = #
GROUP BY tag
ORDER BY count DESC
LIMIT 10

4. Get all the link's tags:

SELECT tag
FROM user_link_tags
WHERE user_link_id = #

5. Get the total number of times a url has been linked:

SELECT count(url) as count
FROM user_links
WHERE url = #

6. Get the latest 10 links added to the site:

SELECT url, title
FROM user_links
ORDER BY created_on DESC
LIMIT 10

7. Get the 50 all-time popular links:

SELECT url, title, count(url) as count
FROM user_links
GROUP BY url, title
ORDER BY count DESC
LIMIT 50

8. Get the 50 all-time popular tags:

SELECT tag, count(tag) as count
FROM user_link_tags
GROUP BY tag
ORDER BY count DESC
LIMIT 50

But what if we want to find the recent popular links or tags, essentially re-creating del.icio.us popular? The following queries explicitly make use of exclusive MySQL functions and syntax.

9. Get most 50 popular links over the last 24 hours:

SELECT url, title, count(url) as count
FROM user_links
WHERE DATE_SUB(CURDATE(),INTERVAL 1 DAY) <= created_on
GROUP BY url, title
ORDER BY count DESC
LIMIT 50

10. Get the 50 popular tags over the last 24 hours:

SELECT tag, count(tag) as count
FROM user_link_tags
WHERE DATE_SUB(CURDATE(),INTERVAL 1 DAY) <= created_on
GROUP BY tag
ORDER BY count DESC
LIMIT 50

Using the built in MySQL date function proved to be faster than using ColdFusion to loop and store popular links in a temporary hash table. If you are using a different database package, you should check your manual to find related functions. If you have a lot of data nodes (records in this case), you might want to cache the queries, store the data in a temporary database table, or write the static/dynamic page using a scheduled task (perhaps cron).

Now you need to figure out a way to search your links, tags, and descriptions. Picking the best method revolves around how serious you want to get. If this is going to be a huge application with lots of deep searching, I suggest compiling Lucene, an open-source search engine ported to various languages. Lucene's success is a function of its agility and accuracy. If this seems more like a chore to you, or if you want to start small, or if you are stuck with Windows Server, you can also establish full-text indexes on the appropriate columns. While this is less work to set up, you should remember to repair your table indexes every now and then.

Alas, here are my final thoughts. Setup a normalized database if you are purist and like to keep things clean. Just remember that you might have an issue with scaling your app. I recommend the latter if scale or query performance is an issue. For those interested, let me know if you want the MySQL schema file - I will email it over.

Maeda's illuminating resolution

1/09/2006

John Maeda provides some thoughts on why it is better to be a light bulb rather than a laser beam:

The logic being that a laser beam is focused with exact precision and without affecting its surroundings, whereas a lightbulb can illuminate everything (and everyone) around them.
More developers, scientists, and researchers need to adopt this simple philosophy. Rather than being afraid of battling conflicting opinions or giving away trade secrets, people should just become more open. How would you propose we as society overcome this intrinsic fear? As an example, more web developers should be open-sourcing and focusing their time building an infrastructure for collaboration. The value of open thinking radiates stronger than marginal "Web 2.0" profits.

I'm not feeling FoaFy

1/05/2006

Remember the FoaF project? The presumption was that a decentralized social network, harnessed by the semantic web, would help people connect. The FoaF project and XFN (now accelerated by the WordPress adoption) would make a network more compelling than your average network like Friendster. I must admit that I too was sold on the concept. On previous incarnations of WhatsPop, I finger-crafted and updated my RDF files in Windows Notepad. Sadly, it appears that the movement has gradually decelerated and my geeky friends that advocated the system for a couple years have ceased writing and hosting their FoaF files.

I think that the project may never reach the mainstream. First, while I think the amount of personalization is suitable, it is not easily extensible for the average person. Do you really want to edit your identity using strange syntax, rather than natural language? Probably not. I am going to call a spade a spade and admit that my mom would not know how to start articulating her identity in an RDF file. Yet, the chances are better that she will be able to manage her profile better with Friendster web forms. Second, where are the important tools for visualizing my network? Java applets? Get real. The project needs a social network structured something like MySpace, complete with hyperlinks, pictures, and a full-text search engine. I once heard from a reliable source that the average MySpacer spends 40 minutes a day browsing the site network, which implies that having a crawl-able network is important. Finally, some people just want their lives to be private, which does not really enhance the effect of the project. When I first started FoaFing, I did not want to give out any details other than my name and URL, which is redundant: if you're surfing my RDF file, odds are that you already know both my name and URL!

Yet I am crossing my fingers and hoping that the project picks back up again (cough, Google). Maybe one of these large social networks will dynamically create a FoaF file for you. If MySpace did that, there would be 32 million people with a FoaF! However, until there are better reasons to do so, I am not going to host my RDF file until something significant happens in the space. For now, it is not worth the hassle to maintain a document that isn't even going to be aggregated by a directional network. Until we meet again my FoaF file.

ColdFusion Technorati API Component

1/02/2006

I recently completed my CFML Technorati API implementation. Even though I have been using the Technorati API queries for autonomous tasks, I decided that it was worth the time to compile it all into a single component. You can read about it and download the source code from here. I will do my best to maintain it as Technorati continues to add more queries.

About I am currently a Senior Engineer at MySpace. Feel free to check out my personal collective.

Archives
April 2005
May 2005
June 2005
July 2005
August 2005
September 2005
October 2005
November 2005
December 2005
January 2006
March 2006
April 2006
May 2006
June 2006
July 2006
August 2006
September 2006
October 2006
November 2006
January 2007
February 2007
April 2007
November 2007
December 2007
January 2008
March 2008
April 2008
May 2008
June 2008

Subscribe to my feed