Review App

Internet Entrepreneurs Blog

Archive for the ‘Pseudocode’ Category

Online calendar: recurring events pseudo-code

Thursday, May 18th, 2006

I recently needed some help building recurring event functionality into a web calendar I was building for Goodvan and I just couldn’t find what I was looking for online. I came up with a few different ways to do it but in the end decided the following is the best method.

Say you already have a very simple calendar table (called Cal) that has an id, event title, and a date. The next step is to create an identical table for recurring events (let’s call it CalRepeat) with the same fields as Cal. Now we need to add a field to Cal called RepeatID to link the two tables (Cal.RepeatID -> CalRepeat.id).

Now we need a separate form for users to enter recurring events, and I’m assuming you already have a form for adding regular events to Cal. When a user adds a recurring event they will enter how often the event is repeated (once a month, every Sunday, etc.) and when the series ends (it will need to be finite for reasons that will become apparent shortly).

Upon submission your script will first need to add a new entry into CalRepeat. Then using the CalRepeat id, loop your script through all the dates that match the criteria (say the 15th of every month through Jan 2008) and add them as new entries into Cal with RepeatID equal to the CalRepeat id.

Now if you need to edit a single event (say just this month’s meeting was moved to the 16th) you only need to update a single record in Cal. Updating an entire series is almost as easy but you’ll need to have kept track of frequency and end date information to make this work. To update the entire series, first delete all the events with RepeatID equal to the Repeat event in question then add them again using the new criteria.

This is a very general description of making a recurring event calendar and you may still find it challenging to set up the necessary loops for adding multiple events or for storing information about frequency in your CalRepeat table. I hope this article can at least get you started on making your very own online calendar!

Zip code radius calculation algorithm

Sunday, May 7th, 2006

Here’s a neat little method I figured out for finding geographic points (waypoints) located within a certain radius of a “master point.” To make this a bit more concrete, let’s say you want to find all the zip codes within a 10 mile radius of your zip code, 27701 (Durham, NC). Most good zip code databases will have the latitude and longitude coordinates (waypoints) of each zip code, but this is alot of data (my database has over 79,000 zip code entries). It is far too much data to calculate a distance between the master point and each zip code so you need to find a way to quickly zoom in on the points that are most likely to be within your radius.

I’ve included the image above to help in this illustration. In this image X marks my master point, Durham, NC. The various green stars represent other zip codes around Durham. The first step is to determine the boundaries of our radius. A good rule of thumb to use is to assume 1 degree of longitude is about 70 miles wide while a degree of latitude is about 50 miles (this isn’t exact but we’ll improve this estimate when we calculate the actual radius).

In this case, our target zip codes will be between longitudes ‘a’ and ‘b’ while the latitudes will be between ‘c’ and ‘d.’ These can be calculated easily by adding and subtracting our radius ‘r’ from our center point X to get each limit. We can now query our database to return just those zipcodes where the latitudes are between ‘c’ and ‘d’ and longitudes are between ‘a’ and ‘b’ (example: SELECT * FROM zip WHERE latitude >= ‘c’ AND latitude <= ‘d’ AND longitude >= ‘a’ AND longitude <= ‘b’)This query returns 8 points but a few of these are outside our 10 mile radius we originally set. But now we’ve narrowed our consideration set down from 79,000 to just 10 points. We can calculate the distance between X and each of these points to see if they lie within the radius ‘r,’ leaving those points within the shaded circle as our result.The equation I use to calculate distances between two waypoints is:

$radius = 3959 * acos(sin($lat1) * sin($lat2) + cos($lat1) * cos($lat2) * cos($long2 - $long1)); where $radius is in miles

There are a number of ways to calculate these distances that can be as complicated as you like but again, this should be close enough for most applications. Realize, however, that if you try to make this more precise by using a more complicated equation, you’ll severely increase the processing time required to calculate distances. Also note that increasing the radius under consideration can exponentially increase processing times depending on the nature and size of your dataset.

Many companies offer scripts for sale that calculate zip code radiuses and distances but with a little work and using this framework, you can create your own application. Take a look at an example of my implementation at: singletracks.

How to build a simple recommendation engine

Thursday, July 28th, 2005

Building a simple recommendation engine for your website can be a powerful differentiator among your competition. Traditionally only technically advanced sites like Amazon were able to offer this service but with ever increasing processing power and simple open source software, now you too can add recommendations to your website.

A little over a year ago I wanted to add recommendations to my mountain biking website and I searched the net to find a simple open source solution (or even some pseudo-code to get me started). Unfortunately the only things I found were technical research papers on the subject and some fancy proprietary software that I could never afford. In my mind I knew it had to be relatively simple so I set about mapping the idea out in pseudo-code. What follows is pseudo-code you can use to build your own recommendation engine. I personally chose to use PHP and MySQL but this can certainly be done on any platform.

Let me once again reiterate that I am laying out the construction of a simple recommendation engine rather than the complete algorithm for recommendation bliss. I’m sure there are more technically savvy ways to accomplish this and I’m by no means a PHP master. However, I have found this method to work well on my own sites and I think it will work well for 90% of the web tinkerers reading this article.

The first thing you need to do is to keep track of the actions your users are taking. At a minimum you’ll need 3 data tables to store the data to make the engine run. First off, you need a user table so you can uniquely identify who is recommending what. For Amazon this is the customer ID (your email address) and for my site it’s your user login. You’ll also need to use cookies or sessions so you’ll know who the user is and you can track what they do while they’re logged in. Cookies are simple in PHP and a quick search of the PHP website for setcookie() will give you all the info you need.

Next up you’ll need a table to identify the items your users will be recommending (you probably already have this in some form or another). Your table might contain information on products, local night clubs, or mountain bike trails. Each needs to be uniquely identified with a primary key (usually an integer). From here, there’s only one more table to create.

The final table will be a junction table combining the user and information tables. This table will need to have 4 columns at a minimum to keep track of the following details:

  1. user id of recommender
  2. information table id (night club id, product id, etc.)
  3. the rating (usually on a scale of 1 to 5, with 5 meaning “best”) and
  4. a primary key (this is a given).

If your data is product related and you actually sell products on your site, you might be able to skip the rating column altogether and base recommendations simply on what users have purchased (where a user id & product id entry implies purchase).

Once you’ve collected this information in the junction table, it’s time to get down and dirty with the recommendation algorithm. First off, you need to collect all the user ids of users who have rated a particular item highly (my cutoff is 4 and above; if you are trying to recommend other things people will dislike, you might choose 2 and below). Or if your users are very precise or you have lots of data, you might only consider the maximum rating (5). Suppose we have a user who rates item A a 3. This user does not meet the threshold for a “favorable” vote and is therefore not included in providing a recommendation for this item (we assume this user has different tastes since he does not agree with our other users that this is a good item). Collect the proper user ids in an array for use in the next step.

With the user array in hand, we want to find out what other items each user has also rated highly (or lowly, depending). So for example, if we are interested in providing a recommendation for item A and user 1 gave item A a 5, item B a 2, and item C a 4, we would grab the item C product id and rating as a possible recommendation and place it in an array.

rec[’B'][’score’] = 2

rec[’B'][’votes’] = 1

rec[’C'][’score’] = 4

rec[’C'][’votes’] = 1

Now suppose that user 2 also gave item A a 5 but gave B a 3, C a 5, and D a 4. Now our array looks like this:

rec[’B'][’score’] = 2 + 3 = 5

rec[’B'][’votes’] = 1 + 2 = 2

rec[’C'][’score’] = 4 + 5 = 9

rec[’C'][’votes’] = 1 + 1 = 2

rec[’D'][’score’] = 4

rec[’D'][’votes’] = 1

After a little math, we compute the average score for each recommendation and order our array from best match to worst:

rec[’C'][’avg’] = 4.5

rec[’D'][’avg’] = 4

rec[’B'][’avg’] = 2.5

Now we can present our recommendations to the user. Since our cutoff for likeability was 4 and over, we only present those items that have an average score of 4+: items C and D. Item C is the best match, item D is the next best, and item B is not a match at all. You may want to weight those items with more “votes” more highly than single vote recommendations but this really isn’t as important once your recommendation table reaches a decent size. You can also limit the results you return to the top 5 or whatever number you deem sufficient if many of your users rate a large number of items.

Another way to describe these recommendations is to say “Users who enjoyed A also enjoyed C and D.” This actually better explains the relationship between the items, especially given the simple algorithm used to join the items.

There are several variations you can make to this scheme to improve the results you return. For my own use, I have limited the number of users I poll to 10 users who have rated a particular item (selected at random). This is to keep the script from slowing my pages since I have some items that have been rated by hundreds of unique users. I also limit the recommendations to items in similar “categories.” Specifically, I only return bike trail recommendations for trails in the same state as the original trail. Although many of my users have rated trails in multiple states, it is unlikely that information seekers will be interested in a trail in California that is similar to the one in Georgia that they’ve just ridden.

A recommendation engine is fairly easy to construct using a few simple tables and arrays and can be a distinctive and compelling offering for your users. Once you’ve got the basics down it’s easy to improve this idea to give you truly meaningful results.