08.07.06
Your Netflix Ratings
I really love my Netflix account. I’ve been using it for years and at this point I’ve rated almost 3000 movies. Granted just over 900 of those I never, ever want to see and most of those I wish had never been made - but hey that’s enough about me. I enjoy toying with data like this, amalgamating it and proffering it for view by friends and strangers alike.
The other day I decided to abstract my ratings data from Netflix and a cursory Google search turned up some previous work by John Ressig and Devanshu Mehta. Naturally, it was not going to be as straightforward as downloading someone else’s hard work and running a script. Netflix has changed the Login URL and Ratings page URL, as well as the HTML format of the Ratings page since these packages were written. So I figured it was time for me to try my hand at updating the requisite code.
I decided to add a few enchancements such as abstracting the URLs and regexp to a config file. Provide the functionality to capture the retrieved data in a database (PostGreSQL) if the user so desires. Cleanup the codebase and apply several of Damian Conway’s best practices.
I highly reccomend downloading Devanshu’s package as it has several python scripts that garner additional meta-data from Netflix. Devanshu has also gone to the trouble of documenting many of the basic steps required to get your environment set up, check out his blog article for the instructions. The file my code generates is in a format compatible with his. Download and have some fun.
Cyril Bouteille said,
January 2, 2007 at 3:42 pm
I install all the dependent libraries on a Solaris 10 system but I get the following error when running the retrieve_movies script:
No such field ‘email’ at /usr/local/lib/perl5/site_perl/5.8.7/WWW/Mechanize.pm line 1233
I checked the $config variable and the values from the netflix.cfg file seem to get passed properly…
Any idea?
Robert Keogh said,
January 3, 2007 at 12:18 am
The login url (netflix.url.login in netflix.cfg) has changed to http://www.netflix.com/Login
Netflix has changed the layout of their ratings page so the regexp also needs to be updated. I won’t have time to update the regexp until I get back from CES next week.
moo said,
February 14, 2007 at 3:23 am
I tweaked the config and regex to get this working again. These are the changed values:
netflix.url.login=http://www.netflix.com/Login?hnjr=3
netflix.regexp.rating=movieid=(\d+)&trkid=\d+”[^>]*>([^(\w+).*?genre">([a-zA-Z\&\s\-]+).*?stars_(\d)_(\d)
Also, the database connection code was not using the hostname and port specified in the config file. My DB server is not local, so I had to fix that as well. Here is the modified function from Netflix.pm:
sub _load_schema {
my $self = shift;
return unless $self->get_config()->is_db_configured();
my $driver = $self->get_config()->get_db_driver()
or croak “no database driver found”;
my $host = $self->get_config()->get_db_host()
or croak “no database name found”;
my $port = $self->get_config()->get_db_port()
or croak “no database name found”;
my $name = $self->get_config()->get_db_name()
or croak “no database name found”;
my $user = $self->get_config()->get_db_user()
or croak “no database user found”;
my $password = $self->get_config()->get_db_password()
or croak “no database password found”;
my $dsn = ‘dbi:’ . $driver . ‘:’ . $name . ‘:’ . $host . ‘:’ . $port;
return DB::Main->connect( $dsn, $user, $password, {AutoCommit => 0} );
}
moo said,
February 14, 2007 at 3:25 am
Whoops. Forgot to change a few strings in _load_schema. This is better:
sub _load_schema {
my $self = shift;
return unless $self->get_config()->is_db_configured();
my $driver = $self->get_config()->get_db_driver()
or croak “no database driver found”;
my $host = $self->get_config()->get_db_host()
or croak “no database host found”;
my $port = $self->get_config()->get_db_port()
or croak “no database port found”;
my $name = $self->get_config()->get_db_name()
or croak “no database name found”;
my $user = $self->get_config()->get_db_user()
or croak “no database user found”;
my $password = $self->get_config()->get_db_password()
or croak “no database password found”;
my $dsn = ‘dbi:’ . $driver . ‘:’ . $name . ‘:’ . $host . ‘:’ . $port;
return DB::Main->connect( $dsn, $user, $password, {AutoCommit => 0} );
}