Implementing WWW::LastFM with XML::Rabbit - Part 3

In the previous article we looked at how XML::Rabbit does its magic to give us a very compact syntax for creating Moose attributes that mirror simple XML document values. In this article we'll look at some of the other sugar functions available in XML::Rabbit.

Implementing the <metros/> XML chunk extractor

So let's move on to the implementation of WWW::LastFM::Response::MetroList. Now we're starting to get to the juicy parts of the application. Use this code for lib/WWW/LastFM/Response/MetroList.pm:

package WWW::LastFM::Response::MetroList;
use XML::Rabbit;

use List::MoreUtils qw(uniq);

has_xpath_object_list '_locations' => './metro' => 'WWW::LastFM::Response::Metro',
    handles => {
        'locations'        => 'elements',
        'filter_locations' => 'grep',
    },
;

has_xpath_value_list '_countries_with_duplicates' => './metro/country';

has '_unique_countries' => (
    is         => 'ro',
    isa        => 'ArrayRef[Str]',
    traits     => ['Array'],
    handles    => {
        'countries'        => 'elements',
        'filter_countries' => 'grep',
    },
    lazy_build => 1,
);

sub _build__unique_countries {
    my ($self) = @_;
    return [
        uniq(
             @{ $self->_countries_with_duplicates }
        )
    ];
}

finalize_class();

Here you have another declaration method called has_xpath_object_list. It is similar to has_xpath_object, which we've already covered, but it returns an array of objects instead of just a single object. We also add some Moose native delegations for arrays. This makes the API more user friendly, as we don't need to dereference the array references all the time.

The next declaration is almost the same, but it works with strings instead of objects. As the XML document contains lots of duplicates, we create a separate attribute that takes care of getting rid of the duplicates and presents the API we're interested in.

Now we complete this with the implementation of WWW::LastFM::Response::Metro. Create the file lib/WWW/LastFM/Response/Metro.pm with the following content:

package WWW::LastFM::Response::Metro;
use XML::Rabbit;

has_xpath_value 'name'    => './name';
has_xpath_value 'country' => './country';

has 'country_and_name' => (
    is         => 'ro',
    isa        => 'Str',
    lazy_build => 1,
);

sub _build_country_and_name {
    my ($self) = @_;
    return $self->country . ": " . $self->name;
}

has 'name_and_country' => (
    is         => 'ro',
    isa        => 'Str',
    lazy_build => 1,
);

sub _build_name_and_country {
    my ($self) = @_;
    return $self->name . ", " . $self->country;
}

finalize_class();

This is a pretty straight-forward class. You have two string values from the <metro/> XML element extracted. Look at the use of relative XPath queries, which avoids having to construct elaborate queries based on the root of the XML document. As a convenience, I added two calculated attributes.

We can still test how it works with a one-liner, but it is starting to get a bit long.

$ perl -Ilib -MWWW::LastFM -E 'STDOUT->binmode(":utf8"); say join("\n", sort map { $_->name_and_country } WWW::LastFM->new->geo->get_metros->metros->filter_locations(sub { $_->country_and_name =~ /Norway/ }) )'
Bergen, Norway
Oslo, Norway

lastfm_locations.pl, a simple WWW::LastFM client

Let's create a script we can use to query locations available in the Last.FM service. Create bin/lastfm_locations.pl with this content:

#!/usr/bin/env perl

use strict;
use warnings;
use rlib;
use feature qw(say);

use WWW::LastFM;
use Encode qw(decode encode);

my $filter = shift;
my $filter_utf8 = $filter ? Encode::decode('UTF-8', $filter) : "";

my @locations = sort
        map { $_->name_and_country }
        WWW::LastFM->new->geo->get_metros->metros->filter_locations(
            sub { $_->country_and_name =~ /\Q$filter_utf8\E/i }
        )
;

if ( @locations > 0 ) {
    say Encode::encode('UTF-8', join("\n", @locations) );
}
else {
    say "No locations found matching '$filter'";
}

This is very similar to the one-liner, but with a bit more error checking and Unicode handling to ensure you can specify command line parameters in UTF8. I guess I've forgotten to mention this until now, but I have assumed that you are using a terminal software that uses UTF8 encoding. The only common operating system I know that doesn't do that by default nowadays is Windows. I'll leave dealing with that challenge for another article.

You should now have a complete application that allows you to ask a remote HTTP-based API for some information and display it in the way you want to. Adding more API calls is just a matter of adding additional method calls in WWW::LastFM::API::Geo and creating a new response class to handle the XML output. In the next article we will add the API call we're really interested in, geo.getEvents. Stay tuned!

Implementing WWW::LastFM with XML::Rabbit - Part 2

In the previous article we created a simple framework for making HTTP requests to the Last.FM API. In this article we'll go into detail on how XML::Rabbit can help us to extract the information we want from the XML output we saw in the previous article. I've added that XML output here, for your convenience.

<?xml version="1.0" encoding="utf-8"?>
<lfm status="ok">
<metros>
    <metro>
        <name>Sydney</name>
                <country>Australia</country>
    </metro>
...snip...
    <metro>
        <name>Wichita</name>
                <country>United States</country>
    </metro>
</metros></lfm> 

Implementing the geo.getMetros API call

Add the following code at the top of lib/WWW/LastFM.pm to load the WWW::LastFM::API::Geo module (which we'll flesh out in a moment).

use WWW::LastFM::API::Geo;

Next we add another attribute so we can get easy access to the Geo class. Add at the bottom:

# API modules
has 'geo' => (
    is         => 'ro',
    isa        => 'WWW::LastFM::API::Geo',
    lazy_build => 1,
);

sub _build_geo {
    my ($self) = @_;
    return WWW::LastFM::API::Geo->new( lastfm => $self );
}

The next part is to create the basic skeleton that allows us to make API calls with convenience. Add the following code to lib/WWW/LastFM/API/Geo.pm:

package WWW::LastFM::API::Geo;
use Moose;
use namespace::autoclean;

use URI::Escape;

has 'lastfm' => (
    is       => 'ro',
    isa      => 'WWW::LastFM',
    required => 1,
);

# http://www.last.fm/api/show?service=435
# $country must be a binary encoded utf8 string

sub get_metros {
    my ($self, $country) = @_;
    return $self->lastfm->get(
           $self->lastfm->api_root_url
        . '?method=geo.getMetros'
        . ( $country ? '&country=' . uri_escape($country) : "" )
        . '&api_key=' . $self->lastfm->api_key
    );
}

__PACKAGE__->meta->make_immutable();

1;

We perform another one-liner to verify that we're on the right track.

$ perl -Ilib -MWWW::LastFM -E 'say WWW::LastFM->new->geo->get_metros'

What is interesting to see is that we've forwarded the instance of WWW::LastFM into the API class to avoid using global variables to hold our shared state. But let's see if we can't do something with that ugly XML.

Dealing with the root element in the response XML

Let's have a closer look at the XML returned.

<?xml version="1.0" encoding="utf-8"?>
<lfm status="ok">
<metros>
    <metro>
        <name>Sydney</name>
                <country>Australia</country>
    </metro>
... <snip> ...
    <metro>
        <name>Wichita</name>
                <country>United States</country>
    </metro>
</metros></lfm>

What we can see is that Last.FM always wraps its response in a <lfm/> root element, as described in API REST requests. It holds one attribute named status, and it's value is either ok or failed. So we're dealing with a boolean here. If the request failed it will contain an error code, like this:

<?xml version="1.0" encoding="utf-8"?>
<lfm status="failed">
    <error code="10">Invalid API Key</error>
</lfm>

So let's see if we can encapsulate this behaviour in a class with a nice API. Let's first change our get_metros method to return this response class instead of some boring XML. Edit lib/WWW/LastFM/API/Geo.pm and add this after the other imports:

use WWW::LastFM::Response;

And change the get_metros method into this:

sub get_metros {
    my ($self, $country) = @_;
    my $xml = $self->lastfm->get(
           $self->lastfm->api_root_url
        . '?method=geo.getMetros'
        . ( $country ? '&country=' . uri_escape($country) : "" )
        . '&api_key=' . $self->lastfm->api_key
    );
    return WWW::LastFM::Response->new( xml => $xml );
}

So what should this WWW::LastFM::Response class look like? What we know is that it should require some XML to work on, and if it doesn't get that it should blow up with an error message (we want it to throw an exception of some kind). We also want to know if the response failed or not, and if it failed, we should throw an exception as well. If everything is okay, we should be able to get to those metros. Create lib/WWW/LastFM/Response.pm with this content:

package WWW::LastFM::Response;
use XML::Rabbit::Root 0.1.0;

sub BUILD {
    my ($self) = @_;
    return if $self->is_success;
    confess("Last.FM response error " . $self->error_code . ": " . $self->error);
}

has 'is_success' => (
    is         => 'ro',
    isa        => 'Bool',
    lazy_build => 1,
);

sub _build_is_success {
    my ($self) = @_;
    return unless $self->status eq 'ok';
    return 1;
}

has_xpath_value 'status'     => '/lfm/@status';
has_xpath_value 'error'      => '/lfm/error';
has_xpath_value 'error_code' => '/lfm/error/@code';

has_xpath_object 'metros' => '/lfm/metros' => 'WWW::LastFM::Response::MetroList';

finalize_class();

You can test it immediately to see if it works as expected:

$ perl -Ilib -MWWW::LastFM -E 'say WWW::LastFM->new->geo->get_metros->status'

You should get a string that says ok (unless you get an error because the Last.FM API server refuses to answer). If you want to force an error condition, try it out with an invalid API key.

$ perl -Ilib -MWWW::LastFM -E 'say WWW::LastFM->new( api_key => "1234" )->geo->get_metros->status'
Last.FM response error 10: Invalid API key - You must be granted a valid key by last.fm at ...

I snipped the rest of the stack trace, as it is of no interest at this point. We know where we made a mistake.

How XML::Rabbit::Root simplifies dealing with the XML document data

Let's get into detail on how that class definition works.

Using XML::Rabbit::Root does a whole lot of things behind the scenes. It is the equivalent of doing this:

use Moose;
with "XML::Rabbit::RootNode";
use namespace::autoclean;
use XML::Rabbit::Sugar;

The finalize_class() at the bottom is the equivalent of __PACKAGE__->meta->make_immutable(); 1;, which ensures the class executes as fast as possible during runtime and that the file loaded returns a true value, as required by the perl parser. So with that out of the way, let's take a look at the meat of the class and what that imported sugar, has_xpath_value and has_xpath_object, represents.

The first thing to notice is that there is no xml attribute declared in the class. This parameter comes from XML::Rabbit::Role::Document. It is automatically available, and you must specify either file, fh, xml or dom, based on what kind of format your XML document is in. As we have it in a string, we use the xml parameter.

The BUILD method is a special method Moose will automatically call after it has constructed an instance. In our case, we use it to verify the state of our instance. If the attribute is_success is true we just return, as everything is okay. If it is not, we throw an exception with the error code and text from the XML. This is how we can easily make an unwanted value in an attribute cause the entire construction of the object to fail.

The is_success attribute is pretty straight-forward. We take a string value that represents the status and checks if it is negative (that is, NOT the value ok). If that is the case, we return false, otherwise things must be okay and we can return a true value. Notice that I use a guard clause to fail early. This is a best practice to avoid the dreaded arrow anti-pattern.

So what does the has_xpath_value 'status' => '//lfm/@status'; declaration actually mean? Let's have a look in XML::Rabbit::Sugar and see if we can't figure it out. It says:

has_xpath_value($attr_name, $xpath_query, @moose_params)

Extracts a single string according to the xpath query specified. The attribute isa parameter is automatically set to Str. The attribute native trait is automatically set to String.

Okay, so this code creates a Moose attribute with an isa parameter of Str that will actually represent the value of the status attribute on the lfm root element in the XML document. Take a look at an XPath tutorial if you're unfamiliar with the syntax. If you need some additional Moose attribute parameters, you can specify them at the end. In our case there is no need for any. The error and error_code attribute is just the same thing.

So what is this has_xpath_object thing? It represents exactly the same thing as has_xpath_value, but instead of being a string value, it is an object of the specified class. The metros attribute will represent the list of metros returned in the XML document. I was thinking about naming the attribute metro_list, to avoid nouns in plural, but decided to keep it in plural because it more closely matches the XML document layout. To avoid putting too much responsibility inside WWW::LastFM::Response, I've delegated dealing with this list of metros to another class. This follows object orientation best practices, which states that each class should have a clear and defined purpose.

In the next article we'll dive into the details on how this MetroList class is implemented.

Implementing WWW::LastFM, a client library to the Last.FM API, with XML::Rabbit

In this series of articles I'm going to implement a client to the Last.FM web services API which allows us to find concerts and other events in your local area. We'll use a CPAN module I've created called XML::Rabbit to deal with all the mundane details of XML document handling.

Most of you probably don't like XML a lot. JSON is the new kid on the block, and dealing with all the crummy details of XML encoding, parsing and such is boring. Writing long incantations of XML::LibXML code to extract the required data is just something you would prefer not to do (unless you get paid for it, maybe not even then). I'm going to show you a way to deal with XML that requires a lot less boilerplate code than you're most likely used to. Hopefully it will make dealing with XML-based APIs a lot more fun for you. In the process of showing you how to deal with XML with ease I'll also implement a simple, but extensible, framework for communicating with the Last.FM API. You can use the same framework design to build client libraries against other HTTP-based APIs. My hope is that the code I show you will inspire you to work with me on this particular Last.FM API or create clients for other interesting APIs.

I have done my best to follow good Perl programming practices, as advocated by chromatic's Modern Perl book, Damian Conway's Perl Best Practices book, and the Moose Manual. I've also separated as much responsibility as possible into separate classes, attributes and methods. This should make it easier to create good tests that require a minimum amount of mocking to test both positive and negative failure scenarios.

Getting a Last.FM API key and secret

If you want to follow along, you'll need to get an API key from http://www.last.fm/api/account. Just fill out the required fields with something useful to your person and get your API key and secret. When you have that, stick it in a file in your home directory called .lastfm.ini.

The contents should be something like this:

[API]
key = 012345678948bb4c75ff9608aac4fe83
secret = abcdef57349f6ad7e9959a63aa472

That should ensure that the configurable information is tucked away in a personal file outside any code repository.

On Windows the correct directory will probably be C:\Users\<username>\AppData\Local. Run the following command to figure out the correct location:

$ perl -MFile::HomeDir -E "say File::HomeDir->my_data";

If you don't feel like registering, you can actually use the API key used in the examples on the Last.FM API page, currently b25b959554ed76058ac220b7b2e0a026. I'm not sure how long it will work, but as we're not going to touch any of the authenticated API calls you won't need the API secret quite yet.

Setting up your environment

If you want to follow along without typing in all the code, look at the WWW-LastFM github repository. The project uses Dist::Zilla, so installing all the dependencies should be as easy as running the following commands:

$ cpan Dist::Zilla
$ git clone git://github.com/robinsmidsrod/WWW-LastFM.git
$ cd WWW-LastFM
$ dzil authordeps | xargs cpan # or dzil authordeps | cpanm
$ dzil listdeps | xargs cpan   # or dzil listdeps | cpanm

The entry point class

Okay, now we're ready to dive in!

Let's create some basic code to read that config file and get access to our API key and secret. Create the file lib/WWW/LastFM.pm with this content:

package WWW::LastFM;
use Moose;
use namespace::autoclean;

use File::HomeDir;
use Path::Class::Dir;
use Config::Any;

# Sometimes we like some extra debugging output
has 'debug' => (
    is      => 'ro',
    isa     => 'Bool',
    default => 1,
);

# Standard stuff to read our config file
has 'config_file' => (
    is         => 'ro',
    isa        => 'Path::Class::File',
    lazy_build => 1,
);

sub _build_config_file {
    my ($self) = @_;
    my $home = File::HomeDir->my_data;
    my $conf_file = Path::Class::Dir->new($home)->file('.lastfm.ini');
    return $conf_file;
}

# This is where our config file data ends up
has 'config' => (
    is         => 'ro',
    isa        => 'HashRef',
    lazy_build => 1,
);

sub _build_config {
    my ($self) = @_;
    my $cfg = Config::Any->load_files({
        use_ext => 1,
        files   => [ $self->config_file ],
    });
    foreach my $config_entry ( @{ $cfg } ) {
        my ($filename, $config) = %{ $config_entry };
        warn("Loaded config from file: $filename\n") if $self->debug;
        return $config;
    }
    return {};
}

# And here we have our api key and secret
has 'api_key' => (
    is         => 'ro',
    isa        => 'Str',
    lazy_build => 1,
);

sub _build_api_key { return (shift)->config->{'API'}->{'key'}; }

has 'api_secret' => (
    is         => 'ro',
    isa        => 'Str',
    lazy_build => 1,
);

sub _build_api_secret { return (shift)->config->{'API'}->{'secret'}; }

__PACKAGE__->meta->make_immutable();

1;

At this point you should be able to do read your API key from your config file with this small one-liner.

$ perl -Ilib -MWWW::LastFM -E 'say WWW::LastFM->new->api_key'

This code is not really anything special. It's just a bunch of best-of-breed modules used to read an INI file in a dynamic directory based on your platform and putting its contents into a hash reference. Notice the use of lazy Moose attributes to transform your data from a filename, .lastfm.ini, to a specific value, api_key, one step at a time. Also notice that we do not need to use strict and warnings, as Moose takes care of that for us.

The next thing we need is a way to make requests, so let's use the tried and true LWP::UserAgent. Add this at the top of the file after the other imports:

use LWP::UserAgent;

our $VERSION = "0.0.1";

And then continue and add the api_root_url and ua attribute that contains our HTTP client.

# All access to the Last.FM API starts with this URL
has 'api_root_url' => (
    is      => 'ro',
    isa     => 'Str',
    default => 'http://ws.audioscrobbler.com/2.0/',
);

# And finally our HTTP client that we will use to make requests
has 'ua' => (
    is         => 'ro',
    isa        => 'LWP::UserAgent',
    lazy_build => 1,
);

sub _build_ua {
    my ($self) = @_;
    return LWP::UserAgent->new( agent => 'WWW::LastFM/' . $VERSION );
}

# A utility method for making requests, returns raw XML
# or throws exception if no content was generated
sub get {
    my ($self, $url) = @_;
    confess("No URL specified") unless $url;
    my $response = $self->ua->get($url);
    my $content = $response->content;
    confess("HTTP error: " . $response->status_line) unless defined $content;
    confess("HTTP error: " . $response->status_line) if $response->code >= 500;
    return $content;
}

We've also added a basic get method that fetches the data on the specified URL and returns the raw content (bytes). This should make it trivial to make API calls and get back XML we can work with.

The geo.getMetros API call

To test out that our HTTP client works, we can make another one-liner that fetches the locations (metros) the Last.FM service knows about.

$ perl -Ilib -MWWW::LastFM -E 'my $lfm = WWW::LastFM->new; say $lfm->get($lfm->api_root_url . "?method=geo.getMetros&api_key=" . $lfm->api_key)'

You should get some XML spewed out on your screen that looks something like this:

<?xml version="1.0" encoding="utf-8"?>
<lfm status="ok">
<metros>
    <metro>
        <name>Sydney</name>
                <country>Australia</country>
    </metro>
...snip...
    <metro>
        <name>Wichita</name>
                <country>United States</country>
    </metro>
</metros></lfm>

Let's see if we can't expose that information in a better way. If you have a closer look at Last.FM's API you'll notice that they divide their API calls into sub-sections. In the next part we'll make a separate class for the specific API calls we want to provide, in this case the geo sub-section.

JW Player uses term "Open Source", but violates Open Source Definition rule #6

I recently came across the software JW Player. It is a Flash/HTML5-based video player to use on your website. Oh, this looks great, I thought. Why haven't we used this at work, I asked myself? It is, after all, open source. Then I clicked on to the download page and I saw a piece of text that confused me at first.

By downloading, I agree to the non-commercial license.

How can a software be open source, but only for non-commercial use? That does not compute. I looked a bit further, and their free download (and source code availability) is covered by the Creative Commons BY-NC-SA 3.0 license, which disallows commercial use.

Yet, if you look at the Open Source Definition rule number 6, it says the following:

6. No Discrimination Against Fields of Endeavor

The license must not restrict anyone from making use of the program in a specific field of endeavor. For example, it may not restrict the program from being used in a business, or from being used for genetic research.

To me, the incompatibility with the term "Open Source" and the CC-BY-NC-SA license is quite obvious. They are actively polluting the definition of the term "Open Source", as understood by the FLOSS movement, for their own benefit.

I wrote a comment on their forum asking them to either change their use of the "Open Source" term or change the license of their software. I'm not expecting much, as they locked down a previous comment about the issue without acknowledgment.

I'm hoping that someone with more klout than myself (hopefully someone involved with the Open Source Initiative) will carry the torch and put some heat on this company for their misuse and pollution of the Open Source Definition.

Unicode::Collate is really, really slow

I noticed a while back that there was something fishy with perl's built-in sort when dealing with Unicode text. Doing some research made me eventually notice the Unicode Collation Algorithm (UCA) and the perl implementation in Unicode::Collate and the very useful Unicode::Collate::Locale.

Thanks a lot to the author and maintainers of Unicode::Collate for actually implementing the algorithm and giving perl programmers the tools to actually sort text in the right way.

Unfortunately my hope got quickly crushed when I noticed the very large speed difference in the implementation of UCA and the normal sort algorithm. At this point I also noticed the pg_collkey implementation of UCA for PostgreSQL, and I was saved (because it was quite fast). Since all of my data came from my database I could just sort the data before it was returned from the database.

Time passed and I slowly rewrote all of my broken sort's to use database-based ORDER BY instead.

And then I'm reading Tom Christiansen's article on how the builtin sort in perl is broken for some type of work, and that we should be using Unicode::Collate instead. Suddenly everything I had forgotten came back to me. I guess now was the time to speak up.

I whipped up a small benchmark program to actually test how much difference there is between the built-in sort and Unicode::Collate.


#!/usr/bin/env perl

use 5.14.0;
use strict;
use warnings;
use File::Slurp qw(slurp);
use Encode qw(decode_utf8);
use List::MoreUtils qw(uniq);
use autodie;

use Benchmark qw(cmpthese);
use Unicode::Collate::Locale;

# Get unique list of words from specified file
my @words = uniq map { split /\b/u } decode_utf8( slurp(shift) );

# Do benchmark
my $uca = Unicode::Collate::Locale->new( locale => 'nb' );
cmpthese(1_000, {
'sort' => sub { my @sorted_words = sort @words },
'uca' => sub { my @sorted_words = $uca->sort(@words) },
});

exit;

You can find uca_sort_benchmark.pl on GitHub, together with the text file I used, if you'd like to reproduce.

The results are truly devastating.

$ ./uca_sort_benchmark.pl misc_wikipedia_text.txt
Rate uca sort
uca 4.08/s -- -99%
sort 403/s 9782% --

The implementation of UCA in perl is about 100 times slower than the built-in sort!

I'm wondering how much faster it could be if Unicode::Collate was coded to use ICU directly (which is what pg_collkey uses). I really hope someone with XS/C skills can figure out how to make it faster, because I really want to use it everywhere.