A site about how community groups and charities can make the most of data and open data to do something useful. Focused on Birmingham, relevant everywhere.

Data scraping using Google Docs spreadsheets


Thanks to Paul Bradshaw I recently came across a post by Tony Hirst who has very helpfully provided an extraordinary lesson in the joys (and I do mean joys) of scraping data using Google Docs, which I strongly recommend following if you’re at all interested in these sorts of things and haven’t seen it (it was written nearly two years ago).

In just a few very easy steps, as Tony very helpfully points out, it’s possible to get data to appear magically in your spreadsheet (from Wikipedia) and then turn that data into a map, with Google Maps via Yahoo Pipes . It’s just a matter of sitting down and reading Tony’s post – and carefully following it.

What’s remarkable is that much of this is so straightforward. Admittedly, Yahoo Pipes – an extraordinarily powerful tool – does take some getting used too. But then there are people like Tony who can help you get used to some of its quirks.

I followed the tutorial word for word and got almost everything working, but have had a little trouble with the map part. It turned out there have been a few problems with the Yahoo Pipes location module, which appears to be a bit tempremental. I’ve since learned from Mary Hamilton that you can concatenate a bit of postcode in to get it working, but am yet to try that.

Still, this blog post is very much recommended. I’ve started to muck around, as a result, with scraping data from Birmingham City Council’s website. In particular, I’ve had a bit of a go at getting swimming pool opening times off the site.

Helpfully (although I’m quite sure it wasn’t deliberate) the website is organised into tables. It’s relatively straightforward to grab the contents of the table and stick it in the spreadsheet. However, the tables don’t all follow the same rules. Some have a Monday to Friday field, for example, while others have a separate entry for each day of the week.

Nonetheless, it opens up the possibility of gaining a greater understanding of swimming pool opening times in Birmingham as part of my ongoing investigation into swimming pool provision in the city.

2 Responses to “Data scraping using Google Docs spreadsheets”

  1. […] Data scraping using Google Docs spreadsheets | Be Vocal MA Online Journalism student @Andbwell on scraping #data using google docs: http://bit.ly/a9gXYV (tags: data via:packrati.us) […]

  2. […] public librarians might have to play in opening up and liberating content at a local/civic level: Data scraping using Google Docs spreadsheets (after all, public libraries are one of the places you go to pick up flyers and see posters about […]

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>