Adrian Short cad5fcd2b9 | 6 år sedan | |
---|---|---|
bin | 6 år sedan | |
lib | 6 år sedan | |
spec | 6 år sedan | |
.gitignore | 6 år sedan | |
.rspec | 6 år sedan | |
Gemfile | 6 år sedan | |
LICENSE | 6 år sedan | |
README.md | 6 år sedan | |
Rakefile | 6 år sedan | |
uk_planning_scraper.gemspec | 6 år sedan |
PRE-ALPHA: Only works with Idox and Northgate sites and spews a lot of stuff to STDOUT. Not for production use.
This gem scrapes planning applications data from UK local planning authority websites, eg Westminster City Council. Data is returned as an array of hashes, one hash for each planning application.
This scraper gem doesn’t use a database. Storing the output is up to you. It’s just a convenient way to get the data.
Currently this only works for Idox and Northgate sites. The ultimate aim is to provide a consistent interface in a single gem for all variants of all planning systems: Idox Public Access, Northgate Planning Explorer, OcellaWeb, Agile Planning and all the one-off systems.
This project is not affiliated with any organisation.
Add this line to your application’s Gemfile:
gem 'uk_planning_scraper', \
git: 'https://github.com/adrianshort/uk_planning_scraper/'
And then execute:
$ bundle install
Or install it yourself as:
$ gem install specific_install
$ gem specific_install adrianshort/uk_planning_scraper
require 'uk_planning_scraper'
require 'pp'
Applications in Westminster decided in the last seven days:
pp UKPlanningScraper::Authority.named('Westminster').decided_days(7).scrape
Scrape the last week’s planning decisions across the whole of London (actually 23 of the 35 authorities right now):
authorities = UKPlanningScraper::Authority.tagged('london')
authorities.each do |authority|
applications = authority.decided_days(7).scrape
pp applications
# You'll probably want to save `applications` to your database here
end
Launderette applications validated in the last seven days in Scotland:
authorities = UKPlanningScraper::Authority.tagged('scotland')
authorities.each do |authority|
applications = authority.validated_days(7).keywords('launderette').scrape
pp applications # You'll probably want to save `apps` to your database here
end
### More scrape parameter methods
Chain as many scrape parameter methods on a UKPlanningScraper::Authority
object as you like, making sure that scrape
comes last.
received_from(Date.parse("1 Jan 2016"))
received_to(Date.parse("31 Dec 2016"))
# Received in the last n days (including today)
# Use instead of received_to, received_from
received_days(7)
validated_to(Date.today)
validated_from(Date.today - 30)
validated_days(7) # instead of validated_to, validated_from
decided_to(Date.today)
decided_from(Date.today - 30)
decided_days(7) # instead of decided_to, decided_from
# Check that the systems you're scraping return the
# results you expect for multiple keywords (AND or OR?)
keywords("hip gable")
applicant_name("Mr and Mrs Smith") # Currently Idox only
application_type("Householder") # Currently Idox only
development_type("") # Currently Idox only
scrape # runs the scraper
This gem has no interest whatsoever in persistence. What you do with the data it outputs is up to you: relational databases, document stores, VHS and clay tablets are all blissfully none of its business. But using the ScraperWiki gem is a really easy way to store your data:
require 'scraperwiki' # Must be installed, of course
ScraperWiki.save_sqlite([:authority_name, :council_reference], applications)
That applications
param can be a hash or an array of hashes, which is what
gets returned by our Authority.scrape
.
Tags are always lowercase and one word.
london_auths = UKPlanningScraper::Authority.tagged('london')
We’ve got tags for areas:
We also automatically add tags for software systems:
and whatever you’d like to add that would be useful to others.
London has got 32 London Boroughs, tagged londonboroughs
. These are the
councils under the authority of the Mayor of London and the Greater London
Authority.
It has 33 councils: the London Boroughs plus the City of London (named City of London
). We don’t currently have a tag for this, but if you want to add
londoncouncils
please go ahead.
And it’s got 35 local planning authorities: the 33 councils plus the two
londondevelopmentcorporations
, named London Legacy Development Corporation
and Old Oak and Park Royal Development Corporation
. The tag london
covers
all (and only) the 35 local planning authorities in London.
UKPlanningScraper::Authority.tagged('londonboroughs').size
# => 32
UKPlanningScraper::Authority.tagged('londondevelopmentcorporations').size
# => 2
UKPlanningScraper::Authority.tagged('london').size
# => 35
UKPlanningScraper::Authority.named('Merton').tags
# => ["england", "london", "londonboroughs", "northgate", "outerlondon", "southlondon"]
UKPlanningScraper::Authority.not_tagged('london')
# => [...]
UKPlanningScraper::Authority.named('Islington').tagged?('southlondon')
# => false
UKPlanningScraper::Authority.all.each { |a| puts a.name }
pp UKPlanningScraper::Authority.tags
The list of authorities is in a CSV file in /lib/uk_planning_scraper
:
The easiest way to add to or edit this list is to edit within GitHub (use the pencil icon) and create a new pull request for your changes. If accepted, your changes will be available to everyone with the next version of the gem.
The file format is one line per authority, with comma-separated:
City of London
which is a special case)There’s no need to manually add tags to the authorities.csv
file for the
software systems like idox
, northgate
etc as these are added automatically.
Please check the tag list before you change anything:
pp UKPlanningScraper::Authority.tags
After checking out the repo, run bin/setup
to install dependencies. You can
also run bin/console
for an interactive prompt that will allow you to
experiment.
To install this gem onto your local machine, run bundle exec rake install
. To
release a new version, update the version number in version.rb
, and then run
bundle exec rake release
, which will create a git tag for the version, push
git commits and tags, and push the .gem
file to
rubygems.org.
Bug reports and pull requests are welcome on GitHub at https://github.com/adrianshort/uk_planning_scraper.