A Ruby gem to get planning applications data from UK council websites.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 5.9 KiB

6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
6 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180
  1. # UK Planning Scraper
  2. **PRE-ALPHA: Only works with Idox and Northgate sites and spews a lot of stuff to STDOUT. Not for production use.**
  3. This gem scrapes planning applications data from UK council/local planning authority websites, eg Westminster City Council. Data is returned as an array of hashes, one hash for each planning application.
  4. This scraper gem doesn't use a database. Storing the output is up to you. It's just a convenient way to get the data.
  5. Currently this only works for Idox and Northgate sites. The ultimate aim is to provide a consistent interface in a single gem for all variants of all planning systems: Idox Public Access, Northgate Planning Explorer, OcellaWeb, Agile Planning and all the one-off systems.
  6. This project is not affiliated with any organisation.
  7. ## Installation
  8. Add this line to your application's Gemfile:
  9. ```ruby
  10. gem 'uk_planning_scraper', :git => 'https://github.com/adrianshort/uk_planning_scraper/'
  11. ```
  12. And then execute:
  13. $ bundle install
  14. Or install it yourself as:
  15. $ gem install specific_install
  16. $ gem specific_install adrianshort/uk_planning_scraper
  17. ## Usage
  18. ### First, require your stuff
  19. ```ruby
  20. require 'uk_planning_scraper'
  21. require 'pp'
  22. ```
  23. ### Scrape from a council
  24. ```ruby
  25. apps = UKPlanningScraper::Authority.named('Westminster').scrape({ decided_days: 7 })
  26. pp apps
  27. ```
  28. ### Scrape from a bunch of councils
  29. ```ruby
  30. auths = UKPlanningScraper::Authority.tagged('london')
  31. auths.each do |auth|
  32. apps = auth.scrape({ decided_days: 7 })
  33. pp apps # You'll probably want to save `apps` to your database here
  34. end
  35. ```
  36. Yes, we just scraped the last week's planning decisions across the whole of London (actually 23 of the 35 authorities right now) with five lines of code.
  37. ### Satisfy your niche interests
  38. ```ruby
  39. auths = UKPlanningScraper::Authority.tagged('scotland')
  40. auths.each do |auth|
  41. apps = auth.scrape({ validated_days: 7, keywords: 'launderette' })
  42. pp apps # You'll probably want to save `apps` to your database here
  43. end
  44. ```
  45. ### More search parameters
  46. ```ruby
  47. # Don't try these all at once
  48. params = {
  49. received_to: Date.today,
  50. received_from: Date.today - 30,
  51. received_days: 7, # instead of received_to, received_from
  52. validated_to: Date.today,
  53. validated_from: Date.today - 30,
  54. validated_days: 7, # instead of validated_to, validated_from
  55. decided_to: Date.today,
  56. decided_from: Date.today - 30,
  57. decided_days: 7 # instead of decided_to, decided_from
  58. keywords: "hip gable", # Check that the systems you're scraping return the results you expect for multiple keywords (AND or OR?)
  59. }
  60. apps = UKPlanningScraper::Authority.named('Camden').scrape(params)
  61. ```
  62. ### Save to a SQLite database
  63. This gem has no interest whatsoever in persistence. What you do with the data it outputs is up to you: relational databases, document stores, VHS and clay tablets are all blissfully none of its business. But using the [ScraperWiki](https://github.com/openaustralia/scraperwiki-ruby) gem is a really easy way to store your data:
  64. ```ruby
  65. require 'scraperwiki' # Must be installed, of course
  66. ScraperWiki.save_sqlite([:authority_name, :council_reference], apps)
  67. ```
  68. That `apps` param can be a hash or an array of hashes, which is what gets returned by our `search()`.
  69. ### Find authorities by tag
  70. Tags are always lowercase and one word.
  71. ```ruby
  72. london_auths = UKPlanningScraper::Authority.tagged('london')
  73. ```
  74. We've got tags for areas:
  75. - london
  76. - innerlondon
  77. - outerlondon
  78. - northlondon
  79. - southlondon
  80. - surrey
  81. - wales
  82. and software systems:
  83. - idox
  84. - northgate
  85. and whatever you'd like to add that would be useful to others.
  86. ### More fun with Authority tags
  87. ```ruby
  88. UKPlanningScraper::Authority.named('Merton').tags
  89. # => ["london", "outerlondon", "southlondon", "england", "northgate", "londonboroughs"]
  90. UKPlanningScraper::Authority.not_tagged('london')
  91. # => [...]
  92. UKPlanningScraper::Authority.named('Islington').tagged?('southlondon')
  93. # => false
  94. ```
  95. ### List all authorities
  96. ```ruby
  97. UKPlanningScraper::Authority.all.each { |a| puts a.name }
  98. ```
  99. ### List all tags
  100. ```ruby
  101. pp UKPlanningScraper::Authority.tags
  102. ```
  103. ## Add your favourite local planning authorities
  104. The list of authorities is in a CSV file in `/lib/uk_planning_scraper`:
  105. https://github.com/adrianshort/uk_planning_scraper/blob/master/lib/uk_planning_scraper/authorities.csv
  106. The easiest way to add to or edit this list is to edit within GitHub (use the pencil icon) and create a new pull request for your changes. If accepted, your changes will be available to everyone with the next version of the gem.
  107. The file format is one line per authority, with comma-separated:
  108. - Name (omit "the", "council", "borough of", "city of", etc. and write "and" not "&", except for `City of London` which is a special case)
  109. - URL of the search form (use the advanced search URL if there is one)
  110. - Tags (use as many comma-separated tags as is reasonable, lowercase and all one word.)
  111. Currently only Idox and Northgate scrapers work but feel free to add authorities that use other systems, along with appropriate system tags like `ocellaweb` and `agileplanning`. This gem selects the appropriate scraper by examining the URL not by looking at the tags, so it doesn't matter what you use as long as it's consistent with others.
  112. Please check the tag list before you change anything:
  113. ```ruby
  114. pp UKPlanningScraper::Authority.tags
  115. ```
  116. ## Development
  117. After checking out the repo, run `bin/setup` to install dependencies. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
  118. To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
  119. ## Contributing
  120. Bug reports and pull requests are welcome on GitHub at https://github.com/adrianshort/uk_planning_scraper.