A Ruby gem to get planning applications data from UK council websites.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 7.0 KiB

5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250
  1. # UK Planning Scraper
  2. **PRE-ALPHA: Only works with Idox and Northgate sites and spews a lot of stuff
  3. to STDOUT. Not for production use.**
  4. This gem scrapes planning applications data from UK local planning authority
  5. websites, eg Westminster City Council. Data is returned as an array of hashes,
  6. one hash for each planning application.
  7. This scraper gem doesn't use a database. Storing the output is up to you. It's
  8. just a convenient way to get the data.
  9. Currently this only works for Idox and Northgate sites. The ultimate aim is to
  10. provide a consistent interface in a single gem for all variants of all planning
  11. systems: Idox Public Access, Northgate Planning Explorer, OcellaWeb, Agile
  12. Planning and all the one-off systems.
  13. This project is not affiliated with any organisation.
  14. ## Installation
  15. Add this line to your application's Gemfile:
  16. ```ruby
  17. gem 'uk_planning_scraper'
  18. ```
  19. And then execute:
  20. $ bundle install
  21. Or install it directly:
  22. $ gem install uk_planning_scraper
  23. ## Usage
  24. ### First, require your stuff
  25. ```ruby
  26. require 'uk_planning_scraper'
  27. require 'pp'
  28. ```
  29. ### Scrape from a council
  30. Applications in Westminster decided in the last seven days:
  31. ```ruby
  32. pp UKPlanningScraper::Authority.named('Westminster').decided_days(7).scrape
  33. ```
  34. ### Scrape from a bunch of councils
  35. Scrape the last week's planning decisions across the whole of
  36. London (actually 23 of the 35 authorities right now):
  37. ```ruby
  38. authorities = UKPlanningScraper::Authority.tagged('london')
  39. authorities.each do |authority|
  40. applications = authority.decided_days(7).scrape
  41. pp applications
  42. # You'll probably want to save `applications` to your database here
  43. end
  44. ```
  45. ### Satisfy your niche interests
  46. Launderette applications validated in the last seven days in Scotland:
  47. ```ruby
  48. authorities = UKPlanningScraper::Authority.tagged('scotland')
  49. authorities.each do |authority|
  50. applications = authority.validated_days(7).keywords('launderette').scrape
  51. pp applications # You'll probably want to save `apps` to your database here
  52. end
  53. ```
  54. ### More scrape parameter methods
  55. Chain as many scrape parameter methods on a `UKPlanningScraper::Authority`
  56. object as you like, making sure that `scrape` comes last.
  57. ```ruby
  58. received_from(Date.parse("1 Jan 2016"))
  59. received_to(Date.parse("31 Dec 2016"))
  60. # Received in the last n days (including today)
  61. # Use instead of received_to, received_from
  62. received_days(7)
  63. validated_to(Date.today)
  64. validated_from(Date.today - 30)
  65. validated_days(7) # instead of validated_to, validated_from
  66. decided_to(Date.today)
  67. decided_from(Date.today - 30)
  68. decided_days(7) # instead of decided_to, decided_from
  69. # Check that the systems you're scraping return the
  70. # results you expect for multiple keywords (AND or OR?)
  71. keywords("hip gable")
  72. applicant_name("Mr and Mrs Smith") # Currently Idox only
  73. application_type("Householder") # Currently Idox only
  74. development_type("") # Currently Idox only
  75. scrape # runs the scraper
  76. ```
  77. ### Save to a SQLite database
  78. This gem has no interest whatsoever in persistence. What you do with the data it
  79. outputs is up to you: relational databases, document stores, VHS and clay
  80. tablets are all blissfully none of its business. But using the
  81. [ScraperWiki](https://github.com/openaustralia/scraperwiki-ruby) gem is a really
  82. easy way to store your data:
  83. ```ruby
  84. require 'scraperwiki' # Must be installed, of course
  85. ScraperWiki.save_sqlite([:authority_name, :council_reference], applications)
  86. ```
  87. That `applications` param can be a hash or an array of hashes, which is what
  88. gets returned by our `Authority.scrape`.
  89. ### Find authorities by tag
  90. Tags are always lowercase and one word.
  91. ```ruby
  92. london_auths = UKPlanningScraper::Authority.tagged('london')
  93. ```
  94. We've got tags for areas:
  95. - london
  96. - innerlondon
  97. - outerlondon
  98. - northlondon
  99. - southlondon
  100. - greatermanchester
  101. - surrey
  102. - wales
  103. We also automatically add tags for software systems:
  104. - idox
  105. - northgate
  106. - ocellaweb
  107. - agileplanning
  108. - unknownsystem -- for when we can't identify the system
  109. and whatever you'd like to add that would be useful to others.
  110. ### WTF is up with London?
  111. London has got 32 London Boroughs, tagged `londonboroughs`. These are the
  112. councils under the authority of the Mayor of London and the Greater London
  113. Authority.
  114. It has 33 councils: the London Boroughs plus the City of London (named `City of
  115. London`). We don't currently have a tag for this, but if you want to add
  116. `londoncouncils` please go ahead.
  117. And it's got 35 local planning authorities: the 33 councils plus the two
  118. `londondevelopmentcorporations`, named `London Legacy Development Corporation`
  119. and `Old Oak and Park Royal Development Corporation`. The tag `london` covers
  120. all (and only) the 35 local planning authorities in London.
  121. ```ruby
  122. UKPlanningScraper::Authority.tagged('londonboroughs').size
  123. # => 32
  124. UKPlanningScraper::Authority.tagged('londondevelopmentcorporations').size
  125. # => 2
  126. UKPlanningScraper::Authority.tagged('london').size
  127. # => 35
  128. ```
  129. ### More fun with Authority tags
  130. ```ruby
  131. UKPlanningScraper::Authority.named('Merton').tags
  132. # => ["england", "london", "londonboroughs", "northgate", "outerlondon", "southlondon"]
  133. UKPlanningScraper::Authority.not_tagged('london')
  134. # => [...]
  135. UKPlanningScraper::Authority.named('Islington').tagged?('southlondon')
  136. # => false
  137. ```
  138. ### List all authorities
  139. ```ruby
  140. UKPlanningScraper::Authority.all.each { |a| puts a.name }
  141. ```
  142. ### List all tags
  143. ```ruby
  144. pp UKPlanningScraper::Authority.tags
  145. ```
  146. ## Add your favourite local planning authorities
  147. The list of authorities is in a CSV file in `/lib/uk_planning_scraper`:
  148. https://github.com/adrianshort/uk_planning_scraper/blob/master/lib/uk_planning_scraper/authorities.csv
  149. The easiest way to add to or edit this list is to edit within GitHub (use the
  150. pencil icon) and create a new pull request for your changes. If accepted, your
  151. changes will be available to everyone with the next version of the gem.
  152. The file format is one line per authority, with comma-separated:
  153. - Name (omit "the", "council", "borough of", "city of", etc. and write "and" not
  154. "&", except for `City of London` which is a special case)
  155. - URL of the search form (use the advanced search URL if there is one)
  156. - Tags (use as many comma-separated tags as is reasonable, lowercase and all one
  157. word.)
  158. There's no need to manually add tags to the `authorities.csv` file for the
  159. software systems like `idox`, `northgate` etc as these are added automatically.
  160. Please check the tag list before you change anything:
  161. ```ruby
  162. pp UKPlanningScraper::Authority.tags
  163. ```
  164. ## Development
  165. After checking out the repo, run `bin/setup` to install dependencies. You can
  166. also run `bin/console` for an interactive prompt that will allow you to
  167. experiment.
  168. To install this gem onto your local machine, run `bundle exec rake install`. To
  169. release a new version, update the version number in `version.rb`, and then run
  170. `bundle exec rake release`, which will create a git tag for the version, push
  171. git commits and tags, and push the `.gem` file to
  172. [rubygems.org](https://rubygems.org).
  173. ## Contributing
  174. Bug reports and pull requests are welcome on GitHub at
  175. https://github.com/adrianshort/uk_planning_scraper.