Merton Council planning applications
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 2.7 KiB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172
  1. # Merton Council planning applications scraper
  2. This scrapes planning applications data from [Merton Council's planning database website](http://planning.merton.gov.uk/Northgate/PlanningExplorerAA/GeneralSearch.aspx) and puts it in an SQLite database.
  3. Merton Council runs [Northgate Planning Explorer](https://www.northgateps.com).
  4. This scraper is designed to run once per 24 hours.
  5. It runs on [Morph](https://morph.io). To get started [see the documentation](https://morph.io/documentation).
  6. ## Schema
  7. The schema is based on the core elements from [planningalerts.org.au](https://www.planningalerts.org.au/how_to_write_a_scraper).
  8. ## Installation
  9. $ git clone https://github.com/adrianshort/merton-planning-applications.git
  10. $ cd merton-planning-applications
  11. $ bundle
  12. ## Configuration
  13. According to the principle of _one codebase, many deploys_, this scraper is [configured using environment variables](https://12factor.net/config) rather than by editing constants in the code.
  14. | Name | Purpose | Default | Required? |
  15. | --- | --- | --- |
  16. | MORPH_DELAY | Minimum delay in seconds between HTTP requests to the server. | 10 | No |
  17. | MORPH_USER_AGENT | User agent string sent as an HTTP request header. | _None_ | Yes |
  18. | MORPH_LOG_LEVEL | Controls the level of detail in the output logs according to [Ruby's `Logger` class](https://ruby-doc.org/stdlib-2.1.0/libdoc/logger/rdoc/Logger.html) constants. | 1 _(Logger::INFO)_ | No |
  19. | MORPH_DAYS | | Number of days to scrape | Only if MORPH_MONTHS is unset |
  20. | MORPH_MONTHS | Number of months to scrape | _None_ | Only if MORPH_DAYS is unset |
  21. | MORPH_STATUS | Only scrape applications with this status code. | _None_ | No |
  22. ## Running
  23. $ bundle exec ruby scraper.rb
  24. ## Logging
  25. [Log messages are written unbuffered to `STDOUT`.](https://12factor.net/logs) You can redirect them to a file or the log drain of your choice.
  26. $ bundle exec ruby scraper.rb >> log.txt
  27. Morph.io will only show the first 10,000 lines of log output. This constraint doesn't apply when running elsewhere, eg on your local machine.
  28. ## Similar projects
  29. - [maxharlow/scrape-planning-northgate](https://github.com/maxharlow/scrape-planning-northgate) (Node)
  30. - [adrianshort/planningalerts](https://github.com/adrianshort/planningalerts), especially the [Python scrapers for Northgate Planning Explorer](https://github.com/adrianshort/planningalerts/blob/master/python_scrapers/PlanningExplorer.py) - not by me, just a copy of this project's codebase
  31. ## Tags
  32. - Merton
  33. - Merton Council
  34. - London
  35. - UK
  36. - localgov
  37. - localgovdigital
  38. - opendata
  39. - Morph
  40. - ScraperWiki
  41. - planning
  42. - Planning Alerts
  43. - plantech
  44. - civictech
  45. ## Author
  46. By [Adrian Short](https://www.adrianshort.org/).
  47. This project is not by or affiliated with Merton Council.