Merton Council planning applications
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 2.5 KiB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869
  1. # Merton Council planning applications scraper
  2. This scrapes planning applications data from [Merton Council's planning database website](http://planning.merton.gov.uk/Northgate/PlanningExplorerAA/GeneralSearch.aspx) and puts it in an SQLite database.
  3. Merton Council runs [Northgate Planning Explorer](https://www.northgateps.com).
  4. This scraper is designed to run once per 24 hours.
  5. It runs on [Morph](https://morph.io). To get started [see the documentation](https://morph.io/documentation).
  6. ## Schema
  7. The schema is based on the core elements from [planningalerts.org.au](https://www.planningalerts.org.au/how_to_write_a_scraper).
  8. ## Installation
  9. $ git clone https://github.com/adrianshort/merton-planning-applications.git
  10. $ cd merton-planning-applications
  11. $ bundle
  12. ### Configuration
  13. According to the principle of _one codebase, many deploys_, this scraper is [configured using environment variables](https://12factor.net/config) rather than by editing constants in the code.
  14. |Name|Purpose|Default|Required?|
  15. |------------------|-----------------------------------------|----------|
  16. |SCRAPER_DELAY |Minimum delay in seconds between HTTP requests to the server.|10|No|
  17. |SCRAPER_USER_AGENT|User agent string sent as an HTTP request header.|_None_|Yes|
  18. |SCRAPER_LOG_LEVEL |Controls the level of detail in the output logs according to [Ruby's `Logger` class](https://ruby-doc.org/stdlib-2.1.0/libdoc/logger/rdoc/Logger.html) constants.|1 _(Logger::INFO)_|No|
  19. ## Running
  20. $ bundle exec ruby scraper.rb
  21. ## Logging
  22. [Log messages are written unbuffered to `STDOUT`.](https://12factor.net/logs) You can redirect them to a file or the log drain of your choice.
  23. $ bundle exec ruby scraper.rb >> log.txt
  24. Morph.io will only show the first 10,000 lines of log output. This constraint doesn't apply when running elsewhere, eg on your local machine.
  25. ## Similar projects
  26. - [maxharlow/scrape-planning-northgate](https://github.com/maxharlow/scrape-planning-northgate) (Node)
  27. - [adrianshort/planningalerts](https://github.com/adrianshort/planningalerts), especially the [Python scrapers for Northgate Planning Explorer](https://github.com/adrianshort/planningalerts/blob/master/python_scrapers/PlanningExplorer.py) - not by me, just a copy of this project's codebase
  28. ## Tags
  29. - Merton
  30. - Merton Council
  31. - London
  32. - UK
  33. - localgov
  34. - localgovdigital
  35. - opendata
  36. - Morph
  37. - ScraperWiki
  38. - planning
  39. - Planning Alerts
  40. - plantech
  41. - civictech
  42. ## Author
  43. By [Adrian Short](https://www.adrianshort.org/).
  44. This project is not by or affiliated with Merton Council.