Jez Nicholson před 5 roky
committed by GitHub
rodič
revize
308e43a13b
V databázi nebyl nalezen žádný známý klíč pro tento podpis ID GPG klíče: 4AEE18F83AFDEB23
7 změnil soubory, kde provedl 434 přidání a 74 odebrání
  1. +18
    -0
      lib/uk_planning_scraper/authorities.csv
  2. +6
    -0
      lib/uk_planning_scraper/authority_scrape_params.rb
  3. +74
    -57
      lib/uk_planning_scraper/idox.rb
  4. +18
    -17
      spec/authority_spec.rb
  5. +40
    -0
      spec/council_reference_spec.rb
  6. +139
    -0
      spec/vcr_cassettes/for_a_non-existant_idox_planning_reference.yml
  7. +139
    -0
      spec/vcr_cassettes/for_an_existing_idox_planning_reference.yml

+ 18
- 0
lib/uk_planning_scraper/authorities.csv Zobrazit soubor

@@ -1,8 +1,10 @@
authority_name,url,tags
Aberdeen,https://publicaccess.aberdeencity.gov.uk/online-applications/search.do?action=advanced,scotland
Aberdeenshire,https://upa.aberdeenshire.gov.uk/online-applications/search.do?action=advanced,scotland
Adur and Worthing,https://planning.adur-worthing.gov.uk/online-applications/search.do?action=advanced,england
Allerdale,https://planning.allerdale.gov.uk/portal/servlets/ApplicationSearchServlet,england
Amber Valley,https://www.ambervalley.gov.uk/environment-and-planning/planning/development-management/planning-applications/view-a-planning-application.aspx,england
Argyll and Bute,https://publicaccess.argyll-bute.gov.uk/online-applications/search.do?action=advanced,scotland
Arun,https://www.arun.gov.uk/weekly-lists,england
Ashfield,https://www2.ashfield.gov.uk/cfusion/Planning/plan_findfile.cfm,england
Ashford,http://planning.ashford.gov.uk/planning/Default.aspx?new=true,england
@@ -25,18 +27,27 @@ Bury,https://planning.bury.gov.uk/online-applications/search.do?action=advanced,
Calderdale,https://portal.calderdale.gov.uk/online-applications/search.do?action=advanced,england westyorkshire
Camden,http://planningrecords.camden.gov.uk/Northgate/PlanningExplorer17/GeneralSearch.aspx,londonboroughs london
Cardiff,https://planningonline.cardiff.gov.uk/online-applications/search.do?action=advanced,wales
Carlisle,https://publicaccess.carlisle.gov.uk/online-applications/search.do?action=advanced,england
Cheshire West and Chester,https://pa.cheshirewestandchester.gov.uk/online-applications/search.do?action=advanced,chester cheshire england
City of London,http://www.planning2.cityoflondon.gov.uk/online-applications/search.do?action=advanced,london innerlondon northlondon england
Conwy,http://www.conwy.gov.uk/en/Resident/Planning-Building-Control-and-Conservation/Planning-Applications/Planning-Explorer.aspx,wales
Cornwall,http://planning.cornwall.gov.uk/online-applications/search.do?action=advanced,england
Croydon,http://publicaccess2.croydon.gov.uk/online-applications/search.do?action=advanced,londonboroughs london
Cornwall,http://planning.cornwall.gov.uk/online-applications/search.do?action=advanced,england
Comhairle nan Eilean Siar,http://planning.cne-siar.gov.uk/PublicAccess/search.do?action=advanced,scotland
County Durham,https://publicaccess.durham.gov.uk/online-applications/search.do?action=advanced,england
Craven,https://publicaccess.cravendc.gov.uk/online-applications/search.do?action=advanced,england
Darlington,https://publicaccess.darlington.gov.uk/online-applications/search.do?action=advanced,england
Doncaster,https://planning.doncaster.gov.uk/online-applications/search.do?action=advanced,england southyorkshire
Dumfries and Galloway,https://eaccess.dumgal.gov.uk/online-applications/search.do?action=advanced,scotland
Ealing,https://pam.ealing.gov.uk/online-applications/search.do?action=advanced,londonboroughs london
East Riding of Yorkshire,https://newplanningaccess.eastriding.gov.uk/newplanningaccess/search.do?action=advanced,england
East Lothian,https://pa.eastlothian.gov.uk/online-applications/search.do?action=advanced,scotland
East Lindsey,https://publicaccess.e-lindsey.gov.uk/online-applications/search.do?action=advanced,england
Edinburgh,https://citydev-portal.edinburgh.gov.uk/idoxpa-web/search.do?action=advanced,scotland
Enfield,https://planningandbuildingcontrol.enfield.gov.uk/online-applications/search.do?action=advanced,londonboroughs london
Epsom and Ewell,http://eplanning.epsom-ewell.gov.uk/online-applications/search.do?action=advanced,surrey england
Fife,https://planning.fife.gov.uk/online/search.do?action=advanced,scotland
Glasgow,https://publicaccess.glasgow.gov.uk/online-applications/search.do?action=advanced,scotland
Greenwich,https://planning.royalgreenwich.gov.uk/online-applications/search.do?action=advanced,london innerlondon southlondon england londonboroughs
Hackney,http://planning.hackney.gov.uk/Northgate/PlanningExplorer/generalsearch.aspx,londonboroughs london
@@ -46,6 +57,7 @@ Haringey,http://www.planningservices.haringey.gov.uk/portal/servlets/Application
Harrow,http://www.harrow.gov.uk/planningsearch/lg/plansearch.page?org.apache.shale.dialog.DIALOG_NAME=planningsearch&Param=lg.Planning&searchType=detailed,london londonboroughs
Havering,http://development.havering.gov.uk/OcellaWeb/planningSearch,london londonboroughs eastlondon outerlondon
Hillingdon,http://planning.hillingdon.gov.uk/OcellaWeb/planningSearch,london londonboroughs westlondon outerlondon
Highland,https://wam.highland.gov.uk/wam/search.do?action=advanced,scotland
Hounslow,http://planning.hounslow.gov.uk/planning_search.aspx,london londonboroughs
Hull,https://www.hullcc.gov.uk/padcbc/publicaccess-live/search.do?action=advanced,england
Islington,http://planning.islington.gov.uk/northgate/planningexplorer/generalsearch.aspx,londonboroughs london
@@ -62,8 +74,10 @@ Merton,http://planning.merton.gov.uk/Northgate/PlanningExplorerAA/GeneralSearch.
Milton Keynes,https://publicaccess2.milton-keynes.gov.uk/online-applications/search.do?action=advanced,england
Newcastle upon Tyne,https://publicaccessapplications.newcastle.gov.uk/online-applications/search.do?action=advanced,england tyneandwear
Newham,https://pa.newham.gov.uk/online-applications/search.do?action=advanced,londonboroughs london londonboroughs
North Ayrshire,https://www.eplanning.north-ayrshire.gov.uk/OnlinePlanning/search.do?action=advanced,scotland
North East Lincolnshire,http://planninganddevelopment.nelincs.gov.uk/online-applications/search.do?action=advanced,england
North Lincolnshire,http://www.planning.northlincs.gov.uk/plan/search/,england
North Norfolk,https://idoxpa.north-norfolk.gov.uk/online-applications/search.do?action=advanced,england
North Somerset,https://planning.n-somerset.gov.uk/online-applications/search.do?action=advanced,england
Northern Ireland,http://epicpublic.planningni.gov.uk/publicaccess/search.do?action=advanced&searchType=Application,northernireland belfast
North Tyneside,https://idoxpublicaccess.northtyneside.gov.uk/online-applications/search.do?action=advanced,england tyneandwear
@@ -75,10 +89,13 @@ Peterborough,https://planpa.peterborough.gov.uk/online-applications/search.do?ac
Plymouth,https://planning.plymouth.gov.uk/online-applications/search.do?action=advanced,england
Poole,https://boppa.poole.gov.uk/online-applications/search.do?action=advanced,england
Portsmouth,http://publicaccess.portsmouth.gov.uk/online-applications/search.do?action=advanced,england
Purbeck,https://planningsearch.purbeck-dc.gov.uk/PlanAppSrch.aspx,england
Redbridge,http://planning.redbridge.gov.uk/swiftlg/apas/run/wphappcriteria.display,london londonboroughs
Richmond,http://www2.richmond.gov.uk/PlanData2/Planning_Report.aspx,london londonboroughs
Rhondda Cynon Taff,https://planningonline.rctcbc.gov.uk/online-applications/search.do?action=advanced,wales
Rochdale,http://publicaccess.rochdale.gov.uk/online-applications/search.do?action=advanced,england greatermanchester
Rutland,https://publicaccess.rutland.gov.uk/online-applications/search.do?action=advanced,england
Scottish Government S36,http://www.energyconsents.scot/ApplicationSearch.aspx?T=2,scotland
Salford,http://publicaccess.salford.gov.uk/publicaccess/search.do?action=advanced,england greatermanchester
Sefton,https://pa.sefton.gov.uk/online-applications/search.do?action=advanced,england merseyside liverpoolcityregion
Sheffield,https://planningapps.sheffield.gov.uk/online-applications/search.do?action=advanced,england
@@ -87,6 +104,7 @@ Southend-on-Sea,https://publicaccess.southend.gov.uk/online-applications/search.
South Downs,https://planningpublicaccess.southdowns.gov.uk/online-applications/search.do?action=advanced,nationalparks england
Southampton,https://planningpublicaccess.southampton.gov.uk/online-applications/search.do?action=advanced,england
South Gloucestershire,https://developments.southglos.gov.uk/online-applications/search.do?action=advanced,england
South Lanarkshire,https://publicaccess.southlanarkshire.gov.uk/online-applications/search.do?action=advanced,scotland
South Tyneside,http://planning.southtyneside.info/Northgate/PlanningExplorer/GeneralSearch.aspx,england tyneandwear
Southwark,https://planning.southwark.gov.uk/online-applications/search.do?action=advanced,londonboroughs london
St. Helens,https://publicaccess.sthelens.gov.uk/online-applications/search.do?action=advanced,england merseyside liverpoolcityregion


+ 6
- 0
lib/uk_planning_scraper/authority_scrape_params.rb Zobrazit soubor

@@ -51,6 +51,12 @@ module UKPlanningScraper
self
end
def council_reference(s)
check_class(s, String)
@scrape_params[:council_reference] = s.strip
self
end
def applicant_name(s)
unless system == 'idox'
raise NoMethodError.new("applicant_name is only implemented for Idox. \


+ 74
- 57
lib/uk_planning_scraper/idox.rb Zobrazit soubor

@@ -41,6 +41,7 @@ module UKPlanningScraper
form.send(:"date(applicationDecisionStart)", params[:decided_from].strftime(date_format)) if params[:decided_from]
form.send(:"date(applicationDecisionEnd)", params[:decided_to].strftime(date_format)) if params[:decided_to]

form.send(:"searchCriteria\.reference", params[:council_reference])
form.send(:"searchCriteria\.description", params[:keywords])
# Some councils don't have the applicant name on their form, eg Bexley
@@ -115,68 +116,84 @@ module UKPlanningScraper
if res.code == '200' # That's a String not an Integer, ffs
# Parse the summary tab for this app

app.scraped_at = Time.now

# The Documents tab doesn't show if there are no documents (we get li.nodocuments instead)
# Bradford has #tab_documents but without the document count on it
app.documents_count = 0

if documents_link = res.at('.associateddocument a')
if documents_link.inner_text.match(/\d+/)
app.documents_count = documents_link.inner_text.match(/\d+/)[0].to_i
app.documents_url = base_url + documents_link[:href]
end
elsif documents_link = res.at('#tab_documents')
if documents_link.inner_text.match(/\d+/)
app.documents_count = documents_link.inner_text.match(/\d+/)[0].to_i
app.documents_url = base_url + documents_link[:href]
end
end
# We need to find values in the table by using the th labels.
# The row indexes/positions change from site to site (or even app to app) so we can't rely on that.

res.search('#simpleDetailsTable tr').each do |row|
key = row.at('th').inner_text.strip
value = row.at('td').inner_text.strip
case key
when 'Reference'
app.council_reference = value
when 'Alternative Reference'
app.alternative_reference = value unless value.empty?
when 'Planning Portal Reference'
app.alternative_reference = value unless value.empty?
when 'Application Received'
app.date_received = Date.parse(value) if value.match(/\d/)
when 'Application Registered'
app.date_received = Date.parse(value) if value.match(/\d/)
when 'Application Validated'
app.date_validated = Date.parse(value) if value.match(/\d/)
when 'Address'
app.address = value unless value.empty?
when 'Proposal'
app.description = value unless value.empty?
when 'Status'
app.status = value unless value.empty?
when 'Decision'
app.decision = value unless value.empty?
when 'Decision Issued Date'
app.date_decision = Date.parse(value) if value.match(/\d/)
when 'Appeal Status'
app.appeal_status = value unless value.empty?
when 'Appeal Decision'
app.appeal_decision = value unless value.empty?
else
puts "Error: key '#{key}' not found"
end # case
end # each row
parse_summary(app, res)
else
puts "Error: HTTP #{res.code}"
end # if
end # scrape summary tab for apps

if apps == [] && params[:council_reference] && page.at_css('.addressCrumb')
app = Application.new
app.council_reference = params[:council_reference]
parse_summary(app, page)
apps << app
end # direct hit
apps
end # scrape_idox

def parse_summary(app, res)
base_url = @url.match(/(https?:\/\/.+?)\//)[1]

app.scraped_at = Time.now

unless app.info_url
key_val = res.link_with(id: 'tab_summary')&.href
app.info_url = "#{base_url}#{key_val}"
end

# The Documents tab doesn't show if there are no documents (we get li.nodocuments instead)
# Bradford has #tab_documents but without the document count on it
app.documents_count = 0

if documents_link = res.at('.associateddocument a')
if documents_link.inner_text.match(/\d+/)
app.documents_count = documents_link.inner_text.match(/\d+/)[0].to_i
app.documents_url = base_url + documents_link[:href]
end
elsif documents_link = res.at('#tab_documents')
if documents_link.inner_text.match(/\d+/)
app.documents_count = documents_link.inner_text.match(/\d+/)[0].to_i
app.documents_url = base_url + documents_link[:href]
end
end
# We need to find values in the table by using the th labels.
# The row indexes/positions change from site to site (or even app to app) so we can't rely on that.
res.search('#simpleDetailsTable tr').each do |row|
key = row.at('th').inner_text.strip
value = row.at('td').inner_text.strip
case key
when 'Reference'
app.council_reference = value
when 'Alternative Reference'
app.alternative_reference = value unless value.empty?
when 'Planning Portal Reference'
app.alternative_reference = value unless value.empty?
when 'Application Received'
app.date_received = Date.parse(value) if value.match(/\d/)
when 'Application Registered'
app.date_received = Date.parse(value) if value.match(/\d/)
when 'Application Validated'
app.date_validated = Date.parse(value) if value.match(/\d/)
when 'Address'
app.address = value unless value.empty?
when 'Proposal'
app.description = value unless value.empty?
when 'Status'
app.status = value unless value.empty?
when 'Decision'
app.decision = value unless value.empty?
when 'Decision Issued Date'
app.date_decision = Date.parse(value) if value.match(/\d/)
when 'Appeal Status'
app.appeal_status = value unless value.empty?
when 'Appeal Decision'
app.appeal_decision = value unless value.empty?
else
puts "Error: key '#{key}' not found"
end # case
end
end
end # class
end

+ 18
- 17
spec/authority_spec.rb Zobrazit soubor

@@ -1,13 +1,11 @@
require 'spec_helper'

describe UKPlanningScraper::Authority do

describe '#named' do

let(:authority) { described_class.named(authority_name) }
subject(:authority) { UKPlanningScraper::Authority.named(name) }

context 'when authority exists' do
let(:authority_name) { 'Westminster' }
let(:name) { 'Westminster' }

it 'returns an authority' do
expect(authority).to be_a(UKPlanningScraper::Authority)
@@ -15,7 +13,7 @@ describe UKPlanningScraper::Authority do
end

context 'when authority does not exist' do
let(:authority_name) { 'Westmonster' }
let(:name) { 'Westmonster' }

it 'raises an error' do
expect { authority }.to raise_error(UKPlanningScraper::AuthorityNotFound)
@@ -24,11 +22,10 @@ describe UKPlanningScraper::Authority do
end

describe '#all' do
let(:all) { UKPlanningScraper::Authority.all }

let(:all) { described_class.all }

it 'returns all authorities' do
expect(all.count).to eq(112)
it 'returns more than 100 authorities' do
expect(all.count).to be > 100
end

it 'returns a list of authorities' do
@@ -40,18 +37,22 @@ describe UKPlanningScraper::Authority do
end

describe '#tagged' do
let (:tagged_london) { described_class.tagged('london') }
let (:authority) { UKPlanningScraper::Authority.tagged(tag) }

context 'when tagged london' do
let(:tag) { 'london' }

it 'returns all London authorities' do
expect(tagged_london.count).to eq(35)
it 'returns all 35 London authorities' do
expect(authority.count).to eq(35)
end
end

let (:tagged_londonboroughs) { described_class.tagged('londonboroughs') }
context 'when tagged londonboroughs' do
let(:tag) { 'londonboroughs' }

it 'returns all London boroughs' do
expect(tagged_londonboroughs.count).to eq(32)
it 'returns all 32 London boroughs' do
expect(authority.count).to eq(32)
end
end

end

end

+ 40
- 0
spec/council_reference_spec.rb Zobrazit soubor

@@ -0,0 +1,40 @@
require 'spec_helper'

describe UKPlanningScraper::Authority do
describe 'named+council_reference scrape' do
let(:scraper) { UKPlanningScraper::Authority.named(authority_name).council_reference(council_reference) }

context 'for an existing idox planning reference' do
let(:authority_name) { 'Brighton and Hove' }
let(:council_reference) { 'BH2017/04225' }
subject(:apps) {
VCR.use_cassette("#{self.class.description}") {
scraper.scrape
}
}

it 'returns an app (in the apps array)' do
expect(apps.any?).to be_truthy
end

it 'has a status of Withdrawn' do
expect(apps.first[:status]).to eql('Withdrawn')
end
end

context 'for a non-existant idox planning reference' do
let(:authority_name) { 'Brighton and Hove' }
let(:council_reference) { 'XYZ123' }
subject(:apps) {
VCR.use_cassette("#{self.class.description}") {
scraper.scrape
}
}

it 'returns an empty apps array' do
expect(apps.empty?).to be_truthy
end
end
end

end

+ 139
- 0
spec/vcr_cassettes/for_a_non-existant_idox_planning_reference.yml
Diff nebyl zobrazen, protože je příliš veliký
Zobrazit soubor


+ 139
- 0
spec/vcr_cassettes/for_an_existing_idox_planning_reference.yml
Diff nebyl zobrazen, protože je příliš veliký
Zobrazit soubor


Načítá se…
Zrušit
Uložit