Cheam North and Worcester Park local committee podcast feed creator. Scrapes the webpage and outputs an iTunes-friendly podcast RSS feed.
Nie możesz wybrać więcej, niż 25 tematów Tematy muszą się zaczynać od litery lub cyfry, mogą zawierać myślniki ('-') i mogą mieć do 35 znaków.

scrape.rb 1.5 KiB

12 lat temu
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364
  1. # Scrape webpage into a podcast RSS feed
  2. # https://www.sutton.gov.uk/index.aspx?articleid=4332
  3. require 'nokogiri'
  4. require 'open-uri'
  5. require 'time'
  6. require 'pp'
  7. FEED_TITLE = "Cheam North and Worcester Park Local Committee"
  8. FEED_IMAGE = "https://dl.dropbox.com/u/300783/logo.png"
  9. FEED_AUTHOR = "London Borough of Sutton"
  10. FEED_LINK = "https://www.sutton.gov.uk/index.aspx?articleid=4332"
  11. url = "cnwp.html"
  12. doc = Nokogiri.parse(open(url).read)
  13. meeting = ''
  14. items = []
  15. doc.at("#bodytext").children.each do |node|
  16. if node.inner_text.match(/\d{1,2}\s+\w+\s+\d{4}/) # eg 10 December 2012
  17. meeting = node.inner_text.strip
  18. end
  19. node.children.each do |subnode|
  20. if subnode.name == 'a' && subnode['href'].match(/\.mp3$/i)
  21. items << {
  22. :d => Time.parse(meeting),
  23. :href => subnode['href'].strip,
  24. :title => subnode.inner_text.strip
  25. }
  26. end
  27. end
  28. end
  29. builder = Nokogiri::XML::Builder.new do |xml|
  30. xml.rss('xmlns:itunes' => "http://www.itunes.com/dtds/podcast-1.0.dtd",
  31. :version => "2.0") {
  32. xml.channel {
  33. xml.title FEED_TITLE
  34. xml.link FEED_LINK
  35. xml['itunes'].image(:href => FEED_IMAGE)
  36. xml['itunes'].author FEED_AUTHOR
  37. items.each do |i|
  38. xml.item {
  39. xml.title i[:title]
  40. xml['itunes'].author FEED_AUTHOR
  41. xml.enclosure(
  42. :url => i[:href],
  43. :type => "audio/mpeg"
  44. )
  45. xml.guid i[:href]
  46. xml.pubDate i[:d].rfc822
  47. }
  48. end
  49. }
  50. }
  51. end
  52. puts builder.to_xml