Jun 7, 2012

Parsing XML with Google App Engine: Python

This is a continuation of my Python 2.7 and Google App Engine series. If you are just starting out I suggest you start reading Getting Started and First App.

In this example we are going to read an XML document, parse it and then display it as a HTML table.

The XML document

The following code example came from a job I was doing interfacing a web application with GeoOP, a dispatch and a mobile workforce management system. They have an API but I have 'sanitized' my code to protect IP. But it is a really good system, and if you need something like it I suggest you pop on over and take a peek.

So everyone else can use the code, I've modified my code to use a publicly available XML source available from http://data.gov.au/data/. Specifically, I am using the list of ABC local radio stations.

The code

I am not going to bother explaining how to set up a project environment since I have explained it in earlier blog posts (see the first paragraph). Create a new python file (ensure your YAML configuration file is set correctly) and add the following:

# The webapp2 framework
import webapp2

# The URL Fetch library
from google.appengine.api.urlfetch import fetch

# The minidom library for XML parsing
from xml.dom.minidom import parseString

# Fetches an XML document and parses it
class MainPage(webapp2.RequestHandler):
    # Respond to a HTTP GET request
    def get(self):
        # A try-catch statement
        try:
            # Grabs the XML
            url = fetch('http://www.abc.net.au/local/data/public/stations/abc-local-radio.xml')
           
            # Parses the document
            xml = parseString(url.content)

            # Sets up the webpage
            self.response.out.write("<html><body><table>")
           
            # Outputs the table
            self.response.out.write(outputHTML(xml))
           
            # Sets up the webpage
            self.response.out.write("</table></body></html>")

        # Our exception code
        except (TypeError, ValueError):
            self.response.out.write("<html><body><p>Invalid inputs</p></body></html>")

# Output the XML in a HTML friendly manner
def outputHTML(xml):
    # The get the states list
    states = xml.getElementsByTagName("state")
   
    # Our return string
    outputString = ""
   
    # Cycled through the states
    for state in states:
        #Gets the stations and cycle through them
        stations = state.getElementsByTagName("station")
        for station in stations:
            # Grab data from the station element
            stationname = station.getElementsByTagName("stationname")[0].firstChild.data
            town = station.getElementsByTagName('town')[0].firstChild.data
            website = station.getElementsByTagName('website-url')[0].firstChild.data
           
            # Append the data onto the string
            string = "<tr><td>" + str(stationname) + "</td><td>" + str(town) + "</td><td>" + str(website) + "</td></tr>"

            outputString+=string
       
    # Output string
    return outputString   

# Create our application instance that maps the root to our
# MainPage handler
app = webapp2.WSGIApplication([('/*', MainPage)], debug=True)

If you set it up correctly then you should see a list of ABC radio stations!

References

No comments:

Post a Comment

Thanks for contributing!! Try to keep on topic and please avoid flame wars!!