Jun 26, 2012

Handling HTTP GET requests with webapp2 and Google App Engine: Python

This is a continuation of my Python 2.7 and Google App Engine series. This particular blog post builds upon the code given in my previous posts URL Routing and  Cron and Datastore in Google App Engine: Python, which in turn builds upon my earlier work. If you don't understand parts of the code I highly suggest you browse my earlier blog posts so you can understand some of the design decisions I have made.

A brief overview...

For those who are diving straight in, let me explain the old code and how I will update it:

I have a script feed.py that I have mapped using app.yaml. A cron script (configured by cron.yaml) simply connects to my Twitter account and converts my status updates into an RSS feed. It then stores the RSS feed into a Google Datastore object.

The feed script takes the Datastore object and displays it. We use another script (entity.py) to define the Datastore object.

We will now configure the system so that it can convert multiple twitter accounts into an RSS feed. To display a particular RSS feed we will use a HTTP GET request.

The main application

We will create a file called feed.py. This script will be our controller; it simply gets the HTTP requests and maps them to certain classes. These classes will then call other functions to perform the required tasks.

# The webapp2 framework
import webapp2

# Our datastore interface
from google.appengine.ext import db

# Our entity library
import entity

# Our XML2RSS library
import XML2RSS

# Output the XML in a HTML friendly manner
class Cron(webapp2.RequestHandler):
    # Respond to a HTTP GET request
    def get(self):
        # A try-catch statement
        try:
            XML2RSS.getTweets("almightyolive")
            XML2RSS.getTweets("founding")
            XML2RSS.getTweets("ABCNews24")
            XML2RSS.getTweets("SBSNews")
       
        # Our exception code
        except (TypeError, ValueError):
            self.response.out.write("<html><body><p>Invalid inputs</p></body></html>")

# Fetches an XML document and parses it
class MainPage(webapp2.RequestHandler):
    # Respond to a HTTP GET request
    def get(self):
        # A try-catch statement
        try:
            account = self.request.get('account')
           
            feed = entity.Rss()
            feed_k = db.Key.from_path('Rss', account)
            feed = db.get(feed_k)
           
            # Outputs the RSS
            self.response.out.write(feed.content)

        # Our exception code
        except (TypeError,ValueError):
            self.response.out.write("<html><body><p>Invalid inputs (Type Error)</p></body></html>")
        except:
            self.response.out.write("<html><body><p>Unspecified Error</p></body></html>")

# Create our application instance that maps the root to our
# MainPage handler
app = webapp2.WSGIApplication([('/', MainPage),('/cron', Cron)], debug=True)

The XML2RSS script

As you may have noticed,the feed.py script made reference to an XML2RSS object. This is a separate script that outsources the conversion of XML to RSS into easy-to-call functions. Create a new file called XML2RSS.py and add the following:

# The minidom library for XML parsing
from xml.dom.minidom import parseString

# The URL Fetch library
from google.appengine.api import urlfetch

# Our entity library
import entity

# Detects if it is a URL link and adds the HTML tags
def linkify(text):
    # If http is present in, add the link tag
    if "http" in text:
        text = "&lt;a href='" + text + "'&gt;" + text + "&lt;/a&gt;"
    elif "@" in text:
        text = "&lt;a href='http://twitter.com/#!/" + text.split("@")[1] + "'&gt;" + text + "&lt;/a&gt;"
    elif "#" in text:
        text = "&lt;a href='https://twitter.com/#!/search/%23" + text.split("#")[1] + "'&gt;" + text + "&lt;/a&gt;"
       
    return text

# Output the XML in a HTML friendly manner
def outputRSS(xml, account):
    # The get the states list
    statuses = xml.getElementsByTagName("status")
   
    # Our return string
    outputString = "<?xml version='1.0'?>\n<rss version='2.0'>\n\t<channel>\n\t\t<title>Twitter: " + account + "</title>\n\t\t"
    outputString+= "<link>https://twitter.com/#!/almightyolive</link>\n\t\t<description>The twitter feed for " + account + "</description>"
   
    # Cycled through the states
    for status in statuses:
        #Gets the statuses
        text = status.getElementsByTagName("text")[0].firstChild.data
        date = status.getElementsByTagName("created_at")[0].firstChild.data
        tweet = status.getElementsByTagName("id")[0].firstChild.data
       
        # Insert links into the text
        words = text.split()
       
        for i in range (len(words)):
            words[i] = linkify(words[i])
       
        # Recompile words
        text = " ".join(words)
       
        # Creates our output
        string = "\n\t\t<item>\n\t\t\t<title>" + str(date) + "</title>\n\t\t\t<link>https://twitter.com/AlmightyOlive/status/" + tweet + "</link>\n\t\t\t<description>" + str(text) + "</description>\n\t\t</item>"
        outputString+=string
       
    # Output string
    outputString += "\n\t</channel>\n</rss>"
    return outputString   

# Our RSS storage function
def getTweets(account):
    # Grabs the XML
    url = urlfetch.fetch('https://api.twitter.com/1/statuses/user_timeline.xml?screen_name=' + account + '&count=10&trim_user=true')
           
    # Parses the document
    xml = parseString(url.content)

    # Converts the XML into RSS
    content = outputRSS(xml, account)
   
    # Our RSS storage entity
    rssStore = entity.Rss(key_name='' + account)

    # Elements of our RSS
    rssStore.feed = '' + account
    rssStore.content = content

    # Stores our RSS Feed into the datastore
    rssStore.put()

The pieces to make it all work

If you have been following on from my previous work, then you should already have most of this code. I won't bother explaining it here because it is mostly self-explanatory.

app.yaml:
application: almightynassar
version: 1
runtime: python27
api_version: 1
threadsafe: yes

handlers:
- url: /cron
  script: feed.app
  login: admin
 
- url: /.*
  script: feed.app

cron.yaml:


cron:
- description: daily summary job
  url: /cron
  schedule: every 1 hours

entity.py:

# Our datastore interface
from google.appengine.ext import db

# Our RSS entity object
class Rss(db.Model):
    feed = db.StringProperty()
    content = db.TextProperty()

And that's it! You now have a fully functional application that just uses the webapp2 framework!

If you navigate to http://localhost:8080/?account=almightyolive you should now see the RSS feed. You can test if your mapping works by navigating to http://localhost:8080/?account=founding; you should see the Founding Institute twitter account instead!


References:

2 comments:

  1. Thanks for this. I was just wondering, why not change your blog title tag to show the post title before the blog name. It would be much easier to figure out post title on Google and for your SEO

    ReplyDelete
    Replies
    1. Cheers for the tip!

      Gonna look into it now; hopefully Blogger provides the option...

      Delete

Thanks for contributing!! Try to keep on topic and please avoid flame wars!!