This example will use the code from my previous post 'Convert Twitter stream into an RSS feed'. If you don't understand why I did something, just pop on over to that link and see how I came up with the code originally.
I STRONGLY suggest you look up some of the references such as how Google App Engine handles cron, datastore, and some things about entities and keys. Most of the stuff you will need are provided in the references at the end of the post.
What our plan is...
Ok, let me briefly go over what my proposed system will do and how cron and GAE's Datastore fits in.In a previous blog post I created a web app that would connect to someones twitter feed and convert it into an RSS feed. A problem with this set-up was that there was a massive lag (about 2-3 seconds) while the app downloaded the stream, parsed it, inserted links and outputted an RSS XML file.
To solve this, I will create a cron script that I will run in the background (I will also hide it behind an administration login page so that random users cannot call it randomly). This requires me to store the feed into a persistent object, which Datastore conveniently supplies.
Now for the code....
The RSS object
We will first create a Python object that we will use to define the objects we store into our database. Just create a file called entity.py and add the following code:# Our datastore interface
from google.appengine.ext import db
# Our RSS entity object
class Rss(db.Model):
feed = db.StringProperty()
content = db.TextProperty()
Note that we import the db object; this is our interface to Datastore. If you want to know more about creating entities, I suggest you read the references provided at the end of the post.
The cron script
This script will do everything we did in our previous blog post, except it will store the feed into our RSS entity and into the Datastore. Create a file called cron.py and insert the following code:# The minidom library for XML parsing
from xml.dom.minidom import parseString
# The URL Fetch library
from google.appengine.api import urlfetch
# Our entity library
import entity
# Detects if it is a URL link and adds the HTML tags
def linkify(text):
# If http is present in, add the link tag
if "http" in text:
text = "<a href='" + text + "'>" + text + "</a>"
# If @ is present, turn it into a twitter handle link
elif "@" in text:
text = "<a href='http://twitter.com/#!/" + text.split("@")[1] + "'>" + text
text+= "</a>"
# Turn into twitter hash tags
elif "#" in text:
text = "<a href='https://twitter.com/#!/search/%23" + text.split("@")[1] + "'>" + text
text+= "</a>"
return text
# Output the XML into an RSS feed
def outputRSS(xml):
# The get the status list
statuses = xml.getElementsByTagName("status")
# Our return string
outputString = "<?xml version='1.0'?>\n<rss version='2.0'>\n\t<channel>"
outputString+= "\n\t\t<title>Almightyolive Twitter</title>\n\t\t"
outputString+= "<link>https://twitter.com/#!/almightyolive</link>\n"
outputString+= "\t\t<description>The twitter feed for the Almighty "
outputString+= "Olive</description>"
# Cycled through the status
for status in statuses:
#Gets the statuses
text = status.getElementsByTagName("text")[0].firstChild.data
date = status.getElementsByTagName("created_at")[0].firstChild.data
tweet = status.getElementsByTagName("id")[0].firstChild.data
# Insert links into the text
words = text.split()
for i in range (len(words)):
words[i] = linkify(words[i])
# Recompile words
text = " ".join(words)
# Creates our output
string = "\n\t\t<item>\n\t\t\t<title>" + str(date) + "</title>\n"
string+="\t\t\t<link>https://twitter.com/AlmightyOlive/status/" + tweet
string+= "</link>\n\t\t\t<description>" + str(text) + "</description>\n"
string+= "\t\t</item>"
outputString+=string
# Output string
outputString += "\n\t</channel>\n</rss>"
return outputString
# OUR CRON SCRIPT PROPER!
#
# Grabs the XML
url = urlfetch.fetch('https://api.twitter.com/1/statuses/user_timeline.xml?screen_name=almightyolive&count=10&trim_user=true')
# Parses the document
xml = parseString(url.content)
content = outputRSS(xml)
# Our RSS storage entity
rssStore = entity.Rss(key_name='almightyolive')
# Elements of our RSS
rssStore.feed = "almightyolive"
rssStore.content = content
# Stores our RSS Feed into the datastore
rssStore.put()
The functions linkify() and outputRSSS() are exactly the same as in the previous blog post (with the addition to linkify to do hashtags). Our biggest difference is the replacing MainPage and the webapp specific stuff with a simple sequential script (which in actuality is not unlike the content of MainPage).
A brief explanation of the entity and datastore code:
- Create rssStore object as defined by the Rss object in our entity.py file. Note that we pass a key called 'almightyolive', which is our unique identifier for this object.
- Store our object values, especially our feed variable content
- Call the put() method on our rssStore object to push it onto the Datastore
The feed app
Now we need to create our front-end to access the RSS feed xml. Create a new file called feed.py and add the following:# The webapp2 framework
import webapp2
# Our datastore interface
from google.appengine.ext import db
import entity
# Fetches an datastore object and displays it
class MainPage(webapp2.RequestHandler):
# Respond to a HTTP GET request
def get(self):
# A try-catch statement
try:
# Create RSS entity
feed = entity.Rss()
# Get the key for an RSS entity called almightyolive
feed_k = db.Key.from_path('Rss', 'almightyolive')
# Retrieve object from datastore
feed = db.get(feed_k)
# Outputs the RSS
self.response.out.write(feed.content)
# Our exception code
except (TypeError, ValueError):
self.response.out.write("<html><body><p>Invalid inputs</p></body></html>")
# Create our application instance that maps the root to our
# MainPage handler
app = webapp2.WSGIApplication([('/*', MainPage)], debug=True)
Pretty simple, huh? Now onto the configuration files....
app.yaml and cron.yaml
Let's start with app.yaml first. Add the following:application: almightynassarNote that this is no longer threadsafe; this is because we defined another handler other than feed.app. Why? Because I added a line for our cron handler: 'login: admin'. This restricts access to the URL to only the administrators of the application.
version: 1
runtime: python27
api_version: 1
threadsafe: no
handlers:
- url: /cron
script: cron.py
login: admin
- url: /.*
script: feed.app
Now create cron.yaml with the following:
cron:
- description: daily summary job
url: /cron
schedule: every 1 hours
And there you have it! Once you upload it, the cron.py script will run every hour (you can run it manually first to populate the database) and then see the RSS feed!
References
- Google's own getting started with webapp and Python.
- The official webapp2 reference
- The Google developer resource for GAE
- Google App Engine FAQs
- YAML reference
- app.yaml reference
- Twitter API reference to get a user's status feed
- Python string reference
- Tutorialspoint string tutorial and reference
- Google App Engine reference for Datastore Entities, Properties and Keys
- Google App Engine reference for Scheduling tasks with cron
- Google App Engine reference for Datastore
Hello Nassar! I think you did a great job with this post. I've been building a Google App Engine project with Java, but I wanted to set up a cron job with Python, and this post was very helpful.
ReplyDeleteKeep up the good work.
Thank you!!!
ReplyDelete