Wednesday, 16 December 2009

Google App Engine Task Queue with Python - for Dummies...

I have a script that pull in data using from the net, and does some work on the data before storing it in the Datastore. However, the script was kicked off by cron every now and then, but was running into the 30s time limit set by the App Engine. The obvious solution was to split the work into smaller pieces and assign these to the Task Queue.

However, after reading that documentation few times I was more confused than before I started. The Task Queue API is actually great, but someone should revise that documentation... Above all, it is not clear at all how the different configurations relate to each other, the examples in the documentation is made for "over complicated" examples, IMHO.

Here is Task Queue explained for Dummies (like myself):

First of all, there is the default queue, and in addition you can create your own queues using the queue.yaml config file. The reason you to create your own queue would be that you are not happy with the default execution of 5 tasks per second of the default queue. Let's start looking at using just the default queue, and later on we will expand with creating our own queue too.

In this example the original script was doing work for seven days, and we will split it into seven smaller tasks:
  1. When using the default queue, we do not need to create any queue.yaml file at all.

  2. To start with, we need a URL that cron, or yourself, can use to kick off the whole affair, in app.yaml add for example:

    - url: /update
      script: scripts/all-to-q.py
      login: admin

  3. Now, create the all-to-q.py file in the scripts directory with content like:

    #!/usr/bin/python
    # -*- coding: utf-8 -*-

    import logging
    from google.appengine.api.labs import taskqueue

    for i in range(7):
        taskqueue.add(url='/one-day', params={'dayI': i}, countdown= i)
        logging.info('Adding day '+str(i)+' to the Task Queue.')

    The countdown parameter adds a little delay for each new task before it is executed.

  4. Now, go back to the app.yaml file and add that new URL you need for each task:

    - url: /one-day
      script: scripts/one-day.py
      login: admin

    Simple, isn't it.

  5. And now the essential parts of the one-day.py file; mainly those that will pick up the POST parameters (here just one called 'dayI'):

    import wsgiref.handlers
    from google.appengine.ext import webapp

    class OneDay(webapp.RequestHandler):

      def post(self):
        i = int(self.request.get('dayI'))
       
        # ... and here you get your hands dirty; use i and do the work.

    def main():
     
      application = webapp.WSGIApplication([
                    (r'/one-day', OneDay),
                    ], debug=True)
     
      wsgiref.handlers.CGIHandler().run(application)


    if __name__ == '__main__':
      main()


    ...and that's it.

  6. I don't understand why the official documentation could not explain something this simple...; I believe my example above makes it fairly clear how the execution logic flows.
PS. Note that I also included some logging above; it is really useful... Expand on it yourself.

Now, let's say we do not want to overload the sites we pull data from, so we will create our own queue and used that instead of the default queue. All we need to do is:
  • Create that queue.yaml file, with for example:

    queue:
    - name: one-full-day
      rate: 1/s
      bucket_size: 1

  • Now, in order to use that queue, change one single line in all-to-q.py so it reads:

    #!/usr/bin/python
    # -*- coding: utf-8 -*-

    import logging
    from google.appengine.api.labs import taskqueue

    for i in range(7):
        taskqueue.Task(url='/one-day', params={'dayI': i}, countdown= i).add(queue_name='one-full-day')
       
        logging.info('Adding day '+str(i)+' to the Task Queue.')

       
    Done. Possibly the taskqueue. line above wraps the row here in the blog, but it's a single line.
Wasn't that easy...

    Organizing and re-using Python code

    As the code grows it has become necessary to structure it better and re-use it (import it) - so that it can be maintained and expanded effectively. In short, it is time to take a closer look at
    if __name__=="__main__": ...

    I think this is the best explanation... Better start implementing it.

    Monday, 14 December 2009

    ...actually, no thanks.

    Actually trying to produce anything with Django on GAE has been a pain in the ***. All documentation and tutorials seems aimed at people who already knows Django and wants to put it on GAE.

    I have realized it is a waste of time.

    Will instead just use the template system built into GAE's own webapp (which I know is from Django) - it is probably all I want from Django anyway. When restling with Django itself  i discovered its GUI admin interface and other scary things...

    If webapp will not be enough I will instead use Cheetah. If I go for Cheetah, there a good how-to here.

    Sunday, 13 December 2009

    ...and Django

    The documentation regard getting Django up and running isn't that comprehensive.

    This is good: http://code.google.com/appengine/docs/python/tools/libraries.html#Django

    The actual local installation on the Mac, use this:
    http://www.djangoproject.com/download/
    ...but with the command: sudo python2.5 setup.py install

    Friday, 11 December 2009

    Voice Applications too?

    Just wanted to put this in here while I remember it... great potential:

    http://www.twilio.com/

    More Unicode, and more gor Google to fix

    Once I finally got teh Unicode sorted between BeautifulSoup and Google App Engine Datastore, with some help from this:

    http://khaidoan.wikidot.com/google-app-engine-datastore

    ...I was rather surprised to find out that Google itself still have bugs to sort out... The web based Datastore viewer of the SDK (http://localhost:8080/_ah/admin/datastore) can not display non ASCII charcters! ...and it is a rather old bug:

    http://code.google.com/p/googleappengine/issues/detail?id=502

    Well, well...
    It works fine in the production though - from the DataViewer in the Dashboard.

    Wednesday, 9 December 2009

    BeautifulSoup, GoogleAppengineLauncher and Snow Leopard

    Using BeautifulSoup has turned out very useful ... However, once everything worked fine from Mac (Snow Leopard) "command line" for the Python script, it was more tricky to get it into the GAE SDK (the GoogleAppengineLauncher).

    First of all, to use BeautifulSoup, just download the file "BeautifulSoup.py" from the developer, and drop it in the main directory of the GAE app your are developing. Even if your script that is using BeautifulSoup is in a sub directory, the soup needs to go in the main dir (although I suspect some wizard could have a nice way around that...).

    Now, the problem is:

    : No module named _ctypes
          args = ('No module named _ctypes',)
          message = 'No module named _ctypes'

    ...and it turns out the GAE SDK works only with Python 2.5 and not 2.6 which Snow Leopard uses.

    You can see in both the GoogleAppEnginerLauncher's Log windows that 2.6 is being used, and possibly at the top of yor browser window that throws out the error messages.

    Here is a nice workaround: http://code.google.com/p/googleappengine/issues/detail?id=2122#c3 . Just go into the GoogleAppEnginerLauncher's preferences and set your Python path to:

    /usr/bin/python2.5
     
    You can double check that 2.5 is also installed - it was on my Snow Leopard...
    Note this is slightly different from the path proposed by the Preferences window.
    Off we go...
    

    Sunday, 6 December 2009

    ASCII, ... Unicode and UTF-8

    I was playing with Google Translate (and its API) for my sandbox project - which obviously means I deal with characters not fitting into ASCII.

    After just sorting it out by trial and error for some time, and found this presentation:

    http://farmdev.com/talks/unicode/

    Absolutely brilliant! Why is that not on Python's own website?
    Many thanks to the author Kumar McMillan.

    PS. There are more useful learning on farmdev.

    Basics

    Assuming the big guys do know something I do not, and decided to follow them first.

    ...Google, hosting on Google App Engine; and Python since Java seems much too complicated at this stage.

    I have fumbled around with PHP before and created some really useful script. After doing some Python now, I have to say it is quite enjoyable - cleaner, more forgiving..., and once a script works it is really short. :)

    Python is nice.

    Starting up

    Mostly out of curiosity I will start playing with the technologies that are behind the Internet today - and here I mean the stuff serving up the graphics we watch, and not the routing and networking of the traffic.

    The way I see it "the Internet" in itself a "Second Life" type of world - just more real. The web sites we build and view are in themselves "real estate" with also commercial value - mainly turned into money through advertising. Already having solid knowledge of the networking aspect, I know want to find out what is the building blocks of this other "real estate".

    Thought I'd collect this here in this blog, mainly as a list of bookmarks for myself, but possibly also useful from someone else.