Web Automation using PyQt4 and jQuery
I would like to show how a web related task can be automated using PyQt4 and jQuery. The original intention is to automate the task of checking my broadband usage from the ISP portal. But here I am going to show how we can fetch google search result for given keyword. I know there are better ways to do this but I want to explain the technique with simple example.
Qt4 provides a widget called QWebView that is capable of loading and rendering web pages with the help of WebKit browser engine. But there is no simple way to manipulate the web page elements like clicking a link/button, entering values into input elements and etc. But the class QWebFrame provides the function "evaluateJavaScript" which allows us to execute arbitrary javascript code within current web page. By executing jQuery source within current web page we will get complete jQuery environment in which easily manipulate HTML elements. Below is the python script which receives search keyword from its command line argument and prints the google search result for the keyword.
#!/usr/bin/env python
import sys
from PyQt4.QtCore import *
from PyQt4.QtGui import *
from PyQt4.QtWebKit import *
class GoogleSearchBot(QApplication):
ACTION_NONE = 0
ACTION_SEARCH_KEYWORD = 1
ACTION_FETCH_RESULTS = 2
def __init__(self, argv):
super(GoogleSearchBot, self).__init__(argv)
jqueryFile = open("jquery-1.3.2.min.js") # make sure you have jquery source placed in the same directory
self.__jquery = jqueryFile.read()
jqueryFile.close()
self.__webView = QWebView()
self.__webView.show() # comment this line if you don't want to show the browser window
self.connect(self.__webView, SIGNAL("loadFinished(bool)"), self.loadFinished)
def search(self, keyword):
self.__keyword = keyword
self.__nextAction = self.ACTION_SEARCH_KEYWORD
self.__webView.load(QUrl('http://www.google.com'))
def loadFinished(self, ok):
page = self.__webView.page()
currentFrame = page.currentFrame()
currentFrame.evaluateJavaScript(self.__jquery)
if self.__nextAction == self.ACTION_SEARCH_KEYWORD:
currentFrame.evaluateJavaScript('$("input[title=Google Search]").val("' + self.__keyword + '");')
currentFrame.evaluateJavaScript('$("input[value=Google Search]").parents("form").submit();')
self.__nextAction = self.ACTION_FETCH_RESULTS
elif self.__nextAction == self.ACTION_FETCH_RESULTS:
results = currentFrame.evaluateJavaScript('var results = ""; $("h3[class=r]").each(function(i) { results += $(this).text() + "\\n"; }); results');
resultList = str(results.toString().toAscii()).splitlines()
sno = 1
print('Google search result\n====================')
for result in resultList:
print(str(sno) + ". " + result)
sno += 1
self.__webView.close()
self.__nextAction = self.ACTION_NONE
if __name__ == '__main__':
if len(sys.argv) != 2:
print("Usage: GoogleSearchBot.py <keyword>")
sys.exit(0)
googleSearchBot = GoogleSearchBot(sys.argv)
googleSearchBot.search(sys.argv[1])
sys.exit(googleSearchBot.exec_())
So at line 15 I read jQuery source and have it for future use. Then I create QWebView and connect its "loadFinished" signal. When we call the search function we first load www.google.com and wait for the signal "loadFinished". Once the page is loaded we inject jQuery source into current page. With the help of jQuery selector we find the input text element and enter the keyword. Then we submit the form and again wait for "loadFinished" signal. This time when loadFinished function called, we are presented with google search results for the keyword. Again we inject jQuery and collect the results(just the title alone, not the corresponding URL) and return it. In python side, we split string into list of lines and print it on
here is how it will look when we run
Actually I started with mechanize but it is not working on all cases. Moreover debugging with mechanize is more difficult since the browser window is not visible and we won't know what is happening behind the scene.
Last Updated (Monday, 28 September 2009 21:27)




Comments
Changing the url in line line 25 to http://www.google.com/ncr makes it work outside the US by loaing the same page as US users get