CLOSED-LOOP: Web automation with Groovy and Ruby

Wednesday, February 25, 2009

Web automation with Groovy and Ruby

For a small private project I need to do some web automation.

I wanted to use a scripting language and decided to give Ruby and Groovy a try.

In Ruby there is the Mechanize library. In Groovy there are different options.

The Ruby Mechanize library seems very intuitive:

  require 'rubygems'
  require 'mechanize'

  a = WWW::Mechanize.new { |agent|
    agent.user_agent_alias = 'Mac Safari'
  }

  a.get('http://google.com/') do |page|
    search_result = page.form_with(:name => 'f') do |search|
      search.q = 'Hello world'
    end.submit

    search_result.links.each do |link|
      puts link.text
    end
  end

I like the DSLish way to both, scrape (eg: earch_result.links.each) and manipulate (eg: search.q = 'Hello world') a web page.

In Groovy scraping is also pretty DSLish:

def page = new XmlSlurper(new org.cyberneko.html.parsers.SAXParser()).parse('http://groovy.codehaus.org/')
def data = page.depthFirst().grep{ it.name() == 'A' && it.@href.toString().endsWith('.html') }.'@href'
data.each { println it }

But it makes a bit a less concise impression than the Ruby version.

Manipulating a web page with groovy unfortunately is clumsier:

import com.gargoylesoftware.htmlunit.WebClient

def webClient = new WebClient()
def page = webClient.getPage('http://www.google.com')
// check page title
assert 'Google' == page.titleText
// fill in form and submit it
def form = page.getFormByName('f')
def field = form.getInputByName('q')
field.setValueAttribute('Groovy')
def button = form.getInputByName('btnG')
def result = button.click()
// check groovy home page appears in list (assumes it's on page 1)
assert result.anchors.any{ a -> a.hrefAttribute == 'http://groovy.codehaus.org/' }

This is less DSLish and much more old-scool imperative... the different styles for scraping and manipulating is a bit unfortunate (however you can also use HtmlUnit for scraping).

follow me on twitter, I need some friends :-)

2 comments:

AnonymousFebruary 27, 2009 at 6:00 AM
check out the 'Updating XML with XmlSlurper' article on codehaus.org.

http://groovy.codehaus.org/Updating+XML+with+XmlSlurper

or http://tinyurl.com/bn8fw3

I don't know how current it is, but it seems a little more DSL than the example you've shown.
ReplyDelete
Replies
UnknownMarch 11, 2009 at 7:31 PM
@Kevin Williams
Thanks for the link to 'Updating XML with XmlSlurper'.

But as far as I can see, the scenario is not applicable for web-automation.

XmlSlurper seems to allow me to modify the (in-memory) text-representation of a web-page. But it does not let me manipulate the web-page itself...
ReplyDelete
Replies

Add comment