Uploading data using Mechanize

Posted on November 13, 2011

In this post, I’ll briefly describe a Mechanize script I wrote to update data in a web application. We’ll see how to navigate using Mechanize, and finding HTML elements with XPath.

The goal was simple: open a CSV file containing product references for catalog items and prices, and upload them into a Rails web app (by going to the edit page, entering the prices in the proper form fields, and submitting the form).

HTML markup

Here is the relevant portion of the web page we’ll be interacting with using Mechanize:

  
  
    <div class='catalog_item'>
    <div class='description'>
        <span>15305OR.OO.D088CR.01</span>
        <p>
            <a href="/catalog_items/2/edit">Edit</a>
        </p>
    </div>
</div>
  
  

Enter Mechanize

Full code here.

First things first, we create a new Mechanize instance with a = Mechanize.new on line 15, and navigate to the catalog items page with a.get('http://localhost:3000/catalog_items') on line 23. Once there, we need to locate the “edit” link for the catalog item we want to update prices for. Based on the HTML markup above, we know the following:

  • the reference we’re looking for is within a span tag (line 3 in the HTML above)
  • from there, if we go up one element, then descend into the “p” element, and finally we look for a link with the “Edit” text (line 5 in the HTML above)

For all of this, we use XPath, which allows us to navigate the HTML document much like we’d navigate a file system (where each file/folder would be an HTML element). That translates into line 24:

  
  
    
    edit_link = catalog_items_index.parser.xpath("//span[text()='#{reference}']/../p/a[text()='Edit']")
  
    
  

The thing with XPath is that it will return an array of matches, so to get the actual link we’re interested in we need to grab the first element of the array (since we know there will be only one “edit” link for any given catalog item). Therefore by calling edit_link = edit_link.first on line 28 we get a clickable link we can tell Mechanize to follow.

Once we’re on the appropriate edit page for the catalog item, all that is left to do is to enter the price information in the form fields and to submit the form (lines 31 to 34):

  
  
    edit_catalog_item_page.form_with(:method => 'POST') do |f|
    f.field_with(:id => 'catalog_item_retail_price').value = retail_price
    f.field_with(:id => 'catalog_item_wholesale_price').value = wholesale_price
end.submit
  
  

Things could have been easier

In the case above, the HTML was structured for visual coherence, and I didn’t go out of my way to make it easily parsable by Mechanize (or any other agent). One way to make things much easier for Mechanize would have been to link to catalog items with their reference as the link text. Then, finding a link to the item’s show page would be reduced to catalog_items_index.link_with(:text => reference) . If the link exists, the catalog item is already in the system, and getting to the edit page simply requires following that link, then following the “Edit” link within the show view.


Would you like to see more Elixir content like this? Sign up to my mailing list so I can gauge how much interest there is in this type of content.