In this post, I’ll briefly describe a Mechanize script I wrote to update data in a web application. We’ll see how to navigate using Mechanize, and finding HTML elements with XPath.
The goal was simple: open a CSV file containing product references for catalog items and prices, and upload them into a Rails web app (by going to the edit page, entering the prices in the proper form fields, and submitting the form).
HTML markup
Here is the relevant portion of the web page we’ll be interacting with using Mechanize:
<div class='catalog_item'> <div class='description'> <span>15305OR.OO.D088CR.01</span> <p> <a href="/catalog_items/2/edit">Edit</a> </p> </div> </div>
Enter Mechanize
Full code here.
First things first, we create a new Mechanize instance with “a = Mechanize.new” on line 15, and navigate to the catalog items page with a.get(‘http://localhost:3000/catalog_items’) on line 23. Once there, we need to locate the “edit” link for the catalog item we want to update prices for. Based on the HTML markup above, we know the following:
- the reference we’re looking for is within a span tag
- from there, if we go up one element, then descend into the “p” element, and finally we look for a link with the “Edit” text
For all of this, we use XPath, which allows us to navigate the HTML document much like we’d navigate a file system (where each file/folder would be an HTML element). That translates into line 24:
edit_link = catalog_items_index.parser.xpath("//span[text()='#{reference}']/../p/a[text()='Edit']")
The thing with XPath is that it will return an array of matches, so to get the actual link we’re interested in we need to grab the first element of the array (since we know there will be only one “edit” link for any given catalog item). Therefore by calling “edit_link = edit_link.first” on line 28 we get a clickable link we can tell Mechanize to follow.
Once we’re on the appropriate edit page for the catalog item, all that is left to do is to enter the price information in the form fields and to submit the form (lines 31 to 34):
edit_catalog_item_page.form_with(:method => 'POST') do |f| f.field_with(:id => 'catalog_item_retail_price').value = retail_price f.field_with(:id => 'catalog_item_wholesale_price').value = wholesale_price end.submit
Things could have been easier
In the case above, the HTML was structured for visual coherence, and I didn’t go out of my way to make it easily parsable by Mechanize (or any other agent). One way to make things much easier for Mechanize would have been to link to catalog items with their reference as the link text. Then, finding a link to the item’s show page would be reduced to “catalog_items_index.link_with(:text => reference)”. If the link exists, the catalog item is already in the system, and getting to the edit page simply requires following that link, then following the “Edit” link within the show view.