From: Phil Mcdonnell on
I'm trying to scrape a page that hides some data behind a javascript
function. Is there any way to get this data? I've been using
Mechanize, but I'm not sure it can do this. Is there a better library
to use for this type of thing?

The following is the interesting part of the page:

<td class="colPlus" onclick="fireClick(this,0)">
<a id="iroc_0" class="plus" href="#" onclick="return
false;">&nbsp;</a>
</td>
--
Posted via http://www.ruby-forum.com/.

From: brabuhr on
On Thu, May 20, 2010 at 1:48 AM, Phil Mcdonnell
<phil.a.mcdonnell(a)gmail.com> wrote:
> I'm trying to scrape a page that hides some data behind a javascript
> function. Is there any way to get this data? I've been using
> Mechanize, but I'm not sure it can do this. Is there a better library
> to use for this type of thing?

http://celerity.rubyforge.org/
http://watir.com/

> The following is the interesting part of the page:
>
> <td class="colPlus" onclick="fireClick(this,0)">
>    <a id="iroc_0" class="plus" href="#" onclick="return
> false;">&nbsp;</a>
> </td>

The *really* interesting part is what does the Javascript do :-) with
(a potentially large) effort you may be able to "reverse-engineer" the
javascript and emulate manually in mechanize. I.e. if the javascript
builds a simple HTTP request, you may be able to send the same request
from mechanize (possibly) without much effort.

From: Josh Cheek on
[Note: parts of this message were removed to make it a legal post.]

On Thu, May 20, 2010 at 12:48 AM, Phil Mcdonnell <phil.a.mcdonnell(a)gmail.com
> wrote:

> I'm trying to scrape a page that hides some data behind a javascript
> function. Is there any way to get this data? I've been using
> Mechanize, but I'm not sure it can do this. Is there a better library
> to use for this type of thing?
>
> The following is the interesting part of the page:
>
> <td class="colPlus" onclick="fireClick(this,0)">
> <a id="iroc_0" class="plus" href="#" onclick="return
> false;">&nbsp;</a>
> </td>
> --
> Posted via http://www.ruby-forum.com/.
>
>
You might check out Harmony:

http://www.rubyinside.com/harmony-javascript-and-a-dom-environment-in-ruby-3001.html
http://rubygems.org/gems/harmony
http://github.com/mynyml/harmony

From: Steven Parkes on
> Mechanize cannot execute javascript but watir/celerity can. (I've never
> used harmony)

Harmony uses envjs to execute JavaScript. There's also capybara which can either use a browser or envjs.
From: Phil Mcdonnell on
The other trick here is that this page is behind a login. Mechanize
allows me to fill out the login form and holds onto the login
credentials for me. Can harmony/celebrity/watir do this?

>
> The *really* interesting part is what does the Javascript do :-) with
> (a potentially large) effort you may be able to "reverse-engineer" the
> javascript and emulate manually in mechanize. I.e. if the javascript
> builds a simple HTTP request, you may be able to send the same request
> from mechanize (possibly) without much effort.

How would one do this? I'm somewhat new to javascript as I usually
don't do front end engineering. I see the below definition of this
function in the HTML page. Any way I can sniff out what it's actually
doing? I'm looking to figure out what the fireClick method displays.

<script type="text/javascript">
var d = document.domain.split(".");
document.domain = d[d.length - 2] + "." + d[d.length - 1];
var start = (new Date()).getTime();
var fireClick = function(){};
var omn_hierarchy="US|AMEX|Ser|eStatement";
var omn_pagename="MainPage";
var omn_language="en";
var omn_newpagename="yes";
</script>

... way down below...

<td class="colPlus" onclick="fireClick(this,0)">
<a id="iroc_0" class="plus" href="#" onclick="return
false;">&nbsp;</a>
</td>
--
Posted via http://www.ruby-forum.com/.