From: tedd on
Hi gang:

Here's the problem.

I have 184 HTML pages in a directory and each page contain a
question. The question is noted in the HTML DOM like so:

<p class="question">
Who is Roger Rabbit?
</p>

My question is -- how can I extract the string "Who is Roger Rabbit?"
from each page using php? You see, I want to store the questions in a
database without having to re-type, or cut/paste, each one.

Now, I can extract each question by using javascript --

document.getElementById("question").innerHTML;

-- and stepping through each page, but I don't want to use javascript for this.

I have not found/created a working example of this using PHP. I tried
using PHP's getElementByID(), but that requires the target file to be
valid xml and the string to be contained within an ID and not a
class. These pages do not support either requirement.

Additionally, I realize that I can load the files and parse out what
is between the <p> tags, but I was hoping for a "GetElementByClass"
way to do this.

So, is there one?

Thanks,

tedd
--
-------
http://sperling.com http://ancientstones.com http://earthstones.com
From: Ashley Sheridan on
On Sat, 2010-04-03 at 10:29 -0400, tedd wrote:

> Hi gang:
>
> Here's the problem.
>
> I have 184 HTML pages in a directory and each page contain a
> question. The question is noted in the HTML DOM like so:
>
> <p class="question">
> Who is Roger Rabbit?
> </p>
>
> My question is -- how can I extract the string "Who is Roger Rabbit?"
> from each page using php? You see, I want to store the questions in a
> database without having to re-type, or cut/paste, each one.
>
> Now, I can extract each question by using javascript --
>
> document.getElementById("question").innerHTML;
>
> -- and stepping through each page, but I don't want to use javascript for this.
>
> I have not found/created a working example of this using PHP. I tried
> using PHP's getElementByID(), but that requires the target file to be
> valid xml and the string to be contained within an ID and not a
> class. These pages do not support either requirement.
>
> Additionally, I realize that I can load the files and parse out what
> is between the <p> tags, but I was hoping for a "GetElementByClass"
> way to do this.
>
> So, is there one?
>
> Thanks,
>
> tedd
> --
> -------
> http://sperling.com http://ancientstones.com http://earthstones.com
>


I don't think there is a getElementsByClass function. HTML5 is proposing
one, but that will most likely be implemented in Javascript before PHP
Dom. There is a way to tidy up the HTML to make it XHTML, but I'm not
sure what it is. If you know roughly where in the document the HTML
snippet is you can use XPath to grab it.

Failing that, what about a regex? It shouldn't be too hard to write a
regex to match your example above.

Thanks,
Ash
http://www.ashleysheridan.co.uk


From: vikash.iitb on
I use this: http://simplehtmldom.sourceforge.net/
Check it out.


Thanks,
Vikash Kumar
--
http://vika.sh


On Sat, Apr 3, 2010 at 8:28 PM, Ashley Sheridan <ash(a)ashleysheridan.co.uk>wrote:

> On Sat, 2010-04-03 at 10:29 -0400, tedd wrote:
>
> > Hi gang:
> >
> > Here's the problem.
> >
> > I have 184 HTML pages in a directory and each page contain a
> > question. The question is noted in the HTML DOM like so:
> >
> > <p class="question">
> > Who is Roger Rabbit?
> > </p>
> >
> > My question is -- how can I extract the string "Who is Roger Rabbit?"
> > from each page using php? You see, I want to store the questions in a
> > database without having to re-type, or cut/paste, each one.
> >
> > Now, I can extract each question by using javascript --
> >
> > document.getElementById("question").innerHTML;
> >
> > -- and stepping through each page, but I don't want to use javascript for
> this.
> >
> > I have not found/created a working example of this using PHP. I tried
> > using PHP's getElementByID(), but that requires the target file to be
> > valid xml and the string to be contained within an ID and not a
> > class. These pages do not support either requirement.
> >
> > Additionally, I realize that I can load the files and parse out what
> > is between the <p> tags, but I was hoping for a "GetElementByClass"
> > way to do this.
> >
> > So, is there one?
> >
> > Thanks,
> >
> > tedd
> > --
> > -------
> > http://sperling.com http://ancientstones.com http://earthstones.com
> >
>
>
> I don't think there is a getElementsByClass function. HTML5 is proposing
> one, but that will most likely be implemented in Javascript before PHP
> Dom. There is a way to tidy up the HTML to make it XHTML, but I'm not
> sure what it is. If you know roughly where in the document the HTML
> snippet is you can use XPath to grab it.
>
> Failing that, what about a regex? It shouldn't be too hard to write a
> regex to match your example above.
>
> Thanks,
> Ash
> http://www.ashleysheridan.co.uk
>
>
>
From: Adam Richardson on
>
> <p class="question">
> Who is Roger Rabbit?
> </p>
>
> My question is -- how can I extract the string "Who is Roger Rabbit?" from
> each page using php? You see, I want to store the questions in a database
> without having to re-type, or cut/paste, each one.
>


I have not found/created a working example of this using PHP. I tried using
> PHP's getElementByID(), but that requires the target file to be valid xml
> and the string to be contained within an ID and not a class. These pages do
> not support either requirement.
>
> Additionally, I realize that I can load the files and parse out what is
> between the <p> tags, but I was hoping for a "GetElementByClass" way to do
> this.
>
> So, is there one?
>


Perhaps I'd try this:
http://simplehtmldom.sourceforge.net/manual.htm

<http://simplehtmldom.sourceforge.net/manual.htm>Adam

--
Nephtali: PHP web framework that functions beautifully
http://nephtaliproject.com
From: "Peter Pei" on
On Sat, 03 Apr 2010 08:58:44 -0600, Ashley Sheridan
<ash(a)ashleysheridan.co.uk> wrote:

> On Sat, 2010-04-03 at 10:29 -0400, tedd wrote:
>
>> Hi gang:
>>
>> Here's the problem.
>>
>> I have 184 HTML pages in a directory and each page contain a
>> question. The question is noted in the HTML DOM like so:
>>
>> <p class="question">
>> Who is Roger Rabbit?
>> </p>
>>
>> My question is -- how can I extract the string "Who is Roger Rabbit?"
>> from each page using php? You see, I want to store the questions in a
>> database without having to re-type, or cut/paste, each one.
>>
>> Now, I can extract each question by using javascript --
>>
>> document.getElementById("question").innerHTML;
>>
>> -- and stepping through each page, but I don't want to use javascript
>> for this.
>>
>> I have not found/created a working example of this using PHP. I tried
>> using PHP's getElementByID(), but that requires the target file to be
>> valid xml and the string to be contained within an ID and not a
>> class. These pages do not support either requirement.
>>
>> Additionally, I realize that I can load the files and parse out what
>> is between the <p> tags, but I was hoping for a "GetElementByClass"
>> way to do this.
>>
>> So, is there one?
>>
>> Thanks,
>>
>> tedd
>> --
>> -------
>> http://sperling.com http://ancientstones.com http://earthstones.com
>>
>
>
> I don't think there is a getElementsByClass function. HTML5 is proposing
> one, but that will most likely be implemented in Javascript before PHP
> Dom. There is a way to tidy up the HTML to make it XHTML, but I'm not
> sure what it is. If you know roughly where in the document the HTML
> snippet is you can use XPath to grab it.
>
> Failing that, what about a regex? It shouldn't be too hard to write a
> regex to match your example above.
>
> Thanks,
> Ash
> http://www.ashleysheridan.co.uk
>
>

Somejavascript engine already support GetElementByClass, for example Opera
does.

--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/