From: robert on
Often I want to extract some web table contents. Formats are
mostly static, simple text & numbers in it, other tags to be
stripped off. So a simple & fast approach would be ok.

What of the different modules around is most easy to use, stable,
up-to-date, iterator access or best matrix-access (without need
for callback functions,classes.. for basic tasks)?


Robert
From: Tim Cook on
There are couple of HTML examples using Pyparsing here:

http://pyparsing.wikispaces.com/Examples


--Tim

On Sun, 2008-07-06 at 14:40 +0200, robert wrote:
> Often I want to extract some web table contents. Formats are
> mostly static, simple text & numbers in it, other tags to be
> stripped off. So a simple & fast approach would be ok.
>
> What of the different modules around is most easy to use, stable,
> up-to-date, iterator access or best matrix-access (without need
> for callback functions,classes.. for basic tasks)?
>
>
> Robert
> --
> http://mail.python.org/mailman/listinfo/python-list
--
Timothy Cook, MSc
Health Informatics Research & Development Services
LinkedIn Profile:http://www.linkedin.com/in/timothywaynecook
Skype ID == timothy.cook
**************************************************************
*You may get my Public GPG key from popular keyservers or *
*from this link http://timothywayne.cook.googlepages.com/home*
**************************************************************
From: robert on
Tim Cook wrote:
>
> On Sun, 2008-07-06 at 14:40 +0200, robert wrote:
>> Often I want to extract some web table contents. Formats are
>> mostly static, simple text & numbers in it, other tags to be
>> stripped off. So a simple & fast approach would be ok.
>>
>> What of the different modules around is most easy to use, stable,
>> up-to-date, iterator access or best matrix-access (without need
>> for callback functions,classes.. for basic tasks)?
>>
> There are couple of HTML examples using Pyparsing here:
>
> http://pyparsing.wikispaces.com/Examples
>
>

hm - nothing special with HTML tables.

Meanwhile:

I dislike "ClientTable" (file centric, too much parsing errors in
real world).

"TableParse" works. Very simple&fast 70-liner regexp->matrix and
strip/clean/HTML-entities conversion. Fast success hands-on.
Doesn't separate nested tables and such complexities consciously -
but works though for simple hands-on tasks in real world.


Robert
From: "Sebastian "lunar" Wiesner" on
robert <no-spam(a)no-spam-no-spam.invalid>:

> Often I want to extract some web table contents. Formats are
> mostly static, simple text & numbers in it, other tags to be
> stripped off. So a simple & fast approach would be ok.
>
> What of the different modules around is most easy to use, stable,
> up-to-date, iterator access or best matrix-access (without need
> for callback functions,classes.. for basic tasks)?

Not more than a handful of lines with lxml.html:

def htmltable2matrix(table):
"""Converts a html table to a matrix.

:param table: The html table element
:type table: An lxml element
"""
matrix = []
for row in table:
matrix.append([e.text_content() for e in row])
return matrix



--
Freedom is always the freedom of dissenters.
(Rosa Luxemburg)