|
Prev: Python / WxPython error message
Next: Hi there! I need what is probably a very basic lesson in events....
From: robert on 6 Jul 2008 08:40 Often I want to extract some web table contents. Formats are mostly static, simple text & numbers in it, other tags to be stripped off. So a simple & fast approach would be ok. What of the different modules around is most easy to use, stable, up-to-date, iterator access or best matrix-access (without need for callback functions,classes.. for basic tasks)? Robert
From: Tim Cook on 6 Jul 2008 08:52 There are couple of HTML examples using Pyparsing here: http://pyparsing.wikispaces.com/Examples --Tim On Sun, 2008-07-06 at 14:40 +0200, robert wrote: > Often I want to extract some web table contents. Formats are > mostly static, simple text & numbers in it, other tags to be > stripped off. So a simple & fast approach would be ok. > > What of the different modules around is most easy to use, stable, > up-to-date, iterator access or best matrix-access (without need > for callback functions,classes.. for basic tasks)? > > > Robert > -- > http://mail.python.org/mailman/listinfo/python-list -- Timothy Cook, MSc Health Informatics Research & Development Services LinkedIn Profile:http://www.linkedin.com/in/timothywaynecook Skype ID == timothy.cook ************************************************************** *You may get my Public GPG key from popular keyservers or * *from this link http://timothywayne.cook.googlepages.com/home* **************************************************************
From: robert on 6 Jul 2008 10:42 Tim Cook wrote: > > On Sun, 2008-07-06 at 14:40 +0200, robert wrote: >> Often I want to extract some web table contents. Formats are >> mostly static, simple text & numbers in it, other tags to be >> stripped off. So a simple & fast approach would be ok. >> >> What of the different modules around is most easy to use, stable, >> up-to-date, iterator access or best matrix-access (without need >> for callback functions,classes.. for basic tasks)? >> > There are couple of HTML examples using Pyparsing here: > > http://pyparsing.wikispaces.com/Examples > > hm - nothing special with HTML tables. Meanwhile: I dislike "ClientTable" (file centric, too much parsing errors in real world). "TableParse" works. Very simple&fast 70-liner regexp->matrix and strip/clean/HTML-entities conversion. Fast success hands-on. Doesn't separate nested tables and such complexities consciously - but works though for simple hands-on tasks in real world. Robert
From: "Sebastian "lunar" Wiesner" on 6 Jul 2008 11:34
robert <no-spam(a)no-spam-no-spam.invalid>: > Often I want to extract some web table contents. Formats are > mostly static, simple text & numbers in it, other tags to be > stripped off. So a simple & fast approach would be ok. > > What of the different modules around is most easy to use, stable, > up-to-date, iterator access or best matrix-access (without need > for callback functions,classes.. for basic tasks)? Not more than a handful of lines with lxml.html: def htmltable2matrix(table): """Converts a html table to a matrix. :param table: The html table element :type table: An lxml element """ matrix = [] for row in table: matrix.append([e.text_content() for e in row]) return matrix -- Freedom is always the freedom of dissenters. (Rosa Luxemburg) |