From: davidgp on
hello, i'm new on this group, and quiet new to python!
i'm trying to scrap some adress data from bundes-telefonbuch.de but i
run into a problem:
the link is like this: http://www.bundes-telefonbuch.de/cgi-btbneu/chtml/chtml?WA=20
and it is basically the same for every search query.
thus i need to submit post data to the webserver, i try to do this
like this:

opener = urllib2.build_opener()
opener.addheaders = [('User-Agent', 'Mozilla/5.0 (compatible;
Konqueror/3.5; Linux) KHTML/3.5.4 (like Gecko)')]
urllib2.install_opener(opener)

data = urllib.urlencode({'F0': 'mySearchKeyword','B': 'T','F8': 'A ||
G','W': '1','Z': '0','HA': '10','SAS_static_0_treffer_treffer': 'Suche
starten','S': '1','translationtemplate': 'checkstrasse'})

url = 'http://www.bundes-telefonbuch.de/cgi-btbneu/chtml/chtml?WA=20'
response = urllib2.urlopen(url, data)

this returns a page saying i have to reenter my search terms..
what's going wrong here?

Thanks!!
From: Rebelo on
On 19 lip, 12:23, davidgp <davidvanijzendo...(a)gmail.com> wrote:
> hello, i'm new on this group, and quiet new to python!
> i'm trying to scrap some adress data from bundes-telefonbuch.de but i
> run into a problem:
> the link is like this:http://www.bundes-telefonbuch.de/cgi-btbneu/chtml/chtml?WA=20
> and it is basically the same for every search query.
> thus i need to submit post data to the webserver, i try to do this
> like this:
>
> opener = urllib2.build_opener()
> opener.addheaders = [('User-Agent', 'Mozilla/5.0 (compatible;
> Konqueror/3.5; Linux) KHTML/3.5.4 (like Gecko)')]
> urllib2.install_opener(opener)
>
> data = urllib.urlencode({'F0': 'mySearchKeyword','B': 'T','F8': 'A ||
> G','W': '1','Z': '0','HA': '10','SAS_static_0_treffer_treffer': 'Suche
> starten','S': '1','translationtemplate': 'checkstrasse'})
>
> url = 'http://www.bundes-telefonbuch.de/cgi-btbneu/chtml/chtml?WA=20'
> response = urllib2.urlopen(url, data)
>
> this returns a page saying i have to reenter my search terms..
> what's going wrong here?
>
> Thanks!!

Try mechanize : http://wwwsearch.sourceforge.net/mechanize/

import mechanize
response = mechanize.urlopen("http://www.bundes-telefonbuch.de/")
forms = mechanize.ParseResponse(response, backwards_compat=False)
form = forms[0]
form["F0"] = "query" #enter query
html = mechanize.urlopen(form.click()).read()
f = open("tmp.html","w")
f.writelines(html)
f.close()

Or you can try to parse response but I think that their HTML is not
valid
From: Michael Torrie on
On 06/19/2010 04:23 AM, davidgp wrote:
> opener = urllib2.build_opener()
> opener.addheaders = [('User-Agent', 'Mozilla/5.0 (compatible;
> Konqueror/3.5; Linux) KHTML/3.5.4 (like Gecko)')]
> urllib2.install_opener(opener)
>
> data = urllib.urlencode({'F0': 'mySearchKeyword','B': 'T','F8': 'A ||
> G','W': '1','Z': '0','HA': '10','SAS_static_0_treffer_treffer': 'Suche
> starten','S': '1','translationtemplate': 'checkstrasse'})
>
> url = 'http://www.bundes-telefonbuch.de/cgi-btbneu/chtml/chtml?WA=20'
> response = urllib2.urlopen(url, data)
>
> this returns a page saying i have to reenter my search terms..
> what's going wrong here?

Most likely you need a cookie. You'll probably have to set up a cookie
store for use with urllib2, then request the page that the search form
is on so that the cookie is generated, and then make your post with your
search terms.