Prev: store encrypted data in sqlite ?
Next: Python vs Java -> Perché i pythonisti ce l'hanno tanto con Java?
From: james27 on 5 Oct 2009 03:54 hello.. im new to python. i have some problem with mechanize. before i was used mechanize with no problem. but i couldn't success login with some site. for several days i was looked for solution but failed. my problem is , login is no probelm but can't retrieve html source code from opened site. actually only can read some small html code, such like below. <html> <script language=javascript> location.replace("http://www.naver.com"); </script> </html> i want to retrive full html source code..but i can't . i was try with twill and mechanize and urllib and so on. i have no idea.. anyone can help me? here is full source code. and Thanks in advance! # -*- coding: cp949 -*- import sys,os import mechanize, urllib import cookielib import re import BeautifulSoup params = urllib.urlencode({'url':'http://www.naver.com', 'svctype':'', 'viewtype':'', 'postDataKey':'', 'encpw':'3a793b174d976d8a614467eb0466898230f39ca68a8ce2e9c866f9c303e7c96a17c0e9bfd02b958d88712f5799abc5d26d5b6e2dfa090e10e236f2afafb723d42d2a2aba6cc3f268e214a169086af782c22d0c440c876a242a4411860dd938c4051acce987', 'encnm':'100003774', 'saveID':'0', 'enctp':'1', 'smart_level':'1', 'id':'lbu142vj', 'pw':'wbelryl', 'x':'24', 'y':'4' }) rq = mechanize.Request("http://nid.naver.com/nidlogin.login", params) rs = mechanize.urlopen(rq) data = rs.read() print data rq = mechanize.Request("http://mail2.naver.com") rs = mechanize.urlopen(rq) data = rs.read() print data -- View this message in context: http://www.nabble.com/some-site-login-problem-help-plz..-tp25746497p25746497.html Sent from the Python - python-list mailing list archive at Nabble.com.
From: Diez B. Roggisch on 5 Oct 2009 04:26 james27 wrote: > > hello.. > im new to python. > i have some problem with mechanize. > before i was used mechanize with no problem. > but i couldn't success login with some site. > for several days i was looked for solution but failed. > my problem is , login is no probelm but can't retrieve html source code > from opened site. > actually only can read some small html code, such like below. > > <html> > <script language=javascript> > location.replace("http://www.naver.com"); > </script> > </html> > > i want to retrive full html source code..but i can't . i was try with > twill and mechanize and urllib and so on. > i have no idea.. anyone can help me? Your problem is that the site uses JavaScript to replace itself. Mechanize can't do anything about that. You might have more luck with scripting a browser. No idea if there are any special packages available for that though. Diez
From: james27 on 5 Oct 2009 08:49 still looking for good solution. anyway..thanks Diez :) Diez B. Roggisch-2 wrote: > > james27 wrote: > >> >> hello.. >> im new to python. >> i have some problem with mechanize. >> before i was used mechanize with no problem. >> but i couldn't success login with some site. >> for several days i was looked for solution but failed. >> my problem is , login is no probelm but can't retrieve html source code >> from opened site. >> actually only can read some small html code, such like below. >> >> <html> >> <script language=javascript> >> location.replace("http://www.naver.com"); >> </script> >> </html> >> >> i want to retrive full html source code..but i can't . i was try with >> twill and mechanize and urllib and so on. >> i have no idea.. anyone can help me? > > Your problem is that the site uses JavaScript to replace itself. Mechanize > can't do anything about that. You might have more luck with scripting a > browser. No idea if there are any special packages available for that > though. > > Diez > -- > http://mail.python.org/mailman/listinfo/python-list > > -- View this message in context: http://www.nabble.com/some-site-login-problem-help-plz..-tp25746497p25750229.html Sent from the Python - python-list mailing list archive at Nabble.com.
From: lkcl on 12 Oct 2009 13:35
On Oct 5, 8:26 am, "Diez B. Roggisch" <de...(a)nospam.web.de> wrote: > james27 wrote: > > > hello.. > > im new to python. > > i have some problem with mechanize. > > before i was used mechanize with no problem. > > but i couldn't success login with some site. > > for several days i was looked for solution but failed. > > my problem is , login is no probelm but can't retrieve html source code > > from opened site. > > actually only can read some small html code, such like below. > > > <html> > > <script language=javascript> > > location.replace("http://www.naver.com"); > > </script> > > </html> > > > i want to retrive full html source code..but i can't . i was try with > > twill and mechanize and urllib and so on. > > i have no idea.. anyone can help me? > > Your problem is that the site usesJavaScriptto replace itself. Mechanize > can't do anything about that. You might have more luck with scripting a > browser. No idea if there are any special packages available for that > though. yes, there are. i've mentioned this a few times, on comp.lang.python, (so you can search for them) and have the instances documented here: http://wiki.python.org/moin/WebBrowserProgramming basically, you're not going to like this, but you actually need a _full_ web browser engine, and to _execute_ the javascript. then, after a suitable period of time (or after the engine's "stopped executing" callback has been called, if it has one) you can then node-walk the DOM of the engine, grab the engine's document.body.innerHTML property, or use the engine's built-in XPath support (if it has it) to find specific parts of the DOM faster than if you extracted the text (into lxml etc). you should not be shocked by this - by the fact that it takes a whopping 10 or 20mb library, including a graphical display mechanism, to execute a few bits of javascript. also, if you ask him nicely, flier liu is currently working on http://code.google.com/p/pyv8 and on implementing the W3C DOM standard as a "daemon" service (i.e. with no GUI component) and he might be able to help you out. the pyv8 project comes with an example w3c.py file which implements DOM partially, but i know he's done a lot more. so - it's all doable, but for a given value of "do" :) l. |