From: james27 on

hello..
im new to python.
i have some problem with mechanize.
before i was used mechanize with no problem.
but i couldn't success login with some site.
for several days i was looked for solution but failed.
my problem is , login is no probelm but can't retrieve html source code from
opened site.
actually only can read some small html code, such like below.

<html>
<script language=javascript>
location.replace("http://www.naver.com");
</script>
</html>

i want to retrive full html source code..but i can't . i was try with twill
and mechanize and urllib and so on.
i have no idea.. anyone can help me?

here is full source code.
and Thanks in advance!

# -*- coding: cp949 -*-
import sys,os
import mechanize, urllib
import cookielib
import re
import BeautifulSoup

params = urllib.urlencode({'url':'http://www.naver.com',
'svctype':'',
'viewtype':'',
'postDataKey':'',

'encpw':'3a793b174d976d8a614467eb0466898230f39ca68a8ce2e9c866f9c303e7c96a17c0e9bfd02b958d88712f5799abc5d26d5b6e2dfa090e10e236f2afafb723d42d2a2aba6cc3f268e214a169086af782c22d0c440c876a242a4411860dd938c4051acce987',
'encnm':'100003774',
'saveID':'0',
'enctp':'1',
'smart_level':'1',
'id':'lbu142vj',
'pw':'wbelryl',
'x':'24',
'y':'4'
})
rq = mechanize.Request("http://nid.naver.com/nidlogin.login", params)
rs = mechanize.urlopen(rq)
data = rs.read()
print data
rq = mechanize.Request("http://mail2.naver.com")
rs = mechanize.urlopen(rq)
data = rs.read()
print data
--
View this message in context: http://www.nabble.com/some-site-login-problem-help-plz..-tp25746497p25746497.html
Sent from the Python - python-list mailing list archive at Nabble.com.

From: Diez B. Roggisch on
james27 wrote:

>
> hello..
> im new to python.
> i have some problem with mechanize.
> before i was used mechanize with no problem.
> but i couldn't success login with some site.
> for several days i was looked for solution but failed.
> my problem is , login is no probelm but can't retrieve html source code
> from opened site.
> actually only can read some small html code, such like below.
>
> <html>
> <script language=javascript>
> location.replace("http://www.naver.com");
> </script>
> </html>
>
> i want to retrive full html source code..but i can't . i was try with
> twill and mechanize and urllib and so on.
> i have no idea.. anyone can help me?

Your problem is that the site uses JavaScript to replace itself. Mechanize
can't do anything about that. You might have more luck with scripting a
browser. No idea if there are any special packages available for that
though.

Diez
From: james27 on

still looking for good solution.
anyway..thanks Diez :)

Diez B. Roggisch-2 wrote:
>
> james27 wrote:
>
>>
>> hello..
>> im new to python.
>> i have some problem with mechanize.
>> before i was used mechanize with no problem.
>> but i couldn't success login with some site.
>> for several days i was looked for solution but failed.
>> my problem is , login is no probelm but can't retrieve html source code
>> from opened site.
>> actually only can read some small html code, such like below.
>>
>> <html>
>> <script language=javascript>
>> location.replace("http://www.naver.com");
>> </script>
>> </html>
>>
>> i want to retrive full html source code..but i can't . i was try with
>> twill and mechanize and urllib and so on.
>> i have no idea.. anyone can help me?
>
> Your problem is that the site uses JavaScript to replace itself. Mechanize
> can't do anything about that. You might have more luck with scripting a
> browser. No idea if there are any special packages available for that
> though.
>
> Diez
> --
> http://mail.python.org/mailman/listinfo/python-list
>
>

--
View this message in context: http://www.nabble.com/some-site-login-problem-help-plz..-tp25746497p25750229.html
Sent from the Python - python-list mailing list archive at Nabble.com.

From: lkcl on
On Oct 5, 8:26 am, "Diez B. Roggisch" <de...(a)nospam.web.de> wrote:
> james27 wrote:
>
> > hello..
> > im new to python.
> > i have some problem with mechanize.
> > before i was used mechanize with no problem.
> > but i couldn't success login with some site.
> > for several days i was looked for solution but failed.
> > my problem is , login is no probelm but can't retrieve html source code
> > from opened site.
> > actually only can read some small html code, such like below.
>
> > <html>
> > <script language=javascript>
> > location.replace("http://www.naver.com");
> > </script>
> > </html>
>
> > i want to retrive full html source code..but i can't . i was try with
> > twill and mechanize and urllib and so on.
> > i have no idea.. anyone can help me?
>
> Your problem is that the site usesJavaScriptto replace itself. Mechanize
> can't do anything about that. You might have more luck with scripting a
> browser. No idea if there are any special packages available for that
> though.

yes, there are. i've mentioned this a few times, on
comp.lang.python,
(so you can search for them) and have the instances documented here:

http://wiki.python.org/moin/WebBrowserProgramming

basically, you're not going to like this, but you actually need
a _full_ web browser engine, and to _execute_ the javascript.
then, after a suitable period of time (or after the engine's
"stopped executing" callback has been called, if it has one)
you can then node-walk the DOM of the engine, grab the engine's
document.body.innerHTML property, or use the engine's built-in
XPath support (if it has it) to find specific parts of the DOM
faster than if you extracted the text (into lxml etc).

you should not be shocked by this - by the fact that it takes
a whopping 10 or 20mb library, including a graphical display
mechanism, to execute a few bits of javascript.

also, if you ask him nicely, flier liu is currently working on
http://code.google.com/p/pyv8 and on implementing the W3C DOM
standard as a "daemon" service (i.e. with no GUI component) and
he might be able to help you out. the pyv8 project comes with
an example w3c.py file which implements DOM partially, but i
know he's done a lot more.

so - it's all doable, but for a given value of "do" :)

l.