From: J�rgen Exner on
"Thomas Andersson" <thomas(a)tifozi.net> wrote:
>As it is now it keeps grabbing the same page over and over thousands of
>times (creating new files for each loop).
>
>my $pcnt = 1;
>my $page = get
>"http://csr.wwiionline.com/scripts/services/persona/sorties.jsp?page=$pcnt&pid=$pid";
>while ($page) {
> if ($page) {
> print "Site is alive\n";
> }
> else {
> print "Site is not accessible\n";
> };
>
>#Create filename and write file, then save grabbed webpage into it.
>open FILE, ">", "c:\\scr\\$pid-pg$pcnt.txt" or die $!;
>print FILE $page;
>$pcnt += 1;
>};
>
>I guess the URL doesn't get updated by the increased pagecount, any
>suggestions on how to fix that part?

It may or it may not. Had you used better indentation then you might
have spotted that your get() is outside of the loop, therefore it is
executed only once, therefore the value of $page never changes, and
therefore of course your loop never terminates because the loop
condition will always be the same value as in the first test.

jue
From: Sherm Pendley on
"Thomas Andersson" <thomas(a)tifozi.net> writes:

> Sherm Pendley wrote:
>
>>> while ($page) {
>> The if() is redundant here; if $page is false, the while() will exit
>> and the if() won't be reached.
>
> Sorry, didn't quite get what you were saying here?

You had originally written something like this:

while ($page) {
if ($page) {
# do stuff
} else {
}
}

Since the while() loop repeats only if $page evaluates to a true
value, you don't need to check $page again with an if(). If $page is
false, the body of the loop will not execute at all, so by the time
you reach the line that the if() is on, you already know that $page
is true. So, the if() block will always run, and the else block never
will; that being the case, it's simpler to just omit the if():

while ($page) {
# do stuff
}

Note that while() only checks its condition *once* before repeating
its block of code. So you can't omit the if(), if the value of $page
might get changed inside the while(), before reaching the if():

while ($page) {

# code that might change $page

# check $page again, because it might have been changed, and
# the while() loop won't check again until the next time we get
# to the top of the loop

if ($page) {
# do stuff
}
}

sherm--

--
Sherm Pendley <www.shermpendley.com>
<www.camelbones.org>
Cocoa Developer
From: Thomas Andersson on
Sherm Pendley wrote:

> You had originally written something like this:
>
> while ($page) {
> if ($page) {
> # do stuff
> } else {
> }
> }
>
> Since the while() loop repeats only if $page evaluates to a true
> value, you don't need to check $page again with an if(). If $page is
> false, the body of the loop will not execute at all, so by the time
> you reach the line that the if() is on, you already know that $page
> is true. So, the if() block will always run, and the else block never
> will; that being the case, it's simpler to just omit the if():

Ah, I realized that afterwards while looking over the code. That if/then bit
was a leftover from a example script I found and is now gone as it serves no
purpose in my script. Next thing I need to add is a check for the exit
conditions. Thinking about using $page as condition might be a bad idea, how
about it checking for a signal variable to be set? Inside the loop code
would run untill my exit conditions are meet and then it sets the signal
variable telling the loop to end? The two conditions would be finding a
either of two strings within the captured page (either a sid we already know
or the string "No more sorties").


From: Thomas Andersson on
Uri Guttman wrote:
>>>>>> "TA" == Thomas Andersson <thomas(a)tifozi.net> writes:

> so you need to put some conditionals in the loop. first, how would you
> know when the pages are done? can you look for a link to the next page
> and exit the loop if it isn't there? then define what a 'processed
> link' is. keep track (likely in a hash) of processed links and if you
> find one exit the loop. exiting a loop is easy, use the last function.

They've been quite helpfull there as the empty pages contain the string "No
more sorties". The other condition is trickier, I need to load a variable at
the same time as the pid that tells the last processed sid, when that sid is
found no further pages needs to be loaded (the whole point of capturing
these list pages is so we can extract all sids we find in them for further
processing).

> use less comment. make your comments mean something outside the
> code. code is what, comments are why. and you are writing code to be
> read by a maintainer. always keep that person in your mind and your
> code will be better for it.

Well, I only started learning perl a day ago and the comments are mostly for
my own sake to remind me what I'm doing as most of this stuff is still
pretty voodoo to me.

> have you ever heard of white space? jamming lines of code together
> makes major migraines when reading it. loosen up a little. blank
> lines between sections is a good idea.

Rodger that, will do.

>> open PIDLIST, "<", $pidfile or die "Could not open $pidfile: $!";
>> my $pid = <PIDLIST>;
>> print $pid; # print just so we know we have a pid to process.
>
> comments on the code line are a poor idea in most cases. when they are
> long comments it is a horrible idea.

OK, will stop doing that then.

>> chomp $pid; # Remove endline from pid.
>
> again, you are telling us what you just did. redundant to anyone who
> knows what chomp is.

Ok, but as I said before, I'm learning and those comments are only for my
own information to help me learn. Once it's done I can go over and remove
all thsoe comments and put something more useful in.

>> my $page = get "$pbase?page=$pcnt&pid=$pid";
>> while ($page) {
>
> bah. it is not clear why you are testing page in the loop. and you
> have two duplicate lines with the get. make it an infinite loop and
> exit when the get fails.

Yeah, that's a big bug with my code and I know about it. The idea was to
keep loading pages untill there was no more, but that idea failed as the
server keeps serving empty pages with ever higher page numbers. Another
solution for finding a loop ender is needed and I have two requirements that
each should end it.

>> # Create file for storing pages containing the sids.
>> my $tmpf = "c:/scr/$pid.txt";
>> open TEMPF, ">>", $tmpf or die "Could not open $tmpf: $!";
>> print TEMPF $page; # Store grabbed webpage into the file
>
> you can do that with getstore or use File::Slurp's write_file (from
> cpan).
>
> use File::Slurp ;
>
> write_file( "c:/scr/$pid.txt", $page ) ;
> much easier to read.

Definitely, so that one call replaces all 3 of my lines? Butwill I get a
error message like prrevious if it fails?

> here is a better loop:
>
> while( 1 ) {
>
> my $page = get "$pbase?page=$pcnt&pid=$pid";
> last unless $page ;
> write_file( "c:/scr/$pid.txt", $page ) ;
> }
>
> short, easy to read, easy to maintain. now you can add in the checks
> for exiting the loop and it will be easier.

Hmm, as I'm noob I don't quit get it, but I think it's allong the lines I
mentioned in another message. I assume a non failure signals 1? and I need
to set anything but inside the loop to exit it? But what do I set? it has no
variable name?


From: Tad McClellan on
Thomas Andersson <thomas(a)tifozi.net> wrote:
> Uri Guttman wrote:

>> here is a better loop:
>>
>> while( 1 ) {
>>
>> my $page = get "$pbase?page=$pcnt&pid=$pid";
>> last unless $page ;
>> write_file( "c:/scr/$pid.txt", $page ) ;
>> }
>>
>> short, easy to read, easy to maintain. now you can add in the checks
>> for exiting the loop and it will be easier.
>
> Hmm, as I'm noob I don't quit get it, but I think it's allong the lines I
> mentioned in another message. I assume


No need to assume, just look it up in the docs for the function
you are using:

perldoc LWP::Simple

The get() function will fetch the document identified by the
given URL and return it. It returns "undef" if it fails.


> a non failure signals 1?


A non-failure stores the contents of the page in $page (a true value).

A failure stores an undef in $page (a false value).

You should probably avoid using the word "signal" unless you
are talking about signals. That is, the term has a particular
meaning to programmers:

http://en.wikipedia.org/wiki/Signal_%28computing%29


> and I need
> to set anything but inside the loop to exit it?


No.

$page will contain undef (false) when the get() fails.

"unless" executes its statement when the condition is false.

So, when get() fails, "last" is evaluated and the loop will be exited.


--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.liamg\100cm.j.dat/"
The above message is a Usenet post.
I don't recall having given anyone permission to use it on a Web site.