From: Klaus on
On 21 avr, 20:07, s...(a)netherlands.com wrote:
> On Wed, 21 Apr 2010 10:06:14 -0700 (PDT), Klaus <klau...(a)gmail.com> wrote:
> >On 21 avr, 14:35, alwaysonnet <kalyanrajsi...(a)gmail.com> wrote:
> >> Hello all,
> >> I'm trying to parse the XML using XML::Twig Module as my XML could be
> >> very large to handle using XML::Simple. Please help me out of how to
> >> print the values based on the following...
> >>  <B>get the values of Sender, Receiver</B>
> >>  <B>get the FileType. In this case possible values are
> >> InitTAP,FatalRAP,ReTxTAP</B>

> Thats nice. Lets say he generally said "in this case its:"
> InitTAP  ReTxTAP  FatalRAP
> Why? Because its the file type.
> Maybe he wants all file types of the sender/reciever's.

in that case you use XML::Reader->newhd(... {filter => 2});

use strict;
use warnings;
use XML::Reader;

my $rdr = XML::Reader->newhd(\*DATA, {filter => 2});

my ($sender, $receiver);

while ($rdr->iterate) {
if ($rdr->path eq '/Data/ConnectionList/Connection/Sender') {
$sender = $rdr->value;
}
elsif ($rdr->path eq '/Data/ConnectionList/Connection/Receiver') {
$receiver = $rdr->value;
}
elsif ($rdr->is_start
and $rdr->path =~ m{\A /Data/ConnectionList/Connection/
FileItemList/FileItem/FileType/ (\w+) \z}xms) {
printf "Sender: %-5s, Receiver: %-5s, Type: %s\n",
$sender, $receiver, $1;
}
}

Here is the output

Sender: BRADD, Receiver: SHANE, Type: InitTAP
Sender: BRADD, Receiver: SHANE, Type: ReTxTAP
Sender: BRADD, Receiver: SHANE, Type: FatalRAP
From: sln on
On Wed, 21 Apr 2010 11:48:59 -0700 (PDT), Klaus <klaus03(a)gmail.com> wrote:

>On 21 avr, 20:07, s...(a)netherlands.com wrote:
>> On Wed, 21 Apr 2010 10:06:14 -0700 (PDT), Klaus <klau...(a)gmail.com> wrote:
>> >On 21 avr, 14:35, alwaysonnet <kalyanrajsi...(a)gmail.com> wrote:
>> >> Hello all,
>> >> I'm trying to parse the XML using XML::Twig Module as my XML could be
>> >> very large to handle using XML::Simple. Please help me out of how to
>> >> print the values based on the following...
>> >> �<B>get the values of Sender, Receiver</B>
>> >> �<B>get the FileType. In this case possible values are
>> >> InitTAP,FatalRAP,ReTxTAP</B>
>
>> Thats nice. Lets say he generally said "in this case its:"
>> InitTAP �ReTxTAP �FatalRAP
>> Why? Because its the file type.
>> Maybe he wants all file types of the sender/reciever's.
>
>in that case you use XML::Reader->newhd(... {filter => 2});
>
>use strict;
>use warnings;
>use XML::Reader;
>
>my $rdr = XML::Reader->newhd(\*DATA, {filter => 2});
>
>my ($sender, $receiver);
>
>while ($rdr->iterate) {
> if ($rdr->path eq '/Data/ConnectionList/Connection/Sender') {
> $sender = $rdr->value;
> }
> elsif ($rdr->path eq '/Data/ConnectionList/Connection/Receiver') {
> $receiver = $rdr->value;
> }
> elsif ($rdr->is_start
> and $rdr->path =~ m{\A /Data/ConnectionList/Connection/
>FileItemList/FileItem/FileType/ (\w+) \z}xms) {
> printf "Sender: %-5s, Receiver: %-5s, Type: %s\n",
> $sender, $receiver, $1;
> }
>}
>
>Here is the output
>
>Sender: BRADD, Receiver: SHANE, Type: InitTAP
>Sender: BRADD, Receiver: SHANE, Type: ReTxTAP
>Sender: BRADD, Receiver: SHANE, Type: FatalRAP

This is pretty good. I assume it does attribute/value as well.
It appears to be a lot of regex work, the more unknown the
elements become, but thats a tree stack.

It would be good though to have a capture mechanism, where
xml capture can be triggered on/off by the user, later to
be regurgitated to the user (on demand), and given to an
xml::simple style mechanism to turn it into filtered records.

It wouldn't change the simple, low memmory stream parsing at all,
just the source would be captured (appended) on/off to a named buffer,
on demand.

Its not as easy as it seems though. CaptureON/OFF (bufname, before/after),
nested capture's, single data pool. I think I've done this before.

-sln
From: Klaus on
On 22 avr, 10:29, Klaus <klau...(a)gmail.com> wrote:
> On 21 avr, 14:35, alwaysonnet <kalyanrajsi...(a)gmail.com> wrote:
> > Hello all,
> > I'm trying to parse the XML using XML::Twig Module as my XML could be
> > very large to handle using XML::Simple.
> Klaus <klau...(a)gmail.com> wrote:
> > However, let me bring in a shameless plug:
> > You could also use my module XML::Reader
> >http://search.cpan.org/~keichner/XML-Reader-0.32/lib/XML/Reader.pm
> s...(a)netherlands.com wrote:
> > > Indeed shameless.
>
> > > [...]
>
> > > It would be good though to have a capture mechanism, where
> > > xml capture can be triggered on/off by the user, later to
> > > be regurgitated to the user (on demand), and given to an
> > > xml::simple style mechanism to turn it into filtered records.
>
> use XML::Reader;
> my $rdr = XML::Reader->newhd(\*DATA, {filter => 3,
>     using => '/Data/ConnectionList/Connection/FileItemList/FileItem/
> FileType'});

I have now released XML::Reader 0.34
http://search.cpan.org/~keichner/XML-Reader-0.34/lib/XML/Reader.pm

This new version allows to write the same program (...the program that
uses XML::Reader to capture sub-trees from a potentially very big XML
file into a buffer and pass that buffer to XML::Simple...) even
shorter:

use strict;
use warnings;
use XML::Reader 0.34;

use XML::Simple;
use Data::Dumper;

my $rdr = XML::Reader->newhd(\*DATA, {filter => 5},
{ root => '/Data/ConnectionList/Connection/FileItemList/FileItem/
FileType', branch => '*' },
);

while ($rdr->iterate) {
my $buffer = $rdr->rval;
my $ref = XMLin($buffer);
print Dumper($ref), "\n\n";
}
From: sln on
On Mon, 26 Apr 2010 13:13:24 -0700 (PDT), Klaus <klaus03(a)gmail.com> wrote:

>On 22 avr, 10:29, Klaus <klau...(a)gmail.com> wrote:
>> On 21 avr, 14:35, alwaysonnet <kalyanrajsi...(a)gmail.com> wrote:
>> > Hello all,
>> > I'm trying to parse the XML using XML::Twig Module as my XML could be
>> > very large to handle using XML::Simple.
>> Klaus <klau...(a)gmail.com> wrote:
>> > However, let me bring in a shameless plug:
>> > You could also use my module XML::Reader
>> >http://search.cpan.org/~keichner/XML-Reader-0.32/lib/XML/Reader.pm
>> s...(a)netherlands.com wrote:
>> > > Indeed shameless.
>>
>> > > [...]
>>
>> > > It would be good though to have a capture mechanism, where
>> > > xml capture can be triggered on/off by the user, later to
>> > > be regurgitated to the user (on demand), and given to an
>> > > xml::simple style mechanism to turn it into filtered records.
>>
>> use XML::Reader;
>> my $rdr = XML::Reader->newhd(\*DATA, {filter => 3,
>> � � using => '/Data/ConnectionList/Connection/FileItemList/FileItem/
>> FileType'});
>
>I have now released XML::Reader 0.34
>http://search.cpan.org/~keichner/XML-Reader-0.34/lib/XML/Reader.pm
>
>This new version allows to write the same program (...the program that
>uses XML::Reader to capture sub-trees from a potentially very big XML
>file into a buffer and pass that buffer to XML::Simple...) even
>shorter:
>
>use strict;
>use warnings;
>use XML::Reader 0.34;
>
>use XML::Simple;
>use Data::Dumper;
>
>my $rdr = XML::Reader->newhd(\*DATA, {filter => 5},
> { root => '/Data/ConnectionList/Connection/FileItemList/FileItem/
>FileType', branch => '*' },
> );
>
>while ($rdr->iterate) {
> my $buffer = $rdr->rval;
> my $ref = XMLin($buffer);
> print Dumper($ref), "\n\n";
>}

Good job on this.

my $buffer = '';

while ($rdr->iterate) {
$buffer .= $rdr->rval;
}

if (length $buffer) {
my $ref = XMLin('<FileItem>'.$buffer.'</FileItem>');
print Dumper($ref), "\n\n";
}

-sln
From: John Bokma on
Klaus <klaus03(a)gmail.com> writes:

> my $rdr = XML::Reader->newhd(\*DATA, {filter => 5},

To me filter is very unclear. I understand that it are options to the
program, but just 5 is very confusing. Maybe split "filter" in several
options which combined result in 1,2,3,4,5 ?

why is the constructor called newhd?

anyway, thanks for mentioning this module, I will check it out when I
have more time.

--
John Bokma j3b

Hacking & Hiking in Mexico - http://johnbokma.com/
http://castleamber.com/ - Perl & Python Development