From: nobody on
I'm trying to process flat files with many thousands of records. In
these files several rows comprise the information for a single customer.
In the example __DATA__ below, I'm trying to fill the variables with the
customer information while the customer number is 06020004293, then for
customer number 07020000279, and finally customer number 09020000251. I
believe my problem is looping while the customer number remains the same,
then move on to the next customer numbers. I've been pulling my hair out
with nested while and do loops. I've included the desired output below.
Here's what I'm working with so far:



#!/usr/bin/perl

use strict;
use warnings;

my (
$Name,
$City,
$Street
);


while (<DATA>) {

chomp;

if (substr($_, 12, 1) eq 'A') {
$Name = substr($_, 14, 17);
}

if (substr($_, 12, 1) eq 'B') {
$City = substr($_, 14, 17);
}

if (substr($_, 12, 1) eq 'C') {
$Street = substr($_, 33, 19);
}



}


print "Name: $Name\n";
print "City: $City\n";
print "Street: $Street\n";


# Desired output:

#Name: Fred Flintstone
#City: Bedrock
#Street: 123 Bedrock Road

#Name: George Washington
#City: Washington D.C.
#Street:

#Name: Joe Smith
#City: Smallville
#Street:


__DATA__
06020004293 A Fred Flintstone 123 Bedrock Road
06020004293 B Bedrock Gravel Pit
06020004293 C Loney Toons 123 Bedrock Road
07020000279 A George Washington 234 Washington Ave.
07020000279 B Washington D.C. 234 Washington Ave.
09020000251 A Joe Smith 54 Abbey Road
09020000251 B Smallville 54 Abbey Road
From: Ben Morrow on

Quoth nobody <nobody(a)nowhere.com>:
> I'm trying to process flat files with many thousands of records. In
> these files several rows comprise the information for a single customer.
> In the example __DATA__ below, I'm trying to fill the variables with the
> customer information while the customer number is 06020004293, then for
> customer number 07020000279, and finally customer number 09020000251. I
> believe my problem is looping while the customer number remains the same,
> then move on to the next customer numbers. I've been pulling my hair out
> with nested while and do loops. I've included the desired output below.
> Here's what I'm working with so far:

Since you want to print out the information you have every time you see
a new customer number, you need to extract and remember the number from
each line. I'll make minimal additions to your code to acheive this,
then talk about general style later.

> #!/usr/bin/perl
>
> use strict;
> use warnings;
>
> my (
> $Name,
> $City,
> $Street

$Customer,
$Last_Customer,

Note that Perl explicitly allows you to include a trailing comma in
lists, so that you can add and remove lines without worrying about
whether this was the last entry or not.

> );
>
> while (<DATA>) {
>
> chomp;
>

# Get the customer number for the new line
$Customer = substr($_, 0, 10);

if (
# ..we've seen at least one line already, and...
defined $Last_Customer and

# ...the new line is for a different customer from the last...
$Customer ne $Last_Customer
) {
# ...print out the data for the old customer before we proceed
# to extract the data for the new one.
print "Name: $Name\n";
print "City: $City\n";
print "Street: $Street\n";
print "\n";
}

# Remember which customer we were on for next time round the loop.
$Last_Customer = $Customer;

> if (substr($_, 12, 1) eq 'A') {
> $Name = substr($_, 14, 17);
> }
>
> if (substr($_, 12, 1) eq 'B') {
> $City = substr($_, 14, 17);
> }
>
> if (substr($_, 12, 1) eq 'C') {
> $Street = substr($_, 33, 19);
> }
>
> }

We need to keep this final section in this version of the program, since
otherwise the very last customer will never get their information
printed. *However*, that fact should immediately make you say to
yourself 'I've just written the same thing twice. How could I have
avoided that?'.

> print "Name: $Name\n";
> print "City: $City\n";
> print "Street: $Street\n";

The first comment to make about style is, IMHO, that multiple 'print'
statements are always a bad idea. Perl has a special form of multi-line
quoting called 'here documents' which allow you to avoid that:

print <<OUTPUT;
Name: $Name
City: $City
Street: $Street

OUTPUT

See the section "<<EOF" in perldoc perlop for more details.

The second is that it would be much easier to split the line into fields
first, rather than picking out pieces as you need them. For this I would
use a regex, which will additionally let you check that the line looks
as you expect. So, I might write something like

my @record = /^(\d{10}) ([ABC]) (.{17}) (.{19})$/
or die "Invalid record: [$_]";

which does rather a lot of things in one statement. First the /.../
expression matches $_ against the given pattern, and returns a list of
substrings. Start with perldoc perlretut to understand the syntax used
for the patterns. Next, the 'my @record =' takes that list of
substrings, and puts it in a newly-declared array. Finally, if the
pattern match failed, the whole expression is 'false', so the 'or die
"..."' will fire to alert you of the error. (The reason for putting the
offending line in [] in the error message is so you can easily see if
there is extra whitespace at either end.)

Using this array is then straightforward: the customer number is in
$record[0], the line code in $record[1], and the two data fields in
$record[2] and $record[3].

(The next step would be to turn the printing into a subroutine, so you
don't have to duplicate the code, and to build up a hash for each
customer rather than using global variables; but this post is already
quite long enough... :).)

Ben

From: Tad McClellan on
nobody <nobody(a)nowhere.com> wrote:
> I'm trying to process flat files with many thousands of records. In
> these files several rows comprise the information for a single customer.
> In the example __DATA__ below, I'm trying to fill the variables with the
> customer information while the customer number is 06020004293, then for
> customer number 07020000279, and finally customer number 09020000251.


Another way of saying that is:

fill the variables with the customer information until the start
of the next customer record (marked by an 'A' row).


> I
> believe my problem is looping while the customer number remains the same,


or looping until an 'A' line is found...


> then move on to the next customer numbers.

[snip]

> # Desired output:
>
> #Name: Fred Flintstone
> #City: Bedrock
> #Street: 123 Bedrock Road
>
> #Name: George Washington
> #City: Washington D.C.
> #Street:
>
> #Name: Joe Smith
> #City: Smallville
> #Street:


--------------------------------
#!/usr/bin/perl
use warnings;
use strict;

my %buffer;
while ( <DATA> ) {
chomp;
my $code = substr $_, 12, 1;

if ( $code eq 'A' ) {
if ( keys %buffer) {
output(%buffer);
%buffer = ();
}
$buffer{Name} = substr $_, 14, 17;
}
elsif ( $code eq 'B' ) {
$buffer{City} = substr $_, 14, 17;
}
elsif ( $code eq 'C' ) {
$buffer{Street} = substr $_, 34, 18;
}
else {
warn "code '$code' is invalid\n";
}
}
output(%buffer);


sub output {
my %h = @_;
foreach my $key qw/Name City Street/ {
print "#$key: ";
print $h{$key} if defined $h{$key};
print "\n";
}
print "\n";
}


__DATA__
06020004293 A Fred Flintstone 123 Bedrock Road
06020004293 B Bedrock Gravel Pit
06020004293 C Loney Toons 123 Bedrock Road
07020000279 A George Washington 234 Washington Ave.
07020000279 B Washington D.C. 234 Washington Ave.
09020000251 A Joe Smith 54 Abbey Road
09020000251 B Smallville 54 Abbey Road
--------------------------------


--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
From: nobody on
On Fri, 13 Nov 2009 20:37:20 -0600, Tad McClellan wrote:

Thanks for your answer, it does exactly as I asked. However, the data
files I'm dealing with are more complicated. In the __DATA__ below, as
part of the 06020004293 records, Fred has two daughters, both of which
comprise a 'B' record. Fred's 06020004293 data outputs Sue Flintstone
twice like:

Name: Fred Flintstone
Daughter: Sue Flintstone
Daughter2: Sue Flintstone
Company: Gravel Pit
OldCompany: Loney Toons
Street: 123 Bedrock Road

The first should be Jane Flintstone, so I'm trying to do something in the
code below where it says "NEED Daughter2". Any help would be greatly
appreciated again!



#!/usr/bin/perl
use warnings;
use strict;

my $flag = 0;

my %buffer;
while ( <DATA> ) {
chomp;
my $code = substr $_, 12, 1;

if ( $code eq 'A' ) {
if ( keys %buffer) {
output(%buffer);
%buffer = ();
}
$buffer{Name} = substr $_, 14, 17;
$buffer{Street} = substr $_, 32, 17;
}
elsif ( $code eq 'B' ) {
$buffer{Daughter} = substr $_, 14, 17;
$flag = 1;


####### NEED Daughter2

if ($buffer{Daughter}) {
$buffer{Daughter2} = substr $_, 14, 17;
}


}
elsif ( $code eq 'C' ) {
$buffer{Company} = substr $_, 32, 18;
}
elsif ( $code eq 'D' ) {
$buffer{OldCompany} = substr $_, 14, 18;
}
else {
warn "code '$code' is invalid\n";
}
}

output(%buffer);


sub output {
my %h = @_;
foreach my $key qw/Name Daughter Daughter2 Company OldCompany Street/
{
print "$key: ";
print $h{$key} if defined $h{$key};
print "\n";
}
print "\n";
}


__DATA__
06020004293 A Fred Flintstone 123 Bedrock Road
06020004293 B Jane Flintstone 123 Bedrock Road
06020004293 B Sue Flintstone 123 Bedrock Road
06020004293 C Bedrock Gravel Pit
06020004293 D Loney Toons 123 Bedrock Road
07020000279 A George Washington 234 Washington Ave.
07020000279 C Washington D.C. 234 Washington Ave.
09020000251 A Joe Smith 54 Abbey Road
09020000251 C Smallville 54 Abbey Road

From: nobody on
On Sat, 14 Nov 2009 08:37:33 -0800, sln wrote:

> On Sat, 14 Nov 2009 00:12:20 GMT, nobody <nobody(a)nowhere.com> wrote:
>
>>I'm trying to process flat files with many thousands of records. In
>>these files several rows comprise the information for a single customer.
>>In the example __DATA__ below, I'm trying to fill the variables with the
>>customer information while the customer number is 06020004293, then for
>>customer number 07020000279, and finally customer number 09020000251. I
>>believe my problem is looping while the customer number remains the
>>same, then move on to the next customer numbers. I've been pulling my
>>hair out with nested while and do loops. I've included the desired
>>output below. Here's what I'm working with so far:
>>
>>
> Hey dude, I see your on your 5th or 7th incarnation of this so called
> problem. Several threads later this flat file is in a fixed width form,
> but at least its in the same flintstone janra. You should work for WB's
> or try upgrading your cable subscription to something other than toon
> tv.
>


Hey dude, you're confused. I'm working with various data files in
various formats. Some are flat files, some are delimited. You should
learn some manners, give up your lame attempts at humor, And learn how
to spell genera.