From: Joe on
On Fri, 13 Aug 2010 11:17:37 -0700, Jim Thompson wrote:

> I have a data, voltage-versus-time, in 4 columns...
> Time DataVersion1 DataVersion2 DataVersion3
>
> Time is in 100ps steps, over a total span of 100ns, thus 1000 data
> points in each data column.
>
> The end-use software, IBIS, can only handle 100 data points per column.
>
> Snag #1: Data in each column doesn't quite match data in other
> columns... process variations, temperature, voltage, etc.
>
> Snag #2: Time column must be the _same_ for all three data columns.
>
> Any ideas on how to process, such that enough time points can be
> eliminated to reduce total time points to 100.
....

The following perl program can do that sort of thing, ie select
a subset of data points such that all the original tabulated values
of each of the dependent variables remains within some given limits
of linearly-interpolated values based on the selected points. See
program notes in lines 32-33 re tolerance limits $ELF and $AEL that
control how many points get removed. With tolerances as shown, a
set of 111 data lines of 5 numbers each (resistances vs temperature
for four different models of thermocouples) reduces to a set of 30
data lines. This program does nothing re "Snag #1".

---------cut here----------
#!/usr/bin/perl

# Program to winnow data by removing points that fall within specified
# fraction of linear-interpolated values between remaining points.
# (For small values, instead test absolute error; see comments.)

# Usage: ./multi-winnow < datasetin > datasetout

# Each line of input should contain decimal numbers, separated from
# each other by one or more blanks or tabs. Leading blanks, signs,
# decimal points are ok. The first number on each line is treated as
# the independent variable, x. Other numbers are treated as dependent
# variables.

# Method: Test if points between range-start ($p) and last-point-read
# ($L) are within tolerance limits ($ELF and $AEL) on all of the
# dependent variables, relative to linearly-interpolated values
# between points $p and $L. If not, output point $p and start new
# range.

# Change the printf format below as necessary to suit your data;
# eg, use %8d for integers, and eg %.4f, %8f, %9.3f, etc for reals.
sub outLine {
my $p = shift;
for ($i=1; $i <= $V[$p][0]; ++$i) {
printf " %.3f", $V[$p][$i];
}
print "\n";
}

#======================== Main ==========================
# Change $ELF, $AEL, and $eps to control accuracy of fit
$ELF = 0.01; # ELF = error-limit fraction ( 0.01 = 1% )
$AEL = 1e-7; # absolute-error-limit, used when abs(y) < eps
$eps = 1e-7; # If abs(y) < eps, test abs(y - interpolated y) < AEL
$L = 0; # L = input line number
$c = -1; # c = conversion state
$p = 0; # p = range start point
$cols = 4; # cols = number of data fields per line

foreach $s (<>) { # s = string of input
++$L; # Set line #
$s = " " . $s; # make front of line uniform
$s =~ s/[ +\t]+/ /g;# change runs of spaces, tabs, + to 1 space
@nums = split (/ /, $s);
$V[$L][0] = $#nums; # Save # of #'s on line

for ($i=1; $i <= $V[$L][0]; ++$i) {
$V[$L][$i] = 1.0 * $nums[$i]; # Convert string to number
}

if ($c > 0) { # See if any data on this line go out of range
for ($i=2; $i <= $V[$L][0]; ++$i) {
$c = $c && testRange($i);
}
outLine ($p) if ($c==0); # If O-O-R occurred, emit a line
}
$p = $L-1 if ($c==0); # If O-O-R or starting-up, remember line #
++$c;
}
outLine ($L);

#===== testRange(k) returns 1 if datum k is in range, or 0 if out. =====
sub testRange {
my $k = shift;
local ($yp=$V[$p][$k], $yl=$V[$L][$k], $a, $i, $u);
local ($xp=$V[$p][1], $xl=$V[$L][1], $d=$xl-$xp);
die "Duplicated x at lines $p and $L ?" if (abs($d) < 1e-10);

for ($i=$p+1; $i < $L; ++$i) {
$a = ($V[$i][1]-$xp)/$d;
$u = $V[$i][$k]; # Fetch actual occurring value
$v = $a*$yl + (1-$a)*$yp; # Compute interpolated value
if (abs($u) < $eps) {
return 0 if (abs($u-$v) > $AEL); # abs error check
} else {
return 0 if (abs(($u-$v)/$u) > $ELF); # ratio check
}
}
return 1;
} # By joe(a)swo-za, 13 Aug 2010
__END__
---------cut here----------

From: Joe on
On Sat, 14 Aug 2010 00:45:03 +0000, Joe wrote:
> On Fri, 13 Aug 2010 11:17:37 -0700, Jim Thompson wrote:
>> I have a data, voltage-versus-time, in 4 columns...
>> Time DataVersion1 DataVersion2 DataVersion3
>>
>> Time is in 100ps steps, over a total span of 100ns, thus
>> 1000 data points in each data column.
....
>> Any ideas on how to process, such that enough time points
>> can be eliminated to reduce total time points to 100.
....
> The following perl program can do that sort of thing,
....

Remove line shown below from previously-posted perl program:

> $cols = 4; # cols = number of data fields per line