From: zhouvian on
Hello,

I'd like to read the last 100 lines from a text file. I use the below
method:
1. Move the file descriptor (fd) to the end of file.
2. Move fd 2 byte backwards.
3. Read a byte.
4. If the char is '\n', increase counter. And restart from 1 until
counter == 100.

Are there any better methods to do the job?

I am coding in C++, Perl or tcl.

Thanks in advance!
From: dave.joubert on
On Jun 22, 6:59 am, zhouv...(a)gmail.com wrote:
> Hello,
>
> I'd like to read the last 100 lines from a text file. I use the below
> method:
> 1. Move the file descriptor (fd) to the end of file.
> 2. Move fd 2 byte backwards.
> 3. Read a byte.
> 4. If the char is '\n', increase counter. And restart from 1 until
> counter == 100.
>
> Are there any better methods to do the job?
>
> I am coding in C++, Perl or tcl.
>
> Thanks in advance!

If you are on a *nix type system, you may be much better off using
exec and tail -100 to do the job. This will cost you in terms of
portability.

If you are not, and you can use Tcl, try reading say a 16K buffer from
the end, and then using split and llength and reading zero or more
blocks located closer to the beginning. This will probably be faster
than working byte by byte.

Dave
From: Neil Madden on
zhouvian(a)gmail.com wrote:
> Hello,
>
> I'd like to read the last 100 lines from a text file. I use the below
> method:
> 1. Move the file descriptor (fd) to the end of file.
> 2. Move fd 2 byte backwards.
> 3. Read a byte.
> 4. If the char is '\n', increase counter. And restart from 1 until
> counter == 100.
>
> Are there any better methods to do the job?
>
> I am coding in C++, Perl or tcl.
>
> Thanks in advance!

How big is the file? If it fits in memory, then the simplest and
quickest way will be:

lrange [split [read $fd] \n] end-99 end

If it doesn't fit into memory, then you can at least read in much bigger
chunks than 1 byte and use:

incr count [regexp -all {\n} $buffer]

To count the number of newlines in each block. Reading 1 byte blocks
will be sloooooow.

-- Nei
From: dave.joubert on
On Jun 22, 2:49 pm, Neil Madden <n...(a)cs.nott.ac.uk> wrote:
> zhouv...(a)gmail.com wrote:
> > Hello,
>
> > I'd like to read the last 100 lines from a text file. I use the below
> > method:
> > 1. Move the file descriptor (fd) to the end of file.
> > 2. Move fd 2 byte backwards.
> > 3. Read a byte.
> > 4. If the char is '\n', increase counter. And restart from 1 until
> > counter == 100.
>
> > Are there any better methods to do the job?
>
> > I am coding in C++, Perl or tcl.
>
> > Thanks in advance!
>
> How big is the file? If it fits in memory, then the simplest and
> quickest way will be:
>
> lrange [split [read $fd] \n] end-99 end
>
> If it doesn't fit into memory, then you can at least read in much bigger
> chunks than 1 byte and use:
>
> incr count [regexp -all {\n} $buffer]
>
> To count the number of newlines in each block. Reading 1 byte blocks
> will be sloooooow.
>
> -- Nei

Here is a proc that explains what Neil and I were talking about:
proc tail {f want} {
set fd [open $f]
set gotCount 0
set buffSize 4096

while {$gotCount < $want} {
seek $fd -$buffSize end
set buff [read $fd $buffSize]
set gotCount [regexp -all {\n} $buff]
if {$gotCount < $want } {
set factor [expr {1+int(ceil($want/
$gotCount))}]
set buffSize [expr {$buffSize * $factor}]
}
}
return [lrange [split $buff \n] end-[expr {$want-1}] end]
}
set output [tail access.log 100]
puts stdout [join $output "\n"]

Note: My feeling is that the file IO subsystem will buffer things
enough, that there is no point in using Tcl's append etc to make a big
buffer out of lots of small buffers. This is the reason I chose an
expanding buffer. Also, a good guess rather than choosing an 4K
guestimate may help, as will checks that the file exists, and that the
buffer does not explode (no EOLs in the file)!!

time ./xx.tcl >/dev/null
real 0m0.006s
user 0m0.008s
sys 0m0.000s

time tail -100 access.log >/dev/null
real 0m0.001s
user 0m0.000s
sys 0m0.004s

Dave
From: billposer on
A slower approach that uses much less memory than slurping the entire
file at once and then splitting it is to read one line at a time and
keep the lines in a circular buffer. One way to implement a circular
buffer is to use an array indexed on the line number modulo the
desired number of lines.

proc LastKLines {k} {
set fh stdin
set LinesRead 0
set ModCnt 0
while {1} {
set CharsRead [gets $fh line]
set ModCnt [expr $LinesRead % $k]
if {$CharsRead >= 0} {
set Lines($ModCnt) $line
} else {
break
}
incr LinesRead
}
set LastIndex $ModCnt
set FirstIndex [expr ($ModCnt-1) % $k]
# Handle the case in which the file does not contain as many lines
# as desired.
if {$LinesRead > $k} {
set Limit $k
} else {
set Limit $LinesRead
}
for {set i $LastIndex} {$i < $Limit} {incr i} {
puts $Lines($i)
}
for {set i 0} {$i < $LastIndex} {incr i} {
puts $Lines($i)
}
}

#Test: LastKLines 100

 |  Next  |  Last
Pages: 1 2 3 4 5
Prev: FTP package?
Next: aes with critcl