|
Prev: FTP package?
Next: aes with critcl
From: zhouvian on 22 Jun 2008 01:59 Hello, I'd like to read the last 100 lines from a text file. I use the below method: 1. Move the file descriptor (fd) to the end of file. 2. Move fd 2 byte backwards. 3. Read a byte. 4. If the char is '\n', increase counter. And restart from 1 until counter == 100. Are there any better methods to do the job? I am coding in C++, Perl or tcl. Thanks in advance!
From: dave.joubert on 22 Jun 2008 04:41 On Jun 22, 6:59 am, zhouv...(a)gmail.com wrote: > Hello, > > I'd like to read the last 100 lines from a text file. I use the below > method: > 1. Move the file descriptor (fd) to the end of file. > 2. Move fd 2 byte backwards. > 3. Read a byte. > 4. If the char is '\n', increase counter. And restart from 1 until > counter == 100. > > Are there any better methods to do the job? > > I am coding in C++, Perl or tcl. > > Thanks in advance! If you are on a *nix type system, you may be much better off using exec and tail -100 to do the job. This will cost you in terms of portability. If you are not, and you can use Tcl, try reading say a 16K buffer from the end, and then using split and llength and reading zero or more blocks located closer to the beginning. This will probably be faster than working byte by byte. Dave
From: Neil Madden on 22 Jun 2008 09:49 zhouvian(a)gmail.com wrote: > Hello, > > I'd like to read the last 100 lines from a text file. I use the below > method: > 1. Move the file descriptor (fd) to the end of file. > 2. Move fd 2 byte backwards. > 3. Read a byte. > 4. If the char is '\n', increase counter. And restart from 1 until > counter == 100. > > Are there any better methods to do the job? > > I am coding in C++, Perl or tcl. > > Thanks in advance! How big is the file? If it fits in memory, then the simplest and quickest way will be: lrange [split [read $fd] \n] end-99 end If it doesn't fit into memory, then you can at least read in much bigger chunks than 1 byte and use: incr count [regexp -all {\n} $buffer] To count the number of newlines in each block. Reading 1 byte blocks will be sloooooow. -- Nei
From: dave.joubert on 22 Jun 2008 13:19 On Jun 22, 2:49 pm, Neil Madden <n...(a)cs.nott.ac.uk> wrote: > zhouv...(a)gmail.com wrote: > > Hello, > > > I'd like to read the last 100 lines from a text file. I use the below > > method: > > 1. Move the file descriptor (fd) to the end of file. > > 2. Move fd 2 byte backwards. > > 3. Read a byte. > > 4. If the char is '\n', increase counter. And restart from 1 until > > counter == 100. > > > Are there any better methods to do the job? > > > I am coding in C++, Perl or tcl. > > > Thanks in advance! > > How big is the file? If it fits in memory, then the simplest and > quickest way will be: > > lrange [split [read $fd] \n] end-99 end > > If it doesn't fit into memory, then you can at least read in much bigger > chunks than 1 byte and use: > > incr count [regexp -all {\n} $buffer] > > To count the number of newlines in each block. Reading 1 byte blocks > will be sloooooow. > > -- Nei Here is a proc that explains what Neil and I were talking about: proc tail {f want} { set fd [open $f] set gotCount 0 set buffSize 4096 while {$gotCount < $want} { seek $fd -$buffSize end set buff [read $fd $buffSize] set gotCount [regexp -all {\n} $buff] if {$gotCount < $want } { set factor [expr {1+int(ceil($want/ $gotCount))}] set buffSize [expr {$buffSize * $factor}] } } return [lrange [split $buff \n] end-[expr {$want-1}] end] } set output [tail access.log 100] puts stdout [join $output "\n"] Note: My feeling is that the file IO subsystem will buffer things enough, that there is no point in using Tcl's append etc to make a big buffer out of lots of small buffers. This is the reason I chose an expanding buffer. Also, a good guess rather than choosing an 4K guestimate may help, as will checks that the file exists, and that the buffer does not explode (no EOLs in the file)!! time ./xx.tcl >/dev/null real 0m0.006s user 0m0.008s sys 0m0.000s time tail -100 access.log >/dev/null real 0m0.001s user 0m0.000s sys 0m0.004s Dave
From: billposer on 22 Jun 2008 17:25
A slower approach that uses much less memory than slurping the entire file at once and then splitting it is to read one line at a time and keep the lines in a circular buffer. One way to implement a circular buffer is to use an array indexed on the line number modulo the desired number of lines. proc LastKLines {k} { set fh stdin set LinesRead 0 set ModCnt 0 while {1} { set CharsRead [gets $fh line] set ModCnt [expr $LinesRead % $k] if {$CharsRead >= 0} { set Lines($ModCnt) $line } else { break } incr LinesRead } set LastIndex $ModCnt set FirstIndex [expr ($ModCnt-1) % $k] # Handle the case in which the file does not contain as many lines # as desired. if {$LinesRead > $k} { set Limit $k } else { set Limit $LinesRead } for {set i $LastIndex} {$i < $Limit} {incr i} { puts $Lines($i) } for {set i 0} {$i < $LastIndex} {incr i} { puts $Lines($i) } } #Test: LastKLines 100 |