|
From: Giacomo on 26 Oct 2005 19:09 I need to extract a substring of n adjacent digits from every single line of a file. The position of the n digits are different from line to line. For example: asdasd 123 asd 191991 1234 lijoioi 4567 asdi 67567 iojoii For n=4 the result for each line must be 1234 e 4567. Thanks in advance, Giacomo.
From: Janis Papanagnou on 26 Oct 2005 20:16 Giacomo wrote: > I need to extract a substring of n adjacent digits from every single > line of a file. The position of the n digits are different from line to > line. What type of shell or programs you do have to use? What have you tried to program thus far? General outline, for example... Depending on whether the shell/tool/program supports extended regular expressions or not you have to either define a regexp like [0-9]{n} or construct one from n sequences of [0-9]. This regexp must be embedded within white space [ \t] or non-numerical patterns [^0-9] depending on your requirements. Take care of the line boundaries, so you'll likely have to consider start of line ^ for the left and end of line $ for the right boundary. Finally extract the substring from the matching part. Consider to add spaces to the front and read of the input line to simplify the matching and extraction of the substring pattern. > For example: > > asdasd 123 asd 191991 1234 > lijoioi 4567 asdi 67567 iojoii > > For n=4 the result for each line must be 1234 e 4567. Janis
From: William James on 26 Oct 2005 21:33 Giacomo wrote: > I need to extract a substring of n adjacent digits from every single > line of a file. The position of the n digits are different from line to > line. > > For example: > > asdasd 123 asd 191991 1234 > lijoioi 4567 asdi 67567 iojoii > > For n=4 the result for each line must be 1234 e 4567. > > Thanks in advance, > Giacomo. ruby -ne 'puts $1 if /(?:^|\D)(\d{4})(?!\d)/'
From: William Park on 26 Oct 2005 22:42 Giacomo <a(a)b.cde> wrote: > I need to extract a substring of n adjacent digits from every single > line of a file. The position of the n digits are different from line to > line. > > For example: > > asdasd 123 asd 191991 1234 > lijoioi 4567 asdi 67567 iojoii > > For n=4 the result for each line must be 1234 e 4567. a='asdasd 123 asd 191991 1234 lijoioi 4567 asdi 67567 iojoii' RE='\<[0-9]{4}\>' echo "${a|+$RE}" Ref: http://home.eol.ca/~parkw/index.html#parameter_expansion -- William Park <opengeometry(a)yahoo.ca>, Toronto, Canada ThinFlash: Linux thin-client on USB key (flash) drive http://home.eol.ca/~parkw/thinflash.html BashDiff: Super Bash shell http://freshmeat.net/projects/bashdiff/
From: Ed Morton on 26 Oct 2005 22:52
Giacomo wrote: > I need to extract a substring of n adjacent digits from every single > line of a file. The position of the n digits are different from line to > line. > > For example: > > asdasd 123 asd 191991 1234 > lijoioi 4567 asdi 67567 iojoii > > For n=4 the result for each line must be 1234 e 4567. > > Thanks in advance, > Giacomo. Using a POSIX awk: awk '{for (i=1;i<=NF;i++) if ($i ~ /^[0-9]{4}$/) print $i}' To get GNU awk (gawk) to behave like that, use awk --posix ... or awk --re-interval .... There are cuter ways to get the same result in awk, but this is the simplest and most obvious. Regards, Ed. |