From: Chris Martin on
I have a file with lines like these:

abc1234|one;two;three
xyz3245|two;three
def9876|four

From a data perspective, this is two fields separated by '|', where the
second field contains multiple subitems separated by semicolons.

I want to use sed to output this:

abc1234|one
abc1234|two
abc1234|three
xyz3245|two
xyz3245|three
def9876|four

In other words, I want to pair the first field on each line (before the
'|') with each sub-element in the second field, one pair to each line,
separated by '|'.

I can replace the semicolons with '\n' newlines, which gives me this:

abc1234|one
two
three
xyz3245|two
three
def9876|four

I seems like this should be straightforward, but I can't figure out a way
to substitute the first item multiple times, to end up with a repeat of
the first item on each line in every line from that set. Suggestions from
sed gurus will be appreciated.


Chris Martin
University of North Carolina at Chapel Hill
School of Medicine

From: Dave B on
Chris Martin wrote:

> I have a file with lines like these:
>
> abc1234|one;two;three
> xyz3245|two;three
> def9876|four
>
> From a data perspective, this is two fields separated by '|', where the
> second field contains multiple subitems separated by semicolons.
>
> I want to use sed to output this:
>
> abc1234|one
> abc1234|two
> abc1234|three
> xyz3245|two
> xyz3245|three
> def9876|four
>
> In other words, I want to pair the first field on each line (before the
> '|') with each sub-element in the second field, one pair to each line,
> separated by '|'.
>
> I can replace the semicolons with '\n' newlines, which gives me this:
>
> abc1234|one
> two
> three
> xyz3245|two
> three
> def9876|four
>
> I seems like this should be straightforward, but I can't figure out a way
> to substitute the first item multiple times, to end up with a repeat of
> the first item on each line in every line from that set. Suggestions from
> sed gurus will be appreciated.
>
>
> Chris Martin
> University of North Carolina at Chapel Hill
> School of Medicine

You can do that in more than one way. A very straightforward way is using
awk, like this:

awk -F '[|;]' '{for(i=2;i<=NF;i++)print $1"|"$i}' yourfile

That solution assumes that "|" and ";" are only used to delimit fields and
do not occur anywhere else.

If you want to use sed, you can do this:

sed ':n;s/\(\([^|]*\)|.*\);\([^;]*\)$/\1\n\2|\3/;tn' yourfile

(should work with most modern seds)

Hope this helps.

--
echo 0|sed 's909=oO#3u)o19;s0#0ooo)].O0;s()(0bu}=(;s#}#.1m"?0^2{#;
s)")9v2@3%"9$);so%op]t(p$e#!o;sz(z^+.z;su+ur!z"au;sxzxd?_{h)cx;:b;
s/\(\(.\).\)\(\(..\)*\)\(\(.\).\)\(\(..\)*#.*\6.*\2.*\)/\5\3\1\7/;
tb'|awk '{while((i+=2)<=length($1)-18)a=a substr($1,i,1);print a}'
From: Rakesh Sharma on
On Jun 18, 9:52 pm, Chris Martin <c...(a)vfemail.net> wrote:
> I have a file with lines like these:
>
> abc1234|one;two;three
> xyz3245|two;three
> def9876|four
>
> From a data perspective, this is two fields separated by '|', where the
> second field contains multiple subitems separated by semicolons.
>
> I want to use sed to output this:
>
> abc1234|one
> abc1234|two
> abc1234|three
> xyz3245|two
> xyz3245|three
> def9876|four
>


sed -e '
s/^\([^|]*[|]\)\([^;]*\)[;]/\1\2\
\1/
P;D
' yourfile