|
From: jpco94340 on 13 Apr 2008 03:09 Sorry for my very bad english. I want to extract text from an html page and I try to use sed with the multi line pattern. To give you a sample. The text is the following: </CENTER> </td> <td> Here is the text I want to get!!!!! </td> I Think it was possible to do something very simple like that cat file.htm | sed 's|</CENTER>\n</td>\n<td>\n^\(.*\)$\n</td>|\1|' It doesn't work with the SED embeded with MLS Toolkit 8.5.1 on windows 2000 Pro SP4. So I try to understand how it was possible to work with special characters like newline with a very sample file and I write a text like that : A B B I try to simply replace the two first lines by the unique line A with something like that cat test.txt | sed 's|A\n\B|A\n|' It doesn'work !! I can do that : echo "A\n\B\nC" and I have a file with three lines but I can't use the \n with the search/replace option of sed and I dont find how to tell SED I want to work with newline I try also \r\n but it doesn'work I try to use hex values , \Ox0D , \Ox0A , but it doesn't seem to work. By example, if I only do that to try switch A and B : echo "A" | sed 's/A/\0x42/' I only have : Ax42 . I don't find how to work with hex values It's a long time I'm searching. I would appreciate some help
From: pk on 13 Apr 2008 05:16 jpco94340(a)hotmail.com wrote: > 2000 Pro SP4. So I try to understand how it was possible to work with > special characters like newline with a very sample file and I write a > text like that : > A > B > B > I try to simply replace the two first lines by the unique line A with > something like that > cat test.txt | sed 's|A\n\B|A\n|' > It doesn'work !! > I can do that : echo "A\n\B\nC" and I have a file with three > lines > but I can't use the \n with the search/replace option of sed and I > dont find how to tell SED I want to work with newline > I try also \r\n but it doesn'work > I try to use hex values , \Ox0D , \Ox0A , but it doesn't seem to > work. > By example, if I only do that to try switch A and B : > echo "A" | sed 's/A/\0x42/' I only have : Ax42 . I don't find how > to work with hex values > > It's a long time I'm searching. I would appreciate some help See the SED FAQ: http://student.northpark.edu/pemente/sed/sedfaq5.html section 5.10: Why can't I match or delete a newline using the \n escape sequence? Why can't I match 2 or more lines using \n? Hope this helps. -- All the commands are tested with bash and GNU tools, so they may use nonstandard features. I try to mention when something is nonstandard (if I'm aware of that), but I may miss something. Corrections are welcome.
From: Ed Morton on 13 Apr 2008 19:55 On 4/13/2008 2:09 AM, jpco94340(a)hotmail.com wrote: > Sorry for my very bad english. > I want to extract text from an html page and I try to use sed with > the multi line pattern. Don't. Only use sed for simple substitutions in one line. For anything else use awk, perl, ruby, etc.... > To give you a sample. The text is the following: > </CENTER> > </td> > <td> > Here is the text I want to get!!!!! > </td> > > I Think it was possible to do something very simple like that > cat file.htm | sed 's|</CENTER>\n</td>\n<td>\n^\(.*\)$\n</td>|\1|' Yes, but not with sed. Just escape the backslashes with GNU awk: $ cat file </CENTER> </td> <td> Here is the text I want to get!!!!! </td> $ gawk -v RS= '{print gensub(/<\/CENTER>\n<\/td>\n<td>\n(.*)\n<\/td>/,"\\1","")}' file Here is the text I want to get!!!!! There may be better solutions depending on your requirements and a more complete sample input file. > It doesn't work with the SED embeded with MLS Toolkit 8.5.1 on windows > 2000 Pro SP4. So I try to understand how it was possible to work with > special characters like newline with a very sample file and I write a > text like that : > A > B > B > I try to simply replace the two first lines by the unique line A with > something like that > cat test.txt | sed 's|A\n\B|A\n|' > It doesn'work !! $ cat file A B B $ awk -v RS= '{print gensub(/A\nB/,"A","")}' file A B Regards, Ed. > I can do that : echo "A\n\B\nC" and I have a file with three > lines > but I can't use the \n with the search/replace option of sed and I > dont find how to tell SED I want to work with newline > I try also \r\n but it doesn'work > I try to use hex values , \Ox0D , \Ox0A , but it doesn't seem to > work. > By example, if I only do that to try switch A and B : > echo "A" | sed 's/A/\0x42/' I only have : Ax42 . I don't find how > to work with hex values > > It's a long time I'm searching. I would appreciate some help
|
Pages: 1 Prev: wholesale nike air force one shoes Next: how to get TTY with SSH session |