ANSI terminal escape sequence regexp [Shell]

Prev: subtraction files
Next: copy files

From: pk on 25 May 2010 09:32

Janis Papanagnou wrote:

> pk wrote:
>> Janis Papanagnou wrote:
>>
>>> I am looking for a regexp that matches the ANSI terminal escape
>>> sequences (ESC [ ...) (for xterm), or alternatively for a tool (Linux)
>>> that replaces ANSI terminal sequences by an arbitrary chosen fixed
>>> replacement. Thanks.
>>
>> I've never done that, but I suppose any regex flavor that can match the
>> escape character would do, so for example with GNU sed's ERE to match
>> coloring sequences:
>>
>> \x1b\[[0-9]+;[0-9]+m
>>
>> or something similar.
>>
>> $ GREEN='\033[01;32m'; YELLOW='\033[01;33m'
>> $ printf "$GREEN - $YELLOW\n" | sed -r 's/\x1b\[[0-9]+;[0-9]+m/FOO/g'
>> FOO - FOO
>>
>> Apologies if I didn't understand correctly what you're after.
>
> Sorry for having been unclear.
>
> I know that I just need some BRE/ERE tool, like sed, to substitute the
> actual ANSI codes. I was interested in a regexp that covers all ANSI
> sequences in one regexp expression because, actually, I don't know what
> the telnet server will emit. (Please see also my response to Andrew.)

See if this expect tip helps:

http://wiki.tcl.tk/9673

From: Janis Papanagnou on 25 May 2010 10:39

pk wrote:
> Janis Papanagnou wrote:
>
>> pk wrote:
>>> Janis Papanagnou wrote:
>>>
>>>> I am looking for a regexp that matches the ANSI terminal escape
>>>> sequences (ESC [ ...) (for xterm), or alternatively for a tool (Linux)
>>>> that replaces ANSI terminal sequences by an arbitrary chosen fixed
>>>> replacement. Thanks.
>>> I've never done that, but I suppose any regex flavor that can match the
>>> escape character would do, so for example with GNU sed's ERE to match
>>> coloring sequences:
>>>
>>> \x1b\[[0-9]+;[0-9]+m
>>>
>>> or something similar.
>>>
>>> $ GREEN='\033[01;32m'; YELLOW='\033[01;33m'
>>> $ printf "$GREEN - $YELLOW\n" | sed -r 's/\x1b\[[0-9]+;[0-9]+m/FOO/g'
>>> FOO - FOO
>>>
>>> Apologies if I didn't understand correctly what you're after.
>> Sorry for having been unclear.
>>
>> I know that I just need some BRE/ERE tool, like sed, to substitute the
>> actual ANSI codes. I was interested in a regexp that covers all ANSI
>> sequences in one regexp expression because, actually, I don't know what
>> the telnet server will emit. (Please see also my response to Andrew.)
>
> See if this expect tip helps:
>
> http://wiki.tcl.tk/9673

Not sure. Quoting from the link (first example)...

regexp -- {^\x1b(\[|$|$)[;?0-9]*[0-9A-Za-z]} ${data} match

It seems that ANSI sequences can terminate in a digit. How could one
distinguish in a sequence like, say, \x1b[0A whether the A is part of
the ANSI sequence or part of the subsequent data.

Janis

From: Ben Bacarisse on 25 May 2010 12:43

Janis Papanagnou <janis_papanagnou(a)hotmail.com> writes:
> pk wrote:
<snip>
>> See if this expect tip helps:
>>
>> http://wiki.tcl.tk/9673
>
> Not sure. Quoting from the link (first example)...
>
> regexp -- {^\x1b(\[|$|$)[;?0-9]*[0-9A-Za-z]} ${data} match
>
> It seems that ANSI sequences can terminate in a digit.

A quick scan of some online documents suggest that this is not so. All
the sequences I've see end in a letter. Wikipedia suggest the last byte
must be between ASCII @ and ~ inclusive.

If you are prepared to use a very general regexp that will strip out
ill-formed escape sequences you could start with

\x1b\[[^@-~]*[@-~]

You then need to catch the two-byte sequences:

\x1b\[[^@-~]*[@-~]|\x1b[@-~]

This will go wrong for those sequences that can include quoted strings
like those that set key mappings. Maybe you can ignore these.

There is also a one-byte alternative to \x1b[ which is \x9b so you might
want to try:

(\x1b\[|\x9b)[^@-~]*[@-~]|\x1b[@-~]

--
Ben.

From: pk on 25 May 2010 12:49

Ben Bacarisse wrote:

> Janis Papanagnou <janis_papanagnou(a)hotmail.com> writes:
>> pk wrote:
> <snip>
>>> See if this expect tip helps:
>>>
>>> http://wiki.tcl.tk/9673
>>
>> Not sure. Quoting from the link (first example)...
>>
>> regexp -- {^\x1b(\[|$|$)[;?0-9]*[0-9A-Za-z]} ${data} match
>>
>> It seems that ANSI sequences can terminate in a digit.
>
> A quick scan of some online documents suggest that this is not so. All
> the sequences I've see end in a letter. Wikipedia suggest the last byte
> must be between ASCII @ and ~ inclusive.
>
> If you are prepared to use a very general regexp that will strip out
> ill-formed escape sequences you could start with
>
> \x1b\[[^@-~]*[@-~]
>
> You then need to catch the two-byte sequences:
>
> \x1b\[[^@-~]*[@-~]|\x1b[@-~]
>
> This will go wrong for those sequences that can include quoted strings
> like those that set key mappings. Maybe you can ignore these.
>
> There is also a one-byte alternative to \x1b[ which is \x9b so you might
> want to try:
>
> (\x1b\[|\x9b)[^@-~]*[@-~]|\x1b[@-~]

For reference, here are some tables with most ANSI escape sequences:

http://isthe.com/chongo/tech/comp/ansi_escapes.html
http://ascii-table.com/ansi-escape-sequences.php

From: Janis Papanagnou on 25 May 2010 13:52

Ben Bacarisse wrote:
> Janis Papanagnou <janis_papanagnou(a)hotmail.com> writes:
>> pk wrote:
> <snip>
>>> See if this expect tip helps:
>>>
>>> http://wiki.tcl.tk/9673
>> Not sure. Quoting from the link (first example)...
>>
>> regexp -- {^\x1b(\[|$|$)[;?0-9]*[0-9A-Za-z]} ${data} match
>>
>> It seems that ANSI sequences can terminate in a digit.
>
> A quick scan of some online documents suggest that this is not so. All
> the sequences I've see end in a letter. Wikipedia suggest the last byte
> must be between ASCII @ and ~ inclusive.
>
> If you are prepared to use a very general regexp that will strip out
> ill-formed escape sequences you could start with
>
> \x1b\[[^@-~]*[@-~]
>
> You then need to catch the two-byte sequences:
>
> \x1b\[[^@-~]*[@-~]|\x1b[@-~]
>
> This will go wrong for those sequences that can include quoted strings
> like those that set key mappings. Maybe you can ignore these.

Yes, I think I can ignore those.

>
> There is also a one-byte alternative to \x1b[ which is \x9b so you might
> want to try:
>
> (\x1b\[|\x9b)[^@-~]*[@-~]|\x1b[@-~]
>

Looks good, and seems to work. Thanks, Ben. Thanks also to Andrew and
pk.

Just an additional note for those who try that expression and observe
problems; setting LANG=C might fix some issues in non-C locales.

Janis

First | Prev | Next | Last
Pages: 1 2 3
Prev: subtraction files
Next: copy files