From: gtb on
if {[regexp {^\s*/[/*]} $line]} {


The line above seems to find C style comments. I know that the caret
anchors it but am not sure what \s means.

thanks
From: Jonathan Bromley on
On Thu, 3 Sep 2009 12:16:49 -0700 (PDT), gtb
<goodTweetieBird(a)hotmail.com> wrote:

>if {[regexp {^\s*/[/*]} $line]} {
>
>
>The line above seems to find C style comments.

It will not match trailing comments:

if (p == q) // this won't match

> I know that the caret
>anchors it but am not sure what \s means.

It matches any white-space character (tab, space and a few others).
See the "re_syntax" man page for more details.
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.bromley(a)MYCOMPANY.com
http://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.
From: goodTweetieBird on
Thanks but I found that it is whitespace. So the search is for
beginning of line, possible white space, then /* or //.

Right?

From: Donald Arseneau on
On Sep 3, 12:18 pm, Jonathan Bromley <jonathan.brom...(a)MYCOMPANY.com>
wrote:
> <goodTweetieB...(a)hotmail.com> wrote:
> >if {[regexp {^\s*/[/*]} $line]} {
>
> >The line above seems to find C style comments.
>
> It will not match trailing comments:
>
> if (p == q) // this won't match

Those are not "C style comments" but "C++ style comments",
and the particular regexp seems to locate whole-line comments
of either style, but makes a mess of multi-line C-style
comments.

regexp {/\*.*?\*/} $wholetext

will get simple C-style comments (ignoring the many possible
complications such as quoted "/*" in strings); Note $wholetext
instead of $line to indicate that it has to scan the whole
file, not line-by-line.

regexp -line {//.*$} $wholetext
regexp {//.*$} $line

both find C++ style comments.

> not sure what \s means.

A white-space character. (Tcl man re_syntax.)

If you do want to collect all preceding space for regexps that
take the whole line, then you could separately do:

regexp {^\s*?/\*.*?\*/} $wholetext
regexp -line {^\s*?//.*?$} $wholetext

which brings up a nasty catch that, err, catches me all the time!
Since I need a non-greedy qualifier ".*?" later in the regexp, I
have made the first one non-greedy also -- there can be only one
style!

Furthermore, this makes 4 regexps to do consecutively. If you
want to combine them, then the greediness requirement becomes
unmanageable: the .*? in /\*.*?\*/ must become .*, which then
matches too much; and the fix looks like gibberish:

regexp {(?:^\s*)?/\*[^*]*(?:\*(?!/)[^*]*)*\*/} $wholetext

OK, let's explain....

First, (?: ) is non-capturing grouping. Too bad it is uglier than
( ).

# (grouped) pattern for spaces at the beginning of a line:
set indentation {(?:^\s*)}

But let's omit that from the beginning of the C pattern (see below)

# literal slash-star:
set slashstar {/\*}
# literal star-slash:
set star-slash {\*/}
# all non-star characters:
set allnonstar {[^*]*}
# star character not followed by a slash ("negative lookahead" for
slash):
set lonestar {\*(?!/)}
# A lone star plus ensuing non-star characters, in a group:
set starmore "(?:$lonestar$allnonstar)"
# non-star characters plus any ensuing (groups of) lone-star plus
more:
set nonstarwithlonestars "$allnonstar$starmore*"
# All together:
# a /*, all non-star characters as well as
# stars not followed by slash, and a */ at the end:
set Cpattern "$slashstar$allnonstar$starmore*$starslash"

The other obstacle to combining the C and C++ patterns is the
line-by-line matching for the C++ \\ style. That is easily
changed though

set untilendline {[^\n]*}
set Cpppattern "//$untilendline"

And the full pattern is:

# Optional indentation plus (C comment OR C++ comment)
set pattern "$indentation?(?:$Cpattern|$Cpppattern)"

regexp -all $pattern $wholetext {}

Eeeek!

I hope that was useful for information purposes. It is not
useful for an engine to capture or remove all comments from
C programs because it ignores the peculiarities ofspecial
cases like comments in quoted strings and, if the application
treats /* /* */ */ as nested, nested comments in comments.

I recommend the wiki page http://wiki.tcl.tk/14658


Donald Arseneau