C String Prefix Operator Idea (Was: gfortran diagnostics andso on) [Fortran]

Prev: stack overflow
Next: Formatted stream I/O

From: glen herrmannsfeldt on 24 Nov 2007 11:29

James Van Buskirk wrote:
(snip)

> Well, not quite. Having high priority is normally undesirable I
> think, but of course we could always use parentheses. Let's see...
> we have to change '\a' to achar(7), '\b' to achar(8), '\f' to achar(12),
> '\n' to achar(10), '\r' to achar(13), '\t' to achar(9), '\v' to achar(11),
> "\'" to achar(39), '\"' to achar(34), '\\' to achar(92), '\?' to achar(63)
> and all that, but also '\1' to achar(1), '\12' to achar(10) and
> '\123' to achar(83). Also, achar(92)//achar(10) translates to ''.
> Microsoft says that '\c' should translate to 'c' but gfortran would
> translate it to '\c'.

K&R2, and I believe C89, leave these undefined. Many C compilers
generate them as 'c'.

> http://msdn2.microsoft.com/en-us/library/h21280bw(VS.80).aspx
> Translate '\x12' obviously to achar(18), but does that mean
> '\x1' becomes achar(1)? How about '\x123' or '\x'? These seem
> to not be unambiguously defined.

Most that I know of allow one, two, or three octal digits and
two hex digits. K&R2 says "There is no limit on the number of
digits, but the behavior is undefined if the resulting character
value exceeds that of the largest character." (This could be
interesting on systems with char larger than eight bits.)

--------------------------------

I suppose I don't see anything wrong with doing it as a function,
though I have never seen that done in C. In C, they are by
definition done to character constants allowing one to put
in characters that would otherwise be difficult.

As a side note, Java has unicode escapes of the form \uXXXX
where XXXX is four hex digits. These escapes are processed
earlier than string escapes, such that a line like:

s="some string\u000a";

will fail with an unterminated string, though

s="another string\u0022;

will work fine.

(snip)

> Even C compilers implement these escape sequences differently.

How are they different? The single character escapes should work
the same on all systems. The octal and hex escapes, if followed
by an octal or hex digit, could be done differently I suppose.

> Therefore anyone who uses them in C has to be ready to change
> his escape sequences around as the program is ported to different
> compilers. Changing them to invocations of ACHAR is then just
> as normal as would be the case in changing C compilers.

Do you have any examples that were done differently?

-- glen

From: Gary Scott on 24 Nov 2007 11:46

Craig Dedo wrote:
> "James Van Buskirk" <not_valid(a)comcast.net> wrote in message
> news:Dq2dnca0_tpIQNranZ2dnUVZ_jKdnZ2d(a)comcast.com...
>
>> "Craig Dedo" <cdedo(a)wi.rr.com> wrote in message
>> news:47472586$0$4966$4c368faf(a)roadrunner.com...
>>
>>> In addition to the other advantages that you list, anyone can
>>> implement this idea right now. There is no need to wait for a
>>> compiler vendor to provide the functionality. Just write the
>>> function to implement the C language substitution rules and then use
>>> the defined-operator feature. Of course, it would be a user-defined
>>> operator rather than an intrinsic operator, but that should not be
>>> much of an inconvenience; defined unary operators have the highest
>>> precedence anyway (Fortran 2003, 7.3, Table 7.7).
>>
>>
>>> Someone should implement this and then post it in this newsgroup.
>>> Perhaps I could do it, time permitting.
>>
>>
>> Well, not quite. Having high priority is normally undesirable I
>> think, but of course we could always use parentheses. Let's see...
>
>
> Thanks for pointing out the complications, but I believe that all or
> nearly all of them are already covered by the C 99 Standard, ISO/IEC
> 9899:1999, especially section 6.4.4.4.
>
>> we have to change '\a' to achar(7), '\b' to achar(8), '\f' to achar(12),
>> '\n' to achar(10), '\r' to achar(13), '\t' to achar(9), '\v' to
>> achar(11),
>> "\'" to achar(39), '\"' to achar(34), '\\' to achar(92), '\?' to
>> achar(63)
>> and all that, but also '\1' to achar(1), '\12' to achar(10) and
>> '\123' to achar(83). Also, achar(92)//achar(10) translates to ''.
>
>
> All of these are covered by C99, 6.4.4.4.
>
>> Microsoft says that '\c' should translate to 'c' but gfortran would
>> translate it to '\c'.
>> http://msdn2.microsoft.com/en-us/library/h21280bw(VS.80).aspx
>
>
> This behavior is Microsoft specific and is labelled as such on the
> web page you reference. It is most definitely non-standard. See C99,
> 6.4.4.4, par. 8 and note 64. In fact, note 64 requires an error
> condition if there is any escape sequence that is not covered by any of
> the specifically authorized translations.
>
>> Translate '\x12' obviously to achar(18), but does that mean
>> '\x1' becomes achar(1)? How about '\x123' or '\x'? These seem
>> to not be unambiguously defined.
>
>
> These are covered by C99, 6.4.4.4, especially par. 1, 5, 6, 7, and 9.
>
>> A problem with a user-defined function is the translation of
>> '\"' and "\'". These can't both be trapped in the same string
>> without compiler assistance. In a C program most of the
>> escape sequences are '\n' which is irrelevant in record-oriented
>> Fortran I/O. Also, do you want to trim the output of the
>> function to take into account the string contraction? Easily
>> done with a specification function, but it would stop the
>> function from being elemental.
>
>
> I had not thought about the elemental issue before, but making it
> elemental seems reasonable. It would add useful functionality at next
> to no cost. Blank padding at the end is not a big deal in Fortran.
>
>> Do you want to append the
>> achar(0) at the end as well? Probably not, because I don't
>> think existing implementations do so.
>
>
> I was not planning to add a null character at the end. FWIW, the CVF
> extension does add a null character at the end, according to the
> documentation. I have not looked at such strings in the debugger, but,
> in my experience, DEC, CVF, HP, and Intel documentation is highly accurate.

FWIW ""C is a microsoft Fortran extension rather than a CVF originated one.

>
>> Even C compilers implement these escape sequences differently.
>> Therefore anyone who uses them in C has to be ready to change
>> his escape sequences around as the program is ported to different
>> compilers. Changing them to invocations of ACHAR is then just
>> as normal as would be the case in changing C compilers.
>
>
> If I did it, I would follow the C99 standard rigorously and not worry
> about compiler-specific deviations from C99. That is the only way to
> keep the implementation effort to a reasonable amount.
>

--

Gary Scott
mailto:garylscott(a)sbcglobal dot net

Fortran Library: http://www.fortranlib.com

Support the Original G95 Project: http://www.g95.org
-OR-
Support the GNU GFortran Project: http://gcc.gnu.org/fortran/index.html

If you want to do the impossible, don't hire an expert because he knows
it can't be done.

-- Henry Ford

From: Gary Scott on 24 Nov 2007 11:48

Craig Dedo wrote:

> "James Van Buskirk" <not_valid(a)comcast.net> wrote in message
> news:Dq2dnca0_tpIQNranZ2dnUVZ_jKdnZ2d(a)comcast.com...
>
>> "Craig Dedo" <cdedo(a)wi.rr.com> wrote in message
>> news:47472586$0$4966$4c368faf(a)roadrunner.com...
>>
>>> In addition to the other advantages that you list, anyone can
>>> implement this idea right now. There is no need to wait for a
>>> compiler vendor to provide the functionality. Just write the
>>> function to implement the C language substitution rules and then use
>>> the defined-operator feature. Of course, it would be a user-defined
>>> operator rather than an intrinsic operator, but that should not be
>>> much of an inconvenience; defined unary operators have the highest
>>> precedence anyway (Fortran 2003, 7.3, Table 7.7).
>>
>>
>>> Someone should implement this and then post it in this newsgroup.
>>> Perhaps I could do it, time permitting.
>>
>>
>> Well, not quite. Having high priority is normally undesirable I
>> think, but of course we could always use parentheses. Let's see...
>
>
> Thanks for pointing out the complications, but I believe that all or
> nearly all of them are already covered by the C 99 Standard, ISO/IEC
> 9899:1999, especially section 6.4.4.4.
>
>> we have to change '\a' to achar(7), '\b' to achar(8), '\f' to achar(12),
>> '\n' to achar(10), '\r' to achar(13), '\t' to achar(9), '\v' to
>> achar(11),
>> "\'" to achar(39), '\"' to achar(34), '\\' to achar(92), '\?' to
>> achar(63)
>> and all that, but also '\1' to achar(1), '\12' to achar(10) and
>> '\123' to achar(83). Also, achar(92)//achar(10) translates to ''.
>
>
> All of these are covered by C99, 6.4.4.4.
>
>> Microsoft says that '\c' should translate to 'c' but gfortran would
>> translate it to '\c'.
>> http://msdn2.microsoft.com/en-us/library/h21280bw(VS.80).aspx
>
>
> This behavior is Microsoft specific and is labelled as such on the
> web page you reference. It is most definitely non-standard. See C99,
> 6.4.4.4, par. 8 and note 64. In fact, note 64 requires an error
> condition if there is any escape sequence that is not covered by any of
> the specifically authorized translations.
>
>> Translate '\x12' obviously to achar(18), but does that mean
>> '\x1' becomes achar(1)? How about '\x123' or '\x'? These seem
>> to not be unambiguously defined.
>
>
> These are covered by C99, 6.4.4.4, especially par. 1, 5, 6, 7, and 9.
>
>> A problem with a user-defined function is the translation of
>> '\"' and "\'". These can't both be trapped in the same string
>> without compiler assistance. In a C program most of the
>> escape sequences are '\n' which is irrelevant in record-oriented
>> Fortran I/O. Also, do you want to trim the output of the
>> function to take into account the string contraction? Easily
>> done with a specification function, but it would stop the
>> function from being elemental.
>
>
> I had not thought about the elemental issue before, but making it
> elemental seems reasonable. It would add useful functionality at next
> to no cost. Blank padding at the end is not a big deal in Fortran.
>
>> Do you want to append the
>> achar(0) at the end as well? Probably not, because I don't
>> think existing implementations do so.
>
>
> I was not planning to add a null character at the end. FWIW, the CVF
> extension does add a null character at the end, according to the
> documentation. I have not looked at such strings in the debugger, but,
> in my experience, DEC, CVF, HP, and Intel documentation is highly accurate.
>
>> Even C compilers implement these escape sequences differently.
>> Therefore anyone who uses them in C has to be ready to change
>> his escape sequences around as the program is ported to different
>> compilers. Changing them to invocations of ACHAR is then just
>> as normal as would be the case in changing C compilers.
>
>
> If I did it, I would follow the C99 standard rigorously and not worry
> about compiler-specific deviations from C99. That is the only way to
> keep the implementation effort to a reasonable amount.
>
None of these proposals seem to address a generic substitution
capability. That's far more broadly useful than simple escape sequences.

--

Gary Scott
mailto:garylscott(a)sbcglobal dot net

Fortran Library: http://www.fortranlib.com

Support the Original G95 Project: http://www.g95.org
-OR-
Support the GNU GFortran Project: http://gcc.gnu.org/fortran/index.html

If you want to do the impossible, don't hire an expert because he knows
it can't be done.

-- Henry Ford

|
Pages: 1
Prev: stack overflow
Next: Formatted stream I/O