From: Jon Harrop on
Steven G. Kargl wrote:
> If all you really care about is Fortran 77, then grab f2c
> from netlib. Otherwise, I think you severely under estimate
> the difficult involved with one person writing a Fortran 77
> compiler.

Actually I have now completed the compiler and it was fiddly but really not
that difficult (~1 week full-time work using OCaml). I processed the code
in several stages:

.. Munge by line: concatenating, stripping comments and using regular
expressions to disambiguate tokens for the lexer.
.. Lex by line.
.. Parse by line, yielding a kind of AST for each line.
.. Recombine partial ASTs into a real AST.
.. Rearrange the AST into a more modern form, e.g. replacing gotos with tail
calls.
.. Spit out the AST in another language.

If anyone is interested in a Fortran 77 to anything compiler then I could
commercialize one based upon this. I might write a backend using LLVM for
fun.

--
Dr Jon D Harrop, Flying Frog Consultancy
http://www.ffconsultancy.com/products/?u
From: Tobias Burnus on
On Apr 4, 2:46 pm, Jon Harrop <j...(a)ffconsultancy.com> wrote:
> Steven G. Kargl wrote:
> > If all you really care about is Fortran 77, then grab f2c
> > from netlib. Otherwise, I think you severely under estimate
> > the difficult involved with one person writing a Fortran 77
> > compiler.
>
> Actually I have now completed the compiler and it was fiddly but really not
> that difficult (~1 week full-time work using OCaml). I processed the code
> in several stages:

I think both statements are true. Writing a full parser of Fortran 77
which supports all syntax is extremely demanding. Writing a parser
which converts a "typical" Fortran code or even better a Fortran code
following a special style much less demanding.

Does it parse the following?

integerinteger,end
realprint
logicalif
end=5
if=.true.
if(if)read(*,*)print,integer
end

Tobias

--
"Consistently separating words by spaces became a general custom about
the tenth century A.D., and lasted until about 1957, when FORTRAN
abandoned the practice." -- Sun FORTRAN Reference Manual
From: Jon Harrop on
Tobias Burnus wrote:
> On Apr 4, 2:46 pm, Jon Harrop <j...(a)ffconsultancy.com> wrote:
>> Steven G. Kargl wrote:
>> > If all you really care about is Fortran 77, then grab f2c
>> > from netlib. Otherwise, I think you severely under estimate
>> > the difficult involved with one person writing a Fortran 77
>> > compiler.
>>
>> Actually I have now completed the compiler and it was fiddly but really
>> not that difficult (~1 week full-time work using OCaml). I processed the
>> code in several stages:
>
> I think both statements are true. Writing a full parser of Fortran 77
> which supports all syntax is extremely demanding.

Yes. Fortunately, the code I am translating is almost very cleanly
written. :-)

> Writing a parser
> which converts a "typical" Fortran code or even better a Fortran code
> following a special style much less demanding.
>
> Does it parse the following?
>
> integerinteger,end

No. This could be fixed by lexing keywords directly rather than via
identifiers.

> realprint

No. Same solution.

> logicalif

No.

> end=5

Yes.

> if=.true.

Yes.

> if(if)read(*,*)print,integer

Yes.

> end

Yes.

Having said that, the generated code would be broken anyway. :-)

--
Dr Jon D Harrop, Flying Frog Consultancy
http://www.ffconsultancy.com/products/?u
From: glen herrmannsfeldt on
Jon Harrop wrote:
(someone wrote)

>>Does it parse the following?

>> integerinteger,end

Some of the harder ones I remember have to do with DO statements
and FORMAT statements. Because of the problems with FORMAT
statements, WATFIV made the string 'FORMAT(' at the beginning
of a statement reserved.

DO10I1=1,3
10 FORMAT(I1)=3
DO 10 I1 = 1.3

-- glen

From: James Giles on
glen herrmannsfeldt wrote:
> Jon Harrop wrote:
> (someone wrote)
>
>>> Does it parse the following?
>
>>> integerinteger,end
>
> Some of the harder ones I remember have to do with DO statements
> and FORMAT statements. Because of the problems with FORMAT
> statements, WATFIV made the string 'FORMAT(' at the beginning
> of a statement reserved.
>
> DO10I1=1,3
> 10 FORMAT(I1)=3
> DO 10 I1 = 1.3

Once again I'm surprised that Sale's algorithm isn't being used (and
that no one seems to know that these things are always handled
correctly by it). There are a few more cases in F90 and
beyond, but a suitable and fairly simple modification to Sale's
algorithm handles those too.

So, defining a simple concept: "Open context" means a symbol
that's not within parentheses and not in a string literal, hollerith
constant, or comment.

As I recall, 1) if a statement contains an open context equal sign (=),
the statement must be either an assignment statement, a DO statement,
or an IF statement. (Again, there are more cases in F90 and beyond.)
2) If a statement contains an open context comma(,), then it must begin
with a keyword. 3) If a statement contains an open context letter or
digit that immediately follows a close parenthesis, that statement
must begin with a keyword. 4) All other statements must begin
with a keyword.

Based on these rules (with only a little moderate effort) you can
identify the location of *all* the keywords in the source code.
So, even before lexical scanning, you can replace those keywords
with an appropriate special token that identifies the keyword.
Most of the keywords appear at the start of the statement. Most
of the rest appear (based on rule (3) above) immediately after
that close parenthesis in an IF statement.

Given

DO10I1=1,3

There is an open context equal and an open context comma. The
statement must begin with a keyword. DO is identified as that
keyword. No other keywords can be present in the statement.

10 FORMAT(I1)=3

There is an open context equal, but rules (2) and (3) don't apply.
By rule (1) the statement is an assignment. No keywords can be
present. Notice this is entirely independent of whether the statement
had a label. Sure, it's not permitted for a FORMAT not to be
labelled. But it *is* permitted for an assignment to be labelled.
So the choice can't depend on whether a label is present.

DO 10 I1 = 1.3

There is an open context equal. There are no commas or parens.
This is an assignment. No keywords are allowed. So, the compiler
doesn't even look for one.

Let's take the hardest of a previous article in the thread:

if(if)read(*,*)print,integer

No equal sign - so we already know the statement begins with
keyword. Given that it's an IF, the first place where you have an
open context letter preceeded by a close paren must also be a
keyword (READ in this case). There are no other keywords,
the algorithm doesn't even bother to look. Subtle yes, but not
excessively so.

The first published version of Sale's algorithm didn't replace the
keywords, it merely identified statement type (including the
"substatements" within IFs). It was about 500 lines of code
and was written in F66. Given that much of the code was more
complicated than necessary because it had to do character
manipulation with hollerith (he was, after all, helping people
that didn't yet have F77 prepare tools to support it). The proper
place to do this analysis is at the same time you remove comments
and stitch together continuations.

There is one other case I remember off the top of my head (and
I think there's yet another I don't remember, I'd have to read through
the existing lierature on the problem again). This case can't be
handled entirely by Sale's algorithm alone. Assuming that you are
going to allow identifiers longer that six characters, what do you
make of these:

INTEGER FUNCTION F(N)
or
INTEGER FUNCTION F(5)

Well, there is no equal sign in either, so the first token is the
INTEGER keyword. But what to make of the rest? If it's not
the first statement of a program unit, the next item *can't* be
a keyword (and you don't bother to look). If it is the first
statement of a program unit, the next token is FUNCTION for
the first of the above, but it's still just the identifier FUNCTIONF
in the second case. The decision depends on the syntactic
category of the things between the parentheses. You just have
to bite the bullet and special case the first line of program
units (only if the first non-blank characters are a typename
keyword followed by FUNCTION). Of course, if you don't
allow identifiers longer than six characters, the second
token must be a keyword, it's too long to be an identifier!

All this stuff is from memory, so I may have made some minor
errors somewhere. If I were to do it for real I'd have to review
my old notes. But getting this stuff right is about this level
of complexity. If Fortran really *was* hard to process correctly,
then how did they do it with computers we wouldn't consider
adequate to run a toaster oven?

--
J. Giles

"I conclude that there are two ways of constructing a software
design: One way is to make it so simple that there are obviously
no deficiencies and the other way is to make it so complicated
that there are no obvious deficiencies." -- C. A. R. Hoare


 |  Next  |  Last
Pages: 1 2 3 4 5 6 7 8
Prev: Vector Algebra Module
Next: Zero-size arrays