|
Prev: Vector Algebra Module
Next: Zero-size arrays
From: Jon Harrop on 4 Apr 2008 08:46 Steven G. Kargl wrote: > If all you really care about is Fortran 77, then grab f2c > from netlib. Otherwise, I think you severely under estimate > the difficult involved with one person writing a Fortran 77 > compiler. Actually I have now completed the compiler and it was fiddly but really not that difficult (~1 week full-time work using OCaml). I processed the code in several stages: .. Munge by line: concatenating, stripping comments and using regular expressions to disambiguate tokens for the lexer. .. Lex by line. .. Parse by line, yielding a kind of AST for each line. .. Recombine partial ASTs into a real AST. .. Rearrange the AST into a more modern form, e.g. replacing gotos with tail calls. .. Spit out the AST in another language. If anyone is interested in a Fortran 77 to anything compiler then I could commercialize one based upon this. I might write a backend using LLVM for fun. -- Dr Jon D Harrop, Flying Frog Consultancy http://www.ffconsultancy.com/products/?u
From: Tobias Burnus on 4 Apr 2008 09:16 On Apr 4, 2:46 pm, Jon Harrop <j...(a)ffconsultancy.com> wrote: > Steven G. Kargl wrote: > > If all you really care about is Fortran 77, then grab f2c > > from netlib. Otherwise, I think you severely under estimate > > the difficult involved with one person writing a Fortran 77 > > compiler. > > Actually I have now completed the compiler and it was fiddly but really not > that difficult (~1 week full-time work using OCaml). I processed the code > in several stages: I think both statements are true. Writing a full parser of Fortran 77 which supports all syntax is extremely demanding. Writing a parser which converts a "typical" Fortran code or even better a Fortran code following a special style much less demanding. Does it parse the following? integerinteger,end realprint logicalif end=5 if=.true. if(if)read(*,*)print,integer end Tobias -- "Consistently separating words by spaces became a general custom about the tenth century A.D., and lasted until about 1957, when FORTRAN abandoned the practice." -- Sun FORTRAN Reference Manual
From: Jon Harrop on 4 Apr 2008 09:12 Tobias Burnus wrote: > On Apr 4, 2:46 pm, Jon Harrop <j...(a)ffconsultancy.com> wrote: >> Steven G. Kargl wrote: >> > If all you really care about is Fortran 77, then grab f2c >> > from netlib. Otherwise, I think you severely under estimate >> > the difficult involved with one person writing a Fortran 77 >> > compiler. >> >> Actually I have now completed the compiler and it was fiddly but really >> not that difficult (~1 week full-time work using OCaml). I processed the >> code in several stages: > > I think both statements are true. Writing a full parser of Fortran 77 > which supports all syntax is extremely demanding. Yes. Fortunately, the code I am translating is almost very cleanly written. :-) > Writing a parser > which converts a "typical" Fortran code or even better a Fortran code > following a special style much less demanding. > > Does it parse the following? > > integerinteger,end No. This could be fixed by lexing keywords directly rather than via identifiers. > realprint No. Same solution. > logicalif No. > end=5 Yes. > if=.true. Yes. > if(if)read(*,*)print,integer Yes. > end Yes. Having said that, the generated code would be broken anyway. :-) -- Dr Jon D Harrop, Flying Frog Consultancy http://www.ffconsultancy.com/products/?u
From: glen herrmannsfeldt on 4 Apr 2008 09:44 Jon Harrop wrote: (someone wrote) >>Does it parse the following? >> integerinteger,end Some of the harder ones I remember have to do with DO statements and FORMAT statements. Because of the problems with FORMAT statements, WATFIV made the string 'FORMAT(' at the beginning of a statement reserved. DO10I1=1,3 10 FORMAT(I1)=3 DO 10 I1 = 1.3 -- glen
From: James Giles on 4 Apr 2008 15:11
glen herrmannsfeldt wrote: > Jon Harrop wrote: > (someone wrote) > >>> Does it parse the following? > >>> integerinteger,end > > Some of the harder ones I remember have to do with DO statements > and FORMAT statements. Because of the problems with FORMAT > statements, WATFIV made the string 'FORMAT(' at the beginning > of a statement reserved. > > DO10I1=1,3 > 10 FORMAT(I1)=3 > DO 10 I1 = 1.3 Once again I'm surprised that Sale's algorithm isn't being used (and that no one seems to know that these things are always handled correctly by it). There are a few more cases in F90 and beyond, but a suitable and fairly simple modification to Sale's algorithm handles those too. So, defining a simple concept: "Open context" means a symbol that's not within parentheses and not in a string literal, hollerith constant, or comment. As I recall, 1) if a statement contains an open context equal sign (=), the statement must be either an assignment statement, a DO statement, or an IF statement. (Again, there are more cases in F90 and beyond.) 2) If a statement contains an open context comma(,), then it must begin with a keyword. 3) If a statement contains an open context letter or digit that immediately follows a close parenthesis, that statement must begin with a keyword. 4) All other statements must begin with a keyword. Based on these rules (with only a little moderate effort) you can identify the location of *all* the keywords in the source code. So, even before lexical scanning, you can replace those keywords with an appropriate special token that identifies the keyword. Most of the keywords appear at the start of the statement. Most of the rest appear (based on rule (3) above) immediately after that close parenthesis in an IF statement. Given DO10I1=1,3 There is an open context equal and an open context comma. The statement must begin with a keyword. DO is identified as that keyword. No other keywords can be present in the statement. 10 FORMAT(I1)=3 There is an open context equal, but rules (2) and (3) don't apply. By rule (1) the statement is an assignment. No keywords can be present. Notice this is entirely independent of whether the statement had a label. Sure, it's not permitted for a FORMAT not to be labelled. But it *is* permitted for an assignment to be labelled. So the choice can't depend on whether a label is present. DO 10 I1 = 1.3 There is an open context equal. There are no commas or parens. This is an assignment. No keywords are allowed. So, the compiler doesn't even look for one. Let's take the hardest of a previous article in the thread: if(if)read(*,*)print,integer No equal sign - so we already know the statement begins with keyword. Given that it's an IF, the first place where you have an open context letter preceeded by a close paren must also be a keyword (READ in this case). There are no other keywords, the algorithm doesn't even bother to look. Subtle yes, but not excessively so. The first published version of Sale's algorithm didn't replace the keywords, it merely identified statement type (including the "substatements" within IFs). It was about 500 lines of code and was written in F66. Given that much of the code was more complicated than necessary because it had to do character manipulation with hollerith (he was, after all, helping people that didn't yet have F77 prepare tools to support it). The proper place to do this analysis is at the same time you remove comments and stitch together continuations. There is one other case I remember off the top of my head (and I think there's yet another I don't remember, I'd have to read through the existing lierature on the problem again). This case can't be handled entirely by Sale's algorithm alone. Assuming that you are going to allow identifiers longer that six characters, what do you make of these: INTEGER FUNCTION F(N) or INTEGER FUNCTION F(5) Well, there is no equal sign in either, so the first token is the INTEGER keyword. But what to make of the rest? If it's not the first statement of a program unit, the next item *can't* be a keyword (and you don't bother to look). If it is the first statement of a program unit, the next token is FUNCTION for the first of the above, but it's still just the identifier FUNCTIONF in the second case. The decision depends on the syntactic category of the things between the parentheses. You just have to bite the bullet and special case the first line of program units (only if the first non-blank characters are a typename keyword followed by FUNCTION). Of course, if you don't allow identifiers longer than six characters, the second token must be a keyword, it's too long to be an identifier! All this stuff is from memory, so I may have made some minor errors somewhere. If I were to do it for real I'd have to review my old notes. But getting this stuff right is about this level of complexity. If Fortran really *was* hard to process correctly, then how did they do it with computers we wouldn't consider adequate to run a toaster oven? -- J. Giles "I conclude that there are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies and the other way is to make it so complicated that there are no obvious deficiencies." -- C. A. R. Hoare |