Fortran parse (Robert Corbett)

Index Home About Blog

From: Robert.Corbett@Eng.Sun.COM (Robert Corbett)
Subject: Re: is lex useful?
Date: 30 Jun 1996

In article <96-06-129@comp.compilers>,
>[Yup, that's what I said.  Fortran needs a multi-pass lexer to correctly
>recognize that REAL*4HELLO doesn't contain the string constant 'ELLO'.  -John]

A poor example.  A lexer can recognize this case in a single
left-to-right scan with one character lookahead.  The sequence

    letter+
    *
    digit+

at the start of a statement can be followed only by an identifier.

A better example is

      DO10I = expr1, expr2

Since the length of expr1 is bounded only by the number of characters
allowed in a statement, either a multipass lexer or practically
unbounded lookahead are needed.

Because Fortran limits the maximum size of a statement, a lexer for
Fortran can analyze any Fortran statement in constant time.

					Sincerely,
					Bob Corbett
[Right, thanks for the correction.  In the DO10I example, note that just
looking ahead for a comma isn't sufficient.  You have to look for a comma
not enclosed in parens, which lex can't do, because REs can't count. -John]

From: Robert.Corbett@Eng.Sun.COM (Robert Corbett)
Subject: Re: Is Fortran90 LL(1)?
Date: 18 Apr 1996

>[To parse Fortran, you have to tell whether a statement is an
>assignment (or statement function) or something else.  First, if you
>accept the old 3Hfoo Hollerith constants, you strip them out, being
>careful not to be confused by REAL*4HELLO.  Then you look for an equal
>sign not protected by parentheses, and not followed by a comma which
>also must not be protected by parentheses.  If you find the equal
>sign, and no comma, it's an assignment or a statement function.  If
>not, it's something else.  Once you've decided that, the lexing and
>parsing are pretty straightforward, with the parser at each stage
>having to tell the lexer what kind of tokens to look for.  See my
>sample Fortran subset parser in the archives for an example of all
>this nonsense. -John]

There are still some tricky points in writing a Fortran grammar.
One problem is distinguishing a complex constant from an implied
DO-loop in a WRITE statement.

      WRITE (*, *) (1.0, 0.0)

and

      WRITE (*, *) (1.0, I = 1, 10)

look quite similar up to the second comma.  A common way of dealing
with this case is to relax the restriction that the first part of a
complex constant must be a real or integer constant.  Semantic routines
can report the error later if desired.

					Sincerely,
					Bob Corbett
[Yeah, in my compiler I allowed (exp,exp) as a general complex constructor,
and enforced the constant restriction semantically in places where I had to,
e.g. data statements. -John]

Index Home About Blog