Can this language be described by a non-ambiguous BNF grammar?

It’s not always easy (or even possible) to demonstrate that a grammar is ambiguous, but if there is a short ambiguous sentence, then it can be found with brute-force enumeration, which is what I believe that tool does. And the output is revealing; the shortest ambiguous sentence is the empty string.

So it remains only to figure out why the empty string can be parsed in two (or more) ways, and a quick look at the productions reveals that FinalStmtSemiOpt has two nullable productions, which means that it has two ways of deriving the empty string. That’s evident by inspection, if you believe that every production whose name ends with Opt in fact describes an optional sequence, since FinalStmtSemiOpt has two productions each of which consist only of XOpts. A little more effort is required to verify that the optional non-terminals are, in fact, optional, which is as it should be.

So the solution is to rework FinalStmtSemiOpt without using two nullable right-hand sides. That shouldn’t be too hard.

Almost all the other questions you raise are answerable by inspection:

  • A grammar is context-free (by definition) iff every production has precisely one symbol on the left hand side. (If it had more than one symbol, then the extra symbols define a context in which the substitution is valid. If no production has a context, the grammar is context-free.)

  • A non-terminal is left-recursive if some production for the non-terminal either starts with the non-terminal itself (direct left-recursion), or starts with a sequence of nullable non-terminals followed by the non-terminal itself (indirect left-recursion). In other words, the non-terminal is or could be the left-most symbol in a production. Similarly, a non-terminal is right-recursive if some production ends with the non-terminal (again, possibly followed by nullable non-terminals).

  • There is no algorithm which can tell you in general if a language is inherently ambiguous, but in the particular case that a language is regular, then it is definitely not inherently ambiguous. A language with a regular grammar is regular, but a non-regular grammar could still describe a regular language. Your grammar is not regular, but it can be made regular without too much work (mostly refactoring). A hint that it is probably regular is that there is no recursively nested syntax (eg. parentheses or statement blocks). Most useful grammars are recursively nested, and nesting in no way implies ambiguity. So that doesn’t take you too far, but it happens to be the case here.

Leave a Comment