next up previous contents
Next: Linear Parsing of LL(k) Up: Grammars Previous: Building Parse Trees

Computing First Sets of Grammars

The previous examples show that XSB can process grammars efficiently, using them to determine language membership and the structure of strings. But XSB can also do other kinds of grammar processing. In the following, we use XSB to compute the FIRST sets of a grammar.

$FIRST(\alpha)$, for any string of terminals and nonterminals $\alpha$, is defined to be the set of terminals that begin strings derived from $\alpha$, and if $\alpha$ derives the empty string then the empty string is also in $FIRST(\alpha)$. $FIRST_k(\alpha)$ is the generalization to length k strings of terminals that are prefixes of strings derived from $\alpha$.

We will assume that a grammar is stored in a predicate ==>/2, with the head of a rule as an atomic symbol and the body of a rule as a list of symbols. Nonterminals are assumed to be those symbols for which there is at least one rule with it as its head. (==>/2 is declared as an infix operator.)

The predicate first(SF,K,L) is true if the list SF of grammar symbols derives a string whose first K terminal symbols are L.

% The definition of FIRST:
% first(SentForm,K,FirstList) computes firsts for a context-free grammar.
:- table first/3.
first(_,0,[]).
first([],K,[]) :- K>0.
first([S|R],K,L) :- K>0,
    (S ==> B),
    first(B,K,L1),
    length(L1,K1),
    Kr is K - K1,
    first(R,Kr,L2),
    append(L1,L2,L).
first([S|R],K,[S|L]) :- K>0,
    \+ (S ==> _),    % S is a terminal
    K1 is K-1,
    first(R,K1,L).

The first rule says that the empty string is in $FIRST_0(\alpha)$ for any $\alpha$. The second rule says that the empty string is in $FIRST_k(\alpha)$ for $\alpha$ being the empty sequence of grammar symbols. The third rule handles the case in which the sequence of grammar rules begins with a nonterminal, S. It takes a rule beginning with S and gets a string L1 (of length K1 $\leq$ K) generated by the body of that rule. It gets the remaining K-K1 symbols from the rest of the list of input grammar symbols. And the fourth rule handles terminal symbols.

This is a relatively simple declarative (and constructive) definition of FIRSTk. But without the table declaration it would not run for many grammars. Any left-recursive grammar would cause this definition to loop in Prolog.


next up previous contents
Next: Linear Parsing of LL(k) Up: Grammars Previous: Building Parse Trees
David S. Warren
1999-07-31