Section 2.3: Problem 1 Solution »

# Section 2.3: A Parsing Algorithm

The uniqueness of decomposition of terms and wffs is needed for arguments and definitions using recursion.

## Terms

For a string of variable, constant and function symbols we define a function $K$ as follows:
• $K(x)=1$ ($x$ is a term by itself, so its appearance in the string adds one complete term).
• $K(c)=1$ (same for constants).
• $K(f)=1-n$ where $f$ is an $n$ -place function symbol (a function symbol implies that there will be a complete term, $+1$ , once all $n$ terms, arguments of $f$ , are found in the string, $-n$ ).
• $K(s_{1}\ldots s_{n})=K(s_{1})+\ldots+K(s_{n})$ .
For a term $t$ , $K(t)=1$ . For any terminal segment $t_{1}$ of a term $t$ , $t_{1}$ is a concatenation of term, in particular, $K(t_{1})\ge1$ . For any proper initial segment $t_{0}$ of a term $t$ , $K(t_{0})\le0$ , in particular, $t_{0}$ is not a term.

### Parsing Algorithm

The following parsing algorithm for terms ensures that a string $s$ of symbols is parsed to a tree iff $s$ corresponds to a term, and in this case the tree is uniquely determined and represents the term.
1. Form a tree with the only vertex being the expression $s$ .
2. If all leaves of the tree are single symbol strings, then if they all are variable or constant symbols, stop(+), otherwise, stop(-). Otherwise select the left-most leaf $s'$ having at least two symbols.
3. The first symbol of $s'$ must be an $n$ -place function symbol $f$ , otherwise stop(-).
4. We form $n$ leaves below $s'$ . Scan $s'$ after the symbol $f$ until the initial segment $t$ such that $K(t)=1$ is found. Put it the leftmost leaf. Proceed with the remainder of the string the same way for all other leaves below $s'$ . If the string ended before all terms for $n$ leaves are found, or if the last term is found before the end of the string $s'$ , stop(-). Go to step 2.
stop(+) the tree represents the term corresponding to the initial expression
stop(-) the initial expression does not correspond to any term
• The algorithm is finite
• The algorithm attempts to construct the tree in the only possible way, as the choice of each term $t$ in step 4 is determined by the two facts: first, $K(t)$ must equal $1$ (and for every shorter string it was not), and, second, for any proper initial substring $t_{0}$ of a term $K(t_{0})<1$ , hence, anything longer than $t$ would not be a term.
• Hence, the tree is unique.
• If the algorithm fails, then $s$ does not represent a term, as the only possible way to try to understand (parse) it fails.
• If the algorithm stops in step 2, then the resulting tree represents the only possible term corresponding to $s$ .
In terms of Section 1.4, the uniqueness can be rephrased as follows.
(Unique Readability for Terms) The set of terms is freely generated from the set of variable and constant symbols by the $\mathcal{F}_{f}$ operations.

## Formulas

We can further define $K$ for other symbols:
• $K(()=-1$ .
• $K())=1$ .
• $K(\forall)=-1$ .
• $K(\neg)=0$ .
• $K(\rightarrow)=-1$ .
• $K(P)=1-n$ where $P$ is an $n$ -place predicate symbol.
• $K(=)=-1$ .
For a wff $\alpha$ , $K(\alpha)=1$ . For any proper initial segment $\alpha_{0}$ of a wff $\alpha$ , $K(\alpha_{0})\le0$ , in particular, $\alpha_{0}$ is not a wff.

### Parsing Algorithm

The following parsing algorithm ensures that a string $s$ of symbols is parsed to a tree iff $s$ corresponds to a wffs, and in this case the tree is uniquely determined and represents the wff.
1. Form a tree with the only vertex being the expression $s$ .
2. If all leaves of the tree are atomic formulas (an $n$ -place predicate symbol followed by $n$ terms), stop(+). Otherwise select the left-most leaf $s'$ that is not an atomic formula.
3. The first symbol(s) of $s'$ must be either $($ or $\forall v_{i}$ , followed by other symbols, if not, stop(-). If $s'$ is $\forall v_{i}s''$ , form one leaf below it with $s''$ , and go to step 2.
4. If the second symbol of $s'$ is $\neg$ , then there should be at least four symbols in $s'$ with the last symbol being $)$ , i.e. $s'=(\neg\mbox{s''})$ (otherwise stop(-)), form one leaf below $s'$ with $s''$ , and go to step 2.
5. Scan $s'$ from left until first reaching $(s''$ where $s''$ is a non-empty string having either $K(s'')=1$ or a balance of $($ and $)$ (both methods work!), if no such $s''$ found, then stop(-). There should be a binary connective symbol ($\rightarrow$ ) after $s''$ , followed by a non-empty $s'''$ and $)$ , otherwise stop(-). We form two leaves, the left one $s''$ and the right one $s'''$ , below $s'$ . Go to step 2.
stop(+) the tree represents the wff corresponding to the initial expression
stop(-) the initial expression does not correspond to any wff
• The algorithm is finite
• The algorithm attempts to construct the tree in the only possible way, as the choice of $s''$ in step 5 is determined by the two facts: first, $s''$ must have $K(s'')=1$ or be balanced (and everything less than $s''$ was not), and, second, any proper initial subexpression of a wff has $K$ less than 1 or unequal number of left and right parentheses, hence, anything longer than $s''$ would not be a proper wff expression.
• Hence, the tree is unique.
• If the algorithm fails, then $s$ does not represent a wff, as the only possible way to try to understand (parse) it fails.
• If the algorithm stops in step 2, then the resulting tree represents the only possible wff corresponding to $s$ .
In terms of Section 1.4, the uniqueness can be rephrased as follows.
(Unique Readability for Formulas) The set of wffs is freely generated from the set of atomic formulas by the operations $\mathcal{E}_{\neg}$ , $\mathcal{E}_{\rightarrow}$ and $\mathcal{Q}_{i}$ , $i=1,2,\ldots$ .