Previously, we have seen how to generate a language using regular
expressions. CFG is a different way of generating languages but is more
powerful in the sense that every language that can be generated by a
regular expression can also be generated by a CFG, but there are
languages that can be generated by CFGs and not by regular expressions,
namely {0n1n∣n≥0}.
CFGs are widely used in Computer Science. For example, they are used to
define the syntax of programming languages and markup languages (e.g.
XML and JSON).
In this subject, we have actually used CFGs to define regular
expressions. Well-formed formulas can also be defined using CFGs.
Think of production rules as substitution rules. In particular, a
production rule A→w tells us that we can substitute A
with w. We remark that CFGs allow recursion since the right-hand side
of a rule can contain the variable that is on the left-hand side of the
rule. In particular, the variable A can also occur in the string w.
This is evident in the upcoming example.
Given a grammar G, we can generate strings by starting with the start
variable and iteratively replacing variables using one of the production
rules[fn::Indeed, production rules are sometimes called substitution
rules.} until we end up with a string that is only terminals:
Initialize output string x=S, i.e. the output string x
consists of the start variable S
While the string x contains at least one variable, apply one of
the rules in R to replace a single occurrence of some
variable. In particular, applying the rule A→w to an
occurrence of A replaces it with the string w
The string at the end is the generated string
By making different choices of which production rule to apply and which
occurrence of the variable to apply it to, we can generate different
strings.
Let x,y,z be strings in (Σ∪V)∗, i.e. strings of terminals
and variables, and A be a variable in V. We say that applying the
rule A→y to the string xAzyieldsxyz, and use the
notation xAz⇒xyz.
We refer to strings in (Σ∪V)∗ as sentential forms and
strings in Σ∗ as sentences. Note that sentences are sentential
forms without any variables.
Parse trees let us visualize derivations of strings, and more
importantly, represent the syntactic structure of a string. For example,
parse trees tell us the order of operations of arithmetic expressions.
Just as for regular languages, we are interested in whether CFLs are
closed under the operations on languages that we have seen before, as
these operations let us define new languages. The closure properties
will be useful later on for proving that regular expressions are
context-free (Theorem 2).
Previously, we proved closure of regular languages under these
operations by considering finite automata. While there is an equivalent
automata model to CFGs, called pushdown automata, it is easy to prove
these closure properties using CFGs.