Skip to article frontmatterSkip to article content

5.1Context-Free Grammars

The University of Melbourne

Previously, we have seen how to generate a language using regular expressions. CFG is a different way of generating languages but is more powerful in the sense that every language that can be generated by a regular expression can also be generated by a CFG, but there are languages that can be generated by CFGs and not by regular expressions, namely {0n1nn0}\{0^n1^n \mid n \geq 0\}.

CFGs are widely used in Computer Science. For example, they are used to define the syntax of programming languages and markup languages (e.g. XML and JSON).

In this subject, we have actually used CFGs to define regular expressions. Well-formed formulas can also be defined using CFGs.

5.1.1Definition of CFG

Think of production rules as substitution rules. In particular, a production rule AwA \rightarrow w tells us that we can substitute AA with ww. We remark that CFGs allow recursion since the right-hand side of a rule can contain the variable that is on the left-hand side of the rule. In particular, the variable AA can also occur in the string ww. This is evident in the upcoming example.

Notational shorthand

Instead of listing each rule separately, we will group together rules with the same left-hand side in a single line separated by vertical bars.

We will also sometimes specify a grammar by specifying only its production rules.

5.1.2Generating strings and languages

Given a grammar GG, we can generate strings by starting with the start variable and iteratively replacing variables using one of the production rules[fn::Indeed, production rules are sometimes called substitution rules.} until we end up with a string that is only terminals:

  1. Initialize output string x=Sx = S, i.e. the output string xx consists of the start variable SS

  2. While the string xx contains at least one variable, apply one of the rules in RR to replace a single occurrence of some variable. In particular, applying the rule AwA \rightarrow w to an occurrence of AA replaces it with the string ww

  3. The string at the end is the generated string

By making different choices of which production rule to apply and which occurrence of the variable to apply it to, we can generate different strings.

Terminology and notation

Let x,y,zx,y,z be strings in (ΣV)(\Sigma \cup V)^*, i.e. strings of terminals and variables, and AA be a variable in VV. We say that applying the rule AyA \rightarrow y to the string xAzxAz yields xyzxyz, and use the notation xAzxyzxAz \Rightarrow xyz.

We refer to strings in (ΣV)(\Sigma \cup V)^* as sentential forms and strings in Σ\Sigma^* as sentences. Note that sentences are sentential forms without any variables.

5.1.3Parse trees

Parse trees let us visualize derivations of strings, and more importantly, represent the syntactic structure of a string. For example, parse trees tell us the order of operations of arithmetic expressions.

Ambiguity

Ambiguity in a grammar arises when there is more than one way to parse a string.[4]

5.1.4Closure Properties

Just as for regular languages, we are interested in whether CFLs are closed under the operations on languages that we have seen before, as these operations let us define new languages. The closure properties will be useful later on for proving that regular expressions are context-free (Theorem 2).

Previously, we proved closure of regular languages under these operations by considering finite automata. While there is an equivalent automata model to CFGs, called pushdown automata, it is easy to prove these closure properties using CFGs.

5.1.5Regular Languages are Context-Free

Footnotes
  1. This is the alphabet of the CFG.

  2. Similar to the state transition function of a finite automaton.

  3. Similar to the initial state of a finite automaton.

  4. See this Wikipedia article for a classic, humorous example of linguistic ambiguity.