Skip to article frontmatterSkip to article content

4.2Operations on Languages

The University of Melbourne

When designing an algorithm for a problem, it is often helpful to decompose the problem into simpler sub-problems. We can do something similar when designing algorithms for languages as well. In particular, we will now consider operations on languages that allow us to combine languages, similar to how propositional connectives (¬,,\neg, \vee, \wedge) let us combine propositions.

These operations are:

  1. complement

  2. intersection

  3. union

  4. concatenation

  5. Kleene star (aka Kleene closure)

The last 3 are called regular operations.

Given an alphabet Σ\Sigma, we define Σ\Sigma^* to be the set of all strings. For example, when Σ={0,1}\Sigma = \{0,1\}, then Σ={ϵ,0,1,00,01,10,11,}\Sigma^* = \{\epsilon, 0, 1, 00, 01, 10, 11, \ldots \}. A language LL over the alphabet Σ\Sigma is a subset of Σ\Sigma^*.

4.2.1Concatenation

Given two strings xx and yy, we denote their concatenation by xyxy.

Alternative Definition Attempt 1

Let’s first try to design an algorithm that constructs ABA \circ B. To be more precise, we want an algorithm that prints the strings in ABA \circ B, one by one.

If AA and BB are finite, it is easy to do it using the following algorithm:

for x in A:
  for y in B:
    print x+y

Note that the pseudocode uses the Python notation for string concatenation.

What about when ABA \circ B is infinite (e.g. if B=ΣB = \Sigma^*)? In this case, we say that the algorithm succeeds if every string in ABA \circ B is eventually printed by the algorithm.

Taking a closer look, we realize that when BB is finite, the algorithm succeeds even when AA is infinite. Unfortunately, it is unclear at this point how to modify the algorithm to be successful if both AA and BB are infinite. We leave this as an advanced exercise for now.

Exercise 1 (Advanced exercise)

Given a language LL, an algorithm MM is said to enumerate LL if every string in LL is eventually printed by MM.

Suppose that there are algorithms MAM_A and MBM_B that enumerate AA and BB, respectively. Design an algorithm that enumerates ABA \circ B. Your algorithm should work even when both AA and BB are infinite.

Alternative Definition Attempt 2

Let’s instead try to build on cases of ABA \circ B that are easier to understand.

Suppose AA consists of a single string aa, i.e. A={a}A = \{a\}. Then, ABA \circ B is easy to describe: it is the set of strings that are obtained by appending aa with a string in BB.

We can then define ABA \circ B iteratively as follows: ABA \circ B is the union of the sets {a}B\{a\} \circ B over all aAa \in A.[1]

4.2.2Kleene star

Consider the following algorithm that, roughly speaking, generates LL^*, given a subroutine concat" that computes the concatenation of two languages:

k = 0
initialise Lstar to contain only the empty string
while(true):
  for j = 1 up to k:
    Lstar = concat(Lstar, L)

Closure

The proof follows by taking the finite automata for LL, AA and BB, and modifying them appropriately.

For complement, this is easy: let MM be the finite automaton for LL. Then, the finite automaton NN for LcL^c is obtained by swapping accept and reject states in MM. The others seem difficult. We will need the power of non-determinism to deal with them.

Footnotes
  1. If you are comfortable with set notation, AB=aA({a}B)A \circ B = \bigcup_{a \in A} (\{a\} \circ B).