Discrete Structures for Computing
Notes 11
------------------------------------------------------------------------
Chapter 12: Modeling Computation
------------------------------------------------------------------------
Computation can be modeled with a variety of formalisms.
So far in this course, we have not gone into much detail and have
used pseudocode and relied on our intuition for how programs work.
But sometimes this approach is not sufficiently rigorous.
It is important to answer questions such as:
* is it possible to solve a given problem with a computer (i.e.,
with an algorithm)?
* what is the complexity of solving a problem?
This chapter presents a few of the most common, and useful,
formal models of computing:
* phrase-structured grammars
* finite state machines
* Turing machines
A common theme in the study of models of computation is how POWERFUL
a particular model is. The power refers to how large is the class of
problems that can be solved in a given model. Ultimately, we would
like to have a model that is as powerful as the computers that we
build and use every day. But sometimes it is useful to have other,
weaker, models of computation. The course CPSC 433 goes into all this
in much more detail.
------------------------------------------------------------------------
12.1: Languages and Grammars
------------------------------------------------------------------------
DEF: A PHRASE-STRUCTURE GRAMMAR G = (V, T, S, P) has
* set V of SYMBOLS
* subset T of V called TERMINALS
* distinguished nonterminal element S in V (called the START SYMBOL)
* set P of PRODUCTIONS (or RULES)
A production has the form alpha -> beta, where alpha and beta are both
strings of symbols, and alpha contains at least one nonterminal
The idea is that we start with S, and then iteratively use the
productions to transform the current string of symbols into a new
string of symbols. If we ever get to a string that has only
terminals, then this procedure (called a DERIVATION) terminates.
The set of all terminal strings that can be derived from a grammar G
is called the LANGUAGE of G, denoted L(G).
Different kinds of grammars put different restrictions on what the
productions can look like.
DEF: In a REGULAR grammar, all productions are of the form
* S -> empty string or
* A -> aB, where A and B are nonterminals and a is a terminal
EX: Consider the grammar G = (V,T,S,P) where
* V = {S,A,0,1}
* T = {0,1}
* S is the start symbol
* P has the rules (explain the shorthand)
S -> 0S | 1A | 1 | empty-string
A -> 1A | 1
What kinds of terminal strings does this generate?
<<< do some derivations >>>
Every terminal string generated consists of some number of 0's followed
by some number of 1's. I.e., L(G) = {0^m 1^n : m and n are nonneg ints}.
This statement can be proved by induction on the length of the derivation.
Applications of regular grammars include:
* algorithms to search text for certain patterns
* part of a compiler that transforms an input stream (i.e., the characters
of the input program) into a stream of tokens (i.e., groups characters
together into entities that have more meaning, such as "variable" or
"signed integer") for
use by next stage of the compiler (parser)
Another notation for regular grammars is called BACKUS-NAUR FORM (BNF):
EX: BNF for signed integers in decimal notation:
::=
::= + | -
::= |
::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
BNF is used extensively to describe the syntax of programming languages
(Java, LISP), database languages (SQL), markup languages (XML), etc.
------------------------------------------------------------------------
12.3: Finite-State Machines with No Output
------------------------------------------------------------------------
Finite state machines (aka finite automata) are used to model
processes in which there are a fixed number of different states, and we
can transition from one state to another based on the input characters
that are fed to the process. Certain states are designated as
"accepting" states: if the machine ends in an accepting state once it
has consumed its entire input, then we say the input string has been
accepted.
Finite state machines are the basis of algorithms for
* spell checking
* grammar checking (in compilers)
* indexing or searching texts
* speech recognition
* formatting text with markup languages such as HTML and XML
* network communication protocols
DEF: A finite automaton M = (S,I,f,s_0,F) consists of
* finite set of states S
* finite input alphabet I
* transition function f : S x I -> S (takes current state and current
input and produces next state)
* start state s_0 (drawn from S)
* subset F of S (the final, or accepting, states)
Easiest way to describe a finite automaton is with a state transition
diagram: a directed graph in which each vertex is a state and there is
an edge from vertex a to vertex b labeled with x if f(a,x) = b, i.e.,
there is a transition from state a to state b when the current input
character is x.
<<< Fig 1, p. 806 >>>
DEF: A string x is ACCEPTED by M if, starting with S, the computation
of M on x ends in an accepting state. The set of all strings accepted
by M is the LANGUAGE of M, L(M).
EX: Describe the languages accepted by these finite automata:
<<< Fig 2, p. 807 >>>
EX: Construct finite automata for these languages:
(1) set of bit strings that begin with two 0's
(2) set of bit strings that contain two consecutive 0's
(3) set of bit strings that do not contain two consecutive 0's
(4) set of bit strings that end with two 0's
(5) set of bit strings that contain at least two 0's
Construct a finite automaton for the set of all strings over {a,b}
that contain abba.
<<< uses "backtracking" transitions >>>
The type of finite state machine defined so far is DETERMINISTIC:
given the current state and the next input symbol, there is only one
possible next state to go to, according to the transition function.
We can also consider a variation on the finite state machine called
NONDETERMINISTIC: The transition function provides a SET of possible
next states, for a given current state and input symbol.
The notion of a string being accepted is **different** for a
nondeterministic finite state machine: a string is accepted if there
exists a computation that ends in an accepting state.
Even if there are other computations that don't accept, as long as
there is at least one that does accept, the string is considered to
be accepted.
EX: <<< Fig 6, p. 812 >>>
--------
It may appear that nondeterministic FAs can accept more languages
than can deterministic FAs (for instance, there is a looser notion of
string acceptance). However, it turns out that this is not the case:
THEOREM: For every nondeterministic FA, there is an "equivalent"
deterministic FA (accepts the same language).
PROOF IDEA: Given an arbitrary nondeterministic FA M, we construct
a deterministic FA M' that accepts the same language. The state set
for M' is the powerset of the state set of M: that is, there is
a single state of M' for every subset of states of M. Any state of M'
that contains an accepting state of M is considered to be accepting
for M'. Transitions are constructed among the states of M' in such
a way as to ensure that every accepting computation of M is mimicked
by an accepting computation of M'.
QED
Note that the equivalent deterministic machine may have many more
states than the original nondeterministic machine.
Virtue of nondeterministic machines is that they are often simpler to
come up with. (Cf. the "abba" language from before, no need for the
"backtracking" transitions.) Then if you need to be deterministic,
you can run the conversion algorithm (more mechanical).
------------------------------------------------------------------------
12.4: Language Recognition
------------------------------------------------------------------------
It turns out that there is an interesting connection between the set of
languages that are generated by regular grammars and the set of languages
accepted by finite state machines. Namely, they are the same.
Furthermore, the set of languages represented by REGULAR EXPRESSIONS are
also the same.
Regular expessions are a notation for specifying sets of strings that can be
created by using concatenation, union, and a special "closure" operation,
called Kleene closure, starting with certain base objects.
DEF: KLEENE CLOSURE of a set A, denoted A*, is U_{k=0}^infty A^k.
I.e., it is the set consisting of concatenations of arbitrarily many
strings from A.
EX: Suppose A = {0,1}.
A^0 = {lambda} (set containing a single string, the empty string)
A^1 = A = {0,1}
A^2 = AA, the concatenation of A and A, which is {00,01,10,11}
A^3 = {000,001,010,011,100,101,110,111}
...
So A* is the set of all binary strings of any length.
EX: Suppose A = {ab,bc}
A^0 = {lambda}
A^1 = A = {ab,bc}
A^2 = AA = {abab, abbc, bcab, bcbc}
A^3 = AAA = {ababab, ababbc, abbcab, abbcbc,...}
...
Regular expressions are formally defined recursively:
DEF: Let I be a set.
Basis:
* emptyset-bold is a regular expression, denoting the empty set
* lambda-bold is a regular expression, denoting the set {lambda}
(lambda is the empty string)
* x-bold is a regular expression for each x in I, denoting the set {x}
Recursive: Suppose A and B are regular expressions.
Then so are
* AB, denoting the concatenation of the sets represented by A and B
* A U B, denoting the union of the sets represented by A and B
* A*, denoting the Kleene closure of A
DEF: Sets represented by regular expressions are called REGULAR SETS.
EX:
* 10* is the set of all strings that start with a 1 which is followed
by zero or more 0's
* (10)* is set of all strings consisting of zero or more copies of "10"
* 0 U 01 is the set consisting of the string 0 and the string 01
* (0*1)* is the set consisting of all binary strings that do not end with 0
EX: Find regular expressions for:
* set of bit strings with even length
((1 U 0)(1 U 0))* or (00 U 01 U 10 U 11)*
* set of bit strings that end with 0 and do not contain 11
(0 U 10)* (0 U 10) (second term is there to exclude the empty string)
* set of bit strings that contain an odd number of 0's
1*(01*01*)*01*
KLEENE'S THEOREM: A set is regular (represented by a regular expression)
iff it is recognized by a finite state machine.
PROOF SKETCH:
(1) Show that any regular set is accepted by a finite state machine.
Strategy: By definition, every regular set is represented by a regular
expression. Show how to convert any regular expression into an "equivalent"
finite state machine. Construction is recursive, to match the
recursive definition of regular expressions:
(2) Show that the language accepted by any finite state machine
is a regular set.
Strategy: Look at the state transition diagram of the finite state machine
and go through a process of "condensing" it and changing the labels on the
edges until getting something small enough to where the equivalent
regular expression can be extracted from the edge labels.
QED