String matching based on repetition factors

Let yidenote the concatenation of string y with itself i times. For example, (ab)3= ababab. We say that a string x ∈Σ* has repetition factor r if x = yr for some string y ∈Σ* and some r > 0. Let p(x) denote the largest r such that x has repetition factor r.

a. Give an efficient algorithm that takes as input a pattern P [1..m]and computes the value ρ(Pi) for i = 1,2,…,m. What is the running time of your algorithm?

b. For any pattern P [1..m], let (P*) be defined as max1≤i≤m ρ(pi). Prove that if the pattern P is chosen randomly from the set of all binary strings of length m, then the expected value of ρ(P) is O(1).

c. Argue that the following string-matching algorithm correctly finds all occurrences of pattern P in a text T[1..n] in time O(ρ*(P) n + m):


1 m = P. length

2 n = T. length

3 k = 1 + ρ*(P)

4 q = 0

5 s = 0

6 while s ≤ n – m

7 if T [s + q + 1] = = P [q + 1]

8 q = q + 1

9 if q = = m

10 print “Pattern occurs with shift” s

11 if q = = m or T [s + q + 1] ≠ P[q + 1]

12 s = s C max(1, [q/k])

13 q = 0

This algorithm is due to Galil and Seiferas. By extending these ideas greatly, they obtained a linear-time string-matching algorithm that uses only O(1) storage beyond what is required for P and T .