(**EDITED** to include a deeper discussion, better code and more benchmarks)

# Summary

For raw speed and efficiency, one can use a Cython or Numba accelerated version (when the input is a Python sequence or a NumPy array, respectively) of one of the classical algorithms.

The recommended approaches are:

`find_kmp_cy()`

for Python sequences (`list`

, `tuple`

, etc.)
`find_kmp_nb()`

for NumPy arrays

Other efficient approaches, are `find_rk_cy()`

and `find_rk_nb()`

which, are more memory efficient but are not guaranteed to run in linear time.

If Cython / Numba are not available, again both `find_kmp()`

and `find_rk()`

are a good all-around solution for most use cases, although in the average case and for Python sequences, the naïve approach, in some form, notably `find_pivot()`

, may be faster. For NumPy arrays, `find_conv()`

(from @Jaime answer) outperforms any non-accelerated naïve approach.

(Full code is below, and here and there.)

# Theory

This is a classical problem in computer science that goes by the name of string-searching or string matching problem.
The naive approach, based on two nested loops, has a computational complexity of `O(n + m)`

on average, but worst case is `O(n m)`

.
Over the years, a number of alternative approaches have been developed which guarantee a better worst case performances.

Of the classical algorithms, the ones that can be best suited to generic sequences (since they do not rely on an alphabet) are:

This last algorithm relies on the computation of a rolling hash for its efficiency and therefore may require some additional knowledge of the input for optimal performance.
Eventually, it is best suited for homogeneous data, like for example numeric arrays.
A notable example of numeric arrays in Python is, of course, NumPy arrays.

# Remarks

- The naïve algorithm, by being so simple, lends itself to different implementations with various degrees of run-time speed in Python.
- The other algorithms are less flexible in what can be optimized via language tricks.
- Explicit looping in Python may be a speed bottleneck and several tricks can be used to perform the looping outside of the interpreter.
- Cython is especially good at speeding up explicit loops for generic Python code.
- Numba is especially good at speeding up explicit loops on NumPy arrays.
- This is an excellent use-case for generators, so all the code will be using those instead of regular functions.

# Python Sequences (`list`

, `tuple`

, etc.)

## Based on the Naïve Algorithm

`find_loop()`

, `find_loop_cy()`

and `find_loop_nb()`

which are the explicit-loop only implementation in pure Python, Cython and with Numba JITing respectively. Note the `forceobj=True`

in the Numba version, which is required because we are using Python object inputs.

```
def find_loop(seq, subseq):
n = len(seq)
m = len(subseq)
for i in range(n - m + 1):
found = True
for j in range(m):
if seq[i + j] != subseq[j]:
found = False
break
if found:
yield i
```

```
%%cython -c-O3 -c-march=native -a
#cython: language_level=3, boundscheck=False, wraparound=False, initializedcheck=False, cdivision=True, infer_types=True
def find_loop_cy(seq, subseq):
cdef Py_ssize_t n = len(seq)
cdef Py_ssize_t m = len(subseq)
for i in range(n - m + 1):
found = True
for j in range(m):
if seq[i + j] != subseq[j]:
found = False
break
if found:
yield i
```

```
find_loop_nb = nb.jit(find_loop, forceobj=True)
find_loop_nb.__name__ = 'find_loop_nb'
```

`find_all()`

replaces the inner loop with `all()`

on a comprehension generator

```
def find_all(seq, subseq):
n = len(seq)
m = len(subseq)
for i in range(n - m + 1):
if all(seq[i + j] == subseq[j] for j in range(m)):
yield i
```

`find_slice()`

replaces the inner loop with direct comparison `==`

after slicing `[]`

```
def find_slice(seq, subseq):
n = len(seq)
m = len(subseq)
for i in range(n - m + 1):
if seq[i:i + m] == subseq:
yield i
```

`find_mix()`

and `find_mix2()`

replaces the inner loop with direct comparison `==`

after slicing `[]`

but includes one or two additional short-circuiting on the first (and last) character which may be faster because slicing with an `int`

is much faster than slicing with a `slice()`

.

```
def find_mix(seq, subseq):
n = len(seq)
m = len(subseq)
for i in range(n - m + 1):
if seq[i] == subseq[0] and seq[i:i + m] == subseq:
yield i
```

```
def find_mix2(seq, subseq):
n = len(seq)
m = len(subseq)
for i in range(n - m + 1):
if seq[i] == subseq[0] and seq[i + m - 1] == subseq[m - 1] \
and seq[i:i + m] == subseq:
yield i
```

`find_pivot()`

and `find_pivot2()`

replace the outer loop with multiple `.index()`

call using the first item of the sub-sequence, while using slicing for the inner loop, eventually with additional short-circuiting on the last item (the first matches by construction). The multiple `.index()`

calls are wrapped in a `index_all()`

generator (which may be useful on its own).

```
def index_all(seq, item, start=0, stop=-1):
try:
n = len(seq)
if n > 0:
start %= n
stop %= n
i = start
while True:
i = seq.index(item, i)
if i <= stop:
yield i
i += 1
else:
return
else:
return
except ValueError:
pass
def find_pivot(seq, subseq):
n = len(seq)
m = len(subseq)
if m > n:
return
for i in index_all(seq, subseq[0], 0, n - m):
if seq[i:i + m] == subseq:
yield i
```

```
def find_pivot2(seq, subseq):
n = len(seq)
m = len(subseq)
if m > n:
return
for i in index_all(seq, subseq[0], 0, n - m):
if seq[i + m - 1] == subseq[m - 1] and seq[i:i + m] == subseq:
yield i
```

## Based on Knuth–Morris–Pratt (KMP) Algorithm

`find_kmp()`

is a plain Python implementation of the algorithm. Since there is no simple looping or places where one could use slicing with a `slice()`

, there is not much to be done for optimization, except using Cython (Numba would require again `forceobj=True`

which would lead to slow code).

```
def find_kmp(seq, subseq):
n = len(seq)
m = len(subseq)
# : compute offsets
offsets = [0] * m
j = 1
k = 0
while j < m:
if subseq[j] == subseq[k]:
k += 1
offsets[j] = k
j += 1
else:
if k != 0:
k = offsets[k - 1]
else:
offsets[j] = 0
j += 1
# : find matches
i = j = 0
while i < n:
if seq[i] == subseq[j]:
i += 1
j += 1
if j == m:
yield i - j
j = offsets[j - 1]
elif i < n and seq[i] != subseq[j]:
if j != 0:
j = offsets[j - 1]
else:
i += 1
```

`find_kmp_cy()`

is Cython implementation of the algorithm where the indices use C int data type, which result in much faster code.

```
%%cython -c-O3 -c-march=native -a
#cython: language_level=3, boundscheck=False, wraparound=False, initializedcheck=False, cdivision=True, infer_types=True
def find_kmp_cy(seq, subseq):
cdef Py_ssize_t n = len(seq)
cdef Py_ssize_t m = len(subseq)
# : compute offsets
offsets = [0] * m
cdef Py_ssize_t j = 1
cdef Py_ssize_t k = 0
while j < m:
if subseq[j] == subseq[k]:
k += 1
offsets[j] = k
j += 1
else:
if k != 0:
k = offsets[k - 1]
else:
offsets[j] = 0
j += 1
# : find matches
cdef Py_ssize_t i = 0
j = 0
while i < n:
if seq[i] == subseq[j]:
i += 1
j += 1
if j == m:
yield i - j
j = offsets[j - 1]
elif i < n and seq[i] != subseq[j]:
if j != 0:
j = offsets[j - 1]
else:
i += 1
```

## Based on Rabin-Karp (RK) Algorithm

`find_rk()`

is a pure Python implementation, which relies on Python's `hash()`

for the computation (and comparison) of the hash. Such hash is made rolling by mean of a simple `sum()`

. The roll-over is then computed from the previous hash by subtracting the result of `hash()`

on the just visited item `seq[i - 1]`

and adding up the result of `hash()`

on the newly considered item `seq[i + m - 1]`

.

```
def find_rk(seq, subseq):
n = len(seq)
m = len(subseq)
if seq[:m] == subseq:
yield 0
hash_subseq = sum(hash(x) for x in subseq) # compute hash
curr_hash = sum(hash(x) for x in seq[:m]) # compute hash
for i in range(1, n - m + 1):
curr_hash += hash(seq[i + m - 1]) - hash(seq[i - 1]) # update hash
if hash_subseq == curr_hash and seq[i:i + m] == subseq:
yield i
```

`find_rk_cy()`

is Cython implementation of the algorithm where the indices use the appropriate C data type, which results in much faster code. Note that `hash()`

truncates "the return value based on the bit width of the host machine."

```
%%cython -c-O3 -c-march=native -a
#cython: language_level=3, boundscheck=False, wraparound=False, initializedcheck=False, cdivision=True, infer_types=True
def find_rk_cy(seq, subseq):
cdef Py_ssize_t n = len(seq)
cdef Py_ssize_t m = len(subseq)
if seq[:m] == subseq:
yield 0
cdef Py_ssize_t hash_subseq = sum(hash(x) for x in subseq) # compute hash
cdef Py_ssize_t curr_hash = sum(hash(x) for x in seq[:m]) # compute hash
cdef Py_ssize_t old_item, new_item
for i in range(1, n - m + 1):
old_item = hash(seq[i - 1])
new_item = hash(seq[i + m - 1])
curr_hash += new_item - old_item # update hash
if hash_subseq == curr_hash and seq[i:i + m] == subseq:
yield i
```

## Benchmarks

The above functions are evaluated on two inputs:

```
def gen_input(n, k=2):
return tuple(random.randint(0, k - 1) for _ in range(n))
```

- (almost) worst inputs for the naïve algorithm

```
def gen_input_worst(n, k=-2):
result = [0] * n
result[k] = 1
return tuple(result)
```

The `subseq`

has fixed size (`32`

).
Since there are so many alternatives, two separate grouping have been done and some solutions with very small variations and almost identical timings have been omitted (i.e. `find_mix2()`

and `find_pivot2()`

).
For each group both inputs are tested.
For each benchmark the full plot and a zoom on the fastest approach is provided.

### Naïve on Random

### Naïve on Worst

### Other on Random

### Other on Worst

(Full code is available here.)

# NumPy Arrays

## Based on the Naïve Algorithm

`find_loop()`

, `find_loop_cy()`

and `find_loop_nb()`

which are the explicit-loop only implementation in pure Python, Cython and with Numba JITing respectively. The code for the first two are the same as above and hence omitted. `find_loop_nb()`

now enjoys fast JIT compilation. The inner loop has been written in a separate function because it can then be reused for `find_rk_nb()`

(calling Numba functions inside Numba functions does not incur in the function call penalty typical of Python).

```
@nb.jit
def _is_equal_nb(seq, subseq, m, i):
for j in range(m):
if seq[i + j] != subseq[j]:
return False
return True
@nb.jit
def find_loop_nb(seq, subseq):
n = len(seq)
m = len(subseq)
for i in range(n - m + 1):
if _is_equal_nb(seq, subseq, m, i):
yield i
```

`find_all()`

is the same as above, while `find_slice()`

, `find_mix()`

and `find_mix2()`

are almost identical to the above, the only difference is that `seq[i:i + m] == subseq`

is now the argument of `np.all()`

: `np.all(seq[i:i + m] == subseq)`

.

`find_pivot()`

and `find_pivot2()`

share the same ideas as above, except that now uses `np.where()`

instead of `index_all()`

and the need for enclosing the array equality inside an `np.all()`

call.

```
def find_pivot(seq, subseq):
n = len(seq)
m = len(subseq)
if m > n:
return
max_i = n - m
for i in np.where(seq == subseq[0])[0]:
if i > max_i:
return
elif np.all(seq[i:i + m] == subseq):
yield i
def find_pivot2(seq, subseq):
n = len(seq)
m = len(subseq)
if m > n:
return
max_i = n - m
for i in np.where(seq == subseq[0])[0]:
if i > max_i:
return
elif seq[i + m - 1] == subseq[m - 1] \
and np.all(seq[i:i + m] == subseq):
yield i
```

`find_rolling()`

express the looping via a rolling window and the matching is checked with `np.all()`

. This vectorizes all the looping at the expenses of creating large temporary objects, while still substantially appling the naïve algorithm. (The approach is from @senderle answer).

```
def rolling_window(arr, size):
shape = arr.shape[:-1] + (arr.shape[-1] - size + 1, size)
strides = arr.strides + (arr.strides[-1],)
return np.lib.stride_tricks.as_strided(arr, shape=shape, strides=strides)
def find_rolling(seq, subseq):
bool_indices = np.all(rolling_window(seq, len(subseq)) == subseq, axis=1)
yield from np.mgrid[0:len(bool_indices)][bool_indices]
```

`find_rolling2()`

is a slightly more memory efficient variation of the above, where the vectorization is only partial and one explicit looping (along the expected shortest dimension -- the length of `subseq`

) is kept. (The approach is also from @senderle answer).

```
def find_rolling2(seq, subseq):
windows = rolling_window(seq, len(subseq))
hits = np.ones((len(seq) - len(subseq) + 1,), dtype=bool)
for i, x in enumerate(subseq):
hits &= np.in1d(windows[:, i], [x])
yield from hits.nonzero()[0]
```

## Based on Knuth–Morris–Pratt (KMP) Algorithm

`find_kmp()`

is the same as above, while `find_kmp_nb()`

is a straightforward JIT-compilation of that.

```
find_kmp_nb = nb.jit(find_kmp)
find_kmp_nb.__name__ = 'find_kmp_nb'
```

## Based on Rabin-Karp (RK) Algorithm

`find_rk()`

is the same as the above, except that again `seq[i:i + m] == subseq`

is enclosed in an `np.all()`

call.

`find_rk_nb()`

is the Numba accelerated version of the above. Uses `_is_equal_nb()`

defined earlier to definitively determine a match, while for the hashing, it uses a Numba accelerated `sum_hash_nb()`

function whose definition is pretty straightforward.

```
@nb.jit
def sum_hash_nb(arr):
result = 0
for x in arr:
result += hash(x)
return result
@nb.jit
def find_rk_nb(seq, subseq):
n = len(seq)
m = len(subseq)
if _is_equal_nb(seq, subseq, m, 0):
yield 0
hash_subseq = sum_hash_nb(subseq) # compute hash
curr_hash = sum_hash_nb(seq[:m]) # compute hash
for i in range(1, n - m + 1):
curr_hash += hash(seq[i + m - 1]) - hash(seq[i - 1]) # update hash
if hash_subseq == curr_hash and _is_equal_nb(seq, subseq, m, i):
yield i
```

`find_conv()`

uses a pseudo Rabin-Karp method, where initial candidates are hashed using the `np.dot()`

product and located on the convolution between `seq`

and `subseq`

with `np.where()`

. The approach is pseudo because, while it still uses hashing to identify probable candidates, it is may not be regarded as a rolling hash (it depends on the actual implementation of `np.correlate()`

. Also, it needs to create a temporary array the size of the input. (The approach is from @Jaime answer).

```
def find_conv(seq, subseq):
target = np.dot(subseq, subseq)
candidates = np.where(np.correlate(seq, subseq, mode='valid') == target)[0]
check = candidates[:, np.newaxis] + np.arange(len(subseq))
mask = np.all((np.take(seq, check) == subseq), axis=-1)
yield from candidates[mask]
```

## Benchmarks

Like before, the above functions are evaluated on two inputs:

```
def gen_input(n, k=2):
return np.random.randint(0, k, n)
```

- (almost) worst inputs for the naïve algorithm

```
def gen_input_worst(n, k=-2):
result = np.zeros(n, dtype=int)
result[k] = 1
return result
```

The `subseq`

has fixed size (`32`

).
This plots follow the same scheme as before, summarized below for convenience.

Since there are so many alternatives, two separate grouping have been done and some solutions with very small variations and almost identical timings have been omitted (i.e. `find_mix2()`

and `find_pivot2()`

).
For each group both inputs are tested.
For each benchmark the full plot and a zoom on the fastest approach is provided.

### Naïve on Random

### Naïve on Worst

### Other on Random

### Other on Worst

(Full code is available here.)

`x=''.join(str(x) for x in a)`

Then use the find method with the resulting strings? Or do they have to remain lists?