检测序列是否是Python中子序列的倍数

mat*_*ots 14 python algorithm sequence

我有一个零和一元组,例如:

(1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1)
Run Code Online (Sandbox Code Playgroud)

事实证明:

(1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1) == (1, 0, 1, 1) * 3
Run Code Online (Sandbox Code Playgroud)

我想要一个函数f,使得if s是一个零和一的非空元组,f(s)是最短的子元素r,s == r * n对于某些正整数n.

所以,例如,

f( (1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1) ) == (1, 0, 1, 1)
Run Code Online (Sandbox Code Playgroud)

f在Python中编写函数的方法是什么?

编辑:

我目前使用的天真方法

def f(s):
  for i in range(1,len(s)):
    if len(s)%i == 0 and s == s[:i] * (len(s)/i):
      return s[:i]
Run Code Online (Sandbox Code Playgroud)

Kno*_*the 5

我相信我有一个O(n)时间解决方案(实际上2n + r,n是元组的长度,r是子tuplle),它不使用后缀树,但使用字符串匹配算法(如KMP,你应该找到它-架子).

我们使用以下鲜为人知的定理:

If x,y are strings over some alphabet,

then xy = yx if and only if x = z^k and y = z^l for some string z and integers k,l.
Run Code Online (Sandbox Code Playgroud)

我现在声称,出于我们问题的目的,这意味着我们需要做的就是确定给定的元组/列表(或字符串)是否是自身的循环移位!

为了确定字符串是否是自身的循环移位,我们将它与自身连接起来(它甚至不必是真正的连接,只是虚拟连接)并检查子串匹配(与其自身).

为了证明这一点,假设字符串是自身的循环移位.

我们有给定的字符串y = uv = vu.由于uv = vu,我们必须从上述定理得到u = z ^ k和v = z ^ 1,因此y = z ^ {k + 1}.另一个方向很容易证明.

这是python代码.该方法称为powercheck.

def powercheck(lst):
    count = 0
    position = 0
    for pos in KnuthMorrisPratt(double(lst), lst):
        count += 1
        position = pos
        if count == 2:
            break

    return lst[:position]


def double(lst):
    for i in range(1,3):
        for elem in lst:
            yield elem

def main():
    print powercheck([1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1])

if __name__ == "__main__":
    main()
Run Code Online (Sandbox Code Playgroud)

这是我使用的KMP代码(由David Eppstein提供).

# Knuth-Morris-Pratt string matching
# David Eppstein, UC Irvine, 1 Mar 2002

def KnuthMorrisPratt(text, pattern):

    '''Yields all starting positions of copies of the pattern in the text.
Calling conventions are similar to string.find, but its arguments can be
lists or iterators, not just strings, it returns all matches, not just
the first one, and it does not need the whole text in memory at once.
Whenever it yields, it will have read the text exactly up to and including
the match that caused the yield.'''

    # allow indexing into pattern and protect against change during yield
    pattern = list(pattern)

    # build table of shift amounts
    shifts = [1] * (len(pattern) + 1)
    shift = 1
    for pos in range(len(pattern)):
        while shift <= pos and pattern[pos] != pattern[pos-shift]:
            shift += shifts[pos-shift]
        shifts[pos+1] = shift

    # do the actual search
    startPos = 0
    matchLen = 0
    for c in text:
        while matchLen == len(pattern) or \
              matchLen >= 0 and pattern[matchLen] != c:
            startPos += shifts[matchLen]
            matchLen -= shifts[matchLen]
        matchLen += 1
        if matchLen == len(pattern):
            yield startPos
Run Code Online (Sandbox Code Playgroud)

对于您的样本,此输出

[1,0,1,1]
Run Code Online (Sandbox Code Playgroud)

正如所料.

我将它与shx2的代码(不是numpy代码)进行比较,通过生成一个随机的50位字符串,然后复制使总长度为100万.这是输出(十进制数是time.time()的输出)

1362988461.75

(50, [1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1])

1362988465.96

50 [1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1]

1362988487.14
Run Code Online (Sandbox Code Playgroud)

上面的方法需要大约4秒,而shx2的方法需要大约21秒!

这是时间码.(shx2的方法叫做powercheck2).

def rand_bitstring(n):
    rand = random.SystemRandom()
    lst = []
    for j in range(1, n+1):
        r = rand.randint(1,2)
        if r == 2:
            lst.append(0)
        else:
            lst.append(1)

    return lst

def main():
    lst = rand_bitstring(50)*200000
    print time.time()
    print powercheck(lst)
    print time.time()
    powercheck2(lst)
    print time.time()
Run Code Online (Sandbox Code Playgroud)