在Mathematica中找到(重复)列表的周期的最佳方法是什么?

Arn*_*ing 19 algorithm wolfram-mathematica

在重复列表中查找句点的最佳方法是什么?

例如:

a = {4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2}
Run Code Online (Sandbox Code Playgroud)

重复{4, 5, 1, 2, 3}与余数{4, 5, 1, 2}匹配,但不完整.

该算法应该足够快以处理更长的情况,如下所示:

b = RandomInteger[10000, {100}];
a = Join[b, b, b, b, Take[b, 27]]
Run Code Online (Sandbox Code Playgroud)

$Failed如果没有如上所述的重复模式,则算法应该返回.

Sza*_*lcs 8

请查看散布在代码中的注释,了解其工作原理.

(* True if a has period p *)
testPeriod[p_, a_] := Drop[a, p] === Drop[a, -p]

(* are all the list elements the same? *)
homogeneousQ[list_List] := Length@Tally[list] === 1
homogeneousQ[{}] := Throw[$Failed] (* yes, it's ugly to put this here ... *)

(* auxiliary for findPeriodOfFirstElement[] *)
reduce[a_] := Differences@Flatten@Position[a, First[a], {1}]

(* the first element occurs every ?th position ? *)
findPeriodOfFirstElement[a_] := Module[{nl},
  nl = NestWhileList[reduce, reduce[a], ! homogeneousQ[#] &];
  Fold[Total@Take[#2, #1] &, 1, Reverse[nl]]
  ]

(* the period must be a multiple of the period of the first element *)
period[a_] := Catch@With[{fp = findPeriodOfFirstElement[a]},
   Do[
    If[testPeriod[p, a], Return[p]],
    {p, fp, Quotient[Length[a], 2], fp}
    ]
   ]
Run Code Online (Sandbox Code Playgroud)

请问是否findPeriodOfFirstElement[]不清楚.我是独立完成的(为了好玩!),但现在我看到原理与Verbeia的解决方案相同,除了Brett指出的问题是固定的.

我正在测试

b = RandomInteger[100, {1000}];
a = Flatten[{ConstantArray[b, 1000], Take[b, 27]}];
Run Code Online (Sandbox Code Playgroud)

(注意低整数值:同一时期内会有很多重复元素*)


编辑: 根据列昂尼德在下面的评论,通过使用专门为整数列表编译的自定义位置函数,可以实现另外2-3倍的加速(在我的机器上大约2.4倍):

(* Leonid's reduce[] *)

myPosition = Compile[
  {{lst, _Integer, 1}, {val, _Integer}}, 
  Module[{pos = Table[0, {Length[lst]}], i = 1, ctr = 0}, 
    For[i = 1, i <= Length[lst], i++, 
      If[lst[[i]] == val, pos[[++ctr]] = i]
    ]; 
    Take[pos, ctr]
  ], 
  CompilationTarget -> "C", RuntimeOptions -> "Speed"
]

reduce[a_] := Differences@myPosition[a, First[a]]
Run Code Online (Sandbox Code Playgroud)

编译testPeriod在快速测试中进一步提高了约20%,但我相信这将取决于输入数据:

Clear[testPeriod]
testPeriod = 
 Compile[{{p, _Integer}, {a, _Integer, 1}}, 
  Drop[a, p] === Drop[a, -p]]
Run Code Online (Sandbox Code Playgroud)


Dan*_*lau 7

如果您没有噪音,上述方法会更好.如果您的信号只是近似值,那么傅立叶变换方法可能会有用.我将用"参数化"设置来说明,其中基本信号的重复长度和数量,尾随部分的长度以及噪声扰动的界限都是可以使用的变量.

noise = 20;
extra = 40;
baselen = 103;
base = RandomInteger[10000, {baselen}];
repeat = 5;
signal = Flatten[Join[ConstantArray[base, repeat], Take[base, extra]]];
noisysignal = signal + RandomInteger[{-noise, noise}, Length[signal]];
Run Code Online (Sandbox Code Playgroud)

我们计算FFT的绝对值.我们两端都是零.通过与邻居比较,对象将达到阈值.

sigfft = Join[{0.}, Abs[Fourier[noisysignal]], {0}];
Run Code Online (Sandbox Code Playgroud)

现在我们创建两个0-1向量.在一个阈值中,通过为fft中的每个元素设置1,该元素大于其两个邻居的几何平均值的两倍.在另一方面,我们使用平均值(算术平均值),但我们将大小降低到3/4.这是基于一些实验.我们计算每种情况下1的数量.理想情况下,我们每个都会得到100,因为在没有噪音且没有尾部的"完美"情况下,这将是非零的数量.

In[419]:= 
thresh1 = 
  Table[If[sigfft[[j]]^2 > 2*sigfft[[j - 1]]*sigfft[[j + 1]], 1, 
    0], {j, 2, Length[sigfft] - 1}];
count1 = Count[thresh1, 1]
thresh2 = 
  Table[If[sigfft[[j]] > 3/4*(sigfft[[j - 1]] + sigfft[[j + 1]]), 1, 
    0], {j, 2, Length[sigfft] - 1}];
count2 = Count[thresh2, 1]

Out[420]= 114

Out[422]= 100
Run Code Online (Sandbox Code Playgroud)

现在我们对"重复"的值进行最佳猜测,将总长度超过我们的平均值.

approxrepeats = Floor[2*Length[signal]/(count1 + count2)]
Out[423]= 5
Run Code Online (Sandbox Code Playgroud)

所以我们发现基本信号重复5次.这可以开始精炼估计正确的长度(baselen,上面).为此,我们可能会尝试在末尾删除元素,并查看我们何时让ffts更接近实际在非零值之间运行四个0.

可能用于估计重复次数的其他东西是在阈值化的fft的行程编码中找到模态的零数.虽然我实际上没有尝试过这种方法,但看起来对于如何进行阈值处理的详细信息中的错误选择看起来可能很强大(我的实验似乎只是实验).

Daniel Lichtblau


Ver*_*eia 5

以下假设循环从第一个元素开始,并给出周期长度和周期.

findCyclingList[a_?VectorQ] :=
  Module[{repeats1, repeats2, cl, cLs, vec}, 
  repeats1 = Flatten@Differences[Position[a, First[a]]];
  repeats2 = Flatten[Position[repeats1, First[repeats1]]]; 
  If[Equal @@ Differences[repeats2] && Length[repeats2] > 2(* 
   is potentially cyclic - first element appears cyclically *),
   cl = Plus @@@ Partition[repeats1, First[Differences[repeats2]]];
   cLs = Partition[a, First[cl]];
   If[SameQ @@ cLs  (* candidate cycles all actually the same *), 
    vec = First[cLs];
    {Length[vec], vec}, $Failed], $Failed]  ]
Run Code Online (Sandbox Code Playgroud)

测试

b = RandomInteger[50, {100}];
a = Join[b, b, b, b, Take[b, 27]];

findCyclingList[a]

{100, {47, 15, 42, 10, 14, 29, 12, 29, 11, 37, 6, 19, 14, 50, 4, 38, 
  23, 3, 41, 39, 41, 17, 32, 8, 18, 37, 5, 45, 38, 8, 39, 9, 26, 33, 
  40, 50, 0, 45, 1, 48, 32, 37, 15, 37, 49, 16, 27, 36, 11, 16, 4, 28,
   31, 46, 30, 24, 30, 3, 32, 31, 31, 0, 32, 35, 47, 44, 7, 21, 1, 22,
   43, 13, 44, 35, 29, 38, 31, 31, 17, 37, 49, 22, 15, 28, 21, 8, 31, 
  42, 26, 33, 1, 47, 26, 1, 37, 22, 40, 27, 27, 16}}

b1 = RandomInteger[10000, {100}]; 
a1 = Join[b1, b1, b1, b1, Take[b1, 23]];

findCyclingList[a1]

{100, {1281, 5325, 8435, 7505, 1355, 857, 2597, 8807, 1095, 4203, 
  3718, 3501, 7054, 4620, 6359, 1624, 6115, 8567, 4030, 5029, 6515, 
  5921, 4875, 2677, 6776, 2468, 7983, 4750, 7609, 9471, 1328, 7830, 
  2241, 4859, 9289, 6294, 7259, 4693, 7188, 2038, 3994, 1907, 2389, 
  6622, 4758, 3171, 1746, 2254, 556, 3010, 1814, 4782, 3849, 6695, 
  4316, 1548, 3824, 5094, 8161, 8423, 8765, 1134, 7442, 8218, 5429, 
  7255, 4131, 9474, 6016, 2438, 403, 6783, 4217, 7452, 2418, 9744, 
  6405, 8757, 9666, 4035, 7833, 2657, 7432, 3066, 9081, 9523, 3284, 
  3661, 1947, 3619, 2550, 4950, 1537, 2772, 5432, 6517, 6142, 9774, 
  1289, 6352}}
Run Code Online (Sandbox Code Playgroud)

这种情况应该失败,因为它不是周期性的.

findCyclingList[Join[b, Take[b, 11], b]]

$Failed
Run Code Online (Sandbox Code Playgroud)

我尝试了一些东西Repeated,例如a /. Repeated[t__, {2, 100}] -> {t}但它对我不起作用.