什么是最快的算法来确定排序数组中的任何数字是否是`x`的倍数？

Question

什么是最快的算法来确定排序数组中的任何数字是否是`x`的倍数？

sho*_*ole 17 algorithm math

给定正整数x和排序正整数数组A

有没有比O(N)确定任何元素A是否是倍数更快的算法x？没有负面因素A.

A到目前为止,Naive循环是我唯一的想法,我不知道是否有任何方法可以利用A为加快速度而排序的事实.

Answer 1

tob*_*s_k 14

这似乎在很大程度上取决于其中x元素的大小和数量A,特别是x内部候选倍数的数量A.

二进制搜索内部的特定数字A需要O(log(n))时间(n是其中元素的数量A),因此如果在第一个元素和最后一个元素之间存在k可能的倍数,则需要对它们进行全部检查.如果该数字小于,则可以使用此算法,否则只需进行线性搜索.xAO(k * log(N))n

(此外,对于上述算法可能有一些小的优化.例如,一旦你检查x*i(并且没有找到它),你可以使用x*i在搜索时应该作为下限的位置x*(i+1)而不是第一个元素.数组.)

您也可以将其作为递归算法,一次搜索x的所有倍数,在每一轮中将它们分成上半部分和上半部分,并且每次再次检查k*logn <n是否仍然成立. (3认同)

Answer 2

Tat*_*ize 13

HT @ tobias_k的评论.

您可以在〜中解决它O(n/x)(实际上可能更新O(N*log(N)/X^2)).您同时对x的所有倍数进行二进制搜索.每次迭代细分每个搜索空间,当搜索空间不能包含x的倍数时,您中止该分支.因此,不是二元搜索每个值,而是二进制搜索所有值,但仅限于那些在其范围内仍包含有效倍数的分支.最好的是它完全阻止研究相同的空间两次,这使得最坏的情况x = 1或者O(n/1) O(n).在最好的情况下,它将知道范围不能包含O(1)中的倍数和中止.

因为你可以保证更糟糕的是O(n),你基本上会错过每个该死的缓存查找(请记住,在现实世界中,这可能最终比时间复杂度更重要,因此测试这些事情).你会得到理论上的时间复杂度,可能比O(n)更好,但绝不会比这更糟糕(保存数组的跳跃会遗漏缓存,因为这就是计算机最终实际在现实世界中工作的方式).

正如预测的那样,速度增加很大程度上取决于k(x)的值.

这开始比k = ~128的原始循环更快.(除法因子)

截断分支管理到现实世界超越原始循环.我假设n计数并不重要,因为它看起来大致相同,但也许直接检查更好.

注意:根据此代码的性质,它将跳过双精度数,这是计数的差异.

public class MultipleSearch {

    public static void main(String[] args) {
        Random random = new Random();
        int[] array = new int[500000000];
        for (int i = 0, m = array.length; i < m; i++) {
            array[i] = Math.abs(random.nextInt());
        }
        Arrays.sort(array);
        for (int k = 1; k < 16777216; k *= 2) {
            long time;
            time = System.currentTimeMillis();
            binaryFactorLocator(array, k);
            System.out.println("Factors found multi: " + (System.currentTimeMillis() - time) + " @" + k);
            time = System.currentTimeMillis();
            loopFactorLocator(array, k);
            System.out.println("Factors found loop: " + (System.currentTimeMillis() - time) + " @" + k);
        }
    }

    public static void loopFactorLocator(int[] array, int k) {
        int count = 0;
        for (int i = 0, m = array.length; i < m; i++) {
            if (array[i] % k == 0) {
                count++;
                //System.out.println("loop: " + array[i] + " value at index " + i + " is a proven multiple of " + k);
            }
        }
        System.out.println(count + " items found.");
    }

    public static void binaryFactorLocator(int[] array, int k) {
        int count = binaryFactorLocator(0, array, k, 0, array.length);
        System.out.println(count + " items found.");
    }

    public static int binaryFactorLocator(int count, int[] array, int k, int start, int end) {
        if (start >= end) { //contains zero elements. (End is exclusive)
            return count;
        }
        int startValue = array[start]; //first value
        int endValue = array[end - 1]; //last value;
        if (startValue / k == endValue / k) { //if both values are contained within the same factor loop.
            if (startValue % k == 0) { //check lower value for being perfect factor.
                //System.out.println("multi-binary: " + startValue + " value at index " + start + " is a proven multiple of " + k);
                return count + 1;
            }
            return count; //There can be no other factors within this branch.
        }
        int midpoint = (start + end) / 2; //subdivide
        count = binaryFactorLocator(count, array, k, start, midpoint); //recurse.
        count = binaryFactorLocator(count, array, k, midpoint, end); //recurse.
        return count;
    }
}

Run Code Online (Sandbox Code Playgroud)

这个实现应该非常可靠,因为它会在start/k == end/k元素中截断循环,它应该跳过double(有时,它可能会在两个doubled值之间切换).显然这样的递归可能不会是最优的,应该用较少的callstack堆栈重写.

474682772 items found.
Factors found multi: 21368 @1
500000000 items found.
Factors found loop: 5653 @1
236879556 items found.
Factors found multi: 21573 @2
250000111 items found.
Factors found loop: 7782 @2
118113043 items found.
Factors found multi: 19785 @4
125000120 items found.
Factors found loop: 5445 @4
58890737 items found.
Factors found multi: 16539 @8
62500081 items found.
Factors found loop: 5277 @8
29399912 items found.
Factors found multi: 12812 @16
31250060 items found.
Factors found loop: 5117 @16
14695209 items found.
Factors found multi: 8799 @32
15625029 items found.
Factors found loop: 4935 @32
7347206 items found.
Factors found multi: 5886 @64
7812362 items found.
Factors found loop: 4815 @64
3673884 items found.
Factors found multi: 3441 @128
3906093 items found.
Factors found loop: 4479 @128
1836857 items found.
Factors found multi: 2100 @256
1953038 items found.
Factors found loop: 4592 @256
918444 items found.
Factors found multi: 1335 @512
976522 items found.
Factors found loop: 4361 @512
459141 items found.
Factors found multi: 959 @1024
488190 items found.
Factors found loop: 4447 @1024
229495 items found.
Factors found multi: 531 @2048
243961 items found.
Factors found loop: 4114 @2048
114715 items found.
Factors found multi: 295 @4096
121964 items found.
Factors found loop: 3894 @4096
57341 items found.
Factors found multi: 195 @8192
61023 items found.
Factors found loop: 4061 @8192
28554 items found.
Factors found multi: 106 @16384
30380 items found.
Factors found loop: 3757 @16384
14282 items found.
Factors found multi: 65 @32768
15207 items found.
Factors found loop: 3597 @32768
7131 items found.
Factors found multi: 35 @65536
7575 items found.
Factors found loop: 3288 @65536
3678 items found.
Factors found multi: 17 @131072
3883 items found.
Factors found loop: 3281 @131072
1796 items found.
Factors found multi: 13 @262144
1900 items found.
Factors found loop: 3243 @262144
873 items found.
Factors found multi: 6 @524288
921 items found.
Factors found loop: 2970 @524288
430 items found.
Factors found multi: 3 @1048576
456 items found.
Factors found loop: 2871 @1048576
227 items found.
Factors found multi: 2 @2097152
238 items found.
Factors found loop: 2748 @2097152
114 items found.
Factors found multi: 1 @4194304
120 items found.
Factors found loop: 2598 @4194304
48 items found.
Factors found multi: 0 @8388608
51 items found.
Factors found loop: 2368 @8388608

Run Code Online (Sandbox Code Playgroud)

这种算法的复杂性不是"O(n/x)".它一般不依赖于`x`的值.在最坏的情况下,它是"O(n)".但这并不重要,因为它在实际数据上比在循环上要快得多. (4认同)

Answer 3

Zen*_*Zen 6

在之前的尝试中,我尝试了一个简单的二分法搜索,但正如所指出的那样,实际上并没有任何结果.

这是我最好的尝试.我怀疑这是值得的麻烦,对于现实世界的案例它甚至可能会更慢,但是这里也是如此.

如果你有一个排序正整数的数组A [0..n],你想检查A [i..j]中是否有正整数X的倍数,其中0≤i

If i>j then A[i..j] ist empty and thus contains no multiple of X
Else if A[i] % X = 0 or A[j] % X = 0 then A[i..j] contains a multiple of X
Else if A[i] / X = A[j] / X then A[i..j] contains no multiple of X
Else A[i..j] contains a multiple of X iff A[i+1..(i+j)/2] or A[(i+j)/2+1..j-1] contains a multiple of X

Run Code Online (Sandbox Code Playgroud)

我的猜测是,复杂性将是O(n/X)ish,因此,并不是对宏观方案的改进.

编辑添加: 如果您的数据非常特殊,这实际上可能会有所帮助.在许多情况下,它可能实际上有害:

有管理返回堆栈的开销(第一个递归调用是非终端)
因为我们在数据中来回跳过而不是遍历数据,所以我们破坏了缓存局部性
我们使处理器的分支预测更加困难

归档时间：	8 年，6 月前
查看次数：	2372 次
最近记录：	8 年，6 月前