Java数组,查找重复项

Sno*_*man 59 java arrays

我有一个数组,我正在寻找重复.

duplicates = false;
for(j = 0; j < zipcodeList.length; j++){
    for(k = 0; k < zipcodeList.length; k++){
        if (zipcodeList[k] == zipcodeList[j]){
            duplicates = true;
        }
    }
}
Run Code Online (Sandbox Code Playgroud)

但是,当没有重复时,此代码不起作用.为什么?

and*_*soj 139

在鼻子上回答..

duplicates=false;
for (j=0;j<zipcodeList.length;j++)
  for (k=j+1;k<zipcodeList.length;k++)
    if (k!=j && zipcodeList[k] == zipcodeList[j])
      duplicates=true;
Run Code Online (Sandbox Code Playgroud)

编辑后切换.equals()==自从我读到你正在使用的某个地方int,这在最初的问题中并不清楚.同样要设置k=j+1,将执行时间减半,但它仍然是O(n 2).

更快(极限)的方式

这是一种基于哈希的方法.你必须为自动装箱付费,但它是O(n)而不是O(n 2).一个有进取心的灵魂会找到一个原始的基于int的哈希集(Apache或Google Collections有这样的东西,可以解决这个问题.)

boolean duplicates(final int[] zipcodelist)
{
  Set<Integer> lump = new HashSet<Integer>();
  for (int i : zipcodelist)
  {
    if (lump.contains(i)) return true;
    lump.add(i);
  }
  return false;
}
Run Code Online (Sandbox Code Playgroud)

向HuyLe低头

请参阅HuyLe对或多或少O(n)解决方案的回答,我认为这需要几个额外的步骤:

static boolean duplicates(final int[] zipcodelist)
{
   final int MAXZIP = 99999;
   boolean[] bitmap = new boolean[MAXZIP+1];
   java.util.Arrays.fill(bitmap, false);
   for (int item : zipcodeList)
     if (!bitmap[item]) bitmap[item] = true;
     else return true;
   }
   return false;
}
Run Code Online (Sandbox Code Playgroud)

或者只是紧凑

static boolean duplicates(final int[] zipcodelist)
{
   final int MAXZIP = 99999;
   boolean[] bitmap = new boolean[MAXZIP+1];  // Java guarantees init to false
   for (int item : zipcodeList)
     if (!(bitmap[item] ^= true)) return true;
   return false;
}
Run Code Online (Sandbox Code Playgroud)

有关系吗?

好吧,所以我运行了一个小基准测试,这个地方都是iffy,但这里是代码:

import java.util.BitSet;

class Yuk
{
  static boolean duplicatesZero(final int[] zipcodelist)
  {
    boolean duplicates=false;
    for (int j=0;j<zipcodelist.length;j++)
      for (int k=j+1;k<zipcodelist.length;k++)
        if (k!=j && zipcodelist[k] == zipcodelist[j])
          duplicates=true;

    return duplicates;
  }


  static boolean duplicatesOne(final int[] zipcodelist)
  {
    final int MAXZIP = 99999;
    boolean[] bitmap = new boolean[MAXZIP + 1];
    java.util.Arrays.fill(bitmap, false);
    for (int item : zipcodelist) {
      if (!(bitmap[item] ^= true))
        return true;
    }
    return false;
  }

  static boolean duplicatesTwo(final int[] zipcodelist)
  {
    final int MAXZIP = 99999;

    BitSet b = new BitSet(MAXZIP + 1);
    b.set(0, MAXZIP, false);
    for (int item : zipcodelist) {
      if (!b.get(item)) {
        b.set(item, true);
      } else
        return true;
    }
    return false;
  }

  enum ApproachT { NSQUARED, HASHSET, BITSET};

  /**
   * @param args
   */
  public static void main(String[] args)
  {
    ApproachT approach = ApproachT.BITSET;

    final int REPS = 100;
    final int MAXZIP = 99999;

    int[] sizes = new int[] { 10, 1000, 10000, 100000, 1000000 };
    long[][] times = new long[sizes.length][REPS];

    boolean tossme = false;

    for (int sizei = 0; sizei < sizes.length; sizei++) {
      System.err.println("Trial for zipcodelist size= "+sizes[sizei]);
      for (int rep = 0; rep < REPS; rep++) {
        int[] zipcodelist = new int[sizes[sizei]];
        for (int i = 0; i < zipcodelist.length; i++) {
          zipcodelist[i] = (int) (Math.random() * (MAXZIP + 1));
        }
        long begin = System.currentTimeMillis();
        switch (approach) {
        case NSQUARED :
          tossme ^= (duplicatesZero(zipcodelist));
          break;
        case HASHSET :
          tossme ^= (duplicatesOne(zipcodelist));
          break;
        case BITSET :
          tossme ^= (duplicatesTwo(zipcodelist));
          break;

        }
        long end = System.currentTimeMillis();
        times[sizei][rep] = end - begin;


      }
      long avg = 0;
      for (int rep = 0; rep < REPS; rep++) {
        avg += times[sizei][rep];
      }
      System.err.println("Size=" + sizes[sizei] + ", avg time = "
            + avg / (double)REPS + "ms");
    }
  }

}
Run Code Online (Sandbox Code Playgroud)

使用NSQUARED:

Trial for size= 10
Size=10, avg time = 0.0ms
Trial for size= 1000
Size=1000, avg time = 0.0ms
Trial for size= 10000
Size=10000, avg time = 100.0ms
Trial for size= 100000
Size=100000, avg time = 9923.3ms
Run Code Online (Sandbox Code Playgroud)

使用HashSet

Trial for zipcodelist size= 10
Size=10, avg time = 0.16ms
Trial for zipcodelist size= 1000
Size=1000, avg time = 0.15ms
Trial for zipcodelist size= 10000
Size=10000, avg time = 0.0ms
Trial for zipcodelist size= 100000
Size=100000, avg time = 0.16ms
Trial for zipcodelist size= 1000000
Size=1000000, avg time = 0.0ms
Run Code Online (Sandbox Code Playgroud)

使用BitSet

Trial for zipcodelist size= 10
Size=10, avg time = 0.0ms
Trial for zipcodelist size= 1000
Size=1000, avg time = 0.0ms
Trial for zipcodelist size= 10000
Size=10000, avg time = 0.0ms
Trial for zipcodelist size= 100000
Size=100000, avg time = 0.0ms
Trial for zipcodelist size= 1000000
Size=1000000, avg time = 0.0ms
Run Code Online (Sandbox Code Playgroud)

BITSET赢了!

但只有一个头发...... 15毫秒是错误的currentTimeMillis(),我的基准测试中有一些漏洞.请注意,对于任何长于100000的列表,您只需返回,true因为会有重复.事实上,如果列表是随机的,你可以为更短的列表返回真正的WHP.什么是道德?在极限中,最有效的实现是:

 return true;
Run Code Online (Sandbox Code Playgroud)

而且你不会经常犯错.

  • 另一种选择:如果由于某种原因您在空间上紧张,您可以对原始列表进行排序,然后搜索重复的邻居,O(nlogn)时间和O(1)空间. (3认同)
  • 感觉......不优雅.可能使用基于散列的方法更好,特别是如果zipcodelist变大.这是一个O(n ^ 2),这是一个很好的定义"不能很好地扩展".通过阵列嵌套的死亡数据库通常像一个难以触及的书架上长期丢失的BigMac.如果设置`.contains()`下一个`ZipCode`,最好保持一个`Set <ZipCode>`并逐个测试. (2认同)
  • 使用BitSet可能稍好一点:http://download.oracle.com/javase/1.4.2/docs/api/java/util/BitSet.html (2认同)
  • 我认为对于"更快(极限)方式"部分和"Bow to HuyLe"部分,时间复杂度是相同的,即O(n).请看这个.. >> http://stackoverflow.com/questions/11166247/why-is-the-following-two-duplicate-finder-algorithms-have-different-time-complex (2认同)
  • 鉴于 k 的起始值为 `j+1`,没有必要检查 `k!=j`,这些值永远不会相等。 (2认同)

Lie*_*yan 13

让我们看看你的算法是如何工作的:

an array of unique values:

[1, 2, 3]

check 1 == 1. yes, there is duplicate, assigning duplicate to true.
check 1 == 2. no, doing nothing.
check 1 == 3. no, doing nothing.
check 2 == 1. no, doing nothing.
check 2 == 2. yes, there is duplicate, assigning duplicate to true.
check 2 == 3. no, doing nothing.
check 3 == 1. no, doing nothing.
check 3 == 2. no, doing nothing.
check 3 == 3. yes, there is duplicate, assigning duplicate to true.
Run Code Online (Sandbox Code Playgroud)

一个更好的算法:

for (j=0;j<zipcodeList.length;j++) {
    for (k=j+1;k<zipcodeList.length;k++) {
        if (zipcodeList[k]==zipcodeList[j]){ // or use .equals()
            return true;
        }
    }
}
return false;
Run Code Online (Sandbox Code Playgroud)


Huy*_* Le 13

您可以使用位图来获得更大的阵列性能.

    java.util.Arrays.fill(bitmap, false);

    for (int item : zipcodeList)
        if (!bitmap[item]) bitmap[item] = true;
        else break;
Run Code Online (Sandbox Code Playgroud)

更新:这是我当天的一个非常疏忽的回答,保留在这里仅供参考.你应该参考andersoj的优秀答案.


cod*_*ict 5

要检查重复项,您需要比较不同的对。

  • 我想这应该是一条评论..但它已经 6 岁了所以 w/e (3认同)