如何在Python中使用for循环在数组中查找重复元素?

Beg*_*ner 22 python duplicates

我有一个包含重复元素的列表:

 list_a=[1,2,3,5,6,7,5,2]

 tmp=[]

 for i in list_a:
     if tmp.__contains__(i):
         print i
     else:
         tmp.append(i)
Run Code Online (Sandbox Code Playgroud)

我已经使用上面的代码来查找重复的元素list_a.我不想从列表中删除元素.

但我想在这里使用for循环.通常C/C++我们这样使用我猜:

 for (int i=0;i<=list_a.length;i++)
     for (int j=i+1;j<=list_a.length;j++)
         if (list_a[i]==list_a[j])
             print list_a[i]
Run Code Online (Sandbox Code Playgroud)

我们如何在Python中使用这样的?

for i in list_a:
    for j in list_a[1:]:
    ....
Run Code Online (Sandbox Code Playgroud)

我尝试了上面的代码.但它解决方案有误.我不知道如何增加价值j.

YOU*_*YOU 56

仅供参考,在python 2.7+中,我们可以使用Counter

import collections

x=[1, 2, 3, 5, 6, 7, 5, 2]

>>> x
[1, 2, 3, 5, 6, 7, 5, 2]

>>> y=collections.Counter(x)
>>> y
Counter({2: 2, 5: 2, 1: 1, 3: 1, 6: 1, 7: 1})
Run Code Online (Sandbox Code Playgroud)

唯一清单

>>> list(y)
[1, 2, 3, 5, 6, 7]
Run Code Online (Sandbox Code Playgroud)

找到的物品超过1次

>>> [i for i in y if y[i]>1]
[2, 5]
Run Code Online (Sandbox Code Playgroud)

物品只发现一次

>>> [i for i in y if y[i]==1]
[1, 3, 6, 7]
Run Code Online (Sandbox Code Playgroud)

  • `[n代表n,i代表y.iteritems()如果i> 1]`而不是`i == 1`. (2认同)

小智 25

使用in运算符而不是__contains__直接调用.

你几乎有所作为(但是是O(n**2)):

for i in xrange(len(list_a)):
  for j in xrange(i + 1, len(list_a)):
    if list_a[i] == list_a[j]:
      print "duplicate:", list_a[i]
Run Code Online (Sandbox Code Playgroud)

但是使用集合要容易得多(由于哈希表大致为O(n)):

seen = set()
for n in list_a:
  if n in seen:
    print "duplicate:", n
  else:
    seen.add(n)
Run Code Online (Sandbox Code Playgroud)

或者dict,如果你想跟踪重复的位置(也是O(n)):

import collections
items = collections.defaultdict(list)
for i, item in enumerate(list_a):
  items[item].append(i)
for item, locs in items.iteritems():
  if len(locs) > 1:
    print "duplicates of", item, "at", locs
Run Code Online (Sandbox Code Playgroud)

或者甚至只是在某处检测到重复(也是O(n)):

if len(set(list_a)) != len(list_a):
  print "duplicate"
Run Code Online (Sandbox Code Playgroud)


Eva*_*ark 17

你总是可以使用列表理解:

dups = [x for x in list_a if list_a.count(x) > 1]
Run Code Online (Sandbox Code Playgroud)

  • 这会为每个元素遍历列表一次(尽管OP的代码也是O(N**2)). (3认同)
  • 我认为这稍微有效:[x for i,x in enumerate(list_a)if list_a [i:].count(x)> 1] (2认同)

e-s*_*tis 8

在Python 2.3之前,使用dict():

>>> lst = [1, 2, 3, 5, 6, 7, 5, 2]
>>> stats = {}
>>> for x in lst : # count occurrences of each letter:
...     stats[x] = stats.get(x, 0) + 1 
>>> print stats
{1: 1, 2: 2, 3: 1, 5: 2, 6: 1, 7: 1} # filter letters appearing more than once:
>>> duplicates = [dup for (dup, i) in stats.items() if i > 1] 
>>> print duplicates
Run Code Online (Sandbox Code Playgroud)

所以一个功能:

def getDuplicates(iterable):
    """
       Take an iterable and return a generator yielding its duplicate items.
       Items must be hashable.

       e.g :

       >>> sorted(list(getDuplicates([1, 2, 3, 5, 6, 7, 5, 2])))
       [2, 5]
    """
    stats = {}
    for x in iterable : 
        stats[x] = stats.get(x, 0) + 1
    return (dup for (dup, i) in stats.items() if i > 1)
Run Code Online (Sandbox Code Playgroud)

使用Python 2.3来自set(),它甚至是内置的:

def getDuplicates(iterable):
    """
       Take an iterable and return a generator yielding its duplicate items.
       Items must be hashable.

       e.g :

       >>> sorted(list(getDuplicates([1, 2, 3, 5, 6, 7, 5, 2])))
       [2, 5]
    """
    try: # try using built-in set
        found = set() 
    except NameError: # fallback on the sets module
        from sets import Set
        found = Set()

    for x in iterable:
        if x in found : # set is a collection that can't contain duplicate
            yield x
        found.add(x) # duplicate won't be added anyway
Run Code Online (Sandbox Code Playgroud)

使用Python 2.7及更高版本,你可以让collections模块提供与dict相同的功能,并且我们可以使它比解决方案1更短(更快,可能是引擎盖下的C):

import collections

def getDuplicates(iterable):
    """
       Take an iterable and return a generator yielding its duplicate items.
       Items must be hashable.

       e.g :

       >>> sorted(list(getDuplicates([1, 2, 3, 5, 6, 7, 5, 2])))
       [2, 5]
    """
    return (dup for (dup, i) in collections.counter(iterable).items() if i > 1)
Run Code Online (Sandbox Code Playgroud)

我坚持使用解决方案2.


HOT*_*HOT 6

def get_duplicates(arr):
    dup_arr = arr[:]
    for i in set(arr):
        dup_arr.remove(i)       
    return list(set(dup_arr))
Run Code Online (Sandbox Code Playgroud)