HashSet与ArrayList的CPU使用率很高

Question

HashSet与ArrayList的CPU使用率很高

Sth*_*ita 8 java performance arraylist hashset

我有104k字符串值,其中89k是唯一的.我想检查一下该列表中是否存在字符串.

这是我的类及其保存所有这些记录的方法.

public class TestClass {
    private static TestClass singletonObj = null;
    private List<String> stringList= null;

    public static synchronized TestClass getInstance() {
        if(singletonObj == null) {
            singletonObj = new TestClass();
        }
        return singletonObj;
    }


    public boolean isValidString(String token) {
        if(stringList == null) {
            init();
        }
        if(stringList != null && token != null && !token.isEmpty())
            return stringList.contains(token.toLowerCase());
        return false;
    }

    private init() {
     stringList = new ArrayList<String>();
     // put all 104k values in this data structure.
    }
}

Run Code Online (Sandbox Code Playgroud)

我的应用程序尝试同时使用此isValidString()方法,每秒约20个请求.这工作正常但是当我尝试将数据结构更改为时HashSet,CPU使用率非常高.根据我的理解,Hashset应该比ArrayList [o(n)]执行更好的[o(1)].谁能解释我为什么会这样？

Answer 1

Lit*_*nti 1

我的猜测是，作为一个基于哈希的结构，从将每个字符串插入到 HashSet的那一刻起HashSet，即在方法中，就计算每个字符串的 hashCode 。这可能是 CPU 变高的时期，也是我们在迭代结构体值时获得更好吞吐量所付出的代价的一部分。init

如果我是对的，方法init结束后，CPU 应该会下降，程序的速度应该会大大提高，这就是使用 HashSet 的好处。

顺便说一句：一种可靠的优化方法是预先确定结构的大小：

ArrayList 的初始大小应等于将包含的最大元素数。

并且 HashSet 的初始大小比最大值大 1.7。

顺便说一句：标准哈希算法String.hash计算字符串的所有字符。例如，也许您可能满足于仅计算前 100 个字符（当然，具体取决于您正在处理的数据的性质）。然后，您可以将字符串封装到您自己的类中，使用hashCode您自己的哈希算法覆盖该方法，并覆盖该equals方法以执行严格的比较。

归档时间：	10 年，7 月前
查看次数：	1033 次
最近记录：	10 年，7 月前