如何加快数组搜索功能？

Question

如何加快数组搜索功能？

use*_*876 3 javascript arrays search react-native expo

我正在开发用react-native编写的字典应用程序。

当我想从搜索框中过滤数组时，我编写了以下函数。当我用 2000 个单词列表进行测试时，效果非常好。但是当单词列表达到数千个时，搜索速度就非常慢了。

那么，如何改进这个搜索功能呢？

//Filter array when input text (Search)

let filteredWords = []
if(this.state.searchField != null)
{
  filteredWords = this.state.glossaries.filter(glossary => {
    return glossary.word.toLowerCase().includes(this.state.searchField.toLowerCase());
  })
}

Run Code Online (Sandbox Code Playgroud)

Answer 1

Waz*_*ner 9

有多种因素导致此代码变慢：

您正在使用filter()lambda。这会增加每个正在搜索的项目的函数调用开销。
您toLowercase()在调用之前调用了两个字符串includes()。这将为每次比较分配两个新的字符串对象。
你在打电话includes。由于某种原因，该includes()方法在某些浏览器中的优化不如indexOf().

`for`循环 (-11%)

filter()我建议创建一个新方法Array并使用循环来填充它，而不是使用该方法for。

const glossaries = this.state.glossaries;
const searchField = this.state.searchField;
const filteredWords = [];   

for (let i = 0; i < glossaries.length; i++) {
  if (glossaries[i].toLowerCase().includes(searchField.toLowerCase())) {
    filteredWords.push(glossaries[i]);
  }
}

Run Code Online (Sandbox Code Playgroud)

toLowerCase 分配 (-45%)

由于 JavaScript 使用垃圾收集机制来释放已用内存，因此内存分配非常昂贵。当执行垃圾收集时，整个程序将暂停，同时尝试查找不再使用的内存。

toLowerCase()您可以通过在每次更新术语表时复制术语表来完全摆脱（在搜索循环内），我认为这种情况并不常见。

// When you build the glossary
this.state.glossaries = ...;
this.state.searchGlossaries = this.state.glossaries.map(g => g.toLowerCase());

Run Code Online (Sandbox Code Playgroud)

您还可以toLowerCase()通过在循环之前调用一次来删除 searchText 上的。经过这些更改后，代码将如下所示：

const glossaries = this.state.glossaries;
const searchGlassaries = this.state.searchGlossaries;
const searchField = this.state.searchField.toLowerCase();
const filteredWords = []; 

for (let i = 0; i < glossaries.length; i++) {
  if (searchGlassaries[i].includes(searchField)) {
    filteredWords.push(glossaries[i]);
  }
}

Run Code Online (Sandbox Code Playgroud)

`indexOf()`而不是`includes()`(-13%)

我不太确定为什么会出现这种情况，但测试表明这indexOf比includes.

const glossaries = this.state.glossaries;
const searchGlassaries = this.state.searchGlossaries;
const searchField = this.state.searchField.toLowerCase();
const filteredWords = []; 

for (let i = 0; i < glossaries.length; i++) {
  if (searchGlassaries[i].indexOf(searchField) !== -1) {
    filteredWords.push(glossaries[i]);
  }
}

Run Code Online (Sandbox Code Playgroud)

总体性能提高了 70%。我从https://jsperf.com/so-question-perf获得了性能百分比

优化算法

在评论中，您说您想要一个优化示例，当要求放宽到仅匹配以搜索文本开头的单词时，可以完成该优化。一种方法是二分查找。

让我们以上面的代码为起点。在将术语表存储到状态之前，我们会对术语表进行排序。为了不区分大小写排序，JavaScript 公开了Intl.Collator构造函数。它提供了compare(x, y)返回的方法：

negative value  | X is less than Y
zero            | X is equal to Y
positive value  | X is greater than Y

Run Code Online (Sandbox Code Playgroud)

以及生成的代码：

// Static in the file
const collator = new Intl.Collator(undefined, {
  sensitivity: 'base'
});

function binarySearch(glossaries, searchText) {
  let lo = 0;
  let hi = glossaries.length - 1;

  while (lo <= hi) {
    let mid = (lo + hi) / 2 | 0;
    let comparison = collator.compare(glossaries[mid].word, searchText);

    if (comparison < 0) {
      lo = mid + 1;
    }
    else if (comparison > 0) {
      hi = mid - 1;
    }
    else {
      return mid;
    }
  }

  return -1;
}

// When you build the glossary
this.state.glossaries = ...;
this.state.glossaries.sort(function(x, y) {
  return collator.compare(x.word, y.word);
});

// When you search
const glossaries = this.state.glossaries;
const searchField = this.state.searchField.toLowerCase();
const filteredWords = [];

const idx = binarySearch(glossaries, searchField);

if (idx != -1) {
  // Find the index of the first matching word, seeing as the binary search
  // will end up somewhere in the middle
  while (idx >= 0 && collator.compare(glossaries[idx].word, searchField) < 0) {
    idx--;
  }

  // Add each matching word to the filteredWords
  while (idx < glossaries.length && collator.compare(glossaries[idx].word, searchField) == 0) {
    filteredWords.push(glossaries[idx]);
  }
}

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，6 月前
查看次数：	5793 次
最近记录：	7 年，6 月前

如何加快数组搜索功能？

for循环 (-11%)

toLowerCase 分配 (-45%)

indexOf()而不是includes()(-13%)

优化算法

`for`循环 (-11%)

`indexOf()`而不是`includes()`(-13%)