使用固定索引快速将元素加载到数组/列表中,无需重复

Question

使用固定索引快速将元素加载到数组/列表中,无需重复

我的要求是将字符串输入到不在数组中的数组中.我还需要维护固定索引,因为此数组将与其他数据结构一起使用,并与每个索引具有一对一的关系.目前我正在使用ArrayList该类并使用该indexOf ()方法检查它是否存在,如果没有,则使用add ()带有一个参数的方法将其添加到列表中.我不熟悉java中的类,因此无法理解如何使用HashMap或其他东西(trie或其他)实现它,这将使加载过程快速.

执行indexOf ()中ArrayList,使顺序搜索？我的观点是减少将单词加载到数组中时的处理时间,不插入重复项,并维护元素的固定索引.如果测试的单词已经在数组中,则需要插入它的索引,因为需要索引来索引其他结构并进行一些处理.有什么建议让这个过程更好吗？

UPDATE

有一个数组,我有一些文件,我需要扫描每个单词,并在文档中找到唯一的单词.但我还需要计算重复数量.换句话说,我需要计算文档中出现的唯一术语的术语频率.我保持一个ArrayList<Integer[]>术语频率(术语数量x文档数量).我正在获取一个单词,然后使用该indexOf ()方法检查它是否在单词列表中.如果它不在单词列表中,那么我将单词插入列表,并在2d数组中分配一个新行(the Array<Integer[]>),然后将2d数组中的term元素的计数设置为1.如果单词已经在单词数组中,然后我使用数组中单词的索引来索引Array<Integer[]>矩阵的行,并使用当前处理文档编号来获取单元格并递增计数.

我的问题是减少indexOf ()我目前使用的每个单词的处理时间.如果它已经在那里,我需要得到单词数组中单词的索引,如果它不在那里,那么我需要动态地将它插入到数组中.

示例代码

import java.io.*;
import java.util.ArrayList;
import static java.lang.Math.log;


class DocumentRepresentation
{
  private String dirPath;
  private ArrayList<String> fileNameVector;
  private ArrayList<String> termVector;
  private ArrayList<Integer[]> tf; /* store it in natural 2d array */
  private Integer df[]; /* do normal 1d array */
  private Double idf[]; /* do normal 1d array */
  private Double tfIdf[][]; /* do normal 2d array */

  DocumentRepresentation (String dirPath)
  {
    this.dirPath = dirPath;
    fileNameVector = new ArrayList<String> ();
    termVector = new ArrayList<String> ();
    tf = new ArrayList<Integer[]> ();
  }

  /* Later sepatere the internal works */
  public int start ()
  {
    /* Load the files, and populate the fileNameVector string */
    File fileDir = new File (dirPath);
    int fileCount = 0;
    int index;

    if (fileDir.isDirectory () == false)
    {
      return -1;
    }

    File fileList[] = fileDir.listFiles ();

    for (int i=0; i<fileList.length; i++)
    {
      if (fileList[i].isFile () == true)
      {
        fileNameVector.add (fileList[i].getName ());
        //      System.out.print ("File Name " + (i + 1) + ": " + fileList[i].getName () + "\n");
      }
    }

    fileCount = fileNameVector.size ();
    for (int i=0;i<fileNameVector.size (); i++)
    {
      System.out.print ("Name " + (i+1) + ": " + fileNameVector.get (i) + "\n");
    }

    /* Bind the files with a buffered reader */
    BufferedReader fileReaderVector[] = new BufferedReader [fileCount];
    for (int i=0; i<fileCount; i++)
    {
      try
      {
        fileReaderVector[i] = new BufferedReader (new FileReader (fileList[i]));
      }
      /* Not handled */
      catch (FileNotFoundException e)
      {
        System.out.println (e);
      }
    }

    /* Scan the term frequencies in the tf 2d array */
    for (int i=0; i<fileCount; i++)
    {
      String line;

      try
      {
            /*** THIS IS THE PLACE OF MY QUESTION **/
        while ((line = fileReaderVector[i].readLine ()) != null)
        {
          String words[] = line.split ("[\\W]");

          for (int j=0;j<words.length;j++)
          { 
            if ((index = termVector.indexOf (words[j])) != -1)
            {
              tf.get (index)[i]++;
              /* increase the tf count */
            }
            else
            {
              termVector.add (words[j]);
              Integer temp[] = new Integer [fileCount];

              for (int k=0; k<fileCount; k++)
              {
                temp[k] = new Integer (0);
              }
              temp[i] = 1;
              tf.add (temp);
              index = termVector.indexOf (words[j]);
            }

            System.out.println (words[j]);
          }
        }
      }
      /* Not handled */
      catch (IOException e)
      {
        System.out.println (e);
      }
    }

    return 0;
  }
}

class DocumentRepresentationTest
{
  public static void main (String args[])
  {
    DocumentRepresentation docSet = new DocumentRepresentation (args[0]);
    docSet.start ();
    System.out.print ("\n");
  }
}

Run Code Online (Sandbox Code Playgroud)

注意:代码会被剪切以保持对问题的关注

Answer 1

NPE*_*NPE 5

LinkedHashMap 可以立即满足您的所有要求,具有良好的性能特征.

键将是您的项目,值将是索引.如果按增加索引的顺序插入元素,则迭代映射也会按增加索引的顺序返回元素.

以下是一些示例代码:

LinkedHashMap<Item,Integer> map = new LinkedHashMap<Item,Integer>();

Run Code Online (Sandbox Code Playgroud)

获取项目的索引:

Integer index = map.get(item);
if (index != null) {
  // already in the map; use `index'
} else {
  // not in the map
}

Run Code Online (Sandbox Code Playgroud)

添加item下一个索引:

if (!map.containsKey(item)) {
  map.put(item, map.size());
}

Run Code Online (Sandbox Code Playgroud)

按增加索引的顺序迭代元素:

for (Entry<Item,Integer> e : map.entrySet()) {
  Item item = e.getKey();
  int index = e.getValue();
  ...
}

Run Code Online (Sandbox Code Playgroud)

这不能有效地做到的是获取特定索引处的值,但是我对您的问题的阅读表明您实际上并不需要这个.

归档时间：	14 年前
查看次数：	1094 次
最近记录：	14 年前