从文件java中删除空格

Sou*_*uad 1 java whitespace file removing-whitespace

我试图通过首先删除停用词并对它们应用词干算法来处理文本,然后将它们分成单词并将它们保存到文件中.我做了所有这些,我遇到的问题是文件中包含以下单词的空格:

Hi
teacher

mother
sister
father .... and so on
Run Code Online (Sandbox Code Playgroud)

问题是老师和母亲之间的空间.我想删除它.我无法弄清楚它的原因.

这是相关代码的药水.

public void parseFiles(String filePath) throws FileNotFoundException, IOException {
    File[] allfiles = new File(filePath).listFiles();
    BufferedReader in = null;
    for (File f : allfiles) {
        if (f.getName().endsWith(".txt")) {
            fileNameList.add(f.getName());
            Reader fstream = new InputStreamReader(new FileInputStream(f),"UTF-8"); 
            in = new BufferedReader(fstream);
            StringBuilder sb = new StringBuilder();
            String s=null;
            String word = null;
            while ((s = in.readLine()) != null) {
                s=s.trim().replaceAll("[^A-Za-z0-9]", " ");        //remove all punctuation for English text
                Scanner input = new Scanner(s);
                  while(input.hasNext()) {              
                       word= input.next();
                       word=word.trim().toLowerCase();
                if(stopword.isStopword(word)==true)
                {
                    word= word.replace(word, "");
                }
                String stemmed=stem.stem (word);
                sb.append(stemmed+"\t");

                  }
                   //System.out.print(sb);

            }
            String[] tokenizedTerms = sb.toString().replaceAll("[\\W&&[^\\s]]", "").split("\\W+");   //to get individual terms  (English)

          for (String term : tokenizedTerms) {
               if (!allTerms.contains(term)) {  //avoid duplicate entry
                 allTerms.add(term);
                   System.out.print(term+"\t");
                 }
            }
            termsDocsArray.add(tokenizedTerms);
        }
    } 
    //System.out.print("file names="+fileNameList);
}
Run Code Online (Sandbox Code Playgroud)

请帮忙.谢谢

Chr*_*ian 5

为什么不使用if检查线是否为空?

while ((s = in.readLine()) != null) {
  if (!s.trim().isEmpty()) {
  ...
  }
}
Run Code Online (Sandbox Code Playgroud)

  • 我还会添加一个`trim()`,因为如果它只是由空格组成,你可以认为它是一个空的字符串 (2认同)