搜索大文件中的特定字符串

Question

搜索大文件中的特定字符串

我在C中编写程序,可以在一个大的.txt文件中搜索特定的字符串并对其进行计数然后将其打印出来.但似乎出现了问题,导致我的程序输出与两个文本编辑器的输出不同.根据文本编辑器,总共有3000个单词,在这种情况下,我在.txt文件中搜索单词"make" .但我的程序输出只有2970.

我找不到我的程序的问题.所以我很好奇文本编辑器如何准确地搜索特定的字符串？人们如何实现这一点？有人能用C给我看一些代码吗？

为了清楚起见:这是一个大的.txt文件,大约20M,包含很多字符.所以我认为一次将它读入内存并不是那么好.我已经通过将我的程序分成碎片来实现我的程序,然后扫描所有这些以进行解析.但是,它在某种程度上失败了.

也许我应该把代码放在这里.请等一下.

代码有点长,70行左右.我把它放在我的github上,如果您有任何兴趣,请帮忙.https://github.com/walkerlala/searchText 请注意,唯一相关的文件是wordCount.c和testfile.txt,如下所示:

#include<stdio.h>
#include<stdlib.h>
#include<stdbool.h>
char arr[51];
int flag=0;
int flag2=0;
int flag3=0;
int flag4=0;
int pieceCount(FILE*);
int main()
{
     //the file in which I want to search the word is testfile.txt
    //I have formatted the file so that it contain no newlins any more
    FILE* fs=fopen("testfile.txt","r");
    int n=pieceCount(fs);
    printf("%d\n",n);           



    rewind(fs);         //refresh the file...

    static bool endOfPiece1=false,endOfPiece2=false,endOfPiece3=false;
    bool begOfPiece1,begOfPiece2,begOfPiece3;

    for(int start=0;start<n;++start){
            fgets(arr,sizeof(arr),fs);
            for(int i=0;i<=46;++i){
                if((arr[i]=='M'||arr[i]=='m')&&(arr[i+1]=='A'||arr[i+1]=='a')&&(arr[i+2]=='K'||arr[i+2]=='k')&&(arr[i+3]=='E'||arr[i+3]=='e')){
                    flag+=1;
                    //continue;
                }
        }


    //check the border
        begOfPiece1=((arr[1]=='e'||arr[1]=='E'));
        if(begOfPiece1==true&&endOfPiece1==true)
            flag2+=1;
        endOfPiece1=((arr[47]=='m'||arr[47]=='M')&&(arr[48]=='a'||arr[48]=='A')&&(arr[49]=='k'||arr[49]=='K'));

        begOfPiece2=((arr[1]=='k'||arr[1]=='K')&&(arr[2]=='e'||arr[2]=='E'));
        if(begOfPiece2==true&&endOfPiece2==true)
            flag3+=1;
        endOfPiece2=((arr[48]=='m'||arr[48]=='M')&&(arr[49]=='a'||arr[49]=='A'));

        begOfPiece3=((arr[1]=='a'||arr[1]=='A')&&(arr[2]=='k'||arr[2]=='K')&&(arr[3]=='e'||arr[3]=='E'));
        if(begOfPiece3==true&&endOfPiece3==true)
            flag4+=1;
        endOfPiece3=(arr[49]=='m'||arr[49]=='M');

} 
  printf("%d\n%d\n%d\n%d\n",flag,flag2,flag3,flag4);
    getchar();
    return 0;
}

//the function counts how many pieces have I split the file into
int pieceCount(FILE* file){
    static int count=0;
    char arr2[51]={'\0'};
  while(fgets(arr2,sizeof(arr),file)){
        count+=1;
        continue;
    }

    return count;
}

Run Code Online (Sandbox Code Playgroud)

Answer 1

Wea*_*ane 5

你只需要一个滚动缓冲区就可以做到这一点.您不需要将文件分成几个部分.

#include <stdio.h>
#include <string.h>

int main(void) {

    char buff [4];                                  // word buffer
    int count = 0;                                  // occurrences
    FILE* fs=fopen("test.txt","r");                 // open the file
    if (fs != NULL) {                               // if the file opened
        if (4 == fread(buff, 1, 4, fs)) {           // fill the buffer
            do {                                    // if it worked
                if (strnicmp(buff, "make", 4) == 0) // check for target word
                    count++;                        // tally
                memmove(buff, buff+1, 3);           // shift the buffer down
            } while (1 == fread(buff+3, 1, 1, fs)); // fill the last position
        }                                           // end of file
        fclose(fs);                                 // close the file
    }
    printf("%d\n", count);                          // report the result
    return 0;
}

Run Code Online (Sandbox Code Playgroud)

为简单起见,我没有使搜索词"更柔和"并分配正确的缓冲区和各种大小,因为这不是问题.我必须留下一些东西供OP使用.

归档时间：	10 年，7 月前
查看次数：	923 次
最近记录：	10 年，7 月前