从文本文件中读取并将行解析为C中的单词

use*_*774 11 c io file-io file

我是C和系统编程的初学者.对于家庭作业,我需要编写一个程序,将stdin解析行的输入读入单词并使用System V消息队列(例如,计数单词)将单词发送到排序子进程.我被困在输入部分.我正在尝试处理输入,删除非字母字符,将所有字母单词放在小写字母中,最后将一行单词分成多个单词.到目前为止,我可以用小写字母打印所有字母单词,但字母之间有一些线,我认为这是不正确的.有人可以看看并给我一些建议吗?

来自荷马的荷马伊利亚特的Project Gutenberg电子书

我认为正确的输出应该是:

the
project
gutenberg
ebook
of
the
iliad
of
homer
by
homer
Run Code Online (Sandbox Code Playgroud)

但我的输出如下:

project
gutenberg
ebook
of
the
iliad
of
homer
                         <------There is a line there
by
homer
Run Code Online (Sandbox Code Playgroud)

我认为空行是由","和"by"之间的空格引起的.我尝试了"if isspace(c)然后什么都不做"之类的事情,但它不起作用.我的代码如下.任何帮助或建议表示赞赏.

#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <fcntl.h>
#include <errno.h>
#include <unistd.h>
#include <string.h>


//Main Function
int main (int argc, char **argv)
{
    int c;
    char *input = argv[1];
    FILE *input_file;

    input_file = fopen(input, "r");

    if (input_file == 0)
    {
        //fopen returns 0, the NULL pointer, on failure
        perror("Canot open input file\n");
        exit(-1);
    }
    else
    {        
        while ((c =fgetc(input_file)) != EOF )
        {
            //if it's an alpha, convert it to lower case
            if (isalpha(c))
            {
                c = tolower(c);
                putchar(c);
            }
            else if (isspace(c))
            {
                ;   //do nothing
            }
            else
            {
                c = '\n';
                putchar(c);
            }
        }
    }

    fclose(input_file);

    printf("\n");

    return 0;
}
Run Code Online (Sandbox Code Playgroud)

编辑**

我编辑了我的代码,最后得到了正确的输出:

int main (int argc, char **argv)
{
    int c;
    char *input = argv[1];
    FILE *input_file;

    input_file = fopen(input, "r");

    if (input_file == 0)
    {
        //fopen returns 0, the NULL pointer, on failure
        perror("Canot open input file\n");
        exit(-1);
    }
    else
    {
        int found_word = 0;

        while ((c =fgetc(input_file)) != EOF )
        {
            //if it's an alpha, convert it to lower case
            if (isalpha(c))
            {
                found_word = 1;
                c = tolower(c);
                putchar(c);
            }
            else {
                if (found_word) {
                    putchar('\n');
                    found_word=0;
                }
            }

        }
    }

    fclose(input_file);

    printf("\n");

    return 0;
}
Run Code Online (Sandbox Code Playgroud)

Rob*_*Rob 6

我认为你只需要忽略任何非alpha字符!isalpha(c)否则转换为小写字母.在这种情况下,当您找到单词时,您需要跟踪.

int found_word = 0;

while ((c =fgetc(input_file)) != EOF )
{
    if (!isalpha(c))
    {
        if (found_word) {
            putchar('\n');
            found_word = 0;
        }
    }
    else {
        found_word = 1;
        c = tolower(c);
        putchar(c);
    }
}
Run Code Online (Sandbox Code Playgroud)

如果你需要在诸如"不是"之类的单词中处理撇号,那么这应该这样做.

int found_word = 0;
int found_apostrophe = 0;
    while ((c =fgetc(input_file)) != EOF )
    {
    if (!isalpha(c))
    {
        if (found_word) {
            if (!found_apostrophe && c=='\'') {
                found_apostrophe = 1;
            }
            else {
                found_apostrophe = 0;
                putchar('\n');
                found_word = 0;
            }
                }
    }
    else {
        if (found_apostrophe) {
            putchar('\'');
            found_apostrophe == 0;
        }
        found_word = 1;
        c = tolower(c);
        putchar(c);
    }
}
Run Code Online (Sandbox Code Playgroud)