xmz*_*xmz 2 c++ string parsing lexical-analysis ifstream
我想从文本文件中逐字读取。这是我的 C++ 代码:
int main(int argc, const char * argv[]) {
    // insert code here...
    ifstream file("./wordCount.txt");
    string word;
    while(file >> word){
        cout<<word<<endl;
    }
    return 0;
}
该文本文件包含以下句子:
I don't have power, but he has power.
这是我得到的结果:
I
don\241\257t
have
power,
but
he
has
power.
你能告诉我如何获得如下格式的结果:
I
don't
have
power
but
he
has
power
谢谢。
我知道您正在寻求摆脱标点符号。
不幸的是,从流中提取字符串仅查找空格作为分隔符。因此,“don't”或“Hello,world”将被读作一个单词,而“don't”或“Hello,world”将被读作两个单词。
另一种方法是逐行读取文本,并使用string::find_first_of()从分隔符跳转到分隔符:     
string separator{" \t\r\n,.!?;:"};
string line; 
string word;
while(getline (cin, line)){  // read line by line 
    size_t e,s=0;            // s = offset of next word, e = end of next word 
    do {
        s = line.find_first_not_of(separator,s);  // skip leading separators
        if (s==string::npos)                  // stop if no word left
            break;
        e=line.find_first_of(separator, s);   // find next separator 
        string word(line.substr(s,e-s));      // construct the word
        cout<<word<<endl;
        s=e+1;                                // position after the separator
    } while (e!=string::npos);                // loop if end of line not reached
}