xmz*_*xmz 2 c++ string parsing lexical-analysis ifstream
我想从文本文件中逐字读取。这是我的 C++ 代码:
int main(int argc, const char * argv[]) {
// insert code here...
ifstream file("./wordCount.txt");
string word;
while(file >> word){
cout<<word<<endl;
}
return 0;
}
Run Code Online (Sandbox Code Playgroud)
该文本文件包含以下句子:
I don't have power, but he has power.
Run Code Online (Sandbox Code Playgroud)
这是我得到的结果:
I
don\241\257t
have
power,
but
he
has
power.
Run Code Online (Sandbox Code Playgroud)
你能告诉我如何获得如下格式的结果:
I
don't
have
power
but
he
has
power
Run Code Online (Sandbox Code Playgroud)
谢谢。
我知道您正在寻求摆脱标点符号。
不幸的是,从流中提取字符串仅查找空格作为分隔符。因此,“don't”或“Hello,world”将被读作一个单词,而“don't”或“Hello,world”将被读作两个单词。
另一种方法是逐行读取文本,并使用string::find_first_of()从分隔符跳转到分隔符:
string separator{" \t\r\n,.!?;:"};
string line;
string word;
while(getline (cin, line)){ // read line by line
size_t e,s=0; // s = offset of next word, e = end of next word
do {
s = line.find_first_not_of(separator,s); // skip leading separators
if (s==string::npos) // stop if no word left
break;
e=line.find_first_of(separator, s); // find next separator
string word(line.substr(s,e-s)); // construct the word
cout<<word<<endl;
s=e+1; // position after the separator
} while (e!=string::npos); // loop if end of line not reached
}
Run Code Online (Sandbox Code Playgroud)