The*_*mer 310 c++ parsing split
我使用以下代码在C++中解析字符串:
string parsed,input="text to be parsed";
stringstream input_stringstream(input);
if(getline(input_stringstream,parsed,' '))
{
// do some processing.
}
Run Code Online (Sandbox Code Playgroud)
使用单个char分隔符进行解析很好.但是如果我想使用字符串作为分隔符该怎么办呢.
示例:我想拆分:
scott>=tiger
Run Code Online (Sandbox Code Playgroud)
用> =作为分隔符,这样我就能得到斯科特和老虎.
Vin*_*Pii 500
您可以使用该std::string::find()
函数查找字符串分隔符的位置,然后使用它std::string::substr()
来获取令牌.
例:
std::string s = "scott>=tiger";
std::string delimiter = ">=";
std::string token = s.substr(0, s.find(delimiter)); // token is "scott"
Run Code Online (Sandbox Code Playgroud)
该find(const string& str, size_t pos = 0)
函数返回str
字符串中第一次出现的位置,或者npos
如果找不到该字符串.
该substr(size_t pos = 0, size_t n = npos)
函数返回对象的子字符串,从位置pos
和长度开始npos
.
如果您有多个分隔符,则在提取一个标记后,可以将其删除(包括分隔符)以继续进行后续提取(如果要保留原始字符串,只需使用s = s.substr(pos + delimiter.length());
):
s.erase(0, s.find(delimiter) + delimiter.length());
Run Code Online (Sandbox Code Playgroud)
这样您就可以轻松循环获取每个令牌.
std::string s = "scott>=tiger>=mushroom";
std::string delimiter = ">=";
size_t pos = 0;
std::string token;
while ((pos = s.find(delimiter)) != std::string::npos) {
token = s.substr(0, pos);
std::cout << token << std::endl;
s.erase(0, pos + delimiter.length());
}
std::cout << s << std::endl;
Run Code Online (Sandbox Code Playgroud)
输出:
scott
tiger
mushroom
Run Code Online (Sandbox Code Playgroud)
mos*_*ald 55
此方法std::string::find
通过记住前一个子字符串标记的开头和结尾来使用而不改变原始字符串.
#include <iostream>
#include <string>
int main()
{
std::string s = "scott>=tiger";
std::string delim = ">=";
auto start = 0U;
auto end = s.find(delim);
while (end != std::string::npos)
{
std::cout << s.substr(start, end - start) << std::endl;
start = end + delim.length();
end = s.find(delim, start);
}
std::cout << s.substr(start, end);
}
Run Code Online (Sandbox Code Playgroud)
Svi*_*lav 31
您可以使用next函数来拆分字符串:
vector<string> split(const string& str, const string& delim)
{
vector<string> tokens;
size_t prev = 0, pos = 0;
do
{
pos = str.find(delim, prev);
if (pos == string::npos) pos = str.length();
string token = str.substr(prev, pos-prev);
if (!token.empty()) tokens.push_back(token);
prev = pos + delim.length();
}
while (pos < str.length() && prev < str.length());
return tokens;
}
Run Code Online (Sandbox Code Playgroud)
Rik*_*ika 23
您还可以为此使用正则表达式:
std::vector<std::string> split(const std::string str, const std::string regex_str)
{
std::regex regexz(regex_str);
std::vector<std::string> list(std::sregex_token_iterator(str.begin(), str.end(), regexz, -1),
std::sregex_token_iterator());
return list;
}
Run Code Online (Sandbox Code Playgroud)
这相当于:
std::vector<std::string> split(const std::string str, const std::string regex_str)
{
std::sregex_token_iterator token_iter(str.begin(), str.end(), regexz, -1);
std::sregex_token_iterator end;
std::vector<std::string> list;
while (token_iter != end)
{
list.emplace_back(*token_iter++);
}
return list;
}
Run Code Online (Sandbox Code Playgroud)
并像这样使用它:
#include <iostream>
#include <string>
#include <regex>
std::vector<std::string> split(const std::string str, const std::string regex_str)
{ // a yet more concise form!
return { std::sregex_token_iterator(str.begin(), str.end(), std::regex(regex_str), -1), std::sregex_token_iterator() };
}
int main()
{
std::string input_str = "lets split this";
std::string regex_str = " ";
auto tokens = split(input_str, regex_str);
for (auto& item: tokens)
{
std::cout<<item <<std::endl;
}
}
Run Code Online (Sandbox Code Playgroud)
您可以像正常一样简单地使用子字符串、字符等,或者使用实际的正则表达式来进行拆分。
它也简洁和 C++11!
Ara*_*san 22
基于字符串分隔符拆分字符串.比如"adsf-+qwret-+nvfkbdsj-+orthdfjgh-+dfjrleih"
根据字符串分隔符拆分字符串"-+"
,输出即可{"adsf", "qwret", "nvfkbdsj", "orthdfjgh", "dfjrleih"}
#include <iostream>
#include <sstream>
#include <vector>
using namespace std;
// for string delimiter
vector<string> split (string s, string delimiter) {
size_t pos_start = 0, pos_end, delim_len = delimiter.length();
string token;
vector<string> res;
while ((pos_end = s.find (delimiter, pos_start)) != string::npos) {
token = s.substr (pos_start, pos_end - pos_start);
pos_start = pos_end + delim_len;
res.push_back (token);
}
res.push_back (s.substr (pos_start));
return res;
}
int main() {
string str = "adsf-+qwret-+nvfkbdsj-+orthdfjgh-+dfjrleih";
string delimiter = "-+";
vector<string> v = split (str, delimiter);
for (auto i : v) cout << i << endl;
return 0;
}
Run Code Online (Sandbox Code Playgroud)
产量
adsf qwret nvfkbdsj orthdfjgh dfjrleih
基于字符分隔符拆分字符串.比如"adsf+qwer+poui+fdgh"
用分隔符拆分字符串"+"
就会输出{"adsf", "qwer", "poui", "fdg"h}
#include <iostream>
#include <sstream>
#include <vector>
using namespace std;
vector<string> split (const string &s, char delim) {
vector<string> result;
stringstream ss (s);
string item;
while (getline (ss, item, delim)) {
result.push_back (item);
}
return result;
}
int main() {
string str = "adsf+qwer+poui+fdgh";
vector<string> v = split (str, '+');
for (auto i : v) cout << i << endl;
return 0;
}
Run Code Online (Sandbox Code Playgroud)
产量
adsf qwer poui fdgh
rya*_*ork 15
strtok允许您传递多个字符作为分隔符.我打赌如果你传入"> ="你的示例字符串将被正确分割(即使>和=被计为单独的分隔符).
编辑如果您不想使用c_str()
从字符串转换为char*,则可以使用substr和find_first_of进行标记.
string token, mystring("scott>=tiger");
while(token != mystring){
token = mystring.substr(0,mystring.find_first_of(">="));
mystring = mystring.substr(mystring.find_first_of(">=") + 1);
printf("%s ",token.c_str());
}
Run Code Online (Sandbox Code Playgroud)
Wil*_*vo 15
此代码从文本中分割行,并将所有人添加到矢量中.
vector<string> split(char *phrase, string delimiter){
vector<string> list;
string s = string(phrase);
size_t pos = 0;
string token;
while ((pos = s.find(delimiter)) != string::npos) {
token = s.substr(0, pos);
list.push_back(token);
s.erase(0, pos + delimiter.length());
}
list.push_back(s);
return list;
}
Run Code Online (Sandbox Code Playgroud)
被称为:
vector<string> listFilesMax = split(buffer, "\n");
Run Code Online (Sandbox Code Playgroud)
Shu*_*wal 10
答案已经存在,但是 selected-answer 使用了非常昂贵的擦除功能,想想一些非常大的字符串(以 MB 为单位)。因此我使用以下功能。
vector<string> split(const string& i_str, const string& i_delim)
{
vector<string> result;
size_t found = i_str.find(i_delim);
size_t startIndex = 0;
while(found != string::npos)
{
result.push_back(string(i_str.begin()+startIndex, i_str.begin()+found));
startIndex = found + i_delim.size();
found = i_str.find(i_delim, startIndex);
}
if(startIndex != i_str.size())
result.push_back(string(i_str.begin()+startIndex, i_str.end()));
return result;
}
Run Code Online (Sandbox Code Playgroud)
Nox*_*Nox 10
一种使用 C++20 的方法:
#include <iostream>
#include <ranges>
#include <string_view>
int main()
{
std::string hello = "text to be parsed";
auto split = hello
| std::ranges::views::split(' ')
| std::ranges::views::transform([](auto&& str) { return std::string_view(&*str.begin(), std::ranges::distance(str)); });
for (auto&& word : split)
{
std::cout << word << std::endl;
}
}
Run Code Online (Sandbox Code Playgroud)
请参阅:https : //stackoverflow.com/a/48403210/10771848 https://en.cppreference.com/w/cpp/ranges/split_view
我会使用boost::tokenizer
. 这里的文档解释了如何制作适当的标记器函数:http : //www.boost.org/doc/libs/1_52_0/libs/tokenizer/tokenizerfunction.htm
这是一种适用于您的情况。
struct my_tokenizer_func
{
template<typename It>
bool operator()(It& next, It end, std::string & tok)
{
if (next == end)
return false;
char const * del = ">=";
auto pos = std::search(next, end, del, del + 2);
tok.assign(next, pos);
next = pos;
if (next != end)
std::advance(next, 2);
return true;
}
void reset() {}
};
int main()
{
std::string to_be_parsed = "1) one>=2) two>=3) three>=4) four";
for (auto i : boost::tokenizer<my_tokenizer_func>(to_be_parsed))
std::cout << i << '\n';
}
Run Code Online (Sandbox Code Playgroud)
这是我对此的看法。它处理边缘情况并采用可选参数从结果中删除空条目。
bool endsWith(const std::string& s, const std::string& suffix)
{
return s.size() >= suffix.size() &&
s.substr(s.size() - suffix.size()) == suffix;
}
std::vector<std::string> split(const std::string& s, const std::string& delimiter, const bool& removeEmptyEntries = false)
{
std::vector<std::string> tokens;
for (size_t start = 0, end; start < s.length(); start = end + delimiter.length())
{
size_t position = s.find(delimiter, start);
end = position != string::npos ? position : s.length();
std::string token = s.substr(start, end - start);
if (!removeEmptyEntries || !token.empty())
{
tokens.push_back(token);
}
}
if (!removeEmptyEntries &&
(s.empty() || endsWith(s, delimiter)))
{
tokens.push_back("");
}
return tokens;
}
Run Code Online (Sandbox Code Playgroud)
例子
split("a-b-c", "-"); // [3]("a","b","c")
split("a--c", "-"); // [3]("a","","c")
split("-b-", "-"); // [3]("","b","")
split("--c--", "-"); // [5]("","","c","","")
split("--c--", "-", true); // [1]("c")
split("a", "-"); // [1]("a")
split("", "-"); // [1]("")
split("", "-", true); // [0]()
Run Code Online (Sandbox Code Playgroud)
这对于字符串(或单个字符)分隔符应该非常有效。不要忘记包含#include <sstream>
.
std::string input = "Alfa=,+Bravo=,+Charlie=,+Delta";
std::string delimiter = "=,+";
std::istringstream ss(input);
std::string token;
std::string::iterator it;
while(std::getline(ss, token, *(it = delimiter.begin()))) {
std::cout << token << std::endl; // Token is extracted using '='
it++;
// Skip the rest of delimiter if exists ",+"
while(it != delimiter.end() and ss.peek() == *(it)) {
it++; ss.get();
}
}
Run Code Online (Sandbox Code Playgroud)
第一个 while 循环使用字符串定界符的第一个字符提取标记。第二个 while 循环跳过分隔符的其余部分并在下一个标记的开头停止。
一个非常简单/天真的方法:
vector<string> words_seperate(string s){
vector<string> ans;
string w="";
for(auto i:s){
if(i==' '){
ans.push_back(w);
w="";
}
else{
w+=i;
}
}
ans.push_back(w);
return ans;
}
Run Code Online (Sandbox Code Playgroud)
或者您可以使用 boost 库 split 函数:
vector<string> result;
boost::split(result, input, boost::is_any_of("\t"));
Run Code Online (Sandbox Code Playgroud)
或者您可以尝试 TOKEN 或 strtok:
char str[] = "DELIMIT-ME-C++";
char *token = strtok(str, "-");
while (token)
{
cout<<token;
token = strtok(NULL, "-");
}
Run Code Online (Sandbox Code Playgroud)
或者你可以这样做:
char split_with=' ';
vector<string> words;
string token;
stringstream ss(our_string);
while(getline(ss , token , split_with)) words.push_back(token);
Run Code Online (Sandbox Code Playgroud)
以防万一将来有人想要Vincenzo Pii的答案的开箱即用功能
#include <vector>
#include <string>
std::vector<std::string> SplitString(
std::string str,
std::string delimeter)
{
std::vector<std::string> splittedStrings = {};
size_t pos = 0;
while ((pos = str.find(delimeter)) != std::string::npos)
{
std::string token = str.substr(0, pos);
if (token.length() > 0)
splittedStrings.push_back(token);
str.erase(0, pos + delimeter.length());
}
if (str.length() > 0)
splittedStrings.push_back(str);
return splittedStrings;
}
Run Code Online (Sandbox Code Playgroud)
我还修复了一些错误,以便如果字符串开头或结尾有分隔符,该函数不会返回空字符串
归档时间: |
|
查看次数: |
986044 次 |
最近记录: |