Ash*_*ppa 2895 c++ string split
我正在尝试迭代字符串的单词.
可以假设该字符串由用空格分隔的单词组成.
请注意,我对C字符串函数或那种字符操作/访问不感兴趣.另外,请在答案中优先考虑优雅而不是效率.
我现在最好的解决方案是:
#include <iostream>
#include <sstream>
#include <string>
using namespace std;
int main()
{
string s = "Somewhere down the road";
istringstream iss(s);
do
{
string subs;
iss >> subs;
cout << "Substring: " << subs << endl;
} while (iss);
}
Run Code Online (Sandbox Code Playgroud)
有没有更优雅的方式来做到这一点?
Eva*_*ran 2395
我用它来分隔字符串.第一个将结果放在预先构造的向量中,第二个返回一个新向量.
#include <string>
#include <sstream>
#include <vector>
#include <iterator>
template <typename Out>
void split(const std::string &s, char delim, Out result) {
std::istringstream iss(s);
std::string item;
while (std::getline(iss, item, delim)) {
*result++ = item;
}
}
std::vector<std::string> split(const std::string &s, char delim) {
std::vector<std::string> elems;
split(s, delim, std::back_inserter(elems));
return elems;
}
Run Code Online (Sandbox Code Playgroud)
请注意,此解决方案不会跳过空标记,因此以下内容将找到4个项目,其中一个项目为空:
std::vector<std::string> x = split("one:two::three", ':');
Run Code Online (Sandbox Code Playgroud)
Zun*_*ino 1329
对于它的价值,这是从输入字符串中提取标记的另一种方法,仅依赖于标准库设施.这是STL设计背后的力量和优雅的一个例子.
#include <iostream>
#include <string>
#include <sstream>
#include <algorithm>
#include <iterator>
int main() {
using namespace std;
string sentence = "And I feel fine...";
istringstream iss(sentence);
copy(istream_iterator<string>(iss),
istream_iterator<string>(),
ostream_iterator<string>(cout, "\n"));
}
Run Code Online (Sandbox Code Playgroud)
可以使用相同的通用copy
算法将提取的标记插入到容器中,而不是将提取的标记复制到输出流.
vector<string> tokens;
copy(istream_iterator<string>(iss),
istream_iterator<string>(),
back_inserter(tokens));
Run Code Online (Sandbox Code Playgroud)
...或vector
直接创建:
vector<string> tokens{istream_iterator<string>{iss},
istream_iterator<string>{}};
Run Code Online (Sandbox Code Playgroud)
idi*_*dak 826
使用Boost的可能解决方案可能是:
#include <boost/algorithm/string.hpp>
std::vector<std::string> strs;
boost::split(strs, "string to split", boost::is_any_of("\t "));
Run Code Online (Sandbox Code Playgroud)
这种方法可能比stringstream
方法更快.由于这是一个通用模板函数,因此可以使用各种分隔符来分割其他类型的字符串(wchar等或UTF-8).
有关详细信息,请参阅文档
kev*_*kev 353
#include <vector>
#include <string>
#include <sstream>
int main()
{
std::string str("Split me by whitespaces");
std::string buf; // Have a buffer string
std::stringstream ss(str); // Insert the string into a stream
std::vector<std::string> tokens; // Create vector to hold our words
while (ss >> buf)
tokens.push_back(buf);
return 0;
}
Run Code Online (Sandbox Code Playgroud)
Mar*_*ius 180
对于那些不能很好地牺牲代码大小的所有效率并将"高效"视为一种优雅的人来说,下面的内容应该是一个很好的选择(我认为模板容器类是一个非常优雅的添加.):
template < class ContainerT >
void tokenize(const std::string& str, ContainerT& tokens,
const std::string& delimiters = " ", bool trimEmpty = false)
{
std::string::size_type pos, lastPos = 0, length = str.length();
using value_type = typename ContainerT::value_type;
using size_type = typename ContainerT::size_type;
while(lastPos < length + 1)
{
pos = str.find_first_of(delimiters, lastPos);
if(pos == std::string::npos)
{
pos = length;
}
if(pos != lastPos || !trimEmpty)
tokens.push_back(value_type(str.data()+lastPos,
(size_type)pos-lastPos ));
lastPos = pos + 1;
}
}
Run Code Online (Sandbox Code Playgroud)
我通常选择使用std::vector<std::string>
类型作为我的第二个参数(ContainerT
)...但是list<>
比vector<>
不需要直接访问时更快,你甚至可以创建自己的字符串类并使用类似的std::list<subString>
地方subString
不做任何副本以获得令人难以置信的速度增加.
它比本页面上最快的标记化速度快一倍以上,几乎是其他标记的5倍.此外,使用完美的参数类型,您可以消除所有字符串和列表副本,以提高速度.
此外,它不会(非常低效)返回结果,而是将标记作为参考传递,因此如果您愿意,还可以使用多个调用来构建标记.
最后,它允许您指定是否通过最后一个可选参数从结果中修剪空标记.
它所需要的只是std::string
......其余的都是可选的.它不使用流或boost库,但足够灵活,能够自然地接受这些外来类型中的一些.
Ale*_*mas 157
这是另一种解决方案.它结构紧凑,效率相当高:
std::vector<std::string> split(const std::string &text, char sep) {
std::vector<std::string> tokens;
std::size_t start = 0, end = 0;
while ((end = text.find(sep, start)) != std::string::npos) {
tokens.push_back(text.substr(start, end - start));
start = end + 1;
}
tokens.push_back(text.substr(start));
return tokens;
}
Run Code Online (Sandbox Code Playgroud)
它很容易被处理以处理字符串分隔符,宽字符串等.
请注意,拆分会""
产生一个空字符串,并且拆分","
(即sep)会产生两个空字符串.
它也可以很容易地扩展为跳过空标记:
std::vector<std::string> split(const std::string &text, char sep) {
std::vector<std::string> tokens;
std::size_t start = 0, end = 0;
while ((end = text.find(sep, start)) != std::string::npos) {
if (end != start) {
tokens.push_back(text.substr(start, end - start));
}
start = end + 1;
}
if (end != start) {
tokens.push_back(text.substr(start));
}
return tokens;
}
Run Code Online (Sandbox Code Playgroud)
如果需要在跳过空标记的情况下将字符串拆分为多个分隔符,则可以使用以下版本:
std::vector<std::string> split(const std::string& text, const std::string& delims)
{
std::vector<std::string> tokens;
std::size_t start = text.find_first_not_of(delims), end = 0;
while((end = text.find_first_of(delims, start)) != std::string::npos)
{
tokens.push_back(text.substr(start, end - start));
start = text.find_first_not_of(delims, end);
}
if(start != std::string::npos)
tokens.push_back(text.substr(start));
return tokens;
}
Run Code Online (Sandbox Code Playgroud)
gno*_*med 122
这是我最喜欢的迭代字符串的方法.你可以随心所欲地做任何事.
string line = "a line of text to iterate through";
string word;
istringstream iss(line, istringstream::in);
while( iss >> word )
{
// Do something on `word` here...
}
Run Code Online (Sandbox Code Playgroud)
Fer*_*cio 80
这类似于Stack Overflow问题如何在C++中对字符串进行标记?.
#include <iostream>
#include <string>
#include <boost/tokenizer.hpp>
using namespace std;
using namespace boost;
int main(int argc, char** argv)
{
string text = "token test\tstring";
char_separator<char> sep(" \t");
tokenizer<char_separator<char>> tokens(text, sep);
for (const string& t : tokens)
{
cout << t << "." << endl;
}
}
Run Code Online (Sandbox Code Playgroud)
Sha*_*531 67
我喜欢以下内容,因为它将结果放入向量中,支持字符串作为delim并控制保持空值.但是,它看起来并不那么好.
#include <ostream>
#include <string>
#include <vector>
#include <algorithm>
#include <iterator>
using namespace std;
vector<string> split(const string& s, const string& delim, const bool keep_empty = true) {
vector<string> result;
if (delim.empty()) {
result.push_back(s);
return result;
}
string::const_iterator substart = s.begin(), subend;
while (true) {
subend = search(substart, s.end(), delim.begin(), delim.end());
string temp(substart, subend);
if (keep_empty || !temp.empty()) {
result.push_back(temp);
}
if (subend == s.end()) {
break;
}
substart = subend + delim.size();
}
return result;
}
int main() {
const vector<string> words = split("So close no matter how far", " ");
copy(words.begin(), words.end(), ostream_iterator<string>(cout, "\n"));
}
Run Code Online (Sandbox Code Playgroud)
当然,Boost有一个split()
类似的工作.并且,如果通过"白色空间",你真的意味着任何类型的白色空间,使用Boost的分裂与is_any_of()
作品很棒.
小智 53
STL没有这样的方法.
但是,您可以使用成员使用C的strtok()
功能std::string::c_str()
,也可以自己编写.以下是我在快速Google搜索后找到的代码示例("STL字符串拆分"):
void Tokenize(const string& str,
vector<string>& tokens,
const string& delimiters = " ")
{
// Skip delimiters at beginning.
string::size_type lastPos = str.find_first_not_of(delimiters, 0);
// Find first "non-delimiter".
string::size_type pos = str.find_first_of(delimiters, lastPos);
while (string::npos != pos || string::npos != lastPos)
{
// Found a token, add it to the vector.
tokens.push_back(str.substr(lastPos, pos - lastPos));
// Skip delimiters. Note the "not_of"
lastPos = str.find_first_not_of(delimiters, pos);
// Find next "non-delimiter"
pos = str.find_first_of(delimiters, lastPos);
}
}
Run Code Online (Sandbox Code Playgroud)
摘自:http://oopweb.com/CPP/Documents/CPPHOWTO/Volume/C++Programming-HOWTO-7.html
如果您对代码示例有疑问,请发表评论,我会解释.
并且因为它没有实现一个typedef
被调用的迭代器或重载,所以<<
运算符并不意味着它是错误的代码.我经常使用C函数.例如,printf
并且scanf
都是快于std::cin
和std::cout
(显著),该fopen
语法是二进制类型很多更加友好,而且他们也往往会产生更小的EXE文件.
不要出售这种"优雅超越性能"的交易.
Mar*_* M. 42
这是一个拆分函数:
忽略空标记(可以很容易地更改)
template<typename T>
vector<T>
split(const T & str, const T & delimiters) {
vector<T> v;
typename T::size_type start = 0;
auto pos = str.find_first_of(delimiters, start);
while(pos != T::npos) {
if(pos != start) // ignore empty tokens
v.emplace_back(str, start, pos - start);
start = pos + 1;
pos = str.find_first_of(delimiters, start);
}
if(start < str.length()) // ignore trailing delimiter
v.emplace_back(str, start, str.length() - start); // add what's left of the string
return v;
}
Run Code Online (Sandbox Code Playgroud)用法示例:
vector<string> v = split<string>("Hello, there; World", ";,");
vector<wstring> v = split<wstring>(L"Hello, there; World", L";,");
Run Code Online (Sandbox Code Playgroud)
Rob*_*ert 36
又一种灵活而快速的方式
template<typename Operator>
void tokenize(Operator& op, const char* input, const char* delimiters) {
const char* s = input;
const char* e = s;
while (*e != 0) {
e = s;
while (*e != 0 && strchr(delimiters, *e) == 0) ++e;
if (e - s > 0) {
op(s, e - s);
}
s = e + 1;
}
}
Run Code Online (Sandbox Code Playgroud)
将它与字符串向量一起使用(编辑:因为某人指出不继承STL类... hrmf;)):
template<class ContainerType>
class Appender {
public:
Appender(ContainerType& container) : container_(container) {;}
void operator() (const char* s, unsigned length) {
container_.push_back(std::string(s,length));
}
private:
ContainerType& container_;
};
std::vector<std::string> strVector;
Appender v(strVector);
tokenize(v, "A number of words to be tokenized", " \t");
Run Code Online (Sandbox Code Playgroud)
而已!这只是使用标记器的一种方法,比如如何计算单词:
class WordCounter {
public:
WordCounter() : noOfWords(0) {}
void operator() (const char*, unsigned) {
++noOfWords;
}
unsigned noOfWords;
};
WordCounter wc;
tokenize(wc, "A number of words to be counted", " \t");
ASSERT( wc.noOfWords == 7 );
Run Code Online (Sandbox Code Playgroud)
受想象力限制;)
rho*_*omu 36
我有一个解决这个问题的2行解决方案:
char sep = ' ';
std::string s="1 This is an example";
for(size_t p=0, q=0; p!=s.npos; p=q)
std::cout << s.substr(p+(p!=0), (q=s.find(sep, p+1))-p-(p!=0)) << std::endl;
Run Code Online (Sandbox Code Playgroud)
然后,您可以将其放在矢量中,而不是打印.
dk1*_*123 31
这是一个只使用标准正则表达式库的简单解决方案
#include <regex>
#include <string>
#include <vector>
std::vector<string> Tokenize( const string str, const std::regex regex )
{
using namespace std;
std::vector<string> result;
sregex_token_iterator it( str.begin(), str.end(), regex, -1 );
sregex_token_iterator reg_end;
for ( ; it != reg_end; ++it ) {
if ( !it->str().empty() ) //token could be empty:check
result.emplace_back( it->str() );
}
return result;
}
Run Code Online (Sandbox Code Playgroud)
regex参数允许检查多个参数(空格,逗号等)
我通常只检查分隔空格和逗号,所以我也有这个默认函数:
std::vector<string> TokenizeDefault( const string str )
{
using namespace std;
regex re( "[\\s,]+" );
return Tokenize( str, re );
}
Run Code Online (Sandbox Code Playgroud)
在"[\\s,]+"
对空间(检查\\s
)和逗号(,
).
注意,如果你想拆分wstring
而不是string
,
std::regex
以std::wregex
sregex_token_iterator
以wsregex_token_iterator
注意,您可能还希望通过引用获取字符串参数,具体取决于您的编译器.
KTC*_*KTC 26
使用std::stringstream
你的工作非常好,并做你想要的.如果你只是在寻找不同的做事方式,你可以使用std::find()
/ std::find_first_of()
和std::string::substr()
.
这是一个例子:
#include <iostream>
#include <string>
int main()
{
std::string s("Somewhere down the road");
std::string::size_type prev_pos = 0, pos = 0;
while( (pos = s.find(' ', pos)) != std::string::npos )
{
std::string substring( s.substr(prev_pos, pos-prev_pos) );
std::cout << substring << '\n';
prev_pos = ++pos;
}
std::string substring( s.substr(prev_pos, pos-prev_pos) ); // Last word
std::cout << substring << '\n';
return 0;
}
Run Code Online (Sandbox Code Playgroud)
zer*_*erm 26
如果你想使用boost,但想要使用整个字符串作为分隔符(而不是像以前提出的大多数解决方案中那样使用单个字符),你可以使用boost_split_iterator
.
示例代码包括方便的模板:
#include <iostream>
#include <vector>
#include <boost/algorithm/string.hpp>
template<typename _OutputIterator>
inline void split(
const std::string& str,
const std::string& delim,
_OutputIterator result)
{
using namespace boost::algorithm;
typedef split_iterator<std::string::const_iterator> It;
for(It iter=make_split_iterator(str, first_finder(delim, is_equal()));
iter!=It();
++iter)
{
*(result++) = boost::copy_range<std::string>(*iter);
}
}
int main(int argc, char* argv[])
{
using namespace std;
vector<string> splitted;
split("HelloFOOworldFOO!", "FOO", back_inserter(splitted));
// or directly to console, for example
split("HelloFOOworldFOO!", "FOO", ostream_iterator<string>(cout, "\n"));
return 0;
}
Run Code Online (Sandbox Code Playgroud)
Pra*_*are 20
有一个名为的函数strtok
.
#include<string>
using namespace std;
vector<string> split(char* str,const char* delim)
{
char* saveptr;
char* token = strtok_r(str,delim,&saveptr);
vector<string> result;
while(token != NULL)
{
result.push_back(token);
token = strtok_r(NULL,delim,&saveptr);
}
return result;
}
Run Code Online (Sandbox Code Playgroud)
AJM*_*eld 19
这是一个仅使用标准正则表达式库的正则表达式解决方案.(我有点生疏,所以可能会有一些语法错误,但这至少是一般的想法)
#include <regex.h>
#include <string.h>
#include <vector.h>
using namespace std;
vector<string> split(string s){
regex r ("\\w+"); //regex matches whole words, (greedy, so no fragment words)
regex_iterator<string::iterator> rit ( s.begin(), s.end(), r );
regex_iterator<string::iterator> rend; //iterators to iterate thru words
vector<string> result<regex_iterator>(rit, rend);
return result; //iterates through the matches to fill the vector
}
Run Code Online (Sandbox Code Playgroud)
luk*_*mac 17
该字符串流,如果你需要分析非空间符号串可以方便:
string s = "Name:JAck; Spouse:Susan; ...";
string dummy, name, spouse;
istringstream iss(s);
getline(iss, dummy, ':');
getline(iss, name, ';');
getline(iss, dummy, ':');
getline(iss, spouse, ';')
Run Code Online (Sandbox Code Playgroud)
小智 14
到目前为止,我使用了Boost中的那个,但我需要一些不依赖于它的东西,所以我来到这个:
static void Split(std::vector<std::string>& lst, const std::string& input, const std::string& separators, bool remove_empty = true)
{
std::ostringstream word;
for (size_t n = 0; n < input.size(); ++n)
{
if (std::string::npos == separators.find(input[n]))
word << input[n];
else
{
if (!word.str().empty() || !remove_empty)
lst.push_back(word.str());
word.str("");
}
}
if (!word.str().empty() || !remove_empty)
lst.push_back(word.str());
}
Run Code Online (Sandbox Code Playgroud)
一个好处就是separators
你可以传递多个角色.
Dan*_*nyK 13
我使用strtok自己滚动并使用boost来分割字符串.我找到的最好的方法是C++ String Toolkit Library.它非常灵活和快速.
#include <iostream>
#include <vector>
#include <string>
#include <strtk.hpp>
const char *whitespace = " \t\r\n\f";
const char *whitespace_and_punctuation = " \t\r\n\f;,=";
int main()
{
{ // normal parsing of a string into a vector of strings
std::string s("Somewhere down the road");
std::vector<std::string> result;
if( strtk::parse( s, whitespace, result ) )
{
for(size_t i = 0; i < result.size(); ++i )
std::cout << result[i] << std::endl;
}
}
{ // parsing a string into a vector of floats with other separators
// besides spaces
std::string s("3.0, 3.14; 4.0");
std::vector<float> values;
if( strtk::parse( s, whitespace_and_punctuation, values ) )
{
for(size_t i = 0; i < values.size(); ++i )
std::cout << values[i] << std::endl;
}
}
{ // parsing a string into specific variables
std::string s("angle = 45; radius = 9.9");
std::string w1, w2;
float v1, v2;
if( strtk::parse( s, whitespace_and_punctuation, w1, v1, w2, v2) )
{
std::cout << "word " << w1 << ", value " << v1 << std::endl;
std::cout << "word " << w2 << ", value " << v2 << std::endl;
}
}
return 0;
}
Run Code Online (Sandbox Code Playgroud)
该工具包比这个简单的示例具有更大的灵活性,但它在将字符串解析为有用元素方面的实用性令人难以置信.
use*_*233 13
短而优雅
#include <vector>
#include <string>
using namespace std;
vector<string> split(string data, string token)
{
vector<string> output;
size_t pos = string::npos; // size_t to avoid improbable overflow
do
{
pos = data.find(token);
output.push_back(data.substr(0, pos));
if (string::npos != pos)
data = data.substr(pos + token.size());
} while (string::npos != pos);
return output;
}
Run Code Online (Sandbox Code Playgroud)
可以使用任何字符串作为分隔符,也可以与二进制数据一起使用(std :: string支持二进制数据,包括空值)
使用:
auto a = split("this!!is!!!example!string", "!!");
Run Code Online (Sandbox Code Playgroud)
输出:
this
is
!example!string
Run Code Online (Sandbox Code Playgroud)
J. *_*lus 13
C++20 终于给我们带来了一个split
函数。或者更确切地说,是一个范围适配器。Godbolt 链接。
#include <iostream>
#include <ranges>
#include <string_view>
namespace ranges = std::ranges;
namespace views = std::views;
using str = std::string_view;
constexpr auto view =
"Multiple words"
| views::split(' ')
| views::transform([](auto &&r) -> str {
return {
&*r.begin(),
static_cast<str::size_type>(ranges::distance(r))
};
});
auto main() -> int {
for (str &&sv : view) {
std::cout << sv << '\n';
}
}
Run Code Online (Sandbox Code Playgroud)
Ste*_*ell 11
我这样做是因为我需要一种简单的方法来分割字符串和基于c的字符串...希望其他人也能发现它很有用.它也不依赖于令牌,你可以使用字段作为分隔符,这是我需要的另一个键.
我相信可以进一步改善其优雅的改进,请务必做
StringSplitter.hpp:
#include <vector>
#include <iostream>
#include <string.h>
using namespace std;
class StringSplit
{
private:
void copy_fragment(char*, char*, char*);
void copy_fragment(char*, char*, char);
bool match_fragment(char*, char*, int);
int untilnextdelim(char*, char);
int untilnextdelim(char*, char*);
void assimilate(char*, char);
void assimilate(char*, char*);
bool string_contains(char*, char*);
long calc_string_size(char*);
void copy_string(char*, char*);
public:
vector<char*> split_cstr(char);
vector<char*> split_cstr(char*);
vector<string> split_string(char);
vector<string> split_string(char*);
char* String;
bool do_string;
bool keep_empty;
vector<char*> Container;
vector<string> ContainerS;
StringSplit(char * in)
{
String = in;
}
StringSplit(string in)
{
size_t len = calc_string_size((char*)in.c_str());
String = new char[len + 1];
memset(String, 0, len + 1);
copy_string(String, (char*)in.c_str());
do_string = true;
}
~StringSplit()
{
for (int i = 0; i < Container.size(); i++)
{
if (Container[i] != NULL)
{
delete[] Container[i];
}
}
if (do_string)
{
delete[] String;
}
}
};
Run Code Online (Sandbox Code Playgroud)
StringSplitter.cpp:
#include <string.h>
#include <iostream>
#include <vector>
#include "StringSplit.hpp"
using namespace std;
void StringSplit::assimilate(char*src, char delim)
{
int until = untilnextdelim(src, delim);
if (until > 0)
{
char * temp = new char[until + 1];
memset(temp, 0, until + 1);
copy_fragment(temp, src, delim);
if (keep_empty || *temp != 0)
{
if (!do_string)
{
Container.push_back(temp);
}
else
{
string x = temp;
ContainerS.push_back(x);
}
}
else
{
delete[] temp;
}
}
}
void StringSplit::assimilate(char*src, char* delim)
{
int until = untilnextdelim(src, delim);
if (until > 0)
{
char * temp = new char[until + 1];
memset(temp, 0, until + 1);
copy_fragment(temp, src, delim);
if (keep_empty || *temp != 0)
{
if (!do_string)
{
Container.push_back(temp);
}
else
{
string x = temp;
ContainerS.push_back(x);
}
}
else
{
delete[] temp;
}
}
}
long StringSplit::calc_string_size(char* _in)
{
long i = 0;
while (*_in++)
{
i++;
}
return i;
}
bool StringSplit::string_contains(char* haystack, char* needle)
{
size_t len = calc_string_size(needle);
size_t lenh = calc_string_size(haystack);
while (lenh--)
{
if (match_fragment(haystack + lenh, needle, len))
{
return true;
}
}
return false;
}
bool StringSplit::match_fragment(char* _src, char* cmp, int len)
{
while (len--)
{
if (*(_src + len) != *(cmp + len))
{
return false;
}
}
return true;
}
int StringSplit::untilnextdelim(char* _in, char delim)
{
size_t len = calc_string_size(_in);
if (*_in == delim)
{
_in += 1;
return len - 1;
}
int c = 0;
while (*(_in + c) != delim && c < len)
{
c++;
}
return c;
}
int StringSplit::untilnextdelim(char* _in, char* delim)
{
int s = calc_string_size(delim);
int c = 1 + s;
if (!string_contains(_in, delim))
{
return calc_string_size(_in);
}
else if (match_fragment(_in, delim, s))
{
_in += s;
return calc_string_size(_in);
}
while (!match_fragment(_in + c, delim, s))
{
c++;
}
return c;
}
void StringSplit::copy_fragment(char* dest, char* src, char delim)
{
if (*src == delim)
{
src++;
}
int c = 0;
while (*(src + c) != delim && *(src + c))
{
*(dest + c) = *(src + c);
c++;
}
*(dest + c) = 0;
}
void StringSplit::copy_string(char* dest, char* src)
{
int i = 0;
while (*(src + i))
{
*(dest + i) = *(src + i);
i++;
}
}
void StringSplit::copy_fragment(char* dest, char* src, char* delim)
{
size_t len = calc_string_size(delim);
size_t lens = calc_string_size(src);
if (match_fragment(src, delim, len))
{
src += len;
lens -= len;
}
int c = 0;
while (!match_fragment(src + c, delim, len) && (c < lens))
{
*(dest + c) = *(src + c);
c++;
}
*(dest + c) = 0;
}
vector<char*> StringSplit::split_cstr(char Delimiter)
{
int i = 0;
while (*String)
{
if (*String != Delimiter && i == 0)
{
assimilate(String, Delimiter);
}
if (*String == Delimiter)
{
assimilate(String, Delimiter);
}
i++;
String++;
}
String -= i;
delete[] String;
return Container;
}
vector<string> StringSplit::split_string(char Delimiter)
{
do_string = true;
int i = 0;
while (*String)
{
if (*String != Delimiter && i == 0)
{
assimilate(String, Delimiter);
}
if (*String == Delimiter)
{
assimilate(String, Delimiter);
}
i++;
String++;
}
String -= i;
delete[] String;
return ContainerS;
}
vector<char*> StringSplit::split_cstr(char* Delimiter)
{
int i = 0;
size_t LenDelim = calc_string_size(Delimiter);
while(*String)
{
if (!match_fragment(String, Delimiter, LenDelim) && i == 0)
{
assimilate(String, Delimiter);
}
if (match_fragment(String, Delimiter, LenDelim))
{
assimilate(String,Delimiter);
}
i++;
String++;
}
String -= i;
delete[] String;
return Container;
}
vector<string> StringSplit::split_string(char* Delimiter)
{
do_string = true;
int i = 0;
size_t LenDelim = calc_string_size(Delimiter);
while (*String)
{
if (!match_fragment(String, Delimiter, LenDelim) && i == 0)
{
assimilate(String, Delimiter);
}
if (match_fragment(String, Delimiter, LenDelim))
{
assimilate(String, Delimiter);
}
i++;
String++;
}
String -= i;
delete[] String;
return ContainerS;
}
Run Code Online (Sandbox Code Playgroud)
例子:
int main(int argc, char*argv[])
{
StringSplit ss = "This:CUT:is:CUT:an:CUT:example:CUT:cstring";
vector<char*> Split = ss.split_cstr(":CUT:");
for (int i = 0; i < Split.size(); i++)
{
cout << Split[i] << endl;
}
return 0;
}
Run Code Online (Sandbox Code Playgroud)
将输出:
这
是
一个
cstring
示例
int main(int argc, char*argv[])
{
StringSplit ss = "This:is:an:example:cstring";
vector<char*> Split = ss.split_cstr(':');
for (int i = 0; i < Split.size(); i++)
{
cout << Split[i] << endl;
}
return 0;
}
int main(int argc, char*argv[])
{
string mystring = "This[SPLIT]is[SPLIT]an[SPLIT]example[SPLIT]string";
StringSplit ss = mystring;
vector<string> Split = ss.split_string("[SPLIT]");
for (int i = 0; i < Split.size(); i++)
{
cout << Split[i] << endl;
}
return 0;
}
int main(int argc, char*argv[])
{
string mystring = "This|is|an|example|string";
StringSplit ss = mystring;
vector<string> Split = ss.split_string('|');
for (int i = 0; i < Split.size(); i++)
{
cout << Split[i] << endl;
}
return 0;
}
Run Code Online (Sandbox Code Playgroud)
保留空条目(默认情况下将排除清空):
StringSplit ss = mystring;
ss.keep_empty = true;
vector<string> Split = ss.split_string(":DELIM:");
Run Code Online (Sandbox Code Playgroud)
目标是使其类似于C#的Split()方法,其中拆分字符串就像:
String[] Split =
"Hey:cut:what's:cut:your:cut:name?".Split(new[]{":cut:"}, StringSplitOptions.None);
foreach(String X in Split)
{
Console.Write(X);
}
Run Code Online (Sandbox Code Playgroud)
我希望其他人能像我一样认为这很有用.
小智 10
那这个呢:
#include <string>
#include <vector>
using namespace std;
vector<string> split(string str, const char delim) {
vector<string> v;
string tmp;
for(string::const_iterator i; i = str.begin(); i <= str.end(); ++i) {
if(*i != delim && i != str.end()) {
tmp += *i;
} else {
v.push_back(tmp);
tmp = "";
}
}
return v;
}
Run Code Online (Sandbox Code Playgroud)
Sam*_*m B 10
我简直不敢相信这些答案中的大多数都过于复杂。为什么没有人建议像这样简单的事情?
#include <iostream>
#include <sstream>
std::string input = "This is a sentence to read";
std::istringstream ss(input);
std::string token;
while(std::getline(ss, token, ' ')) {
std::cout << token << endl;
}
Run Code Online (Sandbox Code Playgroud)
小智 9
这是另一种做法..
void split_string(string text,vector<string>& words)
{
int i=0;
char ch;
string word;
while(ch=text[i++])
{
if (isspace(ch))
{
if (!word.empty())
{
words.push_back(word);
}
word = "";
}
else
{
word += ch;
}
}
if (!word.empty())
{
words.push_back(word);
}
}
Run Code Online (Sandbox Code Playgroud)
我喜欢使用boost/regex方法完成此任务,因为它们为指定拆分条件提供了最大的灵活性.
#include <iostream>
#include <string>
#include <boost/regex.hpp>
int main() {
std::string line("A:::line::to:split");
const boost::regex re(":+"); // one or more colons
// -1 means find inverse matches aka split
boost::sregex_token_iterator tokens(line.begin(),line.end(),re,-1);
boost::sregex_token_iterator end;
for (; tokens != end; ++tokens)
std::cout << *tokens << std::endl;
}
Run Code Online (Sandbox Code Playgroud)
最近我不得不将一个带有骆驼字的单词分成子字.没有分隔符,只有上面的字符.
#include <string>
#include <list>
#include <locale> // std::isupper
template<class String>
const std::list<String> split_camel_case_string(const String &s)
{
std::list<String> R;
String w;
for (String::const_iterator i = s.begin(); i < s.end(); ++i) { {
if (std::isupper(*i)) {
if (w.length()) {
R.push_back(w);
w.clear();
}
}
w += *i;
}
if (w.length())
R.push_back(w);
return R;
}
Run Code Online (Sandbox Code Playgroud)
例如,这将"AQueryTrades"拆分为"A","查询"和"交易".该函数适用于窄字符串和宽字符串.因为它尊重当前的地点,所以它将"RaumfahrtÜberwachungsVerordnung"分为"Raumfahrt","Überwachungs"和"Verordnung".
注意std::upper
应该作为函数模板参数传递.然后,这个函数的更一般化可以分隔为分隔符","
,";"
或者" "
也可以分割.
#include<iostream>
#include<string>
#include<sstream>
#include<vector>
using namespace std;
vector<string> split(const string &s, char delim) {
vector<string> elems;
stringstream ss(s);
string item;
while (getline(ss, item, delim)) {
elems.push_back(item);
}
return elems;
}
int main() {
vector<string> x = split("thi is an sample test",' ');
unsigned int i;
for(i=0;i<x.size();i++)
cout<<i<<":"<<x[i]<<endl;
return 0;
}
Run Code Online (Sandbox Code Playgroud)
这个答案接受字符串并将其放入字符串向量中.它使用boost库.
#include <boost/algorithm/string.hpp>
std::vector<std::string> strs;
boost::split(strs, "string to split", boost::is_any_of("\t "));
Run Code Online (Sandbox Code Playgroud)
下面的代码用于strtok()
将字符串拆分为标记并将标记存储在向量中.
#include <iostream>
#include <algorithm>
#include <vector>
#include <string>
using namespace std;
char one_line_string[] = "hello hi how are you nice weather we are having ok then bye";
char seps[] = " ,\t\n";
char *token;
int main()
{
vector<string> vec_String_Lines;
token = strtok( one_line_string, seps );
cout << "Extracting and storing data in a vector..\n\n\n";
while( token != NULL )
{
vec_String_Lines.push_back(token);
token = strtok( NULL, seps );
}
cout << "Displaying end result in vector line storage..\n\n";
for ( int i = 0; i < vec_String_Lines.size(); ++i)
cout << vec_String_Lines[i] << "\n";
cout << "\n\n\n";
return 0;
}
Run Code Online (Sandbox Code Playgroud)
得到提升!: - )
#include <boost/algorithm/string/split.hpp>
#include <boost/algorithm/string.hpp>
#include <iostream>
#include <vector>
using namespace std;
using namespace boost;
int main(int argc, char**argv) {
typedef vector < string > list_type;
list_type list;
string line;
line = "Somewhere down the road";
split(list, line, is_any_of(" "));
for(int i = 0; i < list.size(); i++)
{
cout << list[i] << endl;
}
return 0;
}
Run Code Online (Sandbox Code Playgroud)
这个例子给出了输出 -
Somewhere
down
the
road
Run Code Online (Sandbox Code Playgroud)
小智 7
我使用这个简单因为我们的String类"特殊"(即不标准):
void splitString(const String &s, const String &delim, std::vector<String> &result) {
const int l = delim.length();
int f = 0;
int i = s.indexOf(delim,f);
while (i>=0) {
String token( i-f > 0 ? s.substring(f,i-f) : "");
result.push_back(token);
f=i+l;
i = s.indexOf(delim,f);
}
String token = s.substring(f);
result.push_back(token);
}
Run Code Online (Sandbox Code Playgroud)
#include <iostream>
#include <regex>
using namespace std;
int main() {
string s = "foo bar baz";
regex e("\\s+");
regex_token_iterator<string::iterator> i(s.begin(), s.end(), e, -1);
regex_token_iterator<string::iterator> end;
while (i != end)
cout << " [" << *i++ << "]";
}
Run Code Online (Sandbox Code Playgroud)
IMO,这是最接近python的re.split()的东西.有关regex_token_iterator的更多信息,请参阅cplusplus.com.-1(regex_token_iterator ctor中的第4个参数)是不匹配的序列部分,使用匹配作为分隔符.
尽管有一些答案提供了 C++20 解决方案,但自从发布以来,进行了一些更改并作为缺陷报告应用于 C++20。因此,解决方案更短、更好:
#include <iostream>
#include <ranges>
#include <string_view>
namespace views = std::views;
using str = std::string_view;
constexpr str text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit.";
auto splitByWords(str input) {
return input
| views::split(' ')
| views::transform([](auto &&r) -> str {
return {r.begin(), r.end()};
});
}
auto main() -> int {
for (str &&word : splitByWords(text)) {
std::cout << word << '\n';
}
}
Run Code Online (Sandbox Code Playgroud)
截至今天,它仍然仅在 GCC 的主干分支(Godbolt link)上可用。它基于两个更改:P1391 迭代器构造函数std::string_view
和 P2210 DR 修复std::views::split
以保留范围类型。
在 C++23 中,不需要任何transform
样板,因为 P1989 向 std::string_view 添加了范围构造函数:
#include <iostream>
#include <ranges>
#include <string_view>
namespace views = std::views;
constexpr std::string_view text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit.";
auto main() -> int {
for (std::string_view&& word : text | views::split(' ')) {
std::cout << word << '\n';
}
}
Run Code Online (Sandbox Code Playgroud)
(神箭链接)
以下是一种更好的方法.它可以采用任何角色,除非你想要,否则不会分割线条.不需要特殊的库(好吧,除了std,但是谁真的认为是一个额外的库),没有指针,没有引用,而且它是静态的.简单简单的C++.
#pragma once
#include <vector>
#include <sstream>
using namespace std;
class Helpers
{
public:
static vector<string> split(string s, char delim)
{
stringstream temp (stringstream::in | stringstream::out);
vector<string> elems(0);
if (s.size() == 0 || delim == 0)
return elems;
for(char c : s)
{
if(c == delim)
{
elems.push_back(temp.str());
temp = stringstream(stringstream::in | stringstream::out);
}
else
temp << c;
}
if (temp.str().size() > 0)
elems.push_back(temp.str());
return elems;
}
//Splits string s with a list of delimiters in delims (it's just a list, like if we wanted to
//split at the following letters, a, b, c we would make delims="abc".
static vector<string> split(string s, string delims)
{
stringstream temp (stringstream::in | stringstream::out);
vector<string> elems(0);
bool found;
if(s.size() == 0 || delims.size() == 0)
return elems;
for(char c : s)
{
found = false;
for(char d : delims)
{
if (c == d)
{
elems.push_back(temp.str());
temp = stringstream(stringstream::in | stringstream::out);
found = true;
break;
}
}
if(!found)
temp << c;
}
if(temp.str().size() > 0)
elems.push_back(temp.str());
return elems;
}
};
Run Code Online (Sandbox Code Playgroud)
我写了下面这段代码.您可以指定分隔符,它可以是字符串.结果类似于Java的String.split,结果中包含空字符串.
例如,如果我们调用split("ABCPICKABCANYABCTWO:ABC","ABC"),结果如下:
0 <len:0>
1 PICK <len:4>
2 ANY <len:3>
3 TWO: <len:4>
4 <len:0>
Run Code Online (Sandbox Code Playgroud)
码:
vector <string> split(const string& str, const string& delimiter = " ") {
vector <string> tokens;
string::size_type lastPos = 0;
string::size_type pos = str.find(delimiter, lastPos);
while (string::npos != pos) {
// Found a token, add it to the vector.
cout << str.substr(lastPos, pos - lastPos) << endl;
tokens.push_back(str.substr(lastPos, pos - lastPos));
lastPos = pos + delimiter.size();
pos = str.find(delimiter, lastPos);
}
tokens.push_back(str.substr(lastPos, str.size() - lastPos));
return tokens;
}
Run Code Online (Sandbox Code Playgroud)
这是我使用C++11和STL 的解决方案。它应该是相当有效的:
#include <vector>
#include <string>
#include <cstring>
#include <iostream>
#include <algorithm>
#include <functional>
std::vector<std::string> split(const std::string& s)
{
std::vector<std::string> v;
const auto end = s.end();
auto to = s.begin();
decltype(to) from;
while((from = std::find_if(to, end,
[](char c){ return !std::isspace(c); })) != end)
{
to = std::find_if(from, end, [](char c){ return std::isspace(c); });
v.emplace_back(from, to);
}
return v;
}
int main()
{
std::string s = "this is the string to split";
auto v = split(s);
for(auto&& s: v)
std::cout << s << '\n';
}
Run Code Online (Sandbox Code Playgroud)
输出:
this
is
the
string
to
split
Run Code Online (Sandbox Code Playgroud)
当将空白作为分隔符处理时,使用的明显答案std::istream_iterator<T>
已经给出并投了很多.当然,元素可能不是由空格分隔,而是由一些分隔符分隔.我没有发现任何答案,只是重新定义了空格的含义,然后使用传统的方法.
改变流视为空格的方法,你只需用一个facet 改变流的std::locale
using(std::istream::imbue()
),std::ctype<char>
它有自己对空白意味着什么的定义(也可以这样做std::ctype<wchar_t>
,但它实际上略有不同,因为std::ctype<char>
是表驱动的而std::ctype<wchar_t>
由虚拟功能驱动).
#include <iostream>
#include <algorithm>
#include <iterator>
#include <sstream>
#include <locale>
struct whitespace_mask {
std::ctype_base::mask mask_table[std::ctype<char>::table_size];
whitespace_mask(std::string const& spaces) {
std::ctype_base::mask* table = this->mask_table;
std::ctype_base::mask const* tab
= std::use_facet<std::ctype<char>>(std::locale()).table();
for (std::size_t i(0); i != std::ctype<char>::table_size; ++i) {
table[i] = tab[i] & ~std::ctype_base::space;
}
std::for_each(spaces.begin(), spaces.end(), [=](unsigned char c) {
table[c] |= std::ctype_base::space;
});
}
};
class whitespace_facet
: private whitespace_mask
, public std::ctype<char> {
public:
whitespace_facet(std::string const& spaces)
: whitespace_mask(spaces)
, std::ctype<char>(this->mask_table) {
}
};
struct whitespace {
std::string spaces;
whitespace(std::string const& spaces): spaces(spaces) {}
};
std::istream& operator>>(std::istream& in, whitespace const& ws) {
std::locale loc(in.getloc(), new whitespace_facet(ws.spaces));
in.imbue(loc);
return in;
}
// everything above would probably go into a utility library...
int main() {
std::istringstream in("a, b, c, d, e");
std::copy(std::istream_iterator<std::string>(in >> whitespace(", ")),
std::istream_iterator<std::string>(),
std::ostream_iterator<std::string>(std::cout, "\n"));
std::istringstream pipes("a b c| d |e e");
std::copy(std::istream_iterator<std::string>(pipes >> whitespace("|")),
std::istream_iterator<std::string>(),
std::ostream_iterator<std::string>(std::cout, "\n"));
}
Run Code Online (Sandbox Code Playgroud)
大多数代码用于打包提供软分隔符的通用工具:合并多行分隔符.无法生成空序列.当流中需要不同的分隔符时,您可能使用共享流缓冲区使用不同的设置流:
void f(std::istream& in) {
std::istream pipes(in.rdbuf());
pipes >> whitespace("|");
std::istream comma(in.rdbuf());
comma >> whitespace(",");
std::string s0, s1;
if (pipes >> s0 >> std::ws // read up to first pipe and ignore sequence of pipes
&& comma >> s1 >> std::ws) { // read up to first comma and ignore commas
// ...
}
}
Run Code Online (Sandbox Code Playgroud)
作为业余爱好者,这是我想到的第一个解决方案.我有点好奇为什么我还没有在这里找到类似的解决方案,我是怎么做到的根本错误的?
#include <iostream>
#include <string>
#include <vector>
std::vector<std::string> split(const std::string &s, const std::string &delims)
{
std::vector<std::string> result;
std::string::size_type pos = 0;
while (std::string::npos != (pos = s.find_first_not_of(delims, pos))) {
auto pos2 = s.find_first_of(delims, pos);
result.emplace_back(s.substr(pos, std::string::npos == pos2 ? pos2 : pos2 - pos));
pos = pos2;
}
return result;
}
int main()
{
std::string text{"And then I said: \"I don't get it, why would you even do that!?\""};
std::string delims{" :;\".,?!"};
auto words = split(text, delims);
std::cout << "\nSentence:\n " << text << "\n\nWords:";
for (const auto &w : words) {
std::cout << "\n " << w;
}
return 0;
}
Run Code Online (Sandbox Code Playgroud)
使用std::string_view
和Eric Niebler的range-v3
库:
https://wandbox.org/permlink/kW5lwRCL1pxjp2pW
#include <iostream>
#include <string>
#include <string_view>
#include "range/v3/view.hpp"
#include "range/v3/algorithm.hpp"
int main() {
std::string s = "Somewhere down the range v3 library";
ranges::for_each(s
| ranges::view::split(' ')
| ranges::view::transform([](auto &&sub) {
return std::string_view(&*sub.begin(), ranges::distance(sub));
}),
[](auto s) {std::cout << "Substring: " << s << "\n";}
);
}
Run Code Online (Sandbox Code Playgroud)
通过使用范围for
循环而不是ranges::for_each
算法:
#include <iostream>
#include <string>
#include <string_view>
#include "range/v3/view.hpp"
int main()
{
std::string str = "Somewhere down the range v3 library";
for (auto s : str | ranges::view::split(' ')
| ranges::view::transform([](auto&& sub) { return std::string_view(&*sub.begin(), ranges::distance(sub)); }
))
{
std::cout << "Substring: " << s << "\n";
}
}
Run Code Online (Sandbox Code Playgroud)
每个人都回答预定义的字符串输入。我认为这个答案将帮助某人进行扫描输入。
我使用令牌向量来保存字符串令牌。它是可选的。
#include <bits/stdc++.h>
using namespace std ;
int main()
{
string str, token ;
getline(cin, str) ; // get the string as input
istringstream ss(str); // insert the string into tokenizer
vector<string> tokens; // vector tokens holds the tokens
while (ss >> token) tokens.push_back(token); // splits the tokens
for(auto x : tokens) cout << x << endl ; // prints the tokens
return 0;
}
Run Code Online (Sandbox Code Playgroud)
样本输入:
port city international university
Run Code Online (Sandbox Code Playgroud)
示例输出:
port
city
international
university
Run Code Online (Sandbox Code Playgroud)
请注意,默认情况下,这仅适用于作为分隔符的空格。您可以使用自定义分隔符。为此,您已经自定义了代码。让分隔符为','。所以用
char delimiter = ',' ;
while(getline(ss, token, delimiter)) tokens.push_back(token) ;
Run Code Online (Sandbox Code Playgroud)
代替
while (ss >> token) tokens.push_back(token);
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
2141942 次 |
最近记录: |