Ear*_*own 6 c++ sorting r std rcpp
虽然我可以通过在std :: sort中指定UTF-8语言环境来成功地对带有重音元音的西班牙语单词进行排序,
// [[Rcpp::export]]
std::vector<std::string> sort_words(std::vector<std::string> x) {
std::sort(x.begin(), x.end(), std::locale("en_US.UTF-8"));
return x;
}
/*** R
words <- c("casa", "árbol", "zona", "árbol", "casa", "libro")
sort_words(words)
*/
returns (as expected):
[1] "árbol" "árbol" "casa" "casa" "libro" "zona"
Run Code Online (Sandbox Code Playgroud)
我无法弄清楚如何对地图做同样的事情:
// slightly modified version of tableC on http://adv-r.had.co.nz/Rcpp.html
// [[Rcpp::export]]
std::map<String, int> table_words(CharacterVector x) {
std::setlocale(LC_ALL, "en_US.UTF-8");
// std::setlocale(LC_COLLATE, "en_US.UTF-8"); // also tried this instead of previous line
std::map<String, int> counts;
int n = x.size();
for (int i = 0; i < n; i++) {
counts[x[i]]++;
}
return counts;
}
/*** R
words <- c("casa", "árbol", "zona", "árbol", "casa", "libro")
table_words(words)
*/
returns:
casa libro zona árbol
2 1 1 2
but I want:
árbol casa libro zona
2 2 1 1
Run Code Online (Sandbox Code Playgroud)
关于如何table_words将重音"árbol"放在"casa"之前的任何想法,与Rcpp或甚至退出R,与base::sort?
另外,std::sort(..., std::locale("en_US.UTF-8"))我的Linux机器上的单词只有:gcc version 4.8.2(Ubuntu 4.8.2-19ubuntu1).它在Mac 10.10.3上不起作用:Apple LLVM版本6.1.0(clang-602.0.53)(基于LLVM 3.6.0svn).关于我的Linux编译器缺少什么我的Mac编译器的线索?
这是我的脚本和sessionInfo,适用于两台机器:
// [[Rcpp::plugins(cpp11)]]
#include <locale>
#include <clocale>
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
std::vector<std::string> sort_words(std::vector<std::string> x) {
std::sort(x.begin(), x.end(), std::locale("en_US.UTF-8"));
return x;
}
// [[Rcpp::export]]
std::map<String, int> table_words(CharacterVector x) {
// std::setlocale(LC_ALL, "en_US.UTF-8"); // tried this instead of next line
std::setlocale(LC_COLLATE, "en_US.UTF-8");
std::map<String, int> counts;
int n = x.size();
for (int i = 0; i < n; i++) {
counts[x[i]]++;
}
return counts;
}
/*** R
words <- c("casa", "árbol", "zona", "árbol", "casa", "libro")
sort_words(words)
table_words(words)
sort(table_words(words), decreasing = T)
output_from_Rcpp <- table_words(words)
sort(names(output_from_Rcpp))
*/
> words <- c("casa", "árbol", "zona", "árbol", "casa", "libro")
> sort_words(words)
[1] "árbol" "árbol" "casa" "casa" "libro" "zona"
> table_words(words)
casa libro zona árbol
2 1 1 2
> sort(table_words(words), decreasing = T)
casa árbol libro zona
2 2 1 1
> output_from_Rcpp <- table_words(words)
> sort(names(output_from_Rcpp))
[1] "árbol" "casa" "libro" "zona"
sessionInfo on linux machine:
R version 3.2.0 (2015-04-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04 LTS
locale:
[1] en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] tools_3.2.0 Rcpp_0.11.6
sessionInfo on Mac:
R version 3.2.1 (2015-06-18)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.3 (Yosemite)
locale:
[1] en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] textcat_1.0-3 readr_0.1.1 rvest_0.2.0
loaded via a namespace (and not attached):
[1] httr_1.0.0 selectr_0.2-3 R6_2.1.0 magrittr_1.5 tools_3.2.1 curl_0.9.1 Rcpp_0.11.6 slam_0.1-32 stringi_0.5-5
[10] tau_0.0-18 stringr_1.0.0 XML_3.98-1.3
Run Code Online (Sandbox Code Playgroud)
std::sort应用在 a 上是没有意义的std::map,因为根据定义,映射总是排序的。该定义是模板实例化的具体类型的一部分。std::map有第三个“隐藏”类型参数,用于用于对键进行排序的比较函数,默认为std::less键类型。请参阅http://en.cppreference.com/w/cpp/container/map。
在你的情况下,你可以使用std::locale作为比较类型,并通过std::locale("en-US")(或任何适合您的系统的内容)传递给构造函数。
这是一个例子。它使用 C++11,但您可以轻松地在 C++03 中使用相同的解决方案。
\n\n#include <map>\n#include <iostream>\n#include <string>\n#include <locale>\n#include <exception>\n\nusing Map = std::map<std::string, int, std::locale>;\n\nint main()\n{\n try\n {\n Map map(std::locale("en-US"));\n map["casa"] = 1;\n map["\xc3\xa1rbol"] = 2;\n map["zona"] = 3;\n map["\xc3\xa1rbol"] = 4;\n map["casa"] = 5;\n map["libro"] = 6;\n\n for (auto const& map_entry : map)\n {\n std::cout << map_entry.first << " -> " << map_entry.second << "\\n";\n }\n }\n catch (std::exception const& exc)\n {\n std::cerr << exc.what() << "\\n";\n }\n}\nRun Code Online (Sandbox Code Playgroud)\n\n输出:
\n\n\xc3\xa1rbol -> 4\ncasa -> 5\nlibro -> 6\nzona -> 3\nRun Code Online (Sandbox Code Playgroud)\n\n当然,您必须意识到std::locale高度依赖于实现的事实。使用Boost.Locale可能会更好。
另一个问题是这个解决方案可能看起来很混乱,因为std::locale并不完全是许多程序员将其与比较函数联系起来的东西。这有点太聪明了。
因此,一个可能更具可读性的替代方案:
\n\n#include <map>\n#include <iostream>\n#include <string>\n#include <locale>\n#include <exception>\n\nstruct ComparisonUsingLocale\n{\n std::locale locale{ "en-US" };\n\n bool operator()(std::string const& lhs, std::string const& rhs) const\n {\n return locale(lhs, rhs);\n }\n};\n\nusing Map = std::map<std::string, int, ComparisonUsingLocale>;\n\nint main()\n{\n try\n {\n Map map;\n map["casa"] = 1;\n map["\xc3\xa1rbol"] = 2;\n map["zona"] = 3;\n map["\xc3\xa1rbol"] = 4;\n map["casa"] = 5;\n map["libro"] = 6;\n\n for (auto const& map_entry : map)\n {\n std::cout << map_entry.first << " -> " << map_entry.second << "\\n";\n }\n }\n catch (std::exception const& exc)\n {\n std::cerr << exc.what() << "\\n";\n }\n}\nRun Code Online (Sandbox Code Playgroud)\n