小编use*_*024的帖子

如何控制C printf%e中'e'后的指数位数？

我想控制C中'e'后的指数位数printf %e？

例如,C printf("%e")结果2.35e+03,但我想2.35e+003,我需要3位数的指数,我该如何使用printf？

码:

#include<stdio.h>
int main()
{
    double x=34523423.52342353;
    printf("%.3g\n%.3e",x,x);
    return 0;
}

Run Code Online (Sandbox Code Playgroud)

结果: http ://codepad.org/dSLzQIrn

3.45e+07
3.452e+07

Run Code Online (Sandbox Code Playgroud)

我想要

3.45e+007
3.452e+007

Run Code Online (Sandbox Code Playgroud)

但有趣的是,我在Windows中使用MinGW获得了正确的结果.

c printf

use*_*024

2015 07-10

8
推荐指数

1
解决办法

7686
查看次数

如何检查ifstream是否是C++文件的结尾

我需要按顺序读取一个大文件(大约10GB)的所有块,该文件包含许多带有几个字符串的浮点数,如下所示(每个项目由'\n'分割): 6.292611 -1.078219E-266 -2.305673E+065 sod;eiwo 4.899747e-237 1.673940e+089 -4.515213

我MAX_NUM_PER_FILE每次都读取项目并处理它们并写入另一个文件,但我不知道什么时候ifstream结束.这是我的代码:

ifstream file_input(path_input);  //my file is a text file, but i tried  both text and binary mode, both failed.
if(file_input)
{
    file_input.seekg(0,file_input.end);
    unsigned long long length = file_input.tellg();    //get file size
    file_input.seekg(0,file_input.beg);

    char * buffer = new char [MAX_NUM_PER_FILE+MAX_NUM_PER_LINE];
    int i=1,j;
    char c,tmp[3];
    while(file_input.tellg()<length)
    {
        file_input.read(buffer,MAX_NUM_PER_FILE);
        j=MAX_NUM_PER_FILE;
        while(file_input.get(c)&&c!='\n')
            buffer[j++]=c;   //get a complete item

        //process with buffer...

        itoa(i++,tmp,10);    //int2char
        string out_name="out"+string(tmp)+".txt";
        ofstream file_output(out_name);
        file_output.write(buffer,j);
        file_output.close();
    }

    file_input.close();
    delete[] buffer; …

Run Code Online (Sandbox Code Playgroud)

c++ line-endings file-handling ifstream seekg

use*_*024

2017 05-23

5
推荐指数

1
解决办法

1万
查看次数

如何使用python通过余弦相似度有效地检索顶级K-相似文档？

我正在处理十万（100,000）份文件（平均文件长度约为 500 个术语）。对于每个文档，我想通过余弦相似度获得前 k 个（例如 k = 5）个相似文档。那么如何通过Python有效地做到这一点。

这是我所做的：

对于每个文档，进行文本分割，去除停用词，计算词频（tf）

所以我们得到了 tf 矩阵，大约 100,000 个文档 * 600000 个术语

1做- pairwise_distances（tf_matrix，度量= “余弦”）

对于每个文档，获取前 k 个相似文档。

我在 i5-2.5GHz 上运行我的代码，12 小时过去了，但它仍然有效。所以我想知道如何优化我的代码或程序。

这是我的想法：

对于每个文档，进行特征选择，只保留 tf > 1 的术语

首先进行聚类，然后计算每个聚类内的余弦相似度

因为我只需要前 k 个相似的文档，我是否需要计算所有成对余弦相似度？

python GPU编程还是并行编程？

那么，你有什么好主意吗？

非常感谢。

我知道有一个类似的问题，但这不是我想要的。

更新1

感谢 @orange ，经过分析，我发现第 2 步是瓶颈！这是示例代码：

def construct_dt_matrix(): dt_matrix = pd.DataFrame(columns=['docid']) docid = 0 for f in files: # text segmentation for f # remove stop words # word count store …
Run Code Online (Sandbox Code Playgroud)

python algorithm tf-idf feature-selection cosine-similarity

use*_*024

2017 05-23

5
推荐指数

1
解决办法

2005
查看次数

如何在c ++ 11中多线程读取文件？

我有一个大文件,我必须通过块读取它.每次当我读取一个块时,我都要做一些耗时的操作,所以我认为多线程读取可能有所帮助,每个线程逐个读取一个块并进行操作.这是我在c ++ 11中的代码

#include<iostream> #include<fstream> #include <condition_variable> #include <mutex> #include <thread> using namespace std; const int CHAR_PER_FILE = 1e8; const int NUM_THREAD = 2; int order = -1; bool is_reading = false; mutex mtx; condition_variable file_not_reading; void partition(ifstream& is) { while (is.peek() != EOF) { unique_lock<mutex> lock(mtx); while (is_reading) file_not_reading.wait(lock); is_reading = true; char *c = new char[CHAR_PER_FILE]; is.read(c, CHAR_PER_FILE); order++; is_reading = false; file_not_reading.notify_all(); lock.unlock(); char oc[3]; sprintf(oc, "%d", order); this_thread::sleep_for(chrono::milliseconds(2000));//some operations that take long time ofstream os(oc, ios::binary); …
Run Code Online (Sandbox Code Playgroud)

c++ multithreading mutex c++11

use*_*024

2015 03-14

3
推荐指数

2
解决办法

5845
查看次数

标签统计

c++ ×2

algorithm ×1

c ×1

c++11 ×1

cosine-similarity ×1

feature-selection ×1

file-handling ×1

ifstream ×1

line-endings ×1

multithreading ×1

mutex ×1

printf ×1

python ×1

seekg ×1

tf-idf ×1

如何控制C printf%e中'e'后的指数位数？

如何检查ifstream是否是C++文件的结尾

如何使用python通过余弦相似度有效地检索顶级K-相似文档？

更新1

如何在c ++ 11中多线程读取文件？

标签 统计

小编use_024的帖子

标签统计