我正在编写读取包含DNA碱基的巨大文本文件的代码,我需要能够提取特定部分。该文件如下所示:
TGTTCCAGGCTGTCAGATGCTAACCTGGGG
TCACTGGGGGTGTGCGTGCTGCTCCAGCCT
GTTCCAGGATATCAGATGCTCACCTGGGGG
Run Code Online (Sandbox Code Playgroud)
...
每行30个字符。
我有一个单独的文件来指示这些部分,这意味着我有一个开始值和一个结束值。因此,对于每一个开始和结束的值,我需要提取的文件中的相应字符串。例如,如果我有start = 10,end = 45,则需要将以第一行(C)的第10个字符开始并以第2行(C)的第15个字符结束的字符串存储在单独的临时文件中文件。
我尝试将fread函数(如下所示)用于具有上述字母行的测试文件。参数分别为:开始 = 1,结束 = 90,结果文件如下:
TGTTCCAGGCTGTCAGATGCTAACCTGGGG
TCACTGGGGGTGTGCGTGCTGCTCCAGCCT
GTTCCAGGATATCAGATGCTCACCTGGG™eRV
Run Code Online (Sandbox Code Playgroud)
每次运行都会在末尾给出随机字符。
代码:
FILE* fp;
fp=fopen(filename, "r");
if (fp==NULL) puts("Failed to open file");
int start=1, end=90;
char string[end-start+2]; //characters from start to end = end-start+1
fseek(fp, start-1, SEEK_SET);
fread(exon,1, end-start+1, fp);
FILE* tp;
tp=fopen("exon", "w");
if (tp==NULL) puts("Failed to make tmp file");
fprintf(tp, "%s\n", string);
fclose(tp);
Run Code Online (Sandbox Code Playgroud)
我不明白fread如何处理\ n字符,因此我尝试将其替换为以下内容:
int i=0;
char ch;
while (!feof(fp))
{
ch=fgetc(fp);
if (ch != '\n')
{
string[i]=ch;
i++;
if (i==end-start) break;
}
}
string[end-start+1]='\0';
Run Code Online (Sandbox Code Playgroud)
它创建了以下文件:TGTTCCAGGCTGTCAGATGCTAACCTGGGGTCACTGGGGGTGTGCGTGCTGCTCCAGCCTGTTCCAGGATATCAGATGCTCACCTGGGGô
(没有换行符,我不介意)。每次运行我都会得到一个不同的随机字符,而不是“ G”。
我究竟做错了什么?有没有一种方法可以通过fread或其他功能来完成?
先感谢您。
我修改了您的代码并添加了注释以进行解释。
请看一下。您忽略了错误检查,代码中几乎没有未定义的变量。
我已经从if失败的块中返回,goto`会更合适。
和是加1个字符还是2个字符,请参考此评论。startend
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main()
{
FILE* fp;
// fp = fopen(filename, "r");
// since the filename is undeclared i have used hard coded file name
fp = fopen("dna.txt", "r");
// Nothing wrong in performing error checking
if (fp == NULL) {
puts("Failed to open file");
return -1;
}
// Make sure start is not 0 if you want to use indices starting from 1
int start = 1, end = 90;
// I would adjust the start and end index by adding count of '\n' or '\r\n' to the start and end
// Here I am adjusting for '\n' i.e 1 char
// since you have 30 chars so hardcoding it.
int m = 1; // m depends on whether it is \n or \r\n
// 1 for \n and 2 for \r\n
--start; --end; // adjusting indexes to be 0 based
if (start != 0)
start = start + (start / 30) * m; // start will be 0
if (end != 0)
end = end + (end / 30) * m; // start will be 93
// lets declare the chars to read
int char_to_read = end - start + 1;
// need only 1 extra char to append null char
// If start and end is going to change, then i would suggest using malloc instead of static buffer
// because compiler cannot predict the memory to allocate to the buffer if it is dependent on external factor
// char string[char_to_read + 1]; //characters from start to end = end-start+1
char *string = malloc(char_to_read + 1);
if (string == NULL) {
printf("malloc failed\n");
fclose(fp);
return -2;
}
// zero the buffer
memset(string, 0, char_to_read + 1);
int rc = fseek(fp, start, SEEK_SET);
if (rc == -1) {
printf("fseek failed");
fclose(fp);
return -1;
}
// exon is not defined, and btw we wanted to read in string.
int bytes_read = fread(string, 1, char_to_read, fp);
// Lets check if there is any error after reading
if (bytes_read == -1) {
fclose(fp);
return -1;
}
// Now append the null char to the end
string[bytes_read] = 0;
printf("%s\n", string);
fclose(fp);
// free the memory once you are done with it
if (string)
free(string);
// Now u can write it back to file.
// FILE* tp;
// tp=fopen("exon", "w");
// if (tp==NULL) puts("Failed to make tmp file");
// fprintf(tp, "%s\n", string);
// fclose(tp);
}
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
132 次 |
| 最近记录: |