使用fread,data.table包读取strand(+, - )列

Alv*_*lez 6 r data.table

我正在尝试使用fread将基因组对齐读入到data.tableR中.这是对齐文件的快照:

USI-EAS28:1:100:1786:674#0/1    +   1_maternal  68326824      CTCAATTATACTGAAAGAAACACAATATATCATA    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  0   
USI-EAS28:1:100:1786:940#0/1    +   16_maternal 11407541    CTATTAGTGACCTGCTGTGGGACCTTGGGATGGT  IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  0   
USI-EAS28:1:100:1786:705#0/1    +   1_maternal  63849584    CTGAGGGTTTGTGTCAGGAAGGGGTGTGGAATTG  IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  0   0:T>C
USI-EAS28:1:100:1786:1168#0/1   -   5_maternal  31381649    GCATCATTCATGAAACAATTTTCAAGAGAGGAAA  IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  0   
 USI-EAS28:1:100:1787:582#0/1   +   10_maternal 54587781    CTACAATAATAATAGGGGACTAAAACACCCCACT  IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  0   
 USI-EAS28:1:100:1787:62#0/1    +   10_maternal 70390747     CTATTTGCTACTGAATTGTTAATTTTAAAACAGT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  0   
 USI-EAS28:1:100:1788:573#0/1   -   7_maternal  92583837     CACTGTCAACATTAGACAGACCAATGAGACAAAG IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  0   
 USI-EAS28:1:100:1788:854#0/1   +   7_maternal  129611206    GTTTGTTTTTTTTTTTGAGATGGAGTCTCATTTT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  0   32:C>T
 USI-EAS28:1:100:1788:185#0/1   -   13_maternal 23694307    CAAACAAACTCAAAATGGACTATCGACTGAAAAA  IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  0   
 USI-EAS28:1:100:1788:1339#0/1  -   13_maternal 33699510    TTAACTCTAGTTTTTAGGGATTGCAAATTAGACG  IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  0   0:A>G
Run Code Online (Sandbox Code Playgroud)

第二列报告读取对齐的链(+正向,-反向).不幸的是,fread正在尝试将此列读取为整数,将值始终指定为0.此列应该作为字符读取,或者甚至是布尔值.试图玩参数sepsep2没有帮助.

Mat*_*wle 3

感谢您的报告。现在已在 v1.8.9 commit 849 中修复。现在+-读取为字符,添加了测试。

顺便说一句,我们还打算添加,colClasses以便您可以覆盖fread检测到的列类型。与此相关的待办事项列表fread位于源文件的顶部: https:
//r-forge.r-project.org/scm/viewvc.php/pkg/src/fread.c ?view=markup&root=datatable