tum*_*eed 2 python csv python-3.x pandas
我有文件.txt包含这样的单词列表:
5.91686268506 exclusively, catering, provides, arms, georgia, formal, purchase, choose
5.91560417296 hugh, senlis
5.91527936181 italians
5.91470429433 soil, cultivation, fertile
5.91468087491 increases, moderation
....
5.91440227412 farmers, descendants
Run Code Online (Sandbox Code Playgroud)
我想将这些数据转换为pandas表,我希望将其显示到html/bootstrap模板中,如下所示(*):
COL_A COL_B
5.91686268506 exclusively, catering, provides, arms, georgia, formal, purchase, choose
5.91560417296 hugh, senlis
5.91527936181 italians
5.91470429433 soil, cultivation, fertile
5.91468087491 increases, moderation
....
5.91440227412 farmers, descendants
Run Code Online (Sandbox Code Playgroud)
所以我用熊猫尝试了以下内容:
import pandas as pd
df = pd.read_csv('file.csv',
sep = ' ', names=['Col_A', 'Col_B'])
df.head(20)
Run Code Online (Sandbox Code Playgroud)
但是,我的表没有上述所需的结构:
COL_A COL_B
6.281426 engaged, chance, makes, meeting, nations, things, believe, tries, believing, knocked, admits, awkward
6.277438 sweden NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
6.271190 artificial, ammonium NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
6.259790 boats, prefix NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
6.230612 targets, tactical, wing, missile, squadrons NaN NaN NaN NaN NaN NaN NaN
Run Code Online (Sandbox Code Playgroud)
有关如何以(*)表格格式获取数据的任何想法?
因为单词之间有空格,如果将空格指定为分隔符,它自然会将它们分开.为了得到你需要的东西,你可以尝试将其设置sep为正则表达式(?<!,),?<!是语法背后的负面看法,这意味着只有在空格前面没有逗号时它才会分开,它应该适用于你的情况:
pd.read_csv("~/test.csv", sep = "(?<!,) ", names=['weight', 'topics'])
# weight topics
#0 5.916863 exclusively, catering, provides, arms, georgia...
#1 5.915604 hugh, senlis
#2 5.915279 italians
#3 5.914704 soil, cultivation, fertile
#4 5.914681 increases, moderation
#5 5.914402 farmers, descendants
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
51 次 |
| 最近记录: |