我已经阅读了关于Pandas'to_csv(......等等)的Python 2限制的内容.我打了吗?我在使用Python 2.7.3
当它们出现在字符串中时,这会产生≥和 - 的垃圾字符.除此之外,出口是完美的.
df.to_csv("file.csv", encoding="utf-8")
Run Code Online (Sandbox Code Playgroud)
有没有解决方法?
df.head()是这样的:
demography Adults ?49 yrs Adults 18?49 yrs at high risk|| \
state
Alabama 32.7 38.6
Alaska 31.2 33.2
Arizona 22.9 38.8
Arkansas 31.2 34.0
California 29.8 38.8
Run Code Online (Sandbox Code Playgroud)
csv输出就是这个
state, Adults ≥49 yrs, Adults 18−49 yrs at high risk||
0, Alabama, 32.7, 38.6
1, Alaska, 31.2, 33.2
2, Arizona, 22.9, 38.8
3, Arkansas,31.2, 34
4, California,29.8, 38.8
Run Code Online (Sandbox Code Playgroud)
整个代码是这样的:
import pandas
import xlrd
import csv
import json
df = pandas.DataFrame()
dy = …Run Code Online (Sandbox Code Playgroud) 试图从这样的东西中抓取一些HTML.有时我需要的数据是div [0],有时是div [1]等.
想象一下,每个人都需要3-5节课.其中之一就是生物学.他们的成绩单总是按字母顺序排列.我想要每个人的生物学等级.
我已经把所有这些HTML都写成了文本,现在如何剔除生物学成绩?
<div class = "student">
<div class = "score">Algebra C-</div>
<div class = "score">Biology A+</div>
<div class = "score">Chemistry B</div>
</div>
<div class = "student">
<div class = "score">Biology B</div>
<div class = "score">Chemistry A</div>
</div>
<div class = "student">
<div class = "score">Alchemy D</div>
<div class = "score">Algebra A</div>
<div class = "score">Biology B</div>
</div>
<div class = "student">
<div class = "score">Algebra A</div>
<div class = "score">Biology B</div>
<div class = "score">Chemistry C+</div>
</div> …Run Code Online (Sandbox Code Playgroud) 我正在处理 d3.js 图形。我的数据在一个巨大的多标签 .xls 中。我必须从每个选项卡中获取数据,因此我决定将其全部转储到 Pandas 中并导出一些 .json。
原始数据,分布在许多选项卡中:
demography, area, state, month, rate
over 65, region2, GA, May, 23
over 65, region2, AL, May, 25
NaN, random_odd_data, mistake, error
18-65, region2, GA, 77
18-65, region2, AL, 75
Run Code Online (Sandbox Code Playgroud)
现在,放入熊猫,合并并清理:
demography area state month rate
0 over 65 region2 GA May 23
1 over 65 region2 AL May 25
2 18-65 region2 GA May 50
3 18-65 region2 AL May 55
Run Code Online (Sandbox Code Playgroud)
现在,分组
group = df.groupby(['state', 'demography'])
Run Code Online (Sandbox Code Playgroud)
产量
<pandas.core.groupby.DataFrameGroupBy object at 0x106939610> …Run Code Online (Sandbox Code Playgroud)