不使用groupby
如何在没有过滤数据的情况下NaN
?
假设我有一个矩阵,客户将在其中填写'N/A','n/a'或其任何变体,其他人将其留空:
import pandas as pd
import numpy as np
df = pd.DataFrame({'movie': ['thg', 'thg', 'mol', 'mol', 'lob', 'lob'],
'rating': [3., 4., 5., np.nan, np.nan, np.nan],
'name': ['John', np.nan, 'N/A', 'Graham', np.nan, np.nan]})
nbs = df['name'].str.extract('^(N/A|NA|na|n/a)')
nms=df[(df['name'] != nbs) ]
Run Code Online (Sandbox Code Playgroud)
输出:
>>> nms
movie name rating
0 thg John 3
1 thg NaN 4
3 mol Graham NaN
4 lob NaN NaN
5 lob NaN NaN
Run Code Online (Sandbox Code Playgroud)
我如何过滤掉NaN值,以便我可以得到如下结果:
movie name rating
0 thg John 3
3 mol …
Run Code Online (Sandbox Code Playgroud) 我有一个DataFrame:
import pandas as pd
import numpy as np
df = pd.DataFrame({'foo.aa': [1, 2.1, np.nan, 4.7, 5.6, 6.8],
'foo.fighters': [0, 1, np.nan, 0, 0, 0],
'foo.bars': [0, 0, 0, 0, 0, 1],
'bar.baz': [5, 5, 6, 5, 5.6, 6.8],
'foo.fox': [2, 4, 1, 0, 0, 5],
'nas.foo': ['NA', 0, 1, 0, 0, 0],
'foo.manchu': ['NA', 0, 0, 0, 0, 0],})
Run Code Online (Sandbox Code Playgroud)
我想在以foo.
.开头的列中选择值1 .有没有比这更好的方法:
df2 = df[(df['foo.aa'] == 1)|
(df['foo.fighters'] == 1)|
(df['foo.bars'] == 1)|
(df['foo.fox'] == 1)|
(df['foo.manchu'] == …
Run Code Online (Sandbox Code Playgroud) 我目前正在使用python,pandas
并想知道是否有办法将数据从熊猫输出到julia Dataframes
,反之亦然.(我想你可以从Julia调用python,Pycall
但我不确定它是否适用于数据帧)有没有办法从python调用Julia并让它接受panda
数据帧?(不保存为像csv这样的其他文件格式)
除了非常大的数据集和运行具有许多循环(如神经网络)的东西之外,何时使用Julia Dataframes而不是Pandas是否有利?
是否可以打开PDF并使用python pandas读取它或者我是否必须使用pandas剪贴板来实现此功能?
假设我Person
在Julia中指定了一个类型:
type Person
name::String
male::Bool
age::Float64
children::Int
end
function describe(p::Person)
println("Name: ", p.name, " Male: ", p.male)
println("Age: ", p.age, " Children: ", p.children)
end
ted = Person("Ted",1,55,0)
describe(ted)
Run Code Online (Sandbox Code Playgroud)
哪个将输出功能:
Name: Ted Male: true
Age: 55.0 Children: 0
Run Code Online (Sandbox Code Playgroud)
然后我修改了类型的功能,我在该类型Person
中添加了一个新类别eyes
type Person
name::String
male::Bool
age::Float64
children::Int
eyes::String
end
ted = Person("Ted",1,55,0,brown)
Run Code Online (Sandbox Code Playgroud)
如果我现在运行该功能,我会收到错误
Error evaluating REPL:
invalid redefinition of constant Person
in include_string at loading.jl:97
Run Code Online (Sandbox Code Playgroud)
在开发新代码时,解决此问题的最佳方法是什么?除了按照朱莉娅常见问题解答中的建议制作模块
如何在条形图中的条形图上方添加值的标签:
import pandas as pd
import matplotlib.pyplot as plt
df=pd.DataFrame({'Users': [ 'Bob', 'Jim', 'Ted', 'Jesus', 'James'],
'Score': [10,2,5,6,7],})
df = df.set_index('Users')
df.plot(kind='bar', title='Scores')
plt.show()
Run Code Online (Sandbox Code Playgroud) 我正在尝试使用来自2个或更多不均匀的pandas数据帧的数据创建堆叠直方图?到目前为止,我可以让他们在彼此之上绘制图形而不是堆叠.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('dert.csv', encoding = "ISO-8859-1", index_col=0)
df1['text'] = df['text'].dropna(subset=['five'])
df2['printed'] = df['text2']
ax = df1['text'].hist( bins=100, range=(1,100), stacked=True, color = 'r')
ax = df2['printed'].hist(bins=100, range=(1,100), stacked=True, color = 'g')
plt.setp(ax.get_xticklabels(), rotation=45)
plt.show()
Run Code Online (Sandbox Code Playgroud)
我怎样让他们堆叠?
我找到了一个解决方案,但它没有使用pandas数据帧Matplotlib,从三个不等长的数组创建堆叠直方图
假设我想用循环中的值创建和填充空数据框.
import pandas as pd
import numpy as np
years = [2013, 2014, 2015]
dn=pd.DataFrame()
for year in years:
df1 = pd.DataFrame({'Incidents': [ 'C', 'B','A'],
year: [1, 1, 1 ],
}).set_index('Incidents')
print (df1)
dn=dn.append(df1, ignore_index = False)
Run Code Online (Sandbox Code Playgroud)
即使忽略index为false,append也会给出一个对角矩阵:
>>> dn
2013 2014 2015
Incidents
C 1 NaN NaN
B 1 NaN NaN
A 1 NaN NaN
C NaN 1 NaN
B NaN 1 NaN
A NaN 1 NaN
C NaN NaN 1
B NaN NaN 1
A NaN NaN 1 …
Run Code Online (Sandbox Code Playgroud) 我如何绘制线性回归结果用于大熊猫的线性回归?
import pandas as pd
from pandas.stats.api import ols
df = pd.read_csv('Samples.csv', index_col=0)
control = ols(y=df['Control'], x=df['Day'])
one = ols(y=df['Sample1'], x=df['Day'])
two = ols(y=df['Sample2'], x=df['Day'])
Run Code Online (Sandbox Code Playgroud)
我试过plot()
但它没用.我想在一个图上绘制所有三个样本是否有任何pandas代码或matplotlib代码以这些摘要的格式的hadle数据?
无论如何,结果看起来像这样:
控制
------------------------Summary of Regression Analysis-------------------------
Formula: Y ~ <x> + <intercept>
Number of Observations: 7
Number of Degrees of Freedom: 2
R-squared: 0.5642
Adj R-squared: 0.4770
Rmse: 4.6893
F-stat (1, 5): 6.4719, p-value: 0.0516
Degrees of Freedom: model 1, resid 5
-----------------------Summary of Estimated Coefficients------------------------
Variable Coef Std Err t-stat p-value …
Run Code Online (Sandbox Code Playgroud) 你如何使用Julia Dataframes进行分组和透视表?
让我们说我有Dataframe
using DataFrames
df =DataFrame(Location = [ "NY", "SF", "NY", "NY", "SF", "SF", "TX", "TX", "TX", "DC"],
Class = ["H","L","H","L","L","H", "H","L","L","M"],
Address = ["12 Silver","10 Fak","12 Silver","1 North","10 Fak","2 Fake", "1 Red","1 Dog","2 Fake","1 White"],
Score = ["4","5","3","2","1","5","4","3","2","1"])
Run Code Online (Sandbox Code Playgroud)
我想做以下事情:
1)具有Location
和Class
应输出的枢轴表
Class H L M
Location
DC 0 0 1
NY 2 1 0
SF 1 2 0
TX 1 2 0
Run Code Online (Sandbox Code Playgroud)
2)按"位置"分组,并计算该组中应记录的记录数
Pop
DC 1
NY 3
SF 3
TX 3
Run Code Online (Sandbox Code Playgroud)