ecj*_*cjb 3 python dataframe pandas julia julia-dataframe
我想用它PyJulia
来加速代码的某些部分
import numpy as np
import julia
import pandas as pd
import random
from julia import Base
from julia import Main
from julia import DataFrames
n = 100000
randomlist = []
for i in range(0,n):
num = random.randint(1,100)
randomlist.append(num)
data = {
'Score': list(randomlist),
'ScoreBin': list(np.zeros(n))
}
df = pd.DataFrame(data, columns = ['Score', 'ScoreBin'])
Main.dfj = df
Main.eval("""
for i = 1:10
#println(i)
if dfj.Score[i] >= 10
println(dfj.Score[i])
end
end
"""
)
Run Code Online (Sandbox Code Playgroud)
但是我收到以下错误消息:
JuliaError: Exception 'TypeError: non-boolean (PyObject) used in boolean context' occurred while calling julia code:
Run Code Online (Sandbox Code Playgroud)
此外还有以下命令:
Main.eval("""
println(dfj.Score[1])
"""
)
Run Code Online (Sandbox Code Playgroud)
给出输出(看起来不是 Julia DataFrame):
PyObject 84
Run Code Online (Sandbox Code Playgroud)
有没有办法将 pandas DataFrame 转换为 Julia DataFrame?
编辑1
感谢 @PrzemyslawSzufel 的回答,以下代码现在可以运行:
import numpy as np
import julia
import pandas as pd
import random
import copy
from julia import Base
from julia import Main
from julia import DataFrames
from julia import Pandas
#julia.install(DataFrame)
%load_ext julia.magic
n = 100000
randomlist = []
for i in range(0,n):
num = random.randint(1,100)
randomlist.append(num)
data = {
'Score': list(randomlist),
'ScoreBin': list(np.zeros(n))
}
df = pd.DataFrame(data, columns = ['Score', 'ScoreBin'])
Main.df = df;
Main.eval("""
dfj = df |> Pandas.DataFrame|> DataFrames.DataFrame;
""")
Run Code Online (Sandbox Code Playgroud)
;
然而,虽然我在行尾放置了 a ,但我总是从 dfj 得到打印输出,这是不需要的且很长(100000 行)并且需要大约一秒钟的时间。有没有办法避免打印输出?
此外,如果我现在修改 Julia 中的数据帧(这比在 python 中执行要快得多,也是整个问题的目标)并希望它将其转换回 python pandas,我也会收到错误
Main.eval("""
for i = 1:length(dfj[:, :Score])
if dfj[i, :Score] > 50
dfj[i, :ScoreBin] = 1
end
end
"""
)
dfjpy = pd.DataFrame(Main.dfj)
dfjpy
RuntimeError: Julia exception: MethodError: no method matching iterate(::DataFrames.DataFrame)
Closest candidates are:
iterate(!Matched::Core.SimpleVector) at essentials.jl:568
iterate(!Matched::Core.SimpleVector, !Matched::Any) at essentials.jl:568
iterate(!Matched::ExponentialBackOff) at error.jl:199
...
Stacktrace:
[1] jlwrap_iterator(::DataFrames.DataFrame) at /Users/mymac/.julia/packages/PyCall/zqDXB/src/pyiterator.jl:144
[2] pyjlwrap_getiter(::Ptr{PyCall.PyObject_struct}) at /Users/mymac/.julia/packages/PyCall/zqDXB/src/pyiterator.jl:125
Run Code Online (Sandbox Code Playgroud)
顺便说一下,该命令type(dfjpy)
给出了PyCall.jlwrap
输出
编辑2
为了将 Julia Dataframe 转换为 Python Pandas,您必须首先将其转换为 Julia Pandas。is 是最新的工作代码
n = 100000
randomlist = []
for i in range(0,n):
num = random.randint(1,100)
randomlist.append(num)
data = {
'Score': list(randomlist),
'ScoreBin': list(np.zeros(n))
}
df = pd.DataFrame(data, columns = ['Score', 'ScoreBin'])
Main.df = df;
Main.eval("""
dfj = df |> Pandas.DataFrame|> DataFrames.DataFrame;
for i = 1:length(dfj[:, :Score])
if dfj[i, :Score] > 50
dfj[i, :ScoreBin] = 1
end
end
dfjp = dfj |> Pandas.DataFrame;
"""
)
dfjpy = Main.dfjp
dfjpy
Run Code Online (Sandbox Code Playgroud)
您需要已经Pandas.jl
安装。该库将使用 Julia 处理您的 Python pandas 数据框架,然后您可以将其转换为DataFrames.jl
.
这是 Julia 代码(假设这dfj
是您的 Python 变量):
import DataFrames
import Pandas
juliandf = dfj |> Pandas.DataFrame |> DataFrames.DataFrame;
Run Code Online (Sandbox Code Playgroud)
请注意,最后一行也可以写为:
C= DataFrames.DataFrame(Pandas.DataFrame(dfj));
Run Code Online (Sandbox Code Playgroud)
转换回来Pandas.DataFrame(juliandf)
应该可以。
归档时间: |
|
查看次数: |
1226 次 |
最近记录: |