如何将 Python pandas 转换为 Julia DataFrame(使用 PyJulia)并返回 Python Pandas

ecj*_*cjb 3 python dataframe pandas julia julia-dataframe

我想用它PyJulia来加速代码的某些部分

import numpy as np
import julia
import pandas as pd
import random
from julia import Base
from julia import Main
from julia import DataFrames

n = 100000
randomlist = []
for i in range(0,n):
    num = random.randint(1,100)
    randomlist.append(num)

data = {
    'Score': list(randomlist),
        'ScoreBin': list(np.zeros(n))
           }
df = pd.DataFrame(data, columns = ['Score', 'ScoreBin'])
Main.dfj = df

Main.eval(""" 
for i = 1:10
    #println(i)
    if dfj.Score[i] >= 10
        println(dfj.Score[i])
    end
end
"""
)
Run Code Online (Sandbox Code Playgroud)

但是我收到以下错误消息:

JuliaError: Exception 'TypeError: non-boolean (PyObject) used in boolean context' occurred while calling julia code:
Run Code Online (Sandbox Code Playgroud)

此外还有以下命令:

Main.eval(""" 
println(dfj.Score[1])
"""
)
Run Code Online (Sandbox Code Playgroud)

给出输出(看起来不是 Julia DataFrame):

PyObject 84
Run Code Online (Sandbox Code Playgroud)

有没有办法将 pandas DataFrame 转换为 Julia DataFrame?

编辑1

感谢 @PrzemyslawSzufel 的回答,以下代码现在可以运行:

import numpy as np
import julia
import pandas as pd
import random
import copy
from julia import Base
from julia import Main
from julia import DataFrames
from julia import Pandas
#julia.install(DataFrame)
%load_ext julia.magic

n = 100000
randomlist = []
for i in range(0,n):
    num = random.randint(1,100)
    randomlist.append(num)

data = {
    'Score': list(randomlist),
        'ScoreBin': list(np.zeros(n))
           }
df = pd.DataFrame(data, columns = ['Score', 'ScoreBin'])
Main.df = df;

Main.eval("""
dfj = df |> Pandas.DataFrame|> DataFrames.DataFrame;
""")
Run Code Online (Sandbox Code Playgroud)

;然而,虽然我在行尾放置了 a ,但我总是从 dfj 得到打印输出,这是不需要的且很长(100000 行)并且需要大约一秒钟的时间。有没有办法避免打印输出?

此外,如果我现在修改 Julia 中的数据帧(这比在 python 中执行要快得多,也是整个问题的目标)并希望它将其转换回 python pandas,我也会收到错误

Main.eval(""" 
for i = 1:length(dfj[:, :Score])
    if dfj[i, :Score] > 50
        dfj[i, :ScoreBin] = 1 
    end
end
"""
)

dfjpy = pd.DataFrame(Main.dfj)
dfjpy


RuntimeError: Julia exception: MethodError: no method matching iterate(::DataFrames.DataFrame)
Closest candidates are:
  iterate(!Matched::Core.SimpleVector) at essentials.jl:568
  iterate(!Matched::Core.SimpleVector, !Matched::Any) at essentials.jl:568
  iterate(!Matched::ExponentialBackOff) at error.jl:199
  ...
Stacktrace:
 [1] jlwrap_iterator(::DataFrames.DataFrame) at /Users/mymac/.julia/packages/PyCall/zqDXB/src/pyiterator.jl:144
 [2] pyjlwrap_getiter(::Ptr{PyCall.PyObject_struct}) at /Users/mymac/.julia/packages/PyCall/zqDXB/src/pyiterator.jl:125
Run Code Online (Sandbox Code Playgroud)

顺便说一下,该命令type(dfjpy)给出了PyCall.jlwrap输出

编辑2

为了将 Julia Dataframe 转换为 Python Pandas,您必须首先将其转换为 Julia Pandas。is 是最新的工作代码

n = 100000
randomlist = []
for i in range(0,n):
    num = random.randint(1,100)
    randomlist.append(num)

data = {
    'Score': list(randomlist),
        'ScoreBin': list(np.zeros(n))
           }
df = pd.DataFrame(data, columns = ['Score', 'ScoreBin'])
Main.df = df;

Main.eval("""
dfj = df |> Pandas.DataFrame|> DataFrames.DataFrame;

for i = 1:length(dfj[:, :Score])
    if dfj[i, :Score] > 50
        dfj[i, :ScoreBin] = 1 
    end
end

dfjp = dfj |> Pandas.DataFrame;
"""
)

dfjpy = Main.dfjp
dfjpy
Run Code Online (Sandbox Code Playgroud)

Prz*_*fel 6

您需要已经Pandas.jl安装。该库将使用 Julia 处理您的 Python pandas 数据框架,然后您可以将其转换为DataFrames.jl.

这是 Julia 代码(假设这dfj是您的 Python 变量):

import DataFrames
import Pandas
juliandf = dfj |> Pandas.DataFrame |> DataFrames.DataFrame;
Run Code Online (Sandbox Code Playgroud)

请注意,最后一行也可以写为:

C= DataFrames.DataFrame(Pandas.DataFrame(dfj));
Run Code Online (Sandbox Code Playgroud)

转换回来Pandas.DataFrame(juliandf)应该可以。