在Julia中将DataFrame重新采样为每小时15分钟和5分钟

jbs*_*ssm 7 resampling dataframe julia

我对朱莉娅很新,但是我试试看,因为基准测试声称它比Python快得多.

我试图以["unixtime","price","amount"]格式使用一些股票价格数据

我设法加载数据并将unixtime转换为Julia中的日期,但现在我需要重新采样数据以使用olhc(开,高,低,收盘)作为价格和金额的总和,在特定时期朱莉娅(每小时,15分钟,5分钟等等):

julia> head(btc_raw_data)
6x3 DataFrame:
                           date price  amount
[1,]    2011-09-13T13:53:36 UTC   5.8     1.0
[2,]    2011-09-13T13:53:44 UTC  5.83     3.0
[3,]    2011-09-13T13:53:49 UTC   5.9     1.0
[4,]    2011-09-13T13:53:54 UTC   6.0    20.0
[5,]    2011-09-13T14:32:53 UTC  5.95 12.4521
[6,]    2011-09-13T14:35:04 UTC  5.88   7.458
Run Code Online (Sandbox Code Playgroud)

我看到有一个名为Resampling的软件包,但它似乎只接受一个时间段,只有我想要输出数据的行数.

还有其他选择吗?

Fem*_*der 2

您可以使用https://github.com/femtotrader/TimeSeriesIO.jl将 DataFrame(来自 DataFrames.jl)转换为 TimeArray(来自 TimeSeries.jl)

\n\n
using TimeSeriesIO: TimeArray\nta = TimeArray(df, colnames=[:price], timestamp=:date)\n
Run Code Online (Sandbox Code Playgroud)\n\n

您可以使用 TimeSeriesResampler https://github.com/femtotrader/TimeSeriesResampler.jl \n 和 TimeFrames https://github.com/femtotrader/TimeFrames.jl重新采样时间序列(来自 TimeSeries.jl 的 TimeArray)

\n\n
using TimeSeriesResampler: resample, mean, ohlc, sum, TimeFrame\n\n# Define a sample timeseries (prices for example)\nidx = DateTime(2010,1,1):Dates.Minute(1):DateTime(2011,1,1)\nidx = idx[1:end-1]\nN = length(idx)\ny = rand(-1.0:0.01:1.0, N)\ny = 1000 + cumsum(y)\n#df = DataFrame(Date=idx, y=y)\nta = TimeArray(collect(idx), y, ["y"])\nprintln("ta=")\nprintln(ta)\n\n# Define how datetime should be grouped (timeframe)\ntf = TimeFrame(dt -> floor(dt, Dates.Minute(15)))\n\n# resample using OHLC values\nta_ohlc = ohlc(resample(ta, tf))\nprintln("ta_ohlc=")\nprintln(ta_ohlc)\n\n# resample using mean values\nta_mean = mean(resample(ta, tf))\nprintln("ta_mean=")\nprintln(ta_mean)\n\n# Define an other sample timeseries (volume for example)\nvol = rand(0:0.01:1.0, N)\nta_vol = TimeArray(collect(idx), vol, ["vol"])\nprintln("ta_vol=")\nprintln(ta_vol)\n\n# resample using sum values\nta_vol_sum = sum(resample(ta_vol, tf))\nprintln("ta_vol_sum=")\nprintln(ta_vol_sum)\n
Run Code Online (Sandbox Code Playgroud)\n\n

你应该得到:

\n\n
julia> ta\n525600x1 TimeSeries.TimeArray{Float64,1,DateTime,Array{Float64,1}} 2010-01-01T00:00:00 to 2010-12-31T23:59:00\n\n                      y\n2010-01-01T00:00:00 | 1000.16\n2010-01-01T00:01:00 | 1000.1\n2010-01-01T00:02:00 | 1000.98\n2010-01-01T00:03:00 | 1001.38\n\xe2\x8b\xae\n2010-12-31T23:56:00 | 972.3\n2010-12-31T23:57:00 | 972.85\n2010-12-31T23:58:00 | 973.74\n2010-12-31T23:59:00 | 972.8\n\n\njulia> ta_ohlc\n35040x4 TimeSeries.TimeArray{Float64,2,DateTime,Array{Float64,2}} 2010-01-01T00:00:00 to 2010-12-31T23:45:00\n\n                      Open       High       Low        Close\n2010-01-01T00:00:00 | 1000.16    1002.5     1000.1     1001.54\n2010-01-01T00:15:00 | 1001.57    1002.64    999.38     999.38\n2010-01-01T00:30:00 | 999.13     1000.91    998.91     1000.91\n2010-01-01T00:45:00 | 1001.0     1006.42    1001.0     1006.42\n\xe2\x8b\xae\n2010-12-31T23:00:00 | 980.84     981.56     976.53     976.53\n2010-12-31T23:15:00 | 975.74     977.46     974.71     975.31\n2010-12-31T23:30:00 | 974.72     974.9      971.73     972.07\n2010-12-31T23:45:00 | 972.33     973.74     971.49     972.8\n\n\njulia> ta_mean\n35040x1 TimeSeries.TimeArray{Float64,1,DateTime,Array{Float64,1}} 2010-01-01T00:00:00 to 2010-12-31T23:45:00\n\n                      y\n2010-01-01T00:00:00 | 1001.1047\n2010-01-01T00:15:00 | 1001.686\n2010-01-01T00:30:00 | 999.628\n2010-01-01T00:45:00 | 1003.5267\n\xe2\x8b\xae\n2010-12-31T23:00:00 | 979.1773\n2010-12-31T23:15:00 | 975.746\n2010-12-31T23:30:00 | 973.482\n2010-12-31T23:45:00 | 972.3427\n\njulia> ta_vol\n525600x1 TimeSeries.TimeArray{Float64,1,DateTime,Array{Float64,1}} 2010-01-01T00:00:00 to 2010-12-31T23:59:00\n\n                      vol\n2010-01-01T00:00:00 | 0.37\n2010-01-01T00:01:00 | 0.67\n2010-01-01T00:02:00 | 0.29\n2010-01-01T00:03:00 | 0.28\n\xe2\x8b\xae\n2010-12-31T23:56:00 | 0.74\n2010-12-31T23:57:00 | 0.66\n2010-12-31T23:58:00 | 0.22\n2010-12-31T23:59:00 | 0.47\n\n\njulia> ta_vol_sum\n35040x1 TimeSeries.TimeArray{Float64,1,DateTime,Array{Float64,1}} 2010-01-01T00:00:00 to 2010-12-31T23:45:00\n\n                      vol\n2010-01-01T00:00:00 | 7.13\n2010-01-01T00:15:00 | 6.99\n2010-01-01T00:30:00 | 8.73\n2010-01-01T00:45:00 | 8.27\n\xe2\x8b\xae\n2010-12-31T23:00:00 | 6.11\n2010-12-31T23:15:00 | 7.49\n2010-12-31T23:30:00 | 5.75\n2010-12-31T23:45:00 | 8.36\n
Run Code Online (Sandbox Code Playgroud)\n