Myk*_*tko 7 python time-series deep-learning tensorflow tensorflow-datasets
我有多个时间序列数据,如下所示:
df = pd.DataFrame({'Time': np.tile(np.arange(5), 2),
'Object': np.concatenate([[i] * 5 for i in [1, 2]]),
'Feature1': np.random.randint(10, size=10),
'Feature2': np.random.randint(10, size=10)})
Time Object Feature1 Feature2
0 0 1 3 3
1 1 1 9 2
2 2 1 6 6
3 3 1 4 0
4 4 1 7 7
5 0 2 4 8
6 1 2 3 7
7 2 2 1 1
8 3 2 7 5
9 4 2 1 7
Run Code Online (Sandbox Code Playgroud)
其中每个对象(1 和 2)都有自己的数据(实际数据中大约有 2000 个对象)。我想将这些数据分块输入 RNN/LSTM,使用tf.data.Dataset.window一种不同的对象数据不会出现在一个窗口中的方式,如下例所示:
dataset = tf.data.Dataset.from_tensor_slices(df)
for w in dataset.window(3, shift=1, drop_remainder=True):
print(list(w.as_numpy_iterator()))
Run Code Online (Sandbox Code Playgroud)
输出:
[array([0, 1, 3, 3]), array([1, 1, 9, 2]), array([2, 1, 6, 6])]
[array([1, 1, 9, 2]), array([2, 1, 6, 6]), array([3, 1, 4, 0])]
[array([2, 1, 6, 6]), array([3, 1, 4, 0]), array([4, 1, 7, 7])]
[array([3, 1, 4, 0]), array([4, 1, 7, 7]), array([0, 2, 4, 8])] # Mixed data from both objects
[array([4, 1, 7, 7]), array([0, 2, 4, 8]), array([1, 2, 3, 7])] # Mixed data from both objects
[array([0, 2, 4, 8]), array([1, 2, 3, 7]), array([2, 2, 1, 1])]
[array([1, 2, 3, 7]), array([2, 2, 1, 1]), array([3, 2, 7, 5])]
[array([2, 2, 1, 1]), array([3, 2, 7, 5]), array([4, 2, 1, 7])]
Run Code Online (Sandbox Code Playgroud)
预期输出:
[array([0, 1, 3, 3]), array([1, 1, 9, 2]), array([2, 1, 6, 6])]
[array([1, 1, 9, 2]), array([2, 1, 6, 6]), array([3, 1, 4, 0])]
[array([2, 1, 6, 6]), array([3, 1, 4, 0]), array([4, 1, 7, 7])]
[array([0, 2, 4, 8]), array([1, 2, 3, 7]), array([2, 2, 1, 1])]
[array([1, 2, 3, 7]), array([2, 2, 1, 1]), array([3, 2, 7, 5])]
[array([2, 2, 1, 1]), array([3, 2, 7, 5]), array([4, 2, 1, 7])]
Run Code Online (Sandbox Code Playgroud)
也许还有另一种方法可以做到。主要要求是我的模型应该看到非混合数据块来自不同的对象(可能通过嵌入)。
嗯,也许只是创建两个单独的数据帧,然后在窗口后连接。这样,你就不会有任何重叠:
import tensorflow as tf
import pandas as pd
import numpy as np
df = pd.DataFrame({'Time': np.tile(np.arange(5), 2),
'Object': np.concatenate([[i] * 5 for i in [1, 2]]),
'Feature1': np.random.randint(10, size=10),
'Feature2': np.random.randint(10, size=10)})
df1 = df[df['Object'] == 1]
df2 = df[df['Object'] == 2]
dataset = tf.data.Dataset.from_tensor_slices(df1).window(3, shift=1, drop_remainder=True).concatenate(tf.data.Dataset.from_tensor_slices(df2).window(3, shift=1, drop_remainder=True))
for w in dataset:
print(list(w.as_numpy_iterator()))
Run Code Online (Sandbox Code Playgroud)
[array([0, 1, 3, 3]), array([1, 1, 9, 2]), array([2, 1, 6, 6])]
[array([1, 1, 9, 2]), array([2, 1, 6, 6]), array([3, 1, 4, 0])]
[array([2, 1, 6, 6]), array([3, 1, 4, 0]), array([4, 1, 7, 7])]
[array([0, 2, 4, 8]), array([1, 2, 3, 7]), array([2, 2, 1, 1])]
[array([1, 2, 3, 7]), array([2, 2, 1, 1]), array([3, 2, 7, 5])]
[array([2, 2, 1, 1]), array([3, 2, 7, 5]), array([4, 2, 1, 7])]
Run Code Online (Sandbox Code Playgroud)
更新1:
另一种方法是tf.data.Dataset.filter像这样使用:
import tensorflow as tf
import pandas as pd
import numpy as np
df = pd.DataFrame({'Time': np.tile(np.arange(5), 2),
'Object': np.concatenate([[i] * 5 for i in [1, 2]]),
'Feature1': np.random.randint(10, size=10),
'Feature2': np.random.randint(10, size=10)})
objects = df['Object'].unique()
dataset = tf.data.Dataset.from_tensor_slices(df)
new_dataset = None
for o in objects:
temp_dataset = dataset.filter(lambda x: tf.math.equal(x[1], tf.constant(o))).window(3, shift=1, drop_remainder=True)
if new_dataset:
new_dataset = new_dataset.concatenate(temp_dataset)
else:
new_dataset = temp_dataset
for w in new_dataset:
print(list(w.as_numpy_iterator()))
Run Code Online (Sandbox Code Playgroud)
更新 2:另一个选择是排除/删除重叠序列。这样您就可以灵活地决定如何处理重叠:
import tensorflow as tf
import pandas as pd
import numpy as np
df = pd.DataFrame({'Time': np.tile(np.arange(5), 2),
'Object': np.concatenate([[i] * 5 for i in [1, 2]]),
'Feature1': np.random.randint(10, size=10),
'Feature2': np.random.randint(10, size=10)})
dataset = tf.data.Dataset.from_tensor_slices(df).window(3, shift=1, drop_remainder=True).flat_map(lambda x: x.batch(3)).filter(lambda y: tf.reduce_all(tf.unique(y[..., 1])[1] == 0))
for w in dataset:
print(w)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
689 次 |
| 最近记录: |