将Pandas数据帧转换为PyTorch张量?

M. *_*bio 19 python dataframe pandas pytorch

我想使用个人数据库在PyTorch上训练一个简单的神经网络.此数据库从Excel文件导入并存储在df.

其中一列被命名"Target",它是网络的目标变量.如何使用此数据框作为PyTorch神经网络的输入?

我试过这个,但它不起作用:

target = pd.DataFrame(data = df['Target'])
train = data_utils.TensorDataset(df, target)
train_loader = data_utils.DataLoader(train, batch_size = 10, shuffle = True)
Run Code Online (Sandbox Code Playgroud)

blu*_*nox 17

我指的是标题中的问题,因为你没有在文本中真正指定任何其他内容,所以只需将DataFrame转换为PyTorch张量.

如果没有关于您的数据的信息,我只是将浮点值作为示例目标.

将Pandas数据帧转换为PyTorch张量?

import pandas as pd
import torch
import random

# creating dummy targets (float values)
targets_data = [random.random() for i in range(10)]

# creating DataFrame from targets_data
targets_df = pd.DataFrame(data=targets_data)
targets_df.columns = ['targets']

# creating tensor from targets_df 
torch_tensor = torch.tensor(targets_df['targets'].values)

# printing out result
print(torch_tensor)
Run Code Online (Sandbox Code Playgroud)

输出:

tensor([ 0.5827,  0.5881,  0.1543,  0.6815,  0.9400,  0.8683,  0.4289,
         0.5940,  0.6438,  0.7514], dtype=torch.float64)
Run Code Online (Sandbox Code Playgroud)

用Pytorch 0.4.0测试.

如果您有任何其他问题,我希望这会有所帮助 - 请问.:)

  • 使用你的代码我写了这个:`train_target = torch.tensor(train['Target'].values) train = torch.tensor(train.drop('Target', axis = 1).values) train_tensor = data_utils.TensorDataset( train, train_target) train_loader = data_utils.DataLoader(dataset = train_tensor, batch_size = batch_size, shuffle = True)` 运行神经网络模型我得到这个错误: `RuntimeError: Expected object of type torch.FloaTtensor but found type torch.DoubleTensor for参数 #4 'mat1'` (2认同)

uke*_*emi 12

您可以将df.values属性(numpy 数组)直接传递给 Dataset 构造函数:

import torch.utils.data as data_utils

# Creating np arrays
target = df['Target'].values
features = df.drop('Target', axis=1).values

# Passing to DataLoader
train = data_utils.TensorDataset(features, target)
train_loader = data_utils.DataLoader(train, batch_size=10, shuffle=True)
Run Code Online (Sandbox Code Playgroud)

注意:您的特征(df)还包含目标变量(df['Target']),即您的网络正在“作弊”,因为它可以看到输入中的目标。您需要从功能集中删除此列。


Anh*_*INH 9

您可以使用以下函数将任何数据帧或熊猫系列转换为 pytorch 张量

import pandas as pd
import torch

# determine the supported device
def get_device():
    if torch.cuda.is_available():
        device = torch.device('cuda:0')
    else:
        device = torch.device('cpu') # don't have GPU 
    return device

# convert a df to tensor to be used in pytorch
def df_to_tensor(df):
    device = get_device()
    return torch.from_numpy(df.values).float().to(device)

df_tensor = df_to_tensor(df)
series_tensor = df_to_tensor(series)
Run Code Online (Sandbox Code Playgroud)

  • @Luis:这是您要转换的 pandas 系列。换成你的。 (2认同)

Gau*_*ava 7

只需转换pandas dataframe -> numpy array -> pytorch tensor. 下面描述了这样的示例:

import pandas as pd
import numpy as np
import torch

df = pd.read_csv('train.csv')
target = pd.DataFrame(df['target'])
del df['target']
train = data_utils.TensorDataset(torch.Tensor(np.array(df)), torch.Tensor(np.array(target)))
train_loader = data_utils.DataLoader(train, batch_size = 10, shuffle = True)
Run Code Online (Sandbox Code Playgroud)

希望这将帮助您使用 pytorch 创建自己的数据集(与最新版本的 pytorch 兼容)。


All*_*len 5

也许尝试一下,看看它是否可以解决您的问题(根据您的示例代码)?

train_target = torch.tensor(train['Target'].values.astype(np.float32))
train = torch.tensor(train.drop('Target', axis = 1).values.astype(np.float32)) 
train_tensor = data_utils.TensorDataset(train, train_target) 
train_loader = data_utils.DataLoader(dataset = train_tensor, batch_size = batch_size, shuffle = True)
Run Code Online (Sandbox Code Playgroud)


小智 5

#This works for me

target = torch.tensor(df['Targets'].values)
features = torch.tensor(df.drop('Targets', axis = 1).values)

train = data_utils.TensorDataset(features, target)
train_loader = data_utils.DataLoader(train, batch_size=10, shuffle=True)
Run Code Online (Sandbox Code Playgroud)