Zet*_*eta 4 python machine-learning data-science pytorch
这是我第一次使用Pytorch和Pytorch几何。我正在尝试使用 Pytorch Geometric 创建一个简单的图神经网络。我正在通过遵循 Pytorch Geometric 文档并扩展 InMemoryDataset 创建自定义数据集。之后,我将数据集分为训练数据集、验证数据集和测试数据集,其大小分别为(3496、437、439)。这些是每个数据集中的图表数量。这是我的简单神经网络
class Net(torch.nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = GCNConv(dataset.num_node_features, 10)
self.conv2 = GCNConv(10, dataset.num_classes)
def forward(self, data):
x, edge_index, batch = data.x, data.edge_index, data.batch
x = self.conv1(x, edge_index)
x = F.relu(x)
x = F.dropout(x, training=self.training)
x = self.conv2(x, edge_index)
return F.log_softmax(x, dim=1)
Run Code Online (Sandbox Code Playgroud)
我在训练模型时收到此错误,这表明我的输入维度存在一些问题。也许原因在于我的批量大小?
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
File "E:\Users\abc\Anaconda3\lib\site-packages\torch_scatter\scatter.py", line 22, in scatter_add
size[dim] = int(index.max()) + 1
out = torch.zeros(size, dtype=src.dtype, device=src.device)
return out.scatter_add_(dim, index, src)
~~~~~~~~~~~~~~~~ <--- HERE
else:
return out.scatter_add_(dim, index, src)
RuntimeError: index 13654 is out of bounds for dimension 0 with size 678
Run Code Online (Sandbox Code Playgroud)
该错误特别发生在神经网络中的这行代码上,
x = self.conv1(x, edge_index)
Run Code Online (Sandbox Code Playgroud)
编辑:添加了有关edge_index的更多信息,并更详细地解释了我正在使用的数据。
这是我试图传递的变量的形状
x: torch.Size([678, 43])
edge_index: torch.Size([2, 668])
torch.max(edge_index): tensor(541690)
torch.min(edge_index): tensor(1920)
Run Code Online (Sandbox Code Playgroud)
我正在使用包含对象的数据列表Data(x=node_features, edge_index=edge_index, y=labels)。当我将数据集分为训练、验证和测试数据集时,我(3496, 437, 439)分别在每个数据集中获得图表。最初,我尝试从我的数据集创建一个图,但我不确定它如何与Dataloader小批量一起使用。
train_loader = DataLoader(train_dataset, batch_size=batch_size)
val_loader = DataLoader(val_dataset, batch_size=batch_size)
test_loader = DataLoader(test_dataset, batch_size=batch_size)
Run Code Online (Sandbox Code Playgroud)
这是从数据帧生成图表的代码。我尝试创建一个简单的图,其中只有一些顶点和一些连接它们的边。我可能忽略了一些事情,这就是我遇到这个问题的原因。创建此图时,我尝试遵循 Pytorch 几何文档(Pytorch Geometric:创建您自己的数据集)
def process(self):
data_list = []
grouped = df.groupby('EntityId')
for id, group in grouped:
node_features = torch.tensor(group.drop(['Labels'], axis=1).values)
source_nodes = group.index[1:].values
target_nodes = group.index[:-1].values
labels = torch.tensor(group.Labels.values)
edge_index = torch.tensor([source_nodes, target_nodes])
data = Data(x=node_features, edge_index=edge_index, y=labels)
data_list.append(data)
if self.pre_filter is not None:
data_list = [data for data in data_list if self.pre_filter(data)]
if self.pre_transform is not None:
data_list = [self.pre_transform(data) for data in data_list]
data, slices = self.collate(data_list)
torch.save((data, slices), self.processed_paths[0])
Run Code Online (Sandbox Code Playgroud)
如果有人可以帮助我在任何类型的数据上创建图表并将其与 GCNConv 一起使用,我将不胜感激。
我同意@TrialNerror——这是一个数据问题。您edge_index应该参考数据节点,它max不应该那么高。由于您不想向我们展示数据并要求“在任何类型的数据上创建图表”,所以它就在这里。
我基本上保持你Net不变。您可以尝试使用与您的数据相匹配的常量。
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch_geometric.nn import GCNConv
from torch_geometric.data import Data
num_node_features = 100
num_classes = 2
num_nodes = 678
num_edges = 1500
num_hidden_nodes = 128
x = torch.randn((num_nodes, num_node_features), dtype=torch.float32)
edge_index = torch.randint(low=0, high=num_nodes, size=(2, num_edges), dtype=torch.long)
y = torch.randint(low=0, high=num_classes, size=(num_nodes,), dtype=torch.long)
class Net(torch.nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = GCNConv(num_node_features, num_hidden_nodes)
self.conv2 = GCNConv(num_hidden_nodes, num_classes)
def forward(self, data):
x, edge_index = data.x, data.edge_index
x = self.conv1(x, edge_index)
x = F.relu(x)
x = F.dropout(x, training=self.training)
x = self.conv2(x, edge_index)
return F.log_softmax(x, dim=1)
data = Data(x=x, edge_index=edge_index, y=y)
net = Net()
optimizer = torch.optim.Adam(net.parameters(), lr=1e-2)
for i in range(1000):
output = net(data)
loss = F.cross_entropy(output, data.y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
if i % 100 == 0:
print('Accuracy: ', (torch.argmax(output, dim=1)==data.y).float().mean())
Run Code Online (Sandbox Code Playgroud)
输出
Accuracy: tensor(0.5059)
Accuracy: tensor(0.8702)
Accuracy: tensor(0.9159)
Accuracy: tensor(0.9233)
Accuracy: tensor(0.9336)
Accuracy: tensor(0.9484)
Accuracy: tensor(0.9602)
Accuracy: tensor(0.9676)
Accuracy: tensor(0.9705)
Accuracy: tensor(0.9749)
Run Code Online (Sandbox Code Playgroud)
(是的,我们可以过度拟合随机数据)
| 归档时间: |
|
| 查看次数: |
4916 次 |
| 最近记录: |