I am new to Tensorboard.
I am using fairly simple code running an experiment, and this is the output:
I don't remember asking for a hp_metric graph, yet here it is.
What is it and how do I get rid of it?
Full code to reproduce, using Pytorch Lightning (not that I think anyone should have to reproduce this to answer):
Please notice the ONLY line dereferencing TensorBoard is
self.logger.experiment.add_scalars("losses", {"train_loss": loss}, global_step=self.current_epoch)
Run Code Online (Sandbox Code Playgroud)
import torch
from torch import nn
import …Run Code Online (Sandbox Code Playgroud) 我使用 PyTorch Lightning 版本 1.4.0 并为数据集定义了以下类:
class CustomTrainDataset(Dataset):
'''
Custom PyTorch Dataset for training
Args:
data (pd.DataFrame) - DF containing product info (and maybe also ratings)
all_itemIds (list) - Python3 list containing all Item IDs
'''
def __init__(self, data, all_orderIds):
self.users, self.items, self.labels = self.get_dataset(data, all_orderIds)
def __len__(self):
return len(self.users)
def __getitem__(self, idx):
return self.users[idx], self.items[idx], self.labels[idx]
def get_dataset(self, data, all_orderIds):
users, items, labels = [], [], []
user_item_set = set(zip(train_ratings['CustomerID'], train_ratings['ItemCode']))
num_negatives = 7
for u, i in user_item_set: …Run Code Online (Sandbox Code Playgroud) 我有一个模型,尝试在 DDP 模式下与训练器一起使用。
import pytorch_lightning as pl
import torch
import torchvision
from torchmetrics import Accuracy
class Model(pl.LightningModule):
def __init__(
self,
model_name: str,
num_classes: int,
model_hparams: Dict["str", Union[str, int]],
optimizer_name: str,
optimizer_hparams: Dict["str", Union[str, int]],
):
super().__init__()
self.save_hyperparameters()
self.model = torchvision.resnet18(num_classes=num_classes, **model_hparams)
self.loss_module = CrossEntropyLoss()
self.example_input_array = torch.zeros((1, 3, 512, 512), dtype=torch.float32)
# Trying to use in DDP mode
self.test_accuracy = Accuracy(num_classes=num_classes)
def forward(self, imgs) -> Tensor:
return self.model(imgs)
# <redacted training_*, val_*, etc. as they are not relevant>
def test_step(self, …Run Code Online (Sandbox Code Playgroud) 我试图使用 PyTorch 和 PyTorch Lightning 制作多输入模型,但我不明白为什么训练器卡在 epoch 0。我试图将此代码从 TensorFlow 迁移到 PyTorch,但 PyTorch 学习曲线是有点陡,我不知道从这里该去哪里。
RC_train_config = config.init_dataset_config(
'RC',
'GI4E',
'label',
16,
lr = 0.001,
epochs = 500,
train_ratio = 0.8
Run Code Online (Sandbox Code Playgroud)
模型的配置,包括超参数和使用的数据集。它也用于数据选择,因为不同的数据集需要不同的处理方法。
class RCDataset(Dataset):
def __init__(self, config_dataset):
super().__init__()
self.config_dataset = config_dataset
# Image-handling
if self.config_dataset['dataset'] == 'all':
pass
elif self.config_dataset['dataset'] == 'BIOID':
if self.config_dataset['mode'] == 'label':
pass
elif self.config_dataset['mode'] == 'filter':
pass
elif self.config_dataset['dataset'] == 'GI4E':
if self.config_dataset['mode'] == 'label':
image1_noteye_paths = glob(C.WORKING_DATASETS['GI4E']['images_label'] + '/0/noteye/*')
image1_eye_paths = glob(C.WORKING_DATASETS['GI4E']['images_label'] + '/0/left/*')
image1_eye_paths += …Run Code Online (Sandbox Code Playgroud) python machine-learning conv-neural-network pytorch pytorch-lightning
官方文档只说明
>>> from pytorch_lightning.metrics import ConfusionMatrix
>>> target = torch.tensor([1, 1, 0, 0])
>>> preds = torch.tensor([0, 1, 0, 0])
>>> confmat = ConfusionMatrix(num_classes=2)
>>> confmat(preds, target)
Run Code Online (Sandbox Code Playgroud)
这并未展示如何在框架中使用指标。
我的尝试(方法不完整,只显示相关部分):
def __init__(...):
self.val_confusion = pl.metrics.classification.ConfusionMatrix(num_classes=self._config.n_clusters)
def validation_step(self, batch, batch_index):
...
log_probs = self.forward(orig_batch)
loss = self._criterion(log_probs, label_batch)
self.val_confusion.update(log_probs, label_batch)
self.log('validation_confusion_step', self.val_confusion, on_step=True, on_epoch=False)
def validation_step_end(self, outputs):
return outputs
def validation_epoch_end(self, outs):
self.log('validation_confusion_epoch', self.val_confusion.compute())
Run Code Online (Sandbox Code Playgroud)
在第 0 个纪元之后,这给出
Traceback (most recent call last):
File "C:\code\EPMD\Kodex\Templates\Testing\venv\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 521, in train
self.train_loop.run_training_epoch()
File "C:\code\EPMD\Kodex\Templates\Testing\venv\lib\site-packages\pytorch_lightning\trainer\training_loop.py", …Run Code Online (Sandbox Code Playgroud) 我正在尝试通过 pytorch_lightning 训练我的问答模型。但是,在运行命令时,trainer.fit(model,data_module)我收到以下错误:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-72-b9cdaa88efa7> in <module>()
----> 1 trainer.fit(model,data_module)
4 frames
/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py in _call_setup_hook(self)
1488
1489 if self.datamodule is not None:
-> 1490 self.datamodule.setup(stage=fn)
1491 self._call_callback_hooks("setup", stage=fn)
1492 self._call_lightning_module_hook("setup", stage=fn)
TypeError: setup() got an unexpected keyword argument 'stage'
Run Code Online (Sandbox Code Playgroud)
我已经安装并导入了 pytorch_lightning。
我还定义了data_module = BioQADataModule(train_df, val_df, tokenizer, batch_size = BATCH_SIZE)其中 BATCH_SIZE = 2,N_EPOCHS = 6。
我使用的模型如下:-
model = T5ForConditionalGeneration.from_pretrained(MODEL_NAME, return_dict=True)
Run Code Online (Sandbox Code Playgroud)
另外,我为模型定义了类,如下所示:-
class BioQAModel(pl.LightningModule):
def __init__(self):
super().__init__()
self.model = T5ForConditionalGeneration.from_pretrained(MODEL_NAME, return_dict=True)
def …Run Code Online (Sandbox Code Playgroud) 我想使用新数据继续模型的训练过程。
我了解您可以继续训练 Pytorch Lightning 模型,例如
pl.Trainer(max_epochs=10, resume_from_checkpoint='./checkpoints/blahblah.ckpt')例如,如果您最后一个检查点保存在第 5 纪元。但是有没有办法通过添加不同的数据来继续训练?
我试图从 pytorch_forecasting 模块进行时间融合变压器,但我在 trainer.fit 方法中遇到错误:model必须是 aLightningModule或torch._dynamo.OptimizedModule,得到TemporalFusionTransformer。我只是从“towardsdatascience”复制这篇论文。参考: https: //towardsdatascience.com/temporal -fusion-transformer-time-series-forecasting-with-deep-learning-complete-tutorial-d32c1e51cd91#:~:text=T%20emporal%20F%20usion%20T,dynamics%20of%20multiple%20time%20sequences。
我有一个现有模型,我在其中加载一些预训练的权重,然后在 pytorch 中进行预测(一次一个图像)。我正在尝试将其基本上转换为 pytorch 闪电模块,并且对一些事情感到困惑。
所以目前,我__init__的模型方法如下所示:
self._load_config_file(cfg_file)
# just creates the pytorch network
self.create_network()
self.load_weights(weights_file)
self.cuda(device=0) # assumes GPU and uses one. This is probably suboptimal
self.eval() # prediction mode
Run Code Online (Sandbox Code Playgroud)
我可以从闪电文档中收集到的信息,我几乎可以做同样的事情,除了不cuda()打电话。所以像:
self.create_network()
self.load_weights(weights_file)
self.freeze() # prediction mode
Run Code Online (Sandbox Code Playgroud)
所以,我的第一个问题是这是否是使用闪电的正确方法?闪电如何知道它是否需要使用 GPU?我猜这需要在某处指定。
现在,对于预测,我有以下设置:
def infer(frame):
img = transform(frame) # apply some transformation to the input
img = torch.from_numpy(img).float().unsqueeze(0).cuda(device=0)
with torch.no_grad():
output = self.__call__(Variable(img)).data.cpu().numpy()
return output
Run Code Online (Sandbox Code Playgroud)
这是让我困惑的一点。我需要覆盖哪些功能才能进行闪电兼容预测?
此外,目前,输入是一个 numpy 数组。这是否可以从闪电模块中实现,或者总是必须使用某种数据加载器?
在某些时候,我想扩展这个模型实现来做训练,所以想确保我做对了,但虽然大多数例子都集中在训练模型上,但一个简单的例子是在生产时对单个图像进行预测/数据点可能有用。
我在带有 cuda 10.1 的 GPU 上使用 0.7.5 和 pytorch …
我在 Pytorch Lightning 中设置了一个迁移学习 Resnet。该结构借自此 wandb 教程 https://wandb.ai/wandb/wandb-lightning/reports/Image-Classification-using-PyTorch-Lightning--VmlldzoyODk1NzY
并查看文档https://pytorch-lightning.readthedocs.io/en/latest/common/lightning_module.html
我对 defforward () 和 def Training_step() 方法之间的区别感到困惑。
最初在 PL 文档中,模型不会在训练步骤中调用,仅在前向调用中调用。但在训练步骤中也不会调用前向。我一直在数据上运行模型,输出看起来很合理(我有一个图像回调,我可以看到模型正在学习,并最终获得了良好的准确性结果)。但我担心,鉴于没有调用前向方法,该模型在某种程度上没有被实现?
型号代码为:
class TransferLearning(pl.LightningModule):
"Works for Resnet at the moment"
def __init__(self, model, learning_rate, optimiser = 'Adam', weights = [ 1/2288 , 1/1500], av_type = 'macro' ):
super().__init__()
self.class_weights = torch.FloatTensor(weights)
self.optimiser = optimiser
self.thresh = 0.5
self.save_hyperparameters()
self.learning_rate = learning_rate
#add metrics for tracking
self.accuracy = Accuracy()
self.loss= nn.CrossEntropyLoss()
self.recall = Recall(num_classes=2, threshold=self.thresh, average = av_type)
self.prec = Precision( num_classes=2, …Run Code Online (Sandbox Code Playgroud)