小编use*_*571的帖子

将 Colab 连接到付费 TPU

我想将 Colab 连接到付费 TPU(从免费 TPU 升级)。我使用本指南创建了一个 JSON 密钥:https : //cloud.google.com/docs/authentication/production#auth-cloud-explicit-python,然后将其上传到 Colab。我可以连接到我的存储,但不能连接到 TPU:

%tensorflow_version 2.x
import tensorflow as tf
import os
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = './gcp-permissions.json'

# Authenticated API request - works.
storage_client = storage.Client.from_service_account_json(
    'gcp-permissions.json')
print(list(storage_client.list_buckets())

#Accessing the TPU - does not work. Request times out.
cluster_resolver = tf.distribute.cluster_resolver.TPUClusterResolver(
    tpu='My-TPU-Name',
    zone='us-central1-a',
    project='My-Project-Name'
)
Run Code Online (Sandbox Code Playgroud)

我还尝试了 TPUClusterResolver 调用,仅使用 tpu 名称和 'credentials=gcp-permissions.json' - 结果相同。我已经仔细检查了我的 TPU 是否已在 GCP 控制台中启动并运行。它不是抢占式的。我错过了什么?

谢谢!

google-cloud-platform google-colaboratory google-cloud-tpu tpu

6
推荐指数
1
解决办法
725
查看次数

Huggingface Bert TPU 微调适用于 Colab,但不适用于 GCP

我正在尝试在 TPU 上微调 Huggingface Transformers BERT 模型。它在 Colab 中工作,但当我切换到 GCP 上的付费 TPU 时失败。Jupyter笔记本代码如下:

[1] model = transformers.TFBertModel.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')
# works
[2] cluster_resolver = tf.distribute.cluster_resolver.TPUClusterResolver(
    tpu='[My TPU]',
    zone='us-central1-a',
    project='[My Project]'
)
tf.config.experimental_connect_to_cluster(cluster_resolver)
tf.tpu.experimental.initialize_tpu_system(cluster_resolver)
tpu_strategy = tf.distribute.experimental.TPUStrategy(cluster_resolver)
#Also works. Got a bunch of startup messages from the TPU - all good.

[3] with tpu_strategy.scope():
    model = TFBertModel.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')
#Generates the error below (long). Same line works in Colab.
Run Code Online (Sandbox Code Playgroud)

这是错误消息:

NotFoundError                             Traceback (most recent call last)
<ipython-input-14-2cfc1a238903> in <module>
      1 with tpu_strategy.scope():
----> 2     model …
Run Code Online (Sandbox Code Playgroud)

google-cloud-platform google-colaboratory google-cloud-tpu bert-language-model huggingface-transformers

5
推荐指数
0
解决办法
1454
查看次数