从Azure ML实验中访问Azure博客存储

Ste*_*org 14 python azure azure-machine-learning-studio cortana-intelligence

Azure ML实验提供了通过ReaderWriter模块将CSV文件读取和写入Azure blob存储的方法.但是,我需要将一个JSON文件写入blob存储.由于没有模块可以这样做,我试图在一个Execute Python Script模块中这样做.

# Import the necessary items
from azure.storage.blob import BlobService

def azureml_main(dataframe1 = None, dataframe2 = None):
    account_name = 'mystorageaccount'
    account_key='mykeyhere=='
    json_string='{jsonstring here}'

    blob_service = BlobService(account_name, account_key)

    blob_service.put_block_blob_from_text("upload","out.json",json_string)

    # Return value must be of a sequence of pandas.DataFrame
    return dataframe1,
Run Code Online (Sandbox Code Playgroud)

但是,这会导致错误: ImportError: No module named azure.storage.blob

这意味着azure-storageAzure包上未安装Python包.

如何从Azure ML实验中写入Azure blob存储?

这是填充错误消息:

Error 0085: The following error occurred during script evaluation, please view the output log for more information:
---------- Start of error message from Python interpreter ----------
data:text/plain,Caught exception while executing function: Traceback (most recent call last):
  File "C:\server\invokepy.py", line 162, in batch
    mod = import_module(moduleName)
  File "C:\pyhome\lib\importlib\__init__.py", line 37, in import_module
    __import__(name)
  File "C:\temp\azuremod.py", line 19, in <module>
    from azure.storage.blob import BlobService
ImportError: No module named azure.storage.blob

---------- End of error message from Python  interpreter  ----------
Start time: UTC 02/06/2016 17:59:47
End time: UTC 02/06/2016 18:00:00`
Run Code Online (Sandbox Code Playgroud)

感谢大家!

更新:感谢Dan和Peter的以下想法.这是我使用这些建议取得的进展.我创建了一个干净的Python 2.7虚拟环境(在VS 2005中),并做了一个pip install azure-storage将依赖项放到我的site-packages目录中.然后我按照Dan的下面的说明压缩了site-packages文件夹并上传为Zip文件.然后,我将对site-packages目录的引用包含在内,并成功导入了所需的项目.这导致写入博客存储时出现超时错误.

无法写入Blob存储

这是我的代码:

# Get access to the uploaded Python packages    
import sys
packages = ".\Script Bundle\site-packages"
sys.path.append(packages)

# Import the necessary items from packages referenced above
from azure.storage.blob import BlobService
from azure.storage.queue import QueueService

def azureml_main(dataframe1 = None, dataframe2 = None):
    account_name = 'mystorageaccount'
    account_key='p8kSy3F...elided...3plQ=='

    blob_service = BlobService(account_name, account_key)
    blob_service.put_block_blob_from_text("upload","out.txt","Test to write")

    # All of the following also fail
    #blob_service.create_container('images')
    #blob_service.put_blob("upload","testme.txt","foo","BlockBlob")

    #queue_service = QueueService(account_name, account_key)
    #queue_service.create_queue('taskqueue')

    # Return value must be of a sequence of pandas.DataFrame
    return dataframe1,
Run Code Online (Sandbox Code Playgroud)

这是新的错误日志:

Error 0085: The following error occurred during script evaluation, please view the output log for more information:
---------- Start of error message from Python interpreter ----------
data:text/plain,C:\pyhome\lib\site-packages\requests\packages\urllib3\util\ssl_.py:79: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
  InsecurePlatformWarning
Caught exception while executing function: Traceback (most recent call last):   
  File "C:\server\invokepy.py", line 169, in batch
    odfs = mod.azureml_main(*idfs)
  File "C:\temp\azuremod.py", line 44, in azureml_main
    blob_service.put_blob("upload","testme.txt","foo","BlockBlob")
  File ".\Script Bundle\site-packages\azure\storage\blob\blobservice.py", line 883, in put_blob
    self._perform_request(request)
  File ".\Script Bundle\site-packages\azure\storage\storageclient.py", line 171, in _perform_request
    resp = self._filter(request)
  File ".\Script Bundle\site-packages\azure\storage\storageclient.py", line 160, in _perform_request_worker
    return self._httpclient.perform_request(request)
  File ".\Script Bundle\site-packages\azure\storage\_http\httpclient.py", line 181, in perform_request
    self.send_request_body(connection, request.body)
  File ".\Script Bundle\site-packages\azure\storage\_http\httpclient.py", line 143, in send_request_body
    connection.send(request_body)
  File ".\Script Bundle\site-packages\azure\storage\_http\requestsclient.py", line 81, in send
    self.response = self.session.request(self.method, self.uri, data=request_body, headers=self.headers, timeout=self.timeout)
  File "C:\pyhome\lib\site-packages\requests\sessions.py", line 464, in request
    resp = self.send(prep, **send_kwargs)
  File "C:\pyhome\lib\site-packages\requests\sessions.py", line 576, in send
    r = adapter.send(request, **kwargs)
  File "C:\pyhome\lib\site-packages\requests\adapters.py", line 431, in send
    raise SSLError(e, request=request)
SSLError: The write operation timed out

---------- End of error message from Python  interpreter  ----------
Start time: UTC 02/10/2016 15:33:00
End time: UTC 02/10/2016 15:34:18
Run Code Online (Sandbox Code Playgroud)

我目前的探索领先地位是对requestsPython包的依赖azure-storage.requests在Python 2.7中有一个已知的错误,用于调用较新的SSL协议.不确定,但我现在正在那个区域挖掘.

更新2:此代码在Python 3 Jupyter笔记本中运行完美.此外,如果我将Blob容器打开以进行公共访问,我可以通过URL直接从Container读取.例如:df = pd.read_csv("https://mystorageaccount.blob.core.windows.net/upload/test.csv")从blob存储轻松加载文件.但是,我无法使用azure.storage.blob.BlobService从同一个文件中读取.

在此输入图像描述

更新3:Dan在下面的评论中建议我尝试使用Azure ML上托管的Jupyter笔记本.我从当地的Jupyter笔记本上运行它(参见上面的更新2). 但是,从Azure ML Notebook运行时失败,并且错误requires再次指向包.我需要找到该软件包的已知问题,但从我的阅读中,已知的问题是urllib3,只影响Python 2.7而不是任何Python 3.x版本.这是在Python 3.x笔记本中运行的.哎呀.

在此输入图像描述

更新4:正如Dan在下面所说,这可能是Azure ML网络的一个问题,因为Execute Python Script它相对较新并且只是获得了网络支持.但是,我还在Azure App Service webjob上测试了这一点,该webjob位于完全不同的Azure平台上.(它也是一个完全不同的Python发行版,支持Python 2.7和3.4/5,但只支持32位 - 即使在64位计算机上也是如此.)那里的代码也会失败,并带有InsecurePlatformWarning消息.

[02/08/2016 15:53:54 > b40783: SYS INFO] Run script 'ListenToQueue.py' with script host - 'PythonScriptHost'
[02/08/2016 15:53:54 > b40783: SYS INFO] Status changed to Running
[02/08/2016 15:54:09 > b40783: INFO] test.csv
[02/08/2016 15:54:09 > b40783: ERR ] D:\home\site\wwwroot\env\Lib\site-packages\requests\packages\urllib3\util\ssl_.py:315: SNIMissingWarning: An HTTPS request has been made, but the SNI (Subject Name Indication) extension to TLS is not available on this platform. This may cause the server to present an incorrect TLS certificate, which can cause validation failures. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#snimissingwarning.
[02/08/2016 15:54:09 > b40783: ERR ]   SNIMissingWarning
[02/08/2016 15:54:09 > b40783: ERR ] D:\home\site\wwwroot\env\Lib\site-packages\requests\packages\urllib3\util\ssl_.py:120: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
[02/08/2016 15:54:09 > b40783: ERR ]   InsecurePlatformWarning
[02/08/2016 15:54:09 > b40783: ERR ] D:\home\site\wwwroot\env\Lib\site-packages\requests\packages\urllib3\util\ssl_.py:120: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
[02/08/2016 15:54:09 > b40783: ERR ]   InsecurePlatformWarning
Run Code Online (Sandbox Code Playgroud)

Ste*_*org 5

Bottom Line Up Front:使用HTTP而不是HTTPS来访问Azure存储.

声明BlobService传入protocol='http'以强制服务通过HTTP进行通信时.请注意,您必须将容器配置为允许通过HTTP进行请求(默认情况下这样做).

client = BlobService(STORAGE_ACCOUNT, STORAGE_KEY, protocol="http")

历史和信誉:

我在@AzureHelps上发布了关于此主题的查询,他们在MSDN论坛上打开了一张票:https://social.msdn.microsoft.com/Forums/azure/en-US/46166b22-47ae-4808-ab87-402388dd7a5c/ 麻烦写的Blob存储文件功能于蔚蓝-ML-实验?论坛=机器学习和教授为必填

Sudarshan Raghunathan回应了魔法.以下是让每个人都可以轻松复制我的修复的步骤:

  1. 下载azure.zip,它提供了所需的库:https://azuremlpackagesupport.blob.core.windows.net/python/azure.zip
  2. 将它们作为DataSet上载到Azure ML Studio
  3. 将它们连接到Execute Python Script模块上的Zip输入
  4. 像往常一样编写脚本,确保创建BlobService对象protocol='http'
  5. 运行实验 - 您现在应该能够写入blob存储.

一些示例代码可以在这里找到:https://gist.github.com/drdarshan/92fff2a12ad9946892df

我使用的代码如下,它不首先将CSV写入文件系统,而是作为文本流发送.

from azure.storage.blob import BlobService

def azureml_main(dataframe1 = None, dataframe2 = None):
    account_name = 'mystorageaccount'
    account_key='p8kSy3FACx...redacted...ebz3plQ=='
    container_name = "upload"
    json_output_file_name = 'testfromml.json'
    json_orient = 'records' # Can be index, records, split, columns, values
    json_force_ascii=False;

    blob_service = BlobService(account_name, account_key, protocol='http')

    blob_service.put_block_blob_from_text(container_name,json_output_file_name,dataframe1.to_json(orient=json_orient, force_ascii=json_force_ascii))

    # Return value must be of a sequence of pandas.DataFrame
    return dataframe1,
Run Code Online (Sandbox Code Playgroud)

一些想法:

  1. 如果默认导入azure Python库,我更愿意.作为Anaconda发行版的一部分,Microsoft将数百个第三方库导入Azure ML.它们还应包括与Azure一起使用的必要内容.我们在Azure,我们已经致力于Azure.接受它.
  2. 我不喜欢我必须使用HTTP而不是HTTPS.当然,这是内部Azure通信,所以它可能没什么大不了的.但是,大多数文档都建议在使用blob存储时使用SSL/HTTPS,因此我更愿意这样做.
  3. 我仍然在实验中得到随机超时错误.有时Python代码将在几毫秒内执行,有时它会运行60或几秒,然后超时.这使得在实验中运行它有时非常令人沮丧.但是,当作为Web服务发布时,我似乎没有这个问题.
  4. 我希望我的本地代码的经验与Azure ML更紧密地匹配.在本地,我可以使用HTTPS,永远不会超时.它速度快,易于编写.但是,转移到Azure ML实验意味着几乎每次调试.

来自微软的丹,彼得和苏达山的巨大道具帮助他们解决了这个问题.我非常感谢!