我正在关注PowerShell Gallery入门,其中指出PowerShellGet模块存在于Windows 10中(我正在使用它 - build 14721).要确认,我正在运行PowerShell v5:
>$PSVersionTable
Name Value
---- -----
PSVersion 5.0.14271.1000
PSCompatibleVersions {1.0, 2.0, 3.0, 4.0...}
BuildVersion 10.0.14271.1000
CLRVersion 4.0.30319.42000
WSManStackVersion 3.0
PSRemotingProtocolVersion 2.3
SerializationVersion 1.1.0.1
Run Code Online (Sandbox Code Playgroud)
PowerShellGet还要求NuGet提供程序与PowerShell库一起使用.如果NuGet提供程序不在以下位置之一,则会在首次使用PowerShellGet时自动安装NuGet提供程序:•$ env:ProgramFiles\PackageManagement\ProviderAssemblies
•$ env:LOCALAPPDATA\PackageManagement\ProviderAssemblies
我在这些地方没有任何东西:
>ls $env:LOCALAPPDATA\PackageManagement\ProviderAssemblies
>ls $env:ProgramFiles\PackageManagement\ProviderAssemblies
ls : Cannot find path 'C:\Program Files\PackageManagement\ProviderAssemblies' because it does not exist.
At line:1 char:1
+ ls $env:ProgramFiles\PackageManagement\ProviderAssemblies
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : ObjectNotFound: (C:\Program File...viderAssemblies:String) [Get-ChildItem], ItemNotFoundException
+ FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.GetChildItemCommand
Run Code Online (Sandbox Code Playgroud)
或者,您可以运行Install-PackageProvider -Name NuGet …
我们在 Google Cloud Dataflow 中内置了一个数据管道,它使用来自 pubsub 主题的消息并将它们流式传输到 BigQuery 中。为了测试它是否成功运行,我们在 CI 管道中运行了一些测试,这些测试将消息发布到 pubsub 主题并验证消息是否成功写入 BigQuery。
这是发布到 pubsub 主题的代码:
from google.cloud import pubsub_v1
def post_messages(project_id, topic_id, rows)
futures = dict()
publisher = pubsub_v1.PublisherClient()
topic_path = publisher.topic_path(
project_id, topic_id
)
def get_callback(f, data):
def callback(f):
try:
futures.pop(data)
except:
print("Please handle {} for {}.".format(f.exception(), data))
return callback
for row in rows:
# When you publish a message, the client returns a future. Data must be a bytestring
# ...
# construct a message in …Run Code Online (Sandbox Code Playgroud) 我正在尝试使用异步客户端库(该库也是由 Google 提供)调用外部 API(由 Google 提供的 API)。
我尝试调用的异步方法是async list_featurestores()(文档)。它提供了以下示例代码:
from google.cloud import aiplatform_v1
async def sample_list_featurestores():
# Create a client
client = aiplatform_v1.FeaturestoreServiceAsyncClient()
# Initialize request argument(s)
request = aiplatform_v1.ListFeaturestoresRequest(
parent="parent_value",
)
# Make the request
page_result = client.list_featurestores(request=request)
# Handle the response
async for response in page_result:
print(response)
Run Code Online (Sandbox Code Playgroud)
(我已经从上面的链接页面复制/粘贴了该代码)。
为了运行该代码,我对其进行了稍微调整:
import asyncio
from google.cloud import aiplatform_v1
async def sample_list_featurestores():
client = aiplatform_v1.FeaturestoreServiceAsyncClient()
request = aiplatform_v1.ListFeaturestoresRequest(parent="projects/MY_GCP_PROJECT/locations/europe-west2",)
page_result = client.list_featurestores(request=request)
async for response in page_result:
print(response)
if …Run Code Online (Sandbox Code Playgroud) 我正在尝试使用Azure资源管理器(ARM)模板文件作为ASP.net网站进行部署,并且遇到了障碍.这是Azure的一个新生功能,所以网上没有太多关于它的专业知识,希望有人可以提供帮助.
我可以在新的资源组中成功创建一个新站点(即Microsoft.Web/sites资源),即当我在ARM模板中定义一个网站时,它可以正常工作:
{
"apiVersion": "2014-06-01",
"name": "[parameters('siteName')]",
"type": "Microsoft.Web/sites",
"location": "[parameters('siteLocation')]",
"tags": {
"[concat('hidden-related:', resourceGroup().id, '/providers/Microsoft.Web/serverfarms/', parameters('hostingPlanName'))]": "Resource",
"displayName": "Website"
},
"dependsOn": [
"[concat('Microsoft.Web/serverfarms/', parameters('hostingPlanName'))]"
],
"properties": {
"name": "[parameters('siteName')]",
"serverFarm": "[parameters('hostingPlanName')]"
}
}
Run Code Online (Sandbox Code Playgroud)
当我尝试将ASP.net网站部署到其中时,我的问题出现了.这是我添加到ARM模板的内容:
{
"apiVersion": "2014-06-01",
"name": "[parameters('siteName')]",
"type": "Microsoft.Web/sites",
"location": "[parameters('siteLocation')]",
"tags": {
"[concat('hidden-related:', resourceGroup().id, '/providers/Microsoft.Web/serverfarms/', parameters('hostingPlanName'))]": "Resource",
"displayName": "Website"
},
"dependsOn": [
"[concat('Microsoft.Web/serverfarms/', parameters('hostingPlanName'))]"
],
"properties": {
"name": "[parameters('siteName')]",
"serverFarm": "[parameters('hostingPlanName')]"
},
"resources": [
{
"apiVersion": "2014-06-01",
"type": "extensions",
"name": "MSDeploy",
"dependsOn": [ "[concat('Microsoft.Web/sites/', parameters('siteName'))]" ], …Run Code Online (Sandbox Code Playgroud) 为什么列1st_from_end包含null:
from pyspark.sql.functions import split
df = sqlContext.createDataFrame([('a b c d',)], ['s',])
df.select( split(df.s, ' ')[0].alias('0th'),
split(df.s, ' ')[3].alias('3rd'),
split(df.s, ' ')[-1].alias('1st_from_end')
).show()
Run Code Online (Sandbox Code Playgroud)
我在不可为空的数据框中有一个 StructField。简单的例子:
import pyspark.sql.functions as F
from pyspark.sql.types import *
l = [('Alice', 1)]
df = sqlContext.createDataFrame(l, ['name', 'age'])
df = df.withColumn('foo', F.when(df['name'].isNull(),False).otherwise(True))
df.schema.fields
Run Code Online (Sandbox Code Playgroud)
返回:
[StructField(name,StringType,true), StructField(age,LongType,true), StructField(foo,BooleanType,false)]
请注意,该字段foo不可为空。问题是(出于我不会讨论的原因)我希望它可以为空。我发现这篇文章Change nullable property of column in spark dataframe建议了一种方法,所以我将其中的代码调整为:
import pyspark.sql.functions as F
from pyspark.sql.types import *
l = [('Alice', 1)]
df = sqlContext.createDataFrame(l, ['name', 'age'])
df = df.withColumn('foo', F.when(df['name'].isNull(),False).otherwise(True))
df.schema.fields
newSchema = [StructField('name',StringType(),True), StructField('age',LongType(),True),StructField('foo',BooleanType(),False)]
df2 = sqlContext.createDataFrame(df.rdd, newSchema)
Run Code Online (Sandbox Code Playgroud)
失败了:
TypeError: StructField(name,StringType,true) 不是 JSON 可序列化的
我也在堆栈跟踪中看到了这一点:
raise ValueError("检测到循环引用")
所以我有点卡住了。任何人都可以修改这个例子,使我能够定义一个列可以 …
我正在使用Terraform archive_file 提供程序将多个文件打包成一个 zip 文件。当我像这样定义存档时,它工作正常:
data "archive_file" "archive" {
type = "zip"
output_path = "./${var.name}.zip"
source_dir = "${var.source_dir}"
}
Run Code Online (Sandbox Code Playgroud)
但是我不希望存档包含所有文件var.source_dir,我只想要其中的一个子集。我注意到 archive_file 提供程序有一个source_file属性,所以我希望我可以提供这些文件的列表并将它们打包到存档中,如下所示:
locals {
source_files = ["${var.source_dir}/foo.txt", "${var.source_dir}/bar.txt"]
}
data "archive_file" "archive" {
type = "zip"
output_path = "./${var.name}.zip"
count = "2"
source_file = "${local.source_files[count.index]}"
}
Run Code Online (Sandbox Code Playgroud)
但这不起作用,存档是为定义的每个文件构建的,local.source-files因此我有一个“最后一个获胜”场景,其中构建的存档文件仅包含 bar.txt。
我试过这个:
locals {
source_files = ["${var.source_dir}/main.py", "${var.source_dir}/requirements.txt"]
}
data "archive_file" "archive" {
type = "zip"
output_path = "./${var.name}.zip"
source_file = "${local.source_files}"
}
Run Code Online (Sandbox Code Playgroud)
但不出所料,失败了:
data.archive_file.archive:source_file …
给定PySpark DataFrame是否可以获得 DataFrame 引用的源列的列表?
也许一个更具体的例子可能有助于解释我所追求的。假设我有一个 DataFrame 定义为:
import pyspark.sql.functions as func
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
source_df = spark.createDataFrame(
[("pru", 23, "finance"), ("paul", 26, "HR"), ("noel", 20, "HR")],
["name", "age", "department"],
)
source_df.createOrReplaceTempView("people")
sqlDF = spark.sql("SELECT name, age, department FROM people")
df = sqlDF.groupBy("department").agg(func.max("age").alias("max_age"))
df.show()
Run Code Online (Sandbox Code Playgroud)
返回:
+----------+--------+
|department|max_age |
+----------+--------+
| finance| 23|
| HR| 26|
+----------+--------+
Run Code Online (Sandbox Code Playgroud)
引用的列df是[department, age]。是否可以通过编程方式获取引用列的列表?
感谢在 pyspark 中捕获explain()的结果,我知道我可以将计划提取为字符串:
+----------+--------+
|department|max_age |
+----------+--------+
| finance| 23|
| …Run Code Online (Sandbox Code Playgroud) apache-spark ×3
pyspark ×3
python ×3
archive-file ×1
azure ×1
google-oauth ×1
powershell ×1
terraform ×1