小编mdl*_*003的帖子

使用 openpyxl 进行数据验证不会写入文件 - 包含代码

实际写入每个文件的代码运行得很好。我遇到的问题是数据验证部分似乎没有做任何事情。在我引用的范围内没有创建下拉菜单。

预先感谢您的任何和所有帮助！

%%time

import pandas as pd
import xlsxwriter as ew
import csv as csv
import os
import glob
import openpyxl

#remove existing files from directory
files = glob.glob(#filename)
for f in files:
    os.remove(f)

pendpath = #filename
df = pd.read_sas(pendpath)

allusers = df.UserID_NB.unique()
listuserpath = #filename
listusers = pd.read_csv(listuserpath)

listusers = listusers['USER_ID'].apply(lambda x: str(x).strip())

for id in listusers:
    x = df.loc[df['UserID_NB']==id]
    path = #filename
    x.to_excel(path, sheet_name = str(id), index = False)

    from openpyxl import load_workbook

    wb = openpyxl.load_workbook(filename = path) …

Run Code Online (Sandbox Code Playgroud)

python validation openpyxl

mdl*_*003

2017 02-18

5
推荐指数

1
解决办法

1万
查看次数

加入 Spark Dataframe 时出现意外的断言失败错误 - 发现重复的重写属性

当我运行下面的代码时，出现错误java.lang.AssertionError:assertionfailed:Foundduplicaterewriteattributes。在更新我们的 databricks 运行时之前，它运行得很顺利。

top10_df 是列表中具有唯一键的数据的数据框groups。
res_df 是 top10_df 中唯一键与最小和最大日期的聚合。
创建并保存 res_df 后，它会根据组中的唯一键重新连接到 top10_df 中。

groups = ['col1','col2','col3','col4']
min_date_created = fn.min('date_created').alias('min_date_created')
max_date_created = fn.max('date_created').alias('max_date_created')

res_df = (top10_df
            .groupBy(groups)
            .agg(min_date_created
            ,max_date_created
            )
         )
res_df.persist()
print(res_df.count())

score_rank = fn.row_number().over(w.partitionBy(groups).orderBy(fn.desc('score')))
unique_issue_id = fn.row_number().over(w.orderBy(groups))

out_df = (top10_df.alias('t10')
                    .join(res_df.alias('res'),groups,'left')
                    .where(fn.col('t10.date_created')==fn.col('res.max_date_created'))
                    .drop(fn.col('t10.date_created'))
                    .drop(fn.col('t10.date_updated'))
                    .withColumn('score_rank',score_rank)
                    .where(fn.col('score_rank')==1)
                    .drop('score_rank'
                          ,'latest_revision_complete_hash'
                          ,'latest_revision_durable_hash'
                         )
                    .withColumn('unique_issue_id',unique_issue_id)
                   .withColumnRenamed('res.id','resource_id')
                  )

out_df.persist()
print(out_df.count())

Run Code Online (Sandbox Code Playgroud)

apache-spark pyspark databricks

mdl*_*003

lucky-day

5
推荐指数

1
解决办法

5668
查看次数