我想用Python代码解析git diff,我有兴趣从diff解析器获取以下信息:
我为此目的使用unidiff 0.5.2并且我编写了以下代码:
from unidiff import PatchSet
import git
import os
commit_sha1 = 'b4defafcb26ab86843bbe3464a4cf54cdc978696'
repo_directory_address = '/my/git/repo'
repository = git.Repo(repo_directory_address)
commit = repository.commit(commit_sha1)
diff_index = commit.diff(commit_sha1+'~1', create_patch=True)
diff_text = reduce(lambda x, y: str(x)+os.linesep+str(y), diff_index).split(os.linesep)
patch = PatchSet(diff_text)
print patch[0].is_added_file
Run Code Online (Sandbox Code Playgroud)
我正在使用GitPython来生成Git diff.我收到以下代码的以下错误:
current_file = PatchedFile(source_file, target_file,
UnboundLocalError: local variable 'source_file' referenced before assignment
Run Code Online (Sandbox Code Playgroud)
如果你能帮助我解决这个错误,我将不胜感激.
我想用 Python 将大数据流写入镶木地板文件。我的数据很大,我无法将它们保存在内存中并一口气写入它们。
我找到了两个可以在 Parquet 文件上读写的 Python 库(Pyarrow、Fastparquet)。这是我使用 Pyarrow 的解决方案,但如果您知道一个可行的解决方案,我很乐意尝试另一个库:
import pandas as pd
import random
import pyarrow as pa
import pyarrow.parquet as pq
def data_generator():
# This is a simulation for my generator function
# It is not allowed to change the nature of this function
options = ['op1', 'op2', 'op3', 'op4']
while True:
dd = {'c1': random.randint(1, 10), 'c2': random.choice(options)}
yield dd
result_file_address = 'example.parquet'
index = 0
try:
dic_data = next(data_generator())
df = pd.DataFrame(dic_data, [index])
table = …Run Code Online (Sandbox Code Playgroud)