我正在从 465 个网页中提取 xml 数据,并使用 python 数据框将其解析并存储在“.csv”文件中。程序运行 30 分钟后,程序保存“200.csv”文件并自行杀死。命令行执行显示“Killed”。但是当我分别运行前 200 页和其余 265 页的程序进行提取时,它运行良好。我在互联网上彻底搜索过,这个问题没有正确的答案。你能告诉我可能是什么原因吗?
for i in list:
addr = str(url + i + '?&$format=json')
response = requests.get(addr, auth=(self.user_, self.pass_))
# print (response.content)
json_data = response.json()
if ('d' in json_data):
df = json_normalize(json_data['d']['results'])
paginate = 'true'
while paginate == 'true':
if '__next' in json_data['d']:
addr_next = json_data['d']['__next']
response = requests.get(addr_next, auth=(self.user_, self.pass_))
json_data = response.json()
df = df.append(json_normalize(json_data['d']['results']))
else:
paginate = 'false'
try:
if(not df.empty):
storage = '/usr/share/airflow/documents/output/' + i + …Run Code Online (Sandbox Code Playgroud) 我在终端上使用curl 在 docker 上创建了一个 debezium 连接器,但我一直在修改现有连接器。
我的泊坞窗文件:
---
version: '3'
services:
kafka-connect-02:
image: confluentinc/cp-kafka-connect:latest
container_name: kafka-connect-02
ports:
- 8083:8083
environment:
CONNECT_LOG4J_APPENDER_STDOUT_LAYOUT_CONVERSIONPATTERN: "[%d] %p %X{connector.context}%m (%c:%L)%n"
CONNECT_CUB_KAFKA_TIMEOUT: 300
CONNECT_BOOTSTRAP_SERVERS: "https://***9092"
CONNECT_REST_ADVERTISED_HOST_NAME: 'kafka-connect-02'
CONNECT_REST_PORT: 8083
CONNECT_GROUP_ID: kafka-connect-group-01-v04
CONNECT_CONFIG_STORAGE_TOPIC: _kafka-connect-group-01-v04-configs
CONNECT_OFFSET_STORAGE_TOPIC: _kafka-connect-group-01-v04-offsets
CONNECT_STATUS_STORAGE_TOPIC: _kafka-connect-group-01-v04-status
CONNECT_KEY_CONVERTER: io.confluent.connect.avro.AvroConverter
CONNECT_KEY_CONVERTER_SCHEMA_REGISTRY_URL: "https://***9092"
CONNECT_KEY_CONVERTER_BASIC_AUTH_CREDENTIALS_SOURCE: "USER_INFO"
CONNECT_KEY_CONVERTER_SCHEMA_REGISTRY_BASIC_AUTH_USER_INFO: "***:***"
CONNECT_VALUE_CONVERTER: io.confluent.connect.avro.AvroConverter
CONNECT_VALUE_CONVERTER_SCHEMA_REGISTRY_URL: "https://***9092"
CONNECT_VALUE_CONVERTER_BASIC_AUTH_CREDENTIALS_SOURCE: "USER_INFO"
CONNECT_VALUE_CONVERTER_SCHEMA_REGISTRY_BASIC_AUTH_USER_INFO: "***:***"
CONNECT_INTERNAL_KEY_CONVERTER: 'org.apache.kafka.connect.json.JsonConverter'
CONNECT_INTERNAL_VALUE_CONVERTER: 'org.apache.kafka.connect.json.JsonConverter'
CONNECT_LOG4J_ROOT_LOGLEVEL: 'INFO'
CONNECT_LOG4J_LOGGERS: 'org.apache.kafka.connect.runtime.rest=WARN,org.reflections=ERROR'
CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR: '3'
CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR: '3'
CONNECT_STATUS_STORAGE_REPLICATION_FACTOR: '3'
CONNECT_PLUGIN_PATH: '/usr/share/java,/usr/share/confluent-hub-components/'
# Confluent Cloud config
CONNECT_REQUEST_TIMEOUT_MS: "20000"
CONNECT_RETRY_BACKOFF_MS: …Run Code Online (Sandbox Code Playgroud) class SFTPOperation(object):
PUT = 'put'
GET = 'get'
operation=SFTPOperation.GET,
NameError: name 'SFTPOperation' is not defined
Run Code Online (Sandbox Code Playgroud)
我在这里定义了操作符,但我在互联网上找不到与操作相关的任何内容
class sftpplugin(AirflowPlugin):
name = "sftp_plugin"
operators = [SFTPOperator]
Run Code Online (Sandbox Code Playgroud)
任何帮助将不胜感激!
谢谢,
我发现有很多方法可以将其存储为变量,挂钩和其他使用加密的方法。我想知道什么是最好的方法。
我有两个不同的客户数据框,我想根据Jaccard距离矩阵或任何其他方法来匹配它们。
df1
Name country cost
0 raj Kazakhstan 23
1 sam Russia 243
2 kanan Belarus 2
3 Nan Nan 0
Run Code Online (Sandbox Code Playgroud)
df2
Name country DOB
0 rak Kazakhstan 12-12-1903
1 sim russia 03-04-1994
2 raj Belarus 21-09-2003
3 kane Belarus 23-12-1999
Run Code Online (Sandbox Code Playgroud)
输出:
如果字符串比较值大于> 0.6,我想合并新数据框中的两行。
Df3
Name country Name country cost DOB
0 raj Kazakhstan rak Kazakhstan 23 12-12-1903
1 sam Russia sim russia 243 03-04-1994
2 kanan Belarus Kane Belarus 2 23-12-1999
Run Code Online (Sandbox Code Playgroud)
我曾尝试对每一行进行每一行的计算,但不比较每一行与另一行中的整个行之间的比较吗?
import configparser
config= configparser.ConfigParser()
config.read(r'C:\Users\PycharmProjects\Integration\local.ini')
print(config.sections())
Run Code Online (Sandbox Code Playgroud)
不知道之后该做什么。我试过这段代码
server = config.get('db','server')
Run Code Online (Sandbox Code Playgroud)
它抛出 print 语句的输出和错误。
['"db"', '"Auth"']
configparser.NoSectionError: No section: 'db'
local.ini file contains
["db"]
server=raj
log=ere2
["Auth"]
login=hi
Run Code Online (Sandbox Code Playgroud) Power BI 新手。尝试获取有权访问每个仪表板的用户的报告。任何指示都会有帮助。
提前致谢!
python ×5
airflow ×2
dataframe ×2
pandas ×2
apache-kafka ×1
azure ×1
comparison ×1
configparser ×1
credentials ×1
debezium ×1
hook ×1
linux ×1
operation ×1
operators ×1
portal ×1
powerbi ×1
python-3.x ×1
security ×1
string ×1
xml ×1