将 Kafka 消息转换为数据帧时,将包作为参数传递时出现错误。
from pyspark.sql import SparkSession, Row
from pyspark.context import SparkContext
from kafka import KafkaConsumer
import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.2.0.jar: org.apache.spark:spark-streaming-kafka-0-8-assembly_2.11:jar:2.1.1 pyspark-shell'pyspark-shell'
sc = SparkContext.getOrCreate()
spark = SparkSession(sc)
df = spark \
.read \
.format("kafka") \
.option("kafka.bootstrap.servers", "localhost:9092") \
.option("subscribe", "Jim_Topic") \
.load()
df.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
Run Code Online (Sandbox Code Playgroud)
错误 ::::::::::::::::::::::::::::::::::::::::::::::
:: UNRESOLVED DEPENDENCIES ::
::::::::::::::::::::::::::::::::::::::::::::::
:: org.apache.spark#spark-sql-kafka-0-10_2.11;2.2.0.jar: not found
::::::::::::::::::::::::::::::::::::::::::::::
Run Code Online (Sandbox Code Playgroud) 我有一个使用管道分隔的列,我必须用其他值替换整个列。
例子 :
A|B|C 我想用“Z”替换第二列,A|Z|C
我有一个生产者代码,正在向 Kafka 发送消息。直到昨天我才能发送消息。从今天开始,我无法发送消息。不确定是否是版本兼容问题。没有失败或错误消息,代码被执行,但没有发送消息。
以下是 Python 模块版本:
kafka-python==2.0.1
Python 3.8.2
下面是我的代码:
from kafka import KafkaProducer
import logging
logging.basicConfig(level=logging.INFO)
producer = KafkaProducer(bootstrap_servers='127.0.0.1:9092')
producer.send('Jim_Topic', b'Message from PyCharm')
producer.send('Jim_Topic', key=b'message-two', value=b'This is Kafka-Python')
Run Code Online (Sandbox Code Playgroud)
我也尝试记录行为,但不知道为什么生产者被关闭:
INFO:kafka.conn:<BrokerConnection node_id=bootstrap-0 host=127.0.0.1:9092 <connecting> [IPv4 ('127.0.0.1', 9092)]>: connecting to 127.0.0.1:9092 [('127.0.0.1', 9092) IPv4]
INFO:kafka.conn:<BrokerConnection node_id=bootstrap-0 host=127.0.0.1:9092 <connecting> [IPv4 ('127.0.0.1', 9092)]>: Connection complete.
INFO:kafka.producer.kafka:Closing the Kafka producer with 0 secs timeout.
INFO:kafka.producer.kafka:Proceeding to force close the producer since pending requests could not be completed within timeout 0.
INFO:kafka.producer.kafka:Kafka producer closed
Process …
Run Code Online (Sandbox Code Playgroud) 将 base.html 调用到其他 html 页面时出现问题
Exception Type: TemplateDoesNotExist
Exception Value: base.html
下面是我的settings.py
TEMPLATES = [
{
'BACKEND': 'django.template.backends.django.DjangoTemplates',
'DIRS': [TEMPLATE_DIR],
'APP_DIRS': True,
'OPTIONS': {
'context_processors': [
'django.template.context_processors.debug',
'django.template.context_processors.request',
'django.contrib.auth.context_processors.auth',
'django.contrib.messages.context_processors.messages',
],
},
},
Run Code Online (Sandbox Code Playgroud)
]
索引.html
{% extends "base.html" %}
<h1>Django</h1>
{% block content %}
<h1>Django</h1>
{% endblock %}
Run Code Online (Sandbox Code Playgroud)
我有一个 dd-MON-yy 格式的字符串。在 python 中转换为日期时,由于年份是两位数,因此导致了问题。
datetime.datetime.strptime('17-JUN-03', '%d-%m-%y')
Run Code Online (Sandbox Code Playgroud)
错误是,
ValueError: time data '17-JUN-03' does not match format '%d-%m-%y'
Run Code Online (Sandbox Code Playgroud) python ×3
apache-kafka ×2
apache-spark ×1
awk ×1
django ×1
kafka-python ×1
pyspark ×1
shell ×1
strptime ×1