我正在尝试运行线性回归,但我认为数据类型有问题。我已经逐行测试,一切正常,直到我到达最后一行,在那里我遇到了 TypeError: invalid Type Promotion 问题。根据我的研究,我认为这是由于日期格式。
这是我的代码:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
data=pd.read_excel('C:\\Users\\Proximo\\PycharmProjects\Counts\\venv\\Counts.xlsx')
data['DATE'] = pd.to_datetime(data['DATE'])
data.plot(x = 'DATE', y = 'COUNT', style = 'o')
plt.title('Corona Spread Over the Time')
plt.xlabel('Date')
plt.ylabel('Count')
plt.show()
X=data['DATE'].values.reshape(-1,1)
y=data['COUNT'].values.reshape(-1,1)
X_train,X_test,Y_train,Y_test=train_test_split(X,y,test_size=.2,random_state=0)
regressor = LinearRegression()
regressor.fit(X_train,Y_train)
y_pre = regressor.predict(X_test)
Run Code Online (Sandbox Code Playgroud)
当我运行它时,这是我得到的完整错误:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-21-c9e943251026> in <module>
----> 1 y_pre = regressor.predict(X_test)
2
c:\users\slavi\pycharmprojects\coronavirus\venv\lib\site-packages\sklearn\linear_model\_base.py in predict(self, X)
223 …Run Code Online (Sandbox Code Playgroud) 所以我使用库 BeautifulSoup 从表中提取数据,代码如下:
if soup.find("table", {"class":"a-keyvalue prodDetTable"}) is not None:
table = parse_table(soup.find("table", {"class":"a-keyvalue prodDetTable"}))
df = pd.DataFrame(table)
Run Code Online (Sandbox Code Playgroud)
所以这行得通,我得到了表 nad 将其解析为数据帧,但是我正在尝试使用 selenium 在不同的网站上做类似的事情,这是我目前的代码:
driver = webdriver.Chrome()
i = "DCD710S2"
base_url = str("https://www.lowes.com/search?searchTerm=" + str(i))
driver.get(base_url)
table = driver.find_element_by_xpath("//*[@id='collapseSpecs']/div/div/div[1]/table/tbody")
Run Code Online (Sandbox Code Playgroud)
所以我进入了表格,我尝试使用 getAttribute(innerHTML) 和其他一些 getAttribute 元素,但我无法将表格按原样放入 Pandas。关于如何用硒处理的任何建议?
selenium beautifulsoup python-3.x pandas selenium-chromedriver
我正在使用 Json 格式的套接字库从传感器流式传输数据,并尝试解析它并将其加载到数据库中。当我打印流时,我得到以下格式的 Json:
b'[{"metadata":{"timezone":{"location":"Etc/UTC"},"serial_number":"00:07:32:52:09:fc","device_type":"SPIDER"},"timestamp":"2019-08-29T13:53:05.895Z","framenumber":"2290718","tracked_objects":[{"id":2592,"is_at_border":true,"type":"PERSON","position":{"x":233,"y":262,"type":"FOOT","coordinate_system":"PROCESSING_IN_PIXEL"},"person_data":{"height":1728}}]}]'
Run Code Online (Sandbox Code Playgroud)
根据我的研究,前缀 b 代表字节类型。所以当我尝试用下面的代码解析它时:
while True:
message, address = server_socket.recvfrom(1024)
message = message.upper()
# loading json file.
objs_json = json.loads(message)
# using if looop to prevent script of trying to to parse data without any object being tracked.
if "tracked_objects" in objs_json:
# Parsing json file with json_normalize object
objs_df = json_normalize(
objs_json, record_path='tracked_objects',
meta=[['metadata', 'serial_number'], 'timestamp']
)
# Renaming columns
objs_df = objs_df.rename(
columns={
"id": "object_id", "position.x": "x_pos",
"position.y": "y_pos", "person_data.height": "height",
"metadata.serial_number": "serial_number",
"timestamp": …Run Code Online (Sandbox Code Playgroud)