我想获取熊猫数据框中每个值的对数。我已经尝试过了,但是没有用:
#Reading data from excel and rounding values on 2 decimal places
import math
import pandas as pd
data = pd.read_excel("DataSet.xls").round(2)
log_data= math.log10(data)
Run Code Online (Sandbox Code Playgroud)
它给了我这个错误:
TypeError:必须是实数,而不是DataFrame
你有什么想法吗?
我想IsolationForest用于发现异常值。我想使用找到最佳模型参数GridSearchCV。问题是我总是得到相同的错误:
TypeError: If no scoring is specified, the estimator passed should have a 'score' method. The estimator IsolationForest(behaviour='old', bootstrap=False, contamination='legacy',
max_features=1.0, max_samples='auto', n_estimators=100,
n_jobs=None, random_state=None, verbose=0, warm_start=False) does not.
Run Code Online (Sandbox Code Playgroud)
似乎是一个问题,因为IsolationForest没有score方法。有没有办法解决这个问题?还可以找到隔离林的分数吗?这是我的代码:
import pandas as pd
from sklearn.ensemble import IsolationForest
from sklearn.model_selection import GridSearchCV
df = pd.DataFrame({'first': [-112,0,1,28,5,6,3,5,4,2,7,5,1,3,2,2,5,2,42,84,13,43,13],
'second': [42,1,2,85,2,4,6,8,3,5,7,3,64,1,4,1,2,4,13,1,0,40,9],
'third': [3,4,7,74,3,8,2,4,7,1,53,6,5,5,59,0,5,12,65,4,3,4,11],
'result': [5,2,3,0.04,3,4,3,125,6,6,0.8,9,1,4,59,12,1,4,0,8,5,4,1]})
x = df.iloc[:,:-1]
tuned = {'n_estimators':[70,80,100,120,150,200], 'max_samples':['auto', 1,3,5,7,10],
'contamination':['legacy', 'outo'], 'max_features':[1,2,3,4,5,6,7,8,9,10,13,15],
'bootstrap':[True,False], 'n_jobs':[None,1,2,3,4,5,6,7,8,10,15,20,25,30], 'behaviour':['old', 'new'],
'random_state':[None,1,5,10,42], 'verbose':[0,1,2,3,4,5,6,7,8,9,10], 'warm_start':[True,False]}
isolation_forest = …Run Code Online (Sandbox Code Playgroud) 我有一个包含 3 列的大型数据集,列是文本、短语和主题。我想找到一种基于主题提取关键短语(短语列)的方法。Key-Phrase 可以是文本值的一部分,也可以是整个文本值。
import pandas as pd
text = ["great game with a lot of amazing goals from both teams",
"goalkeepers from both teams made misteke",
"he won all four grand slam championchips",
"the best player from three-point line",
"Novak Djokovic is the best player of all time",
"amazing slam dunks from the best players",
"he deserved yellow-card for this foul",
"free throw points"]
phrase = ["goals", "goalkeepers", "grand slam championchips", "three-point line", "Novak Djokovic", "slam dunks", "yellow-card", "free …Run Code Online (Sandbox Code Playgroud) 我想用 Keras 制作简单的分类器来对我的数据进行分类。特征是数字数据,结果是字符串/分类数据。我预测有 15 个不同的类别/类别。这就是我的代码的样子:
model = Sequential()
model.add(Dense(16, input_dim = x_train.shape[1], activation = 'relu')) # input layer requires input_dim param
model.add(Dense(16, activation = 'relu'))
model.add(Dense(16, activation = 'relu'))
model.add(Dense(1, activation='relu'))
model.compile(loss="binary_crossentropy", optimizer= "adam", metrics=['accuracy'])
#es = EarlyStopping(monitor='loss', min_delta=0.005, patience=1, verbose=1, mode='auto')
model.fit(x_train, y_train, epochs = 100, shuffle = True, batch_size=128, verbose=2)
scores = model.evaluate(x_test, y_test)
print(model.metrics_names[0], model.metrics_names[1])
Run Code Online (Sandbox Code Playgroud)
问题是我总是收到此错误:
ValueError: could not convert string to float 'category1'
Run Code Online (Sandbox Code Playgroud)
我究竟做错了什么?
当我用整数替换我的类名称“category1”、“category2”等时,我的代码可以工作,但它总是给我 0 的精度。我尝试更改节点数、层数和激活函数,但结果始终为 0。看起来模型认为我正在做回归而不是分类。
如果我的分类值不只是 1 或 0,使用 Keras lib 进行分类的正确方法是什么?
我使用Python的Scikit-learn库编写了一个简单的线性回归和决策树分类器代码,用于预测结果。它运作良好。
我的问题是,是否有一种方法可以反向执行此操作,以根据估算结果(参数,精度最高的参数)来预测参数值的最佳组合。
或者我可以这样问,是否存在可以基于一个(或多个)预测多个结果的分类,回归或其他某种类型的算法(决策树,SVM,KNN,逻辑回归,线性回归,多项式回归...) )参数?
我试图通过放入多变量结果来做到这一点,但它显示了错误:
ValueError:预期的2D数组,取而代之的是1D数组:array = [101905182182268646624465]。如果数据具有单个功能,则使用array.reshape(-1,1)重整数据;如果包含单个样本,则使用array.reshape(1,-1)重整数据。
这是我为回归编写的代码:
import pandas as pd
from sklearn import linear_model
from sklearn import tree
dic = {'par_1': [10, 30, 13, 19, 25, 33, 23],
'par_2': [1, 3, 1, 2, 3, 3, 2],
'outcome': [101, 905, 182, 268, 646, 624, 465]}
df = pd.DataFrame(dic)
variables = df.iloc[:,:-1]
results = df.iloc[:,-1]
regression = linear_model.LinearRegression()
regression.fit(variables, results)
input_values = [14, 2]
prediction = regression.predict([input_values])
prediction = round(prediction[0], 2)
print(prediction)
Run Code Online (Sandbox Code Playgroud)
这是我为决策树编写的代码:
dic = {'par_1': [10, …Run Code Online (Sandbox Code Playgroud) 我用 Python 编写了一个从 PDF 文件中提取文本的代码。但对于某些文件,我得到一些奇怪的输出。这是我的代码:
\nimport requests\n\nfrom io import BytesIO\nfrom pdfminer.high_level import extract_text, extract_pages\n\npdf_link = 'https://www.neerach.ch/public/upload/assets/1417/MTB0321.pdf'\n\nresponse = requests.get(pdf_link)\nwith BytesIO(response.content) as data:\n \n num_of_pages = len(list(extract_pages(data)))\n print('number of pages', num_of_pages)\n\n #extract first 5 pages\n text = extract_text(data, password='', page_numbers = None, maxpages = 5, caching=True, codec='utf-8', laparams=None)\n text = str(text)\n text = text.replace('\\n\\n\\n', '\\n\\n').strip()\n print(text)\nRun Code Online (Sandbox Code Playgroud)\n我得到的结果:
\ncid:3)\n(cid:3)\n(cid:3)\n(cid:3)\n\n(cid:3)\n(cid:3)\n(cid:3)\n\nNr. 3 | 2021\n\nM\xc3\xa4rz 2021\n\n(cid:3)\n(cid:57)(cid:72)(cid:85)(cid:75)(cid:68)(cid:81)(cid:71)(cid:79)(cid:88)(cid:81)(cid:74)(cid:72)(cid:81)(cid:3)(cid:71)(cid:72)(cid:86)(cid:3)(cid:42)(cid:72)(cid:80)(cid:72)(cid:76)(cid:81)(cid:71)(cid:72)(cid:85)(cid:68)(cid:87)(cid:72)(cid:86)(cid:3)\n(cid:3)\n(cid:54)(cid:70)(cid:75)(cid:88)(cid:79)(cid:72)(cid:81)(cid:3)\n(cid:3)\n(cid:54)(cid:82)(cid:93)(cid:76)(cid:68)(cid:79)(cid:72)(cid:3)(cid:39)(cid:76)(cid:72)(cid:81)(cid:86)(cid:87)(cid:72)(cid:3)\n(cid:3)\n(cid:48)(cid:76)(cid:87)(cid:87)(cid:72)(cid:76)(cid:79)(cid:88)(cid:81)(cid:74)(cid:72)(cid:81)(cid:3)(cid:39)(cid:82)(cid:85)(cid:73)(cid:89)(cid:72)(cid:85)(cid:72)(cid:76)(cid:81)(cid:72)(cid:3)\n(cid:3)\n(cid:48)(cid:76)(cid:87)(cid:87)(cid:72)(cid:76)(cid:79)(cid:88)(cid:81)(cid:74)(cid:72)(cid:81)(cid:3)(cid:68)(cid:88)(cid:86)(cid:90)(cid:108)(cid:85)(cid:87)(cid:76)(cid:74)(cid:72)(cid:85)(cid:3)(cid:57)(cid:72)(cid:85)(cid:72)(cid:76)(cid:81)(cid:72)(cid:3)\n(cid:3)\n(cid:48)(cid:76)(cid:87)(cid:87)(cid:72)(cid:76)(cid:79)(cid:88)(cid:81)(cid:74)(cid:72)(cid:81)(cid:3)(cid:46)(cid:76)(cid:85)(cid:70)(cid:75)(cid:74)(cid:72)(cid:80)(cid:72)(cid:76)(cid:81)(cid:71)(cid:72)(cid:81)(cid:3)\n\n(cid:20)(cid:3)\n\n(cid:23)(cid:3)\n\n(cid:20)(cid:21)(cid:3)\n\n(cid:21)(cid:20)(cid:3)\n\n(cid:21)(cid:24)(cid:3)\n\nMitteilungsblatt Neerach | Gemeindeverwaltung Neerach | Binzm\xc3\xbchlestrasse 14 | 8173 Neerach\n044 859 16 16 | einwohnerkontrolle@neerach.ch | www.neerach.ch\n\n(cid:3)(cid:3)(cid:3)(cid:3)(cid:3)(cid:3)(cid:3)(cid:3)\n(cid:3)\n(cid:3)\n(cid:3)\n …Run Code Online (Sandbox Code Playgroud) 我写了一个预测房价的代码。问题是,我的准确率得分为负。我使用了 5 种不同的算法,准确度得分无处不在。
我遇到的第一个问题是我在使用.map函数时收到警告,但我认为这不是问题。
回归模型可以工作,但它们的训练和测试准确度到处都是。我也试过这个:
from sklearn.metrics import accuracy_score
...
score_train = regression.accuracy_score(variables_train, result_train)
...
但它向我展示了这个 AttributeError: 'LinearRegression' object has no attribute 'accuracy_score'
您可以从这里下载数据库:
https://www.sendspace.com/file/93nkdy
这是代码:
import pandas as pd
from sklearn import linear_model
from sklearn.svm import SVR
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split
#pandas display options
pd.set_option('display.max_rows', 70)
pd.set_option('display.max_columns', 100)
pd.set_option('display.width', 1000)
data = pd.read_csv("validate.csv")
data = data.drop(columns = ["id"])
data = data.dropna(axis='columns')
data_for_pred = data[["bedrooms_total", "baths_total",
"sq_ft_tot_fn", "garage_capacity",
"city", "total_stories", "rooms_total",
"garage", "flood_zone","price_closed"]]
#to …Run Code Online (Sandbox Code Playgroud) 我已经开始学习使用Python和sklearn库进行集群。我写了一个简单的代码来聚类文本数据。我的目标是找到相似句子的组/类。我试图绘制它们,但失败了。
问题是文本数据,我总是会收到此错误:
ValueError: setting an array element with a sequence.
Run Code Online (Sandbox Code Playgroud)
相同的方法适用于数字数据,但不适用于文本数据。有没有办法绘制相似句子的组/群?另外,是否有办法查看这些组是什么,这些组代表什么,如何识别它们?我打印了,labels = kmeans.predict(x)但这些只是数字列表,它们代表什么?
import pandas as pd
import re
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
x = ['this is very good show' , 'i had a great time on my school trip', 'such a boring movie', 'Springbreak was amazing', 'You are wrong', 'This food is so tasty', 'I had so much fun last night', 'This is crap', 'I had a …Run Code Online (Sandbox Code Playgroud) 我有.ndjson一个 20GB 的文件,我想用 Python 打开它。文件太大,所以我找到了一种方法,用一个在线工具将其分成 50 个文件。这是这个工具:https://pinetools.com/split-files
现在我得到一个文件,其扩展名.ndjson.000(我不知道那是什么)
我试图将其作为 json 或 csv 文件打开,以在 pandas 中读取它,但它不起作用。您知道如何解决这个问题吗?
import json
import pandas as pd
Run Code Online (Sandbox Code Playgroud)
第一种方法:
df = pd.read_json('dump.ndjson.000', lines=True)
Run Code Online (Sandbox Code Playgroud)
错误:ValueError: Unmatched ''"' when when decoding 'string'
第二种方法:
with open('dump.ndjson.000', 'r') as f:
my_data = f.read()
print(my_data)
Run Code Online (Sandbox Code Playgroud)
错误:json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 104925061 (char 104925060)
我认为问题是我的文件中有一些表情符号,所以我不知道如何对它们进行编码?
我用 Python 中的 OpenCV 库读取了一张图片。我想知道,如何将背景颜色更改为白色。我只想让图像和白色背景中的人。
例如:
我想改成这样:
我怎么能做这样的事情:
import numpy as np
import cv2
my_image = r'C:\Users\Pc\Desktop\preklapanje4.jpg'
my_image = cv2.imread(my_image, 1)
cv2.imshow('img',my_image)
cv2.waitKey(0)
Run Code Online (Sandbox Code Playgroud)