我想用一个类似于sklearn.
我找到具有 k 个主成分的 PCA 的算法如下:
import numpy as np
class MyPCA:
def __init__(self, n_components):
self.n_components = n_components
def fit_transform(self, X):
"""
Assumes observations in X are passed as rows of a numpy array.
"""
# Translate the dataset so it's centered around 0
translated_X = X - np.mean(X, axis=0)
# Calculate the eigenvalues and eigenvectors of the covariance matrix
e_values, e_vectors = np.linalg.eigh(np.cov(translated_X.T))
# Sort eigenvalues and their eigenvectors in descending …Run Code Online (Sandbox Code Playgroud) 假设我有一个如下所示的数据框:
+---+-----------+-----------+
| id| address1| address2|
+---+-----------+-----------+
| 1|address 1.1|address 1.2|
| 2|address 2.1|address 2.2|
+---+-----------+-----------+
Run Code Online (Sandbox Code Playgroud)
我想将自定义函数直接应用于address1和address2列中的字符串,例如:
def example(string1, string2):
name_1 = string1.lower().split(' ')
name_2 = string2.lower().split(' ')
intersection_count = len(set(name_1) & set(name_2))
return intersection_count
Run Code Online (Sandbox Code Playgroud)
我想将结果存储在一个新列中,以便我的最终数据框如下所示:
+---+-----------+-----------+------+
| id| address1| address2|result|
+---+-----------+-----------+------+
| 1|address 1.1|address 1.2| 2|
| 2|address 2.1|address 2.2| 7|
+---+-----------+-----------+------+
Run Code Online (Sandbox Code Playgroud)
我尝试以一种曾经将内置函数应用于整个列的方式来执行它,但出现错误:
>>> df.withColumn('result', example(df.address1, df.address2))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in example
TypeError: …Run Code Online (Sandbox Code Playgroud) 我有想要在一个pandas数据框中绘制的所有数据,例如:
date flower_color flower_count
0 2017-08-01 blue 1
1 2017-08-01 red 2
2 2017-08-02 blue 5
3 2017-08-02 red 2
Run Code Online (Sandbox Code Playgroud)
我需要在一个图上使用几条不同的线:x值应该是第一列的日期,y值应该是flower_count,y值应该取决于第二列中给出的flower_color.
如何在不过滤原始df并首先将其另存为新对象的情况下执行此操作?我唯一的想法是只为红色花朵创建一个数据框,然后指定它:
figure.line(x="date", y="flower_count", source=red_flower_ds)
figure.line(x="date", y="flower_count", source=blue_flower_ds)
Run Code Online (Sandbox Code Playgroud)