我看到IPython Notebook有一个菜单项:File > Revert to Checkpoint但是这对于我的任何笔记本来说都不会包含多个条目.
有没有办法让这个菜单能容纳多个检查点?我无法在网络上的任何地方找到有关如何执行此操作的文档.谢谢.
另外,我把绿框作为亮点.

给定稀疏矩阵列表,计算矩阵中每列(或行)之间的余弦相似度的最佳方法是什么?我宁愿不迭代n次选择两次.
说输入矩阵是:
A=
[0 1 0 0 1
0 0 1 1 1
1 1 0 1 0]
Run Code Online (Sandbox Code Playgroud)
稀疏表示是:
A =
0, 1
0, 4
1, 2
1, 3
1, 4
2, 0
2, 1
2, 3
Run Code Online (Sandbox Code Playgroud)
在Python中,使用矩阵输入格式很简单:
import numpy as np
from sklearn.metrics import pairwise_distances
from scipy.spatial.distance import cosine
A = np.array(
[[0, 1, 0, 0, 1],
[0, 0, 1, 1, 1],
[1, 1, 0, 1, 0]])
dist_out = 1-pairwise_distances(A, metric="cosine")
dist_out
Run Code Online (Sandbox Code Playgroud)
得到:
array([[ 1. , 0.40824829, 0.40824829],
[ …Run Code Online (Sandbox Code Playgroud) 我有这样的日期字符串:
'January 11, 2010'
Run Code Online (Sandbox Code Playgroud)
我需要一个返回星期几的函数,比如
'mon', or 'monday'
Run Code Online (Sandbox Code Playgroud)
我在Python帮助中的任何地方都找不到这个.任何人?谢谢.
我需要知道如何以这样一种方式返回逻辑回归系数,即我可以自己生成预测概率.
我的代码看起来像这样:
lr = LogisticRegression()
lr.fit(training_data, binary_labels)
# Generate probabities automatically
predicted_probs = lr.predict_proba(binary_labels)
Run Code Online (Sandbox Code Playgroud)
我假设lr.coeff_值将遵循典型的逻辑回归,因此我可以返回预测的概率,如下所示:
sigmoid( dot([val1, val2, offset], lr.coef_.T) )
Run Code Online (Sandbox Code Playgroud)
但这不是合适的表述.有没有人有适当的格式来生成Scikit Learn LogisticRegression的预测概率?谢谢!
我想使用pyspark在本地计算机上运行Spark.从这里我使用命令:
sbt/sbt assembly
$ ./bin/pyspark
Run Code Online (Sandbox Code Playgroud)
安装完成,但pyspark无法运行,导致以下错误(完整):
138:spark-0.9.1 comp_name$ ./bin/pyspark
Python 2.7.6 |Anaconda 1.9.2 (x86_64)| (default, Jan 10 2014, 11:23:15)
[GCC 4.0.1 (Apple Inc. build 5493)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Traceback (most recent call last):
File "/Users/comp_name/Downloads/spark-0.9.1/python/pyspark/shell.py", line 32, in <module>
sc = SparkContext(os.environ.get("MASTER", "local"), "PySparkShell", pyFiles=add_files)
File "/Users/comp_name/Downloads/spark-0.9.1/python/pyspark/context.py", line 123, in __init__
self._jsc = self._jvm.JavaSparkContext(self._conf._jconf)
File "/Users/comp_name/Downloads/spark-0.9.1/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py", line 669, in __call__
File "/Users/comp_name/Downloads/spark-0.9.1/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py", line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred …Run Code Online (Sandbox Code Playgroud) 简单地说,在B列='t3'的情况下,我想用新字符串替换A列中的NaN值。
我在下面的尝试都失败了。
d = pd.DataFrame({"A":[np.nan, 't2', np.nan, 't3', np.nan], "B":['t1', 't2', 't3', 't4', 't3']})
print "Original Dataframe:\n", d
# Does not work
d[d.B == 't3'].A = 'new_val'
# Does not work
d[d.B == 't3'].A.replace(np.nan, 'new_val')
# Does not work
d[d.B == 't3'].A.replace(np.nan, 'new_val', inplace=True)
print "Final Dataframe:\n", d
Run Code Online (Sandbox Code Playgroud)
这是输出:
Original Dataframe:
A B
0 NaN t1
1 t2 t2
2 NaN t3
3 t3 t4
4 NaN t3
[5 rows x 2 columns]
Final Dataframe:
A B
0 NaN t1 …Run Code Online (Sandbox Code Playgroud) 我在Sublime Text2中设置构建环境时遇到问题.我正在使用Macports进行Python和包安装.
我的python.sublime-build文件如下所示:
{
"cmd": ["python", "-u", "$file"],
"file_regex": "^[ ]*File \"(...*?)\", line ([0-9]*)",
"selector": "source.python"
}
Run Code Online (Sandbox Code Playgroud)
我认为(从搜索中)我需要修改"cmd"行以指向Macports版本.有人成功完成了吗?
从终端,一切都建立/运行良好,它只是Sublime Text2的构建,仍然抓住系统版本.
附加信息:
which python
/opt/local/bin/python
Run Code Online (Sandbox Code Playgroud)
谢谢你的帮助.
我有推荐人的用户/项目数据.我将它分成测试和训练数据,我需要确保在评估推荐者之前省略测试数据中的任何新用户或项目.我的方法适用于小型数据集,但是当它变大时,它永远都需要.有一个更好的方法吗?
# Test set for removing users or items not in train
te = pd.DataFrame({'user': [1,2,3,1,6,1], 'item':[16,12,19,15,13,12]})
tr = pd.DataFrame({'user': [1,2,3,4,5], 'item':[11,12,13,14,15]})
print "Training_______"
print tr
print "\nTesting_______"
print te
# By using two joins and selecting the proper indices, all 'new' members of test set are removed
b = pd.merge( pd.merge(te,tr, on='user', suffixes=['', '_d']) , tr, on='item', suffixes=['', '_d'])[['user', 'item']]
print "\nSolution_______"
print b
Run Code Online (Sandbox Code Playgroud)
得到:
Training_______
item user
0 11 1
1 12 2
2 13 3
3 14 …Run Code Online (Sandbox Code Playgroud) 我有一些日志数据,我想首先按user_id分组,然后选择第二个条目.这是在下面完成的.缺少的步骤是分组后每个条目相对于第一个条目的年龄.
dd = pd.DataFrame({'item_id': {0: 0, 1: 4, 2: 6, 3: 8, 4: 9, 5: 1}, 'date': {0: '2013-12-29T17:56:01Z', 1: '2013-12-29T19:44:09Z', 2: '2013-12-29T19:58:05Z', 3: '2013-12-29T20:00:09Z', 4: '2013-12-29T20:13:35Z', 5: '2013-12-29T20:19:56Z'}, 'user_id': {0: 6, 1: 8, 2: 3, 3: 3, 4: 6, 5: 6}})
print "Step 1: Original DataFrame, sorted by date:\n", dd
g = dd.groupby(by='user_id', sort=False)
print "\nStep 2: Grouped by User ID:\n", g.head()
# Print the 2nd entey (if it exists)
print "\nStep 3: The 2nd user for …Run Code Online (Sandbox Code Playgroud) python ×9
pandas ×4
apache-spark ×1
datetime ×1
macports ×1
numpy ×1
scikit-learn ×1
similarity ×1
sublimetext2 ×1