我有一个庞大的数据集,我希望PCA.我受限于RAM和PCA的计算效率.因此,我转而使用Iterative PCA.
数据集大小 - (140000,3504)
该文件指出,This algorithm has constant memory complexity, on the order of batch_size, enabling use of np.memmap files without loading the entire file into memory.
这真的很好,但不确定如何利用这一点.
我尝试加载一个memmap,希望它能以块的形式访问它,但是我的RAM已经爆炸了.我下面的代码最终使用了大量的RAM:
ut = np.memmap('my_array.mmap', dtype=np.float16, mode='w+', shape=(140000,3504))
clf=IncrementalPCA(copy=False)
X_train=clf.fit_transform(ut)
Run Code Online (Sandbox Code Playgroud)
当我说"我的RAM吹"时,我看到的Traceback是:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\sklearn\base.py", line 433, in fit_transfo
rm
return self.fit(X, **fit_params).transform(X)
File "C:\Python27\lib\site-packages\sklearn\decomposition\incremental_pca.py",
line 171, in fit
X = check_array(X, dtype=np.float)
File "C:\Python27\lib\site-packages\sklearn\utils\validation.py", line 347, in
check_array
array …Run Code Online (Sandbox Code Playgroud) 如何打破 netlogo 中的询问?例如......特工正在移动,突然满足了某些条件。现在您不想要求代理执行(移动)并停止询问。并在询问后将控制权传递给下一行。
类似的东西存在于称为“停止”的过程中。但不是为了问。有办法解决吗?
以下代码在R中生成一个qudaratic回归.
lm.out3 = lm(listOfDataFrames1$avgTime ~ listOfDataFrames1$betaexit + I(listOfDataFrames1$betaexit^2) + I(listOfDataFrames1$betaexit^3))
summary(lm.out3)
Call:
lm(formula = listOfDataFrames1$avgTime ~ listOfDataFrames1$betaexit +
I(listOfDataFrames1$betaexit^2) + I(listOfDataFrames1$betaexit^3))
Residuals:
Min 1Q Median 3Q Max
-14.168 -2.923 -1.435 2.459 28.429
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 199.41 11.13 17.913 < 2e-16 ***
listOfDataFrames1$betaexit -3982.03 449.49 -8.859 1.14e-12 ***
I(listOfDataFrames1$betaexit^2) 32630.86 5370.27 6.076 7.87e-08 ***
I(listOfDataFrames1$betaexit^3) -93042.90 19521.05 -4.766 1.15e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual …Run Code Online (Sandbox Code Playgroud) 给定线 A 、 B 和 C 的系数。
Ax + By + C = 0
Run Code Online (Sandbox Code Playgroud)
我希望以顺时针或逆时针方向旋转x角度的线。我怎样才能实现这一点以便获得新的系数?(在NetLogo中)
编辑:抱歉,将点视为 x_0 和 y_0
如何在python中获得副词和形容词的相应动词和名词?似乎简单的继承和优先可能不是很准确.可能存在类似于例如的停用词.在我很高兴学习......
我不能将任何图书馆甚至问题陈述正式化.
现在代码.现在我想为句子中的每个形容词返回相应的副词和名词动词.请帮忙.
Code:
def pos_func(input_text):
#pos tagging code:
text=input_text
tokens=tokenize_words(text)
tagged=pos_tag(tokens)
pos_store(tagged)
def pos_store(tagged):
verbs=[]
adjectives=[]
adverbs=[]
nouns=[]
for tag in tagged:
pos=tag[1]
if pos[0]=='V':
verbs.append(tag[0])
elif pos[0]=='N':
nouns.append(tag[0])
elif pos[0]=='J':
adjectives.append(tag[0])
elif pos[0:2]=='RB':
adverbs.append(tag[0])
def tokenize_words(text):
tokens = TreebankWordTokenizer().tokenize(text)
contractions = ["n't", "'ll", "'m"]
fix = []
for i in range(len(tokens)):
for c in contractions:
if tokens[i] == c: fix.append(i)
fix_offset = 0
for fix_id in fix:
idx = fix_id - 1 - fix_offset
tokens[idx] = tokens[idx] + …Run Code Online (Sandbox Code Playgroud) 我有以下代码来绘制直方图.值time_new是发生某事的小时数.
time_new=[9, 23, 19, 9, 1, 2, 19, 5, 4, 20, 23, 10, 20, 5, 21, 17, 4, 13, 8, 13, 6, 19, 9, 14, 9, 10, 23, 19, 23, 20, 19, 6, 5, 24, 20, 19, 15, 14, 19, 14, 15, 21]
hour_list = time_new
print hour_list
numbers=[x for x in xrange(0,24)]
labels=map(lambda x: str(x), numbers)
plt.xticks(numbers, labels)
plt.xlim(0,24)
pdb.set_trace()
plt.hist(hour_list,bins=24)
plt.show()
Run Code Online (Sandbox Code Playgroud)
这会产生直方图,但是这些箱子没有按照我的意愿对齐.我希望小时位于垃圾箱的中心,而不是边缘.
我提到了这个问题/答案,但似乎也没有回答这个问题.
我尝试使用以下代码来代替直方图,但它没有为值绘制条形图 23
plt.hist(hour_list, bins=np.arange(24)-0.5)
Run Code Online (Sandbox Code Playgroud)
任何人都可以帮助我获得24个垃圾箱,每个小时位于每个垃圾箱的中心吗?
import os,pdb
import sys
print os.path.dirname(os.path.realpath(__file__))
pdb.set_trace()
os.chdir('../bci_framework')
Run Code Online (Sandbox Code Playgroud)
输出:
D:\baseline\BCI-Comparison-Framework\bci_framework
WindowsError: (2, 'The system cannot find the file specified', '../bci_framework')
Run Code Online (Sandbox Code Playgroud)
为什么?文件夹存在!
我有一个格式列表(浮点数,字符串).如何从列表中删除具有相同浮点值的重复项?
列表按浮点顺序排序.我想保留订单.
[(0.10507038451969995,
'Deadly stampede in Shanghai - Emergency personnel help victims.'),
(0.078586381821416265,
'Deadly stampede in Shanghai - Police and medical staff help injured people after the stampede.'),
(0.072031446647399661, '- Emergency personnel help victims.'),
(0.072031446647399661, 'Emergency personnel help victims.')]
Run Code Online (Sandbox Code Playgroud)
看看最后两个.
我正在编写一个程序来分割主题标签中包含的单词。
例如我想分割主题标签:
#Whatthehello #goback
Run Code Online (Sandbox Code Playgroud)
进入:
What the hello go back
Run Code Online (Sandbox Code Playgroud)
re.sub我在使用函数参数时遇到了麻烦。
我写的代码是:
import re,pdb
def func_replace(each_func):
i=0
wordsineach_func=[]
while len(each_func) >0:
i=i+1
word_found=longest_word(each_func)
if len(word_found)>0:
wordsineach_func.append(word_found)
each_func=each_func.replace(word_found,"")
return ' '.join(wordsineach_func)
def longest_word(phrase):
phrase_length=len(phrase)
words_found=[];index=0
outerstring=""
while index < phrase_length:
outerstring=outerstring+phrase[index]
index=index+1
if outerstring in words or outerstring.lower() in words:
words_found.append(outerstring)
if len(words_found) ==0:
words_found.append(phrase)
return max(words_found, key=len)
words=[]
# The file corncob_lowercase.txt contains a list of dictionary words
with open('corncob_lowercase.txt') as f:
read_words=f.readlines()
for read_word in read_words:
words.append(read_word.replace("\n","").replace("\r",""))
Run Code Online (Sandbox Code Playgroud)
例如,当使用这些函数时,如下所示: …
python ×6
netlogo ×3
string ×2
geometry ×1
hashtag ×1
histogram ×1
list ×1
math ×1
matplotlib ×1
nlp ×1
nltk ×1
numpy ×1
pca ×1
python-2.7 ×1
r ×1
regex ×1
replace ×1
scikit-learn ×1
stanford-nlp ×1
tuples ×1