我想使用以下pandas,但根本无法导入它.
https://github.com/pydata/pandas/releases/download/v0.15.0/pandas-0.15.0.win-amd64-py2.7.exe
但是我无法导入它:
import pandas as pd
Traceback (most recent call last):
File "<pyshell#0>", line 1, in <module>
import pandas as pd
File "C:\Python27\lib\site-packages\pandas\__init__.py", line 45, in <module>
from pandas.io.api import *
File "C:\Python27\lib\site-packages\pandas\io\api.py", line 15, in <module>
from pandas.io.gbq import read_gbq
File "C:\Python27\lib\site-packages\pandas\io\gbq.py", line 39, in <module>
if LooseVersion(_GOOGLE_API_CLIENT_VERSION >= '1.2.0'):
File "C:\Python27\lib\distutils\version.py", line 265, in __init__
self.parse(vstring)
File "C:\Python27\lib\distutils\version.py", line 274, in parse
self.component_re.split(vstring))
TypeError: expected string or buffer
Run Code Online (Sandbox Code Playgroud)
怎么了?
我想只选择没有任何0元素的行.
data = np.array([[1,2,3,4,5],
[6,7,0,9,10],
[11,12,13,14,15],
[16,17,18,19,0]])
Run Code Online (Sandbox Code Playgroud)
结果将是:
array([[1,2,3,4,5],
[11,12,13,14,15]])
Run Code Online (Sandbox Code Playgroud) 我想将从下面的脚本获得的HTML表转换为CSV文件,但是出现类型错误,如下所示:
TypeError:序列项0:期望字符串,找到Tag
from bs4 import BeautifulSoup
import urllib2
url = 'http://www.data.jma.go.jp/obd/stats/etrn/view/monthly_s3_en.php?block_no=47401&view=1'
html = urllib2.urlopen(url).read()
soup = BeautifulSoup(html)
table = soup.find_all('table', class_='data2_s')
rows = table[0].find_all('tr')
Run Code Online (Sandbox Code Playgroud)
如何将其转换为CSV文件的最简单方法是什么?我试过:
fo = open('fo.txt','w')
for r in rows:
fo.write(str(r.txt) + '\n')
fo.close()
Run Code Online (Sandbox Code Playgroud)
但它写了'无'
HTML是这样的:
<table class="data2_s"><caption class="m">WAKKANAI   WMO Station ID:47401 Lat 45<sup>o</sup>24.9'N  Lon 141<sup>o</sup>40.7'E</caption><tr><th scope="col">Year</th><th scope="col">Jan</th><th scope="col">Feb</th><th scope="col">Mar</th><th scope="col">Apr</th><th scope="col">May</th><th scope="col">Jun</th><th scope="col">Jul</th><th scope="col">Aug</th><th scope="col">Sep</th><th scope="col">Oct</th><th scope="col">Nov</th><th scope="col">Dec</th><th scope="col">Annual</th></tr><tr class="mtx" style="text-align:right;"><td style="text-align:center">1938</td><td class="data_0_0_0_0">-5.2</td><td class="data_0_0_0_0">-4.9</td><td class="data_0_0_0_0">-0.6</td><td class="data_0_0_0_0">4.7</td><td class="data_0_0_0_0">9.5</td><td class="data_0_0_0_0">11.6</td><td class="data_0_0_0_0">17.9</td><td class="data_0_0_0_0">22.2</td><td class="data_0_0_0_0">16.5</td><td class="data_0_0_0_0">10.7</td><td class="data_0_0_0_0">3.3</td><td class="data_0_0_0_0">-4.7</td><td …Run Code Online (Sandbox Code Playgroud) scikit-learn 可以用于去除使用多元线性回归时高度相关的特征吗?
关于@behzad.nouri 发布的在 statsmodels 中捕获高度多重共线性的答案,我有一些问题可以避免我的困惑。
于是,他检验了自变量的5列或特征之间的高度多重共线性;每列有 100 行或数据。他得到 w[0] 接近于零。那么我可以说应该删除第一列或第一个自变量以避免非常高的多重共线性吗?
我有如下的 numpy 数组。
import numpy as np
data = np.array([[0,0,0,4],
[3,0,5,0],
[8,9,5,3]])
print (data)
Run Code Online (Sandbox Code Playgroud)
我必须只提取前三个元素不全为零的那些行,预期结果如下:
result = np.array([[3,0,5,0],
[8,9,5,3]])
Run Code Online (Sandbox Code Playgroud)
我试过:
res = [l for l in data if l[:3].sum() !=0]
print (res)
Run Code Online (Sandbox Code Playgroud)
它给出了结果。但是,正在寻找更好的、麻木的方法。
我必须将oldnames替换为程序中的newnames,如下所示:
oldnames = ['apple','banana','sheep']
for oldname in oldnames:
if oldname == 'apple':
newname = 'monkey'
if oldname == 'banana':
newname = 'monkey'
if oldname == 'sheep':
newname = 'lion'
Run Code Online (Sandbox Code Playgroud)
我的程序运行良好,但想知道最好的pythonic方式是什么?
如何使用逐个元素将两个列表合并为一个,如下所示:
list1 = ['a','c','e']
list2 = ['apple','carrot','elephant']
result = ['a', 'apple', 'c', 'carrot', 'e', 'elephant']
Run Code Online (Sandbox Code Playgroud)
审讯
result = [(x,y) for x,y in zip(list1,list2)]
print result
Run Code Online (Sandbox Code Playgroud)
但他们是元组,预计任何更容易的灵魂......
from collections import OrderedDict
l = [('Monkey', 71), ('Monkey', 78), ('Ostrich', 80), ('Ostrich', 96), ('Ant', 98)]
d = OrderedDict()
for i, j in l:
d[i] = j
print d
OrderedDict([('Monkey', 78), ('Ostrich', 96), ('Ant', 98)])
Run Code Online (Sandbox Code Playgroud)
预期的'd'应该是:
OrderedDict([('Monkey', (71,78)), ('Ostrich', (80,96)), ('Ant', 98)])
Run Code Online (Sandbox Code Playgroud)
如果所有值都被组合或列出,则没有问题.
我有一个full_list如下:
full_list = [[[-180, 90], [-180, 80], [-175, 80], [-175, 90]], [[-180, 80], [-180, 70], [-175, 70], [-175, 80]], [[-180, 70], [-180, 60], [-175, 60], [-175, 70]], [[-180, 60], [-180, 50], [-175, 50], [-175, 60]]]
Run Code Online (Sandbox Code Playgroud)
如何生成名为ans的列表如下?
ans = [[[-180, 90], [-180, 80], [-175, 80], [-175, 90]],
[[-180, 80], [-180, 70], [-175, 70], [-175, 80]],
[[-180, 70], [-180, 60], [-175, 60], [-175, 70]],
[[-180, 60], [-180, 50], [-175, 50], [-175, 60]]]
Run Code Online (Sandbox Code Playgroud)
两个列表都是相同的,唯一的区别是后者被分成行.