我正在使用pydoop从hdfs读取文件,当我使用时:
import pydoop.hdfs as hd
with hd.open("/home/file.csv") as f:
print f.read()
Run Code Online (Sandbox Code Playgroud)
它显示了stdout中的文件.
有没有办法让我在这个文件中读取数据帧?我尝试过使用pandas的read_csv("/ home/file.csv"),但它告诉我无法找到该文件.确切的代码和错误是:
>>> import pandas as pd
>>> pd.read_csv("/home/file.csv")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python2.7/site-packages/pandas/io/parsers.py", line 498, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/lib64/python2.7/site-packages/pandas/io/parsers.py", line 275, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/usr/lib64/python2.7/site-packages/pandas/io/parsers.py", line 590, in __init__
self._make_engine(self.engine)
File "/usr/lib64/python2.7/site-packages/pandas/io/parsers.py", line 731, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/usr/lib64/python2.7/site-packages/pandas/io/parsers.py", line 1103, in __init__
self._reader = _parser.TextReader(src, **kwds)
File "pandas/parser.pyx", …Run Code Online (Sandbox Code Playgroud) 我得到错误
Cannot create directory /home/hadoop/hadoopinfra/hdfs/namenode/current
Run Code Online (Sandbox Code Playgroud)
尝试在本地Mac上安装hadoop时。
这可能是什么原因?仅供参考,我将xml文件放在下面:
mapred-site.xml:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Run Code Online (Sandbox Code Playgroud)
hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///home/hadoop/hadoopinfra/hdfs/namenode </value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/hadoop/hadoopinfra/hdfs/datanode </value>
</property>
</configuration>
Run Code Online (Sandbox Code Playgroud)
core-site.xml:
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/Cellar/hadoop/hdfs/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
Run Code Online (Sandbox Code Playgroud)
我认为我的问题出在我的hdfs-site.xml文件中,但是我不确定如何查明/更改它。
我正在使用本教程,文件路径中的“ hadoop”已替换为我的用户名。
我正试图转动这些数据:
ID
UserID
1 a1
1 a2
2 a1
2 a3
Run Code Online (Sandbox Code Playgroud)
进入如下数据框:
UserID a1 a2 a3
1 1 1 0
2 1 0 1
Run Code Online (Sandbox Code Playgroud)
我尝试过以下操作df = pd.pivot_table(df, index='UserID', columns='ID',但它给了我一个DataError: No numeric types to aggregate错误.我能做什么?
我正在尝试使用 hmmlearn 库在给定一些数据的情况下预测最佳序列,但出现错误。我的代码是:
from hmmlearn import hmm
trans_mat = np.array([[0.2,0.6,0.2],[0.4,0.0,0.6],[0.1,0.2,0.7]])
emm_mat = np.array([[0.2,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1],[0.1,0.1,0.1,0.1,0.2,0.1,0.1,0.1,0.1],[0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.2]])
start_prob = np.array([0.3,0.4,0.3])
X = [3,4,5,6,7]
model = GaussianHMM(n_components = 3, n_iter = 1000)
X = np.array(X)
model.startprob_ = start_prob
model.transmat_ = trans_mat
model.emissionprob_ = emm_mat
# Predict the optimal sequence of internal hidden state
x = model.fit([X])
print(model.decode([X]))
Run Code Online (Sandbox Code Playgroud)
但我收到一条错误消息:
Traceback (most recent call last):
File "hmm_loyalty.py", line 55, in <module>
x = model.fit([X])
File "build/bdist.macosx-10.6-x86_64/egg/hmmlearn/base.py", line 421, in fit
File "build/bdist.macosx-10.6-x86_64/egg/hmmlearn/hmm.py", line 183, in _init
File …Run Code Online (Sandbox Code Playgroud) 我有100个带有标签A的节点和2个带有标签B的节点.所有带有标签A的节点都与至少一个带有标签B的节点相关.如何获得标签A的所有节点都与标签B的两个节点相关?我尝试过以下方法:
MATCH p=(:A)-[:TYPE]->(b:B) where b.Name = 'XYZ' or b.Name = 'ABC'
RETURN p
Run Code Online (Sandbox Code Playgroud)
这只给了我与这两个节点中的任何一个节点相关的所有节点.
编辑:我已设法通过使用以下查询执行此操作:
MATCH (a:A)- [:TYPE] ->(t:Type) where t.Name = 'ABC'
MATCH (a:A)- [:TYPE] -> (u:Type) where u.Name = 'XYZ'
return a, t, u
Run Code Online (Sandbox Code Playgroud)
有没有办法优化这个?
我有一个简单的 HTML 模板,如下所示:
<html>
<head> Sentiment Analysis Dataset</head>
<form method='POST'>
<b> Unclassified Text </b>
<input type='text' name='Text' value={{db.Entry}} readonly><br>
</form>
</html>
Run Code Online (Sandbox Code Playgroud)
以下是我的 Flask 代码:
@app.route('/', methods=['GET', 'POST'])
def index():
if request.method == 'GET':
db={'Entry':data.next()}
print db
return render_template('index.html', db=db)
elif request.method == 'POST':
db={'Entry':data.next()}
print db
return render_template('index.html', db=db)
Run Code Online (Sandbox Code Playgroud)
db 字典看起来像{'Entry': 'Worst thing I've ever seen'}. 当我运行应用程序时,它只显示 html 文本框中的第一个单词。为什么会出现这种情况?我该怎么做才能在文本框中显示整个字符串?
编辑:我刚刚将 {{db.Entry}} 用引号引起来,它起作用了