我有这个数据帧
X Y Z Value
0 18 55 1 70
1 18 55 2 67
2 18 57 2 75
3 18 58 1 35
4 19 54 2 70
Run Code Online (Sandbox Code Playgroud)
我想将其保存为具有此格式的文本文件
X Y Z Value
18 55 1 70
18 55 2 67
18 57 2 75
18 58 1 35
19 54 2 70
Run Code Online (Sandbox Code Playgroud)
我试过这段代码但是没有用:
np.savetxt('xgboost.txt', a.values, delimiter ='\t')
TypeError: Mismatch between array dtype ('object') and format specifier ('%.18e %.18e %.18e')
Run Code Online (Sandbox Code Playgroud) 我有一个3.5 go的大型csv文件,我想用pandas来阅读它.
这是我的代码:
import pandas as pd
tp = pd.read_csv('train_2011_2012_2013.csv', sep=';', iterator=True, chunksize=20000000, low_memory = False)
df = pd.concat(tp, ignore_index=True)
Run Code Online (Sandbox Code Playgroud)
我收到此错误:
pandas/parser.pyx in pandas.parser.TextReader.read (pandas/parser.c:8771)()
pandas/parser.pyx in pandas.parser.TextReader._read_rows (pandas/parser.c:9731)()
pandas/parser.pyx in pandas.parser.TextReader._tokenize_rows (pandas/parser.c:9602)()
pandas/parser.pyx in pandas.parser.raise_parser_error (pandas/parser.c:23325)()
CParserError: Error tokenizing data. C error: out of
Run Code Online (Sandbox Code Playgroud)
我的公羊的容量是8 Go.
我必须对一些情绪进行分类,我的数据框是这样的
Phrase Sentiment
is it good movie positive
wooow is it very goode positive
bad movie negative
Run Code Online (Sandbox Code Playgroud)
我做了一些预处理作为标记化停止词干...等我得到
Phrase Sentiment
[ good , movie ] positive
[wooow ,is , it ,very, good ] positive
[bad , movie ] negative
Run Code Online (Sandbox Code Playgroud)
我需要最终得到一个数据帧,该行是文本,其值是tf_idf,列是像这样的单词
good movie wooow very bad Sentiment
tf idf tfidf_ tfidf tf_idf tf_idf positive
Run Code Online (Sandbox Code Playgroud)
(其余两条线也一样)
我正在尝试使用基于keras tensorflow的卷积神经网络来训练我的网络,这是我的代码我在函数编译时遇到错误,但我不知道为什么
model = Sequential() # or Graph or whatever
model.add(Embedding(input_dim = n_symbols + 1,
output_dim = vocab_dim,
input_length=maxlen,
dropout=0.2))
# we add a Convolution1D, which will learn nb_filter
# word group filters of size filter_length:
model.add(Convolution1D(nb_filter=nb_filter,
filter_length=filter_length,
border_mode='valid',
activation='relu',
subsample_length=1))
# we use max pooling:
model.add(GlobalMaxPooling1D())
# We add a vanilla hidden layer:
model.add(Dense(hidden_dims))
model.add(Dropout(0.2))
model.add(Activation('relu'))
# We project onto a single unit output layer, and squash it with a sigmoid:
model.add(Dense(1))
model.add(Activation('sigmoid'))
model.compile(loss='mean_squared_error',
optimizer='adam',
metrics=['accuracy'])
model.fit(X_train, y_train,
batch_size=batch_size, …Run Code Online (Sandbox Code Playgroud) 我有这样的数据帧
Phrase Sentiment
[ good , movie ] positive
[wooow ,is , it ,very, good ] positive
[] negative
[] pOSTIVE
Run Code Online (Sandbox Code Playgroud)
列短语类型是对象,需要删除包含[]的行,我不知道如何使用python
像这样:
Phrase Sentiment
[ good , movie ] positive
[wooow ,is , it ,very, good ] positive
Run Code Online (Sandbox Code Playgroud) 我的函数返回28个图(图)但我需要将它们分组在一个图上这是我生成28个图的代码
for cat in df.ASS_ASSIGNMENT.unique() :
a = df.loc[df['ASS_ASSIGNMENT'] == cat]
dates = a['DATE']
prediction = a['CSPL_RECEIVED_CALLS']
plt.plot(dates,prediction)
plt.ylabel("nmb_app")
plt.legend([cat.decode('utf-8')],loc='best')
plt.xlabel(cat.decode('utf-8'))
Run Code Online (Sandbox Code Playgroud) python ×6
pandas ×4
dataframe ×3
csv ×1
delete-row ×1
figure ×1
keras ×1
large-files ×1
matplotlib ×1
memory ×1
numpy ×1
plot ×1
tensorflow ×1
text ×1
text-mining ×1
tf-idf ×1