小编sar_iii的帖子

无法在intellij IDE中的scala中找到或加载主类

我已经搜索过这个错误,但答案是针对Java的,但我的情况是Scala.我试图在IntelliJ IDE sentimenAnalysis中运行这个项目 ,但它会抛出一个错误.这也是项目的结构.找不到类

更新1根据答案,在搜索类名称的末尾添加$

添加sbt任务后更新2:

更新3 我的问题通过导入项目解决,而不是直接从GitHub获取,可能是迈克艾伦说可以解决问题的手动配置,但我无法成功应用它.

scala intellij-idea sbt

21
推荐指数

4
解决办法

3万
查看次数

KeyError：“ [[”，“]]]都不在“ [列]”中

我想在我的数据框中切两列。

这是我这样做的代码：

import pandas as pd
df = pd.read_csv('source.txt',header=0)
cidf=df.loc[:,['vocab','sumCI']]
print(cidf)

Run Code Online (Sandbox Code Playgroud)

这是数据示例：

ID  vocab   sumCI   sumnextCI   new_diff
450      statu    3.0        0.0       3.0
391     provid    4.0        1.0       3.0
382  prescript    3.0        0.0       3.0
300   lymphoma    2.0        0.0       2.0
405      renew    2.0        0.0       2.0

Run Code Online (Sandbox Code Playgroud)

**首先我收到此错误：**

KeyError: “None of [['', '']] are in the [columns]”'

Run Code Online (Sandbox Code Playgroud)

我尝试过的

我在读取文件时尝试header用index 0
我尝试使用以下代码重命名列：
```
df.rename(columns=df.iloc[0],inplace=True)
```
Run Code Online (Sandbox Code Playgroud)
我也试过这个：
```
df.columns = df.iloc[1]
df=df.reindex(df.index.drop(0))
```
Run Code Online (Sandbox Code Playgroud)
在此链接中也尝试过评论

以上都不是解决问题的方法。

python slice dataframe pandas

8
推荐指数

1
解决办法

1万
查看次数

在pycharm中并行运行两个python脚本

我看过这个问题，但我不确定我是否正确理解。

我打开了 pycharm 和一个 python 脚本及其运行（它是主题建模）。

此外，我还有另一个 python 脚本，我在同一台服务器的另一个 pycharm 中打开了该脚本。我也运行它。

现在这两个程序在同一台服务器上运行，我应该提一下，我没有更改任何配置，无论是 server 还是 pycharm。

你觉得这样好吗？或者一个脚本在技术上不会运行（就进度而言，我的意思是它只显示它的运行但实际上不会运行）直到另一个脚本完成？

python parallels pycharm

7
推荐指数

3
解决办法

2万
查看次数

如何在keras中存储每个时期的操作结果（如TOPK）

我在 keras 中编写了一个自定义层。在这个自定义层的一部分中，可以说我有一个这样的矩阵：

c = tf.cast(tf.nn.top_k(tf.nn.top_k(n, tf.shape(n)[1])[1][:, ::-1], tf.shape(n)[1])[1][:, ::-1], dtype=tf.float32)

Run Code Online (Sandbox Code Playgroud)

我的问题是如何跟踪每个时期的结果值？

例如，如果我有 20 个纪元，我需要将这个矩阵的 20 个保存在一个csv文件中。

（我知道如何保存模型的权重，但这是中间层操作的结果，我需要跟踪这个矩阵）。

我做了什么：

这是我的层的结构：

class my_layer(Layer):
    def __init__(self, topk, ctype, **kwargs):
    self.x_prev = None
    self.topk_mat = None

   def call(self, x):
     'blah blah'

   def get_config(self):
      'blah blah'

   def k_comp_tanh(self,x, f=6):
     'blah blah'
      if self.topk_mat is None:
            self.topk_mat = self.add_weight(shape=(20, 25),
                                          initializer='zeros',
                                          trainable=False,
                                          # dtype=tf.float32,
                                          name='topk_mat')

     c = tf.cast(tf.nn.top_k(tf.nn.top_k(n, tf.shape(n)[1])[1][:, ::-1], tf.shape(n)[1])[1][:, ::-1], dtype=tf.float32)
     self.topk_mat.assign(c)

Run Code Online (Sandbox Code Playgroud)

用于构建模型和拟合数据的代码：

class AutoEncoder(object):
def __init__(self, input_size, dim, comp_topk=None, ctype=None, …

Run Code Online (Sandbox Code Playgroud)

python keras tensorflow

7
推荐指数

1
解决办法

364
查看次数

交叉验证时,键中的键错误不在索引中

我在我的数据集上应用了svm.我的数据集是多标签意味着每个观察都有多个标签.

虽然KFold cross-validation它引起了错误not in index.

它显示从601到6007的索引not in index(我有1 ... 6008个数据样本).

这是我的代码:

   df = pd.read_csv("finalupdatedothers.csv")
categories = ['ADR','WD','EF','INF','SSI','DI','others']
X= df[['sentences']]
y = df[['ADR','WD','EF','INF','SSI','DI','others']]
kf = KFold(n_splits=10)
kf.get_n_splits(X)
for train_index, test_index in kf.split(X,y):
    print("TRAIN:", train_index, "TEST:", test_index)
    X_train, X_test = X.iloc[train_index], X.iloc[test_index]
    y_train, y_test = y.iloc[train_index], y.iloc[test_index]

SVC_pipeline = Pipeline([
                ('tfidf', TfidfVectorizer(stop_words=stop_words)),
                ('clf', OneVsRestClassifier(LinearSVC(), n_jobs=1)),
            ])

for category in categories:
    print('... Processing {} '.format(category))
    # train the model using X_dtm & y
    SVC_pipeline.fit(X_train['sentences'], y_train[category])

    prediction = …

Run Code Online (Sandbox Code Playgroud)

python scikit-learn cross-validation

6
推荐指数

1
解决办法

4317
查看次数

词云没有正确显示词频

我已经在词云中绘制了我的文本数据。这是我拥有的数据框

vocab   sumCI
aid      3
tinnitu  3
sudden   3
squamou  3
saphen   3
problem  3
prednison 3
pain    2
dysuria 3
cancer  2

Run Code Online (Sandbox Code Playgroud)

然后我将它转换为这样的字符串（实际上，我已经复制了每个单词在我的数据框中出现的次数，然后提供给函数）：

aid aid aid tinnitu tinnitu tinnitu sudden sudden sudden squamou squamou squamou

Run Code Online (Sandbox Code Playgroud)

然后我使用此代码来可视化文本数据：

def generate_wordcloud(text): # optionally add: stopwords=STOPWORDS and change the arg below
    wordcloud = WordCloud(
                          background_color="white",
                          width=1200, height=1000,
                          relative_scaling = 1.0,
                          collocations=False
                          ).generate(text)
    plt.imshow(wordcloud)
    plt.axis("off")
    plt.show()

cidf=cidf.loc[cidf.index.repeat(cidf['sumCI'])].reset_index(drop=True)
strCI = ' '.join(cidf['vocab'])
print(strCI)
generate_wordcloud(strCI)

Run Code Online (Sandbox Code Playgroud)

然后结果是这样的：

如您所见，大多数单词会重复 2 或 3 次，但它们在词云中的大小并未显示这一点。即使是同样大小的单词，大小也有很大的不同！

例如：

例如看看这个数据框中的“tinnitu”和“dysuria”，它们的频率都是3，耳鸣很大，但排尿困难你会发现它很难，因为它很小。

谢谢：）

python visualization word-cloud pandas

6
推荐指数

1
解决办法

1578
查看次数

如何重塑文本数据以适合于Keras中的LSTM模型

更新1：

我指的代码就是本书中的代码，您可以在这里找到。

唯一的事情是我不想embed_size在解码器部分中拥有。这就是为什么我认为根本不需要具有嵌入层的原因，因为如果我放入嵌入层，则需要embed_size在解码器部分中放置（如果Im错误，请更正我）。

总的来说，我试图在不使用嵌入层的情况下采用相同的代码，因为我需要vocab_size在解码器部分中具有。

我认为评论中提供的建议可能是正确的（using one_hot_encoding）我如何面对此错误：

当我做的时候one_hot_encoding：

tf.keras.backend.one_hot(indices=sent_wids, classes=vocab_size)

Run Code Online (Sandbox Code Playgroud)

我收到此错误：

in check_num_samples you should specify the + steps_name + argument ValueError: If your data is in the form of symbolic tensors, you should specify the steps_per_epoch argument (instead of the batch_size argument, because symbolic tensors are expected to produce batches of input data)

我准备数据的方式是这样的：

sent_lensis的形状，(87716, 200)我想以可以将其输入LSTM的方式重塑形状。这里200代表sequence_lenght，87716是我拥有的样本数。

以下是代码LSTM …

python autoencoder lstm keras tensorflow

6
推荐指数

1
解决办法

274
查看次数

如何在没有分组依据的情况下连接pandas中数据帧的列的所有行

我有一个像这样的数据框

已经服用 12 天，同时服用 60 毫克百忧解 4 年多了。索引句

  1   I feel the best I have felt in years.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
  2   "I have taken for over 7 years.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
  3   I slept 2 hours".                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
  4   IT SAVED MY LIFE                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
  5   IT SAVED MY LIFE"

Run Code Online (Sandbox Code Playgroud)

然后我想将它们连接在一个数组中。问题是，可能有一些句子重复，但我仍然想保留所有句子，所以结果将是：

["I feel the best I have felt in years", "I have taken for over 7 years." , "I slept 2 hours." , "IT SAVED MY LIFE" , "IT SAVED MY LIFE"]

Run Code Online (Sandbox Code Playgroud)

我已经尝试过这个链接和这个都在r中。

我也尝试过这种方法：

dfsent.groupby(['sentences']).apply(','.join) …

Run Code Online (Sandbox Code Playgroud)

python concatenation dataframe pandas

5
推荐指数

1
解决办法

2万
查看次数

我如何基于张量流中的条件获得最高的最小张量值

我有一个这样的张量：

sim_topics = [[0.65 0.   0.   0.   0.42  0.   0.   0.51 0.   0.34 0.]
              [0.   0.51 0.   0.  0.52  0.   0.   0.   0.53 0.42 0.]
              [0.    0.32 0.  0.50 0.34  0.   0.   0.39 0.32 0.52 0.]
              [0.    0.23 0.37 0.   0.    0.37 0.37 0.   0.47 0.39 0.3 ]]

Run Code Online (Sandbox Code Playgroud)

和这样的一个布尔张量：

bool_t = [False  True  True  True]

Run Code Online (Sandbox Code Playgroud)

我想以某种方式sim_topics基于bool标志选择一部分，bool_t它仅选择top k smallest每行的值（如果该行为true，则不保留原样）。

所以预期的输出将是这样的：（在这里k=2）

[[0.65 0.   0.    0.   0.42  0.   0.   0.51 0.   0.34 0.]
 [0.   0.51 …

Run Code Online (Sandbox Code Playgroud)

python-3.x deep-learning tensorflow

5
推荐指数

1
解决办法

92
查看次数

如何将列名放入pandas中具有特定条件的数据框单元格中

我有这样的数据帧:

         ADR     WD      EF    INF    SSI   DI
0        1.0    NaN     NaN    NaN    NaN  NaN
1        NaN    NaN     1      1      NaN  NaN
2        NaN    NaN     NaN    NaN    1    NaN
3        NaN    1       1      1      NaN  NaN
4        NaN    1.0     NaN    NaN    NaN  NaN

Run Code Online (Sandbox Code Playgroud)

我希望结果是这样的:

[["ADR"],["EF","INF"],["SSI"],["WD","EF","INF"],["WD"]]

Run Code Online (Sandbox Code Playgroud)

如您所见,如果列中有列的名称已被替换1.所有都被放在另一个阵列中.

我看过这个帖子链接,但它没有帮助我,因为名称已经静态改变.

谢谢:)

python data-analysis dataframe pandas

4
推荐指数

2
解决办法

53
查看次数

如何在matplotlib pandas中将两个文件的两个条形图组合在一张图中

我有两个具有相同列但内容不同的数据框。我已经策划了dffinal data frame。现在我想dffinal_no在同一张图上绘制另一个数据框以进行比较。

例如，一个条形图blue colour，以及带有另一种颜色的相同条形图differentiating in y-axis。

这是我绘制第一个数据框的代码的一部分。

dffinal = df[['6month','final-formula','numPatients6month']].drop_duplicates().sort_values(['6month'])

ax=dffinal.plot(kind='bar',x='6month', y='final-formula')
import matplotlib.pyplot as plt
ax2 = ax.twinx()
dffinal.plot(ax=ax2,x='6month', y='numPatients6month')
plt.show()

Run Code Online (Sandbox Code Playgroud)

现在想象我有另一个dffinal_no具有相同列的数据框，我如何在同一个图表中绘制它？

这是我绘制的第一个图表，我希望该图表上的另一个条形图使用另一种颜色。

所以@Mohamed Thasin ah 的答案在某种程度上是我想要的，除了正确的 y 轴不正确。

我想both data frame基于(6month, final-formula)但右边y-axis只是显示患者数量，作为用户的信息。

实际上，我DO NOT希望第一个 df 基于，final-fomula第二个 df 基于NumberPatients.

Update1 jast 作为参考它看起来像我的数据框

dffinal = df[['6month','final-formula','numPatients6month']].drop_duplicates().sort_values(['6month'])
nocidffinal = nocidf[['6month','final-formula','numPatients6month']].drop_duplicates().sort_values(['6month'])
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax2 = …

Run Code Online (Sandbox Code Playgroud)

group-by matplotlib data-analysis dataframe pandas

2
推荐指数

1
解决办法

6091
查看次数

从数据框的不同列中显示堆积条形图的每种颜色的值

这是我的数据框：

          6month  final-formula     Question Text  numPatients6month
286231       1       0.031730        CI_FINANCE                977
286270       1       0.147390            CI_MJO                977
286276       1       0.106448  CI_CONCENTRATING                977
286700       2       0.010323            CI_MJO                775
286323       2       0.018065        CI_FINANCE                775
286401       2       0.034839  CI_CONCENTRATING                775
286228       3       0.032020  CI_CONCENTRATING                812
286238       3       0.061576            CI_MJO                812
286292       3       0.008621        CI_FINANCE                812
286690       4       0.008097            CI_MJO                741
286342       4       0.005398        CI_FINANCE                741
286430       4       0.060729  CI_CONCENTRATING                741
286481       5       0.009840        CI_FINANCE                813
287441       5       0.008610            CI_MJO                813
286362       5       0.041820  CI_CONCENTRATING …

Run Code Online (Sandbox Code Playgroud)

python matplotlib data-analysis dataframe pandas

1
推荐指数

2
解决办法

3689
查看次数

标签统计

data-analysis ×3

autoencoder ×1

concatenation ×1

cross-validation ×1

deep-learning ×1

intellij-idea ×1

lstm ×1

sbt ×1

scikit-learn ×1

visualization ×1