小编lap*_*nio的帖子

Pandas concat ValueError:缓冲区dtype不匹配,预期'Python对象'但得到'long long'

我试图从功能选择挑战中分析Gizette数据集

当我尝试用基于熊猫示例的标签系列连接火车数据帧时

ValueError:缓冲区dtype不匹配,预期'Python对象'但是'long long'

码:

import pandas as pd

trainData = pd.read_table(filepath_or_buffer='GISETTE/gisette_train.data'
                              ,delim_whitespace=True
                              ,header=None
                              ,names=['AA','AB','AC','AD','AE','AF','AG','AH','AI','AJ','AK','AL','AM','AN','AO','AP','AQ','AR','AS','AT','AU','AV','AW','AX','AY','AZ','BA','BB','BC','BD','BE','BF','BG','BH','BI','BJ','BK','BL','BM','BN','BO','BP','BQ','BR','BS','BT','BU','BV','BW','BX','BY','BZ','CA','CB','CC','CD','CE','CF','CG','CH','CI','CJ','CK','CL','CM','CN','CO','CP','CQ','CR','CS','CT','CU','CV','CW','CX','CY','CZ','DA','DB','DC','DD','DE','DF','DG','DH','DI','DJ','DK','DL','DM','DN','DO','DP','DQ','DR','DS','DT','DU','DV','DW','DX','DY','DZ','EA','EB','EC','ED','EE','EF','EG','EH','EI','EJ','EK','EL','EM','EN','EO','EP','EQ','ER','ES','ET','EU','EV','EW','EX','EY','EZ','FA','FB','FC','FD','FE','FF','FG','FH','FI','FJ','FK','FL','FM','FN','FO','FP','FQ','FR','FS','FT','FU','FV','FW','FX','FY','FZ','GA','GB','GC','GD','GE','GF','GG','GH','GI','GJ','GK','GL','GM','GN','GO','GP','GQ','GR','GS','GT','GU','GV','GW','GX','GY','GZ','HA','HB','HC','HD','HE','HF','HG','HH','HI','HJ','HK','HL','HM','HN','HO','HP','HQ','HR','HS','HT','HU','HV','HW','HX','HY','HZ','IA','IB','IC','ID','IE','IF','IG','IH','II','IJ','IK','IL','IM','IN','IO','IP','IQ','IR','IS','IT','IU','IV','IW','IX','IY','IZ','JA','JB','JC','JD','JE','JF','JG','JH','JI','JJ','JK','JL','JM','JN','JO','JP','JQ','JR','JS','JT','JU','JV','JW','JX','JY','JZ','KA','KB','KC','KD','KE','KF','KG','KH','KI','KJ','KK','KL','KM','KN','KO','KP','KQ','KR','KS','KT','KU','KV','KW','KX','KY','KZ','LA','LB','LC','LD','LE','LF','LG','LH','LI','LJ','LK','LL','LM','LN','LO','LP','LQ','LR','LS','LT','LU','LV','LW','LX','LY','LZ','MA','MB','MC','MD','ME','MF','MG','MH','MI','MJ','MK','ML','MM','MN','MO','MP','MQ','MR','MS','MT','MU','MV','MW','MX','MY','MZ','NA','NB','NC','ND','NE','NF','NG','NH','NI','NJ','NK','NL','NM','NN','NO','NP','NQ','NR','NS','NT','NU','NV','NW','NX','NY','NZ','OA','OB','OC','OD','OE','OF','OG','OH','OI','OJ','OK','OL','OM','ON','OO','OP','OQ','OR','OS','OT','OU','OV','OW','OX','OY','OZ','PA','PB','PC','PD','PE','PF','PG','PH','PI','PJ','PK','PL','PM','PN','PO','PP','PQ','PR','PS','PT','PU','PV','PW','PX','PY','PZ','QA','QB','QC','QD','QE','QF','QG','QH','QI','QJ','QK','QL','QM','QN','QO','QP','QQ','QR','QS','QT','QU','QV','QW','QX','QY','QZ','RA','RB','RC','RD','RE','RF','RG','RH','RI','RJ','RK','RL','RM','RN','RO','RP','RQ','RR','RS','RT','RU','RV','RW','RX','RY','RZ','SA','SB','SC','SD','SE','SF','SG','SH','SI','SJ','SK','SL','SM','SN','SO','SP','SQ','SR','SS','ST','SU','SV','SW','SX','SY','SZ','TA','TB','TC','TD','TE','TF'])
# print 'finished with train data'
trainLabel = pd.read_table(filepath_or_buffer='GISETTE/gisette_train.labels'
                           ,squeeze=True
                           ,names=['label']
                           ,delim_whitespace=True
                           ,header=None)
trainData.info()
Run Code Online (Sandbox Code Playgroud)

输出

    <class 'pandas.core.frame.DataFrame'>
    MultiIndex: 6000 entries   
    Columns: 500 entries, AA to TF   
    dtypes: int64(500)None



trainLabel.describe()
Run Code Online (Sandbox Code Playgroud)

输出

    count    6000.000000
    mean        0.000000
    std         1.000083
    min        -1.000000
    25%        -1.000000
    50%         0.000000
    75%         1.000000
    max         1.000000
    dtype: float64

readyToTrain = pd.concat([trainData, trainLabel], axis=1)
Run Code Online (Sandbox Code Playgroud)

完整的堆栈跟踪

   File "C:\env\Python27\lib\site-packages\pandas\tools\merge.py", line 717, in concat  
     verify_integrity=verify_integrity) …
Run Code Online (Sandbox Code Playgroud)

python-2.7 pandas

8
推荐指数
1
解决办法
1万
查看次数

如何在管道之后将变量名称映射到要素

我修改了OneHotEncoder示例以实际训练LogisticRegression.我的问题是如何将生成的权重映射回分类变量?

def oneHotEncoderExample(sqlContext: SQLContext): Unit = {

val df = sqlContext.createDataFrame(Seq(
    (0, "a", 1.0),
    (1, "b", 1.0),
    (2, "c", 0.0),
    (3, "d", 1.0),
    (4, "e", 1.0),
    (5, "f", 0.0)
)).toDF("id", "category", "label")
df.show()

val indexer = new StringIndexer()
  .setInputCol("category")
  .setOutputCol("categoryIndex")
  .fit(df)
val indexed = indexer.transform(df)
indexed.select("id", "categoryIndex").show()

val encoder = new OneHotEncoder()
  .setInputCol("categoryIndex")
  .setOutputCol("features")
val encoded = encoder.transform(indexed)
encoded.select("id", "features").show()


val lr = new LogisticRegression()
  .setMaxIter(10)
  .setRegParam(0.01)

val pipeline = new Pipeline()
  .setStages(Array(indexer, encoder, lr))

// Fit the pipeline to …
Run Code Online (Sandbox Code Playgroud)

scala apache-spark apache-spark-ml apache-spark-mllib

8
推荐指数
1
解决办法
2346
查看次数

PySpark中的PCA分析

查看http://spark.apache.org/docs/latest/mllib-dimensionality-reduction.html.这些示例似乎只包含Java和Scala.

Spark MLlib是否支持Python的PCA分析?如果是这样,请指出一个例子.如果没有,如何将Spark与scikit-learn结合起来?

python pca apache-spark apache-spark-ml apache-spark-mllib

7
推荐指数
1
解决办法
1万
查看次数

Spark:线程中的异常"dag-scheduler-event-loop"java.lang.OutOfMemoryError:Java堆空间

使用spark-1.6.0-bin-hadoop2.6根据http://spark.apache.org/docs/latest/configuration.html

我可以用spark.executor.memory设置堆大小,它是来自spark-submit的--executor -memory

运行我的作业时,执行程序内存不会超过分配的内存但我收到错误:

java.lang.OutOfMemoryError:Java堆空间

我提交的工作是:

./bin/spark-submit \
  --class edu.gatech.cse8803.main.Main \
  --master spark://ec2-52-23-155-99.compute-1.amazonaws.com:6066 \
  --deploy-mode cluster \
  --executor-memory 27G \
  --total-executor-cores 100 \
  /root/final_project/phenotyping_w_anchors_161-assembly-1.0.jar \
  1000
Run Code Online (Sandbox Code Playgroud)

我使用2个m4.2xlarge实例(32.0 GB,8个内核)

amazon-ec2 amazon-web-services apache-spark

6
推荐指数
1
解决办法
5471
查看次数

一种指示每行多个指标变量的有效方法?

给出"空"指标数据帧:

Index    Ind_A    Ind_B
  1        0        0
  2        0        0
  3        0        0
  4        0        0
Run Code Online (Sandbox Code Playgroud)

和值的数据框:

Index    Indicators
  1         Ind_A
  3         Ind_A
  3         Ind_B
  4         Ind_A
Run Code Online (Sandbox Code Playgroud)

我想最终得到:

Index    Ind_A    Ind_B
  1        1        0
  2        0        0
  3        1        1
  4        1        0
Run Code Online (Sandbox Code Playgroud)

没有for循环有没有办法做到这一点?

r indicator dataframe

5
推荐指数
1
解决办法
539
查看次数

如何将字符串列表转换为数据框?

我想将字符串列表转换为数据框.给定结构:

lst <- list(NULL, "PSYC", c("PSYC", "PHIL"), "PHIL")
Run Code Online (Sandbox Code Playgroud)

我想生成一个数据帧

      Index     major_cd
       1            NULL
       2            PSYC
       3            PSYC
       3            PHIL
       4            PHIL
Run Code Online (Sandbox Code Playgroud)

请注意列表中的第3项如何变为数据框的2行.

r list dataframe

4
推荐指数
1
解决办法
665
查看次数

如何在spark中检索最小值的记录?

假设我有一个像这样的 RDD -> (String, Date, Int)

[("sam", 02-25-2016, 2), ("sam",02-14-2016, 4), ("pam",03-16-2016, 1), ("pam",02-16-2016, 5)]
Run Code Online (Sandbox Code Playgroud)

我想将它转换成这样的列表 ->

[("sam", 02-14-2016, 4), ("pam",02-16-2016, 5)]
Run Code Online (Sandbox Code Playgroud)

其中值是记录,其中日期是每个键的最小值。做这个的最好方式是什么?

scala apache-spark

3
推荐指数
1
解决办法
4193
查看次数

如何使用代码使反应观察者无效?

给出类似pseduo的代码:

dateRange <- reactive({
    input$select_dates #action button

    save_selected_date_range()
    isolate(input$dateRange)
})


customerId <- reactive({
    #check if customer has saved date range if so trigger
    saved_info <- saved_preferences(input$customerId)
    if(nrow(saved_info) > 0) {
      flog.info(saved_info)
      updateDateRangeInput(session, "dateRange", start = saved_info$start, end = saved_info$start)
    }

    input$customerId
})
Run Code Online (Sandbox Code Playgroud)

场景:

输入: 选定的日期范围和客户选择器.按下操作按钮时会注册日期范围.

期望的行动: 我们希望能够在选择客户时加载已保存的日期范围(如果可用).

问题: 如何触发输入$ select_dates,就像按下操作按钮一样?像没有计时器的invalidateLater之类的东西会很好.或者,如果有一种手动方式将输入$ select_dates标记或标记为无效.

shiny

2
推荐指数
1
解决办法
1894
查看次数