小编fig*_*ggy的帖子

嵌套Json到pandas DataFrame具有特定格式

我需要在pandas DataFrame中以特定格式格式化Json文件的内容,以便我可以运行pandassql来转换数据并通过评分模型运行它.

file = C:\ scoring_model\json.js('file'的内容如下)

{
"response":{
  "version":"1.1",
  "token":"dsfgf",
   "body":{
     "customer":{
         "customer_id":"1234567",
         "verified":"true"
       },
     "contact":{
         "email":"mr@abc.com",
         "mobile_number":"0123456789"
      },
     "personal":{
         "gender": "m",
         "title":"Dr.",
         "last_name":"Muster",
         "first_name":"Max",
         "family_status":"single",
         "dob":"1985-12-23",
     }
   }
 }
Run Code Online (Sandbox Code Playgroud)

我需要数据框看起来像这样(显然在同一行上的所有值,尝试尽可能地格式化这个问题):

version | token | customer_id | verified | email      | mobile_number | gender |
1.1     | dsfgf | 1234567     | true     | mr@abc.com | 0123456789    | m      |

title | last_name | first_name |family_status | dob
Dr.   | Muster    | Max        | single       | 23.12.1985
Run Code Online (Sandbox Code Playgroud)

我已经查看了有关此主题的所有其他问题,尝试了各种方法将Json文件加载到pandas中

`with open(r'C:\scoring_model\json.js', 'r') …
Run Code Online (Sandbox Code Playgroud)

python format json nested pandas

31
推荐指数
1
解决办法
3万
查看次数

管道中的python特征选择:如何确定特征名称?

我使用管道和grid_search来选择最佳参数,然后使用这些参数来拟合最佳管道('best_pipe').但是,由于feature_selection(SelectKBest)在管道中,所以没有适用于SelectKBest.

我需要知道'k'所选功能的功能名称.有任何想法如何检索它们?先感谢您

from sklearn import (cross_validation, feature_selection, pipeline,
                     preprocessing, linear_model, grid_search)
folds = 5
split = cross_validation.StratifiedKFold(target, n_folds=folds, shuffle = False, random_state = 0)

scores = []
for k, (train, test) in enumerate(split):

    X_train, X_test, y_train, y_test = X.ix[train], X.ix[test], y.ix[train], y.ix[test]

    top_feat = feature_selection.SelectKBest()

    pipe = pipeline.Pipeline([('scaler', preprocessing.StandardScaler()),
                                 ('feat', top_feat),
                                 ('clf', linear_model.LogisticRegression())])

    K = [40, 60, 80, 100]
    C = [1.0, 0.1, 0.01, 0.001, 0.0001, 0.00001]
    penalty = ['l1', 'l2']

    param_grid = [{'feat__k': K,
                  'clf__C': C,
                  'clf__penalty': penalty}]

    scoring …
Run Code Online (Sandbox Code Playgroud)

pipeline feature-selection scikit-learn

6
推荐指数
3
解决办法
7472
查看次数