使用 JSON 功能读取 CSV

phy*_*ker 8 python csv json pandas

我正在尝试读取包含 JSON 功能的大型 CSV(位置位于此处)。对于第一行,假设有 100 行,文件如下所示:

Time,location,labelA,labelB
2019-09-10,{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8},nan,nan
Run Code Online (Sandbox Code Playgroud)

我按照这个问题来解析位置列。该解决方案基本上将助手定义为:

def CustomParser(data):
    import json
    j1 = json.loads(data)
    return j1
Run Code Online (Sandbox Code Playgroud)

进而

df=pd.read_csv('data.csv', nrows=100,converters={'location':CustomParser},header=0)
Run Code Online (Sandbox Code Playgroud)

我收到以下与 JSON 格式相关的错误:

JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Run Code Online (Sandbox Code Playgroud)

Q1:如何将特征位置解析到新列上?

Q2(一般情况):对于数据中的 nrows>100,最后一个特征(labelA 和 labelB)也具有具有不同键和值的 JSON 格式。我怎样才能通过解析包含 JSON(甚至部分)的每个功能来读取整个 CSV?

test100v1.csv

Zeit,device,Text,Typ,Position,Data,Data1,Data2
2019-09-10T12:13:24.000Z,CO 5052994,Lifesign,cgmon_Lifesign,{"lng":12.9975201,"alt":413.0,"time":"2019-09-10T12:09:58Z","error":7.0,"lat":47.8258582},N/A,N/A,N/A
2019-09-10T12:13:23.000Z,CO 5050450,Lifesign,cgmon_Lifesign,{"lng":14.3195367,"alt":260.0,"time":"2019-09-10T12:12:37Z","error":10.0,"lat":48.2695571},N/A,N/A,N/A
2019-09-10T12:13:21.000Z,CO 5050903,Location updated,c8y_LocationUpdate,{"lng":15.2678846,"alt":494.0,"time":"2019-09-10T12:13:21Z","error":11.0,"lat":48.7477466},N/A,N/A,N/A
2019-09-10T12:13:20.000Z,CO 5051466,Location updated,c8y_LocationUpdate,{"lng":17.64815,"alt":106.0,"time":"2019-09-10T12:13:20Z","error":3.0,"lat":47.6851036},N/A,N/A,N/A
2019-09-10T12:13:20.000Z,CO 5050569,Location updated,c8y_LocationUpdate,{"lng":14.0582286,"alt":286.0,"time":"2019-09-10T12:13:20Z","error":14.0,"lat":48.1808019},N/A,N/A,N/A
2019-09-10T12:13:18.000Z,CO 5050666,Location updated,c8y_LocationUpdate,{"lng":14.5788998,"alt":25.0,"time":"2019-09-10T12:13:18Z","error":12.0,"lat":53.4233772},N/A,N/A,N/A
2019-09-10T12:13:17.000Z,CO 5051113,Location updated,c8y_LocationUpdate,{"lng":14.325237,"alt":254.0,"time":"2019-09-10T12:13:17Z","error":13.0,"lat":48.2600698},N/A,N/A,N/A
2019-09-10T12:13:10.000Z,CO 5050666,Lifesign,cgmon_Lifesign,{"lng":14.5788998,"alt":25.0,"time":"2019-09-10T12:13:18Z","error":12.0,"lat":53.4233772},N/A,N/A,N/A
2019-09-10T12:13:07.000Z,CO 5051887,Location updated,c8y_LocationUpdate,{"lng":13.8064589,"alt":510.0,"time":"2019-09-10T12:13:07Z","error":10.0,"lat":46.5672814},N/A,N/A,N/A
2019-09-10T12:12:58.000Z,CO 5051131,Lifesign,cgmon_Lifesign,{"lng":11.4933341,"alt":581.0,"time":"2019-09-10T12:08:43Z","error":13.0,"lat":47.2738262},N/A,N/A,N/A
2019-09-10T12:12:55.000Z,CO 5051696,Lifesign,cgmon_Lifesign,{"lng":14.3200391,"alt":249.0,"time":"2019-09-10T12:04:38Z","error":10.0,"lat":48.26912},N/A,N/A,N/A
2019-09-10T12:12:48.000Z,CO 5051326,Lifesign,cgmon_Lifesign,{"lng":9.7326865,"alt":403.0,"time":"2019-09-10T12:04:34Z","error":10.0,"lat":47.4595067},N/A,N/A,N/A
2019-09-10T12:12:47.000Z,CO 5052218,Lifesign,cgmon_Lifesign,{"lng":14.3197285,"alt":262.0,"time":"2019-09-10T12:10:11Z","error":9.0,"lat":48.2688562},N/A,N/A,N/A
2019-09-10T12:12:45.000Z,CO 5050405,Lifesign,cgmon_Lifesign,{"lng":14.2755301,"alt":253.0,"time":"2019-09-08T12:13:37Z","error":8.0,"lat":48.2468603},N/A,N/A,N/A
2019-09-10T12:12:44.000Z,CO 5050706,Lifesign,cgmon_Lifesign,{"lng":15.0519029,"alt":124.0,"time":"2019-09-10T12:07:07Z","error":13.0,"lat":59.0569164},N/A,N/A,N/A
2019-09-10T12:12:42.000Z,CO 5050903,Lifesign,cgmon_Lifesign,{"lng":15.2678846,"alt":494.0,"time":"2019-09-10T12:13:21Z","error":11.0,"lat":48.7477466},N/A,N/A,N/A
2019-09-10T12:12:38.000Z,CO 5051303,Lifesign,cgmon_Lifesign,{"lng":21.9561564,"alt":244.0,"time":"2019-09-10T09:04:08Z","error":11.0,"lat":42.9978861},N/A,N/A,N/A
2019-09-10T12:12:37.000Z,CO 5051558,Location updated,c8y_LocationUpdate,{"lng":13.806765,"alt":514.0,"time":"2019-09-10T12:12:37Z","error":6.0,"lat":46.5672868},N/A,N/A,N/A
2019-09-10T12:12:37.000Z,CO 5050450,Location updated,c8y_LocationUpdate,{"lng":14.3195367,"alt":260.0,"time":"2019-09-10T12:12:37Z","error":10.0,"lat":48.2695571},N/A,N/A,N/A
2019-09-10T12:12:37.000Z,CO 5050450,Location updated,c8y_LocationUpdate,{"lng":14.3195367,"alt":260.0,"time":"2019-09-10T12:12:37Z","error":10.0,"lat":48.2695571},N/A,N/A,N/A
2019-09-10T12:12:26.000Z,CO 5050408,Lifesign,cgmon_Lifesign,{"lng":14.2761472,"alt":280.0,"time":"2019-09-08T12:13:28Z","error":11.0,"lat":48.246868},N/A,N/A,N/A
2019-09-10T12:12:25.000Z,CO 5051418,Location updated,c8y_LocationUpdate,{"lng":15.5343521,"alt":550.0,"time":"2019-09-10T12:12:25Z","error":11.0,"lat":48.7483843},N/A,N/A,N/A
2019-09-10T12:12:24.000Z,CO 5050556,Location updated,c8y_LocationUpdate,{"lng":13.0783658,"alt":435.0,"time":"2019-09-10T12:12:24Z","error":6.0,"lat":47.7692905},N/A,N/A,N/A
2019-09-10T12:12:22.000Z,CO 5052730,Lifesign,cgmon_Lifesign,{"lng":14.3180816,"alt":251.0,"time":"2019-09-10T12:07:29Z","error":14.0,"lat":48.2771342},N/A,N/A,N/A
2019-09-10T12:12:11.000Z,CO 5051654,Location updated,c8y_LocationUpdate,{"lng":15.3298821,"alt":404.0,"time":"2019-09-10T12:12:11Z","error":13.0,"lat":47.1319909},N/A,N/A,N/A
2019-09-10T12:12:01.000Z,CO 5051400,Location updated,c8y_LocationUpdate,{"lng":13.4580769,"alt":306.0,"time":"2019-09-10T12:12:01Z","error":6.0,"lat":48.4494078},N/A,N/A,N/A
2019-09-10T12:11:25.000Z,CO 5050495,Location updated,c8y_LocationUpdate,{"lng":13.3380207,"alt":423.0,"time":"2019-09-10T12:11:25Z","error":14.0,"lat":48.6001935},N/A,N/A,N/A
2019-09-10T12:11:15.000Z,CO 5052483,Motion started,c8y_MotionDetected,{"lng":12.0622763,"alt":511.0,"time":"2019-09-10T12:11:04Z","error":5.0,"lat":47.4938857},N/A,N/A,N/A
2019-09-10T12:11:13.000Z,CO 5052999,Location updated,c8y_LocationUpdate,{"lng":13.06406,"alt":425.0,"time":"2019-09-10T12:11:13Z","error":5.0,"lat":47.8167399},N/A,N/A,N/A
2019-09-10T12:11:04.000Z,CO 5052483,Location updated,c8y_LocationUpdate,{"lng":12.0622763,"alt":511.0,"time":"2019-09-10T12:11:04Z","error":5.0,"lat":47.4938857},N/A,N/A,N/A
2019-09-10T12:11:01.000Z,CO 5051844,Location updated,c8y_LocationUpdate,{"lng":11.5022149,"alt":556.0,"time":"2019-09-10T12:11:01Z","error":6.0,"lat":47.2765674},N/A,N/A,N/A
2019-09-10T12:11:01.000Z,CO 5051920,Lifesign,cgmon_Lifesign,{"lng":15.0575633,"alt":619.0,"time":"2019-09-10T12:10:44Z","error":13.0,"lat":47.3821983},N/A,N/A,N/A
2019-09-10T12:10:59.000Z,CO 5051679,Location updated,c8y_LocationUpdate,{"lng":15.0565198,"alt":599.0,"time":"2019-09-10T12:10:59Z","error":14.0,"lat":47.3821768},N/A,N/A,N/A
2019-09-10T12:10:55.000Z,CO 5050630,Location updated,c8y_LocationUpdate,{"lng":15.0587754,"alt":596.0,"time":"2019-09-10T12:10:55Z","error":14.0,"lat":47.3820239},N/A,N/A,N/A
2019-09-10T12:10:52.000Z,CO 5051844,Lifesign,cgmon_Lifesign,{"lng":11.5022149,"alt":556.0,"time":"2019-09-10T12:11:01Z","error":6.0,"lat":47.2765674},N/A,N/A,N/A
2019-09-10T12:10:51.000Z,CO 5052999,Lifesign,cgmon_Lifesign,{"lng":13.06406,"alt":425.0,"time":"2019-09-10T12:11:13Z","error":5.0,"lat":47.8167399},N/A,N/A,N/A
2019-09-10T12:10:50.000Z,CO 5051921,Lifesign,cgmon_Lifesign,{"lng":15.0581282,"alt":606.0,"time":"2019-09-10T12:10:36Z","error":6.0,"lat":47.3817808},N/A,N/A,N/A
2019-09-10T12:10:49.000Z,CO 5051679,Lifesign,cgmon_Lifesign,{"lng":15.0565198,"alt":599.0,"time":"2019-09-10T12:10:59Z","error":14.0,"lat":47.3821768},N/A,N/A,N/A
2019-09-10T12:10:47.000Z,CO 5050630,Lifesign,cgmon_Lifesign,{"lng":15.0587754,"alt":596.0,"time":"2019-09-10T12:10:55Z","error":14.0,"lat":47.3820239},N/A,N/A,N/A
2019-09-10T12:10:44.000Z,CO 5051920,Location updated,c8y_LocationUpdate,{"lng":15.0575633,"alt":619.0,"time":"2019-09-10T12:10:44Z","error":13.0,"lat":47.3821983},N/A,N/A,N/A
2019-09-10T12:10:41.000Z,CO 5051088,Location updated,c8y_LocationUpdate,{"lng":16.6432683,"alt":161.0,"time":"2019-09-10T12:10:41Z","error":8.0,"lat":48.3200659},N/A,N/A,N/A
2019-09-10T12:10:41.000Z,CO 5050020,Location updated,c8y_LocationUpdate,{"lng":15.9287275,"alt":193.0,"time":"2019-09-10T12:10:41Z","error":8.0,"lat":48.3246395},N/A,N/A,N/A
2019-09-10T12:10:40.000Z,CO 5052681,Location updated,c8y_LocationUpdate,{"lng":16.4388427,"alt":173.0,"time":"2019-09-10T12:10:40Z","error":8.0,"lat":48.1359584},N/A,N/A,N/A
2019-09-10T12:10:36.000Z,CO 5051921,Location updated,c8y_LocationUpdate,{"lng":15.0581282,"alt":606.0,"time":"2019-09-10T12:10:36Z","error":6.0,"lat":47.3817808},N/A,N/A,N/A
2019-09-10T12:10:35.000Z,CO 5051406,Location updated,c8y_LocationUpdate,{"lng":19.0824957,"alt":108.0,"time":"2019-09-10T12:10:35Z","error":7.0,"lat":47.4680908},N/A,N/A,N/A
2019-09-10T12:10:33.000Z,CO 5052676,Location updated,c8y_LocationUpdate,{"lng":16.4368017,"alt":166.0,"time":"2019-09-10T12:10:33Z","error":7.0,"lat":48.1376442},N/A,N/A,N/A
2019-09-10T12:10:33.000Z,CO 5051767,Location updated,c8y_LocationUpdate,{"lng":14.3252332,"alt":266.0,"time":"2019-09-10T12:10:33Z","error":6.0,"lat":48.2598268},N/A,N/A,N/A
2019-09-10T12:10:32.000Z,CO 5050710,Location updated,c8y_LocationUpdate,{"lng":16.4767327,"alt":164.0,"time":"2019-09-10T12:10:32Z","error":5.0,"lat":48.2780685},N/A,N/A,N/A
2019-09-10T12:10:32.000Z,CO 5050565,Location updated,c8y_LocationUpdate,{"lng":15.0918659,"alt":544.0,"time":"2019-09-10T12:10:32Z","error":12.0,"lat":47.3648989},N/A,N/A,N/A
2019-09-10T12:10:31.000Z,CO 5051820,Location updated,c8y_LocationUpdate,{"lng":13.3525861,"alt":296.0,"time":"2019-09-10T12:10:31Z","error":12.0,"lat":48.5992175},N/A,N/A,N/A
2019-09-10T12:10:25.000Z,CO 5051464,Location updated,c8y_LocationUpdate,{"lng":14.3240624,"alt":271.0,"time":"2019-09-10T12:10:25Z","error":12.0,"lat":48.2607067},N/A,N/A,N/A
2019-09-10T12:10:22.000Z,CO 5050655,Lifesign,cgmon_Lifesign,{"lng":16.4315322,"alt":190.0,"time":"2019-09-10T12:01:19Z","error":13.0,"lat":48.1431609},N/A,N/A,N/A
2019-09-10T12:10:20.000Z,CO 5050581,Location updated,c8y_LocationUpdate,{"lng":13.045159,"alt":422.0,"time":"2019-09-10T12:10:20Z","error":11.0,"lat":47.8110246},N/A,N/A,N/A
2019-09-10T12:10:18.000Z,CO 5051496,Location updated,c8y_LocationUpdate,{"lng":14.3246911,"alt":271.0,"time":"2019-09-10T12:10:18Z","error":7.0,"lat":48.2602569},N/A,N/A,N/A
2019-09-10T12:10:17.000Z,CO 5051111,Location updated,c8y_LocationUpdate,{"lng":12.9975553,"alt":398.0,"time":"2019-09-10T12:10:17Z","error":11.0,"lat":47.8261238},N/A,N/A,N/A
2019-09-10T12:10:11.000Z,CO 5052218,Location updated,c8y_LocationUpdate,{"lng":14.3197285,"alt":262.0,"time":"2019-09-10T12:10:11Z","error":9.0,"lat":48.2688562},N/A,N/A,N/A
2019-09-10T12:10:11.000Z,CO 5052218,Location updated,c8y_LocationUpdate,{"lng":14.3197285,"alt":262.0,"time":"2019-09-10T12:10:11Z","error":9.0,"lat":48.2688562},N/A,N/A,N/A
2019-09-10T12:10:10.000Z,CO 5050889,Location updated,c8y_LocationUpdate,{"lng":15.2681143,"alt":526.0,"time":"2019-09-10T12:10:10Z","error":6.0,"lat":48.7494337},N/A,N/A,N/A
2019-09-10T12:10:06.000Z,CO 5050941,Location updated,c8y_LocationUpdate,{"lng":14.3259313,"alt":254.0,"time":"2019-09-10T12:10:06Z","error":12.0,"lat":48.2594256},N/A,N/A,N/A
2019-09-10T12:10:02.000Z,CO 5052698,Location updated,c8y_LocationUpdate,{"lng":16.4387847,"alt":155.0,"time":"2019-09-10T12:10:02Z","error":12.0,"lat":48.1361544},N/A,N/A,N/A
2019-09-10T12:09:58.000Z,CO 5052994,Location updated,c8y_LocationUpdate,{"lng":12.9975201,"alt":413.0,"time":"2019-09-10T12:09:58Z","error":7.0,"lat":47.8258582},N/A,N/A,N/A
2019-09-10T12:09:58.000Z,CO 5052994,Location updated,c8y_LocationUpdate,{"lng":12.9975201,"alt":413.0,"time":"2019-09-10T12:09:58Z","error":7.0,"lat":47.8258582},N/A,N/A,N/A
2019-09-10T12:09:53.000Z,CO 5050172,Location updated,c8y_LocationUpdate,{"lng":12.5073911,"alt":413.0,"time":"2019-09-10T12:09:53Z","error":6.0,"lat":48.2486859},N/A,N/A,N/A
2019-09-10T12:09:46.000Z,CO 5050036,Location updated,c8y_LocationUpdate,{"lng":15.5402195,"alt":546.0,"time":"2019-09-10T12:09:46Z","error":10.0,"lat":48.7482861},N/A,N/A,N/A
2019-09-10T12:09:42.000Z,CO 5051360,Location updated,c8y_LocationUpdate,{"lng":15.5412234,"alt":546.0,"time":"2019-09-10T12:09:42Z","error":14.0,"lat":48.7482963},N/A,N/A,N/A
2019-09-10T12:09:41.000Z,CO 5052254,Lifesign,cgmon_Lifesign,{"lng":14.1636504,"alt":497.0,"time":"2019-09-10T12:06:33Z","error":3.0,"lat":47.8020297},N/A,N/A,N/A
2019-09-10T12:09:36.000Z,CO 5051886,Location updated,c8y_LocationUpdate,{"lng":14.0586228,"alt":317.0,"time":"2019-09-10T12:09:36Z","error":4.0,"lat":48.1806919},N/A,N/A,N/A
2019-09-10T12:09:36.000Z,CO 5052270,Lifesign,cgmon_Lifesign,{"lng":14.1637559,"alt":497.0,"time":"2019-09-10T12:06:33Z","error":13.0,"lat":47.8015199},N/A,N/A,N/A
2019-09-10T12:09:35.000Z,CO 5050625,Location updated,c8y_LocationUpdate,{"lng":15.0918728,"alt":551.0,"time":"2019-09-10T12:09:35Z","error":14.0,"lat":47.3645485},N/A,N/A,N/A
2019-09-10T12:09:35.000Z,CO 5052165,Location updated,c8y_LocationUpdate,{"lng":13.8262713,"alt":535.0,"time":"2019-09-10T12:09:35Z","error":14.0,"lat":46.5696408},N/A,N/A,N/A
2019-09-10T12:09:32.000Z,CO 5051569,Location updated,c8y_LocationUpdate,{"lng":15.0962545,"alt":251.0,"time":"2019-09-10T12:09:32Z","error":9.0,"lat":48.1569883},N/A,N/A,N/A
2019-09-10T12:09:29.000Z,CO 5051886,Lifesign,cgmon_Lifesign,{"lng":14.0586228,"alt":317.0,"time":"2019-09-10T12:09:36Z","error":4.0,"lat":48.1806919},N/A,N/A,N/A
2019-09-10T12:09:26.000Z,CO 5050079,Location updated,c8y_LocationUpdate,{"lng":14.3260754,"alt":273.0,"time":"2019-09-10T12:09:26Z","error":12.0,"lat":48.259309},N/A,N/A,N/A
2019-09-10T12:09:24.000Z,CO 5051608,Lifesign,cgmon_Lifesign,{"lng":13.0620331,"alt":443.0,"time":"2019-09-10T12:01:33Z","error":4.0,"lat":47.8183534},N/A,N/A,N/A
2019-09-10T12:09:22.000Z,CO 5050636,Location updated,c8y_LocationUpdate,{"lng":15.7496359,"alt":214.0,"time":"2019-09-10T12:09:22Z","error":10.0,"lat":48.3474868},N/A,N/A,N/A
2019-09-10T12:09:13.000Z,CO 5051374,Lifesign,cgmon_Lifesign,{"lng":16.2192937,"alt":290.0,"time":"2019-09-10T12:00:44Z","error":11.0,"lat":47.7971662},N/A,N/A,N/A
2019-09-10T12:09:13.000Z,CO 5050449,Lifesign,cgmon_Lifesign,{"lng":14.5795362,"alt":1.0,"time":"2019-09-10T11:58:43Z","error":5.0,"lat":53.4248321},N/A,N/A,N/A
2019-09-10T12:09:09.000Z,CO 5052285,Location updated,c8y_LocationUpdate,{"lng":14.3242807,"alt":279.0,"time":"2019-09-10T12:09:09Z","error":11.0,"lat":48.2603765},{"cgmon_TrackVersion":1,"cgmon_AccumulatedTrackLength":22.6966869,"cgmon_PrivateMappingData":"{"MappedCoords":[{"OriginalCoords":{"Longitude":14.3242807,"Latitude":48.2603765},"MappedCoords":{"Longitude":14.324294807614198,"Latitude":48.260394023504993},"Distance":2.05000634,"MappedObject":380848,"Source":352093,"Target":355952,"Length":0.5924257},{"OriginalCoords":{"Longitude":14.3242807,"Latitude":48.2603765},"MappedCoords":{"Longitude":14.324331226025482,"Latitude":48.260439145193047},"Distance":7.32469217,"MappedObject":384713,"Source":355935,"Target":355945,"Length":0.7556776},{"OriginalCoords":{"Longitude":14.3242807,"Latitude":48.2603765},"MappedCoords":{"Longitude":14.324228797434349,"Latitude":48.260311767396061},"Distance":7.55397675,"MappedObject":304419,"Source":278400,"Target":278401,"Length":0.2397567}],"LastCoordUsed":{"OriginalCoords":{"Longitude":14.3242807,"Latitude":48.2603765},"MappedCoords":{"Longitude":14.324228797434349,"Latitude":48.260311767396061},"Distance":7.55397675,"MappedObject":304419,"Source":278400,"Target":278401,"Length":0.2397567}}","cgmon_TrackSegmentIDs":[384707,384708,555759,555760,380849,384723,304419],"cgmon_TrackLength":0.7655877,"time":"2019-09-10T12:13:48.8192688+00:00","cgmon_MappedPoint":{"lng":14.324294807614198,"offset":2.05000634,"lat":48.26039402350499}},N/A,N/A
2019-09-10T12:09:09.000Z,CO 5052731,Lifesign,cgmon_Lifesign,{"lng":14.3181143,"alt":252.0,"time":"2019-09-09T11:59:34Z","error":12.0,"lat":48.2771772},N/A,N/A,N/A
2019-09-10T12:09:08.000Z,CO 5051642,Lifesign,cgmon_Lifesign,{"lng":14.163689,"alt":477.0,"time":"2019-09-10T12:06:20Z","error":10.0,"lat":47.8022479},N/A,N/A,N/A
2019-09-10T12:09:07.000Z,CO 5052267,Lifesign,cgmon_Lifesign,{"lng":14.1631847,"alt":471.0,"time":"2019-09-10T12:06:42Z","error":11.0,"lat":47.80162},N/A,N/A,N/A
2019-09-10T12:09:07.000Z,CO 5051478,Lifesign,cgmon_Lifesign,{"lng":14.1641262,"alt":497.0,"time":"2019-09-10T12:06:15Z","error":7.0,"lat":47.8003779},N/A,N/A,N/A
2019-09-10T12:09:01.000Z,CO 5052393,Lifesign,cgmon_Lifesign,{"lng":13.0494004,"alt":428.0,"time":"2019-09-10T12:03:39Z","error":11.0,"lat":47.8189722},N/A,N/A,N/A
2019-09-10T12:08:57.000Z,CO 5051020,Lifesign,cgmon_Lifesign,{"lng":16.2196522,"alt":287.0,"time":"2019-09-10T12:01:08Z","error":4.0,"lat":47.7972928},N/A,N/A,N/A
2019-09-10T12:08:51.000Z,CO 5050301,Location updated,c8y_LocationUpdate,{"lng":2.9992244,"alt":-2.0,"time":"2019-09-10T12:08:51Z","error":17.0,"lat":43.1661339},N/A,N/A,N/A
2019-09-10T12:08:50.000Z,CO 5051365,Location updated,c8y_LocationUpdate,{"lng":22.169639,"alt":60.0,"time":"2019-09-10T12:08:50Z","error":14.0,"lat":48.3902318},{"cgmon_TrackVersion":1,"cgmon_AccumulatedTrackLength":2148.951632500001,"cgmon_PrivateMappingData":"{"MappedCoords":[{"OriginalCoords":{"Longitude":22.169639,"Latitude":48.3902318},"MappedCoords":{"Longitude":22.169632788779996,"Latitude":48.390239719391744},"Distance":0.92519023,"MappedObject":1387861,"Source":1210580,"Target":1236897,"Length":0.8962704},{"OriginalCoords":{"Longitude":22.169639,"Latitude":48.3902318},"MappedCoords":{"Longitude":22.169663000344876,"Latitude":48.390201166002555},"Distance":3.55078237,"MappedObject":1388168,"Source":1236896,"Target":1210581,"Length":0.8169932},{"OriginalCoords":{"Longitude":22.169639,"Latitude":48.3902318},"MappedCoords":{"Longitude":22.169597136212552,"Latitude":48.390285304388065},"Distance":6.21585648,"MappedObject":1388154,"Source":1236890,"Target":1236891,"Length":0.9307675}],"LastCoordUsed":{"OriginalCoords":{"Longitude":22.169639,"Latitude":48.3902318},"MappedCoords":{"Longitude":22.169663000344876,"Latitude":48.390201166002555},"Distance":3.55078237,"MappedObject":1388168,"Source":1236896,"Target":1210581,"Length":0.8169932}}","cgmon_TrackSegmentIDs":[1356958,1356952,1356950,1387850,1387851,1387852,1387853,1387860,1357049,1388168],"cgmon_TrackLength":2.4847659999999996,"time":"2019-09-10T12:11:51.8831079+00:00","cgmon_MappedPoint":{"lng":22.169632788779996,"offset":0.92519023,"lat":48.390239719391744}},N/A,N/A
2019-09-10T12:08:48.000Z,CO 5050995,Location updated,c8y_LocationUpdate,{"lng":22.1701667,"alt":99.0,"time":"2019-09-10T12:08:48Z","error":11.0,"lat":48.3905254},{"cgmon_TrackVersion":1,"cgmon_AccumulatedTrackLength":3214.932654,"cgmon_PrivateMappingData":"{"MappedCoords":[{"OriginalCoords":{"Longitude":22.1701667,"Latitude":48.3905254},"MappedCoords":{"Longitude":22.170165780943215,"Latitude":48.390526570555245},"Distance":0.14129331,"MappedObject":1357050,"Source":1236896,"Target":1210581,"Length":0.8176738},{"OriginalCoords":{"Longitude":22.1701667,"Latitude":48.3905254},"MappedCoords":{"Longitude":22.170200136954,"Latitude":48.3904827851519},"Distance":4.9398585,"MappedObject":1388164,"Source":1210575,"Target":1236894,"Length":0.7718482},{"OriginalCoords":{"Longitude":22.1701667,"Latitude":48.3905254},"MappedCoords":{"Longitude":22.17013252678472,"Latitude":48.390569018631112},"Distance":5.06730103,"MappedObject":1388168,"Source":1236896,"Target":1210581,"Length":0.8169932}],"LastCoordUsed":{"OriginalCoords":{"Longitude":22.1701667,"Latitude":48.3905254},"MappedCoords":{"Longitude":22.17013252678472,"Latitude":48.390569018631112},"Distance":5.06730103,"MappedObject":1388168,"Source":1236896,"Target":1210581,"Length":0.8169932}}","cgmon_TrackSegmentIDs":[1356958,1356952,1356950,1387850,1387851,1387852,1387853,1387860,1357049,1388168],"cgmon_TrackLength":2.4847659999999996,"time":"2019-09-10T12:11:03.6011894+00:00","cgmon_MappedPoint":{"lng":22.170165780943215,"offset":0.14129331,"lat":48.390526570555245}},N/A,N/A
2019-09-10T12:08:43.000Z,CO 5051131,Location updated,c8y_LocationUpdate,{"lng":11.4933341,"alt":581.0,"time":"2019-09-10T

Tre*_*ney 10

修复文件:

  • 不幸的是,该文件很难读取,因为每一行都包含一个dict,其key-value对之间用逗号分隔。
  • 解决该问题的最简单方法是将每个dict、 from,到之外的分隔符更改为|
  • 以下代码将读取现有文件
    • 它假设第一行是标题,使用.replace(',', '|')
    • 剩余的行将使用正则表达式来替换,外部{}
    • 每行都将写入一个新文件。

代码:

数据:

Time,location,labelA,labelB
2019-09-10,{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8},{"ack":123,"bar":456},{"foo":123,"bar":456}
2019-09-10,{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8},nan,nan
2019-09-10,{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8},{"ack":123,"bar":456},{"foo":123,"bar":456}
2019-09-10,{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8},nan,nan
2019-09-10,{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8},{"ack":123,"bar":456},{"foo":123,"bar":456}
2019-09-10,{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8},nan,nan
2019-09-10,{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8},{"ack":123,"bar":456},{"foo":123,"bar":456}
2019-09-10,{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8},nan,nan
Run Code Online (Sandbox Code Playgroud)

文件修复:

import re
from pathlib import Path

p = Path.cwd() / 'test.csv'
p2 = Path.cwd() / 'test2.csv'

with p.open('r') as f:
    with p2.open('w') as f2:
        for cnt, line in enumerate(f):
            if cnt == 0:
                line = line.replace(',', '|')
            else:
                line = re.sub(r',(?=(((?!\}).)*\{)|[^\{\}]*$)', '|', line)
            f2.write(line)
Run Code Online (Sandbox Code Playgroud)

新文件:

Time|location|labelA|labelB
2019-09-10|{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8}|{"ack":123,"bar":456}|{"foo":123,"bar":456}
2019-09-10|{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8}|nan|nan
2019-09-10|{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8}|{"ack":123,"bar":456}|{"foo":123,"bar":456}
2019-09-10|{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8}|nan|nan
2019-09-10|{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8}|{"ack":123,"bar":456}|{"foo":123,"bar":456}
2019-09-10|{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8}|nan|nan
2019-09-10|{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8}|{"ack":123,"bar":456}|{"foo":123,"bar":456}
2019-09-10|{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8}|nan|nan
Run Code Online (Sandbox Code Playgroud)

解析新文件:

  • 现在,列将被正确分隔.read_csv
  • 但是,locationlabelAlabelB列是str
    • 用于ast.literal_eval转换为dict
    • literal_eval不起作用nan,所以替换nan{}
  • for col in df.columns[1:]:循环遍历每一列并且:
    • try-except将捕获任何未正确形成的列
    • 将它们从 转换strdict
    • 将其分成keys
    • concats现有数据框的列
    • drops旧专栏
import pandas as pd
from ast import literal_eval

df = pd.read_csv('test2.csv', sep='|')
print(df)

       Time                                                             location                 labelA                 labelB
 2019-09-10  {"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8}  {"ack":123,"bar":456}  {"foo":123,"bar":456}
 2019-09-10  {"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8}                    NaN                    NaN
 2019-09-10  {"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8}  {"ack":123,"bar":456}  {"foo":123,"bar":456}
 2019-09-10  {"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8}                    NaN                    NaN
 2019-09-10  {"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8}  {"ack":123,"bar":456}  {"foo":123,"bar":456}
 2019-09-10  {"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8}                    NaN                    NaN
 2019-09-10  {"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8}  {"ack":123,"bar":456}  {"foo":123,"bar":456}
 2019-09-10  {"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8}                    NaN                    NaN


for col in df.columns[1:]:
    try:
        df[col].fillna('{}', inplace=True)
        df[col] = df[col].apply(literal_eval)
        df = pd.concat([df, df[col].apply(pd.Series)], axis=1)
        df.drop(columns=[col], inplace=True)
    except (SyntaxError, ValueError) as e:
        print(f'{col}: {e}')


print(df)

       Time   lng    alt        time  error   lat    ack    bar    foo    bar
 2019-09-10  12.9  413.0  2019-09-10    7.0  17.8  123.0  456.0  123.0  456.0
 2019-09-10  12.9  413.0  2019-09-10    7.0  17.8    NaN    NaN    NaN    NaN
 2019-09-10  12.9  413.0  2019-09-10    7.0  17.8  123.0  456.0  123.0  456.0
 2019-09-10  12.9  413.0  2019-09-10    7.0  17.8    NaN    NaN    NaN    NaN
 2019-09-10  12.9  413.0  2019-09-10    7.0  17.8  123.0  456.0  123.0  456.0
 2019-09-10  12.9  413.0  2019-09-10    7.0  17.8    NaN    NaN    NaN    NaN
 2019-09-10  12.9  413.0  2019-09-10    7.0  17.8  123.0  456.0  123.0  456.0
 2019-09-10  12.9  413.0  2019-09-10    7.0  17.8    NaN    NaN    NaN    NaN
Run Code Online (Sandbox Code Playgroud)

字面评估注释:

  • Pandas 有多种形式导入数据的方法,例如dictlist
  • 但是,read_csv不能dict很好地解释容器(例如 ),它们被解释为字符串,除非您指定converters参数(pd.read_csv('test3.csv', sep='|', converters={'a': literal_eval}).
  • literal_eval不适用于由容器 和strings或组成的列NaN,除非string仅为数字(例如“8654”)
  • 上面的部分代码,先全部替换nan成a{}这样literal_eval就不会出错了。
  • 给出以下混合列​​示例:
column_a
{"ack":123,"bar":456}
some string
{"ack":123,"bar":456}
some string
{"ack":123,"bar":456}
some string
Run Code Online (Sandbox Code Playgroud)
  • literal_eval会扔ValueError: malformed node or string:
  • 两个解决方案之间的差异在于另一个解决方案修复了一列,而该解决方案的实现方式是修复所有列并消除仅读取前 100 行的必要性。
  • 您可以放弃循环来修复所有列,而只修复该location列(如果是 all )dicts。使用以下代码:
df['location'] = df['location'].apply(literal_eval)
df = pd.concat([df, df['location'].apply(pd.Series)], axis=1)
Run Code Online (Sandbox Code Playgroud)

注意实际数据test100v1.csv

  • 柱形location不正确
    • '{"lng":12.9975201,alt:413.0,"time:""2019-09-10T12:09:58Z""",error:7.0,lat:47.8258582}'
  • 这是预期的形式:
    • '{"lng":12.9975201,"alt":413.0,"time":"2019-09-10T12:09:58Z","error":7.0,"lat":47.8258582}'

修复location列:

  • location列是Position真实数据
def fix_pos(x):
    word_dict = {'alt': '"alt"',
                 '"time:"': '"time":',
                 '"",error:': ',"error":',
                 'lat': '"lat"'}
    for k, v in word_dict.items():
        x = x.replace(k, v)
    return x

df.Position = df.Position.apply(lambda x: fix_pos(x))
Run Code Online (Sandbox Code Playgroud)
  • 对真实数据文件使用以下循环。
  • Zeit, device, Text&Type不需要处理
  • Positionindex4 点。
for col in df.columns[4:]:
    try:
        df[col].fillna('{}', inplace=True)
        df[col] = df[col].apply(literal_eval)
        df = pd.concat([df, df[col].apply(pd.Series)], axis=1)
        df.drop(columns=[col], inplace=True)
    except (SyntaxError, ValueError) as e:
        print(f'{col}: {e}')
Run Code Online (Sandbox Code Playgroud)
  • 适用于所有列的循环literal_eval已更新为try-except
    • 如果有exception名称column和错误消息将被打印出来。
    • 真实数据共有 64 列,其中大部分是Furchtbar

错误:

  • 这些是所提供文件中所有列的错误csv
device: unexpected EOF while parsing (<unknown>, line 1)
Text: malformed node or string: <_ast.Name object at 0x00000203B8473C08>
Typ: malformed node or string: <_ast.Name object at 0x00000203BE217E08>
Data: unexpected EOF while parsing (<unknown>, line 1)
Data1: invalid syntax (<unknown>, line 1)
Data2: invalid syntax (<unknown>, line 1)
Unnamed: 8: invalid syntax (<unknown>, line 1)
Unnamed: 9: unexpected EOF while parsing (<unknown>, line 1)
Unnamed: 10: invalid syntax (<unknown>, line 1)
Unnamed: 11: unexpected EOF while parsing (<unknown>, line 1)
Unnamed: 12: invalid syntax (<unknown>, line 1)
Unnamed: 13: invalid syntax (<unknown>, line 1)
Unnamed: 14: invalid syntax (<unknown>, line 1)
Unnamed: 15: invalid syntax (<unknown>, line 1)
Unnamed: 16: invalid syntax (<unknown>, line 1)
Unnamed: 17: invalid syntax (<unknown>, line 1)
Unnamed: 18: invalid syntax (<unknown>, line 1)
Unnamed: 19: invalid syntax (<unknown>, line 1)
Unnamed: 20: invalid syntax (<unknown>, line 1)
Unnamed: 21: unexpected EOF while parsing (<unknown>, line 1)
Unnamed: 22: invalid syntax (<unknown>, line 1)
Unnamed: 23: invalid syntax (<unknown>, line 1)
Unnamed: 24: invalid syntax (<unknown>, line 1)
Unnamed: 25: invalid syntax (<unknown>, line 1)
Unnamed: 26: invalid syntax (<unknown>, line 1)
Unnamed: 27: invalid syntax (<unknown>, line 1)
Run Code Online (Sandbox Code Playgroud)


pau*_*ult 8

这里的问题是json字符串中的逗号被视为分隔符。您应该修改输入数据(如果您无法直接访问该文件,则始终可以使用openfirst 将内容读入字符串列表)。

以下是您可以尝试的一些修改选项:

选项 1:json用单引号引用字符串

使用单引号(或数据中不会出现的其他字符)作为字符串的引号字符json

>> cat data.csv
Time,location,labelA,labelB
2019-09-10,'{"lng":12.9,"alt":413.0,"time":"2019-09-10","error":7.0,"lat":17.8}',nan,nan
Run Code Online (Sandbox Code Playgroud)

然后quotechar="'"在读取数据的时候使用:

import pandas as pd
import json

df=pd.read_csv('data.csv', converters={'location':json.loads}, header=0, quotechar="'")
Run Code Online (Sandbox Code Playgroud)

选项 2:json使用双引号和转义来引用字符串

如果不能使用单引号,实际上可以使用双引号作为quotechar,只要转义字符串内的引号即可json

import pandas as pd
import json

df=pd.read_csv('data.csv', converters={'location':json.loads}, header=0, quotechar="'")
Run Code Online (Sandbox Code Playgroud)

请注意,这现在与您链接的问题的格式匹配。

df=pd.read_csv('data.csv', converters={'location':json.loads}, header=0, quotechar='"')
Run Code Online (Sandbox Code Playgroud)

选项 3:更改分隔符

使用不同的字符,例如|作为分隔符

>> cat data.csv
Time,location,labelA,labelB
2019-09-10,"{""lng"":12.9,""alt"":413.0,""time"":""2019-09-10"",""error"":7.0,""lat"":17.8}",nan,nan
Run Code Online (Sandbox Code Playgroud)

现在使用sep参数指定新的分隔符:

df=pd.read_csv('data.csv', converters={'location':json.loads}, header=0, sep="|")
Run Code Online (Sandbox Code Playgroud)

这些方法中的每一个都会产生相同的输出:

print(df)
#   Time        location                                            labelA  labelB
#0  2019-09-10  {u'lat': 17.8, u'lng': 12.9, u'error': 7.0, u'...   NaN     NaN
Run Code Online (Sandbox Code Playgroud)

完成后,您可以使用展平 Pandas DataFrame 中的 JSON 列location中描述的方法之一来扩展该列

new_df = df.join(pd.io.json.json_normalize(df["location"])).drop(["location"], axis=1)
print(new_df)
#   Time        labelA  labelB  alt    error  lat   lng   time
#0  2019-09-10  NaN     NaN     413.0  7.0    17.8  12.9  2019-09-10
Run Code Online (Sandbox Code Playgroud)