我正在使用Pandas来阅读一堆CSV.将选项json传递给dtype参数以告诉pandas将哪些列读取为字符串而不是默认值:
dtype_dic= { 'service_id':str, 'end_date':str, ... }
feedArray = pd.read_csv(feedfile , dtype = dtype_dic)
Run Code Online (Sandbox Code Playgroud)
在我的场景中,除了一些特定的列之外的所有列都将被读作字符串.因此dtype_dic
,我不想将几个列定义为str ,而是将我选择的几个列设置为int或float.有没有办法做到这一点?
它是循环遍历不同列的各种CSV的循环,因此在将整个csv读取为string(dtype=str
)之后进行直接列转换并不容易,因为我不会立即知道csv具有哪些列.(我宁愿花费精力来定义dtype json中的所有列!)
编辑:但是,如果有一种方法可以处理要转换为数字的列名列表而不会出错,如果该列不存在于该csv中,那么是的,那将是一个有效的解决方案,如果没有别的方法可以做这在csv阅读阶段本身.
注意:这听起来像一个先前提出的问题,但那里的答案走了一条非常不同的路径(bool相关),这不适用于这个问题.请不要标记为重复!
在这里发布两个问答,因为还没有关于此类错误的帖子,并且网络搜索上的其他链接导致了未解决的 gihub 问题。这是我刚刚在虚拟 python 环境中更新一个包:
\n\n(py36) $ pip install tornado -U\nCollecting tornado\n Downloading https://files.pythonhosted.org/packages/03/3f/5f89d99fca3c0100c8cede4f53f660b126d39e0d6a1e943e95cc3ed386fb/tornado-6.0.2.tar.gz (481kB)\n 100% |\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88| 491kB 476kB/s \nBuilding wheels for collected packages: tornado\n Building wheel for tornado (setup.py) ... done\n Stored in directory: /home/nikhil/.cache/pip/wheels/61/7e/7a/5e02e60dc329aef32ecf70e0425319ee7e2198c3a7cf98b4a2\nSuccessfully built tornado\nInstalling collected packages: tornado\n Found existing installation: tornado 5.1.1\n Uninstalling tornado-5.1.1:\n Successfully uninstalled tornado-5.1.1\nCould not install packages due to an EnvironmentError: [Errno 39] Directory not empty: '/mnt/STUFF/py36/lib/python3.6/site-packages/~ornado'\n
Run Code Online (Sandbox Code Playgroud)\n\n那么如何解决这个问题呢?
\n我有一个应用程序,其中标记/特征从多个源加载到图层/图层组(正确的术语?),并且它们是动态加载的(基于某些属性feature.properties
和其他条件).我希望能够在侧面板上告知当前加载到显示层中的标记数量.只给出图层的变量/标识符,如何找到加载到其中的标记/特征的数量?
var layer1= L.layerGroup();
layerControl.addOverlay(layer1, 'Layer 1');
... // loading stuff into this layer from different sources
console.log(layer1.length); // doesn't work, gives "undefined"
console.log(JSON.stringify(layer1)); // doesn't work, "TypeError: cyclic object value"
Run Code Online (Sandbox Code Playgroud)
..所以我猜图层不能像JSON对象那样对待.
我找到了一个相关的问题,但那里的答案只解决了从一个geoJson源加载的标记,并建议一个简单counter++
的onEachFeature
.我在我的应用程序中使用了很多层,并且希望不必为每一个都添加单独的计数器变量,而只是想使用层的变量/标识符来计算.如果我们可以将一个图层添加到地图或群组中,那么我们应该能够计算其中的内容,对吧?
to_sql()
pandas 中的函数现在正在生成 SADeprecationWarning。
df.to_sql(name=tablename, con=c, if_exists='append', index=False )
[..]/lib/python3.8/site-packages/pandas/io/sql.py:1430: SADeprecationWarning:The Connection.run_callable() method is deprecated and will be removed in a future release. Use a context manager instead. (deprecated since: 1.4)
Run Code Online (Sandbox Code Playgroud)
df.read_sql()
在运行 sql select 语句时,即使使用命令,我也得到了这个。将其更改为df.read_sql_query()
环绕的原始内容,摆脱了它。我怀疑那里会有一些联系。
所以,问题是,如何将数据帧表写入 SQL 而不会在未来版本中被弃用?“使用上下文管理器”是什么意思,我该如何实现?
版本:
熊猫:1.1.5 | SQLAlchemy: 1.4.0 | pyodbc: 4.0.30 | Python:3.8.0
使用 mssql 数据库。
操作系统:Linux Mint Xfce,18.04。使用python虚拟环境。
如果重要,连接创建如下:
conn_str = r'mssql+pyodbc:///?odbc_connect={}'.format(dbString).strip()
sqlEngine = sqlalchemy.create_engine(conn_str,echo=False, pool_recycle=3600)
c = sqlEngine.connect()
Run Code Online (Sandbox Code Playgroud)
而在db操作之后,
c.close()
Run Code Online (Sandbox Code Playgroud)
这样做可以让主连接 sqlEngine 在 api 调用之间保持“活动”状态,并让我使用池连接而不必重新连接。
我有一个名为df的熊猫DataFrame。随着df.dtypes
我可以在屏幕上打印:
arrival_time object
departure_time object
drop_off_type int64
extra object
pickup_type int64
stop_headsign object
stop_id object
stop_sequence int64
trip_id object
dtype: object
Run Code Online (Sandbox Code Playgroud)
我想保存此信息,以便可以将其与其他数据进行比较,在其他地方进行类型转换,等等。我想将其保存到本地文件中,然后在其他程序无法恢复的地方将其恢复。但是我不知道怎么做。显示各种转换的结果。
df.dtypes.to_dict()
{'arrival_time': dtype('O'),
'departure_time': dtype('O'),
'drop_off_type': dtype('int64'),
'extra': dtype('O'),
'pickup_type': dtype('int64'),
'stop_headsign': dtype('O'),
'stop_id': dtype('O'),
'stop_sequence': dtype('int64'),
'trip_id': dtype('O')}
----
df.dtypes.to_json()
'{"arrival_time":{"alignment":4,"byteorder":"|","descr":[["","|O"]],"flags":63,"isalignedstruct":false,"isnative":true,"kind":"O","name":"object","ndim":0,"num":17,"str":"|O"},"departure_time":{"alignment":4,"byteorder":"|","descr":[["","|O"]],"flags":63,"isalignedstruct":false,"isnative":true,"kind":"O","name":"object","ndim":0,"num":17,"str":"|O"},"drop_off_type":{"alignment":4,"byteorder":"=","descr":[["","<i8"]],"flags":0,"isalignedstruct":false,"isnative":true,"kind":"i","name":"int64","ndim":0,"num":9,"str":"<i8"},"extra":{"alignment":4,"byteorder":"|","descr":[["","|O"]],"flags":63,"isalignedstruct":false,"isnative":true,"kind":"O","name":"object","ndim":0,"num":17,"str":"|O"},"pickup_type":{"alignment":4,"byteorder":"=","descr":[["","<i8"]],"flags":0,"isalignedstruct":false,"isnative":true,"kind":"i","name":"int64","ndim":0,"num":9,"str":"<i8"},"stop_headsign":{"alignment":4,"byteorder":"|","descr":[["","|O"]],"flags":63,"isalignedstruct":false,"isnative":true,"kind":"O","name":"object","ndim":0,"num":17,"str":"|O"},"stop_id":{"alignment":4,"byteorder":"|","descr":[["","|O"]],"flags":63,"isalignedstruct":false,"isnative":true,"kind":"O","name":"object","ndim":0,"num":17,"str":"|O"},"stop_sequence":{"alignment":4,"byteorder":"=","descr":[["","<i8"]],"flags":0,"isalignedstruct":false,"isnative":true,"kind":"i","name":"int64","ndim":0,"num":9,"str":"<i8"},"trip_id":{"alignment":4,"byteorder":"|","descr":[["","|O"]],"flags":63,"isalignedstruct":false,"isnative":true,"kind":"O","name":"object","ndim":0,"num":17,"str":"|O"}}'
----
json.dumps( df.dtypes.to_dict() )
...
TypeError: dtype('O') is not JSON serializable
----
list(xdf.dtypes)
[dtype('O'),
dtype('O'),
dtype('int64'),
dtype('O'),
dtype('int64'),
dtype('O'),
dtype('O'),
dtype('int64'),
dtype('O')]
Run Code Online (Sandbox Code Playgroud)
如何保存和导出/归档熊猫DataFrame的dtype信息?