我有一个Python pandas DataFrame rpt
:
rpt
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 47518 entries, ('000002', '20120331') to ('603366', '20091231')
Data columns:
STK_ID 47518 non-null values
STK_Name 47518 non-null values
RPT_Date 47518 non-null values
sales 47518 non-null values
Run Code Online (Sandbox Code Playgroud)
我可以过滤库存ID '600809'
如下的行:rpt[rpt['STK_ID'] == '600809']
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 25 entries, ('600809', '20120331') to ('600809', '20060331')
Data columns:
STK_ID 25 non-null values
STK_Name 25 non-null values
RPT_Date 25 non-null values
sales 25 non-null values
Run Code Online (Sandbox Code Playgroud)
我想把一些股票的所有行放在一起,例如['600809','600141','600329']
.这意味着我想要这样的语法:
stk_list = ['600809','600141','600329']
rst = rpt[rpt['STK_ID'] in stk_list] # …
Run Code Online (Sandbox Code Playgroud) 打电话的时候
df = pd.read_csv('somefile.csv')
Run Code Online (Sandbox Code Playgroud)
我明白了:
/Users/josh/anaconda/envs/py27/lib/python2.7/site-packages/pandas/io/parsers.py:1130:DtypeWarning:列(4,5,7,16)有混合类型.在导入时指定dtype选项或设置low_memory = False.
为什么该dtype
选项与此相关low_memory
,以及为什么会False
帮助解决此问题?
所以我试图使用unittest.mock在我的单元测试中模拟我的一些方法.我做:
from unittest.mock import MagicMock
f = open("data/static/mock_ffprobe_response")
subprocess.check_output = MagicMock(return_value=f.read())
f.close()
Run Code Online (Sandbox Code Playgroud)
但我得到:
ImportError: No module named mock
Run Code Online (Sandbox Code Playgroud)
我试过了:
pip install mock
Run Code Online (Sandbox Code Playgroud)
它仍然无法正常工作.
将函数应用于Pandas索引的最佳方法是什么DataFrame
?目前我正在使用这种冗长的方法:
pd.DataFrame({"Month": df.reset_index().Date.apply(foo)})
Run Code Online (Sandbox Code Playgroud)
其中Date
是索引foo
的名称,是我正在应用的函数的名称.
我有一个小数据帧,比如说这个:
Mass32 Mass44
12 0.576703 0.496159
13 0.576658 0.495832
14 0.576703 0.495398
15 0.576587 0.494786
16 0.576616 0.494473
...
Run Code Online (Sandbox Code Playgroud)
我想有一个滚动的列的意思Mass32
,所以我这样做:
x['Mass32s'] = pandas.rolling_mean(x.Mass32, 5).shift(-2)
Run Code Online (Sandbox Code Playgroud)
它的工作方式如同我有一个新的列Mass32s
,其中包含我希望它包含的内容,但我也收到警告消息:
尝试在DataFrame的切片副本上设置值.尝试使用.loc [row_indexer,col_indexer] = value
请参阅文档中的警告:http: //pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
我想知道是否有更好的方法来做到这一点,特别是避免收到此警告信息.
我有以下代码,通过seaborn创建一个表和一个barplot.
#Building a dataframe grouped by the # of Engagement Types
sales_type = sales.groupby('# of Engagement Types').sum()
#Calculating the % of people who bought the course by # engagement types
sales_type['% Sales per Participants'] = round(100*(sales_type['Sales'] / sales_type['Had an Engagement']), 2)
#Calculating the # of people who didn't have any engagements
sales_type.set_value(index=0, col='Had an Engagement', value=sales[sales['Had an Engagement']==0].count()['Sales'])
#Calculating the % of sales for those who didn't have any engagements
sales_type.set_value(index=0, col='% Sales per Participants',
value=round(100 * (sales_type.ix[0, 'Sales'] / …
Run Code Online (Sandbox Code Playgroud) 我正在迭代从Windows主机提取的许多导出的安全事件日志,示例数据框如下所示:
"MachineName","EventID","EntryType","Source","TimeGenerated","TimeWritten","UserName","Message"
"mycompname","5156","SuccessAudit","Microsoft-Windows-Security-Auditing","4/26/2017 10:47:41 AM","4/26/2017 10:47:41 AM",,"The Windows Filtering Platform has permitted a connection. Application Information: Process ID: 4 Application Name: System Network Information: Direction: %%14592 Source Address: 192.168.10.255 Source Port: 137 Destination Address: 192.168.10.238 Destination Port: 137 Protocol: 17 Filter Information: Filter Run-Time ID: 83695 Layer Name: %%14610 Layer Run-Time ID: 44"
"mycompname","4688","SuccessAudit","Microsoft-Windows-Security-Auditing","4/26/2014 10:47:03 AM","4/26/2014 10:47:03 AM",,"A new process has been created. Subject: Security ID: S-1-5-18 Account Name: mycompname$ Account Domain: mydomain Logon ID: 0x3e7 Process Information: New Process …
Run Code Online (Sandbox Code Playgroud) 我有这样的数据帧
d={}
d['z']=['Q8','Q8','Q7','Q9','Q9']
d['t']=['10:30','10:31','10:38','10:40','10:41']
d['qty']=[20,20,9,12,12]
Run Code Online (Sandbox Code Playgroud)
我想比较第一行和第二行
期望值是
qty t z valid
0 20 2015-06-05 10:30:00 Q8 False
1 20 2015-06-05 10:31:00 Q8 True
2 9 2015-06-05 10:38:00 Q7 False
3 12 2015-06-05 10:40:00 Q9 False
4 12 2015-06-05 10:41:00 Q9 True
Run Code Online (Sandbox Code Playgroud) 我有一个大约155,000行和12列的数据帧.如果我使用dataframe.to_csv将其导出到csv,则输出为11MB文件(即时生成).
但是,如果我使用to_sql方法导出到Microsoft SQL Server,则需要5到6分钟!没有列是文本:只有int,float,bool和日期.我见过ODBC驱动程序设置nvarchar(max)的情况,这会减慢数据传输速度,但这不是这种情况.
有关如何加快出口流程的任何建议?导出11 MB数据需要6分钟,这使得ODBC连接几乎无法使用.
谢谢!
我的代码是:
import pandas as pd
from sqlalchemy import create_engine, MetaData, Table, select
ServerName = "myserver"
Database = "mydatabase"
TableName = "mytable"
engine = create_engine('mssql+pyodbc://' + ServerName + '/' + Database)
conn = engine.connect()
metadata = MetaData(conn)
my_data_frame.to_sql(TableName,engine)
Run Code Online (Sandbox Code Playgroud) 我正在从github链接尝试一个简单的tensorflow演示代码.
我目前正在使用python版本3.5.2
Z:\downloads\tensorflow_demo-master\tensorflow_demo-master>py Python
3.5.2 (v3.5.2:4def2a2901a5, Jun 25 2016, 22:18:55) [MSC v.1900 64 bit (AMD64)] on win32<br> Type "help", "copyright", "credits" or "license" for more information.
Run Code Online (Sandbox Code Playgroud)
当我在命令行中尝试board.py时遇到了这个错误.我已经安装了运行它所需的所有依赖项.
def _read32(bytestream):
dt = numpy.dtype(numpy.uint32).newbyteorder('>')
return numpy.frombuffer(bytestream.read(4), dtype=dt)
def extract_images(filename):
"""Extract the images into a 4D uint8 numpy array [index, y, x, depth]."""
print('Extracting', filename)
with gzip.open(filename) as bytestream:
magic = _read32(bytestream)
if magic != 2051:
raise ValueError(
'Invalid magic number %d in MNIST image file: %s' %
(magic, filename))
num_images …
Run Code Online (Sandbox Code Playgroud) python ×10
pandas ×8
dataframe ×4
numpy ×2
python-3.x ×2
importerror ×1
indexing ×1
mnist ×1
parsing ×1
pyodbc ×1
python-2.7 ×1
seaborn ×1
sql ×1
sqlalchemy ×1
tensorflow ×1
time-series ×1