feccandid我想从以 H 或 S 作为第一个值的列中提取行:
cid amount date catcode feccandid
0 N00031317 1000 2010 B2000 H0FL19080
1 N00027464 5000 2009 B1000 H6IA01098
2 N00024875 1000 2009 A5200 S2IL08088
3 N00030957 2000 2010 J2200 S0TN04195
4 N00026591 1000 2009 F3300 S4KY06072
5 N00031317 1000 2010 B2000 P0FL19080
6 N00027464 5000 2009 B1000 P6IA01098
7 N00024875 1000 2009 A5200 S2IL08088
8 N00030957 2000 2010 J2200 H0TN04195
9 N00026591 1000 2009 F3300 H4KY06072
Run Code Online (Sandbox Code Playgroud)
我正在使用这段代码:
campaign_contributions.loc[campaign_contributions['feccandid'].astype(str).str.extractall(r'^(?:S|H)')]
Run Code Online (Sandbox Code Playgroud)
返回错误:
ValueError: pattern contains no capture groups …
我试图将value_count的输出分配给新的df.我的代码如下.
import pandas as pd
import glob
df = pd.concat((pd.read_csv(f, names=['date','bill_id','sponsor_id']) for f in glob.glob('/home/jayaramdas/anaconda3/df/s11?_s_b')))
column_list = ['date', 'bill_id']
df = df.set_index(column_list, drop = True)
df = df['sponsor_id'].value_counts()
df.columns=['sponsor', 'num_bills']
print (df)
Run Code Online (Sandbox Code Playgroud)
未为指定'sponsor','num_bills'的列标题分配值计数.我从print.head获得以下输出
1036 426
791 408
1332 401
1828 388
136 335
Name: sponsor_id, dtype: int64
Run Code Online (Sandbox Code Playgroud) 我有以下df:
tz.head()
state 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
0 AL 5.7 4.5 4.0 4.0 5.7 11.0 10.5 9.6 8.0 7.2 6.8 6.1
1 AK 7.5 6.9 6.6 6.3 6.7 7.7 7.9 7.6 7.1 6.9 6.9 6.5
2 AZ 5.0 4.7 4.2 3.9 6.2 9.9 10.4 9.5 8.3 7.7 6.8 6.1
3 AR 5.7 5.2 5.2 5.3 5.5 7.8 8.2 8.3 7.6 7.3 6.1 5.2
4 CA 6.2 5.4 4.9 5.4 7.3 11.2 12.2 …Run Code Online (Sandbox Code Playgroud) 我有以下 file.txt(已删节):
SICcode Catcode Category SICname MultSIC
0111 A1500 Wheat, corn, soybeans and cash grain Wheat X
0112 A1600 Other commodities (incl rice, peanuts) Rice X
0115 A1500 Wheat, corn, soybeans and cash grain Corn X
0116 A1500 Wheat, corn, soybeans and cash grain Soybeans X
0119 A1500 Wheat, corn, soybeans and cash grain Cash grains, NEC X
0131 A1100 Cotton Cotton X
0132 A1300 Tobacco & Tobacco products Tobacco X
Run Code Online (Sandbox Code Playgroud)
我在将它读入熊猫 df 时遇到了一些问题。我尝试pd.read_csv了以下规范,engine='python', sep='Tab'但它在一列中返回了文件:
?SICcode …Run Code Online (Sandbox Code Playgroud) 我正在使用 JSON 文件并使用 Python。我正在尝试打印嵌套在数组中的对象。我想从以下阵列打印选择对象(如“名”,“thomas_id”)(是在数组被认为是“对象”的名单'将数组被称为?的“共同发起”排列):
"cosponsors": [
{
"district": null,
"name": "Akaka, Daniel K.",
"sponsored_at": "2011-01-25",
"state": "HI",
"thomas_id": "00007",
"title": "Sen",
"withdrawn_at": null
},
.
.
.
{
"district": null,
"name": "Lautenberg, Frank R.",
"sponsored_at": "2011-01-25",
"state": "NJ",
"thomas_id": "01381",
"title": "Sen",
"withdrawn_at": null
}
]
Run Code Online (Sandbox Code Playgroud)
问题是我不知道在数组中打印对象(列出?)的语法。我尝试了从堆栈溢出中发现的内容推断出的许多变体;即,以下各项的变体:
print(data['cosponsors']['0']['thomas_id']
Run Code Online (Sandbox Code Playgroud)
我收到错误“列表索引必须是整数或切片,而不是 str”
背景:
我有超过 3000 个 json 文件,它们包含在一个所谓的主文件中。我只需要每个文件的相同特定方面,我以后需要将其导出到 MYSQL DB 中,但这是另一个主题(或者是不是,即我是否以错误的方式处理这个问题?)。因此,我正在编写一个代码,我可以在所有文件上使用该代码以获得我需要的数据。考虑到我没有任何编程经验,我一直做得很好。我一直在 Python 中使用以下代码:
import json
data = json.load(open('s2_data.json', 'r'))
print (data["official_title"], data["number"], data["introduced_at"],
data["bill_id"], data['subjects_top_term'], data['subjects'],
data['summary']['text'], data['sponsor']['thomas_id'],
data['sponsor']['state'], data['sponsor']['name'], …Run Code Online (Sandbox Code Playgroud) 我从熊猫汇集的OLS回归中获得以下输出.唯一的问题是我不确定拦截在哪里.在回归中总是存在一个通常在外生变量之前列出的截距,即Y = a + ßx1 + ßx2 + error_term我在回归中没有看到它.我使用了ayhan的建议,X = add_constant(X)但不知怎的,我觉得我用语法弄乱了一些事情(用一种显而易见的方式).我知道这不是火箭科学.有人能告诉我我错过了什么吗?
import numpy as np
import pandas as pd
from pandas import DataFrame, Series
import statsmodels.formula.api as sm
from sklearn.linear_model import LinearRegression
import scipy, scipy.stats
import matplotlib.pyplot as plt
import matplotlib
matplotlib.style.use('ggplot')
from statsmodels.api import add_constant
X = add_constant(X)
Y = df['billsum_support']
X = df[['direct_expenditures','indirect_expenditures', 'years_exp', 'leg_totalbills',\
'log_diff_rgdp', 'unemployment', 'expendituresfor']]
result = sm.OLS( Y, X ).fit()
result.summary()
OLS Regression Results Dep. Variable: billsum_support R-squared: 0.663
Model: …Run Code Online (Sandbox Code Playgroud)