我有一个字典,其中包含“data.csv”中每个状态的数据框。
df=pd.read_csv('data.csv')
dict_of_st = {k: v for k, v in df.groupby('Preferred State/Province')}
Run Code Online (Sandbox Code Playgroud)
我想将每个数据框写入已经存在的工作簿('test.xlsx')中的单独 excel 表。
我尝试使用 for 循环和 load workbook
from openpyxl import load_workbook
book = load_workbook('test.xlsx')
writer = pd.ExcelWriter('test.xlsx', engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
for i in dict_of_st:
i.to_excel(writer, sheet_name=i)
writer.save()
Run Code Online (Sandbox Code Playgroud)
但是 jupyter notebook 引发了这个错误:
AttributeError Traceback (most recent call last)
<ipython-input-8-c1ba1b4d53d8> in <module>
7
8 for i in dict_of_st:
----> 9 i.to_excel(writer, sheet_name=i)
10
11 writer.save()
AttributeError: 'str' object has no …Run Code Online (Sandbox Code Playgroud) 我创建了一个简单的爬虫,Scrapy它从给定的链接开始,并跟踪给定内的所有链接,DEPTH_LIMIT每次运行蜘蛛时都会根据项目参数进行调整。为了简单起见,该脚本会打印响应 URL。
import scrapy
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
from NONPROF.items import NonprofItem
from scrapy.http import Request
import re
class Nonprof(CrawlSpider):
name = "my_scraper"
allowed_domains = ["stackoverflow.com"]
start_urls = ["https://stackoverflow.com"]
rules = [
Rule(LinkExtractor(
allow=['.*']),
callback='parse_item',
follow=True)
]
def parse_item (self, response):
print (response.url)
Run Code Online (Sandbox Code Playgroud)
我当前的目标是解析从起始 URL 开始的给定深度内的所有可见文本,并使用该数据进行主题建模。我过去使用 做过类似的事情BeautifulSoup,但我想在我的爬虫中利用以下解析语言。
from bs4 import BeautifulSoup
import bs4 as bs
import urllib.request
def tag_visible(element):
if element.parent.name in ['style', 'script', 'head', 'title', 'meta', '[document]']:
return …Run Code Online (Sandbox Code Playgroud)