小编Mar*_*k K的帖子

使用BeautifulSoup解析网页 - 跳过404错误页面

我使用下面的代码来获取网站的标题.

from bs4 import BeautifulSoup
import urllib2

line_in_list = ['www.dailynews.lk','www.elpais.com','www.dailynews.co.zw']

for websites in line_in_list:
    url = "http://" + websites
    page = urllib2.urlopen(url)
    soup = BeautifulSoup(page.read())
    site_title = soup.find_all("title")
    print site_title
Run Code Online (Sandbox Code Playgroud)

如果网站列表包含"不良"(不存在)网站/网页,或者网站有某种类型或错误,例如"404找不到页面"等,则脚本将中断并停止.

我以什么方式让脚本忽略/跳过"坏"(不存在)和有问题的网站/网页?

python beautifulsoup web-scraping

2
推荐指数
1
解决办法
3236
查看次数

R,类似于ggmap的免费地图

在执行网络分析时,我想在地图上绘制网络图。ggmap 似乎是首选,但它需要 API 访问。

是否有任何不需要 API 访问的免费和等效/替代(到 ggmap)选项?

谢谢你。

r graph ggmap network-analysis

2
推荐指数
1
解决办法
4413
查看次数

Pandas 列内容到新列,与其他原始列

如下所示的表格,我想从中创建一个新表格(使用“颜色”列中的值)。

在此处输入图片说明

我试过了:

import pandas as pd
import functools

data = {'Seller': ["Mike","Mike","Mike","Mike","David","David","Pete","Pete","Pete"], 
'Code' : ["9QBR1","9QBR1","9QBW2","9QBW2","9QD1X","9QD1X","9QEBO","9QEBO","9QEBO"],
'From': ["2020-01-03","2020-01-03","2020-01-03","2020-01-03","2020-01-03","2020-01-03","2020-01-03","2020-01-03","2020-01-03"],
'Color_date' : ["2020-02-14","2020-02-14","2020-05-18","2020-05-18","2020-01-04","2020-01-04","2020-03-04","2020-03-13","2020-01-28"],
'Color' : ["Blue","Red","Red","Grey","Red","Grey","Blue","Orange","Red"],
'Delivery' : ["Nancy","Nancy","Kate","Kate","Lilly","Lilly","John","John","John"]}

df = pd.DataFrame(data)

df_1 = df.set_index([df.index, 'Color'])['Color_date'].unstack()
df_1['Code'] = df['Code']

final_df = functools.reduce(lambda left,right: pd.merge(left,right,on='Code'), [df, df_1])
Run Code Online (Sandbox Code Playgroud)

“df_1”看起来不错,但“final_df”比预期的要长得多。

哪里出错了,我该如何纠正?谢谢你。

python dataframe pandas

2
推荐指数
1
解决办法
49
查看次数

Python按值(一个列表)对两个对应列表进行排序

有两个相应的1对1关系列表.

names = ["David", "Peter", "Kate", "Lucy", "Kit", "Jason", "Judy"]
scores = [1,1,0.8,0.2,0.4,0.1,0.6]
Run Code Online (Sandbox Code Playgroud)

我想展示得分超过0.5并且显示在1行中的人:

Peter (1 point), David (1 point), Kate (0.8 point), Judy (0.6 point)
Run Code Online (Sandbox Code Playgroud)

我尝试的是:

import operator

names = ["David", "Peter", "Kate", "Lucy", "Kit", "Jason", "Judy"]
scores = [1,1,0.8,0.2,0.4,0.1,0.6]

dictionary = dict(zip(names, scores))

dict_sorted = sorted(dictionary.items(), key=operator.itemgetter(1), reverse=True)

print dict_sorted
Run Code Online (Sandbox Code Playgroud)

它给:

[('Peter', 1), ('David', 1), ('Kate', 0.8), ('Judy', 0.6), ('Kit', 0.4), ('Lucy', 0.2), ('Jason', 0.1)]
Run Code Online (Sandbox Code Playgroud)

怎么能进一步得到想要的结果呢?注意:需要从大到小的排序结果.

2个用于测试目的的较长列表:

names = ["Olivia","Charlotte","Khaleesi","Cora","Isla","Isabella","Aurora","Amelia","Amara","Penelope","Audrey","Rose","Imogen","Alice","Evelyn","Ava","Irma","Ophelia","Violet"]
scores = [1.0, 1.0, 0.8, 0.2, 0.2, 0.4, …
Run Code Online (Sandbox Code Playgroud)

python dictionary list

1
推荐指数
1
解决办法
92
查看次数

'if'语句中的多个条件('和'&'或'?)

我想过滤符合以下条件的行:

  1. 字符'/'在行中
  2. 人物';' 在线
  3. 字符'e'在行中
  4. 字符'k'不在行中
  5. 字符'@'不在行中
  6. 线的长度不超过80

我有的是:

the_list = ['C  TEE edBore 1 1/4200;',
'Cylinder SingleVerticalB HHJ e 1 1/8Cooling 1',
'EngineBore 11/1; TDT 8Length 3Width 3',
'EngineCy HEE Inline2008Bore 1',
'Height 4TheChallen TET e 1Stroke 1P 305',
'Height 8C ;0;Wall15ccG QBG ccGasEngineJ 142',
'Height EQE C ;0150ccGas2007',
'Length 10Wid ETQ Length 10Width ',
'Stro EHT oke 1 1/8Length ',
'Stroke 1 1/4HP   JII Stroke 1 1/4HP  ',
'Stroke 1Cy QTH 7Weight ; 1/2LBS',
'Weight 18LBSLength 1 DQT …
Run Code Online (Sandbox Code Playgroud)

python if-statement

-1
推荐指数
1
解决办法
278
查看次数

在Python 2.7.6中选择列表中的元素

在Python 2.7.6中,列表如下.我以何种方式拿起物品以"4"开头,长度为4,即4646和4648以下?

aaa = [2013, 2014, 2002, 4646, 4648, 20, 456, 5623, 'abc']
Run Code Online (Sandbox Code Playgroud)

我只能通过以下方式选择4个长度:

results = []

for number in aaa:
  if len(str(number)) == 4:
      results.append(number)

print results
Run Code Online (Sandbox Code Playgroud)

谢谢.


一切都很棒.但我是新手,所以选择最简单的.:)

python list

-2
推荐指数
1
解决办法
76
查看次数