小编use*_*999的帖子

使用lxml和request进行HTML抓取会产生unicode错误

我正在尝试像这里提供的那样使用HTML scraper .它适用于他们提供的示例.但是,当我尝试在我的网页上使用它时,我收到此错误 - Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration. 我尝试使用Google搜索但无法找到解决方案.我真的很感激任何帮助.我想知道是否有办法使用Python将其复制为HTML.

编辑:

from lxml import html
import requests
page = requests.get('http://cancer.sanger.ac.uk/cosmic/gene/analysis?ln=PTEN&ln1=PTEN&start=130&end=140&coords=bp%3AAA&sn=&ss=&hn=&sh=&id=15#')
tree = html.fromstring(page.text)

Run Code Online (Sandbox Code Playgroud)

谢谢.

html python unicode lxml web-scraping

use*_*999

2014 07-30

21
推荐指数

1
解决办法

7965
查看次数

使用Python反向补充DNA链

我有一个DNA序列,并希望使用Python获得它的反向补码.它位于CSV文件的其中一列中,我想将反向补码写入同一文件中的另一列.棘手的部分是,有一些单元格不同于A,T,G和C.我能够通过这段代码得到反向补码:

def complement(seq):
    complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'} 
    bases = list(seq) 
    bases = [complement[base] for base in bases] 
    return ''.join(bases)
    def reverse_complement(s):
        return complement(s[::-1])

    print "Reverse Complement:"
    print(reverse_complement("TCGGGCCC"))

Run Code Online (Sandbox Code Playgroud)

但是,当我试图找到补码词典中没有的项目时,使用下面的代码,我只得到最后一个基础的补充.它不会迭代.我想知道如何解决它.

def complement(seq):
    complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'} 
    bases = list(seq) 
    for element in bases:
        if element not in complement:
            print element  
        letters = [complement[base] for base in element] 
        return ''.join(letters)
def reverse_complement(seq):
    return complement(seq[::-1])

print "Reverse Complement:"
print(reverse_complement("TCGGGCCCCX"))

Run Code Online (Sandbox Code Playgroud)

python list bioinformatics dna-sequence biopython

use*_*999

2019 01-29

6
推荐指数

4
解决办法

4万
查看次数