小编Gan*_*alf的帖子

w中的无效编码

我想在c下载一个wget的网页.我已经编写了这段代码,但是当我尝试它时,程序下载的页面仅以给定名称的一部分命名,我在文件名中找到了无效的编码.

页面名称是这样的

test0L???i}?X?????L???????R?td]?{??+`??U{?@ (invalid encoding)

Run Code Online (Sandbox Code Playgroud)

我的计划的重要部分是这个.

#define PAGE "http://deckbox.org/games/mtg/cards?p="

char *cat_url(char *s1, char *s2)
{
    char *tmp;
    tmp = (char*)malloc(sizeof(char*) * (strlen(s1) + strlen(s2)));
    strcat(tmp, s1);
    strcat(tmp, s2);
    return tmp;
}

void get_card_name(char *pg_name)
{
    int i;
    int fk;
    char *args[6], tmp;

    for (i = 0; i < 8; i++) {
        tmp = itoa(i);
        args[0] = "wget";
        args[1] = "-q";
        args[2] = cat_url(PAGE, &tmp);
        args[3] = "-O";
        args[4] = cat_url("test", &tmp);
        args[5] = NULL;

        if (fork()) {
            wait(&fk);
        } else …

Run Code Online (Sandbox Code Playgroud)

c wget

Gan*_*alf

lucky-day

2
推荐指数

1
解决办法

178
查看次数

使用lxml从html提取属性

我用来lxml从html页面检索标签的属性。html页面的格式如下：

<div class="my_div">
    <a href="/foobar">
        <img src="my_img.png">
    </a>
</div>

Run Code Online (Sandbox Code Playgroud)

我用来检索<a>标记内的url 和相同标记内的src值的python脚本是这样的：<img><div>

from lxml import html 

...
tree = html.fromstring(page.text)
for element in tree.xpath('//div[contains(@class, "my_div")]//a'):
    href = element.xpath('/@href')
    src = element.xpath('//img/@src')

Run Code Online (Sandbox Code Playgroud)

为什么我没有得到琴弦？

html python lxml

Gan*_*alf

2014 11-22

2
推荐指数

2
解决办法

8056
查看次数