Cis*_*idx 1 javascript node.js cheerio
我想只是凑Jung Ho Kang和5,从这个网站,并把它变成一个对象.我想排除(R)和中的所有内容SS.
<td id="lineup-table-top">
<b class="text-muted pad-left-10">5</b>
Jung Ho Kang
<small class="text-muted">(R)</small>
<small class="text-muted">SS</small>
</td>
Run Code Online (Sandbox Code Playgroud)
这是我的代码:
var someObjArr = [];
$('td#lineup-table-top').each(function(i, element){
//Get the text from cheerio.
var text = $(this).text();
//if undefined, create the object inside of our array.
if(someObjArr[i] == undefined){
someObjArr[i] = {};
};
//Update the salary property of our object with the text value.
someObjArr[i].name = text;
$('b.pad-left-10').each(function(i, element){
//Get the text from cheerio.
var text = $(this).text();
//if undefined, create the object inside of our array.
if(someObjArr[i] == undefined){
someObjArr[i] = {};
};
//Update the name property of our object with the text value.
someObjArr[i].batting = text;
});
});
Run Code Online (Sandbox Code Playgroud)
代码的确切输出如下:
{ batting: '5',
name: '5 Jung Ho Kang (R) SS 3B' }
{ name: '5 Jung Ho Kang (R) SS' },
Run Code Online (Sandbox Code Playgroud)
预期输出:
{ batting: '5',
name: 'Jung Ho Kang' }
Run Code Online (Sandbox Code Playgroud)
我不知道为什么它似乎循环两次,我无法弄清楚如何在没有与之关联的类/ id的情况下隔离该名称.
任何方向都受到热烈的赞赏.
看起来您只想抓取标记中的文本节点.
https://github.com/cheeriojs/cheerio/issues/359
我不确定是否nodeType支持,但你应该先尝试使用它.(nodeType docs)
$('td#lineup-table-top').contents().each(function(i, element){
someObjArr[i] = someObjArr[i] || {};
// The first element in #linup-table-top is batting stats
if ( i === 0 && $(element).hasClass('pad-left-10') ) {
someObjArr[i].name = $(element).text().trim();
}
// The raw text inside of #lineup-table-top the player name
if ( element.nodeType === 3 ) {
someObjArr[i].name = $(element).toString().trim();
}
});
Run Code Online (Sandbox Code Playgroud)
如果不支持,您可以回退使用 element.type
if ( element.type === 'text' ) {
someObjArr[i] = someObjArr[i] || {};
someObjArr[i].name = $(element).toString().trim();
}
Run Code Online (Sandbox Code Playgroud)
我过去用过这个来抓取整个标记页面中的文本.
// For each DOM element in the page
$('*').each(function(i, element) {
// Scrape only the text nodes
$(element).contents().each(function(i, element) {
if (element.type === 'text') {
}
});
});
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
2318 次 |
| 最近记录: |