我一直在尝试使用Scrapy(xpath)从Kbb的HTML中提取脚本标记中的数据.但我的主要问题是识别正确的div和脚本标签.我是使用xpath的新手,非常感谢任何帮助!
<script type="text/javascript" src="http://s1.kbb.com/combine/IncentivesPilotJs/949332058"></script>
<input type="hidden" id="ResaleValueUrl" value="/ymmt/resalevalue/?vehicleid=392396" />
<input type="hidden" id="Intent" value="buy-used" />
<!--[if lt IE 9]>
<script>
window.FlashCanvasOptions = {
swfPath: "/js/canvas/FlashCanvas/UCMarketMeter/"
};
</script>
<script type="text/javascript" src="http://s1.kbb.com/combine/YmmtMarketMeterFlashCanvasJs/795892638"></script>
<![endif]-->
<script type="text/javascript" src="http://s1.kbb.com/combine/YMMTOverview/1527402533"></script>
<script type="text/javascript" src="http://s1.kbb.com/combine/YmmtPricingOverviewBuyUsedJs/-1416499456"></script>
<script language="javascript" type="text/javascript">
$(document).ready(function() {
KBB.Vehicle.Pages.PricingOverview.Buyers.setup({
//Workaround until we get cross domain working for Flash
imageDir: window.FlashCanvasOptions ? "/Content/images" : "http://file.kelleybluebookimages.com/kbb/images/marketmeter",
vehicleId: "392396",
zipCode: "78701",
mileage: "10000",
intent: "buy-used",
priceType: "retail",
condition: "good",
options: "392396|53635|78701|100|10|",
price: "17074",
manufacturer: "Nissan",
model: "Altima", …Run Code Online (Sandbox Code Playgroud) 我试图将我的数据框中的单个"字符"变量拆分为多个"因子"变量.
> sampledf=data.frame(vin=c('v1','v2','v3'),features=c('f1:f2:f3','f2:f4:f5','f1:f4:f5'))
> sampledf
vin features
1 v1 f1:f2:f3
2 v2 f2:f4:f5
3 v3 f1:f4:f5
> desireddf=data.frame(vin=c('v1','v2','v3'),f1=c(1,0,1),f2=c(1,1,0),f3=c(1,0,0),f4=c(0,1,1),f5=c(0,1,1))
> desireddf
vin f1 f2 f3 f4 f5
1 v1 1 1 1 0 0
2 v2 0 1 0 1 1
3 v3 1 0 0 1 1
Run Code Online (Sandbox Code Playgroud)
我已经尝试过strsplit()分开"功能"列
strsplit(as.character(df$features), ";")
Run Code Online (Sandbox Code Playgroud)
但没有运气因素.