O P*_*O P 12 python beautifulsoup
我有一个Python脚本,可以在html页面中删除元素的src属性<video>.使用此页面视频上的浏览器检查器,我可以看到我需要抓取的视频元素,但直接查看页面源只显示ember应用程序JavaScript文件.
我需要做什么来访问保存<video>元素的"内部框架"标记,以便我可以抓取src属性?
编辑所以它不是那么广泛
无需进入完整的浏览器/ selenium路线.只是做一些调查,你会看到它是如何工作的:
对于vine URL https://vine.co/v/i3pQ70vK3iv,您需要描述视频的json文件.
如此简单的刮取URL https://archive.vine.co/posts/i3pQ70vK3iv.json.这将返回文件,如:
{
"username": "Bleacher Report",
"userIdStr": "906307026416705536",
"postId": 1352573572862066700,
"verified": 1,
"description": "",
"created": "2016-06-09T06:14:43.000000",
"permalinkUrl": "https://vine.co/v/i3pQ70vK3iv",
"userId": 906307026416705500,
"profileBackground": "0x333333",
"vanityUrls": [
"BleacherReport"
],
"entities": [],
"postIdStr": "1352573572862066688",
"comments": 293,
"reposts": 2384,
"videoLowURL": "http://mtc.cdn.vine.co/r/videos_r2/DC69CF91B61352573549554077696_558739dd749.17.0.4126553130190094381.mp4?versionId=oVIxbcFKL5aaqsbMx_q.7wt4zEnhgQ0w",
"loops": 19182516,
"videoUrl": "http://mtc.cdn.vine.co/r/videos/DC69CF91B61352573549554077696_558739dd749.17.0.4126553130190094381.mp4?versionId=av0W8OaLWSzghq.9__iKdSU4y75FDNg.",
"videoDashUrl": "http://mtc.cdn.vine.co/r/videos_dashhd/DC69CF91B61352573549554077696_558739dd749.17.0.4126553130190094381.mp4?versionId=98zVYTYAx16DJka7Oa1yQu20utGrQch9",
"thumbnailUrl": "http://v.cdn.vine.co/r/thumbs/DC69CF91B61352573549554077696_558739dd749.17.0.4126553130190094381.mp4.jpg?versionId=7LmJNEI3C6bsHkF3t9jqu5k1O2xEHo9l",
"explicitContent": 0,
"likes": 6593
}
Run Code Online (Sandbox Code Playgroud)
您将找到视频本身的URL作为videoUrl返回的json中的属性.
| 归档时间: |
|
| 查看次数: |
761 次 |
| 最近记录: |