我有一个很棒的HTML页面.但我想使用Xpath选择某些节点:
<html>
........
<!-- begin content -->
<div>some text</div>
<div><p>Some more elements</p></div>
<!-- end content -->
.......
</html>
Run Code Online (Sandbox Code Playgroud)
我可以在<!-- begin content -->使用后选择HTML :
"//comment()[. = ' begin content ']/following::*"
Run Code Online (Sandbox Code Playgroud)
我也可以在<!-- end content -->使用之前选择HTML :
"//comment()[. = ' end content ']/preceding::*"
Run Code Online (Sandbox Code Playgroud)
但是,我必须让XPath选择两条评论之间的所有HTML吗?
使用水豚测试设计登录.似乎有些不对劲,因为我无法使用rspec和capybara测试登录.我用工厂女孩来定义用户
FactoryGirl.define do
factory :user do
email 'admin@revol-tech.com.np'
password 'bhaktapur'
password_confirmation 'bhaktapur'
admin true
name 'admin'
confirmation_sent_at "#{DateTime.now}"
confirmation_token 'anupsumhikichiki'
confirmed_at "#{DateTime.now}"
username 'username'
end
end
Run Code Online (Sandbox Code Playgroud)
这是我的spec_helper.rb
# This file is copied to spec/ when you run 'rails generate rspec:install'
ENV["RAILS_ENV"] ||= 'test'
require File.expand_path("../../config/environment", __FILE__)
require 'rspec/rails'
require 'rspec/autorun'
require 'capybara/rspec'
require 'database_cleaner'
# FactoryGirl.find_definitions
Capybara.current_driver = :selenium
# Requires supporting ruby files with custom matchers and macros, etc,
# in spec/support/ and its subdirectories.
Dir[Rails.root.join("spec/support/**/*.rb")].each {|f| require f} …Run Code Online (Sandbox Code Playgroud) 我有一些HTML页面,其中要提取的内容标有HTML注释,如下所示.
<html>
.....
<!-- begin content -->
<div>some text</div>
<div><p>Some more elements</p></div>
<!-- end content -->
...
</html>
Run Code Online (Sandbox Code Playgroud)
我正在使用Nokogiri并试图在<!-- begin content -->和 <!-- end content -->评论之间提取HTML .
我想提取这两个HTML注释之间的完整元素:
<div>some text</div>
<div><p>Some more elements</p></div>
Run Code Online (Sandbox Code Playgroud)
我可以使用这个字符回调获得纯文本版本:
class TextExtractor < Nokogiri::XML::SAX::Document
def initialize
@interesting = false
@text = ""
@html = ""
end
def comment(string)
case string.strip # strip leading and trailing whitespaces
when /^begin content/ # match starting comment
@interesting = true
when /^end content/
@interesting = false # …Run Code Online (Sandbox Code Playgroud) ruby ×3
nokogiri ×2
capybara ×1
html ×1
rspec ×1
scraper ×1
testing ×1
web-crawler ×1
web-scraping ×1
xpath ×1