如果页面具有无限滚动,我如何在 Java 中抓取页面的 HTML?我目前正在以这种方式抓取页面:
URL url = new URL(stringUrl);
URLConnection con = url.openConnection();
InputStream in = con.getInputStream();
String encoding = con.getContentEncoding();
encoding = encoding == null ? "UTF-8" : encoding;
String html = IOUtils.toString(in, encoding);
Document document = Jsoup.parse(html);
Run Code Online (Sandbox Code Playgroud)
但它不会返回与页面无限滚动部分相关的任何内容。如何在 HTML 页面上触发此滚动,以便我的 Jsoup 文档包含此部分?
我在解组课时收到此错误.我正在使用亚马逊的mTurks以及Spring,Maven和(惊喜,惊喜)一个xerces问题已经引起了人们的关注.
我已经通过许多不同的方式玩POM来尝试解决问题,但我似乎无法找出解决方案.
我使用的是mturks的mavenized版本:https: //github.com/tc/java-aws-mturk
我明确地从mturks中排除了xerces的东西:
<dependency>
<groupId>com.amazon</groupId>
<artifactId>java-aws-mturk</artifactId>
<version>1.2.2</version>
<exclusions>
<exclusion>
<groupId>commons-lang</groupId>
<artifactId>commons-lang</artifactId>
</exclusion>
<exclusion>
<groupId>apache-xerces</groupId>
<artifactId>xercesImpl</artifactId>
</exclusion>
<exclusion>
<groupId>apache-xerces</groupId>
<artifactId>resolver</artifactId>
</exclusion>
<exclusion>
<groupId>apache-xerces</groupId>
<artifactId>xml-apis</artifactId>
</exclusion>
</exclusions>
</dependency>
Run Code Online (Sandbox Code Playgroud)
并明确包含xerces-impl和xml-api依赖项:
<dependency>
<groupId>xerces</groupId>
<artifactId>xercesImpl</artifactId>
<version>2.11.0</version>
</dependency>
<dependency>
<groupId>xml-apis</groupId>
<artifactId>xml-apis</artifactId>
<version>1.4.01</version>
</dependency>
Run Code Online (Sandbox Code Playgroud)
我已尝试使用xercesImpl版本2.9.1,2.11.0和xml-apis版本1.4.01,2.0.2的所有四种组合无济于事.
xercesImpl 2.11.0和xml-api 2.0.2会导致不同的错误:
java.lang.NoClassDefFoundError: org/w3c/dom/ElementTraversal
Run Code Online (Sandbox Code Playgroud)
我该如何解决这个问题?
我正在使用 kafka 1.1.0。kafka 流始终抛出此异常(尽管具有不同的消息)
WARN o.a.k.s.p.i.RecordCollectorImpl@onCompletion:166 - task [0_0] Error sending record (key KEY value VALUE timestamp TIMESTAMP) to topic OUTPUT_TOPIC due to Producer attempted an operation with an old epoch. Either there is a newer producer with the same transactionalId, or the producer's transaction has been expired by the broker.; No more records will be sent and no more offsets will be recorded for this task.
WARN o.a.k.s.p.i.AssignedStreamsTasks@closeZombieTask:202 - stream-thread [90556797-3a33-4e35-9754-8a63200dc20e-StreamThread-1] stream task 0_0 got migrated to another thread …Run Code Online (Sandbox Code Playgroud) 我正在使用Ansible的查找功能在INI文件中查找值.这是文档中的示例:
- debug: msg="User in integration is {{ lookup('ini', 'user section=integration file=users.ini') }}"
Run Code Online (Sandbox Code Playgroud)
这是我的任务:
- set_fact: aws_access_var = "{{ lookup('ini', 'AWS_ACCESS_KEY_ID section=Credentials file=/etc/boto.cfg') }}"
Run Code Online (Sandbox Code Playgroud)
它们在语法上看起来相同但我的任务失败了:
fatal: [localhost]: FAILED! => {"failed": true, "msg": "template error while templating string: unexpected char u\"'\" at 18. String: \"{{ lookup('ini', 'AWS_ACCESS_KEY_ID section"}
Run Code Online (Sandbox Code Playgroud)
知道它有什么问题吗?
我的数据结构如下:
{
number: Integer
letter: String
}
Run Code Online (Sandbox Code Playgroud)
我想按以下两个属性进行组计数:
g.V().values('number', 'letter').groupCount();
Run Code Online (Sandbox Code Playgroud)
并查看显示的数据,如下所示:
[[1,A]:16, [1,B]:64, [2,A]:78, [2,B]:987]
Run Code Online (Sandbox Code Playgroud)
在tinkerpop中有什么方法可以做到这一点?
我有一个Pojo.avsc包含以下声明的文件:
{
"namespace": "io.fama.pubsub.schema",
"type": "record",
"name": "Pojo",
"fields": [
{
"name": "field",
"type": "string"
}
]
}
Run Code Online (Sandbox Code Playgroud)
我有一个PojoCollection.avsc只包含 Pojo 对象集合的文件。
{
"namespace": "io.fama.pubsub.schema",
"type": "record",
"name": "PojoCollection",
"fields": [
{
"name": "collection",
"type": {
"type": "array",
"items": {
"name": "pojo",
"type": "Pojo"
}
}
}
]
}
Run Code Online (Sandbox Code Playgroud)
我的 avro-maven-plugin 配置如下:
<plugin>
<groupId>org.apache.avro</groupId>
<artifactId>avro-maven-plugin</artifactId>
<version>1.8.2</version>
<executions>
<execution>
<phase>generate-sources</phase>
<goals>
<goal>schema</goal>
</goals>
<configuration>
<imports>
<import>${basedir}/src/main/avro/Pojo.avsc</import>
</imports>
</configuration>
</execution>
</executions>
</plugin>
Run Code Online (Sandbox Code Playgroud)
这会导致以下异常:
Caused by: org.apache.avro.SchemaParseException: Type not supported: …Run Code Online (Sandbox Code Playgroud)