根据gate.ac.uk,地名词典是:
地名词典由一组列表组成,这些列表包含诸如城市,组织,星期几等实体的名称。这些列表用于查找文本中这些名称的出现,例如,用于命名实体的识别任务。“地名词典”一词通常既可用于实体列表集,又可用于处理资源,该资源使用这些列表来查找文本中名称的出现。
这与“本体论”有何不同?
我正在尝试提取由默认ANNIE处理资源生成的注释集的各个文本值.
当我遍历注释集时,每个条目仅给出注释引用但不提供.value()方法的开始和结束位置.是否有一种简单的方法来获取值,或者我是否需要使用FileWriter或某些等效项来直接从我使用注释的开始和结束位置处理的语料库中提取值?
annotTypesRequired.add("Location");
Set<Annotation> organization = new HashSet<Annotation>(
defaultAnnotSet.get(annotTypesRequired));
Run Code Online (Sandbox Code Playgroud) 我在GATE中的句子拆分器模块有问题。我的文字是这样的:
Social history. He drank a lot in his young age. He did
not attend a school. He was depressed of his condition.
Run Code Online (Sandbox Code Playgroud)
虽然我们确定句子应该像
Sentence 1: Social history.
Sentence 2: He drank a lot in his young age.
Sentence 3: He did not attend a school.
Sentence 4: He was depressed of his condition.
Run Code Online (Sandbox Code Playgroud)
ANNIE句子拆分器认识到,不同行中的文本应分组在不同的句子中,因此得出以下结果:
Sentence 1: Social history.
Sentence 2: He drank a lot in his young age.
Sentence 3: He did
Sentence 4: not attend a school.
Sentence …Run Code Online (Sandbox Code Playgroud) 当前,我正在使用Groovy创建嵌套的for循环,该循环将对象的内容打印到旨在作为分隔数据行的字符串中。我想将这些字符串输出到一个csv文件,而不是打印它们。
这是代码:
for (doc in docs) {
AnnotationSet row = doc.getAnnotations("Final").get("Row")
AnnotationSet BondCounsel = doc.getAnnotations("Final").get("Bond_Counsel")
AnnotationSet PurchasePrice = doc.getAnnotations("Final").get("PurchasePrice")
AnnotationSet DiscountRate = doc.getAnnotations("Final").get("DiscountRate")
for (b in BondCounsel) {
for (d in DiscountRate) {
for (r in row) {
for (p in PurchasePrice) {
println(doc.getFeatures().get("gate.SourceURL") + "|"
+ "mat_amount|" + r.getFeatures().get("MatAmount") + "|"
+ "orig_price|" + p.getFeatures().get("VAL") + "|"
+ "orig_yield|" + r.getFeatures().get("Yield") + "|"
+ "orig_discount_rate|" + d.getFeatures().get("rate")+ "|"
+ "CUSIP|" + r.getFeatures().get("CUSIPVAL1") + r.getFeatures().get("CUSIPVAL2") + r.getFeatures().get("CUSIPVAL3") + "|"
+ …Run Code Online (Sandbox Code Playgroud) 如果我使用Ant构建脚本,那么如果我将它们放在正确的文件夹中,它将包含我创建的JAPE文件.但是如果我想从maven使用GATE,我如何包含我自己的JAPE文件?
我对NLP很新,我正在使用GATE.如果我运行大数据集(包含7K +记录)的代码,我会收到OOM异常.下面是发生异常的代码.
/**
* Run ANNIE
*
* @param controller
* @throws GateException
*/
public void execute(SerialAnalyserController controller)
throws GateException {
TestLogger.info("Running ANNIE...");
controller.execute(); /**** GateProcessor.java:217 ***/
// controller.cleanup();
TestLogger.info("...ANNIE complete");
}
Run Code Online (Sandbox Code Playgroud)
这是日志:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.HashMap.addEntry(Unknown Source)
at java.util.HashMap.put(Unknown Source)
at java.util.HashMap.putAll(Unknown Source)
at gate.annotation.AnnotationSetImpl.<init>(AnnotationSetImpl.java:111)
at gate.jape.SinglePhaseTransducer.attemptAdvance(SinglePhaseTransducer.java:448)
at gate.jape.SinglePhaseTransducer.transduce(SinglePhaseTransducer.java:287)
at gate.jape.MultiPhaseTransducer.transduce(MultiPhaseTransducer.java:168)
at gate.jape.Batch.transduce(Batch.java:352)
at gate.creole.Transducer.execute(Transducer.java:116)
at gate.creole.SerialController.runComponent(SerialController.java:177)
at gate.creole.SerialController.executeImpl(SerialController.java:136)
at gate.creole.SerialAnalyserController.executeImpl(SerialAnalyserController.java:67)
at gate.creole.AbstractController.execute(AbstractController.java:42)
at in.co.test.GateProcessor.execute(GateProcessor.java:217)
Run Code Online (Sandbox Code Playgroud)
我想知道执行函数究竟发生了什么以及如何解决它.谢谢.
我一直在阅读有关文本分类的文章,并发现了几种可用于分类的Java工具,但我仍然想知道:文本分类是否与句子分类相同!
有没有专注于句子分类的工具?