我正在使用NLTK和Maltparser从自然语言的句子中提取依赖关系.我用这段代码做了一些使用Stanford解析器的实验:
sentence = '''I shot an elephant in my pajamas'''
os.popen("echo '"+sentence+"' > ~/stanfordtemp.txt")
parser_out = os.popen("/usr/local/Cellar/stanford-parser/2.0.3/bin/lexparser.sh ~/stanfordtemp.txt").readlines()
for i, tag in enumerate(parser_out):
if len(tag.strip()) > 0 and tag.strip()[0] == '(':
parse = " ".join(tag.strip())
print i, "Parse: ", tag
elif len(tag.strip()) > 0:
print i, "Typed dependencies: ", tag
bracketed_parse = " ".join( [tag.strip() for tag in parser_out if len(tag.strip()) > 0 and tag.strip()[0] == "("] )
print bracketed_parse
Run Code Online (Sandbox Code Playgroud)
并得到了这个好结果:
Parsing [sent. 1 len. 7]: I shot an elephant …Run Code Online (Sandbox Code Playgroud) 我有一个rdf文件,例如:
<?xml version="1.0"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dbp="http://dbpedia.org/ontology/"
xmlns:dbprop="http://dbpedia.org/property/"
xmlns:foaf="http://xmlns.com/foaf/0.1/">
<rdf:Description rdf:about="http://dbpedia.org/page/Johann_Sebastian_Bach">
<dbp:birthDate>1685-03-21</dbp:birthDate>
<dbp:deathDate>1750-07-28</dbp:deathDate>
<dbp:birthPlace>Eisenach</dbp:birthPlace>
<dbp:deathPlace>Leipzig</dbp:deathPlace>
<dbprop:shortDescription>German composer and organist</dbprop:shortDescription>
<foaf:name>Johann Sebastian Bach</foaf:name>
<rdf:type rdf:resource="http://dbpedia.org/class/yago/GermanComposers"/>
<rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/>
</rdf:Description>
</rdf:RDF>
Run Code Online (Sandbox Code Playgroud)
而且我只想提取这个文件的文本部分,即在这种情况下我的输出是:
output_ tex = "Johann Sebastian Bach, German composer and organist,1685-03-21, 1750-07-28, Eisenach, Leipzig"
Run Code Online (Sandbox Code Playgroud)
如何使用RDFlib获得此结果?