我将MySQL数据库迁移到Neo4j并测试了一个简单的请求。我很惊讶地发现neo4j中的等效请求比MySql中的请求长10到100倍。我正在研究Neo4j 2.0.1。
在原始的MySql模式中,我具有以下三个表:
每个属性都有一个索引。我要显示在多个条件下给定大陆的城市剧院数量。请求是:
SELECT count(*) as nb, c.name
FROM `cities` c LEFT JOIN theaters t ON c.id = t.city_id
WHERE c.country_code IN
(SELECT code FROM countries WHERE selected is true AND continent_id = 4)
AND c.status=1 AND t.public = 1
GROUP BY c.name ORDER BY nb DESC
Run Code Online (Sandbox Code Playgroud)
Neo4j中
的数据库架构如下:
(:Continent)-[:Include]->(:Country {selected:bool })-[:Include]->(:City {name:string,status:bool })-[:Include]->(:Theater {public:bool })
每个属性上还定义了一个索引。密码请求是:
MATCH (:Continent{code: 4})-[:Include]->(:Country{selected:true})-[:Include]->(city:City{status:true})-[:Include]->(:Theater{public: true})
RETURN city.name, count(*) AS nb ORDER BY nb DESC
Run Code Online (Sandbox Code Playgroud)
每个数据库中大约有70.000个城市和140.000个剧院。
在ID为4的大陆上,MySql请求大约花费0.02s,而Neo4j花费0.4s。此外,如果我在Cypher请求中引入Country和City之间的可变关系长度(...(:Country{selected:true})-[:Include*..3]->(city:City{status:true})...),因为我希望能够添加诸如Regions之类的中间级别,那么该请求将花费2秒钟以上的时间。
我知道在这种特殊情况下,使用Neo4j代替MySql没有任何好处,但是我希望看到这两种技术之间的性能大致相当,并且我想利用Neo4j的地理层次结构功能。
我是否缺少某些东西,或者这是Neo4j的限制吗?
谢谢您的回答。
编辑:首先,您将在这里找到数据库转储文件。Neo4j 服务器配置是开箱即用的。我在Ruby环境中工作,并且使用neography宝石。另外,由于我不在JRuby上,所以我分别运行Neo4J服务器,因此它通过Rest API发送密码请求。
该数据库包含244个国家,69000个城市和138,000个剧院。对于continent_id 4,有46,982个城市(37,210个状态的布尔值设置为true)和74,420个剧院。
该请求返回了2256行。在第三轮运行中,花费了338毫秒。这是带有概要分析信息的请求输出:
profile MATCH (:Continent{code: 4})-[:Include]->(country:Country{selected:true})-[:Include*..1]->(city:City{status:true})-[:Include]->(theater:Theater{public: true}) RETURN city.name, count(*) AS nb ORDER BY nb DESC;
==> ColumnFilter(symKeys=["city.name", " INTERNAL_AGGREGATE85ca19f3-9421-4c18-a449-1097e3deede2"], returnItemNames=["city.name", "nb"], _rows=2256, _db_hits=0)
==> Sort(descr=["SortItem(Cached( INTERNAL_AGGREGATE85ca19f3-9421-4c18-a449-1097e3deede2 of type Integer),false)"], _rows=2256, _db_hits=0)
==> EagerAggregation(keys=["Cached(city.name of type Any)"], aggregates=["( INTERNAL_AGGREGATE85ca19f3-9421-4c18-a449-1097e3deede2,CountStar())"], _rows=2256, _db_hits=0)
==> Extract(symKeys=["city", " UNNAMED27", " UNNAMED7", "country", " UNNAMED113", "theater", " UNNAMED72"], exprKeys=["city.name"], _rows=2257, _db_hits=2257)
==> Filter(pred="(hasLabel(theater:Theater(3)) AND Property(theater,public(5)) == true)", _rows=2257, _db_hits=2257)
==> SimplePatternMatcher(g="(city)-[' UNNAMED113']-(theater)", _rows=2257, _db_hits=4514)
==> Filter(pred="(((hasLabel(city:City(2)) AND hasLabel(city:City(2))) AND Property(city,status(4)) == true) AND Property(city,status(4)) == true)", _rows=2257, _db_hits=74420)
==> TraversalMatcher(start={"label": "Continent", "query": "Literal(4)", "identifiers": [" UNNAMED7"], "property": "code", "producer": "SchemaIndex"}, trail="( UNNAMED7)-[ UNNAMED27:Include WHERE (((hasLabel(NodeIdentifier():Country(1)) AND hasLabel(NodeIdentifier():Country(1))) AND Property(NodeIdentifier(),selected(3)) == true) AND Property(NodeIdentifier(),selected(3)) == true) AND true]->(country)-[:Include*1..1]->(city)", _rows=37210, _db_hits=37432)
Run Code Online (Sandbox Code Playgroud)
您是对的,我为自己尝试过,只将查询时间降低到100毫秒。
MATCH (:Continent{code: 4})-[:Include]->
(country:Country{selected:true})-[:Include]->
(city:City{status:true})-[:Include]->
(theater:Theater{public: true})
RETURN city.name, count(*) AS nb
ORDER BY nb DESC;
| "Forbach" | 1 |
| "Stuttgart" | 1 |
| "Mirepoix" | 1 |
| "Bonnieux" | 1 |
| "Saint Cyprien Plage" | 1 |
| "Crissay sur Manse" | 1 |
+--------------------------------------+
2256 rows
**85 ms**
Run Code Online (Sandbox Code Playgroud)
请注意,从2.0.x版本开始的cypher尚未对性能进行优化,该工作始于Neo4j 2.1,并将一直持续到2.3。内核中还计划了更多的性能工作,这些工作也会加快速度。
我也用Java实现了该解决方案,并将其降低到19ms。它当然不那么漂亮,但这也是我们针对cypher的目标:
class City {
Node city;
int count = 1;
public City(Node city) {
this.city = city;
}
public void inc() { count++; }
@Override
public String toString() {
return String.format("City{city=%s, count=%d}", city.getProperty("name"), count);
}
}
private List<?> queryJava3() {
long start = System.currentTimeMillis();
Node continent = IteratorUtil.single(db.findNodesByLabelAndProperty(CONTINENT, "code", 4));
Map<Node,City> result = new HashMap<>();
for (Relationship rel1 : continent.getRelationships(Direction.OUTGOING,Include)) {
Node country = rel1.getEndNode();
if (!(country.hasLabel(COUNTRY) && (Boolean) country.getProperty("selected", false))) continue;
for (Relationship rel2 : country.getRelationships(Direction.OUTGOING, Include)) {
Node city = rel2.getEndNode();
if (!(city.hasLabel(CITY) && (Boolean) city.getProperty("status", false))) continue;
for (Relationship rel3 : city.getRelationships(Direction.OUTGOING, Include)) {
Node theater = rel3.getEndNode();
if (!(theater.hasLabel(THEATER) && (Boolean) theater.getProperty("public", false))) continue;
City city1 = result.get(city);
if (city1==null) result.put(city,new City(city));
else city1.inc();
}
}
}
List<City> list = new ArrayList<>(result.values());
Collections.sort(list, new Comparator<City>() {
@Override
public int compare(City o1, City o2) {
return Integer.compare(o2.count,o1.count);
}
});
output("java", start, list.iterator());
return list;
}
java time = 19ms
first = City{city=Val de Meuse, count=1} total-count 22561
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
2001 次 |
| 最近记录: |