dan*_*l__ 58 database relational-database nosql
有几种类型的数据库用于不同的目的,但通常MySQL用于一切,因为是最知名的数据库.仅举一个例子,在我的公司中,大数据的应用程序在初始阶段有一个MySQL数据库,这是令人难以置信的,并将给公司带来严重后果.为何选择MySQL?仅仅因为没有人知道应该如何(以及何时)使用另一个DBMS.
所以,我的问题不是供应商,而是数据库的类型.您是否可以为每种类型的数据库提供特定情况(或应用程序)的实际示例,强烈建议您使用它?
例:
•由于Y,社交网络应使用类型X.
•MongoDB或沙发数据库不支持交易,因此文档数据库对银行或拍卖网站的应用程序不利.
等等...
关系: MySQL,PostgreSQL,SQLite,Firebird,MariaDB,Oracle DB,SQL服务器,IBM DB2,IBM Informix,Teradata
对象: ZODB,DB4O,Eloquera,Versant, Objectivity DB,VelocityDB
图形数据库: AllegroGraph,Neo4j,OrientDB,InfiniteGraph,graphbase,sparkledb,flockdb,BrightstarDB
主要价值商店: Amazon DynamoDB,Redis,Riak,Voldemort,FoundationDB,leveldb,BangDB,KAI,hamsterdb,Tarantool,Maxtable,HyperDex,Genomu,Memcachedb
列系列: 大表,Hbase,超级表,Cassandra,Apache Accumulo
RDF商店: Apache Jena,Sesame
多模数据库: arangodb,Datomic,Orient DB,FatDB,AlchemyDB
文档: Mongo DB,Couch DB,Rethink DB,Raven DB,terrastore,Jas DB,Raptor DB,djon DB,EJDB,denso DB,Couchbase
分层: InterSystemsCaché,GT.M 感谢@Laurent Parenteau
dan*_*l__ 66
我找到两篇关于这个主题的令人印象深刻 所有信用到highscalability.com.本答案中的信息由以下文章转录:
如果您的应用需要......
• 复杂的事务,因为您不能丢失数据,或者如果您想要一个简单的事务编程模型,那么请查看Relational或Grid数据库.
• 示例:可能需要完整ACID的库存系统.当我买了一个产品时,我非常不高兴,他们后来说他们缺货了.我不想要一个有偿的交易.我想要我的物品!
• to scale then NoSQL or SQL can work. Look for systems that support scale-out, partitioning, live addition and removal of machines, load balancing, automatic sharding and rebalancing, and fault tolerance.
• to always be able to write to a database because you need high availability then look at Bigtable Clones which feature eventual consistency.
• to handle lots of small continuous reads and writes, that may be volatile, then look at Document or Key-value or databases offering fast in-memory access. Also, consider SSD.
•要实现社交网络操作,您首先可能需要Graph数据库,其次,像Riak这样的数据库支持关系.具有简单SQL连接的内存中关系数据库可能足以满足小型数据集的需要.Redis的设置和列表操作也可以工作.
•要在各种访问模式和数据类型上运行,然后查看文档数据库,它们通常具有灵活性并且运行良好.
• 具有大型数据集的强大离线报告,然后查看支持MapReduce的Hadoop第一和第二产品.支持MapReduce与擅长它并不相同.
• to span multiple data-centers then look at Bigtable Clones and other products that offer a distributed option that can handle the long latencies and are partition tolerant.
• to build CRUD apps then look at a Document database, they make it easy to access complex data without joins.
• built-in search then look at Riak.
• to operate on data structures like lists, sets, queues, publish-subscribe then look at Redis. Useful for distributed locking, capped logs, and a lot more.
• programmer friendliness in the form of programmer-friendly data types like JSON, HTTP, REST, Javascript then first look at Document databases and then Key-value Databases.
• transactions combined with materialized views for real-time data feeds then look at VoltDB. Great for data-rollups and time windowing.
• enterprise-level support and SLAs then look for a product that makes a point of catering to that market. Membase is an example.
• to log continuous streams of data that may have no consistency guarantees necessary at all then look at Bigtable Clones because they generally work on distributed file systems that can handle a lot of writes.
• to be as simple as possible to operate then look for a hosted or PaaS solution because they will do all the work for you.
• to be sold to enterprise customers then consider a Relational Database because they are used to relational technology.
• 动态构建具有动态属性的对象之间的关系,然后考虑图形数据库,因为它们通常不需要模式,并且可以通过编程以增量方式构建模型.
•支持大型媒体,然后查看S3等存储服务.虽然MongoDB有文件服务,但NoSQL系统往往不会处理大型BLOBS.
• 快速有效地批量上传大量数据,然后查找支持该方案的产品.大多数不会,因为他们不支持批量操作.
• an easier upgrade path then use a fluid schema system like a Document Database or a Key-value Database because it supports optional fields, adding fields, and field deletions without the need to build an entire schema migration framework.
• to implement integrity constraints then pick a database that supports SQL DDL, implement them in stored procedures, or implement them in application code.
• a very deep join depth then use a Graph Database because they support blisteringly fast navigation between entities.
•将行为移近数据,以便不必通过网络移动数据,然后查看这种或那种类型的存储过程.这些可以在Relational,Grid,Document甚至Key-value数据库中找到.
• 缓存或存储BLOB数据,然后查看键值存储.缓存可以用于多个网页,或者用于保存在关系数据库中加入的昂贵的复杂对象,以减少延迟等等.
• 经过验证的跟踪记录,例如不破坏数据,只是通常工作然后选择已建立的产品,当您达到扩展(或其他问题)时,使用常见的解决方法之一(扩展,调整,memcached,分片,非规范化等).
• fluid data types because your data isn't tabular in nature, or requires a flexible number of columns, or has a complex structure, or varies by user (or whatever), then look at Document, Key-value, and Bigtable Clone databases. Each has a lot of flexibility in their data types.
• other business units to run quick relational queries so you don't have to reimplement everything then use a database that supports SQL.
• to operate in the cloud and automatically take full advantage of cloud features then we may not be there yet.
• support for secondary indexes so you can look up data by different keys then look at relational databases and Cassandra's new secondary index support.
• create an ever-growing set of data (really BigData) that rarely gets accessed then look at Bigtable Clone which will spread the data over a distributed file system.
• to integrate with other services then check if the database provides some sort of write-behind syncing feature so you can capture database changes and feed them into other systems to ensure consistency.
• fault tolerance check how durable writes are in the face power failures, partitions, and other failure scenarios.
• to push the technological envelope in a direction nobody seems to be going then build it yourself because that's what it takes to be great sometimes.
• to work on a mobile platform then look at CouchDB/Mobile couchbase.
General Use Cases (NoSQL)
• Bigness. NoSQL is seen as a key part of a new data stack supporting: big data, big numbers of users, big numbers of computers, big supply chains, big science, and so on. When something becomes so massive that it must become massively distributed, NoSQL is there, though not all NoSQL systems are targeting big. Bigness can be across many different dimensions, not just using a lot of disk space.
• Massive write performance. This is probably the canonical usage based on Google's influence. High volume. Facebook needs to store 135 billion messages a month (in 2010). Twitter, for example, has the problem of storing 7 TB/data per day (in 2010) with the prospect of this requirement doubling multiple times per year. This is the data is too big to fit on one node problem. At 80 MB/s it takes a day to store 7TB so writes need to be distributed over a cluster, which implies key-value access, MapReduce, replication, fault tolerance, consistency issues, and all the rest. For faster writes in-memory systems can be used.
• Fast key-value access. This is probably the second most cited virtue of NoSQL in the general mind set. When latency is important it's hard to beat hashing on a key and reading the value directly from memory or in as little as one disk seek. Not every NoSQL product is about fast access, some are more about reliability, for example. but what people have wanted for a long time was a better memcached and many NoSQL systems offer that.
• Flexible schema and flexible datatypes. NoSQL products support a whole range of new data types, and this is a major area of innovation in NoSQL. We have: column-oriented, graph, advanced data structures, document-oriented, and key-value. Complex objects can be easily stored without a lot of mapping. Developers love avoiding complex schemas and ORM frameworks. Lack of structure allows for much more flexibility. We also have program- and programmer-friendly compatible datatypes like JSON.
• Schema migration. Schemalessness makes it easier to deal with schema migrations without so much worrying. Schemas are in a sense dynamic because they are imposed by the application at run-time, so different parts of an application can have a different view of the schema.
• Write availability. Do your writes need to succeed no matter what? Then we can get into partitioning, CAP, eventual consistency and all that jazz.
• Easier maintainability, administration and operations. This is very product specific, but many NoSQL vendors are trying to gain adoption by making it easy for developers to adopt them. They are spending a lot of effort on ease of use, minimal administration, and automated operations. This can lead to lower operations costs as special code doesn't have to be written to scale a system that was never intended to be used that way.
• No single point of failure. Not every product is delivering on this, but we are seeing a definite convergence on relatively easy to configure and manage high availability with automatic load balancing and cluster sizing. A perfect cloud partner.
• Generally available parallel computing. We are seeing MapReduce baked into products, which makes parallel computing something that will be a normal part of development in the future.
• Programmer ease of use. Accessing your data should be easy. While the relational model is intuitive for end users, like accountants, it's not very intuitive for developers. Programmers grok keys, values, JSON, Javascript stored procedures, HTTP, and so on. NoSQL is for programmers. This is a developer-led coup. The response to a database problem can't always be to hire a really knowledgeable DBA, get your schema right, denormalize a little, etc., programmers would prefer a system that they can make work for themselves. It shouldn't be so hard to make a product perform. Money is part of the issue. If it costs a lot to scale a product then won't you go with the cheaper product, that you control, that's easier to use, and that's easier to scale?
• Use the right data model for the right problem. Different data models are used to solve different problems. Much effort has been put into, for example, wedging graph operations into a relational model, but it doesn't work. Isn't it better to solve a graph problem in a graph database? We are now seeing a general strategy of trying to find the best fit between a problem and solution.
• Avoid hitting the wall. Many projects hit some type of wall in their project. They've exhausted all options to make their system scale or perform properly and are wondering what next? It's comforting to select a product and an approach that can jump over the wall by linearly scaling using incrementally added resources. At one time this wasn't possible. It took custom built everything, but that's changed. We are now seeing usable out-of-the-box products that a project can readily adopt.
• Distributed systems support. Not everyone is worried about scale or performance over and above that which can be achieved by non-NoSQL systems. What they need is a distributed system that can span datacenters while handling failure scenarios without a hiccup. NoSQL systems, because they have focussed on scale, tend to exploit partitions, tend not use heavy strict consistency protocols, and so are well positioned to operate in distributed scenarios.
• Tunable CAP tradeoffs. NoSQL systems are generally the only products with a "slider" for choosing where they want to land on the CAP spectrum. Relational databases pick strong consistency which means they can't tolerate a partition failure. In the end, this is a business decision and should be decided on a case by case basis. Does your app even care about consistency? Are a few drops OK? Does your app need strong or weak consistency? Is availability more important or is consistency? Will being down be more costly than being wrong? It's nice to have products that give you a choice.
• More Specific Use Cases
• Managing large streams of non-transactional data: Apache logs, application logs, MySQL logs, clickstreams, etc.
• Syncing online and offline data. This is a niche CouchDB has targeted.
• Fast response times under all loads.
• Avoiding heavy joins for when the query load for complex joins become too large for an RDBMS.
• Soft real-time systems where low latency is critical. Games are one example.
• Applications where a wide variety of different write, read, query, and consistency patterns need to be supported. There are systems optimized for 50% reads 50% writes, 95% writes, or 95% reads. Read-only applications needing extreme speed and resiliency, simple queries, and can tolerate slightly stale data. Applications requiring moderate performance, read/write access, simple queries, completely authoritative data. A read-only application which complex query requirements.
• Load balance to accommodate data and usage concentrations and to help keep microprocessors busy.
• Real-time inserts, updates, and queries.
• Hierarchical data like threaded discussions and parts explosion.
• Dynamic table creation.
• Two-tier applications where low latency data is made available through a fast NoSQL interface, but the data itself can be calculated and updated by high latency Hadoop apps or other low priority apps.
• Sequential data reading. The right underlying data storage model needs to be selected. A B-tree may not be the best model for sequential reads.
• Slicing off part of service that may need better performance/scalability onto its own system. For example, user logins may need to be high performance and this feature could use a dedicated service to meet those goals.
• Caching. A high performance caching tier for websites and other applications. Example is a cache for the Data Aggregation System used by the Large Hadron Collider. Voting.
• Real-time page view counters.
• User registration, profile, and session data.
• Document, catalog management and content management systems. These are facilitated by the ability to store complex documents has a whole rather than organized as relational tables. Similar logic applies to inventory, shopping carts, and other structured data types.
• Archiving. Storing a large continual stream of data that is still accessible on-line. Document-oriented databases with a flexible schema that can handle schema changes over time.
• Analytics. Use MapReduce, Hive, or Pig to perform analytical queries and scale-out systems that support high write loads.
• Working with heterogeneous types of data, for example, different media types at a generic level.
• Embedded systems. They don’t want the overhead of SQL and servers, so they use something simpler for storage.
• A "market" game, where you own buildings in a town. You want the building list of someone to pop up quickly, so you partition on the owner column of the building table, so that the select is single-partitioned. But when someone buys the building of someone else you update the owner column along with price.
• JPL is using SimpleDB to store rover plan attributes. References are kept to a full plan blob in S3. (source)
由于普遍性,这个问题几乎无法回答.我认为你正在寻找一些简单的答案问题=解决方案.问题是每个"问题"在成为一个企业时变得越来越独特.
你怎么称呼社交网络?推特?Facebook的?LinkedIn?堆栈溢出?它们都针对不同的部分使用不同的解决方案,并且可以存在使用多语言方法的许多解决方案.Twitter有一个像概念的图形,但只有1度的连接,关注者和追随者.另一方面,LinkedIn在展示人们如何超越一级学位方面茁壮成长.这是两种不同的处理和数据需求,但两者都是"社交网络".
如果您有"社交网络"但没有执行任何发现机制,那么您可以轻松地使用任何基本的键值存储.如果您需要高性能,水平缩放,并且将具有二级索引或全文搜索,则可以使用Couchbase.
如果您在收集的日志数据之上进行机器学习,则可以将Hadoop与Hive或Pig或Spark/Shark集成.或者你可以做一个lambda架构并使用Storm的许多不同系统.
如果您通过超出二度顶点的图形查询进行发现,并且还对边缘属性进行过滤,则可能会考虑在主存储上方的图形数据库.但是,图形数据库不是会话存储或通用存储的好选择,因此您需要多语言解决方案才能有效.
什么是数据速度?规模?你想怎么管理它.您在公司或创业公司拥有哪些专业知识.这有很多原因,这不是一个简单的问答.