从表创建嵌套数组的最佳方法:多个查询/循环VS单个查询/循环样式

Question

从表创建嵌套数组的最佳方法:多个查询/循环VS单个查询/循环样式

Cap*_*nch 6 php mysql sql server-side relational-database

假设我有2个表,我可以"合并"并表示在一个嵌套数组中.

考虑到:我正在徘徊这是最好的方法.

效率
最佳实践
数据库/服务器端使用权衡
你应该在现实生活中做些什么
3个,4个或更多表的相同情况,可以这样"合并"

问题是关于任何服务器端/关系数据库.

我想到的两种简单方法(如果你有其他人,请建议!注意我要求一个简单的SERVER-SIDE和RELATIONAL-DB,所以请不要浪费你的时间来解释为什么我不应该使用这种DB,使用MVC设计等等......):

2个循环,5个简单的"SELECT"查询
1个循环,1个"JOIN"查询

我试图给出一个简单而详细的例子,以便解释自己并更好地理解你的答案(虽然如何编写代码和/或发现可能的错误不是问题,所以尽量不要关注那个...... .)

用于创建数据并将数据插入表格的SQL脚本

CREATE TABLE persons
(
    id int NOT NULL AUTO_INCREMENT,
    fullName varchar(255),
    PRIMARY KEY (id)
);

INSERT INTO persons (fullName) VALUES ('Alice'), ('Bob'), ('Carl'), ('Dan');

CREATE TABLE phoneNumbers
(
    id int NOT NULL AUTO_INCREMENT,
    personId int,
    phoneNumber varchar(255),
    PRIMARY KEY (id)
);

INSERT INTO phoneNumbers (personId, phoneNumber) VALUES ( 1, '123-456'), ( 1, '234-567'), (1, '345-678'), (2, '456-789'), (2, '567-890'), (3, '678-901'), (4, '789-012');

Run Code Online (Sandbox Code Playgroud)

在我"合并"之后表格的JSON表示:

[
  {
    "id": 1,
    "fullName": "Alice",
    "phoneNumbers": [
      "123-456",
      "234-567",
      "345-678"
    ]
  },
  {
    "id": 2,
    "fullName": "Bob",
    "phoneNumbers": [
      "456-789",
      "567-890"
    ]
  },
  {
    "id": 3,
    "fullName": "Carl",
    "phoneNumbers": [
      "678-901"
    ]
  },
  {
    "id": 4,
    "fullName": "Dan",
    "phoneNumbers": [
      "789-012"
    ]
  }
]

Run Code Online (Sandbox Code Playgroud)

PSEUDO代码2种方式:

1.

query: "SELECT id, fullName FROM persons"
personList = new List<Person>()
foreach row x in query result:
    current = new Person(x.fullName)
    "SELECT phoneNumber FROM phoneNumbers WHERE personId = x.id"
    foreach row y in query result:
        current.phoneNumbers.Push(y.phoneNumber)
    personList.Push(current)        
print personList

Run Code Online (Sandbox Code Playgroud)

2.

query: "SELECT persons.id, fullName, phoneNumber FROM persons
            LEFT JOIN phoneNumbers ON persons.id = phoneNumbers.personId"
personList = new List<Person>()
current = null
previouseId = null
foreach row x in query result:
    if ( x.id !=  previouseId )
        if ( current != null )
            personList.Push(current)
            current = null
        current = new Person(x.fullName)
    current.phoneNumbers.Push(x.phoneNumber)
print personList

Run Code Online (Sandbox Code Playgroud)

PHP/MYSQL中的代码实现:

1.

/* get all persons */
$result = mysql_query("SELECT id, fullName FROM persons"); 
$personsArray = array(); //Create an array
//loop all persons
while ($row = mysql_fetch_assoc($result))
{
    //add new person
    $current = array();
    $current['id'] = $row['id'];
    $current['fullName'] = $row['fullName'];

    /* add all person phone-numbers */
    $id = $current['id'];
    $sub_result = mysql_query("SELECT phoneNumber FROM phoneNumbers WHERE personId = {$id}");
    $phoneNumbers = array();
    while ($sub_row = mysql_fetch_assoc($sub_result))
    {
        $phoneNumbers[] = $sub_row['phoneNumber']);
    }
    //add phoneNumbers array to person
    $current['phoneNumbers'] = $phoneNumbers;

    //add person to final result array
    $personsArray[] = $current;
}

echo json_encode($personsArray);

Run Code Online (Sandbox Code Playgroud)

2.

/* get all persons and their phone-numbers in a single query */
$sql = "SELECT persons.id, fullName, phoneNumber FROM persons
            LEFT JOIN phoneNumbers ON persons.id = phoneNumbers.personId";
$result = mysql_query($sql); 

$personsArray = array();
/* init temp vars to save current person's data */
$current = null;
$previouseId = null;
$phoneNumbers = array();
while ($row = mysql_fetch_assoc($result))
{
    /*
       if the current id is different from the previous id:
       you've got to a new person.
       save the previous person (if such exists),
       and create a new one
    */
    if ($row['id'] != $previouseId )
    {
        // in the first iteration,
        // current (previous person) is null,
        // don't add it
        if ( !is_null($current) )
        {
            $current['phoneNumbers'] = $phoneNumbers;
            $personsArray[] = $current;
            $current = null;
            $previouseId = null;
            $phoneNumbers = array();
        }

        // create a new person
        $current = array();
        $current['id'] = $row['id'];
        $current['fullName'] = $row['fullName'];
        // set current as previous id
        $previouseId = $current['id'];
    }

    // you always add the phone-number 
    // to the current phone-number list
    $phoneNumbers[] = $row['phoneNumber'];
    }
}

// don't forget to add the last person (saved in "current")
if (!is_null($current))
    $personsArray[] = $current);

echo json_encode($personsArray);

Run Code Online (Sandbox Code Playgroud)

PS这个链接是这里一个不同问题的例子,我试图建议第二种方式:表格单个json

Answer 1

Per*_*DBA 7

Preliminary

First, thank you for putting that much effort into explaining the problem, and for the formatting. It is great to see someone who is clear about what they are doing, and what they are asking.

But it must be noted that that, in itself, forms a limitation: you are fixed on the notion that this is the correct solution, and that with some small correction or guidance, this will work. That is incorrect. So I must ask you to give that notion up, to take a big step back, and to view (a) the whole problem and (b) my answer without that notion.

The context of this answer is:

all the explicit considerations you have given, which are very important, which I will not repeat
the two most important of which is, what best practice and what I would do in real life

This answer is rooted in Standards, the higher order of, or frame of reference for, best practice. This is what I have done in real life since 1990, meaning that since 1990, I have never had the need to write code such as yours. This is what the commercial Client/Server world does, or should be doing.

This issue, this whole problem space, is becoming a common problem. I will give a full consideration here, and thus answer another SO question as well. Therefore it might contain a tiny bit more detail that you require. If it does, please forgive this.

Consideration

The database is a server-based resource, shared by many users. In an online system, the database is constantly changing. It contains that One Version of the Truth (as distinct from One Fact in One Place, which is a separate, Normalisation issue) of each Fact.
- the fact that mickey mouse NONsqls do not have a server architecture, and that therefore the notion of server in such software is false and misleading, are separate but noted points.
As I understand it, JSON and JSON-like structures are required for "performance reasons", precisely because the "server" doesn't, cannot, perform as a server. The concept is to cache the data on each (every) client, such that you are not fetching it from the "server" all the time.
- This opens up a stinking can of worms. If you do not design and implement this properly, the worms will overrun the app and the stench will kill you.
- Such an implementation is a gross violation of the Client/Server Architecture, which allows simple code on both sides, and appropriate deployment of software and data components, such that implementation times are small, and efficiency is high.
- Further, such an implementation requires a substantial implementation effort, and it is complex, consisting of many parts. Each of those parts must be appropriately designed.
- The web, and the many books written in this subject area, provide a cesspool of methods, marketed on the basis of supposed simplicity; ease; anyone-can-do-anything; freeware-can-do-anything; etc. There is not scientific basis for any of those proposals.

Non-architecture & Sub-standard

如证据所示,您已经了解到这种营销神话是欺诈性的.您遇到了一个问题,一个是该建议错误的实例.一旦你解决了这个问题,下一个问题就会暴露出来,这个问题现在对你来说并不明显.这些概念是一组永无止境的问题.

我不会列举这些假装专家(实际上是对技术一无所知的马戏团怪)市场的所有错误观念.我相信,当你逐步完成我的回答时,你会注意到一个接一个推销的概念是假的.

两个底线是:

The notions violate Architecture and Design Standards, namely Client/Server Architecture; Open Architecture; Engineering Principles; and to a lesser in this particular problem, Database Design Principles.
Which leads to people like you, who are trying to do an honest job, being defrauded, tricked, seduced, into implementing simple notions, which turn into massive implementations. Implementations that will never quite work, so they require substantial ongoing maintenance, and will eventually be replaced, wholesale.

Architecture

The central principle being violated is, never duplicate anything. The moment you have a location where data is duplicated (due to caching or replication or two separate monolithic apps, etc), you create a duplicate that will go out of synch in an online situation. So the principle is to avoid doing that.

Sure, for serious third-party software, such as a gruntly report tool, by design, they may well cache server-based data in the client. But note that they have put hundreds of man years into implementing it correctly, with due consideration to the above. Yours is not such a piece of software.

Rather than providing a lecture on the principles that must be understood, or the evils and costs of each error, the rest of this answer provides the requested what would you do in real life, using the correct architectural method (a step above best practice).

Architecture 1

Do not confuse

the data
which must be Normalised

with

the result set
which, by definition, is the flattened ("de-normalised" is not quite correct) view of the data.

The data, given that it is Normalised, will not contain duplicate values; repeating groups. The result set will contain duplicate values; repeating groups. That is pedestrian.

Note that the notion of Nested Sets (or Nested Relations), which is being heavily marketed by the schizophrenics, is based on precisely this confusion.
For fortyfive years since the advent of the RM, they have been unable to differentiate base relations (for which Normalisation does apply) from derived relations (for which Normalisation does not apply).
Two of the freaks are currently mounting an assault on the definition of First Normal Form. This is an assault on the intellect. This would (if accepted) normalise insanity. 1NF is the foundation of the other NFs, if this insanity is accepted, all the NFs will damaged, demeaned, rendered value-less. The result would be that Normalisation itself (sparsely defined in mathematical terms, but clearly understood as a science by professionals) will be severely damaged, if not destroyed.

Architecture 2

There is a centuries-old scientific or engineering principle, that content (data) must be separated from control (program elements). This is because the analysis; design; and implementation of the two are completely different. This principle is no less important in the software sciences, where it has specific articulation.

In order to keep this brief (ha ha), instead of a discourse, I will assume that you understand:

That there is a scientifically demanded boundary between data and program elements. Mixing them up results in complex objects that are error-prone and hard to maintain.
- The confusion of this principle has reached epidemic proportions in the OO/ORM world, the consequences reach far and wide.
- Only educated professionals avoid this insanity. For the rest, the great majority, they accept this insanity as "normal", and they spend their lives fixing problems that we simply do not have.
The architectural superiority, the great value, of data being both stored and presented in Tabular Form per Dr E F Codd's Relational Model. That there are specific rules for Normalisation of data.
And importantly, you can determine when the people in the mad house, who write and market books, advise non-relational or anti-relational methods.

Architecture 3

If you cache data on the client:

Cache the absolute minimum.

That means cache only the data that does not change in the online environment. That means Reference and Lookup tables only, the tables that populate the higher level classifiers, the drop-downs, etc.
Currency

For every table that you do cache, you must have a method of (a) determining that the cached data has become stale, compared to the One Version of the Truth which exists on the server, and (b) refreshing it from the server, (c) on a table-by-table basis.

Typically, this involves a background process that executes every (e) five minutes, that queries the MAX updated DateTime for each cached table on the client vs the DateTime on the server, and if changed, refreshes the table, and all its child tables, those that dependent on the changed table.

That, of course, requires that you have an UpdatedDateTime column on every table. That is not a burden, because you need that for OLTP ACID Transactions anyway (if you have a real database, instead of a bunch of sub-standard files).

Which really means, never replicate, the coding burden is prohibitive.

Architecture 4

In the sub-commercial, non-server world, I understand that the freaks advise the reverse (insane people always contradict sanity), caching of "everything".

That is the only way the programs like PusGresQl, produced by their cohorts in the same asylum, can to the used in a multi-user system, the only way that they can spread their cancer.
You always get what you pay for: you pay peanuts, you get monkeys; you pay zero, you get zero.

The corollary to Architecture 3 is, if you do cache data on the client, do not cache tables that change frequently. These are the transaction and history tables. The notion of caching such tables, or all tables, on the client is completely bankrupt.

In a genuine Client/Server deployment, due to use of applicable standards, for each data window, the app should query only the rows that are required, for that particular need, at that particular time, based on context or filter values, etc. The app should never load the entire table.

If the same user using the same window inspected its contents, 15 minutes after the first inspection, the data would be 15 mins out of date.

For freeware/shareware/vapourware platforms, which define themselves by the absence of a server architecture, and thus by the result, that performance is non-existent, sure, you have to cache more than the minimum tables on the client.
If you do that, you must take all the above into account, and implement it correctly, otherwise your app will be broken, and the stench will drive the users to seek your termination. If there is more than one user, they will have the same cause, and soon form an army.

Architecture 5

Now we get to how you cache those carefully chosen tables on the client.

Note that databases grow, they are extended.

If the system is broken, a failure, it will grow in small increments, and require a lot of effort.
If the system is even a small success, it will grow exponentially.
If the system (each of the database, and the app, separately) is designed and implemented well, the changes will be easy, the bugs will be few.

Therefore, all the components in the app must be designed properly, to comply with applicable standards, and the database must be fully Normalised. This in turn minimises the effect of changes in the database, on the app, and vice versa.

The app will consist of simple, not complex, objects, which are easy to maintain and change.
For the data that you do cache on the client, you will use arrays of some form: multiple instances of a class in an OO platform; DataWindows (TM, google for it) or similar in a 4GL; simple arrays in PHP.

(Aside. Note gravely, that what people in situations such as yours produce in one year, professional providers such as I produce in one week, using a commercial SQL platform, a commercial 4GL, and complying with Architecture and Standards.)

Architecture 6

So let's assume that you understand all the above, and appreciate its value, particularly Architecture 1 & 2.

If you don't, please stop here and ask questions, do not proceed to the below.

Now that we have established the full context, we can address the crux of your problem.

In those arrays in the app, why on Earth would you store flattened views of data ?
- and consequently mess with, and agonise over, the problems
instead of storing copies of the Normalised tables ?

Answer

Never duplicate anything that can be derived. That is an Architectural Principle, not limited to Normalisation in a database.
Never merge anything.

If you do, you will be creating:
- data duplication, and masses of it, on the client. The client will not only be fat and slow, it will be anchored to the floor with the ballast of duplicated data.
- additional code, which is completely unnecessary
- complexity in that code
- code that is fragile, that will constantly have to change.
That is the precise problem you are suffering, a consequence of the method, which you know intuitively is wrong, that there must be a better way. You know it is a generic and common problem.

Note also that method (the poison that is marketed), that code, constitutes a mental anchor for you. Look at the way that you have formatted it and presented it so beautifully: it is of importance to you. I am reluctant to inform you of all this.
- Which reluctance is easily overcome, due to your earnest and forthright attitude, and the knowledge that you did not invent this method, that you followed "teachers" who are, as evidenced, totally ignorant of the relevant science, who market insanity; non-science; nonsense, as "science".
In each code segment, at presentation time, as and when required:

a. In the commercial Client/Server context
Execute a query that joins the simple, Normalised, unduplicated tables, and retrieves only the qualifying rows. Thereby obtaining current data values. The user never sees stale data. Here, Views (flattened views of Normalised data) are often used.

b. In the sub-commercial non-server context
Create a temporary result-set array, and join the simple, unduplicated, arrays (copies of tables that are cached), and populate it with only the qualifying rows, from the source arrays. The currency of which is maintained by the background process.
- Use the Keys to form the joins between the arrays, in exactly the same way that Keys are used to form the joins in the Relational tables in the database.
- Destroy those components when the user closes the window.
- A clever version would eliminate the result-set array, and join the source arrays via the Keys, and limit the result to the qualifying rows.

Separate to being an architectural insanity, Nested Arrays or Nested Sets or JSON or JSON-like structures are simply not required. This is the consequence of confusing the Architecture 1 Principle.

If you do choose to use such structures, then use them only for the temporary result-set arrays.

Last, I trust this discourse demonstrates that n tables is a non-issue. More important, that m levels deep in the data hierarchy, the "nesting", is a non-issue.

Answer 2

Now that I have given the full context (and not before), which removes the implications in your question, and makes it a generic, kernel one.

The question is about ANY server-side/relational-db. [Which is better]:

2 loops, 5 simple "SELECT" queries

1 loop, 1 "JOIN" query

The detailed examples you have given are not accurately described above. The accurate descriptions is:

Your Option 1 2 loops, each loop for loading each array 1 single-table SELECT query per loop (executed n x m times ... the outermost loop, only, is a single execution)
Your Option 2 1 Joined SELECT query executed once followed by 2 loops, each loop for loading each array

For the commercial SQL platforms, neither, because it does not apply.

The commercial SQL server is a set-processing engine. Use one query with whatever joins are required, that returns a result set. Never step through the rows using a loop, that reduces the set-processing engine to a pre-1970's ISAM system. Use a View, in the server, since it affords the highest performance and the code is in one place.

However, for the non-commercial, non-server platforms, where:

your "server" is not a set-processing engine ie. it returns single rows, therefore you have to fetch each row and fill the array, manually or
your "server" does not provide Client/Server binding, ie. it does not provide facilities on the client to bind the incoming result set to a receiving array, and therefore you have to step through the returned result set, row by row, and fill the array, manually,

as per your example then, the answer is, by a large margin, your option 2.

Please consider carefully, and comment or ask questions.

Response to Comment

Say i need to print this json (or other html page) to some STOUT (example: an http response to: GET /allUsersPhoneNumbers. it's just an example to clarify what i'm expecting to get), should return this json. i have a php function that got this 2 result sets (1). now it should print this json - how should i do that?. this report could be an employee month salary for a whole year, and so one. one way or anther, i need to gather this information and represent it in a "JOIN"ed representation

Perhaps I was not clear enough.

Basically, do not use JSON. Unless you absolutely have to. Which means sending to some system that requires it, which means that receiving system, and that demand, is very, very, stupid.
Make sure that your system doesn't make such demands on others.
Keep your data Normalised. Both in the database, and in whatever program elements that you write. That means (in this example) use one SELECT per table or array. That is for loading purposes, so that you can refer to and inspect them at any point in the program.
When you need a join, understand that it is:
- a result-set; a derived relation; a view
- therefore temporary, it exists for the duration of the execution of that element, only
a. For tables, join them in the usual manner, via Keys. One query, joining two (or more) tables.

b. For arrays, join arrays in the program, the same way you join tables in the database, via Keys.
For the example you have given, which is a response to some request, first understand that it is the category [4], and then fulfil it.

Why even consider JSON ???
What h

thnx详细的答案和回应.现在我明白你的答案 (2认同)

归档时间：	10 年，9 月前
查看次数：	4307 次
最近记录：	8 年前