在将列表转换为数据框或数据表时处理丢失的信息

Question

在将列表转换为数据框或数据表时处理丢失的信息

与上一个问题相关的是,有没有办法将一些命名元素列表转换成一个数据表,其中NA值实际按照它们出现在列表中的顺序显示在数据表中？

例如:列表

testlist <- list("Blue", "405", "Truck", "400", "Car", "White", "500", "Truck")
testnames <- c("Color", "HP", "Type", "HP", "Type", "Color", "HP", "Type")
names(testlist) <- testnames

$Color
[1] "Blue"

$HP
[1] "405"

$Type
[1] "Truck"

$HP
[1] "400"

$Type
[1] "Car"

$Color
[1] "White"

$HP
[1] "500"

$Type
[1] "Truck"

Run Code Online (Sandbox Code Playgroud)

可以使用以下方法更改为数据表:

dcast(setDT(melt(testlist))[, N:=1:.N, L1], N~L1, value.var='value')

Run Code Online (Sandbox Code Playgroud)

但输出是这样的:

  N Color  HP  Type
1 1  Blue 405 Truck
2 2 White 400   Car
3 3  <NA> 500 Truck

Run Code Online (Sandbox Code Playgroud)

当我想要:

  N Color  HP  Type
1 1  Blue 405 Truck
2 2  <NA> 400   Car
3 3 White 500 Truck

Run Code Online (Sandbox Code Playgroud)

有没有人建议如何解决这个问题？我很感激帮助.

Answer 1

bgo*_*dst 9

One approach is to preallocate a table with the correct number of rows and the correct number, names, and types of columns, and then fill it in by index-assigning the cells covered by the original list.

cns <- c('Color','HP','Type');
lcis <- match(names(testlist),cns);
lris <- c(1L,cumsum(diff(lcis)<=0L)+1L);
df <- as.data.frame(testlist[match(1:length(cns),lcis)],stringsAsFactors=F)[0,];
df[max(lris),] <- NA;
df;
##   Color   HP Type
## 1  <NA> <NA> <NA>
## 2  <NA> <NA> <NA>
## 3  <NA> <NA> <NA>
for (ci in 1:length(cns)) { m <- lcis==ci; df[lris[m],ci] <- do.call(c,testlist[m]); };
df;
##   Color  HP  Type
## 1  Blue 405 Truck
## 2  <NA> 400   Car
## 3 White 500 Truck

Run Code Online (Sandbox Code Playgroud)

In my solution I was careful to handle each column separately, which provides the potential benefit that if different columns in the output table (corresponding to different subsets of components in the input list) have different data types, then those data types will be preserved in the final table. This is why I opted for a for loop for the index-assignment. This is of course not necessary for your exact input list, which has only character types, but I thought it was a worthy goal anyway.

Explanation of intermediate variables

cns The column names in the output table.
lcis每个输入列表组件的列索引将在输出表中具有.这是通过简单地匹配输入列表组件的名称来计算的cns.
lris The row indexes each input list component will have in the output table. The computation of this variable is somewhat interesting and central to the solution. Since column representation in the input list is incomplete (IOW there can be "missing columns" in the input list), but you consider the input list components to be ordered with respect to their row-wise occurrence in the output table, we can't use regular indexing (such as taking every three components as a row), and we also can't use any single column name as a marker of each row, because any column can be missing in any row. From my thinking, the only correct approach is to identify when a lower-index (or equal-index, actually) column occurs immediately after a higher-index (or equal-index) column in the input list, and take those as row breaks. Hence, we can take diff(lcis)<=0L要获得表示行中断的逻辑向量,请使用cumsum()和添加1以获取行索引,并且我们还必须手动前置1以完成向量.
ci输出表中的列索引.在for循环期间用于迭代每个输出列.
mci在for循环内为每个计算.表示哪个输入列表组件属于当前列的逻辑向量ci.用于索引两者lris(拉出要分配的行索引)和输入列表本身(拉出要分配的实际值).

实际数据

我从dropbox抓取你的真实数据并将其存储为testlist.以下是我的调查结果.

首先,我按照它们出现的顺序检查了唯一的组件名称,将它们视为cns:

## first reasonable assumption about cns
cns <- unique(names(testlist));
cns;
##  [1] "Status"              "Make"                "Model"
##  [4] "Kilometres"          "Stock Number"        "Engine"
##  [7] "Number of Hours"     "Front axle"          "Rear axle"
## [10] "Suspension"          "Wheelbase"           "Transmission"
## [13] "Price"               "Style/Trim"          "Brakes"
## [16] "Mfg Exterior Colour" "Tires"               "Engine (HP)"
## [19] "Exterior Colour"

Run Code Online (Sandbox Code Playgroud)

From which we can compute a new tentative lcis:

## examine lcis for ordering
lcis <- match(names(testlist),cns);
lcis;
##   [1]  1  2  3  4  5  6  7  8  9 10 11 12 13  1  2  3  4  5  6  7  8  9 10 11 12
##  [26] 13  1  2  3  4  5  6  7  8  9 10 11 12 13  1  2  3  4  5  6  7  8  9 10 11
##  [51] 12 13  1  2  3  4  5  6  7  8  9 10 11 12 13  1  2  3  4  5  6  7  8  9 10
##  [76] 11 12 13  1  2  3  4  5  6  7  8  9 10 11 12 13  1  2  3  4  5  6  7  8  9
## [101] 10 11 12 13  1  2  3  4  5  6  7  8  9 10 11 12 13  1  2  3  4  5  6  7  8
## [126]  9 10 11 12 13  1  2  3  4  5  6  7  8  9 10 11 12 13  1  2  3  4  5  6  7
## [151]  8  9 10 11 12 13  1  2  3  4 14 13  1  2  3  4  5  6  7  8  9 10 11 12 13
## [176]  1  2  3  4  5 15 16  6  8  9 10 17 11 18 12 19 13  1  2  3  4  5 15 16  6
## [201]  8  9 10 17 11 18 12 19 13

Run Code Online (Sandbox Code Playgroud)

Looking carefully at the above vector, we can see that it begins with many regular repetitions of 1:13. In fact, only towards the end of the vector does it become irregular, where we see 14 followed by 13, and 16 followed by 6, 10-11-12 interleaved with 17-18-19, etc.

But one important observation we can make here is that the vector seems to consist of groups delineated by 1 and 13. In other words, for all extents that seem to have some regularity (even if there is also some irregularity), they seem to begin with 1 and end with 13. This observation agrees with your comment regarding disorder in the middle of vehicle data. Let's call this the 1/13 assumption.

We can get a clearer view of the groups by splitting on this 1/13 boundary:

## recognizing 1/13 consistency, split on it to see how each (possible) row looks under this assumption
split(lcis,cumsum(lcis==1L));
## $`1`
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13
##
## $`2`
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13
##
## $`3`
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13
##
## $`4`
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13
##
## $`5`
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13
##
## $`6`
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13
##
## $`7`
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13
##
## $`8`
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13
##
## $`9`
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13
##
## $`10`
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13
##
## $`11`
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13
##
## $`12`
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13
##
## $`13`
## [1]  1  2  3  4 14 13
##
## $`14`
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13
##
## $`15`
##  [1]  1  2  3  4  5 15 16  6  8  9 10 17 11 18 12 19 13
##
## $`16`
##  [1]  1  2  3  4  5 15 16  6  8  9 10 17 11 18 12 19 13

Run Code Online (Sandbox Code Playgroud)

Now, if you look very carefully at the above groups, you can figure out that it is possible to reorder cns in such a way that all groups will be ordered ascending. They will not be contiguous, but contiguity is not required for the solution I devised for the original problem; all that is necessary is ascending order.

For instance, we need to order column 14 before 13, and we need to order columns 15 and 16 before 6, 8, 9, etc.:

## recognizing the possibility of reordering to achieve perfect within-row ascending order, reorder cns to cns2
cns2 <- cns[c(1,2,3,4,14,5,15,16,6,7,8,9,10,17,11,18,12,19,13)];
cns2;
##  [1] "Status"              "Make"                "Model"
##  [4] "Kilometres"          "Style/Trim"          "Stock Number"
##  [7] "Brakes"              "Mfg Exterior Colour" "Engine"
## [10] "Number of Hours"     "Front axle"          "Rear axle"
## [13] "Suspension"          "Tires"               "Wheelbase"
## [16] "Engine (HP)"         "Transmission"        "Exterior Colour"
## [19] "Price"

Run Code Online (Sandbox Code Playgroud)

Now we can recalculate lcis, which I will now call lcis2, and demonstrate the new group orders:

## calculate lcis2 from cns2, and prove that we've successfully ordered each individual row under the 1/13 (now 1/19) break assumption
lcis2 <- match(names(testlist),cns2);
split(lcis2,cumsum(lcis2==1L));
## $`1`
##  [1]  1  2  3  4  6  9 10 11 12 13 15 17 19
##
## $`2`
##  [1]  1  2  3  4  6  9 10 11 12 13 15 17 19
##
## $`3`
##  [1]  1  2  3  4  6  9 10 11 12 13 15 17 19
##
## $`4`
##  [1]  1  2  3  4  6  9 10 11 12 13 15 17 19
##
## $`5`
##  [1]  1  2  3  4  6  9 10 11 12 13 15 17 19
##
## $`6`
##  [1]  1  2  3  4  6  9 10 11 12 13 15 17 19
##
## $`7`
##  [1]  1  2  3  4  6  9 10 11 12 13 15 17 19
##
## $`8`
##  [1]  1  2  3  4  6  9 10 11 12 13 15 17 19
##
## $`9`
##  [1]  1  2  3  4  6  9 10 11 12 13 15 17 19
##
## $`10`
##  [1]  1  2  3  4  6  9 10 11 12 13 15 17 19
##
## $`11`
##  [1]  1  2  3  4  6  9 10 11 12 13 15 17 19
##
## $`12`
##  [1]  1  2  3  4  6  9 10 11 12 13 15 17 19
##
## $`13`
## [1]  1  2  3  4  5 19
##
## $`14`
##  [1]  1  2  3  4  6  9 10 11 12 13 15 17 19
##
## $`15`
##  [1]  1  2  3  4  6  7  8  9 11 12 13 14 15 16 17 18 19
##
## $`16`
##  [1]  1  2  3  4  6  7  8  9 11 12 13 14 15 16 17 18 19

Run Code Online (Sandbox Code Playgroud)

And finally, we can run the entire solution, being careful to use the 2-suffixed variable names now:

## now we can apply the preallocate/fill-in solution using cns2 and lcis2
## will use lris2 and df2 just to be consistent
lris2 <- c(1L,cumsum(diff(lcis2)<=0L)+1L);
df2 <- as.data.frame(testlist[match(1:length(cns2),lcis2)],stringsAsFactors=F)[0,];
df2[max(lris2),] <- NA;
df2;
##    Status Make Model Kilometres Style.Trim Stock.Number Brakes Mfg.Exterior.Colour Engine Number.of.Hours Front.axle Rear.axle Suspension Tires Wheelbase Engine..HP. Transmission Exterior.Colour Price
## 1    <NA> <NA>  <NA>       <NA>       <NA>         <NA>   <NA>                <NA>   <NA>            <NA>       <NA>      <NA>       <NA>  <NA>      <NA>        <NA>         <NA>            <NA>  <NA>
## 2    <NA> <NA>  <NA>       <NA>       <NA>         <NA>   <NA>                <NA>   <NA>            <NA>       <NA>      <NA>       <NA>  <NA>      <NA>        <NA>         <NA>            <NA>  <NA>
## 3    <NA> <NA>  <NA>       <NA>       <NA>         <NA>   <NA>                <NA>   <NA>            <NA>       <NA>      <NA>       <NA>  <NA>      <NA>        <NA>         <NA>            <NA>  <NA>
## 4    <NA> <NA>  <NA>       <NA>       <NA>         <NA>   <NA>                <NA>   <NA>            <NA>       <NA>      <NA>       <NA>  <NA>      <NA>        <NA>         <NA>            <NA>  <NA>
## 5    <NA> <NA>  <NA>       <NA>       <NA>         <NA>   <NA>                <NA>   <NA>            <NA>       <NA>      <NA>       <NA>  <NA>      <NA>        <NA>         <NA>            <NA>  <NA>
## 6    <NA> <NA>  <NA>       <NA>       <NA>         <NA>   <NA>                <NA>   <NA>            <NA>       <NA>      <NA>       <NA>  <NA>      <NA>        <NA>         <NA>            <NA>  <NA>
## 7    <NA> <NA>  <NA>       <NA>       <NA>         <NA>   <NA>                <NA>   <NA>            <NA>       <NA>      <NA>       <NA>  <NA>      <NA>        <NA>         <NA>            <NA>  <NA>
## 8    <NA> <NA>  <NA>       <NA>       <NA>         <NA>   <NA>                <NA>   <NA>            <NA>       <NA>      <NA>       <NA>  <NA>      <NA>        <NA>         <NA>            <NA>  <NA>
## 9    <NA> <NA>  <NA>       <NA>       <NA>         <NA>   <NA>                <NA>   <NA>            <NA>       <NA>      <NA>       <NA>  <NA>      <NA>        <NA>         <NA>            <NA>  <NA>
## 10   <NA> <NA>  <NA>       <NA>       <NA>         <NA>   <NA>                <NA>   <NA>            <NA>       <NA>      <NA>       <NA>  <NA>      <NA>        <NA>         <NA>            <NA>  <NA>
## 11   <NA> <NA>  <NA>       <NA>       <NA>         <NA>   <NA>                <NA>   <NA>            <NA>       <NA>      <NA>       <NA>  <NA>      <NA>        <NA>         <NA>            <NA>  <NA>
## 12   <NA> <NA>  <NA>       <NA>       <NA>         <NA>   <NA>                <NA>   <NA>            <NA>       <NA>      <NA>       <NA>  <NA>      <NA>        <NA>         <NA>            <NA>  <NA>
## 13   <NA> <NA>  <NA>       <NA>       <NA>         <NA>   <NA>                <NA>   <NA>            <NA>       <NA>      <NA>       <NA>  <NA>      <NA>        <NA>         <NA>            <NA>  <NA>
## 14   <NA> <NA>  <NA>       <NA>       <NA>         <NA>   <NA>                <NA>   <NA>            <NA>       <NA>      <NA>       <NA>  <NA>      <NA>        <NA>         <NA>            <NA>  <NA>
## 15   <NA> <NA>  <NA>       <NA>       <NA>         <NA>   <NA>                <NA>   <NA>            <NA>       <NA>      <NA>       <NA>  <NA>      <NA>        <NA>         <NA>            <NA>  <NA>
## 16   <NA> <NA>  <NA>       <NA>       <NA>         <NA>   <NA>                <NA>   <NA>            <NA>       <NA>      <NA>       <NA>  <NA>      <NA>        <NA>         <NA>            <NA>  <NA>
for (ci in 1:length(cns2)) { m <- lcis2==ci; df2[lris2[m],ci] <- do.call(c,testlist[m]); };
df2;
##    Status          Make                                          Model Kilometres    Style.Trim Stock.Number Brakes Mfg.Exterior.Colour                  Engine Number.of.Hours                     Front.axle                      Rear.axle                     Suspension    Tires Wheelbase Engine..HP.                   Transmission Exterior.Colour    Price
## 1     New     Peterbilt                 367 Tri-Drive c/w 58'' Sleeper   3,360 km          <NA>        12949   <NA>                <NA> Cummins ISX15  (550 hp)              44  Dana Spicer D2000  (20,000lb) Dana T69-170    (wide track) t Peterbilt Air-Trak  (66,000lb)     <NA>     267''        <NA>  RTLO18918B  Fuller (18 speed)            <NA> $217,770
## 2     New      Kenworth                               T800 T/A Tractor  82,230 km          <NA>        10720   <NA>                <NA>   Cummins ISX15 (550hp)           2,712 Dana Spicer D2000  (20,000 lb) Dana D46-170HPW (46,000 lb) ta Neway ADZ252    (52,000lb) Air     <NA>     244''        <NA> Fuller 18 spd main AT1202 2 sp            <NA> $199,500
## 3     New      Kenworth            T800 Tandem Tractor w/ 38'' Sleeper  98,521 km          <NA>        10722   <NA>                <NA>   Cummins ISX15 (550hp)           2,790 Dana Spicer D2000  (20,000 lb) Dana D46-170HPW (46,000 lb) ta Neway ADZ252    (52,000lb) Air     <NA>     244''        <NA> Fuller 18 spd main AT1202 2 sp            <NA> $199,500
## 4    Used      Kenworth           W900 Tri-Drive Sleeper Truck Tractor 170,422 km          <NA>        13227   <NA>                <NA> Cummins ISX15  (600 hp)           4,925 Meritor FL941      (20,000 lb)  Meritor RZ-166    (69,000 lb)  Kenworth AG690 (69,000lb) Air     <NA>     259''        <NA> 18 speed main &     4 speed au            <NA> $197,750
## 5     New     Peterbilt       367 T/A Wet-Kit Tractor c/w 58'' Sleeper   3,367 km          <NA>        12180   <NA>                <NA>  Cummins ISX15  (550hp)              38 Dana Spicer E14621  (14,600 lb Dana D46-170     (46,000lb) ta Peterbilt Air-Trak  (46,000lb)     <NA>     244''        <NA>  RTLO18918B  Fuller (18 speed)            <NA> $193,300
## 6     New     Peterbilt       367 T/A Wet-Kit Tractor c/w 58'' Sleeper   3,421 km          <NA>        12179   <NA>                <NA>  Cummins ISX15  (550hp)              46 Dana Spicer E14621  (14,600 lb Dana D46-170     (46,000lb) ta Peterbilt Air-Trak  (46,000lb)     <NA>     244''        <NA>  RTLO18918B  Fuller (18 speed)            <NA> $193,300
## 7     New     Peterbilt       367 T/A Wet-Kit Tractor c/w 58'' Sleeper   2,157 km          <NA>        12181   <NA>                <NA>  Cummins ISX15  (550hp)              64 Dana Spicer E14621  (14,600 lb Dana D46-170     (46,000lb) ta Peterbilt Air-Trak  (46,000lb)     <NA>     244''        <NA>  RTLO18918B  Fuller (18 speed)            <NA> $189,880
## 8     New     Peterbilt       367 T/A Wet-Kit Tractor c/w 58'' Sleeper   3,444 km          <NA>        12954   <NA>                <NA>  Cummins ISX15  (550hp)              45 Dana Spicer E14621  (14,600 lb Dana D46-170     (46,000lb) ta Peterbilt Air-Trak  (46,000lb)     <NA>     244''        <NA>  RTLO18918B  Fuller (18 speed)            <NA> $189,880
## 9     New     Peterbilt       367 T/A Wet-Kit Tractor c/w 58'' Sleeper   3,427 km          <NA>        12955   <NA>                <NA>  Cummins ISX15  (550hp)              43 Dana Spicer E14621  (14,600 lb Dana D46-170     (46,000lb) ta Peterbilt Air-Trak  (46,000lb)     <NA>     244''        <NA>  RTLO18918B  Fuller (18 speed)            <NA> $189,880
## 10    New     Peterbilt       367 T/A Wet-Kit Tractor c/w 58'' Sleeper   3,982 km          <NA>        12182   <NA>                <NA>  Cummins ISX15  (550hp)              78 Dana Spicer E14621  (14,600 lb Dana D46-170     (46,000lb) ta Peterbilt Air-Trak  (46,000lb)     <NA>     244''        <NA>  RTLO18918B  Fuller (18 speed)            <NA> $189,880
## 11    New     Peterbilt       367 T/A Wet-Kit Tractor c/w 58'' Sleeper  23,293 km          <NA>        12953   <NA>                <NA>  Cummins ISX15  (550hp)             394 Dana Spicer E14621  (14,600 lb Dana D46-170     (46,000lb) ta Peterbilt Air-Trak  (46,000lb)     <NA>     244''        <NA>  RTLO18918B  Fuller (18 speed)            <NA> $189,880
## 12    New     Peterbilt       367 T/A Wet-Kit Tractor c/w 58'' Sleeper  27,215 km          <NA>        12509   <NA>                <NA>  Cummins ISX15  (550hp)             458 Dana Spicer E14621  (14,600 lb Dana D46-170     (46,000lb) ta Peterbilt Air-Trak  (46,000lb)     <NA>     244''        <NA>  RTLO18918B  Fuller (18 speed)            <NA> $186,600
## 13   Used         Volvo                                 VNL64T 780-730  72,000 km VNL64T780-730         <NA>   <NA>                <NA>                    <NA>            <NA>                           <NA>                           <NA>                           <NA>     <NA>      <NA>        <NA>                           <NA>            <NA> $185,000
## 14    New     Peterbilt 367 T/A Wet Kit Tractor c/w       58'' Sleeper  60,657 km          <NA>        10838   <NA>                <NA>  Cummins ISX15  (550hp)           1,822 Dana Spicer E14621  (14,600 lb Dana D46-170HP (46,000lb) tand Peterbilt Air-Trak  (46,000lb)     <NA>     244''        <NA>  RTLO18918B  Fuller (18 speed)            <NA> $171,800
## 15   Used International                                   ProStar +122  36,236 km          <NA>       463555    Air               White             Cummins ISX            <NA>         Arvin Meritor 13200 lb         Arvin Meritor 40000 lb                     Int'l IROS  11R22.5    228 in         450      Eaton Fuller D/O (18 spd)           White $168,750
## 16   Used International                                   ProStar +122  33,000 km          <NA>       463543    Air               White             Cummins ISX            <NA>         Arvin Meritor 13200 lb         Arvin Meritor 46000 lb                     Int'l IROS 11R/22.5    236 in         475      Eaton Fuller D/O (18 spd)           White $165,900

Run Code Online (Sandbox Code Playgroud)

Now, I realized that it might be preferable to move away entirely from the "ascending-order assumption" (let's call it) to the 1/13 assumption, which we can do very simply by changing the lris calculation. This will absolve us of the need to reorder cns from the order we receive from the unique() call.

Below I demonstrate this, reverting back to the unsuffixed variable names, which will be useful, as will be seen in a moment:

## change lris calculation to depend directly on 1/13 assumption; don't bother reordering
cns <- unique(names(testlist));
lcis <- match(names(testlist),cns);
lris <- c(1L,cumsum(lcis[-1]==1L)+1L);
df <- as.data.frame(testlist[match(1:length(cns),lcis)],stringsAsFactors=F)[0,];
df[max(lris),] <- NA;
for (ci in 1:length(cns)) { m <- lcis==ci; df[lris[m],ci] <- do.call(c,testlist[m]); };
df;
##    Status          Make                                          Model Kilometres Stock.Number                  Engine Number.of.Hours                     Front.axle                      Rear.axle                     Suspension Wheelbase                   Transmission    Price    Style.Trim Brakes Mfg.Exterior.Colour    Tires Engine..HP. Exterior.Colour
## 1     New     Peterbilt                 367 Tri-Drive c/w 58'' Sleeper   3,360 km        12949 Cummins ISX15  (550 hp)              44  Dana Spicer D2000  (20,000lb) Dana T69-170    (wide track) t Peterbilt Air-Trak  (66,000lb)     267''  RTLO18918B  Fuller (18 speed) $217,770          <NA>   <NA>                <NA>     <NA>        <NA>            <NA>
## 2     New      Kenworth                               T800 T/A Tractor  82,230 km        10720   Cummins ISX15 (550hp)           2,712 Dana Spicer D2000  (20,000 lb) Dana D46-170HPW (46,000 lb) ta Neway ADZ252    (52,000lb) Air     244'' Fuller 18 spd main AT1202 2 sp $199,500          <NA>   <NA>                <NA>     <NA>        <NA>            <NA>
## 3     New      Kenworth            T800 Tandem Tractor w/ 38'' Sleeper  98,521 km        10722   Cummins ISX15 (550hp)           2,790 Dana Spicer D2000  (20,000 lb) Dana D46-170HPW (46,000 lb) ta Neway ADZ252    (52,000lb) Air     244'' Fuller 18 spd main AT1202 2 sp $199,500          <NA>   <NA>                <NA>     <NA>        <NA>            <NA>
## 4    Used      Kenworth           W900 Tri-Drive Sleeper Truck Tractor 170,422 km        13227 Cummins ISX15  (600 hp)           4,925 Meritor FL941      (20,000 lb)  Meritor RZ-166    (69,000 lb)  Kenworth AG690 (69,000lb) Air     259'' 18 speed main &     4 speed au $197,750          <NA>   <NA>                <NA>     <NA>        <NA>            <NA>
## 5     New     Peterbilt       367 T/A Wet-Kit Tractor c/w 58'' Sleeper   3,367 km        12180  Cummins ISX15  (550hp)              38 Dana Spicer E14621  (14,600 lb Dana D46-170     (46,000lb) ta Peterbilt Air-Trak  (46,000lb)     244''  RTLO18918B  Fuller (18 speed) $193,300          <NA>   <NA>                <NA>     <NA>        <NA>            <NA>
## 6     New     Peterbilt       367 T/A Wet-Kit Tractor c/w 58'' Sleeper   3,421 km        12179  Cummins ISX15  (550hp)              46 Dana Spicer E14621  (14,600 lb Dana D46-170     (46,000lb) ta Peterbilt Air-Trak  (46,000lb)     244''  RTLO18918B  Fuller (18 speed) $193,300          <NA>   <NA>                <NA>     <NA>        <NA>            <NA>
## 7     New     Peterbilt       367 T/A Wet-Kit Tractor c/w 58'' Sleeper   2,157 km        12181  Cummins ISX15  (550hp)              64 Dana Spicer E14621  (14,600 lb Dana D46-170     (46,000lb) ta Peterbilt Air-Trak  (46,000lb)     244''  RTLO18918B  Fuller (18 speed) $189,880          <NA>   <NA>                <NA>     <NA>        <NA>            <NA>
## 8     New     Peterbilt       367 T/A Wet-Kit Tractor c/w 58'' Sleeper   3,444 km        12954  Cummins ISX15  (550hp)              45 Dana Spicer E14621  (14,600 lb Dana D46-170     (46,000lb) ta Peterbilt Air-Trak  (46,000lb)     244''  RTLO18918B  Fuller (18 speed) $189,880          <NA>   <NA>                <NA>     <NA>        <NA>            <NA>
## 9     New     Peterbilt       367 T/A Wet-Kit Tractor c/w 58'' Sleeper   3,427 km        12955  Cummins ISX15  (550hp)              43 Dana Spicer E14621  (14,600 lb Dana D46-170     (46,000lb) ta Peterbilt Air-Trak  (46,000lb)     244''  RTLO18918B  Fuller (18 speed) $189,880          <NA>   <NA>                <NA>     <NA>        <NA>            <NA>
## 10    New     Peterbilt       367 T/A Wet-Kit Tractor c/w 58'' Sleeper   3,982 km        12182  Cummins ISX15  (550hp)              78 Dana Spicer E14621  (14,600 lb Dana D46-170     (46,000lb) ta Peterbilt Air-Trak  (46,000lb)     244''  RTLO18918B  Fuller (18 speed) $189,880          <NA>   <NA>                <NA>     <NA>        <NA>            <NA>
## 11    New     Peterbilt       367 T/A Wet-Kit Tractor c/w 58'' Sleeper  23,293 km        12953  Cummins ISX15  (550hp)             394 Dana Spicer E14621  (14,600 lb Dana D46-170     (46,000lb) ta Peterbilt Air-Trak  (46,000lb)     244''  RTLO18918B  Fuller (18 speed) $189,880          <NA>   <NA>                <NA>     <NA>        <NA>            <NA>
## 12    New     Peterbilt       367 T/A Wet-Kit Tractor c/w 58'' Sleeper  27,215 km        12509  Cummins ISX15  (550hp)             458 Dana Spicer E14621  (14,600 lb Dana D46-170     (46,000lb) ta Peterbilt Air-Trak  (46,000lb)     244''  RTLO18918B  Fuller (18 speed) $186,600          <NA>   <NA>                <NA>     <NA>        <NA>            <NA>
## 13   Used         Volvo                                 VNL64T 780-730  72,000 km         <NA>                    <NA>            <NA>                           <NA>                           <NA>                           <NA>      <NA>                           <NA> $185,000 VNL64T780-730   <NA>                <NA>     <NA>        <NA>            <NA>
## 14    New     Peterbilt 367 T/A Wet Kit Tractor c/w       58'' Sleeper  60,657 km        10838  Cummins ISX15  (550hp)           1,822 Dana Spicer E14621  (14,600 lb Dana D46-170HP (46,000lb) tand Peterbilt Air-Trak  (46,000lb)     244''  RTLO18918B  Fuller (18 speed) $171,800          <NA>   <NA>                <NA>     <NA>        <NA>            <NA>
## 15   Used International                                   ProStar +122  36,236 km       463555             Cummins ISX            <NA>         Arvin Meritor 13200 lb         Arvin Meritor 40000 lb                     Int'l IROS    228 in      Eaton Fuller D/O (18 spd) $168,750          <NA>    Air               White  11R22.5         450           White
## 16   Used International                                   ProStar +122  33,000 km       463543             Cummins ISX            <NA>         Arvin Meritor 13200 lb         Arvin Meritor 46000 lb                     Int'l IROS    236 in      Eaton Fuller D/O (18 spd) $165,900          <NA>    Air               White 11R/22.5         475           White

Run Code Online (Sandbox Code Playgroud)

As you can see, the column order of df is different from df2, but we can prove the data is identical with the following:

## prove df2 and df are identical, ignoring the column order difference
identical(df,df2[names(df)]);
## [1] TRUE

Run Code Online (Sandbox Code Playgroud)

Answer 2

Kha*_*haa 5

我能提出的最佳解决方案

library(data.table)
listnames <- names(testlist) 
# "Color" "HP"    "Type"  "HP"    "Type"  "Color" "HP"    "Type" 

unames <- unique(listnames)
# "Color" "HP"    "Type"

a <- setNames(1:length(unames), unames)
# Color    HP  Type 
# 1     2     3 

d <- unname(a[listnames])
# [1] 1 2 3 2 3 1 2 3

splitted_list <- split(testlist, cumsum(shift(d, fill=0)>d))
# results in testlist splitted by increasing sequences in d
# (1,2,3), (2,3), (1, 2, 3)
# You can impose a different splitting condition here, for instance, 
# if each entry begins with 1, then cumsum(d==1) is adequate 

# and the last step is pretty much self explanatory
rbindlist(lapply(splitted_list, data.frame), fill=TRUE) 
#    Color  HP  Type
# 1:  Blue 405 Truck
# 2:    NA 400   Car
# 3: White 500 Truck

Run Code Online (Sandbox Code Playgroud)

希望它解决你的问题.

当从具有拆分条件的Dropbox应用于您的测试数据时cumsum(d==1),结果是

structure(list(Status = structure(c(1L, 1L, 1L, 2L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L), .Label = c("New", "Used"
), class = "factor"), Make = structure(c(1L, 2L, 2L, 2L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 4L, 4L), .Label = c("Peterbilt", 
"Kenworth", "Volvo", "International"), class = "factor"), Model = structure(c(1L, 
2L, 3L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 7L, 8L, 8L), .Label = c("367 Tri-Drive c/w 58'' Sleeper", 
"T800 T/A Tractor", "T800 Tandem Tractor w/ 38'' Sleeper", "W900 Tri-Drive Sleeper Truck Tractor", 
"367 T/A Wet-Kit Tractor c/w 58'' Sleeper", "VNL64T 780-730", 
"367 T/A Wet Kit Tractor c/w       58'' Sleeper", "ProStar +122"
), class = "factor"), Kilometres = structure(1:16, .Label = c("3,360 km", 
"82,230 km", "98,521 km", "170,422 km", "3,367 km", "3,421 km", 
"2,157 km", "3,444 km", "3,427 km", "3,982 km", "23,293 km", 
"27,215 km", "72,000 km", "60,657 km", "36,236 km", "33,000 km"
), class = "factor"), Stock.Number = structure(c(1L, 2L, 3L, 
4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, NA, 13L, 14L, 15L), .Label = c("12949", 
"10720", "10722", "13227", "12180", "12179", "12181", "12954", 
"12955", "12182", "12953", "12509", "10838", "463555", "463543"
), class = "factor"), Engine = structure(c(1L, 2L, 2L, 3L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, NA, 4L, 5L, 5L), .Label = c("Cummins ISX15  (550 hp)", 
"Cummins ISX15 (550hp)", "Cummins ISX15  (600 hp)", "Cummins ISX15  (550hp)", 
"Cummins ISX"), class = "factor"), Number.of.Hours = structure(c(1L, 
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, NA, 13L, NA, NA
), .Label = c("44", "2,712", "2,790", "4,925", "38", "46", "64", 
"45", "43", "78", "394", "458", "1,822"), class = "factor"), 
    Front.axle = structure(c(1L, 2L, 2L, 3L, 4L, 4L, 4L, 4L, 
    4L, 4L, 4L, 4L, NA, 4L, 5L, 5L), .Label = c("Dana Spicer D2000  (20,000lb)", 
    "Dana Spicer D2000  (20,000 lb)", "Meritor FL941      (20,000 lb)", 
    "Dana Spicer E14621  (14,600 lb", "Arvin Meritor 13200 lb"
    ), class = "factor"), Rear.axle = structure(c(1L, 2L, 2L, 
    3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, NA, 5L, 6L, 7L), .Label = c("Dana T69-170    (wide track) t", 
    "Dana D46-170HPW (46,000 lb) ta", "Meritor RZ-166    (69,000 lb)", 
    "Dana D46-170     (46,000lb) ta", "Dana D46-170HP (46,000lb) tand", 
    "Arvin Meritor 40000 lb", "Arvin Meritor 46000 lb"), class = "factor"), 
    Suspension = structure(c(1L, 2L, 2L, 3L, 4L, 4L, 4L, 4L, 
    4L, 4L, 4L, 4L, NA, 4L, 5L, 5L), .Label = c("Peterbilt Air-Trak  (66,000lb)", 
    "Neway ADZ252    (52,000lb) Air", "Kenworth AG690 (69,000lb) Air", 
    "Peterbilt Air-Trak  (46,000lb)", "Int'l IROS"), class = "factor"), 
    Wheelbase = structure(c(1L, 2L, 2L, 3L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, NA, 2L, 4L, 5L), .Label = c("267''", "244''", 
    "259''", "228 in", "236 in"), class = "factor"), Transmission = structure(c(1L, 
    2L, 2L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, NA, 1L, 4L, 4L
    ), .Label = c("RTLO18918B  Fuller (18 speed)", "Fuller 18 spd main AT1202 2 sp", 
    "18 speed main &     4 speed au", "Eaton Fuller D/O (18 spd)"
    ), class = "factor"), Price = structure(c(1L, 2L, 2L, 3L, 
    4L, 4L, 5L, 5L, 5L, 5L, 5L, 6L, 7L, 8L, 9L, 10L), .Label = c("$217,770", 
    "$199,500", "$197,750", "$193,300", "$189,880", "$186,600", 
    "$185,000", "$171,800", "$168,750", "$165,900"), class = "factor"), 
    Style.Trim = structure(c(NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, 1L, NA, NA, NA), .Label = "VNL64T780-730", class = "factor"), 
    Brakes = structure(c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, 1L, 1L), .Label = "Air", class = "factor"), 
    Mfg.Exterior.Colour = structure(c(NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, 1L, 1L), .Label = "White", class = "factor"), 
    Tires = structure(c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, 1L, 2L), .Label = c("11R22.5", "11R/22.5"
    ), class = "factor"), Engine..HP. = structure(c(NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1L, 2L), .Label = c("450", 
    "475"), class = "factor"), Exterior.Colour = structure(c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1L, 1L
    ), .Label = "White", class = "factor")), .Names = c("Status", 
"Make", "Model", "Kilometres", "Stock.Number", "Engine", "Number.of.Hours", 
"Front.axle", "Rear.axle", "Suspension", "Wheelbase", "Transmission", 
"Price", "Style.Trim", "Brakes", "Mfg.Exterior.Colour", "Tires", 
"Engine..HP.", "Exterior.Colour"), row.names = c(NA, -16L), class = "data.frame")

Run Code Online (Sandbox Code Playgroud)

归档时间：	10 年，7 月前
查看次数：	288 次
最近记录：	10 年，7 月前