从现有表和列表创建新表

Dil*_*ila 2 python pandas

我正在读取一个表格数据文件,如下所示(为了解决这个问题而缩短):

  ID Ah Am  RAs  Ed Em  DEs   Vmag    U-B    B-V    V-I    e_    e_    e_    e_ _ _ _ _ mb n_ 2MASS
   1 10 42 57.6 -59 47 22.6 18.681         1.105  1.461 0.002       0.103 0.053 2 0 1 2       10425765-5947229
   2 10 42 57.7 -59 44 22.2 18.303                2.764 0.012             0.013 2 0 0 2
   3 10 42 57.7 -59 46 58.0 18.610                1.573 0.038             0.039 2 0 0 2       10425776-5946583
   4 10 42 57.8 -59 47 49.5 12.870         0.764  0.799 0.009       0.009 0.009 3 0 1 3       10425773-5947495
   5 10 42 57.8 -59 44 03.4 18.815         1.072  1.433 0.017       0.110 0.043 2 0 1 2
   6 10 42 57.8 -59 48 29.3 18.697                1.304 0.014             0.019 2 0 0 2       10425778-5948293
   7 10 42 57.8 -59 44 08.5 17.817         1.700  2.384 0.011       0.108 0.013 2 0 1 2       10425786-5944083
   8 10 42 57.9 -59 43 11.1 18.621         0.925  1.322 0.014       0.084 0.014 2 0 1 2
   9 10 42 58.0 -59 41 34.4 16.993         0.998  1.742 0.003       0.027 0.003 3 0 1 3       10425799-5941342
  10 10 42 58.0 -59 49 23.3 16.981         0.656  1.043 0.023       0.034 0.023 3 0 1 3       10425796-5949235
  11 10 42 58.1 -59 48 20.2 17.047         0.926  1.003 0.009       0.034 0.017 3 0 1 3
  12 10 42 58.1 -59 47 51.5 17.535         0.879  1.197 0.008       0.071 0.035 2 0 1 2
  13 10 42 58.2 -59 47 16.9 15.982         0.854  1.146 0.006       0.011 0.008 3 0 1 3       10425820-5947169
  14 10 42 58.2 -59 36 10.2 18.855                1.376 0.051             0.069 2 0 0 2
  15 10 42 58.2 -59 49 29.5 17.959         0.830  1.229 0.027       0.060 0.027 2 0 1 2       10425821-5949297
  16 10 42 58.2 -59 45 39.7 18.556         1.114  1.520 0.001       0.103 0.007 2 0 1 2
  17 10 42 58.3 -59 48 59.5 18.659         1.252  2.013 0.000       0.126 0.018 2 0 1 2       10425824-5948595
  18 10 42 58.3 -59 48 17.9 15.417         0.707  0.874 0.002       0.010 0.002 3 0 1 3       10425825-5948180
  19 10 42 58.3 -59 39 51.6 16.899         1.050  1.204 0.009       0.026 0.010 3 0 1 3       10425833-5939512
  20 10 42 58.3 -59 42 39.3 18.011         1.016  1.452 0.002       0.068 0.014 2 0 1 2       10425834-5942390
Run Code Online (Sandbox Code Playgroud)
df = pd.read_fwf('Hur_et_al_2012_catalog/table1.dat', infer_nrows=1001)
Run Code Online (Sandbox Code Playgroud)

我还有一个看起来像这样的列表(也缩短了)

total_sources = ['7', '9', '19']
Run Code Online (Sandbox Code Playgroud)

中元素的编号total_sources对应于ID第一个表的列。有没有办法创建第二个表,仅包含IDs中列出的信息total_sources?因此,对于此示例,表格仅显示 ID 7、9 和 19 的信息。

小智 5

import pandas as pd

df = pd.read_fwf('table1.dat', infer_nrows=1001)

total_sources = ['7', '9', '19']
rows = [int(x) for x in total_sources]
df_filtered = df[df['ID'].isin(rows)]
df_filtered
Run Code Online (Sandbox Code Playgroud)

结果是:

    ID  Ah  Am  RAs     Ed  Em  DEs     Vmag    U-B     B-V     ...     e_.1    e_.2    e_.3    _   _.1     _.2     _.3     mb  n_  2MASS
6   7   10  42  57.8    -59     44  8.5     17.817  NaN     1.700   ...     NaN     0.108   0.013   2   0   1   2   NaN     NaN     10425786-5944083
8   9   10  42  58.0    -59     41  34.4    16.993  NaN     0.998   ...     NaN     0.027   0.003   3   0   1   3   NaN     NaN     10425799-5941342
18  19  10  42  58.3    -59     39  51.6    16.899  NaN     1.050   ...     NaN     0.026   0.010   3   0   1   3   NaN     NaN     10425833-5939512
Run Code Online (Sandbox Code Playgroud)