使用Python / Numpy中的单词构建转换矩阵

Question

使用Python / Numpy中的单词构建转换矩阵

我正在尝试使用此数据构建3x3转换矩阵

days=['rain', 'rain', 'rain', 'clouds', 'rain', 'sun', 'clouds', 'clouds', 
  'rain', 'sun', 'rain', 'rain', 'clouds', 'clouds', 'sun', 'sun', 
  'clouds', 'clouds', 'rain', 'clouds', 'sun', 'rain', 'rain', 'sun',
  'sun', 'clouds', 'clouds', 'rain', 'rain', 'sun', 'sun', 'rain', 
  'rain', 'sun', 'clouds', 'clouds', 'sun', 'sun', 'clouds', 'rain', 
  'rain', 'rain', 'rain', 'sun', 'sun', 'sun', 'sun', 'clouds', 'sun', 
  'clouds', 'clouds', 'sun', 'clouds', 'rain', 'sun', 'sun', 'sun', 
  'clouds', 'sun', 'rain', 'sun', 'sun', 'sun', 'sun', 'clouds', 
  'rain', 'clouds', 'clouds', 'sun', 'sun', 'sun', 'sun', 'sun', 'sun', 
  'clouds', 'clouds', 'clouds', 'clouds', 'clouds', 'sun', 'rain', 
  'rain', 'rain', 'clouds', 'sun', 'clouds', 'clouds', 'clouds', 'rain', 
  'clouds', 'rain', 'sun', 'sun', 'clouds', 'sun', 'sun', 'sun', 'sun',
  'sun', 'sun', 'rain']

Run Code Online (Sandbox Code Playgroud)

目前，我用一些临时词典和一些单独计算每种天气概率的列表来实现。它不是一个漂亮的解决方案。有人可以指导我提供更合理的解决方案吗？

self.transitionMatrix=np.zeros((3,3))

#the columns are today
sun_total_count = 0
temp_dict={'sun':0, 'clouds':0, 'rain':0}
total_runs = 0
for (x, y), c in Counter(zip(data, data[1:])).items():
    #if column 0 is sun
    if x is 'sun':
        #find the sum of all the numbers in this column
        sun_total_count +=  c
        total_runs += 1
        if y is 'sun':
            temp_dict['sun'] = c
        if y is 'clouds':
            temp_dict['clouds'] = c
        if y is 'rain':
            temp_dict['rain'] = c

        if total_runs is 3:
            self.transitionMatrix[0][0] = temp_dict['sun']/sun_total_count
            self.transitionMatrix[1][0] = temp_dict['clouds']/sun_total_count
            self.transitionMatrix[2][0] = temp_dict['rain']/sun_total_count

return self.transitionMatrix

Run Code Online (Sandbox Code Playgroud)

对于每种类型的天气，我需要计算第二天的概率

Answer 1

Her*_*van 11

如果您不介意使用pandas，则可以使用一种方法来提取过渡概率：

pd.crosstab(pd.Series(days[1:],name='Tomorrow'),
            pd.Series(days[:-1],name='Today'),normalize=1)

Run Code Online (Sandbox Code Playgroud)

输出：

Today      clouds      rain       sun
Tomorrow                             
clouds    0.40625  0.230769  0.309524
rain      0.28125  0.423077  0.142857
sun       0.31250  0.346154  0.547619

Run Code Online (Sandbox Code Playgroud)

考虑到今天下雨了，在这里的（向前）概率是在“雨”列，“太阳”行中找到的。如果您想获得后向概率（鉴于今天的天气，昨天的天气可能是多少），请切换前两个参数。

如果希望将概率存储在行而不是列中，请进行设置，normalize=0但请注意，如果直接在此示例中执行此操作，则将获得向后存储为行的概率。如果您希望获得与上述相同的结果但已转置，则可以a）是，转置或b）切换前两个参数的顺序并将其设置normalize为0。

如果只想将结果保留为numpy二维数组（而不是熊猫数据框），请.values在最后一个括号后键入。

Answer 2

Bra*_*mon 5

我喜欢pandas和itertools为此的组合。代码块比上面的要长一些，但不要将冗长与速度混为一谈。（windowfunc 应该非常快；诚然，pandas 部分会更慢。）

首先，制作一个“窗口”功能。这是 itertools 食谱中的一个。这会让你得到一个转换元组列表（state1 到 state2）。

from itertools import islice

def window(seq, n=2):
    """Sliding window width n from seq.  From old itertools recipes."""
    it = iter(seq)
    result = tuple(islice(it, n))
    if len(result) == n:
        yield result
    for elem in it:
        result = result[1:] + (elem,)
        yield result

# list(window(days))
# [('rain', 'rain'),
#  ('rain', 'rain'),
#  ('rain', 'clouds'),
#  ('clouds', 'rain'),
#  ('rain', 'sun'),
# ...

Run Code Online (Sandbox Code Playgroud)

然后使用pandas groupby + value counts 操作得到从每个状态1到每个状态2的转换矩阵：

import pandas as pd

pairs = pd.DataFrame(window(days), columns=['state1', 'state2'])
counts = pairs.groupby('state1')['state2'].value_counts()
probs = (counts / counts.sum()).unstack()

Run Code Online (Sandbox Code Playgroud)

您的结果如下所示：

print(probs)
state2  clouds  rain   sun
state1                    
clouds    0.13  0.09  0.10
rain      0.06  0.11  0.09
sun       0.13  0.06  0.23

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年前
查看次数：	3567 次
最近记录：	6 年，8 月前