Lon*_*hai 12 statistics machine-learning probability-theory
我已经阅读了一些关于非iid数据的论文.基于维基百科,我知道iid(独立和相同的分布式)数据是什么,但我仍然对非iid感到困惑.我做了一些研究,但找不到明确的定义和例子.有人可以帮我吗?
gre*_*ess 16
来自维基百科iid
:
"独立且相同分布"意味着序列中的元素独立于之前的随机变量.以这种方式,IID序列不同于马尔可夫序列,其中第n个随机变量的概率分布是序列中先前随机变量的函数(对于一阶马尔可夫序列).
As a simple synthetic example, assume you have a special dice with 6 faces. If the last time the face value is 1, next time you throw it, you will still get a face value of 1 with 0.5 probability and a face value of 2,3,4,5,6 each with 0.1 probability. However, if the last time the face value is not 1, you get equal probability of each face. E.g.,
p(face(0) = k) = 1/6, k = 1,2,3,4,5,6 -- > initial probability at time 0.
p(face(t) = 1| face(t-1) = 1) = 0.5, p(face(t) = 1| face(t-1) != 1) = 1/6
p(face(t) = 2| face(t-1) = 1) = 0.1, p(face(t) = 1| face(t-1) != 1) = 1/6
p(face(t) = 3| face(t-1) = 1) = 0.1, p(face(t) = 1| face(t-1) != 1) = 1/6
p(face(t) = 4| face(t-1) = 1) = 0.1, p(face(t) = 1| face(t-1) != 1) = 1/6
p(face(t) = 5| face(t-1) = 1) = 0.1, p(face(t) = 1| face(t-1) != 1) = 1/6
p(face(t) = 6| face(t-1) = 1) = 0.1, p(face(t) = 1| face(t-1) != 1) = 1/6
face(t) stands for the face value of t-th throw.
Run Code Online (Sandbox Code Playgroud)
This is an example when the probability distribution for the nth random variable (the result of the nth throw) is a function of the previous random variable in the sequence.
I see Non-identical and Non-independent (e.g, Markovian) data in some machine learning scenarios, which can be thought of as non-iid examples.
当传入示例的分布随时间变化时,使用流数据进行在线学习:示例不是相同分布的.假设您有一个用于预测在线广告点击率的学习模块,那么来自用户的查询字词的分布在一年中会根据季节趋势发生变化.夏季和圣诞节期间的查询字词应该有不同的分布.
主动学习,学习者请求特定数据的标签:也违反了独立性假设.
使用图形模型学习/制作推理.变量通过依赖关系连接起来.