将高斯噪声添加到浮点数据集中并保存(Python)

sar*_*ara 3 classification machine-learning noise python-3.x

我正在研究分类问题,我需要在数据集中添加不同级别的高斯噪声,并进行分类实验,直到我的ML算法无法对数据集进行分类。不幸的是,我不知道该怎么做。关于如何添加高斯噪声的任何建议或编码技巧?

Moh*_*OUI 9

您可以按照以下步骤操作:

  • 将数据加载到熊猫数据框 clean_signal = pd.read_csv("data_file_name")
  • 使用numpy生成尺寸与数据集相同的高斯噪声。
  • 添加噪音以清洁信号 signal = clean_signal + noise

这是一个可重现的示例:

import pandas as pd
# create a sample dataset with dimension (2,2)
# in your case you need to replace this with 
# clean_signal = pd.read_csv("your_data.csv")   
clean_signal = pd.DataFrame([[1,2],[3,4]], columns=list('AB'), dtype=float) 
print(clean_signal)
"""
print output: 
    A    B
0  1.0  2.0
1  3.0  4.0
"""
import numpy as np 
mu, sigma = 0, 0.1 
# creating a noise with the same dimension as the dataset (2,2) 
noise = np.random.normal(mu, sigma, [2,2]) 
print(noise)

"""
print output: 
array([[-0.11114313,  0.25927152],
       [ 0.06701506, -0.09364186]])
"""
signal = clean_signal + noise
print(signal)
"""
print output: 
          A         B
0  0.888857  2.259272
1  3.067015  3.906358
""" 
Run Code Online (Sandbox Code Playgroud)

不含注释和打印语句的整体代码:

import pandas as pd
# clean_signal = pd.read_csv("your_data.csv")
clean_signal = pd.DataFrame([[1,2],[3,4]], columns=list('AB'), dtype=float) 
import numpy as np 
mu, sigma = 0, 0.1 
noise = np.random.normal(mu, sigma, [2,2])
signal = clean_signal + noise
Run Code Online (Sandbox Code Playgroud)

将文件保存回csv

signal.to_csv("output_filename.csv", index=False)
Run Code Online (Sandbox Code Playgroud)

  • `mu` 是平均值,`sigma` 是标准偏差,所以使用这些参数来改变噪声 (2认同)