san*_*jha 7 python random dataframe python-3.x pandas
我有示例架构,其中包含 12 列,每列都有特定的类别。现在我需要将这些数据模拟成大约 1000 行的数据帧。我该怎么办?
我使用下面的代码为每列生成数据
Location = ['USA','India','Prague','Berlin','Dubai','Indonesia','Vienna']
Location = random.choice(Location)
Age = ['Under 18','Between 18 and 64','65 and older']
Age = random.choice(Age)
Gender = ['Female','Male','Other']
Gender = random.choice(Gender)
Run Code Online (Sandbox Code Playgroud)
等等
我需要如下的输出
Location Age Gender
Dubai below 18 Female
India 65 and older Male
Run Code Online (Sandbox Code Playgroud)
。。。。
您可以使用以下命令一一创建每一列np.random.choice
:
df = pd.DataFrame()
N = 1000
df["Location"] = np.random.choice(Location, size=N)
df["Age"] = np.random.choice(Age, size=N)
df["Gender"] = np.random.choice(Gender, size=N)
Run Code Online (Sandbox Code Playgroud)
或者使用列表理解来做到这一点:
column_to_choice = {"Location": Location, "Age": Age, "Gender": Gender}
df = pd.DataFrame(
[np.random.choice(column_to_choice[c], 100) for c in column_to_choice]
).T
df.columns = list(column_to_choice.keys())
Run Code Online (Sandbox Code Playgroud)
结果:
>>> print(df.head())
Location Age Gender
0 India 65 and older Female
1 Berlin Between 18 and 64 Female
2 USA Between 18 and 64 Male
3 Indonesia Under 18 Male
4 Dubai Under 18 Other
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
3471 次 |
最近记录: |