我有以下3个表格:
AggData <- structure(list(Path = c("NonBrand", "Brand", "NonBrand,NonBrand",
"Brand,Brand", "NonBrand,NonBrand,NonBrand", "Brand,Brand,Brand",
"Brand,NonBrand", "NonBrand,Brand", "NonBrand,NonBrand,NonBrand,NonBrand",
"Brand,Brand,Brand,Brand", "NonBrand,NonBrand,NonBrand,NonBrand,NonBrand",
"Brand,Brand,Brand,Brand,Brand", "Brand,Brand,NonBrand", "NonBrand,Brand,Brand",
"Brand,NonBrand,NonBrand", "NonBrand,NonBrand,NonBrand,NonBrand,NonBrand,NonBrand",
"NonBrand,NonBrand,Brand", "Brand,NonBrand,Brand", "NonBrand,Brand,NonBrand",
"NonBrand,NonBrand,NonBrand,NonBrand,NonBrand,NonBrand,NonBrand",
"Brand,Brand,Brand,Brand,Brand,Brand", "NonBrand,NonBrand,NonBrand,NonBrand,NonBrand,NonBrand,NonBrand,NonBrand",
"NonBrand,Brand,Brand,Brand", "NonBrand,NonBrand,NonBrand,Brand",
"Brand,Brand,Brand,NonBrand", "Brand,Brand,Brand,Brand,Brand,Brand,Brand",
"Brand,NonBrand,NonBrand,NonBrand", "NonBrand,NonBrand,Brand,Brand",
"Brand,Brand,NonBrand,NonBrand", "Brand,NonBrand,Brand,Brand",
"NonBrand,NonBrand,NonBrand,NonBrand,NonBrand,NonBrand,NonBrand,NonBrand,NonBrand",
"Brand,Brand,NonBrand,Brand", "NonBrand,Brand,NonBrand,NonBrand",
"Brand,Brand,Brand,Brand,Brand,Brand,Brand,Brand", "NonBrand,NonBrand,NonBrand,NonBrand,NonBrand,NonBrand,NonBrand,NonBrand,NonBrand,NonBrand",
"NonBrand,NonBrand,Brand,NonBrand", "Brand,NonBrand,NonBrand,Brand",
"NonBrand,Brand,Brand,Brand,Brand", "NonBrand,NonBrand,NonBrand,NonBrand,Brand",
"Brand,NonBrand,Brand,NonBrand", "NonBrand,Brand,Brand,NonBrand",
"Brand,Brand,Brand,Brand,NonBrand", "Brand,NonBrand,NonBrand,NonBrand,NonBrand",
"Brand,Brand,Brand,Brand,Brand,Brand,Brand,Brand,Brand", "NonBrand,NonBrand,NonBrand,NonBrand,NonBrand,NonBrand,NonBrand,NonBrand,NonBrand,NonBrand,NonBrand",
"Brand,NonBrand,Brand,Brand,Brand", "NonBrand,Brand,NonBrand,Brand",
"Brand,Brand,Brand,NonBrand,Brand", "NonBrand,NonBrand,Brand,Brand,Brand",
"NonBrand,NonBrand,NonBrand,Brand,Brand", "Brand,Brand,NonBrand,Brand,Brand",
"Brand,Brand,Brand,NonBrand,NonBrand", "Brand,Brand,Brand,Brand,Brand,Brand,Brand,Brand,Brand,Brand",
"NonBrand,NonBrand,NonBrand,Brand,NonBrand", "Brand,Brand,NonBrand,NonBrand,NonBrand",
"NonBrand,Brand,Brand,Brand,Brand,Brand", "NonBrand,Brand,NonBrand,NonBrand,NonBrand",
"NonBrand,NonBrand,Brand,NonBrand,NonBrand", "NonBrand,NonBrand,NonBrand,NonBrand,NonBrand,Brand",
"Brand,NonBrand,NonBrand,NonBrand,NonBrand,NonBrand", "Brand,Brand,Brand,Brand,Brand,NonBrand",
"NonBrand,Brand,Brand,NonBrand,NonBrand", "Brand,NonBrand,NonBrand,Brand,Brand",
"NonBrand,NonBrand,NonBrand,NonBrand,Brand,Brand", "NonBrand,NonBrand,Brand,Brand,Brand,Brand",
"NonBrand,NonBrand,NonBrand,NonBrand,Brand,NonBrand", "NonBrand,NonBrand,Brand,NonBrand,Brand",
"Brand,NonBrand,NonBrand,Brand,NonBrand", "NonBrand,NonBrand,NonBrand,Brand,Brand,Brand",
"NonBrand,Brand,Brand,NonBrand,Brand", "Brand,NonBrand,NonBrand,NonBrand,NonBrand,Brand",
"Brand,Brand,NonBrand,NonBrand,NonBrand,NonBrand,NonBrand", "Brand,Brand,Brand,Brand,NonBrand,NonBrand,NonBrand"
), click_count …Run Code Online (Sandbox Code Playgroud) 最近我经常发现自己在问Pandas我依赖于我正在使用的数据的问题,到目前为止我需要花很长时间来创建一个与我的数据相似的数据框(可重复的数据框),以便SO用户可以轻松复制它到他们的机器.
我宁愿找到一个方便的方法,所以我可以在我的问题中打印我的小DF,其他用户可以轻松地收集它,因此用最小的努力创建它.
在R我习惯dput在控制台的函数中打印我的数据的一小部分样本,然后在我的问题中打印输出(示例):
运行for循环时获取错误"级别因素集是不同的"
我已经注意到了这个解释,但我认为它不适合为其他SO用户打印数据样本: Python相当于R的dput()函数
在Pandas中是否有相同的方法来做到这一点?
提前致谢!
我想在Windows中尝试更快的R版本.pqR/Riposte没有Windows版本.Renjin的网站有一个Renjin Studio GUI(适用于所有平台),它打开了一个可以运行R命令的控制台,但这并不太有用.我知道人人还在开发中,但我想问一下:是否有可能在RStudio内使用人人,即在RStudio内设置"R版"到人津?
我正在开发一个需要运行a ctree然后以交互模式绘制它的项目- 比如'D3.js'树布局,我的主要障碍是将ctree输出转换为json格式,以后由javascript使用.
以下是我需要的(例如来自虹膜数据):
> library(party)
> irisct <- ctree(Species ~ .,data = iris)
> irisct
Conditional inference tree with 4 terminal nodes
Response: Species
Inputs: Sepal.Length, Sepal.Width, Petal.Length, Petal.Width
Number of observations: 150
1) Petal.Length <= 1.9; criterion = 1, statistic = 140.264
2)* weights = 50
1) Petal.Length > 1.9
3) Petal.Width <= 1.7; criterion = 1, statistic = 67.894
4) Petal.Length <= 4.8; criterion = 0.999, statistic = 13.865
5)* weights = …Run Code Online (Sandbox Code Playgroud) 在与这个惊人的facebookresearch / PyTorch-BigGraph项目及其不可能的 API苦苦挣扎之后,我设法掌握了如何运行它(感谢独立的简单示例)
我的系统限制不允许我训练所有边的密集(嵌入)表示,我需要不时上传过去的嵌入并使用新边和现有节点训练模型,注意过去和新边的节点列表不一定重叠。
我试图从这里了解:请参阅上下文部分如何做到这一点,到目前为止没有成功。
以下是一个独立的 PGD 代码,它变成batch_edges了一个嵌入节点列表,但是,我需要它来使用预训练的节点列表past_trained_nodes。
import os
import shutil
from pathlib import Path
from torchbiggraph.config import parse_config
from torchbiggraph.converters.importers import TSVEdgelistReader, convert_input_data
from torchbiggraph.train import train
from torchbiggraph.util import SubprocessInitializer, setup_logging
DIMENSION = 4
DATA_DIR = 'data'
GRAPH_PATH = DATA_DIR + '/output1.tsv'
MODEL_DIR = 'model'
raw_config = dict(
entity_path=DATA_DIR,
edge_paths=[DATA_DIR + '/edges_partitioned', ],
checkpoint_path=MODEL_DIR,
entities={"n": {"num_partitions": 1}},
relations=[{"name": "doesnt_matter", "lhs": "n", "rhs": "n", "operator": "complex_diagonal", }], …Run Code Online (Sandbox Code Playgroud) 我正在尝试使用auto.arima该forecast软件包创建30天的预测.我想捕捉长期趋势,所以我将其插入到xreg参数中.
数据:
dput(data)
structure(list(TKDate = structure(c(15706, 15707, 15708, 15709,
15710, 15711, 15712, 15713, 15714, 15715, 15716, 15717, 15718,
15719, 15720, 15721, 15722, 15723, 15724, 15725, 15726, 15727,
15728, 15729, 15730, 15731, 15732, 15733, 15734, 15735, 15736,
15737, 15738, 15739, 15740, 15741, 15742, 15743, 15744, 15745,
15746, 15747, 15748, 15749, 15750, 15751, 15752, 15753, 15754,
15755, 15756, 15757, 15758, 15759, 15760, 15761, 15762, 15763,
15764, 15765, 15766, 15767, 15768, 15769, 15770, 15771, 15772,
15773, 15774, …Run Code Online (Sandbox Code Playgroud) 我希望适用pading于我的数据框的每一组
请注意,对于单个组('element_id'),我在填充方面没有问题:
第一组(group1):
{'date': {88: datetime.date(2017, 10, 3), 43: datetime.date(2017, 9, 26), 159: datetime.date(2017, 11, 8)}, u'element_id': {88: 122, 43: 122, 159: 122}, u'VALUE': {88: '8.0', 43: '2.0', 159: '5.0'}}
Run Code Online (Sandbox Code Playgroud)
所以我在它上面应用填充(效果很好):
print group1.set_index('date').asfreq('D', method='pad').head()
Run Code Online (Sandbox Code Playgroud)
我希望通过几个组应用这个逻辑 groupby
另一组(group2):
{'date': {88: datetime.date(2017, 10, 3), 43: datetime.date(2017, 9, 26), 159: datetime.date(2017, 11, 8)}, u'element_id': {88: 122, 43: 122, 159: 122}, u'VALUE': {88: '8.0', 43: '2.0', 159: '5.0'}}
group_data=pd.concat([group1,group2],axis=0)
group_data.groupby(['element_id']).set_index('date').resample('D').asfreq()
Run Code Online (Sandbox Code Playgroud)
我收到以下错误:
AttributeError: Cannot access callable attribute 'set_index' of 'DataFrameGroupBy' objects, try using …Run Code Online (Sandbox Code Playgroud) 我有一个 pandas DF,其中每列代表一个节点,两列代表一条边,如下所示:
import pandas as pd
df = pd.DataFrame({'node1': ['2', '4','17', '17', '205', '208'],
'node2': ['4', '13', '25', '38', '208', '300']})
Run Code Online (Sandbox Code Playgroud)
所有节点都是无向的,即您可以从一个到另一个undirected_graph
我想将它们分组为所有连接的组(连接),如下所示:
df = pd.DataFrame({'node1': ['2', '4','17', '17', '205', '208'],
'node2': ['4', '13', '25', '38', '208', '300']
,'desired_group': ['1', '1', '2', '2', '3', '3']})
Run Code Online (Sandbox Code Playgroud)
例如,前两行之所以被分组,是因为它可以从节点 2 到达节点 13(通过 4)。
我设法找到的最接近的问题是这个: pandas - 根据列值将数据框重塑为边缘列表,但据我了解,这是一个不同的问题。
对此的任何帮助都会很棒,提前致谢。
即时创建ggplot与geom_vline在x轴上的特定位置.我希望x轴显示特定值
以下是我的数据+代码:
dput(agg_data)
structure(list(latency = structure(c(0, 1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 24, 26, 28,
29, 32, 36, 37, 40, 43, 46, 47, 48, 49, 54, 64, 71, 72, 75, 87,
88, 89, 93, 134, 151), class = "difftime", units = "days"), count = c(362,
11, 8, 5, 4, 2, 8, 6, 4, 2, 2, 1, 5, 1, 2, 2, 2, 1, …Run Code Online (Sandbox Code Playgroud) 这个问题只是要求在R中实现以下问题:在一组字符串中找到最长的公共起始子字符串(JavaScript)
" 这个问题是最长公共子串问题的一个更具体的例子.我只需要在数组中找到最长的公共起始子串 ".
所以我只是看一个这个问题的R实现(最好不是在 JavaScript版本中建议的for/while循环方式),如果可能的话我想把它作为一个函数包装起来,所以我可以在很多组中应用数据表.
经过一些搜索,我找不到一个R的例子,因此这个问题.
示例数据:我有以下字符向量:
dput(data)
c("ADA4417-3ARMZ-R7", "ADA4430-1YKSZ-R2", "ADA4430-1YKSZ-R7",
"ADA4431-1YCPZ-R2", "ADA4432-1BCPZ-R7", "ADA4432-1BRJZ-R2")
Run Code Online (Sandbox Code Playgroud)
我想在R中运行一个算法,它将找到以下输出:ADA44.
从我在JavaScript接受的答案中看到的,我们的想法是首先对向量进行排序,提取第一个和最后一个元素(例如:"ADA4417-3ARMZ-R7"和"ADA4432-1BRJZ-R2"它们分成单个字符,并循环遍历它们,直到其中一个字符为'匹配(希望我对)
对此的任何帮助都会很棒!
r ×6
pandas ×3
python ×2
d3.js ×1
forecasting ×1
ggplot2 ×1
graph ×1
graph-theory ×1
json ×1
pytorch ×1
renjin ×1
rstudio ×1
string ×1
time-series ×1
treeview ×1