我用layout.circle"igraph包" 的算法绘制了这个数字.

由于放置在节点间边缘后面,左边的自环带的一些标签不清晰可见.我是否可以应用任何调整来提高绘图的可读性而不改变标签的距离?(我想在没有重新编码整个事情的情况下,在圆的径向矢量上绘制循环是不可能的...)
这是代码
par(mar=c(0,0,0,0))
plot(g,
layout=layout.circle,
vertex.label.family="Palatino",
edge.label.family="Palatino",
edge.label.cex=0.7,
vertex.size=log(V(g)$community_size)+7,
vertex.label=V(g)$community_size,
edge.width=log(E(g)$weight),
edge.label=E(g)$weight)
Run Code Online (Sandbox Code Playgroud) 我有这个载体
vector <- c("www.one","www.two","www.one","www.three")
Run Code Online (Sandbox Code Playgroud)
我想找到所有重复项,包括第一次出现的重复值.如果我做
dup <- duplicated(vector)
Run Code Online (Sandbox Code Playgroud)
我明白了
dup
# [1] FALSE FALSE TRUE FALSE
Run Code Online (Sandbox Code Playgroud)
虽然我需要得到
# [1] TRUE FALSE TRUE FALSE
Run Code Online (Sandbox Code Playgroud) 如何为绘图文本元素和其他注释统一定义文本样式(大小和族)?
以下MWE
library(ggplot2)
data1.df <- data.frame(Plant = c("Plant1", "Plant1", "Plant1", "Plant2", "Plant2",
"Plant2"), Type = c(1, 2, 3, 1, 2, 3), Axis1 = c(0.2, -0.4, 0.8, -0.2, -0.7,
0.1), Axis2 = c(0.5, 0.3, -0.1, -0.3, -0.1, -0.8))
theme_set(theme_bw() + theme(text=element_text(family="Palatino", size=10)))
ggplot(data1.df, aes(x = Axis1, y = Axis2, shape = Plant, color = Type)) + geom_point(size = 5) + annotate("text", x=0.4, y=0.0, label="Label", fontface="italic") + theme(legend.position="none")
Run Code Online (Sandbox Code Playgroud)
产生

"标签"与主题element_text()定义不一致.
我有两个数据集,其中有两个连续变量:duration和waiting。
library("MASS")
data(geyser)
geyser1 <- geyser[1:150,]
geyser2 <- geyser[151:299,]
geyser2$duration <- geyser2$duration - 1
geyser2$waiting <- geyser2$waiting - 20
Run Code Online (Sandbox Code Playgroud)
对于每个数据集,我输出一个二维密度图
ggplot(geyser1, aes(x = duration, y = waiting)) +
xlim(0.5, 6) + ylim(40, 110) +
stat_density2d(aes(alpha=..level..),
geom="polygon", bins = 10)
ggplot(geyser2, aes(x = duration, y = waiting)) +
xlim(0.5, 6) + ylim(40, 110) +
stat_density2d(aes(alpha=..level..),
geom="polygon", bins = 10)
Run Code Online (Sandbox Code Playgroud)
我现在想要生成一个图,指示两个图具有相同密度(白色)的区域、负差异(从白色到蓝色的渐变,密度geyser2大于geyser1)和正差异(从白色到红色的渐变,密度geyser1大于geyser2) 。
如何计算并绘制密度差?
看起来很容易检索SELECT带有SQL查询的ed 行数
cursor.execute("SELECT COUNT(*) from ...")
result=cursor.fetchone()
Run Code Online (Sandbox Code Playgroud)
但是我应该如何通过DELETE查询检索行数?
我有97M行的长表.每行包含一个人采取的操作的信息以及该操作的时间戳,格式如下:
actions <- c("walk","sleep", "run","eat")
people <- c("John","Paul","Ringo","George")
timespan <- seq(1000,2000,1)
set.seed(28100)
df.in <- data.frame(who = sample(people, 10, replace=TRUE),
what = sample(actions, 10, replace=TRUE),
when = sample(timespan, 10, replace=TRUE))
df.in
# who what when
# 1 Paul eat 1834
# 2 Paul sleep 1295
# 3 Paul eat 1312
# 4 Ringo eat 1635
# 5 John sleep 1424
# 6 George run 1092
# 7 Paul walk 1849
# 8 John run 1854
# 9 George sleep 1036
# …Run Code Online (Sandbox Code Playgroud) 我有一个向量“nameAlpha”,例如c(“Mark Twain”,“Phil Hall”,“Michael P. O'Connor”,“”,...)。我想将每个名字传递给另一个向量“nameAlpha_first”。我运行这个
nameAlpha_first <- sapply(strsplit(nameAlpha, "\\s+"), "[[", 1)
Run Code Online (Sandbox Code Playgroud)
但我得到
Error in FUN(X[[12L]], ...) : subscript out of bounds
Run Code Online (Sandbox Code Playgroud)
难道是因为向量中很少有元素是空的吗?我该如何解决它?
我一直在阅读不同的问题/答案(特别是在这里和这里),而没有设法适用于我的情况.
我有一个11,390行矩阵,其中包含属性id,作者,文本,例如:
library(tm)
m <- cbind(c("01","02","03","04","05","06"),
c("Author1","Author2","Author2","Author3","Author3","Auhtor4"),
c("Text1","Text2","Text3","Text4","Text5","Text6"))
Run Code Online (Sandbox Code Playgroud)
我想用它创建一个tm语料库.我可以快速创建我的语料库
tm_corpus <- Corpus(VectorSource(m[,3]))
Run Code Online (Sandbox Code Playgroud)
它终止了我的11,390行矩阵的执行
user system elapsed
2.383 0.175 2.557
Run Code Online (Sandbox Code Playgroud)
但是当我尝试将元数据添加到语料库中时
meta(tm_corpus, type="local", tag="Author") <- m[,2]
Run Code Online (Sandbox Code Playgroud)
执行时间超过15分钟并计数(然后我停止执行).
根据这里的讨论,可能会大大减少处理语料库的时间tm_map; 就像是
tm_corpus <- tm_map(tm_corpus, addMeta, m[,2])
Run Code Online (Sandbox Code Playgroud)
我仍然不知道该怎么做.可能它会是这样的
addMeta <- function(text, vector) {
meta(text, tag="Author") = vector[??]
text
}
Run Code Online (Sandbox Code Playgroud)
首先,如何传递给tm_map值的向量以分配给语料库的每个文本?我应该在循环中调用该函数吗?我应该把tm_map功能包括在内vapply吗?
我有这个函数,我曾经加载SQLite表
sqLiteConnect <- function(database, table) {
library(DBI)
library(RSQLite)
con <- dbConnect("SQLite", dbname = database)
query <- dbSendQuery(con, paste("SELECT * FROM ", table, ";", sep=""))
result <- fetch(query, n = -1, encoding="utf-8")
dbClearResult(query)
dbDisconnect(con)
return(result)
}
Run Code Online (Sandbox Code Playgroud)
但现在它接缝会产生错误
album <- sqLiteConnect("~/Downloads/ChinookDatabase1.3_Sqlite/Chinook_Sqlite.sqlite","Album")
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘dbConnect’ for signature ‘"character"’
Called from: stop(gettextf("unable to find an inherited method for function %s for signature %s",
sQuote(fdef@generic), sQuote(cnames)), domain = NA)
Run Code Online (Sandbox Code Playgroud)
(我从这里下载了数据库) …
我有一个data.table带有一个演员列表的演员,这些演员通过id做事来做事date.actor特定事物的数量没有限制date.
require(data.table)
set.seed(28100)
df.in <- data.table(id = sample(1:10, 100, replace=TRUE),
date = sample(2001:2012, 100, replace=TRUE))
Run Code Online (Sandbox Code Playgroud)
现在,我想总结一下我的数据集,找出以下序列的每个区间的出现次数
sequence <- seq(2000, 2012, 4)
df.out1 <- as.data.frame(table(cut(df.in$date, breaks = sequence)))
df.out1
# Var1 Freq
# 1 (2000,2004] 35
# 2 (2004,2008] 27
# 3 (2008,2012] 38
Run Code Online (Sandbox Code Playgroud)
都好.但是现在我不计算事件的数量,而是计算每个区间中活动的参与者的数量,即一次或多次发生.