问题描述
我有一个公式和一个数据框,我想提取model.matrix()
.但是,我需要生成的矩阵包含在原始数据集中找到的 NA.如果我要使用 model.frame()
来做到这一点,我会简单地传递它 na.action=NULL
.但是,我需要的输出是 model.matrix()
格式.具体来说,我只需要右边的变量,我需要输出是一个矩阵(不是数据框),我需要将因子转换为一系列虚拟变量.
I have a formula and a data frame, and I want to extract the model.matrix()
. However, I need the resulting matrix to include the NAs that were found in the original dataset. If I were to use model.frame()
to do this, I would simply pass it na.action=NULL
. However, the output I need is of the model.matrix()
format. Specifically, I need only the right-hand side variables, I need the output to be a matrix (not a data frame), and I need factors to be converted to a series of dummy variables.
我确信我可以使用循环或其他方式将某些东西组合在一起,但我想知道是否有人可以建议一种更清洁、更有效的解决方法.非常感谢您的时间!
I'm sure I could hack something together using loops or something, but I was wondering if anyone could suggest a cleaner and more efficient workaround. Thanks a lot for your time!
这是一个例子:
dat <- data.frame(matrix(rnorm(20),5,4), gl(5,2))
dat[3,5] <- NA
names(dat) <- c(letters[1:4], 'fact')
ff <- a ~ b + fact
# This omits the row with a missing observation on the factor
model.matrix(ff, dat)
# This keeps the NA, but it gives me a data frame and does not dichotomize the factor
model.frame(ff, dat, na.action=NULL)
这是我想获得的:
(Intercept) b fact2 fact3 fact4 fact5
1 1 0.7266086 0 0 0 0
2 1 -0.6088697 0 0 0 0
3 NA 0.4643360 NA NA NA NA
4 1 -1.1666248 1 0 0 0
5 1 -0.7577394 0 1 0 0
6 1 0.7266086 0 1 0 0
7 1 -0.6088697 0 0 1 0
8 1 0.4643360 0 0 1 0
9 1 -1.1666248 0 0 0 1
10 1 -0.7577394 0 0 0 1
推荐答案
您可以根据行名对 model.matrix
对象进行一些处理:
You can mess around a little with the model.matrix
object, based on the rownames :
MM <- model.matrix(ff,dat)
MM <- MM[match(rownames(dat),rownames(MM)),]
MM[,"b"] <- dat$b
rownames(MM) <- rownames(dat)
给出:
> MM
(Intercept) b fact2 fact3 fact4 fact5
1 1 0.9583010 0 0 0 0
2 1 0.3266986 0 0 0 0
3 NA 1.4992358 NA NA NA NA
4 1 1.2867461 1 0 0 0
5 1 0.5024700 0 1 0 0
6 1 0.9583010 0 1 0 0
7 1 0.3266986 0 0 1 0
8 1 1.4992358 0 0 1 0
9 1 1.2867461 0 0 0 1
10 1 0.5024700 0 0 0 1
或者,您可以使用 contrasts()
为您完成这项工作.手动构建矩阵将是:
Alternatively, you can use contrasts()
to do the work for you. Constructing the matrix by hand would be :
cont <- contrasts(dat$fact)[as.numeric(dat$fact),]
colnames(cont) <- paste("fact",colnames(cont),sep="")
out <- cbind(1,dat$b,cont)
out[is.na(dat$fact),1] <- NA
colnames(out)[1:2]<- c("Intercept","b")
rownames(out) <- rownames(dat)
给出:
> out
Intercept b fact2 fact3 fact4 fact5
1 1 0.2534288 0 0 0 0
2 1 0.2697760 0 0 0 0
3 NA -0.8236879 NA NA NA NA
4 1 -0.6053445 1 0 0 0
5 1 0.4608907 0 1 0 0
6 1 0.2534288 0 1 0 0
7 1 0.2697760 0 0 1 0
8 1 -0.8236879 0 0 1 0
9 1 -0.6053445 0 0 0 1
10 1 0.4608907 0 0 0 1
无论如何,这两种方法都可以合并到一个可以处理更复杂公式的函数中.我将练习留给读者(当我在论文中遇到这句话时,我有什么不喜欢的;-))
In any case, both methods can be incorporated in a function that can deal with more complex formulae. I leave the exercise to the reader (what do I loath that sentence when I meet it in a paper ;-) )
这篇关于model.matrix() 与 na.action=NULL?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,WP2