Monday, 26 August 2013

R data.table syntax for subsetting and summarising

R data.table syntax for subsetting and summarising

This is probablly quite simple but would like to be able to summarise some
data (mean and median) based upon on random column selection, and for it
to be grouped by a different column.
Please see below:
DT = data.table(x=rep(c("a","b","c"),each=3), y=c(1,3,6), v=1:9)
ww <- sample(c("y","v"),1)
DT[,list(avg=mean(ww),med=median(ww)),by="x"]
x avg med
1: a NA y
2: b NA y
3: c NA y
Warning messages:
1: In `[.data.table`(DT, , list(avg = mean(ww), med = median(ww)), :
argument is not numeric or logical: returning NA
2: In `[.data.table`(DT, , list(avg = mean(ww), med = median(ww)), :
argument is not numeric or logical: returning NA
3: In `[.data.table`(DT, , list(avg = mean(ww), med = median(ww)), :
argument is not numeric or logical: returning NA
If for example ww happened to equal "v" then I would expect the following
output
x avg med
1: a 2 2
2: b 5 5
3: c 8 8
I think it is just syntax that I need to adjust, but am unsure how to
adjust it...Any help would be greatly appreciated...

No comments:

Post a Comment