« "I'm gonna kick your ass" | Main | »
March 24, 2006
Visualization of large datasets
Gregor Gorjanc writes,
Gentleman et al. published a paper on visualizing genomic data. There are quite some issues that can be applied to other areas of data visualization. I particulary like the scatterplot examples on page 17. I [Gregor] often have massive datasets and it is hard to see anything there. smoothScatter from geneplotter R package can help a lot in producing more informative and eye candy graphs. Try the following (from smootScatter help page). And my examples--unfortunatelly not in English, but graphs show some context.library("geneplotter") ## you need additionally annotate and Biobase ## from BioC and RColorBrewer if(interactive()) { x1 <- matrix(rnorm(1e4), ncol=2) x2 <- matrix(rnorm(1e4, mean=3, sd=1.5), ncol=2) x <- rbind(x1,x2)layout(matrix(1:4, ncol=2, byrow=TRUE))
smoothScatter(x, nrpoints=0)
smoothScatter(x)
smoothScatter(x, nrpoints=Inf,
colramp=colorRampPalette(RColorBrewer::brewer.pal(9,
"YlOrRd")),
bandwidth=40)
colors <- densCols(x)
plot(x, col=colors, pch=20)
}
Posted by Andrew at March 24, 2006 12:53 AM
Trackback Pings
TrackBack URL for this entry:
http://www.stat.columbia.edu/~cook/movabletype/mt-tb.cgi/369
Comments
Gregor,
you might be interested in the book "Graphics of Large Data Sets" which is due this summer. (I hate advertising my own work, but ...) There are many real world examples which might be helpful when dealing with large data.
There are also some slides of a talk I gave some years ago you might like.
Posted by: Martin Theus at March 24, 2006 11:57 AM.
Posted by: Gregor at March 24, 2006 5:31 PM.