« "I'm gonna kick your ass" | Main | »

March 24, 2006

Visualization of large datasets

Gregor Gorjanc writes,

Gentleman et al. published a paper on visualizing genomic data. There are quite some issues that can be applied to other areas of data visualization. I particulary like the scatterplot examples on page 17. I [Gregor] often have massive datasets and it is hard to see anything there. smoothScatter from geneplotter R package can help a lot in producing more informative and eye candy graphs. Try the following (from smootScatter help page). And my examples--unfortunatelly not in English, but graphs show some context.
library("geneplotter") ## you need additionally annotate and Biobase ## from BioC and RColorBrewer if(interactive()) { x1 <- matrix(rnorm(1e4), ncol=2) x2 <- matrix(rnorm(1e4, mean=3, sd=1.5), ncol=2) x <- rbind(x1,x2)

layout(matrix(1:4, ncol=2, byrow=TRUE))
smoothScatter(x, nrpoints=0)
smoothScatter(x)
smoothScatter(x, nrpoints=Inf,
colramp=colorRampPalette(RColorBrewer::brewer.pal(9,
"YlOrRd")),
bandwidth=40)
colors <- densCols(x)
plot(x, col=colors, pch=20)
}


Posted by Andrew at March 24, 2006 12:53 AM

RSS feed for this entry.

Trackback Pings

TrackBack URL for this entry:
http://www.stat.columbia.edu/~cook/movabletype/mt-tb.cgi/369

Comments

Gregor,

you might be interested in the book "Graphics of Large Data Sets" which is due this summer. (I hate advertising my own work, but ...) There are many real world examples which might be helpful when dealing with large data.


There are also some slides of a talk I gave some years ago you might like.

Posted by: Martin Theus at March 24, 2006 11:57 AM.

Martin, thanks!

Any pointers to good works are welcome.

Posted by: Gregor at March 24, 2006 5:31 PM.

Post a comment




Remember Me?

(you may use HTML tags for style)