[Miner] General Data Mining Links

The ACM SIGKDD is the premier professional organization for data mining.

Proceedings of KDD-2000

Knowledge Disovery and Data Mining. The main Web page for KDD related information.

STATLIB. STATLIB is a major online source of information for statistical computing.

Machine Learning Resources. Very extensive list of information and pointers on machine learning research and applications.

Pattern Recognition. Information and resources about pattern recognition methods and applications.

SVM applet (thanks to Pai-Hsi for the link)


General Purpose Data Banks

The UC Irvine KDD Archive  This is a new archive of large data sets specifically collected for researchers in data mining and KDD to serve as a benchmark collection of data (developed under sponsorship of the National Science Foundation). There are some excellent data sets here to consider as class projects.

STATLIB data sets. STATLIB is the major online archive for statistical research and the data sets portion contains many data sets which have been analyzed in the statistical literature (mostly smaller data sets, many for regression, classification, and other statistical methods).

UC Irvine Machine Learning Data Archive. The ICS department's widely known original archive of benchmark data sets (mostly smallish data sets, mostly for classification problems). I would prefer you look first at the larger data sets from the UCI KDD archive (above) for class projects. Nonetheless, there are some interesting data sets here too.

Computer Vision Test Images. Very extensive set of pointers to online collections of sets of image data.

The DELVE benchmark collection. A collection of data sets and software focused on the systematic evaluation of learning algorithms.

A Casebook for a First Course in Statistic and Data Analysis. A very nice collection of "smallish" data sets based on a text of the same name, including many classic data sets such as Iris and Old Faithful.

Time Series Data Sets. Very extensive collection of over 500 time series data sets from a very broad collection of applications.

DASL: The Data and Story Library. Very nice collection of "small" data sets with background descriptions, used mainly for teaching.

NIST Benchmark Image Data Sets. Benchmark data sets for optical character recognition and face recognition problems.

CHANCE Data Sets. Small collection of data sets used in the CHANCE teaching framework (including Dow Jones and some sports-related data).


Application-Specific Data Banks

Economic Time Series Data. Contains virtually every economic time series (for the United States) that you can imagine.

Asteroid Elements Database A dataset containing detailed information on several thousand asteroids in the solar system. The problem of interest here is one of clustering or grouping the asteroids into natural groups.

Climate Data Sources. Very extensive list of pointers to climate, atmosphere, and ocean related data sets.

Comet Library. Large collection of images of observed comets.

El Nino and Internet usage data. (Not related!). Two nicely documented data sets intended as benchmarks for the 1999 American Statistical Association meeting.


Pointers to Data Banks

Knowledge Discovery and Data Mining Data Sets. Pointers to data sets used in KDD research.

Pattern Recognition Archives. Pointers to data archives used in the pattern recognition, machine learning, and neural network research literature.

University of Nevada at Reno. Extensive list of pointers to various online data archives.

UCLA Statistics Department. Pointers to data sets from various texts and national studies and collections.