Developing methodology for analyzing data is an interest of several communities. While there are many, I have wandered around statistics, machine learning and data mining. Every community has its own priorities, its own problems, its own preferred approaches. It is somewhat difficult for truly interdisciplinary work to emerge, but it is possible to examine what other communities are working on, and adopt good techniques from different areas. This posting will attempt to describe the motivations and interests of statisticians, machine learners and data miners.
There do not need to be many wanderers, but there are areas with relatively few. There was an interesting discussion in the blogosphere last year started by Eszter Hargittai and Cosma Shalizi about the clustering in the network of social networkers: physicists cite other physicists and sociologists cite other sociologists, without much intercitation. With the overproduction of scientific publications, researchers in the field of scientometrics like Loet Leydesdorff are trying to create maps of the scientific enterprise, using the citation links as indicators of connectedness. A good example is his Visualization of the Citation Impact Environments of Scientific Journals.
In the succeeding paragraphs, I will describe the different disciplines.
Applied Statistics
The primary goal in applied statistics is to apply formal data analysis to a particular problem. This involves preparing an experiment, deciding on the structure of the data, building the model, verifying it and then providing conclusions and recommendations. The level of detail and precision is very high, the level of automation is relatively low, the flexibility in modelling is quite high unless one just uses prefabricated analytical tools. Generally, there is not much data, so this field pays much attention to justifying that the found model does not capture random characteristics of the data.
Machine Learning
Machine learning has developed from the artificial intelligence stream in computer science. The ambition of artificial intelligence was to develop a higher level of sophistication in software, as to enable software to deal with problems that involved "thinking" (although E. Dijkstra joked something like "asking whether computers can think is like asking whether submarines can swim").
A lot of early work was based on logic, especially knowledge representation and reasoning that would result in expert systems. But it turned out that expert systems were expensive to design, so people focused on the problem of "learning from data". In the 80's the initial attempts were inspired by CART and similar methods for "induction of decision trees": the results of learning could be represented easily in the familiar form of logical rules. Over the 90's, machine learning has assimilated many of the researchers that previously worked with neural networks, especially those following a more formal approach in pattern recognition and computer vision. Moreover, the success of probabilistic models (such as belief nets developed in AI and statistics, and the naive Bayes method from the old days of pattern recognition) also created their own substreams.
Generally, a custom model is not considered particularly interesting. The usual approach in machine learning involves a large number of different data sets. Instead, people are impressed by a model that can be applied to a variety of data, that works efficiently and that does not "overfit". The model is rarely examined, and people are comfortable with black-box models, as long as objective measures of fit are fine. What matters, however, is the cross-validated accuracy of a particular algorithm on a set of benchmark data sets. It is quite popular to do competitions or challenges, where different groups compete who will get the highest score on the data that was hidden. Unlike declarative "models" in statistics, machine learners tend to work with "algorithms" or procedures for building a model.
The traditional problems mostly addressed in machine learning are classification, probability estimation, regression, clustering and, more recently, dimension reduction. Machine learning was successful developing rather robust tools for all these tasks, and these tools then get applied in software that everyone uses. Methodologically, people are quite relaxed. While statistics is an accepted approach, "statistical learning", logic and other theories are used. However, machine learning is always looking for new types of problems that involve "learning" across the board of possible applications. This is perhaps the most exciting aspect of it: the successful applications of machine learning techniques to various tasks in the analysis of text was one of the hot topics in the past few years.
The primary conferences in this area seem to be ICML, NIPS and ECML.
Data Mining
Data mining itself has developed by fusion of database researchers and machine learners. It is a highly pragmatic field which pays a lot of attention to commercial applications and problem solving. The main methodological characteristic is scalability, efficiency and robustness. It is quite usual to do modelling on several million cases. Lately, there has been a slight disconnect between the machine learning theory, which is becoming increasingly theoretical, and the applications that should support business objectives. The community is very interested in new applications of data mining. But in general, data mining is focusing on tools and techniques for analysis of large quantities of diverse data - "knowledge discovery in databases".
The primary conferences in this area are KDD, ICDM, PAKDD and PKDD.

Thanks for the very nice overview of several fields. Just one note: machine learning goes way back before the '80's and CART. People were constructing perceptron learning algorithms back in the 1950's. But it sounds like you're defining machine learning to not include neural networks?, so maybe I'm misunderstanding what you mean. A great overview of the history of machine learning can be found in the standard AI textbook Artificial Intelligence: A Modern Approach, by Stuart Russell and Peter Norvig.
Brendan, I agree, the matter of machine learning has been around for a long time. But the chain of "machine learning" conferences appeared later on and I think it initially did not include neural networks: in fact, neural networks seemed to have been a somewhat separate field - there still are conferences specific to neural networks. A while ago we did some digging around the past, finding reports from machine learning workshops circa 1983. As for neural network models - many of them have parallels in conventional statistical models, as this paper by Warren S. Sarle nicely shows.
To add more confusion to your discussion/history, the 1st usage of "machine learning" according
to the OED is this 1959 citation:
A. L. SAMUEL. Some studies in machine learning using the game of checkers. IBM JOURNAL OF RESEARCH AND DEVELOPMENT, VOL. 3, NO. 3, 1959.
"We have at our command computers with adequate data-handling ability and with sufficient computational speed to make use of *machine-learning* techniques."
"As for neural network models - many of them have parallels in conventional statistical models, as this paper by Warren S. Sarle nicely shows"
Neural networks are nonparametric and parcimonous regression or classification tools. Why do people from the statistical community disregard them ? Because they perform well in real case applications (high number of predictors, nonlinearity, non-Gaussian noise, outliers, etc) ?
May I suggest to everybody to read Breiman's article : http://www.cis.upenn.edu/datamining/ReadingGroup/papers/breiman2001.pdf (published in an international and peer-reviewed statistical journal while Sarle's article was presented at a SAS conference...)
Thanks to Anonymous for providing the link to Breiman's paper. I guess the key difference is that statististics community really cares about what the model "means", whereas neural networks work predominantly pursues predictive performance regardless of what is the model like. In many cases, the purpose of modeling is to understand what is going on, to decide and to act, and not merely to predict. It's two cultures indeed.
Breiman tries to bridge the two cultures by showing how visualization tools can be used to pull "meaning" out of an otherwise black box model. But often this meaning is not a complete and sufficient description of the data.
BTW: Sarle wrote a neural network library for the SAS system, and authored a large part of the comp.ai.neural-nets FAQ.
I surely agree that neural networks are very prediction-oriented, but prediction can nevertheless be used to decide and to act. For instance, in chemometrics applications, where the Partial Least Squares regression is often used and where neural networks begin to somhow enter the market, it is mostly about prediction and subsequent decision. A typical example in food control is to allow or deny the marketing of a sample of processed meat based on the prediction of the fat content of the meat by NIR spectra analysis.
As for the comment on the fact that Sarle's paper was presented at a SAS conference, it was not meant to disregard Sarle's qualifications and/or expertise, it was meant to emphisize the fact that the paper's intended audience (potential buyers ?) probably biased the writing of the paper. Breiman's paper by contrast was published for a (more neutral ?) scientific audience.
(P.S. sorry for previously posting as anonymous)