Workshop description

What is Data Mining?

Data Mining for Research

While data mining is useful in practice, it is also a powerful tool for testing existing theories and developing new models (Shmueli & Koppius, 2010). Many empirical research fields are monopolized by statistical methods for analyzing data, mostly due to the training of the researchers and the lack of knowledge of data mining. Yet, data mining offers a unique and complementary technology for deriving knowledge from data. Moreover, predictive analytics help bridge gaps between theoretical work and practice. For these reasons, data mining has an enormous potential to further research in engineering fields that have large datasets.

Data mining provides semi-automated algorithms that learn from historic data how to combine a set of input information (X variables) to accurately predict a response of interest (Y variable). Statistical models typically focus on quantifying the relationship in the sample, and inferring about the relationship in the population. In contrast, predictive analytics focus on predictive power and on how the relationship between the input information can predict new individual observations. Think of the difference between quantifying the relationship between the amount of smoking and the probability of cancer in a population (statistical methods), compared to predicting the probability of cancer for one or more individuals, given their smoking habits (predictive analytics). The two are different, and call for different modeling (Shmueli, 2010).

Data mining algorithms tend to be more data-driven; they “learn” from data with much less assumptions compared to statistical methods. With large datasets, such tools provide an opportunity for discovering new relationships; they are semi-automated; and they can be deployed in real-time on large amounts of data.

Workshop objectives

During the “data mining for engineering research” workshop, participants will

Workshop topics

Pre-requisites

The workshop is intended for graduate-level students and faculty who have taken at least one statistics course.

References

Grossman R L, Kamath C, Kegelmeyer P, Kumar V, & Namburu R (Eds.) (2001) Data Mining for Scientific and Engineering Applications. Series: Massive Computing, Vol. 2, Springer. ISBN: 978-1-4020-0033-1

Shmueli G, Patel N R, and Bruce P (2010) Data Mining for Business Intelligence: Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner, 2nd edition,  John Wiley and Sons Inc., ISBN: 978-0-470-52682-8.

Shmueli G and Koppius O (2011) “Predictive Analytics in Information Systems Research”, MIS Quarterly, forthcoming.

Shmueli G (2010) “To Explain or To Predict?”, Statistical Science, vol 25(3) pp. 289-310.