Date and time
Monday 9 November, 13:30 - 17:00.
Description
This workshop gives an introductory overview of the field of data mining. Data Mining is about discovering patterns in large data sets, and converting data into information or learning. Data sets can be examined systematically from different perspectives seeking regularities and patterns, some of which may be spurious or trivial but others may be interesting and potentially useful "nuggets".
Data mining is a relatively recent multi-disciplinary field that combines techniques from statistics, computing and machine learning. One of the outputs of e-research is the gathering of large data sets. This workshop is aimed at helping participants understand whether data mining is applicable and has the potential for unlocking useful information from their online repositories.
The workshop surveys the field highlighting the key ideas - descriptive data exploration, data mining methodologies and the different forms of algorithms. We step through selected representative algorithms to illuminate the underlying principles. There is a hands on component using the open source R package to demonstrate its underlying graphical and statistical capabilities.
Outline
- Overview – What is data mining – introduction, descriptive data exploration, data mining methodologies, the importance of data integrity and overview of the different forms of algorithms.
About 60 minutes - Selected representative algorithms in depth – Basket analysis, classification and clustering. We select representative techniques and walk through them in just enough detail to provide background understanding as to how they work. This illuminates the computation principles and informing statitistcs behind these techniques.
60 minutes - Hands on with R – part 1 - In this part we introduce the R package and demonstrate some of its graphical capabilities.
45 minutes - Hands on with R – part 2 - In this part we show users how to use some of the statistical and package cababilites of R.
30 minutes
Who should attend
Researchers (scientists) and students who are interested in developing knowledge about data mining and the latest developments in data description and exploration techniques.
Attendees to bring a laptop capable of connecting to the Internet or a USB drive for downloading sample data files. The R package should be downloaded and installed from http://www.r-project.org/ prior to the workshop. R is a free software package for statistical computing and graphics (with versions for Windows, MacOS and Unix).
About the presenters
Dr Ayse Bilgin is a lecturer in the Department of Statistics at Macquarie University. She teaches undergraduate and postgraduate students in various topics such as Operations Research, Data Mining and Decision Support Systems. Her research interests include statistics education and applied statistics especially in health sciences.
Associate Professor Julian Leslie is from the Department of Statistics at Macquarie University where he has developed an interest in data-mining methods as part his research in forensic statistics. It has been apparent for some time that there is considerable demand for data-mining expertise amongst those who encounter large datasets as part of their work. In response to this, Dr Bilgin and Dr Leslie offer a postgraduate unit in data-mining at Macquarie University.
Gillian Miller is affiliated with the Department of Computing at Macquarie University and has taught a variety of software engineering units including Web services, XML technologies, databases and advanced information systems development. She has a career background in online information systems dissemination and information retrieval systems for government and industry. Her interests include semi-structured data, enhanced information modeling and recently data mining.