Statistical Principles for Data Mining

Date/Time: October 22, 2014 from 6:30 PM to 8:30 PM, with food/drinks (membership not required to attend)


Data mining is a juggernaut. Unfortunately, an unacceptable number of automated methods stumble on the usual landmines. The results are invalid inferences, uncalibrated predictions, and proposed interventions that fail to produce the expected benefits. To avoid these pitfalls we encourage adoption of four essential principles:

·       Always combine expert judgment with automated algorithms.

·       Always account for data snooping.

·       Always check the model against the observed data.

·       Always calibrate statistical procedures.

The talk gives simple examples to illustrate the importance of following these mandates.


Speaker image

Joe R. Hill is an HP Fellow. He joined EDS R&D in 1986 after receiving a Ph.D. in Mathematics (with an emphasis in Statistics) from the University of Texas at Austin. He has had many roles over the years and is currently Chief Technologist for Business Process Services in the Office of the CTO of HP Enterprise Services. In addition to being an expert in modern service-oriented and business process architectures, he is passionate about statistics being done correctly. Here are a few papers that describe his perspective:

·       “A General Framework for Model-Based Statistics” (1990) Biometrika 77, 115-126.

·       “Outlier Tests for Logistic Regression: A Conditional Approach” (with Ed Bedrick) (1990) Biometrika 77, 815-827.

·       “A Generalized Bootstrap” (with Ed Bedrick) (1992) in Exploring the Limits of Bootstrap, edited by Raoul LePage and Lynne Billard, New York: John Wiley and Sons, 319-326. Based on the IMS Special Meeting on Bootstrap, East Lansing, 1990.

·       “Properties and Applications of the Generalized Likelihood as a Summary Function for Prediction Problems” (with Ed Bedrick) (1999) Scandinavian Journal of Statistics 26, 593-609.


THE ADVISORY BOARD - BUILDING 7 (map - http://bit.ly/PA804c)

Room Number: Suite 100

12357-C Riata Trace Parkway

Bldg 7, Suite 100

Austin, Texas

United States 78727

Meeting Agenda:

6:30 p.m. Networking and Gathering (with free food, drinks)

6:50 p.m. Call to Order, Announcement

7:00 p.m. Presentation, with Q/A

8:30 p.m. Meeting Evaluation, Adjourn