Layered critical values: a powerful direct-adjustment approach to discovering significant patterns

Webb, Geoffrey I.

doi:10.1007/s10994-008-5046-x

Layered critical values: a powerful direct-adjustment approach to discovering significant patterns

Technical Note
Published: 08 March 2008

Volume 71, pages 307–323, (2008)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Layered critical values: a powerful direct-adjustment approach to discovering significant patterns

Download PDF

Geoffrey I. Webb¹

766 Accesses
41 Citations
3 Altmetric
Explore all metrics

Abstract

Standard pattern discovery techniques, such as association rules, suffer an extreme risk of finding very large numbers of spurious patterns for many knowledge discovery tasks. The direct-adjustment approach to controlling this risk applies a statistical test during the discovery process, using a critical value adjusted to take account of the size of the search space. However, a problem with the direct-adjustment strategy is that it may discard numerous true patterns. This paper investigates the assignment of different critical values to different areas of the search space as an approach to alleviating this problem, using a variant of a technique originally developed for other purposes. This approach is shown to be effective at increasing the number of discoveries while still maintaining strict control over the risk of false discoveries.

References

Agrawal, R., Imielinski, T., & Swami, A. (1993). Mining associations between sets of items in massive databases. In Proceedings of the 1993 ACM-SIGMOD international conference on management of data (pp. 207–216). Washington, DC, May 1993.
Aumann, Y., & Lindell, Y. (1999). A statistical theory for quantitative association rules. In Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining (KDD-99) (pp. 261–270).
Bastide, Y., Pasquier, N., Taouil, R., Stumme, G., & Lakhal, L. (2000). Mining minimal non-redundant association rules using frequent closed itemsets. In First international conference on computational logic—CL 2000 (pp. 972–986). Berlin: Springer.
Google Scholar
Bay, S. D., & Pazzani, M. J. (2001). Detecting group differences: Mining contrast sets. Data Mining and Knowledge Discovery, 5(3), 213–246.
Article MATH Google Scholar
Bayardo, R. J. Jr., Agrawal, R., & Gunopulos, D. (2000). Constraint-based rule mining in large, dense databases. Data Mining and Knowledge Discovery, 4(2/3), 217–240.
Article Google Scholar
Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A new and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B, 57, 289–300.
MATH MathSciNet Google Scholar
Brijs, T., Swinnen, G., Vanhoof, K., & Wets, G. (1999). Using association rules for product assortment decisions: A case study. In Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 254–260). New York: ACM.
Chapter Google Scholar
Brin, S., Motwani, R., & Silverstein, C. (1997). Beyond market baskets: Generalizing association rules to correlations. In J. Peckham (Ed.), SIGMOD 1997, Proceedings ACM SIGMOD international conference on management of data, May 1997 (pp. 265–276). New York: ACM.
Chapter Google Scholar
Dong, G., & Li, J. (1999). Efficient mining of emerging patterns: Discovering trends and differences. In Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining (KDD-99) (pp. 15–18). New York: ACM.
Google Scholar
DuMouchel, W., & Pregibon, D. (2001). Empirical Bayes screening for multi-item associations. In KDD-2001: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, August 2001 (pp. 76–76). New York: ACM.
Google Scholar
Gionis, A., Mannila, H., Mielikainen, T., & Tsaparas, P. (2006). Assessing data mining results via swap randomization. In 12th international conference on knowledge discovery and data mining (KDD) (pp. 167–176).
Han, J., Wang, J., Lu, Y., & Tzvetkov, P. (2002). Mining top-K frequent closed patterns without minimum support. In International conference on data mining (pp. 211–218).
Hettich, S., & Bay, S. D. (2007). The UCI KDD archive. Department of Information and Computer Science, University of California, Irvine, CA. http://kdd.ics.uci.edu.
Holland, B. S., & Copenhaver, M. D. (1988). Improved Bonferroni-type multiple testing procedures. Psychological Bulletin, 104(1), 145–149.
Article Google Scholar
Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6, 65–70.
MathSciNet Google Scholar
International Business Machines (1996). IBM intelligent miner user’s guide, version 1, release 1.
Jaroszewicz, S., & Simovici, D. A. (2004). Interestingness of frequent itemsets using Bayesian networks as background knowledge. In R. Kohavi, J. Gehrke, & J. Ghosh (Eds.), KDD-2004: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, August 2004 (pp. 178–186). New York: ACM.
Chapter Google Scholar
Klösgen, W. (1996). Explora: A multipattern and multistrategy discovery assistant. In U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, & R. Uthurusamy (Eds.), Advances in knowledge discovery and data mining (pp. 249–271). Menlo Park: AAAI.
Google Scholar
Liu, B., Hsu, W., & Ma, Y. (1999). Pruning and summarizing the discovered associations. In Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining (KDD-99), August 1999 (pp. 125–134). New York: AAAI.
Chapter Google Scholar
Megiddo, N., & Srikant, R. (1998). Discovering predictive association rules. In Proceedings of the fourth international conference on knowledge discovery and data mining (KDD-98) (pp. 27–78). Menlo Park: AAAI.
Google Scholar
Možina, M., Demšar, J., Žabkar, J., & Bratko, I. (2006). Why is rule learning optimistic and how to correct it. In Machine learning: ECML 2006 (pp. 330–340). Berlin: Springer.
Chapter Google Scholar
Newman, D. J., Hettich, S., Blake, C., & Merz, C. J. (2007). UCI repository of machine learning databases (Machine-readable data repository). Department of Information and Computer Science, University of California, Irvine, CA.
Piatetsky-Shapiro, G. (1991). Discovery, analysis, and presentation of strong rules. In G. Piatetsky-Shapiro & J. Frawley (Eds.), Knowledge discovery in databases (pp. 229–248). Menlo Park: AAAI/MIT Press.
Google Scholar
Scheffer, T. (1995). Finding association rules that trade support optimally against confidence. Intelligent Data Analysis, 9(4), 381–395.
Google Scholar
Scheffer, T., & Wrobel, S. (2002). Finding the most interesting patterns in a database quickly by using sequential sampling. Journal of Machine Learning Research, 3, 833–862.
Article MathSciNet Google Scholar
Webb, G. I. (1995). OPUS: An efficient admissible algorithm for unordered search. Journal of Artificial Intelligence Research, 3, 431–465.
MATH MathSciNet Google Scholar
Webb, G. I. (2001). Discovering associations with numeric variables. In Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining (KDD-2001) (pp. 383–388). New York: ACM.
Chapter Google Scholar
Webb, G. I. (2002). Magnum opus, version 1.3. Software. Melbourne: G.I. Webb & Associates.
Google Scholar
Webb, G. I. (2006). Discovering significant rules. In L. Ungar, M. Craven, D. Gunopulos, & T. Eliassi-Rad (Eds.), Proceedings of the twelfth ACM SIGKDD international conference on knowledge discovery and data mining, KDD-2006 (pp. 434–443). New York: ACM.
Chapter Google Scholar
Webb, G. I. (2007). Discovering significant patterns. Machine Learning, 68(1), 1–33.
Article MathSciNet Google Scholar
Webb, G. I., & Zhang, S. (2005). K-optimal rule discovery. Data Mining and Knowledge Discovery, 10(1), 39–79.
Article MathSciNet Google Scholar
Zaki, M. J. (2004). Mining non-redundant association rules. Data Mining and Knowledge Discovery, 9(3), 223–248.
Article MathSciNet Google Scholar
Zhang, H., Padmanabhan, B., & Tuzhilin, A. (2004). On the discovery of significant statistical quantitative rules. In Proceedings of the tenth international conference on knowledge discovery and data mining (KDD-2004), August 2004 (pp. 374–383). New York: ACM.
Chapter Google Scholar
Zheng, Z., Kohavi, R., & Mason, L. (2001). Real world performance of association rule algorithms. In Proceedings of the seventh international conference on knowledge discovery and data mining (KDD-2001), August 2001 (pp. 401–406). New York: ACM.
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Information Technology, Monash University, Clayton Campus, Wellington Road, Clayton, Vic, Australia
Geoffrey I. Webb

Authors

Geoffrey I. Webb
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Geoffrey I. Webb.

Additional information

Editor: Johannes Fuernkranz

Rights and permissions

Reprints and permissions

About this article

Cite this article

Webb, G.I. Layered critical values: a powerful direct-adjustment approach to discovering significant patterns. Mach Learn 71, 307–323 (2008). https://doi.org/10.1007/s10994-008-5046-x

Download citation

Received: 15 July 2007
Revised: 06 February 2008
Accepted: 07 February 2008
Published: 08 March 2008
Issue Date: June 2008
DOI: https://doi.org/10.1007/s10994-008-5046-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Layered critical values: a powerful direct-adjustment approach to discovering significant patterns

Abstract

Article PDF

Similar content being viewed by others

A tutorial on statistically sound pattern discovery

A new causal discovery heuristic

Sets of Robust Rules, and How to Find Them

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Layered critical values: a powerful direct-adjustment approach to discovering significant patterns

Abstract

Article PDF

Similar content being viewed by others

A tutorial on statistically sound pattern discovery

A new causal discovery heuristic

Sets of Robust Rules, and How to Find Them

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation