PT - JOURNAL ARTICLE AU - David J. Leinweber TI - Stupid Data Miner Tricks AID - 10.3905/joi.2007.681820 DP - 2007 Feb 28 TA - The Journal of Investing PG - 15--22 VI - 16 IP - 1 4099 - https://pm-research.com/content/16/1/15.short 4100 - https://pm-research.com/content/16/1/15.full AB - This article originated over ten years ago as a set of joke slides showing silly spurious correlations. These statistically appealing relationships between the stock market and diary products and third world livestock populations have been cited often, in Business Week, the Wall Street Journal, the book “A Mathematician Looks at the Stock Market,” and elsewhere. Students from Bill Sharpe's classes at Stanford seem to be familiar with them. The slides were expanded to include some actual content about data mining, and reissued as an academic working paper in 2001. Occasional requests arrive from distant corners of the world, so I thank the editors of the Journal of Investing for publishing this article. Without taking a hatchet to the original, the advice offered remains valuable, perhaps even more so now that there is so much more data to mine. Monthly data arrives as a single data point, once a month. It's hard to avoid data mining sins if you look twice. Ticks, quotes, and executions arrive in millions per minute, and many of the practices which fail the statistical sniff tests for low frequency data can now be used responsibly. Nevertheless, fooling yourself remains an occupational hazard in quantitative trading.TOPICS: Big data/machine learning, security analysis and valuation