Workshops

Workshops
by Joshua

INTRODUCTORY MACHINE LEARNING: DECISION TREES, RANDOM FORESTS, AND OTHER DENDROLOGICAL METHODS

James Chen - Justin Smith Morrill Professor of Law, Michigan State University

James Chen – Justin Smith Morrill Professor of Law, Michigan State University

A summary of the workshop can be viewed here

Perhaps no task is more prevalent, and more useful, in economic analysis than the prediction of a numerical value through its relationship with other variables. By far the most popular tool for regression is the multivariable generalization of ordinary least squares. Every spreadsheet and statistics package performs linear regression. In addition, the results of linear regression are widely and readily understood. The scale and sign of coefficients, along with p-values and t-statistics, communicate valuable information within the language of economics.

Despite all of these benefits, linear regression may not be the most effective way to describe relationships among economic variables or to predict as yet unseen instances of a phenomenon. Machine learning and artificial intelligence have dramatically expanded the range of tools available in economics. Open-source software and a burgeoning coding community have made these methods more accessible to a broader audience.

This workshop introduces the “dendrological” family of machine-learning methods. These methods, at their root, depend on the use of decision trees to divide data, variable by variable. Statistically informed extensions, such as bagging and pasting, extend the explanatory power of decision trees. Ensembles of decision trees harness the Delphic wisdom of potentially thousands of miniature regressors.

Trees and forests do lack the overt interpretability of linear regression. Machine-learning packages often compensate for the opacity of these “black-box” techniques by scoring the relative importance of dataset features. This workshop will also cover the theoretical tradeoff between bias and variance, as well as the importance of training, cross-validation, and reserving a holdout dataset for testing.

This workshop presumes that participants have little to no experience with machine learning. We will explore the Boston housing study as an iconic instance of economic regression and a proving ground for machine-learning methods.

Further reading:

Aurélien Géron, Hands-On Machine Learning with SciKit-Learn, Keras & TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems (O’Reilly, 2nd edition 2019), https://amzn.to/31CDBPc

Thomas W. Miller, Marketing Data Science: Modeling Techniques in Predictive Analytics with R and Python (Pearson Education, 2015) (including an overview of the Boston housing study), https://amzn.to/2CebMnx

Andreas C. Müller & Sarah Guido, Introduction to Machine Learning with Python: A Guide for Data Scientists (O’Reilly, 2017), https://amzn.to/3fITkBd

Click to view full model

HEALTH, EDUCATION, AND WELFARE:  RESULTS FROM, AND TRAINING IN USING, DATA FROM THE CONSUMER EXPENDITURE SURVEYS

 U.S. Bureau of Labor Statistics, Washington, DC

Geoffrey Paulin, Ph.D.
Leknath Chalise
Leknath Chalise
Julie Sullivan

The Consumer Expenditure Surveys (CE) are the most detailed source of expenditures collected directly from consumers by the Federal government. They also collect information on demographics, income, assets, and liabilities. The CE program publishes the results of the surveys in both tabular and microdata formats for free download by economists at all levels of experience: from students (undergraduate and graduate) to long-time professionals. Staff of the CE program present current research, followed by training in use of the data.

Attendees will receive an introduction to the CE and its component surveys, the Interview and Diary Surveys. This includes a description of the CE mission, survey methods, and free data provided for public use, both tables and microdata.

Then, following the examples of research in progress, attendees will take a tour of the CE website to learn the location of tables and microdata files, and published articles using CE data.  This will be followed by instruction in use of an online tool particularly useful for those who want to track trends in expenditures for general or specific items in either table or chart formats. Participation in this session does not require any special skills, software, or detailed knowledge beyond an understanding of basic economic terms and statistical techniques.

Research in progress to be presented in this session includes:”Let’s See the Charts:  Trends in Healthcare Expenditures from 2004-2018″, “Comparing Characteristics and Selected Expenditures of Dual and Single Income Households with Children”, “How Student Loans Relate to Income, Expenditures, Asset Accumulation, and Other Liabilities: A Matter of Life in Debt?”, “CE Data and Tools: How to Find and Use Them

To learn more about the CE in advance, please visit the CE homepage (https://www.bls.gov/cex/).

Links to tables (https://www.bls.gov/cex/tables.htm) and microdata (https://www.bls.gov/cex/pumd.htm) are available on this page.

Examples of published research using or describing CE data include:

Monthly Labor Review:

Beyond the Numbers:

Spotlight on Statistics: