Gather all formal demographic fertility and population projection methods into a single open source R package for academic, policy, and popular use

Longer description of your proposed project

The ongoing global fertility collapse has generated remarkably little attention. A handful of new Twitter accounts and blog posts opining on the issue, a $10 million donation towards fertility research from Elon Musk, and the “Natal Conference” held in Austin this year simply do not do justice to the social consequences entailed by the rapid aging and eventual vanishing of Europe, South Korea, Japan, and China. In leading newsmagazines like The Atlantic, the problem of fertility is given coverage perhaps every six months or so (a recent example on how fertility is leading to vanishing cousin relationships can be found here: https://www.theatlantic.com/family/archive/2023/12/cousin-relationships-fertility-rate/676892/).

Part of the gap between events and public attention has arisen from the limited funding given to social scientists who study fertility versus mortality. For instance, within the National Institutes of Health, over 200% more funding is given to the study of aging in the form of the National Institute on Aging than to the National Institute on Child Health and Human Development. This calculation is a wild underestimate, neglecting the fact that the overwhelming majority of disease-specific funding, e.g. from the Cancer Institute or the National Heart, Lung, and Blood Institute is also targeted at the elderly. A ballpark calculation (available upon request) suggests that over 1000% more public funding goes to the study of population aging and the elderly than fertility. As a result of this neglect, low-hanging, high-impact fruit abounds.

What are the predictors of fertility at the group and individual level? How have these predictors evolved over time? What areas of America will depopulate the fastest between now and 2050? Europe? South Korea? Japan? Where will schools need to consolidate? How much migration will be needed to replicate today’s age and sex structure? Will sub-Saharan African countries, who (for the most part) just started their fertility decline, be the birthplace of 20% of humanity in 2050? Rather than answer these questions directly, I am applying to ACX Grants to develop a software toolkit for demographic estimation, projection, and forecasting (working title: “DemoProj”) concentrated on fertility analysis and small area forecasting in R (2 million global users, concentrated in the statistics community), and if funding permits, Python (8 million global users, concentrated in web development and data analytics).

Population projections are used by policy researchers for actuarial, military, climate, and health sciences; economics, epidemiology, sociology, and education; and environmental, migration, and urban studies. Despite how central they are to public administration and the future of humanity, there is no standard demographic software for population projections or fertility analysis. Instead, municipal planners, civil engineers, national Census bureaus, the U.N., university departments, school districts, defense, healthcare, and labor force planners, and environmental organizations all conduct their own population projections using ad hoc assumptions and computational routines. Findings across research reports are often not reproducible. A flexible, public-facing, open source R package enabling reproducible analysis would not only facilitate all of the routine policy analytics work described above, but could also greatly enhance the public discourse on fertility by allowing anyone with a working scripting knowledge to investigate demographic questions on their own. The package will include national and small area population projections, a wrapper for all fertility-data related APIs, and classic model fertility analysis routines.

Why has this not been done already? In fact, the U.N. Population Division, recognizing both the gap I have described and the global benefits to be had from filling it, commenced the development of a “DemoTools” package in R in 2019. Originally intended to include all fertility and population projection tools used by demographers and policy analysts, the COVID-19 pandemic has paused the development of DemoTools’ fertility analysis, population projection, and forecasting indefinitely in favor of work on mortality. Other demographic packages exist for population projection, but they tend to be either inaccessible to the public, incomplete, or inappropriate for policy use. SAS, a premium statistical programming language, contains extensive demographic estimation and projection libraries but costs thousands of dollars per year, putting it out of the range of independent researchers, those in the developing world, and most students. Other R packages are focused on animal populations or specific statistical modeling approaches, e.g. BayesPop. This proposal can fill an important need that will facilitate answering questions of high importance for the future of humanity.

Describe why you think you're qualified to work on this

The U.S. generates a small number of Ph.D.’s in Demography every year (around 10) from Princeton University, U.C. Berkeley, Penn State, and the University of Pennsylvania. Ph.D.’s in demography devote about two years to learning about the mathematical core of formal demographic research and statistics before conducting original research and prior to going on the academic or public sector job market. I am one of these demographers in training and have reached the “All But Dissertation” stage of my studies. I have published peer-reviewed work in Demographic Research, Econ Journal Watch, the International Review of Financial Analysis, the Journal of Human Capital, and have papers under review at the American Sociological Review, Social Forces, and RSF: The Russell Sage Foundation Journal of the Social Sciences. My collaborators include the pioneering researcher in multigenerational inequality, Xi Song at the University of Pennsylvania, UCLA sociologist Jennie Brand, Harvard economist George Borjas, asset pricing expert Frank Fabozzi, and Assistant Professor of Public Administration and Policy Jason Anastasopoulos. Beyond the knowledge and time to do this project, I bring several additional advantages to the table.

First, prior to my graduate studies I worked at the U.S. Census Bureau, and I am familiar with both the U.S. and U.N.’s demographic and fertility data. I know the U.N. DemoTools team and thus can ensure that the proposed software will be compatible with the emerging global standards for the analysis of demographic data. As an individual without a particular mandate, I can also focus on high-value development opportunities that are generally outside of the U.N.’s purview, e.g. small area population projections and forecasting. If the project was funded and the package developed, it would find natural advertisement through my attendance at academic and industry demography conferences, e.g. the Population Association of America (PAA) annual conference and the PAA Applied Demography Conference, regularly. I am currently a fellow at the International Max Planck Institute for Population Health and Data Science, so I also know many members of the next generation of Europe-based demographers.

Second, during my graduate studies, my classmates and I produced many R scripts enabling analysts to produce human population and small area population projections. Much of the code that would make up the core of the proposed package has already been written through coursework, paper drafts, and attempts to understand course material. It would be substantially cheaper to develop an R package now, when much of the code has already been written, albeit split across multiple folders and files, than it would be to develop from scratch in the distant future. Academic researchers in demography (and social science generally) have not been especially interested in the development of open-source software to ensure that their methods find wider utilization.

In part, this situation has arisen because, historically, foundations (read: Bill and Melinda Gates Foundation) supporting the global public health community and tenure committees that evaluate academic social scientists simply do not put high value on public goods. Unlike an early graduate student or tenure track Professor, I am at a juncture in my career in which I am familiar with a number of valuable known and esoteric demographic methods and have time and energy to push these into the public domain for wider consumption. In this way, I hope to be a bridge from the older generation of demographers who made no great efforts to maintain their code or data to a new generation of public-facing social science scholars that embrace open science methods.

My classmate Eugenio Paglino is also likely to assist in this project if awarded funding. He is also a doctoral candidate in demography and sociology and leader in the rapidly growing field of Bayesian demography. He has published in Science Advances, JAMA Network Open, PLOS One, PLOS Global Public Health, Population Research and Policy Review, PNAS, and Demographic Research.

Other ways I can learn about you

My Twitter account is: https://twitter.com/regressiondisco

My Google Scholar account is: https://scholar.google.com/citations?hl=en&user=Eem7q2sAAAAJ&view_op=list_works&sortby=pubdate

A long interview with me could be found here:

https://www.youtube.com/watch?v=Z45AKuoTwJM

Unfortunately, it is the norm among sociologists not to upload preprints, but I have several unpublished working papers that I am happy to share upon request.

How much money do you need?

With $20,000, I could build out a “minimum viable package” that would be uploadable to the Comprehensive R Archive Network (CRAN) and contain small area (based on cohort change ratios) and fertility projections. I have set a minimum price of $9,500 to reflect the minimum amount of money I would need to give this project attention in the coming summer.

I have extensive experience coding in R and have contributed to R packages (including DemoTools itself). As a result, I recognize the chasm between small scripts to handle problem sets and textbook examples and the world of messy and unpredictable real data. There are a number of challenges in going from a functional and useful set of scripts to a publicly available R package. I am seeking funding in part to connect with a professional software engineer to clean, harmonize, generalize, and standardize the code so that the package is maintainable with a minimum of future burden for the social science software community. For instance, code, data, examples, and functions should be written in a unified syntax that complies with CRAN package requirements. All functions should have multiple examples of their use and clean documentation. Additional costs include website development and post-upload testing. With $50,000, I believe that I could replicate the package in Python and Stata. I expect that this would more than double the audience for DemoProj. Ultimately, I see DemoProj as one step in the development of a set of a comprehensive and unified packages for demographic analysis (working title: “demoverse”, after the beloved R “tidyverse” package(s)) that will reduce most of the tedium of demographic analysis to a few lines of code, enabling independent analysts and scholars to focus on the substance of their research (“what they want to say”) rather than tedious programming (“how to say it”). Completing the demoverse and ensuring that it remains on CRNA would probably cost much, but could save millions each for governments and policy analysts analysts around the world.

Estimate your probability of succeeding if you get the amount of money you asked for

80%+ for the minimum viable package. 60% for the stretch goal because I have not developed in Python for a long time and would be dependent on external software engineers. Still, I would say it is far more likely than not. The codes and data are routine. The main challenges are organizational.