Pre-Mortem: Metaculus' AI Progress Tournament (...Sort Of)
I most recently wrote about my reasoning for wanting to try and forecast AI progress soon, and painted a rough picture of my next steps. Since then I’ve poked around the space and settled on an initial target for my efforts: Metaculus’ Forecasting AI Progress Tournament. The only problem is that the tournament ended about two years ago.
Luckily, there are 23 remaining questions from the pool that are still open on the platform because they have resolution dates out in 2025 or so. While there’s no possibility of prize money or accolades, I think this is still the perfect set of initial AI questions for me to dig into. They’re curated by experts in the space as likely to be relevant, and they’re mostly about individual very tangible trends extending into the near term future. This is contrasted with questions like the chance of an AI catastrophe which I feel mostly require a particular worldview or probability distribution across worldviews. I would much rather start with the ground level questions so I can build up an accurate model of the current AI progress world before I start trying to build my own AI worldview. The research required to forecast these questions well should do just that.
Even though there are already many forecasts and comments on these aging questions I want to artificially enforce a structure similar to the x-risk tournament that I participated in previously. I really value thinking through a question, doing my own research, and coming to my own conclusion before seeing the work of others and then deciding whether or not to update my original forecast. This is doubly true when my objective is to learn rather than produce the most accurate forecast, as it is here.
Accordingly, my planned approach for each question is to…
Review the question and question criteria. I can hide the community forecast to avoid biasing myself.
Do my own research on the factors I think are critical to the question to see what information is out there / I can find. I’m aware of some organizations and individuals creating resources that will be useful, but I fully expect to discover many more through this effort.
Generate my initial forecast with an associated rationale.
Review the community prediction and associated rationales/comments. How different is the consensus from where I ended up? Can I tell why? Where individuals disagreed with me was it on the basis of intuitions or different information? What information sources did people find that I missed but that were relevant to my model of the question?
Generate my “final” forecast with an associated rationale. This is what I plan to publish here, and I’ll try to make it explicit where my independent reasoning changed based on what I learned from the other forecasters. I will also post this to Metaculus as a forecast. I don’t think this will do me much good because these questions aren’t even resolving until 2025, but if anyone else is referencing these questions for something maybe my rationales will be useful. It will also only take me 10 seconds of copying and pasting by the time I’ve gotten to this point.
I’m expecting to spend maybe an average of 1-2 hours per question, but I’ll spend more if I feel like I’m still learning a lot from the associated research at that point. The above protocol is also liable to change as I see how it actually plays out with the first few questions.
I’ve also sequenced the full list of 23 questions into an order that I expect to make the most sense when I work through them. This is very fuzzy and based mainly on intuition, but it comes from reasoning like wanting to understand the state of research on different topics and trends before forecasting performance before forecasting commercial impact. Within that, I’m obviously grouping highly similar questions together, with nearer term timeline ones coming first. The question topics, in the order I plan to forecast them, is:
Number of "Natural Language Processing" Papers
Number of "Few-Shot Learning" Papers
Number of "Reinforcement Learning" Papers (2021-2027)
Number of "Reinforcement Learning" Papers (2021-2030)
Number of "Multi-Modal Learning" Papers
Number of "AI Safety, Interpoerability or Explainability" Papers (2021-2026)
Number of "AI Safety, Interpoerability or Explainability" Papers (2021-2030
Supercomputer Performance
Total Performance of Top 500 Supercomputers
Maximum Training Compute
Industrial Production Index for Semiconductors and PCBs (2026)
Industrial Production Index for Semiconductors and PCBs (2030)
Image Classification Performance (miniImageNet)
Image Classification Performance (Custom Index)
Language Model Performance
Object Detection Performance
Number of Computer and Information Research Scientists in the US
Wage for Computer and Information Research Scientists in the US
Alphabet's Market Capitalization (2026)
Alphabet's Market Capitalization (2030)
IGM Price
Software and IT contribution to US GDP (Q3 2030)
Software and IT contribution to US GDP (Q4 2030)
I will link to the actual Metaculus questions when I publish my rationales. As with my protocol for how to answer each question, this sequencing of questions is liable to change once I get to work and find out all my intuitions are wrong. Such is the nature of a pre-mortem.
With the amount of time I expect to spend on each question, I’ll maybe get to 5 per week, meaning this task will keep me busy for almost 5 weeks of my budgeted time. When I look at the above question topics, this seems well worth it to have a decent foundation in the current state of AI and its impacts as well as opinions about where it’s headed. I’ll be much more confident digging into subsequent, juicier questions about catastrophic and existential AI risks that until now I haven’t felt qualified to really attempt.
I’ll publish my rationales here on this blog in weekly bundles, so they’ll be naturally grouped similarly to the order I laid out above. Once I’m done, I plan to write a post-mortem which will look back at this plan for how I wanted things to go and compare that to what actually happened. By that time, maybe I’ll even have a picture of my next steps in AI forecasting.