Forecast: AI Progress, Batch 1 - Academic Papers
You can find my pre-mortem for these forecasts here. This post is the first of what I expect to be three batches of forecasts for the remaining open questions from the “Forecasting AI Progress” tournament by Metaculus. I published these forecasts on Metaculus as I completed them, so the first part of this post will be those forecasts copied here along with a link to the Metaculus question they’re answering. They’re in the order that I wrote them, and they tend to reference each other so hopefully there’s not too much redundancy in this format.
I have some initial takeaways just from forecasting this set of questions, but I think I’ll save them until I’ve forecasted all the questions and publish a full post-mortem so that I can consider it all in context.
As always, feedback or discussion on these forecasts is most welcome.
How many Natural Language Processing e-prints will be published on arXiv over the 2021-01-14 to 2030-01-14 period?
Question Link - My Forecast Link
A linear fit on the annual publications from 2017-2022 would have about 103k total papers in this time period with about 39k already published.
Reasons the future might differ from the pretty consistent linear increase we've seen so far:
A different term or set of terms comes to mean the kind of research we currently expect to call NLP
Maybe a 2% chance of a 50% reduction in publication volume?
arXiv as a platform changes in popularity / shuts down
Looks like it launched in 1991 and has had the same leadership since 2011. Going to ignore this risk.
Everyone dies
My previous forecasts have this at 0.3% by 2030. Going to ignore this risk
Some sort of relevant research moratorium or taboo. If "AI" comes to dominate NLP, as it seems to be doing, this could be relevant.
Maybe 5% chance of a 50% reduction?
A discontinuity in number of researchers or the speed at which they can publish.
I don't think it's plausible we'd see a large discontinuity in the number of researchers in this time period, but AI tools could greatly enhance publication speed both via topic generation and the actual writing process for papers.
This random summary of a survey by Nature has researchers (across all fields?) spending about 25% of their time on writing.
A 25% boost in the publication speed of researchers due to AI tools in the next few years seems very reasonable to me, so let's call this roughly a 12% boost in publication volume. I expect this effect to dominate any fluctuations in topic popularity or researcher population from the already observed trend.
If NLP is "solved" in some sense or at least becomes boring, this could significantly reduce publication volume (insight from reading the comments)
It doesn't seem obvious to me that this is possible. Chess is cited in the comments as precedent, but this seems fundamentally different as the same type of research is going on, but now on different games. NLP is a broader topic, and even if the application side of things is totally "solved'" by black box LLMs in the near future, shouldn't there still be research into finding its limits and trying to understand how it works? I'm going to disregard this impact for now as I'm not confident in either magnitude or direction.
Adjusting for the above factors makes me want to target a median of ~112k. There are already ~39k publications in this time period, so I want to keep my left tail tight. I think there's a possibility that AI enhancement of publication efficiency greatly increases volume beyond what I've described above, so I'm ok with a longer right tail. It seems like I'm shifted slightly higher than the community prediction, but I'm sure most of those predictions were made in 2021.
How many e-prints on Few-Shot Learning will be published on arXiv over the 2020-01-01 to 2027-01-01 period?
Question Link - My Forecast Link
Looks like the sliders aren't really set up to accommodate my forecast... so this explanation should be more valuable than those numbers.
I'm adapting my approach from forecasting NLP papers, but a 2nd degree polynomial fit the 2017-2022 data much better than a linear fit. Extrapolating this this through the date range gives a total of about 23,000 papers to be published with a >6,400 already having been published (lower bound of the distribution).
I believe that the possibility of LLMs increasing the efficiency of publication (as Id escribed in my NLP papers forecast) holds well across topics, so again I'll increase my forecast by ~12% to account for this.
Relative to NLP, I think this search term is more likely to both:
Be affected by some sort AI research moratorium or taboo
Fall-off in popularity because the dominant AI paradigm has shifted, or this particular technique just isn't useful anymore.
Together I give maybe a 20% chance of a 50% reduction in publication volume, or a 10% reduction in my forecast. These two effects combine for a net 2% increase, or an expected 23,500 publications.
Reading through the comments, it seems like the large gap between my forecast and the community prediction is mostly from the other predictions being made a while ago when there was more time for the increasing popularity of this term to stall out. There is also an anecdote from 2020 about arXiv being outcompeted by other platforms which I found interesting. I then found data of it surging in popularity in 2021, and so decided not to update off of this.
How many Reinforcement Learning e-prints will be published on arXiv over the 2021-01-14 to 2027-01-01 period?
Question Link - My Forecast Link
Drawing from the reasoning I used in forecasting NLP and Few-Shot Learning papers, a linear trend for annual paper totals from 2017-2022 provides a pretty good fit (r^2=0.969). Extrapolating this gives a total of ~36,600 papers by 2027 with about 15,000 already published.
Reinforcement learning as a concept seems more likely than NLP to someday decrease in popularity/relevance, but less likely than Few-Shot Learning. Few-Shot learning represents a more specific technique within the reinforcement learning "paradigm". In the Few-Shot Learning papers forecast I gave a 20% chance of a 50% reduction in publication volume over this time period and in the NLP papers forecast about a 7% chance of a 50% reduction over a slightly longer time period (to 2030). for this forecast, I'll estimate a 10% chance of a 50% reduction.(-5% to forecasted volume of papers)
My previous arguments about AI increasing the efficiency of paper writing/publication still holds, which I approximated as a 12% increase in publication volume over this same time period. (Net +7% to forecast, giving ~39,000 as the target median, left tail should bottom out about 15,000, and a long right tail)
How many Reinforcement Learning e-prints will be published on arXiv over the 2020-12-14 to 2031-01-01 period?
Question Link - My Forecast Link
The slider settings make forecasting this inline with my reasoning pretty functionally impossible.
All the same reasoning as my forecast for the shorter time period, found here.
Same linear fit on this longer time horizon predicts a total of ~68,200 papers with about 15,000 already published.
The longer time period in questions means there is more opportunity for things to diverge from this predicted trend. I have been considering both the positive impacts of AI efficiency gains and the possible negative impacts of an AI research moratorium/taboo or a term becoming less relevant to the research being done. I'm going to estimate that the increased time period has a symmetrical effect on both these positive and negative possibilities and keep my net adjustment at +7% of my estimated trend to get an expected 73,000 papers with a left tail that bottoms out above 15,000 and extends far to the right.
How many e-prints on multi-modal learning will be published on ArXiv over the 2021-02-14 to 2031-02-14 period?
Question Link - My Forecast Link
This forecast is using similar reasoning to my forecasts of NLP, Few-Shot Learning, and Reinforcement Learning publications.
Fitting a 2nd degree polynomial tot he annual publication totals from 2017-2022 gives an R^2 of .987 (vs. .956 for a linear fit). Using this trend, the expected total number of publications in the specified time period is about 8,100 with about 1,050 having already been published.
Due to the combined risks of AI Research moratoriums/taboos and just general obsolescence of particular research terms, I had given few-shot learning a 20% chance of a 50% reduction in publication volume, NLP about a 7% chance of a 50% reduction, and reinforcement learning a 10% chance of a 50% reduction.
I associate multi-modal learning with the trend of increasing generality of AI models, and this seems like a more robust concept/trend than few-shot learning and maybe close to reinforcement learning. I'm estimating a 12% chance of a 50% reduction (-6% impact on forecast).
My previous arguments about AI increasing the efficiency of paper writing/publication still hold, which I approximated as a 12% increase in publication volume over this same period.
These two factors net out out to +6% in publication volume for a median of about 8,600, a left tail that bottoms out above 1050, and a longer right tail.
How many e-prints on AI Safety, Interpretability or Explainability will be published on arXiv from 2021 through 2026?
Question Link - My Forecast Link
I'm using the same structure of forecast/reasoning as I applied to the similar NLP, Few-Shot Learning, Reinforcement Learning, and Multi-Modal Learning questions. See those for more detail on assumptions.
A linear fit on 2017-2022 annual publication data gives an R^2 of 0.983 (same as a 2nd degree polynomial, but trend looks more linear to me). This trend predicts a total of 5800 papers in this time period with about 2300 already published.
Relative to the other paper topics I've forecasted, I expect this to be much less vulnerable to a research moratorium/taboo, and more robust to obsolescence. I'm estimating a 2% chance of a 50% reduction in publication volume (-1% to forecast).
My same arguments about LLMs increasing publication efficiency hold, updating my forecast upwards by 12%.
A net +11% update gives a target median of 6400 with a left tail bottoming out around 2300 and a long right tail.
How many e-prints on AI Safety, interpretability or explainability will be published on ArXiv over the 2021-02-14 to 2031-02-14 period?
Question Link - My Forecast Link
Same reasoning and updates as my forecast for the same question with a shorter time period.
Extrapolating the same trendline through this longer timeline gives a total of 11500 papers with 2300 already published. The same net +11% update gives a target median of 12800 with a left tail bottoming out around 2300 and a long right tail.