Application: Connect Metaculus to the EA Forum to Incentivize Better Research
A hypothetical case study in how forecasting accuracy and citations can be used to reward the most useful research
This is an Application post, one of many post categories found on this blog. To see what to expect from posts in each category, along with categorized lists of past posts, check out the Table of Posts. As always, feedback or collaboration of any kind is greatly appreciated either here in the comments or privately on Twitter (@Damien_Laird).
Summary
I vaguely describe a hypothetical feature of both Metaculus and the EA forum, connecting them so that users can see which pieces of written content influenced accurate forecasts. This would enable more incentivization of more useful research, which would in turn enable higher quality and more accurate forecasts. I am not in a position to implement any aspect of this proposal, but I’m surfacing the concept for community consideration and feedback because I personally believe it would be both extremely valuable and feasible.
Introduction
I believe that improving the creation and curation of relevant knowledge in forms useful to generalist forecasters is a key lever in improving our ability to forecast Global Catastrophic Risks (GCRs), as I describe here. Furthermore, I think current incentive schemes for rewarding research are pretty weak. Within academia, the focus ends up on number of citations and number of publications. On the internet, rewards typically take the form of upvotes/karma or engagement. Both have a circular aspect to them, where the research that other researchers think is useful gets rewarded. This effect can spiral into traps like extensive and unnecessary jargon, negative-sum competition in the form of adversarial peer review or just lack of recognition/collaboration. Online research targeted at a more general audience, combined with deliberate norms, can circumvent some of these failure modes. Then the rewards go from what researchers think is good research to what a general audience thinks is good research.
But was that research actually useful? Did it make our model of the world more or less accurate? Maybe there was a critical error in the calculations (or just fraudulent data underlying the analysis), such that an otherwise beautifully polished paper is actually moving our knowledgebase in the wrong direction. To some extent errors can be caught by careful readers, or we can attempt to replicate studies to confirm their findings, but not all research lends itself to empirical validation. Less dramatically, research can just be chock full of jargon, otherwise poorly written, or about an irrelevant topic and still end up functionally useless.
What if we connected research to forecasting? Forecasts, especially in prediction polling environments, often have rationales attached that explain the reasoning and sources behind a particular prediction. If those rationales contained citations for their sources, you could actually track over time which sources of information tended to be used in more accurate forecasts. If those sources themselves contained legible citations of their sources, this information could flow along the citation chains, and you could see which scaffolds of knowledge were being built in a direction that improved our model of reality and which weren’t. Where research negatively impacted accuracy, you’d have reason to look for errors in data or reasoning that could produce this effect. In such a system it would be trivial to also show how the research of a given author/publisher stands up relative to others, allowing us to place some trust in their work even before it has been used in forecasts.
Furthermore, these forecast rationales are really just additional pieces of research that could themselves be cited, further enriching the relevant knowledgebase.
Reality
To make this new research paradigm a reality, we would need to implement the following:
A lasting link between forecasts and their informational sources
A lasting link between sources and their sources
A reward that flows along these links to sources that are cited in accurate forecasts
Metaculus seems like the best existing platform for implementing a link between forecasts and citations. As I mention above, I think this almost has to be implemented in the context of prediction polling rather than prediction markets. In prediction polling, forecasters have to be explicit about their confidence level in a claim, and are free to set this confidence level however they want. In a prediction market, forecaster confidence is somewhat encoded in the volume of a given trade, but this can be confounded by a lack of available funds or low trading skill. I also believe the incentive systems of prediction polling platforms can be more accepting of information sharing between participants, a critical prerequisite if we want forecasters to thoroughly and honestly explain their reasoning at the time of a forecast. The default for prediction markets is information hoarding to maintain a competitive edge over other traders in the zero sum environment, but it’s possible that this is not intrinsic to the format and that a play money market system like Manifold could get around it. If this is the case, I haven’t seen a proposed mechanism for doing so in my review of research. Regardless, this seems like a harder journey than Metaculus would have to make, so I’m limiting the focus of my hypothetical implementation details.
Metaculus already has a fairly robust comment section on each forecastable question. Users are prompted to share a rationale in this section with each of the forecasts, and there is even a simple markdown language for formatting these rationales. A small but critical gap is that there is no markdown feature for citations/footnotes. Implementing this well should both encourage citations and make them easy to track, and I expect doing so is a prerequisite to the rest of this proposal.
If this is done, Metaculus forecasters would have the ability to cite sources in their rationales. These are presumably typically associated with links to webpages… but how do we associate those with particular researchers? What if multiple links lead to the same study? Attribution is necessary to score useful research, and is straightforward for citations of other forecast rationales hosted on Metaculus, but otherwise seems painful.
My initial idea to solve this problem was to make Metaculus more friendly for posting longer form research to. There are already discussion threads and forums implemented on the site in some cases, though these are really just forecasting questions without something to forecast, and use the same commenting mechanisms. This means that almost all of the written content on Metaculus is hidden under individual questions/discussions/forums and can’t be easily discovered or searched. Obviously this could be remedied, but this seems like a big lift that would greatly increase the complexity of the site.
Eventually, I realized that I was trying to recreate the EA Forum. This forum is almost exclusively focused already on the kinds of information that I expect forecasters to find useful, has pre-existing norms for clear communication, honesty, and reasoning transparency, and even already has a space dedicated to the topic of forecasting. In fact, when I participated in a tournament that heavily incentivized well-cited forecasting rationales, I found many posts on the EA forum worth using. It even has a very strong set of formatting options including easy citations.
Instead of trying to build Metaculus into something it currently isn’t, I’m proposing a novel connection between Metaculus and the EA Forum.
This simplifies the attribution problem for Metaculus. It only needs to worry about two kinds of links: those that point to comments on Metaculus, and those that point to posts on the EA Forum. These both already have unique identifiers and are intrinsically connected to individual authors. On the EA Forum, tracking the citation of forum posts within other forum posts should be straightforward.
All of the above should cover items #1 and #2 from my initial checklist in this section. We now have a way (that seems quite technically feasible to me, a layman on the outside) to capture the linkages between forecasts and their sources and between sources and their sources, given the self imposed constraints that we only consider content posted on Metaculus or the EA Forum. This seems like a reasonable outcome to me, given that the EA Forum already encourages link posting from other sources with summaries.
This leaves an open question for #3, the actual reward that flows along citation chains to both indicate and incentivize research that is used in accurate forecasts.
It seems obvious to me that this should primarily be some form of “internet points” analogous to the existing “points” currency awarded for accuracy on Metaculus or the “karma” rewarded for upvotes on the EA Forum. I want to be careful with this proposal to minimize the chance of degrading the existing function of either platform, as I think they’re both already doing important things well. To that end, I believe a separate imaginary currency is warranted. This seems worth the slight increase in complexity on both platforms to preserve the integrity of their current systems.
As a stand-in, let’s call these “research points” (to be clear, I think this is a terrible name). Metaculus users will now earn both their current points for accurately forecasting, and '“research points” for their rationales being cited by other forecasters who in turn forecast accurately, or for their rationales being cited by other rationales who are in turn cited by accurate forecasts, and so on. This creates a novel incentive for forecasters to actually write clear and accurate rationales for why they believe what they believe, whereas currently they’re just prompted with no possibility for additional reward. If a forecaster only cares about accumulating the original points and being accurate without having to show their work, they can continue doing so with no real change to their experience. In my mind, this will create a new niche for a kind of forecaster dedicated to clear communication in addition to accuracy. I think we see this kind of forecasting done now by dedicated individuals or top forecasting teams, but it’s unfortunately rare.
Similarly, on the EA Forum, users will now have an ability to accumulate “research points” alongside their existing karma. Karma will indicate what other forum users think is worth rewarding, while “research points” indicate some genuinely manifested value in increasing the accuracy of someone’s beliefs. I expect this metric to be exciting in the EA/rationalist community, but forum posters will be free to ignore this new metric if they want and continue to post content as they did before, not specifically targeted at being useful for forecasters.
How will this metric be calculated? I expect others (especially those who work at these platforms) to have better ideas on this than I do, but just as a concrete example:
A forecast generates a number of “research points” equal to its traditional accuracy points.
These are divided equally between each of its cited sources that are from either Metaculus or the EA Forum.
The “research points” allocated to a source are then divided in half, with half going to that source itself, and the other half being divided equally between each of its cited sources from either Metaculus or the EA Forum.
So on and so forth, to the end of each citation chain. Maybe rounding each division up to the nearest whole number of points to preserve the value of long chains?
These “research points” are continually updated as the traditional accuracy points for Metaculus forecasts are continually updated (per my understanding) and displayed in aggregate in three places:
On each source itself. Either Metaculus comments or EA Forum posts, alongside traditional upvotes/karma.
On user profiles, on either platform, for the total of all the content they’ve generated that’s been cited.
Note that the traditional points on Metaculus can be negative in order to disincentive inaccurate forecasts, which allows a signal to flow through the system when sources are counterproductive.
I believe the above proposal represents a sort of minimum viable system that creates the incentives I’m talking about. It also leaves the door open for a lot of future possibilities, many of which I’m sure haven’t even occurred to me. Some that have include…
Different ways to display the information conveyed by “knowledge points” rather than raw point totals. Leaderboards of the most positively influential sources/posters? Percentile scores alongside or instead of point totals? Does this makes sense to be broken up by topic?
Visualizations of citation chains/graphs. These already exist for academic citations, but my intuition is that they get a lot more interesting/useful when you can see which threads of research have actually made forecasting models more accurate.
Cash incentives or other rewards tied to “research points’. Does this enable tournaments or grants to create the most “useful” research, in a way that I expect can be judged much less subjectively than the status quo?
Red Teaming
What could go significantly wrong with the above proposal?
These features could be impossible or much more expensive/difficult to implement as described. If this is the case, I’m not going to figure it out from my armchair, so I’m hoping to get this proposal to people who know much more. (~20% chance this is the case?)
I could be wrong about this being useful. Maybe high quality information actually isn’t a bottleneck for important forecasting, or maybe no one would be interested in writing the kinds of rationales or content intended to be incentivized by this system. (~10% chance this is the case?)
Maybe the points system is trivially gameable/exploitable in some way and won’t incentivize the behaviors I expect. I believe it is gameable, but I think the existing karma and points systems of the two platforms already are to a large extent, and the reason they are mostly useful is more due to the nature of the associated communities and their objective than some intense reliability of their point systems. (~20% this is the case? In which case I think the proposal is likely recoverable with mechanism tweaks)
It could degrade the existing functionality of these platforms in some way. I’ve tried my best to minimize this possibility, but again, I don’t think I can do much better than this from my armchair and will endeavor to get this idea to people with more relevant insights. (~15% chance this is the case? But may still be recoverable with tweaks)