1

Comparing forecasting platform accuracy

🥨

Jack

$1,500raised
$800valuation
Sign in to trade

Project description

Different forecasting platforms often show substantially different predictions for the same questions. I want to collect and analyze data to compare prediction accuracy between different platforms, and help understand how different methods of eliciting and aggregating forecasts compare - e.g. prediction markets vs prediction polls, and real money vs play money prediction markets. How similar or different are their accuracies? Are there different types of questions where different platforms perform better?

I have identified a few categories of forecasting questions that offer identical or near-identical questions that can be directly compared between multiple forecasting platforms - world events, politics, elections, sports, and the ACX prediction contest all offer opportunities for comparisons between some subsets of platforms. Platforms I expect to look at include Metaculus, Manifold, Polymaket, PredictIt, Good Judgement, and possibly others such as Insight, Betfair, etc if data and time permits. I would also like to compare these to forecasts such as 538.

I want to analyze prediction accuracy scores for these comparisons (see the post mentioned in the next section for an example of this type of analysis), and if possible try to understand where the differences come from, what edge the different platforms may have, and to what extent one platform is pricing in the predictions of another platform. I also want to experiment with methods of aggregating the forecasts of different platforms together to produce a meta-forecast.

What is your track record on similar projects?

Last fall I did an analysis comparing election forecast accuracy on the 2022 US elections: https://firstsigma.substack.com/p/midterm-elections-forecast-comparison-analysis. As I point out there, a single election cycle gives a set of highly correlated questions with very limited information value. This project will extend that to more types of forecasts. Additionally, that previous work only looked at the forecasts the night before the election - this project will examine forecasts over different spans of time.

How will you spend your funding?

The funding is to provide incentive/reward for me to do this project - I've been thinking about this project for a while but haven't been able to prioritize it.

At minimum funding valuation, I anticipate being able to collect data for 2-3 of the platforms listed above and run broad analyses without too many cross-tabs.

With higher funding valuation, I expect to personally spend more time on the project and/or bring on a collaborator to extend the project further (I already am discussing this project with them and have worked with them on other forecasting projects). I anticipate collecting data for more platforms and diving deeper into some questions, e.g.

  • Investigating how well different platforms perform at different time horizons before question resolution. E.g. I would expect to observe the impact of discount rates on prediction markets.

  • Analyzing predictive performance on different question categories, e.g. politics vs sports.

  • Analyzing the impact of number of forecasters and liquidity (for prediction markets) and controlling for them (e.g. comparing questions with similar numbers of forecasters).

holds 0%
🥨

Jack

over 1 year ago

Update:

How is the project going?

We collected data from Polymarket, Metaculus, and Manifold, by writing scripts to retrieve time series data from their APIs and putting together a dataset of comparable questions with matching resolution criteria (43 questions so far). We ran comparisons between the platforms by calculating time-weighted accuracy scores (log and Brier), restricted to the time intervals in common between each pair of platforms.

Currently, in general the scripts and collected results look good but we're still working on validating the data - there were a number of data bugs that we already found and fixed, so there's a pretty good chance there's still some bugs, and we aren't ready to report results yet.


How much money have you spent so far? Have you gotten more funding from other sources? Do you need more funding?

We did not get any additional funding, and no additional funding is needed.

How well has your project gone compared to where you expected it to be at this point? (Score from 1-10, 10 = Better than expected)

3/10 - We were able to collect a lot of good data a broad range of questions, including extras like counts of participants and predictions at each point in time, and have analysis in progress, but I was hoping to have at least basic results published at this point.

Are there any remaining ways you need help, besides more funding?

Nothing strictly needed, but one nice-to-have: If we had a better data source for Polymarket, that would be nice to get more accurate price data - what we currently have uses transaction prices, but if we had precise historical order-book mid prices that would be useful.

And having more questions that are in common between different platforms would of course be helpful as well. There will be a large batch of these at the end of the year, so I will plan to add them to the analysis then.

Any other thoughts or feedback?

I will have a block of time to work more on forecasting-related projects this fall, so I expect to be able to complete this analysis by then and possibly work on additional follow-up investigations.


Thanks to Michael Wheatley who worked with me on this project and did most of the data collection work.