Calibration City

EA Community Choice Forecasting

wasabipesto

ActiveGrant

$3,764raised

$10,000funding goal

Donate

Project summary

I’ve been working on Calibration City, a site for prediction market calibration and accuracy analysis. I want the site to be useful for experienced prediction market users as well for as people who have never heard of them before.

Example user questions we aim to answer include:

I'm interested in sports, how good is Manifold at predicting games a week in advance? Do other sites have a better track record?
This PredictIt market is trading at 90¢ but has less than 2000 shares in volume. How often does a market like that end up being wrong?
I’m worried about the accuracy of markets that won’t resolve for a long time. What is the typical accuracy of a market over a year away from resolution?

What have you done so far?

Calibration City is currently live! We completed the MVP in January 2024 with additional features landing in February and March. We integrate data from Kalshi, Manifold, Metaculus, and Polymarket, with over 130,000 total markets and over 300 visitors in the past month.

There are currently two main visualizations: calibration and accuracy. The calibration page shows a standard calibration plot for each supported platform. The user can choose how markets are sorted into bins along the x-axis (by the market probability at a specific point, or a time-weighted average). They can also apply weighting to each market based on values such as the market volume, length, or number of traders. Users can filter the total set of markets used for analysis based on keyword, category, duration, volume, or other features. Is Polymarket consistently overconfident? Underconfident? What about on long-term markets?

The accuracy plot allows users to directly compare different factors’ effects on market accuracy. In addition to the standard filters and binning options, the user can select a factor such as the market date, total trade volume, market length, or number of traders. With this additional axis, users can learn how (or if) those factors actually impact market accuracy. Does higher trade volume really increase accuracy? If so, by how much? What about more recent markets?

The beginner-friendly introduction page is a Socratic-style dialog introducing the reader to basic concepts of forecasting before introducing the premise of the site. The resources page lists the current capabilities of the site, answers common questions about the data gathering, and lists a few community resources for further reading. A simple list page displays all markets in the sample, useful for locating outliers or trends over similar markets.

Calibration City was awarded $3500 from the Manifold Community Fund, the highest of any project submitted. It was recently mentioned in Nuño Sempere‘s forecasting newsletter for June 2024.

What do you have planned next?

My next big goal is to address one of the biggest problems with naive calibration comparison: different platforms predict different things. Some platforms automatically create dozens of markets in the style of “Will X metric be in range Y at time Z?” every day while other platforms have far fewer markets with longer timespans and more uncertainty. The analysis you currently see on Calibration City can be very useful but it’s unfair to calculate the calibration score of each platform and compare them directly.

In order to address this, we need to classify markets into narrow questions, such as “Who will win the 2024 US presidential election?” or “Will a nuclear weapon be detonated in 2024?”. We can find all markets across all platforms that predict the relevant outcome, check the resolution criteria to make sure they’re essentially equivalent, and then compare those with a relative Brier score that awards markets that were correct earlier. Once we have a corpus of these questions and their constituent markets, we can calculate a score for each platform in each category and fairly compare them.

I plan to do this classification primarily with GPT-4, starting with smaller samples and building a corpus from there. A fair amount of human effort will still be necessary to identify variations in resolution criteria and other edge cases. Once we have the dataset I can build a scorecard or dashboard that fairly compares each platform in each category, allowing users to definitively answer which market platform is most accurate in each field.

Some of my other planned features for this project include:

Integrate data from more sites, such as PredictIt, Futuur, and Insight Predictions
Get more data from the sites we do monitor, such as market volume from Polymarket
Easily share visualizations with a link or export a summary card for social sharing
Natively support advanced market types such as multiple-choice or numeric/date markets
Generate individual user calibration plots with the same methodology that we use for platforms
Create an easy-to-use cross-platform bot framework for arbitrage or reactive betting
Have a dashboard of live markets with comparisons/discrepancies across platforms
Provide an estimated probability spread for live markets based on similar past markets

How will this funding be used?

The primary use of this funding will be as compensation for my time. In addition, some planned features will incur direct costs:

Classifying over 130,000 markets with GPT-4 in order to find matches
VPN connections for platforms that restrict users based on location
Additional compute server capacity for increased load

Who is on your team?

I’m wasabipesto - you may recognize me from the Manifold discord. You can find my contact information and other projects over at my website.

I have a full-time job but I enjoy working on projects like this in my spare time. I am not typically paid for hobby projects so I work on whatever interests me at the moment. Funding from this grant would compensate me for my time and incentivize me to work on additional features when I would otherwise be unproductive or working on other projects.

Calibration City is fully open-source on GitHub and open to community contribution. You can see the live data used by the site for your own analysis at https://api.calibration.city/

What other funding are you or your project getting?

I received retroactive funding for this project from the Manifold Community Fund. I don’t receive any ongoing funding for this project.

donated $2,000

Ryan Kidd

7 months ago

Main points in favor of this grant

I think prediction markets are a great forecasting mechanism and accurate forecasts are an essential component of good decision-making. I regularly consult Manifold, Metaculus, etc. for decision-relevant forecasts. Establishing the accuracy of these platforms seems crucial for widespread adoption of prediction markets in institutional decision-making.
I’m excited by the potential for Calibration City to track the accuracy of AI-specific forecasts, to aid AI safety and improve planning for transformative AI. I strongly encourage wasabipesto to create an interface tracking the accuracies of predictions about AI capabilities and AGI company developments.

Donor's main reservations

It’s possible that this tool doesn’t increase trust or uptake of prediction markets in decision-making because the interface is too abstract or concepts are too abstract. However, even so, this might prove useful to some individual decision makers or research projects.
It’s possible that the AI questions I am most interested in calibrating on belong to a class of long-horizon predictions that is not well-represented by the calibration of short-horizon, closed markets.

Process for deciding amount

I decided to fund this project $2k somewhat arbitrarily. I wanted to leave room for other donors and I didn’t view it as impactful in expectation as other $5k+ projects I’ve funded.

Conflicts of interest

I don't believe there are any conflicts of interest to declare.

wasabipesto

7 months ago

@RyanKidd Thank you for the contribution and the kind words! I agree AI forecasting is very important and is therefore one of the primary topic areas I intend to be featured on the site. I also think that the most important questions in that area will be long-horizon and future accuracy may not be reflected by past performance, but I'm sure there is still plenty to learn.

donated $40

Sasha Cooper

7 months ago

My partner and I made notes on all of the projects in the EACC initiative, and thought this was one of the more convincing among some really strong competition.

Our quick and dirty notes:

They: Something distinct in the prediction market field

He: Product ready = big plus, and doing something distinct (much more a fan of this than making new forecasting alternative tools)

donated $10

Nathan Young

7 months ago

I asked @wasabipesto to sign up here because calibration city is something I wanted to exist and they built it! So I wanted to reward them for that.

donated $20

David Glidden

8 months ago

The world needs to better understand the importance of calibration - let's help it go mainstream!

donated $50

🍉

nikki

8 months ago

The Manifold community yearns for per-user calibration!

donated $50

Osnat Katz

8 months ago

I think this is cool and beautiful and I look forward to seeing more of Calibration City