What progress have you made since your last update?
See our recent update, "Timaeus in 2024," for a high-level overview of our research progress in 2024.
Because this Manifund proposal was not fully funded and because progress in separate research projects opened up new research possibilities, we decided to direct our attention immediately to the last of the projects we describe in this proposal: understanding-based evals, under the heading of "Singular Psychometrics."
We've been working on this project in partnership with the UK AISI and are on track to finish this project by the end of March 2025. As described in the update linked above, we have successfully overcome the engineering obstacles required to scale LLC estimation to models with billions of parameters. This unblocks the primary hurdle to seeing this project to completion.
What are your next steps?
We're currently working on the final stage of the singular psychometrics project. Our hope for this project is to use SLT-derived metrics to differentiate how different models achieve the same level of performance. Can we distinguish a model that has memorized a given benchmark from one that truly generalizes on that benchmark using the local learning coefficient?
Is there anything others could help you with?
Not currently. We're looking forward to sharing the final update in April.