Introductory resources for Singular Learning Theory

Technical AI safety

Matthew Cameron Farrugia-Roberts

ActiveGrant

$10,650raised

$10,530funding goal

Fully funded and not currently accepting donations.

Project summary

6-week salary to contribute to foundational resources in the nascent field of "Singular Learning Theory x AI Alignment"

Project goals

Produce a literature review on Singular Learning Theory, as a foundational resource to help orient newcomers to the field.

How will this funding be used?

Salary for Matthew Farrugia-Roberts during the 6 week period (annualized $91,260/year).

What is the recipient's track record on similar projects?

A detailed survey of the literature has already been completed by Matthew as part of his MS thesis, but has yet to be written up. This substantial preparation will enable the work to be completed relatively quickly.

Matthew has published a joint first-author theoretical ML paper in ICML, a top-teir venue, and completed an MS thesis at the University of Melbourne with a mark of 95+, reserved for students 'typically encountered only a handful of times throughout an academic career'.

How could this project be actively harmful?

Singular Learning Theory provides a potential path to better understanding ML systems. Although better understanding of systems can be helpful for safety, it could also lead to insights improving the efficiency of ML training procedures potentially enabling more powerful systems to be trained sooner without a corresponding improvement in alignment. This risk holds for science of deep learning and interpretability methods in general; on balance, the benefits seem to outweigh the risks, but it is important to at least remain aware of the downside.

Singular Learning Theory is a speculative research direction. Foundational resources will enable more people to on-board to it. However, there's a possibility it's a dead-end and these people would have been better spending their time elsewhere. On balance, it seems worth exploring Singular Learning Theory and enabling newcomers to more rapidly on-board should decrease the overall cost to exploring this direction if resources are allocated efficiently.

What other funding is this person or project getting?

No other funding during this period. Matthew was previously receiving an RA salary for a previous project, and will receive an RA salary for a new project after completion of this six week project.

Matthew Cameron Farrugia-Roberts

over 1 year ago

Progress update

What progress have you made since your last update?

The 6 weeks were mostly spent on three projects:

Planning out the literature review mentioned in the grant.
Contributing to SLT research related to validating the developmental interpretability research agenda.
Learning to use TPUs and PyTorch/XLA, and teaching this to the developmental interpretability research community.

Project (1) is not yet complete. I made some progress recruiting co-authors and planning a literature review. However, we didn't make much progress writing the actual review: all we have is a review sketch. I consider this an overall failure. (I do still think I can eventually complete the review in the near future without additional funding. More below.)

Project (2) was much more successful. Over the 6 weeks I successfully replicated an in-context learning experiment based on prior work, and this went on to become the foundation for a collaborative research project leading to a paper "The Developmental Landscape of In-Context Learning" (under review, arXiv preprint, associated lesswrong post).

Project (3) was also successful. It culminated in me delivering a tutorial for the local alignment research community on how to accelerate research experiments with free TPUs from Google (tutorial, recording). Since then I have built on this experience by learning JAX and I am planning a JAX course for my research group for later this year.

In addition to these outcomes, I kept a weekly research journal for the first few weeks of the project, which contains some more detailed commentary.

Immediately after the 6 weeks, I transitioned to full time research work at Krueger AI Safety Lab and have been working since then on other projects (plus continuing to collaborate on project (2) with the developmental interpretability research community).

Overall evaluation

My main personal goal for the project was to fill a 6 week gap between paid work opportunities, and contribute in some positive ways to alignment research. I feel that this broader goal was achieved through projects (2) and (3) described above.

This broader motivation was part of my initial pitch to Adam Gleave, the regrantor who awarded my funding. However, Adam was most excited about the literature review aspect of my proposal, and chose to emphasise that aspect of the project in his writeup for this Manifund proposal.

Since I didn't manage to produced a literature review in the 6 weeks and also haven't managed to produce one since then, I consider this project to have failed.

What are your next steps?

However, I still think the project is recoverable. Since setting out to produce the literature review, there have been three publications that partially fill the need for this resource for the SLT/alignment research community.

A summary paper by the founder of SLT: Watanabe, 2022, "Recent Advances in Algebraic Geometry and Bayesian Statistics"
A methods paper including a good technical introduction to different levels of singularity, a central concept for SLT: Lau et al., 2023, "Quantifying Degeneracy in Singular Models via the Learning Coefficient"
Another methods paper including a good technical introduction to different definitions of the learning coefficient, a central quantity in SLT: Furman and Lau, 2024, "Estimating the Local Learning Coefficient at Scale".

These resources are enough to hold the community over but there is still a need for a comprehensive accessible technical introduction to the theory. Moreover, the SLT/developmental interpretability community has made some progress establishing the viability of this research direction, and so the need for such a resource within the wider alignment community is still present.

Working with some of the authors of the latter two articles I mentioned above, I still plan to write up the literature review we have planned. To do so I am waiting for an appropriate opportunity. I believe I will have an opportunity in the latter half of 2024 when my KASL project comes to a conclusion, in the few months before and after the start of my DPhil.

Is there anything others could help you with?

No.

In particular, I do not currently require more funding to find time to work on this.

If there is anyone who is interested in collaborating on the literature review, however, feel free to reach out.

Austin Chen

over 2 years ago

(and approved!)

Austin Chen

over 2 years ago

Thanks for the writeup, Adam! I like that the grant rationale is understandable even for myself (with little background in the field of alignment), and that you've pulled out comparison points for this salary ask.

I generally would advocate for independently conducted research to receive lower compensation than at alignment organizations, as I usually expect people to be significantly more productive in an organization where they can receive mentorship (and many of these organizations are at least partially funding constrained).

I share the instinct that "working as an independent researcher is worse than in an org/team", but hadn't connected that to "and thus funders should set higher salaries for at orgs", so thanks for mentioning.

Tangent: I hope one side effect of our public grant process is that "how much salary should I ask for in my application" becomes easier for grantees. (I would love to establish something like Levels.fyi for alignment work.)

donated $10,530

Adam Gleave

over 2 years ago

Main points in favor of this grant

There's been an explosion of interest in Singular Learning Theory lately in the alignment community, and good introductory resources could save people a lot of time. A scholarly literature review also has the benefit of making this area more accessible to the ML research community more broadly. Matthew seems well placed to conduct this, having already familiarized himself with the field during his MS thesis and collected a database of papers. He also has extensive teaching experience and experience writing publications aimed at the ML research community.

Donor's main reservations

I'm unsure how useful Singular Learning Theory is going to be for alignment. I'm most unsure whether it'll actually deliver on the promise of better understanding deep networks. The positive case is that traditional statistical learning theory has some serious limitations, making predictions that contradict empirical results on deep networks, so we need some replacement. But grandiose theories pop up now and again (the neural tangent kernel was hot last year, for example) yet rarely pan out. Singular learning theory has been around for several decades, so that it only recently gained popularity in ML should also give some pause for thought. It seems plausible enough and enough people are excited by it what I'm willing to give it a shot for a relatively small grant like this, but this grant is definitely not me endorsing singular learning theory -- I'd need to understand it a lot better to really give an inside-view evaluation.

Conditional on singular learning theory actually enabling deeper understanding of neural networks, there's still a question of it that's actually useful for alignment. I feel reasonably confident that it would be a positive development: generally having theoretical frameworks to engage with (even if approximate) seems a key component of engineering systems with strong guarantees. Whereas just making something that works well most of the time is much more tractable via a trial-and-error approach. So, understanding seems to differentially help building reliable systems than just systems that mostly work. But, understanding does accelerate both -- so there is a non-trivial backfire risk.

Process for deciding amount

Fully funded Matthew's ask, which amounts to $92,260/year annualized. The salary seems reasonable given his experience level. It's higher than US PhD stipends (~50k/year), but below that of most alignment research non-profits in the SF Bay Area (LCA filings from Redwood show at least $140k/year for an ML Researcher; FAR AI's pay scale is $80k-$175k/year for Research Engineers) and significantly below for-profit tech jobs. Matthew will be working from Australia where tech salaries are lower; Levels.fyi gives a median of $54k/year USD total comp, but short-term contractor positions are often up to 2x that of salaried employees, so I still consider the ask to be in a reasonable range.

Not directly relevant in this grant, but I generally would advocate for independently conducted research to receive lower compensation than at alignment organizations, as I usually expect people to be significantly more productive in an organization where they can receive mentorship (and many of these organizations are at least partially funding constrained).

Conflicts of interest

Please disclose e.g. any romantic, professional, financial, housemate, or familial relationships you have with the grant recipient(s).

I supervised Matthew for an internship in 2021 at CHAI; I have continued collaborating with him (although relatively light-touch) to see that project through to publication.

donated $10,530

Adam Gleave

over 2 years ago

Typo: salary is $91,260 annualized not $92,260.