🥥
Praveen Selvaraj

@pravsels

$0total balance
$0charity balance
$0cash balance

$0 in pending offers

Projects

Comments

🥥

Praveen Selvaraj

10 months ago

@tsmythe It's not just that text-to-video models require a lot of compute. There's also a lot of things about them that are still hazy. How long does each generation take ? How would we edit the videos ?

Possible that eventually that's where we're headed, though.

🥥

Praveen Selvaraj

10 months ago

@AntonMakiievskyi I set things up and then found out that I can't edit.

🥥

Praveen Selvaraj

10 months ago

@agucova I guess my goal is to create these tools and make them widely accessible in the easiest to use format, which would be a platform like substack/medium where the tools are built into the post creation flow. I have no idea how I'd make folks use these tools or if things will get to the 'building of a platform' stage.

It's still an open question if the SOTA LLM + RAG would be better or worse than a custom LLM that's finetuned on as much manim resources there are out there + a bunch of synthetically generated prompts/answers + a reward loop where the animations generated per prompt are rated (either by humans or by a multimodal LLM).

While working on the repo at a recent hackathon (more info here), I saw on X that Claude 3 was already good at generated manim code so I figured I should build that workflow first to see how good it is. Its possible that this wrapper + RAG approach might be good enough.

Regarding the diagram workflow, the goal is to build a sketch to image flow which can then be edited further by the user providing mask + text prompts. Kinda similar to this.