Detailed balance in large language model-driven agents
Posted by Anon84 3 days ago
Comments
Comment by gwern 5 hours ago
I wonder if this can be interpreted as consistent with that 'meta-learned descent' PoV? If the system is fixed and is just cycling through fixed strategies, that is what you'd expect from that: the descent will thrash around the nearest pre-learned tasks but won't change the overall system or create new solved tasks.
Comment by Mathnerd314 7 hours ago
But I have used prompts like this a fair amount, and it is more like stochastic gradient descent - most of the time, once it is close to the target, the model will take a small incremental change, but when it is really close the model will sort of say "this is not improveable as it is" and it will take a large leap to a completely different configuration. And then this will do the incremental optimizations and so on. This could be an artifact of the sampling algorithm, but I think it is also an issue that the model has this potential function encoded, but the prompt and the structure of the model do not actually minimize this potential. So, a real lesson here is that there is actually a lot of work still left to do in terms of smarter sampling. Beam search like is used today is sort of the tip of the iceberg. If we could start doing optimization with the transformer model as a component, like optimizing pipelines of reasoning rather than always generating inputs and outputs sequentially, that is where you could start using this potential function directly and then you would see orders of magnitude smarter AI. There is stuff about prompt optimization, but it is still based on treating models as black boxes rather than the piles of math they are.