Why wouldn't something like an RL environment allow them to specialize in poker playing, gaining those skills as necessary to increase score in that environment?
E.g. given a small code execution environment, it could use some secure random generator to pick between options, it could use a calculator for whatever math it decides it can't do 'mentally', and they are very capable of deception already, even more so when the RL training target encourages it.
I'm not sure why you couldn't train an LLM to play poker quite well with a relatively simple training harness.
> Why wouldn't something like an RL environment allow them to specialize in poker playing, gaining those skills as necessary to increase score in that environment?
I think an RL environment is needed to solve poker with an ML model. I also think that like chess, you need the model to do some approximate work. General-purpose LLMs trained on text corpus are bad at math, bad at accuracy, and struggle to stay on task while exploring.
So a purpose built model with a purpose built exploring harness is likely needed. I've built the basis of an RL like environment, and the basis of learning agents in rust for poker. Next steps to come.
E.g. given a small code execution environment, it could use some secure random generator to pick between options, it could use a calculator for whatever math it decides it can't do 'mentally', and they are very capable of deception already, even more so when the RL training target encourages it.
I'm not sure why you couldn't train an LLM to play poker quite well with a relatively simple training harness.