It looks like the image from the server and control information to the server is sent through the VNC protocol. Other information such as the reward signal from the environment server is sent through a WebSockets protocol using JSON:
You should be able to implement this protocol for your environment and run a VNC server for the rest. A new class for the client representing your environment can be based on this:
If it's true, I believe we have to wait for the OpenAI team to build new gym environments before we can train in new games.
I only briefly poked around because it's nearing on midnight here - maybe you can pull open the examples included and work out how to rewire them to work on new games, maybe not. Either way, I've got a particular use case I'd like to make a gym for so I'm interested in finding out.