Trying to get GPT to generate comments at a particular level really highlights its limitations in my experience. For instance, I couldn't get it to focus on commenting on programming language aspects of the code (or only in a crude way). There's some depth it's lacking, it might be from RHLF, I don't know, but its commenting is like its writing.
Have you tried getting it to write a high level description before reproducing the code with comments? (via either FSL or instructions) Most of the reasoning ability in LLMs comes from them rambling about something and then the attention picking up on the rambling when it needs to generate the conclusion. If you skip that then the output will probably be much less coherent.
We played around with this for a bit actually. One idea we had was to generate a PlantUML diagram to show how the different components of a file or even a repository connected with one another. However, given the current limitations with GPT context, even when using GPT-4, this quickly became impractical for large files. We would need to leverage an AI with a much larger context length.
That said, perhaps if the entire repository is fed into a vectorised database, a high-level overview would be possible? Just thinking aloud right now and am happy to collaborate with anyone interested in exploring this further!