Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: File-by-file AI-generated comments for your codebase (swiftstart.vercel.app)
51 points by swiftstart on May 23, 2023 | hide | past | favorite | 52 comments
My friends and I were complaining about having to decipher incomprehensible code one day and decided to pass the code through GPT to see if it could write easily understandable comments to help us out. It turns out that GPT can but it was still a hassle to generate comments for large files.

So we decided to develop a basic web application that automatically integrates with your Github repository, generate comments, create a pull request and send you an email when it is all done.

There is definitely a lot more that can be done but we wanted to gain feedback on whether this is a problem that you face too. Do you often find it challenging to understand complex code? Do you have difficulties in writing informative comments? And if so, would you find value in a tool that can automatically generate comments for your code?

Really appreciate any feedback and suggestions! Thanks in advance!



If they can be generated from the source they probably shouldn't be in the source. Maybe it should be an IDE plugin that displays comments for code as you hover over it.


I used to think that too, even as recently as the last time I saw someone posting how they made GPT-3 write commit messages automatically.

However

There are couple reasons why I think this may be valid use case:

- Code is a static serialization artifact. It's unfortunate we still work with it directly, but that's another conversation; fact is, we are working with it directly, and if there is commentary relevant to the code, it's best placed in the code, so it remains if the tools generating them dynamically (like your "on-hover" IDE plugin idea) become unavailable or stop working (or start generating different outputs after some update).

- It's true that the comments should talk about "why" much more than "how", as the "why" is often not apparent in the code itself. However, "not apparent" doesn't mean "independent" - the code structures and the reasons for their existence are correlated. GPT-4 may have seen enough and be smart enough to actually spot those - as if it was a developer familiar with the problem space, going all "yeah, I've seen this type of code before - it's likely trying to ${high level goal}". Generating those comments would be valuable.

Of course, generated comments should be reviewed and edited at the point of generation, to make sure they're accurate and reference actual documents/tickets/design decisions. Whether or not an average developer will do that - that's another topic.


For what it's worth from limited testing, (one example) gpt3.5 does have enough knowledge to determine the why.

I was curious what was going on in a sine cosine lookup table and that was doing linear interpolation between the points.

Feeding it into chatgpt, added the appropriate comments for each step.

It was fairly obvious, but had taken me 10 minutes to track down and realize what was going on.


> Code is a static serialization artifact. It's unfortunate we still work with it directly, but that's another conversation

Could you hint at what it is we should be working with instead?


Multiple different views and representations at every level - e.g. syntactic (coding to AST nodes, not characters), structural (imagine having editable class/module outline), semantic (editable views optimized for specific abstractions that allow you to e.g. edit anything resembling a state machine in a state-machine-specific view).

Views that give you a vertical slice though code, auto-inlining function calls when you need it, allowing you read and edit a large block of code, and propagating changes to their respective source locations - this is the solution for the "lots of tiny functions vs. few large ones" "clean code" pseudo-problem.

Views that let you control focus. No more constant dealing with exceptions vs. Result<T, E>, 50 shades of async, or how to mix logging into it all - problems to which modern solution seems to be all kinds of monadic bullshit that makes code completely unreadable, unless the language itself gives you some arcane syntax and semantics to hide it all. Instead, have your IDE hide all the things you don't care about - aka. cross-cutting concerns - from your view.

For example, are you working on the business logic, and focusing on what the code is trying to achieve, aka. the golden/success path? Have your IDE hide all error handling for you. Turn all the Result<T, E> return types into just T, making it look as if the code was using exception handling (and doing the handling somewhere else). Then do some vertical-slice auto-inlining to make a specific functionality more apparent. Too noisy with logging code? Turn display of that off. Conversely, if you're interested in error propagation, turn display of all the business code off.

(Think of it as Aspect-Oriented Programming on steroids, in an interactive form.)

This and much, much, more. It starts with a simple idea though: stop thinking in terms of source code as text in files. Start looking at it as semantic units (classes, functions, statements, expressions) in a database of some kind. Instead of opening a file and editing its text, you would query the database to get an abstract code graph, and feed it to a view that renders it the way you need it. "SELECT Foo Bar from Classes, JOIN Fields, JOIN Methods", feed it to an editable outline view. "SELECT" whatever else you care about, feed it through some transformer, to a different custom view. Edit it, and have it automatically apply changes/"refactorings" to affected code.

And yes, editing raw plaintext is something that's often very efficient and we have well-optimized tools for this. But this doesn't mean the plaintext in question has to correspond 1:1 with source code. Instead, you could have the class/module outline view be editable plaintext, so you could regex-replace half of it in 5 seconds, and then press a button, and it would rename and move methods and classes across the codebase, making it conform to your edited outline. Basically what dired mode does to filesystem in Emacs.

Etc.


Ah, thank you. Interesting.

This sounds like it would enable more complex systems. Is that the goal?

The skeptic in me thinks these are fancy bandaids for failure to keep complexity under control.

The optimist in me thinks this sounds like a fabulously interesting development experience.


> This sounds like it would enable more complex systems. Is that the goal?

Enabling more complex systems, making it much easier and faster to create safe, stable and efficient systems at current complexity levels - both are really the same goal. Making current complexity level easier to deal with also means you can increase complexity level to the point the work is as difficult as it was before. Your favorite cake suddenly costing half as much means you can save half the cost, or... just buy two.

> The skeptic in me thinks these are fancy bandaids for failure to keep complexity under control.

To me, most of the recent programming language trends are such fancy bandaids. You can't optimize for every possible concern simultaneously in a single plaintext format, but $deity, people try. That's how you get special syntax for Result<T, E> handling (e.g. ?, ?!), or increasingly impenetrable abstractions at the intersection of typing and monads - all because you'd like to represent error handling and logging and futures and few other things in maximally easy/readable way, in the same text, at the same time.

You're fighting two limits here - "in the same text" and "at the same time". IMHO, we should give up on both, and accept that the final "single source of truth" form will become some sort of unholy blend between C and Haskell, serving the role of assembly above assembly. Expressing everything in one place, but not casually readable. For day to day work, you would use many specialized representations, each focused on its specific concern, and free from constraints of a single common text format.

> The optimist in me thinks this sounds like a fabulously interesting development experience.

That's what I think too. It's about raising the tooling to meet us at the level we think at, making it work the way we think about code and systems - instead of trying to project every possible way of thinking into single programming syntax directly.

Note: there is prior art for this, mostly in Smalltalk world (including, recently, the Glamorous Toolkit). The short time I spent playing with those tools tells me this approach has great potential, but could use a lot larger dev community giving it prolonged focus, to improve and streamline the tooling.


The way I see it, I'd expect that the generated comments would often get some human attention immediately afterward. Even if they don't get edited but there's just a crude "filter" where the developer keeps some comments and throws out those which are undesirable in some way, that's very useful signal which can't be re-generated from the source and thus needs to be stored somewhere along with the code.


Depends on your comment philosophy.

Under mine, the AI should sign and date the comments, something like so

AI 20230524

Comments are an odd asset. Some of the most useful ones I've seen when debugging are rotted ones that describe the code that used to be there.

You'll then turn back the clock and find the bozo that "fixed" something by removing important code but for some reason left the comment behind. It's pretty common.

This AI tactic could protect against that so when the next bozo comes in with their wrecking ball fingers and leaves the ai comments behind, you can go in there with your mop and broom and recover things quicker.

What would be really nice is my favorite commenting technique which is where I write obvious but also counterintuitively very wrong code and I aggressively comment it like

"Hello, if you're here you may be asking yourself why it's not like this

(Wrong code)

I did too! That's what #231 and #302 are about! I know I know, I fell for it too. If you're going to change it, open a ticket or something because you're probably going to break it as well"

If the AI can do that then we're in gold territory


A more robust and less passive aggressive way to do this is to write a unit test that fails under the "wrong code".


That's another thing that could get removed/modified/disabled and it's action at a distance

Proximity is really fucking important but of course is only a proxy for other practices and anti pattern avoidance

Regardless, if you need to engage, do so at the obvious point of engagement, not off in some test suite where you cross your fingers on predicting the future on the diligence of its upkeep.

I mean write the test, do your rain dance. It's not going to protect you against the stupidity you have to worry about, X years after you've left the building.

Your code will live longer than you think and be modified more times by more people, who you will never meet, then you realize.

You don't have to feel responsible for that. But I do and that's one of the examples of how I practice it. You're leaving notes for future archaeologists to remove their guesswork.

Again, tests are great, linters are fine, but your nth generational successors may not agree with you or how you did them or how often they should be run or... and there goes your hardwork. Don't rely on them for assuring protection past your tenure.

Future coders are probably 10 times more likely to curse you as a nuisance than appreciate your diligence. Assume they'll hate you.


I actually completely agree with you in your approach, but also feel you/we should not take upon our shoulders quite so much responsibility to make the things we build perfectly resilient to bad decisions made by successors in the future. Put another way, I love and also do the “Note: do not use “‘86400 seconds’ - fails 2x per year” comment - however, I have not ever had the “there goes my hard work” thought after I have quit. If they don’t hire equal or more competent developers than me, and they mess up things, that’s actually kind of funny! (Note: if I worked in the medical, military, or nuclear fields, I’d feel very differently!)


Those industries are fundamentally different. They don't build software by the same rules.

It's like comparing say, clothing you buy at the mall to the clothing of a hazmat suit - sewing is involved, materials, you need something that fits over a body, it's roughly the same but they're fundamentally different.

There's extensive compliance and regulation. I've done avionics/automobile software. It's not fun or sexy and it isn't supposed to be.

If you try to play by the same rules as the rest of software you get Theranos. It doesn't work.


Agreed this makes much more sense, from both a UX perspective as well as a code-cleanliness perspective.

There's also the AI evolution question. When we have GPT-10 next year, do we go through and regenerate all the comments? That would introduce a lot of noise into the repo's commit history and `git blame`, which I think is another indicator that the repo is not the right place to store this sort of thing. (And it'd have to be done again every time the AI got smarter...)

Having the AI perched on your shoulder and just analyzing the code as you look at it seems much simpler. Like a friendly, modern version of the pirate's parrot.


> There's also the AI evolution question. When we have GPT-10 next year, do we go through and regenerate all the comments? That would introduce a lot of noise into the repo's commit history and `git blame`, which I think is another indicator that the repo is not the right place to store this sort of thing.

Counterpoint: comments in code reflect what the writer thought at the time of their generation - whether they were a human or an AI (and, excepting automated stupidity, there would be a human reviewing and accepting AI-generated comments). "Having the AI perched on your shoulder" is like re-reading and re-interpreting what the code means. You get the benefit of experience (and improved AI models), but you'll also miss the context, long lost to time since the code in question was written.

I'd say, we should to both. And code cleanliness... we won't make much more progress here than we've already made, not until we stop coding directly in the final, plaintext source form. There are too many conflicting concerns wrt. readability, and you can't have them individually optimized at the same time in a single piece of text.


Wow, thanks for the insightful discussion and feedback! This is definitely something that we will take into consideration and ideally, provide as an option.


Would it be so bad on git blame? Assuming all/nearly all comments are on their own lines, I would not expect that part to be a problem. The main problem I would see would be finding a way to merge in such a huge PR with lots of people actively working in these files, so there would be a lot of “merge conflicts” each time people tried to land their branches after one of these mega-comment-PRs went in.


I can see the argument and wouldn’t want to go overboard with generated comments, but it’s nice to have some in the source for now, since IDEs have tooling to display source comments in various contexts (eg hover over a function and get its docstring).

I can definitely see the utility of a “tell me more about this method” button that gets descriptions from GPT.

I also like the “ChatGDB” style of interface where all the local UI context is added to your GPT session (eg “what is this code doing” will answer about what you have selected, in the context of the whole file, and perhaps with the ability to retrieve other files too if needed for the explanation).


Exactly. If something can be generated relatively quickly, don't persist it.


OK, then keep the generated comments and delete the code.


I can actually imagine a stage, past where we are now but before AGIs just writing all the software, where a repo consists of the prompts describing each module in a way that an AI would be able to generate it. Update the software by editing the prompts, or more likely, by asking an AI to make the necessary changes to all the prompts to add a particular feature.


Cool idea, but for me the difficult part is understanding the author’s intention rather than the syntax. And thats is what needs solving. Not just at a high level but in detail.


Fair enough, that is a problem that we are facing too but we have no idea how to solve it at this point to be honest. Perhaps connecting one's github repository and commit history into a vectorised database for additional context? Hmmm


My advice for developers has been to write comments before writing the code. That way they describe the intention followed by an implementation. Even if there was a bug somewhere or code was poorly written at least i knew what the intention was.

So maybe training a model against some really well documented codebases might help?

Perhaps unit tests are a also a must since proper unit tests force granularity, thus in theory you’d have a good description of what the intention was + small chunks of code to correlate with.

The issue tho is that most open source code is technical code and code that relies on it serves a business function.

Either way i think the key is finding well documented well tdd’d business function code. Even if what you wish to explain is not clean and tidy but certain bits may fit in patterns that make sense when individually fit against a model. If it makes sense.


"My advice for developers has been to write comments before writing the code." Haha yeah, however in our experience thus far, quite a number of developers really only explain their intentions verbally so nothing ever makes it into the codebase unfortunately. And after a few months, they themselves may not understand why the code was written in a particular manner.

"The issue tho is that most open source code is technical code and code that relies on it serves a business function." Wow yeah never thought about it in that way before!


This is really cool, but somewhat misses the point of comments. If the comments can be generated from the code, then they are just restating what the code is doing. That's great for docstrings! But code comments are far more useful when they explain to you why the code is a certain way. That is something that is learned by the person implementing the code, and can't be learned by looking at it directly. For example, "Uses a generator here to avoid fetching all results into memory".


Thanks for the feedback! We can't figure out how to hack together something like that just yet (and if it is even something that should / can be solved by a tech product) but if we do, we'll definitely share that as an update! :)


I really like the direct GitHub repo integration! I've thought about doing something similar as well.

But keep in mind, this should be easy to do from the command line with a number of tools as long as you have a gpt-4 api key. I would probably trust gpt-3.5-turbo with this task in a pinch, but I think there would be more risk of it disrupting the original code.

Here it is with aichat [1]:

  $ curl -s https://raw.githubusercontent.com/leachim6/hello-world/main/p/Python%203.py | aichat --model="gpt-4" -p "emit this exact code, but with helpful comments; don't put any comments before a #!shebang line if present"

  #!/usr/bin/env python3

  # This is a simple Python script that prints "Hello World" to the console.

  # The first line, called the shebang, tells the operating system how to execute the script.
  # In this case, it specifies that the script should be run using the Python 3 interpreter.

  # The print function is used to output text to the console.
  # Here, it is used to print the string "Hello World".
  print("Hello World")
Or with my own tool aider [2]:

  $ git clone https://github.com/leachim6/hello-world.git
  $ cd hello-world
  $ aider "p/Python 3.py"

  Added p/Python 3.py to the chat
  Using git repo: .git

  > add helpful comments

   p/Python 3.py
   <<<<<<< ORIGINAL
   #!/usr/bin/env python3
   print("Hello World")
   =======
   #!/usr/bin/env python3
   # This is a simple Python script that prints "Hello World" to the console
   print("Hello World")  # Print "Hello World" to the console
   >>>>>>> UPDATED

  Applied edit to p/Python 3.py
  Commit aad4afc aider: Added helpful comments to Python script.

[1] https://github.com/sigoden/aichat

[2] https://github.com/paul-gauthier/aider


Interesting! We are personally not the most comfortable with editing things directly from the terminal, especially when GPT hallucinates, but we can definitely see how this would provide users with more flexibility. Thanks for sharing!


You can easily extend the PR workflow to local git: just check that it's run inside a git repo and error out if there are any unstaged changes. Add a --dangerous flag for non-git repo use cases where data might be lost. You can use the git API directly and commit to a new branch without editing the active user branch on disk.


Absolutely! Aider does most of this.

It notices if your local repo is dirty and asks if you'd like to commit before proceeding with the GPT chat. It will even provide a suggestion for the commit message.

You can run aider with --no-auto-commits if you don't want it to commit to the repo. This is similar to your suggested --dangerous flag.

I have considered various magic/automatic branching strategies. But I suspect they would be too confusing. And people probably have their own preferred git workflows. I feel like it's probably better to let folks explicitly manage their branches and PRs however they like.


I agree, sometimes you need to carefully review the changes that GPT suggests.

My aider tool tries to make this easy by leveraging git. While it automatically commits the edits from GPT, it also provides in-chat commands like /diff and /undo. These commands let you quickly check exactly what edits GPT made, and undo them if they're not correct.

Aider will notify GPT if you /undo its changes, and GPT will probably ask why and then try again with your concerns in mind.

To manage a longer chat that includes a sequence of changes, you can also use your preferred standard git workflows like branches, PRs, etc.


Trying to get GPT to generate comments at a particular level really highlights its limitations in my experience. For instance, I couldn't get it to focus on commenting on programming language aspects of the code (or only in a crude way). There's some depth it's lacking, it might be from RHLF, I don't know, but its commenting is like its writing.


Yeah, we had to do a lot of prompt engineering and even then, we still had to clean up the files quite a bit programmatically.

Perhaps GPT-5 and beyond would make this entire process 10X easier :)


Have you tried getting it to write a high level description before reproducing the code with comments? (via either FSL or instructions) Most of the reasoning ability in LLMs comes from them rambling about something and then the attention picking up on the rambling when it needs to generate the conclusion. If you skip that then the output will probably be much less coherent.


We played around with this for a bit actually. One idea we had was to generate a PlantUML diagram to show how the different components of a file or even a repository connected with one another. However, given the current limitations with GPT context, even when using GPT-4, this quickly became impractical for large files. We would need to leverage an AI with a much larger context length.

That said, perhaps if the entire repository is fed into a vectorised database, a high-level overview would be possible? Just thinking aloud right now and am happy to collaborate with anyone interested in exploring this further!


>> My friends and I were complaining about having to decipher incomprehensible code one day

Your raw sample files [1] are already highly descriptive: filenames, component names, variables etc etc. the auto-generated comments add only noise (increase file size). I would use complex, hard-to-decipher incomprehensible code to show how these comments bring sense forth. Perhaps let it loose on the Linux Kernel source, see what it does - an idea

Personally, I prefer the doc-as-you-code approach, especially using context-aware naming. When code's cryptic & large, i try to visualise it e.g. Sequence Diagram from code (intellij).

[1] https://swiftstart.notion.site/Sample-files-acfb097bbf214210...


Thanks for the feedback! That's a really good idea and we will definitely give that a shot!

If it provides more clarity, we are trying to tackle this in stages. Can the AI produce helpful and descriptive code for 1. Basic + Short (less than 100 lines) files? 2. Basic + Long (more than 100 lines) files? 3. Complex / Vague + Long files?

After stage 3, which I believe is what you are referring to, we then hope to explore stage 4. 4. Can AI incorporate programmers' intentions into the comments?


100% I've had this problem in the past and the functinos were way more obscure than something called " get_doc_from_code ".

Can it even deobfscuate code ?


I entered my email address, selected a file, clicked "Upload", and a popup told me "Please select a file." Nothing I did would get it to ingest a Python file. Tried with both Firefox and Chrome with the same results. Sample Python file was only 29 lines long. What am I missing?


Yep, the tool only seems to be accepting .zip files.

I managed to upload a file successfully by zipping a .js file. The file upload successfully, but then when I received the results a few minutes later, the zip file containing the results was empty.


Hey, yeah this is an issue that we are aware of and has since been rectified.

If you are still experiencing this issue, please feel free to reach out to us at swiftstarters@gmail.com

Thanks!


Hey, could you reach out to us at swiftstarters@gmail.com?

Will be happy to help you debug this. Thanks!


I love the idea of using this to more quickly understand otherwise badly-commented or uncommented code.

I'd hate the idea of this leading anyone to avoid commenting their code ("just let the AI do it!"), since comments also need to be about the why and how to use and how not to use -- not just the what.


It would be funny if comments are code in the future, as an AI just translates intentions to code that is invisible.


Hey, that is an interesting way to lens the possible future. There is definitely talk about the "singularity" of programming languages, similar to how you can translate from one language to another using the meaning behind the words. In fact, Copilot seems to be a glimpse into that future. Thank you for the comment!


That's kind of my biggest issue with Co-pilot at the moment... it can turn comments into code, but the amount of effort required to produce the comments is greater than the effort required to write the code.


It's both distressing and amusing to me that so many people think software engineers spend most of their time coding.


In that case, would our web application help to solve your problem? :) Specifically, you write the code, we write the comments for you.


It might be… I would say 95% of code doesn’t need comments. But then you encounter that one complicated function and you go wtf? So I think I would prefer something more targeted. Like a vs code extension that lets you summarize any code snippet.


landing page should show example input output. youre asking for too much up front. all the best!


Thanks for the feedback! The web application has been updated with a link to some sample files accordingly!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: