Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: GitHub "Lines Viewed" extension to keep you sane reviewing long AI PRs (chromewebstore.google.com)
40 points by somesortofthing 23 days ago | hide | past | favorite | 47 comments
I was frustrated with how bad a signal of progress through a big PR "Files viewed" was, so I made a "Lines viewed" indicator to complement it.

Designed to look like a stock Github UI element - even respects light/dark theme. Runs fully locally, no API calls.

Splits insertions and deletions by default, but you can also merge them into a single "lines" figure in the settings.



Sure, it looks neat, but why would you ever want this? What happened to closing PRs like thise with a short and simple "This is unreadable. Split it into smaller self-contained commits, and write proper commit messages explaining what they do and why" comment?

Massive walls of code have always been rejected simply for being unreviewable. Why would you suddenly allow this for AI PRs - where you should be even more strict with your reviews?


I'm on the fence about this. Sometimes a new feature needs maybe 2k lines of code split over 10-20 or so files. Sure, you could split it up, but you can't necessarily run the parts in isolation and if they get split over multiple reviewers it might even be that no one reviewer gets the whole context.

So, I'm kind of okay with large PRs as long as they're one logical unit. Or, maybe rather, if it would be less painful to review as one PR rather than several.


I think the particular problem is if AI is just producing large volumes of code which are unnecessary, because the LLM is simply not able to create a more concise solution. If this is the case it suggests these LLM generated solutions are likely bringing about a lot of tech debt faster than anyone is ever likely to be able to resolve it. Although maybe people are banking on LLMs themselves one day being sophisticated enough to do it, although that would also be the perfect time to price gouge them.


Agree. We've seen cowboy developers who move fast by producing unreadable code and cutting every corner. And sometimes that's ok. Say you want a proof of concept to validate demand and iterate on feedback. But we want maintainable and reliable production code we can reason about and grasp quickly. Tech debt has a price to pay and looks like LLM abusers are on a path to waking up with a heavy hangover :)


We hired some LLM cowboy developer externals that were pushing out a plethora of PRs daily and a large portion of our team's time at one point was dedicated entirely to just doing PR reviews. Eventually we let them go, and the last few months for us has been dedicated to cleaning up vast quantities of unmaintainable LLM code that's entered our codebase.

I think it's still early days, and it's probably the case that a lot of software development teams have yet to realize that a team basically just doing PR reviews is a strong indication that a codebase is very quickly trending away from maintainability. Our team is still heavily using LLMs and coding agents, but our PR backlog recently has been very manageable.

I suspect we'll start seeing a lot of teams realize they're inundated with tech debt as soon as it becomes difficult for even LLMs to maintain their codebases. The "go fast and spit out as much code as humanly possible" trend that I think has infected software development will eventually come back to bite quite a few companies.


Yep, it's the early days. Eventually we'll work out something like Design Patterns for Hybrid Development, where humans are responsible for software architecture, breaking requirements into maintainable SOLID components, and defining pass/fail criteria. Armed with that, LLMs will do the actual boilerplate implementation and serve as our Rubber Ducky Council for Brainstorming :)


I'm okay with long PRs, but please split them up into shorter commits. One big logical unit can nearly always be split into a few smaller units - tests, helpers, preliminary refactoring, handling the happy vs error path, etc. can be separated out from the rest of the code.


There really is no benefit of splitting a functionality from it's test. Then you just have a commit in the history which is not covered by tests.

Splitting "handling the happy vs error path" sounds even worse. Now I first have to review something that's obviously wrong (lack of error handling). That would commit code that is just wrong.

What is next, separating the idea from making it typecheck?

One should split commits into the minimum size that makes sense, not smaller.

"Makes sense" should be "passes tests, is useful for git bisect" etc, not "has less lines than arbitrary number I personally like to review" - use a proper review tool to help with long reviews.


Depends entirely on your workflow - we squash PRs into a single commit, so breaking a PR into pieces is functionally identical to not doing so for the purposes of the commit history. It does, however, make it easier to follow from the reviewer's perspective.

Don't give me 2000 lines unless you've made an honest good-faith attempt to break it up, and if it really can't be broken up into smaller units that make sense, at least break it up into units that let me see the progression of your thought as you solve the problem.


> Sometimes a new feature needs maybe 2k lines of code split over 10-20 or so files

I still disagree. Why was the feature not split up into more low-level details? I don't trust that kind of project management to really know what it's doing either.

I am not promoting micromanagement, but any large code review means the dev is having to make a lot of independent decisions. These may be the right decisions, but there's still a lack of communication happening.

Hands off management can be good for creativity and team trust, but ultimately still bad for the outcome. I'm speaking from my own experience here. I would never go back to working somewhere not very collaborative.


This is very much my take. As long as the general rule is a lack of long PRs, I think we get into a good place. Blueskying, scaffolding, all sorts of things reasonably end up in long PRs.

But, it becomes incumbent on the author to write a guide for reviewing the request, to call the reviewer's attention to areas of interest, perhaps even to outline decisions made.


> on the author to write a guide for reviewing the request

I'm not saying that doesn't work, but writing a guide means the author is now also doing all the planning too.

A successful "guide" then becomes more about convincing the reviewer. The outcome is either a lot of friction, or the reviewer is just going through the motions and trust is eroding.


That's fine but such a PR doesn't need to be (and actually can't) be reviewed. Or at least it can only be reviewed broadly: does it change files that shouldn't change, does it have appropriate tests, etc.


Maybe AI should not author big and complex features that cant be split up into parts and thus easily reviewed.


I'm reviewing PRs I wrote myself. Valid concern in a real org though.


I don’t understand. Are they AI PRs (as in the title), or did you write them yourself?


This is fundamentally a scaling problem, not a tooling problem. When AI generates PRs that no single person can fully grasp, the question isn't "how do we make reviewing 5,000 lines more comfortable" – it's "who is actually vouching for this code?" The answer is already deeply embedded in Git's tooling: every commit carries both an author and a committer field. The author wrote the code, the committer is the person who put it into the codebase. With git blame you always know who is to blame – in both senses. In the age of AI-generated code, this distinction matters more than ever: the author might be an LLM, but the committer is the human who vouches for it. Disclosure: non-native English speaker, used AI to help articulate these thoughts – the ideas are my own.


So who authored your comment?

What would you put into the commit message fields if it were a git commit?


Currently you'd read quite a lot of: "Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>"


Ah, co-authored, but that's different thing, isn't it?

Tbh I only know it from squashed PRs.

However:

> "Co-author" is not a git concept. It is a convention in commit messages used by some services, including GitHub. So, the solution is to edit the actual commit message with git commit --amend and add a line to the end:

https://stackoverflow.com/a/64311381


It's a door opened by git (interpret-trailers), walked through by GitHub with the Co-authored-by key and UI support, GitLab followed.


But why would you want to review long AI PRs in the first place? Why don't we apply the same standards we apply to humans? Doesn't matter if it was AI-generated, outsourced to Upwork freelancers or handcrafted in Notepad. Either submit well-structured, modular, readable, well-tested code or PR gets rejected.


Related to this, how do you get your comments that you add in the review back into your agent (Claude Code, Cursor, Codex etc.)? Everybody talks about AI doing the code review, but I want a solution for the inverse - I review AI code and it should then go away and fix all the comments, and then update the PR.


What you do is actually read the comments, think about how you can improve the code, and then improve it, whether by telling the agent to do that or doing it yourself


There’s a bunch of versions of this out there. This one’s mine, but it’s based on other ones. It works really well. It assesses the validity and importance of each comment, then handles it appropriately, creating issues, fixing the code, adding comments, updating the GH Copilot instructions file, etc.

https://github.com/cboone/cboone-cc-plugins/blob/main/plugin...


I tell claude code “review the comments on this PR” and give it the url, and that’s enough. It then uses the gh cli tool and fetches the PR and individual comments.


I suspect you don't need anything special for this. The GH API has support for reading comments from PRs. Maybe have it maintain a small local store to remember the IDs of the comments it's already read so it doesn't try to re-implement already-implemented fixes. Another similar thing you can do is a hook that reminds it to start a subagent to monitor the CI/autofix errors after it creates/updates a PR.


GitHub API is actually quite tricky here because there is a different between “comment” and “review” and “review comment” (paraphrasing, I don’t remember the details). So it’s not as simple as one API call that grabs the markdown. Of course you can write a creative one-liner to extract what you need, though.


I don't use it, but you can tag @copilot on GitHub comments and it will do so.

I don't do it because the chances of me reviewing vomited code are close to 0.


Reviewing large volume of code is a problem. In the pre-LLM era, as a workaround to occasionally review large PRs, I used to checkout the PR, reset commits, and stage code as I would review. In the first pass I would stage the trivial changes, leaving the "meat" of the PR that would need deeper thinking for later passes.

With the increased volume of code with agentic coding, what was once occasional is now a daily occurrence. I would like to see new kinds of code review to deal with larger volume of code. The current Github review interface does not scale well. And I am not sure Microsoft has organizational capacity to come up creative UI/UX solutions to this problem.


GitHub seems entirely uninterested in improving the code review experience (except maybe the stacked PRs thing if that ends up shipping) for well over a decade now.

Things that I’d consider table stakes that Phabricator had in 2016 - code movement/copying gutter indicators and code coverage gutters - are still missing, and their UI (even the brand new review UI that also renders suggestion comment diffs incorrectly) still hides the most important large file changes by default.

And the gutter “moved” indicators would be more useful than ever, as I used to be able to trust that a hand-written PR that moves a bunch of code around generally didn’t change it, but LLM refactors will sometimes “rewrite from memory“ instead of actually moving, changing the implementation or comments along the way.


You review long PRs by checking out the branch, git reset, then stage hunks/files as you review them. Reviewing long PRs in GitHub UI is never sane.


Or you just view each commit separately, assuming the author made reasonable commits.


Can I get an AI that automatically nitpicks AI PRs with the goal of rejecting them?


I built (using AI) a small cli that provides the breakdown of changes in a PR between docs, source, tests, etc

https://github.com/jbonatakis/differ

It helps when there’s a massive AI PR and it’s intimidating…seeing that it’s 70% tests, docs, and generated files can make it a bit more approachable. I’ve been integrating it into my CI pipelines so I get that breakdown as a comment on the PR


What about the data security, is it sending code to any servers or it works on client side?


Makes same-origin requests to github's frontend to fetch info about line counts(line count figures are only sometimes loaded into app state) - that's the only network calls it makes.


Cool thanks for the clarification


Care to opensource? I'd like to use it in firefox, will send a pr



the lines-viewed thing is cool for tracking progress but honestly the bigger problem I keep running into is that AI PRs look perfectly reasonable line by line. like the code reads fine, passes linting, tests pass - and then you realize it introduced a dependency you didn't need or there's a subtle auth bypass because the LLM pattern-matched from some tutorial that didn't handle edge cases. splitting into smaller commits helps but doesn't fully solve it because each commit also looks fine in isolation. I think we need better tooling around semantic review - not just did you read every line but did anyone actually verify the security properties didn't change. been spending a lot of time on this problem lately and tbh the existing static analysis tools weren't built for this pattern at all, they assume a human wrote something wrong not that an AI wrote something plausible but subtly broken


[dead]


I think you're right about a chunk of these cases but honestly I've also seen experienced devs do the same thing. like senior people who absolutely can write the code themselves but use AI to go faster and then skip the careful review because they trust their own judgment - they figure if they prompted it right the output is probably fine. and sometimes it is fine. but the failure mode is different from what you're describing, it's not illiteracy it's overconfidence. they know enough to think they'd catch a problem but the AI generates something that passes their mental model without triggering any alarms. the auth bypass example I mentioned - that was from someone who'd been writing auth code for years, they just didn't expect the LLM to quietly drop a check that was in the original code they were refactoring. so yeah the desperate-to-hide-illiteracy crowd is real and a problem but I think the more dangerous version is competent people who stopped being paranoid because the code looks right


[dead]


I don't think it proves the rule though, I think it's two completely separate failure modes that happen to look similar in code review. the illiterate crowd submits AI code because they can't write it themselves - sure. but the experienced crowd submits AI code because they wrote a good prompt and the output looked reasonable and they moved on to the next ticket. the second group is harder to catch because their PRs have the right structure, reasonable variable names, comments that make sense. you're not gonna flag it the way you'd flag someone who clearly doesn't understand what a middleware chain does. idk maybe I'm wrong about the proportions but in the codebases I've worked on the scary bugs came from people who should have known better, not from people who never knew in the first place. the illiterate ones get caught in review. the competent ones get a rubber stamp because everyone trusts them


[dead]


yeah I think we actually agree on the volume part, I'm not disputing that. my point is more about which group causes the bugs that make it to production. the garbage PRs from the illiterate crowd - those get caught. someone submits a PR where the error handling is clearly copy pasted from a chatbot and the variable names are arg1 arg2, that's an easy reject. but when your senior engineer submits something that looks clean because they prompted well and skimmed the output, that sails through review. I've literally seen a race condition introduced this way that sat in prod for weeks because the PR looked like something that person would write. so yeah the volume problem is real but I think it's a distraction from the harder problem


Was this vibe coded? Did you test it on itself?


Just autoclose any AI PRs.


LGTM




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: