Hacker Newsnew | past | comments | ask | show | jobs | submit | adamzwasserman's commentslogin

I enjoyed the deep dice. A lot of sensible advice, and enjoyed the deep dive. A lot of articles do not get a lot of that as right as this article does.

Anyone who also enjoyed it would probably get a kick out of my article on the same subject that goes into the regex (which has some valid use cases): https://hackernoon.com/on-the-practicality-of-regex-for-emai...


Do you need photos or paper?

pretext alleged bid to fund the war accustomed to constant unpredictability hoarding stashing useless bricks odyssey record high

Hide your kids, hide your wife, and hide your husband.

Nice idea, but most of the hiring managers I know have very little say over who gets in the door.


No it is not too soon, but it is contingent upon understanding how factories actually work, and what distinguishes industrial production from artisanal production. Even AI generated code is artisanal.

As afar as I know, my book "The Chaos Factory" is the only book that does this.


This answer seems very vague, buzzwordy and promotional. Not sure if a book from 2019 could explain the issues from the post.


It is obviously promotional, I wrote the book and I want people to read it.

The reason why I wrote a book is that it is a very long and complex answer. I will see if I can summarize it for you adequately in a HN post.

I start by building the case that corporate IT failure is a real and measurable long-running crisis. I cite Standish Group's 1994 Chaos Report and KPMG's Report on IT Runaway Systems: only a third of corporate IT projects are deemed successful by the executives who fund them. PMI estimated $300B was lost in the US to failed IT in 1999–2001 alone. Decades of methodologies and magic bullets haven't moved the number.

I explain how enterprise software is still produced by hand. Every successful project depends on master programmers mentoring less-experienced ones. Per Bob Martin, programmer headcount doubles every five years, so half the industry always has under five years' experience: "perpetual inexperience.

I explain why prior fixes failed: Project management responded to the 1994 reports by trying to freeze requirements, which killed IT's ability to track changing business needs. The COTS (Commercial Off The Shelf) wave collapsed against unique corporate processes. Industrialization attempts—CASE, CORBA, COM/.NET, MDA—all targeted the wrong thing: writing code.

I move to making the case that the correct target is assembly of code. Per Alexander Stepanov, only about 10% of a developer's time goes to creative coding; the other 90% is "glue, fitting, and assembly" — wiring up libraries, reconciling versions, plumbing data between layers. Auto design is still artisanal; what got industrialized was assembly, via machine tools, interchangeable parts, and semi-skilled labor.

Then I explain how to industrialize the assembly, not the coding. Compilers and JIT VMs are the robots IT already has; class libraries and open-source packages are its interchangeable parts. The FANGs already build internal mass-production tooling on top of these. Corporate IT must do the same so master programmers can leverage semi-skilled assemblers because we are not going magically exponentially increase the number of insanely great programmers.

The book was written eight years ago, so I will address AI coding here: LLMs are trained using a technique called gradient descent which is a way of finding the most mid possible point of anything and we then assign the highest weights to that. LLM write mid code by design and this is unchangeable. Any attempt to write insanely great code using an LLM is an unholy uphill battle of fighting the gradient descent every step of the way.


Too bad the pope frames modern slavery almost entirely through the lens of the digital economy. Chattel slavery is alive and thriving in Mauritania, the Gulf, Xinjiang, and elsewhere.

None of it dependent on AI, none of it apparently worth its own treatment in a 38,000-word document.


It is very disappointing that the article does not point out that decommissioning old centers in favor of new ones will improve energy efficiency 2x and compute-per-watt at the silicon level between 20–30x.

Even the most fervent eco-warrior should be demanding we build new data centers and decommission old ones as fast as we can.

So tired of all the bullshit around this.


Its a good point but one i deal with in a follow up article. The key here is to start the conversation not hit every point ;-)


Ah, thx for the reply.


People need to cope with the fact that no thought is original. Even Newton and Leibniz were having the same thoughts at the same time. Get over it.


When did the last original thought happen then? Clearly thoughts must have been original at some point, or there wouldn't be any at all


When did the first homo sapiens exist? Ideas like species evolve. Saying there are no original ideas seems to me an attempt to glibly capture something quite fundamental.


Hi dmoose, your handle looks familiar to me. The non-glib answer is that we should giver some very serious consideration to the possibility that language either functions like, or possibly is the same as, Jung's collective unconscious: the organically created repository of all of humankind's cognition and reason, accumulated over vasts periods of time, deposited by billions of humans.

My way of "giving this serious attention" is through pre-registered, falsifiable, repeatable, experimentation, which anyone can look up on osf.io because I use my real name. I'll bet you that non of the randos in this thread do as much.

To all of the randos: unless you have data... it is just an opinion.


> unless you have data... it is just an opinion

Glib as well, but this one hits home a lot harder. Well said.


I don't disagree with your premise, but I'd argue that saying "there are no original ideas" in the context of a discussion of plagiarism is needlessly reductive. Even though I think I mostly agree with the author here, I think there are legitimate counterarguments that can be made; equating all of the ways someone can cite or build upon an idea with copying something word-for-word and claiming it's your own is not one of them though.


No offense, but you sound like someone who has never built a language model. Anyone who has actually built one understands that there is no copying going on. Just predicting words (tokens actually).

The problem is that people's words are MUCH more predictable then they would like to believe. And that truth upsets them.

In addition to having created models, I also write books and articles. Probably more than most people commenting here. I have a firm grip on what actual copyright law is and the pros and the cons of it.


> No offense, but you sound like someone who has never built a language model. Anyone who has actually built one understands that there is no copying going on. Just predicting words (tokens actually).

> The problem is that people's words are MUCH more predictable then they would like to believe. And that truth upsets them.

I'm not offended. I do think it's a little weird that you seem to think "training on a bunch of stuff that includes a set of words" and then "predicting" those words exactly is somehow okay because theoretically it might be extrapolating the exact same words from combining other ones. I'd argue that if a model trains on data, and then reproduces exactly a large subset of that data, the bar should be pretty high to prove that it's not copying, and "you don't understand because you didn't implement this" is not a good basis for law.

> In addition to having created models, I also write books and articles. Probably more than most people commenting here. I have a firm grip on what actual copyright law is and the pros and the cons of it.

I'm not convinced you have a firm grip on the idea that no matter how smart you may be, "just trust me bro" is a pretty terrible strategy if you're actually intending to convince anyone of anything. If that's not what your goal is here, it's not clear why it's worth your time to respond to other people's comments when you clearly have so many other productive ways to spend your time.


You seem to discount the possibility that ideas are emergent and as they emerge, multiple people at once become aware of them.

I am asserting it is Charles Fort's "steam engine time". Far from a crank position. It is one that bears serious consideration.


No, I'm saying that simultaneous discovery and plagiarism are not philosophically incompatible, and treating them as equivalent is hard to take seriously.


Did those original thoughts not build upon all the original thoughts that came before them?


Is my house a copy of the dirt it's on top of? Did the people who built my house build the dirt? There's a difference between "building upon" an idea and trying to claim you built the idea itself


Sure they build upon them, you still need to add your 1% of original insight. There was a first person to realise that you could make fire by rubbing two sticks together.


Technically one of {Newton, Leibniz} was first, but you're missing GP's point


No, I think I just find it reductive. The fact that some ideas are independently thought by multiple people does not feel like a compelling argument for normalizing copying someone else's work verbatim and trying to pass it off as your own.


Im not sure where I land - is it just information compiled like humans do at scale or is it different? But I am sympathetic to your idea.

I think it points to an interesting trend either way. People are less tolerant of machines. Failures of machines are reviled because of their nature, even when the overall problem compared to humans is less. For example, self driving cars. If self driving cars halve traffic deaths from reckless driving but it occasionally mows over a family of four in broad daylight for no apparent reason, society will overwhelmingly reject the technology.

Basically, I dont think people will ever be satisfied even if we prove "its just doing the same thing we are." It's going to be held to a higher standard.


Having an original thought is in no way related to breaking copyright laws.

I don't think we should "get over" the fact that modern SOTA models couldn't exist without being trained on protected works.


I'm trained on protected works. Do I need to pay royalties?


If you produce them verbatim or in significant enough portions, yes.


> I'm trained on protected works.

That someone, at some point, paid for.

I'd like to understand why I can't use a song in one of my videos without permission/payment, but an AI company can train models using that song without having either.

I'm not anti-AI. I'd just like to see companies play by the rules everyone else has to follow.


> I'd like to understand why I can't use a song in one of my videos without permission/payment, but an AI company can train models using that song without having either.

Because training isn't redistribution.

You can also listen to the song and make a new one that sounds similar, just like the AI can.


To do that training, you must first obtain the item with the content you require. Did OpenAI purchase a copy of every book they trained their models on?

Answer: They did not. That is literally why there are dozens of ongoing lawsuits in progress.


For songs, it's not that hard to legally get access to it, I think. I'm not sure if Spotify can legally prevent you from using songs for AI training for example.


> I'd like to understand why I can't use a song in one of my videos without permission/payment, but an AI company can train models using that song without having either.

Because when you say you are “using” the song, what you mean is that you are distributing copies of the song, which is protected by copyright.

When AI companies train on the song, the model is learning from it. Outside of the rare cases of memorisation, this is not distributing copies and so copyright doesn’t have any say in the matter.

Learning isn’t copying, so copyright doesn’t get involved at all.


I appreciate your comment, but you answered as if this question had been answered legally. It has not.

The New York Times is suing both OpenAI and Microsoft for copyright infringement. The Authors Guild is suing OpenAI. Getty Images is suing Stability AI. Disney is suing Midjourney. Universal Music Group and Sony have filed suits against multiple AI companies.

> so copyright doesn’t get involved at all.

The dozens of ongoing cases that discredit that statement.


Which statement of mine do you think is not settled law? Which law do you think is being broken and how?

Your objection doesn’t make sense. In the event that an AI company loses a lawsuit for copyright infringement based on simply training on copyrighted works, the answer to you saying you’d like to understand why they can do it and you can’t is simply “your premise is wrong; neither of you can”.


> Which statement of mine do you think is not settled law?

I object to your statement that "copyright doesn’t get involved at all" when that is objectively untrue. If that was true, many of the world's largest companies wouldn't be spending tens of millions of dollars to have that question answered in court. Go to any law-focused forum, and you will find attorneys arguing over these questions.

To train a model using a book, you must first obtain a copy of that book. Did OpenAI purchase a copy of every book not already in the public domain used during training? They did not.

Some of the suits I mentioned claim that OpenAI literally stole copies of books to train its models.

My point is that the copyright question has not been answered. If the NYT, et. al. win, it will be a watershed moment for how AI companies pay for training data moving forward.


I'd like to understand why I can't use a song in one of my videos without permission/payment, but an AI company can train models using that song without having either.

You're right, it's an unjust situation. And you may note that no one else besides the AI companies has made any progress at all towards changing it.

Copyright will soon die, having outlived its usefulness to society. Whether the knife is held by someone named Stallman or someone named Altman is of little consequence.


I submit most of the replies to my original reply as proof that there are no original thoughts.


Should we hold machines to the same standards as humans though?


OK, and the AI labs are open sourcing their frontier models since those are not original either. Right? RIGHT?


Why post comments then?


same reason we do anything else - sweet, sweet dopamine


For funsies


Why post comments then?


Because some thoughts can, actually, be original ? Or relatively original enough ? Or simply, pertinent and timely ?


to bring attention to certain ideas


reiteration is still important


I've noticed that AI has caused this narrative to become more popular. "Nothing is original anyway, so why bother?" That's pure cope and you know it. A deep insecurity masked as bold truthtelling.


I think you're right, the ease in which AI can do task that we previously considered unique to human creativity does force us to further rethink and acknowledge how creativity is in a large part about "remixing" prior works, although of course we've had discourse about this for at least as early as Richard Simon's 1678 "Critical History of the Old Testament", which identified it as being a remix of earlier sources [0].

[0] https://archive.org/details/hisyo00simo/page/n1/mode/2up


Nono, actually there are no thoughts. Every utterance is just a copy of a previous utterance plus a slight random mutation. (somewhat /s)


I know I did done:

https://news.ycombinator.com/item?id=46240221

I built a Chrome extension to solve a problem I kept having: losing track of conversations on HN. The threads page is a firehose. Someone replies to your comment, you miss it, the conversation dies. Or you revisit a thread and can't remember which comments you've already read.

HN Reader does three things:

1. Hides stories you've seen – Checkbox next to each story. Check it to dim. Helps filter the front page to stuff you haven't looked at yet. 2. Collapsible comments that remember – Click the arrow to collapse a comment thread. Come back later and it stays collapsed, unless someone added a new reply, then it auto-expands with a "NEW" badge. 3. Highlights your conversations – On your threads page, badges show "you", "replied to you", and "you replied" so you can instantly spot active conversations.

That last one is what made me build this. I was missing replies buried in long threads. Now I just glance at my threads page and the blue "replied to you" badges jump out.

Everything stays in local storage. No server, no account, no tracking. Auto-cleans old data when storage gets full.

GitHub: https://github.com/adamzwasserman/hnreader

Works in Chrome, Arc, and any Chromium browser. Load it unpacked from the repo.

Feedback welcome – especially on what other HN reading problems you'd want solved.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: