Hacker Newsnew | past | comments | ask | show | jobs | submit | blt's commentslogin

What makes this different from linking to a random zip file somewhere?


Microsoft could have used any dataset for their blog, they could have even chosen to use actual public domain novels. Instead, they opted to use copywritten works that JK hasn't released into the public domain (unless user "Shubham Maindola" is JK's alter ego).


Rowling is known for using pseudonyms. Maybe she got tired of writing and decided to break into LLM tech.


The licensing: If I steal something and tell you its free and yours for the taking, that feels different than a Fence (knowingly) buying stolen goods. It's obviously semantics and there should have been some better judgemend from MS, but downloading a dataset (stated as public domain) from kaggle feels spiritually different from piracy (e.g.: if someone uploads a less known, copyrighted data set to kaggle/huggingface under an incorrect license, are tutorials that use this data set a 'guide to pirating' this data set? To me, that feels like a wrong use of the term)


The licence?

If it comes from a site claiming it was under a licence when it was not, the misdeed is done by the person who provided the version carrying the licence.


Just because it says "CC0" does not make it CC0. If you upload a dataset you don't have the rights to, any license declaration you make is null and void, and anyone using it as if it had that license is violating copyright

Even if MS could claim that they were acting in good faith there really isn't much legal wiggle room for that. But it doesn't even come to that because I don't think anyone would buy that they really thought that the Harry Potter books were under the CC0


If you buy a pirated book on Amazon you get to keep the book and the pirate printer is the one persecuted.

Same thing applies here.

Up to 80% off all works that are in copyright terms are accidentally in the public domain. A well known example is Night of the Living Dead. It is not your job to check that the copiright on a work you use is the correct one.


The only reason you get to keep the book is because no bothers to enforce the law, this doesn't make it legal.

And it is your job to check that you have the rights to use other people's work. Ignorance is not a defence.


>the law

Which ones? As far as I was aware, it's a crime to redistribute copyrighted works, not receive.


Copyright act 1968. Sect 116.


Section 116 (2) A plaintiff is not entitled by virtue of this section to any damages or to any other pecuniary remedy, other than costs, if it is established that, at the time of the conversion or detention:

(a) the defendant was not aware, and had no reasonable grounds for suspecting, that copyright subsisted in the work or other subject - matter to which the action relates;

(b) where the articles converted or detained were infringing copies--the defendant believed, and had reasonable grounds for believing, that they were not infringing copies; or

(c) where an article converted or detained was a device used or intended to be used for making articles--the defendant believed, and had reasonable grounds for believing, that the articles so made or intended to be made were not or would not be, as the case may be, infringing copies.

Does this not mean the opposite of your claim? It sounds to me that if you unwittingly bought a dodgy copy of something, the law thinks the copyright owner can get you to pay for a legit copy, but not punish you for your mistake.

In the specific case of the Harry Potter works, the fame might meet the threshold of reasonable grounds for believing, but noosphr's argument that "Up to 80% off all works that are in copyright terms are accidentally in the public domain" could grant a reasonable grounds for believing it is not.

This is one of those things that causes interesting court cases because a reasonable grounds for believing X is not the same thing as not reasonable grounds for believing not X. Reasonable grounds for suspicion probably carries more weight here than reasonable grounds for the absence of suspicion, but cases have hung on things like this before , like the presence or absence of an Oxford comma.


Australia doesn't have fair use either. Who cares what a country smaller than California in population and economy does?


Oh come on. The licence was obviously incorrect and you cant escape culpability because of that.


The 'artwork' they generated and the text on the blog post?


And their predictions about Go were wrong, because they thought the algorithm would forever be α-β pruning with a weak value heuristic


An underrated aspect of Matlab is its call-by-value semantics. Function arguments are copied by default. Python+NumPy is call-by-reference; mutations to array arguments are visible to the caller. This creates a big class of bugs that is hard for non-programmers to understand.


Clarity in writing comes mostly from the logical structure of ideas presented. Writing can have grammar/style errors but still be clear. If the structure is bad after translation, then it was bad before translation too.


Yes. Generative AI is bad. Most of the general population already realizes it. Only the tech and computer science bubble remains optimistic.


The (non-tech) industry I am in generates an enormous amount of text that it's fairly certain nobody reads past the executive summary.

My workmates love it. Amongst the tech community, I see a divide very similar to the crypto one - everybody who has a stake in it succeeding is very optimistic. Everybody working in other areas seems dubious at best.


Yeah, overall it feels somewhat useful in a work context. Nothing transformative though.


Generating huge amounts of stylistically awful ultra-verbose nonsense isn’t actually useful in any industry other than the ultra-low-end journalism/blogspam space. Either no-one reads it anyway, in which case it’s more or less a wash, or someone actually has to read it, in which case it’s a productivity drain.


It's a productivity boost for the people who have to generate the text that nobody reads. At an organisational level it's a wash caused by requiring that text in the first place


Literally every conversation that I am in or hear about AI outside of tech circles (and most of them inside) are negative.

Is there any technology in recent memory where the greator public has turned so universally against it?


I don't think any tech has ever been pushed so hard in front of people so fast. People hated facebook for ages but you could just not use it. While these AI features are shoved in front of your face constantly. Google sticks them at the top of every search, reddit sticks generated slop on every page, every SaaS has rebranded themselves as an AI tool with constant popups telling you to use the new AI feature.

Most other crappy tech you could just choose to not use.


Google+ was like this, in that they shoved that stuff into your face constantly with no opt-out, but just on Google.


3D tv maybe? It was never as prominent, but people exposed to it mostly strongly disliked it.

Maybe ‘the metaverse’.


Not that I can think of, but the reason I think it happened for AI so quickly:

AI enshittified way, WAY too quickly.

The thing is that all these tech companies are really just innovating new ways to scam consumers into adopting something that's worse for them. They just subsidize the bad stuff and, eventually, have to start bleeding consumers dry.

Uber is now more expensive than taxis, AirBNB is more expensive than hotels, placing an order online is more inconvenient than calling, and on and on. But it took decades for this to transpire. For a long time, these new things were actually better.

But AI was pushed so hard, so severely, that it became enshittified way too quick. And consumers are already on guard after seeing tech A-Z slowly make their life worse.


Web3/crypto?

Ad tech?


What bubble are you living in where it's universal and not mostly apathetic?


I don't think that this is tree. I'm a high school student, and I've overheard quite a few conversations between our admin about whether they prefer Microsoft Copilot or ChatGPT.


That may not be because they like it but because they're required to use it. The teachers in my area, at least, are mandated to use AI themselves and integrate it into their curriculum.


I think people in the cs bubble will be optimistic up until it directly affects them by destroying software engineering as a (good) career.


that's already happening for Juniors, combination of ai and less funding


It's already happening for all the seniors who got hit by the layoffs ("because AI make code go brrrr") in these past two years, too.


Yet non-tech people are paying for ChatGPT which blows my mind.


What do you mean by tech and compsci bubble? Many of the software engineers I interact with don’t seem all that optimistic or positive about the AI tools. There are bubbles for either side I think.

But I’m one of those who hasn’t had great experiences using them for anything beyond toy projects.. so maybe my bubble falls more on the AI skeptic side.


I think it's the money-generating part of "tech".. they see this "glorified spellcheck", assume endless possibilities that everyone will want to buy, and are busy placing "buy AI now" buttons everywhere.. (well, more like "Try it, try it, you'll be amazed, and if you give us money for the premium version you'll be even more amazed!" buttons)


This needs to be refined: f(x) is O(g(x)) if there exists some X >= 0 such that f(x)/g(x) is bounded for all x > X.

Otherwise, we cannot say that 1 is O(x), for example.


The definition of big O notation is pure math - there is nothing specific to analysis of algorithms.

For example: "the function x^2 is O(x^3)" is a valid sentence in big-O notation, and is true.

Big O is commonly used in other places besides analysis of algorithms, such as when truncating the higher-order terms in a Taylor series approximation.

Another example is in statistics and learning theory, where we see claims like "if we fit the model with N samples from the population, then the expected error is O(1/sqrt(N))." Notice the word expected - this is an average-case, not worst-case, analysis.


I should have probably been a lot more clear about who the target audience of my post was.


Yes, in most cases the reduction of supervised learning to optimal control is not interesting.

We can also reduce supervised learning to reinforcement learning, but that doesn't mean we should use RL algorithms to do supervised learning.

We can also reduce sorting a list of integers to SAT, but that doesn't mean we should use a SAT solver to sort lists of integers.


Awaiting the rocket engine equivalent of the K20.


tangentially, does anyone know a good way to limit web searches to the "low-background" era that integrates with address bar, OS right-click menus, etc? I often add a pre-2022 filter on searches manually in reaction to LLM junk results, but I'd prefer to have it on every search by default.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: