Hacker Newsnew | past | comments | ask | show | jobs | submit | theamk's commentslogin

wrong link? that's AWS front page, and it has no references to space for me

> Unless this are enterprise disks with capacitors anything can happen when it suddenly looses power. Not the FSes fault.

Most filesystems just get a few files/directories damaged though. ZFS is famous for handling totally crazy things like broken hardware which damages data in-transit. ext4 has no checksum, but at least fsck will drop things into lost+found directory.

The "making all data inaccessible" part is pretty unique to btrfs, and lets not pretend nothing can be done about this.


What kind of "education overhaul" you have in mind? Some things can be easily verified in class (run a mile), but some require effort (write exam in class / testing center), and some require too much effort to be practical (multi-day research or programming project).

Unfortunately at the high school level, the materials are not that complex, and there are a lot of ways to cheat. Answer keys for textbooks, graphical calculators (or CAS systems), reports copied wholesale from some websites. AI just made all of this significantly worse.


That's an answer I don't have, and am not qualified to make. I'd defer the decision to teachers and those who work well with children already, I just don't think this current iteration works.

If I had to guess, it would look something like a software that confines the student to the software and provides interactive lessons and exams...but I'm a computer guy, my answer will always be "use a computer"


"confines the student to the software and provides interactive lessons and example" - this already exists. It is also useless without continuous supervision, as students will simply take a 2nd device (cell phone or tablet), start LLM app on it, then point to locked-down device's screen and ask to solve the problem. Yes, it slows down the process a bit since the students have to actually re-type the LLM answers instead of copy-pasting them, but it does not eliminate the problem.

"That's an answer I don't have .. I'd defer the decision to teachers" - you are really sounding right now like someone who comes to a town's discussion of whether to get more solar panels, and starts saying how nice it would be if the fusion were solved, and we all had an near-infinite source of cheap and clean energy. Yes, it would be nice, but unless you have a good idea on how to achieve this, please don't distract people from the real problems they have.

The AI-in-education is the same way: there is a crisis right now, and it seems that the only way is to lean heavily onto proctored exams - which students hate, and are more expensive for schools too. Saying "There should be a better way, I have no idea about what this better way is, but meanwhile what you are doing is bad" really does not help much.


Another example of AI Psychosis[0]?

https://news.ycombinator.com/item?id=45027072


Original source of their troubles:

"Delve – Fake Compliance as a Service" https://news.ycombinator.com/item?id=47444319


As long as you clearly document that the incoming data is going to be modified, it's not a problem. And in a lot of cases, the data either comes from the network or is read from the file - so the buffer is going to be discarded at the end anyway... why not reuse it?

And yes, today it would be easier to make a copy of the data... but remember we are talking about 90's, where RAM is measured in megabytes and your L1 cache may be only 8KB or so.


x86 had 6 general-purpose working registers total. Using length + pointers would have caused a lot of extra spills.

“Sure your software crashes and your machines get owned, but at least they’re not-working very fast!”

Right. This is so often the excuse for terrible designs in C and C++. It's wrong, "But it's faster". No, it's just wrong, only for correct answers does it matter whether you were faster. If just any answer was fine there's no need to write any of this software.

First common 32 bit system was Win 95, which required 4MB of RAM (not GB!). The 4-byte prefix would be considered extremely wasteful in those times - maybe not for a single string, but anytime when there is a list of strings involved, such as constants list. (As a point of reference, Turbo Pascal's default strings still had 1-byte length field).

Plus, C-style strings allow a lot of optimizations - if you have a mutable buffer with data, you can make a string out of them with zero copy and zero allocations. strtok(3) is an example of such approach, but I've implemented plenty of similar parsers back in the day. INI, CSV, JSON, XML - query file size, allocate buffer once, read it into the buffer, drop some NULL's into strategic positions, maybe shuffle some bytes around for that rare escape case, and you have a whole bunch of C strings, ready to use, and with no length limits.

Compared to this, Pascal strings would be incredibly painful to use... So you query file size, allocate, read it, and then what? 1-byte length is too short, and for 2+ byte length, you need a secondary buffer to copy string to. And how big should this buffer be? Are you going to be dynamically resizing it or wasting some space?

And sure, _today_ I no longer write code like that, I don't mind dropping std::string into my code, it'd just a meg or so of libraries and 3x overhead for short strings - but that's nothing those days. But back when those conventions were established, it was really really important.


> First common 32 bit system was Win 95

We're just going to ignore Amigas, and any Unix workstations?


> query file size, allocate buffer once, read it into the buffer, drop some NULL's into strategic positions, maybe shuffle some bytes around for that rare escape case, and you have a whole bunch of C strings, ready to use, and with no length limits.

I have also done this, but I would argue that, even at the time, the design was very poor. A much much better solution would have been wise pointers — pass around the length of the string separately from the pointer, much like string_view or Rust’s &str. Then you could skip the NULL-writing part.

Maybe C strings made sense on even older machines which had severely limited registers —- if you have an accumulator and one resister usable as a pointer, you want to minimize the number of variables involved in a computation.


> zero copy and zero allocations

This is a red herring, because when you actually read the strings out, you still need to iterate through the length for each string—zero copy, zero allocation, but linear complexity.

> query file size, allocate buffer once, read it into the buffer, drop some NULL's into strategic positions, maybe shuffle some bytes around for that rare escape case, and you have a whole bunch of C strings, ready to use, and with no length limits.

I write parsers in a very different way—I keep the file buffer around as read-only until the end of the pipeline, prepare string views into the buffer, and pipe those along to the next step.


I don't see what's "red herring" about it - for a reasonable format, any parsing will normally be O(n) complexity, so all we can do is to decrease constant factor.

So _today_ I write parsers in a very different way as well, copying strings is very cheap (today) and not worth it extra complexity.

But remember we are talking about the past, when those conventions are being established. And back in the 90's, zero copy and zero allocations were real advantage. Not in the theoretical CS sense, but in very practical - remember there was _no_ "dynamically resizing vector" in C's (or Pascal's) stdlib, it's just raw malloc() and realloc(), and it is up to you to assemble vector from it as needed. And free()/malloc() overhead was non-trivial, you had to re-use and grow the buffer as needed. And you want to store the parsed data, storing separate length would double your index size! So a parse-in-place + null-terminated strings approach would give you both smaller code and smaller runtime, at the expense of a few sharp corners. But we were all running with scissors back then.


I think the concern was conserving memory ( which was scarce back then) and not iterating through each substring.

I am very sceptical about that. Much safer and cleaner languages like ML and Lisp were contemporary to C, and were equally developed on memory-scarce hardware.

Maybe on the high-end machines in some fancy lab somewhere?

All I saw were 386's and 486's, and I am pretty sure every piece of software I ever used was either C or Turbo Pascal or direct assembly. In the mid-90s, Java appeared and I remember how horribly slow those Java apps were compared to C/Pascal code.


They were also comparatively slow, no? And their runtimes used up much more of that scarce memory than a C program did.

But does it even conserve memory? Copying a string when you have the length is 2 bytes of machine code on x86 (rep movsb).

Remember, code takes up memory too.


How do you drop nulls in the middle of a string without requiring O(N) extra space to restore the original characters?

Besides my DA/Algo classes in College, I've never used C seriously. And you know, it's semantics like this that really make me go WTF lol....

From strtok man page... "The first time that strtok() is called, str should be specified; subsequent calls, wishing to obtain further tokens from the same string, should pass a null pointer instead."

Really?? a null pointer.. This is valid code:

  char str[] = "C is fucking weird, ok? I said it, sue me.";
  char *result = strtok(str, ",");
  char *res = strtok(NULL, ",");
Why is that ok?

You have to understand the context, and the time period. Memory and CPU cycles were precious. All computers being 24/7 networked wasn't a thing, so security wasn't much of a concern. API design tended to reflect that.

Not mentioned in my initial comment, but yeah, I'm viscerally aware of the affect the time period and resources at the time have on API design in C and other languages from that time period.

The null pointer in place of the operand here just seemed like a really good quirk to point out


It's like this because the 1970s C programmer, typically a single individual, is expected to maintain absolute knowledge of the full context of everything at all times. So these functions (the old non-re-entrant C functions) just assume you - that solo programmer - will definitely know which string you're currently tokenising and would never have say, a sub-routine which also needs to tokenize strings.

All of this is designed before C11, which means that hilariously it's actually always Undefined Behaviour to write multi-threaded code in C. There's no memory ordering rules yet in the language, and if you write a data race (how could you not in multi-threaded code) then the Sequentially Consistent if Data Race Free proof, SC/DRF does not apply and in C all bets are off if you lose Sequential Consistency† So in this world that's enough, absolute mastery and a single individual keeping track of everything. Does it work? Not very well but hey, it was cheap.

† This is common and you should assume you're fucked in any concurrent language which doesn't say otherwise. In safe Rust you can't write a data race so you're always SC, in Java losing SC is actually guaranteed safe (you probably no longer understand your program, but it does have a meaning and you could reason about it) but in many languages which say nothing it's game over because it was game over in C and they can't do better.


GNU Parallel's citations was always weird - the author IMHO goes too far, in requiring each user to type "will cite" first time they use the app, even if context is entirely non-academic. Of course the author has the right to ask for anything, but I think that's too much, plus parallel is not nearly complex compared to other tools people use, like compilers or math libraries or Linux kernel - and those do not need citations.

I suspect that the main reason people are using of "parallel" is because it was the first to grab this very nice, very obvious package name. "rush" and "gargs" and "pexec" simply does not have the same obviousness to it.

If you think asking for citations is too much (and I do), I recommend using something else. Yes, "parallel" does have some nice features, and something you might need to use an extra command or two, but it is worth it for the peace of mind, plus it respects author's wishes too.


Yes, I agree. It's been small enough of an issue for me to care (I have `~/.parallel/will-cite` set by my dotfiles repo, so wouldn't even see it on a new machine), but now I switched to `rust-parallel`.

Picked that one because it was supposedly the fastest, I liked the Github page and I will remember the name :) And I guess I was hoping for it to be a drop-in replacement for `parallel` interface-wise, which it turned out it was not, but my needs are quite minimal. I used to do:

ls | parallel 'echo {} && git -C {} fetch --prune --all'

Now I do:

( for i in $(ls); do echo git -C $i fetch --prune --all; done ) | rust-parallel -p


There is a a lot of wishful thinking here that I do not believe to be true.

For Amazon retail, the the intent is not to "return what someone paid to show you" - but rather to "maximize amount of income for Amazon". I am sure they are making A/B tests with different amount of ads vs organic placement. The moment profits from placement ads displace profits from fulfillment/referral/closing fees by too much, then ads will be dialed down.

(And that's why Amazon search is so crap - yes, maybe you really want that specific part, but A-B search showed that showing random vaguely related matches will sell more stuff, perhaps because of impulse buyers, perhaps people getting wrong thing and not returning it)

And no, Grainger model is not a good fit for general shopping. Not everyone is a purchasing manager. Who wants to get no good results because they typed "picture hanging bolt" instead of "eye wood screw"?

And for AWS, bandwidth is only one of the moats. There is institutional knowledge - in existing systems, docs, people's heads, code on disk. There is common knowledge - there is ton of info on the web. The unified org/billing is huge (maybe Planetscale is so much better than RDS, but setting the relationship requires half-a-year of negotiations with finance and security, while RDS is 3 clicks away). The support is also pretty good, at least at the higher tiers.

Sorry OP, I am sure that Amazon will die one day, but not soon, and not from the problems you are describing.


> Who wants to get no good results because they typed "picture hanging bolt" instead of "eye wood screw"?

Honestly, I prefer that over having to look through pages of barely-related search results to find the pearl hiding in there somewhere. But then I'm clearly not in Amazon's target market since I ditched them a few years back and was better off for it.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: