Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> But then at the end it added a "Fun fact" that unicode actually does have a seahorse emoji, and proceeded to melt down in the usual way.

To be fair, most developers I’ve worked with will have a meltdown if I try to start a conversation about Unicode.

E.g. if during a job interview the interviewer asks you to check if a string is a palindrome, try explaining why that isn’t technically possible in Python (at least during an interview) without using a third-party library.



Just slap a "assert foo.isascii()" at the beginning and proceed? It's just an interview


> try explaining why that isn’t technically possible in Python (at least during an interview) without using a third-party library.

I'm actually vaguely surprised that Python doesn't have extended-grapheme-cluster segmentation as part of its included batteries.

Every other language I tend to work with these days either bakes support for UAX29 support directly into its stdlib (Ruby, Elixir, Java, JS, ObjC/Swift) or provides it in its "extended first-party" stdlib (e.g. Golang with golang.org/x/text).


> try explaining why that isn’t technically possible in Python (at least during an interview) without using a third-party library.

You're more likely to impress the interviewer by asking questions like "should I assume the input is only ASCII characters or the complete possible UTF-8 character set?"

A job interview is there to prove you can do the job, not prove your knowledge and intellect. It's valuable to know the intricacies of Python and strings for sure, but it's mostly irrellevant for a job interview or the job itself (unless the job involves heavy UTF-8 shenanigans, but those are very rare)


Don’t leave me in suspense! Why isn’t possible?


At a guess, there's nothing in Python stdlib which understands graphemes vs code points - you can palindrome the code points but that's not necessarily a palindrome of what you "see" in the string.

(Same goes for Go, it turns out, as I discovered this morning.)


It's a scream how easy it is in PHP of all things:

    function is_palindrome(string $str): bool {
        return $str === implode('', array_reverse(grapheme_str_split($str)));
    }

    $palindrome = 'satanoscillatemymetallicsonatas';
    $polar_bear = "\u{1f43b}\u{200d}\u{2744}\u{fe0f}";
    $palindrome = str_replace($palindrome, 'y', $polar_bear);
    is_palindrome($palindrome);


Are you trying to start a conversation about unicode or intentionally pretending you dont understand what the interviewer asked for with "string is a palindrome" question?

Cause if you are intentionally obtuse, it is not meltdown to conclude you are intentionally obtuse.


These sorts of questions are what I call “Easter eggs”. If someone understands the actual complexity of the question being asked, they’ll be able to give a good answer. If not, they’ll be able to give the naive answer. Either way, it’s an Easter egg, and not useful on its own since the rest of the interview will be representative. The thing they are useful for is amplifying the justification. You can say “they demonstrated a deeper understanding of Unicode by pointing out that a naive approach could be incorrect”.


E.g. Can you completely parse HTML with regex?


You can't parse [X]HTML with regex. Because HTML can't be parsed by regex. Regex is not a tool that can be used to correctly parse HTML. As I have answered in HTML-and-regex questions here so many times before, the use of regex will not allow you to consume HTML. Regular expressions are a tool that is insufficiently sophisticated to understand the constructs employed by HTML.

etc. https://stackoverflow.com/a/1732454


If by "parse" you mean "match", the answer is yes because you can express a context-free language in PCRE.

If you mean "parse" then it's probably annoying, as all parser generators are, because they're bad at error messages when something has invalid syntax.


Is this true, in practice, given the lenient parsing requirements of the real world?


Technically, no

Practically, yes


To be fair, most developers I’ve worked with will have a meltdown if I try to start a conversation about Unicode.

Why are we being "fair" to a machine? It's not a person.

We don't say, "Well, to be fair, most people I know couldn't hammer that nail with their hands, either."

An LLM is a machine, and a tool. Let's not make excuses for it.


> Why are we being "fair" to a machine?

We aren't, that turn of phrase is only being used to set up a joke about developers and about Unicode.

It's actually a pretty popular form these days:

a does something patently unreasonable, so you say "To be fair to a, b is also patently unreasonable thing under specific detail of the circumstances that is clearly not the only/primary reason a was unreasonable."


I think people are making explanations for it - because it's effectively a digital black box. So all we can do is try to explain what it's doing. Saying "be fair" is more colloquial expression in this sense. And the reason he's comparing it to developers and unicode is a funny aside about the state of things with unicode. And Besides that, LLMs only emit what they emit because it's trained on all those said people.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: