Combating abuse in Matrix – without backdoors

justusthane · on Oct 20, 2020

This was previously submitted to HN 23 hours ago — and in fact, there's even a link to the previous submission at the bottom of the Matrix article, which is how I found it

https://news.ycombinator.com/item?id=24826951

robertlagrant · on Oct 20, 2020

I'm surprised the site doesn't convert a dupe post to an upvote (if the user hasn't already upvoted).

bhauer · on Oct 20, 2020

It does, based on a URL match. One of the links included a "www." hostname prefix in the URL and the other did not, so the URLs don't match.

wslh · on Oct 20, 2020

So HN could detect dups based on hashes instead? I understand the HTML could be different while the content is the same but this is an extra step that helps.

simias · on Oct 20, 2020

Then you'll have one of these "fancy" modern websites that load their content entirely through js and it'll fail because the HTML is the same on all pages.

A URL match is probably good enough the vast majority of the time. Maybe it could also support a bit of fuzzing, such as matching with and without the leading www and both http and https. Beyond that it's probably asking for trouble.

a1369209993 · on Oct 20, 2020

> one of these "fancy" modern websites that load their content entirely through js [will] fail because the HTML is the same on all pages.

That's a feature, not a bug. (Although it would admittedly be better to block those explicitly rather than relying on coincidental interactions with something that doesn't seem directly related.)

krapp · on Oct 20, 2020

> Then you'll have one of these "fancy" modern websites that load their content entirely through js and it'll fail because the HTML is the same on all pages.

What are you talking about, it would be done entirely on the backend.

Mind you, I've brought it up with dang multiple times and he says it would be a hassle and too brittle to be effective (fair enough), but nothing about it would require javascript.

justusthane · on Oct 21, 2020

I believe the parent is talking about submitting links to websites that render their content via clientside JavaScript and how that would break the hash dupe detection. They aren't suggesting that the functionality would need to be implemented in JS by HN.

Regardless, hashing the content to detect dupes is just an idea that wouldn't work for a lot of reasons.

dang · on Oct 20, 2020

We tried that kind of thing and it was a nightmare. Trying to make general content-processing things on the web is a full time job and more.

I do think it's practical for us to make use of <link rel='canonical'> though. But many pages don't include that. The OP doesn't for example, so that wouldn't have helped here.

krapp · on Oct 20, 2020

Matching against the title and domain would probably catch most duplicates, including this one. Although that's still work.

dang · on Oct 20, 2020

There are so many small variations in titles that one probably couldn't rely on that alone, but it's a good idea.

TheRealPomax · on Oct 20, 2020

hashes of what, though? Without accessing the content, the URL is essentially the only thing to go on. Plus having the same post make it to the front page pretty much means the system is working exactly as intended: content that HN users are interested in makes it to the widest audience (not everyone checks HN multiple times a day =)

scubbo · on Oct 20, 2020

THANK YOU. I have never understood why folks on Reddit, etc. are so vehemently opposed to reposts. If people are upvoting it, that means they like it. If they like it, that means either a) it's new to them, or b) they enjoy seeing it again.

justusthane · on Oct 20, 2020

For what it's worth, my comment wasn't meant as a complaint that it was posted again. I just wanted to make sure that folks saw the previous submission, since that one has comments by the Matrix lead.

scubbo · on Oct 22, 2020

Understood, and I (for one) interpreted your comment as you intended it (and - thank you for providing that context!).

ffpip · on Oct 20, 2020

Who is going to crawl all posted links? Also, HTML keeps changing. Unnecessary (and very bad) way to check.

mschuster91 · on Oct 20, 2020

It does, but only in a certain window (I believe 12 hours). After that, dang and other admins can (and have done) merge duplicate threads.

dang · on Oct 20, 2020

Comments merged thither. Thanks!

other than these of course