Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
[dupe] Combating abuse in Matrix – without backdoors (matrix.org)
222 points by jeltz on Oct 20, 2020 | hide | past | favorite | 18 comments


This was previously submitted to HN 23 hours ago — and in fact, there's even a link to the previous submission at the bottom of the Matrix article, which is how I found it

https://news.ycombinator.com/item?id=24826951


I'm surprised the site doesn't convert a dupe post to an upvote (if the user hasn't already upvoted).


It does, based on a URL match. One of the links included a "www." hostname prefix in the URL and the other did not, so the URLs don't match.


So HN could detect dups based on hashes instead? I understand the HTML could be different while the content is the same but this is an extra step that helps.


Then you'll have one of these "fancy" modern websites that load their content entirely through js and it'll fail because the HTML is the same on all pages.

A URL match is probably good enough the vast majority of the time. Maybe it could also support a bit of fuzzing, such as matching with and without the leading www and both http and https. Beyond that it's probably asking for trouble.


> one of these "fancy" modern websites that load their content entirely through js [will] fail because the HTML is the same on all pages.

That's a feature, not a bug. (Although it would admittedly be better to block those explicitly rather than relying on coincidental interactions with something that doesn't seem directly related.)


> Then you'll have one of these "fancy" modern websites that load their content entirely through js and it'll fail because the HTML is the same on all pages.

What are you talking about, it would be done entirely on the backend.

Mind you, I've brought it up with dang multiple times and he says it would be a hassle and too brittle to be effective (fair enough), but nothing about it would require javascript.


I believe the parent is talking about submitting links to websites that render their content via clientside JavaScript and how that would break the hash dupe detection. They aren't suggesting that the functionality would need to be implemented in JS by HN.

Regardless, hashing the content to detect dupes is just an idea that wouldn't work for a lot of reasons.


We tried that kind of thing and it was a nightmare. Trying to make general content-processing things on the web is a full time job and more.

I do think it's practical for us to make use of <link rel='canonical'> though. But many pages don't include that. The OP doesn't for example, so that wouldn't have helped here.


Matching against the title and domain would probably catch most duplicates, including this one. Although that's still work.


There are so many small variations in titles that one probably couldn't rely on that alone, but it's a good idea.


hashes of what, though? Without accessing the content, the URL is essentially the only thing to go on. Plus having the same post make it to the front page pretty much means the system is working exactly as intended: content that HN users are interested in makes it to the widest audience (not everyone checks HN multiple times a day =)


THANK YOU. I have never understood why folks on Reddit, etc. are so vehemently opposed to reposts. If people are upvoting it, that means they like it. If they like it, that means either a) it's new to them, or b) they enjoy seeing it again.


For what it's worth, my comment wasn't meant as a complaint that it was posted again. I just wanted to make sure that folks saw the previous submission, since that one has comments by the Matrix lead.


Understood, and I (for one) interpreted your comment as you intended it (and - thank you for providing that context!).


Who is going to crawl all posted links? Also, HTML keeps changing. Unnecessary (and very bad) way to check.


It does, but only in a certain window (I believe 12 hours). After that, dang and other admins can (and have done) merge duplicate threads.


Comments merged thither. Thanks!

other than these of course




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: