We treat a submission as a duplicate if the story has had significant attention in the last year or so. This is in the FAQ: https://news.ycombinator.com/newsfaq.html.
If a story hasn't*had significant attention in the last year or so, then we don't treat it as a dupe, because it's important for good articles to get multiple chances at getting attention. Otherwise the randomness of what gets noticed on /newest would be even more dominant than it already is.
I like it the way it is currently. Now suppose you add some smartness to website , on what basis you decide which duplicate to remove? HN is the last website I want to add AI to it's recommendation system. We are already being fed so much, on every other platform.
Let community moderate the itself. It's old school and maybe dumb but not everything needs smart ass AI.
I propose automatic de-dupe: whatever the title says, if the exact same URL has already been submitted, just count it as an upvote on the existing story... optional: if the submitted caption is different, then add a comment that says "also submitted by X with the caption `Y`."
People want upvotes so they submit it. Setting up scanning for recent articles is probably harder than it seems. Can’t think of any other forum that does it.
>Setting up scanning for recent articles is probably harder than it seems. Can’t think of any other forum that does it.
The vast majority of dupes posted here are the same domain, site and title. Catching that would be as easy as a call to the Algolia API. I'm using the Hacker News enhancement suite add-on for Firefox and it does that to generate a list of prior threads. That leaves edge cases, but human curation should be enough for that.
Then again, HN checks for canonical URLs, it could check for canonical title, author and excerpt text in metadata as well. But just a bare title match would catch almost everything, especially during high velocity periods. Then again again, I think I mentioned this in the past and dang said they tried to find more robust solutions in software but false negatives and edge cases made it infeasible.
HN could prompt submitters to reply to an existing thread instead of posting a duplicate, if an open thread exists, while allowing the option to post. That should remove the possibility of the software rejecting a legit post due to a bad match. Having the process be entirely automated would probably be a bad idea.
Some sites have multiple versions of the same page. Like there would be a GitHub repo, and then a GitHub page residing on site.github.io that is usually the same thing. Also some projects eventually get an entire dedicated domain after residing on GitHub for a number of years.
If a story hasn't*had significant attention in the last year or so, then we don't treat it as a dupe, because it's important for good articles to get multiple chances at getting attention. Otherwise the randomness of what gets noticed on /newest would be even more dominant than it already is.
https://news.ycombinator.com/front
I kind of wish you could do hourly for the past day or so.
Presumably an exact URL match, and maybe within a timescale?
Let community moderate the itself. It's old school and maybe dumb but not everything needs smart ass AI.
I propose automatic de-dupe: whatever the title says, if the exact same URL has already been submitted, just count it as an upvote on the existing story... optional: if the submitted caption is different, then add a comment that says "also submitted by X with the caption `Y`."
I have seen my own articles submitted more than once.
The vast majority of dupes posted here are the same domain, site and title. Catching that would be as easy as a call to the Algolia API. I'm using the Hacker News enhancement suite add-on for Firefox and it does that to generate a list of prior threads. That leaves edge cases, but human curation should be enough for that.
Then again, HN checks for canonical URLs, it could check for canonical title, author and excerpt text in metadata as well. But just a bare title match would catch almost everything, especially during high velocity periods. Then again again, I think I mentioned this in the past and dang said they tried to find more robust solutions in software but false negatives and edge cases made it infeasible.
HN could prompt submitters to reply to an existing thread instead of posting a duplicate, if an open thread exists, while allowing the option to post. That should remove the possibility of the software rejecting a legit post due to a bad match. Having the process be entirely automated would probably be a bad idea.
At least now I know what I'm going to do this weekend.