That's because the posters decided to submit them anyway after getting a warning on submitting.
That duplication issue dropped immediately after I went back to posting source links and stuffing the archive in the post body. Now if I see an occasional duplicate, it's usually just the same article, but posted in a different sub. So I've been posting that way since. Seemed to be the way the code was intended to work (detect duplication) best.
Now if I see an occasional duplicate, it's usually just the same article, but posted in a different sub.
Like I said, the posters received a duplicate warning but decided to post it anyway, since it's a different sub, it's totally fine. Unless someone floods the same links in several subs.
The duplicate detection works both ways, by comparing the link in the post content and the URL field in the database.
I get all that.
The duplicate detection works both ways, by comparing the link in the post content
From what you said, I assume in a link post, the post url link is captured and all url links contained within the post content, whether in hyperlink form or not, are also captured within the db.
I've been under the impression this was not the case because there was a marked improvement once I used source urls in post links again. When using archived urls in the post link, I used to add the source url within a hyperlink in the post content but they seemed to be ignored because of the volume of duplication I was seeing at the time. Now that I usually use the source url as the post link, it's down to maybe a couple of duplicates a week. That's fantastic! And that's why I usually post source urls over the past year (with archive inside).
If the post content urls ARE being picked up for the db as you say, a couple of people here had to have been screwing with me with duplicate posts when I was posting archive urls in my post links, then they suddenly stopped about the same time I switched to source urls for my post links. That seems like a strange coincidence to be unrelated. I think they were getting the duplicate notice after I made the change (they were also creating source url post links). IIRC, I got fed up one day before changing and asked the poster. They told me they didn't get any notice when posting, meanwhile I had the source url in a hyperlink within the post content - and that event is what pointed me in this direction, that they weren't being captured in the url db. I didn't save the pms. Maybe they were bullshitting, didn't seem like it.
That's about as well as I can describe what I have seen in my user experience.
I can't explain the reason of the change, that code has been this way for years on now.
(post is archived)