Now if I see an occasional duplicate, it's usually just the same article, but posted in a different sub.
Like I said, the posters received a duplicate warning but decided to post it anyway, since it's a different sub, it's totally fine. Unless someone floods the same links in several subs.
The duplicate detection works both ways, by comparing the link in the post content and the URL field in the database.
I get all that.
The duplicate detection works both ways, by comparing the link in the post content
From what you said, I assume in a link post, the post url link is captured and all url links contained within the post content, whether in hyperlink form or not, are also captured within the db.
I've been under the impression this was not the case because there was a marked improvement once I used source urls in post links again. When using archived urls in the post link, I used to add the source url within a hyperlink in the post content but they seemed to be ignored because of the volume of duplication I was seeing at the time. Now that I usually use the source url as the post link, it's down to maybe a couple of duplicates a week. That's fantastic! And that's why I usually post source urls over the past year (with archive inside).
If the post content urls ARE being picked up for the db as you say, a couple of people here had to have been screwing with me with duplicate posts when I was posting archive urls in my post links, then they suddenly stopped about the same time I switched to source urls for my post links. That seems like a strange coincidence to be unrelated. I think they were getting the duplicate notice after I made the change (they were also creating source url post links). IIRC, I got fed up one day before changing and asked the poster. They told me they didn't get any notice when posting, meanwhile I had the source url in a hyperlink within the post content - and that event is what pointed me in this direction, that they weren't being captured in the url db. I didn't save the pms. Maybe they were bullshitting, didn't seem like it.
That's about as well as I can describe what I have seen in my user experience.
I can't explain the reason of the change, that code has been this way for years on now.
A quick experiment that I never got around to run could be designed with 3 posts testing 2 different urls within the post content; tests hyperlink and url capture from post content.
The first post's post content has one url set up as a hyperlink, one as a plain url.
Second post uses the hyperlink url from the first post content as its post link, if captured in db, should get a notice.
Third post uses the plain url from the first post content as its post link, if captured in db, should get a notice.
I'm too tired tonight, 2am here. Maybe I'll try it tomorrow ... or feel free!
(post is archived)