I now have one podcast that seems to get duplicate entries all the time. It seems like even the daily scheduled updates may cause the problem because every day the duplicates appear.
I also now know why.
What is happening is that there is a prefix number attached early on into the URL. Here's an example from the database:
- Code: Select all
(NOTE that I've modified the URLs to avoid comments on the podcast content itself instead of focusing in the problem. They only represent the entries, not the actual entries.)
When you look at the three entries above, you can see that that every day there is new numerical identifier at the beginning of the URL, but other than that, the episode content is exactly the same. Right now, Madsonic downloads the new episode right away, but what I haven't tried is to see if an older episode with an older URL can still be downloaded. I will let the duplicates add up to see if that's the case.
What I don't have is an easy solution proposal to this. I need to fully understand the implications of using the older URLs (maybe that's why the account was shut down in the past...trying to use the older URLs may have triggered something). Basically my solution approach before analysis would be to compare the latest rss feed with the database and change the existing URL entries to the latest if the other fields match. This would probably require an option on the podcast screen to allow "deeper analysis" of the rss feed beyond the existing method (which seems to work most of the time so far) so that not all podcasts need the extra scrutiny. However, even that might not work if the URL has changed without re-checking the rss feed.
I'll report back as I analyze this more.