Jackett: [TorrentLeech.pl] Returning irrelevant results & returning no results when the release year is entered

Created on 1 Mar 2020  路  23Comments  路  Source: Jackett/Jackett

Environment

OS: Windows 10 64-bit

.Net Runtime: N/A

.Net Version: N/A

Jackett Version: 0.13.280.0

Last Working Jackett Version: N/A

Are you using a proxy or VPN? No

Description

Searching TorrentLeech. pl via Jackett returns a large number of irrelevant results. E.g. Try searching the indexer for 'The Morning Show'. The tracker will return 14 results, none of which is relevant. Would 'andmatch' solve this?

Logged Error Messages

N/A

Screenshots

N/A

Enhanced Log

N/A

All 23 comments

Jackett 0.13.405

@garfield69 I just realized that this tracker does not return any results when the year is entered in the search string. This is actually consistent with the search behavior via the website. Is this fixable?

yes, we could use the keywords filter to strip out 4-digit strings.

but its not specific so it will do this for all titles, be they movies, series or music.
still want to strip it?

but its not specific so it will do this for all titles, be they movies, series or music.
still want to strip it?

I guess it'll work well most of the time? I can see it being a problem for something like the movie 1917 where the whole title only consists of 4 digits, but the for the most part, I think it'll work well.

Is there a way to only strip the 4 digits if they're preceded by text? This way titles like 1917 and 2001: A Space Odyssey will still work fine.

What do you think?

so don't strip the year if its the first keyword? yes I think we can do that... i'll do some research.

so don't strip the year if its the first keyword?

Exactly. This way, in titles like 2019: After the Fall of New York, and 1990: The Bronx Warriors the digits won't be affected, and in titles like, The Exterminators of the Year 3000, removing 3000 won't impact the search results much since the rest remains intact.

@ngosang Can something similar be done for AHD Issue A in https://github.com/Jackett/Jackett/issues/7829#issue-586750395? At least temporarily until it's fixed in the API.

re_replace can use ^ to indicate beginning of the text. So something like ^\D.*(\d{4}) would say at least one non-digit character has to come between the beginning of the string and 4 digits in a row.

The only issue I can think of, which I guess is very rare, is when the user searches for something like The 4400 which is a television series.

This is a case of at least one non-digit character that has come before 4 digits in a row, the 4 digits are mistaken for a year, are removed, and so now the search is only for the word The. The solution, if at all possible, is to remove the 4 digits if they are not the first keyword - as we mentioned before - AND they start with either 19 or 20 as all release years in cinema history do.

I'm bringing this up in case this needs to be extrapolated and implemented for other trackers that do not like the year in the search string, and from my experience, the only other one I know of is Awesome-HD.

when we go looking for the year we can use a mask to narrow down the 4 digits to those most likely to be a year
((?:20|1[7-9])\\d{2})

Is there a need for the 7 & 8 in the range? 'Cause this might lead to stripping non-release year 4 digits.

I just grabbed an example out of an exiting indexer to illustrate the possibility.
it could just as easily be 19|20d{2}

so give me an example of titles that you believe torrentleech.pl is choking on when the year is included?

so give me an example of titles that you believe torrentleech.pl is choking on when the year is included?

Little Women 2019 - Returns 0 instead of 8
Coco 2017- Returns 2 instead of 10
Avengers: Endgame 2019 - Returns 0 instead of 36
1917 2019 - Returns 6 instead of 15

On the contrary:
Titanic 1997 - All 4 results are returned

I can fix everything but 1917 2019 since that is two sets of years and so the filter will leave them both in since the query does not begin with a non-digit.
still, better than nothing for the rest.

I can fix everything but 1917 2019 since that is two sets of years and so the filter will leave them both in since the query does not begin with a non-digit.
still, better than nothing for the rest.

That would be good enough, but if I understand correctly, only 1917 will be left in; 2019 will be removed because it is preceded by a numeric string. @cadatoiva mentioned that the check can be done/restricted only for non-numeric characters... Unless I misunderstood that last part (blame it on the lack of sleep) :D

no, re-read
#7424 (comment)

Got it. Is it possible to instruct it that in case two sets of 4-digit numbers are found in the string, AND both of which are possibly years (i.e. 19xx or 20xx), remove only the latter set? Is it worth doing that, or will it be overly complicated?

I'll mull it over in the next few days and see if I can improve it.

@garfield69 try this: [" +(?:19|20)\\d{2} *$", ""]

https://en.wikipedia.org/wiki/List_of_television_programs:_numbers
https://en.wikipedia.org/wiki/List_of_films:_numbers

I'll use some titles from these lists to test later when I get the chance for both TL.pl & AHD in #8342. Thought you guys might find it handy.

Jackett 0.16.127

Was this page helpful?
0 / 5 - 0 ratings

Related issues

RoloSoze picture RoloSoze  路  4Comments

whitesnakeftw picture whitesnakeftw  路  3Comments

rebekah65 picture rebekah65  路  4Comments

RoloSoze picture RoloSoze  路  4Comments

Corjaantje picture Corjaantje  路  3Comments