Jackett: [TorrentLeech.pl] Returning irrelevant results & returning no results when the release year is entered

Created on 1 Mar 2020 · 23Comments · Source: Jackett/Jackett

Environment

OS: Windows 10 64-bit

.Net Runtime: N/A

.Net Version: N/A

Jackett Version: 0.13.280.0

Last Working Jackett Version: N/A

Are you using a proxy or VPN? No

Description

Searching TorrentLeech. pl via Jackett returns a large number of irrelevant results. E.g. Try searching the indexer for 'The Morning Show'. The tracker will return 14 results, none of which is relevant. Would 'andmatch' solve this?

Logged Error Messages

N/A

Screenshots

N/A

Enhanced Log

N/A

Source

RoloSoze

All 23 comments

Jackett 0.13.405

garfield69 on 2 Mar 2020

👍1

@garfield69 I just realized that this tracker does not return any results when the year is entered in the search string. This is actually consistent with the search behavior via the website. Is this fixable?

RoloSoze on 23 Apr 2020

yes, we could use the keywords filter to strip out 4-digit strings.

garfield69 on 23 Apr 2020

but its not specific so it will do this for all titles, be they movies, series or music.
still want to strip it?

garfield69 on 23 Apr 2020

but its not specific so it will do this for all titles, be they movies, series or music.
still want to strip it?

I guess it'll work well most of the time? I can see it being a problem for something like the movie 1917 where the whole title only consists of 4 digits, but the for the most part, I think it'll work well.

Is there a way to only strip the 4 digits if they're preceded by text? This way titles like 1917 and 2001: A Space Odyssey will still work fine.

What do you think?

RoloSoze on 23 Apr 2020

so don't strip the year if its the first keyword? yes I think we can do that... i'll do some research.

garfield69 on 23 Apr 2020

so don't strip the year if its the first keyword?

Exactly. This way, in titles like 2019: After the Fall of New York, and 1990: The Bronx Warriors the digits won't be affected, and in titles like, The Exterminators of the Year 3000, removing 3000 won't impact the search results much since the rest remains intact.

RoloSoze on 23 Apr 2020

@ngosang Can something similar be done for AHD Issue A in https://github.com/Jackett/Jackett/issues/7829#issue-586750395? At least temporarily until it's fixed in the API.

RoloSoze on 23 Apr 2020

re_replace can use ^ to indicate beginning of the text. So something like ^\D.*(\d{4}) would say at least one non-digit character has to come between the beginning of the string and 4 digits in a row.

cadatoiva on 23 Apr 2020

👍1

The only issue I can think of, which I guess is very rare, is when the user searches for something like The 4400 which is a television series.

This is a case of at least one non-digit character that has come before 4 digits in a row, the 4 digits are mistaken for a year, are removed, and so now the search is only for the word The. The solution, if at all possible, is to remove the 4 digits if they are not the first keyword - as we mentioned before - AND they start with either 19 or 20 as all release years in cinema history do.

I'm bringing this up in case this needs to be extrapolated and implemented for other trackers that do not like the year in the search string, and from my experience, the only other one I know of is Awesome-HD.

RoloSoze on 23 Apr 2020

when we go looking for the year we can use a mask to narrow down the 4 digits to those most likely to be a year
((?:20|1[7-9])\\d{2})

garfield69 on 23 Apr 2020

👍1

Is there a need for the 7 & 8 in the range? 'Cause this might lead to stripping non-release year 4 digits.

RoloSoze on 23 Apr 2020

I just grabbed an example out of an exiting indexer to illustrate the possibility.
it could just as easily be 19|20d{2}

garfield69 on 23 Apr 2020

👍1

so give me an example of titles that you believe torrentleech.pl is choking on when the year is included?

garfield69 on 23 Apr 2020

so give me an example of titles that you believe torrentleech.pl is choking on when the year is included?

Little Women 2019 - Returns 0 instead of 8
Coco 2017- Returns 2 instead of 10
Avengers: Endgame 2019 - Returns 0 instead of 36
1917 2019 - Returns 6 instead of 15

On the contrary:
Titanic 1997 - All 4 results are returned

RoloSoze on 23 Apr 2020

I can fix everything but 1917 2019 since that is two sets of years and so the filter will leave them both in since the query does not begin with a non-digit.
still, better than nothing for the rest.

garfield69 on 23 Apr 2020

I can fix everything but 1917 2019 since that is two sets of years and so the filter will leave them both in since the query does not begin with a non-digit.
still, better than nothing for the rest.

That would be good enough, but if I understand correctly, only 1917 will be left in; 2019 will be removed because it is preceded by a numeric string. @cadatoiva mentioned that the check can be done/restricted only for non-numeric characters... Unless I misunderstood that last part (blame it on the lack of sleep) :D

RoloSoze on 23 Apr 2020

no, re-read
https://github.com/Jackett/Jackett/issues/7424#issuecomment-618169741

garfield69 on 23 Apr 2020

no, re-read
#7424 (comment)

Got it. Is it possible to instruct it that in case two sets of 4-digit numbers are found in the string, AND both of which are possibly years (i.e. 19xx or 20xx), remove only the latter set? Is it worth doing that, or will it be overly complicated?

RoloSoze on 23 Apr 2020

I'll mull it over in the next few days and see if I can improve it.

garfield69 on 23 Apr 2020

👍1

@garfield69 try this: [" +(?:19|20)\\d{2} *$", ""]