Sonarr: Indexer: animetosho

Created on 16 Jul 2016  路  29Comments  路  Source: Sonarr/Sonarr

Hey,

i was wondering if there is a chance to incorporate https://animetosho.org/ as a new Indexer for Anime, since Fanzub and it's fork seem to be gone for good

indexer

Most helpful comment

@xelra It simply won't work without develop. The url will work without develop, tnx to the nabapi->api change, but there are some things Sonarr didn't handle about the response that would likely get it rejected.
See my list of 'Sonarr issues' in the earlier post.

All 29 comments

this probaly wont get added to sonarr because it does not have a api and is based around data form the anibdb with sonarr does not support.

The fact that it doesn't have an API definitely reduces the chances of us implementing it, but I also didn't see anything for RSS either, which would be the bare minimum that we'd require to support it. Assuming I missed it, if someone points it out we can consider it, otherwise we'll close this out.

Hi, sorry for posting in a closed issue, but I thought this might be related.

A user has been using AnimeTosho's RSS feed in Sonarr, but has mentioned a few issues. Unfortunately I haven't been able to get much information about it, so I was wondering whether you guys would be able to help. I have tried doing some research myself, but haven't spent too long on it, so please pardon me if there's something I should've known.

  1. He mentions that the "file size" isn't being displayed in the RSS. I believe that the RSS specification only allows specifying the size of the 'enclosure' (i.e. .torrent file, not the files within). Would this "file size" actually refer to the size of the torrent file?
  2. He asks about a search API (which I assume is related to the API you speak of here), however I am unable to find any specifications on how such an API should be implemented. Is there any such specification available?

A few other things:

  • AT's RSS includes both NZB and torrent files as enclosures, but it appears that Sonarr only allows for torrent RSS. Is there a NZB RSS option I've missed?
  • Unfortunately most traditional info sources you may be using generally don't cover anime particularly well. AniDB does link to several other sources, though not sources that are non-anime friendly unfortunately. What info sources are supported in Sonarr?

    • Regardless, is there any way to specify an info source in the API mentioned above? Or does one use tags in the ATOM feed or similar?

  • AT may not exactly be an ideal indexer. For torrents, Nyaa is probably more complete (does lack some TokyoTosho sourced files, but you could combine with TT's RSS). For NZB, it only goes back a few months for now, as AT doesn't go back and re-post historical content; I'm sure there's better usenet indexers around.

Thanks for creating the issue!

At work atm but I'll go a bit more in depth when I get home.

Ideally you would implement part of the newznab/torznab api, which is basically a rss-based api with filter/query options.
Also, rss allows for additional elements in the feed, we have several example feeds available, I'll link them later.

I appreciate your interest, so I've reopened the issue.

Ok, first, rss feeds:

  • There's only a Torrent Rss provider, mostly because almost all usenet indexers support newznab as standard.
  • Example rss feed, as I mentioned the <size>123456</size> will be recognized as such.
  • In your current rss feed the size in the description field gets parsed. However, since it has 3 decimals our regex goes wrong, parses it wrongly, I'll get that fixed. The description field is not very reliable though. <size> is preferred.
  • The double enclosure isn't something we support, but it happens to work coz the torrent enclosure is first one in the feed, but I'll look into adjusting our code to be more resilient.

Second, newznab/torznab:
https://github.com/Sonarr/Sonarr/wiki/Implementing-a-Torznab-indexer (I really should update that page)
You can make it as easy or as complicated as you want. The minimum newznab/torznab api needs to support ?t=caps and ?t=search&q={wildcard query}. Using wildcard queries isn't ideal, but it's what we use for anime atm.
As I mentioned earlier, it's basically an rss feed with filtering capabilities.
Some example feeds newznab and torznab. As you can see, the output is basically rss with some additional attributes and elements.

We would prefer to one day support ?t=tvsearch&q={title}&abs={absolute episode number}, but we first have to agree on a spec with the devs behind newznab.

Even beter would be based on tvdbid/anidbid but that's not feasible atm. The benefit of these kind of searches is that it's can use database indexes to make searching much more efficient, compared to wildcard text searches. I did notice that AnimeTosho has some Anidb references, so it may be interesting to explore in the future.

Thanks a lot for the information!
I probably should've made it clearer that I was only looking for information for now - changes may or may not ever actually get implemented. In other words, I can't guarantee anything will come out of this, but I appreciate all the help you've provided anyway.

RSS feeds: thanks for explaining that. I can add the additional elements, though it doesn't look standard from the RSS specifications as they're not in a namespace, but probably doesn't really matter.
This has the size and info_hash parameters added.

The API, on the other hand, seems like it could require some work, particularly the wildcard search feature (if possible at all). I couldn't find any information on the wildcard specifications though, so I can't say whether it would work - is there documentation around that? Newznab doesn't seem to mention anything about wildcards, so maybe that's easier to implement.
Again, not sure if this will get implemented. AT probably isn't the best indexer after all.

Thanks again for the help.

I just wanted to be sure you got all the information, in case you wanted to implement an api.
I already fixed the parsing of the animetosho rss feed, but haven't pushed that fix out yet.

It is true the rss spec doesn't really allow a <size> element. However, it does allows additional elements as long as they have their own namespace, such as <ext:size>, where ext is a custom namespace). Which is why newznab has <newsnab:attr ... />. Similar to what kickass did, based on the ezrss specification.
You could even add em as <tosho:size> and define your own xmlns:tosho="..." namespace, Sonarr doesn't care much for namespaces and will pick that up too.

The API definitely requires a bit of work and it depends entirely on whether you want to have search capability on your site. That's your decision, not ours.
If you want to add search capability, then using newznab/torznab has the benefit that several apps (Sonarr, Sickrage and potentially some others) support that api already.

'wildcard search' is fairly simple: q=Dragon+Ball+Super returns all entries with that in the name. Sonarr currently uses q=Dragon+Ball+Super+-+053 for example, to query for specific anime episodes. Implementing it is more tricky coz it requires full text searches on the database.
_(aid=11210&abs=53 would be better for db performance. The devs behind newznab are open to the idea of adding abs/absnr as query param, we just didn't get around to actually doing that.)_

If you decide to add an api in the future, please feel free to contact me, I'd be happy to answer any questions.

I already fixed the parsing of the animetosho rss feed

Awesome!

Sonarr doesn't care much for namespaces and will pick that up too.

Ah, didn't know that, thank you for the info and example.

'wildcard search' is fairly simple

Ah, that sounds more like keyword based search. I thought that wildcards would include characters with special meanings like partial* or similar. Thanks for the clarification.
Some full text indexers do have a minimum word length limit though, so episode numbers may get completely ignored during search. Scrap that, I don't think AT has this limit any more.

Keywords seem doable. I do have a problem handling search + paging though, but that's something for me to figure out.

Yes, 'keyword search' is a better term to describe it :smile:

PS: One important detail is the /api?t=caps endpoint, it returns an xml document describing the capabilities of the indexer as well as it's categories.

Maybe we can use https://www.tokyotosho.info/rss.php?filter=1 at least for RSS in the meantime.

EDIT:
LOL, I commented on the wrong issue. Well, not completely wrong.

I haven't had the opportunity to test it yet, but to keep this issue updated, I've added a Newznab/Torznab API here.

@animetosho I did some tests.

Here is some feedback:

  • Could you make sure the url ends with /api? most clients tack that after the specified url. So if the user specifies http://animetosho.org/feed/nab, the app will query http://animetosho.org/feed/nab/api?t=search.
    I've made some changes in Sonarr to allow the user to override the endpoint to /nabapi, but having the api on /api offers better compatibility and is less user error prone. (http://animetosho.org/api would be ideal, but something like http://animetosho.org/feed/nab/api is ok)
  • Consider adding 5070 as category and adding the releases to that category as well. 5070 is the nzedb category for Anime. You can safely add multiple <...:attr name="category" ... elements. obviously the system should then include the release in the response if &cat=5070 is specified during the api call.
  • Similarly, if I query with &cat=1234, the response includes all releases. It should return an empty list.
    cat is a comma separate list of categories that should be included. no cat query param means everything. So if I query with &cat=1234,100001, I expect to see all results since those releases are in cat 100001.
  • the caps supportedParams does not need limit,offset, coz support for those is implied and mandatory.
  • I assume you have the item elements item.size and item.info_hash for backward compatibility with the existing rss feed, but strictly speaking it's against the rss specification to include custom non-namespaced elements.
  • The <...:attr name="guid" ... isn't needed. Remove it in favor of the already present rss item.guid element.
  • Consider including the media size as length value in the enclosures. Sonarr will favor the separate attr size, but some clients might care.
  • The rss contains a commented out <!-- <category></category> --> line, even though category is supplied separately.
  • Hint: You can choose to hide certain attributes unless the &extended=1 query param is supplied, it's a flag indicating that the user wants all available attributes.
    Base attributes are size and category, and should be supplied in all cases.
    But magneturl,infohash,seeds,peers,files doesn't really need to be included for a basic response and might help cut the size down or simplify the database call. That's your choice, if it doesn't matter feel free to leave them all included.

Some issues in Sonarr that I'll fix:

  • Some releases have no enclosure with nzb links, this is a scenario that Sonarr wasn't accounting for and would reject the feed. I've made the necessary changes but didn't push a release yet.
  • Manual Search worked, but grabbing an nzb download resulted in a torrent, since the guid for both releases were the same and the UI used that to select the release. I've fixed that too in Sonarr.

Thanks a lot for the feedback! I've implemented most of your suggestions.
By the way, is there a guide/specification which mentions all of that? Much of the information you've given me, I couldn't find in any of the specification documents listed above, and I wouldn't be surprised if other implementors aren't aware of all that.

I've changed the URL to https://animetosho.org/feed/api

Similarly, if I query with &cat=1234, the response includes all releases. It should return an empty list.

I've done this, but does the specification mention what should happen if a client queries a category that isn't listed in the \t=caps? I was under the impression that clients shouldn't be doing this, and hence, undefined behaviour is okay...

the caps supportedParams does not need limit,offset, coz support for those is implied and mandatory.

Is there such a list of mandatory items available? For one, I presume that support for the cat parameter is also mandatory (above point).

I assume you have the item elements item.size and item.info_hash for backward compatibility with the existing rss feed, but strictly speaking it's against the rss specification to include custom non-namespaced elements.

I didn't want to have to define a custom xmlns, but I suppose those two can just be removed completely.

Consider including the media size as length value in the enclosures. Sonarr will favor the separate attr size, but some clients might care.

I thought that the size there refers to the size of the enclosure, as opposed to the media size, or is that not the case?

Thanks again for the suggestions and changes!

@animetosho Thank you very much for implementing a torznab api. I wanted to make you aware that recently, in the course of implementing torznab for nyaa.si, there has been an addition to the torznab spec. It now supports tags, which can be used to reflect the "hide remake" and "trusted" filters.

Please check out @Taloth's comment here https://github.com/nyaadevs/nyaa/pull/108#issuecomment-302964725. He's probably soon gonna tell you more about it, because it's not written down yet in the docs.

Tnx for the url change, that'll make it easier for users to configure.

Unknown categories: Clients shouldn't do it, yes, but in these cases the Robustness Principe applies. So it's wise to handle the case, especially since the default configured category is not 100001.
And, I regretfully admit, Sonarr doesn't actually check if the cat is in the caps response. I suspect many clients don't coz it involves additional api calls. We've been meaning to add an interactive dropdown to the UI so the user can select the desired category, instead of having to type in the number.

Required attributes & parameters: The newznab spec says that size & category are always returned. But it only specifies which query params are optional _for the client_ to use. This is because the newznab spec was written for clients, not servers and since newznab indexer backend obviously supports all the parameters from it's own specification.
I really should write a more comprehensive document explaining the specification from both client and server perspective. That would make it easier for others to implement it.

size & info_hash elements: So they're a left-over from the rss feed code. You can leave em if you like, but I don't think it's useful.

Enclosure length: I think, strictly speaking it should be the size of the torrent file, not the media. It's a difference between newznab and the rss specification.
I guess your choice of keeping it zero is the most accurate option. It's not really a problem.
I think for random rss feeds, Sonarr actually rejects enclosure lengths < 4 MB and tries to parse the description instead (many rss feeds include the size as human readable in the description)

@xelra I've not actually 'added' the query param tag or it's attribute yet.

My idea was to have stuff like:

<torznab:attr name="tag" value="remake" />
<torznab:attr name="tag" value="trusted" />
multiple allowed, like category.

And query param tag=!remake,trusted (filter out any torrent with a remake tag _and_ only include those with a trusted tag). Unlike category this is an 'and' operator.
The idea is that tags like that are more generic than something like a quality_filter={number}. The query param comma separated and negation logic should translate easily enough to indexable db columns.

Certain newznab sites have added their own query params specific for their site. It's not disallowed. I just hope we can do something that makes sense enough to be usable for multiple sites.
Similarly, you're free to add query param source to filter on that, but no application will automatically handle it (In Sonarr the user can specify additional query params to append). Any custom query param should be chosen such that it doesn't interfere with other ones.

BTW, I've tried reaching the api by entering https://animetosho.org/feed into Sonarr, but it doesn't work. Is that because of the url base?

I know that the develop version of Sonarr has some fixes, I just thought that the previous would have worked, after the change to https://animetosho.org/feed/api.

@xelra It simply won't work without develop. The url will work without develop, tnx to the nabapi->api change, but there are some things Sonarr didn't handle about the response that would likely get it rejected.
See my list of 'Sonarr issues' in the earlier post.

Thanks again @Taloth for the answers/info - it makes a lot more sense to me now!

Seems like what I have should be fine for now then. I may revisit if something more is developed with tagging.

@animetosho I'm just trying out your feed and it works nicely. I noticed that on the site you have seeder/leecher stats, but in the feed there are none.

It would be really great if you could add those to the feed too.

I've added them as Torznab attributes. Unsure whether it'll show up for you though.

Yup it's working. However, not all releases have those attribute, it looks like the info is pulled from the 'bold' trackers on the site, so if the bold tracker has 'Scrape failed' there is no info despite other trackers having plenty of seeders.
Not sure if that was intentional.

Yes, the stats are pulled from the primary tracker listed in the torrent.
I'm not sure what tracker should be selected if the primary isn't working - do you know what's the typical solution for that?

@animetosho I would show the ceiling of all trackers available. Doesn't that make sense? Let's say it's {151,nil,nil,213}, then it should show 213.

What if the maximum value for seeders/leechers are on different trackers? Show max for each, even if scrape times could be completely different?

Yes, max(trackers.seeders) & max(trackers.leechers) separately, if none of the trackers have a value (scrape failed) then omit the related torznab:attr element from the result.

It's not entirely accurate anyway given DHT. It's just to provide an indication of how well the torrent is seeded, and more importantly _if_ it's seeded. So we're mostly interested in the order of magnitude, so a max seems to cover that nicely.

Btw. Would you be ok if we add animetosho to the preset list in Sonarr for easier selection? This doesn't mean it's enabled for all Sonarr instances, it's only easier to select by the user. But it'll allow us to set the proper api url and categories.

PS: Sonarr will honor the http status code 429 Retry-After, in case things get too hot. Not that I'm expecting significant loads, but you never know.

Okay thanks for the clarification, I've changed the feed to do that now (just does require a second query to calculate the max :/).

Feel free to add links wherever you please, it doesn't bother me. AT never sends back 429 responses (and it's not clear how you'd judge when it's appropriate to do such a thing). Generally if things aren't working, you'll get a 502, 503 or 504 HTTP response, or just none at all.

If it takes an extra query, then feel free to leave it at the original logic. It's not really worth the additional db load.

429 is mostly used to rate-limit particular clients/IPs addresses if they make disproportionate amount of queries. Sonarr also responds other errors by temporarily backing off, but with 429 you have more control over how long it waits.

Nah it's fine, the server's overpowered anyway =)

Thanks for the info though!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

PiscisSwimeatus picture PiscisSwimeatus  路  3Comments

pimlie picture pimlie  路  4Comments

mabasic picture mabasic  路  3Comments

markus101 picture markus101  路  4Comments

sam3d picture sam3d  路  3Comments