Arctos: Force refresh of WoRMS (via Arctos)

Created on 11 Mar 2021  路  13Comments  路  Source: ArctosDB/arctos

@dustymc
Are you able to force a refresh of a genus or family in WoRMS (via Arctos) at my request? The genus Alycaeus
is in the family Alycaeidae. But an unknown portion of the WoRMS (via Arctos) classifications of this genus (and Dicharax, etc.) are still in Cyclophoridae. The only way I've been able to get them all in Alycaeidae is to manually refresh the ones I'm using which leaves the WoRMS (via Arctos) inconsistent. I didn't know it until I printed labels and they showed up in two different families.

Here's an example.

Alycaeus conformis - WoRMS

Alycaeus conformis WoRMS (via Arctos)

I don't think the hierarchical tool works for externally managed sources. Can I force a refresh of a family or genus with some type of taxon bulkload without having to manually enter all the aphiaIDs and names?

This relates to #3311 whether to use WoRMS (via ARctos) - slow to update - or WoRMS (via GlobalNames) - only updated every 60 days and cannot be updated directly from WoRMS with aphiaID. But in the meantime, it would be helpful to not have to manually refresh (and add) WoRMS names and classifications.

Enhancement

All 13 comments

It should be understood that #3311 and talking directly to WoRMS' API are not incompatible. We can do both.

3311 could have two impacts on this.

  1. If we can find a way to get WoRMS to give, and GN to accept, whatever you need then this - and maybe the other few hundred sources that go through GN - is trivial to handle in the future. Going through GN rather than talking directly to WoRMS seems a lot more sustainable to me, and so I think we should if we can, but if we can't make GN work then we can still go straight to the source (but it does require additional time, CPU, code, etc.).

  2. Some of the complexity is "translating." We have to figure out how to skip that if #3311 is going to work, and whatever we do there should also work for any other "non-local" source, wherever it comes from. Even if we keep talking directly to WoRMS, 3311 is likely to simplify doing so by eliminating the need to interpret.

The below should be set to refresh, that should happen in the next 800 minutes or so if nothing else pops up.

      ap.term,
      f.term,
      scientific_name
    from
      taxon_name
      inner join taxon_term ap on taxon_name.taxon_name_id=ap.taxon_name_id and ap.source='WoRMS (via Arctos)' and ap.term_type='aphiaid'
      inner join taxon_term g on taxon_name.taxon_name_id=g.taxon_name_id and g.source='WoRMS (via Arctos)' and g.term_type='genus'
      inner join taxon_term f on taxon_name.taxon_name_id=f.taxon_name_id and f.source='WoRMS (via Arctos)' and f.term_type='family'
    where
    g.term='Alycaeus'

I'll check it later today. Thanks for the SQL. @Jegelewicz can we add this to our cheat sheet?

@dustymc I'm not sure I understand what "translating" is so I leave that to your magic.

not sure I understand what "translating" is

I have code that tries to make WoRMS data align with https://arctos.database.museum/info/ctDocumentation.cfm?table=cttaxon_term. There are lots of gaps in various places in that process, and it's all hard-coded - when we add of change something the API call needs rebuilt, which often gets skipped. #3311 would (somehow) only require "our" terms for "locally-managed" taxonomy - it would stop you from typing "speeceez," but it would still accept that from "remote" sources, however we pull their data in.

https://github.com/ArctosDB/arctos/issues/3498 is a first-pass attempt at making those less-predictable data available from the catalog record.

Added to cheat sheet as

Update WoRMS (via Arctos) for a single genus

@Jegelewicz Thanks for adding to the cheat sheet. It might be better understood as "refresh" as that's the (current) term.

Changed!

For clarity, that's just a select. Writing to cf_worms_refreshed calls for a refresh. These seem to have caught up.

For clarity, that's just a select. Writing to cf_worms_refreshed calls for a refresh.

Sooooo - mere mortals can't really DO anything with it?

For clarity, that's just a select. Writing to cf_worms_refreshed calls for a refresh. These seem to have caught up.

So I just tried the SQL with Dicharax and it gave me a nice list of 73 records. I'll give them the same 800 minutes. But are you saying that I still have to manually refresh them or will this nudge them?

Yup, no mortals. That's still just a select - nudging requires more access. That could probably be made more accessible, but it's just a symptom of some other problem so not sure how I feel about that....

Check back in 140 minutes.

I think we're done here?

No. We may be done with this specific issue since there doesn't seem to be a way to refresh more than one taxon name at a time. The bigger issue remains - WoRMS (via Arctos) is not what is in WoRMS (marinespecies.org) and what is in WoRMS (via Global Names) is not what is in WoRMS.

Evidently we don't have the processing power to actually have the WoRMS database available within a reasonable time frame (1 week? 1 month?) in WoRMS (via Arctos) and using Global Names doesn't catch us up. I have the time to manage the taxonomy and it's not nearly as much work as before we added WoRMS (via Arctos), but we shouldn't promote Arctos as having the WoRMS taxonomy "built-in."

Right now, for the genus Chamalycaeus, we have 47 names in the family Cyclophoridae (former), 59 in Alycaeidae (correct) and total of 106. WoRMS has 225 because they include those with a subgenus and we don't get any of them. Global names hasn't caught up with the change in Dicharax from Cyclophoridae to Alycaeidae. The last change was made in November 2020.

If there's not much we can do about any of this, then, yes, we can close this issue.

Screen Shot 2021-03-17 at 11 10 07 AM


Screen Shot 2021-03-17 at 12 22 56 PM

Thanks, reopening. I think much of this needs dedicated Issues - it's scattered around, we hit on symptoms here and there, but I don't think we really have a place to get at the core of the issue.

don't have the processing power

It's more than CPU - it needs time, probably ongoing.

Global Names

A central question is if we can spend whatever we need on or through GN (which Arctos can easily talk to, and which would address related Issues in the future), or if we're just going to have to figure out how to maintain the connection ourselves, or ??? I don't think it's so much if but how we can best do this.

Chamalycaeus

Set to refresh. (That could probably be an app, but again I'd really like to get at the core of the problem instead of just treating symptoms.)

subgenus

There's an old issue that could be revived somewhere, from here this looks like a "research grade" problem - doing crazy "traditional" things to names and providing "research grade" data do not seem compatible to me. That doesn't mean we can't do something, but I'm not (yet, I hope) sure what that might be. "Get GlobalNames to figure it out" would be pretty cool....

Was this page helpful?
0 / 5 - 0 ratings

Related issues

alexkrohn picture alexkrohn  路  3Comments

dustymc picture dustymc  路  7Comments

ebraker picture ebraker  路  8Comments

Jegelewicz picture Jegelewicz  路  6Comments

AJLinn picture AJLinn  路  4Comments