Managing all animals in the "Arctos" classification is often problematic (Ex: http://arctos.database.museum/name/Cepolidae), and a bunch of plants-and-stuff that will surely find a way to clash sooner or later have been reintroduced by paleo collections.
Managing classifications in much smaller chunks avoids taxonomy-at-scale weirdness, but
1) Most collections have cataloged a few outliers and need taxonomy for them
2) I think everyone wants to pull expertise, which involves sharing a Source with the experts, which involves huge cumbersome groups of classifications
https://github.com/ArctosDB/arctos/issues/1852 would fix this: it doesn't matter which source a classification is in if you can select it individually, but I don't think we're realistically going to compile the data nor use a taxon concepts system.
From https://github.com/ArctosDB/arctos/issues/1852#issuecomment-484545346
Potential not-concepts solution [to the perceived homonym problem]: create "dynamic" sources which are based on collection-defined criteria and auto-refresh themselves periodically. Selection could cross sources, include things like taxon_status or various ranks, etc. Data would be managed in the shared (eg, "Arctos") Source(s) and the dynamic source would be refreshed from updates.
Dynamic sources would address the idea that the scale at which taxonomy is best managed and the scale at which taxonomy is used are not necessarily the same.
Simplest case, a teaching collection might pull relevant animals from "Arctos" and relevant plants from "Arctos Plants."
DMNS:Inv could
Someone or some coalition could manage any group (species+subspecies, family, phylum, 'stuff we need that isn't in some other source' eg land snails, etc.) in the system of their choosing (including Arctos, the Arctos Hierarchical Editor, a desktop app, a remote system like WoRMS, etc.), then anyone else could pull those data or parts of them into their "preferred" classification.
There are no real barriers to this; it will fit in the current structure, we just need some (complicated and expensive, probably) SQL-or-something to build and maintain the merged classifications.
Nobody would be forced into this; the capability would not necessarily change anything for any existing collection, it would just add the possibility of combining existing data.
There are potentially consistency problems - maybe the Murids source classification will include superfamily and the Cricetids source classification will not, resulting in inconsistent rodents - but I suspect that would still be more overall consistent than the current data (in which individual names are often outliers).
It is worth comparing the scale of taxonomy in Arctos with the scale of taxonomy used by collections here; dynamic classifications could result in much smaller datasets, which might support more discovery methods.
UAM@ARCTOS> select count(*) numberOfNamesInArctos from taxon_name;
NUMBEROFNAMESINARCTOS
---------------------
3,408,094
UAM@ARCTOS> select count(distinct(classification_id)) numberOfClassifnsInArctos from taxon_term;
NUMBEROFCLASSIFNSINARCTOS
-------------------------
15,842,709
UAM@ARCTOS> select count(distinct(taxon_name_id)) numberManagedTaxa from taxon_term where source='Arctos';
NUMBERMANAGEDTAXA
-----------------
1,707,718
UAM@ARCTOS> select count(distinct(taxon_name_id)) numberUsedTaxa from identification_taxonomy;
NUMBERUSEDTAXA
--------------
109,575
select
guid_prefix,
to_char(count(distinct(taxon_name_id)),'999,999,999,999') numberUsedNames
from
collection,
cataloged_item,
identification,
identification_taxonomy
where
collection.collection_id=cataloged_item.collection_id and
cataloged_item.collection_object_id=identification.collection_object_id and
identification.identification_id=identification_taxonomy.identification_id
group by
guid_prefix
order by
count(distinct(taxon_name_id))
17 ;
GUID_PREFIX NUMBERUSEDNAMES
-------------------- ------------------------------------------------
BYU:Herp 1
NMU:Para 1
MSBObs:Mamm 1
UAM:Art 1
UAM:Env 1
UCSC:Herp 1
UTEPObs:Ento 1
CHAS:Herp 2
UWYMV:Egg 2
DGR:Ento 3
NBSB:Bird 4
OWU:Para 5
KNWRObs:Fish 9
UAMObs:Fish 10
COA:Ento 13
MLZ:Herb 14
MVZObs:Mamm 16
MVZObs:Herp 19
COA:Herp 20
OWU:Fish 20
KNWR:Inv 22
MLZ:Egg 23
OWU:Bird 25
UTEP:Zoo 25
CHAS:Fish 26
UAMObs:Mamm 27
UAM:EH 28
DMNS:Herp 36
ASNHC:Mamm 39
UTEP:Arc 40
UWYMV:Herp 45
UAM:Herp 49
UTEP:Fish 51
WNMU:Fish 60
UCSC:Mamm 61
DMNS:Para 61
COA:Mamm 61
UWYMV:Fish 62
COA:Egg 79
OWU:Mamm 79
OWU:Rept 89
DGR:Mamm 92
UNR:Bird 99
UCM:Obs 115
UTEPObs:Herp 122
CHAS:EH 125
UCSC:Bird 128
UAM:Arc 141
UTEP:Teach 145
NMU:Bird 147
CHAS:Herb 148
NMU:Mamm 149
UNR:Herp 151
UAMObs:Bird 163
MLZ:Mamm 194
UNR:Mamm 211
UNR:Fish 232
COA:Bird 281
UWYMV:Mamm 287
MVZ:Hild 310
UTEP:Mamm 318
APSU:Herp 319
UTEP:HerpOS 330
USNPC:Para 343
OWU:ES 353
WNMU:Bird 422
MSB:Herp 453
UNM:ES 454
DGR:Bird 491
MSB:Fish 506
WNMU:Mamm 518
MVZObs:Bird 560
UWBM:Herp 574
ALMNH:ES 589
UCM:Egg 591
UTEP:Bird 595
UWYMV:Bird 599
UAM:Fish 627
UMZM:Mamm 639
UTEP:ES 665
CHAS:Mamm 675
UAM:Mamm 698
UMNH:Mamm 720
UAM:Alg 723
MSB:Para 730
DMNS:Egg 746
MSB:Host 802
UCM:Fish 804
UMNH:Herp 819
KWP:Ento 880
KNWR:Herb 895
CHAS:Teach 957
UTEP:Ento 963
UMZM:Bird 967
UWBM:Mamm 973
UAM:ES 986
DMNS:Mamm 991
CHAS:Egg 991
UMNH:Bird 1,074
UTEP:Herp 1,077
UTEP:Inv 1,191
KNWR:Ento 1,409
UCM:Bird 1,437
UCM:Mamm 1,448
CHAS:Bird 2,335
MLZ:Bird 2,411
DMNS:Bird 2,473
UAM:Bird 2,774
MVZ:Egg 2,965
UCM:Herp 3,160
MSB:Bird 3,183
UAM:Inv 3,199
MSB:Mamm 3,513
UAMb:Herb 4,064
HWML:Para 4,766
MVZ:Herp 5,284
CHAS:Inv 5,472
MVZ:Mamm 5,547
UAM:Ento 5,989
CHAS:Ento 6,774
UAM:Herb 8,240
UAMObs:Ento 9,739
DMNS:Inv 10,171
MVZ:Bird 11,137
UTEP:Herb 21,842
--- same data, different sort
select
guid_prefix,
to_char(count(distinct(taxon_name_id)),'999,999,999,999') numberUsedNames
from
collection,
cataloged_item,
identification,
identification_taxonomy
where
collection.collection_id=cataloged_item.collection_id and
cataloged_item.collection_object_id=identification.collection_object_id and
identification.identification_id=identification_taxonomy.identification_id
group by
guid_prefix
order by
guid_prefix
17 ;
GUID_PREFIX NUMBERUSEDNAMES
-------------------- ------------------------------------------------
ALMNH:ES 589
APSU:Herp 319
ASNHC:Mamm 39
BYU:Herp 1
CHAS:Bird 2,335
CHAS:EH 125
CHAS:Egg 991
CHAS:Ento 6,774
CHAS:Fish 26
CHAS:Herb 148
CHAS:Herp 2
CHAS:Inv 5,472
CHAS:Mamm 675
CHAS:Teach 957
COA:Bird 281
COA:Egg 79
COA:Ento 13
COA:Herp 20
COA:Mamm 61
DGR:Bird 491
DGR:Ento 3
DGR:Mamm 92
DMNS:Bird 2,473
DMNS:Egg 746
DMNS:Herp 36
DMNS:Inv 10,171
DMNS:Mamm 991
DMNS:Para 61
HWML:Para 4,766
KNWR:Ento 1,409
KNWR:Herb 895
KNWR:Inv 22
KNWRObs:Fish 9
KWP:Ento 880
MLZ:Bird 2,411
MLZ:Egg 23
MLZ:Herb 14
MLZ:Mamm 194
MSB:Bird 3,183
MSB:Fish 506
MSB:Herp 453
MSB:Host 802
MSB:Mamm 3,513
MSB:Para 730
MSBObs:Mamm 1
MVZ:Bird 11,137
MVZ:Egg 2,965
MVZ:Herp 5,284
MVZ:Hild 310
MVZ:Mamm 5,547
MVZObs:Bird 560
MVZObs:Herp 19
MVZObs:Mamm 16
NBSB:Bird 4
NMU:Bird 147
NMU:Mamm 149
NMU:Para 1
OWU:Bird 25
OWU:ES 353
OWU:Fish 20
OWU:Mamm 79
OWU:Para 5
OWU:Rept 89
UAM:Alg 723
UAM:Arc 141
UAM:Art 1
UAM:Bird 2,774
UAM:EH 28
UAM:ES 986
UAM:Ento 5,989
UAM:Env 1
UAM:Fish 627
UAM:Herb 8,240
UAM:Herp 49
UAM:Inv 3,199
UAM:Mamm 698
UAMObs:Bird 163
UAMObs:Ento 9,739
UAMObs:Fish 10
UAMObs:Mamm 27
UAMb:Herb 4,064
UCM:Bird 1,437
UCM:Egg 591
UCM:Fish 804
UCM:Herp 3,160
UCM:Mamm 1,448
UCM:Obs 115
UCSC:Bird 128
UCSC:Herp 1
UCSC:Mamm 61
UMNH:Bird 1,074
UMNH:Herp 819
UMNH:Mamm 720
UMZM:Bird 967
UMZM:Mamm 639
UNM:ES 454
UNR:Bird 99
UNR:Fish 232
UNR:Herp 151
UNR:Mamm 211
USNPC:Para 343
UTEP:Arc 40
UTEP:Bird 595
UTEP:ES 665
UTEP:Ento 963
UTEP:Fish 51
UTEP:Herb 21,842
UTEP:Herp 1,077
UTEP:HerpOS 330
UTEP:Inv 1,191
UTEP:Mamm 318
UTEP:Teach 145
UTEP:Zoo 25
UTEPObs:Ento 1
UTEPObs:Herp 122
UWBM:Herp 574
UWBM:Mamm 973
UWYMV:Bird 599
UWYMV:Egg 2
UWYMV:Fish 62
UWYMV:Herp 45
UWYMV:Mamm 287
WNMU:Bird 422
WNMU:Fish 60
WNMU:Mamm 518
Let's talk about how GloBi does taxonomy. See Enhydra lutris

which links to all of the various taxonomic sources. This is done through a resolver. Zenodo
Could we free ourselves from managing taxonomy in Arctos by using a tool like this?
Alternate approach which might be mostly functionally identical but require less development, processors, and sorta everything else:
collection.preferred_taxonomy_source's datatype is currently FKEY-->classification_source, which is interpreted as "use classification data from SOURCE, else fail with no cached classification data."
Converting to ordered array (supported by PG) would be interpreted as "use SourceA if exists, else use SourceB if exists, else use SourceC if exists, else fail with no cached classification data."
So for example a collection could...
Un-wishlisting this; this approach should be comparatively trivial to implement and would have significant impacts.
DMNS:Inv could just use (and perhaps help improve) the Arctos classification for things not in WoRMS.
Animal-centric paleo collections could fall back to Arctos Plants for plant material, which would stop the continual reintroduction of plants to the "Arctos" classification. Problems caused by homonyms in the same classification - and there are many thousands of them - are what caused us to split classifications in the first place.
Suggest prioritization; the single classification per collection is actively introducing potentially-problematic data.
I support this, with high priority.
On Thu, Aug 13, 2020 at 11:06 AM dustymc notifications@github.com wrote:
- [EXTERNAL]*
Un-wishlisting this; this approach should be comparatively trivial to
implement and would have significant impacts.DMNS:Inv could just use (and perhaps help improve) the Arctos
classification for things not in WoRMS.Animal-centric paleo collections could fall back to Arctos Plants for
plant material, which would stop the continual reintroduction of plants to
the "Arctos" classification. Problems caused by homonyms in the same
classification - and there are many thousands of them - are what caused us
to split classifications in the first place.Suggest prioritization; the single classification per collection is
actively introducing potentially-problematic data.—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/ArctosDB/arctos/issues/2231#issuecomment-673596882,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ADQ7JBGOKDJOTWMCW5UEUNLSAQMRXANCNFSM4IOXE42Q
.
DMNS:Inv could just use (and perhaps help improve) the Arctos classification for things not in WoRMS.
Totally agree this would be better than mucking up WoRMS (via Arctos) with names they don't have.
This is mostly functional and should be out tonight or possibly tomorrow. It will need documentation. I can demonstrate whatever you'd like to see in test, but https://github.com/ArctosDB/internal/issues/65 makes it difficult to see for yourself. There are two changes:
Manage collection looks like this:

which is interpreted as "if all taxa used in an identification have at least one Arctos classification then use that, if not check Arctos Plants, if not there then check Worms, if not there then we're at the end of the list so do nothing."
I hope this will lead to more smaller and cleaner classifications. CollectionA has a shrew taxonomist, so they start a "Soricidae according to us" classification and do cool things with a manageable number of taxa, CollectionB has a bat taxonomist so they do the same, all mammal-having collections then use
If CollectionC doesn't like what CollectionB has done with some bats, they can just create a "Phyllostomidae" classification and use...
This is a different viewpoint than originally laid out, but I believe it leads to about the same place - collections can "prefer" bits and pieces of multiple classifications, managers can deal with the 50 rabbits they really care about without being force-fed a million insects which are in the same classification for some reason, and then collections can use those well-curated rabbit data without also needing to somehow munge aardvarks in with it.
That also means the rabbit-manager cannot possibly "oops" those million insects, which are in a different compartment, so this could open up the possibility of a hierarchical (or otherwise simplified) editor which writes directly to the core tables.
This makes documenting sources - https://github.com/ArctosDB/arctos/issues/3019 - even more important.
Yay everybody?
If I understand correctly, for DMNS:Inv, we would first choose Source WoRMS (via Arctos) then Source Arctos.
Next, I would copy any classifications that I've created without an aphiaid in WoRMs (via Arctos) to Arctos and delete them from WoRMS (via Arctos). I would still be the person listed as "managed by." There would be no classifications in WoRMS (via Arctos) without an aphiaid.
If - and it has happened to about 500 names since we started using WoRMs (via Arctos) - WoRMS adds a new name, my identification would automatically switch to WoRMS (via Arctos) and show the new classification. Once a year or so, you could probably give me a list of names that list me as the "managed by" that now have a WoRMs aphiaid, so I could remove my name.
Sounds perfectly awesome and I'm on board. Will need to do a lot of documentation updating, probably at the same time as all the changes we're consolidating per your request #2695.
Yes, YAY!
Yes, essentially.
Falling back to "Arctos" isn't necessary - you can do that, or create something new, or whatever, but not being limited to one classification source is the big picture.
I'm advocating getting rid of "managed by" as a term altogether now that there's less reason to have giant all-encompassing classifications but whatever, it's not hurting anything, if it makes you happy then rock on!
I might eventually get around to advocating for the WoRMS classification to be purely service-managed, but we can talk about that when/if we get there.
identification would automatically switch
Yup.
Help. I tried to change our source selection by making WoRMS (via Arctos) 1 and adding Arctos as 2.

When I save it, it reverses the order

Neato, thanks!
I applied duct tape, should be doing what you want but I'll think about that form some more.
Thanks. I'll test out a few records and see if anything else needs taping.
I might eventually get around to advocating for the WoRMS classification to be purely service-managed, but we can talk about that when/if we get there.
Once this is working - I vote we do as Dusty suggests
As a test, this morning I took _Achatinella bryonii_ which isn't in WoRMS so there isn't been an aphiaid for it. I copied the entire classification that I had created in WoRMS (via Arctos) into Arctos and deleted the WoRMS (via Arctos) classification. It appears that the catalog record is able to find the correct classification but it doesn't show yet in the taxonomy page that the Source for DMNS:Inv for this particular name has changed to Arctos. Should that happen or will it take a while for it to change?

I'll update that. It's just a view of collection settings, nothing's broken....
@Nicole-Ridgwell-NMMNHS with this in place - I think we should set up a separate taxonomy source for geology stuff - I'll propose in a new issue once we have our data ready.
FWIW I sort of expect any diverse+active paleo collection is going to end up with about 20 taxonomy sources, assuming this is FINALLY the thing that gets people to managing taxonomy in Arctos.....
I don't see any problems with geology collections or mineral taxonomy or etc., but I suspect we're missing some tools - would be good to get that fleshed out ASAP, and of course real data always forges better tools.
I don't see any problems with geology collections or mineral taxonomy or etc., but I suspect we're missing some tools - would be good to get that fleshed out ASAP, and of course real data always forges better tools.
We have a working set of data and a plan that we are putting before a few geologists before we put it up for more community discussion. Should be a new issue soon...
Yay! I am excited about this. This will be great for our minerals and I'm looking forward to eventually building up a phylocode classification!
building up a phylocode classification!
If that means what I think it does, it's going to make us think about tools. A few examples of all the complexity that might be needed by any record would give me something to think about, should you happen to have some data hanging around....
@dustymc this is what we tentatively have for minerals, rocks and chemical elements. Have a blast.
Geology Taxonomy.zip
...
Excellent, thanks!
Here is a download of data for Ornithischia, excluding genus/species from the Paleobiology Database, it is a mix of ranked and unranked terms:
PBDB Ornithischia.zip
I think having something like the hierarchical taxonomy editor that would work for unranked terms would be essential.
Well, this is well timed! I'll let you all mention as needed in today's
discussion.
On Wed, Aug 19, 2020 at 9:33 AM Nicole-Ridgwell-NMMNHS <
[email protected]> wrote:
- [EXTERNAL]*
Here is a download of data for Ornithischia, excluding genus/species from
the Paleobiology Database, it is a mix of ranked and unranked terms:
PBDB Ornithischia.zip
https://github.com/ArctosDB/arctos/files/5097457/PBDB.Ornithischia.zip
I think having something like the hierarchical taxonomy editor that would
work for unranked terms would be essential.—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/ArctosDB/arctos/issues/2231#issuecomment-676499011,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ADQ7JBC54DF4HUJPPICS5OTSBPWDVANCNFSM4IOXE42Q
.
That seems to be purely hierarchical - there's a term with zero or one parents (and some metadata, sometimes). Those data could be managed in some hierarchical tool, and as long as we don't have a need to flatten them then writing back to Arctos should be fairly straightforward. I think it could even take the shape of a built-in editor, as long as there's some mechanism to prevent adding inconsistent data (eg by disallowing access to the single-record editor - https://github.com/ArctosDB/arctos/issues/1698).
OK, I have found a flaw in the system (maybe).
Check out https://arctos.database.museum/name/Aphlebia
The insect usage of the name has been declared a synonym, so is not "valid" but the plant usage is valid. I have cloned in both classifications from GBIF (insect to the Arctos source and plant to Arctos Plants) and created the synonym relationship. Here's the rub. ALMNH:ES uses Arctos as the preferred source, with WoRMS (via Arctos) and Arctos Plants in succession. This means that they are going to wind up with the Arctos classification (insect) even if they really mean the plant and in this crazy scenario, they could potentially have both in their collection. Also, the plant version is not a synonym with Phyllodromica, but it is going to look that way now.
Sigh.
You found a flaw in taxonomy, not Arctos....
potentially have both
There's not much of a taxonomy solution for that. Split the collection, use taxon concepts to clarify, ....
plant version is not a synonym with Phyllodromica, but it is going to look that way now.
Relationships help search. If you want to do more, then we need relationships between classifications (which means we need a completely different approach to how we treat classification data, which is hard to imagine happening without dedicated funding).
I figured we could create an ALMNH:ES source for stupid one-offs like this but only if their collection includes only insects OR plants...
There is a part of me that wants to say - taxonomists can't get their act together and I shouldn't have to fix that....
Most helpful comment
Well, this is well timed! I'll let you all mention as needed in today's
discussion.
On Wed, Aug 19, 2020 at 9:33 AM Nicole-Ridgwell-NMMNHS <
[email protected]> wrote: