Arctos: Taxon Concepts (again)

Created on 18 Sep 2019  Â·  30Comments  Â·  Source: ArctosDB/arctos

Ref: https://github.com/ArctosDB/arctos/issues/1852#issuecomment-532857238

Goal

Add taxon concepts as an enhancement to, rather than a replacement for, the current identification-->taxon (by way of preferred classification) pathway.

Definition

For the initial purposes of this project, taxon concepts are defined as the intersection of names and publications (plus relationships between concepts).

The Immediate Plan

  • add a way to manage taxon concepts to Arctos. It may or may not get a UI at this time.
  • add a new foreign key-->taxon_concept_id to table identification

Longer-term

  • sprinkle concept data around; maybe smoosh them into classifications for display?
  • maybe do more with concepts, hopefully by plugging in to some project that's done more with concepts.

Usability Implications

Everybody

There will be a new field, which can be ignored, in identifications

People who want to use taxon concepts

There will be a new field which can reference taxon concepts from identifications, and some way to select concepts for it.

Open Question

Is this fundamentally different than id_sensu (identification.publication_id FKEY-->publication)? Both involve publications and taxa, albeit less-directly in the case of id_sensu. Perhaps id_sensu can eventually be merged into taxon_concepts; build them parallel to test the idea, don't lose sight of this.

Failing merge, we need to develop disambiguating documentation.

Who's using id_sensu?

  guid_prefix,
  count(distinct(identification.collection_object_id)) numberSpecimens,
  count(*) numberIdentifications
from
  collection,
  cataloged_item,
  identification
where
  collection.collection_id=cataloged_item.collection_id and
  cataloged_item.collection_object_id=identification.collection_object_id and
  identification.publication_id is not null 
group by
  guid_prefix
order by
  guid_prefix
 17  ;

GUID_PREFIX                          NUMBERSPECIMENS NUMBERIDENTIFICATIONS
------------------------------------------------------------ --------------- ---------------------
ALMNH:ES                                   4             5
CHAS:Bird                                  1             1
CHAS:Ento                                  4             4
CHAS:Mamm                                 17            17
DGR:Bird                                   6             6
DGR:Mamm                                   1             1
DMNS:Bird                                 17            26
DMNS:Mamm                                127               130
KNWR:Ento                                250               283
KWP:Ento                                  96            96
MLZ:Bird                                1134              1231
MSB:Bird                                  10            10
MSB:Fish                                   3             3
MSB:Host                                  24            24
MSB:Mamm                                4830              6010
MSB:Para                                 282               297
MVZ:Bird                                 308               308
MVZ:Herp                                3604              3724
MVZ:Hild                                   1             1
MVZ:Mamm                                3014              3276
MVZObs:Herp                                1             1
UAM:Bird                                 153               156
UAM:ES                                   749               811
UAM:Ento                                5153              5154
UAM:Fish                                   1             1
UAM:Herb                                 416               416
UAM:Herp                                   3             3
UAM:Inv                                   15            15
UAM:Mamm                                6027              7217
UAMObs:Ento                             1715              1716
UAMObs:Mamm                                7             7
UAMb:Herb                                  5             5
UCM:Herp                                   2             2
UMNH:Mamm                                171               171
UTEP:ES                                  856               897
UTEP:Ento                                 21            21
UTEP:Herp                                369               464
UTEP:HerpOS                                3             3
UTEP:Mamm                                 61            62

39 rows selected.
Function-TaxonomIdentification Grant funded Priority-Normal

All 30 comments

Great summary @dustymc, thanks. Here is a sketch of my thoughts for the the additional two tables and the link to ID. The asterisk indicates additions.

                                              =======================
                                              identification
                                              -----------------------
  ===================                         id (PK)
  taxon_concept *                             agent_id (FK)           ---->
  -------------------                         name_id (FK)            ---->
  id (PK)              <============+==+----  taxon_concept_id * (FK)
  publication_id (FK)  ---->        |  |      =======================
  name_id (FK)         ---->        |  |  
  ===================               |  |
                                    |  |
                                    ^  ^
      =======================       |  |
      taxon_concept_rel *           |  |
      --------------------          |  |
      id (PK)                       |  |
      from_tc_id (FK)           ----+  |
      to_tc_id (FK)             -------+
      relationship
      according_to_pub_id (FK)  ---->
      ========================

The taxon_concept_rel.relationship field could be an enum that includes set relationships: ‘includes’, ‘is included in’, ‘is congruent with’, ‘overlaps’, ‘intersects’, ‘is disjunct with’. Another field could possibly be created with the more vague synonymy terms: ‘is synonym of’, ‘is pro parte synonym of’, ‘is heterotypic synonym of’, ‘is homotypic synonym of’, etc.

See here for ongoing discussion of these terms by the @tdwg Taxon Names and Concepts group. The group is creating a new taxonomic names and usages standard, based on the older Taxon Concept Transfer Schema.

See here for an example of the kind of data that could go in the taxon_concept and taxon_concept_rel tables.

@camwebb do you have existing data or is that something that'll be created in Arctos?

If you do have data could you pass it along? If possible, building this around real data will probably stop some problems before they come to exist. I can provide a transfer site if it can't be attached here or emailed.

The model seems right, and I think the relationships (possibly in conjunction with a 'local' publication) could be used to merge "mini-concepts" into more complex concepts ("these 50 pubs all agree....").

I will need definitions for the relationship terms. I think they'll all be in one code table, and it'll be up to the user to use them properly (eg, not use something vague when you know something specific). Reasonable?

@dustymc At first we'll be generating data locally in our DB, and importing it into Arctos (pre-aligned with the names in Arctos). The best test data would be from the example above. I've substituted numeric keys and excluded all the names that don't match 'Arctos plants names' and pasted the 4 tables as plain text here. Let me know if this will not work and I'll wrangle it into another format.

@dustymc any progress?

Yea I've been playing around a bit when I can't stand to look at PG triggers and such any longer - I think the basics are more or less there, but I need some arctosified data to really know. http://arctos.database.museum/name/Claytonia%20koliana should now have some new toys that you can play with.

@dustymc Great to see this test/demo!

On the Taxonomy Committee phone call just now I discovered that in Arctos there is not a single occurrence of a name + author_string combination. That combination potentially lives in may places in the classification table, no? So there's not yet a name+author_string_id to link to, as per the diagram above.

It seems from the demo that you've dealt with this by using the name + original_publication as a 'place marker' in place of name + author_string and name + later_publication as place marker for subsequent taxon concepts dealing with the same original name. Seems like a good solution given the other tables, but... I'm not sure it'll do what we need it to do. The author strings in botany can get quite complex (e.g., Pulsatilla dahurica (Fisch. ex DC.) Spreng.) and can't be simply captured by a place marker publication (e.g., the Spreng. publication in this example). Also, the data won't come in this publication-centric form. We need a way to refer to the author string.

My suggestion is to form the concept out of a triplet: name_id, publication_id, "author_string". What do you think?

lives in may places in the classification table

For not-us data, yes. For local data, author + infraspecific rank auth + name should all be rolled up in display_name (which is autogenerated and Code-aware) when those data are available. http://arctos.database.museum/name/Pulsatilla%20dahurica#ArctosPlants

Yes I added "concept label" as a band-aid to display what I think you want to display. It could be pulled (and maintained, I think) from the collection's preferred classification?

What exactly do you mean by "publication-centric form"?

This is essentially implemented as a triplet (with a pkey)

UAM@ARCTOS> desc taxon_concept;
 Name                                  Null?    Type
 ----------------------------------------------------------------- -------- --------------------------------------------
 TAXON_CONCEPT_ID                          NOT NULL NUMBER
 TAXON_NAME_ID                             NOT NULL NUMBER
 PUBLICATION_ID                            NOT NULL NUMBER
 CONCEPT_LABEL                             NOT NULL VARCHAR2(255)

it's easy enough to relabel (or change the table at this point) CONCEPT_LABEL->author_string if that does what you're suggesting.

What exactly do you mean by "publication-centric form"?

Moot point now, since I see what you are doing, but to be clear, I think the data will be coming in in this form:

1. Concept = tmp_concept_id (int) + 
   name ( = arctos_name_id (int) +  author_string (string) ) + 
   publication (long string)

2. ConceptRelationship = from_tmp_concept_id (int) + 
   to_tmp_concept_id (int) + 
   relationship (string) +  
   according_to_publication (long string)

I like the solution you give above. Are you storing the complete name+author_string in CONCEPT_LABEL or just the author_string? CONCEPT_LABEL is a good choice of name.

The hard part of importing seems to be: 1) matching the incoming author_string to any existing display_name in Arctos plants (many will not match exactly), and 2) matching incoming publication strings to existing publication records in Arctos.

CONCEPT_LABEL

It's just free-text for now. I can probably do more if it makes something easier.

matching the incoming author_string to any existing display_name in Arctos plants (many will not match exactly),

Yes! It would be at least another proposal's work, but at some point we might consider hooking into Agents so we don't have to deal exclusively with strings.

matching incoming publication strings

Yea that's going to be sort of a nuisance too. Recent (and increasingly not-so-recent) publications should have DOIs so we might find some magic in Crossref's API or similar. Older publications are probably going to be a little ugly, but I think that's OK too - the cleanup has to start somewhere.

@ArctosDB/taxonomy asks: possible to relate concepts to classifications when they are different?

Hu? "Concepts" as currently defined in Arctos are the intersection of classifications and publications.

As discussed to on the call today, Dusty's additions for Taxon Concepts are working well. See this example for Claytonia arctica - scroll down to Concepts.

I asked Dusty to add the link from Identification to Concept (as a new field in Identification). Please weigh in here if you have any reservations.

Cam,

Nice !

But...
Claytonia arctica Adams sunsu Yurtsev 1981

What is sunsu? shouldn't that be sensu?

-D

On Wed, Feb 19, 2020 at 3:39 PM Cam Webb notifications@github.com wrote:

As discussed to on the call today, Dusty's additions for Taxon Concepts
are working well. See this example for Claytonia arctica
http://arctos.database.museum/name/Claytonia%20arctica - scroll down to
Concepts.

I asked Dusty to add the link from Identification to Concept (as a new
field in Identification). Please weigh in here if you have any reservations.

—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub
https://github.com/ArctosDB/arctos/issues/2267?email_source=notifications&email_token=ACFNUMYLKNKARPWVUSVRR7DRDXGN3A5CNFSM4IYD57DKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMKI7CI#issuecomment-588550025,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ACFNUM4JCA545PIHXFENNW3RDXGN3ANCNFSM4IYD57DA
.

--

+++++++++++++++++++++++++++++++++++
Derek S. Sikes, Curator of Insects
Professor of Entomology
University of Alaska Museum
1962 Yukon Drive
Fairbanks, AK 99775-6960

[email protected]

phone: 907-474-6278
FAX: 907-474-5469

University of Alaska Museum - search 400,276 digitized arthropod records
http://arctos.database.museum/uam_ento_all
http://www.uaf.edu/museum/collections/ento/
+++++++++++++++++++++++++++++++++++

Interested in Alaskan Entomology? Join the Alaska Entomological
Society and / or sign up for the email listserv "Alaska Entomological
Network" at
http://www.akentsoc.org/contact_us

Oops - typo. But I guess this shows up a minor limitation. The concept label is an HTML string that the user creates. E.g., <i>Claytonia arctica</i> Adams <i>sensu</i> Porsild 1974. So prone to errors.

Identification

If it looks like the core model is working I'll go ahead and add that to PG with the other ID changes I need to make. If necessary we can talk about patching it back in to production - hopefully we'll be in PG soon and won't have to....

typo

I'm up for clever ideas, both on the name/label itself and how things get displayed/work/whatever. The most obvious generated "concept name" - <i>Claytonia arctica</i> Adams <i>sensu</i> Porsild 1974 - isn't necessarily unique, and I'm running under the vague idea that "labels" would be cleverly named by Curators. I don't know how that'll line up with reality.

Can we force taxon concepts to be entered into a template form with the
and sensu and spaces pre- filled? That will at least eliminate those
potential typos. And then pull the name and pub from an Arctos drop-down as
in data entry?

On Wed, Feb 19, 2020, 11:16 PM dustymc notifications@github.com wrote:

  • UNM-IT Warning:* This message was sent from outside of the LoboMail
    system. Do not click on links or open attachments unless you are sure the
    content is safe. (2.3)

Identification

If it looks like the core model is working I'll go ahead and add that to
PG with the other ID changes I need to make. If necessary we can talk about
patching it back in to production - hopefully we'll be in PG soon and won't
have to....

typo

I'm up for clever ideas, both on the name/label itself and how things get
displayed/work/whatever. The most obvious generated "concept name" - Claytonia
arctica
Adams sensu Porsild 1974 - isn't necessarily unique,
and I'm running under the vague idea that "labels" would be cleverly named
by Curators. I don't know how that'll line up with reality.

—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub
https://github.com/ArctosDB/arctos/issues/2267?email_source=notifications&email_token=ADQ7JBA4TDB27D3IBHBBOALRDYG2TA5CNFSM4IYD57DKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMKYS7A#issuecomment-588614012,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ADQ7JBC656C6VTCT2752ML3RDYG2TANCNFSM4IYD57DA
.

m with the and sensu and spaces pre- filled?

We can do WHATEVER, including not use that format, or not use any consistent format, or not use any label at all, or ....

pull the name and pub

Those are data objects, they can't be a problem here.

I notice a lot of inconsistencies in the name "Adams".

pull the name
a lot of inconsistencies in the name

Err - maybe I'm lost. The taxon name is a data object and so easy/unambiguous/etc. The author name is a string. I could do magic with author_text, but...

name "Adams".

It's a string. Inconsistency is what strings do. If that's undesirable, string is the wrong datatype. We certainly don't have the resources to do anything about that, other than temper our expectations....

Unless we create all authorities as agents . . .

On Thu, Feb 20, 2020, 10:03 AM dustymc notifications@github.com wrote:

  • UNM-IT Warning:* This message was sent from outside of the LoboMail
    system. Do not click on links or open attachments unless you are sure the
    content is safe. (2.3)

pull the name
a lot of inconsistencies in the name

Err - maybe I'm lost. The taxon name is a data object and so
easy/unambiguous/etc. The author name is a string. I could do magic with
author_text, but...

name "Adams".

It's a string. Inconsistency is what strings do. If that's undesirable,
string is the wrong datatype. We certainly don't have the resources to do
anything about that, other than temper our expectations....

—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub
https://github.com/ArctosDB/arctos/issues/2267?email_source=notifications&email_token=ADQ7JBAOZHVFR5X4LDFTKADRD2STFA5CNFSM4IYD57DKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMO4VVA#issuecomment-589155028,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ADQ7JBHW4EJFIO6DFISAY7TRD2STFANCNFSM4IYD57DA
.

authorities as agents

https://github.com/ArctosDB/arctos/issues/2267#issuecomment-542927279

I've been hoping to shame someone who's had the $$ (and taxonomic focus) into doing that for quite some time. Maybe that's the wrong outlook and we should write a grant to do something ourselves. I'm still not crazy about the idea of inadvertently becoming a "taxonomic authority!"

Back to concepts, I can't add the ID link (=change structure) without breaking my migration scripts, so I'll hold off until directed otherwise and hope we quickly find a way to get to PG.

@campmlc: Can we force taxon concepts to be entered into a template form with the <i> and sensu and spaces pre- filled?

I agree here. The taxon name in the string could be auto-filled from the taxon name from the page, and the _sensu_ publication could be filled from the selected publication. This leaves only the author string 'free'. I think it's important to leave this free and _not_ import it from (and link it to) a classification, because there may be obscure authors that don't appear in any classification.

So the fields for a concept record would be, e.g., concept_id, taxon_name_id, author_string, publication_id, from which a generated label would be displayed: <i>taxon_name</i> author_string <i>sensu</i> publication

@dustymc: Back to concepts, I can't add the ID link (=change structure) without breaking my migration scripts, so I'll hold off until directed otherwise and hope we quickly find a way to get to PG.

Sounds good. No hurry.

I'm still not 100% comfortable with the idea that labels can be fully autogenerated - we can easily revisit that if I'm wrong - so I...

  • added an "author" field to create concept
  • set up a script to pull authors from classifications (and strings within them) and suggest them
  • added a "generate" button to the label

Hopefully that'll add up to two clicks and a publication pick most of the time.

Again, happy to revisit any or all of that if I'm being too paranoid.

Added taxon_concept_id references taxon_concept(taxon_concept_id) to identification

Rebuilt editidentification to use concepts

Added concepts to specimendetail

added concepts to "previousidentifications"

@dustymc I tried out the Taxon Concept Creator, and the Identify to Taxon Concept and they work well. Thank you! I think we now have full functionality to record any imported TCs and TCRels from our Flora of Alaska project, and to edit/manage them in Arctos.

Would you like me to edit the user manual to reflect this new functionality?

Please, and YAY!

@dustymc Finally made some edits to the Documentation wiki, but I seem to have lost write access to the Github repo. Could you please re-authorize me. Thanks.

@mkoo help?

@dustymc, @mkoo... ping :-)

@camwebb can you access the docs now?

@dustymc Yup. Changes pushed and appearing in handbook. Thanks.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dustymc picture dustymc  Â·  6Comments

mvzhuang picture mvzhuang  Â·  5Comments

acdoll picture acdoll  Â·  8Comments

acdoll picture acdoll  Â·  4Comments

mkoo picture mkoo  Â·  3Comments