Arctos: JSONify everything in FLAT

Created on 2 Jul 2019  Â·  4Comments  Â·  Source: ArctosDB/arctos

FLAT (=specimen summary) and everything that uses it (primarily DarwinCore files) contains complex data as strings. We now have an "Arctos JSON Standard" (http://handbook.arctosdb.org/documentation/json.html). We should use it to make various concatenated things more digestible/less ambiguous to more users.

Some things that are currently strings in FLAT

collectors
preparators
genbank numbers (but https://terms.tdwg.org/wiki/dwc:associatedSequences)
other IDs
relationships (https://github.com/ArctosDB/internal/issues/11)
citations
attributes
encumbrances (probably best left as-is)
media
previousidentifications (but https://terms.tdwg.org/wiki/dwc:previousIdentifications)

Priority-Low

Most helpful comment

Would it melt DwC? No. Would people be able to understand it? Partially. The keys are a bit esoteric. If those are spelled out with apt labels it would help. Full-text searches on the recordedBy field would still recover relevant records, just as now. It doesn't exactly fit the semantics of recordedBy, but it does contain everything that recordedBy is meant to contain. For botanical specimens, would you fill the name with the official botanical collector name? The URL to the agent is pretty cool - Arctos has its own Agent authority with lots of info. If orcIDs and foaf entries were in that authority (accessible via API) you'd have the beginnings of linked open data that could participate more broadly than just Arctos.

All 4 comments

whoa- that's cool.

On Tue, Jul 2, 2019 at 9:42 AM dustymc notifications@github.com wrote:

FLAT (=specimen summary) and everything that uses it (primarily DarwinCore
files) contains complex data as strings. We now have an "Arctos JSON
Standard" (http://handbook.arctosdb.org/documentation/json.html). We
should use it to make various concatenated things more digestible/less
ambiguous to more users.

Some things that are currently strings in FLAT

collectors
preparators
genbank numbers (but https://terms.tdwg.org/wiki/dwc:associatedSequences)
other IDs
relationships (ArctosDB/internal#11
https://github.com/ArctosDB/internal/issues/11)
citations
attributes
encumbrances (probably best left as-is)
media
previousidentifications (but
https://terms.tdwg.org/wiki/dwc:previousIdentifications)

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/ArctosDB/arctos/issues/2141?email_source=notifications&email_token=AATH7UJ27UBIXSY7NOJM353P5OAQHA5CNFSM4H454QP2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G453WYA,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AATH7UJE2LSO37YW3ASQUQDP5OAQHANCNFSM4H454QPQ
.

@tucotuco @dbloom DWC has collectors and preparators - Arctos has more, and much more information than the strings we're sharing now.

Would for example something like....

select concatCollectors_JSONTEST(60674) from dual;
[
   {
      "AN":"Thomas V. Schumacher",
      "CR":"collector",
      "CO":"1",
      "MI":"http://arctos.database.museum/agent.cfm?agent_id=2775"
   },
   {
      "AN":"K. Rutledge",
      "CR":"collector",
      "CO":"2",
      "MI":"http://arctos.database.museum/agent.cfm?agent_id=3606"
   },
   {
      "AN":"Damon R. Bender",
      "CR":"preparator",
      "CO":"3",
      "MI":"http://arctos.database.museum/agent.cfm?agent_id=310"
   }
]

in collectors (lacking better ideas...) instead of the current...

UAM@ARCTOS> select collectors,preparators from flat where collection_object_id=60674;
Thomas V. Schumacher, K. Rutledge
Damon R. Bender

... melt DWC?

In general, is this a dumb idea - eg, would it make things less accessible since everyone else is just sending (and perhaps storing) strings?

If we are going to be slinging URLs around in data, maybe we should check into buying a short domain name or grabbing some ARKs or something for that purpose - http://arctos.database.museum/agent.cfm?agent_id= repeated 12 times for http://arctos.database.museum/guid/CHAS:Herp:1998.3.1 would be about 600 bytes.

UAM@ARCTOS> select concatCollectors_JSONTEST(collection_object_id)from flat where guid='CHAS:Herp:1998.3.1';
[
   {
      "AN":"Paul G. Heltne",
      "CR":"collector",
      "CO":"1",
      "MI":"http://arctos.database.museum/agent.cfm?agent_id=21296753"
   },
   {
      "AN":"Jean Linsner",
      "CR":"collector",
      "CO":"2",
      "MI":"http://arctos.database.museum/agent.cfm?agent_id=21319390"
   },
   {
      "AN":"Kayla Barlow",
      "CR":"preparator",
      "CO":"3",
      "MI":"http://arctos.database.museum/agent.cfm?agent_id=21319391"
   },
   {
      "AN":"Tom Collins",
      "CR":"preparator",
      "CO":"4",
      "MI":"http://arctos.database.museum/agent.cfm?agent_id=21317418"
   },
   {
      "AN":"Anastasia DeMaio",
      "CR":"preparator",
      "CO":"5",
      "MI":"http://arctos.database.museum/agent.cfm?agent_id=21317512"
   },
   {
      "AN":"Annamarie Fadorsen",
      "CR":"preparator",
      "CO":"6",
      "MI":"http://arctos.database.museum/agent.cfm?agent_id=21292439"
   },
   {
      "AN":"Carlos Molina",
      "CR":"preparator",
      "CO":"7",
      "MI":"http://arctos.database.museum/agent.cfm?agent_id=21316660"
   },
   {
      "AN":"Elizabeth Nelson",
      "CR":"preparator",
      "CO":"8",
      "MI":"http://arctos.database.museum/agent.cfm?agent_id=21319393"
   },
   {
      "AN":"Endora Roberts",
      "CR":"preparator",
      "CO":"9",
      "MI":"http://arctos.database.museum/agent.cfm?agent_id=21311704"
   },
   {
      "AN":"Kristian Williams",
      "CR":"preparator",
      "CO":"10",
      "MI":"http://arctos.database.museum/agent.cfm?agent_id=21319392"
   },
   {
      "AN":"Jensen Wong",
      "CR":"preparator",
      "CO":"11",
      "MI":"http://arctos.database.museum/agent.cfm?agent_id=21316663"
   },
   {
      "AN":"Ellie Zahedi",
      "CR":"preparator",
      "CO":"12",
      "MI":"http://arctos.database.museum/agent.cfm?agent_id=21316932"
   }
]

Would it melt DwC? No. Would people be able to understand it? Partially. The keys are a bit esoteric. If those are spelled out with apt labels it would help. Full-text searches on the recordedBy field would still recover relevant records, just as now. It doesn't exactly fit the semantics of recordedBy, but it does contain everything that recordedBy is meant to contain. For botanical specimens, would you fill the name with the official botanical collector name? The URL to the agent is pretty cool - Arctos has its own Agent authority with lots of info. If orcIDs and foaf entries were in that authority (accessible via API) you'd have the beginnings of linked open data that could participate more broadly than just Arctos.

The keys are a bit esoteric.

https://github.com/ArctosDB/arctos/issues/2131 - I'm totally open to better ideas.

For botanical specimens, would you fill the name with the official botanical collector name?

We could easily include that for the few we have.

UAM@ARCTOS> select count(*) from agent_name where agent_name_type='Kew abbr.';

  COUNT(*)
----------
       849

orcIDs

Those (and Wikidata) are just addresses - we could share them. (All 10 of them; maybe this is the tipping point users need to provide those data...)

API

That would be easy enough to build, although finding the processors to run it might be less trivial.

could participate more broadly than just Arctos.

That would be the point - this isn't really doing anything for "us" (at least not until we embarrass someone else into sharing more than strings, THEN it might get interesting).

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ccicero picture ccicero  Â·  8Comments

AJLinn picture AJLinn  Â·  4Comments

dustymc picture dustymc  Â·  3Comments

Jegelewicz picture Jegelewicz  Â·  6Comments

acdoll picture acdoll  Â·  8Comments