Arctos: locality attributes: site description

Created on 10 Jul 2020  路  26Comments  路  Source: ArctosDB/arctos

@Jegelewicz @Nicole-Ridgwell-NMMNHS breaking this out of https://github.com/ArctosDB/arctos/issues/2498

Site Found By and Site Found Date will be merged into free-text attribute... (right?)

    attribute_type-->"site description"
    determined_by_agent_id-->getAgentID( Site Found By)
    attribute_value-->"no information available"  (very up for better ideas here!)
    attribute_units-->NULL
    attribute_remark-->NULL
    determination_method--NULL
    determined_date-->Site Found Date

Please feel free to correct any of that.

There are some data (in production) which can't be converted and will need cleaned up before the final migration. (I can just drop anything that doesn't fit in test).

arctosprod@arctos>>  select geo_att_value from geology_attributes where geology_attribute='Site Found By' and getAgentID(geo_att_value) is null group by geo_att_value;
     geo_att_value     
-----------------------
 Barrick
 Benvie
 Bobrow
 Colpits
 Corwall
 Costas X. Tsentas
 Cotton
 de Abreu
 Drake
 Graffham
 Hester
 Jason S. Silviria
 Kazur
 Machin
 Massingill
 Merewether
 M. Schillaci
 Natal V. D'Andrea
 Otis
 Pence
 Phillip B. Huber
 Pittman
 Soskin
 Stachura
 Tatham
 Tedford
 Torreon
 Traeger
 verbatim agent
 Winnett
 Yeck
(31 rows)




arctosprod@arctos>>  select geo_att_value from geology_attributes where geology_attribute='Site Found Date' and is_iso8601(geo_att_value)!='valid' group by geo_att_value;
 geo_att_value 
---------------
 5/2/1977
 7/25/1986
 10/4/1986
 1900-01-00
 2/7/1987

Most helpful comment

I'm trying to make sure this covers things like 5 year old finds fossil in hillside, parents call museum and museum collects fossil.

Yes, that's much of the idea I'm trying to get across. Put their name in a cool database, and they'll call you the next time as well. If the rock (invasive plant, rare snake, whatever) disappears into some bureaucracy then the next one might end up on their shelf.

I think the definition mostly addresses the rest of my concerns.

I'll run with "site found" - we can revisit later if necessary, this does seem like a place where a small change could have big impacts in how the public views their relationship to "us," Arctos doesn't limit us to one "field," this all might be worth some more discussion.

I made the data updates @Jegelewicz mapped above in production, but had to drop some indexes to accommodate. Please avoid doing anything to the geology code table, and be super-extra-double-careful if you absolutely must.

All 26 comments

attribute_type-->"site description"
determined_by_agent_id-->getAgentID( Site Found By)
attribute_value-->"site found" (this is what NMMNH staff and other paleo people I have worked with call this - would be nice if we could have a table from which to get these values to keep them consistent)
attribute_units-->NULL
attribute_remark-->NULL
determination_method--NULL
determined_date-->Site Found Date

new table site_description

term | definition
-- | --
site found | for a paleontological site, the date and the agent(s) who originally surveyed a site.

arctosprod@arctos>> select geo_att_value from geology_attributes where geology_attribute='Site Found By' and getAgentID(geo_att_value) is null group by geo_att_value;

geo_att_value

Current value | Change to
-- | --
Barrick | unknown
Benvie | George Benvie
Bobrow | unknown
Colpits | unknown
Corwall | unknown
Costas X. Tsentas | Costas Tsentas
Cotton | unknown
de Abreu | unknown
Drake | unknown
Graffham | unknown
Hester | unknown
Jason S. Silviria | Jason Silviria
Kazur | unknown
Machin | unknown
Massingill | unknown
Merewether | unknown
M. Schillaci | Michael Schillaci
Natal V. D'Andrea | Natal V. D'Andrea
Otis | unknown
Pence | unknown
Phillip B. Huber | Philip B. Huber
Pittman | unknown
Soskin | unknown
Stachura | unknown
Tatham | unknown
Tedford | unknown
Torreon | unknown
Traeger | unknown
verbatim agent | unknown
Winnett | unknown
Yeck | unknown

for all "unknown, put name in attribute remark

arctosprod@arctos>> select geo_att_value from geology_attributes where geology_attribute='Site Found Date' and is_iso8601(geo_att_value)!='valid' group by geo_att_value;

geo_att_value

Current value | Change to
-- | --
5/2/1977 | 1977-05-02
7/25/1986 | 1986-07-25
10/4/1986 | 1986-10-04
1900-01-00 | 1900-01-01
2/7/1987 | 1987-02-07

site found

That takes me back to the point, probably buried in the other issue, that you're never actually going to know that. I suppose "surveyed" has some implications, but it still seems at least awkward to imply that some paleontologist necessarily got there first. I suspect this concept will be picked up by archeologists and herpetologists and etc. as well, which is even weirder.

new table site_description

I don't think there's a new table involved. "site description" would be an entry in ctlocality_attribute_type (makes it available as a locality attribute), and NOT an entry in ctlocality_att_att (makes it free-text).

In any case seems like we're agreed on the idea, and free-text are ~easy to update if we have a reason to. I'll go update production with your lookups and then check back here.

some paleontologist necessarily got there first.

But that is the idea and sometimes a publication is the confirmation of it.

@Nicole-Ridgwell-NMMNHS can chime in.

I suspect this concept will be picked up by archeologists and herpetologists and etc. as well, which is even weirder.

I would imagine that archaeologists have something like this and I could also see how a "type locality" would make use of this concept...

FWIW - type localities could be associated with taxa....

I know of a couple where a farmer on a dozer "found" the site, I suspect hundreds of them were found by some Native kid then "found" by an archeologist 500 years later, etc., etc., etc. I'm just not seeing any reason other than tradition to use that terminology. Not a technical problem, I'm mostly OK with doing whatever, this just seems like a very easy way for Arctos to be a lot more inclusive.

OK, how about we make it less "first-y"? That way, there could be multiple assertions.

term | definition
-- | --
site found | the date and the agent(s) who surveyed a site.

less "first-y"

I like.

found

Still "first-y"

Sheesh - my brain is melting. "found" is what the paleo people are used to. Other possible terms:

site surveyed
site developed
site visited

@Nicole-Ridgwell-NMMNHS

I don't think found has quite the same implications as discover? But maybe we should get some more people to weigh in on this. I don't want to generalize the term and lose meaning here either? "surveyed" and "developed" both have separate meanings.

How about definition: "the date and the agent(s) who originally found and reported the locality to a collecting institution"? I'm trying to make sure this covers things like 5 year old finds fossil in hillside, parents call museum and museum collects fossil.

site found - the date and the agent(s) who originally found and reported the locality to a collecting institution.

I like that better and I think we are stuck with found....

I'm trying to make sure this covers things like 5 year old finds fossil in hillside, parents call museum and museum collects fossil.

Yes, that's much of the idea I'm trying to get across. Put their name in a cool database, and they'll call you the next time as well. If the rock (invasive plant, rare snake, whatever) disappears into some bureaucracy then the next one might end up on their shelf.

I think the definition mostly addresses the rest of my concerns.

I'll run with "site found" - we can revisit later if necessary, this does seem like a place where a small change could have big impacts in how the public views their relationship to "us," Arctos doesn't limit us to one "field," this all might be worth some more discussion.

I made the data updates @Jegelewicz mapped above in production, but had to drop some indexes to accommodate. Please avoid doing anything to the geology code table, and be super-extra-double-careful if you absolutely must.

I found a wall, possibly the sort I'd like to avoid with TRS data. There are multiple dates, no dates, multiple agents, no agents, and various combinations thereof. I'm not sure how to merge that.

The easy path from here is to just make multiple attributes with at least one of (determiner, date) NULL (same as we have now) and deal with them later, but I'm not crazy about ignoring the inconsistency either. I did this (in production); data attached, advise please.

create table temp_loc_sfbd (locality_id bigint,fd_by varchar, fd_dt varchar);

insert into temp_loc_sfbd (locality_id) (
select distinct locality_id from (
select locality_id from geology_attributes where geology_attribute='Site Found By'
union
select locality_id from geology_attributes where geology_attribute='Site Found Date'
) x
);


update temp_loc_sfbd set fd_by=(
select string_agg(x,'; ') from (select geo_att_value x from geology_attributes where geology_attributes.locality_id=temp_loc_sfbd.locality_id and geology_attribute='Site Found By') als
);


update temp_loc_sfbd set fd_dt=(
select string_agg(x,'; ') from (select geo_att_value x from geology_attributes where geology_attributes.locality_id=temp_loc_sfbd.locality_id and geology_attribute='Site Found Date') als
);

temp_loc_sfbd.csv.zip

Suggest replace

NULL Found by with "unknown"
NULL Found Date with "1901-01-01" (this is what was used for a bunch of them where apparently the date was unknown)

@Nicole-Ridgwell-NMMNHS

Multiple names for a location can be two (or however many) attributes - it indicates that a group of people "found" the site.

Multiple dates:

This one is just entered twice

11030595 | Kenneth K. Kietzke | 1984-06-01; 1984-06-01

as far as I can tell, that's the only multiple date.

NULL Found by with "unknown"
NULL Found Date with "1901-01-01"

If both are null, just don't add the attribute. Is there a reason to use a filler date rather than just leaving the date blank?

Multiple names for a location can be two (or however many) attributes - it indicates that a group of people "found" the site.

Yes, create multiple attributes for multiple names.

There aren't any in the list where both are NULL.

I assume we are making the determined date required. Possible to NULL?

making the determined date required

No, I've got it NOT NULL

arctosprod@arctosutf>> \d locality_attributes
                                                     Table "public.locality_attributes"
         Column         |         Type          | Collation | Nullable |                              Default                               
------------------------+-----------------------+-----------+----------+--------------------------------------------------------------------
 locality_attribute_id  | integer               |           | not null | nextval('locality_attributes_locality_attribute_id_seq'::regclass)
 locality_id            | bigint                |           | not null | 
 determined_by_agent_id | bigint                |           |          | 
 attribute_type         | character varying(60) |           | not null | 
 attribute_value        | character varying     |           | not null | 
 attribute_units        | character varying(60) |           |          | 
 attribute_remark       | character varying     |           |          | 
 determination_method   | character varying     |           |          | 
 determined_date        | character varying(22) |           |          | 
Indexes:
    "ix_u_locality_attributes_id" UNIQUE, btree (locality_attribute_id)

Does look like only one with multiple dates, so I can zap one of those and be left with...

date without determiner
determiner without date
multiple determiners on the same (maybe-null) date
1:1
all of which works

Yay?

YAY!

Sounds good!

Bah. Some of these have remarks and methods. The data in prod are a lot more complex than what I have to test with, the prod UIs are overloaded and not very useful, and I'm just less confident than I'd like to be about this. I tentatively propose to create a site found attribute for every existing Site Found By and Site Found Date attribute, and then we can revisit a merger when the dust has settled a bit. Is that OK? If not I'll put this aside and come back next week.

That should be OK - I think it would be easy to remove duplicates, then we can mess around with the rest.

Yes, I think that is fine.

This is done but I'll leave it open - there are a lot of internal inconsistencies, might need to pull/clean/reload or something when we're in the more-capable system.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dustymc picture dustymc  路  3Comments

ccicero picture ccicero  路  8Comments

dustymc picture dustymc  路  7Comments

mkoo picture mkoo  路  3Comments

dustymc picture dustymc  路  6Comments