Arctos: units standardization

Created on 20 Feb 2018  Â·  35Comments  Â·  Source: ArctosDB/arctos

From @Jegelewicz on https://github.com/ArctosDB/arctos/issues/1403#issuecomment-358690383

Perhaps a discussion across the community to change all measurement units to the unambiguous spelled-out versions? It makes it difficult in the data entry process when you have to go figure out which spelling you need to use for three or four different unit of measurement fields. It would be best to have these consistent across ALL units of measurement.

Data attached.

1) Did I miss any code tables?
2) Strong objections to spelling everything out? µL-->microliters, ft-->feet, etc.

temp_all_units.csv.zip

Enhancement Function-CodeTables Priority-High

All 35 comments

I have no objections to fully spelling out units. Would listing the unit with its abbreviation be useful? e.g., microliters (µL)

No objections. Agree with adding abbreviation.

On Fri, Jun 12, 2020, 3:51 PM Emily Braker notifications@github.com wrote:

  • [EXTERNAL]*

I have no objections to fully spelling out units. Would listing the unit
with its abbreviation be useful? e.g., microliters (µL)

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/ArctosDB/arctos/issues/1444#issuecomment-643497405,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ADQ7JBH3WXG2HKBDHCWN6D3RWKPOVANCNFSM4ERR6ANQ
.

Agree with abbreviation. That would be helpful.

Parenthetical abbreviations would make it much harder to combine these data with other data, use them to talk to webservices, etc., etc., etc. They'd certainly break a bunch of my scripts that convert things to common units, and they'd probably make bulkloading less fun that it is now (was that "feet" or "feet (ft)" or "foot (ft)" or f(oo)t" or ??????????????). The closer we can stay to "normal" or "expected" the happier everyone's going to be.

So easier to just have the full word, and it is the () that is the problem?
Could we create a code table where people could reference if they want to double check that the measurement means the same as the abbreviation they generally use?

See for example https://exceljet.net/excel-functions/excel-convert-function

Excel can apparently do magic with liters microliters uL (but not µL - ugh!) etc.

microliters (uL), or anything else not on everyone's normal list of units, is going to break excel, and R, and Arctos, and any chance of comparing to anything else in GBIF, and.....

create a code table

We already have them - eg https://arctos.database.museum/info/ctDocumentation.cfm?table=cttissue_volume_units. We could certainly do better with the definitions; including all the variants there could be useful, for example.

Looks like we have to add to the code table too.

I am going to advocate for:

term | defintion
microliters | (uL, µL) - one millionth of a liter.

I prefer that our terms are the full name of the thing. We can list as many abbreviations as we want in the definition without breaking anything, but it seems like there is always a potential for whatever abbreviation we decide to use as a term to break something.

I prefer that our terms are the full name of the thing. We can list as many abbreviations as we want in the definition without breaking anything, but it seems like there is always a potential for whatever abbreviation we decide to use as a term to break something.

I like it! Good workaround.

I agree, spell it out!

Issue Summary:

Change all measurement units to the unambiguous spelled-out terms. It makes it difficult in the data entry process when you have to go figure out which spelling you need to use for three or four different unit of measurement fields. It would be best to have these consistent across ALL units of measurement. Abbreviations are not easy for machines to handle so our terms will be the full name of the thing. We can list as many abbreviations as we want in the definition without breaking anything.

Summary of changes suggested is here.

AWG suggests change all to abbreviations except fathom and microliter. Changes made in document.

@campmlc @ccicero @mkoo @ewommack do we need more discussion or can we proceed?

Are you proposing to use statute mile instead of mi?

On Thu, Jul 2, 2020 at 1:58 PM Teresa Mayfield-Meyer <
[email protected]> wrote:

  • [EXTERNAL]*

AWG suggests change all to abbreviations except fathom and microliter.
Changes made in document
https://docs.google.com/spreadsheets/d/16KcU3l7JJIH-4RWLulzCFni1G2Oig6_PUPMtz29UqQk/edit#gid=0
.

@campmlc https://github.com/campmlc @ccicero
https://github.com/ccicero @mkoo https://github.com/mkoo @ewommack
https://github.com/ewommack do we need more discussion or can we
proceed?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/ArctosDB/arctos/issues/1444#issuecomment-653195320,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ADQ7JBGJ4Y3LGVG7DY6ZA3DRZTRFBANCNFSM4ERR6ANQ
.

I don't know who added that or what it means.

If you mean the column I just added, that's how Excel interprets "mi."

This seems, at best, a step sideways, not forwards towards better communication with more data.

So the abbreviations are problematic?

I think so. They're not good for communicating with machines, or likely anyone whose primary language isn't English. Adding consistency is an improvement, but this is going to be a fair bit of work (it's scattered around in a bunch of places) and probably somewhat disruptive as users adapt, and it seems a waste not to get more out of it.

is "mile" any better? How does Excel interpret that?

On Thu, Jul 2, 2020 at 2:25 PM dustymc notifications@github.com wrote:

  • [EXTERNAL]*

I think so. They're not good for communicating with machines, or likely
anyone whose primary language isn't English. Adding consistency is an
improvement, but this is going to be a fair bit of work (it's scattered
around in a bunch of places) and probably somewhat disruptive as users
adapt, and it seems a waste not to get more out of it.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/ArctosDB/arctos/issues/1444#issuecomment-653205443,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ADQ7JBAOWRPMQJDE5WOVZCTRZTUMFANCNFSM4ERR6ANQ
.

How hard will it be to use the fully spelled out terms in the code table (thus bulkloader, API, DwC downloads, etc.) but have them display in Arctos as the abbreviations that are a column in the code table?

Globally forever, probably impossible.

On a page or two is possible, but I'm less than enthusiastic about intentionally making the UI confusing.

that are a column in the code table?

That would require rebuilding all of the code tables and all of the code that talks to them.

Excel does not have a "mile" handler.

So, assuming we want to move away from abbreviations as they are currently potentially problematic, what is more evil?

1 feet (which is wrong, but works for every other number except zero - which shouldn't be a problem)

or

2 foot? (which would be wrong for everything but 0 and one)

And how would machines interpret "feet" vs "foot"?

Found The Unified Code for Units of Measure

https://unitsofmeasure.org/ucum.html#section-Base-Units

All of these use the singular form with a symbol for "printing" which avoids the singular/plural problem. Unfortunately, there is no standard "print" symbol for the "Customary" terms, which include foot and so on. The base units can be combined with prefixes to get the rest of the terms we would probably want.

@dustymc it is weird to me that machines don't like the metric abbreviations, (cm, m, kg, etc.) what gives there? Do you know what the machines want?

Suggest that for now, we proceed with the AWG likes column, but that we eventually consider the following:

Science works in the metric system, but we also need to maintain verbatim information. Would it be possible to record all elevation, depth, and coordinate data entry as "verbatim" (see DwC verbatim elevation, verbatim depth) with a conversion to metric using the prefixes and base units so that we could use the appropriate "print symbol" and pass the DWC minimum elevation in meters, maximum elevation in meters, minimum depth in meters and maximum depth in meters terms to aggregators?

Surprisingly, there is no verbatim coordinate error in DwC - maybe this should be proposed as we definitely have some old stuff with error calculated in feet or miles.

with a symbol for "printing"

That's more complexity than I'm very willing to embrace.

weird to me ... what gives

Without some useful standard, random people do random things and then act all surprised when they can't talk to each other!

m

Would be interesting to know how many unit-like things have used that abbreviation over the years!

AWG likes column

I suggest we table this until the work can do something in return.

record all elevation, depth, and coordinate data entry as "verbatim"

Doesn't seem like a problem with locality attributes existing. That would also get rid of a denormalizer - there are a LOT of ways to say "this long" and Arctos treats them all as unique.

using the prefixes and base units s

MAYBE that's necessary if we're mixing nanometers and meters or something, but for place-error a straight to_meters() conversion isn't going to lose anything of value and would be a great simplification.

no verbatim coordinate error in DwC

Arctos is a click away; I can't see how that would add much value for us.

I suggest we table this until the work can do something in return.

It will. Definitions will be provided and we will no longer have to deal with "feet" in depth and "ft" in elevation.

no verbatim coordinate error in DwC

Arctos is a click away; I can't see how that would add much value for us.

How many people do you believe click through from GBIF or iDigBio?

Surely Arctos can handle abbreviations for the SI system. If there are
people out there that don't understand that m means meter, that's their
problem. I agree with Teresa.

On Mon, Jul 27, 2020 at 11:19 AM Teresa Mayfield-Meyer <
[email protected]> wrote:

  • [EXTERNAL]*

I suggest we table this until the work can do something in return.

It will. Definitions will be provided and we will no longer have to deal
with "feet" in depth and "ft" in elevation.

no verbatim coordinate error in DwC

Arctos is a click away; I can't see how that would add much value for us.

How many people do you believe click through from GBIF or iDigBio?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/ArctosDB/arctos/issues/1444#issuecomment-664527749,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ADQ7JBATGPPCL5C5SLVVKITR5WZJZANCNFSM4ERR6ANQ
.

For Pete's sake - can we do something about this?

image

My (halfhearted, because this has really limited impact as long as we're refusing to settle on something that lets us talk to other stuff) would be one (new) "ctdistance_units" code table to replace

https://arctos.database.museum/info/ctDocumentation.cfm?table=ctlat_long_error_units
https://arctos.database.museum/info/ctDocumentation.cfm?table=ctlength_units
https://arctos.database.museum/info/ctDocumentation.cfm?table=ctorig_elev_units
https://arctos.database.museum/info/ctDocumentation.cfm?table=ctdepth_units

and whatever else I've missed.

Measuring fleas in fathoms or giving elevation in femtometers might be weird, but it's better than having a big pile of arbitrary length-stuff hanging around.

Combining the code tables sounds ok to me. I don't think having to navigate around fathoms would bother me too much.

I like it - simple, easy to remember.

OK with me!

On Tue, Feb 23, 2021 at 2:51 PM Teresa Mayfield-Meyer <
[email protected]> wrote:

  • [EXTERNAL]*

I like it - simple, easy to remember.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/ArctosDB/arctos/issues/1444#issuecomment-784537246,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ADQ7JBDVKGZXGNDN7JLFDUTTAQPMFANCNFSM4ERR6ANQ
.

AWG says GO.

I don't think we need a new table here, we can just reuse https://arctos.database.museum/info/ctDocumentation.cfm?table=ctlength_units - going with that unless someone has an immediate reason not to....

Here's the final (I hope) length units code table.

arctosprod@arctosutf>> select * from ctlength_units order by length_units;
 length_units |                                              description                                              
--------------+-------------------------------------------------------------------------------------------------------
 cm           | Centimetre or centimeter, one hundredth of a metre.
 fathom       | A fathom is generally understood to be six feet, but other values have been used in various contexts.
 ft           | Foot or feet; 0.3048 m.
 in           | Inch; 25.4 mm.
 km           | Kilometre or kilometer; equal to one thousand meters.
 m            | Meter or metre; base unit of length in the International System of Units (SI).
 mi           | Mile; generally assumed to be statute mile, 5,280 feet or about 1609 meters.
 mm           | Millimetre or millimeter; one thousandth of a meter.
 yd           | Yard; 3 feet or 0.9144 meters.

"fathom" is obviously the outlier there (and I had to change the table structure to let it fit), but https://en.wikipedia.org/wiki/Talk%3AFathom#Abbreviation_%22ftm%22 suggests this is the most agreeable "symbol."

I like the simplicity.

done

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dustymc picture dustymc  Â·  6Comments

acdoll picture acdoll  Â·  8Comments

AJLinn picture AJLinn  Â·  4Comments

acdoll picture acdoll  Â·  4Comments

dustymc picture dustymc  Â·  4Comments