Arctos: treat spatial data as spatial data: locality meta issue

Created on 11 May 2020  Â·  31Comments  Â·  Source: ArctosDB/arctos

Issue Documentation is http://handbook.arctosdb.org/how_to/How-to-Use-Issues-in-Arctos.html

Is your feature request related to a problem? Please describe.

We store spatial data as numbers and such (which will get slightly weirder in PostGres). We then use crazy math and javascript libraries and such to attempt to do stuff with those data, triggers to keep them properly typed, etc.

* Describe what you're trying to accomplish*

Treat spatial data as spatial data.

Describe the solution you'd like

  1. Install postGIG
  2. Convert things that can be represented as spatial data to spatial data
  3. Treat them as spatial data

Describe alternatives you've considered

Struggle on as we have been.

Priority

I think high, but firmly post-migration. Should be done in conjunction with other locality issues so we're not rebuilding forms and then immediately rebuilding the same forms again.

https://github.com/ArctosDB/arctos/issues/2659
https://github.com/ArctosDB/arctos/issues/2658
https://github.com/ArctosDB/arctos/issues/2498
https://github.com/ArctosDB/arctos/issues/2274
https://github.com/ArctosDB/arctos/issues/2271
https://github.com/ArctosDB/arctos/issues/2102
https://github.com/ArctosDB/arctos/issues/2068

PLEASE TAG THIS ISSUE IN ANY LOCALITY STRUCTURE RELATED ISSUES!

Enhancement Function-LocalitEvenGeoreferencing Priority-High

Most helpful comment

TACC has elaborated on "not a huge deal" by turning postgis on in test.

All 31 comments

From https://github.com/ArctosDB/arctos/issues/919

Use case: searching for specimens that intersect a polygon like Yosemite NP using the footprint or WKT polygon even if specimen does not reference Yosemite in the locality description. Can we leverage Google Maps API or BerkeleyMapper perhaps?

Do we need to leverage GMA or BM? Why not just send polygon data as a
bounding box search like we do on the Arctos search page? Arent' there PG
libraries for that? Those wkt's can be converted to other formats too.

On Wed, Jul 1, 2020 at 2:44 PM dustymc notifications@github.com wrote:

From #919 https://github.com/ArctosDB/arctos/issues/919

Use case: searching for specimens that intersect a polygon like Yosemite
NP using the footprint or WKT polygon even if specimen does not reference
Yosemite in the locality description. Can we leverage Google Maps API or
BerkeleyMapper perhaps?

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/ArctosDB/arctos/issues/2660#issuecomment-652661729,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AATH7UMOJ5DFP4B54NK3DO3RZOU4BANCNFSM4M6AMJAQ
.

Do we need to leverage GMA or BM?

That might solve some issues, but I don't think it's a complete solution.

just send polygon data

Also only part of the problem.

PG libraries

That's (part of) what I'm looking for.

  • core use case: more appropriate datatype for locality.dec_lat,locality.dec_long.
  • random incidental coolness

    • convert UTM to DD.dd and DD.dd to UTM so @Nicole-Ridgwell-NMMNHS can better interact with agencies, so we can continue (or resume) accepting UTM as "as-entered", etc.

    • consider datum when converting 'as-entered' to locality.coordinates==>more accurate mapping

    • do more cool stuff with spatial data - convert stuff, query WITHIN, INTERSECTS, etc.

    • do more with elevation

    • do more with TRS? Not sure what's possible....

    • 19784 other cool things that I haven't thought of yet because I'm dealing with numeric data instead of spatial

This is really important for NMMNHS spatial data, which, because we use UTMs, is currently basically not functional in Arctos. Aside from NMMNHS need, as dusty says, there is a ton of neat stuff we could do with this functionality.

Sounds like a good idea!

On Wed, Jan 27, 2021 at 11:13 AM Nicole-Ridgwell-NMMNHS <
[email protected]> wrote:

  • [EXTERNAL]*

This is really important for NMMNHS spatial data, which, because we use
UTMs, is currently basically not functional in Arctos. Aside from NMMNHS
need, as dusty says, there is a ton of neat stuff we could do with this
functionality.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/ArctosDB/arctos/issues/2660#issuecomment-768474980,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ADQ7JBCLK7JQWKNCQR47P7DS4BJWBANCNFSM4M6AMJAQ
.

What and who do we need to make this happen?

TACC says pggis is "not a huge deal." I don't think we're going to do anything too radical to the locality model, but perhaps worth a quick AWG discussion - otherwise I think this just needs prioritized. I'll go next task.

TACC has elaborated on "not a huge deal" by turning postgis on in test.

As long as that can translate to production!

production

No problem, but will require a restart.

AWG Issues Group: Go, figure out how to UTM-->DD.dd as a start.

one note about UTM to DD conversion, we might want to ensure we can do southern hemisphere transformations too, which will require a utm latitude band letter as well as the zone.

so for UTM, we need these fields: UTM zone, band letter (if northern hemisphere maybe we can assume this and not require it), northing, easting, datum (but maybe we state it is assumed to be WGS84?)

It seem like what I actually require is an SRID, but I'm not yet sure that there's not a more-human-readable proxy to that.

32633 is 33N / WGS84, for example.

and I have a definition for it

arctosprod@arctosutf>> select * from spatial_ref_sys where srid=32633;
 srid  | auth_name | auth_srid |                                                                                                                                                                                                                                                                                                        srtext                                                                                                                                                                                                                                                                                                         |                     proj4text                      
-------+-----------+-----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------
 32633 | EPSG      |     32633 | PROJCS["WGS 84 / UTM zone 33N",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",15],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","32633"]] | +proj=utm +zone=33 +datum=WGS84 +units=m +no_defs 
(1 row)

I haven't been brave enough to try, but I think I can add to my definitions - eg I should be able to convert to/from anything I can find on spatialreference.org/, in theory.

Below is what I'm starting with - I'm not sure how many of these are valid, I think perhaps most of them are something that someone made up for some reason. It might be faster (certainly more accurate) to fire up a utm_zone code table (with SRID) and manually translate them once. That is almost certainly a requirement going forward, "not" or "0.5" should be preemptively rejected and I'm not sure how that could happen without a CT.

I think we'll need to add SRID to ctdatum as well, but that can be Step Two (probably).

arctosprod@arctosutf>> select distinct utm_zone from collecting_event where utm_zone is not null order by utm_zone
arctosutf-> ;
 utm_zone 
----------
 0.5
 06
 1
 10
 10N
 10S
 10T
 11
 11N
 11S
 12
 125
 12N
 12R
 12S
 12s
 13
 135
 13N
 13R
 13S
 13T
 13s
 14
 15
 15T
 15U
 16
 16N
 16T
 16U
 17
 17P
 17T
 17U
 18
 185
 18T
 19
 200
 27
 28
 36K
 36L
 37K
 37L
 48
 48P
 5
 5W
 6
 6N
 6V
 6W
 83
 AG7
 BG8
 BH6
 BH8
 BJ1
 BJ4
 BP1
 BV4
 CB1
 CB2
 CC3
 CG1
 CH0
 CH2
 CH5
 CH6
 CH7
 CJ0
 CJ9
 CK6
 CK9
 CN1
 CN2
 CP1
 CP3
 CP9
 CQ0
 CS4
 CT8
 DA8
 DG1
 DH2
 DH3
 DH7
 DK5
 DK6
 DK7
 DL7
 DM0
 DN4
 DP3
 DQ0
 DQ3
 DR5
 DR6
 DR9
 DS3
 DS5
 DS6
 DS8
 DS9
 DT4
 DT8
 EA0
 EE8
 EH8
 EJ4
 EN1
 EN6
 EP0
 EP2
 EP3
 EP4
 ER3
 ES0
 ET0
 ET4
 FC7
 FH3
 FH4
 FL8
 FN6
 FR2
 FS4
 GC5
 GK6
 GK7
 GL7
 GN8
 GQ4
 GS9
 GU3
 GU9
 KC5
 KC9
 KD3
 KE2
 KE4
 KE5
 KE6
 KF8
 KH0
 KK4
 KK8
 LC5
 LC6
 LC9
 LD6
 LE6
 LF1
 LG5
 LJ2
 LJ3
 LN5
 LP0
 MD1
 MD3
 MD5
 MD8
 ME3
 MF3
 MF7
 MH2
 MH8
 MK7
 MP2
 MP8
 MQ9
 MT1
 MU4
 MUL
 ND4
 ND5
 NE1
 NE2
 NE9
 NF8
 NG6
 NG7
 NG9
 NH1
 NH8
 NJ0
 NK2
 NK7
 NK9
 NL0
 NL4
 NL9
 NM2
 NM7
 NN4
 NP1
 NQ8
 NQ9
 NR9
 NS0
 Orl
 PB4
 PC0
 PE3
 PE4
 PE6
 PF0
 PF2
 PF3
 PF5
 PG7
 PJ0
 PJ2
 PJ7
 PJ9
 PK9
 PL5
 PM0
 PM7
 PN2
 PP9
 PQ0
 PQ2
 PQ6
 PQ9
 PR5
 PR9
 PS5
 PT2
 PT4
 PT6
 PT9
 QE4
 QF7
 QH3
 QH8
 QJ0
 QJ7
 QN3
 QP4
 QQ0
 QQ3
 QQ7
 QS0
 QS3
 QS7
 QT0
 QT1
 QT2
 QU3
 QU6
 QU8
 QU9
 QV6
 QV7
 QV9
 RV0
 RV1
 RV2
 S13
 SE6
 SE7
 SE8
 SE9
 SF6
 SF7
 SG9
 SQ8
 T13
 TB7
 TC9
 TD0
 TD6
 TD9
 TE0
 TE3
 TE4
 TE5
 TE6
 TE7
 TE8
 TF5
 TF9
 TG6
 TJ0
 TN3
 TN4
 TN9
 TP1
 TQ1
 TS9
 TT9
 TU0
 TU1
 TU2
 TU3
 TU4
 TU6
 TU7
 TV1
 TV5
 UA6
 UA7
 UB3
 UB6
 UB7
 UC3
 UF2
 UF4
 UG1
 UH2
 UH3
 UH4
 UK5
 UL5
 UL8
 UM0
 UN1
 UN7
 UP5
 UQ6
 UQ9
 US0
 US2
 US3
 UT0
 UT1
 UT2
 UT5
 UU0
 UU1
 UU2
 UU3
 UU4
 UV8
 VA3
 VA6
 VD9
 VE6
 VF0
 VF2
 VF4
 VF5
 VF9
 VG4
 VJ0
 VJ6
 VJ8
 VL9
 VM5
 VP1
 VR3
 VR4
 VS5
 VU2
 VU3
 VV3
 WA6
 WC2
 WF1
 WF2
 WG6
 WH3
 WR3
 XB3
 XG5
 XG8
 XH0
 XH8
 XJ4
 XK5
 XL3
 XM6
 XN5
 XN7
 XN8
 XR9
 XU6
 YG8
 YH4
 YH6
 YM3
 YM9
 YN7
 YR3
 YT0
 ZD1
 ZM0
 ZM1
 ZM3
 km
 not
(394 rows)

It might be faster (certainly more accurate) to fire up a utm_zone code table (with SRID) and manually translate them once. That is almost certainly a requirement going forward, "not" or "0.5" should be preemptively rejected and I'm not sure how that could happen without a CT.

Agreed, most of that list of zones is nonsense.

band letter (if northern hemisphere maybe we can assume this and not require it)

It would be very nice if we could assume it for northern hemisphere. Is band letter required for southern hemisphere or do we just need to know that it is southern hemisphere?

assume it for northern hemisphere.

I'm no utmologist, but https://upload.wikimedia.org/wikipedia/commons/b/b7/Universal_Transverse_Mercator_zones.svg suggests the 'letter bands' are universal, and I don't seem to have any "bare" zones without the letter suffix in my definition file.

Everything I ever thought I knew about UTM seems to be wrong, and I'm not sure I'm done with that yet.....

We definitely need a zone code table with SRID, BUT in order to not confound datum it will be somewhat arbitrary. I'll just assume everything is WGS84 in the code table (and therefore in the initial conversion), convert to coordinate-like strings (I think using geography at this stage would be problematic, not sure yet), then convert from the original datum to WGS84 (after we add an SRID there...).

PG does not need latitude bands (which I think may not be "official" UTM after all??), but does need hemisphere. I think that just means we'll use the same SRID for lots of entities (eg, all northern zone10 - 10N,10P,..,10X - will use https://spatialreference.org/ref/epsg/32610/).

We could potentially drop the redundancy and reduce that to eg "10 north" and "10 south" but the latitude bands do serve as a sort of checksum, and I think may be necessary in some UTM-like systems.

We have a fair bit of "zone S" in the original data, and I'd guess some of it means "Zone S" and some means "South." (The intended meaning of "N" doesn't matter to me, I think.) I think I can continue ignoring that for now, but at some point we're going to have to deal with our existing not-quite-UTM data (perhaps by tossing it into some 'verbatim...' collecting event attribute).

Please let me know ASAP if any of that sounds wildly wrong.

@tucotuco you got any extra fairy dust to sprinkle around? For now, I'm just trying to convert UTM to DD with postgis, but do feel free to stop me doing anything else I'll regret.

@Nicole-Ridgwell-NMMNHS do you have some wild UTM data handy? A small batch of UTM stuff plus state-or-so (just as a sanity check) might be handy in the maybe-not-so-distant future.

We have a fair bit of "zone S" in the original data, and I'd guess some of it means "Zone S" and some means "South."

Yep, this was something that totally confused me as a student, even in my GIS courses, it wasn't really explained. I'm going to guess most of the S's means "Zone S". And probably most of the N's mean North.

do you have some wild UTM data handy?

Sure, here is some private land (not federally protected) data you can work with:
Sample UTM Set.zip

wasn't really explained

Maybe NOBODY really knows! "S for South" certainly seem to be a common thing in pgis lingo.

I would be forced to reject the sample UTM data you sent - I don't think I can do this without something that gets me to the right hemisphere. You can easily fake it by choosing N or greater (these are likely almost all S, maybe some R) - I think all valid northern values will be processed the same way - but then your verbatim won't be very verbatim, which might find a way to bite someone in the future. I assume that will be common - suggestions for how to deal with it? I'm tempted to suggest we include eg "Zone 10, somewhere north of the equator" but I also kinda wanna go wash out my brain with bleach for considering such a thing....

We'll also need to get rid of the N and S suffixes in your value columns, but that should be trivial.

I don't think any of this is stopping me at the moment, just things we need to consider going forward.

I assume that will be common - suggestions for how to deal with it? I'm tempted to suggest we include eg "Zone 10, somewhere north of the equator" but I also kinda wanna go wash out my brain with bleach for considering such a thing....

Our data is easy, just assume northern hemisphere for everything, but that won't necessarily work for other collections' data if we're trying to just take what is currently in UTM verbatim and convert. Definitely moving forward, data entry needs to require Northern hemisphere/Southern hemisphere, written out so its less likely to be confused with band letter.

Is that a vote for '10N-X' (=north) and '10C-M' (=south) "zones"?

Sure, but if we're going to use that assumption for converting existing values in verbatim, we might want to run a check first so see if anyone has any southern hemisphere data that might break that rule.

I seem to have a functional UTM2WGS84 script.

Todo ASAP:

Todo next:

  • [ ] allow UTM in anything that loads collecting events, directly or otherwise

Todo less-urgently:

Todo eventually:

  • [ ] datum-correct everything, figure out a place to store coordinates::geography
  • [ ] do magical spatial stuff with data

@Nicole-Ridgwell-NMMNHS I ran your data through my function (https://github.com/ArctosDB/PG_DDL/blob/master/function/UTM2WGS84.sql), results attached. Please sanity-check a few, and note the missing values in https://docs.google.com/spreadsheets/d/12XIyXXfuhh6pHI9TyvuuKUdgKZpg8EDeQLn4zu1cf1M/edit#gid=94044881 - adding them in now (before I pull those data to production) is OK, or we can deal with them through the normal code table path after that goes to prod.

I'll pop open a new issue for the zone thing; I think that deserves its own discussion.

Here's the code I used.

drop table temp_utmtest;
create table temp_utmtest as select * from temp_cache.temp_dlm_uptbl ;
select distinct datum from temp_utmtest;
update temp_utmtest set datum='World Geodetic System 1984' where datum='WGS 84';
update temp_utmtest set datum='North American Datum 1983' where datum='NAD 83';
update temp_utmtest set datum='North American Datum 1927' where datum='NAD 27';
 select distinct utmzone from temp_utmtest;
-- not enough information; these are mostly SWUS so fake it
update temp_utmtest set utmzone=utmzone||'S';
select distinct utmn from temp_utmtest;
-- this should be numeric, it ain't
update temp_utmtest set utmn=replace(utmn,'N','');
 select distinct utme from temp_utmtest;
-- ditto
update temp_utmtest set utme=replace(utme,'E','');
  -- place for results
  alter table temp_utmtest add lat double precision;
  alter table temp_utmtest add lng double precision;
  alter table temp_utmtest add convmsg varchar;
  -- make this easy
  alter table temp_utmtest add key int;
  update temp_utmtest set key=nextval('somerandomsequence');
-- now can actually do stuff

CREATE OR REPLACE FUNCTION tempbutm() RETURNS void AS $body$
DECLARE
    r record;
    j json;
begin
  for r in (select * from temp_utmtest) loop
    select UTM2WGS84(
      r.utmzone,
      r.utmn::int,
      r.utme::int,
      r.datum)
    into j;
    if j::jsonb->>'status'='OK' then
      update temp_utmtest set lat=(j::jsonb->>'lat')::double precision,lng=(j::jsonb->>'lng')::double precision where key=r.key;
    else
      update temp_utmtest set convmsg=concat_ws('::',j::jsonb->>'status',j::jsonb->>'message') where key=r.key;
    end if;
  end loop;
END;
$body$
LANGUAGE PLPGSQL
SECURITY DEFINER
volatile;

select tempbutm();

temp_utmtest.csv.zip

I ran your data through my function

Looks good.

adding them in now (before I pull those data to production) is OK, or we can deal with them through the normal code table path after that goes to prod.

Depends I think on what we do in #3415, if we go a route where we don't include band zones (other than north/south) or then I think we could just add them all at the start.

Putting this on pause until I have postgis at production - TACC is working on that.

Are there any updates on this?

I'll ping Chris.

It would be helpful to know whether this will still happen in the near future or whether it is a more lengthy process than we thought for TACC to turn on postgis. If the latter then we can work on a stopgap of creating a structured UTM locality attribute. @dustymc @mkoo

see also https://github.com/ArctosDB/arctos/issues/2309#issuecomment-812559005

I pinged tacc last week, still waiting for them to schedule something. You'll know more when I know more, unless @mkoo has some magic.

From @mkoo

Actually if we could move away from WKT's I'd be super happy. It's great for storing, say, a SQUARE but it gets challenging with coastlines and anything more complicated than an octagon (probably). I just know it's extra steps to convert from GIS data PLUS it stores no metadata! (that's the killer when comparing to a KML or geoJSON)

WKT is also a pain for me and now we (will soon, I hope...) have spatial tools, so

  • add polygon(geography) to geog_auth_rec and locality, move the WKT data in, lose the media_ids
  • convert to WKT or other formats as necessary/useful.

I think the biggest drawback is backups - this will make them larger. Seems a price well worth paying.

This will probably mean it's not possible to use media IDs in the catalog record bulkloader, but I don't think anyone ever has and it's possible (and introduces much less chance for confusion) to pre-create localities and use them.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

alexkrohn picture alexkrohn  Â·  3Comments

ccicero picture ccicero  Â·  8Comments

mkoo picture mkoo  Â·  3Comments

Jegelewicz picture Jegelewicz  Â·  7Comments

AJLinn picture AJLinn  Â·  4Comments