Does anyone mind if I clean up the higher geography for Republic of Palau? Some entries are in the Caroline Islands (my preference) and some are in the Palau Islands. Only a few of the states are listed and several islands aren't assigned to a state.
Searches would probably be hampered by the inconsistency. Changes will impact mostly MVZ records and our DMNS:Inv records.
Is this the best way to notify people that changes are needed that could impact their records, or, if I have good Wikipedia sources, do we just go ahead and do it? I have seen similar issues in other Pacific Ocean island countries.
Medium priority
@PaulaBarteau any comment?
Is this the best way to notify people that changes are needed that could impact their records, or, if I have good Wikipedia sources, do we just go ahead and do it?
I think a notification is a good best practice.
YES PLEASE!!!
notify
Agree some communication would be appreciated. I think as long as you're not changing the meaning of the data asking for forgiveness rather than permission is just fine. Here's who uses Palau.
select guid_prefix, count(*) from collection, cataloged_item, specimen_event,collecting_event,locality,geog_auth_rec where
collection.collection_id=cataloged_item.collection_id and cataloged_item.collection_object_id=specimen_event.collection_object_id and
specimen_event.collecting_event_id=collecting_event.collecting_event_id and
collecting_event.locality_id=locality.locality_id and
locality.geog_auth_rec_id=geog_auth_rec.geog_auth_rec_id and
higher_geog like '%Palau%' group by guid_prefix;
GUID_PREFIX COUNT(*)
------------------------------------------------------------ ----------
UAMb:Herb 1
CHAS:Inv 10
UMNH:Mamm 1
MVZ:Bird 78
UCM:Herp 2
MVZ:Mamm 10
MVZ:Herp 21
DMNS:Inv 128
Hi, many of our molluscs are from Palau. We're in the middle of cleaning them up to upload into Arctos. I think cleaning up the Palau Higher Geography is a great idea, but it may create more work for us by changing the geography that I already went through to format for uploading. I believe they will ready for uploading next week or the week after, if you could hold off until then.
@lin-fred - What do you think?
So how about a common place/procedure for these suggestions - or is there already one? I'm sure I'm not the only one who comes up with ideas for improvement, e.g., in Belize, two of the six districts use the word District. The Belize District doesn't, but Wikipedia does (https://en.wikipedia.org/wiki/Belize_District) probably to disambiguate it from the country Belize, and it would clarify the string Central America, Belize, Belize. At the least, shouldn't we be consistent within a country?
Can the committee make these decisions monthly or regularly and we submit suggestions via a dedicated GitHub issue, etc.?
And, of course, we can hold off on Palau. I just happen to be working on our geolocation while I'm under shelter-in-your-house-with-your-computer orders.
I'm all for cleaning it up even more, but we are close to uploading our batch of data with the current organization. Waiting until after we've uploaded would definitely be helpful. I can post in this thread when it's been done, and then you all can move forward with these changes as you'd like?
Thanks Phyllis, we appreciate you waiting.
There are tons of inconsistencies like that in the Higher Geography. I think it would be a great use time working from home to make a concerted effort to catalog as many of the inconsistencies as possible in once place so they can be addressed systematically.
@Jegelewicz is this something we could suggest delegating to collegues who need tasks across the street?
So how about a common place/procedure for these suggestions - or is there already one?
This is it - just add them to the Geography in Arctos Project. In fact, there are already some suggested clean-ups in there....
I'm sure I'm not the only one who comes up with ideas for improvement, e.g., in Belize, two of the six districts use the word District. The Belize District doesn't, but Wikipedia does (https://en.wikipedia.org/wiki/Belize_District) probably to disambiguate it from the country Belize, and it would clarify the string Central America, Belize, Belize. At the least, shouldn't we be consistent within a country?
We have guidelines already, and yes, we should be consistent within a country. Changing those for consistency would definitely be a good thing, but checking around first is NICE. Had you made the Palau changes without warning, Paula and Lindsey would be completely baffled as to why their data wouldn't load and possibly think they had lost it, because they KNOW they checked all that higher geography while they were preparing their data!
Can the committee make these decisions monthly or regularly
That's for the @ArctosDB/geo-group to determine, but it seems like an excellent idea!
and we submit suggestions via a dedicated GitHub issue, etc.?
Not a dedicated issue, but many issues assigned to the Geography in Arctos Project would be the way to go.
Is this something we could suggest delegating to collegues who need tasks across the street?
Unfortunately, no. I am not inclined to give access to editing higher geography to someone who has never used Arctos. Too much could go wrong.
Oops, I did change two entries - for the islands of Babelthuap Island and Koror. I added the district that they are in. Then I realized that a lot more editing needed to be done and posted this issue. Hope that doesn't mess up Paula and Lindsey. I can always reverse them if that's easier. Let me know.
Yes some formal procedure would be most welcome. Documentation would be useful no matter what ultimately happens here, and new tools often come from that sort of thing.
FWIW, this issue doesn't make much sense to me. Someone willing and able to clean up data is rare and valuable, while making slight changes to already-consistent data is trivial using Arctos tools or most any editor. It's also an easy SQL update that I'm happy to help with. Finding them may not quite meet the "trivial" bar, but there are only 19 possible items involved.
@sharpphyl could you toss old and new higher_geog into something that can be made into CSV, which would make the bulkloader a non-issue? I could even get that from of the changelogs, although there's some non-quite-zero chance that some previous version would map to some unrelated current entry. Perhaps that (probably with some confirmation step) should be built in to Arctos as a cleanup tool?
Let me take a look over the weekend at some of the higher geography that I thought was inconsistent or inaccurate and try to create a draft CSV for further discussion.
@sharpphyl I meant as you make changes, so bulkloader data could be synced up to the new higher_geog, but again I'm up for anything (or nothing!).
There's a tool which uses search terms to translate geography - I wonder if it makes sense to throw "old" higher_geog into search terms, either as part of a procedure or automatically, so it could be available to that tool? The tool would almost certainly figure out the changes without an explicit translation, but with there's essentially no path to the wrong conclusion. And of course the tool could be more exposed/integrated, or the pre-bulkloader more integrated into the 'normal' process, or whatever.
Is it possible to use any pause-time to find or create WKT for these things? That can be added at any time without affecting the bulkloader in any way. @mkoo thoughts? Some docs on that would be useful too - I don't really know where WKT comes from....
Adding a good source (I haven't looked at these but inconsistent data tends to not have good links) doesn't change any functionality either, although I'm not sure how much sense it makes to change that without getting everything.
Is it possible to use any pause-time to find or create WKT for these things? That can be added at any time without affecting the bulkloader in any way. @mkoo thoughts? Some docs on that would be useful too - I don't really know where WKT comes from....
Same - WKT is like a unicorn to me, but WKT for all HG would be the bomb.
as you make changes, so bulkloader data could be synced up to the new higher_geog, but again I'm up for anything (or nothing!).
That would be a nice feature, but I'm not sure we should spend a lot of resources on it? This probably doesn't happen super-often so maybe my ask for an issue to warn people is overkill. It just seems like a nice thing to do, even if the announcement comes after the updates so that anyone with data in process can make necessary adjustments.
I got the authority change notification for the edits that @sharpphyl made - perhaps a log of such changes for the last month somewhere would be enough to help someone caught between the old and new values?
spend a lot of resources
In general, I agree - magicky-tools need discussed and prioritized, and implementation isn't something I'd be very excited to tackle pre-PG.
For a specific use case that we know is going to cause problems, given a manually-built old/new CSV I could very quickly make updates (or Arctos tools can be used, or find-replace in a text file, or ....).
There's currently no temporal control on the viewer so this may eat your browser but
https://arctos.database.museum/info/ctchange_log.cfm?tbl=GEOG_AUTH_REC
is all changes since we started tracking them. I'm actually not sure there's a link to that one anywhere in the UI - geog is a weird kind of "code table." New Issue....
@dustymc I can certainly send you a csv if and when I make changes to the higher geography, but I've done that very, very infrequently and usually only on HG that impacts only DMNS:Inv - but I hadn't thought of collections that were new and in the process of creating their bulkload which would certainly cause them problems. I'll hold off until Paula or Lindsey say they're done with their upload.
I think a notification is a good best practice.
Do we want a separate issue for every correction? That would seem overwhelming, but this request may get lost in this issue which has wandered a bit off course. I can move it to a separate issue if need be but it's not a big change if it's ok with MVZ and not in anyone's bulkloader.
https://arctos.database.museum/Locality.cfm?Action=editGeog&geog_auth_rec_id=10003440 is listed as
Pacific Ocean, France, French Polynesia, Clipperton Island
It is no longer part of French Polynesia however it was administered from French Polynesia until 2007 which may be where this administrative structure came from. Per Wikipedia, https://en.wikipedia.org/wiki/Clipperton_Island, it is an overseas state private property of France, under direct authority of the Minister of Overseas France.
According to the handbook, we are to use current definitions of geographical units. (By the way, the Handbook for HG - https://handbook.arctosdb.org/documentation/higher-geography.html- cites Baja California Norte as a good example of a state, but the name of the state has been Baja California since 1952. See https://en.wikipedia.org/wiki/Baja_California. I can try to change it if that's ok with everyone.)
Eliminating French Polynesia from this string will impact
1 DMNS:Inv specimens
1 MVZ:Bird specimens
2 MVZ:Egg specimens
I agree that advance notification is the best practice, but does a committee eventually make the change? Or @dustymc is this the kind of change you were suggesting I make, then send you a csv so you can magic the before and after together? What about the impact on existing (MVZ) records? Personally, the update wouldn't matter to me, but other updates might.
Your report https://arctos.database.museum/info/ctchange_log.cfm?tbl=GEOG_AUTH_REC came up ok and the changes I made are toward the very bottom marked "updating" but it doesn't highlight what changed.
Lastly, I amended one entry in French Polynesia which may impact Lindsey and Paula's bulkloader but no other collection currently uses it. Pacific Ocean, France, French Polynesia, Society Islands, Rangiroa has been corrected to be Pacific Ocean, France, French Polynesia, Tuamotu Archipelago, Rangiroa. I can change it back until you finish your bulk upload if you're using this HG.
Do we want a separate issue for every correction? That would seem overwhelming,
Yes is my answer. Issues should be closed once the updates are complete and should only include ONE geography clean-up so that stuff doesn't get lost. BUT, making a change to update just one HG probably doesn't need an issue, just requests for clean-ups of a country or an island group or a major overhaul to how some group of HG is presented.
then send you a csv so you can magic the before and after together
Only if we KNOW it's going to interrupt an ongoing process, and even that's not much problem in my experience. If it is a problem we should discuss automation - I have the old data, we could use them, no CSV necessary.
doesn't highlight what changed.
That's just a table-dump. We could wrap UI around it if necessary - issue....
What about the impact on existing (MVZ) records?
They become easier to find. There's an archive, there's a notification, nobody who'd change the nature of the data or introduce inconsistencies has access (right!?!), all you can do is make things better.
separate issue for every correction
My answer is about the opposite of @Jegelewicz. I started these kinds of issues as a plea for help. I can slog around on wikipedia for a month and STILL might not get Kenya right; someone who actually knows something about the place is going to do a better job than I can, and I don't have the month to kill anyway. If that person shows up and volunteers to help I want to get the contents of their brain in Arctos, not scare them off with bureaucracy. Similarly if someone finds an easy-to-correct mistake or outlier I want them to spend the minute fixing it, not ignore it to avoid the bureaucracy.
It seems clear that the procedures need formalized across the community. AWG agenda?
Here's my take from the above.
making a change to update just one HG probably doesn't need an issue, just requests for clean-ups of a country or an island group or a major overhaul to how some group of HG is presented.
I'll go ahead and modify Clipper Island since it has minor impact on other collections and everyone will get notice of the change anyway.
I'll wait until the bulkload with Palau entries is complete (let me know here) before making any further adjustments.
I'll substitute Baja California Sur for Baja California Norte in the HG documentation.
The AWG will clarify procedures for notification and approvals needed before making corrections and improvements to existing higher geography. It's obvious we have a variety of opinions that need to be resolved.
I just prefer consistency, which isn't always possible for political terms.
How did we end up with the Gulf of Panama as a Sea? https://en.wikipedia.org/wiki/List_of_seas
@PaulaBarteau Do you have the Pearl Islands, Panama in your bulkloader? I just added the Balboa district to the higher geography since we're the only collection using it.
Pacific Ocean, Gulf of Panama, Panama, Balboa, Pearl Islands
I also want to make the Balboa entry consistent. Right now, it's in Central America and not the Pacific Ocean. It's an island district so Pacific Ocean is ok - right @dustymc. UTEP:Inv is using it. @mvzhuang is this change ok?
We don't seem to have any localities with the Pearl Islands