Documentation is http://handbook.arctosdb.org/documentation/encoding.html
Suggest we move the NOPRINT check to a function, add "not contains �" to all possible free-text fields. Below will need cleaned up first.
select guid_prefix,count(*) c from collection inner join cataloged_item on collection.collection_id=cataloged_item.collection_id inner join specimen_event on cataloged_item.collection_object_id=specimen_event.collection_object_id inner join collecting_event on specimen_event.collecting_event_ID=collecting_event.collecting_event_ID inner join locality on collecting_event.locality_id=locality.locality_id
where
3 spec_locality like '%�%' group by guid_prefix order by count(*);
GUID_PREFIX C
------------------------------------------------------------ ----------
UMNH:Teach 1
ALMNH:ES 1
UTEP:Mamm 1
HWML:Para 1
MVZ:Egg 1
UCM:Bird 2
DMNS:Bird 3
NMMNH:Ento 3
UTEP:Teach 3
UAM:Herb 4
UTEPObs:Herp 5
CHAS:Mamm 9
UTEP:Herp 9
UAM:Alg 9
UTEP:Inv 10
UAM:Inv 11
CHAS:Bird 12
UWBM:Herp 16
MSB:Para 17
MSB:Bird 21
MSB:Host 22
UNR:Herp 41
UTEP:Ento 43
MSB:Fish 75
MSB:Mamm 110
NMMNH:Mamm 110
UAMb:Herb 128
ASNHC:Herp 143
MSB:Herp 159
ASNHC:Mamm 1312
I have attempted to assign people who are responsible for these collections.
Put %�% in specific locality and search your collection to find what needs fixing.
UTEP:Teach corrected
ALMNH:ES corrected
Put %�% in specific locality and search your collection to find what needs fixing.
Yup, or let me know if you need some other query.
Need help fixing stuff? See http://handbook.arctosdb.org/documentation/encoding.html
Find replacement code at https://en.wikipedia.org/wiki/List_of_Unicode_characters use HTML option
Is there a faster way of editing the localities? Or is it one by one?
Also in Verbatim locality and locality remarks
Editing is one by one, unfortunately....
We can make a list and work on them together. Or I can give Paula access and she can help - as long as she knows exactly what to do.
If there's some pattern I can mass-update (post-postgres). "Everything" is my ideal filter for that, but I can do it for smaller sets as well, I just need to know how to find and what to replace with.
The main problem I'm seeing for us is the TRS data, which we won't know what it's supposed to be unless we go through each one by one and edit it. Does this have to get fixed now, for transfer into post gres, or is it something that could be fixed along with georeferencing?
MSB will need to wait until after PG because we have a couple hundred of
these.
On Tue, May 19, 2020 at 12:07 PM Lindsey NMMNHS notifications@github.com
wrote:
- [EXTERNAL]*
The main problem I'm seeing for us is the TRS data, which we won't know
what it's supposed to be unless we go through each one by one and edit it.
Does this have to get fixed now, for transfer into post gres, or is it
something that could be fixed along with georeferencing?—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
https://github.com/ArctosDB/arctos/issues/2675#issuecomment-630987893,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ADQ7JBFSTHQRVV6GGPQQJ6LRSLC6LANCNFSM4M6MFZ4A
.
I have 611 just for NMMNH:Ento when I put %�% into Any Geography Term
Yes it's find to wait for PG, and I'm going to be extremely hesitant to do anything in Oracle anyway.
These will still be OK as data, it just won't be possible to save them - which will eventually break some script and I'll be forced to replace them with "[funky unicode fail diamond thingee was here]" or something equally annoying....
611 just for NMMNH:Ento
You'll only need to fix localities, not each specimen. Looks like you fixed it while I was typing?
select guid_prefix,count(*) c from collection inner join cataloged_item on collection.collection_id=cataloged_item.collection_id inner join specimen_event on cataloged_item.collection_object_id=specimen_event.collection_object_id inner join collecting_event on specimen_event.collecting_event_ID=collecting_event.collecting_event_ID inner join locality on collecting_event.locality_id=locality.locality_id
where
3 spec_locality like '%�%' group by guid_prefix order by guid_prefix;
GUID_PREFIX C
------------------------------------------------------------ ----------
ASNHC:Herp 143
ASNHC:Mamm 1312
CHAS:Bird 12
CHAS:Mamm 9
DMNS:Bird 3
HWML:Para 1
MSB:Bird 21
MSB:Fish 75
MSB:Herp 159
MSB:Host 22
MSB:Mamm 110
MSB:Para 17
MVZ:Egg 1
NMMNH:Inv 25
NMMNH:Mamm 110
UAM:Alg 9
UAM:Herb 4
UAM:Inv 11
UAMb:Herb 128
UCM:Bird 2
UMNH:Teach 1
UNR:Herp 41
UTEP:Ento 43
UTEP:Herp 9
UTEP:Inv 10
UTEP:Mamm 1
UTEPObs:Herp 5
UWBM:Herp 16
Oh.
Any Geography Term
Here's verbatim - we should get them too.
GUID_PREFIX C
------------------------------------------------------------ ----------
ASNHC:Herp 27
ASNHC:Mamm 2296
CHAS:Bird 1
CHAS:Mamm 8
DMNS:Bird 161
DMNS:Inv 3
HWML:Para 1
MSB:Bird 87
MSB:Fish 53
MSB:Herp 80
MSB:Host 9
MSB:Mamm 156
MSB:Para 2
MVZ:Bird 2
NMMNH:Ento 606
NMMNH:Inv 375
NMMNH:Mamm 156
UAM:Alg 9
UAM:Herb 4
UAM:Inv 11
UAM:Mamm 2
UAMObs:Ento 1
UAMb:Herb 163
UCM:Bird 2
UCM:Mamm 14
UCM:Obs 2
UMNH:Teach 1
UNR:Fish 1
UNR:Herp 41
UTEP:Bird 1
UTEP:ES 2520
UTEP:Ento 133
UTEP:Herb 29
UTEP:Herp 104
UTEP:HerpOS 7
UTEP:Inv 104
UTEP:Mamm 4
UTEP:Teach 1
UTEP:Zoo 1
UTEPObs:Herp 29
UWBM:Herp 17
I'll try and at least get the specific locality ones done now as it doesn't contain counties and TRS data, which is where a lot of errors are coming from. But there are a lot of these issues in Verbatim Locality and Locality Remarks
Locality Remarks
select guid_prefix,count(*) c from collection inner join cataloged_item on collection.collection_id=cataloged_item.collection_id inner join specimen_event on cataloged_item.collection_object_id=specimen_event.collection_object_id inner join collecting_event on specimen_event.collecting_event_ID=collecting_event.collecting_event_ID inner join locality on collecting_event.locality_id=locality.locality_id
where
locality_remarks like '%�%' group by guid_prefix order by guid_prefix;
GUID_PREFIX C
------------------------------------------------------------ ----------
ALMNH:ES 1
MSB:Mamm 558
NMMNH:Ento 20
NMMNH:Herb 8
NMMNH:Mamm 557
UAM:Alg 186
UCM:Bird 3
UCM:Herp 18
UCM:Mamm 3
UMZM:Mamm 1
UTEP:Ento 289
UTEP:Herb 19
UTEP:Herp 520
UTEP:HerpOS 5
UTEP:Inv 14
I have 611 just for NMMNH:Ento when I put %�% into Any Geography Term
A lot of these are duplicates (13 specimens share a locality and collecting event) so it is less than you think, but still a lot of work. For these kinds of things, I like to tackle 5 a day until they are done. Although I also end up getting on a roll and end up spending an hour so that I can finish up the 56 from some specific collection.
The only issue is that until we fix what is there and Dusty can change data validation, people can continue to create more.....
The only issue is that until we fix what is there and Dusty can change data validation, people can continue to create more.....
So until everyone fixes everything in specific locality, verbatim locality, and locality remarks, we can't change data validation?
I believe that the NMMNH and MSB Mamm localities are the same. These are
all likely Dave Hafner's Mexico material, probably all enyes. So we only
have to fix 557 between us.
On Tue, May 19, 2020 at 12:37 PM Lindsey NMMNHS notifications@github.com
wrote:
- [EXTERNAL]*
The only issue is that until we fix what is there and Dusty can change
data validation, people can continue to create more.....So until everyone fixes everything in specific locality, verbatim
locality, and locality remarks, we can't change data validation?—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
https://github.com/ArctosDB/arctos/issues/2675#issuecomment-631005344,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ADQ7JBCVPILVGQWCUGYVMKDRSLGW5ANCNFSM4M6MFZ4A
.
Probably best to have the exact text characters in the verbatim but not in the specific locality for search reasons. I think there is a mix of letters with accents.
Jonathan L. Dunnum Ph.D.
Senior Collection Manager
Division of Mammals, Museum of Southwestern Biology
University of New Mexico
Albuquerque, NM 87131
(505) 277-9262
Fax (505) 277-1351
MSB Mammals website: http://www.msb.unm.edu/mammals/index.html
Facebook: http://www.facebook.com/MSBDivisionofMammals
Shipping Address:
Museum of Southwestern Biology
Division of Mammals
University of New Mexico
CERIA Bldg 83, Room 204
Albuquerque, NM 87131
From: Mariel Campbell notifications@github.com
Sent: Tuesday, May 19, 2020 12:44 PM
To: ArctosDB/arctos arctos@noreply.github.com
Cc: Jonathan Dunnum jldunnum@unm.edu; Assign assign@noreply.github.com
Subject: Re: [ArctosDB/arctos] Locality character conversion issues (#2675)
[EXTERNAL]
I believe that the NMMNH and MSB Mamm localities are the same. These are
all likely Dave Hafner's Mexico material, probably all enyes. So we only
have to fix 557 between us.
On Tue, May 19, 2020 at 12:37 PM Lindsey NMMNHS notifications@github.com
wrote:
- [EXTERNAL]*
The only issue is that until we fix what is there and Dusty can change
data validation, people can continue to create more.....So until everyone fixes everything in specific locality, verbatim
locality, and locality remarks, we can't change data validation?—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
https://github.com/ArctosDB/arctos/issues/2675#issuecomment-631005344,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ADQ7JBCVPILVGQWCUGYVMKDRSLGW5ANCNFSM4M6MFZ4A
.
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHubhttps://github.com/ArctosDB/arctos/issues/2675#issuecomment-631008980, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AED2PAZOGBBVBJ37BFPFSRTRSLHPPANCNFSM4M6MFZ4A.
For the Hafner material, we at least have an original spreadsheet with the
actual values. What about pulling these from the global bulkload file
archive? If we have everything that was ever bulkloaded, we should be able
to find the original locality values and replace the invalid characters
with the original.
On Tue, May 19, 2020 at 12:48 PM jldunnum notifications@github.com wrote:
- [EXTERNAL]*
Probably best to have the exact text characters in the verbatim but not in
the specific locality for search reasons. I think there is a mix of letters
with accents.
Jonathan L. Dunnum Ph.D.
Senior Collection Manager
Division of Mammals, Museum of Southwestern Biology
University of New Mexico
Albuquerque, NM 87131
(505) 277-9262
Fax (505) 277-1351MSB Mammals website: http://www.msb.unm.edu/mammals/index.html
Facebook: http://www.facebook.com/MSBDivisionofMammalsShipping Address:
Museum of Southwestern Biology
Division of Mammals
University of New Mexico
CERIA Bldg 83, Room 204
Albuquerque, NM 87131
From: Mariel Campbell notifications@github.com
Sent: Tuesday, May 19, 2020 12:44 PM
To: ArctosDB/arctos arctos@noreply.github.com
Cc: Jonathan Dunnum jldunnum@unm.edu; Assign assign@noreply.github.com
Subject: Re: [ArctosDB/arctos] Locality character conversion issues (#2675)[EXTERNAL]
I believe that the NMMNH and MSB Mamm localities are the same. These are
all likely Dave Hafner's Mexico material, probably all enyes. So we only
have to fix 557 between us.On Tue, May 19, 2020 at 12:37 PM Lindsey NMMNHS notifications@github.com
wrote:
- [EXTERNAL]*
The only issue is that until we fix what is there and Dusty can change
data validation, people can continue to create more.....So until everyone fixes everything in specific locality, verbatim
locality, and locality remarks, we can't change data validation?—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
https://github.com/ArctosDB/arctos/issues/2675#issuecomment-631005344,
or unsubscribe
<
https://github.com/notifications/unsubscribe-auth/ADQ7JBCVPILVGQWCUGYVMKDRSLGW5ANCNFSM4M6MFZ4A.
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub<
https://github.com/ArctosDB/arctos/issues/2675#issuecomment-631008980>,
or unsubscribe<
https://github.com/notifications/unsubscribe-auth/AED2PAZOGBBVBJ37BFPFSRTRSLHPPANCNFSM4M6MFZ4A.
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
https://github.com/ArctosDB/arctos/issues/2675#issuecomment-631011256,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ADQ7JBAQ6IYQCJ5SGBLCWBDRSLH7BANCNFSM4M6MFZ4A
.
letters with accents.
See http://handbook.arctosdb.org/documentation/encoding.html
Accents (hieroglyphics, cyrillic, kanji, whatever) are fine.
All of that stuff HTML-encoded is acceptable, but not searchable.
This problem - some character or characters replaced by a 'I have no idea what you mean' unicode character - comes about when you have those in some non-UTF encoding and your editor doesn't properly convert them to UTF before they're loaded to Arctos.
@campmlc I can look but I think its unlikely this happened after bulkloading.
It happened during bulkloading. I have also gone in and made various edits based on fixes Dave sent me directly after he found the georeferencing problems from Steven's (volunteer) re-georeferencing. He had various locality changes based on the data from his catalogs, said he wasn't sure when the errors got introduced but most were there when we got the original data from Patty.
Jonathan L. Dunnum Ph.D.
Senior Collection Manager
Division of Mammals, Museum of Southwestern Biology
University of New Mexico
Albuquerque, NM 87131
(505) 277-9262
Fax (505) 277-1351
MSB Mammals website: http://www.msb.unm.edu/mammals/index.html
Facebook: http://www.facebook.com/MSBDivisionofMammals
Shipping Address:
Museum of Southwestern Biology
Division of Mammals
University of New Mexico
CERIA Bldg 83, Room 204
Albuquerque, NM 87131
From: dustymc notifications@github.com
Sent: Tuesday, May 19, 2020 12:53 PM
To: ArctosDB/arctos arctos@noreply.github.com
Cc: Jonathan Dunnum jldunnum@unm.edu; Assign assign@noreply.github.com
Subject: Re: [ArctosDB/arctos] Locality character conversion issues (#2675)
[EXTERNAL]
letters with accents.
See http://handbook.arctosdb.org/documentation/encoding.html
Accents (hieroglyphics, cyrillic, kanji, whatever) are fine.
All of that stuff HTML-encoded is acceptable, but not searchable.
This problem - some character or characters replaced by a 'I have no idea what you mean' unicode character - comes about when you have those in some non-UTF encoding and your editor doesn't properly convert them to UTF before they're loaded to Arctos.
@campmlchttps://github.com/campmlc I can look but I think its unlikely this happened after bulkloading.
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHubhttps://github.com/ArctosDB/arctos/issues/2675#issuecomment-631014163, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AED2PA2EIW52SAYWFDNNAIDRSLITRANCNFSM4M6MFZ4A.
Just looked at the original spreadsheet used for bulkloading.
This is a locality from that file: 8 mi. S, 3 mi. W La PurÃsima
This is the same locality after being bulkloaded into Arctos and
redownloaded. There is a ? in the blank space. 8 mi. S, 3 mi. W La Pur sima
On Tue, May 19, 2020 at 1:00 PM jldunnum notifications@github.com wrote:
- [EXTERNAL]*
It happened during bulkloading. I have also gone in and made various edits
based on fixes Dave sent me directly after he found the georeferencing
problems from Steven's (volunteer) re-georeferencing. He had various
locality changes based on the data from his catalogs, said he wasn't sure
when the errors got introduced but most were there when we got the original
data from Patty.
Jonathan L. Dunnum Ph.D.
Senior Collection Manager
Division of Mammals, Museum of Southwestern Biology
University of New Mexico
Albuquerque, NM 87131
(505) 277-9262
Fax (505) 277-1351MSB Mammals website: http://www.msb.unm.edu/mammals/index.html
Facebook: http://www.facebook.com/MSBDivisionofMammalsShipping Address:
Museum of Southwestern Biology
Division of Mammals
University of New Mexico
CERIA Bldg 83, Room 204
Albuquerque, NM 87131
From: dustymc notifications@github.com
Sent: Tuesday, May 19, 2020 12:53 PM
To: ArctosDB/arctos arctos@noreply.github.com
Cc: Jonathan Dunnum jldunnum@unm.edu; Assign assign@noreply.github.com
Subject: Re: [ArctosDB/arctos] Locality character conversion issues (#2675)[EXTERNAL]
letters with accents.
See http://handbook.arctosdb.org/documentation/encoding.html
Accents (hieroglyphics, cyrillic, kanji, whatever) are fine.
All of that stuff HTML-encoded is acceptable, but not searchable.
This problem - some character or characters replaced by a 'I have no idea
what you mean' unicode character - comes about when you have those in some
non-UTF encoding and your editor doesn't properly convert them to UTF
before they're loaded to Arctos.@campmlchttps://github.com/campmlc I can look but I think its unlikely
this happened after bulkloading.—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub<
https://github.com/ArctosDB/arctos/issues/2675#issuecomment-631014163>,
or unsubscribe<
https://github.com/notifications/unsubscribe-auth/AED2PA2EIW52SAYWFDNNAIDRSLITRANCNFSM4M6MFZ4A.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/ArctosDB/arctos/issues/2675#issuecomment-631019169,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ADQ7JBDFOV5BFQ7FGXUQWW3RSLJMVANCNFSM4M6MFZ4A
.
That is from MSB:Mamm:274283
On Tue, May 19, 2020 at 1:02 PM Mariel Campbell campbell@carachupa.org
wrote:
Just looked at the original spreadsheet used for bulkloading.
This is a locality from that file: 8 mi. S, 3 mi. W La PurÃsima
This is the same locality after being bulkloaded into Arctos and
redownloaded. There is a ? in the blank space. 8 mi. S, 3 mi. W La Pur simaOn Tue, May 19, 2020 at 1:00 PM jldunnum notifications@github.com wrote:
- [EXTERNAL]*
It happened during bulkloading. I have also gone in and made various
edits based on fixes Dave sent me directly after he found the
georeferencing problems from Steven's (volunteer) re-georeferencing. He had
various locality changes based on the data from his catalogs, said he
wasn't sure when the errors got introduced but most were there when we got
the original data from Patty.
Jonathan L. Dunnum Ph.D.
Senior Collection Manager
Division of Mammals, Museum of Southwestern Biology
University of New Mexico
Albuquerque, NM 87131
(505) 277-9262
Fax (505) 277-1351MSB Mammals website: http://www.msb.unm.edu/mammals/index.html
Facebook: http://www.facebook.com/MSBDivisionofMammalsShipping Address:
Museum of Southwestern Biology
Division of Mammals
University of New Mexico
CERIA Bldg 83, Room 204
Albuquerque, NM 87131
From: dustymc notifications@github.com
Sent: Tuesday, May 19, 2020 12:53 PM
To: ArctosDB/arctos arctos@noreply.github.com
Cc: Jonathan Dunnum jldunnum@unm.edu; Assign>
Subject: Re: [ArctosDB/arctos] Locality character conversion issues
(#2675)[EXTERNAL]
letters with accents.
See http://handbook.arctosdb.org/documentation/encoding.html
Accents (hieroglyphics, cyrillic, kanji, whatever) are fine.
All of that stuff HTML-encoded is acceptable, but not searchable.
This problem - some character or characters replaced by a 'I have no idea
what you mean' unicode character - comes about when you have those in some
non-UTF encoding and your editor doesn't properly convert them to UTF
before they're loaded to Arctos.@campmlchttps://github.com/campmlc I can look but I think its unlikely
this happened after bulkloading.—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub<
https://github.com/ArctosDB/arctos/issues/2675#issuecomment-631014163>,
or unsubscribe<
https://github.com/notifications/unsubscribe-auth/AED2PA2EIW52SAYWFDNNAIDRSLITRANCNFSM4M6MFZ4A.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/ArctosDB/arctos/issues/2675#issuecomment-631019169,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ADQ7JBDFOV5BFQ7FGXUQWW3RSLJMVANCNFSM4M6MFZ4A
.
@campmlc is that CSV or something proprietary?
In either case, what character encoding is used?
UAM@ARCTOS> select spec_locality from bulkloader_deletes where spec_locality like '8 mi. S, 3 mi. W La Pur%';
SPEC_LOCALITY
------------------------------------------------------------------------------------------------------------------------
8 mi. S, 3 mi. W La Pur�sima
8 mi. S, 3 mi. W La Pur�sima
8 mi. S, 3 mi. W La Pur�sima
8 mi. S, 3 mi. W La Pur�sima
8 mi. S, 3 mi. W La Pur�sima
8 mi. S, 3 mi. W La Pur�sima
8 mi. S, 3 mi. W La Pur�sima
7 rows selected.
@dustymc Here are some you can bulk edit?
All verbatim locality for events in locality nickname UTEP:ES:Site 21
replace "Do�a Ana" with "Doña Ana"
Is that do-able?
Is that do-able?
easily- but let's wait until we're in a less-meltable environment?
Csv
There are likely georeferencing problems.
On Tue, May 19, 2020, 1:07 PM dustymc notifications@github.com wrote:
- [EXTERNAL]*
@campmlc https://github.com/campmlc is that CSV or something
proprietary?In either case, what character encoding is used?
UAM@ARCTOS> select spec_locality from bulkloader_deletes where spec_locality like '8 mi. S, 3 mi. W La Pur%';
SPEC_LOCALITY
8 mi. S, 3 mi. W La Pur�sima
8 mi. S, 3 mi. W La Pur�sima
8 mi. S, 3 mi. W La Pur�sima
8 mi. S, 3 mi. W La Pur�sima
8 mi. S, 3 mi. W La Pur�sima
8 mi. S, 3 mi. W La Pur�sima
8 mi. S, 3 mi. W La Pur�sima
7 rows selected.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/ArctosDB/arctos/issues/2675#issuecomment-631023291,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ADQ7JBFIHQCYJNVDCL4KDBDRSLKGDANCNFSM4M6MFZ4A
.
NMMNH:Inv; NMMNH:Mamm; NMMNH:Ento specific localities done
Do�a Ana
I just got all of the unverified events while I was in there, here's the original if someone wants me to undo something.
update collecting_event set verbatim_locality=replace(verbatim_locality,'Do�a Ana','Doña Ana') where verbatim_locality like '%Do�a Ana%' and collecting_event_id not in (select collecting_event_id from specimen_event where VERIFICATIONSTATUS='verified and locked');
198 rows updated.
YASSSSSS! Thanks!
OK, here is another possible batch.
All named localities starting with UTEP:ES:Site
that have 34� in verbatim locality, replace with 34°
also
that have 106� in verbatim locality, replace with 106°
Wow, you even got the right character!
I try to read and pick the right thing - it doesn't always work, but dang it, I do try!
So say we all!
FYI my usual go-to for that is http://www.fileformat.info/info/unicode/char/search.htm?q=%C2%B0&preview=entity
And now for the rest of that thought: I wonder if we can and/or should block some of those? Do we really need to accept áµ’ and ° and ° and <sup>o</sup> and the bajillion other ways to make something that sorta looks like a degree symbol? If we should filter, is blocking them worth the investment - does it MATTER that "34o" is slightly less searchable in a field that's fundamentally not searchable, or is that vastly outweighed by the work to clean data? If you've found the specimen-or-whatever it still adequately conveys the idea to humans - is doing more worth the effort?
I tried using the Unicode and it failed to do anything. I just ended up with "34U+00B0" so I used the HTML instead. If we are going to select one, let's make sure it is one that ends up being readable even if it isn't searchable.
Unicode and it failed to do anything.
In what context?
http://test.arctos.database.museum/editLocality.cfm?locality_id=84325

ends up being readable
Also depends on context - eg, the HTML looks like ° or ° or <sup>o</sup> or whatever in many views (CSV probably most relevant here).

Changed to "ñ" and now it's

@dustymc here are some bulk replacements you can make.
In Specific locality:
M�zquiz = Múzquiz
38�#0' = 38°#0'
26� ENE = 26° ENE
Do�a Ana = Doña Ana
130� = 130°
Ca�oncito = Cañoncito
Volc�n Po�s = Volcán Poás
Ca�on = Cañon
Mayag�ez = Mayagüez
In verbatim locality
130� = 130°
Can we deal with this systematically in https://github.com/ArctosDB/arctos/issues/2678 instead of sniping away at problems which immediately return if we let them?
I got Do�a Ana on May 20, it's apparently back.
it's apparently back.
Yeah - I thought these were getting caught at data entry - not true?
I've been working on these - should I stop or keep going?
Keep going!
OK, I have edited all of the localities with the � that I can. The remaining 61 I cannot determine what the replacement character(s) should be. I vote that we replace with [?] and STOP THIS MADNESS.
agree.
On Tue, Nov 17, 2020 at 12:18 PM Teresa Mayfield-Meyer <
[email protected]> wrote:
- [EXTERNAL]*
OK, I have edited all of the localities with the � that I can. The
remaining 61 I cannot determine what the replacement character(s) should
be. I vote that we replace with [?] and STOP THIS MADNESS.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/ArctosDB/arctos/issues/2675#issuecomment-729145716,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ADQ7JBA4TCWXZNOXVAVISULSQLEB5ANCNFSM4M6MFZ4A
.
I agree and am ok with the replacement
STOP THIS MADNESS.
See https://github.com/ArctosDB/arctos/issues/2678
2 possibilities for madness-stopping
Replace � with something else everywhere, update the function. This would be an entirely consistent solution - � would be banned from all of Arctos, yay everybody. We'd also end up with a lot of replacement "something" ([?] is nice) in various places.
Clean up one field, swap that field, and only that field, to a new check that disallows � (plus whatever the current check does). This would NOT be consistent, would be confusing, drags at least part of this problem out indefinitely, means I have two functions to get confused by. Not so yay, but maybe tolerable.
I'm a big fan of (1) but reality might not be.
Sheesh - where else do we have �?
I'm fine with the replacement. If you send a list maybe I can fit a few more.
Jonathan L. Dunnum Ph.D.
Senior Collection Manager
Division of Mammals, Museum of Southwestern Biology
University of New Mexico
Albuquerque, NM 87131
(505) 277-9262
Fax (505) 277-1351
MSB Mammals website: http://www.msb.unm.edu/mammals/index.html
Facebook: http://www.facebook.com/MSBDivisionofMammals
Shipping Address:
Museum of Southwestern Biology
Division of Mammals
University of New Mexico
CERIA Bldg 83, Room 204
Albuquerque, NM 87131
From: Teresa Mayfield-Meyer notifications@github.com
Sent: Tuesday, November 17, 2020 1:08 PM
To: ArctosDB/arctos arctos@noreply.github.com
Cc: Jonathan Dunnum jldunnum@unm.edu; Assign assign@noreply.github.com
Subject: Re: [ArctosDB/arctos] Locality character conversion issues (#2675)
[EXTERNAL]
Sheesh - where else do we have �?
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHubhttps://github.com/ArctosDB/arctos/issues/2675#issuecomment-729172060, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AED2PA73MDQRRVG3QD4N27DSQLJ2LANCNFSM4M6MFZ4A.
where else do we have �?
https://github.com/ArctosDB/arctos/issues/2678#issuecomment-729226025
From Excel, save your bulkload csv files as UTF-8! Spread the word and document!
@ebraker add end screen to tutorials and @Jegelewicz will peruse documentation to add the above.
Most helpful comment
OK, I have edited all of the localities with the � that I can. The remaining 61 I cannot determine what the replacement character(s) should be. I vote that we replace with [?] and STOP THIS MADNESS.