The problem
Conducting searches using Any Taxon or Any Geography is resulting in timeout errors, flaming computers (#1564 ), and general curator crankiness. Majority of users (at MVZ anyway) are simply searching on a taxon and something (like Tissue flag, or county or island etc). They dont expand all the search options and dont get any results since the search times out. (we have already have eliminated browser, network, 3rd app interference as issues).
Describe the solution you'd like
Unlimited resources would help ( #857) but since we dont have that, Id like to suggest some high-priority fix to the interface for the short-term
Alternatives?
Open to suggestions-- anything that will reduce timeout errors. I am not sure what to do with Any Geography at all frankly (and how frequent this is the issue)
Thoughts on this since it would be a change to the default search interface?!
Priority
I would like to have this resolved by date: ASAP
The flip side of this is that identification is unpredictable.
(200 woodfrogs) and
(500 woodfrogs) are not discoverable by any one ID search, where
http://arctos.database.museum/SpecimenResults.cfm?taxon_name=Rana%20sylvatica
finds them all - when it doesn't time out.
Locality data are very similar (perhaps even less predictable) - the "any geog" term matches anything (including data from external sources) but is slow, specific fields are unpredictable (less so for Curators, possibly) but fast. Something I was looking at this AM looks like this...

so isn't very discoverable by locality data.
We moved the 'anytaxon' field to the top so the general public would be more likely to find what they were looking for - many would type in a higher taxon like a family name (eg "Silphidae") into what was the scientific name/identification field and be surprised that they were only finding specimens that were identified to family and no members of that family (id'd to species for example).
Solution: have a different looking search form for logged-in users than the search form that the public gets when they aren't logged in?
I think the user that spawned this was "us" and I recently dealt with something similar (but completely different!) involving a "not-us" grad student who was looking for a genus in "any taxon."
Not to hijack this Issue TOO much, but the obvious solution is to throw hardware at this. I've also been dealing with various things, both internal (eg, part display, media access) and external (our "volunteers" at VertNet are restless) that could greatly benefit from an Oracle upgrade. Is there some way we can swing a major infrastructure upgrade?
Yes, this was supposed to make public searches easier, but not if it times
out. I had a complaint from a local researcher about this last week: "any
taxon" = mammalia, "any geog" = california" timed out.
Is this something we need more disk space for at TACC?
Just saw Dusty's reply . . .
How much? Grant proposal with TACC?
On Tue, Oct 23, 2018 at 1:59 PM DerekSikes notifications@github.com wrote:
We moved the 'anytaxon' field to the top so the general public would be
more likely to find what they were looking for - many would type in a
higher taxon like a family name (eg "Silphidae") into what was the
scientific name/identification field and be surprised that they were only
finding specimens that were identified to family and no members of that
family (id'd to species for example).Solution: have a different looking search form for logged-in users than
the search form that the public gets when they aren't logged in?โ
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/ArctosDB/arctos/issues/1755#issuecomment-432396528,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AOH0hGCrizy8OiCjqjeJ7P7nmUvHwftdks5un3UQgaJpZM4X2e0F
.
What's the hardware needed and let's get some estimates! Should we get Chris/TACC on this discussion?
Otherwise if we leave Any Taxon as the default, are there tweaks to what it is searching on to avoid the timeout error. Because that is the issue and not doing anything isnt an option
Space isn't the problem, processors are.
The answer to "how much?" is always "more!" CA + Mammalia is the intersection of two ridiculously large caches. There are some organizational things we could do to make that faster (stripe both the data and the indexes across multiple independent disks that can be read simultaneously, for example), but it's mostly a matter or raw processors - we ultimately still have to sort through a ~half-billion rows to come up with an answer.
There's also a bit of a positive feedback loop here. With more processors I could spend less time dealing with performance issues and more doing cool stuff with data, which would all take more resources. I'm fairly confident that I can find a use for just about any number of processors!
So is this a proposal we can write with TACC?
On Tue, Oct 23, 2018 at 2:56 PM dustymc notifications@github.com wrote:
Space isn't the problem, processors are.
The answer to "how much?" is always "more!" CA + Mammalia is the
intersection of two ridiculously large caches. There are some
organizational things we could do to make that faster (stripe both the data
and the indexes across multiple independent disks that can be read
simultaneously, for example), but it's mostly a matter or raw processors -
we ultimately still have to sort through a ~half-billion rows to come up
with an answer.There's also a bit of a positive feedback loop here. With more processors
I could spend less time dealing with performance issues and more doing cool
stuff with data, which would all take more resources. I'm fairly confident
that I can find a use for just about any number of processors!โ
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/ArctosDB/arctos/issues/1755#issuecomment-432416505,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AOH0hBHHd5co9SDyC8rcVwsBsIg6Ytwyks5un4KPgaJpZM4X2e0F
.
with TACC?
Possibly even to TACC?? I don't know how any of that works.
How many processors is Arctos limited to now?
What's a lot more? give me a number that makes you giddy. I need some
numbers to crunch....
On Tue, Oct 23, 2018 at 2:23 PM dustymc notifications@github.com wrote:
with TACC?
Possibly even to TACC?? I don't know how any of that works.
โ
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/ArctosDB/arctos/issues/1755#issuecomment-432425062,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ACZ_0eZBCqFu-a6Nw7RfstlseLIEcNscks5un4i0gaJpZM4X2e0F
.
I have no idea how to answer that. The box I can see is running 8 E5-2660s, but I don't know if that's what's actually available to Oracle or not - I think there's a bunch of VM magic somewhere in the stack.
I suppose "giddy" would be something like Amazon's auto scaling, where it can pull from some ridiculous number of processors as needed. TACC is probably in the best position to provide an answer to what's possible/realistic.
Per email 18 Dec 2018: Simple (temp?) solution: Move the "Search Any..." fields for geography and identification to the bottom of the set of fields in that block, and add a warning message "May time out - try a more specific search in the above fields"
There is significant functionality in any_geog (and any_taxon) that is not accessible in any other way. I'm hesitant to hide them, even if they do cause a flurry of "Arctos is broken!!" emails 3 or 4 times per year.
Less than 10% of registered users (and 0% of public users) have the "Locality" pane expanded by default. I think moving it and removing it are about equivalent for "not-us," and "we" (who created these data) are the only ones who have much of a chance of finding anything without the 'anything' option.
@mkoo proposed replacing any_taxon with scientificname (eg, current identification).
What is the proposed replacement for any_geog?
There is significant functionality in any_geog (and any_taxon) that is not accessible in any other way.
I use the any_geography field ALL the time when searching for ethnological and historical items as we rarely have our collections geolocated (getting more each day) - this is the main way I cast a wide net. If I get too many, then I move to Specific Locality. Our users would certainly have trouble finding the right way to look up Alaska village names if this was dropped in the hierarchy.
I am getting timeouts all too often, just now I can't even get a result for
all Alaska north slope Vulpes using a bounding box on the map. This should
be a couple hundred specimens. I eliminated some columns, chose tissues
only, etc. I can get a red fox result but not arctic fox. I am having to go
to vert net to find my specimens, not good.
On Tue, Dec 18, 2018 at 8:18 AM dustymc notifications@github.com wrote:
There is significant functionality in any_geog (and any_taxon) that is not
accessible in any other way. I'm hesitant to hide them, even if they do
cause a flurry of "Arctos is broken!!" emails 3 or 4 times per year.Less than 10% of registered users (and 0% of public users) have the
"Locality" pane expanded by default. I think moving it and removing it are
about equivalent for "not-us," and "we" (who created these data) are the
only ones who have much of a chance of finding anything without the
'anything' option.@mkoo https://github.com/mkoo proposed replacing any_taxon with
scientificname (eg, current identification).What is the proposed replacement for any_geog?
โ
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/ArctosDB/arctos/issues/1755#issuecomment-448298968,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AIDLepaq5e__E72yyjP7CF9PXtilq3_uks5u6SNRgaJpZM4X2e0F
.
--
Aren Gunderson
Mammal Collection Manager
University of Alaska Museum of the North http://www.uaf.edu/museum
1962 Yukon Drive
Fairbanks, AK 99775
[email protected]
907-474-6947
@amgunderson I dug one of your queries (Vulpes and coordinates) out of the logs.
As I found it, execution time was 00:01:21.31 which will time the UI out.
With last_part_location removed it runs in 00:00:01.12 which should easily complete.
I don't think we can cache part detail due to datatype limitations, which are currently 4KB. Supposedly Oracle 12 ups that to 32KB, which probably is enough for a cache. 12 also allegedly "speaks" JSON, so we could potentially do more with the cache, and it's probably cheaper to build it.
This seems to be one of a growing list of things that an upgrade would make easy to solve, even if it doesn't directly solve it by just performing better.
While we figure this out, maybe we should think about changing the explanation for the field, or adding suggestions?
Something about if your query produces and error message, please try adding more specific info to the field. Or if using both the Any Taxon and Any Geographic Feature fields returns an error message, please try using one of the more specific geography fields for your search?
I tried some stuff from the recent timeout logs, there seem to be two major categories.
Many of the queries complete well under the timeout limit - the performance of the DB is not the only factor here, it probably has to do with other queries/activity. This seems like something an Oracle performance consultant could fix - eg, it likely depends on things like the physical arrangement of disks.
There are a several timeouts caused by lots of attributes and similarly-arranged data (https://github.com/ArctosDB/arctos/issues/1755#issuecomment-448379783). We could reduce the complexity of the data public users view (https://github.com/ArctosDB/arctos/issues/857), reduce the number of attributes (https://github.com/ArctosDB/arctos/issues/1623), cache attributes individually in FLAT (would be a fairly significant maintenance load increase, I'm not 100% sure it'll fit anyway), or possibly exploit Oracle 12's increased VARCHAR size and JSON capabilities (if we ever get migrated to 12!) to cache all attributes as a data object.
@ewommack there's some discussion of the timeout error message (and other relevant stuff) at https://github.com/ArctosDB/arctos/issues/1564. A fair number of these queries do seem to be "us" and "we," very much unlike public users, should know how we've organized the data and therefore how to find them without using the "any" options.
I agree with @ewommack https://github.com/ewommack that having an
explanation in the interface would be a big help. I am especially concerned
because public users who don't know to click on see more options get
timeouts and give up, and then have a bad impression of our interface. But
even collection managers, especially from incoming collections, can use
some pointers with a helpful message. Another thing that can be done is to
go into a previous search and reduce the number of fields returned in
add/remove data fields. But that is a very obscure solution and it would
help to explain that as an option.
I talked with Jim Stuart, and suggested this may be related to their state
firewall. He is going to do the same search at home to see if he gets the
same timeout. He did successfully search on a smaller subset.
Dusty, which of the things on your list are actionable? Can we get a list
of specific things the AWG and steering committee can work towards? What do
we need to do to migrate to Oracle 12?
On Wed, Feb 27, 2019 at 11:01 AM dustymc notifications@github.com wrote:
I tried some stuff from the recent timeout logs, there seem to be two
major categories.Many of the queries complete well under the timeout limit - the
performance of the DB is not the only factor here, it probably has to do
with other queries/activity. This seems like something an Oracle
performance consultant could fix - eg, it likely depends on things like the
physical arrangement of disks.There are a several timeouts caused by lots of attributes and
similarly-arranged data (#1755 (comment)
https://github.com/ArctosDB/arctos/issues/1755#issuecomment-448379783).
We could reduce the complexity of the data public users view (#857
https://github.com/ArctosDB/arctos/issues/857), reduce the number of
attributes (#1623 https://github.com/ArctosDB/arctos/issues/1623),
cache attributes individually in FLAT (would be a fairly significant
maintenance load increase, I'm not 100% sure it'll fit anyway), or possibly
exploit Oracle 12's increased VARCHAR size and JSON capabilities (if we
ever get migrated to 12!) to cache all attributes as a data object.@ewommack https://github.com/ewommack there's some discussion of the
timeout error message (and other relevant stuff) at #1564
https://github.com/ArctosDB/arctos/issues/1564. A fair number of these
queries do seem to be "us" and "we," very much unlike public users, should
know how we've organized the data and therefore how to find them without
using the "any" options.โ
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/ArctosDB/arctos/issues/1755#issuecomment-467966500,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AOH0hLG5eBV0gsKLZN8Q56jTRnBbj-FDks5vRsf0gaJpZM4X2e0F
.
public users who don't know to click on see more options get timeouts and give up, and then have a bad impression of our interface.
I completely agree, but that's only half the story: we have to balance that against "public users find half of our woodfrogs (none of our Silphidae, etc., etc., etc.) and give up, and then have a bad impression of our interface."
a very obscure solution
See https://github.com/ArctosDB/arctos/issues/1564 - I don't think it's particularly obscure, but I'm definitely up for ideas re: how to make it more obvious. I can do that at any time.
Maybe something like Emily's awesome tutorial (http://handbook.arctosdb.org/how_to/How-to-Create-Agents.html#tutorial-video) would be useful here.
Oracle 12
TACC is working on migrating TEST to 12. I won't completely know how we might exploit those capabilities until I have a working copy.
actionable
We could restrict certain "results fields" to Operators - it would take some development, but I don't think it would be overly difficult. I dislike this - making the same data available to everyone should be a priority. Perhaps we could somewhat mitigate that by just providing all attributes as JSON instead of individual "columns" but that will fail for some records. (Oracle 12 will fix much, maybe even all, of that as well.)
We can discuss the number of attributes=="result columns" - https://github.com/ArctosDB/arctos/issues/1623.
We could cache individual attributes in columns, but I'd prefer to play with Oracle12 first.
We could pay for an Oracle consultant at any time.
We could discuss general performance with TACC - eg, maybe the queries that time out in the UI and work for me are related to network/webserver/sunspots and not Oracle as all. (Seems a bit unlikely, but still worth investigating.)
Sometimes specific queries can be tuned - I'm happy to take a look if you want to pass any known problems on.
Expensive columns on for >200 users are:
That is not necessarily most traffic - users with rarely-used columns may perform most queries, public users may turn things on, etc - but I think it's reasonably close.
Adding only those to FLAT might offload enough processing to made a difference, and shouldn't be too much work/maintenance.
Dusty, can you explain a bit as to what you mean by offloading to FLAT? Is
FLAT indexed data? I'm trying to understand all this.
On Wed, Feb 27, 2019 at 3:14 PM dustymc notifications@github.com wrote:
Expensive columns on for >200 users are:
- weight
- age
- age class
- total length
- tail length
- reproductive data
- hind foot with claw
That is not necessarily most traffic - users with rarely-used columns may
perform most queries, public users may turn things on, etc - but I think
it's reasonably close.Adding only those to FLAT might offload enough processing to made a
difference, and shouldn't be too much work/maintenance.โ
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/ArctosDB/arctos/issues/1755#issuecomment-468052414,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AOH0hNky8ghM5ElTsJRCfUDyMk8Cam5lks5vRwNUgaJpZM4X2e0F
.
Yes, essentially that. Attributes can't be smooshed into spreadsheets very readily (there might be 20 sex determinations) so they're pulled into results by calling a procedure which smooshes them into something that'll fit in a table cell. That's fairly easy to write code to and there's not much complexity involved, but the procedure is expensive and runs for every attribute in every row of every search result (that includes an attribute).
FLAT caches those data as text when they're created/changed. The computational costs are essentially pre-paid. The remaining costs are (mostly) just in maintenance and storage - the cache uses ~3 times as much disk (which I don't think is much of a problem), and there's a bunch of code to maintain. Adding a couple hundred attributes that change all the time to the cache just isn't something that I think we can afford. Adding the half-dozen that are used the most and probably won't ever change seems more realistic.
It looks to me like most of the problem queries are "us." Geography-cache queries usually work, taxonomy-cache queries usually work, combinations of them sorta-usually work, any of that combined with a bunch of curatorial-type attributes fails. State (or county or whatever) and/or identification and a bunch of curatorial-type attributes usually works. Can we come up with a "welcome to Arctos, don't use the 'any' things" tutorial to send out with operator invitations or something??
I just dug through a bunch of timeouts from the last ~48 hours. Data from the logs and some comments below.
The attribute thing does not seem to be a major factor in this. There may be other reasons to be more aggressive in caching it, but that won't fix most of these.
Some (maybe most) of these seem perfectly appropriate to me - the user did exactly what I'd have done (eg, because the taxonomy is waffly or it's a "minor" geography term that might be anywhere).
Others would have been better able to answer the question (or what I guess the question was) using "normal" terms. any geog "colorado"
select higher_geog from geog_auth_rec where GEOG_AUTH_REC_ID in (select GEOG_AUTH_REC_ID from cache_anygeog where GEOSTRING like '%COLORADO%') group by higher_geog order by higher_geog;
returns 415 rows from places like...
North America, United States, Wyoming, Sweetwater County, Colorado River
North America, United States, Wyoming, Uinta County, Colorado River
South America
South America, Argentina
South America, Argentina, Cordoba
South America, Argentina, Formosa
South America, Argentina, Neuquen
South America, Argentina, Rio Negro
South America, Argentina, Salta
South America, Argentina, Salta, Cafayate Department
South America, Bolivia, Potosi
South America, Brazil, Distrito Federal
South America, Brazil, Rondonia
South America, Chile, Araucania
South America, Colombia, Boyaca
South America, Ecuador
South America, Ecuador, Pichincha
South America, Ecuador, Santo Domingo
West Indies, Netherlands, Netherlands Antilles, Aruba Island Territory, Lesser Antilles, Aruba
West Indies, Trinidad and Tobago, Saint Patrick Parish, Lesser Antilles, Trinidad
West Indies, Trinidad and Tobago, Victoria Parish, Lesser Antilles, Trinidad
no higher geography recorded
for example - "stuff about which someone at some time used the string 'colorado'" - which is probably not what the user was looking for. We definitely need better documentation, and I'll play with more specific suggestions. ("that timed out - click here to try state_prov=colorado instead"). No clue how realistic that is yet.
Almost all of these completed in well under a minute. SOMETHING other than direct DB performance is a factor here. I'll ask TACC to check the webserver and such, but I suspect it's other things simultaneously going on in the DB and the solution comes back to an Oracle tuneup. (wild baseless guess: we need a disk dedicated to the temp tables these queries generate)
rawipaddress: 129.108.202.164,129.114.52.171
source: UT-EL-PASO
User: public
SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'GREAT BLUE HERON%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'GREAT BLUE HERON%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'GREAT BLUE HERON%' )
Execution Time: Elapsed: 00:00:02.08
Suggestion: none, if the query was for "things that have been called great blue heron"
Problem: other stuff was busy??
rawipaddress: 50.236.76.57,129.114.52.171
Comcast
username: (academic non-operator)
SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters,concatAttributeValue(filtered_flat.collection_object_id,'total length') total_length,concatAttributeValue(filtered_flat.collection_object_id,'weight') weight FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) INNER JOIN specimen_event segc ON (filtered_flat.collection_object_id = segc.collection_object_id) INNER JOIN cache_anygeog ON (segc.specimen_event_id = cache_anygeog.specimen_event_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'TAMIAS MINIMUS%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'TAMIAS MINIMUS%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'TAMIAS MINIMUS%' ) AND filtered_flat.collection_id IN ( select collection_id from collection where upper(guid_prefix) IN ('DMNS:MAMM') ) AND cache_anygeog.geostring like '%COLORADO%'
Execution Time: Elapsed: 00:00:44.29
Suggestion: use state_prov (if that's the Colorado in question) and identified as (if that's the scope)
Problem: 2 expensive terms + other stuff was busy??
rawipaddress: 164.64.199.11,129.114.52.171
NM-STATE-GOVERNMENT
SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) INNER JOIN specimen_event segc ON (filtered_flat.collection_object_id = segc.collection_object_id) INNER JOIN cache_anygeog ON (segc.specimen_event_id = cache_anygeog.specimen_event_id) WHERE filtered_flat.collection_object_id IS NOT NULL AND cache_anygeog.geostring like '%VALENCIA%'
;
Elapsed: 00:00:10.23
Suggestion: none, it's hard to say what "Valencia" is or where we might have entered it
problem: other stuff was busy??
rawipaddress: 136.152.143.22,129.114.52.171
University of California at Berkeley (UCAB-1)
SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) INNER JOIN specimen_event segc ON (filtered_flat.collection_object_id = segc.collection_object_id) INNER JOIN cache_anygeog ON (segc.specimen_event_id = cache_anygeog.specimen_event_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'MULTIFASCIATA%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'MULTIFASCIATA%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'MULTIFASCIATA%' ) AND filtered_flat.collection_id IN ( select collection_id from collection where upper(guid_prefix) IN ('MVZ:HERP') ) AND cache_anygeog.geostring like '%CHINA%'
Elapsed: 00:00:30.68
suggestion:
1) use country
2) not clear what 'MULTIFASCIATA' refers to - probably needs either a beginning % or more in identified-as?
rawipaddress: 129.108.81.21,129.114.52.171
UT-EL-PASO
SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) INNER JOIN specimen_event segc ON (filtered_flat.collection_object_id = segc.collection_object_id) INNER JOIN cache_anygeog ON (segc.specimen_event_id = cache_anygeog.specimen_event_id) WHERE filtered_flat.collection_object_id IS NOT NULL AND filtered_flat.collection_id IN ( select collection_id from collection where upper(guid_prefix) IN ('UTEP:HERB') ) AND cache_anygeog.geostring like '%IMPERIAL COUNTY%'
Elapsed: 00:00:07.61
suggestion: use county
rawipaddress: 187.189.107.207,129.114.52.171
MX-TPTE-LACNIC (mexico)
SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'APHELOCOMA WOODHOUSEII%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'APHELOCOMA WOODHOUSEII%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'APHELOCOMA WOODHOUSEII%' )
Elapsed: 00:00:02.16
suggestion: MAYBE use identifiedas, but this is probably what I'd have done
rawipaddress: 201.108.154.176,129.114.52.171
MX-USCV4-LACNIC (mexico)
SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) INNER JOIN specimen_event segc ON (filtered_flat.collection_object_id = segc.collection_object_id) INNER JOIN cache_anygeog ON (segc.specimen_event_id = cache_anygeog.specimen_event_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'PUMA%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'PUMA%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'PUMA%' ) AND cache_anygeog.geostring like '%MEXICO %'
Elapsed: 00:00:16.77
suggestion:
1) MAYBE use identifiedas, but this is probably what I'd have done
2) use country
rawipaddress: 164.64.199.11,129.114.52.171
NM-STATE-GOVERNMENT
SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) INNER JOIN specimen_event segc ON (filtered_flat.collection_object_id = segc.collection_object_id) INNER JOIN cache_anygeog ON (segc.specimen_event_id = cache_anygeog.specimen_event_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'MEPHITIS MACROURA%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'MEPHITIS MACROURA%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'MEPHITIS MACROURA%' ) AND cache_anygeog.geostring like '%NEW MEXICO%';
Elapsed: 00:00:11.54
suggestion: use identifiedas and state
rawipaddress: 164.64.199.11,129.114.52.171
NM-STATE-GOVERNMENT
SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) INNER JOIN specimen_event segc ON (filtered_flat.collection_object_id = segc.collection_object_id) INNER JOIN cache_anygeog ON (segc.specimen_event_id = cache_anygeog.specimen_event_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'NYCTINOMOPS%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'NYCTINOMOPS%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'NYCTINOMOPS%' ) AND cache_anygeog.geostring like '%NEW MEXICO%'
Elapsed: 00:00:12.10
suggestion: use identifiedas and state
rawipaddress: 129.72.142.82,129.114.52.171
UWYO
SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) INNER JOIN specimen_event segc ON (filtered_flat.collection_object_id = segc.collection_object_id) INNER JOIN cache_anygeog ON (segc.specimen_event_id = cache_anygeog.specimen_event_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'FALCO%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'FALCO%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'FALCO%' ) AND cache_anygeog.geostring like '%WYOMING%'
Elapsed: 00:00:16.91
suggestion: use identifiedas and state
rawipaddress: 192.12.184.6,129.114.52.171
Los Alamos National Laboratory (LANL-2)
username: D***[email protected]
SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.county county,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,getMonthCollected(filtered_flat.began_date,filtered_flat.ended_date) month_collected,filtered_flat.year year_collected,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters,concatAttributeValue(filtered_flat.collection_object_id,'sex') sex,concatAttributeValue(filtered_flat.collection_object_id,'weight') weight FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) WHERE filtered_flat.collection_object_id IS NOT NULL AND filtered_flat.cataloged_item_type='specimen' AND filtered_flat.collection_id IN ( select collection_id from collection where upper(guid_prefix) IN ('ALMNH:ES','ASNHC:MAMM','ASNHC:HERB','STAR:ALG','ASUMZ:HERP','ASUMZ:INVERT','ASUMZ:BIRD','ASUMZ:BIVALVE','ASUMZ:FISH','STAR:HERB','ASUMZ:ENTO','ASUMZ:MAMM','APSU:HERP','APSU:BIRD','APSU:FISH','APSU:MAMM','UWBM:HERP','UWBM:MAMM','CHAS:HERP','CHAS:EGG','CHAS:BIRD','CHAS:ES','CHAS:EH','CHAS:FISH','CHAS:ENTO','CHAS:MAMM','CHAS:INV','CHAS:HERB','CHAS:TEACH','COA:HERP','COA:EGG','COA:BIRD','COA:ENTO','COA:MAMM','DMNS:HERP','DMNS:EGG','DMNS:BIRD','DMNS:MAMM','DMNS:INV','DMNS:PARA','HWML:PARA','KNWROBS:FISH','KNWR:ENTO','KNWR:INV','KNWROBS:MAMM','KNWROBS:HERB','KNWR:HERB','KWP:ENTO','UCSC:HERP','UCSC:BIRD','UCSC:FISH','UCSC:MAMM','MLZ:EGG','MLZ:BIRD','MLZ:MAMM','MSB:HERP','MSB:BIRD','MSB:FISH','MSB:HOST','MSBOBS:MAMM','MSB:MAMM','MSB:PARA','MVZOBS:HERP','MVZ:HERP','MVZ:HILD','MVZ:EGG','MVZOBS:BIRD','MVZ:BIRD','MVZOBS:MAMM','MVZ:MAMM','UMNH:HERP','UMNH:BIRD','UMNH:MAMM','NMU:HERP','NMU:BIRD','NMU:MAMM','NMU:PARA','OWU:AMPH','OWU:BIRD','OWU:ENTO','OWU:EH','OWU:FISH','OWU:GEOL','OWU:HERB','OWU:INV','OWU:MAMM','OWU:ES','OWU:PARA','OWU:REPT','NBSB:BIRD','USNPC:PARA','UAM:ALG','UAM:HERP','UAM:ARC','UAMOBS:BIRD','UAM:BIRD','UAMB:HERB','UAM:ES','UAM:ENV','UAM:EH','UAMOBS:FISH','UAM:FISH','UAMOBS:ENTO','UAM:ENTO','UAM:INV','UAMOBS:MAMM','UAM:MAMM','UAM:HERB','UC-SCFS:BIRD','UC-SCFS:MAMM','UCM:HERP','UCM:EGG','UCM:BIRD','UCM:FISH','UCM:MAMM','UCM:OBS','UMZM:BIRD','UMZM:MAMM','UNR:HERP','UNR:EGG','UNR:BIRD','UNR:FISH','UNR:MAMM','UNM:ES','UTEPOBS:HERP','UTEP:HERPOS','UTEP:HERP','UTEP:ARC','UTEP:BIRD','UTEP:ES','UTEP:FISH','UTEPOBS:ENTO','UTEP:ENTO','UTEP:INV','UTEP:MAMM','UTEP:HERB','UTEP:TEACH','UTEP:ZOO','UWYMV:HERP','UWYMV:BIRD','UWYMV:FISH','UWYMV:MAMM','WNMU:BIRD','WNMU:FISH','WNMU:MAMM') ) AND upper(filtered_flat.phylorder) like 'RODENTIA%' AND UPPER(filtered_flat.state_prov) LIKE '%NEW MEXICO%'
89159 rows selected.
Elapsed: 00:01:04.39
suggestion: use class and state
NOTE: has attributes
rawipaddress: 98.238.128.19,129.114.52.171
SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'MUS%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'MUS%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'MUS%' )
DLM (successfully!) trying to break it
rawipaddress: 136.152.143.41,129.114.52.171
University of California at Berkeley (UCAB-1)
username: operator
SELECT distinct flat.collection_object_id,flat.guid guid,flat.scientific_name scientific_name,flat.country country,flat.state_prov state_prov,flat.spec_locality spec_locality,flat.verbatim_date verbatim_date,flat.dec_lat dec_lat,flat.dec_long dec_long,flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM flat WHERE flat.collection_object_id IS NOT NULL and flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'AGELAIUS PHOENICEUS MALLIARDORUM%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'AGELAIUS PHOENICEUS MALLIARDORUM%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'AGELAIUS PHOENICEUS MALLIARDORUM%' ) AND flat.collection_id IN ( select collection_id from collection where upper(guid_prefix) IN ('MVZ:BIRD') ) AND upper(PARTS) LIKE '%SKIN%'
no rows selected
Elapsed: 00:01:06.63
suggestion: use identifiedas
rawipaddress: 131.216.46.100,129.114.52.171
Nevada System of Higher Education (NEVAD-9)
SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) INNER JOIN fake_coordinate_error ON (filtered_flat.locality_id = fake_coordinate_error.locality_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'MAMMAL%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'MAMMAL%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'MAMMAL%' ) AND ( ( 36.23007635975191 between fake_coordinate_error.swlat and fake_coordinate_error.nelat OR 34.953528210949415 between fake_coordinate_error.swlat and fake_coordinate_error.nelat OR fake_coordinate_error.swlat between 34.953528210949415 and 36.23007635975191 OR fake_coordinate_error.nelat between 34.953528210949415 and 36.23007635975191 ) AND ( -115.437744140625 between fake_coordinate_error.swlong and fake_coordinate_error.nelong OR -114.49374389648438 between fake_coordinate_error.swlong and fake_coordinate_error.nelong OR fake_coordinate_error.swlong between -115.437744140625 and -114.49374389648438 OR fake_coordinate_error.nelong between -115.437744140625 and -114.49374389648438 ) )
1186 rows selected.
Elapsed: 00:00:04.02
suggestion: use class
Thanks Dusty!
I can see my time out where I was playing around in the system yesterday after the conversation started. I was trying out different combos of options to see what might cause it to time out.
cheers,
Beth
On Thu, Feb 28, 2019 at 1:17 PM dustymc <[email protected]notifications@github.com> wrote:
โ This message was sent from a non-UWYO address. Please exercise caution when clicking links or opening attachments from external sources.
I just dug through a bunch of timeouts from the last ~48 hours. Data from the logs and some comments below.
The attribute thing does not seem to be a major factor in this. There may be other reasons to be more aggressive in caching it, but that won't fix most of these.
Some (maybe most) of these seem perfectly appropriate to me - the user did exactly what I'd have done (eg, because the taxonomy is waffly or it's a "minor" geography term that might be anywhere).
Others would have been better able to answer the question (or what I guess the question was) using "normal" terms. any geog "colorado"
select higher_geog from geog_auth_rec where GEOG_AUTH_REC_ID in (select GEOG_AUTH_REC_ID from cache_anygeog where GEOSTRING like '%COLORADO%') group by higher_geog order by higher_geog;
returns 415 rows from places like...
North America, United States, Wyoming, Sweetwater County, Colorado River
North America, United States, Wyoming, Uinta County, Colorado River
South America
South America, Argentina
South America, Argentina, Cordoba
South America, Argentina, Formosa
South America, Argentina, Neuquen
South America, Argentina, Rio Negro
South America, Argentina, Salta
South America, Argentina, Salta, Cafayate Department
South America, Bolivia, Potosi
South America, Brazil, Distrito Federal
South America, Brazil, Rondonia
South America, Chile, Araucania
South America, Colombia, Boyaca
South America, Ecuador
South America, Ecuador, Pichincha
South America, Ecuador, Santo Domingo
West Indies, Netherlands, Netherlands Antilles, Aruba Island Territory, Lesser Antilles, Aruba
West Indies, Trinidad and Tobago, Saint Patrick Parish, Lesser Antilles, Trinidad
West Indies, Trinidad and Tobago, Victoria Parish, Lesser Antilles, Trinidad
no higher geography recorded
for example - "stuff about which someone at some time used the string 'colorado'" - which is probably not what the user was looking for. We definitely need better documentation, and I'll play with more specific suggestions. ("that timed out - click here to try state_prov=colorado instead"). No clue how realistic that is yet.
Almost all of these completed in well under a minute. SOMETHING other than direct DB performance is a factor here. I'll ask TACC to check the webserver and such, but I suspect it's other things simultaneously going on in the DB and the solution comes back to an Oracle tuneup. (wild baseless guess: we need a disk dedicated to the temp tables these queries generate)
rawipaddress: 129.108.202.164,129.114.52.171
source: UT-EL-PASO
User: public
SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'GREAT BLUE HERON%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'GREAT BLUE HERON%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'GREAT BLUE HERON%' )
Execution Time: Elapsed: 00:00:02.08
Suggestion: none, if the query was for "things that have been called great blue heron"
Problem: other stuff was busy??
rawipaddress: 50.236.76.57,129.114.52.171
Comcast
username: (academic non-operator)
SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters,concatAttributeValue(filtered_flat.collection_object_id,'total length') total_length,concatAttributeValue(filtered_flat.collection_object_id,'weight') weight FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) INNER JOIN specimen_event segc ON (filtered_flat.collection_object_id = segc.collection_object_id) INNER JOIN cache_anygeog ON (segc.specimen_event_id = cache_anygeog.specimen_event_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'TAMIAS MINIMUS%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'TAMIAS MINIMUS%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'TAMIAS MINIMUS%' ) AND filtered_flat.collection_id IN ( select collection_id from collection where upper(guid_prefix) IN ('DMNS:MAMM') ) AND cache_anygeog.geostring like '%COLORADO%'
Execution Time: Elapsed: 00:00:44.29
Suggestion: use state_prov (if that's the Colorado in question) and identified as (if that's the scope)
Problem: 2 expensive terms + other stuff was busy??
rawipaddress: 164.64.199.11,129.114.52.171
NM-STATE-GOVERNMENT
SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) INNER JOIN specimen_event segc ON (filtered_flat.collection_object_id = segc.collection_object_id) INNER JOIN cache_anygeog ON (segc.specimen_event_id = cache_anygeog.specimen_event_id) WHERE filtered_flat.collection_object_id IS NOT NULL AND cache_anygeog.geostring like '%VALENCIA%'
;
Elapsed: 00:00:10.23
Suggestion: none, it's hard to say what "Valencia" is or where we might have entered it
problem: other stuff was busy??
rawipaddress: 136.152.143.22,129.114.52.171
University of California at Berkeley (UCAB-1)
SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) INNER JOIN specimen_event segc ON (filtered_flat.collection_object_id = segc.collection_object_id) INNER JOIN cache_anygeog ON (segc.specimen_event_id = cache_anygeog.specimen_event_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'MULTIFASCIATA%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'MULTIFASCIATA%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'MULTIFASCIATA%' ) AND filtered_flat.collection_id IN ( select collection_id from collection where upper(guid_prefix) IN ('MVZ:HERP') ) AND cache_anygeog.geostring like '%CHINA%'
Elapsed: 00:00:30.68
suggestion:
rawipaddress: 129.108.81.21,129.114.52.171
UT-EL-PASO
SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) INNER JOIN specimen_event segc ON (filtered_flat.collection_object_id = segc.collection_object_id) INNER JOIN cache_anygeog ON (segc.specimen_event_id = cache_anygeog.specimen_event_id) WHERE filtered_flat.collection_object_id IS NOT NULL AND filtered_flat.collection_id IN ( select collection_id from collection where upper(guid_prefix) IN ('UTEP:HERB') ) AND cache_anygeog.geostring like '%IMPERIAL COUNTY%'
Elapsed: 00:00:07.61
suggestion: use county
rawipaddress: 187.189.107.207,129.114.52.171
MX-TPTE-LACNIC (mexico)
SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'APHELOCOMA WOODHOUSEII%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'APHELOCOMA WOODHOUSEII%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'APHELOCOMA WOODHOUSEII%' )
Elapsed: 00:00:02.16
suggestion: MAYBE use identifiedas, but this is probably what I'd have done
rawipaddress: 201.108.154.176,129.114.52.171
MX-USCV4-LACNIC (mexico)
SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) INNER JOIN specimen_event segc ON (filtered_flat.collection_object_id = segc.collection_object_id) INNER JOIN cache_anygeog ON (segc.specimen_event_id = cache_anygeog.specimen_event_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'PUMA%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'PUMA%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'PUMA%' ) AND cache_anygeog.geostring like '%MEXICO %'
Elapsed: 00:00:16.77
suggestion:
rawipaddress: 164.64.199.11,129.114.52.171
NM-STATE-GOVERNMENT
SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) INNER JOIN specimen_event segc ON (filtered_flat.collection_object_id = segc.collection_object_id) INNER JOIN cache_anygeog ON (segc.specimen_event_id = cache_anygeog.specimen_event_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'MEPHITIS MACROURA%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'MEPHITIS MACROURA%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'MEPHITIS MACROURA%' ) AND cache_anygeog.geostring like '%NEW MEXICO%';
Elapsed: 00:00:11.54
suggestion: use identifiedas and state
rawipaddress: 164.64.199.11,129.114.52.171
NM-STATE-GOVERNMENT
SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) INNER JOIN specimen_event segc ON (filtered_flat.collection_object_id = segc.collection_object_id) INNER JOIN cache_anygeog ON (segc.specimen_event_id = cache_anygeog.specimen_event_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'NYCTINOMOPS%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'NYCTINOMOPS%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'NYCTINOMOPS%' ) AND cache_anygeog.geostring like '%NEW MEXICO%'
Elapsed: 00:00:12.10
suggestion: use identifiedas and state
rawipaddress: 129.72.142.82,129.114.52.171
UWYO
SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) INNER JOIN specimen_event segc ON (filtered_flat.collection_object_id = segc.collection_object_id) INNER JOIN cache_anygeog ON (segc.specimen_event_id = cache_anygeog.specimen_event_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'FALCO%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'FALCO%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'FALCO%' ) AND cache_anygeog.geostring like '%WYOMING%'
Elapsed: 00:00:16.91
suggestion: use identifiedas and state
rawipaddress: 192.12.184.6,129.114.52.171
Los Alamos National Laboratory (LANL-2)
username: D***[email protected]1@gmail.com
SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.county county,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,getMonthCollected(filtered_flat.began_date,filtered_flat.ended_date) month_collected,filtered_flat.year year_collected,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters,concatAttributeValue(filtered_flat.collection_object_id,'sex') sex,concatAttributeValue(filtered_flat.collection_object_id,'weight') weight FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) WHERE filtered_flat.collection_object_id IS NOT NULL AND filtered_flat.cataloged_item_type='specimen' AND filtered_flat.collection_id IN ( select collection_id from collection where upper(guid_prefix) IN ('ALMNH:ES','ASNHC:MAMM','ASNHC:HERB','STAR:ALG','ASUMZ:HERP','ASUMZ:INVERT','ASUMZ:BIRD','ASUMZ:BIVALVE','ASUMZ:FISH','STAR:HERB','ASUMZ:ENTO','ASUMZ:MAMM','APSU:HERP','APSU:BIRD','APSU:FISH','APSU:MAMM','UWBM:HERP','UWBM:MAMM','CHAS:HERP','CHAS:EGG','CHAS:BIRD','CHAS:ES','CHAS:EH','CHAS:FISH','CHAS:ENTO','CHAS:MAMM','CHAS:INV','CHAS:HERB','CHAS:TEACH','COA:HERP','COA:EGG','COA:BIRD','COA:ENTO','COA:MAMM','DMNS:HERP','DMNS:EGG','DMNS:BIRD','DMNS:MAMM','DMNS:INV','DMNS:PARA','HWML:PARA','KNWROBS:FISH','KNWR:ENTO','KNWR:INV','KNWROBS:MAMM','KNWROBS:HERB','KNWR:HERB','KWP:ENTO','UCSC:HERP','UCSC:BIRD','UCSC:FISH','UCSC:MAMM','MLZ:EGG','MLZ:BIRD','MLZ:MAMM','MSB:HERP','MSB:BIRD','MSB:FISH','MSB:HOST','MSBOBS:MAMM','MSB:MAMM','MSB:PARA','MVZOBS:HERP','MVZ:HERP','MVZ:HILD','MVZ:EGG','MVZOBS:BIRD','MVZ:BIRD','MVZOBS:MAMM','MVZ:MAMM','UMNH:HERP','UMNH:BIRD','UMNH:MAMM','NMU:HERP','NMU:BIRD','NMU:MAMM','NMU:PARA','OWU:AMPH','OWU:BIRD','OWU:ENTO','OWU:EH','OWU:FISH','OWU:GEOL','OWU:HERB','OWU:INV','OWU:MAMM','OWU:ES','OWU:PARA','OWU:REPT','NBSB:BIRD','USNPC:PARA','UAM:ALG','UAM:HERP','UAM:ARC','UAMOBS:BIRD','UAM:BIRD','UAMB:HERB','UAM:ES','UAM:ENV','UAM:EH','UAMOBS:FISH','UAM:FISH','UAMOBS:ENTO','UAM:ENTO','UAM:INV','UAMOBS:MAMM','UAM:MAMM','UAM:HERB','UC-SCFS:BIRD','UC-SCFS:MAMM','UCM:HERP','UCM:EGG','UCM:BIRD','UCM:FISH','UCM:MAMM','UCM:OBS','UMZM:BIRD','UMZM:MAMM','UNR:HERP','UNR:EGG','UNR:BIRD','UNR:FISH','UNR:MAMM','UNM:ES','UTEPOBS:HERP','UTEP:HERPOS','UTEP:HERP','UTEP:ARC','UTEP:BIRD','UTEP:ES','UTEP:FISH','UTEPOBS:ENTO','UTEP:ENTO','UTEP:INV','UTEP:MAMM','UTEP:HERB','UTEP:TEACH','UTEP:ZOO','UWYMV:HERP','UWYMV:BIRD','UWYMV:FISH','UWYMV:MAMM','WNMU:BIRD','WNMU:FISH','WNMU:MAMM') ) AND upper(filtered_flat.phylorder) like 'RODENTIA%' AND UPPER(filtered_flat.state_prov) LIKE '%NEW MEXICO%'
89159 rows selected.
Elapsed: 00:01:04.39
suggestion: use class and state
NOTE: has attributes
rawipaddress: 98.238.128.19,129.114.52.171
SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'MUS%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'MUS%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'MUS%' )
DLM (successfully!) trying to break it
rawipaddress: 136.152.143.41,129.114.52.171
University of California at Berkeley (UCAB-1)
username: operator
SELECT distinct flat.collection_object_id,flat.guid guid,flat.scientific_name scientific_name,flat.country country,flat.state_prov state_prov,flat.spec_locality spec_locality,flat.verbatim_date verbatim_date,flat.dec_lat dec_lat,flat.dec_long dec_long,flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM flat WHERE flat.collection_object_id IS NOT NULL and flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'AGELAIUS PHOENICEUS MALLIARDORUM%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'AGELAIUS PHOENICEUS MALLIARDORUM%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'AGELAIUS PHOENICEUS MALLIARDORUM%' ) AND flat.collection_id IN ( select collection_id from collection where upper(guid_prefix) IN ('MVZ:BIRD') ) AND upper(PARTS) LIKE '%SKIN%'
no rows selected
Elapsed: 00:01:06.63
suggestion: use identifiedas
rawipaddress: 131.216.46.100,129.114.52.171
Nevada System of Higher Education (NEVAD-9)
SELECT distinct filtered_flat.collection_object_id,filtered_flat.guid guid,filtered_flat.scientific_name scientific_name,filtered_flat.country country,filtered_flat.state_prov state_prov,filtered_flat.spec_locality spec_locality,filtered_flat.verbatim_date verbatim_date,filtered_flat.dec_lat dec_lat,filtered_flat.dec_long dec_long,filtered_flat.coordinateuncertaintyinmeters coordinateuncertaintyinmeters FROM filtered_flat INNER JOIN cataloged_item ON (filtered_flat.collection_object_id = cataloged_item.collection_object_id) INNER JOIN fake_coordinate_error ON (filtered_flat.locality_id = fake_coordinate_error.locality_id) WHERE filtered_flat.collection_object_id IS NOT NULL and filtered_flat.COLLECTION_OBJECT_ID in ( select collection_object_id from identification where upper(scientific_name) LIKE 'MAMMAL%' union select collection_object_id from identification,identification_taxonomy,taxon_term where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=taxon_term.taxon_name_id and upper(term) LIKE 'MAMMAL%' union select collection_object_id from identification,identification_taxonomy,common_name where identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=common_name.taxon_name_id and upper(common_name) LIKE 'MAMMAL%' ) AND ( ( 36.23007635975191 between fake_coordinate_error.swlat and fake_coordinate_error.nelat OR 34.953528210949415 between fake_coordinate_error.swlat and fake_coordinate_error.nelat OR fake_coordinate_error.swlat between 34.953528210949415 and 36.23007635975191 OR fake_coordinate_error.nelat between 34.953528210949415 and 36.23007635975191 ) AND ( -115.437744140625 between fake_coordinate_error.swlong and fake_coordinate_error.nelong OR -114.49374389648438 between fake_coordinate_error.swlong and fake_coordinate_error.nelong OR fake_coordinate_error.swlong between -115.437744140625 and -114.49374389648438 OR fake_coordinate_error.nelong between -115.437744140625 and -114.49374389648438 ) )
1186 rows selected.
Elapsed: 00:00:04.02
suggestion: use class
โ
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://github.com/ArctosDB/arctos/issues/1755#issuecomment-468419646, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ARs_RLDWONj0jT-ehA4-wk3va2pbH8ZTks5vSDemgaJpZM4X2e0F.
--
Elizabeth Wommack, PhD
Curator and Collections Manager of Vertebrates
University of Wyoming Museum of Vertebrates
Berry Biodiversity Conservation Center
University of Wyoming,
Laramie, WY 82071
ewommack@ewommack@berkeley.eduuwyo.eduhttp://uwyo.edu/
www.uwymv.http://www.uwymv.edu/org
I'll play with more specific suggestions. ("that timed out - click here to try state_prov=colorado instead")
YAY!!!
NM-STATE-GOVERNMENT - was probably looking for Valencia County. Using the county field does it. Not sure how to direct people there other than what we do already?
WOW - LANL is searching our stuff? Wonder what they are doing with the data? Interesting!
NM STATE Government would be James Stuart, who contacted us about the
timeout. Likely a firewall issue on their end, but how would they know
that? How can we suggest they check locally?
On Thu, Feb 28, 2019 at 4:28 PM Teresa Mayfield-Meyer <
[email protected]> wrote:
I'll play with more specific suggestions. ("that timed out - click here to
try state_prov=colorado instead")YAY!!!
NM-STATE-GOVERNMENT - was probably looking for Valencia County. Using the
county field does it. Not sure how to direct people there other than what
we do already?WOW - LANL is searching our stuff? Wonder what they are doing with the
data? Interesting!โ
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/ArctosDB/arctos/issues/1755#issuecomment-468481098,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AOH0hGy7asFMmADzTbKL4pfzVk8dWOr4ks5vSGYOgaJpZM4X2e0F
.
It's not the magic I was hoping to find, but I did this:


It's a start!
Could we write a grant for increased resources, an Oracle audit, that would help with this? Plan for this for grants going forward.
AWG meeting 3/7/2019: Under Identification and Taxonomy, move field for "Identification (Scientic Name)" to the top in place of the more complex term. @dustymc
I'm getting "execution time out expired" when trying to search for all photographs in the collection. I think there are 2000ish photographs in the collection.
I get this time out even when media is removed from the search results page.
Is this normal or is something wrong with Arctos right now?
We are still getting these "execution timeout expired" errors, for searches that used to work just fine. At one point I was able to search for everything in the collection (over 4000) with no errors, but now searches that should result in about 500 or more are failing. Is this normal? What is the maximum number of specimen search results that should be expected to work? We really need to be able to do searches that yield results of more than a few hundred.
From the logs it looks like you're trying to search part remarks - is that's what timing out? That forces Oracle to sort through a massive amount of data (and probably doesn't find what you need anyway, because it's uncontrolled text).
Yes, I'm working on a media project where I have used a text string to flag what needs media in the media part remarks field. I paused this project for the last month or so - before that, this search was working fine. When the search was failing this morning, I tried some other searches that yield 500+ results that used to work and they are not working either. Is there something I can do differently to be able to search part remarks like this successfully or is there a way for me to search for everything in the collection that does not have media associated? We also need to be able to search for the collection object classes (paintings, drawings, prints, photographs, sculpture) in the identification field. Searches for these used to work, but now I only get timeouts.
Actually it looks like sorting through ~80M remarks isn't the bottleneck. Your original query runs in ~1:40 for me.
Without the Attributes, it runs in about 3 seconds.
There's a warning at the top of the customize form:

I think getLatestPartLocation is the really critical thing here - your query consistently runs in under 5 seconds for me without that, which may be in part due to Oracle "learning" from repetitive queries (which is awesome, but it makes diagnostics complicated). Wild guess, you've turned that on recently?
search for the collection object classes (paintings, drawings, prints, photographs, sculpture) in the identification field.
That also seems like massive overloading of a text field to me. Perhaps we should be doing more with parts or http://arctos.database.museum/info/ctDocumentation.cfm?table=CTCATALOGED_ITEM_TYPE or something.
Yep you're right, I had LastPartLocation turned on for search results, thanks! Just turned it off and the searching works perfectly again. However, LastPartLocation is really useful, at times necessary, information to have in our search results. Is there a way to make our searches work with LastPartLocation?
That also seems like massive overloading of a text field to me. Perhaps we should be doing more with parts or http://arctos.database.museum/info/ctDocumentation.cfm?table=CTCATALOGED_ITEM_TYPE or something.
Not sure I follow. We modeled what EH does for classifying the Art collection in Arctos. We follow the Getty Art & Architecture Thesaurus http://www.getty.edu/research/tools/vocabularies/aat/ for classification of the Art collection and use a very limited number of classifications. For identification in Arctos, we use the object class (and sub-class if there is one) followed by the object title in quotations, which is what was suggested we do.
Being able to search for "photograph" or "painting" is the art equivalent of a scientist needing to easily search for "mammal" or "bird".
I'm absolutely open to totally re-thinking how we currently have things organized if there is a better solution.
Issue #1732 (https://github.com/ArctosDB/arctos/issues/1732) discusses adding the AASLH Nomenclature but that doesn't work for our collection. As Karinna explained the standard for art collections and what we based our structure off of was the Getty Vocabularies.
Here is an example of the difference between Nomenclature 4.0 and the Getty AAT
Nomenclature 4.0 doesn't list photographs as a form of art and the only categories for paintings are bark/ceiling/miniature/panel/wall : http://www.nomenclature.info/parcourir-browse.app?id=12925&lang=en&ws=INT&wo=I
The Getty has 128 some types of classifications for paintings but we narrowed it down to visual works by material or technique- Here is an example for painting: http://www.getty.edu/vow/AATHierarchy?find=&logic=AND¬e=&subjectid=300033701
I see that the Getty Vocabularies are now linked as open data:
http://www.getty.edu/research/tools/vocabularies/lod/index.html
What would it take to use these in Arctos?
Is this a solution that would work for some items in Ethnology and History too?
Is there a way to make our searches work with LastPartLocation?
An Oracle tune-up might find some magic, or the next major upgrade may provide a way to do more with JSON+cache. I'll take a look at the function, but I'm not sure I'll get very far with current tools. File an Issue if it becomes a major problem - this is likely to get lost in here.
Your "identifications" do look pretty clean (I scrolled through quickly, a few obvious outliers below), but ultimately it's text - you are going to make mistakes, and the data are (probably, eventually) going to contain your metadata search terms, etc. If "find all the photographs" is something you want to do, you need a way that does so without confounding other stuff and doesn't let you enter 'photo' or "photograf." On the surface, that looks like parts to me - it's a THING (eg, something to which one might stick a barcode). Probably best dealt with in a new Issue.
Nomenclature 4.0
Have you talked to them - eg, are they receptive to making you an editor, and are you interested in that sort of thing if so?
Getty
Looks like their data are fundamentally hierarchical, which easily fits in our taxonomy model. Looks like that can be re-created from their LOD data as well.
That would let you enter something as eg, "acrylic paintings (visual works) {whatever the artist called the thing}" (from http://handbook.arctosdb.org/documentation/bulkloader.html#taxonomy) and find it by "whatever the artist called the thing" or any of...

or any metadata that any of those terms might bear (eg, anything from http://www.getty.edu/vow/AATFullDisplay?find=&logic=AND¬e=&subjectid=300033618)
New Issue - sounds like fun!
Here's IDs that jumped out at me:
Photograph , "Purple (Spot) Laser Painting"
Photograph, Archival pigment print, " Mile 75, Alfred"
Sculpture , "Untitled"
"photograph" or "painting" is the art equivalent of a scientist needing to easily search for "mammal" or "bird".
If so, then Getty should fix this (as long as we all agree with them on what is and is not a bird - something that seemingly never happens in Linnean taxonomy-land!).
If it's actually more (or also) like a scientist looking for bird skulls, then parts may play a role as well.
I'm excited that this might be a possibility! I'll chat some with Angie. We would need a way to see the title of the artwork easily in search results as well as have it prominently displayed on the object record page. Otherwise, that core identification information is buried in the attributes box. Could those two things be solved? Right now our solution has been to include the title in quotation marks after the classification which sort of works but is not ideal.
I did what I can for getLatestPartLocation - I doubt you'll see a real difference, but it's slightly more efficient than it used to be....
Thanks Dusty! I didn't realize I closed this issue by accident. Still feeling pretty new to GitHub.
As we are going to be presenting on Arctos next month at the SPNHC meeting, this is now becoming an even more critical issue in need of resolution. We absolutely must get rid of the any taxon element and any geographic element search fields, or hide them somewhere very deep and put big red warnings on them. Right now, anyone going to our search page will use those two boxes, and they will get a visually nasty and discouraging red letter message of failure. This is not how we reach the public or serve our own members. I just searched on a single species name and a single locality term for a record I know exists, and got the timeout. I put the same search terms in the identification and specific locality boxes and got results within a second. Replace those boxes with Identification and Locality fields, please.
And, to clarify, I was searching as a public user.
@dustymc
Removal is not mitigation; it's just removing a powerful tool capable of piercing the veil of our amazingly inconsistent data.
If we must shoot ourselves in the foot, hopefully temporarily until we can find some actual way to supply the resources required by these hugely powerful options, can we limit it to one foot? I'd vote fairly enthusiastically for keeping the taxonomy option.
1) It performs better/times out less than geography. From a recent timeout, Branta hutchinsii works in about 3 seconds, Colorado takes almost a minute.
2) It gets more usage than geography. I can't immediately quantify that, but I think it does...
3) Chances of finding what you're looking for are really good with it, and really bad without it. There are a finite-ish number of ways of saying "goose" and Arctos contains most of them. There are an infinite number of ways of describing places, and the webservices that drive anygeog only find some of them.
4) Without it, there's little point in proceeding with https://github.com/ArctosDB/arctos/issues/2057 or https://github.com/ArctosDB/arctos/issues/1732, and not much benefit from all the cleanup/WoRMS/Paleobiodb/Species+/etc. data we've recently drug in.
And from that last point, maybe we can shoot ourselves in that one foot with a slightly smaller gun by implementing a "any curatorially-asserted geography" option, which would somewhat alleviate the issue @AJLinn brought up? That's maybe 10% of the functionality of the current thing, but at least it'll find "fairbanks" spread across specloc/verbloc/quad/county without searching 4 times.
This morning I searched on Falco sparverius from Roswell, using those
fields, and got a timeout. I just tried again with the taxon only and it
works.
So perhaps if we prevent people from trying to use both at the same time,
by getting rid of the geography search or burying it?
Maybe we need a "search locality" that can pull only from specific or
verbatim locality terms? Not sure if that is any better - probably would
confuse people who think that means "California".
Make Specific Locality more visible?
What does MCZ do? Their interface is simple and clear.
On Thu, Apr 25, 2019 at 8:49 AM dustymc notifications@github.com wrote:
Removal is not mitigation; it's just removing a powerful tool capable of
piercing the veil of our amazingly inconsistent data.If we must shoot ourselves in the foot, hopefully temporarily until we can
find some actual way to supply the resources required by these hugely
powerful options, can we limit it to one foot? I'd vote fairly
enthusiastically for keeping the taxonomy option.
- It performs better/times out less than geography. From a recent
timeout, Branta hutchinsii works in about 3 seconds, Colorado takes almost
a minute.- It gets more usage than geography. I can't immediately quantify
that, but I think it does...- Chances of finding what you're looking for are really good with it,
and really bad without it. There are a finite-ish number of ways of saying
"goose" and Arctos contains most of them. There are an infinite number of
ways of describing places, and the webservices that drive anygeog only find
some of them.- Without it, there's little point in proceeding with #2057
https://github.com/ArctosDB/arctos/issues/2057 or #1732
https://github.com/ArctosDB/arctos/issues/1732, and not much benefit
from all the cleanup/WoRMS/Paleobiodb/Species+/etc. data we've recently
drug in.And from that last point, maybe we can shoot ourselves in that one foot
with a slightly smaller gun by implementing a "any curatorially-asserted
geography" option, which would somewhat alleviate the issue @AJLinn
https://github.com/AJLinn brought up
https://github.com/ArctosDB/arctos/issues/1755#issuecomment-448325464?
That's maybe 10% of the functionality of the current thing, but at least
it'll find "fairbanks" spread across specloc/verbloc/quad/county without
searching 4 times.โ
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/ArctosDB/arctos/issues/1755#issuecomment-486704929,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADQ7JBDUAR2VPYN457BZIETPSHAHVANCNFSM4F6Z5UCQ
.
I'll throw together a "local place-stuff" search and see what happens.
only from specific or verbatim locality terms
That still leaves you guessing what we've done with "Fairbanks." (Actual answer: little of this, little of that, ...)
confuse people who think that means "California".
Yes, that's part of where the "any" fields came from. Our internal inconsistencies don't much help.
What does MCZ do?
Use local data, of which they have much less - eg, they have 342,441 "taxonomy rows" and we have 235,653,300.
AFAIK they also have a limited locality model - their data are capable of being flattened, ours are not.
And I think they overall have much more DBA support; I don't know details, but I suspect if they think they need a tune-up they just get a tune-up.
Their interface is simple and clear.
It's our interface from a few years ago???
I tried something fairly unorthodox, and the 'any' options seem to be working and performing reasonably well.
Assuming nothing melts as people start making updates, I think this is functional for now. We can talk about reintroducing service-derived terms after we get to 12, assuming we get a functional GMaps API key.
Feel free to try to break it, and let me know details if you do.
This does nothing for adding attributes-and-such, it's only the 'any' search options.
So the combined ANY TAXON and ANY GEOGRAPHY search that timed out yesterday works today within a couple seconds, using an exact specific locality and an exact taxon name ("Falco sparverius and Roswell"). Changing the geography parameters to "New Mexico" as higher geog takes much longer but eventually yields the 201 results on public view. Changing to a common name "rabbit" also worked, although I got "rabbitfootgrass". I guess we need to work on our common names.
Logged in with my original two searches also worked, although both took about 12 seconds. Can we add a pop-up that says "Search in Progress" or something?
using an exact specific locality and an exact taxon name
Maybe we need to do better with the tooltips? If you KNOW what you're looking for and where it is, the 'any' options are both inefficient and likely to find some stuff you don't care about. Roswell (AFAIK) is just a placename that'll reliably end up in specloc, where Fairbanks is at least 3 things county, quad, specloc). I have no idea how to make that clear in something short enough a user might actually read it.
I'm not sure if I can use the current approach to search for "is" or not - it's converting a bunch of stuff to a big concatenation and that to a document which is then indexed, which for some inexplicable reason the Oracle-fairies find easier to navigate than any 'normal' solution. I can look closer if you want....
Right now, the taxonomy search options are limited to 'anything' and 'ranked terms in the collection's preferred classification." There's no reason (other than trying to get users to understand it) that we couldn't add the ability to search WHATEVER - 'family according to GenBank' or 'family according to someone' or 'common name is exactly' or WHATEVER. New Issue...
Search in Progress"
New Issue as well - short answer is 'yes' but it may be less-trivial than I'd like.
This issue is continuing to cause problems. Any hope that migration to Postgres and collaboration with TACC will solve the timeouts using Any Geography and Any Taxon?
If not, I ask again if we can please delete these search boxes or hide them from the public.
Name: Kirsten Nicholson
Email: kirsten.[email protected]
Message: No matter what search I do, it kicks me out and gives me the following error message: [Macromedia][Oracle JDBC Driver]Execution timeout expired. Please help! We need the database for our teaching labs!!
hope that migration to Postgres
Yes
Yah I responded to Kirten at length about this today. Even conducting
searches on known locations (country is Canada), all collections, for genus
is Ursus gave her time outs.
So not even using the Any Geography and Any taxa made everything die but
having large searches caused crashing.
Could this be because we have more collections now then previously, or just
more people using Arctos at one time?
Anyway, I agree, remove these 2 boxes, or put a disclaimer next to them!!
On Mon, Oct 7, 2019 at 4:38 PM Mariel Campbell notifications@github.com
wrote:
This issue is continuing to cause problems. Any hope that migration to
Postgres and collaboration with TACC will solve the timeouts using Any
Geography and Any Taxon?
If not, I ask again if we can please delete these search boxes or hide
them from the public.Name: Kirsten Nicholson
Email: kirsten.[email protected]
Message: No matter what search I do, it kicks me out and gives me the
following error message: [Macromedia][Oracle JDBC Driver]Execution timeout
expired. Please help! We need the database for our teaching labs!!โ
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/ArctosDB/arctos/issues/1755?email_source=notifications&email_token=ABCJF4N3WTIAGIGVVGOQ4QLQNPCBDA5CNFSM4F6Z5UC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEASE2MA#issuecomment-539249968,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABCJF4MLE4JGMWFFPFGD7E3QNPCBDANCNFSM4F6Z5UCQ
.
--
Carol L. Spencer, Ph.D.
Staff Curator of Herpetology & Researcher
Museum of Vertebrate Zoology
3101 Valley Life Sciences Building
University of California, Berkeley, CA, USA 94720-3160
[email protected] or [email protected]
510-643-5778
http://mvz.berkeley.edu/
Little of all that, and some other stuff.


My arguments for keeping the 'any...' options are the same as always: we have inconsistent data and they provide the only realistic pathway to it, as frustrating as the timeouts can be. PG+TACC, at least in theory, should fix the performance issues.
Could we move the 'any' option boxes to the bottom of the Taxonomy and Locality sections and push up the Identification option for Taxonomy, and perhaps push up three options in the collapsed form: State/Province, County, and Specific Locality for Locality?
The form is pretty easy to manipulate, we can move whatever wherever.
Whether we should or not is the bigger question. I see compelling arguments for both cases from here.
The vast majority of complaints come from users who also have things like Attributes turned on. Maybe we should instead limit results for non-Operators to only columns in FLAT?
I don't like limiting search options because I need to be able to get
attributes from collections aI don't have access too. Also, we are trying
to promote Arctos as a source of trait data and a powerful research grade
search tool, which it is not if put limits on public searches. Otherwise,
just go to GBIF.
I strongly support moving the boxes to less visible. We look bad Everytime
people search and the search fails. I keep promoting search tools at
meetings, and then no one can make them work without a personal Arctos
coach.
On Tue, Oct 8, 2019, 12:53 PM dustymc notifications@github.com wrote:
The form is pretty easy to manipulate, we can move whatever wherever.
Whether we should or not is the bigger question. I see compelling
arguments for both cases from here.The vast majority of complaints come from users who also have things like
Attributes turned on. Maybe we should instead limit results for
non-Operators to only columns in FLAT?โ
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/ArctosDB/arctos/issues/1755?email_source=notifications&email_token=ADQ7JBCVITHJK3HZPK4KWBDQNTJLPA5CNFSM4F6Z5UC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAVHDFY#issuecomment-539652503,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADQ7JBFPY7DPSMM7WH7V3GDQNTJLPANCNFSM4F6Z5UCQ
.
I agree with Mariel.
Objections? @DerekSikes ? Does anyone have a way to partially mitigate Derek's concerns?
If we do move those out as default, I'll need a replacement. In light of the (internal??) 'Arctos is so complicated' thing I think a single replacement, so we don't go over 8 fields, is necessary.
Default should be (or at least the top row)
Identification
Include previous IDs? Current ID only Include all IDs Match Type starts
with is (case insensitive) does not contain contains comma-list comma-list
(substring)
and modified from Elizabeth's request
Country, State/Province, County, and Specific Locality for Locality
Why can MCZbase do the Any options but we can't?
On Wed, Oct 9, 2019 at 8:27 AM dustymc notifications@github.com wrote:
Objections? @DerekSikes https://github.com/DerekSikes ? Does anyone
have a way to partially mitigate Derek's concerns?If we do move those out as default, I'll need a replacement. In light of
the (internal??) 'Arctos is so complicated' thing I think a single
replacement, so we don't go over 8 fields, is necessary.โ
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/ArctosDB/arctos/issues/1755?email_source=notifications&email_token=ADQ7JBFGTQP6KS4AEVQEZ7TQNXS6ZA5CNFSM4F6Z5UC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAYCMDY#issuecomment-540026383,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADQ7JBDZMT6J6LDIJYRTN6DQNXS6ZANCNFSM4F6Z5UCQ
.
Mariel - were you having timeout issues today? So was I.
I was trying to search for all records I just loaded in an accession, 1615 records so not that huge. Timed out multiple times before finally working.
@ccicero I see that in the logs, the query is being slowed down by a lot of attributes and part location and such. If you can trim that up, it'll make things work better. It starts returning data in under a second, but then takes just over a minute to complete. There's also still a fair bit of Chinese traffic today - you might have been competing with that.
As far as I know MCZbase has MUCH simpler data - I don't think they consume globalnames or locality webservices.
Actually, I added a lot of those attributes to my custom results AFTER I got it working because I needed to download them. Will remove in the future, but I don't think that was it. I'll blame it on the Chinese :)
Thanks - a cheap query failing but then working in a more expensive form later is definitely a sign of things being momentarily overwhelmed.
FYI your you+montana+perisoreous query with a ton of attributes on completes in 1:30 in Oracle - the system for which it's tuned, right now when there's basically only one other user (and they're not using much) - or 13 seconds in PG. We do have an actual solution, I just need some more focused time to see it through.
13 seconds is good! Yay, Postgres!
On Wed, Nov 20, 2019, 7:55 AM dustymc notifications@github.com wrote:
Thanks - a cheap query failing but then working in a more expensive form
later is definitely a sign of things being momentarily overwhelmed.FYI your you+montana+perisoreous query with a ton of attributes on
completes in 1:30 in Oracle - the system for which it's tuned, right now
when there's basically only one other user (and they're not using much) -
or 13 seconds in PG. We do have an actual solution, I just need some more
focused time to see it through.โ
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/ArctosDB/arctos/issues/1755?email_source=notifications&email_token=ADQ7JBBL54UGI3YQQEYHKT3QUVFV3A5CNFSM4F6Z5UC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEESHSUI#issuecomment-556038481,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ADQ7JBAN7GEYDMXDI37XS7LQUVFV3ANCNFSM4F6Z5UCQ
.
Yes, yay!!! Thanks.
13 seconds
One order of magnitude is perhaps a more useful way to look at that.
Note also and that's an un-tuned single-box development server with the front-end running on a solar-powered calculator (OK not quite but still...). It's faster than our current setup, but more importantly it's much more scalable than our current setup. If we find our way back here again, we can do something about it that doesn't first involve many tens of thousands of dollars in licensing.