Arctos: Request to add more media relationships (current loader maxes out at 5 relationships)

Created on 21 Dec 2020  Â·  17Comments  Â·  Source: ArctosDB/arctos

I'm trying to link bird records to individually scanned ledger pages hosted online by our library. The current media metadata bulkloader maxes out at 5 relationships, however, some of our ledger pages relate to 43 cataloged items, though most are in the ballpark of 12 entries (see https://ark.colorado.edu/ark:/47540/j59s8741819t). In an ideal world, a media record that I create would relate to all specimens recorded on a scanned ledger page (as "documents cataloged item" relationships).

Current error:
image

I've prepared the attached bulkloader. Any way to push it in or augment the metadata handler so that I can load it myself?

bird_ledger_media_load_2020-12-21.zip

FYI there is more of this coming down the pipeline, as the libraries have scanned all of our vertebrate ledgers. I still have 22 ledgers-worth of media to load, representing roughly 90K specimens (the library doesn't have all of this indexed yet, but it will be ~5000 more media records, representing individually scanned pages from each ledger, with 5-40+ "documents cataloged item" relationships per page).

Bug Priority-Normal Tool - Bulkload Media Metadata

All 17 comments

This:

Screen Shot 2020-12-22 at 8 59 18 AM

should handle any number of relationships

Ok thanks. Oof, I didn't realize there was a separate process...I will look for a somewhat efficient way to transform 100+ columns into one to create a bulkload media relationships file.

Will the media_IDs for the 850 media records I am currently creating (via bulkload media metadata) show up in the My Stuff table so I can at least grab and download them somewhat easily to put in this second bulkloader?

Let me know if you find that efficient way to transform columns. We have
way too many tools that require swapping between those two formats, and
this should just not happen as transforming data in that way will
inevitably lead to error for the average user.

On Tue, Dec 22, 2020, 2:00 PM Emily Braker notifications@github.com wrote:

  • [EXTERNAL]*

Ok thanks. Oof, I didn't realize there was a separate process...I will
look for a somewhat efficient way to transform 100+ columns into one to
create a bulkload media relationships file.

Will the media_IDs for the 850 media records I am currently creating (via
bulkload media metadata) show up in the My Stuff table so I can at least
grab and download them somewhat easily to put in this second bulkloader?

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/ArctosDB/arctos/issues/3314#issuecomment-749773483,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ADQ7JBD43VNYFRFFDOAY4L3SWECE5ANCNFSM4VEWHYBQ
.

@dustymc - I'm getting an "invalid" error in my Bulkload Media Metadata status column for all of my preview URIs. Any ideas as to why? They don't seem all that large (~22KB)...

Here's a couple examples:
https://cudl.colorado.edu/MediaManager/srvr?mediafile=/Size1/CUB~8~8/4692/musm_zooLed_birds_v5_13734-13775.jpg
https://cudl.colorado.edu/MediaManager/srvr?mediafile=/Size1/CUB~8~8/4692/musm_zooLed_birds_v5_13987-14028.jpg

https://handbook.arctosdb.org/how_to/How-to-Create-Media-Images.html#preview-uri

Preview filesize should be well under 10K and scale to ~1200x, previews larger than 48K will NOT be displayed. TIP: If you have difficulty creating an appropriately sized thumbnail you might try http://makethumbnails.com/#dropzone

Ok, I'm still getting the invalid error and the preview links are now ~5.7 KB (96x79 pixels)...

Example:
https://cudl.colorado.edu/MediaManager/srvr?mediafile=/Size0/CUB~8~8/4692/musm_zooLed_birds_v5_13944-13986.jpg
https://cudl.colorado.edu/MediaManager/srvr?mediafile=/Size0/CUB~8~8/4692/musm_zooLed_birds_v5_13692-13733.jpg

That's beyond me. @dustymc

Here's the test:

<cfset pf_puri="https://cudl.colorado.edu/MediaManager/srvr?mediafile=/Size1/CUB~8~8/4692/musm_zooLed_birds_v5_13734-13775.jpg">
<cfhttp url="#pf_puri#" charset="utf-8" method="head" />
<cfdump var=#cfhttp#>

and the result:

Screen Shot 2020-12-29 at 10 34 54 AM

For whatever reason, those URIs are returning 404 to Arctos.

curl is returning something different, but still nothing that Arctos would accept as valid.

curl -I https://cudl.colorado.edu/MediaManager/srvr?mediafile=/Size1/CUB~8~8/4692/musm_zooLed_birds_v5_13734-13775.jpg
HTTP/1.1 403 403
Date: Tue, 29 Dec 2020 18:33:08 GMT
Server: Apache/2.4.6 (Red Hat Enterprise Linux) OpenSSL/1.0.2k-fips
Strict-Transport-Security: max-age=15768000
Content-disposition: filename="musm_zooLed_birds_v5_13734-13775.jpg"
Content-Type: text/html;charset=utf-8
Content-Language: en
Content-Length: 22244

So what does that mean? The preview URLs work when I paste them in my browser...

https://cudl.colorado.edu/MediaManager/srvr?mediafile=/Size0/CUB~8~8/4692/musm_zooLed_birds_v5_13692-13733.jpg

I've set up various things to prevent situations that users will interpret as "Arctos is broken." This one is intended to prevent things like trying to use your local drive to host the previews, but it does so by making a HEAD request which that server doesn't like for some reason. There's nothing terribly fatal about dropping that and just accepting whatever you type in; I'd expect it to result in more broken links and such, but it could be argued that that's your problem, not something Arctos needs to deal with.

But, there's a similar check before the thumbs are displayed, and I don't think it would be so trivial to bypass without actually breaking Arctos.

I'm up for about anything, I don't have any great suggestions other than asking them to fix their server.

Ok, ah well. To keep it simple I ended up just loading a generic image of a catalog cover for all the ledger page media.

I have a csv with 12,000 media relationships to add. I loaded a test of 130 and received the following error:

An error occurred while processing this page!
Message: Error invoking external process
Detail: psql:/usr/local/webroot/temp/excopy_ebraker_20210109040102655_522.sql:130: ERROR: invalid input syntax for type bigint: "UCM:Bird:13944" CONTEXT: COPY cf_temp_media_relations_ldr, line 2, column related_key: "UCM:Bird:13944"

It seems as though the bulk media relationships loader does not like GUIDs as related_keys?

media_relationships_test.zip

No the key is an integer, the loader doesn't yet support what you're trying to do.

I can get the key for you if you want to send a CSV of your GUIDs, or I should be able to add a guid-handler to the loader in the next few days (or both).

Thanks! I'm fine with either option, though I would imagine a GUID handler is more widely applicable to users.

csv attached just in case (though I still need to figure out how to semi-gracefully transform this dataset into three columns....)
bird_ledger_media_relationships_2020-12-22_v2.zip

I'll try to get a guid/string handler patched in in the next few days. I did this - results attached.

create table temp_m as select * from temp_cache.temp_dlm_uptbl ;

create table temp_m2 (
  media_id bigint,
  related_term varchar,
  relationship varchar
);

CREATE OR REPLACE function tttemp() returns void AS $body$
declare 
    c bigint;
    mtt varchar;
    mrt varchar;
    s varchar;
BEGIN
    for c in 1..43 loop
      mtt='media_related_term_'||c;
      mrt='media_relationship_'||c;
      s:='insert into temp_m2 (media_id,related_term,relationship) (select media_id::bigint,' || mtt|| ',' || mrt || ' from temp_m)';
      execute s;
    end loop;
end;
$body$
LANGUAGE PLPGSQL
SECURITY DEFINER
 volatile;

select tttemp();
delete from temp_m2 where related_term is null;
alter table temp_m2 add related_key bigint;
update temp_m2 set related_key=(select collection_object_id from flat where flat.guid=temp_m2.related_term);

temp_m2.csv.zip

@dustymc sweet relief!!! Thank you so much for reformatting the spreadsheet! Magic indeed.

Next release has string-resolution for cataloged_item (and a mechanism to add it for other relationships).

Was this page helpful?
0 / 5 - 0 ratings

Related issues

alexkrohn picture alexkrohn  Â·  3Comments

ebraker picture ebraker  Â·  8Comments

dustymc picture dustymc  Â·  6Comments

AJLinn picture AJLinn  Â·  3Comments

Jegelewicz picture Jegelewicz  Â·  7Comments