ImportBot does not seem to be choosing covers correctly from archive.org. It seems to be using the title page even when a good cover. I wonder if this is happening in:
This is using:
which I think is wrong. We should be using e.g. https://archive.org/download/greatdebatesback0000unse/page/cover_t.jpg which gives the cover _or_ the title page (if the cover is not useful).

e.g. https://openlibrary.org/books/OL26968796M/Guan_li_cheng_jiu_sheng_huo
Should display e.g. https://archive.org/download/guanlichengjiush0002fred/page/cover_t.jpg
@mekarpeles @hornc
@mekarpeles @hornc Is there anything special that needs to be done to deploy this, or do we just have to fix the line above?
Note also that for your example, https://archive.org/download/greatdebatesback0000unse/page/title.jpg (the current form) doesn't resolve at all, but in that case the title page would arguably be better than a plain blank green cover.
In addition to fixing the current code, we'll also need to figure out which editions need to have their covers fixed.
It seems that there are four conflated issues here.
First, both the cover and the title page should be captured into the coverstore, not just one or the other. Capturing TP verso would also be helpful.
Second, the better of the two (by some metric) should be presented in search results and carousels. I would argue that when there is not a minimum amount of legible text on the cover (to identify the author and title), seeing the title page is often essential to confirming the edition is correctly described.
Third, absent a good identifying image for the edition, should a useful default cover for the work be presented instead?
Fourth, are all useful sources for cover images being exploited?
@tfmorris could you create a new issue (probably on https://github.com/internetarchive/openlibrary-client ) for cleaning up the incorrect covers?
@LeadSongDog Trying to keep the scope of this issue small. Baby steps :)
covers. This would require labeling the images as "front cover", "title page", etc. Could you create a new issue to investigate redesigning our cover storage schema?@cdrini
I'm not sure whether relying on cover to be set to the title page on books where it is appropriate is fully reliable.
It seems to be the case on many items, e.g. https://archive.org/download/hesiodtheognis00daviuoft/page/cover.jpg
But it looks like title and cover are independent things, and there is no logic that redirects to the other if one is not set.
archive.org logic appears to prefer title if a book is pre-1923, but cover otherwise, but that is explicitly for choosing a preview image to display. We'll need to make a choice ourselves.
also, the correct URL is https://archive.org/download/guanlichengjiush0002fred/page/cover.jpg
cover_s4.jpg would scale the image , but the _t is not meaningful and is stripped.
Also, in the current code the archive.org id is duplicated, which does not seem necessary :man_shrugging:
I'm not sure cover.jpg is absolutely better than title.jpg, it _seems_ like it will be better in this case, and I can't find a concrete example where it is _worse_. It seems very dependent on the source data, and not on any system smarts.
We could
A. make the change and see if we notice any issues
B. Investigate further to make a stronger case
Assigning @hornc per slack discussions because this issue is import related.
Also removing the Good First Issue label as this seems a little involved for a newcomer.
I'm not sure there is a clear action to take here. Closing.