Openlibrary: Stop importing records with "bad authors" from Amazon

Created on 2 May 2020  Â·  9Comments  Â·  Source: internetarchive/openlibrary

Many imports from amazon have invalid authors. These records should be blocked from import.

Evidence / Screenshot (if possible)

  • [ ] A

  • [ ] a

  • [ ] B

  • [ ] C
  • [ ] D
  • [ ] E
  • [ ] F
  • [ ] G
  • [ ] H
  • [ ] I
  • [ ] J
  • [ ] K
  • [ ] L
  • [ ] M
  • [ ] N
  • [ ] O
  • [ ] P
  • [ ] Q
  • [ ] R
  • [ ] S
  • [ ] T
  • [ ] U
  • [ ] V
  • [ ] W
  • [ ] X
  • [ ] Y
  • [ ] Z

  • [ ] Large

  • [ ] Xlarge
  • [ ] XLarge

  • [ ] Imagesoft Sony

  • [ ] Sound Editions

  • [ ] RH Value Publishing

  • [ ] Aerie Mm

  • [ ] Collection

  • [ ] 1stworld Library

  • [ ] Outlet

  • [ ] X pre 1970

  • [ ] Original Soundtrack

Relevant url?

Do an author search for any of the above to find the relevant profile.

Proposal & Constraints

Block these records from being imported as the records are frequently inappropriate items outside the scope of Open Library (i.e. clothing) or the records lack enough information to be useful.

Related files

Stakeholders

@seabelis @leadsongdog @tfmorris

Data AmazonAPI @hornc Import 3 Bug

Most helpful comment

@TimmyTheHelper No, I don’t mean ignore it, I mean that we need mechanisms that actively check for these deficient source records and remove their pollution from the pool. Amazon has not shown any interest in cleaning up their own mess. In the meantime not importing more would be a small step in the right direction.

All 9 comments

I am worried that if you block those, then books that only are there and nowhere else won't be added in. However, idk how this process works. Would that happen if these get blocked?

Also, since this is a list that might keep growing, shouldn't we have a location on the librarians repository for it? Like a file or something?

Blocking must happen at the import level so this issue is in the appropriate place. There are already issues for cleanup of the existing data in the librarians repo.

The author search suggested by @seabelis does not work, but see as one example https://openlibrary.org/authors/OL2754228A/RH_Value_Publishing

These bogus authors are a huge drag on cleanup efforts. If an author is seen only at one online bookseller then no edition can be independently confirmed to exist. We’ve taken their click bait and amplified it. Shame on them. Shame on us. Such crap should be nuked from orbit.

I seriously would leave a note on OpenLibrary.org for people to not waste their time on these authors.

@BrittanyBunk Can you be a little more specific than “on the website” please?

@LeadSongDog He means Ignore it basically.

@TimmyTheHelper No, I don’t mean ignore it, I mean that we need mechanisms that actively check for these deficient source records and remove their pollution from the pool. Amazon has not shown any interest in cleaning up their own mess. In the meantime not importing more would be a small step in the right direction.

@LeadSongDog I love that idea! That saves so much trouble.

@LeadSongDog My bad.

Was this page helpful?
0 / 5 - 0 ratings