Openlibrary: What's the difference between a book and a document?

Created on 1 Jan 2021  Â·  6Comments  Â·  Source: internetarchive/openlibrary

I asked this in the librarians github, but no clear answer. I looked and didn't see much. Maybe it can be explained here and then added on the OL website to help others understand too. I bet I'm not the only one out there who doesn't get this.

Triage Question

All 6 comments

All books are documents. Few documents are books. Letters, memoranda, certificates, currency, tweets, newspaper articles, even passports are examples of non-book documents. Edge cases such as bound volumes containing several issues of a published serial can be debated, but normally a book has a text, a publisher, a place and year of publication, and an author(s), editor(s), or compiler(s) who takes responsibility for the creative content.

@LeadSongDog I see. Not every book has an author - as sometimes it's just the publishing company that is or anonymous. Even publisher is a little iffy, as many books these days are not published - they're like self-published. What if it was never published, like an unfinished book - is that still a book or just a 'work'? Also 'place' is off, as a lot can be published online - good luck locating that. Some works may be made in a 'year', but have no year listed on them. It doesn't even have to have a 'text' - it can be all pictures.

Normally I think of a book as something with a title, cover, contents (that relate to the title), some entity that wrote it, some location, the year that it was finished/published - something that can be a file (like I wouldn't count a tweet as that). I would even call an unfinished book a 'book' - it's just an incomplete one. There's usually a back cover and the contents have a start and end, but those aren't really required. Even a table of contents isn't with. I mean, you have to think if a book's metrics are physically ripped off a book, what are the components that remain that would tell someone that it's a book.

See the issue is that Lisa said that https://www1.maine.gov/dhhs/mecdc/population-health/odh/documents/tasty-treats-teeth.pdf isn't a book, but by both of our definitions it is. What am I missing? I mean, by your definition a passport could kind of be a document if it's filled up - it just doesn't match mine, because the contents don't match the title.

@BrittanyBunk
At the OL the terms Book and Edition are rather too fungible (some would say sloppy), but that’s not likely to change any time soon. To clarify our thinking, it helps to understand the FRBR concepts of Work, Edition, Expression, and Instance even if they are not well applied here. Each of these items has some conceptual range, but nowhere as wide a range as OL’s Book. Here we treat two scans from the same dog-eared copy as distinct editions, while FRBR sees them as instances of a single expression. An argument could be made that if the two scan files contain different image resolutions or OCR that they then are distinct expressions of that edition, but again that is far too nuanced for OL’s terminology.

I’m unsure what @seabelis concern might be. The born-digital cookbook in PDF format, for which you linked the URL, would be one instance in FRBR terms, as would the subtly distinct forms that might be displayed by each browser or PDF reader software or printed by each printer firmware version. All these rendered instances would have the same words and pagination, but page layout and annotation might vary a bit. It could even be argued that the bitwise-identical copies of that PDF file on different machines constitute separate instances solely by virtue of having different local download time stamps, but not even in OL-speak would they be considered distinct editions.

@LeadSongDog It's strange that two identical copies of a book are considered two different books in the OL - that shouldn't happen. You lost me in the 2nd paragraph. To me, it's an ebook - I wouldn't call it a document - doesn't matter the format. I just don't see why it's not an ebook - the OL terms are just not making sense - it's not enough is the issue to determine what's what from it.

“Two identical copies of a book” in FRBR-speak is “two instances of a single manifestation”. OL sloppily refers to them as two editions, which causes no end of confusion. Sadly OL can’t handle more than one IA record link per OL “edition” record, forcing us to have distinct OL pseudo-“edition” records for each instance of popular editions to avoid over-long waiting lists.

A PDF is a pretty lame excuse for an e-book: fonts, pagination and page layout are rigid rather than user-adaptive and platform sensitive. OCR is optional, so many PDFs are simply multi-page images that can’t even be searched. Clearer to just call it a PDF,

@LeadSongDog thank you for getting to the nitty gritty on this. It would help if OL did put these 'instances' together. Would make life much better.

Maybe this can be called a dissertation or something. Maybe both of our definitions are too encompassing, as I bet scientific articles can be put underneath these - maybe even magazines! I checked Wikipedia and how it discerns it is monograph vs serial works: https://en.wikipedia.org/wiki/Book#Non-published_books. I'm thinking I will go with that too, on top of the criteria I mentioned.

I think we're getting somewhere.

Was this page helpful?
0 / 5 - 0 ratings