This github issue is about making sure that search engines display what's in the description field when displaying datasets and files, similar to the work done for making sure that search engines display dataverse descriptions (https://github.com/IQSS/dataverse/issues/1802).
_Google uses the meta.description tag (<meta name="description" content="" />) to display dataverse descriptions in search results:_

_Datasets don't have meta.description tags but do have dc.description tags (<meta name="DC.description" content="" />). But it seems that Google does not use the dc.description tags to display dataset descriptions in search results:_

There are also remaining questions about dataverses and using other types of tags, some brought up in https://github.com/IQSS/dataverse/issues/4468:
og:description or its own Twitter cards twitter:description)?Should we consider other meta tags (e.g. Twitter seems to ignore meta tags but uses the standard Open Graph tags og:description or its own Twitter cards twitter:description)?
I was just looking at this because Dataverse links on Twitter are sad. I think adding Open Graph tags for Twitter and Facebook would be great.
I don't have answers or even opinions on most of the other questions except that, if there is no description given e.g. for a file, the metadata for description should be empty. I think it's bad practice to put some default text in the description instead.
@adam3smith can you please post a screenshot of a sad Dataverse link on Twitter? https://twitter.com/BrianAmos/status/1034081533227937793 doesn't look bad but maybe they added an image separately? Here's a screenshot:

yeah, that looks like an image to me (it is indeed: https://pbs.twimg.com/media/DlnLTFsW4AAvIBj.jpg:large ).
Example from Harvard DV (contrast with Sage link in tweet above): https://twitter.com/gboeing/status/1037007558924484609
Screenshot:

Example from QDR - Tweet: https://twitter.com/qdrepository/status/1046796917013975040 (contrast with the CUP article below).
Screenshot:

@adam3smith thanks. Good examples.
You and @qqmyers might want to look at pull request #4879 to see how small the fix was for #4468 and I wonder if a fix for this issue would be similarly small.
@lmaylein made that excellent contribution by the way. Thanks again!
Thanks! I think to do this properly we would want to create an ui:insert name="open-graph" analogous to DC and JSON-LD That provides full OG metadata (via http://ogp.me/)
Required would be:
og:title - The title of your object as it should appear within the graph, e.g., "The Rock". --> Dataset title
og:type - The type of your object, e.g., "video.movie". Depending on the type you specify, other properties may also be required. --> article (that's the best option of those available)
og:image - An image URL which should represent your object within the graph. --> favicon
og:url - The canonical URL of your object that will be used as its permanent ID in the graph, e.g., "http://www.imdb.com/title/tt0117500/". --> URLified DOI or just URL?
In addition, I think we should add
og:description --> same as meta name="description"
og:site_name --> same as dataset publisher
and the article specific
article:author --> Dataset creator
article:published_time --> Date published.
Still seems pretty straightforward as all of this already exists in DC so just need to duplicate with the new meta names. The duplication is a bit silly, but I guess that's modern web development for you? If you're generally open to that, we can put together a PR
FYI - PR #5600 has Open Graph metadata for datasets. @adam3smith created the basics and I added a 150 char truncation of long descriptions and a pointer to the favicon. I think it would be straight forward to add similar info the dataverse and file pages (if similar metadata can be found).
The one thing we didn't do was to use a thumbnail of the dataset as the image - Open Graph requires a full URL for links and I didn't find any API/URl for the dataset thumbnail. Does such a thing exist and I just missed it? If not, is there a reason not to have one to use in this way?
@qqmyers 4 years ago when I created https://github.com/IQSS/dataverse-android over the winter break we had downloadable thumbnail images via the Search API that I could use in that Android app. I opened #3616 about this but ultimately closed it due to lack of interest. Thumbnails are actually downloadable via a different mechanism these days but I don't believe it's documented. Please see
I mostly added that code so I could test thumbnails back when I worked on pull request #3703 for #3559. There be dragons. Help wanted. I went ahead and moved pull request #5600 but please let me know if you'd like to poke at thumbnails before it's tested by QA. Thanks!
@pdurbin - thanks. My initial try was to just put the base64 encoded data:image/png;base64,... info that's used in the page as the URL for Open Graph, but it doesn't support that. So it seems I need a real, full URL. I was thinking that just having a .../dataset/
I guess if there are dragons, we shouldn't hold up the current PR. If it is just adding an API call, I could probably try that quickly...
@qqmyers hopefully the dragons are small and can be slain. 馃悏 鈿旓笍 If you restore, real downloadable thumbnails, maybe I'll crack open that old Android code (but no promises). 馃槃 Are you thinking you'd want to add that into this issue or a new issue? Small chunks are usually better for us. And in our experience, whenever we touch thumbnails the chunk is not small.
@pdurbin - lets do it in pieces so this can get out as is...
@qqmyers I just spoke with @matthew-a-dunlap and he said we played around with showing dataset thumbnails on the custom homepage for Harvard Dataverse. The API for showing thumbnails is not documented but you can see it being called from some HTML at db87186
@pdurbin - Thanks for the info - I missed that call. I've updated the logic to use the thumbnail in the OG metadata, iff it exists, favicon otherwise. I'll submit a PR...
@jggautier I moved the Open Graph support to other issues (#5640 and #5641) and crossed it out on the list above. I'm going to move this issue to the inbox because I think it needs further discussion before implementation, but let me know if you think it should be prioritized sooner and want to discuss.
FWIW, to me it looks like Google does index our dataset descriptions just fine. See the screenshots of the two recent datasets - both are showing parts of the description:


Met with @scolapasta and @mheppler to discuss how to make search engines show dataset and file descriptions. We agreed on next steps:
<meta name=description /> tags because Dataverse only adds the description metadata to the tag if the page is a dataverse page. So change metadata description tag on dataverse_template page to accept a parameter.<meta name=description /> tag in dataverse pages.<meta name=description /> tag in dataset pages.<meta name=description /> tag in file pages. (Harvard Dataverse's robot.txt is telling Google not to index file pages for now, but that may not be the case for all installations.)@scolapasta mentioned that doing this might be a performance gain with page load times. The value of doing this in order to make data more _findable_ is debatable, since search engines are still indexing and displaying the descriptions on the dataverses/datasets/file pages, and that content is in a favored place, near the top of page.
What to do when there is no metadata in the dataverse/dataset/file description fields is discussed in https://github.com/IQSS/dataverse/issues/5672.
The Google results for IMDB pages was mentioned as an explain of metadata passed in the source code using Schema.org JSON formatting. I took a look at the source code for one of their pages and found the description was included as:
<meta name="description" content="Directed by Luis Valdez. With Lou Diamond Phillips, Esai Morales, Rosanna DeSoto, Elizabeth Pe帽a. Biographical story of the rise from nowhere of early rock and roll singer Ritchie Valens who died at age 17 in a plane crash with Buddy Holly and the Big Bopper." />
<meta property="og:description" content="Directed by Luis Valdez. With Lou Diamond Phillips, Esai Morales, Rosanna DeSoto, Elizabeth Pe帽a. Biographical story of the rise from nowhere of early rock and roll singer Ritchie Valens who died at age 17 in a plane crash with Buddy Holly and the Big Bopper." />
<script type="application/ld+json">{
聽 "@context": "http://schema.org",
...
"description": "La Bamba is a movie starring Lou Diamond Phillips, Esai Morales, and Rosanna DeSoto. Biographical story of the rise from nowhere of early rock and roll singer Ritchie Valens who died at age 17 in a plane crash with Buddy Holly and..."
...
}</script>
Note: the Schema.org version is truncated at 230 characters, with the full descriptions in the other metadata tags coming in at 260 characters.
Hey @jggautier - can we estimate the first four bullets in https://github.com/IQSS/dataverse/issues/4894#issuecomment-474073181 ? Seems like a discrete, clear chunk.
We should implement these:
FYI, the performance gains mentioned above are because the template is instantiating the DataversePage backing bean every time, even for other pages:
See line 26 of:
https://github.com/IQSS/dataverse/blob/develop/src/main/webapp/dataverse_template.xhtml
<meta name="description" content="#{MarkupChecker:stripAllTags(DataversePage.dataverse.description)}"/>
Added new meta_header param to dataverse_template, and dataverse, dataset and file pgs.
Example from dataset.xhtml:
<ui:define name="meta_header">
<meta name="description" content="#{DatasetPage.description}"/>
</ui:define>

Most helpful comment
@pdurbin - lets do it in pieces so this can get out as is...