Data.table: [R-Forge #2197] A a simple labels attribute like in the Hmisc package for variable descriptions

Created on 8 Jun 2014  路  6Comments  路  Source: Rdatatable/data.table

Submitted by: Griffith Rees; Assigned to: Nobody; R-Forge link

One data management feature of stata which R lacks is descriptions of variables within the standard dataframe. The Hmisc package deals with this in a simple way: http://www.statmethods.net/input/variablelables.html. While this seems like a very trivial change, it allows large social science datasets with opaque variable names (have a look at the US Census) to actually be manageable within R without spending hours hand coding variable abbreviations to complicated variable names. If this were implemented, nicely written variable names (with spaces and special characters) could appear in tables and plots that are output straight to latex, without post-processing.

An example of how this could be used with the existing stata importer:

dta2data.table <- function(path) {
dta <- read.dta(path)
d <- data.table(dta)
setlabel(d, attr(dta, "val.labels"))
return(d)
}

Thanks again for an excellent and supremely useful project :)

beginner-task feature request

Most helpful comment

I had a brief look at links and it seems to be much broader approach.
AFAIU what we really need is just an extra attribute, that has to be retained/handled during common operations

d = data.table(celsius = 20, fahrenheit = 68)
setlabels = function(x, labels) {
  setattr(d, "labels", labels)
}
setlabels(d, labels = c("掳C","掳F"))

and then handle that nicely in print.data.table, fwrite(yaml=TRUE)

print(d)
#        掳C         掳F  
#   celsius fahrenheit
#1:      20         68

All 6 comments

I second this feature and in relation to this, some of the functionality at Morpho (http://bit.ly/1Tzc7Nj) looks interesting and very related to what Griffith mentions above, I guess.

I think this issue needs a use case MRE.

If this were implemented, nicely written variable names (with spaces and special characters) could appear in tables and plots that are output straight to latex, without post-processing.

This seems to be quite a deep request and probably better suited to an add-on package as it will likely require S3 or S4 methods for columns to auto-replace their names with their labels.

without spending hours hand coding variable abbreviations to complicated variable names

I'm not seeing why this is the case. In the example, there weren't "hours spend hand coding" (unless that was already done upstream and is anyway moot) -- we simply copy the labels attribute onto the data.table object -- either the object itself, or onto the columns individually.

This is and has always been possible (though I agree quite poorly documented) in base R and hence data.table. So, barring a more specific example of the anticipated workflow/API, I vote to close

Todays, beyond Hmisc other packages like haven, labelled or sjlabelled contribute to manage labels in the tidyverse package family.

I am starting to learn data.table, but not having the posibility of managing labels could discourage me to go on with it. It may be the case of many other people, since variable labels and categorical variable value labels are very useful.

Thank you anyway for the great package.

It is even more important because non-native encoding in column names cannot be reliably handled everywhere, and it seems that we will have to force users to change their column names in some cases. In such case labels could still carry required column names in any encoding.
@iago-pssjd could you maybe link a manual page that describes usage of those in some of the mentioned packages?

Yes, I link two pages for both labelled and sjlabelled, even when the second overlap the first a bit:

Thank you!

I had a brief look at links and it seems to be much broader approach.
AFAIU what we really need is just an extra attribute, that has to be retained/handled during common operations

d = data.table(celsius = 20, fahrenheit = 68)
setlabels = function(x, labels) {
  setattr(d, "labels", labels)
}
setlabels(d, labels = c("掳C","掳F"))

and then handle that nicely in print.data.table, fwrite(yaml=TRUE)

print(d)
#        掳C         掳F  
#   celsius fahrenheit
#1:      20         68
Was this page helpful?
0 / 5 - 0 ratings