Sf: stringsAsFactors

Created on 9 Aug 2016  Â·  13Comments  Â·  Source: r-spatial/sf

Would it be best to set this to FALSE in ST_read?

df = data.frame(row.names = ids, apply(do.call(rbind, f), 2, unlist), stringsAsFactors = FALSE)

Arguably this could be user-controlled, but factors is an R feature so it seems best to stick to straight character. Happy to do the PR if you want the user-option control.

Most helpful comment

My general preference is that stringsAsFactors = FALSE is the more sensible default, but a bigger issue I see is that stringsAsFactors is not currently a primary argument for st_read - it seems like stringsAsFactors=default.stringsAsFactors() is buried all the way in the st_sf constructor.

It seems like it is important enough that it should be a primary argument (and included in the documentation) to bring it in line with readOGR, read.table, etc.

All 13 comments

When using R, what is wrong with R features? I added it, user controllable, but with the default of data.frame.

Well it is an R feature, so there's almost no chance that the behaviour is correct for an external data source, and so straight away there is a problem for writing the very same data out. Should it coerce safely to character or force an error? I would prefer a different default, and it's helpful to know that you really do not. I am happy to argue for it if you are interested but it probably doesn't belong here.

Thanks for the update!

As factor is R, I assume all export would coerce first to character. Do you have an example where character -> factor -> character looses information?

No, but I also don't understand why you would want that to need to happen. I'll post examples of problems if I encounter them.

The default in base R is there for historical reasons: previously factors used less memory and were easier to work with than character vectors. Now, there is no reason to prefer factors over characters, and indeed, it is incorrect in general to assume that a vector of strings has a predefined set of possible values (i.e. is a factor).

Most people find the default stringsAsFactors = TRUE to be substantially annoying.

For them, there is options(stringsAsFactors = FALSE), or the option to create sf objects from tibbles.

Oh perfect, tibble option is excellent! Hadn't thought of that . . .

On Fri, Aug 26, 2016, 19:24 Edzer Pebesma [email protected] wrote:

For them, there is options(stringsAsFactors = FALSE), or the option to
create sf objects from tibbles.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/edzer/sfr/issues/14#issuecomment-242677800, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AD6tbzYIcj0n6O6_Mh5VCOXspelRHMcbks5qjrC3gaJpZM4JfqHh
.

Dr. Michael Sumner
Software and Database Engineer
Australian Antarctic Division
203 Channel Highway
Kingston Tasmania 7050 Australia

My general preference is that stringsAsFactors = FALSE is the more sensible default, but a bigger issue I see is that stringsAsFactors is not currently a primary argument for st_read - it seems like stringsAsFactors=default.stringsAsFactors() is buried all the way in the st_sf constructor.

It seems like it is important enough that it should be a primary argument (and included in the documentation) to bring it in line with readOGR, read.table, etc.

I agree with @rundel : st_read now has the option explicitly and no longer through ..., both st_read and st_sf now have its docs as taken from data.frame, which points you to a way to override the default, options(stringsAsFactors = FALSE).

stringsAsFactors = TRUE is really annoying

That's what brought me to this page!

Can you elaborate on that? I am serious, it's helpful to have specific
details on what you did and what you expected and what you experienced.

It tends to drive my decision making in reverse but it's hard to capture
the process in the act and explain why the existing behaviour causes
problems.

@mdsumner I was not talking about sfr (I just realised this comment page is for sfr)

I just wished to voice out on the internet that R's read.table (stringsAsFactors = default.stringsAsFactors()) (which default.stringsAsFactors() often evaluates to TRUE in my R) is really annoying, I googled and was led me to this page!

I'm sorry about the confusion caused!

That "fix" was meant to refer to https://github.com/r-spatial/lwgeom/issues/14 , not here.

Was this page helpful?
0 / 5 - 0 ratings