Pydantic: StrictBool

Created on 5 Jun 2019 · 34Comments · Source: samuelcolvin/pydantic

Thank you for bringing types into python -- it makes daily life so much easier :)

However, I think there's a pretty serious bug with the bool validator.

Minimal working example:

from pydantic import BaseModel

class Model(BaseModel):
    a: bool

Model(a=1) # should cause exception
Model(a="snap") # should cause exception

a should be a bool and only a bool. If a user provides any other value to the endpoint (using fastapi), the database throws because it's the wrong type.

I cannot think of a single type that can be passed to bool(X) that won't return a boolean, and yet bool_validator returns that coercion. e.g. bool({"moo": 1}) == True. This makes the validator completely useless :(

Is there a reason behind this? And if it needs to stay as is, do you think adding a strict mode to the validator is an option?

Also, as an aside, there are no unit tests for the function.

I'm very happy to make a PR (adding some unit tests as well!) because it shouldn't take long, but I need to understand if there's a reason for this first.

Thanks!

Feedback Wanted feature request help wanted

Source

cazgp

Most helpful comment

I agree that pydantic has been a parsing library. I just don't see the following as a parsing API:

A(foo="true")

That looks like straight forward in-app (e.g. not at IO boundary) model creation to me. Wondering now why pydantic cannot be both:

A replacement for dataclasses
A parsing library

parsing will invariable come from input sources like http requests, stdin, ... and will be passed to e.g. parse_obj?

jasonkuhrt on 10 Jul 2019

👍3

All 34 comments

Yes there's a reason for this.

Please read #578, #284 and #360. You are not the first person to ask about this - pydantic is a parsing library and not a validation library.

There definitely are tests for this, hence the 100% coverage, here are 13 at least.

If you do not want this behaviour:

you can implement a validator with pre=True that only accepts a set list of things
you can implement your own StrictBool type.

Regarding your examples, since bool in python are a sub-type of int, I think most people would except 1 (and therefore probably '1') to be valid as a boolean value.

Other strings are trickier, in python bool('any string with length > 0') == True, but that's awkward since bool('false') == True, hence it's not trivial to decide which values are truey, eg. DRF and yaml accept values like yes.

I'd accept a PR to add support for StrictBool.

samuelcolvin on 5 Jun 2019

There's also the case of non strings, most people would (I imagine) expect any positive int to be coersed to True, "special" values like -1 are trickier.

What about dictionaries and lists? Same behaviour as python? Or maybe same as javascript if we've parsed JSON - it's not that simple.

What about arbitrary class? I imagine most people would expect most of them to be truey like in python, but that can get tricky.

samuelcolvin on 5 Jun 2019

Thanks for responding so quickly!

Thanks for pointing out the tests. I grepped an awful lot and couldn't find anything, and then was in test_validators for a very long time.

StrictBool sounds like it could be useful, however I still think that the coercion at the end is not expected behaviour. I cannot imagine that someone would expect {"test": 1} to be parsed as a bool. It really seems like that's user error which should be caught.

So, in your examples, I would say throw an error for all of them! Unless the value is a bool, or it matches a particular set of strings, I say reject it.

cazgp on 5 Jun 2019

Oh and 1 and 0 should probably be exceptions too

cazgp on 5 Jun 2019

I can of understand that {"test": "john doe"} being parsed to {'test': False} would be confusing for a lot of people.

However this is a breaking change, and people don't like breaking changes.

So what do other people think?

In particular:

which strings should be true/false?
what should we do with numbers other than 0/1?
what should we do with lists and dicts, what about if they're empty? true/false/error?
what should we do with other classes? true/false/empty? Should we look for the __bool__ method?

samuelcolvin on 5 Jun 2019

which strings should be true/false?

We could parse based on Yaml's boolean syntax spec;

Regexp:
 y|Y|yes|Yes|YES|n|N|no|No|NO
|true|True|TRUE|false|False|FALSE
|on|On|ON|off|Off|OFF

what should we do with numbers other than 0/1?

Error, or, if going with the above, not integers whatsoever

what should we do with lists and dicts, what about if they're empty? true/false/error?

Error

what should we do with other classes?

Error

jasonkuhrt on 11 Jun 2019

Oh and 1 and 0 should probably be exceptions too

Not 100% sure about this, but I get it. I think for a StrictBool type it is less acceptable than a bool type.

jasonkuhrt on 11 Jun 2019

Definitely 0 and 1 should be allowed.

And also the strings of those values.

samuelcolvin on 11 Jun 2019

Sure, and yeah the string variants too then

jasonkuhrt on 11 Jun 2019

My two cents regarding how I'd personally like to see these classes implemented:

For StrictBool:
- bools pass validation (duh)
- The only strings that don't raise validation errors are the yaml strings and "0" and "1"
- The only ints that don't raise validation errors are 0 and 1
- Everything else is a validation error -- if you use the StrictBool type, then you take responsibility for explicitly casting to bool.
  - I think checking for __bool__ would be okay, but if used, should probably require it to be explicitly implemented by the user (rather than inherited). But I don't know if there is a clean way to accomplish this, and would probably rather not have to worry about it.

For regular bool:
- Follow standard python bool behavior except parse "0" and the yaml false strings as False.
- (If this differs from the current implementation, e.g., other special cases are handled, I'm fine keeping it as is.)

I can of understand that {"test": "john doe"} being parsed to {'test': False} would be confusing for a lot of people.

Yeah, I would definitely vote that this case be treated as a validation error before it get parsed as False. True would be okay if parsed as a regular bool.

dmontagu on 11 Jun 2019

Follow standard python bool behavior except parse "0" and the yaml false strings as False.

Standard python behaviour is to parse literally anything as True, so I really don't think that's sensible.

class Test:
    pass

def test():
    pass

bool(Test())       # true
bool({"test": 1})  # true
bool(test)         # true

It all but renders this type as useless IMHO.

cazgp on 12 Jun 2019

It all but renders this type as useless IMHO.

Happy to have a discussion and hear opposing views, but please let's not resort to hysterics. 99.9% of pydantic usage is in parsing data from "dumb" formats, eg. JSON, YAML, form data, URL arguments where classes and functions can't be defined, so parsing the above as True would manifestly NOT render the type "useless".

In answer to the question, I think @jasonkuhrt's answer plus 0, 1, '0', '1' is probably the best option.

Have a look here for how DRF does this.

samuelcolvin on 12 Jun 2019

hysterics

That wasn't a wildly exaggerated and emotional reaction. I believe that a bool parser which permits nearly anything to be coerced into a boolean (particularly one named after a portmanteau of "pedantic") is not that useful. I think the behaviour is unexpected and leads to problems for people relying on the library to act as a type guard between their python functions. In my case, a non-boolean value was hitting the database, causing it to throw. Luckily it was caught in development.

Unfortunately github is unhappy with the DRF link.

Happy to go with @jasonkuhrt + 0, 1 as it seems straightforward and expected.

cazgp on 12 Jun 2019

Weird, the link works fine for me. It was line 636 and below in /rest_framework/fields.py.

samuelcolvin on 12 Jun 2019

Sweet, that works now!

Yeah that looks sound.

I might not get a chance to do it this week, but happy to have a pop next week :)

cazgp on 12 Jun 2019

It all but renders this type as useless IMHO.

My reasoning here was that if StrictBool were an alternative option, then I would expect the non-strict version to just follow python's built-in behavior (seems like the least surprising option, and is consistent with how str vs StrictStr works). If StrictBool were an option, and I cared to be strict, I would just use that.

Validation isn't pydantic's only capability -- I would still want to annotate fields as bool for type hinting / schema building, etc., even if I didn't care much about how it was validated (more complex validation logic may incur unwanted overhead; if that were the case I'd be fine with minimal validation in most cases). And I may care more about validating _other_ fields, while still wanting type hints, etc. So I do think this implementation of this type could be useful.

But this is a weakly held opinion, and I'm okay with a stricter implementation.

dmontagu on 12 Jun 2019

Here we have a bool that accepts _any_ value but only calls bool on that value in a limited number of cases. We also have a str validator that accepts a variety of types but _converts_ all of them to strs. We have ints that only accept ints but not bools though bool derives from int. It's not surprising that people are taken by surprise by Pydantic's handling of built-in types!

My opinion is this is scope creep. bool is used as a type hint. The semantics is straightforward: you can't subclass bool so anything other than True and False is not a bool. All of this back and forth about type casting is irrelevant. If you want to have a bool-like type that accepts all possible values in YAML, you should not call it a bool - the alternative is to smash everyone's type checkers to smithereens.

layday on 10 Jul 2019

@layday I don't think you're taking BaseSettings into account?

jasonkuhrt on 10 Jul 2019

@jasonkuhrt It makes sense that you'd want to convert env vars in BaseSettings (it's strings all the way down) but in routine BaseModels I don't think we should lump type hints together with preprocessing - at least not when using built-in types. Depending on your requirements you might find this behaviour very unintituive.

layday on 10 Jul 2019

@layday I'm not sure what your proposal is, though. Examples? My take on what you're saying:

class A(BaseModel):
    foo: bool

A(foo="true") # ERR
A.parse_obj(dict(foo="true")) # OK A<foo=True>

This seems intuitive, to me.

jasonkuhrt on 10 Jul 2019

@layday what about parsing url queries? bool has to be able to accept strings on many scenarios.

samuelcolvin on 10 Jul 2019

@layday

the alternative is to smash everyone's type checkers to smithereens.

What issues do you see/foresee with type checking? The value will be True or False once parsed (so will have the specified type), and pre-parsed values are already not expected to be the correct type in many other reasonable cases; e.g., annotating a type as int will cause '3' to be cast to 3. I am not aware of any issues that this is currently causing with type checking.

It's not surprising that people are taken by surprise by Pydantic's handling of built-in types!

I view pydantic's goal _not_ as maintaining 100% consistency with how python handles casting, but rather to be as broadly useful for parsing applications as possible. Sometimes, I think it is justified to have parsing logic that differs from standard python, especially when

the logic is more-or-less standard in other contexts (yaml, web APIs)
the python-default parsing behavior is surprising / not useful ("false" would be cast to True by default), and
it improves/simplifies the day-to-day usage of the library (rather than needing to import some CustomBool type every time I want to have a reasonably-parsed bool type)

This thread's discussion has convinced me that each of these points is met by this change.

While this approach to bool parsing may be unintuitive* for pydantic newcomers, I personally would much prefer the default behavior to be more useful in most cases over marginally more intuitive, and I believe this is what the discussed modification to default bool parsing achieves.

* For what it's worth, I think this thread has already highlighted that python's standard bool parsing behavior is viewed as __less__ intuitive by many users than the proposed bool parsing anyway.

dmontagu on 10 Jul 2019

👍1

@dmontagu thank you. You've said all the I was thinking.

samuelcolvin on 10 Jul 2019

What issues do you see/foresee with type checking?

Pydantic encourages the following pattern:

class Foo(BaseModel):
    some_url: URL = 'http://example.com/'

... where the string will be cast to URL on init.

I view pydantic's goal not as maintaining 100% consistency with how python handles casting ...

Type casting is orthogonal to type _checking_.

... but rather to be as broadly useful as possible for parsing applications as possible. ... While this approach to bool parsing may be unintuitive* for pydantic newcomers, I personally would much prefer the default behavior to be more useful in most cases over marginally more intuitive, and I believe this is what the discussed modification to default bool parsing achieves.

The premise here doesn't really hold. The behaviour is not unintuive to Pydantic newcomers; it's unintuitive in _specific contexts_. Let's take a real-world example - suppose you want to implement JSON-RPC in Pydantic. JSON-RPC says the value of id can be a number, a string or null. So in Python you write:

class JsonRpcRequest(BaseModel):
    id: Union[None, int, str]


class JsonRpcResponse(BaseModel):
    id: Union[None, int, str]


request = JsonRpcRequest(id='1')
response = JsonRpcResponse(id=request.id)
send(response.json())    # '{"id": 1}'

... and your client can no longer reconcile their request with your response. To me it'd make more sense to default to a simple isinstance check for built-in types and have something akin to:

from pydantic.types import LooseInt

class DataMungingModel(BaseModel):
    id: LooseInt

... for type casting. This would not only be easier than using custom validators to reverse quirks in Pydantic's handling of built-in types but it would also encourage people to use the appropriate type/validator for their use case.

layday on 10 Jul 2019

I respect your opinion but this is simply not what pydantic is about, switching everything to isinstance checks would not only break almost every single existing use of pydantic, but it would also IMHO yield a less useful library.

You might get a better understanding from #578 and the issues referenced from there, particularly the "strict mode" referenced in #284.

The only option I can think of to accommodate your view of what pydantic should be, would be a hook in validators.get_validators which allows you to provide your own validators for builtin types? If you'd like that, please create a new issue.

samuelcolvin on 10 Jul 2019

@layday

... where the string will be cast to URL on init.
Your example (adapted to use the pydantic type UrlStr) raises a mypy error:

from pydantic import BaseModel, UrlStr

class Foo(BaseModel):
    some_url: UrlStr = 'http://example.com/'
# error: Incompatible types in assignment (expression has type "str", variable has type "UrlStr")

While pydantic _would_ cast the string to a UrlStr, I would argue that the right way to accomplish this in a type-checked way would be to replace it with:

from pydantic import BaseModel, UrlStr

class Foo(BaseModel):
    some_url: UrlStr = UrlStr('http://example.com/')

(which does not raise a mypy error).

I don't view pydantic as opinionated either way here; if you want your code to pass static type checking, there's no obstruction. If not using mypy, I don't see much benefit to the cast; if using mypy, you'll be made aware of the problem immediately.

... and your client can no longer reconcile their request with your response.

I think this example is a valid issue with how the Union type is handled -- I think it is totally reasonable to expect that if you provide a value with a type in the Union that it prefer to retain its original type, rather than casting to the first type it can. (While I could imagine arguments in favor of alternatives, I think this would be less surprising in almost all cases.) This is probably worth a separate issue.

dmontagu on 10 Jul 2019

The only option I can think of to accommodate your view of what pydantic should be, would be a hook in validators.get_validators which allows you to provide your own validators for builtin types? If you'd like that, please create a new issue.

Fair enough. A get_validators hook sounds like a nice compromise solution.

I think this example is a valid issue with how the Union type is handled ...

I'm not sure if special-casing Union would be better than using a strict variant of the type. The presence of Union implies to me that you require types to be parsed 'strictly,' which you'd have an easier time accomplishing universally with the hook suggested above.

layday on 10 Jul 2019

I agree the Union handling is not ideal, maybe a hook for processing Union too would help?

samuelcolvin on 10 Jul 2019

What do people think about:

class A(BaseModel):
    foo: bool

A(foo="true") # ERR
A.parse_obj(dict(foo="true")) # OK A<foo=True>

jasonkuhrt on 10 Jul 2019

These two cases are exactly the same and should behave the same as they do now.

samuelcolvin on 10 Jul 2019

To be fair, I think the union issue is explicitly addressed in https://github.com/samuelcolvin/pydantic/issues/436:

In the case of pydantic the order is not ignored, we could better document this, but I don't think it's worth fixing since it would require lots more checking and would slow things down.

Given the recent string of github issues related to Union handling, it might be worth revisiting.

@jasonkuhrt I think your suggestion is potentially related to https://github.com/samuelcolvin/pydantic/issues/650 as well -- I think I personally could get on board with that sort of change, though it would obviously break the majority of pydantic code currently in existence.

dmontagu on 10 Jul 2019

I agree that pydantic has been a parsing library. I just don't see the following as a parsing API:

A(foo="true")

That looks like straight forward in-app (e.g. not at IO boundary) model creation to me. Wondering now why pydantic cannot be both:

A replacement for dataclasses
A parsing library

parsing will invariable come from input sources like http requests, stdin, ... and will be passed to e.g. parse_obj?

jasonkuhrt on 10 Jul 2019

👍3

fixed by #617

samuelcolvin on 12 Aug 2019

🎉1

Sorry, @jasonkuhrt that ship has sailed I'm afraid. :ship:

Maybe A(foo="true") shouldn't have done parsing/validation (don't think I agree, but I can understand the argument), but that was how I designed pydantic back when it was single class in a file somewhere parsing some http headers. It has stayed the same ever since and most people seem quite happy with it that way, this isn't going to change now.

samuelcolvin on 12 Aug 2019

👍1

Was this page helpful?

0 / 5 - 0 ratings