Often we want complex environment variables that are not represented as JSON. One example is a list of items. It's not uncommon to comma-delimit lists like this in bash:
import os
from typing import List
from pydantic import BaseSettings
os.environ['options'] = "a,b,c"
class Settings(BaseSettings):
options: List
s = Settings()
This results in a JSONDecodeError. Writing a list of items as valid json is error prone and not human friendly to read in the context of environment variables.
One workaround is to store the variable as valid json, which is tricky to type correctly in lots of systems where you have to enter environment variables:
OPTIONS='["a","b","c"]'
Another (simplified) workaround is to set json_loads. but its not very elegant since json_loads doesn't know what field its is parsing, which could be error prone:
import json
import os
from typing import List
from pydantic import BaseSettings, BaseModel, root_validator
os.environ['options'] = "a,b,c"
def list_parse_fallback(v):
try:
return json.loads(v)
except Exception as e:
return v.split(",")
class Settings(BaseSettings):
options: List
class Config:
json_loads = list_parse_fallback
s = Settings()
I can see a couple options for implementing the fix:
parse_func = lambda x: x.split(",")
class Settings(BaseSettings):
options: List = Field(..., env_parse=parse_func)
If we take this approach, I think that we can update this branch:
https://github.com/samuelcolvin/pydantic/blob/master/pydantic/env_settings.py#L60-L62
Adding something like the following:
if field.is_complex():
if field.extra.get("env_parse", None) is not None:
try:
env_val = field.extra["env_parse"](env_val) # type: ignore
except ValueError as e:
raise SettingsError(f'error with custom parsing function for "{env_name}"') from e
else:
try:
env_val = self.__config__.json_loads(env_val) # type: ignore
except ValueError as e:
raise SettingsError(f'error parsing JSON for "{env_name}"') from e
d[field.alias] = env_val
Another implementation option is to add a new property like Settings.Config.parse_env_var which takes the field and the value so that it can be overridden to handle dispatching to different parsing methods for different names/properties of field (currently, just overriding json_loads means you are passed a value without knowing where it will be stored so you have to test for and handle all of the possible settings values.
class BaseSettings:
...
class Config:
...
@classmethod
def parse_env_var(cls, field, raw_val):
return cls.json_loads(raw_val)
Then the following line is the only change that is needed:
https://github.com/samuelcolvin/pydantic/blob/master/pydantic/env_settings.py#L62
Changes to self.__config__.parse_env_var(field, env_val)
Change the same line to:
env_val, ee = field.validate(env_val)
# collect ee and raise errors
Pros:
Cons:
Let fields implement custom serialization/deserialization methods. Currently there is json_encoders but not an equivalent json_decoders for use per-field.
There's some discussion of this here: https://github.com/samuelcolvin/pydantic/issues/951
Other ideas? Happy to implement a different suggestion.
Output of python -c "import pydantic.utils; print(pydantic.utils.version_info())":
pydantic version: 1.5.1
pydantic compiled: True
install path: /Users/bull/miniconda3/envs/sandbox/lib/python3.7/site-packages/pydantic
python version: 3.7.6 (default, Jan 8 2020, 13:42:34) [Clang 4.0.1 (tags/RELEASE_401/final)]
platform: Darwin-19.4.0-x86_64-i386-64bit
optional deps. installed: []
I think we should emove the if field.is_complex() bit of BaseSettings and replace it with a universal validator validate('*', pre=True) which tries decoding JSON, but also in the case of lists supports comma separated lists.
That way you could easily override the validator if you so wished.
I think this should be backwards compatible so could be done before v2. PR welcome.
Hi @samuelcolvin, I took a stab at your suggested implementation of moving the json decoding out of the _build_environ method and into a universal validator. Here is what that looks like (before trying to add in any parsing of comma-separated lists):
class BaseSettings(BaseModel):
...
@validator('*', pre=True)
def validate_env_vars(cls, v, config, field):
if isinstance(v, str) and field.is_complex():
try:
return config.json_loads(v) # type: ignore
except ValueError as e:
raise SettingsError(f'error parsing JSON for "{field}"') from e
return v
Unfortunately, this implementation ran into a problem with two tests: test_nested_env_with_basemodel and test_nested_env_with_dict.
These two tests have an field top that is an inner mapping with fields apple and banana. These two fields are specified in different places and stitched together: dict {'apple': 'value'} is passed into the constructor and the string '{"banana": "secret_value"}' is set as an environment variable. In the previous implementation, everything happens inside __init__: _build_environ is run first, which decodes the banana json into a dict, and then they are stitched together with the deep_update function.
Moving the json decoding into a validator breaks this functionality because it runs after __init__. Because the banana mapping is still a string when __init__ runs, deep_update isn't able to merge them and instead replaces the banana dict with the apple dict.
One option would be to drop support for merging nested objects. Complex environment variables seems like an unusual use case to start with, and needing to merge partially specified ones from different sources seems even more unusual.
I discussed this with @pjbull and this is what we like the most: it would simplify the code and remove the need for a deep update (a shallow update would be enough).
Currently input streams are merged in __init__ before running the validators. We could try an approach where the universal validator that does decoding runs on each of the input streams first, and then they get merged after.
This doesn't really fit into the existing flow very neatly and would involve making some part of the validation flow more complex.
Such as the approaches that @pjbull brainstormed.
I faced this issue today, as well. I wanted to parse something like this
REDIS_SENTINELS=192.168.0.1 192.168.0.2
using such a settings class:
class S(BaseSettings):
sentinels: List[str] = Field(..., env='REDIS_SENTINELS')
@validator('sentinels', pre=True)
def validate(cls, val):
return val.split(' ')
Ditto, I'd like to validate a string env var and transform it into a valid submodel. For example, MY_DB=user:pass@server?max_connections=100 -> a settings model containing a DB submodel with valid DSN and other settings. Currently, it seems I can only pass MY_DB as a stringified JSON object.
Given the complexity of these potential use cases, I'm kind of liking something like proposal (1) in the first comment in this thread. I may have some time on Friday to implement if that approach is interesting
hi all,
I'm also faced with this issue today.
taking into account this example, after almost a day of digging I realized that validator is not fired in case the class attribute is of List-type. If attribute type is str, for example, the attribute validator works just fine.
I used this code:
import os
from typing import List
from pydantic import BaseSettings, validator
os.environ['test_var'] = 'test_val'
class S1(BaseSettings):
test_var: str
@validator('test_var', pre=True)
def val_func(cls, v):
print('this validator is called: {}'.format(v))
return v
class S2(BaseSettings):
test_var: List[str]
@validator('test_var', pre=True)
def val_func(cls, v):
print('this validator is called: {}'.format(v))
return [v]
and then instantiating s1 = S1() prints this validator is called: test_val while the code s2 = S2() throws errors and prints nothing:
>>> s2 = S2()
Traceback (most recent call last):
File "pydantic/env_settings.py", line 118, in pydantic.env_settings.BaseSettings._build_environ
File "/home/den/anaconda3/lib/python3.7/json/__init__.py", line 348, in loads
return _default_decoder.decode(s)
File "/home/den/anaconda3/lib/python3.7/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/home/den/anaconda3/lib/python3.7/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "pydantic/env_settings.py", line 35, in pydantic.env_settings.BaseSettings.__init__
File "pydantic/env_settings.py", line 48, in pydantic.env_settings.BaseSettings._build_values
File "pydantic/env_settings.py", line 120, in pydantic.env_settings.BaseSettings._build_environ
pydantic.env_settings.SettingsError: error parsing JSON for "test_var"
is there any errors in my example code?
@zdens per some of the discussion earlier in the thread (kind of mixed in there with proposed changes), the reason you don't see the validator firing is because the code that is failing is the part that parses the environment variables, and that happens before the validators run.
Your variable that is the list needs to be valid JSON, like this:
OPTIONS='["a","b","c"]'
That's why you see a JSONDecodeError.
@jayqi ,
thank you for your reply!
now I understood why this thread has been started.