Pydantic: Settings: Custom parsing of environment variables (non-JSON)

Created on 30 Apr 2020 · 8Comments · Source: samuelcolvin/pydantic

Settings: Custom parsing of environment variables (non-JSON)

Often we want complex environment variables that are not represented as JSON. One example is a list of items. It's not uncommon to comma-delimit lists like this in bash:

import os
from typing import List

from pydantic import BaseSettings

os.environ['options'] = "a,b,c"

class Settings(BaseSettings):
    options: List

s = Settings()

This results in a JSONDecodeError. Writing a list of items as valid json is error prone and not human friendly to read in the context of environment variables.

Workaround

One workaround is to store the variable as valid json, which is tricky to type correctly in lots of systems where you have to enter environment variables:

OPTIONS='["a","b","c"]'

Another (simplified) workaround is to set json_loads. but its not very elegant since json_loads doesn't know what field its is parsing, which could be error prone:

import json
import os
from typing import List

from pydantic import BaseSettings, BaseModel, root_validator

os.environ['options'] = "a,b,c"

def list_parse_fallback(v):
    try:
        return json.loads(v)
    except Exception as e:
        return v.split(",")

class Settings(BaseSettings):
    options: List

    class Config:
        json_loads = list_parse_fallback

s = Settings()

I can see a couple options for implementing the fix:

1. Store the parsing method in the field info extra:

parse_func = lambda x: x.split(",")

class Settings(BaseSettings):
    options: List = Field(..., env_parse=parse_func)

If we take this approach, I think that we can update this branch:
https://github.com/samuelcolvin/pydantic/blob/master/pydantic/env_settings.py#L60-L62

Adding something like the following:

if field.is_complex():
    if field.extra.get("env_parse", None) is not None:
        try:
            env_val = field.extra["env_parse"](env_val)  # type: ignore
        except ValueError as e:
            raise SettingsError(f'error with custom parsing function for "{env_name}"') from e
    else:
        try:
            env_val = self.__config__.json_loads(env_val)  # type: ignore
        except ValueError as e:
            raise SettingsError(f'error parsing JSON for "{env_name}"') from e
d[field.alias] = env_val

2. Add a new config option just for Settings for overriding how env vars are parsed

Another implementation option is to add a new property like Settings.Config.parse_env_var which takes the field and the value so that it can be overridden to handle dispatching to different parsing methods for different names/properties of field (currently, just overriding json_loads means you are passed a value without knowing where it will be stored so you have to test for and handle all of the possible settings values.

class BaseSettings:
    ...
    class Config:
         ...
        @classmethod
        def parse_env_var(cls, field, raw_val):
            return cls.json_loads(raw_val)

Then the following line is the only change that is needed:
https://github.com/samuelcolvin/pydantic/blob/master/pydantic/env_settings.py#L62

Changes to self.__config__.parse_env_var(field, env_val)

3. Call field validators on the raw string instead of trying to load from json first

Change the same line to:

env_val, ee = field.validate(env_val)

# collect ee and raise errors

Pros:

Adding validators for fields is well documented / understood

Cons:

Breaks existing JSON functionality if those fields already have validators
Mixes up the abstractions in that the functions would now do parsing and validation

4. Custom (de)serialization

Let fields implement custom serialization/deserialization methods. Currently there is json_encoders but not an equivalent json_decoders for use per-field.

There's some discussion of this here: https://github.com/samuelcolvin/pydantic/issues/951

5. Something else

Other ideas? Happy to implement a different suggestion.

Output of python -c "import pydantic.utils; print(pydantic.utils.version_info())":

             pydantic version: 1.5.1
            pydantic compiled: True
                 install path: /Users/bull/miniconda3/envs/sandbox/lib/python3.7/site-packages/pydantic
               python version: 3.7.6 (default, Jan  8 2020, 13:42:34)  [Clang 4.0.1 (tags/RELEASE_401/final)]
                     platform: Darwin-19.4.0-x86_64-i386-64bit
     optional deps. installed: []

feature request help wanted

Source

pjbull

👍10

All 8 comments

I think we should emove the if field.is_complex() bit of BaseSettings and replace it with a universal validator validate('*', pre=True) which tries decoding JSON, but also in the case of lists supports comma separated lists.

That way you could easily override the validator if you so wished.

I think this should be backwards compatible so could be done before v2. PR welcome.

samuelcolvin on 30 Apr 2020

Hi @samuelcolvin, I took a stab at your suggested implementation of moving the json decoding out of the _build_environ method and into a universal validator. Here is what that looks like (before trying to add in any parsing of comma-separated lists):

class BaseSettings(BaseModel):

    ...

    @validator('*', pre=True)
    def validate_env_vars(cls, v, config, field):
        if isinstance(v, str) and field.is_complex():
            try:
                return config.json_loads(v)  # type: ignore
            except ValueError as e:
                raise SettingsError(f'error parsing JSON for "{field}"') from e
        return v

Unfortunately, this implementation ran into a problem with two tests: test_nested_env_with_basemodel and test_nested_env_with_dict.

These two tests have an field top that is an inner mapping with fields apple and banana. These two fields are specified in different places and stitched together: dict {'apple': 'value'} is passed into the constructor and the string '{"banana": "secret_value"}' is set as an environment variable. In the previous implementation, everything happens inside __init__: _build_environ is run first, which decodes the banana json into a dict, and then they are stitched together with the deep_update function.

Moving the json decoding into a validator breaks this functionality because it runs after __init__. Because the banana mapping is still a string when __init__ runs, deep_update isn't able to merge them and instead replaces the banana dict with the apple dict.

Option A: Drop support for merging nested objects

One option would be to drop support for merging nested objects. Complex environment variables seems like an unusual use case to start with, and needing to merge partially specified ones from different sources seems even more unusual.

I discussed this with @pjbull and this is what we like the most: it would simplify the code and remove the need for a deep update (a shallow update would be enough).

Option B: Merging input streams after the universal decoding validator

Currently input streams are merged in __init__ before running the validators. We could try an approach where the universal validator that does decoding runs on each of the input streams first, and then they get merged after.

This doesn't really fit into the existing flow very neatly and would involve making some part of the validation flow more complex.

Option C: Keeping decoding in _build_environ and use a different approach

Such as the approaches that @pjbull brainstormed.

jayqi on 2 May 2020

I faced this issue today, as well. I wanted to parse something like this

REDIS_SENTINELS=192.168.0.1 192.168.0.2

using such a settings class:

class S(BaseSettings):
   sentinels: List[str] = Field(..., env='REDIS_SENTINELS')

    @validator('sentinels', pre=True)
    def validate(cls, val):
        return val.split(' ')

chbndrhnns on 2 Jun 2020

Ditto, I'd like to validate a string env var and transform it into a valid submodel. For example, MY_DB=user:pass@server?max_connections=100 -> a settings model containing a DB submodel with valid DSN and other settings. Currently, it seems I can only pass MY_DB as a stringified JSON object.

pikeas on 5 Aug 2020

Given the complexity of these potential use cases, I'm kind of liking something like proposal (1) in the first comment in this thread. I may have some time on Friday to implement if that approach is interesting

pjbull on 5 Aug 2020

hi all,
I'm also faced with this issue today.

taking into account this example, after almost a day of digging I realized that validator is not fired in case the class attribute is of List-type. If attribute type is str, for example, the attribute validator works just fine.

I used this code:

import os
from typing import List
from pydantic import BaseSettings, validator

os.environ['test_var'] = 'test_val'


class S1(BaseSettings):
    test_var: str

    @validator('test_var', pre=True)
    def val_func(cls, v):
        print('this validator is called: {}'.format(v))
        return v

class S2(BaseSettings):
    test_var: List[str]

    @validator('test_var', pre=True)
    def val_func(cls, v):
        print('this validator is called: {}'.format(v))
        return [v]

and then instantiating s1 = S1() prints this validator is called: test_val while the code s2 = S2() throws errors and prints nothing:

>>> s2 = S2()
Traceback (most recent call last):
  File "pydantic/env_settings.py", line 118, in pydantic.env_settings.BaseSettings._build_environ
  File "/home/den/anaconda3/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/home/den/anaconda3/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/den/anaconda3/lib/python3.7/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "pydantic/env_settings.py", line 35, in pydantic.env_settings.BaseSettings.__init__
  File "pydantic/env_settings.py", line 48, in pydantic.env_settings.BaseSettings._build_values
  File "pydantic/env_settings.py", line 120, in pydantic.env_settings.BaseSettings._build_environ
pydantic.env_settings.SettingsError: error parsing JSON for "test_var"

is there any errors in my example code?

zdens on 27 Oct 2020

👀1 👍1

@zdens per some of the discussion earlier in the thread (kind of mixed in there with proposed changes), the reason you don't see the validator firing is because the code that is failing is the part that parses the environment variables, and that happens before the validators run.

Your variable that is the list needs to be valid JSON, like this: