Kedro: [KED-1708] Versioned datasets throw filename syntax error on Windows-latest

Created on 27 May 2020  路  9Comments  路  Source: quantumblacklabs/kedro

Description

While developing a kedro plugin I started failing matrix testing with versioned datasets due to windows path. For now I have turned off versioning and the tests pass successfully.

Error

full traceback further down 馃憞

[WinError 123] The filename, directory name, or volume label syntax is incorrect: 'C:/Users/runneradmin/AppData/Local/Temp/pytest-of-runneradmin/pytest-0/test_after_cleaned_None_None_0/raw/cars.csv/2020-05-25T21.15.02.127Z/C:'

full report of failing test

https://github.com/WaylonWalker/steel-toes/runs/707423172?check_suite_focus=true

Context

This is not greatly impacting me, but I wanted to raise it as an issue that I discovered. I am not able to matrix test versioned datasets. I suspect there are windows users that this would effect in a bigger way.

Steps to Reproduce

  1. run tests https://github.com/WaylonWalker/steel-toes/blob/master/tests/cli/context/test_context.py

Longer snippet of a single failure

2020-05-25T21:15:02.9888576Z c:\hostedtoolcache\windows\python\3.7.7\x64\lib\os.py:223: OSError
2020-05-25T21:15:02.9888752Z 
2020-05-25T21:15:02.9888962Z The above exception was the direct cause of the following exception:
2020-05-25T21:15:02.9889130Z 
2020-05-25T21:15:02.9889405Z branched_dummy_context = <tests.cli.context.conftest.DummyContext object at 0x000002334A1D7048>
2020-05-25T21:15:02.9889623Z dummy_dataframe =    col1  col2  col3
2020-05-25T21:15:02.9889810Z 0     1     4     5
2020-05-25T21:15:02.9890007Z 1     2     5     6
2020-05-25T21:15:02.9890164Z 
2020-05-25T21:15:02.9890347Z     @pytest.fixture
2020-05-25T21:15:02.9890629Z     def ready_branched_dummy_context(branched_dummy_context, dummy_dataframe):
2020-05-25T21:15:02.9890919Z         "gets dummy ready by placing a dummy dataframe at every input edge node"
2020-05-25T21:15:02.9891140Z         print("ready branched")
2020-05-25T21:15:02.9891354Z         for dataset in branched_dummy_context.pipeline.inputs():
2020-05-25T21:15:02.9891566Z             d = getattr(branched_dummy_context.catalog.datasets, dataset)
2020-05-25T21:15:02.9891780Z >           d.save(dummy_dataframe)
2020-05-25T21:15:02.9891985Z 
2020-05-25T21:15:02.9892183Z tests\cli\context\conftest.py:299: 
2020-05-25T21:15:02.9892381Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
2020-05-25T21:15:02.9892642Z c:\hostedtoolcache\windows\python\3.7.7\x64\lib\site-packages\kedro\io\core.py:625: in save
2020-05-25T21:15:02.9892885Z     super().save(data)
2020-05-25T21:15:02.9893516Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
2020-05-25T21:15:02.9893714Z 
2020-05-25T21:15:02.9894400Z self = <kedro.extras.datasets.pandas.csv_dataset.CSVDataSet object at 0x000002334A6BEE08>
2020-05-25T21:15:02.9894659Z data =    col1  col2  col3
2020-05-25T21:15:02.9894865Z 0     1     4     5
2020-05-25T21:15:02.9895092Z 1     2     5     6
2020-05-25T21:15:02.9895240Z 
2020-05-25T21:15:02.9895469Z     def save(self, data: Any) -> None:
2020-05-25T21:15:02.9895708Z         """Saves data by delegation to the provided save method.
2020-05-25T21:15:02.9895905Z     
2020-05-25T21:15:02.9896197Z         Args:
2020-05-25T21:15:02.9896409Z             data: the value to be saved by provided save method.
2020-05-25T21:15:02.9896643Z     
2020-05-25T21:15:02.9896843Z         Raises:
2020-05-25T21:15:02.9897082Z             DataSetError: when underlying save method raises error.
2020-05-25T21:15:02.9897288Z     
2020-05-25T21:15:02.9897522Z         """
2020-05-25T21:15:02.9897717Z     
2020-05-25T21:15:02.9897916Z         if data is None:
2020-05-25T21:15:02.9898170Z             raise DataSetError("Saving `None` to a `DataSet` is not allowed")
2020-05-25T21:15:02.9898381Z     
2020-05-25T21:15:02.9898591Z         try:
2020-05-25T21:15:02.9898836Z             self._logger.debug("Saving %s", str(self))
2020-05-25T21:15:02.9899051Z             self._save(data)
2020-05-25T21:15:02.9899279Z         except DataSetError:
2020-05-25T21:15:02.9899486Z             raise
2020-05-25T21:15:02.9899687Z         except Exception as exc:
2020-05-25T21:15:02.9899938Z             message = "Failed while saving data to data set {}.\n{}".format(
2020-05-25T21:15:02.9900160Z                 str(self), str(exc)
2020-05-25T21:15:02.9900384Z             )
2020-05-25T21:15:02.9900583Z >           raise DataSetError(message) from exc
2020-05-25T21:15:02.9901990Z E           kedro.io.core.DataSetError: Failed while saving data to data set CSVDataSet(filepath=C:\Users\runneradmin\AppData\Local\Temp\pytest-of-runneradmin\pytest-0\test_after_cleaned_None_None_0\raw\cars.csv, protocol=file, save_args={'index': False}, version=Version(load=None, save='2020-05-25T21.15.02.127Z')).
2020-05-25T21:15:02.9902834Z E           [WinError 123] The filename, directory name, or volume label syntax is incorrect: 'C:/Users/runneradmin/AppData/Local/Temp/pytest-of-runneradmin/pytest-0/test_after_cleaned_None_None_0/raw/cars.csv/2020-05-25T21.15.02.127Z/C:'

Your Environment

Include as many relevant details about the environment in which you experienced the bug:

You can find the full workflow here https://github.com/WaylonWalker/steel-toes/blob/master/.github/workflows/test.yml

  • Kedro version used (pip show kedro or kedro -V): 0.16.1
  • Python version used (python -V): 3.7
  • Operating system and version: Windows-latest
Bug Report

Most helpful comment

@abishtcca We have just released Kedro 0.16.3 :)

All 9 comments

Thanks Waylon, I've logged a ticket so the team can track this.

Will add some thoughts after some investigation for the record:

In conftest.py this line constructs the path, which will look like a WindowsPath. This string is then passed to PurePosixPath in CSVDataSet constructor. PurePosixPath doesn't split the path (because there are no / in it) and considers everything as one huge filename. Which results in PurePosixPath.name called inside AbstractVersionedDataSet returning not just the filename, but the whole path.

As for the solutions:

  1. For your use case the immediate solution might be to change the line in the conftest to something like pathlib.PurePosixPath(tmp_path) / layer / f"{dataset}.csv"
  2. On Kedro side we should give some love to Windows paths, which are not really supported as of now.

While running the Kedro Spaceshift Tutorial under a similar configuration (Kedro 0.16.2, up-to-date Windows 10.0.17763), I encountered the same situation, whenever I tried versioning the pickle.PickleDataSet regressor or the pandas.CSVDataSet master_table.

kedro.io.core.DataSetError: Failed while saving data to data set PickleDataSet(backend=pickle, filepath=C:\Users\cgaydon\Documents\Working Materials\Kedro Tutorials\kedro-tutorial\data\06_models\regressor.pickle, protocol=file, version=Version(load=None, save='2020-06-17T12.34.35.775Z')).
[WinError 123] La syntaxe du nom de fichier, de r茅pertoire ou de volume est incorrecte: 'C:/Users/cgaydon/Documents/Working Materials/Kedro Tutorials/kedro-tutorial/data/06_models/regressor.pickle/2020-06-17T12.34.35.775Z/C:'

Not sure if the issue can be solved from the user side for a Windows user side, as path handling happens under Kedro's hood. I think that as of now, the SpaceshiftTutorial cannot be completed with versioning by Windows User. More love for windows paths would definitely be appreciated !

@CharlesGaydon Can you try putting C:/Users/cgaydon/Documents/Working Materials/Kedro Tutorials/kedro-tutorial/data/06_models/regressor.pickle as a filepath for this dataset into your catalog.yml? I have a suspicion the error is due to how Kedro expands relative paths on Windows.

@DmitriiDeriabinQB I tried this, and it worked like a charm, thank you very much 馃槃

@WaylonWalker Thank you for reporting this issue. It was fixed in 390c02fbaf3a2801f06b5bbab3e7ec3650785c56 commit.

Is the latest code (390c02f commit) available in latest version 0.16.2 ? I have the latest version of Kedro and do not see the above code changes in it (via conda environment).
I believe Kedro pip install kedro -U will only run pip install -r requirements.txt

@abishtcca This fix will be available in the upcoming 0.16.3 release.

@abishtcca We have just released Kedro 0.16.3 :)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

tamsanh picture tamsanh  路  3Comments

WaylonWalker picture WaylonWalker  路  3Comments

f-istvan picture f-istvan  路  3Comments

WaylonWalker picture WaylonWalker  路  3Comments

yetudada picture yetudada  路  3Comments