While developing a kedro plugin I started failing matrix testing with versioned datasets due to windows path. For now I have turned off versioning and the tests pass successfully.
full traceback further down 馃憞
[WinError 123] The filename, directory name, or volume label syntax is incorrect: 'C:/Users/runneradmin/AppData/Local/Temp/pytest-of-runneradmin/pytest-0/test_after_cleaned_None_None_0/raw/cars.csv/2020-05-25T21.15.02.127Z/C:'
https://github.com/WaylonWalker/steel-toes/runs/707423172?check_suite_focus=true
This is not greatly impacting me, but I wanted to raise it as an issue that I discovered. I am not able to matrix test versioned datasets. I suspect there are windows users that this would effect in a bigger way.
2020-05-25T21:15:02.9888576Z c:\hostedtoolcache\windows\python\3.7.7\x64\lib\os.py:223: OSError
2020-05-25T21:15:02.9888752Z
2020-05-25T21:15:02.9888962Z The above exception was the direct cause of the following exception:
2020-05-25T21:15:02.9889130Z
2020-05-25T21:15:02.9889405Z branched_dummy_context = <tests.cli.context.conftest.DummyContext object at 0x000002334A1D7048>
2020-05-25T21:15:02.9889623Z dummy_dataframe = col1 col2 col3
2020-05-25T21:15:02.9889810Z 0 1 4 5
2020-05-25T21:15:02.9890007Z 1 2 5 6
2020-05-25T21:15:02.9890164Z
2020-05-25T21:15:02.9890347Z @pytest.fixture
2020-05-25T21:15:02.9890629Z def ready_branched_dummy_context(branched_dummy_context, dummy_dataframe):
2020-05-25T21:15:02.9890919Z "gets dummy ready by placing a dummy dataframe at every input edge node"
2020-05-25T21:15:02.9891140Z print("ready branched")
2020-05-25T21:15:02.9891354Z for dataset in branched_dummy_context.pipeline.inputs():
2020-05-25T21:15:02.9891566Z d = getattr(branched_dummy_context.catalog.datasets, dataset)
2020-05-25T21:15:02.9891780Z > d.save(dummy_dataframe)
2020-05-25T21:15:02.9891985Z
2020-05-25T21:15:02.9892183Z tests\cli\context\conftest.py:299:
2020-05-25T21:15:02.9892381Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
2020-05-25T21:15:02.9892642Z c:\hostedtoolcache\windows\python\3.7.7\x64\lib\site-packages\kedro\io\core.py:625: in save
2020-05-25T21:15:02.9892885Z super().save(data)
2020-05-25T21:15:02.9893516Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
2020-05-25T21:15:02.9893714Z
2020-05-25T21:15:02.9894400Z self = <kedro.extras.datasets.pandas.csv_dataset.CSVDataSet object at 0x000002334A6BEE08>
2020-05-25T21:15:02.9894659Z data = col1 col2 col3
2020-05-25T21:15:02.9894865Z 0 1 4 5
2020-05-25T21:15:02.9895092Z 1 2 5 6
2020-05-25T21:15:02.9895240Z
2020-05-25T21:15:02.9895469Z def save(self, data: Any) -> None:
2020-05-25T21:15:02.9895708Z """Saves data by delegation to the provided save method.
2020-05-25T21:15:02.9895905Z
2020-05-25T21:15:02.9896197Z Args:
2020-05-25T21:15:02.9896409Z data: the value to be saved by provided save method.
2020-05-25T21:15:02.9896643Z
2020-05-25T21:15:02.9896843Z Raises:
2020-05-25T21:15:02.9897082Z DataSetError: when underlying save method raises error.
2020-05-25T21:15:02.9897288Z
2020-05-25T21:15:02.9897522Z """
2020-05-25T21:15:02.9897717Z
2020-05-25T21:15:02.9897916Z if data is None:
2020-05-25T21:15:02.9898170Z raise DataSetError("Saving `None` to a `DataSet` is not allowed")
2020-05-25T21:15:02.9898381Z
2020-05-25T21:15:02.9898591Z try:
2020-05-25T21:15:02.9898836Z self._logger.debug("Saving %s", str(self))
2020-05-25T21:15:02.9899051Z self._save(data)
2020-05-25T21:15:02.9899279Z except DataSetError:
2020-05-25T21:15:02.9899486Z raise
2020-05-25T21:15:02.9899687Z except Exception as exc:
2020-05-25T21:15:02.9899938Z message = "Failed while saving data to data set {}.\n{}".format(
2020-05-25T21:15:02.9900160Z str(self), str(exc)
2020-05-25T21:15:02.9900384Z )
2020-05-25T21:15:02.9900583Z > raise DataSetError(message) from exc
2020-05-25T21:15:02.9901990Z E kedro.io.core.DataSetError: Failed while saving data to data set CSVDataSet(filepath=C:\Users\runneradmin\AppData\Local\Temp\pytest-of-runneradmin\pytest-0\test_after_cleaned_None_None_0\raw\cars.csv, protocol=file, save_args={'index': False}, version=Version(load=None, save='2020-05-25T21.15.02.127Z')).
2020-05-25T21:15:02.9902834Z E [WinError 123] The filename, directory name, or volume label syntax is incorrect: 'C:/Users/runneradmin/AppData/Local/Temp/pytest-of-runneradmin/pytest-0/test_after_cleaned_None_None_0/raw/cars.csv/2020-05-25T21.15.02.127Z/C:'
Include as many relevant details about the environment in which you experienced the bug:
You can find the full workflow here https://github.com/WaylonWalker/steel-toes/blob/master/.github/workflows/test.yml
pip show kedro or kedro -V): 0.16.1python -V): 3.7Thanks Waylon, I've logged a ticket so the team can track this.
Will add some thoughts after some investigation for the record:
In conftest.py this line constructs the path, which will look like a WindowsPath. This string is then passed to PurePosixPath in CSVDataSet constructor. PurePosixPath doesn't split the path (because there are no / in it) and considers everything as one huge filename. Which results in PurePosixPath.name called inside AbstractVersionedDataSet returning not just the filename, but the whole path.
As for the solutions:
pathlib.PurePosixPath(tmp_path) / layer / f"{dataset}.csv"While running the Kedro Spaceshift Tutorial under a similar configuration (Kedro 0.16.2, up-to-date Windows 10.0.17763), I encountered the same situation, whenever I tried versioning the pickle.PickleDataSet regressor or the pandas.CSVDataSet master_table.
kedro.io.core.DataSetError: Failed while saving data to data set PickleDataSet(backend=pickle, filepath=C:\Users\cgaydon\Documents\Working Materials\Kedro Tutorials\kedro-tutorial\data\06_models\regressor.pickle, protocol=file, version=Version(load=None, save='2020-06-17T12.34.35.775Z')).
[WinError 123] La syntaxe du nom de fichier, de r茅pertoire ou de volume est incorrecte: 'C:/Users/cgaydon/Documents/Working Materials/Kedro Tutorials/kedro-tutorial/data/06_models/regressor.pickle/2020-06-17T12.34.35.775Z/C:'
Not sure if the issue can be solved from the user side for a Windows user side, as path handling happens under Kedro's hood. I think that as of now, the SpaceshiftTutorial cannot be completed with versioning by Windows User. More love for windows paths would definitely be appreciated !
@CharlesGaydon Can you try putting C:/Users/cgaydon/Documents/Working Materials/Kedro Tutorials/kedro-tutorial/data/06_models/regressor.pickle as a filepath for this dataset into your catalog.yml? I have a suspicion the error is due to how Kedro expands relative paths on Windows.
@DmitriiDeriabinQB I tried this, and it worked like a charm, thank you very much 馃槃
@WaylonWalker Thank you for reporting this issue. It was fixed in 390c02fbaf3a2801f06b5bbab3e7ec3650785c56 commit.
Is the latest code (390c02f commit) available in latest version 0.16.2 ? I have the latest version of Kedro and do not see the above code changes in it (via conda environment).
I believe Kedro pip install kedro -U will only run pip install -r requirements.txt
@abishtcca This fix will be available in the upcoming 0.16.3 release.
@abishtcca We have just released Kedro 0.16.3 :)
Most helpful comment
@abishtcca We have just released Kedro 0.16.3 :)