User reported that our xml_to_tsv.py fails on python2 because of encoding issues in open().
https://discordapp.com/channels/485586884165107732/485586884165107734/558296626381324324
Reminder: code for tutorial is stored here https://github.com/iterative/dvc-doc-tutorial and should be deployed from there.
IIRC I ran into the same issue and did not say anything. On my systems python2 is the default, so my scripts sometimes start with:
PYTHON=python3.6
PIP=pip3
And then in the script to set up the corresponding tutorial I have this:
${PIP} install -r code/requirements.txt
...
dvc run -d data/Posts.xml -d code/xml_to_tsv.py -d code/conf.py \
-o data/Posts.tsv \
${PYTHON} code/xml_to_tsv.py
Hey @efiop, I ran the code by forking and cloning the repo on my system and running the xml_to_tsv.py file and got this error :
Input file data\Posts.xml does not exist
Usage:
python posts_to_tsv.py
Is there anything that needs to be supplemented, please let me know!
@AJ-54 Do you have data\Posts.xml though?
No @efiop , I cloned https://github.com/iterative/dvc-doc-tutorial and this does not contain any data\Posts.xml file
@AJ-54 You can't use that code as a standalone script. It is for https://dvc.org/doc/tutorial. The https://github.com/iterative/dvc-doc-tutorial is just a repo that code is being deployed from.
Well, Thanks @efiop I understood it. I searched for posts.xml all inside the https://github.com/iterative/dvc.org. but could not find the file. Can you help me out here?
@AJ-54 It is described in the tutorial itself https://dvc.org/doc/tutorial/define-ml-pipeline :slightly_smiling_face: The point of this ticket is to go through the full tutorial on python2 to make sure that it works.
Thanks for the help @efiop, I downloaded and made the required data folder and pasted the posts.xml file in it and ran the xml_to_tsv.py file in python 2 and the file ran successfully without any errors. I got a tsv file in the same data folder by name Posts.tsv . So, I think there is no error anywhere in the code
@AJ-54 Thanks for trying it out! :slightly_smiling_face: But the point of this ticket is to go through the _full tutorial_ on python 2, not only that single script.
@efiop yes, I am on my way to do that.
Can I work on this?
@kurianbenoy sure! thanks for your help! :)
I have been trying to run the Tutorial in python2 . I have encountered some issues so far:
$ dvc run -d code/featurization.py -d code/conf.py \
-d data/Posts-train.tsv -d data/Posts-test.tsv \
-o data/matrix-train.p -o data/matrix-test.p \
python code/featurization.py$ git clone https://github.com/dmpetrov/classify.git does not exist. On looking by, I think this is the new repo. @dmpetrov can you please confirmHi! Is this still something we care about? Python 2 I think will be deprecated in 2 months. Just checking.
@jorgeorpinel yep, let's close it. May be update our docs to specify that we haven't tested on Python 2 and do not recommend using it?
OK adding notes in a new PR that will close this issue.
I guess I'll just close it now.
Most helpful comment
Can I work on this?