Dvc.org: build interactive lessons with katacoda

Created on 10 Aug 2019  路  17Comments  路  Source: iterative/dvc.org

Katacoda allows you to build interactive tutorials like these: https://www.katacoda.com/courses/machine-learning

It may be useful to build some basic tutorials for DVC. For example get-started may be covered by 2 or 3 such interactive tutorials (data tracking, pipeline, experiments). Other more advanced tutorials may be added as well, which demonstrate other features of DVC.

These interactive tutorials give new users a quick hands-on access to DVC, so that they can give it a try and get a feeling about it and its features, without going through the trouble of installing it. This also simplifies the get-started tutorial because installation instructions will not have to be part of it. (Installation instructions can be moved on their own page under the user-guide.)

Katacoda allows also embedding a pre-configured interactive environment on a web page. This may be very useful for trying examples on command-reference pages.

doc-content

Most helpful comment

@shcheklein I am not done yet with the interactive lessons/tutorials/examples, but the lessons about the basic concepts are almost finished. You can review them if you wish.

I am planning to add one more basics lesson about get/import, and then some tutorials and interactive examples. They may take me another week or two.

All 17 comments

This is sounds great! We definitely need something like this.

I have started with Katacoda scenarios, but it is still WIP: https://katacoda.com/dvc/

@shcheklein I am not done yet with the interactive lessons/tutorials/examples, but the lessons about the basic concepts are almost finished. You can review them if you wish.

I am planning to add one more basics lesson about get/import, and then some tutorials and interactive examples. They may take me another week or two.

A few comments so far:

  1. you are not using/mentioning virtualenv/pyenv - this is encouraging bad practices. You also don't mention that there are other options available how to install it.
  2. don't really like the --no-scm approach - it's a really rare edge case and you start with it, instead of explaining actual basics which is Git an essential part of
  3. sed -i file2.txt.dvc -e s/file1.txt/file2.txt/ - super advanced stuff
  4. dvc move - again, it's better I believe to explain what DVC-file is and that it can be edited
    (and I'm not sure that moving DVC and/or data files should be at the very beginning)
  5. dvc add -R super advanced command that you usually need if you understand why do you need it, in 99% of cases you need dvc add dir
  6. in the Tracking Data Versions - why not introduce it with Git? not sure why would you complicate it with --no-scm

@shcheklein Maybe you are missing the whole picture, or maybe I have not explained it well.

I think that for installation we should have a separate top-level page that explains all the possible ways of installation, including virtualenv / pyenv (https://github.com/iterative/dvc.org/issues/656).
By the way, virtualenv / pyenv is suitable when you install DVC just for testing, or just for a trying tutorial on your computer. Since we are using the virtual environment of Katacoda we can install it for real.

I have explained in the introduction of the first lesson why I am using the --no-scm approach:

DVC and Git are not exclusive to each-other; DVC is usually integrated with Git so that while Git keeps track of the code and configuration files, DVC keeps track of the data. However DVC can also work independently of Git, and this is how we will use it in this scenario (for the sake of simplicity, so that we can focus on the basic features of DVC). In later scenarios we will also see how DVC works together with Git and how it takes advantage of Git's versioning features to keep track of the data versions.

The audience of these introductory lessons are DVC beginners, which don't have to be Linux beginners or ML beginners (they might be Linux experts or ML experts). Anyway, the sed command is part of core utils, a basic command that is available on any Linux systems and is explained or mentioned in any Linux book or tutorial. So, it is not advanced at all. Even if it was, you just see it, you try it, and you understand what it does. That's all, no big science about it.

I am trying to be consistent and complete on my explanations and to follow a methodical approach. So, when I explain dvc init I also explain dvc destroy. When I explain dvc add, I also explain dvc move, dvc add -R, dvc remove, etc. That's all about it.

When explaining data tracking I try to explain it first with --no-scm so that the readers can understand first the basic mechanism of data tracking. I have struggled to understand this properly myself, so I have tried to fill this gap and make it easier for the readers.

You will notice that this is a pattern. I try to explain first the basic mechanism and then I tell the readers the best or recommended practices. I sometimes even make mistakes on purpose and correct them later, because you can also learn from making mistakes, and you also need to know how to correct mistakes.

I think that for installation we should have a separate top-level page that explains all the possible ways of installation, including virtualenv / pyenv (#656).

addressed that in the #656 . That ticket though does not answer my concern - it's better to mention that pip w/o virtualenv is used for simplicity only and link to the proper installation page (e.g. get started/install).

I also created a ticket to simplify the existing get-started/install. It has become too overloaded and also does not mention virtualenv/pyenv or recommends other ways to install DVC instead of pip.

By the way, virtualenv / pyenv is suitable when you install DVC just for testing

Why is that? virtualenv/pyen must be used every time you want to install anything globally. I would recommend never install any package with pip into system environment.

I have explained in the introduction of the first lesson why I am using the --no-scm approach:

I think this is not the right approach. As I mentioned - this is confusing. The way Git and DVC work together is the core piece of the workflow. I don't see any reasons to delay the explanation and spend time on the option that is very rarely needed.

So, it is not advanced at all. Even if it was, you just see it, you try it, and you understand what it does

It is advanced. My recommendation is to explain what DVC-file is, that there is nothing magical about it and one can edit, remove, etc it manually. And that we use sed-magic only because of scripting.

I also explain dvc move, dvc add -R, dvc remove, etc

same here. I believe dvc move and dvc remove should be removed - they are broken right now. It's ways easier to explain what DVC-file is about. dvc add -R - similar to --no-scm - very advanced command that 99% users don't need. They need just dvc add dir. Please, read carefully dvc add command reference or ask the community about -R.

I have struggled to understand this properly myself, so I have tried to fill this gap and make it easier for the readers.

Make it DVC-file centric from the very beginning. It is the key to DVC understanding. I haven't tried to this with get started yet, but I think next iteration will look like this.

I try to explain first the basic mechanism and then I tell the readers the best or recommended practices

and it's fine. It's just in certain cases you are actually explaining advanced things instead of focusing on simple basic commands and options.

Hi! Haven't read everything above. I just want to ask what's the summary of what's missing here? Perhaps we could open a dummy PR for Ivan and myself to start reviewing the Katacoda examples (or do it in that repo). Other than reviewing, is there much more left for a first phase here? Thanks

is there much more left for a first phase here?

I am done with the Get Started, Basics, and Tutorials:

I am still working on adding some more Examples:

By the way, if you have not noticed, these are already linked from the page of Interactive Tutorials: https://dvc.org/doc/tutorials/interactive

@dashohoxha thanks for the update. There was a good question about reviewing. I think it's a valid one.

There was a good question about reviewing. I think it's a valid one.

I think that we should find some beginners to review them, if possible (maybe students).
I remember some people have followed the Basics lessons and they had good impressions about them (maybe I can find a reference about this on the chat, I am not sure).

@dashohoxha I think that's where some misunderstanding comes from. While we do care about zero-day experience, our primary target audience are experienced ML practitioners, data scientists, engineers. And first we should think about them, then about students.

The idea of independent review is good, but I think it can be complimentary to a regular reviews and discussions process we have with the team.

Also, you should keep in mind that making a meaningful focus group review is a process that requires some skills and preparation. It's not enough to just ask was good or bad. You need to evaluate the result - was the information understood or not, etc.

@shcheklein I agree with everything that you say. Let's find some non-student beginners to review them.

I am afraid that you and Jorge are biased (including me, after 3 months of being involved with DVC). For example 3 months ago I could say what I found difficult to understand and what not, now I cannot make the difference anymore.

For example, look at this review of Get Started that I made 3 month ago: https://github.com/iterative/dvc.org/issues/545
If I go through Get Started now everything seems normal to me and I can't see any problems. But that was not so at the beginning.

Agree, we're all biased here. Still we can help review your work and double check the examples run and make sense, and we should do this anyway. My Q is where/how to do this. I believe there's a separate GitHub repo with the Kacacoda examples? Sorry, I know I've asked this before but lost track of that conversation, wherever it happened.

Agree, we're all biased here. Still we can help review your work and double check the examples run and make sense, and we should do this anyway.

Thanks @jorgeorpinel, this makes sense. For example the reading time for each tutorial is a very rough estimation, and having to go many times through each tutorial, I am not able to measure it accurately. A first time reader should be able to do it better than me (of course, a DVC beginner would be even better).

My Q is where/how to do this. I believe there's a separate GitHub repo with the Kacacoda examples?

You can open issues on https://github.com/iterative/katacoda-scenarios/issues

I think this issue can be closed.
If needed the discussion can continue here: https://github.com/iterative/katacoda-scenarios/issues

Was this page helpful?
0 / 5 - 0 ratings

Related issues

utkarshsingh99 picture utkarshsingh99  路  3Comments

dashohoxha picture dashohoxha  路  4Comments

pared picture pared  路  4Comments

jorgeorpinel picture jorgeorpinel  路  3Comments

piojanu picture piojanu  路  4Comments