Docs: Clustering Iris tutorial is missing

Created on 13 Jun 2018  Â·  12Comments  Â·  Source: dotnet/docs

Where is the tutorial about the Clustering Iris , i found the tutorial in dot.net/ml but there wasn't any code explanation like the Sentiment Analysis and Taxi Fare Predictor tutorials


Document Details

⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

Area - ML.NET Guide

All 12 comments

That's where that specific tutorial lives and then from there, you can continue with the additional tutorials we created for Docs. Is there anything you'd like to see? Would you like a more in-depth explanation of what the code is doing like we're doing in the tutorials here @Amine-Smahi?

/cc @asthana86 @OliaG

@mairaw yes please

@Amine-Smahi do you mean this tutorial? What confuses me is that it's example of multiclass classification, while you call it clustering. So, maybe you've meant some other tutorial.

If the link above goes to the correct example, I would like to make the full tutorial out of it. @mairaw if it's not yet in the pipeline of the ML team, please assign me.

By the way, @OliaG, there might be a minor bug in the example. The IrisData class contains the following lines:

[Column("4")]
[ColumnName("Label")]
public string Label;

and then it's used like

pipeline.Add(new TextLoader(dataPath).CreateFrom<IrisData>(separator: ','));

As far as I remember, the CreateFrom method doesn't check the ColumnName attribute and the existing code works only because the field is called Label. I'll double-check that, and if that's true, report an issue in the ML.NET repo.

I don't think it's in their pipeline to build the full tutorial out of it. But I think it's something worth discussing with the ML.NET team. I wouldn't get started on that without their blessing that it's the direction they wanna go though.

/cc @asthana86 @OliaG

@pkulikov, @Amine-Smahi "What confuses me is that it's example of multiclass classification, while you call it clustering. "
We are using same iris dataset for 2 different types of problem: multiclass classification and clustering. In the multiclass classification we are using the label column (type of the flower), so it is supervised learning (learning with known answers). In clustering we just ignore the label column and let the algorithm divide all flowers in groups without telling what was the right answer (unsupervised learning).

We do not have a doc for clustering yet as we just recently added this feature. @JRAlexander was going to work on this doc but if he has not started @pkulikov you're welcome to pick up this task, really appreciate your help! @JRAlexander please let @pkulikov know if you're on it or would like Petr's help.

@pkulikov you're right about [ColumnName("Label")] that we don't use it in TextLoader any more, we will remove that line.
in 0.2 version you either need to name the field "Label" in your data type (IrisData) or you can create a column "Label" in code and copy there desired column with the following transform:

pipeline.Add(new ColumnCopier("IrisType", "Label"));

@OliaG thank you for the clarification, that's clear now. So, we use the same data set as in the linked intro tutorial, but apply clustering technique to it. That would be the first tutorial on unsupervised learning in these docs, which is cool. @JRAlexander if you are not working on this yet, I'm fine to do that.

@OliaG

in 0.2 version you either need to name the field "Label" in your data type (IrisData) or you can create a column "Label" in code and copy there desired column

There is one more option: to use ColumnAttribute like it's done in the tutorial on the sentiment analysis:

[Column(ordinal: "0", name: "Label")]
public float Sentiment;

However, I liked the example of ColumnCopier as the transform step, so left it in the taxi fare tutorial.

@pkulikov

There is one more option: to use ColumnAttribute like it's done in the tutorial on the sentiment analysis:

You're absolutely right :)

@pkulikov Go ahead with writing a doc for clustering, we really appreciate your help!!!
cc @JRAlexander

@OliaG then, I'll use the sample from ML.NET samples repo as the foundation for the tutorial here?

@pkulikov yes, perfect!

Thank you guys !

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Manoj-Prabhakaran picture Manoj-Prabhakaran  Â·  3Comments

sebagomez picture sebagomez  Â·  3Comments

gmatv picture gmatv  Â·  3Comments

svick picture svick  Â·  3Comments

sime3000 picture sime3000  Â·  3Comments