Python ML/plotting libraries provide great ways to support Exploratory Data Analysis (EDA) with some awesome charting capabilities - any plans in this area for ML.NET?
I totally agree that charting and visualization is an essential tool for ML practitioners. Currently, .NET ecosystem lacks convenient tools for these, and it is a problem.
I think ML.NET is not the right library to have this functionality though.
@CESARDELATORRE
We're exploring possibilities for having Exploratory Data Analysis (EDA) in .NET, in general, since it is useful in general, not just for ML.NET.
There are multiple third-party .NET libraries available, but they might not be what we need in .NET Core:
oxyplot: https://github.com/oxyplot
聽
Live-Charts: https://github.com/beto-rodriguez/Live-Charts
聽
ScatterplotBox in Accord.net: http://accord-framework.net/docs/html/T_Accord_Controls_ScatterplotBox.htm
A possible approach could be to create a higher level API in a .NET Standard library based on PLPlot
There's actually a .NET library providing the .NET bindings for PLPlot named PLPlotNet, but it is not "official" and the main issue is that it is kind of "low level" API. Plotting with those .NET bindings is not as easy as MatPlotLib or SeaBorn in Python.
A possible approach from Microsoft could be to provide a library with a higher level API simplifying plotting with PLPlot and integrating it with ML.NET DataView (or IList collections), so plotting would be simpler than using the low level API in PLPlotNet.
@mzhukovs - QUESTION: Do you think PLPlot would solve your needs for Exploratory Data Analysis (EDA)?
Please, provide more feedback and needs around EDA as that will help us deciding on priorities.
Thanks for your feedback! 馃憤
Sounds like a good potential option as long as it's in C# and to your main point is higher level so that EDA can be done quickly and efficiently without having to spend hours just to configure a few views, e.g. a dendrogram which is a pretty specialized chart and would likely be a very difficult task for one to set up their own if the library doesn't provide it, especially one with a heatmap included :)
I have no experience with PLPlot, just quickly browsed the samples, it _appears_ to have a lot of flexibility, so if that higher level API you describe could be developed to more or less match matplotlib and seaborn out of the box then I am sure it would be a big hit, and make it easier for people already familiar with those packages.
Hi, @CESARDELATORRE
I also totaly support OP.
These tools that you provided, defenetly good, but in python or c++ people have already accomplished frameworks, which has all stuff for data analysis from the box. Moreover root was created by scientists from cern, gsl from los alamos, I can't name so powerfull organisation behind scipy, but it is widely supports by comunity.
Could you highlight one of the advantages which I mentioned from your list of tools?
How I can see, people from data analysis don't use .net so they already have background with another tools, so it's hard to imagine what .net could give in order to they start use it. But for newcomers it has a sense. And here we have in the one side c++ with root, gnuplot or python with pandas, tensorflow, matplotlib so on which easily combined into one tool for all data science stuff created by famous people. And on the other side ML.NET like core only for machine learning, but from MS and Math.NET or Accord.Net for math, oxyplot for visualisation, I don't know about quality and perfomance this things, but I think prevalence of these tools in community says by itself.
*In no case I don't want to said something bad about frameworks which you named or especially authors. I few times used math and accord.net. It's really good project, but I'm not sure that I will use it in prodaction. And, perhaps, it's my mistake, but in this case these frameworks should getting support from dotnet community, especially it's official members. *
Ok, perhaps, ML.NET should stay core for ML, how keras, or scilearn, but defenetly .net needs in open source data analysis framework. I say about extensions in direction of pure math, like statistics, fitting, some numerical methods. In additional to contemporary data visualisation tools based on SVG(based on some web-engine, like d3, highchart) it could became new modern data analysis framework and became alternative to tensorflow, pandas, root. May be it's only my pain, that I like .net and DA, but it's really hard to use one in another.
Most helpful comment
Sounds like a good potential option as long as it's in C# and to your main point is higher level so that EDA can be done quickly and efficiently without having to spend hours just to configure a few views, e.g. a dendrogram which is a pretty specialized chart and would likely be a very difficult task for one to set up their own if the library doesn't provide it, especially one with a heatmap included :)
I have no experience with PLPlot, just quickly browsed the samples, it _appears_ to have a lot of flexibility, so if that higher level API you describe could be developed to more or less match matplotlib and seaborn out of the box then I am sure it would be a big hit, and make it easier for people already familiar with those packages.