Machinelearning: A few questions

Created on 7 May 2018 · 14Comments · Source: dotnet/machinelearning

Hey,

That's a great initiative to see a machine learning FW for .NET 👍

I have a couple of questions about the rationale behind the project:

Apart the difference between a pure .NET implementation and a mixed C++&.NET Bindings, what are the advantages/differences with CNTK?
What are the training algorithms supported? (e.g CNN, RNN?)
What is the plan about multi-machine, multi-CPU, GPU (and multi-GPU) support?
Is it aimed at providing an abstract API (and a default implementation) that could plug to any implementation behind (e.g CNTK?)?

Thanks!

question

Source

xoofx

👍5

All 14 comments

In addition to @xoofx 's questions, is there a project road map?

helloguo on 7 May 2018

@helloguo I guess at https://github.com/dotnet/machinelearning/blob/master/ROADMAP.md ?

xoofx on 7 May 2018

Thank you! @xoofx

helloguo on 7 May 2018

What is the plan about multi-machine, multi-CPU, GPU (and multi-GPU) support?

Additionally, what is the plan about vectorizing the fundamental computation via .NET Core hardware intrinsics (e.g., SSE, AVX, FMA, etc.)?

fiigii on 7 May 2018

@GalOshri

danmoseley on 8 May 2018

Thank you for the feedback and questions! ML.NET is an extensible framework and we plan to explore how to integrate with libraries like CNTK, TensorFlow, and Accord.NET to enable them as part of ML.NET. Take a look at this blog post with a few more details. One of our goals is to provide consistent APIs that can cover a variety of different components/libraries/approaches.

We’ve highlighted some of the components that are available in the 0.1 release in the release notes. The API docs also have a list of learners. More learners are coming soon. Please let us know if there are any specific learners that you need.

Distributed training is on our roadmap, but we’d love to understand your requirements.

@asthana86 will have more details regarding #56.

GalOshri on 8 May 2018

👍1

Hope to work with you when I finish my phd. I'm intrigued.

bratan05 on 8 May 2018

Will there be more documentation on the components? I would like to understand better what is the purpose of it in the pipeline (e.g. why is file mandatory?).

While trying to fix the tests, I happened to notice there's matrix and vector code interspersed in some of the functions. I don't know the plan, or the way this projects works, but would it be OK to contemplate refactoring things like matrices and vectors to components of their own? How about larger issues if the project is still young?

Some things in no particular order, to make the discussion more concrete:

In distributed setting especially it might be prudent to make a policy to remove DateTime.Now from the codebase.
The code would be easier to handle in production if logging data to streams were removed and tracing and logging facilities used like in other .NET frameworks (at least in the newerones, e.g. MEL and EventId).
The code in the tests is a bit difficult to follow due to quite a lot of inheritance that makes debugger jump around quite a lot (and code like this makes parallization more difficult) and then it's not that straightforward to get from logging output to the source of the problem and also some data format changes (from double/float to string and so on). I have ideas (and opinions), but I'm hesitant to share them as this isn't an area I'm that familiar with nor I'm familiar with the codebase. I would hope IProgress and others wouldn't be too fixed inside the codebase already in the beginning. :)

veikkoeeva on 10 May 2018

The code would be easier to handle in production if logging data to streams were removed and tracing and logging facilities used like in other .NET frameworks (at least in the newerones, e.g. MEL and EventId).

@eerhardt this is something you are looking into, aren't you?

KrzysztofCwalina on 10 May 2018

In distributed setting especially it might be prudent to make a policy to remove DateTime.Now from the codebase.

filed #110 (updated) to do a scrub

KrzysztofCwalina on 10 May 2018

The code would be easier to handle in production if logging data to streams were removed and tracing and logging facilities used like in other .NET frameworks (at least in the newerones, e.g. MEL and EventId).

@eerhardt this is something you are looking into, aren't you?

Yes, I've been working/thinking on ways we can make this area better. The first line of thinking is to try to use DiagnosticSource. MEL may be appropriate in some cases, but it may be a dependency we don't want to take.

filed #57 to do a scrub

I think you meant #110 😉

eerhardt on 10 May 2018

👍1

@eerhardt I refer to https://github.com/aspnet/Logging/issues/612 for background discussion in the larger .NET sphere. That particular thread links into deeper and quite long discussion about the state of logging and tracing and distributed tracing (technical trace, business logs, distributed setting too). The Orleans maintainers have worked on this too (with the community too), so you might want to ping their experiences or ask at https://gitter.im/dotnet/orleans. At least @galvesribeiro and @jdom have been involved.

@GalOshri I didn't notice to link https://gitter.im/Microsoft/CNTK?at=59887b6b614889d475275902 earlier. I see there's a good opportunity to combine Orleans and ML.NET, which is partially why I was wondering that host thing. :) Orleans has now heterogenous siloes and it would be great to combine actors in a live, production system with a machine learning framework suited to the task (as a production system and build tooling on Orleans).

veikkoeeva on 10 May 2018

Adding here feedback discussion on logging and tracing and paging @eerhardt just in case: https://github.com/Microsoft/ApplicationInsights-dotnet-server/issues/913.

veikkoeeva on 17 May 2018

We can make this issue closed as the questions have been answered.