Hi,
I am trying to use TensorBoard to make clusters for an un-labelled data(1 column of sentences) which i can label to their respective categories for further use. Right now i have to extract them manually(copy-pasting the nearest neighbour) which is a tedious task. Can someone help me someway or maybe the developers contributing to TensorBoard provide some button (somewhat similar as in Scalar Dashboard) which can link the output of cluster formed by embeddings after multiple iterations to a file (csv or excel) for further uses for labeling these clusters?
Clusters formed after dimensionality reduction by T-SNE are useful in Visualization but i need to document them too. I believe, we might have lot of developers who might need this feature. Can someone please take this request?
Any help is truly appreciated :)
@nfelt Would be grateful, if you can suggest me something on this?
@Sumit217 I'm actually not very familiar with the embedding projector myself, and unfortunately we may not have time to address this feature request in the near future.
cc @dsmilkov @nsthorat @francoisluus in case anyone has suggestions for workarounds or has an idea for how this might be easily implemented.
Actually you already can. If you use the bookmarks bar in the bottom right, and save the bookmark, the computed t-SNE embeddings will be downloaded in JSON!
Let me know if you have trouble with that.
" {"pca-0":0.2748771905899048,"pca-1":-0.37543976306915283,"pca-2":-0.2921373248100281,"pca-3":0.14536359906196594,"pca-4":-0.1794687956571579,"pca-5":-0.38032111525535583,"pca-6":0.14102117717266083,"pca-7":0.3352908492088318,"pca-8":0.41407617926597595,"pca-9":-0.1911829113960266,"tsne-0":11.142189254026334,"tsne-1":-8.650362966257541,"tsne-2":-9.99883919837444},{"pca-0":0.23078849911689758,"pca-1":0.4652230143547058,"pca-2":-0.1820177137851715,"pca-3":-0.33192309737205505,"pca-4":-0.2976536452770233,"pca-5":0.35050126910209656,"pca-6":0.053142569959163666,"pca-7":-0.0807952731847763,"pca-8":0.1354132741689682,"pca-9":-0.206165465465 "
@nfelt Thank you so much tagging these guys.
@nsthorat Thank you so much for your response. I have tried that feature but i am not able to figure out, how do i make use of these to collect datapoints(sentences) hidden behind the clusters made, efficiently, in a file for labeling. If my embeddings make 5 clusters for my corpus, it would be great if i can get 5 different datasets(of sentences) seperately in a file. Can you suggest me something on these terms or may be something entirely different which can be used to solve this?
@francoisluus @dsmilkov @nsthorat
Hm... if you are looking for clusters for non-visual representation, you may want to just do something like K-means in python instead of using our tool. Our tool is very useful for visualization, but it sounds like you want to do something a little more custom and analytic.
sklearn has everything you'd want to do something like.
Am I missing some reason you want to use the embedding projector?
@nsthorat
Yes, you're right nikhil. That was one of the first option i explored but unfortunately not much luck with the output as my data has too much variance.
Similar to the problems mentioned here: https://stats.stackexchange.com/questions/133656/how-to-understand-the-drawbacks-of-k-means
You can say, I am trying to utilize tensorboard to extract clusters, and looking to build a dataset which can be fed later to ML algorithms.
Is there any option to download the nearest neighbors. Also the slider for nearest neighbors is not working.
Why not have an option to download the space coordinates of the points after TSNE or PCA, it will help a lot apart from visualizing the data.
Most helpful comment
Actually you already can. If you use the bookmarks bar in the bottom right, and save the bookmark, the computed t-SNE embeddings will be downloaded in JSON!
Let me know if you have trouble with that.