Joss-reviews: [REVIEW]: GEM: A Python package for graph embedding methods

Created on 5 Aug 2018 · 58Comments · Source: openjournals/joss-reviews

Submitting author: @palash1992 (Palash Goyal)
Repository: https://github.com/palash1992/GEM
Version: v1.0.0
Editor: @danielskatz
Reviewer: @jsgalan, @rougier
Archive: 10.5281/zenodo.1407178

Status

Status badge code:

HTML: <a href="http://joss.theoj.org/papers/efb9242db91adee8c8265f000f26ef5a"><img src="http://joss.theoj.org/papers/efb9242db91adee8c8265f000f26ef5a/status.svg"></a>
Markdown: [![status](http://joss.theoj.org/papers/efb9242db91adee8c8265f000f26ef5a/status.svg)](http://joss.theoj.org/papers/efb9242db91adee8c8265f000f26ef5a)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@jsgalan & @rougier, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

Make sure you're logged in to your GitHub account
Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.theoj.org/about#reviewer_guidelines. Any questions/concerns please let @danielskatz know.

✨ Please try and complete your review in the next two weeks ✨

Review checklist for @jsgalan

Conflict of interest

[x] As the reviewer I confirm that I have read the JOSS conflict of interest policy and that there are no conflicts of interest for me to review this work.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the repository url?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Version: Does the release version given match the GitHub release (v1.0.0)?
[x] Authorship: Has the submitting author (@palash1992) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the function of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Authors: Does the paper.md file include a list of authors with their affiliations?
[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?

Review checklist for @rougier

Conflict of interest

[x] As the reviewer I confirm that I have read the JOSS conflict of interest policy and that there are no conflicts of interest for me to review this work.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the repository url?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Version: Does the release version given match the GitHub release (v1.0.0)?
[x] Authorship: Has the submitting author (@palash1992) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the function of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Authors: Does the paper.md file include a list of authors with their affiliations?
[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?

accepted published recommend-accept review

Source

whedon

All 58 comments

Hello human, I'm @whedon. I'm here to help you with some common editorial tasks. @jsgalan, it looks like you're currently assigned as the reviewer for this paper :tada:.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands

whedon on 5 Aug 2018

Attempting PDF compilation. Reticulating splines etc...

whedon on 5 Aug 2018

--> Check article proof :page_facing_up: <--

whedon on 5 Aug 2018

👋 @jsgalan & @rougier - thanks for agreeing to review this - please do so in this issue as described in the checklist and the second comment above. Let me know if you have any questions or concerns.

danielskatz on 5 Aug 2018

@palash1992 - in paper.md, you reference [goyal2017graph] which doesn't compile, rather than the correct [@goyal2017graph] -- i.e. the @ is missing and should be added.

danielskatz on 5 Aug 2018

@whedon generate pdf

palash1992 on 5 Aug 2018

Attempting PDF compilation. Reticulating splines etc...

whedon on 5 Aug 2018

--> Check article proof :page_facing_up: <--

whedon on 5 Aug 2018

@danielskatz Thanks for pointing it out. I have modified the manuscript to reflect the change.

palash1992 on 5 Aug 2018

👍1

Hi all,
I just finished reviewing this interesting and novel package.

After installing and revising, a few comments mostly Unrelated on the implementation

I tested it under OSX, Python 2.7.9 on the Enthought Canopy Environment
All incompatibilities were cleared and all dependencies through PiP in the Python Canopy Terminal
Thean, gave me a problem importing "_gof_", so following forum instructions had to uninstall and re-installed (The Bleeding Edge version) through cloning the git repository

Even though is a Theano maybe should be added as a warning of the package

From the example, i had an error on this block
> edge_f = 'gem/data/karate.edgelist'
> isDirected = True

I just got the file and place it in the working directory. Should this dataset be placed inside the package itself under gem.datasets.karate?

Also, Ifound the following errors:

when trying node2vec method

_[Errno 2] No such file or directory
Exception Traceback (most recent call last)
/JOSS/GEM/test1gem.py in ()
42 t1 = time()
43 # Learn embedding - accepts a networkx graph or file with edge list
---> 44 Y, t = embedding.learn_embedding(graph=G, edge_f=None, is_weighted=True, no_python=True)
45 print (embedding._method_name+':\n\tTraining time: %f' % (time() - t1))
46 # Evaluate on graph reconstruction

/Users/jsgalan/Library/Python/2.7/lib/python/site-packages/gem-1.0.0-py2.7.egg/gem/embedding/node2vec.pyc in learn_embedding(self, graph, edge_f, is_weighted, no_python)
81 except Exception as e:
82 print(str(e))
---> 83 raise Exception('./node2vec not found. Please compile snap, place node2vec in the path and grant executable permission')
84 self._X = graph_util.loadEmbedding('tempGraph.emb')
85 t2 = time()

Exception: ./node2vec not found. Please compile snap, place node2vec in the path and grant executable permission_

when trying SDNE
_

---> 14 from gem.embedding.sdne import SDNE
15
16 # File that contains the edges. Format: source target

/Users/jsgalan/Library/Python/2.7/lib/python/site-packages/gem-1.0.0-py2.7.egg/gem/embedding/sdne.py in ()
26 from keras import backend as KBack
27
---> 28 from theano.printing import debugprint as dbprint, pprint
29 from time import time
30

/Users/jsgalan/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/theano/__init__.py in ()
86
87
---> 88 from theano.configdefaults import config
89 from theano.configparser import change_flags
90

/Users/jsgalan/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/theano/configdefaults.py in ()
45 convert=floatX_convert,),
46 # TODO: see gh-4466 for how to remove it.
---> 47 in_c_key=True
48 )
49

/Users/jsgalan/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/theano/configparser.pyc in AddConfigVar(name, doc, configparam, root, in_c_key)
279 if hasattr(root, name):
280 raise AttributeError('This name is already taken',
--> 281 configparam.fullname)
282 configparam.doc = doc
283 configparam.in_c_key = in_c_key

AttributeError: ('This name is already taken', 'floatX')

@palash1992 can you give me pointers to circumvent this errors?

Best

jsgalan on 5 Aug 2018

@jsgalan , thanks for the review! I have revised the library to clarify the above errors. 1. I changed the specification to 'karate.edgelist' so that you can load it from the current directory. 2. Original instructions to run node2vec successfully were in gem/c_exe/readme. I have now merged them with readme on the front page. 3. I have removed the theano.printing import line making it compatible with Tensorflow as well.

Let me know if you have any more issues.

palash1992 on 6 Aug 2018

I'm trying to run the test file but without much success yet:

Using TensorFlow backend.
/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: compiletime version 3.6 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.7
  return f(*args, **kwds)
Traceback (most recent call last):
  File "test.py", line 23, in <module>
    G = graph_util.loadGraphFromEdgeListTxt(edge_f, directed=isDirected)
  File "/Users/rougier/tmp/GEM/gem/utils/graph_util.py", line 146, in loadGraphFromEdgeListTxt
    with open(file_name, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'gem/data/TEST_50M.edgelist'

Not sure what it means. Also, I find it rather difficult to understand what the test.py is supposed to do since there's no documentation (or did I miss the link?). Could you provide some documentation for the software as well as some API description (how to call the library) ?

Also, in order to run the test file, I had to install keras and tensorflow but it is not clear if those are dependencies of the test file or the actual library. If the latter is true, you will need to update your dependencies.

rougier on 6 Aug 2018

@rougier Thanks for taking the time to review! The readme clarifies some of your questions. On the repository documentation page (readme.md), under dependencies I have specified that the package in general doesn't require keras and tensorflow/theano, but if you need to run SDNE model, you will need them. Also, the purpose of test.py is mentioned in usage section of the readme. I have also now added it to the test.py file.

The error on the dataset can be overcome by copying the gem/data/karate.edgelist to the working directory. I have modified the readme and test.py to clarify this. I also changed TEST_50M.edgelist to karate.edgelist in test.py.

Let me know if there are still any issues.

palash1992 on 6 Aug 2018

I still have some issues but I'll move to your issuer tracker. But again, it is not clear what the test is doing and what kind of output is expected. I think you should definitely add a toy example to make it easier for people to rapidly get a sense of what the software is doing.

rougier on 7 Aug 2018

@rougier : I have replied to the issue on Github. I completely agree that a toy example would make it easier to understand the purpose. I will add it soon. Thanks for the suggestion!

palash1992 on 7 Aug 2018

👍1

And also a documentation (even a small one). Sorry to insist on that but it's very difficult to know what this software is all about when you land on the first page. An explanation, a toy example (with the expected output) would really help.

rougier on 7 Aug 2018

@rougier : I will add that by the end of the day. I can add more examples as well.

palash1992 on 7 Aug 2018

Hi all,

I agree with comments made by @rougier regarding the documentation, even though the referenced papers are explanatory of the methods, it is useful to have some documentation of the parameters of the methods.

Otherwise, I am satisfied with the review.

Best

jsgalan on 8 Aug 2018

👍1

@rougier and @jsgalan Thanks for your insightful reviews! I have added explanations of model hyperparameters and more toy examples with sample execution results in the library. Please have a look and let me know if anything is missing. The library certainly benefited from your suggestions!

palash1992 on 9 Aug 2018

I still got some problems with missing files (filed an issue on the repo). I think you might want to have an example directory where you can put the test_karate.py and test_sbm.py files with the relevant data. Is there any reason why you put some data under gem/data ? Overall, I think you might need to think of a better structure for your library. Everything's here but it is now mostly a matter of organization. If you look at the criterion for the review (at the start of this thread) it might help you in deciding what's the best structure.

rougier on 9 Aug 2018

@rougier I have now created a "tests" folder with the required files following the instructions. This should resolve the issues.

palash1992 on 9 Aug 2018

👋 @rougier - can we get an update on your status on this?

danielskatz on 13 Aug 2018

Sorry, I'm attending a conference until Friday and I don't have too much time to re-test right now. The latest change are still not working for me (I opened several issues on the repo). It is a bit frustrating since it is always minor bugs and the core of the software seems to be ok. However, I think also there are room for improvements concerning the documentation (usage + API) and the overall structure.

Also, there is not release yet but @palash1992 can make oneonce everything is ok.

rougier on 13 Aug 2018

@rougier : Based on my replies, were you able to make it work? I have also added more examples in the readme along with sample executions. Also, I added parameter descriptions for all methods in the readme. Let me know where else can I make improvements.

palash1992 on 16 Aug 2018

It still does not work for me. But maybe it's my installation. @jsgalan Did you manage to run the two tests?

rougier on 17 Aug 2018

@rougier : while we wait for @jsgalan 's reply, can you tell me what error you are getting? I asked some of my friends to install it on different machines and they were able to do it. So I am not sure exactly what the problem could be. Perhaps it's a specific Mac issue. Any feedback would be helpful in making the package more robust to platforms.

palash1992 on 22 Aug 2018

Here is what I get from test_karate.py:

 $ python test_karate.py
Using TensorFlow backend.
/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: compiletime version 3.6 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.7
  return f(*args, **kwds)
Num nodes: 34, num edges: 77
Could not import C++ module for Graph Factorization. Reverting to python implementation. Please recompile graphFac_ext from graphFac.cpp using bjam
        Iter id: 0, Objective: 77.0069, f1: 76.9998, f2: 0.00712866
        Iter id: 10000, Objective: 76.9998, f1: 76.9979, f2: 0.00183901
        Iter id: 20000, Objective: 76.9995, f1: 76.9978, f2: 0.00164567
        Iter id: 30000, Objective: 76.9991, f1: 76.9973, f2: 0.00175163
        Iter id: 40000, Objective: 76.9988, f1: 76.997, f2: 0.00183327
graph_factor_sgd:
    Training time: 33.110936
    MAP: 0.606258636714048   preccision curve: [1.0, 0.5, 0.3333333333333333, 0.25, 0.4]



----------------------------------------------------------------------------------------------------
Num nodes: 34, num edges: 77
SVD error (low rank): 0.053622
hope_gsvd:
    Training time: 0.020297
    MAP: 0.12345399411723433     preccision curve: [1.0, 1.0, 0.6666666666666666, 0.5, 0.4]



----------------------------------------------------------------------------------------------------
Embedding dimension greater than 2, use tSNE to reduce it to 2
Num nodes: 34, num edges: 77
Laplacian matrix recon. error (low rank): 6.293280
lap_eigmap_svd:
    Training time: 0.018129
    MAP: 0.42051116510581815     preccision curve: [0.0, 0.0, 0.0, 0.25, 0.2]



----------------------------------------------------------------------------------------------------
/usr/local/lib/python3.7/site-packages/numpy/core/numeric.py:544: ComplexWarning: Casting complex values to real discards the imaginary part
  return array(a, dtype, copy=False, order=order, subok=True)
/usr/local/lib/python3.7/site-packages/numpy/core/numeric.py:492: ComplexWarning: Casting complex values to real discards the imaginary part
  return array(a, dtype, copy=False, order=order)
/usr/local/lib/python3.7/site-packages/matplotlib/text.py:844: ComplexWarning: Casting complex values to real discards the imaginary part
  x = float(self.convert_xunits(self._x))
/usr/local/lib/python3.7/site-packages/matplotlib/text.py:845: ComplexWarning: Casting complex values to real discards the imaginary part
  y = float(self.convert_yunits(self._y))
/usr/local/lib/python3.7/site-packages/matplotlib/text.py:711: ComplexWarning: Casting complex values to real discards the imaginary part
  posx = float(textobj.convert_xunits(textobj._x))
/usr/local/lib/python3.7/site-packages/matplotlib/text.py:712: ComplexWarning: Casting complex values to real discards the imaginary part
  posy = float(textobj.convert_yunits(textobj._y))
Num nodes: 34, num edges: 77
/usr/local/lib/python3.7/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py:1849: RuntimeWarning: invalid value encountered in sqrt
  s = np.sqrt(eigvals)
lle_svd:
    Training time: 0.023433
    MAP: 0.4528783377992382      preccision curve: [0.0, 0.0, 0.0, 0.25, 0.2]



----------------------------------------------------------------------------------------------------
Num nodes: 34, num edges: 77
[Errno 2] No such file or directory: 'node2vec': 'node2vec'
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/gem-1.0.0-py3.7.egg/gem/embedding/node2vec.py", line 80, in learn_embedding
    call(args)
  File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/subprocess.py", line 304, in call
    with Popen(*popenargs, **kwargs) as p:
  File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/subprocess.py", line 756, in __init__
    restore_signals, start_new_session)
  File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/subprocess.py", line 1499, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'node2vec': 'node2vec'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test_karate.py", line 47, in <module>
    Y, t = embedding.learn_embedding(graph=G, edge_f=None, is_weighted=True, no_python=True)
  File "/usr/local/lib/python3.7/site-packages/gem-1.0.0-py3.7.egg/gem/embedding/node2vec.py", line 83, in learn_embedding
    raise Exception('./node2vec not found. Please compile snap, place node2vec in the system path and grant executable permission')
Exception: ./node2vec not found. Please compile snap, place node2vec in the system path and grant executable permission

rougier on 22 Aug 2018

@rougier: So the program runs other models and is unable to locate node2vec. As per the installation instructions on the repository page, to install node2vec as part of the package, recompile from https://github.com/snap-stanford/snap and add node2vec executable to system path. To grant executable permission, run: chmod +x node2vec. It should solve the issue.
To test the above, alternatively you can try running node2vec from a terminal.

palash1992 on 22 Aug 2018

Ok. You'll need to add SNAP in the list of dependencies (and it might be good to add links for each of the dependency).

By the way, can't we use https://github.com/aditya-grover/node2vec as a fallback?

rougier on 22 Aug 2018

@rougier: My idea for not adding node2vec to dependencies was based on the premise that some users may not need to run node2vec and thus should not be restricted by having library dependency. Adding the fallback is a good alternative. At this point I can make one of the below three modifications:

Add an argument in test.py specifying the method to be tested. Thus, the program would be run as python test.py -m all (or) python test.py -m hope,lap,gf. If the user is interested in node2vec, as specified in the installation instructions he/she can specify it in the argument.
Add SNAP as the dependency.
Add https://github.com/aditya-grover/node2vec as the fallback and dependency of gensim in the library. If the user installs SNAP, the library will automatically take the installation.

Please let me know which of these are you in favor of.

palash1992 on 22 Aug 2018

This is where I get lost and I think the culprit is the lack of documentation ;) Is node2vec necessary only to run the tests or is it necessary to run the software? If this is only for the test, then it's fine for me. I can run the karate test and probably @jsgalan can run the other one. However, it would be good in the documentation to explain (for example) the karate example (what is the expected output, what does it mean, etc) and to explain that second example requires the SNAP library. In the end, it would be only a matter of changing the README.

Also, maybe the tests directory could be renamed examples since content is actually illustrating usage.

While checking the paper, I think you're missing a DOI for Belkin et al. 2002.

rougier on 23 Aug 2018

@whedon generate pdf

palash1992 on 24 Aug 2018

Attempting PDF compilation. Reticulating splines etc...

whedon on 24 Aug 2018

--> Check article proof :page_facing_up: <--

whedon on 24 Aug 2018

@rougier: Yes, node2vec is only required for tests. I have now made the changes you specified and also added an additional argument named node2vec to use if you need to run node2vec. I have clarified this in readme.

Hope it is fine now.

palash1992 on 24 Aug 2018

👍1

Thanks. The README looks better. I'm still unable to run the test_sbm.py due to the _node problem. @jsgalan Did you manage to run it?

Apart from that, I think it is ready for publication (@danielskatz) even though you could improve a little bit the README (for example, by re-using the summary of the paper.md as an introduction for your README).

Don't forget to make a release on GitHub, I think it is mandatory for publication.

rougier on 24 Aug 2018

@rougier: I have now added a release and modified readme.

To solve the _node issue, did you try uninstalling networkx and installing using pip install networkx==1.11?

palash1992 on 24 Aug 2018

No, I did not try actually.

rougier on 24 Aug 2018

Hi i've cloned the git a few times now. I cloned and compiled SNAP and added node2vec to my system's path.

This is what i got after trying the tests:

test_karate works (with warnings)
test_karate.txt
test_sbm works (with warnings)
test_sbm.txt

attached both runs, @palash1992 could you look at the warnings?.

Best

jsgalan on 25 Aug 2018

@rougier after the pip install networkx==1.11 some parts of the tests where smooth. you might want to try.

jsgalan on 26 Aug 2018

@rougier : Let me know if the issue still persists

palash1992 on 29 Aug 2018

@palash1992 Just did and it works (with some warnings though)

rougier on 30 Aug 2018

@rougier : Perfect! I also made a release. Let me know if there is anything else needed from my side.

palash1992 on 30 Aug 2018

@rougier - I see there are still quite a few checkboxes left empty. Please go through them when you have a chance.

danielskatz on 30 Aug 2018

@danielskatz Just did. The Functionality documentation and Community guidelines could be improved a bit but it does not prevent publication. Thanks @palash1992 for taking all my comments into account.

rougier on 31 Aug 2018

great, thanks @rougier !

@palash1992 - please make an archive of the current state of the repo in Zenodo or similar, and let me know the DOI of the archive, so that can move forward in publishing this submission.