Go-ipfs: [community contribution] Remove `UnixfsNode` from the trickle builder

Created on 18 Jun 2018  路  23Comments  路  Source: ipfs/go-ipfs

Community contribution opportunity

Area: IPFS files.
Estimated time: 10-20 hours (depends on previous knowledge of the Files API).
Difficulty: Moderate, this is not a good first issue, but the problem has already been solved in another part of the code base, so it's not an open-ended question, it has a clear template to apply.
What you can learn: interaction with files in IPFS, how is a DAG constructed, how is a file formatted inside the DAG, DAG and UnixFS layers.
Mentor: @schomatis (meaning I commit to guide the contributor throughout the entire resolution of this issue answering any question he/she may have).

P3 dihard help wanted topifiles topitechnical debt

Most helpful comment

@schomatis Thanks for your above method. After read the code, I found the issue is NewLeafNode default to TFile. I should add pb.Data_DataType to it and then specified the TRaw for the leaf node of trickle. I could add the trickle node correctly and go test for importer/trickle and importer/balanced is passed. I current patches:
dag: add data type in NewLeafDataNode
dag: remote unixfs node in Layout of trickledag

All 23 comments

The UnixfsNode structure is being removed from the balanced builder in https://github.com/ipfs/go-ipfs/pull/5118 (see https://github.com/ipfs/go-ipfs/issues/5106 for background). The same task can be done for the trickle DAG builder to be able to completely eliminate the UnixfsNode structure from the code base.

This isn't a trivial task but has already been solved for the balanced case, and most of the necessary functions and structures have already been created, they just need to be applied in the trickle case replacing the analogous UnixfsNode-related functions. The most challenging part is understanding the implementation differences between the balanced and trickle builders (conceptually they are very similar) to check if there is a need to include some more functions to completely replace UnixfsNode, especially in the append family of functions which are not present in the balanced case.


I'm not taking this on myself because it does take time and careful analysis to get this right and there's much more to be done in the Files API milestone; the trickle builder is used in only a minor part of the code, so there's no rush here. Also not adding it to the milestone as its main objective is to clarify how Unix files work in IPFS and that is accomplished through the balanced case, the trickle won't add much more new information to the user. Personally I've learned (and suffered) a lot from studying this part of the code base so I think this is a great opportunity for a (relatively) new IPFS developer who wants to dig deeper into the code (and hopefully the suffering part will be diminished with the work done on the milestone).

I have not been following this discussion 100% but this is something I could probably do if it wanted.

@whyrusleeping what are your thoughts on this (if they are expressed elsewhere I apologize, like I said I have not really been following this discussion very carefully).

Hey @kevina, thanks for volunteering but I created this issue as an opportunity to provide some training to a new IPFS developer (you actually have much more experience than me on this part of the code) so if you don't mind I'd rather have someone more inexperienced take this.

@schomatis okay, I am always looking for straightforward tasks to do that don't require a lot of back and forth with other developers and this looks like it was a good fit which is why I volunteered, but I can leave it for someone else.

I am always looking for straightforward tasks to do that don't require a lot of back and forth with other developers

Good to know, I could use your help with some of the issues of the Files API milestone that actually aren't beginner friendly and would fit someone with your experience.

@schomatis most of those issues are labeled as documentation, I can help with writing what I know, but I am not the ultimate authority on it for that I think only @whyrusleeping and possible Juan or one of the other earlier developers that worked on that code.

Hi @schomatis I read the above comments and patches. And I am interested in this work.

Hey @bjzhang, that's great! The first steps would be to:

  1. Take a look at the balanced builder to understand the format of the file DAG.
  2. Take a look at the previous implementation of the builder to understand how was the UnixfsNode used there (which is the structure that should be replaced in the trickle builder).
  3. Study the trickle builder to first understand the topology differences of a trickle file DAG compared to the balanced one.
  4. Draft a plan/issue/PR of how we could use the https://github.com/ipfs/go-ipfs/pull/5118 PR to apply a similar mechanism in the trickle builder, especially with the append functions that we're not present in the balanced builder. (Be warned, I am going to bother you about documenting the code as much as possible :))

(As you said you may have already performed some of these steps.) Use this issue to ask any questions about the code (that will be a great feedback on how we could make it more accessible to the new IPFS developer) or feel free to open new issues/PRs if you think the complexity of the issue deserves it.

Hi, @schomatis Thanks for above guidelines. I read the balanced builder, #5118 and #5106. And I am reading the code in trickle builder.

IIUC, I should make use of db. NewFSNodeOverDag() and db.NewLeafDataNode() to avoid use UnixfsNode. And fillTrickleRec and others functions need to rewrite to replace UnixfsNode with FSNodeOverDag.

Given that there are several functions need to be rewrite. I will start from Layout and fillTrickleRec.

IIUC, I should make use of db. NewFSNodeOverDag() and db.NewLeafDataNode() to avoid use UnixfsNode. And fillTrickleRec and others functions need to rewrite to replace UnixfsNode with FSNodeOverDag.

Yes, exactly! You understood it perfectly.

Given that there are several functions need to be rewrite. I will start from Layout and fillTrickleRec.

Sounds about right. I would strongly advise you not to write a huge commit (as I did in #5118). The advantage of this scenario is that at this point the trickle builder is the only consumer of the UnixfsNode structure so now you can progressively dismember it as you see fit. (The same criteria applies to the DagBuilderHelper structure.) For example, you could replace the ProtoNode and FSNode attributes with the new FSNodeOverDag structure and adapt the AddChild function (and related) to be able to process it (FSNodeOverDag is ideal to act as an internal node, which is by definition what AddChild is adding a child to).

That is just an illustrative example off the top of my head, this is the part where you should take your time to think this through and find a way to slowly modify the code to be able organize the PR in small commits that can each pass the tests individually (so you know you're on the right track and aren't breaking anything).

In the meanwhile I'm going to take a look at the current state of the DAG diff command (ipfs object diff) which can be very useful here as most of the errors you'll be getting are of the hash-mismatch type, meaning your modified version of the builder will create an DAG with a (probably really small) difference that will propagate up to the root changing its hash and the test will just report a generic "the hash of the DAG generated with your builder doesn't match the one I have hard-coded here" error. The diff command can be very helpful to debug that situation and find exactly where is the actual difference (I should have used it myself in #5118 now that I think about it).

@bjzhang Just a heads-up to let you know that the importer and unixfs packages (which concentrate pretty much all the code related to this issue) will be moving (in a near future) to https://github.com/ipfs/go-unixfs (#5316). This won't affect how the code works, but if at some point in the future you update the master branch and can't find the code go to that repo.

I am trying to understand the code in trickle dag builder. And here is my trivial patch of Layout. I could add and get the file successful. But the trickle_test.go report the wrong ProtoNode type:

--- FAIL: TestSizeBasedSplit (0.00s)
    --- FAIL: TestSizeBasedSplit/leaves=ProtoBuf (0.00s)
        /go/src/github.com/ipfs/go-ipfs/unixfs/importer/trickle/trickle_test.go:93: expected file as branch node, got: Raw
    --- FAIL: TestSizeBasedSplit/leaves=Raw (0.00s)
        /go/src/github.com/ipfs/go-ipfs/unixfs/importer/trickle/trickle_test.go:93: expected file as branch node, got: Raw

I guess there is some misunderstanding of such types. But I could not find a glue. Meanwhile I saw a pull request #5120 which try to remove the Raw type by encountering backward compatibility issues. Is it relative issues?

diff --git a/unixfs/importer/helpers/dagbuilder.go b/unixfs/importer/helpers/dagbuilder.go
index 85e7aa7bc..123d9a173 100644
--- a/unixfs/importer/helpers/dagbuilder.go
+++ b/unixfs/importer/helpers/dagbuilder.go
@@ -257,6 +257,25 @@ func (db *DagBuilderHelper) FillNodeLayer(node *UnixfsNode) error {
    return nil
 }

+// FillFSNodeLayer do the same thing as FillNodeLayer.
+func (db *DagBuilderHelper) FillFSNodeLayer(node *FSNodeOverDag) error {
+
+   // while we have room AND we're not done
+   for node.NumChildren() < db.maxlinks && !db.Done() {
+       //TODO size
+       child, childFileSize, err := db.NewLeafDataNode()
+       if err != nil {
+           return err
+       }
+
+       if err := node.AddChild(child, childFileSize, db); err != nil {
+           return err
+       }
+   }
+
+   return nil
+}
+
 // GetNextDataNode builds a UnixFsNode with the data obtained from the
 // Splitter, given the constraints (BlockSizeLimit, RawLeaves) specified
 // when creating the DagBuilderHelper.
diff --git a/unixfs/importer/trickle/trickledag.go b/unixfs/importer/trickle/trickledag.go
index 55e9b849d..41d9a3661 100644
--- a/unixfs/importer/trickle/trickledag.go
+++ b/unixfs/importer/trickle/trickledag.go
@@ -40,23 +40,13 @@ const layerRepeat = 4
 // DagBuilderHelper. See the module's description for a more detailed
 // explanation.
 func Layout(db *h.DagBuilderHelper) (ipld.Node, error) {
-   root := db.NewUnixfsNode()
-   log.Infof("Bamvor: root: %p.", &root)
-
-   if err := fillTrickleRec(db, root, -1); err != nil {
-       return nil, err
-   }
-
-   out, err := db.Add(root)
+   newRoot := db.NewFSNodeOverDag(ft.TFile)
+   root, _, err := fillTrickleRecFSNode(db, newRoot, -1)
    if err != nil {
        return nil, err
    }

-   if err := db.Close(); err != nil {
-       return nil, err
-   }
-
-   return out, nil
+   return db.AddNodeAndClose(root)
 }

 // fillTrickleRec creates a trickle (sub-)tree with an optional maximum specified depth
@@ -91,6 +81,47 @@ func fillTrickleRec(db *h.DagBuilderHelper, node *h.UnixfsNode, maxDepth int) er
    }
 }

+// fillTrickleRecFSNode creates a trickle (sub-)tree with an optional maximum specified depth
+// in the case maxDepth is greater than zero, or with unlimited depth otherwise
+// (where the DAG builder will signal the end of data to end the function).
+func fillTrickleRecFSNode(db *h.DagBuilderHelper, node *h.FSNodeOverDag, maxDepth int) (filledNode ipld.Node, nodeFileSize uint64, err error) {
+   // Always do this, even in the base case
+   if err := db.FillFSNodeLayer(node); err != nil {
+       return nil, 0, err
+   }
+
+   for depth := 1; ; depth++ {
+       // Apply depth limit only if the parameter is set (> 0).
+       if db.Done() || (maxDepth > 0 && depth == maxDepth) {
+           break
+       }
+       for layer := 0; layer < layerRepeat; layer++ {
+           if db.Done() {
+               break
+           }
+
+           nextChild := db.NewFSNodeOverDag(ft.TFile)
+           childNode, childFileSize, err := fillTrickleRecFSNode(db, nextChild, depth)
+           if err != nil {
+               return nil, 0, err
+           }
+
+           if err := node.AddChild(childNode, childFileSize, db); err != nil {
+               return nil, 0, err
+           }
+       }
+   }
+   nodeFileSize = node.FileSize()
+
+   // Get the final `dag.ProtoNode` with the `FSNode` data encoded inside.
+   filledNode, err = node.Commit()
+   if err != nil {
+       return nil, 0, err
+   }
+
+   return filledNode, nodeFileSize, nil
+}
+
 // Append appends the data in `db` to the dag, using the Trickledag format
 func Append(ctx context.Context, basen ipld.Node, db *h.DagBuilderHelper) (out ipld.Node, errOut error) {
    base, ok := basen.(*dag.ProtoNode)

Thanks.

Hey @bjzhang, yes, #5120 was just a simple test to prove that the balanced builder was using the UnixFS types incorrectly: internal nodes should be of type TFile and leaf nodes (when using ProtoNode) should be of type TRaw, but the balanced builder uses TFile for all nodes, so I kept it like that to avoid changing all of the test hashes, but the trickle should continue to comply with this standard,

https://github.com/ipfs/go-unixfs/blob/master/importer/helpers/dagbuilder.go#L217

So, NewLeafDataNode (and the NewLeafNode function it calls) should be adapted to be able to pass the type of leaf node by argument, right now it's always creating TFile leaf nodes. The balanced builder should call it with TFile argument and the trickle with TRaw.

That being said, the error you're getting doesn't seem to be about that,

https://github.com/ipfs/go-unixfs/blob/master/importer/trickle/trickledag.go#L333

verifyTDagRec is complaining that an internal node ("branch node", as it's called there) should have been of type TFile but instead was of type TRaw (this is incorrect for both the balanced and the trickle builders), although from what I'm seeing in your patch you're always using TFile when creating new internal nodes (NewFSNodeOverDag), so at first sight I can't spot the problem.


The importer directory (the code you're working on) has already been moved to the go-unixfs repo, feel free to keep working here with an old version of go-ipfs, but eventually (when you pull from master) you'll have to move your current work there. I'm cc'ing @whyrusleeping so he can paste here a simple example (or point to an already existing one) for a new developer of how to work across repositories using the gx tool (I'm assuming with gx link although the paths here are already fixed to a gx version, I don't know if that's a problem), that is, so you can work on go-unixfs submitting a PR there and use the tests of go-ipfs here to check if it's working.

Also, it seems that by default an unspecified unixfs.FSNode will return a Raw type (which is not really nice) so keep an eye for a possible bug that would leave a node without a type specified,

https://github.com/ipfs/go-unixfs/blob/master/pb/unixfs.pb.go#L89

You can also temporarily hack verifyTDagRec to print more information about the DAG to pinpoint exactly at which level, node, etc., the inconsistency is taking place.

Hey @bjzhang, we're having some technical difficulties with our package manager gx right now, and the go-unixfs won't build against this repo, so for now keep using your old master branch of go-ipfs where you've been working so far to figure out the TFile/TRaw problem, I'll ping you when we have this sorted out, sorry.

Hey @bjzhang, you should now be able to build go-ipfs against your cloned version of go-unixfs doing

GO_UNIXFS_HASH=QmWdTRLi3H7ZJQ8s7NYo8oitz5JHEEPKLn1QPMsJVWg2Ew
# Hash of the current `go-unixfs` version (1.0.2) used in `go-ipfs`

go get github.com/whyrusleeping/gx
go get github.com/whyrusleeping/gx-go

go get -d github.com/ipfs/go-unixfs
gx-go link $GO_UNIXFS_HASH
# This will create a symbolic link from `$GOPATH/src/gx/$GO_UNIXFS_HASH/go-unixfs`
# to the git clone `$GOPATH/src/github.com/ipfs/go-unixfs`.

# Inside the `go-ipfs` repo:
make install

This will allow you to work on github.com/ipfs/go-unixfs and test your code through the go-ipfs tests, bear in mind that gx-go link will rewrite the generic imports in go-unixfs to the gx versions installed, e.g.,

diff --git a/importer/trickle/trickledag.go b/importer/trickle/trickledag.go
index 30c9618..8caff99 100644
--- a/importer/trickle/trickledag.go
+++ b/importer/trickle/trickledag.go
@@ -20,12 +20,12 @@ import (
    "errors"
    "fmt"

-   ft "github.com/ipfs/go-unixfs"
-   h "github.com/ipfs/go-unixfs/importer/helpers"
+   ft "gx/ipfs/QmWdTRLi3H7ZJQ8s7NYo8oitz5JHEEPKLn1QPMsJVWg2Ew/go-unixfs"
+   h "gx/ipfs/QmWdTRLi3H7ZJQ8s7NYo8oitz5JHEEPKLn1QPMsJVWg2Ew/go-unixfs/importer/helpers"

-   cid "github.com/ipfs/go-cid"
-   ipld "github.com/ipfs/go-ipld-format"
-   dag "github.com/ipfs/go-merkledag"
+   cid "gx/ipfs/QmYVNvtQkeZ6AKSwDrjQTs432QtL6umrrK41EBq3cu7iSP/go-cid"
+   ipld "gx/ipfs/QmZtNq8dArGfnpCZfx2pUNY7UcjGhVp5qqwQ4hH6mpTMRQ/go-ipld-format"
+   dag "gx/ipfs/Qma2BR57Wqp8w9vPreK4dEzoXXk8DFFRL3LresMZg4QpzN/go-merkledag"
 )

When submitting a PR in go-unixfs you should restore the generic import version with

go-unixfs$ gx-go rewrite --undo --fix

@schomatis Thanks for your above method. After read the code, I found the issue is NewLeafNode default to TFile. I should add pb.Data_DataType to it and then specified the TRaw for the leaf node of trickle. I could add the trickle node correctly and go test for importer/trickle and importer/balanced is passed. I current patches:
dag: add data type in NewLeafDataNode
dag: remote unixfs node in Layout of trickledag

That's great @bjzhang!!

Could you submit those patches as a PR in go-unixfs to start the process of review and get that merged?

@schomatis I just create the above pull request in go-unixfs package. Just to be clear, I still work on functions other than Layout in trickledag. I want to know if I am on the right direction. Thanks.

When I read the codes in go-unixfs/importer/helpers/helpers.go. I found that the comments of roughLinkSize and DefaultLinksPerBlock is difference from the value of them. Could I send a patch to remove the comments?

Could I send a patch to remove the comments?

Instead of removing the comments I would like them to correctly reflect the current values, but yes, open an issue about it and tag kevina who might now how are those values estimated.

Was this page helpful?
0 / 5 - 0 ratings