Go-ipfs: Core API needs to take flexible inputs

Created on 9 Dec 2016  路  32Comments  路  Source: ipfs/go-ipfs

We've layed the groundwork for the Core API in go-ipfs by implementing parts of the Unixfs API. Its functions properly return CIDs, but they only accept path strings as arguments, which requires unneccessary conversions in most situations.

Core API functions should accept:

  • CIDs
  • Strings
  • Multihashes
  • Paths
  • ipfs.io URIs

The go-cid package in use provides Parse() which takes care of it.

topicore-api

All 32 comments

I don't know if I understand this very well.

I don't like APIs which take interface{} but where interface{} means you can pass different representations of the same *cid.Cid thing but not absolutely anything. I find it very anti-Go in spirit. It makes go behave like Python. Helper methods like cid.Parse() are always available to facilitate that the user adapts to a c *cid.Cid parameter type, but we shouldn't make it an c interface{} type just because it saves a line somewhere else. Is that what you mean?

Note also that methods definitions are pretty much all the documentation we provide in many places, so they should be very meaningful (specially for a Core API).

I agree with Hector, behaviour of CoreAPI should be as strict and as well defined, as possible (so we don't replicate mistakes from HTTP API). CID as being able to describe all of those protocols seems the best.

I don't know if we should limit it only to /ipfs/CID which would define the behaviour even better but would add additional resolve step for consumer of API but it means it is explicit and the consumer can decide if he wants to resolve IPNS for example.

You're both raising fair points.

CID as being able to describe all of those protocols seems the best.

No, we also need to somehow accept full-fledged paths like /ipfs/Qmfoobar/one/two/three and IPNS paths like /ipns/Qmfoo/bar and /ipns/example.com/foo/bar. I guess js-ipfs-api didn't have to deal with IPNS paths yet? cc @diasdavid


What do you think about the following? A separate (*CoreAPI).Resolve() func which tries its best to turn anything into a cid.Cid, and then have the actual funcs in UnixfsAPI and other following APIs accept cid.Cid and nothing else. This could be broken up into ResolveString, ResolvePath, ResolveMultihash, etc., but would always come up with either a cid.Cid, or an error.

The current specs in ipfs/interface-ipfs-core just assume multihashes everywhere, but on the other hand js-ipfs has neither a gateway nor IPNS.

That sounds better @lgierth. I wonder if those resolve functions qualify to be part of core API or should be in other modules though (or part of cid).

I wonder if those resolve functions qualify to be part of core API or should be in other modules though (or part of cid).

We need the Core API to be able to resolve full /ipfs and /ipns paths. In the case of minimal implementations without a namesys, this means at least returning proper errors for this (e.g. ErrNoNamesys).

Lets make sure to keep the robustness principle in mind. I think that our 'nice' interface that we're expecting users to interact with frequently should be very easy to use.

It makes go behave like Python.

Python is quite nice to use and hack things together with, as is javascript. We should make sure to consider who this api is intended for when designing it, more advanced users can create an adder themselves and plumb things through more 'correctly'.

I'd be interested to hear @jbenet's thoughts on the matter.

In my opinion robustness principle applies to user facing interfaces, not really dev facing interfaces, we don't want to replicate what browsers do with broken HTML.

In my opinion robustness principle applies to user facing interfaces, not really dev facing interfaces, we don't want to replicate what browsers do with broken HTML.

Eeeh the robustness principle goes down the whole stack :)

TCP implementations should follow a general principle of robustness: be conservative in what you do, be liberal in what you accept from others.

But let's stay on topic:


The difference between what I proposed in the #3493, and what I proposed with Resolve*() above, is where we place the being-liberal-in-what-we-accept. The former is implicit, the latter is explicit.

We have a similar separation of concerns on the HTTP API and CLI: there's all the functions that do stuff and expect direct ipfs paths (direct as in: just multihash, no links), and there's resolve, which always comes up with either a direct ipfs path , or with an error or a timeout. For user convenience there, we imply resolve in all other functions too, so that e.g. ipfs cat /ipns/ipfs.io/theme/image/logo.png works.

So on the HTTP API and CLI we're being implicit.

We should make sure to consider who this api is intended for when designing it, more advanced users can create an adder themselves and plumb things through more 'correctly'.

Yep! We should still make it super easy though to plumb things.

I feel like the explicit version is okay and easy enough, and I'm seeing another advantage: having all functions accept CIDs-only tells something to users of the Core API: "CIDs are the thing!!!"


I'd be interested to hear @jbenet's thoughts on the matter.

Me too! I feel like we've had a similar discussion a long while ago.

We should make sure to consider who this api is intended for when designing it

The users of the Core API want to:

  • Plumb something together quickly, i.e. what go-ipfs-api (aka ipfs-shell) is provides. Super simple interfaces with not a lot of surface.
  • Build proper applications embedding go-ipfs or parts of it

The first point is why it's critical to accept as many kinds of paths/CIDs/multihashes/URIs as reasonably possible. The consumer might be getting these values from ctulhu-knows-where.

Anyhow, I think it's cool to be explicit, and have Resolve() be the liberal component.

I will respond with more in a bit, but quick thoughts (since others are thinking about it now):

  • i'm not convinced cid.CID should be the type the core IPFS functions operate on. It should be our path.Path (really ipfs.path.Path if it was more qualified). Think about it like a file system.
  • My instinct here is that we should not be asking people to resolve to CID because what we're operating on is not hashes/content-addresses, but rather the identifier of an object in the graph (a path to it). CIDs happen to be a path because /<cid> makes sense in IPFS and we're saving people the extra cognitive load of (/ip{fs,ld}/<cid>).
  • Forcing to a CID in the core implies that ALL functions will behave exactly the same no matter what path to an object is used. I'm not certain that ALL functions should behave exactly the same if two paths to the exact same object are tried. There may be cases where _the access path_ implies something.
  • The _robustness principle_ i think is a great thing to follow in general, all over the stack. (and yes, it comes from TCP.
  • I would be in favor of moving the "liberality" of the main thing we take into a function (like core.Resolve as @lgierth suggests, or some other function like ParseAndResolveToCID). But what should this function output? im not sure that a cid.CID is the right thing, as described above. It may have to output a path, at which point we may be back to calling Resolve internally.

an interesting example:

> ipfs pin add foo
added Qme4u9HfFqYUhH4i34ZFBKi1ZsW7z4MYHtLxScQGndhgKE foo/bar/baz/abc
added QmR4waSammpkmN3xe9Vx34SRSa8L396ZFqrF2aTjXwsZUr foo/bar/baz/def
added QmYud4MjiKLmmZtzPbCBhKS9cRRnmSH6eEyEfaGCxVceBe foo/bar/baz
added QmSTGbf64dkWLDXRPcdoFyKVwbDeE2WRpE3nGFvGd1hg5Z foo/bar
added QmPWCKsKi66kea1GenAygQuGTPf7ANB6Rio6TLLg69UVwy foo

> ipfs cat /ipfs/QmPWCKsKi66kea1GenAygQuGTPf7ANB6Rio6TLLg69UVwy/bar/baz/abc
abc

> ipfs pin add -r /ipfs/QmPWCKsKi66kea1GenAygQuGTPf7ANB6Rio6TLLg69UVwy/bar
pinned QmSTGbf64dkWLDXRPcdoFyKVwbDeE2WRpE3nGFvGd1hg5Z recursively

Today, this means the following:

> ipfs repo gc
# QmPWCKsKi66kea1GenAygQuGTPf7ANB6Rio6TLLg69UVwy should be gone

> ipfs cat /ipfs/QmPWCKsKi66kea1GenAygQuGTPf7ANB6Rio6TLLg69UVwy/bar/baz/abc
error: QmPWCKsKi66kea1GenAygQuGTPf7ANB6Rio6TLLg69UVwy not found

A different semantic of the statement:

ipfs pin add -r /ipfs/QmPWCKsKi66kea1GenAygQuGTPf7ANB6Rio6TLLg69UVwy/bar

would pin QmSTGbf64dkWLDXRPcdoFyKVwbDeE2WRpE3nGFvGd1hg5Z recursively, and pin QmPWCKsKi66kea1GenAygQuGTPf7ANB6Rio6TLLg69UVwy directly "as a dependency".

(this may or may not be better)

I say all this because im not 100% sure that all functions should be ok just taking CIDs. i think Paths is more accurate.

Personally, I need to think about this more deeply before making up my mind here. it's definitely tricky.

I think what this question boils down to is: what does IPFS operate on? Roots of trees? or Subtrees?

"subtrees" (paths) includes "roots of trees" (implied by a CID).

(btw, i _can be_ convinced cid.CID is the right abstraction, i just need to see more cases).

What is the case for cid.CID and against path.Path?

Or _truly_, what is the case for ipld.CID and against ipfs.Path?

Huh, the pinning example is very very tricky. Would be best if we could define those semantics away :)

But what should this function output? im not sure that a cid.CID is the right thing, as described above. It may have to output a path, at which point we may be back to calling Resolve internally.

The main advantage I see in preferring CIDs is efficency. No need to build a path just to take it apart again right away. I'm thinking of e.g. the use case of building plumbing tools here. There could be GC strategies implemented as external tools, and they'd deal with toooons of items.

I have the fear that preferring paths all over the Core API will make it slow by design in some regard.

Btw, happy about a great discussion so far, and within only a few hours! :+1:

I think that path could be made into an interface, and the object could either be a string, or a cid+subpath, and the subpath could be zero length.

Huh, the pinning example is very very tricky. Would be best if we could define those semantics away :)

Yeah maybe. that pinning example could be accomplished with a selector, too (if we extend pinning to be IPLD selector aware), so maybe it's not the best example.

I'd like to see the change by consdiering the whole API. We really need a nice ipfs core package with all the operations there. right now so much is split into the commands themselves...


We could make paths efficient if we need to. for example (this wouldn't work just like this, but gives the idea)

type Path interface {
   String() string 
   // whatever else paths need
}

type cidPath cid.CID

func (p *cidPath) String() string {
    return "/" + cid.CID(p).String() // or something.
}

Question: isn't a path like <hash>/a/b/c very constrained by IPLD specs (or idea)? Is IPLD dictating that hash is a Cid and that a/b/c are subtrees of that object, that / is a reserved char, and that a, b and c have certain charset/length limitations?

In that case, having ipld.ID would be the appropiate. If you would prefer it, it can be an interface which both ipfs.Path and cid.Cid implement. But it can also be a regular Type with ParseCid/ParsePath methods (even if it's an interface it would be nice to have a Type somewhere).

I'm all for the robustness principle, but let's be clear: our specs are strict. There are no different ways to represent a Cid. Our parsing (even if we do it for the user) does not take into account cases where the meaning is clear, despite of a malformed input. Therefore, being explicit about exactly which input is not going to come back as an error is a service to the user (to make their life easier), rather than to us, IMHO. I should add that we do this already everywhere and that it is great: i.e. libp2p works with Multiaddresses, not with generic strings which are then parsed into them for the user.

The only reason for my insisting on CIDs playing some role is efficiency and preventing superfluous conversions and resolutions all over the place. But by now I do agree that paths are pretty important.

How about an enhanced Path object which can hold the resolved Cid? I'll come up with a bit of code in a few.

Okay, so here's a very minimal coreapi.Path which is capable of retaining the CID once it has been resolved, and c6dc74b4c7a6a843484a5ebd9f0f575d5714c262 wires it with the rest.

type path struct {
    path string
    cid  *cid.Cid
}

func ParsePath(p string) (coreiface.Path, error) {
    pp, err := ipfspath.ParsePath(p)
    if err != nil {
        return nil, err
    }
    return &path{path: pp.String()}, nil
}
func NewResolvedPath(p string, c *cid.Cid) coreiface.Path { return &path{path: p, cid: c} }
func NewPathFromCid(c *cid.Cid) coreiface.Path            { return &path{path: "/ipfs/" + c.String(), cid: c} }
func (p *path) String() string                            { return p.path }
func (p *path) Cid() *cid.Cid                             { return p.cid }
func (p *path) Resolved() bool { return p.cid != nil }

And the ResolvePath function:

func (api *CoreAPI) ResolvePath(ctx context.Context, p coreiface.Path) (coreiface.Path, error) {
    if p.Resolved() {
        return p, nil
    }

    c, err := core.ResolveToCid(ctx, api.node, ipfspath.FromString(p.String()))
    if err != nil {
        return nil, err
    }
    return NewResolvedPath(p.String(), c), nil
}

I would make the String method deffered to create path on access, there might be cases that his allocation and conversion are completely unnecessary and the cid.String()method includes base58 conversion which has quite hight cost.

Hrm... is the cid in the path object the 'final' object? Or the 'root' object? I was thinking that it should be the root.

So in the case of /ipfs/QmFooBar/a/b/c/d, the path object would contain /a/b/c/d and Cid(QmFooBar)

Hrm... is the cid in the path object the 'final' object? Or the 'root' object? I was thinking that it should be the root.

So in the case of /ipfs/QmFooBar/a/b/c/d, the path object would contain /a/b/c/d and Cid(QmFooBar)

What do you have in mind? My thinking was that cid and path represent the same node, so that they can be used interchangeably.

Another thing to consider: IPNS. should ipns names be resolved beforehand by the user? the cli should resolve for the user, but the api may or may not

CoreAPI.ResolvePath() currently also resolves /ipns paths. I think this counts as "resolving beforehand by the user" :)

Currently as in, the implementation of the most recent proposal above, in #3493.

@lgierth can i bump this from 0.4.5?

Yes

@lgierth When do you think we can prioritize this? I've moved it to 0.4.8, but if we arent going to do it then i'd like to drop it from the 'next' milestone

committed for docs sprint as part of the general core-api theme

Was this page helpful?
0 / 5 - 0 ratings

Related issues

magik6k picture magik6k  路  3Comments

JesseWeinstein picture JesseWeinstein  路  4Comments

kallisti5 picture kallisti5  路  3Comments

emelleme picture emelleme  路  3Comments

funkyfuture picture funkyfuture  路  3Comments