Please answer these questions before submitting your issue. Thanks!
go version)?1.6.3
go env)?darwin / amd64
Tried to access _all_ cookies stored in a http.CookieJar.
A Function for accessing the []*http.Cookie data. The cookiejar struct is missing a way to access all the cookie structs saved in it. You have to know all the matching urls (as url.URL structs) in order to pull cookies out of it.
Only a method for reading cookies from pre-known url.URL structs.
What will be a good way to implement such a function?
I thought about a generator-style function that returns a channel to a custom type containing the key, subkey and the entry, so you could call something like:
for currentEntry := jar.AllCookies() { ... }
I'd gladly do it, if you guys think my solution is good.
By the way, what is an actual use case for this function?
Isn't it risky, considering same-origin policy?
The use case in general (from looking around on the web and even a couple github projects) is to persist the cookiejar for later reuse. Some people (like me) want to restart their services while keeping state.
I am currently manually keeping up a second copy of everything in the cookiejar (by checking the http client response each time), but since cookiejar is already doing this it is a waste of resources.
The same-origin policy was added because of the need for sandboxing Javsacript in the browser and requests to different, third-party servers. Unknown/unauthorized code isn't really a problem in most Go applications.
Is an opposite function, that accepts all the cookies and puts them in the jar also needed?
How do you put all the saved cookies in the cookie jar?
I'd be happy to give this a shot.
While nice to have, a function to set all the cookies back on the cookie jar isn't required since you can just range over the map/struct and call cookiejar.SetCookies(u *url.URL, cookies []*http.Cookie).
The main thing is simply getting a JSON/map/struct dump of the cookies inside the cookiejar so it can be persisted.
I would like this in the stdlib, too. For now, since CookieJar is an interface, I just wrapped it and keep track of stuff.
import (
"net/url"
"net/http/cookiejar"
"net/http"
"sync"
)
/*
Implements the normal http cookie jar interface but also usefully
allows you to dump all the stored cookies without
having to know any of the domains involved, which helps a lot
*/
func NewExportableCookieJar() http.CookieJar {
realJar, _ := cookiejar.New(nil)
e := &ExportableCookieJar{
jar: realJar,
allCookies: make(map[url.URL][]*http.Cookie),
}
return e
}
func (jar *ExportableCookieJar) SetCookies(u *url.URL, cookies []*http.Cookie) {
jar.Lock()
defer jar.Unlock()
jar.allCookies[*u] = cookies
jar.jar.SetCookies(u, cookies)
}
func (jar *ExportableCookieJar) Cookies(u *url.URL) []*http.Cookie {
return jar.jar.Cookies(u)
}
func (jar *ExportableCookieJar) ExportAllCookies() map[url.URL][]*http.Cookie {
jar.RLock()
defer jar.RUnlock()
copied := make(map[url.URL][]*http.Cookie)
for u, c := range jar.allCookies {
copied[u] = c
}
return copied
}
type ExportableCookieJar struct {
jar *cookiejar.Jar
allCookies map[url.URL][]*http.Cookie
sync.RWMutex
}
The function should return an array instead of a channel. Channels are far to heavy of an iterable, and I cannot see any need for a channel specifically.
@dprime A couple questions on that proposed ExportableCookieJar:
url.URL requests could become inefficient because it could essentially mean storing ALL unique request URL's even though in terms of Cookies rules they mayhttps://domain.com/abc/1 - cookies may just have path /abc but it stores the full URLhttps://domain.com/abc/2 https://domain.com/abc/3allCookies map[url.URL][]*http.Cookie how would you go about restoring a cookie Jarmap[url.URL][]*http.Cookiejar.RLock() in the Cookies even just for reading those valuesFrom what I can tell rummaging around the code, it'd be ideal if there were a way to serialize the entries tree that is built up in the standard libraries cookie jar.
type entries map[string]map[string]entry
// entry is the internal representation of a cookie.
//
// This struct type is not used outside of this package per se, but the exported
// fields are those of RFC 6265.
type entry struct {
Name string
Value string
Domain string
Path string
Secure bool
HttpOnly bool
Persistent bool
HostOnly bool
Expires time.Time
Creation time.Time
LastAccess time.Time
// seqNum is a sequence number so that Cookies returns cookies in a
// deterministic order, even for cookies that have equal Path length and
// equal Creation time. This simplifies testing.
seqNum uint64
}
As this data-structure is what preserves and provides the proper lookup of Cookies associated with a URL request in an efficient manner.
In that case maybe the ExportableCookiesJar just needs to duplicate the logic for building the entries (but then it just seems you might as well fork the implementation or provide a public method to export Entries or something)
This stuff might be a bit more tricky than one would expect.
The issue is named "add way to access all cookies in CookieJar" but accessing the content is only the first half of what a new API would need to provide: After accessing all stored cookies one may serialise them to disk and probably wants to recreate a Jar from such a serialisation. So we not only need a way to access all cookies (including all attributes) but also a way to refill the Jar. This cannot be done through Jar.SetCookies (as this doesn't handle the Created/Accessed attributes).
So we need _at_ _least_ the following API:
type Entry struct { ... } // Export Entry type as container for all cookie attributes.
func (j *Jar) AllEntries() []Entry // Return a copy of all (non-expired) cookies stored in j.
func (j *Jar) LoadEntries(e []Entry) error // Load cookies from e into j.
This allows accessing all stored cookies and re-populating A Jar from that information. Deleting cookies from the jar can be simulated by specially crafted Jar.SetCookies calls.
This is sufficient for the following use case:
During the original discussion about a persistent cookiejar (starting points for more might be https://groups.google.com/forum/#!topic/golang-dev/7Jcbgrbah2s/discussion) we found out that such a minimal API might not be suitable: Think about a browser in a mobile device which wants to persist cookies while not generating garbage (to keep GC low) and minimizing disk access. In such a use case reading jar.AllEntries() after each request (to update LastAccess) would be far to heavyweight. In such a scenario some kind of notification mechanism would be preferable: The Jar informs some subscriber each time some change happens, the subscriber filters this stream of notifications and acts on some subset. E.g. ignoring changes to LastAccess and expired cookies and handling only new/deleted/changed cookies. To make this work we would need an API to subscribe to changes to the Jar and query only a subset of all cookies stored in the Jar:
type Notification uint32
const (
ReadAccess Notification = 1 << iota // LastAccess modified in Cookies().
Expiration // Deleted because expiration detected in Cookies().
DeleteCookie // Deleted during SetCookies.
SetCookie // Set or updated during SetCookies.
)
type Options struct {
PublicSuffixList cookiejar.PublicSuffixList // As existing already
// Notify is a channel on which changes to the jar are announced.
Notify chan<- string // TODO: element type of channel
// NotifyOn is the bitmap which determines which changes to the
// jar will result in a notification message on the Notify channel.
NotifyOn Notification
}
func( j *Jar) GetEntry(domain, path, name string) (Entry, error) // Retrieve single cookie.
This starts to become a pretty large API change.
I think it will be hard to find the right tradeoff between an API which is convenient to use in the simple use case (dump everything at program exit, reload all at next startup) while enabling to partially act on changes to the Jar content.
I expect the following API to be a sensible compromise:
https://godoc.org/github.com/vdobler/ht/cookiejar
jar.ETLDsPlus1(nil).jar.LoadEntries().The reason for the coarse notification and subset is as follows:
Question to @bradfitz and @nigeltao : Are such large API changes okay? Should I prepare a design document? Or is the whole issue something which can be delegated to a 3rd party package?
Large API changes are OK, but it is indeed tricky, as we've discussed before, and unfortunately, I don't have a lot of spare bandwidth to think about cookie jars at the moment. I think the way to start is to work in a 3rd party package.
Does this really need to reside in the stdlib? For example, implementing a cookie jar that immediately persists changes in a database is easiest if you know the database you're talking to. You might even use a per-use wrapper that communicates an open SQL transaction to the thing implementing http.CookieJar.
I think instead of wanting "100% all of the features cookiejar implementation" in stdlib, might it not be more useful to focus on this: Why is implementing your own CookieJar difficult? Is it because marshaling a http.Cookie usefully is hard? Is it because of innate complexity of HTTP cookie behavior? Can stdlib provide helpers for these tasks, without having to find a one-size-fits-everyone CookieJar implementation?
@tv42
Is it because of innate complexity of HTTP cookie behavior?
This.
Which cookies are allowed by the spec and which are so common in the wild that one has to allow them is ugly and the logic behind domains and paths is complicated.
But I agree: There is not much need for this in the stdlib. Taking the stdlib Jar implementation and adding a few functions/methods is not much work. net/http/cookiejar does not see much commits, actually just a handful during the last 5 years. So forking and maintaining the fork is pretty simple.
Adding helper functions to the stdlib is also complicated: http.Cookie is not really suitable to store all information needed in a cookiejar, e.g. how to distinguish domain-cookies from host-cookies?
Most helpful comment
The function should return an array instead of a channel. Channels are far to heavy of an iterable, and I cannot see any need for a channel specifically.